Awk is a powerful text processing tool that is commonly used for manipulating and analyzing data in Unix and Linux environments. One of the key features of awk is its ability to manipulate strings using a wide variety of built-in functions.
In this article, we will explore some of the most commonly used string manipulations functions in awk.
length(string)
: Returns the length of the specified string.substr(string, start, length)
: Returns a substring of the specified string, starting at the specified position and with the specified length.index(string, substring)
: Returns the position of the first occurrence of the specified substring in the specified string.split(string, array, separator)
: Splits the specified string into an array of substrings, using the specified separator to determine where to split the string.sub(regexp, replacement, string)
: This replaces the first occurring regular expression match from the string with “replacement”.gsub(regexp, replacement, string)
: Replaces all occurrences of the specified regular expression in the specified string with the specified replacement string.match(string, regexp)
: Searches the specified string for the first occurrence of the specified regular expression, and returns the position of the match and the length of the matched substring in an array.tolower(string) and toupper(string)
: Converts all uppercase or lowercase characters in the specified string to lowercase or uppercase characters, respectively.
Let’s understand all the string functions one by one including the example:
1. length(string)
The length(string)
function returns the length of the specified string. For example, if we want to find the length of the string “Hello, World!”, we can use the following code:
awk 'BEGIN{print length("Hello, World!")}'
This will output “13”, since the string “Hello, World!” has 13 characters.
2. substr(string, start, length)
The substr(string, start, length)
function returns a substring of the specified string, starting at the specified position and with the specified length. For example, if we want to extract the first 5 characters of the string “Hello, World!”, we can use the following code:
awk 'BEGIN{print substr("Hello, World!", 1, 5)}'
This will output “Hello”, since the first 5 characters of the string are “Hello”.
3. index(string, substring)
The index(string, substring)
function returns the position of the first occurrence of the specified substring in the specified string. For example, if we want to find the position of the substring “World” in the string “Hello, World!”, we can use the following code:
awk 'BEGIN{print index("Hello, World!", "World")}'
This will output “8”, since the substring “World” starts at the 8th position in the string.
4. split(string, array, separator)
The split(string, array, separator)
function splits the specified string into an array of substrings, using the specified separator to determine where to split the string. For example, if we want to split the string “apple,banana,orange” into an array of substrings using the comma as the separator, we can use the following code:
awk 'BEGIN{split("apple,banana,orange", a, ","); for(i in a) print a[i]}'
This will output:
apple
banana
orange
5. sub(regexp, replacement, string)
The sub(regexp, replacement, string)
function replaces the first occurrence of the specified regular expression in the specified string with the specified replacement string. For example, if we want to replace only the first occurrences of the letter “o” in the string “Hello, World!” with the letter “a”, we can use the following code:
awk 'BEGIN{sub("o", "a", "Hello, World!"); print}'
This will output “Hella, World!”, since only the first occurrence of the letter “o” have been replaced with the letter “a”.
6. gsub(regexp, replacement, string)
The gsub(regexp, replacement, string) function replaces all occurrences of the specified regular expression in the specified string with the specified replacement string. For example, if we want to replace all occurrences of the letter “o” in the string “Hello, World!” with the letter “a”, we can use the following code:
awk 'BEGIN{gsub("o", "a", "Hello, World!"); print}'
This will output “Hella, Warld!”, since all occurrences of the letter “o” have been replaced with the letter “a”.
7. match(string, regexp)
The match(string, regexp)
function searches the specified string for the first occurrence of the specified regular expression, and returns the position of the match and the length of the matched substring in an array. For example, if we want to find the position and length of the first occurrence of the word “World” in the string “Hello, World!”, we can use the following code:
awk 'BEGIN{match("Hello, World!", /World/); print RSTART, RLENGTH}'
This will output “8 5”, since the word “World” starts at the 8th position in the string and has a length of 5 characters.
8. tolower(string) and toupper(string)
The tolower(string)
function converts all uppercase characters in the specified string to lowercase characters, while the toupper(string) function converts all lowercase characters in the specified string to uppercase characters. For example, if we want to convert the string “Hello, World!” to all lowercase letters, we can use the following code:
awk 'BEGIN{print tolower("Hello, World!")}'
This will output “hello, world!”.
Similarly, if we want to convert the same string to all uppercase letters, we can use the following code:
awk 'BEGIN{print toupper("Hello, World!")}'
This will output “HELLO, WORLD!”.
Conclusion
In this article, we have explored some of the most commonly used string manipulations functions in awk. These functions allow us to perform a wide variety of tasks, such as finding the length of a string, extracting substrings, searching for patterns, splitting strings into arrays, and converting text to different cases. By mastering these functions, we can become more proficient at working with text data in Unix and Linux environments and increase our productivity as data analysts and programmers.