Facebook Twitter Instagram
    TecAdmin
    • Home
    • FeedBack
    • Submit Article
    • About Us
    Facebook Twitter Instagram
    TecAdmin
    You are at:Home»Linux Tutorials»Using Regular Expressions in Awk

    Using Regular Expressions in Awk

    By RahulMarch 4, 20234 Mins Read

    Regular expressions are a powerful tool for text processing in awk. They allow you to search for patterns in a text file and manipulate the data based on those patterns. In this article, we will explore how to use regular expressions in awk with examples.

    Advertisement

    Regular Expression Basics

    Regular expressions are patterns that match a specific set of characters. The following table lists some of the basic regular expression metacharacters that you can use in awk:

    MetacharacterDescription
    .Matches any single character
    [ ]Matches any character within the brackets
    ^Matches the beginning of a line
    $Matches the end of a line
    *Matches zero or more occurrences of the previous character
    +Matches one or more occurrences of the previous character
    ?Matches zero or one occurrence of the previous character

    Awk provides two built-in functions for using regular expressions: match() and sub(). The match() function is used to find the first occurrence of a regular expression in a string, and sub() is used to replace the first occurrence of a regular expression in a string. Here are some examples:

    Example 1: Matching a Regular Expression

    Let’s say we have a file containing a list of email addresses, and we want to find all email addresses that end with “.com”. We can use the match() function to accomplish this task as follows:

    1
    2
    3
    4
    5
    awk '{
      if (match($0, /\.com$/)) {
        print $0
      }
    }' email.txt

    Here, we use the match() function to search for the regular expression /.com$/ (which matches any string that ends with “.com”) in each line of the file. If a match is found, we print the line.

    Example 2: Replacing a Regular Expression

    Let’s say we have a file containing a list of phone numbers, and we want to replace all instances of “555” with “666”. We can use the sub() function to accomplish this task as follows:

    1
    2
    3
    4
    awk '{
      sub(/555/, "666", $0)
      print $0
    }' phone.txt

    Here, we use the sub() function to search for the regular expression /555/ (which matches any string containing “555”) in each line of the file, and replace it with “666”. We then print the modified line.

    Advanced Regular Expression Techniques

    In addition to the basic regular expression metacharacters, awk supports several advanced regular expression techniques that can help you accomplish more complex text processing tasks. These include:

    1. Grouping:

    You can group parts of a regular expression together using parentheses. This allows you to apply a quantifier to the group as a whole, or to extract specific parts of the matched string.

    Let’s say we have a file containing a list of employee names and salaries, and we want to extract the names and salaries separately. We can use grouping to accomplish this task as follows:

    1
    2
    3
    4
    5
    6
    7
    8
    awk '{
      if (match($0, /^(\w+)\s+(\d+)$/)) {
        name = substr($0, RSTART, RLENGTH)
        salary = substr($0, RSTART+length(name)+1, length($0)-RSTART-length(name))
        print name
        print salary
      }
    }' employees.txt

    Here, we use grouping to match the regular expression /^(\w+)\s+(\d+)$/ (which matches a line containing one or more word characters followed by one or more whitespace characters, followed by one or more digits) and extract the name and salary separately.

    2. Backreferences:

    You can use backreferences (i.e., \1, \2, etc.) to refer to parts of the regular expression that were matched by a group. This allows you to reuse matched substrings in the replacement string.

    Let’s say we have a file containing a list of phone numbers in the format (XXX) XXX-XXXX, and we want to change the format to XXX-XXX-XXXX. We can use backreferences to accomplish this task as follows:

    1
    2
    3
    4
    awk '{
      sub(/\((\d{3})\) (\d{3})-(\d{4})/, "\1-\2-\3", $0)
      print $0
    }' phone.txt

    Here, we use backreferences (i.e., \1, \2, and \3) to refer to the three groups of digits matched by the regular expression “/(\d3)(\d3) (\d{3})-(\d{4})/” (which matches a phone number in the format (XXX) XXX-XXXX) and replace the format with XXX-XXX-XXXX.

    3. Lookahead and Lookbehind:

    You can use lookahead (?=) and lookbehind (?<=) to match patterns only if they are followed by or preceded by another pattern, respectively.

    Let’s say we have a file containing a list of URLs, and we want to extract only the domain names (i.e., the text between “http://” and the next “/” character). We can use lookahead and lookbehind to accomplish this task as follows:

    1
    2
    3
    4
    5
    awk '{
      if (match($0, /(?<=http:\/\/)[^\/]+/)) {
        print substr($0, RSTART, RLENGTH)
      }
    }' urls.txt

    Here, we use lookahead (?<=) to match the regular expression “/(?<=http://)[^/]+/" (which matches any characters that come after “http://” and before the next “/” character) and extract the domain name.

    4. Negated character classes:

    Let’s say we have a file containing a list of email addresses, and we want to extract only the addresses that belong to a specific domain (e.g., example.com). We can use negated character classes to accomplish this task as follows:

    1
    2
    3
    4
    5
    awk '{
      if (match($0, /^[^@]+@example\.com$/)) {
        print $0
      }
    }' emails.txt

    Here, we use a negated character class ([^@]+) to match any characters that are not “@” and extract the username, and then match the literal string “@example.com” to ensure that the address belongs to the specified domain.

    5. Alternation:

    Let’s say we have a file containing a list of phone numbers, and we want to extract only the numbers that are either in the format “(XXX) XXX-XXXX” or “XXX-XXX-XXXX”. We can use alternation to accomplish this task as follows:

    1
    2
    3
    4
    5
    awk '{
      if (match($0, /\((\d{3})\) (\d{3})-(\d{4})|(\d{3})-(\d{3})-(\d{4})/)) {
        print substr($0, RSTART, RLENGTH)
      }
    }' phones.txt

    Here, we use alternation (|) to match either the regular expression “/(\d3)(\d3) (\d{3})-(\d{4})/” (which matches a phone number in the format (XXX) XXX-XXXX) or the regular expression “/(\d{3})-(\d{3})-(\d{4})/” (which matches a phone number in the format XXX-XXX-XXXX).

    Conclusion

    Regular expressions are a powerful tool for text processing in awk. They allow you to search for patterns in a text file, and manipulate the data based on those patterns. By mastering regular expressions in awk, you can become more effective and efficient in your text processing tasks, and accomplish complex data manipulation with ease.

    awk regular expression
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email WhatsApp

    Related Posts

    How to Validate Email Addresses in Python (Using Regular Expressions)

    Understanding 2>&1 in Bash: A Beginner’s Guide

    How to Choose the Best Shebang (#!) for Your Shell Scripts

    Add A Comment

    Leave A Reply Cancel Reply

    Advertisement
    Recent Posts
    • Python Lambda Functions – A Beginner’s Guide
    • 10 Practical Use Cases for Lambda Functions in Python
    • Implementing a Linux Server Security Audit: Best Practices and Tools
    • cp Command in Linux (Copy Files Like a Pro)
    • 15 Practical Examples of dd Command in Linux
    Facebook Twitter Instagram Pinterest
    © 2023 Tecadmin.net. All Rights Reserved | Terms  | Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.