One way of working with files in Linux is using a scripting language to manage the automation of repeated tasks. An example of a good scripting language is awk which makes the extracting of data and working with patterns easy. The GNU implementation of the awk scripting language is gawk. If you are yet to come to terms with its usage, you are in luck. This post presents the different examples of the use of gawk in Linux, and by the end of this guide, you will have a solid understanding of working with it.

Getting Started with Gawk

If you are using the latest Linux versions, gawk should be installed by default. You can verify by checking its version.

There are different ways to use gawk. The common functionalities are the following:

  • Format output lines
  • Transforming data files
  • Scanning files per line
  • Splitting input line into fields
  • Producing formatted reports
  • Sorting data

The basic syntax for gawk is:

$ gawk [POSIX / GNU style options] -f program_file [ ] file

$ gawk [POSIX / GNU style options] [ ] ‘program’ file

To use gawk, you either use the -f option to specify a script file or specify the script on the command line directly.

There are three important options to note when using gawk:

1. -f file, –file=file: Used when you want to use gawk and read commands from a file. The file is the script.

2. -v var=val, –assign=var=val: Used when you need to assign a value to a variable before executing a script.

3. -F fs, –field-separator=fs: The value of the predefined variable FS gets used as the separator for the input field.

Built-In Variables

Gawk offers built-in variables such as:

FS: Used when dividing files and contains the field separator character.

RS: Contains the current character separator.

OFS: Contains the output field separator that separates the fields that AWK prints.

NF: The number of fields for the input record gets stored in the NF.

ORS: Contains the output field separator that separates the output lines printed by AWK.

NR: Contains the total number of input lines.

In the previous example, we use * as the separator for the input lines in the file.

Example Usage of Gawk

1. -F

For sorting a text file and printing the first three colon-separated fields, use the following command. Note that we use the passwd as our file here:

$ gawk -F: ‘{print $1; print $2; print $3}/etc/passwd

Here, our separator is a full colon. Since we want the first three fields, you specify them as shown in the previous example. You can tweak it around and use a different separator and a different number of fields.

2. -f

To specify the awk program source from a file, use the -f flag followed by the file:

$ gawk -F: -f file1.txt /etc/passwd

3. Printing Contents of a File

Using gawk on a file prints all the data lines in the file.

4. Working with Patterns

You can also use gawk and print only the lines matching a given pattern. For instance, to print a line containing a particular word, in our case the word is kim, the command would be:

$ gawk/kim/ {print}’ file-name

Here, the pattern can also be a character. For instance, to print all the lines that contain a colon, the command would be as in the following image:

You can also specify the specific lines to print. For instance, to print the lines containing specific characters such as greater than 6, the syntax is:

$ gawk ‘length ($0) > 6’ file-name

5. Splitting Lines Into Fields

Gawk, by default, prints every field when printing lines in a file. However, you can specify which field to print. The first field gets stored in the $1 and the whole line is represented as $0. By default, the entire line gets printed unless you specify to separate the fields based on the whitespace.

For instance, to separate the lines and print only the second field of each line, the command would be:

$ gawk{print $2}’ file-name

To add the line numbers, add the NR variable.

The new command would be:

$ gawk{print NR, $2}’ file-name

6. Get the Number of Lines

You may need to get the total number of lines for large files, and you can achieve that using the following syntax:

$ gawk ‘END {print NR}’ file-name


Knowing how to use gawk in Linux is fun and helpful, especially when dealing with text data. You can use the different patterns to extract and manipulate the lines of data. Hopefully, the examples covered in this article give you a head start and open your eyes in using gawk for different activities.

Source link