Linux Conquering Cloud

Tuesday, February 26, 2013

Linux Split and Join Command to Manage Large Files

Linux Split and Join Command to Manage Large Files

Join and split command syntax:

join [OPTION]… FILE1 FILE2
split [OPTION]… [INPUT [PREFIX]]

Use the split command to do this:

split --bytes=1024m bigfile.iso small_file_

That command will split bigfile.iso into files that are 1024 MB in size (1GB) and name the various parts small_file_aa, small_file_ab, etc. You can specify b for bytes, k for Kilobytes and m for Megabytes to specify sizes.

To join the files back together on Linux:

cat small_file_* > joined_file.iso

Linux Split Command Examples
1. Basic Split Example

Here is a basic example of split command.

$ split split.zip

$ ls
split.zip xab xad xaf xah xaj xal xan xap xar xat xav xax xaz xbb xbd xbf xbh xbj xbl xbn
xaa        xac xae xag xai xak xam xao xaq xas xau xaw xay xba xbc xbe xbg xbi xbk xbm xbo

So we see that the file split.zip was split into smaller files with x** as file names. Where ** is the two character suffix that is added by default. Also, by default each x** file would contain 1000 lines.

$ wc -l *
40947 split.zip
1000 xaa
1000 xab
1000 xac
1000 xad
1000 xae
1000 xaf
1000 xag
1000 xah
1000 xai
...
...
...

So the output above confirms that by default each x** file contains 1000 lines.

2.Change the Suffix Length using -a option

As discussed in example 1 above, the default suffix length is 2. But this can be changed by using -a option.

As you see in the following example, it is using suffix of length 5 on the split files.

$ split -a5 split.zip
$ ls
split.zip xaaaac xaaaaf xaaaai xaaaal xaaaao xaaaar xaaaau xaaaax xaaaba xaaabd xaaabg xaaabj xaaabm
xaaaaa     xaaaad xaaaag xaaaaj xaaaam xaaaap xaaaas xaaaav xaaaay xaaabb xaaabe xaaabh xaaabk xaaabn
xaaaab     xaaaae xaaaah xaaaak xaaaan xaaaaq xaaaat xaaaaw xaaaaz xaaabc xaaabf xaaabi xaaabl xaaabo

Note: Earlier we also discussed about other file manipulation utilities – tac, rev, paste.
3.Customize Split File Size using -b option

Size of each output split file can be controlled using -b option.

In this example, the split files were created with a size of 200000 bytes.

$ split -b200000 split.zip

$ ls -lart
total 21084
drwxrwxr-x 3 himanshu himanshu     4096 Sep 26 21:20 ..
-rw-rw-r-- 1 himanshu himanshu 10767315 Sep 26 21:21 split.zip
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xad
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xac
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xab
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xaa
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xah
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xag
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xaf
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xae
-rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xar
...
...
...

4. Create Split Files with Numeric Suffix using -d option

As seen in examples above, the output has the format of x** where ** are alphabets. You can change this to number using -d option.

Here is an example. This has numeric suffix on the split files.

$ split -d split.zip
$ ls
split.zip x01 x03 x05 x07 x09 x11 x13 x15 x17 x19 x21 x23 x25 x27 x29 x31 x33 x35 x37 x39
x00        x02 x04 x06 x08 x10 x12 x14 x16 x18 x20 x22 x24 x26 x28 x30 x32 x34 x36 x38 x40

5. Customize the Number of Split Chunks using -C option

To get control over the number of chunks, use the -C option.

This example will create 50 chunks of split files.

$ split -n50 split.zip
$ ls
split.zip xac xaf xai xal xao xar xau xax xba xbd xbg xbj xbm xbp xbs xbv
xaa        xad xag xaj xam xap xas xav xay xbb xbe xbh xbk xbn xbq xbt xbw
xab        xae xah xak xan xaq xat xaw xaz xbc xbf xbi xbl xbo xbr xbu xbx

6. Avoid Zero Sized Chunks using -e option

While splitting a relatively small file in large number of chunks, its good to avoid zero sized chunks as they do not add any value. This can be done using -e option.

Here is an example:

$ split -n50 testfile

$ split -n50 -e testfile

$ ls
split.zip testfile xaa xab xac xad xae xaf

So we see that no zero sized chunk was produced in the above output.

7. Customize Number of Lines using -l option

Number of lines per output split file can be customized using the -l option.

As seen in the example below, split files are created with 20000 lines.

$ split -l20000 split.zip

$ ls
split.zip testfile xaa xab xac

$ wc -l x*
20000 xaa
20000 xab
947 xac
40947 total

Get Detailed Information using –verbose option

To get a diagnostic message each time a new split file is opened, use –verbose option as shown below.

$ split -l20000 --verbose split.zip
creating file `xaa'
creating file `xab'
creating file `xac'

Linux Join Command Examples

8. Basic Join Example

Join command works on first field of the two files (supplied as input) by matching the first fields.

Here is an example :

$ cat testfile1
1 India
2 US
3 Ireland
4 UK
5 Canada

$ cat testfile2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto

$ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
4 UK London
5 Canada Toronto

So we see that a file containing countries was joined with another file containing capitals on the basis of first field.
9. Join works on Sorted List

If any of the two files supplied to join command is not sorted then it shows up a warning in output and that particular entry is not joined.

In this example, since the input file is not sorted, it will display a warning/error message.

$ cat testfile1
1 India
2 US
3 Ireland
5 Canada
4 UK

$ cat testfile2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto

$ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
join: testfile1:5: is not sorted: 4 UK
5 Canada Toronto

10. Ignore Case using -i option

When comparing fields, the difference in case can be ignored using -i option as shown below.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
a NewDelhi
B Washington
c Dublin
d London
e Toronto

$ join testfile1 testfile2
a India NewDelhi
c Ireland Dublin
d UK London
e Canada Toronto

$ join -i testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

11. Verify that Input is Sorted using –check-order option

Here is an example. Since testfile1 was unsorted towards the end so an error was produced in the output.

$ cat testfile1
a India
b US
c Ireland
d UK
f Australia
e Canada

$ cat testfile2
a NewDelhi
b Washington
c Dublin
d London
e Toronto

$ join --check-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
join: testfile1:6: is not sorted: e Canada

12. Do not Check the Sortness using –nocheck-order option

This is the opposite of the previous example. No check for sortness is done in this example, and it will not display any error message.

$ join --nocheck-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London

13. Print Unpairable Lines using -a option

If both the input files cannot be mapped one to one then through -a[FILENUM] option we can have those lines that cannot be paired while comparing. FILENUM is the file number (1 or 2).

In the following example, we see that using -a1 produced the last line in testfile1 (marked as bold below) which had no pair in testfile2.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada
f Australia

$ cat testfile2
a NewDelhi
b Washington
c Dublin
d London
e Toronto

$ join testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

$ join -a1 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto
f Australia

14. Print Only Unpaired Lines using -v option

In the above example both paired and unpaired lines were produced in the output. But, if only unpaired output is desired then use -v option as shown below.

$ join -v1 testfile1 testfile2
f Australia

15. Join Based on Different Columns from Both Files using -1 and -2 option

By default the first columns in both the files is used for comparing before joining. You can change this behavior using -1 and -2 option.

In the following example, the first column of testfile1 was compared with the second column of testfile2 to produce the join command output.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
NewDelhi a
Washington b
Dublin c
London d
Toronto e

$ join -1 1 -2 2 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

Rsync in detail

Important features of rsync

Speed: First time, rsync replicates the whole content between the source and destination directories. Next time, rsync transfers only the changed blocks or bytes to the destination location, which makes the transfer really fast.
Security: rsync allows encryption of data using ssh protocol during transfer.
Less Bandwidth: rsync uses compression and decompression of data block by block at the sending and receiving end respectively. So the bandwidth used by rsync will be always less compared to other file transfer protocols.
Privileges: No special privileges are required to install and execute rsync

Syntax

$ rsync options source destination

Source and destination could be either local or remote. In case of remote, specify the login name, remote server name and location.
Example 1. Synchronize Two Directories in a Local Server

To sync two directories in a local computer, use the following rsync -zvr command.

$ rsync -zvr /var/opt/installation/inventory/ /root/temp
building file list ... done
sva.xml
svB.xml
.
sent 26385 bytes received 1098 bytes 54966.00 bytes/sec
total size is 44867 speedup is 1.63
$

In the above rsync example:

-z is to enable compression
-v verbose
-r indicates recursive

Now let us see the timestamp on one of the files that was copied from source to destination. As you see below, rsync didn’t preserve timestamps during sync.

$ ls -l /var/opt/installation/inventory/sva.xml /root/temp/sva.xml
-r--r--r-- 1 bin bin 949 Jun 18 2009 /var/opt/installation/inventory/sva.xml
-r--r--r-- 1 root bin 949 Sep 2 2009 /root/temp/sva.xml

Example 2. Preserve timestamps during Sync using rsync -a

rsync option -a indicates archive mode. -a option does the following,

Recursive mode
Preserves symbolic links
Preserves permissions
Preserves timestamp
Preserves owner and group

Now, executing the same command provided in example 1 (But with the rsync option -a) as shown below:

$ rsync -azv /var/opt/installation/inventory/ /root/temp/
building file list ... done
./
sva.xml
svB.xml
.
sent 26499 bytes received 1104 bytes 55206.00 bytes/sec
total size is 44867 speedup is 1.63
$

As you see below, rsync preserved timestamps during sync.

$ ls -l /var/opt/installation/inventory/sva.xml /root/temp/sva.xml
-r--r--r-- 1 root bin 949 Jun 18 2009 /var/opt/installation/inventory/sva.xml
-r--r--r-- 1 root bin 949 Jun 18 2009 /root/temp/sva.xml

Example 3. Synchronize Only One File

To copy only one file, specify the file name to rsync command, as shown below.

$ rsync -v /var/lib/rpm/Pubkeys /root/temp/
Pubkeys

sent 42 bytes received 12380 bytes 3549.14 bytes/sec
total size is 12288 speedup is 0.99

Example 4. Synchronize Files From Local to Remote

rsync allows you to synchronize files/directories between the local and remote system.

$ rsync -avz /root/temp/ thegeekstuff@192.168.200.10:/home/thegeekstuff/temp/
Password:
building file list ... done
./
rpm/
rpm/Basenames
rpm/Conflictname

sent 15810261 bytes received 412 bytes 2432411.23 bytes/sec
total size is 45305958 speedup is 2.87

While doing synchronization with the remote server, you need to specify username and ip-address of the remote server. You should also specify the destination directory on the remote server. The format is username@machinename:path

As you see above, it asks for password while doing rsync from local to remote server.

Sometimes you don’t want to enter the password while backing up files from local to remote server. For example, If you have a backup shell script, that copies files from local to remote server using rsync, you need the ability to rsync without having to enter the password.

To do that, setup ssh password less login as we explained earlier.
Example 5. Synchronize Files From Remote to Local

When you want to synchronize files from remote to local, specify remote path in source and local path in target as shown below.

$ rsync -avz thegeekstuff@192.168.200.10:/var/lib/rpm /root/temp
Password:
receiving file list ... done
rpm/
rpm/Basenames
.
sent 406 bytes received 15810230 bytes 2432405.54 bytes/sec
total size is 45305958 speedup is 2.87

Example 6. Remote shell for Synchronization

rsync allows you to specify the remote shell which you want to use. You can use rsync ssh to enable the secured remote connection.

Use rsync -e ssh to specify which remote shell to use. In this case, rsync will use ssh.

$ rsync -avz -e ssh thegeekstuff@192.168.200.10:/var/lib/rpm /root/temp
Password:
receiving file list ... done
rpm/
rpm/Basenames

sent 406 bytes received 15810230 bytes 2432405.54 bytes/sec
total size is 45305958 speedup is 2.87

Example 7. Do Not Overwrite the Modified Files at the Destination

In a typical sync situation, if a file is modified at the destination, we might not want to overwrite the file with the old file from the source.

Use rsync -u option to do exactly that. (i.e do not overwrite a file at the destination, if it is modified). In the following example, the file called Basenames is already modified at the destination. So, it will not be overwritten with rsync -u.

$ ls -l /root/temp/Basenames
total 39088
-rwxr-xr-x 1 root root        4096 Sep 2 11:35 Basenames

$ rsync -avzu thegeekstuff@192.168.200.10:/var/lib/rpm /root/temp
Password:
receiving file list ... done
rpm/

sent 122 bytes received 505 bytes 114.00 bytes/sec
total size is 45305958 speedup is 72258.31

$ ls -lrt
total 39088
-rwxr-xr-x 1 root root        4096 Sep 2 11:35 Basenames

Example 8. Synchronize only the Directory Tree Structure (not the files)

Use rsync -d option to synchronize only directory tree from source to the destination. The below example, synchronize only directory tree in recursive manner, not the files in the directories.

$ rsync -v -d thegeekstuff@192.168.200.10:/var/lib/ .
Password:
receiving file list ... done
logrotate.status
CAM/
YaST2/
acpi/

sent 240 bytes received 1830 bytes 318.46 bytes/sec
total size is 956 speedup is 0.46

Example 9. View the rsync Progress during Transfer

When you use rsync for backup, you might want to know the progress of the backup. i.e how many files are copies, at what rate it is copying the file, etc.

rsync –progress option displays detailed progress of rsync execution as shown below.

$ rsync -avz --progress thegeekstuff@192.168.200.10:/var/lib/rpm/ /root/temp/
Password:
receiving file list ...
19 files to consider
./
Basenames
5357568 100%   14.98MB/s    0:00:00 (xfer#1, to-check=17/19)
Conflictname
12288 100%   35.09kB/s    0:00:00 (xfer#2, to-check=16/19)
.
.
.
sent 406 bytes received 15810211 bytes 2108082.27 bytes/sec
total size is 45305958 speedup is 2.87

You can also use rsnapshot utility (that uses rsync) to backup local linux server, or backup remote linux server.
Example 10. Delete the Files Created at the Target

If a file is not present at the source, but present at the target, you might want to delete the file at the target during rsync.

In that case, use –delete option as shown below. rsync delete option deletes files that are not there in source directory.

# Source and target are in sync. Now creating new file at the target.
$ > new-file.txt

$ rsync -avz --delete thegeekstuff@192.168.200.10:/var/lib/rpm/ .
Password:
receiving file list ... done
deleting new-file.txt
./

sent 26 bytes received 390 bytes 48.94 bytes/sec
total size is 45305958 speedup is 108908.55

Target has the new file called new-file.txt, when synchronize with the source with –delete option, it removed the file new-file.txt
Example 11. Do not Create New File at the Target

If you like, you can update (Sync) only the existing files at the target. In case source has new files, which is not there at the target, you can avoid creating these new files at the target. If you want this feature, use –existing option with rsync command.

First, add a new-file.txt at the source.

[/var/lib/rpm ]$ > new-file.txt

Next, execute the rsync from the target.

$ rsync -avz --existing root@192.168.1.2:/var/lib/rpm/ .
root@192.168.1.2's password:
receiving file list ... done
./

sent 26 bytes received 419 bytes 46.84 bytes/sec
total size is 88551424 speedup is 198991.96

If you see the above output, it didn’t receive the new file new-file.txt
Example 12. View the Changes Between Source and Destination

This option is useful to view the difference in the files or directories between source and destination.

At the source:

$ ls -l /var/lib/rpm
-rw-r--r-- 1 root root 5357568 2010-06-24 08:57 Basenames
-rw-r--r-- 1 root root    12288 2008-05-28 22:03 Conflictname
-rw-r--r-- 1 root root 1179648 2010-06-24 08:57 Dirnames

At the destination:

$ ls -l /root/temp
-rw-r--r-- 1 root root    12288 May 28 2008 Conflictname
-rw-r--r-- 1 bin bin   1179648 Jun 24 05:27 Dirnames
-rw-r--r-- 1 root root        0 Sep 3 06:39 Basenames

In the above example, between the source and destination, there are two differences. First, owner and group of the file Dirname differs. Next, size differs for the file Basenames.

Now let us see how rsync displays this difference. -i option displays the item changes.

$ rsync -avzi thegeekstuff@192.168.200.10:/var/lib/rpm/ /root/temp/
Password:
receiving file list ... done
>f.st.... Basenames
.f....og. Dirnames

sent 48 bytes received 2182544 bytes 291012.27 bytes/sec
total size is 45305958 speedup is 20.76

In the output it displays some 9 letters in front of the file name or directory name indicating the changes.

In our example, the letters in front of the Basenames (and Dirnames) says the following:

> specifies that a file is being transferred to the local host.
f represents that it is a file.
s represents size changes are there.
t represents timestamp changes are there.
o owner changed
g group changed.

Example 13. Include and Exclude Pattern during File Transfer

rsync allows you to give the pattern you want to include and exclude files or directories while doing synchronization.

$ rsync -avz --include 'P*' --exclude '*' thegeekstuff@192.168.200.10:/var/lib/rpm/ /root/temp/
Password:
receiving file list ... done
./
Packages
Providename
Provideversion
Pubkeys

sent 129 bytes received 10286798 bytes 2285983.78 bytes/sec
total size is 32768000 speedup is 3.19

In the above example, it includes only the files or directories starting with ‘P’ (using rsync include) and excludes all other files. (using rsync exclude ‘*’ )
Example 14. Do Not Transfer Large Files

You can tell rsync not to transfer files that are greater than a specific size using rsync –max-size option.

$ rsync -avz --max-size='100K' thegeekstuff@192.168.200.10:/var/lib/rpm/ /root/temp/
Password:
receiving file list ... done
./
Conflictname
Group
Installtid
Name
Sha1header
Sigmd5
Triggername

sent 252 bytes received 123081 bytes 18974.31 bytes/sec
total size is 45305958 speedup is 367.35

max-size=100K makes rsync to transfer only the files that are less than or equal to 100K. You can indicate M for megabytes and G for gigabytes.
Example 15. Transfer the Whole File

One of the main feature of rsync is that it transfers only the changed block to the destination, instead of sending the whole file.

If network bandwidth is not an issue for you (but CPU is), you can transfer the whole file, using rsync -W option. This will speed-up the rsync process, as it doesn’t have to perform the checksum at the source and destination.

# rsync -avzW thegeekstuff@192.168.200.10:/var/lib/rpm/ /root/temp
Password:
receiving file list ... done
./
Basenames
Conflictname
Dirnames
Filemd5s
Group
Installtid
Name

sent 406 bytes received 15810211 bytes 2874657.64 bytes/sec
total size is 45305958 speedup is 2.87

Awk Introduction and Printing Operations

Awk Introduction and Printing Operations

Awk is a programming language which allows easy manipulation of structured data and the generation of formatted reports. Awk stands for the names of its authors “Aho, Weinberger, and Kernighan”

The Awk is mostly used for pattern scanning and processing. It searches one or more files to see if they contain lines that matches with the specified patterns and then perform associated actions.

Some of the key features of Awk are:

Awk views a text file as records and fields.
Like common programming language, Awk has variables, conditionals and loops
Awk has arithmetic and string operators.
Awk can generate formatted reports

Awk reads from a file or from its standard input, and outputs to its standard output. Awk does not get along with non-text files.

Syntax:

awk '/search pattern1/ {Actions}
/search pattern2/ {Actions}' file

In the above awk syntax:

search pattern is a regular expression.
Actions – statement(s) to be performed.
several patterns and actions are possible in Awk.
file – Input file.
Single quotes around program is to avoid shell not to interpret any of its special characters.

Awk Working Methodology

Awk reads the input files one line at a time.
For each line, it matches with given pattern in the given order, if matches performs the corresponding action.
If no pattern matches, no action will be performed.
In the above syntax, either search pattern or action are optional, But not both.
If the search pattern is not given, then Awk performs the given actions for each line of the input.
If the action is not given, print all that lines that matches with the given patterns which is the default action.
Empty braces with out any action does nothing. It wont perform default printing operation.
Each statement in Actions should be delimited by semicolon.

Let us create employee.txt file which has the following content, which will be used in the
examples mentioned below.

$cat employee.txt
100 Thomas Manager    Sales       $5,000
200 Jason   Developer Technology $5,500
300 Sanjay Sysadmin   Technology $7,000
400 Nisha   Manager    Marketing   $9,500
500 Randy   DBA        Technology $6,000

Awk Example 1. Default behavior of Awk

By default Awk prints every line from the file.

$ awk '{print;}' employee.txt
100 Thomas Manager    Sales       $5,000
200 Jason   Developer Technology $5,500
300 Sanjay Sysadmin   Technology $7,000
400 Nisha   Manager    Marketing   $9,500
500 Randy   DBA        Technology $6,000

In the above example pattern is not given. So the actions are applicable to all the lines.
Action print with out any argument prints the whole line by default. So it prints all the
lines of the file with out fail. Actions has to be enclosed with in the braces.
Awk Example 2. Print the lines which matches with the pattern.

$ awk '/Thomas/
> /Nisha/' employee.txt
100 Thomas Manager    Sales       $5,000
400 Nisha   Manager    Marketing   $9,500

In the above example it prints all the line which matches with the ‘Thomas’ or ‘Nisha’. It has two patterns. Awk accepts any number of patterns, but each set (patterns and its corresponding actions) has to be separated by newline.
Awk Example 3. Print only specific field.

Awk has number of built in variables. For each record i.e line, it splits the record delimited by whitespace character by default and stores it in the $n variables. If the line has 4 words, it will be stored in $1, $2, $3 and $4. $0 represents whole line. NF is a built in variable which represents total number of fields in a record.

$ awk '{print $2,$5;}' employee.txt
Thomas $5,000
Jason $5,500
Sanjay $7,000
Nisha $9,500
Randy $6,000

$ awk '{print $2,$NF;}' employee.txt
Thomas $5,000
Jason $5,500
Sanjay $7,000
Nisha $9,500
Randy $6,000

In the above example $2 and $5 represents Name and Salary respectively. We can get the Salary using $NF also, where $NF represents last field. In the print statement ‘,’ is a concatenator.
Awk Example 4. Initialization and Final Action

Awk has two important patterns which are specified by the keyword called BEGIN and END.

Syntax:

BEGIN { Actions}
{ACTION} # Action for everyline in a file
END { Actions }

# is for comments in Awk

Actions specified in the BEGIN section will be executed before starts reading the lines from the input.
END actions will be performed after completing the reading and processing the lines from the input.

$ awk 'BEGIN {print "Name\tDesignation\tDepartment\tSalary";}
> {print $2,"\t",$3,"\t",$4,"\t",$NF;}
> END{print "Report Generated\n--------------";
> }' employee.txt
Name   Designation   Department   Salary
Thomas    Manager    Sales              $5,000
Jason    Developer    Technology    $5,500
Sanjay    Sysadmin    Technology    $7,000
Nisha    Manager    Marketing    $9,500
Randy    DBA       Technology    $6,000
Report Generated
--------------

In the above example, it prints headline and last file for the reports.
Awk Example 5. Find the employees who has employee id greater than 200

$ awk '$1 >200' employee.txt
300 Sanjay Sysadmin   Technology $7,000
400 Nisha   Manager    Marketing   $9,500
500 Randy   DBA        Technology $6,000

In the above example, first field ($1) is employee id. So if $1 is greater than 200, then just do the default print action to print the whole line.
Awk Example 6. Print the list of employees in Technology department

Now department name is available as a fourth field, so need to check if $4 matches with the string “Technology”, if yes print the line.

$ awk '$4 ~/Technology/' employee.txt
200 Jason   Developer Technology $5,500
300 Sanjay Sysadmin   Technology $7,000
500 Randy   DBA        Technology $6,000

Operator ~ is for comparing with the regular expressions. If it matches the default action i.e print whole line will be performed.
Awk Example 7. Print number of employees in Technology department

The below example, checks if the department is Technology, if it is yes, in the Action, just increment the count variable, which was initialized with zero in the BEGIN section.

$ awk 'BEGIN { count=0;}
$4 ~ /Technology/ { count++; }
END { print "Number of employees in Technology Dept =",count;}' employee.txt
Number of employees in Tehcnology Dept = 3

Then at the end of the process, just print the value of count which gives you the number of employees in Technology departme

will print all but very first column:

cat somefile | awk '{$1=""; print $0}'

will print all but two first columns:

cat somefile | awk '{$1=$2=""; print $0}'

Pages

Tuesday, February 26, 2013

Linux Split and Join Command to Manage Large Files

Rsync in detail

Awk Introduction and Printing Operations