Saturday, January 12, 2013

Linux: Comparing 2 files - 'comm'

Many a times, i encountered the situation wherein I would be having 2 files with some values in each. And I need to get common records in both the files and also, unique records for each files.

So, this is how  I do - one of the easiest way.

Below are the files to be compared:



# cat files1.txt1
2
3
4
5
6
7
8
9
0


# cat files2.txt
0
2
9
11
24
46
8
7
5



First sort them. Better do Unique sort - remove duplicate values in each file:

# sort -u files1.txt > unq_sortd_files1.txt
# sort -u files2.txt > unq_sortd_files2.txt


Now the files are sorted:

# cat unq_sortd_files1.txt
0
1
2
3
4
5
6
7
8
9
# cat unq_sortd_files2.txt
0
11
2
24
46
5
7
8
9

 

Don't worry after seeing above output.. 'sort' command always do sort based on column wise. However, it won't affect our goal!

Now use, 'comm' command to get what u needed:

# comm unq_sortd_files1.txt unq_sortd_files2.txt
                0
1
        11
                2
        24
3
4
        46
                5
6
                7
                8
                9


Confused..?!  
By default, comm gives output in 3 columns. 1st column is unique values of 1st file, 2nd is unique values of 2nd file and 3rd column is the common records of both the files..! 
Simple isn't it..?!

You can make it more simple as below:

#comm -12 unq_sortd_files1.txt unq_sortd_files2.txt
0
2
5
7
8
9


# comm -23 unq_sortd_files1.txt unq_sortd_files2.txt
1
3
4
6
 

# comm -13 unq_sortd_files1.txt unq_sortd_files2.txt
11
24
46


Yes..you can just suppress whatever not needed using '-' symbol. '-1' will suppress 1st column and same for '-2' and '-3'.

Any doubts?!

No comments:

Post a Comment

Thank you for Commenting Will reply soon ......

Featured Posts

#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc

 #Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...