Just another mundane earthling : अशाच एखाद्याची बखर: How to remove duplicate lines from a file

Monday, November 17, 2008

How to remove duplicate lines from a file

Our on-line publishing system has a text file that contains certain entries, one per line. Some entries were duplicate, and we wanted to remove them.

This can be done using sort and uniq commands.
sort /foo/bar | uniq > /new/bar

But we wanted to retain the order of lines, and so didn't want to sort the file. I found a solution using awk.
awk '!x[$0]++' /foo/bar > /new/bar

And how do I check if the file contains duplicate lines or not? The -d option of uniq command is helpful in this case.
sort /foo/bar | uniq -d

Just another mundane earthling : अशाच एखाद्याची बखर

Monday, November 17, 2008

How to remove duplicate lines from a file

No comments:

Post a Comment

Archives

Subscribe to this blog

Followers

About Me