Just another mundane earthling : अशाच एखाद्याची बखर: November 2008

Monday, November 17, 2008

How to count occurences of a tag in an xml file

How can you find out total occurrences of a particular tag in an XML file? Say, I want to find out how many property tags are present in an XML file.

grep is not sufficient for this task. Here is a perl script that does the job.

#!/usr/bin/perl
# count_xml_tags.pl
my $xml_tag = shift;
my $filename = shift;
my $count = 0;
open (X_FILE, '<', $filename) or die "Failed to read file $filename : $!";
{
local $/;
while (<X_FILE>) {
while (m#<$xml_tag>(.*?)</$xml_tag>#gs) {
$count++;
}
}
}
close (X_FILE);
print "$count $xml_tag tag(s) found.\n";

Run this script as:
% perl count_xml_tags.pl some_tag filename.xml

How to remove duplicate lines from a file

Our on-line publishing system has a text file that contains certain entries, one per line. Some entries were duplicate, and we wanted to remove them.

This can be done using sort and uniq commands.
sort /foo/bar | uniq > /new/bar

But we wanted to retain the order of lines, and so didn't want to sort the file. I found a solution using awk.
awk '!x[$0]++' /foo/bar > /new/bar

And how do I check if the file contains duplicate lines or not? The -d option of uniq command is helpful in this case.
sort /foo/bar | uniq -d

Just another mundane earthling : अशाच एखाद्याची बखर

Monday, November 17, 2008

How to count occurences of a tag in an xml file

How to remove duplicate lines from a file

Archives

Subscribe to this blog

Followers

About Me