Monday, November 17, 2008

How to count occurences of a tag in an xml file

How can you find out total occurrences of a particular tag in an XML file? Say, I want to find out how many property tags are present in an XML file.

grep is not sufficient for this task. Here is a perl script that does the job.

#!/usr/bin/perl
# count_xml_tags.pl
my $xml_tag = shift;
my $filename = shift;

my $count     = 0;
open (X_FILE, '<', $filename) or die "Failed to read file $filename : $!";
{
    local $/;
    while (<X_FILE>) {
        while (m#<$xml_tag>(.*?)</$xml_tag>#gs) {
            $count++;
        }
    }
}
close (X_FILE);
print "$count $xml_tag tag(s) found.\n";

Run this script as:
% perl count_xml_tags.pl some_tag filename.xml

How to remove duplicate lines from a file

Our on-line publishing system has a text file that contains certain entries, one per line. Some entries were duplicate, and we wanted to remove them.

This can be done using sort and uniq commands.
sort /foo/bar | uniq > /new/bar

But we wanted to retain the order of lines, and so didn't want to sort the file. I found a solution using awk.
awk '!x[$0]++' /foo/bar > /new/bar

And how do I check if the file contains duplicate lines or not? The -d option of uniq command is helpful in this case.
sort /foo/bar | uniq -d