Thursday, September 2, 2010

Validating numbers in a perl script

This morning one of my perlish friends asked me for help. He had a regular expression that was used for validating a string which contained numbers. The regex was failing and he wanted to know how to fix it. The strings he had were "12.3%", "1.25%" and similar. Numbers in it came from disk space usage, in percentages varying from 0% to 100% Similar to what we see in the output of df command.
And he wanted to extract the numeric part of it, to be used in numeric calculations.

His regex was not taking into account that these numbers could have "optional" parts. They could be "100%", "12%", "0.8%", "0%" and so on. Here is the regex I gave him.

[root@sonash5 tmp]# perl -e '$str = "1.23%"; print "matches" if ($str =~ m/^((\d{1,3})?(\.\d+)?)\s*%$/); print ", and the match is $1 \n"'
matches, and the match is 1.23
[root@sonash5 tmp]#

Are you thinking, why not just remove the % character from end of the string and use the rest of it as numeric entity?

That's a bad idea. What if some machine gives you "75 %"? Or even "75", assuming that they are percentages? And the string obtained could even be " 34.5%", resulting in the numeric calculations go wrong. When those string are coming from a source that you don't know, or may vary based on the operating system and the hardware and other things, it is a good idea to validate the string for the numeric pattern that you are looking for. And if the pattern is found, extract it and use it further.

Tuesday, May 25, 2010

How do I I find doubled words in a file

Noticed that mistake in the title? Or missed it? Here's how to catch those in your text files.

perl -nl -e 'print if m/\b(\w+)\s+\1/' filename

Instead if the whole line, you just want to see the line number and the word that is doubled?

perl -nl -e 'print "$. : $1" if m/\b(\w+)\s+\1/' filename

And you want to correct these?

perl -pi -e 's/\b(\w+)\s+\1/\1/g' filename

Friday, May 21, 2010

How do I get difference between two dates that are in yyyy-mm-dd format

A few days ago one of my friends asked me - How to get the difference in terms of number of days from two dates stored in yyyy-mm-dd format. I was in a hurry and told him how to go about, but couldn't show him the exact steps.

Yesterday I posed this questions to the interns in our company, thinking they would benefit from solving it. I told them to read the date values from shell environment variables, and use the programming language /tools of their choice to prepare the solution.

One guy wrote a neat C code to get the answer. He had to maintain a data structure to store how many days are present in all the months of the year, and also a function to find if the given year is a leap year or not. That resulted in the code growing to about 100 lines.

I asked them if they could think of some other approach, and if they were aware of how time is maintained in UNIX systems.

A perl script that uses package Time::Local makes the task easier to accomplish.

#!/bin/perl
use strict;
use warnings;
use Time::Local;

my $date1 = $ENV{'date1'};
my $date2 = $ENV{'date2'};

my @date1_breakup = split(/-/, $date1);
my @date2_breakup = split(/-/, $date2);

my $date1_unix = timelocal(0, 0, 0, $date1_breakup[2], $date1_breakup[1], $date1_breakup[0]);
my $date2_unix = timelocal(0, 0, 0, $date2_breakup[2], $date2_breakup[1], $date2_breakup[0]);

my $diffSeconds = $date2_unix - $date1_unix;
my $diffDays = $diffSeconds / (60 * 60 * 24);
print "difference in days : $diffDays\n";



A plain simple bourne shell script can also do the job for us, if the date command supports -d option.

#/bin/sh
date1_unix=`date -d $date1 +%s`
date2_unix=`date -d $date2 +%s`
diff=`expr $date2_unix - $date1_unix`
diff_days=`expr $diff / 86400` # Number of seconds in a day are 86400
echo $diff_days


yogeshs@yogesh-laptop:~/temp$ export date1=2010-04-22
yogeshs@yogesh-laptop:~/temp$ export date2=2010-05-23
yogeshs@yogesh-laptop:~/temp$ ./date_diff.sh
31
yogeshs@yogesh-laptop:~/temp$


Here's what happened inside the shell script.
yogeshs@yogesh-laptop:~/temp$ sh -x date_diff.sh
+ date -d 2010-04-22 +%s
+ date1_unix=1271874600
+ date -d 2010-05-23 +%s
+ date2_unix=1274553000
+ expr 1274553000 - 1271874600
+ diff=2678400
+ expr 2678400 / 86400
+ diff_days=31
+ echo 31
31
yogeshs@yogesh-laptop:~/temp$



You still want to see how this is done using C? Do let me know and I'll share that piece of code with you.