Thursday, September 2, 2010

Validating numbers in a perl script

This morning one of my perlish friends asked me for help. He had a regular expression that was used for validating a string which contained numbers. The regex was failing and he wanted to know how to fix it. The strings he had were "12.3%", "1.25%" and similar. Numbers in it came from disk space usage, in percentages varying from 0% to 100% Similar to what we see in the output of df command.
And he wanted to extract the numeric part of it, to be used in numeric calculations.

His regex was not taking into account that these numbers could have "optional" parts. They could be "100%", "12%", "0.8%", "0%" and so on. Here is the regex I gave him.

[root@sonash5 tmp]# perl -e '$str = "1.23%"; print "matches" if ($str =~ m/^((\d{1,3})?(\.\d+)?)\s*%$/); print ", and the match is $1 \n"'
matches, and the match is 1.23
[root@sonash5 tmp]#

Are you thinking, why not just remove the % character from end of the string and use the rest of it as numeric entity?

That's a bad idea. What if some machine gives you "75 %"? Or even "75", assuming that they are percentages? And the string obtained could even be " 34.5%", resulting in the numeric calculations go wrong. When those string are coming from a source that you don't know, or may vary based on the operating system and the hardware and other things, it is a good idea to validate the string for the numeric pattern that you are looking for. And if the pattern is found, extract it and use it further.

No comments:

Post a Comment