pboyd.io

The regex [,-.]

[,-.]

I stumbled on this regex recently: \d{2}[,-.]\d{2}.

The intention is clear enough: match two sets of two digits separated by a comma, a dash, or a period. Of course, it shouldn’t work. Dashes in character classes are special because they’re used for ranges (like [a-z] to match lower-case ASCII letters). If you want - in a character class you put it at the beginning, or the end, never the middle. So this should be [-,.] not [,-.].

I assumed that [,-.] was a typo and it wouldn’t match -, but I couldn’t find a bug. In fact, it works fine, you can try it yourself:

$ perl -E 'say "ok" if "12-34" =~ /\d{2}[,-.]\d{2}/'
ok

Try a few variations with other characters, if you want. It won’t match - unless it’s at the beginning or end of the character class, except for [,-.]:

$ perl -E 'say "ok" if "12-34" =~ /\d{2}[*-,]\d{2}/'
$ perl -E 'say "ok" if "12-34" =~ /\d{2}[a-z]\d{2}/'
$ perl -E 'say "ok" if "12-34" =~ /\d{2}[-*,]\d{2}/'
ok
$ perl -E 'say "ok" if "12-34" =~ /\d{2}[-a-z]\d{2}/'
ok
$ perl -E 'say "ok" if "12-34" =~ /\d{2}[,-.]\d{2}/'
ok

What’s going on? Well, if you haven’t figured it out already, the comma, the dash, and the period are right next to each other in ASCII:

$ man ascii
...
       054   44    2C    ,
       055   45    2D    -
       056   46    2E    .
...

So grabbing all characters from , to . also includes - and nothing else. [,-.] is the only possible character class with a - in the middle that only matches -.

I can’t tell if the author was being clever or if it was a particularly lucky typo. Either way, I will continue putting - at the beginning: [-,.]. But if you feel the need to send someone down a rabbit trail [,-.] is an option for you.

Discussions about this post: