Victus Spiritus

home

Regular expressions blues

01 Feb 2011

Or why I hate regular expressions, they're not designed for readability

Early this morning I caught up on a recent changelog covering Happy.js, a lightweight form validation plugin. My first reaction, nifty. I can use a handy form validator for web app inputs. As I was skimming through the example validation functions the REGEX reached out the page and slapped me in the face. It was only then that I realized how unintuitive regular expressions are.

I can handle this regex for phone numbers, while it takes a little familiarity to follow at least it fits in a pre tag:

/^\(?(\d{3})\)?[\- ]?(\d{3})[\- ]?(\d{4})$/.test(val)

Then came this beast for email validation, go ahead keep on scrolling right...

What I'd prefer is an expressive syntax that fits with patterns as I think of them. So for a phone number validation it would be #(###)###-#### where # specifies a singlet digit number, or an equivalent abbreviated form #({3}#){3}#-{4}#. Same for email *@*.EXTs where * is any character sequence and EXTs specifies a user defined list of acceptable extensions. Explicit character matches can be done by adding the characters i.e. *@gmail.com for all gmail addresses. Of course this syntax will need an escape character for reserved symbols and the standard case insensitive flag or global versus first match replacement (i and g flags for regular expressions). Unfortunately it appears all I've done is reinvent another flavor of regular expression.

Another tactic may aid in simplifying the syntax appearance, rearranging the validation string into rows.
#
({3}#))
{3}#
-
{4}#
Not exactly a thing of beauty. I don't have any more time at the moment, but I'd like to revisit the problems with easy regular expression reading.