Thursday, March 11

Regular Expression Matching in the Wild:

I spent the summer of 2006 building Code Search, which lets programmers search for source code using regular expressions. That is, it lets you grep through the world's public source code. We originally planned to use PCRE for the regular expression search, until we realized that it used a backtracking algorithm, meaning it is easy to make searches take exponential time or arbitrary stack depth. Since Code Search accepts regular expressions from anyone on the Internet, using PCRE would have left it open to easy denial of service attacks. As an alternative to PCRE, I wrote a new, carefully reviewed regular expression parser wrapped around Ken Thompson's open source grep implementation, which uses a fast DFA.

Russ Cox's home page:

What's grey? A melted penguin.

tags: topics/regex, topics/technical

p1k3 / 2010 / 3 / 11