For every ‘evil’ regex, does there exist a non-evil alternative, or is the devil in the grammar?

Apparently, ReDos attacks exploit characteristics of some (otherwise useful) regular expressions … essentially causing an explosion of possible paths through the graph defined by the NFA.

Is it possible to avoid such problems by writing an equivalent ‘non-evil’ regex? If not (thus, the grammar can’t be handled in practical space/time by an NFA), what parsing approaches would be better? Why?

Answer

It depends upon whether you’ve got a regular expression or a regexp: regexps are evil, but regular expressions are a thing of beauty and will never turn evil on you.

By regexp, I mean a modern regular expression: i.e., a regular expression with additional modern features such as backreferences — e.g., a Perl-compatible regular expression. This is more powerful than a classical regular expression from a formal languages / automata theory textbook, as classical regular expressions don’t allow backreferences, lookahead, lookbehind, and so on.

.. In comparison, some modern regexps are unavoidably evil. If you have a modern regexp, then matching can require exponential time. In particular, regexps with backreferences can recognize NP-hard languages. Consequently, under plausible assumptions, there exists a class of evil regexps where testing for a match takes exponential time. Thus, some modern regexps are unavoidably evil: there is no feasible way to find an equivalent regexp that won’t cause exponential blowup in running time to match.