Hierarchical File Systems are Dead

For over forty years, we have assumed hierarchical file system namespaces. These namespaces were a rudimentary attempt at simple organization. As users have begun to interact with increasing amounts of data and are increasingly demanding search capability, such a simple hierarchical model has outlasted its usefulness. For this reason, we should design file systems whose organizations map to the ways we access and manipulate data now. We present a new file system architecture in which we replace the hierarchical namespace with a tagged, search-based one.

Regular expression Denial of Service – ReDoS

The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). An attacker can then cause a program using a Regular Expression to enter these extreme situations and then hang for a very long time.

For every ‘evil’ regex, does there exist a non-evil alternative, or is the devil in the grammar?

Apparently, ReDos attacks exploit characteristics of some (otherwise useful) regular expressions … essentially causing an explosion of possible paths through the graph defined by the NFA.

Is it possible to avoid such problems by writing an equivalent ‘non-evil’ regex? If not (thus, the grammar can’t be handled in practical space/time by an NFA), what parsing approaches would be better? Why?

Answer

It depends upon whether you’ve got a regular expression or a regexp: regexps are evil, but regular expressions are a thing of beauty and will never turn evil on you.

By regexp, I mean a modern regular expression: i.e., a regular expression with additional modern features such as backreferences — e.g., a Perl-compatible regular expression. This is more powerful than a classical regular expression from a formal languages / automata theory textbook, as classical regular expressions don’t allow backreferences, lookahead, lookbehind, and so on.

.. In comparison, some modern regexps are unavoidably evil. If you have a modern regexp, then matching can require exponential time. In particular, regexps with backreferences can recognize NP-hard languages. Consequently, under plausible assumptions, there exists a class of evil regexps where testing for a match takes exponential time. Thus, some modern regexps are unavoidably evil: there is no feasible way to find an equivalent regexp that won’t cause exponential blowup in running time to match.