[langsec-discuss] langsec implications on Re-DOS Attacks
h.lampesberger at cdcc.faw.jku.at
Sun Oct 27 07:43:11 UTC 2013
> Imagining there are no back references.
> For example , How do we fix simple evil regex's like the one mentioned here say (a+)+
> Does this simple regular expression match Regular grammar ? or CFG or CSG ?
> Since they are causing ReDOS attacks , can we say they are CSG or above ?
In my opinion there is no evil (backref-free) regex, there are only implementations that (may) suck. The given expression is regular in the formal sense because there are no backreferences. The evilness comes from the implementation of the matcher. Because it is regular, we know that there is a unique deterministic finite state automaton, but the number of states can be exponential in the length of the expression. This implies that the conversion from expression to matcher can take exponential time in the worst case.
A backtracking implementation now safes time by skipping this conversion part; the expression is interpreted like a program and nondeterminism is shifted to runtime. A matcher starts of with a single thread datastructure in memory and with every input to the matcher, the thread moves along the expression from left to right. In the case of nondeterminism (*,+), the thread splits: the current thread datastructure becomes two with different progressions in the expression.
The given evil expression now splits with every input, therefore the number of thread datastructures in memory grows exponentially with every input and so does the time necessary to progress an input. This is your DoS! No CFG or CSG magic, just complexity.
Why do implementations backtrack instead of using nice automata? Because you need backtracking for backreferences in PCRE! Libs like google RE2 drop backreferences from PCRE, so all the cool automata-theoretic concepts work.
> Also how can we determine (programmatically) whether given a regex it's equivalent grammar is CSG or above (thus undecidable) and can potentially cause REDOS ?
Backreferences. The moment you need them, you should step back and ask yourself why you need them. Either the expression can be rewritten to stay regular, or you should think more about the problem you want to solve and use an appropriate parser instead.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the langsec-discuss