[langsec-discuss] langsec implications on Re-DOS Attacks

Sashank Dara krishna.sashank at gmail.com
Mon Oct 28 02:53:59 UTC 2013

I suspect there are other forms of regex's too other than backlashes that
can cause problem .

For example , the context sensitive grammar example given in the link
here .<http://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html>does
not have back lashes !
Since the acceptance of CSG is UnDecidable this could cause  attacks too
according to langsec principles .

Is my understanding correct ?


On Sun, Oct 27, 2013 at 1:13 PM, Harald Lampesberger <
h.lampesberger at cdcc.faw.jku.at> wrote:

> Hey Sashank,
> Imagining there are no back references.
> For example , How do we fix simple evil regex's<https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS>like  the one mentioned here say (a+)+
> Does this simple regular expression match Regular grammar ? or CFG or CSG
> ?
> Since they are causing ReDOS attacks , can we say they are CSG or above ?
> In my opinion there is no evil (backref-free) regex, there are only
> implementations that (may) suck. The given expression is regular in the
> formal sense because there are no backreferences. The evilness comes from
> the implementation of the matcher. Because it is regular, we know that
> there is a unique deterministic finite state automaton, but the number of
> states can be exponential in the length of the expression. This implies
> that the conversion from expression to matcher can take exponential time in
> the worst case.
> A backtracking implementation now safes time by skipping this conversion
> part; the expression is interpreted like a program and nondeterminism is
> shifted to runtime. A matcher starts of with a single thread datastructure
> in memory and with every input to the matcher, the thread moves along the
> expression from left to right. In the case of nondeterminism (*,+), the
> thread splits: the current thread datastructure becomes two with different
> progressions in the expression.
> The given evil expression now splits with every input, therefore the
> number of thread datastructures in memory grows exponentially with every
> input and so does the time necessary to progress an input. This is your
> DoS! No CFG or CSG magic, just complexity.
> Why do implementations backtrack instead of using nice automata? Because
> you need backtracking for backreferences in PCRE! Libs like google RE2 drop
> backreferences from PCRE, so all the cool automata-theoretic concepts work.
> Also how can we determine (programmatically) whether given a regex it's
> equivalent grammar is CSG or above (thus undecidable) and can potentially
> cause REDOS ?
> Backreferences. The moment you need them, you should step back and ask
> yourself why you need them. Either the expression can be rewritten to stay
> regular, or you should think more about the problem you want to solve and
> use an appropriate parser instead.
> Regards, Harald
> _______________________________________________
> langsec-discuss mailing list
> langsec-discuss at mail.langsec.org
> https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.langsec.org/pipermail/langsec-discuss/attachments/20131028/c012d50b/attachment.html>

More information about the langsec-discuss mailing list