[langsec-discuss] Computational Power of Restricted Languages

Jacob Torrey jacob at jacobtorrey.com
Tue Nov 11 21:47:23 UTC 2014


I will leave your main concern to the experts on this list, but I think I
can address your second question. (I think in LangSec, a restricted
language is one that expresses a restricted semantic model, i.e. one that
is not undecidable to determine halting).

I think you are conflating parsing with a parser. LangSec posits that by
not formally parsing and understanding the data the program is operating
on, you leave yourself vulnerable to unintended computations. If you have a
well-defined set of possible inputs to a program, and each maps to a valid
execution path, then there is no way for a user to "exploit" a bug as long
as the input is in the expected set. Certainly if you write a program that
always misbehaves regardless of input, you will show a counter-example to
this. The issue with the *vast* majority of software is that the input set
is NOT well-defined and as such, can drive the program to states that were
not intended by the programmer. This is commonly seen with return-oriented
programming, in which the input data (return addresses) are transformed to
unintended computation via a "weird machine".

Hope this helps. In short, if you have a bug that is not dependent on any
input (so it is always triggered), then you would be correct, otherwise you
are looking too far downstream of the issue at the symptom, rather than the
cause.

- Jacob

On Tue, Nov 11, 2014 at 2:31 PM, Taylor Hornby <havoc at defuse.ca> wrote:

> Hi,
>
> I'm doing a LangSec binge and noticing a frequent error in LangSec
> material (papers, talks). I'm not sure if it's just an "abuse of
> notation" or an actual error, so hopefully someone on this list can
> explain what is actually meant. I'm new to LangSec, so please forgive me
> if I'm missing something obvious.
>
> Here's one example. Quoting the footnote on page 5 of "Security
> Applications of Formal Language Theory"...
>
>     "...HTML5+CSS3 (shown to be UNDECIDABLE by virtue of its ability to
>     implement Rule 110)..."
>
> The fact that HTML5+CSS3 can specify computation that is as powerful as
> a Turing machine does not mean the language itself is undecidable or
> even requires a Turing machine to decide.
>
> For example, the Turing-complete language "brainfuck" is as powerful as
> a Turing machine, but the language specifying a brainfuck program is
> context-free.
>
>     Brainfuck -> OPS
>     OP -> > | < | + | - | . | ,
>     OPS -> OPS OP | OPS Loop
>     Loop -> '[' OPS ']'
>
> Similarly, any reasonable encoding of a Turing machine is context-free.
> Most generally, the set of all strings is a regular language, and
> clearly you can encode Turing machines with that language! The fact that
> a language is, say, regular, does not guarantee that its semantics are
> not Turing-complete.
>
> So: What is meant by this? Is it abuse of notation for "Not only should
> your syntax be regular or context-free, but your semantics should be as
> well?"
>
> Another claim I'm skeptical of (for the same reason) is that having
> a correct parser will make it less likely for input to trigger bugs that
> are "behind" the parser. That's not true, because if you have a bug that
> is not related to parsing, that bug can be exploited without assuming
> anything about the parser, so changing the parser won't affect the
> exploitability of that bug. Given that, LangSec seems to boil down to
> "Don't write code when you can have it generated for you."
>
> Thanks,
> --
> Taylor Hornby
> Disclaimer: I'm wrong a lot of the time
> _______________________________________________
> langsec-discuss mailing list
> langsec-discuss at mail.langsec.org
> https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.langsec.org/pipermail/langsec-discuss/attachments/20141111/de9d65e8/attachment.html>


More information about the langsec-discuss mailing list