[langsec-discuss] embedded languages

Harald Lampesberger h.lampesberger at cdcc.faw.jku.at
Tue Dec 4 16:01:04 UTC 2012

Hello langsec,

the list has gotten pretty silent in the last months, so let's start a
thread again. I would be very interested in who is actively researching
topics of langsec, maybe for discussion or collaboration.

In my case, I want to refine the language-theoretic vulnerability
problem to problems we face with the cloud (web) service stuff. The
result should give some insight into potential countermeasures, whether
they are feasible or make sense at all. Until now, I focus on
semi-structured data (XML) and its theoretical properties. Some of my
current problems/findings are:

* Chomsky hierarchy is for string languages, but semi-structured data is
many cases a regular tree language encoded as context-free string.

* If the underlying structure is a tree, secure interpretation boils
down to assignment of concise types to every node. If the type is clear,
the node semantics are clear.

* With data nodes it becomes complicated. The data could be practically
any kind of structure encoded as string. The type might be clear on a
syntactic level from the node (e.g. primitive string), but the semantics
are not. We have an "embedded language" that depends on the node type,
e.g. Javascript within an XHTML script tag.

* Embedded languages are everywhere as a result of layered protocol
design. For example, a web service parses a submitted HTTP request. The
accepted language of the web server is limited to the header and does
not necessarily restrict the body. But the called service handler
expects a specific body language.

What I want to do is to formalize the whole thing as an abstract model
of communicating agents and relate known attacks/vulnerabilities with
it. Based on this model, I can, for example, present my own
countermeasure and argue what it can and cannot solve.
My problem is, how to approach embedded languages. There is definitely
some type theory required and I suspect, its decidable(?) if the
"embedding" is visible. So what do you think? I would very much
appreciate feedback or suggestions to my idea. :)


More information about the langsec-discuss mailing list