[langsec-discuss] How do i prove a Domain Specific Language is of particular formal grammar

Harald Lampesberger h.lampesberger at cdcc.faw.jku.at
Thu Sep 6 09:18:52 UTC 2012


Hello Sashank,

I am currently working on security topics involving XML and maybe I can
help you.

Pure XML (only tags) is a first-child-next-sibling serialization of an
unranked tree. Chomsky hierarchy is about string languages, but now we
talk about trees. The structure of XML can be described with DTD, or in
your case of SAML with XML Schema, which are in fact XML grammars. DTD
and XML Schema have some determinism constraints (1-unambigiousness,
UPA) in the standards, so they represent a subset of regular tree
grammars. The resulting XML language is a regular tree language, so far
so good! Regular tree languages have all the decidability properties you
want.

In my opinion the troubles with regularity start with:
 - data --> could be practically any language
 - attributes: you could encode them as unordered set of tree nodes, but
without order --> exponential blowup of determ. grammars (still regular)
 - ID/IDREF --> cross-reference = context-sensitivity?

If you can show that SAML has a closed set of data languages and no
IDREFs, than it is clearly a regular tree grammar.

For good literature about regular trees, see the TATA book:
http://tata.gforge.inria.fr/
For theoretical aspects of XML, a good starting point is Prof. Thomas
Schwentick's paper: http://www.springerlink.com/content/e23n24u4230w1162/
(for non-springers) http://fox7.eu/wp-content/uploads/xmlsnapshot2.pdf


Greetings,
Harald


On 06.09.2012 09:53, Krishna Sashank O V (MS2011006) wrote:
> Hi ,
> 
> I have come across this interesting paper
> <http://www.cs.dartmouth.edu/%7Esergey/langsec/papers/langsec-tr.pdf> on
> langsec, that talks about "Security applications of formal language
> theory" ,
> 
> At the page number 8 states that HTTP is context-sensitive , javascript
> is non deterministic context-free etc .
> 
> 
> Now, we have a domain specific language that is XML based i.e SAML (
> Secure Assertion Mark Up Language )
> 
> http://en.wikipedia.org/wiki/Security_Assertion_Markup_Language
> 
> 
> I would like to analyze and prove the class of chomsky hierarchy of
> languages it falls under , like whether it is regular, deterministic
> context-free , etc ?
> 
> 
> How do I start about this exercise , are there any references of similar
> exercises ? any directions ?
> 
> 
> 
> Regards
> 
> Sashank
> 
> 
> 
> 
> _______________________________________________
> langsec-discuss mailing list
> langsec-discuss at lists.langsec.org
> https://lists.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
> 



More information about the langsec-discuss mailing list