[langsec-discuss] two parsing questions

Sven M. Hallberg pesco at khjk.org
Tue May 22 13:05:26 UTC 2018


Hi List,

I've been revisiting the work I did with the DNP3 parser, aiming to
produce a proper context-free grammar for the application layer
messages. Two questions have popped up and I would appreciate pointers
to any previous work in these areas.

1. Parsing binary (i.e. bitwise) structures below the byte boundary,
   e.g. flags and 4-bit numbers. Has there been any work on adapting,
   say, LR parsing to efficiently work on a (theoretical) alphabet of
   {0,1,$}? I'm assuming that simply using the standard algorithms with
   that alphabet would ruin performance. I've spent some thought on
   optimizations or ways of side-stepping this issue but my literature
   searches have come up empty so far.

2. Classes of finite languages. A lot of practical languages and
   structures are actually finite (DNP3 messages have a maximum size,
   for instance), but enumerating them for the purpose of parsing or
   even pure recognition is impractical/inefficient/undesirable.
   Are there any formal categorizations here, maybe along the lines of
   "admits a grammar with the number of rules linear/logarithmic/... in
   the maximum word length/size of language/..."?

Regards,
Sven


More information about the langsec-discuss mailing list