[langsec-discuss] Supporting length fields in grammar formalisms (was: extending JSON to be more friendly towards carrying arbitrary data)

Sven Moritz Hallberg pesco at khjk.org
Thu Jun 14 15:17:19 UTC 2012


On Thu, 14 Jun 2012 02:51:48 +0100, David-Sarah Hopwood <david-sarah at jacaranda.org> wrote:
> >> The idea is to break binary data into chunks of uniform size. I chose
> >> 4096 bytes rather arbitrarily. Allow one final chunk of variable
> >> length and encode that one in Base64. So every 4kB, there is one
> >> character (#) which means "another 4k coming". There need not be any
> >> such "raw chunks"; they are always followed by exactly one (possibly
> >> empty) Base64 string enclosed in %. Examples:
> >>
> >>     #.....#.....%ZGFmdXFp%
> >>     #.....#.....%%
> >>     %ZGFmdXFp%
> > 
> > I'm eager to know whether you think this is a good idea or not.
> 
> That's an interesting way to avoid length fields. But I'm not so sure that the mild
> context-sensitivity introduced by length fields is a problem, *provided* it is
> supported by the parser framework. [...]
> 
> - repetition  =  [repeat] element
> + repetition  =  [repeat] [lengthfieldspec] element
> 
> + lengthfieldspec  =  "@" lengthfieldsize ["+" lengthoffset]
> + lengthfieldsize  =  1*DIGIT
> + lengthoffset     =  1*DIGIT
> 
> [...]
> 
> This would mean that the element is preceded by a big-endian binary length field
> of size (in bytes) given by lengthfieldsize. The length field specifies the total
> number of bytes in the representation of the element, optionally with an added
> offset. This is probably easier to see with some examples.

Assuming we're OK with a mildly context-free grammar in principle, this
does make a number of assumptions, e.g. that length fields always
directly precede the repeated sequence, consist of raw bytes, etc. The
details seem to become complicated, so I'd consider the uniform chunk
approach more elegant for a from-scratch design.

It might be worth it for dealing with the length fields in existing
protocols, though.


Thanks!
pesco


More information about the langsec-discuss mailing list