[langsec-discuss] Supporting length fields in grammar formalisms (was: extending JSON to be more friendly towards carrying arbitrary data)

David-Sarah Hopwood david-sarah at jacaranda.org
Thu Jun 14 01:51:48 UTC 2012

On 13/06/12 14:30, Sven Moritz Hallberg wrote:
> Hello list!


> I'd like to hear your thoughts and comments on a little project I
> undertook recently. I saw Meredith and Sergey's CCC talk on "The
> Science of Insecurity" and also met them at Berlinsides a few weeks
> ago. What I wanted to do was try to come up with a data serialization
> format that could transport large binary chunks without recoding every
> byte, yet still avoid length fields. Both of them seemed hesitant to
> give my idea (which is nothing special) their immediate "seal of langsec
> approval", but did encourage to report any results to this list.
> So here goes. I completed the grammar and brought my demo parser to a
> point where I can show it. For extra fun, I had decided to use the
> opportunity to try out the hammer parser library which Meredith and TQ
> presented at Berlinsides. It went pretty well and I'm eager to see where
> they take that project.
> I put some more info into a blogpost:
> http://www.khjk.org/log/2012/jun/datalang.html
> Here is the main point:
>> The idea is to break binary data into chunks of uniform size. I chose
>> 4096 bytes rather arbitrarily. Allow one final chunk of variable
>> length and encode that one in Base64. So every 4kB, there is one
>> character (#) which means "another 4k coming". There need not be any
>> such "raw chunks"; they are always followed by exactly one (possibly
>> empty) Base64 string enclosed in %. Examples:
>>     #.....#.....%ZGFmdXFp%
>>     #.....#.....%%
>>     %ZGFmdXFp%
> I'm eager to know whether you think this is a good idea or not.

That's an interesting way to avoid length fields. But I'm not so sure that the mild
context-sensitivity introduced by length fields is a problem, *provided* it is
supported by the parser framework. (I agree without qualification that hand-rolling
protocol-specific parsers that handle length fields is a bad idea.)

For instance, we can imagine an extension to ABNF that supports fixed-size binary
length fields, something like this (diff to the ABNF grammar in

- repetition  =  [repeat] element
+ repetition  =  [repeat] [lengthfieldspec] element

+ lengthfieldspec  =  "@" lengthfieldsize ["+" lengthoffset]
+ lengthfieldsize  =  1*DIGIT
+ lengthoffset     =  1*DIGIT

(I think that's added in a reasonable place, but haven't thought very carefully
about precedence. The examples below use explicit parens, which are allowed for
element, to make the precedence clear.)

This would mean that the element is preceded by a big-endian binary length field
of size (in bytes) given by lengthfieldsize. The length field specifies the total
number of bytes in the representation of the element, optionally with an added
offset. This is probably easier to see with some examples.

(Note: I am not putting forward TCPv4 or SSL/TLS as good instances of how to design
protocols for ease of parsing -- just as reasonably well-known examples to illustrate
the above grammar extension.)

Example 1 -- TCPv4 options as defined in http://tools.ietf.org/html/rfc793#section-3.1,
plus the "Packet Mood" option from http://tools.ietf.org/html/rfc5841 [*]:

For simplicity we take the End-Of-Option option's Kind field as being part of the

  OptionsAndPadding  =  *Option *Padding
  Option             =  NoOpOption / KnownOption / UnknownOption
  Padding            =  %x00
  NoOpOption         =  %x01
  KnownOption        =  MSSOption / MoodOption

  MSSOption          =  %x02 @1+2(MaxSegmentSize)
  MaxSegmentSize     =  2*BYTE

  MoodOption         =  %x19 @1+2(Emoticon)
  Emoticon           =  *(%x00-7F)

  UnknownOption      =  (%x03-18 / %x1A-FF) @1+2(*BYTE)

Note that the length field of MSSOption is constrained to be 4, so it could
equivalently have been defined as %x02 %x04 2*BYTE. But that wouldn't work for
variable-length options like MoodOption, and in any case the definition above
is clearer (you don't have to manually encode the length in the grammar).

This notation is not expressive enough to state the requirement that OptionsAndPadding
must be a multiple of 4 bytes, but that *is* expressible if you also extend ABNF
with an And-predicate (&) operator, as in Parsing Expression Grammars:

  OptionsAndPadding  =  &(*(4*BYTE)) *Option *Padding

Example 2 -- TLS handshake messages as defined in
https://tools.ietf.org/html/rfc4346#appendix-A.4 :

  HelloRequestHandshake        =  %d0  @3(HelloRequest)
  ClientHelloHandshake         =  %d1  @3(ClientHello)
  ServerHelloHandshake         =  %d2  @3(ServerHello)
  CertificateHandshake         =  %d11 @3(Certificate)
  ServerKeyExchangeHandshake   =  %d12 @3(ServerKeyExchange)
  CertificateRequestHandshake  =  %d13 @3(CertificateRequest)
  ServerHelloDoneHandshake     =  %d14 @3(ServerHelloDone)
  CertificateVerifyHandshake   =  %d15 @3(CertificateVerify)
  ClientKeyExchangeHandshake   =  %d16 @3(ClientKeyExchange)

  Handshake  =  HelloRequestHandshake / ClientHelloHandshake / ServerHelloHandshake /
                CertificateHandshake / CertificateRequestHandshake /
                ServerHelloDoneHandshake / CertificateVerifyHandshake /
                ClientKeyExchangeHandshake / FinishedHandshake

  ClientHello  =  ProtocolVersion


[I haven't included an UnknownHandshake alternative in Handshake because the specified
behaviour on receiving an unknown handshake message is to abort, with the same fatal
alert (unexpected_message) as for an unparseable message. This is unlike the TCP example
where we needed to ignore unknown options.]

[*] which should get some kind of award for the funniest reference to DSM-IV :-)

David-Sarah Hopwood ⚥

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL: <https://lists.langsec.org/pipermail/langsec-discuss/attachments/20120614/71dfdde1/attachment.pgp>

More information about the langsec-discuss mailing list