[langsec-discuss] TJSON: Tagged JSON with Rich Types

Tony Arcieri bascule at gmail.com
Fri Nov 4 23:15:37 UTC 2016


On Fri, Nov 4, 2016 at 5:38 AM, Sven M. Hallberg <pesco at khjk.org> wrote:

> For one thing, it's harder to write a parser now that reuses an existing
> JSON framework. Before you could do this:
>
> 1) Full recognition by an automaton autogenerated from a CFG.
>    (Also yay decidable equivalence.)
> 2) Interpretation by existing JSON parser.
> 3) Simple visitor pattern on result to convert tagged strings to their
>    native representations.
>

My understanding is parsers like Hammer can still handle these cases in one
pass (I think?). Would love to know!

Some quick BNF describing <member> and <tagged-string> according to:
https://tjson.org/spec/#rfc.section.2.1

    <member> ::= <tagged-string> <name-separator> <value>
    <tagged-string> ::= '"' *<char> ':' <tag> '"'

Unfortunately I don't have a well-defined grammar for <value>, as my
current definitions are somewhat colluded with the ABNF definition of JSON
in RFC 7159. I should definitely produce a full grammar! But you can
imagine it as being a sort of toplevel symbol.

To parse and typecheck TJSON in one pass, it would involve obtaining the
parse tree for the LHS of parsing a particular nonterminal and pass it to
the pushdown automaton parsing the RHS as a sort of parametric argument
along with the remaining unconsumed tokens.

At each frame of the stack, the pushdown automaton continues its way
towards the terminals, but you unwrap a bit of the parse tree parameter and
pass it along with the next pushdown the automaton is consuming, so long as
the type signature is for a non-scalar value.

When the pushdown automaton has reached the terminals and have almost
finished extracting a node on the parse tree, before we return the parsed
node we call a small guard/validation function which takes two nodes of the
parse tree as arguments, where one is the type signature for the current
node, and the other is the parsed value.

A tl;dr version:

- For a particular nonterminal, I want to have a "parameterized" pushdown
automaton that uses LHS to assist parsing RHS, by passing the parse result
for LHS to the parser for RHS
- I want to add what are effectively "postconditions" to that pushdown
automaton which use something approaching boolean algebra to ensure the
result is valid

This sounds context-sensitive to me, I guess. But even if it is, all it's
doing is using type information on LHS to enrich the parsing of RHS.
Certainly there's ample precedent for doing that sort of thing in the
innumerable statically typed languages out there? If it's
context-sensitive, it seems like a very boring kind of context sensitivity.
But IANAL (I Am Not A Linguist)

These are exactly the kind of cases I think parser combinator libraries are
made for.

If not, making a second pass to typecheck the parse tree doesn't seem so
bad either.

There's a completely different approach I'll be using in the Ruby
implementation. It's a bit wacky, but I think it works out.

-- 
Tony Arcieri
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.langsec.org/pipermail/langsec-discuss/attachments/20161104/55d46cf3/attachment.html>


More information about the langsec-discuss mailing list