[langsec-discuss] proven-correct parsers

Matthew Wilson diakopter at gmail.com
Thu Jun 2 03:07:45 UTC 2016

> e.g., do we really need a CFL in a given context or would a RL suffice?

Well, as I understand it, the only reason to prefer CFLs is because
their (efficient!) parsers are (traditionally!) harder to verify than
those for regular languages...

Here's a very hand-wavey speculation: Since a Turing-complete bytecode
can be encoded into a datastream that is encoded (at a lower level) by
a regular language, what really matters is how the resulting data/AST
is interpreted after being "parsed" - at its highest level.
Therefore, all programs/functions should be verified as formalized
parser-interpreters of formal languages, in order to solve the problem
holistically.  Taken to the extreme, we would then view all machines
(and their sub-machines) as interfaces whose inputs (both human and
programmatic) are formally type-checked and implementations are
formally verified, top to bottom, all the way back to system-wide
security and resource-consumption properties/constraints.  This would,
of course, require a total re-think on how to interact with computers,
since our every input (including shell command-lines and GUI command
sequences) would need to be type-checked to effectively the same
extent as every other bit of program code in the system/network. In
other words, every input language is an embedded DSL of dependent type
theory, with a verified implementation... and the process of learning
computer programming transitions to learning how to express all your
problems in such formalizations.

The artifacts (that I've run across in my secondary/tertiary research)
that approach this goal are those formalized in dependent
intuitionistic type theory, including this one:
http://keldysh.ru/papers/2013/prep2013_73_eng.pdf and
https://github.com/ilya-klyuchnikov/ttlite   ... and for those whose
curiosity that paper tickles, here's another project based on the same
foundational formalization (but building from the opposite direction
of the implementation stack):
and https://gitlab.com/joaopizani/piware

Apologies if my suggestion is too hand-wavey for anyone..  For a far
more informative and useful explanation of this theory of progress
toward holistic formalization, see the upcoming book by Edwin Brady:
Note: The current implementation of Idris is not itself formalized
(and it doesn't emit proofs) so it doesn't yet fit into the same
strain of development as the other projects, but it's a superb
learning environment for principles of formalization in type theory.

On Wed, Jun 1, 2016 at 6:30 PM, Guilherme Salazar <gmesalazar at gmail.com> wrote:
> Hey,
> On Wed, Jun 1, 2016 at 9:57 PM, Matthew Wilson <diakopter at gmail.com> wrote:
>> I hadn't seen this paper/repo referenced on this list or any of the
>> links from langsec.org, so I thought it might be helpful to point out.
>> I realize the answer to my question is generally agreed upon by the
>> list participants...
>> On second thought, it occurs to me that perhaps you didn't see the
>> link I included (since you didn't quote it) to the parsing-parses
>> Github repo of JasonGross?
> I did see it; thanks for sharing : )
> The reason I asked was that, the way I see it, LangSec's core idea is
> not only to build correct parsers, but to reason if the language the
> parser parses is what we really need to deal with -- e.g., do we
> really need a CFL in a given context or would a RL suffice?
> Cheers,
> G. Salazar

Sent by an Internet

More information about the langsec-discuss mailing list