[langsec-discuss] Langsec and Java Object Serialization

Sven Moritz Hallberg pesco at khjk.org
Tue Nov 10 15:49:34 UTC 2015

Will Sargent <will.sargent at gmail.com> writes:
> Basically, the only way I know of to securely check the goodness of java
> serialization is to check the class name.  I have no real idea if that can
> be faked or not (I wouldn't be surprised), but anything involving any kind
> of internal query into the structure of a message seems inherently
> doomed.

I, too, read the Foxglove blog post you reference. [1]

I find it relatively troubling/telling that the term "parsing" isn't
used anywhere and that the message seems to be "don't deserialize
untrusted data".

Deserialization is parsing. In the author's own words:

> Most programming languages provide built-in ways for users to output
> application data to disk or stream it over the network. The process of
> converting application data to another format (usually binary) suitable
> for transportation is called serialization. The process of reading data
> back in after it has been serialized is called unserialization.

Clearly, given some input, parsing is exactly what one should be doing
with it, "untrusted" or not. The issue is that Java deserialization maps
onto the set of objects that are of *any* serializable class.

The example code then goes:

    MyObject objectFromDisk = (MyObject)ois.readObject();

As evidenced by the cast, the intended input language is the set of
serializations of objects of class MyObject. Unfortunately, this is
validated only after parsing. As you note, this is a spectacular
counter-example to the Langsec war cry of "full recognition before

What the above should really read is:

    MyObject objectFromDisk = ois.readObject(MyObject);

I'm not sure if that's valid Java, but the point is that the parser
should be restricted to the expected class. 

Obviously, whitelisting is still much better than nothing.

I find two things interesting in this issue:

1. It props up a suspicion I've long held of data formats that purport
   to be "self-describing": If you know what you are parsing you don't
   need the data to describe itself. If on the other hand you don't know
   what you're parsing and invent some way for the data (i.e. the
   adversary) to tell you, you are doing nothing else than building a
   language backdoor.

2. It shows us that


   is objectively bad API design. An obvious alternative would be


   but I suspect many a Java designer would wave it off as naive and
   impractical for one reason or another. My intuition in contrast is
   that a "naive" formulation not being practical often indicates flaws


More information about the langsec-discuss mailing list