[langsec-discuss] WYSINWYG - a (mostly) langsec vulnerability category

travis+ml-langsec at subspacefield.org travis+ml-langsec at subspacefield.org
Tue Dec 2 18:01:09 UTC 2014

So what do these attacks have in common?

NIDS evasion (Ptacek's paper)
IDN homograph attacks
A/V evasion (e.g. Veil-evasion, executable packers)
Double encoding (and other encoding attacks)
Computationally-inequivalent endpoints

What I've been mulling over, and finally teased out, is that they are
all related to a general pattern, namely:

What You See Is Not What You Get

Where the recognizer/enforcement point does not see the same semantic
content as the processor.

Broken down into a hierarchy, these weaknesses (or attacks, which are
sometimes easier to name and tease out slight variations in the
underlying problem) include:

1 TOCTOU vulns
1.1 Bait-and-switch (the pre-computer scam)
1.2 Unix symlink race conditions
1.3 XML validation attack - external entity processing
2 Evasion Techniques
2.1 Turing Complete Languages & Undecidable Problems
2.1.1 A/V evasion & packers
2.1.2 shellcode encoders
2.1.3 Steganography (in the general sense of infinite contextual interpretations of a message)
2.2 Insufficient Validation, Divergent Implementations & Do What I Mean clients
2.2.1 NIDS evasion (Ptacek 1998)
2.2.2 Encoding Attacks URL encoding for path traversal (late 1990s?) Double-Encoding (date?)
2.2.3 IDN homograph attacks (glyph parsing vulnerability, 2002)
2.2.4 Injection filter evasion/bypass XSS filter bypasses Samy worm (2006) Reddit Comment bomb (2009) IE8 Anti-XSS filter bypass (2010) SQLi filter evasion (e.g. Unicode half-quote - date?)
2.2.6 SSL Null Byte Flaw (2005)
2.2.7 MIME Content Sniffing attacks (2009)
2.2.8 Frankencerts (2014)

A super-great example is:

2.2.9 XML Digital Signature weaknesses (2012,2013)

And an old-as-dirt example is using "terms of art" in legal documents,
where the term means one thing to a court, another to a naive person.

The weakness categories are generally less obvious as the numbers
increase, hence the generally-increasing dates.  First we dealt with
TOCTOU vulns, and then the anti-virus never-ending battle, then we got
into the game of browsers trying to correct web design flaws, and now
we're in the situation where the standards are ambiguous and there are
multiple, incompatible parsers for (for example) rendering web
content.  An interesting point about this is that the more popular the
data format, the more likely there are to be multiple implementations,
and therefore multiple interpretations.

For example, on the matter of divergent implementations, consider
C parsing:


Musings: what is the relation between WYSINWYG and shotgun parsing?
How about injection attacks?  Is there an overarching category?

I should add to this:

The diversity of implementations is a good thing in security when the
systems are arranged in a way such that each is independent:

|    |
A    B
goal goal

Or defeating the system involves defeating all of them:


And quite another when failure of one system leads to failure of all:

|  |
A  B
|  |

This is described a bit, in primitive form, in my book:


Or in this case, one is a "policy enforcement point" and the other
is a target:

B (goal)

And there are a few countermeasures.

One involves converting to canonical form at the border:
Split a packed field and I am there; parse a line of text and you will find me.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <https://mail.langsec.org/pipermail/langsec-discuss/attachments/20141202/59287149/attachment.sig>

More information about the langsec-discuss mailing list