[langsec-discuss] LZO, subtle bugs, theorem provers

Sergey Bratus sergey at cs.dartmouth.edu
Mon Jul 7 21:54:05 UTC 2014

    It would be great to have a verification track at next year's LangSec
workshop (or conference, as we hope to grow to one, attendance-wise).

    For this to happen, posing a set of verification problems that align 
well with the LangSec approach would help a lot.

    For example, in LangSec we observe that reducing the complexity of an 
input language generally makes verification of its processing code, the 
recognizer, an easier task (and is, therefore, the preferred way of 
constructing less error-prone, more verifiable input-handlng modules). 
Exploring the effects of input language simplification on verification
in practice, on existing verifiers would be great.



On Mon, 7 Jul 2014, Thomas Dullien wrote:

> My 2c, but my view at the moment is that none of our static analyzers /
> theorem provers can
> be usefully applied to legacy code.
> The successes of formal verification are usually when the checkers & the
> code are co-developed;
> e.g. the code is checked on each build while it is written.
> I haven't had time to look at the LZ bug in detail yet, realistically I may
> get around to it in August;
> once I get around I'll write down how hard I think it would be to detect
> this with an automated checker.
> Cheers,
> Halvar
> On Mon, Jul 7, 2014 at 2:04 PM, Meredith L. Patterson <clonearmy at gmail.com>
> wrote:
>> No promises, but the samples look like ones that Escher (
>> http://www.eschertech.com/index.php) could find. I'm evaluating Escher
>> for verifying parts of Hammer (since there are sound principles for what
>> "correct" means in regard to the parsing algorithms it implements), and can
>> tell you more after I've had some more experience with it.
>> Cheers,
>> --mlp
>> On Mon, Jul 7, 2014 at 1:56 PM, Dan Kaminsky <dan at doxpara.com> wrote:
>>> Do we have a theorem prover / static analyzer that finds this bug?
>>> We sure have a lot that miss it.
>>> On Monday, July 7, 2014, <dan at geer.org> wrote:
>>>> This is a verbatim transcript of what appeared on Peter Neumann's
>>>> RISKS Digest; note Henry Baker's leadoff comment.
>>>> -----------------8<------------cut-here------------8<-----------------
>>>> Date: Fri, 27 Jun 2014 06:11:41 -0700
>>>> From: Henry Baker <hbaker1 at pipeline.com>
>>>> Subject: Buffer overflows in 20-year-old LZ decompression code
>>>>   (Don A. Bailey)
>>>> FYI -- It's time to start using theorem provers on *all* code; if you
>>>> can't
>>>> convince the theorem prover re buffer overflows, you'll have to insert
>>>> executable code to explicitly check.  HB
>>>> Thursday, June 26, 2014
>>>> Raising Lazarus - The 20 Year Old Bug that Went to Mars
>>>> http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.html
>>>> It's rare that you come across a bug so subtle that it can last for two
>>>> decades.  But, that's exactly what has happened with the
>>>> Lempel-Ziv-Oberhumer (LZO) algorithm.  Initially written in 1994, Markus
>>>> Oberhumer designed a sophisticated and extremely efficient compression
>>>> algorithm so elegant and well architected that it outperforms zlib and
>>>> bzip
>>>> by four or five times their decompression speed.
>>>> As a result, Markus has made a successful and well deserved career out of
>>>> optimizing code for various platforms.  I was impressed to find out that
>>>> his
>>>> LZO algorithm has gone to the planet Mars on NASA devices multiple times!
>>>> Most recently, LZO has touched down on the red planet within the Mars
>>>> Curiosity Rover, which just celebrated its first martian anniversary on
>>>> Tuesday.
>>>> Because of the speed and efficiency of the algorithm, LZO has made its
>>>> way
>>>> into both proprietary and open source projects world-wide.  It's has
>>>> lived
>>>> in automotive systems, airplanes, and other embedded systems for over a
>>>> decade.  The algorithm has even made its way into projects we use on a
>>>> daily
>>>> basis, such as OpenVPN, MPlayer2, Libav, FFmpeg, the Linux kernel,
>>>> Juniper
>>>> Junos, and much, much, more.
>>>> In the past few years, LZO has gained traction in file systems as well.
>>>>  LZO
>>>> can be used in the Linux kernel within btrfs, squashfs, jffs2, and
>>>> ubifs.  A
>>>> recent variant of the algorithm, LZ4, is used for compression in ZFS for
>>>> Solaris, Illumos, and FreeBSD.
>>>> LZO is even enabled in kernels for Samsung Android devices to increase
>>>> kernel loading speed and improve the user experience, as noted in the
>>>> Android Hacker's Handbook.
>>>> With its popularity increasing, Lempel-Ziv-Oberhumer has been rewritten
>>>> by
>>>> many engineering firms for both closed and open systems.  These rewrites,
>>>> however, have always been based on Oberhumer's core open source
>>>> implementation.  As a result, they all inherited a subtle integer
>>>> overflow.
>>>> Even LZ4 has the same exact bug, but changed very slightly.
>>>> Engineered Genetics
>>>> Code reuse is a normal part of engineering, and is something we do every
>>>> day.  But, it can be dangerous.  By reusing code that is known to work
>>>> well,
>>>> especially in highly optimized algorithms, projects can become subject to
>>>> vulnerabilities in what is perceived as trusted code.  Auditing highly
>>>> optimized algorithms is a fragile endeavor.  It is very easy to break
>>>> these
>>>> types of algorithms.  Therefore, reused code that is highly specialized
>>>> is
>>>> often presumed safe because of its age, its proven efficiency, and its
>>>> fragility.
>>>> This creates a sort of digital DNA, a digital genetic footprint that can
>>>> be
>>>> traced over time.  Though there are certainly many instances of
>>>> proprietary
>>>> variants of LZO and LZ4, the following six implementations are available
>>>> in
>>>> open source software
>>>>     Oberhumer LZO (core/reference open source implementation)
>>>>     Linux kernel's LZO implementation
>>>>     Libav's LZO implementation
>>>>     FFmpeg's LZO implementation
>>>>     Linux kernel's LZ4 implementation
>>>>     LZ4 core/reference implementation
>>>> Despite each implementation of the algorithm being noticeably different,
>>>> each variant is vulnerable in the exact same way.  Let's take a look at a
>>>> version of the algorithm that is easy to read online, the Linux kernel
>>>> implementation found here.
>>>> In all variants of LZ[O4], the vulnerability occurs when processing a
>>>> Literal Run.  This is a chunk of compressed data that isn't compressed at
>>>> all.  Literals are uncompressed bytes that the user decided, for whatever
>>>> reason, should not be compressed.  A Literal Run is signaled by a state
>>>> machine in LZO, and by a Mask in LZ4.
>>>>  56                         if (likely(state == 0)) {
>>>>  57                                 if (unlikely(t == 0)) {
>>>>  58                                         while (unlikely(*ip == 0)) {
>>>>  59                                                 t += 255;
>>>>  60                                                 ip++;
>>>>  61                                                 NEED_IP(1);
>>>>  62                                         }
>>>>  63                                         t += 15 + *ip++;
>>>>  64                                 }
>>>>  65                                 t += 3;
>>>> In the above sample, the integer overflow is evident.  The variable 't'
>>>> is
>>>> incremented by 255 every time the compression payload contains a nil byte
>>>> (0x00) when a Literal Run is detected.  Regardless of whether the
>>>> variable
>>>> 't' is signed or unsigned, 255 will be added to it.  The only check is to
>>>> ensure that the input buffer contains another byte.  This means that 't'
>>>> can
>>>> accumulate until it is a very large unsigned integer.  If 't' is a 32bit
>>>> integer, it only takes approximately sixteen (16) megabytes of zeroes to
>>>> generate a sufficiently large value for 't'.  Though 't' can overflow
>>>> here,
>>>> this is not where the attack occurs.  There is another more important
>>>> overflow just below this chunk of code.
>>>> 66 copy_literal_run:
>>>> 68                                 if (likely(HAVE_IP(t + 15) &&
>>>> HAVE_OP(t + 15))) {
>>>> 69                                         const unsigned char *ie = ip
>>>> + t;
>>>> 70                                         unsigned char *oe = op + t;
>>>> 71                                         do {
>>>> 72                                                 COPY8(op, ip);
>>>> 73                                                 op += 8;
>>>> 74                                                 ip += 8;
>>>> 75                                                 COPY8(op, ip);
>>>> 76                                                 op += 8;
>>>> 77                                                 ip += 8;
>>>> 78                                         } while (ip < ie);
>>>> 79                                         ip = ie;
>>>> 80                                         op = oe;
>>>> 81                                 } else
>>>> 82 #endif
>>>> Above, we see the "copy_literal_run" chunk of code.  This is the section
>>>> of
>>>> the LZO algorithm that uses the variable 't' as a size parameter.  On
>>>> line
>>>> 68, the code ensures that the input buffer (IP) and output buffer (OP)
>>>> are
>>>> large enough to contain 't' bytes.  However, in the Linux kernel
>>>> implementation, they pad by 15 bytes to ensure the 16 byte copy does not
>>>> overflow either buffer.  This is where things fail.
>>>> The macros HAVE_IP and HAVE_OP validate that 't' bytes are available in
>>>> the
>>>> respective buffer.  But, before the macro is called, the expression (t +
>>>> 15)
>>>> is evaluated.  If the value of 't' is large enough, this expression will
>>>> cause an integer overflow.  The attacker can make this expression result
>>>> in
>>>> a value of zero (0) through fourteen (14) by forcing 't' to equal the
>>>> values
>>>> -15 to -1, respectively.  This means that the HAVE macros will always
>>>> believe that enough space is available in both input and output buffers.
>>>> On line 70, the pointer 'oe' will now point to before the 'op' buffer,
>>>> potentially pointing to memory prior to the start of the output buffer.
>>>>  The
>>>> subsequent code will copy sixteen (16) bytes from the input pointer to
>>>> the
>>>> output pointer, which does nothing as these pointers should point to a
>>>> "safe" location in memory.  However, there are two side effects here that
>>>> the attacker must abuse: lines 78 and 80.
>>>> Because 'ie' will always have an address lower in memory than 'ip', the
>>>> loop
>>>> is immediately broken after the first sixteen (16) byte copy.  This means
>>>> that the value 't' did not cause a crash in the copy loop, making this
>>>> copy
>>>> essentially a no-op from the attacker's point of view.  Most
>>>> importantly, on
>>>> line 80 (and 79), the buffer pointer is set to the overflown pointer.
>>>>  This
>>>> means that now, the output pointer points to memory outside of the
>>>> bounds of
>>>> the output buffer.  The attacker now has the capability to corrupt
>>>> memory,
>>>> or at least cause a Denial of Service (DoS) by writing to an invalid
>>>> memory
>>>> page.
>>>> The Impact of Raising Dead Code
>>>> Each variant of the LZO and LZ4 implementation is vulnerable in slightly
>>>> different ways.  The attacker must construct a malicious payload to fit
>>>> each
>>>> particular implementation.  One payload cannot be used to trigger more
>>>> than
>>>> a DoS on each implementation.  Because of the slightly different overflow
>>>> requirements, state machine subtleties, and overflow checks that must be
>>>> bypassed, even a worldwide DoS is not a simple task.
>>>> This results in completely different threats depending on the
>>>> implementation
>>>> of the algorithm, the underlying architecture, and the memory layout of
>>>> the
>>>> target application.  Remote Code Execution (RCE) is possible on multiple
>>>> architectures and platforms, but absolutely not all.  Denial of Service
>>>> is
>>>> possible on most implementations, but not all.  Adjacent Object
>>>> Over-Write
>>>> (OOW) is possible on many architectures.
>>>> Lazarus raised from the dead
>>>> Because the LZO algorithm is considered a library function, each specific
>>>> implementation must be evaluated for risk, regardless of whether the
>>>> algorithm used has been patched.  Why?  We are talking about code that
>>>> has
>>>> existed in the wild for two decades.  The scope of this algorithm touches
>>>> everything from embedded microcontrollers on the Mars Rover, mainframe
>>>> operating systems, modern day desktops, and mobile phones.  Engineers
>>>> that
>>>> have used LZO must evaluate the use case to identify whether or not the
>>>> implementation is vulnerable, and in what format.
>>>> Here is a list of impact based on each library. Implementations, or use
>>>> cases of each library may change the threat model enough to warrant
>>>> reclassification.  So, please have a variant audited by a skilled third
>>>> party, such as <shameless plug>.
>>>>     Oberhumer LZO
>>>>         RCE: Impractical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>         NOTE: 64bit platforms are impractical for all attacks
>>>>     Linux kernel LZO
>>>>         RCE: Impractical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>         NOTE: Only i386/PowerPC are impacted at this time
>>>>     Libav LZO
>>>>         RCE: Practical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>     FFmpeg LZO
>>>>         RCE: Practical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>     Linux kernel LZ4
>>>>         RCE: Practical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>         NOTE: 64bit architectures are NOT considered practical
>>>>     LZ4
>>>>         RCE: Practical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>         NOTE: 64bit architectures are NOT considered practical
>>>> For a bug report on each implementation, please visit the Lab Mouse
>>>> Security's vulnerability site.
>>>> How Do You Know If You're Vulnerable
>>>> Projects Using LZO/LZ4
>>>> The easiest way to identify whether your specific implementation is
>>>> vulnerable is to determine the maximum chunk size that is passed to the
>>>> decompress routine.  If buffers of sixteen (16) megabytes or more can be
>>>> passed to the LZO or LZ4 decompress routine in one call, then
>>>> exploitation
>>>> of the integer overflow is possible.  For example, ZFS constrains buffer
>>>> sizes to 128k.  So, even though they use a vulnerable implementation of
>>>> LZ4,
>>>> an attack is not possible without a second bug to bypass the buffer size
>>>> constraint.
>>>> The second easiest way is to identify the bit size of the count variable.
>>>> If the count variable (for example, named 't' in the Linux kernel code
>>>> shown
>>>> above) is 64bit, it would take such a massive amount of data to trigger
>>>> the
>>>> overflow that the attack would likely be infeasible, regardless of how
>>>> much
>>>> data can be passed to the vulnerable function in one call.  This is due
>>>> to
>>>> the fact that even modern computers do not have enough RAM available to
>>>> store the data required to implement such an attack.
>>>> However, there is a specific issue with the previous check.  Validate
>>>> that
>>>> even if the count variable is 64bit in size, the value used is still
>>>> 64bit
>>>> when a length value is checked.  If the actual length value is truncated
>>>> to
>>>> 32bits, the attack will still work with only sixteen (16) megabytes of
>>>> data.
>>>> Users
>>>> All users of FFmpeg, Libav, and projects that depend on them, should
>>>> consider themselves at risk to remote code execution.  Period.  Please
>>>> update your software from the FFmpeg and Libav websites, or refrain from
>>>> using these applications until your distribution has an adequate patch.
>>>> It should be noted that certain Linux distributions package Mplayer2 with
>>>> the base system by default.  MPlayer2 is vulnerable to RCE "out of the
>>>> box".
>>>> If your distribution packages MPlayer2 by default, be sure to disable the
>>>> embedded media player plugin (gecko-mediaplayer) for your browser.
>>>> Firefox/Iceweasel, Chromium, Opera, Konqueror, and other Linux-based
>>>> browsers are vulnerable to RCE regardless of the platform/architecture
>>>> when
>>>> an MPlayer2 plugin is enabled.
>>>> Vendor Status
>>>> Lab Mouse has reached out to and worked with each vendor of the
>>>> vulnerable
>>>> algorithm.  As of today, June 26th, 2014, all LZO vendors have patches
>>>> either available online, or will later today.  Please update as soon as
>>>> possible to minimize the existing threat surface.
>>>> In the near future, Lab Mouse will publish a more technical blog on why
>>>> and
>>>> how RCE is possible using this bug.  We consider that information to be
>>>> imperative for both auditors and engineers, as it assists in identifying,
>>>> classifying, and prioritizing a threat.  However, that report will be
>>>> released once the patches have been widely distributed for a sufficient
>>>> amount of time.
>>>> For more information, please visit our contact page.  We are more than
>>>> happy
>>>> to help your team with their use case, or implementation of these
>>>> algorithms.
>>>> Summary
>>>> Overall, this is how this bug release breaks down.
>>>>   Vendors have patches ready or released
>>>>   Distributions have been notified
>>>>   Vendors of proprietary variants have been notified (where they could
>>>> be found)
>>>>   All bug reports can be found here
>>>>   RCE is not only possible but practical on all Libav/FFmpeg based
>>>> projects
>>>>   All others are likely impractical to RCE, but still possible given a
>>>>     sufficiently skilled attacker
>>>> It is always exciting to uncover a vulnerability as subtle as this issue,
>>>> especially one that has persisted and propagated for two decades.  But,
>>>> it
>>>> makes me pause and consider the way we look at engineering as a model.
>>>> Speed and efficiency are imperatives for modern projects.  We're building
>>>> technology that touches our lives like never before.  I know that most
>>>> engineers strive to build not only elegant, but safe code.  But, we still
>>>> see security as a disparate discipline from engineering.  Security and
>>>> engineering could not be more tightly bound.  Without engineering, you
>>>> can't
>>>> provide security to users.  Without security, engineering cannot provide
>>>> a
>>>> stable and provable platform.
>>>> Neil deGrasse Tyson famously claimed, God is in the gaps.  There is a
>>>> similar issue in engineering.  The individual often sees stability where
>>>> the
>>>> individual doesn't have expertise.  Our God is the algorithm.  We "bless"
>>>> certain pieces of code because we don't have the time or knowledge to
>>>> evaluate it.  When we, as engineers and analysts, take that perspective,
>>>> we
>>>> are doing a disservice to the people that use our projects and services.
>>>> Often the best eyes are fresh or untrained eyes.  The more we stop
>>>> telling
>>>> ourselves to step over the gaps in our code bases, the more holes we'll
>>>> be
>>>> able to fill.  All it takes is one set of eyes to find a vulnerability,
>>>> there is no level of expertise required to look and ask questions.  Just
>>>> look.  Maybe you'll find the next 20 year old vulnerability.
>>>> Thanks
>>>> I'd like to thank the following people for their great assistance
>>>> patching,
>>>> coordinating, and advising on this issue:
>>>>     Greg Kroah-Hartman (Linux)
>>>>     Linus Torvalds (Linux)
>>>>     Kees Cook (Google)
>>>>     Xin LI (FreeBSD)
>>>>     Michael Niedermayer (FFmpeg)
>>>>     Luca Barbato (Libav/Gentoo)
>>>>     Markus Oberhumer
>>>>     Christopher J. Dorros (NASA MSL)
>>>>     Dan McDonald (Omniti)
>>>>     Yves-Alexis Perez (Debian)
>>>>     Kurt Seifried (Red Hat)
>>>>     Willy Tarreau (Linux)
>>>>     Solar Designer (Openwall)
>>>>     The US-CERT team
>>>>     The Oracle security team
>>>>     The GE security team
>>>>     Kelly Jackson Higgins (UBM)
>>>>     Steve Ragan (IDG Enterprise)
>>>>     Elinor Mills
>>>> Feeling Guilty?
>>>> Are you reading this post, thinking about all the administrators and
>>>> engineers that are going to have to patch the LZO/LZ4 issue in your
>>>> team's
>>>> systems?  Take some time to tell them how you feel with our hand crafted
>>>> Lab
>>>> Mouse Security custom Sympathy Card!
>>>> Hand crafted with the finest bits and bytes, our Sympathy Card shows your
>>>> engineer what they mean to you and your team.  This is a limited run of
>>>> cards, and will proudly display the Linux kernel LZO exploit written by
>>>> Lab
>>>> Mouse on the card.
>>>> Best wishes,
>>>> Don A. Bailey, Founder / CEO, @InfoSecMouse, Lab Mouse Security, 26 Jun
>>>> 2014
>>>> _______________________________________________
>>>> langsec-discuss mailing list
>>>> langsec-discuss at mail.langsec.org
>>>> https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
>>> _______________________________________________
>>> langsec-discuss mailing list
>>> langsec-discuss at mail.langsec.org
>>> https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
>> _______________________________________________
>> langsec-discuss mailing list
>> langsec-discuss at mail.langsec.org
>> https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss

More information about the langsec-discuss mailing list