[langsec-discuss] LZO, subtle bugs, theorem provers

Dan Kaminsky dan at doxpara.com
Tue Jul 8 03:16:16 UTC 2014

It seems likely then that verification only works on the subset of
expressiveness that can be cowritten in the free and the constrained
language.  i.e. we aren't verifying c at all.

On Mon, Jul 7, 2014 at 2:17 PM, Thomas Dullien <thomasdullien at google.com>

> My 2c, but my view at the moment is that none of our static analyzers /
> theorem provers can
> be usefully applied to legacy code.
> The successes of formal verification are usually when the checkers & the
> code are co-developed;
> e.g. the code is checked on each build while it is written.
> I haven't had time to look at the LZ bug in detail yet, realistically I
> may get around to it in August;
> once I get around I'll write down how hard I think it would be to detect
> this with an automated checker.
> Cheers,
> Halvar
> On Mon, Jul 7, 2014 at 2:04 PM, Meredith L. Patterson <clonearmy at gmail.com
> > wrote:
>> No promises, but the samples look like ones that Escher (
>> http://www.eschertech.com/index.php) could find. I'm evaluating Escher
>> for verifying parts of Hammer (since there are sound principles for what
>> "correct" means in regard to the parsing algorithms it implements), and can
>> tell you more after I've had some more experience with it.
>> Cheers,
>> --mlp
>> On Mon, Jul 7, 2014 at 1:56 PM, Dan Kaminsky <dan at doxpara.com> wrote:
>>> Do we have a theorem prover / static analyzer that finds this bug?
>>> We sure have a lot that miss it.
>>> On Monday, July 7, 2014, <dan at geer.org> wrote:
>>>> This is a verbatim transcript of what appeared on Peter Neumann's
>>>> RISKS Digest; note Henry Baker's leadoff comment.
>>>> -----------------8<------------cut-here------------8<-----------------
>>>> Date: Fri, 27 Jun 2014 06:11:41 -0700
>>>> From: Henry Baker <hbaker1 at pipeline.com>
>>>> Subject: Buffer overflows in 20-year-old LZ decompression code
>>>>   (Don A. Bailey)
>>>> FYI -- It's time to start using theorem provers on *all* code; if you
>>>> can't
>>>> convince the theorem prover re buffer overflows, you'll have to insert
>>>> executable code to explicitly check.  HB
>>>> Thursday, June 26, 2014
>>>> Raising Lazarus - The 20 Year Old Bug that Went to Mars
>>>> http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.html
>>>> It's rare that you come across a bug so subtle that it can last for two
>>>> decades.  But, that's exactly what has happened with the
>>>> Lempel-Ziv-Oberhumer (LZO) algorithm.  Initially written in 1994, Markus
>>>> Oberhumer designed a sophisticated and extremely efficient compression
>>>> algorithm so elegant and well architected that it outperforms zlib and
>>>> bzip
>>>> by four or five times their decompression speed.
>>>> As a result, Markus has made a successful and well deserved career out
>>>> of
>>>> optimizing code for various platforms.  I was impressed to find out
>>>> that his
>>>> LZO algorithm has gone to the planet Mars on NASA devices multiple
>>>> times!
>>>> Most recently, LZO has touched down on the red planet within the Mars
>>>> Curiosity Rover, which just celebrated its first martian anniversary on
>>>> Tuesday.
>>>> Because of the speed and efficiency of the algorithm, LZO has made its
>>>> way
>>>> into both proprietary and open source projects world-wide.  It's has
>>>> lived
>>>> in automotive systems, airplanes, and other embedded systems for over a
>>>> decade.  The algorithm has even made its way into projects we use on a
>>>> daily
>>>> basis, such as OpenVPN, MPlayer2, Libav, FFmpeg, the Linux kernel,
>>>> Juniper
>>>> Junos, and much, much, more.
>>>> In the past few years, LZO has gained traction in file systems as well.
>>>>  LZO
>>>> can be used in the Linux kernel within btrfs, squashfs, jffs2, and
>>>> ubifs.  A
>>>> recent variant of the algorithm, LZ4, is used for compression in ZFS for
>>>> Solaris, Illumos, and FreeBSD.
>>>> LZO is even enabled in kernels for Samsung Android devices to increase
>>>> kernel loading speed and improve the user experience, as noted in the
>>>> Android Hacker's Handbook.
>>>> With its popularity increasing, Lempel-Ziv-Oberhumer has been rewritten
>>>> by
>>>> many engineering firms for both closed and open systems.  These
>>>> rewrites,
>>>> however, have always been based on Oberhumer's core open source
>>>> implementation.  As a result, they all inherited a subtle integer
>>>> overflow.
>>>> Even LZ4 has the same exact bug, but changed very slightly.
>>>> Engineered Genetics
>>>> Code reuse is a normal part of engineering, and is something we do every
>>>> day.  But, it can be dangerous.  By reusing code that is known to work
>>>> well,
>>>> especially in highly optimized algorithms, projects can become subject
>>>> to
>>>> vulnerabilities in what is perceived as trusted code.  Auditing highly
>>>> optimized algorithms is a fragile endeavor.  It is very easy to break
>>>> these
>>>> types of algorithms.  Therefore, reused code that is highly specialized
>>>> is
>>>> often presumed safe because of its age, its proven efficiency, and its
>>>> fragility.
>>>> This creates a sort of digital DNA, a digital genetic footprint that
>>>> can be
>>>> traced over time.  Though there are certainly many instances of
>>>> proprietary
>>>> variants of LZO and LZ4, the following six implementations are
>>>> available in
>>>> open source software
>>>>     Oberhumer LZO (core/reference open source implementation)
>>>>     Linux kernel's LZO implementation
>>>>     Libav's LZO implementation
>>>>     FFmpeg's LZO implementation
>>>>     Linux kernel's LZ4 implementation
>>>>     LZ4 core/reference implementation
>>>> Despite each implementation of the algorithm being noticeably different,
>>>> each variant is vulnerable in the exact same way.  Let's take a look at
>>>> a
>>>> version of the algorithm that is easy to read online, the Linux kernel
>>>> implementation found here.
>>>> In all variants of LZ[O4], the vulnerability occurs when processing a
>>>> Literal Run.  This is a chunk of compressed data that isn't compressed
>>>> at
>>>> all.  Literals are uncompressed bytes that the user decided, for
>>>> whatever
>>>> reason, should not be compressed.  A Literal Run is signaled by a state
>>>> machine in LZO, and by a Mask in LZ4.
>>>>  56                         if (likely(state == 0)) {
>>>>  57                                 if (unlikely(t == 0)) {
>>>>  58                                         while (unlikely(*ip == 0)) {
>>>>  59                                                 t += 255;
>>>>  60                                                 ip++;
>>>>  61                                                 NEED_IP(1);
>>>>  62                                         }
>>>>  63                                         t += 15 + *ip++;
>>>>  64                                 }
>>>>  65                                 t += 3;
>>>> In the above sample, the integer overflow is evident.  The variable 't'
>>>> is
>>>> incremented by 255 every time the compression payload contains a nil
>>>> byte
>>>> (0x00) when a Literal Run is detected.  Regardless of whether the
>>>> variable
>>>> 't' is signed or unsigned, 255 will be added to it.  The only check is
>>>> to
>>>> ensure that the input buffer contains another byte.  This means that
>>>> 't' can
>>>> accumulate until it is a very large unsigned integer.  If 't' is a 32bit
>>>> integer, it only takes approximately sixteen (16) megabytes of zeroes to
>>>> generate a sufficiently large value for 't'.  Though 't' can overflow
>>>> here,
>>>> this is not where the attack occurs.  There is another more important
>>>> overflow just below this chunk of code.
>>>> 66 copy_literal_run:
>>>> 68                                 if (likely(HAVE_IP(t + 15) &&
>>>> HAVE_OP(t + 15))) {
>>>> 69                                         const unsigned char *ie = ip
>>>> + t;
>>>> 70                                         unsigned char *oe = op + t;
>>>> 71                                         do {
>>>> 72                                                 COPY8(op, ip);
>>>> 73                                                 op += 8;
>>>> 74                                                 ip += 8;
>>>> 75                                                 COPY8(op, ip);
>>>> 76                                                 op += 8;
>>>> 77                                                 ip += 8;
>>>> 78                                         } while (ip < ie);
>>>> 79                                         ip = ie;
>>>> 80                                         op = oe;
>>>> 81                                 } else
>>>> 82 #endif
>>>> Above, we see the "copy_literal_run" chunk of code.  This is the
>>>> section of
>>>> the LZO algorithm that uses the variable 't' as a size parameter.  On
>>>> line
>>>> 68, the code ensures that the input buffer (IP) and output buffer (OP)
>>>> are
>>>> large enough to contain 't' bytes.  However, in the Linux kernel
>>>> implementation, they pad by 15 bytes to ensure the 16 byte copy does not
>>>> overflow either buffer.  This is where things fail.
>>>> The macros HAVE_IP and HAVE_OP validate that 't' bytes are available in
>>>> the
>>>> respective buffer.  But, before the macro is called, the expression (t
>>>> + 15)
>>>> is evaluated.  If the value of 't' is large enough, this expression will
>>>> cause an integer overflow.  The attacker can make this expression
>>>> result in
>>>> a value of zero (0) through fourteen (14) by forcing 't' to equal the
>>>> values
>>>> -15 to -1, respectively.  This means that the HAVE macros will always
>>>> believe that enough space is available in both input and output buffers.
>>>> On line 70, the pointer 'oe' will now point to before the 'op' buffer,
>>>> potentially pointing to memory prior to the start of the output buffer.
>>>>  The
>>>> subsequent code will copy sixteen (16) bytes from the input pointer to
>>>> the
>>>> output pointer, which does nothing as these pointers should point to a
>>>> "safe" location in memory.  However, there are two side effects here
>>>> that
>>>> the attacker must abuse: lines 78 and 80.
>>>> Because 'ie' will always have an address lower in memory than 'ip', the
>>>> loop
>>>> is immediately broken after the first sixteen (16) byte copy.  This
>>>> means
>>>> that the value 't' did not cause a crash in the copy loop, making this
>>>> copy
>>>> essentially a no-op from the attacker's point of view.  Most
>>>> importantly, on
>>>> line 80 (and 79), the buffer pointer is set to the overflown pointer.
>>>>  This
>>>> means that now, the output pointer points to memory outside of the
>>>> bounds of
>>>> the output buffer.  The attacker now has the capability to corrupt
>>>> memory,
>>>> or at least cause a Denial of Service (DoS) by writing to an invalid
>>>> memory
>>>> page.
>>>> The Impact of Raising Dead Code
>>>> Each variant of the LZO and LZ4 implementation is vulnerable in slightly
>>>> different ways.  The attacker must construct a malicious payload to fit
>>>> each
>>>> particular implementation.  One payload cannot be used to trigger more
>>>> than
>>>> a DoS on each implementation.  Because of the slightly different
>>>> overflow
>>>> requirements, state machine subtleties, and overflow checks that must be
>>>> bypassed, even a worldwide DoS is not a simple task.
>>>> This results in completely different threats depending on the
>>>> implementation
>>>> of the algorithm, the underlying architecture, and the memory layout of
>>>> the
>>>> target application.  Remote Code Execution (RCE) is possible on multiple
>>>> architectures and platforms, but absolutely not all.  Denial of Service
>>>> is
>>>> possible on most implementations, but not all.  Adjacent Object
>>>> Over-Write
>>>> (OOW) is possible on many architectures.
>>>> Lazarus raised from the dead
>>>> Because the LZO algorithm is considered a library function, each
>>>> specific
>>>> implementation must be evaluated for risk, regardless of whether the
>>>> algorithm used has been patched.  Why?  We are talking about code that
>>>> has
>>>> existed in the wild for two decades.  The scope of this algorithm
>>>> touches
>>>> everything from embedded microcontrollers on the Mars Rover, mainframe
>>>> operating systems, modern day desktops, and mobile phones.  Engineers
>>>> that
>>>> have used LZO must evaluate the use case to identify whether or not the
>>>> implementation is vulnerable, and in what format.
>>>> Here is a list of impact based on each library. Implementations, or use
>>>> cases of each library may change the threat model enough to warrant
>>>> reclassification.  So, please have a variant audited by a skilled third
>>>> party, such as <shameless plug>.
>>>>     Oberhumer LZO
>>>>         RCE: Impractical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>         NOTE: 64bit platforms are impractical for all attacks
>>>>     Linux kernel LZO
>>>>         RCE: Impractical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>         NOTE: Only i386/PowerPC are impacted at this time
>>>>     Libav LZO
>>>>         RCE: Practical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>     FFmpeg LZO
>>>>         RCE: Practical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>     Linux kernel LZ4
>>>>         RCE: Practical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>         NOTE: 64bit architectures are NOT considered practical
>>>>     LZ4
>>>>         RCE: Practical
>>>>         DoS: Practical
>>>>         OOW: Practical
>>>>         NOTE: 64bit architectures are NOT considered practical
>>>> For a bug report on each implementation, please visit the Lab Mouse
>>>> Security's vulnerability site.
>>>> How Do You Know If You're Vulnerable
>>>> Projects Using LZO/LZ4
>>>> The easiest way to identify whether your specific implementation is
>>>> vulnerable is to determine the maximum chunk size that is passed to the
>>>> decompress routine.  If buffers of sixteen (16) megabytes or more can be
>>>> passed to the LZO or LZ4 decompress routine in one call, then
>>>> exploitation
>>>> of the integer overflow is possible.  For example, ZFS constrains buffer
>>>> sizes to 128k.  So, even though they use a vulnerable implementation of
>>>> LZ4,
>>>> an attack is not possible without a second bug to bypass the buffer size
>>>> constraint.
>>>> The second easiest way is to identify the bit size of the count
>>>> variable.
>>>> If the count variable (for example, named 't' in the Linux kernel code
>>>> shown
>>>> above) is 64bit, it would take such a massive amount of data to trigger
>>>> the
>>>> overflow that the attack would likely be infeasible, regardless of how
>>>> much
>>>> data can be passed to the vulnerable function in one call.  This is due
>>>> to
>>>> the fact that even modern computers do not have enough RAM available to
>>>> store the data required to implement such an attack.
>>>> However, there is a specific issue with the previous check.  Validate
>>>> that
>>>> even if the count variable is 64bit in size, the value used is still
>>>> 64bit
>>>> when a length value is checked.  If the actual length value is
>>>> truncated to
>>>> 32bits, the attack will still work with only sixteen (16) megabytes of
>>>> data.
>>>> Users
>>>> All users of FFmpeg, Libav, and projects that depend on them, should
>>>> consider themselves at risk to remote code execution.  Period.  Please
>>>> update your software from the FFmpeg and Libav websites, or refrain from
>>>> using these applications until your distribution has an adequate patch.
>>>> It should be noted that certain Linux distributions package Mplayer2
>>>> with
>>>> the base system by default.  MPlayer2 is vulnerable to RCE "out of the
>>>> box".
>>>> If your distribution packages MPlayer2 by default, be sure to disable
>>>> the
>>>> embedded media player plugin (gecko-mediaplayer) for your browser.
>>>> Firefox/Iceweasel, Chromium, Opera, Konqueror, and other Linux-based
>>>> browsers are vulnerable to RCE regardless of the platform/architecture
>>>> when
>>>> an MPlayer2 plugin is enabled.
>>>> Vendor Status
>>>> Lab Mouse has reached out to and worked with each vendor of the
>>>> vulnerable
>>>> algorithm.  As of today, June 26th, 2014, all LZO vendors have patches
>>>> either available online, or will later today.  Please update as soon as
>>>> possible to minimize the existing threat surface.
>>>> In the near future, Lab Mouse will publish a more technical blog on why
>>>> and
>>>> how RCE is possible using this bug.  We consider that information to be
>>>> imperative for both auditors and engineers, as it assists in
>>>> identifying,
>>>> classifying, and prioritizing a threat.  However, that report will be
>>>> released once the patches have been widely distributed for a sufficient
>>>> amount of time.
>>>> For more information, please visit our contact page.  We are more than
>>>> happy
>>>> to help your team with their use case, or implementation of these
>>>> algorithms.
>>>> Summary
>>>> Overall, this is how this bug release breaks down.
>>>>   Vendors have patches ready or released
>>>>   Distributions have been notified
>>>>   Vendors of proprietary variants have been notified (where they could
>>>> be found)
>>>>   All bug reports can be found here
>>>>   RCE is not only possible but practical on all Libav/FFmpeg based
>>>> projects
>>>>   All others are likely impractical to RCE, but still possible given a
>>>>     sufficiently skilled attacker
>>>> It is always exciting to uncover a vulnerability as subtle as this
>>>> issue,
>>>> especially one that has persisted and propagated for two decades.  But,
>>>> it
>>>> makes me pause and consider the way we look at engineering as a model.
>>>> Speed and efficiency are imperatives for modern projects.  We're
>>>> building
>>>> technology that touches our lives like never before.  I know that most
>>>> engineers strive to build not only elegant, but safe code.  But, we
>>>> still
>>>> see security as a disparate discipline from engineering.  Security and
>>>> engineering could not be more tightly bound.  Without engineering, you
>>>> can't
>>>> provide security to users.  Without security, engineering cannot
>>>> provide a
>>>> stable and provable platform.
>>>> Neil deGrasse Tyson famously claimed, God is in the gaps.  There is a
>>>> similar issue in engineering.  The individual often sees stability
>>>> where the
>>>> individual doesn't have expertise.  Our God is the algorithm.  We
>>>> "bless"
>>>> certain pieces of code because we don't have the time or knowledge to
>>>> evaluate it.  When we, as engineers and analysts, take that
>>>> perspective, we
>>>> are doing a disservice to the people that use our projects and services.
>>>> Often the best eyes are fresh or untrained eyes.  The more we stop
>>>> telling
>>>> ourselves to step over the gaps in our code bases, the more holes we'll
>>>> be
>>>> able to fill.  All it takes is one set of eyes to find a vulnerability,
>>>> there is no level of expertise required to look and ask questions.  Just
>>>> look.  Maybe you'll find the next 20 year old vulnerability.
>>>> Thanks
>>>> I'd like to thank the following people for their great assistance
>>>> patching,
>>>> coordinating, and advising on this issue:
>>>>     Greg Kroah-Hartman (Linux)
>>>>     Linus Torvalds (Linux)
>>>>     Kees Cook (Google)
>>>>     Xin LI (FreeBSD)
>>>>     Michael Niedermayer (FFmpeg)
>>>>     Luca Barbato (Libav/Gentoo)
>>>>     Markus Oberhumer
>>>>     Christopher J. Dorros (NASA MSL)
>>>>     Dan McDonald (Omniti)
>>>>     Yves-Alexis Perez (Debian)
>>>>     Kurt Seifried (Red Hat)
>>>>     Willy Tarreau (Linux)
>>>>     Solar Designer (Openwall)
>>>>     The US-CERT team
>>>>     The Oracle security team
>>>>     The GE security team
>>>>     Kelly Jackson Higgins (UBM)
>>>>     Steve Ragan (IDG Enterprise)
>>>>     Elinor Mills
>>>> Feeling Guilty?
>>>> Are you reading this post, thinking about all the administrators and
>>>> engineers that are going to have to patch the LZO/LZ4 issue in your
>>>> team's
>>>> systems?  Take some time to tell them how you feel with our hand
>>>> crafted Lab
>>>> Mouse Security custom Sympathy Card!
>>>> Hand crafted with the finest bits and bytes, our Sympathy Card shows
>>>> your
>>>> engineer what they mean to you and your team.  This is a limited run of
>>>> cards, and will proudly display the Linux kernel LZO exploit written by
>>>> Lab
>>>> Mouse on the card.
>>>> Best wishes,
>>>> Don A. Bailey, Founder / CEO, @InfoSecMouse, Lab Mouse Security, 26 Jun
>>>> 2014
>>>> _______________________________________________
>>>> langsec-discuss mailing list
>>>> langsec-discuss at mail.langsec.org
>>>> https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
>>> _______________________________________________
>>> langsec-discuss mailing list
>>> langsec-discuss at mail.langsec.org
>>> https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
>> _______________________________________________
>> langsec-discuss mailing list
>> langsec-discuss at mail.langsec.org
>> https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.langsec.org/pipermail/langsec-discuss/attachments/20140707/6d2442a0/attachment-0001.html>

More information about the langsec-discuss mailing list