[langsec-discuss] LZO, subtle bugs, theorem provers

Dan Kaminsky dan at doxpara.com
Mon Jul 7 20:56:56 UTC 2014

Do we have a theorem prover / static analyzer that finds this bug?

We sure have a lot that miss it.

On Monday, July 7, 2014, <dan at geer.org> wrote:

> This is a verbatim transcript of what appeared on Peter Neumann's
> RISKS Digest; note Henry Baker's leadoff comment.
> -----------------8<------------cut-here------------8<-----------------
> Date: Fri, 27 Jun 2014 06:11:41 -0700
> From: Henry Baker <hbaker1 at pipeline.com <javascript:;>>
> Subject: Buffer overflows in 20-year-old LZ decompression code
>   (Don A. Bailey)
> FYI -- It's time to start using theorem provers on *all* code; if you can't
> convince the theorem prover re buffer overflows, you'll have to insert
> executable code to explicitly check.  HB
> Thursday, June 26, 2014
> Raising Lazarus - The 20 Year Old Bug that Went to Mars
> http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.html
> It's rare that you come across a bug so subtle that it can last for two
> decades.  But, that's exactly what has happened with the
> Lempel-Ziv-Oberhumer (LZO) algorithm.  Initially written in 1994, Markus
> Oberhumer designed a sophisticated and extremely efficient compression
> algorithm so elegant and well architected that it outperforms zlib and bzip
> by four or five times their decompression speed.
> As a result, Markus has made a successful and well deserved career out of
> optimizing code for various platforms.  I was impressed to find out that
> his
> LZO algorithm has gone to the planet Mars on NASA devices multiple times!
> Most recently, LZO has touched down on the red planet within the Mars
> Curiosity Rover, which just celebrated its first martian anniversary on
> Tuesday.
> Because of the speed and efficiency of the algorithm, LZO has made its way
> into both proprietary and open source projects world-wide.  It's has lived
> in automotive systems, airplanes, and other embedded systems for over a
> decade.  The algorithm has even made its way into projects we use on a
> daily
> basis, such as OpenVPN, MPlayer2, Libav, FFmpeg, the Linux kernel, Juniper
> Junos, and much, much, more.
> In the past few years, LZO has gained traction in file systems as well.
>  LZO
> can be used in the Linux kernel within btrfs, squashfs, jffs2, and ubifs.
>  A
> recent variant of the algorithm, LZ4, is used for compression in ZFS for
> Solaris, Illumos, and FreeBSD.
> LZO is even enabled in kernels for Samsung Android devices to increase
> kernel loading speed and improve the user experience, as noted in the
> Android Hacker's Handbook.
> With its popularity increasing, Lempel-Ziv-Oberhumer has been rewritten by
> many engineering firms for both closed and open systems.  These rewrites,
> however, have always been based on Oberhumer's core open source
> implementation.  As a result, they all inherited a subtle integer overflow.
> Even LZ4 has the same exact bug, but changed very slightly.
> Engineered Genetics
> Code reuse is a normal part of engineering, and is something we do every
> day.  But, it can be dangerous.  By reusing code that is known to work
> well,
> especially in highly optimized algorithms, projects can become subject to
> vulnerabilities in what is perceived as trusted code.  Auditing highly
> optimized algorithms is a fragile endeavor.  It is very easy to break these
> types of algorithms.  Therefore, reused code that is highly specialized is
> often presumed safe because of its age, its proven efficiency, and its
> fragility.
> This creates a sort of digital DNA, a digital genetic footprint that can be
> traced over time.  Though there are certainly many instances of proprietary
> variants of LZO and LZ4, the following six implementations are available in
> open source software
>     Oberhumer LZO (core/reference open source implementation)
>     Linux kernel's LZO implementation
>     Libav's LZO implementation
>     FFmpeg's LZO implementation
>     Linux kernel's LZ4 implementation
>     LZ4 core/reference implementation
> Despite each implementation of the algorithm being noticeably different,
> each variant is vulnerable in the exact same way.  Let's take a look at a
> version of the algorithm that is easy to read online, the Linux kernel
> implementation found here.
> In all variants of LZ[O4], the vulnerability occurs when processing a
> Literal Run.  This is a chunk of compressed data that isn't compressed at
> all.  Literals are uncompressed bytes that the user decided, for whatever
> reason, should not be compressed.  A Literal Run is signaled by a state
> machine in LZO, and by a Mask in LZ4.
>  56                         if (likely(state == 0)) {
>  57                                 if (unlikely(t == 0)) {
>  58                                         while (unlikely(*ip == 0)) {
>  59                                                 t += 255;
>  60                                                 ip++;
>  61                                                 NEED_IP(1);
>  62                                         }
>  63                                         t += 15 + *ip++;
>  64                                 }
>  65                                 t += 3;
> In the above sample, the integer overflow is evident.  The variable 't' is
> incremented by 255 every time the compression payload contains a nil byte
> (0x00) when a Literal Run is detected.  Regardless of whether the variable
> 't' is signed or unsigned, 255 will be added to it.  The only check is to
> ensure that the input buffer contains another byte.  This means that 't'
> can
> accumulate until it is a very large unsigned integer.  If 't' is a 32bit
> integer, it only takes approximately sixteen (16) megabytes of zeroes to
> generate a sufficiently large value for 't'.  Though 't' can overflow here,
> this is not where the attack occurs.  There is another more important
> overflow just below this chunk of code.
> 66 copy_literal_run:
> 68                                 if (likely(HAVE_IP(t + 15) && HAVE_OP(t
> + 15))) {
> 69                                         const unsigned char *ie = ip +
> t;
> 70                                         unsigned char *oe = op + t;
> 71                                         do {
> 72                                                 COPY8(op, ip);
> 73                                                 op += 8;
> 74                                                 ip += 8;
> 75                                                 COPY8(op, ip);
> 76                                                 op += 8;
> 77                                                 ip += 8;
> 78                                         } while (ip < ie);
> 79                                         ip = ie;
> 80                                         op = oe;
> 81                                 } else
> 82 #endif
> Above, we see the "copy_literal_run" chunk of code.  This is the section of
> the LZO algorithm that uses the variable 't' as a size parameter.  On line
> 68, the code ensures that the input buffer (IP) and output buffer (OP) are
> large enough to contain 't' bytes.  However, in the Linux kernel
> implementation, they pad by 15 bytes to ensure the 16 byte copy does not
> overflow either buffer.  This is where things fail.
> The macros HAVE_IP and HAVE_OP validate that 't' bytes are available in the
> respective buffer.  But, before the macro is called, the expression (t +
> 15)
> is evaluated.  If the value of 't' is large enough, this expression will
> cause an integer overflow.  The attacker can make this expression result in
> a value of zero (0) through fourteen (14) by forcing 't' to equal the
> values
> -15 to -1, respectively.  This means that the HAVE macros will always
> believe that enough space is available in both input and output buffers.
> On line 70, the pointer 'oe' will now point to before the 'op' buffer,
> potentially pointing to memory prior to the start of the output buffer.
>  The
> subsequent code will copy sixteen (16) bytes from the input pointer to the
> output pointer, which does nothing as these pointers should point to a
> "safe" location in memory.  However, there are two side effects here that
> the attacker must abuse: lines 78 and 80.
> Because 'ie' will always have an address lower in memory than 'ip', the
> loop
> is immediately broken after the first sixteen (16) byte copy.  This means
> that the value 't' did not cause a crash in the copy loop, making this copy
> essentially a no-op from the attacker's point of view.  Most importantly,
> on
> line 80 (and 79), the buffer pointer is set to the overflown pointer.  This
> means that now, the output pointer points to memory outside of the bounds
> of
> the output buffer.  The attacker now has the capability to corrupt memory,
> or at least cause a Denial of Service (DoS) by writing to an invalid memory
> page.
> The Impact of Raising Dead Code
> Each variant of the LZO and LZ4 implementation is vulnerable in slightly
> different ways.  The attacker must construct a malicious payload to fit
> each
> particular implementation.  One payload cannot be used to trigger more than
> a DoS on each implementation.  Because of the slightly different overflow
> requirements, state machine subtleties, and overflow checks that must be
> bypassed, even a worldwide DoS is not a simple task.
> This results in completely different threats depending on the
> implementation
> of the algorithm, the underlying architecture, and the memory layout of the
> target application.  Remote Code Execution (RCE) is possible on multiple
> architectures and platforms, but absolutely not all.  Denial of Service is
> possible on most implementations, but not all.  Adjacent Object Over-Write
> (OOW) is possible on many architectures.
> Lazarus raised from the dead
> Because the LZO algorithm is considered a library function, each specific
> implementation must be evaluated for risk, regardless of whether the
> algorithm used has been patched.  Why?  We are talking about code that has
> existed in the wild for two decades.  The scope of this algorithm touches
> everything from embedded microcontrollers on the Mars Rover, mainframe
> operating systems, modern day desktops, and mobile phones.  Engineers that
> have used LZO must evaluate the use case to identify whether or not the
> implementation is vulnerable, and in what format.
> Here is a list of impact based on each library. Implementations, or use
> cases of each library may change the threat model enough to warrant
> reclassification.  So, please have a variant audited by a skilled third
> party, such as <shameless plug>.
>     Oberhumer LZO
>         RCE: Impractical
>         DoS: Practical
>         OOW: Practical
>         NOTE: 64bit platforms are impractical for all attacks
>     Linux kernel LZO
>         RCE: Impractical
>         DoS: Practical
>         OOW: Practical
>         NOTE: Only i386/PowerPC are impacted at this time
>     Libav LZO
>         RCE: Practical
>         DoS: Practical
>         OOW: Practical
>     FFmpeg LZO
>         RCE: Practical
>         DoS: Practical
>         OOW: Practical
>     Linux kernel LZ4
>         RCE: Practical
>         DoS: Practical
>         OOW: Practical
>         NOTE: 64bit architectures are NOT considered practical
>     LZ4
>         RCE: Practical
>         DoS: Practical
>         OOW: Practical
>         NOTE: 64bit architectures are NOT considered practical
> For a bug report on each implementation, please visit the Lab Mouse
> Security's vulnerability site.
> How Do You Know If You're Vulnerable
> Projects Using LZO/LZ4
> The easiest way to identify whether your specific implementation is
> vulnerable is to determine the maximum chunk size that is passed to the
> decompress routine.  If buffers of sixteen (16) megabytes or more can be
> passed to the LZO or LZ4 decompress routine in one call, then exploitation
> of the integer overflow is possible.  For example, ZFS constrains buffer
> sizes to 128k.  So, even though they use a vulnerable implementation of
> LZ4,
> an attack is not possible without a second bug to bypass the buffer size
> constraint.
> The second easiest way is to identify the bit size of the count variable.
> If the count variable (for example, named 't' in the Linux kernel code
> shown
> above) is 64bit, it would take such a massive amount of data to trigger the
> overflow that the attack would likely be infeasible, regardless of how much
> data can be passed to the vulnerable function in one call.  This is due to
> the fact that even modern computers do not have enough RAM available to
> store the data required to implement such an attack.
> However, there is a specific issue with the previous check.  Validate that
> even if the count variable is 64bit in size, the value used is still 64bit
> when a length value is checked.  If the actual length value is truncated to
> 32bits, the attack will still work with only sixteen (16) megabytes of
> data.
> Users
> All users of FFmpeg, Libav, and projects that depend on them, should
> consider themselves at risk to remote code execution.  Period.  Please
> update your software from the FFmpeg and Libav websites, or refrain from
> using these applications until your distribution has an adequate patch.
> It should be noted that certain Linux distributions package Mplayer2 with
> the base system by default.  MPlayer2 is vulnerable to RCE "out of the
> box".
> If your distribution packages MPlayer2 by default, be sure to disable the
> embedded media player plugin (gecko-mediaplayer) for your browser.
> Firefox/Iceweasel, Chromium, Opera, Konqueror, and other Linux-based
> browsers are vulnerable to RCE regardless of the platform/architecture when
> an MPlayer2 plugin is enabled.
> Vendor Status
> Lab Mouse has reached out to and worked with each vendor of the vulnerable
> algorithm.  As of today, June 26th, 2014, all LZO vendors have patches
> either available online, or will later today.  Please update as soon as
> possible to minimize the existing threat surface.
> In the near future, Lab Mouse will publish a more technical blog on why and
> how RCE is possible using this bug.  We consider that information to be
> imperative for both auditors and engineers, as it assists in identifying,
> classifying, and prioritizing a threat.  However, that report will be
> released once the patches have been widely distributed for a sufficient
> amount of time.
> For more information, please visit our contact page.  We are more than
> happy
> to help your team with their use case, or implementation of these
> algorithms.
> Summary
> Overall, this is how this bug release breaks down.
>   Vendors have patches ready or released
>   Distributions have been notified
>   Vendors of proprietary variants have been notified (where they could be
> found)
>   All bug reports can be found here
>   RCE is not only possible but practical on all Libav/FFmpeg based projects
>   All others are likely impractical to RCE, but still possible given a
>     sufficiently skilled attacker
> It is always exciting to uncover a vulnerability as subtle as this issue,
> especially one that has persisted and propagated for two decades.  But, it
> makes me pause and consider the way we look at engineering as a model.
> Speed and efficiency are imperatives for modern projects.  We're building
> technology that touches our lives like never before.  I know that most
> engineers strive to build not only elegant, but safe code.  But, we still
> see security as a disparate discipline from engineering.  Security and
> engineering could not be more tightly bound.  Without engineering, you
> can't
> provide security to users.  Without security, engineering cannot provide a
> stable and provable platform.
> Neil deGrasse Tyson famously claimed, God is in the gaps.  There is a
> similar issue in engineering.  The individual often sees stability where
> the
> individual doesn't have expertise.  Our God is the algorithm.  We "bless"
> certain pieces of code because we don't have the time or knowledge to
> evaluate it.  When we, as engineers and analysts, take that perspective, we
> are doing a disservice to the people that use our projects and services.
> Often the best eyes are fresh or untrained eyes.  The more we stop telling
> ourselves to step over the gaps in our code bases, the more holes we'll be
> able to fill.  All it takes is one set of eyes to find a vulnerability,
> there is no level of expertise required to look and ask questions.  Just
> look.  Maybe you'll find the next 20 year old vulnerability.
> Thanks
> I'd like to thank the following people for their great assistance patching,
> coordinating, and advising on this issue:
>     Greg Kroah-Hartman (Linux)
>     Linus Torvalds (Linux)
>     Kees Cook (Google)
>     Xin LI (FreeBSD)
>     Michael Niedermayer (FFmpeg)
>     Luca Barbato (Libav/Gentoo)
>     Markus Oberhumer
>     Christopher J. Dorros (NASA MSL)
>     Dan McDonald (Omniti)
>     Yves-Alexis Perez (Debian)
>     Kurt Seifried (Red Hat)
>     Willy Tarreau (Linux)
>     Solar Designer (Openwall)
>     The US-CERT team
>     The Oracle security team
>     The GE security team
>     Kelly Jackson Higgins (UBM)
>     Steve Ragan (IDG Enterprise)
>     Elinor Mills
> Feeling Guilty?
> Are you reading this post, thinking about all the administrators and
> engineers that are going to have to patch the LZO/LZ4 issue in your team's
> systems?  Take some time to tell them how you feel with our hand crafted
> Lab
> Mouse Security custom Sympathy Card!
> Hand crafted with the finest bits and bytes, our Sympathy Card shows your
> engineer what they mean to you and your team.  This is a limited run of
> cards, and will proudly display the Linux kernel LZO exploit written by Lab
> Mouse on the card.
> Best wishes,
> Don A. Bailey, Founder / CEO, @InfoSecMouse, Lab Mouse Security, 26 Jun
> 2014
> _______________________________________________
> langsec-discuss mailing list
> langsec-discuss at mail.langsec.org <javascript:;>
> https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.langsec.org/pipermail/langsec-discuss/attachments/20140707/e58275d6/attachment-0001.html>

More information about the langsec-discuss mailing list