The little-known side of vulnerabilities in open source dependencies of applications

in collaboration with: Wolfram Fischer

Digital criminals love easy to exploit vulnerabilities in widespread libraries. In December 2021 it was almost possible to hear the champagne corks pop, when one of the most critical security vulnerabilities was found in a logging library, called Log4j 2 for Java. Such vulnerability has both ingredients: it affects a library widely used in both open and commercial software and attack kits are available to easily exploit it by simply passing carefully crafted strings (that could be taken from the web). This vulnerability was named Log4Shell, and many experts agreed that it is one of the most severe and dangerous security vulnerabilities of the last years.

Nowadays the overwhelming majority of software depends on open source components. Using open source software provides a lot of benefits, e.g., speeding up development thanks to community-developed components implementing widely-used functionalities. Of course any company benefitting from open source must also consider development and maintenance costs for the components, contributing back fixes and improvements. Still, one of the main challenges with the use of open source components is managing security vulnerabilities. Any time a vulnerability is reported and fixed in an open source software, any application making use of it must timely be patched.

Vulnerabilities in open source software are a very attractive attack vector, especially for widespread open source components. Once you have a working attack (exploit) for an open source software, it is often possible to reuse it against any application of any company, provided that it depends on the vulnerable library. As exploits are often available, convenient, and affordable, exploiting known vulnerabilities in open source software is an excellent business for digital criminals.

This is widely known and understood in the IT security community. The Open Web Application Security Project (OWASP) lists “Using components with known vulnerabilities” in its top 10 of security risks since 2013, for almost 10 years now and there are reasons for it to stay.

More and more scanners are available to support the detection, assessment and mitigation of publicly known vulnerabilities affecting open source dependencies of software applications. Once a vulnerability is detected, it can be fixed by upgrading the vulnerable dependency or applying other mitigation strategies. To detect vulnerabilities, most scanners check the dependencies’ versions against a list of known vulnerable libraries. In the case of Log4Shell, a vulnerability scanner would thus detect dependencies on a vulnerable version of log4j-core (the core library of Log4j 2). Developers could then upgrade the reported vulnerable dependencies, thereby solving the problem. But is it really solved?

While a lot of approaches focus on the detection of vulnerable components, less attention is given to the culprit of the vulnerabilities, i.e., the actual code responsible for carrying out the attack.
Reasoning in terms of vulnerable libraries is very convenient, as the list of application dependencies in terms of library versions can be easily and automatically obtained (e.g., relying on build systems like Maven or npm). However it is very dangerous as it does not account for libraries other than the original one that still contain the vulnerable code. Does it happen that the code of a library ends up in other ones? Of course the answer is yes.

The rest of this post will focus on Java, the Java platform and its dependencies. The Java platform is widely used across companies. Moreover its dynamic nature makes it a good candidate for attackers. In fact Java applications can load new code at runtime, to alter their behavior. This is often used in modern applications and unfortunately opens doors for potential attackers: A common way to attack dynamic platforms is to trick them into loading code dynamically, which attackers use to steal data, download malware from the internet, encrypt data for ransomware, etc.

Re-bundling is the very common practice of packaging a dependency (or part of it) within a project on the Java platform. Software developers often do this to obtain a single, self-executable archive (often named using the suffix `jar-with-dependencies`). Whenever a vulnerable library is re-bundled, the vulnerable code ends up in the new archive. Failing at identifying vulnerable code in re-bundles results in applications being vulnerable even after upgrading the “original” vulnerable dependency. Let’s see why.

On the Java platform, the code to be executed is found in the so-called classpath. The classpath is a list of locations, where the Java platform performs an ordered search for the classes that an application wants to load. The Java classloader loads the first found occurrence of the required class: whenever a re-bundle containing vulnerable code is found first, then the vulnerable code is loaded and an attacker can exploit the vulnerability.

Figure 1. Application loads classes from classpath

Figure 1 shows an application which depends on three libraries: commons-lang, log4j-core, and spring-core. Whenever a class from either one of the dependencies needs to be loaded, it is always searched in the classpath in order: first in commons-lang, then log4j-core, and finally in spring-core.

Figure 2. Application loads vulnerable classes from classpath

Figure 2 shows that in case an application depends on a vulnerable version of log4j-core, then vulnerable code is loaded, thereby making the application vulnerable.

Figure 3. Application loads re-bundled vulnerable classes

The application in Figure 3 depends both on a fixed version of `log4j-core` and a library re-bundling the vulnerable code of Log4Shell. Whenever a class belonging to `log4j-core` needs to be loaded, it is searched, in order, in the classpath. If the re-bundle appears first (as in Figure 3), the class would be loaded from there, thereby in its vulnerable form. So, even in case the application depends on a fixed version of the library, there is a chance for the application to be vulnerable depending on the classpath order.

Scanners relying on a mapping of vulnerabilities to affected archives should be able detect the usage of a vulnerable version of log4j-core as shown in Figure 2, however they struggle in identifying vulnerabilities in re-bundles (see Figure 3). This results in so-called false negatives. The same happens whenever vulnerable code is re-bundled without a direct (or transitive) dependency on the original library or manually copied from an open source library and pasted to another project (usually a one time operation where nobody follows up or even notices if a security vulnerability is detected in the original source code at a later point in time).

Scanners relying on the mapping of vulnerabilities to affected libraries may miss the information about the affected re-bundles and to maintain a detailed mapping including re-bundles (assuming it is possible to identify them all) is a very cumbersome and error-prone activity considering the amount of archives re-bundling vulnerable code and the pace of growth of repositories like Maven Central. How to know that the `JndiLookup` class, responsible for Log4Shell, is present in archives other then `log4j-core`, like `pax-logging-log4j2` or the executable of `minecraft-local-map-tool`? As the vulnerable code may end up in arbitrary artifacts and new artifacts becomes available every day, it is clear that to maintain a mapping of vulnerabilities to affected artifacts does not scale.

Re-bundles point out the advantage of using a code-centric approach like the one implemented in Eclipse Steady to be able to report vulnerabilities by the presence of vulnerable code, no matter where it is contained. Eclipse Steady relies on a knowledge base of vulnerabilities with fix-commits, i.e., commits fixing the vulnerabilities, available at project “KB”. Fix-commits can be used to narrow down the vulnerable code inside library archives assuming that the code modified to fix a vulnerability is also the one responsible for the existence of the vulnerability. Eclipse Steady examines compiled Java code and the presence of vulnerable code, independently from the library containing it.

It seems so far that managing known vulnerabilities in OSS dependencies is mostly a reactive game: A vulnerability is disclosed, scanners are used to detect it and then the vulnerability is assessed and mitigated.

Can we do better than playing catchup with newly discovered security vulnerabilities?

One of the most promising solutions which is currently researched is debloating.

The idea of debloating is to remove the parts of the dependencies that are not needed by the application, i.e., to remove as much code as possible from the dependencies without altering the behavior of the application. As every dependency class may be subject to known or yet unknown vulnerabilities, removing classes that are not used decreases the application attack surface and makes it more secure  before vulnerabilities are even found.

As an example: Log4Shell uses several classes in the Log4j 2 library which most probably have no functional use in most applications. One of the first mitigations proposed when Log4Shell became public was to remove a specific class (i.e., `JndiLookup`) from the `log4j-core` component as it contained the code required for exploiting the vulnerability. Removing that class makes it impossible to exploit Log4Shell. If an application does not need any functionality implemented in the `JndiLookup` class, which is often the case, a debloating tool could automatically remove it, thereby making the vulnerability not exploitable even when a vulnerable version of Log4j 2 is declared as dependency. The same holds true for other, discovered or not yet discovered, vulnerabilities.

Known vulnerabilities in OSS dependencies are still a challenge. Critical vulnerabilities keep on being discovered and software projects make large use of open source dependencies. Re-bundles are a juicy attack vectors: the more popular a library is, the more probable it is that someone re-bundles vulnerable code that can be used by digital criminal to attack the application.

Using vulnerability scanners is an important step in the right direction, and scanning at code level is important for detecting vulnerabilities in re-bundles. Though a code-based approach goes in the direction of detecting vulnerabilities no matter where they occur, a study (available here) shows that all existing scanners still need to improve to support the detection of vulnerabilities in modified artifacts. Finally, debloating applications by removing (parts of) dependencies not needed is a promising way to reduce the attack surface.

Feel free to share your  feedback and thoughts in a comment!

Useful links

OWASP Top 10, https://owasp.org/www-project-top-ten/

Eclipse Steady, https://eclipse.github.io/steady/

project “KB”, https://github.com/SAP/project-kb

Identifying Challenges for OSS Vulnerability Scanners – A Study & Test Suite, https://ieeexplore.ieee.org/document/9506931

Contacts

Discover how SAP Security Research serves as a security thought leader at SAP, continuously transforming SAP by improving security.

Serena Ponta, senior researcher at SAP Security Research, Serena PONTA

Wolfram Fischer, senior researcher at SAP Security Research, Wolfram FISCHER