Out Of Memory Error Analysis on SAP BTP Cloud Foundry

In part 1 of this series we talked about Java memory management and how it is different in containers. In part 2 we will focus on different out of memory errors that may occur with Java apps in containerized environments and how to mitigate them.

Out Of Memory Errors

Generally, we distinguish between two different out of memory situations:

  • Firstly, the aforementioned container out of memory error, which happens when the container consumes more memory than allowed by its quota.
  • Secondly, the Java Virtual Machine can still run into the ordinary java.lang.OutOfMemoryError situation, where there’s just not enough memory available to hold the required data into one of its memory areas, the heap being one of them.

Since troubleshooting for both situations is significantly different, it’s crucial to understand which out of memory error you are facing to apply the correct countermeasures.

Container Out Of Memory

As mentioned before, a container out of memory situation occurs when your JVM tries to use more memory than your container quota allows. This can happen when the JVM was given memory flags that configure it to use more memory than is available in the container. Another situation we have observed is that a JVM was scheduling more threads than memory was reserved for via the -Xss flag. By default, the Java buildpack expects and reserves enough memory for 250 * 1M (default value for -Xss).

Symptom

The easiest way to identify a container out of memory situation is via the Cloud Foundry command line tool. When calling cf logs APP_NAME --recent, you see something like this:

2021-05-03T15:54:37.33+0200 [APP/PROC/WEB/0] OUT Exit status 137 (out of memory)

and cf events APP_NAME shows you:

Getting events for app spring-music-oom in org cp-performance / space demo as tim.gerlach@sap.com...
time event actor description
2021-05-03T15:54:42.00+0200 app.crash spring-music-oom index: 0, reason: CRASHED, cell_id: 3663701e-98b5-419d-bc51-e4895b767361, instance: 6d36618b-941e-4a03-42f5-a906, exit_description: APP/PROC/WEB: Exited with status 137 (out of memory)
2021-05-03T15:54:42.00+0200 audit.app.process.crash web index: 0, reason: CRASHED, cell_id: 3663701e-98b5-419d-bc51-e4895b767361, instance: 6d36618b-941e-4a03-42f5-a906, exit_description: APP/PROC/WEB: Exited with status 137 (out of memory)

Note the (out of memory) at the end, which indicates that the container itself ran out of memory (and not the JVM).

Mitigation

Cause 1 – Inappropriate memory flags: Are there any JVM memory flags provided to the Java application which are incompatible with the container? Recommendation: If memory flags were provided manually to the JVM, you should review and evaluate them. For example, it doesn’t make sense to explicitly request a 1G heap if the overall container has only 1G of size.

Cause 2 – High thread count: A very high thread count can lead to memory errors at the container level when the sum of memory consumed by every thread (via -Xss) exceeds the available memory of the container. Especially under a high load, a high number of threads might be processing requests. Recommendation: Verify the maximum number of threads used by your JVM (as explained above) and provide a hint to the Java buildpack with the number of threads to be expected. You can add this hint to the SAP Java buildpack via the JBP_CONFIG_JAVA_OPTS environment variable and the stack_threads property as documented here.

Cause 3 – High native memory consumption: In some cases the JVM requires more native memory than the memory calculation of the buildpack accounted for. You can excplicitly request more memory headroom to be taken into account by adding a hint to the SAP Java buildpack via the JBP_CONFIG_JAVA_OPTS environment variable and the headroom property as documented here. The headroom describes the percentage of available container memory that should be reserved and not assigned to the JVM.

Java Virtual Machine Out Of Memory

When you’ve properly configured the Java Virtual Machine, it knows exactly how much memory it is allowed to allocate for each memory area within the boundaries of the overall container quota. If it is unable to allocate more memory for any of its memory areas, as configured via the memory flags, it crashes with a java.lang.OutOfMemoryError: <space>.

Symptom

When checking the events of the app via cf events <APP_NAME>, this shows up as app.crash event exit status 137:

$ cf events APP_NAME
Getting events for app spring-music in org cp-performance / space demo as tim.gerlach@sap.com...
time event actor description
2021-04-30T17:37:47.00+0200 app.crash spring-music-oom index: 0, reason: CRASHED, cell_id: 44f68af5-c10d-4c4a-81c9-8d4027dd04fe, instance: 4fcde600-b711-4499-6092-edef, exit_description: APP/PROC/WEB: Exited with status 137
2021-04-30T17:37:47.00+0200 audit.app.process.crash web index: 0, reason: CRASHED, cell_id: 44f68af5-c10d-4c4a-81c9-8d4027dd04fe, instance: 4fcde600-b711-4499-6092-edef, exit_description: APP/PROC/WEB: Exited with status 137

In order to understand which memory area caused the memory exhaustion, check the logs via cf logs APP_NAME --recent. At the top, the log contains information about the memory area which was exhausted:

$ cf logs APP_NAME --recent 2021-04-30T17:37:35.69+0200 [APP/PROC/WEB/0] ERR Resource exhaustion event: the JVM was unable to allocate memory from the heap.

In addition, the log output contains a heap histogram which allows you to find out which object types were found in the heap at the point in time the application crashed:

 2021-04-30T17:37:37.46+0200 [APP/PROC/WEB/0] OUT | Instance Count | Total Bytes | Class Name | 2021-04-30T17:37:37.46+0200 [APP/PROC/WEB/0] OUT | 6102 | 125167864 | [Ljava/lang/Object; | 2021-04-30T17:37:37.46+0200 [APP/PROC/WEB/0] OUT | 50985 | 5097648 | [C | 2021-04-30T17:37:37.46+0200 [APP/PROC/WEB/0] OUT | 13288 | 1574216 | Ljava/lang/Class; | 2021-04-30T17:37:37.46+0200 [APP/PROC/WEB/0] OUT | 48268 | 1544576 | Ljava/util/concurrent/ConcurrentHashMap$Node; | 2021-04-30T17:37:37.47+0200 [APP/PROC/WEB/0] OUT | 50709 | 1217016 | Ljava/lang/String; |

You also find a detailed dump of the memory usage per memory pool:

 2021-04-30T17:37:37.57+0200 [APP/PROC/WEB/0] OUT Memory usage: 2021-04-30T17:37:37.57+0200 [APP/PROC/WEB/0] OUT Heap memory: init 12582912, used 146688144, committed 381157376, max 381157376 2021-04-30T17:37:37.57+0200 [APP/PROC/WEB/0] OUT Hint: Heap memory is over 95% full. To increase it, increase the container size. 2021-04-30T17:37:37.57+0200 [APP/PROC/WEB/0] OUT Non-heap memory: init 7667712, used 95216776, committed 99704832, max 554958848 2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Memory pool usage: 2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT CodeHeap 'non-nmethods': init 2555904, used 1402752, committed 2555904, max 6291456 2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT CodeHeap 'profiled nmethods': init 2555904, used 8860160, committed 8912896, max 121634816 2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT CodeHeap 'non-profiled nmethods': init 2555904, used 2087936, committed 2555904, max 123731968 2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Metaspace: init 0, used 73458568, committed 75587584, max 155844608 2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Compressed Class Space: init 0, used 9407360, committed 10092544, max 147456000 2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Eden Space: init 4390912, used 2258512, committed 105185280, max 105185280 2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Survivor Space: init 524288, used 0, committed 13107200, max 13107200 2021-04-30T17:37:37.58+0200 [APP/PROC/WEB/0] OUT Tenured Gen: init 7143424, used 144429632, committed 262864896, max 262864896
Mitigation

If you find your Java application in this situation, it’s often an indicator of a problem in application design where you’re trying to hold more data in memory than your JVM is capable of. A typical example includes situations where a large set of objects are queried from the database without chunking or streaming. While it sounds appealing to just increase the allocated memory to the Java application, it’s important to understand that scaling up vertically (more memory) doesn’t solve an architectural problem. Another issue might be a memory leak in the application. If the heap histogram is not sufficient to find the memory leak, get a full heap dump of the Java process (part 3 of this blog series).

Next

In the next blog post, part 3, we will show you how heap dumps and thread dumps can be created from running Java applications on SAP BTP Cloud Foundry.