Chaos Engineering – Blocked Threads

In the series of chaos engineering articles, we have been learning to simulate various performance problems. In this post, let’s discuss how to make threads go into BLOCKED state.

Sample Program

Here is a sample program from open source BuggyApp application, which would make threads go into BLOCKED state. A thread will enter into a BLOCKED state when it couldn’t acquire a lock on an object because another thread already holds the lock on the same object and doesn’t release it. Review the program carefully.

public class BlockedAppDemo { public static void start() { for (int counter = 0; counter < 10; ++counter) { // Launch 10 threads. new AppThread().start(); } }
} public class AppThread extends Thread { @Override public void run() { AppObject.getSomething(); }
} public class AppObject { public static synchronized void getSomething() { while (true) { try { Thread.sleep(10 * 60 * 1000); } catch (Exception e) {} } }

Sample program contains the ‘BlockedAppDemo’ class. This class has start() method. In this method,’ 10 new threads are created.  In AppThread class there is a run() method that invokes getSomething() method on the AppObject. In this getSomething() method, thread is put to continuous sleep i.e. thread is repeatedly sleeping for 10 minutes again and again. But if you notice, getSomething() method is a synchronized method. Synchronized methods can be executed by only one thread at a time. If any other thread tries to execute the getSomething() method while the previous thread is still working on the method, then the new thread will be put in the BLOCKED state.

In this case, 10 threads are launched to execute getSomething() method. But however only one thread will acquire lock and execute this method, remaining 9 threads will be put in BLOCKED state.

NOTE: If threads are in BLOCKED for a prolonged period, then application may become unresponsive.

How to diagnose ‘blocked thread’?

You can diagnose Blocked Thread either through a manual or automated approach.

Manual approach

In the manual approach, you need to capture thread dumps as the first step. A thread dump shows all the threads that are in memory and their code execution path. You can capture thread dump using one of the 8 options mentioned here. But an important criteria is: You need to capture thread dump right when the problem is happening. Once the thread dump is captured, you need to import the thread dump from your production servers to your local machine. You can use thread dump analysis tools like fastThreadsamurai  to analyze the thread dumps from your local machine.

Automated approach

You can use root cause analysis tools like yCrash – which automatically captures application-level data (thread dump, heap dump, Garbage Collection log) and system-level data (netstat, vmstat, iostat, top, top -H, dmesg,…). Besides capturing the data automatically, it marries application-level data and system-level data and generates an instant root cause analysis report. Below is the report generated by the yCrash tool when the above sample program is executed:

Fig:  yCrash reporting the line of code in which 9 threads are in blocked state

You can notice yCrash tool reporting 9 threads are in BLOCKED state and it’s also pointing out the stack trace in which they are stuck. From the stacktrace you can observe that thread is stuck on ‘com.buggyapp.blockedapp.AppObject#getSomething()’ method.

Fig:  yCrash transitive graph showing BLOCKED threads

yCrash prints a transitive dependency graph that shows what thread is blocking what threads. In this transitive  graph you can see ‘Thread-19’ blocking 9 other threads. If you click on the thread names in the graph, you can see the stack trace of that particular thread. When you click on ‘Thread-19’, you will notice that thread is stuck on sleep() method in java.lang.Thread. Stack trace of ‘Thread-19’ will also points out that before getting stuck, this thread has obtained 1 lock, and due to which 9 threads are put in BLOCKED state.