…This is the first in a series of in-depth answers to user questions about RTI features.
A user wrote, “Once I figured out how to turn on the Thread Dump Snapshot, I was really impressed with all the info that’s presented there. I’d like to use it to find a deadlock/hang in my Eclipse/RCP Java application. How would I do that?”
Here’s our answer…
The RTI thread dumps collect information about the thread states, CPU usage, wait time, and blocking time, as well as monitor/lock information. The first thing to do is to make sure thread dumps are enabled in the RTI Collector. To do this you need to edit the collector configuration and check the BTO Options. The THREADDUMP option controls the collection interval.
The default thread dump interval of 1 minute is designed to give a good overview, with low overhead. For finer-grained thread issues you may need to set the thread dump collection interval to a short value, like 5 seconds. This can cause a fair bit of overhead, so use it judiciously, and be certain to set the interval back after collecting some information.
Once you have thread dump snapshots you should start by viewing the Thread Dump Timeline to familiarize yourself with the (kind of) threads in your application or application server. If you are lucky, your threads will have useful names, otherwise you have to figure that out for yourself. You can examine thread stacks by clicking on a snapshot cell. Examining stack traces can help you figure out what “unnamed” threads are doing.
If you have enabled the option for the JVM to detect deadlocks, the threads that are deadlocked will be marked in red. Using the JVM option incurs extra runtime overhead so you might leave that off in a production environment. Even if the JVM option is off, RTI will always calculate thread deadlocks and mark those in red, if lock information is available. If the JVM option is off and no lock information is available, you will have to identify deadlocked threads manually.
To identify potential deadlock manually, look for threads that have locked resources and that are also waiting for resources. You can find these threads by examining each thread stack to look for those that are both holding a lock and waiting for a lock. The next two images show the thread details for the deadlocked threads identified above.
One way to diagnose potential thread deadlocks is to look for a set of threads that are blocked for a long period of time. Even more suspicion is warranted if the threads don’t make any progress and have the same stack trace repeatedly. The Thread Dump Timeline will generally mark these threads in orange, and the less than (“<“) character to indicate that the thread state is the same as the previous snapshot. Checking the Thread Blocked Timeline can confirm that threads are blocking—they will be marked in orange. (See the resource starvation example below for screenshots of what this might look like.)
Another thread issue is resource starvation. In this scenario, a number of threads are competing for a “scarce” resource. Generally this will lead to a thread(s) that gets the resource and a number of other threads that are waiting for the resource. A single higher-priority thread may always control the resource, or the set of threads may trade control of the resource. Again, the Thread Timeline View and the Thread Blocked and Waited Views will help.
In this example (the Dining Philosophers) three threads are contending for the same set of resources. The higher priority thread (Plato) can make progress, but the other two (Aristotle and Socrates) are blocked waiting for the resources Plato has. (Plato is shown as waiting because it uses Thread.sleep() to simulate processing.)
Checking the thread state of Aristotle or Socrates shows the resources in contention:
And checking the monitors directly show the Chopsticks are contended (are locked and have waiting threads):
A final thread issue is a CPU hog. In this scenario a single thread or threads consumes all the CPU for a time slice, starving other threads. The Thread CPU Timeline can help spot these. Here threads which use a large percentage of the available CPU are highlighted. Threads that continually consume lots of CPU time are good candidates for review.
The following example shows “request” threads that block waiting for work, then process the request and terminate. They show high CPU for the duration of the processing.
Checking their thread state shows that they were only using a small percentage of CPU on an unloaded system.
The details say that since the previous snapshot, this thread used 150 milliseconds CPU which was only 2.26% of the 6.64 seconds available, and that thread has used a total of 700 milliseconds since its creation.
- Run your application under RTI and add its collector to your RTI Console.
- Enable and configure thread dumps in the BTO tab of the Configuration Editor.
- Run your application such that it shows the problem(s) you’re looking for.
- Examine the Thread Timeline for red, indicating deadlock, select those and examine the details, then find the referenced methods and resources in your code.
- If there’s no red, look for orange in the Thread Timeline or Thread Blocked tabs, indicating long blocked periods; again the Thread/Monitor Details will point to where.
- Also, look at the Thread CPU tab to see where other threads may be stealing CPU from the apparently stalled thread(s).
…written by Steve North (RTI Product Manager, OC Systems)