Shining a (Strobe) Light on your Java Application

Featured

Since it’s near Halloween, you might have experienced this recently: you’re walking through a strange, dark place. You don’t know what to expect. You’re a bit nervous. Every so often, a red light comes on, illuminating a ghoul as it jumps in your path and scares the #&@% out of you. Then it’s dark again. Typical haunted house. Now a strobe light starts flashing. Slowly at first, so you just get glimpses of what you’re moving through. Then it starts flashing more frequently, several times per second. Now, you can put together the successive snapshots and form a coherent picture of what’s around you. It’s not so scary.

Is your application a haunted house? Would you like to illuminate it with a strobe light? Use statistical profiling. It’s not just for performance analysis. The idea is very simple. Using Java’s hprof utility with “cpu=samples” (or even better, RTI’s statprof support), you can sample what the program is doing every second, or every 100th of a second. Not just what method is executing, but the entire stack trace.

This kind of “profiling” is the lowest overhead I know of for Java. Everything executes full-speed except for brief interruptions while the monitoring thread updates the in-memory counts and tracebacks. But the point here is that even if you’re not focused on a performance problem, the stack traces give you lots of information about how the program works. Furthermore, I find its much easier to read and understand source code when following a known execution path.

Let’s get down to specifics.

Enabling CPU sampling (statistical profiling)

Using Java 5 or newer, add:

-agentlib:hprof=cpu=samples,interval=10,depth=50

to the java options you’re already using, such as the JAVA_OPTS environment variable used by Tomcat and JBoss and Tomcat.

Here, “interval=10″ means “take a sample every 10 milliseconds”, or 100 per second. “depth=50″ means “record up to 50 call levels for each sample.” These settings will impact your program’s performance, but can produce the best data in the shortest time. You can “turn it down” to “interval=1000,depth=10″ to sample once per second and record ten call levels, but you will have to run your program for much longer before you collect enough samples. More on this below.

Dumping/Resetting Sampling

The way hprof works is that you request a dump of the data collected so far by sending a signal to the JVM process. A side-effect of dumping a report is that the data is reset.

To force a dump on Linux,

PID=$(pw -wwC java | grep hprof | awk '{ print $2 }'
kill -QUIT $PID

or, if you have jmap (included with JDK but not JRE):

jmap -dump:file=hprof-dump.txt $PID

Understanding hprof=cpu=samples output

There are two parts to the output: a list of method names ordered highest sample count first; and the stack traces associated with those samples:

CPU SAMPLES BEGIN (total = 126) Fri Oct 22 12:12:14 2004
rank   self  accum   count trace method
   1 53.17% 53.17%      67 300027 java.util.zip.ZipFile.getEntry
   2 17.46% 70.63%      22 300135 java.util.zip.ZipFile.getNextEntry
   ... (and many more) ... 
TRACE 300027:
java.util.zip.ZipFile.getEntry(ZipFile.java:Unknown line)
java.util.zip.ZipFile.getEntry(ZipFile.java:253)
java.util.jar.JarFile.getEntry(JarFile.java:197)
java.util.jar.JarFile.getJarEntry(JarFile.java:180)
... (length depends on depth= value) ... 

TRACE 300135:
java.util.zip.ZipFile.getNextEntry(ZipFile.java:Unknown line)
java.util.zip.ZipFile.access$700(ZipFile.java:35)
java.util.zip.ZipFile$3.nextElement(ZipFile.java:419)
java.util.zip.ZipFile$3.nextElement(ZipFile.java:413)
...

RTI Statprof View

RTI collects samples and stack traces just as Java’s hprof, but the RTI Console provides several interactive views of this data that makes sorting and analysis easier. The stack trace tab below is most similar to that provided by hprof, showing the trace for each CPU sample.

The call graph provides an interactive view of the call trees, and merges overlapping stack traces together:

And the Method Summary provides a flat list of all methods independent of their callers:

RTI Statprof sampling is always enabled at a “background” level, so you can just take a snapshot at any time to explore for yourself.

Subtleties and Complications

Experience using statistical profiling on server applications has led me to the following
guidelines:

  • Take samples more frequently, or run longer.
    As with any statistical analysis, more samples mean better accuracy. Above I recommend a 10ms interval, but when executing a billion instructions per second you’re still missing a lot. Conversely, spending all your time sampling can skew performance and in a time-sensitive application actually change the executed path, so you need to be aware of that when looking at the data. Either way, the longer you run, the more samples you collect, the more your samples will reflect what your test is doing. But…
  • Keep the application busy, because you’re measuring wait time, too.
    If your application spends 90% of its time waiting to read messages or http requests, then your statistical sample will show that. This is generally not useful information: you want to know what it’s doing when it’s actually processing a request. So, flood your application with requests while you’re sampling, to keep it busy. But…
  • Your data is only as good as your test run.
    Not all requests or program input are equally useful. If you want to know what the application does when it receives new-user requests, you need to do lots of those, and they need to be valid. More than once I have been misled because the input message I was testing was erroneous and so was finishing quickly without ever executing the paths of interest. So…
  • Don’t treat it as gospel, it’s statistics. Given the complications mentioned above, it’s very possible that the samples you get will not cover all the interesting or slow paths your users or testers are experiencing. I’ve found that the best use of statistical profiling is as a guide to walk through the source code, understanding what is happening, and to get a “short list” of methods to investigate further.
  • Use actual instrumentation to get detailed performance data.
    When you feel you’ve gotten what you can from statistical profiling, it’s time to actually instrument your program. You can do this with “hprof=cpu=times”, but that instruments every method and really slows things down. I prefer to use RTI to identify exactly the classes and methods I’m focusing on. This quickly provides a low-overhead summary of actual calls with min/max/average elapsed and CPU time values. I can then enable a deep-dive trace with of each call to identify specific parameter values and SQL queries.

Bottom Line

Statistical profiling using RTI’s Statprof view or Java’s built-in hprof=cpu=samples mechanism provides quick, low-overhead start toward understanding the behavior and performance of an unfamiliar application.


Tom Fleck
RTI Support

Debugging Java Thread Problems using RTI

Featured

…This is the first in a series of in-depth answers to user questions about RTI features.

A user wrote, “Once I figured out how to turn on the Thread Dump Snapshot, I was really impressed with all the info that’s presented there.   I’d like to use it to find a deadlock/hang in my Eclipse/RCP Java application.   How would I do that?”

Here’s our answer…

The RTI thread dumps collect information about the thread states, CPU usage, wait time, and blocking time, as well as monitor/lock information.  The first thing to do is to make sure thread dumps are enabled in the RTI Collector.  To do this you need to edit the collector configuration and check the BTO Options.  The THREADDUMP option controls the collection interval.

1st pic

The default thread dump interval of 1 minute is designed to give a good overview, with low overhead.  For finer-grained thread issues you may need to set the thread dump collection interval to a short value, like 5 seconds.  This can cause a fair bit of overhead, so use it judiciously, and be certain to set the interval back after collecting some information.

Once you have thread dump snapshots you should start by viewing the Thread Dump Timeline to familiarize yourself with the (kind of) threads in your application or application server.  If you are lucky, your threads will have useful names, otherwise you have to figure that out for yourself.  You can examine thread stacks by clicking on a snapshot cell.   Examining stack traces can help you figure out what “unnamed” threads are doing.

2nd pic

If you have enabled the option for the JVM to detect deadlocks, the threads that are deadlocked will be marked in red.  Using the JVM option incurs extra runtime overhead so you might leave that off in a production environment. Even if the JVM option is off, RTI will always calculate thread deadlocks and mark those in red, if lock information is available.  If the JVM option is off and no lock information is available, you will have to identify deadlocked threads manually.

3rd pic

4th pic

To identify potential deadlock manually, look for threads that have locked resources and that are also waiting for resources.  You can find these threads by examining each thread stack to look for those that are both holding a lock and waiting for a lock. The next two images show the thread details for the deadlocked threads identified above.

5th pic

6th pic

One way to diagnose potential thread deadlocks is to look for a set of threads that are blocked for a long period of time.  Even more suspicion is warranted if the threads don’t make any progress and have the same stack trace repeatedly.  The Thread Dump Timeline will generally mark these threads in orange, and the less than (“<“) character to indicate that the thread state is the same as the previous snapshot.  Checking the Thread Blocked Timeline can confirm that threads are blocking—they will be marked in orange. (See the resource starvation example below for screenshots of what this might look like.)

Another thread issue is resource starvation.  In this scenario, a number of threads are competing for a “scarce” resource.  Generally this will lead to a thread(s) that gets the resource and a number of other threads that are waiting for the resource.  A single higher-priority thread may always control the resource, or the set of threads may trade control of the resource.  Again, the Thread Timeline View and the Thread Blocked and Waited Views will help.

In this example (the Dining Philosophers) three threads are contending for the same set of resources.  The higher priority thread (Plato) can make progress, but the other two (Aristotle and Socrates) are blocked waiting for the resources Plato has.  (Plato is shown as waiting because it uses Thread.sleep() to simulate processing.)

7th pic

Checking the thread state of Aristotle or Socrates shows the resources in contention:

8th pic

And checking the monitors directly show the Chopsticks are contended (are locked and have waiting threads):

9th pic

A final thread issue is a CPU hog.  In this scenario a single thread or threads consumes all the CPU for a time slice, starving other threads.  The Thread CPU Timeline can help spot these.  Here threads which use a large percentage of the available CPU are highlighted.  Threads that continually consume lots of CPU time are good candidates for review.

The following example shows “request” threads that block waiting for work, then process the request and terminate.  They show high CPU for the duration of the processing.

10th pic

11th pic

Checking their thread state shows that they were only using a small percentage of CPU on an unloaded system.

The details say that since the previous snapshot, this thread used 150 milliseconds CPU which was only 2.26% of the 6.64 seconds available, and that thread has used a total of 700 milliseconds since its creation.

In summary:

  • Run your application under RTI and add its collector to your RTI Console.
  • Enable and configure thread dumps in the BTO tab of the Configuration Editor.
  • Run your application such that it shows the problem(s) you’re looking for.
  • Examine the Thread Timeline for red, indicating deadlock, select those and examine the details, then find the referenced methods and resources in your code.
  • If there’s no red, look for orange in the Thread Timeline or Thread Blocked tabs, indicating long blocked periods; again the Thread/Monitor Details will point to where.
  • Also, look at the Thread CPU tab to see where other threads may be stealing CPU from the apparently stalled thread(s).

 …written by Steve North (RTI Product Manager, OC Systems)

Performance checkpoint…

Featured

APM (Application Performance Management).  Small word, big task.

Some organizations have it and use it.  Some say “sounds like something my organization needs”.  Some can’t identify problems in their applications, but they don’t have the resources or the manpower to provide the insight.  Plus, they have their own job to do.  Then something happens, and all the sudden, they need a performance management solution in place – yesterday.

Time…money…manpower.  Does this ring a bell?

Like when you are trying to modernize your system, and you are given an SLA to have a 2 second response time for many thousands of internal users.  You need the ability to identify where the slowness is occurring on an ongoing basis to maintain response time requirements; you have to measure the actual end-user response time and decompose where slow transactions are spending their time.  You need a complete understanding of the user experience and to be able to quickly correct slowness – check.

What if your organization experiences intermittent application crashes.  You suspect database problems.  Until you identify the source, you can’t deploy the update into production.  The crashes and delayed releases cause a loss of revenue.  Ugh – what a nightmare.  I can hear management knocking on my door.  You need to find and focus on the actual problem and resolve the issue – check.

Customer support is major.  Not only is it a big cost, but the more you can drive users to online support, the less money you spend on call centers.  Ongoing good response time is key and with low-overhead transaction monitoring and lightweight production profiling, you can define monitoring and alerts on any code in your application, Legacy, COTS, POJO, JEE framework.  You can also respond to alerts by enabling diagnostic snapshots giving you diagnostic level traces of specific transactions.  Happy customers = returning customers – check.

If you have adopted and embraced the methodology and practice of Application Performance Management, you really can save money and avoid correctness problems early in the software lifecycle.  Developers directly connect to local “sandbox” servers to collect performance metrics and profiles for application code under test.  The earlier a problem is detected in the software lifecycle, the less expensive it is to resolve.  Let your developer make sure that his code performs well before it is delivered.  Have them analyze transaction profiles and export and share data, import diagnostic snapshots, correlate metrics to business transaction performance and connect directly to the server.  Yes, application performance does impact the bottom line.  Be proactive, not reactive – check.

Migrations.  Large, tangled webs – and lots of work.  Public-facing, supplier, and internal applications to various platforms.  You know you will have application performance problems and wants a tool in place to deal with these issues as early in the migration as possible.  Monitor and profile migrated applications and deep dive to track specific performance problems.  Be prepared – check.

Too many cooks in the kitchen?  With software, there are always multiple people working in an application.  Analyze the performance of third-party, open-source code.  Profile and trace the execution of web services in a third-party, open-source system and then evaluate the performance of back-end storage strategies. Leave the open-source code unmodified and still collect valuable performance information from lightweight profiles and deep dive traces.   Don’t avoid Pandora’s box, open it up – check.

Georgia Ferretti handles all RTI sales, marketing and partnerships – contact gbf@ocsystems.com.