Category Archives: Performance

Performance Problems Hiding in the Open

Several years ago I worked with a prominent bank with approximately 1700 branches, making it one of the largest in its country. They were receiving seemingly random complaints about an application used to determine the credit worthiness of commercial loan applicants. A common scenario was that someone would walk into a branch to discuss a loan, the loan officer would look up their information in the application and a pop-up window with the applicants history would look like this:



Not exactly what they wanted to see, to say the least. Some reports from the field indicated that if you waited long enough (reportedly somewhere between 20-60 minutes) it would finally render. In other cases it would eventually time out the session. Some people claimed if they refreshed it would come back right away, some said it did nothing. The only additional information we had was that some branches reported consistent application slowness.

Twelve months earlier, a member of the application support team had travelled to many of the branches to try and observe the problem. Using basic packet analyzer tools like Fiddler and Wireshark, he was able to capture some baseline performance metrics, but found nothing conclusive. When I looked at the results, plus HTTP session information captured using HP Real User Monitor, two things jumped out:

1) HTTP 404 and 500 errors

2) Network spikes that corresponded closely to the complaining branches.

We also used a combination of CA Wily Introscope and Dynatrace to determine that server-side application performance was excellent, averaging sub-second response times.

What was interesting was the way the problems were hiding in plain sight. We had previously been reassured that the network was excellent. Upon presenting our findings though, the capacity planning team sheepishly admitted that many of the complaining branches were already on an upgrade schedule, but it would be another six months before that could happen. Essentially it was a political football at that point.

The Fiddler capture that had happened twelve months prior to our engagement had also shown HTTP errors. Some of these had been fixed, but it was believed that they were unimportantly. The thinking went that these were just image files or static reference content rather than anything of substance to the application. What had been overlooked was that many of these 404s related to javascript files. Thus, under certain circumstances, portions of the application functionality rendered by asynchronous HTTP requests (IE: “AJAX”) were failing. All the data relevant to solving the problem was available, but no one had interpreted it successfully.

This is a really common scenario, where seemingly innocuous HTTP errors or application exceptions are ignored because the application “seems to work.” I’ve yet to find a scenario where we couldn’t significantly improve the real-world performance just  by fixing some “unimportant” errors.



Basic Caching Math

I’ve seen this really simple problem now at two software companies, both filled with really bright developers. In both cases, the customer wanted to cache commonly-viewed images in memory at application startup. The number of images was known or could be approximated and would have well-defined categories. For example, perhaps they would want to cache pictures of Animals, Sports teams, etc. In both cases, the customers were complaining of long application startup time and significant memory issues, causing application outages. Can you guess what was broken? It’s simple math:

- A host with 2 gb of ram.

- A Java process consuming 1gb of ram.

- 600 images requiring approximately 2mb of ram each.

See the problem? If your cache loading “algorithm” is to blindly go load the cache with the 600 images, your host will either run out of ram or will thrash about trying to load those images into insufficient memory. Now, to be fair, where this gets complicated is if we can’t accurately estimate the size of the images required in ram. Even in this era of virtualization and container-based deployment, understanding the physical limits of systems determines architectural choices.


Simple Application Performance Problem Example

Let’s say you are using some kind of application performance tool (HP Diagnostics, New Relic, etc) and you see a graph that looks like this:

Screen Shot 2013-10-31 at 8.27.38 AM


What do we know by looking at this? That the reason the request is taking 20 seconds is due to four linear, synchronous http requests. I see this kind of thing frequently when assessing the performance of customer applications, where subsystems are tested and considered to have acceptable performance in isolation. Then someone comes along and wants to tie several services together and doesn’t stop to reflect upon how 4 synchronous calls of 4 seconds each is automatically 16 seconds. How do you fix this? You have two basic choices:

1) Find a way to significantly improve the performance of each subsystem.

2) Find a way to call the subsystems asynchronously such that the overall execution time is reduced.

Application Instrumentation Made Simple

There are many good application performance tools on the market today, supporting a variety of languages and in both software-as-a-service and traditional shrinkwrap forms. Whether using HP Diagnostics, CA Wily Introscope, Dynatrace, AppDynamics, New Relic, or something else, knowing where and why to apply instrumentation is worthwhile to understand.

While I have Java on my mind as I write this, the rules stay the same as we approach C# or even Python or Ruby. With tooling, too much information can be as much of a problem as too little.

What not to Instrument

My recommendation to not instrument these cases is not a definitive rule. There are legitimate scenarios where instrumenting the list below could make sense, but generally as a secondary step as the location of a problem becomes clear.

1) Model / domain / “getter-setter” objects are seldom a source of performance overhead; they are simple data carriers.

2) Primitive types / built-in types. Imagine instrumenting java.lang.String. It will produce a firehose of data and even in the rare possibility that you find a legitimate issue, how will you get it fixed?

3) Virtual Machine code. If your focus is on the performance of your application, instrumenting the (Java | ruby | python | etc) VM itself is likely to artificially degrade performance and produce copious, unusable data.

4) Vendor code / libraries. This isn’t an absolute “don’t” but be aware that you are taking on a challenge. If you find a problem in your vendor’s code, you will need to take it all the way through to convincing them that the problem is real and requires a fix.

5) Stay away from utility behavior unless you have a really good reason to apply instrumentation. Case in point, logging. Logging involves I/O operations that are already a potential performance drain, so the last thing you want to do is make it worse (unless you’ve got a really good indication that logging is the problem).


What to Instrument
1) Reflect on the generic application diagram to the left. The first thing to understand is that with many tools, your effort is reduced because common frameworks and APIs are instrumented out of the box.

2) Focus on their business logic – the heart of their custom-built functionality.

3) Within the  application, focus on Verb rather than Noun behaviors. Look for not only classes, but also specific methods where there is transactional behavior. Focus on specific classes and methods that interact with external systems or where there is a transition between the modules of the application – those are the places where things break.

4) Both when applying instrumentation and doing your analysis, don’t get too hung up on calculations, memory, or threads until you have an indication that they are the source of a problem. Recognize too that a profiler is different than a static analyzer.

Despite vendor warnings, you can get by with a lot of instrumentation if you know where to apply it. Main thing is to keep it focused on actual transaction components – all those classes in a system that control the workflow.

Installing the HP Diagnostics Java Agent

HP Diagnostics has agents for Java, .Net, and Python. The Java and Python agents support multiple operating systems, for which there are separate installers available. In the following example we will install the Java agent on Windows using Oracle WebLogic.

Screen Shot 2013-07-17 at 10.07.28 PM  Step 1: Accept license agreement

Be prepared to hit the “enter” key many times if you are installing using the command-line installer





Screen Shot 2013-07-17 at 10.07.41 PM Step 2: Choose Installation Directory

After this point, the Setup Module will automatically load, enabling configuration of the agent. If you are installing the agent from the command-line, you will need to navigate to the <installation directory>/bin directory and manually launch the setupModule script for your platform.




Screen Shot 2013-07-17 at 10.08.19 PM

Step 3: Choose Configuration Options

 Profiler Mode: The Agent can be run in a stand-alone configuration (no integration with the Commander), free of charge.

AD License: Use this if you intend to only integrate the Commander with LoadRunner or Performance Center

 AM License: The Diagnostics Agent is also used as a data collector for HP TransactionVision. Diagnostics can also be deployed in an HP SaaS configuration. If you are installing Diagnostics in your environment and are not using TransactionVision, then select only the “Diagnostics” option.
Screen Shot 2013-07-17 at 10.09.07 PMStep 4: Enter an Agent Name and Agent Group

This step is important, as the names used here will appear in the Diagnostics Commander interface.  Agent Group is used where you have multiple agents all performing a similar task, examples: “Production,” “Application123,” or “Cluster456.” Both Agent Name and Agent Group will be used by default for any Agent instances executed on this host. By appending “%” to your agent name, a unique incrementing number will be appended.






Screen Shot 2013-07-17 at 10.09.21 PMStep 5: Agent Configuration

In this step we are configuring the Agent to send its data to the Mediator. This may or may not involve a proxy server, depending on your environment. In many cases, the Agent and Mediator will be on the same subnet (good idea), with firewall configuration so that the Mediator and Commander can connect.







Screen Shot 2013-07-17 at 10.09.43 PM Step 6: Complete Installation

We will run the JREInstumenter in the next step, so no need to run it in this step. If we were to select the checkbox in this step, the JREInstrumenter would run against the first JRE/JDK discovered on the system, which may or may not be the one used by our application. By manually executing it in the next step, we explicitly identify which JRE/JDK we intend to use.






Screen Shot 2013-07-27 at 1.20.59 PMStep 7: Proceed Using the JREInstrumenter

The JREInstrumenter is a separate application, accessible from the windows program group. If you are installing from the command-line, you will need to navigate to  the <installation directory>/bin directory and manually launch the JREInstrumenter script for your platform.



Screen Shot 2013-07-27 at 1.19.30 PM Using the JREInstrumenter, we select the JRE/JDK being used by the application to be monitored.The entire output of the JREInstrumenter is a string parameter we will append to our application startup.

Is running the JREInstrumenter required? It depends on the version of your JRE/JDK and Agent. HP strongly recommends that the JREInstrumenter be run as they reserve the right to apply additional Agent initialization features.



Screen Shot 2013-07-27 at 1.46.30 PM Step 8: Modify application bootclasspath. 
This step is application specific, but the summary is that you will append the parameter from the JREInstrumenter to the application bootclasspath for your application. In some cases, such as when using IBM WebSphere application server, you may be able to use a graphical user interface. For Oracle WebLogic, there is a startup script where you can append the parameter. This step may take several attempts to work.

Screen Shot 2013-07-27 at 1.49.00 PM Step 9: View the Probe

You will know the probe is functioning when you can view its user interface at http://<host>:35000. Default username and password are both “admin.”