Friday 20 February 2015

Starting out with jHiccup

After writing my post on 'How to detect and diagnose slow code in production' I was encouraged by a reader to try out jHiccup from Azul systems.

Last year I went to a talk by jHiccup's creator Gil Tene on the correct way to measure latency, where, amongst other things, he introduced us to jHiccup. It had been on my todo list of products to investigate and this gave me the impetus to finally get on with my investigation.  

JHiccup measures the system latency over and above your actual program code. For examples GC times and other OS and hardware events that add latency spikes to the smooth running of your program.  It is critical to understand these because your program can never have better latencies than the underlying environment on which it is running.

To cut a long story short, I love the tool and think that it's going to be really useful to me now that I've started using it.  This post is not about teaching you everything about jHiccup, I refer you to the documentation for that. This post is a 'starting out with jHiccup guide', to show you how I got it running and hopefully whet your appetite to try it out in your own code.

Step 1: Download product

Download the code, it's open source and you can get it from here. Unzip the file and you will see the jHiccup.jar which we will use in the next step.

Step 2: Run your program with jHiccup

There are a number of way for running jHiccup but this is how I did it.  All you need to do is add this vm option to your command line:

-javaagent:jHiccup.jar="-d 0 -i 1000 -l hiccuplog -c"

There are lots of configurations, the ones chosen here mean:

  • -d The delay before which to start recording latencies - this allow for ignoring any code warm up period. (default after 30s)
  • -i Interval data, how often the data is recorded. (default every 5s)
  • -l The name of the log file to which data is recorded.
  • -c Launches a control JVM and records data to logFile.c.  Super useful to compare against actual program to see if the was a global event on the machine.
Step 3: Process the log file

Run this command on your log file (you can process both the program log file and the .c control log file).

jHiccupLogProcessor -i hiccuplog -o myhlog

This produces two files, we are interesting in the one call myhlog (not myhlog.hgram) which we will use in the final step.

Step 4: Produce graph in Excel

Now for the really nice bit.  Open the spreadsheet jHiccupPlotter.xls and make sure you enable the macros.

You will see a sheet like this:

Just select your file from step 3 and choose a chart title (this is a really useful feature when you come to comparing your graphs) and in a matter of seconds you will have a your latency distribution graphs.


I had a program (not particularly latency sensitive) and wanted to understand the effects that the different garbage collects had on latency.

All I had to do was to run my program with different garbage collector settings and compare the graphs. Of course this was a slightly manufactured example I happened to have to hand but you get the point, it's easy to change jvm settings or code and get comparable results.   The control program is also critical to understand what else is happening on your machine that might be affecting the latency of your program.

These are my results: It's interesting to see how the different GCs effect the latency and this is demonstrated beautifully using jHiccup.

Using the serial collector:

Using the G1 collector

Using the G1 collector - max pause set to 1ms

Using the CMS collector

Using the parallel GC

Wednesday 18 February 2015

Java 8 pitfall - Beware of Files.lines()

There's a really nice new feature in Java8 which allows you to get a stream of Strings from a file in a one liner.

List lines = Files.lines(path).collect(Collectors.toList());

You can manipulate the Stream as you would with any other Stream for example you might want to filter() or map() or limit() or skip() etc.

I started using this all over my code until I was hit with this exception,

Caused by: java.nio.file.FileSystemException: /tmp/date.txt: Too many open files in system
at sun.nio.fs.UnixException.translateToIOException(
at sun.nio.fs.UnixException.rethrowAsIOException(
at sun.nio.fs.UnixException.rethrowAsIOException(
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(
at java.nio.file.Files.newByteChannel(
at java.nio.file.Files.newByteChannel(
at java.nio.file.spi.FileSystemProvider.newInputStream(
at java.nio.file.Files.newInputStream(
at java.nio.file.Files.newBufferedReader(
at java.nio.file.Files.lines(
at java.nio.file.Files.lines( 

For some reason I had too many open files! Odd, doesn't Files.lines() close the file?

See code below (run3()) where I've created reproduced the issue:

My code looked something like run3() which produced the exception. I proved this by running the unix command lsof (lists open files) and noticing many many instances of date.txt open. To check that the problem was indeed with Files.lines() I made sure that the code ran with run1() using a BufferedReader, which it did.  By reading through the source code for Files I realised that the Stream need to be created in an auto closable.  When I implemented that in run2() the code ran fine again.

In my opinion I don't think that this is not particularly intuitive.  It really spoils the one liner when you have to use the auto closable.  I guess that the code does need a signal as to when to close the file but somehow it would be nice if that was hidden from us.  At the very least it should be highlighted in the JavaDoc which it is not :-) 

Tuesday 17 February 2015

The Optimum Method to Concatenate Strings in Java

Recently I was asked this question - Is it bad for performance to use the + operator to concatenate Strings in Java?

This got me thinking about the different ways in Java to concatenate Strings and how they would all perform against each other. These are the methods I'm going to investigate:
  1. Using the + operator
  2. Using a StringBuilder
  3. Using a StringBuffer
  4. Using String.concat()
  5. Using String.join (new in Java8)
I also experimented with String.format() but that is so hideously slow that I will leave it out of this post for now.

Before we go any further we should separate two use cases:
  1. Concatenating two Strings together as a single call, for example in a logging message. Because this is only one call you would have thought that performance is hardly an issue but the results are still interesting and shed light on the subject.
  2. Concatenating two Strings in a loop.  Here performance is much more of an issue especially if your loops are large.
My initial thoughts and questions were as follows:
  1. The + operator is implemented with StringBuilder, so at least in the case of concatenating two Strings it should produce similar results to StringBuilder. What exactly is going on under the covers? 
  2. StringBuilder should be the most efficient method, after all the class was designed for the very purpose of concatenating Strings and supersedes StringBuffer. But what is the overhead of creating the StringBuilder when compared with String.concat()?
  3. StringBuffer was the original class for concatenating Strings - unfortunately its methods are synchronized. There really is no need for the synchronization and it was subsequently replaced by StringBuilder which is not synchronized.  The question is, does the JIT optimise away the synchronisation?
  4. String.concat() ought to work well for 2 strings but does it work well in a loop?
  5. String.join() has more functionality that StringBuilder, how does it affect performance if we instruct it to join Strings using an empty delimiter?
The first question I wanted to get out of the way was how the + operator works. I'd always understood that it used a StringBuilder under the covers but to prove this we need to examine the byte code.

The easiest way to look at byte code these days is with JITWatch which is a really excellent tool created to understand how your code is compiled by the JIT.  It has a great view where you can view your source code side by side with byte code (also machine code if you want to go to that level).

Here's the byte code for a really simple method plus2() and we can see that indeed on line 6 a StringBuilder is created and appends the variables a (line 14) and b (line 18).

I thought it would be interesting to compare this against a handcrafted use of the StringBuffer so I create another method build2() with results below.

The byte code generated here is not quite as compact as the plus() method.  The StringBuilder is stored into the variable cache (line 13) rather than just left on the stack.  I'm not sure why this should be but the JIT might be able to do something with this, we'll have to see how the timings look. In any case it would be very surprising if the results of concatenating 2 strings with the plus operator and and the StringBuilder were significantly different.

I wrote a small JMH test to determine how the different methods performed. Let's first look at the two Strings test. See code below:

The results look like this:

BenchmarkScoreScore Error (99.9%)Unit

The clear winner here is String.concat().  Not really surprising as it doesn't have to pay the performance penalty of creating a StringBuilder / StringBuffer for each call. It does though, have to create a new String each time (which will be significant later) but for the very simple case of joining two Stings it is faster.

Another point is that as we expected plus and StringBuilder are equivalent despite the extra byte code produced. StringBuffer is only marginally slower than StringBuilder which is interesting and shows that the JIT must be doing some magic to optimise away the synchronisation.

The next test creates an array of 100 Strings with 10 characters each. The benchmark compares how long it takes for the different methods to concatenate the 100 Strings together. See code below:

The results look quite different this time:

BenchmarkScoreScore Error (99.9%)Unit

Here the plus method really suffers.  The overhead of creating a StringBuilder every time you go round the loop is crippling. You can see this clearly in the byte code:

You can see that a new StringBuilder is created (line 30) every time the loop is executed. It is arguable that the JIT ought to spot this and be able to optimise, but it doesn't and using + becomes very slow.

Again StringBuilder and StringBuffer perform exactly the same but this time they are both faster than String.concat().  The price that String.concat() pays for creating a new String on each iteration of the loop eventually mounts up and a StringBuilder becomes more efficient. 

String.join() does pretty well given all the extra functionality you can add to this method but, as expected, for pure concatenation it is not the best option.


If you are concatenating Strings in a single line of code I would use the + operator as it is the most readable and performance really doesn't matter that much for a single call. Also beware of String.concat() as you will almost certainly need to carry out a null check which is not necessary with the other methods. 

When you are concatenating Strings in a loop you should use a StringBuilder.  You could use a StringBuffer but I wouldn't necessarily trust the JIT in all circumstances to optimise away the synchronization as efficiently as it would in a benchmark. 

All my results were achieved using JMH and they come with the usual health warning.  

Wednesday 11 February 2015

How to detect and diagnose slow code in production

One of the more difficult tasks that a developer faces is finding and diagnosing slow running code in production.

Firstly how do you monitor your production code without slowing it down? Certainly you can't run production code through a profiler.  [EDIT] It has since been pointed out to me that adaptive profilers such as Satoris can be run in production. Even if you have a mechanism for timing your code, how then do you diagnose the issue? This is especially the case if you can't reproduce the problem in a development environment. Ideally you want to be notified when a problem happens in production and be supplied with enough information to have a reasonable chance of being able to fix or at least diagnose the issue.

Here's a mechanism suggested by my colleague Peter Lawrey that you can use to do exactly that. (Full code listing can be found here).

What you do is to create a Monitor class as below:

public class Monitor implements Runnable{
  private final Thread thread;
  private final AtomicLong startTime 
                  = new AtomicLong(Long.MAX_VALUE);
  private final int thresholdMS;

  public Monitor(Thread thread, int thresholdMS){
    this.thread = thread;
    this.thresholdMS = thresholdMS;

  public void reset(){

  public void run(){
      long timeTaken = System.currentTimeMillis()-startTime.get();
      if(timeTaken > thresholdMS){
        System.out.println(timeTaken + "-------------------------");
      try {
      } catch (InterruptedException e) {

This class will dump the stack trace of the running thread if the thread fails to reset within the threshold time.

Here's some example program demonstrating how the Monitor would be called.

Monitor monitor = new Monitor(Thread.currentThread(), 8);
Thread thread = new Thread(monitor, "MonitorThread");

while(true) {
   double x=0;
   for (int i = 0; i < 10_000; i++) {
     x += Math.sqrt(i);
     Logger.getLogger(getClass().getName()).fine("x=" + x);

This 'critical' piece of code is observed by Monitor. It will dump out a stack trace of the code if the monitor is not reset within 8ms.

If you have a Monitor watching your critical sections of code you can be sure that they are performing within the given constraint. If the code does break the constraints you can have a good idea of where the problem lies by examining the stack trace.  You can also use this to see how many times your critical code was not performant through its run time.

You can alter the wait time if you don't have the luxury of a dedicated spare CPU for monitoring. Also you might want to change the wait strategy to allow for GC pauses which would effect all threads. You might want to fine grain the timing by using System.nanoTime() rather than working in milliseconds.

Monday 2 February 2015

JMH: How to setup and run a JMH benchmark

Health Warning!
This post describes how to setup and run a simple JMH benchmark.  Micro benchmarks are notoriously difficult to get right and even when you do get them right (by using tools such as JMH) they can still be misleading. Just because your code runs in a certain way in an extremely isolated artificial situation does not mean it will run in the same way inside your production code. To name but a few issues, in a real program the CPU caches will be subject to pressures from other parts of your code, any object creation will have a downstream effect on GC and the JIT may have inlined and compiled code from other parts of your code that conflict with the code you have benchmarked. Nevertheless micro benchmarks do have their place and if you are going to do them you might as well do them properly with JMH.

In a recent post I was asked to execute my tests as a JMH performance benchmark.

JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM. See full documentation here.

Amongst other things JMH is great, because it takes care of warm up iterations, forking JVM processes so that benchmarks don't interfere with each other, collating results and presenting then in a uniform manner.  And there's much much more.

I'd heard a lot about JMH and seen many JMH results but never actually run one myself. It was surprisingly easy! This is how I did it.

There are two way to run a benchmark:

  1. Add the JMH maven dependencies to your pom file and then add a main method to your code using the Runner object.  This is useful if you want to run in your IDE.  There is some discussion about whether running inside your IDE makes a difference to the results see here for more details.  The reported difference is 2.2%.
  2. The recommended way is to generate a pom file and use that to create a jar. The mvn install uses the shade plugin to create a jar file so that you don't have create a main method.

Method 1 - For running in your IDE

Add these dependencies to your Maven pom.xml file:


Then decide which methods you want benchmarked and add the annotation @Benchmark to them. If you need any initialisation code add it in a method which should be marked @Setup.  

The easiest way to run the benchmark is by adding by adding this implementation into your main method. (See here for other ways to run your tests).

public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()

        new Runner(opt).run();

Alternatively, even more simply use this code:

public static void main(String[] args) throws Exception {

Then just run as you would any normal program and you will get all the JMH goodness!

Method 2 - The recommended way

Generate a pom file using this mvn command.
$ mvn archetype:generate \
          -DinteractiveMode=false \
          -DarchetypeGroupId=org.openjdk.jmh \
          -DarchetypeArtifactId=jmh-java-benchmark-archetype \
          -DgroupId=org.sample \
          -DartifactId=test \
This will create a project called test with an empty benchmark in it called MyBenchmark.
To build the project just use mvn clean install. This will build a jar called benchmark.jar.  It is the benchmark.jar that should be run to run the benchmark not any other jars produced along the way that will be in your target folder.

To run use the command java -jar benchmark.jar - that's it.

As an example to see the format of a JMH benchmark, this is what my results looked like:

Benchmark                                         Mode  Cnt     Score    Error  Units
CompTestBenchmark.bmCustomComparator             thrpt   20  2598.617 ± 67.634  ops/s
CompTestBenchmark.bmJDKComparator                thrpt   20   751.110 ± 14.835  ops/s
CompTestBenchmark.bmNoVTLComparator              thrpt   20  1348.750 ± 30.382  ops/s
CompTestBenchmark.bmNoVTLOrAutoBoxingComparator  thrpt   20  2202.995 ± 43.673  ops/s

There are an enormous number of bells and whistles to fine tune your benchmarks which I'm not going into here but hopefully this will get you up and running.

For a full code listing of my test see here.

To see all the options you have available use the command:

java - jar benchmark.jar -h

One of the most useful options is to run with the benchmarks with a profiler.  To list the available profilers on your system (for example perf is only available on Unix) use the command.

java -jar benchmark.jar -lprof

To run the test with a simple stack profiler (available on all systems) use

java -jar benchmark.jar -prof stack

There are loads of options including annotations to switch off inlining, to vary iterations and many more.  I encourage you to have a look at them - here's some documentation to start with. 

Maven Tip: All about executable jars

An executable jar is an extremely useful artefact when it comes to distributing your code.  It means that, as long as Java is installed on the client machine, on Windows and Mac at least, your users can just double click the jar and program will launch. Alternatively on the command line the program can easily be launched with the simple command line java -jar xxx.jar. No fussing with classpaths and dependent jars.

Creating a executable jar from Maven is pretty straight forward and involves using the maven-assembly-plugin. This can be configured and added to your pom.xml as below:


Let's drill into the details:
<goal>single<goal> tells the plugin that we want to execute the single goal. For more documentation see here.
<phase>install<phase> tells the plugin that we want this task to be run as part of the install lifecycle event.
<descriptorRef>jar-with-dependencies</descriptorRef> tells the plugin that we want the jar to include all dependencies.
<mainClass>com.example.Main</mainClass> tells the plugin which class should be launched when the jar is executed.

In this case I set up the plugin to produce an executable jar during the install life-cycle but of course you can change that setting if you want it to be part of a different part of the life-cycle.

Another task you will want Maven to do for you is to create your executable jar as part of the release.

In order to do this you will need to configure the maven-release-plugin to create the executable jar. This can be done as below:


The key thing here is the configuration of the preparationGoals which is called by release:prepare. When configured as a preparationGoal install is executed before committing (default preparationGoals are clean verify). By adding install to the preparationGoals we ensure that our executable jar will be built before the commit.  It will be tagged with the same number as the release.  If we are moving from version 1.3 to 1.4 the executable jar will be named xxx-jar-with-dependencies-1.4.jar.

Compare what happens when you configure install as a completionGoal as opposed to a preparationGoal. In the same example as above, moving from version 1.3 to 1.4, if install was configured as a completionGoal, install would be run after the commit and the resulting executable jar would be called xxx-jar-with-dependencies-1.5-SNAPSHOT.jar.

Goal (as opposed to preparationGoal and completionGoal) is run by release:perform. By default it calls deploy. See here as to why I configured it to call install.

In summary I hope you can see how easy it is to build an executable jar with Maven and to have it released with the correct release tag every time you release your project.