Tuesday, 17 February 2015

The Optimum Method to Concatenate Strings in Java

Recently I was asked this question - Is it bad for performance to use the + operator to concatenate Strings in Java?

This got me thinking about the different ways in Java to concatenate Strings and how they would all perform against each other. These are the methods I'm going to investigate:
  1. Using the + operator
  2. Using a StringBuilder
  3. Using a StringBuffer
  4. Using String.concat()
  5. Using String.join (new in Java8)
I also experimented with String.format() but that is so hideously slow that I will leave it out of this post for now.

Before we go any further we should separate two use cases:
  1. Concatenating two Strings together as a single call, for example in a logging message. Because this is only one call you would have thought that performance is hardly an issue but the results are still interesting and shed light on the subject.
  2. Concatenating two Strings in a loop.  Here performance is much more of an issue especially if your loops are large.
My initial thoughts and questions were as follows:
  1. The + operator is implemented with StringBuilder, so at least in the case of concatenating two Strings it should produce similar results to StringBuilder. What exactly is going on under the covers? 
  2. StringBuilder should be the most efficient method, after all the class was designed for the very purpose of concatenating Strings and supersedes StringBuffer. But what is the overhead of creating the StringBuilder when compared with String.concat()?
  3. StringBuffer was the original class for concatenating Strings - unfortunately its methods are synchronized. There really is no need for the synchronization and it was subsequently replaced by StringBuilder which is not synchronized.  The question is, does the JIT optimise away the synchronisation?
  4. String.concat() ought to work well for 2 strings but does it work well in a loop?
  5. String.join() has more functionality that StringBuilder, how does it affect performance if we instruct it to join Strings using an empty delimiter?
The first question I wanted to get out of the way was how the + operator works. I'd always understood that it used a StringBuilder under the covers but to prove this we need to examine the byte code.

The easiest way to look at byte code these days is with JITWatch which is a really excellent tool created to understand how your code is compiled by the JIT.  It has a great view where you can view your source code side by side with byte code (also machine code if you want to go to that level).


Here's the byte code for a really simple method plus2() and we can see that indeed on line 6 a StringBuilder is created and appends the variables a (line 14) and b (line 18).

I thought it would be interesting to compare this against a handcrafted use of the StringBuffer so I create another method build2() with results below.


The byte code generated here is not quite as compact as the plus() method.  The StringBuilder is stored into the variable cache (line 13) rather than just left on the stack.  I'm not sure why this should be but the JIT might be able to do something with this, we'll have to see how the timings look. In any case it would be very surprising if the results of concatenating 2 strings with the plus operator and and the StringBuilder were significantly different.

I wrote a small JMH test to determine how the different methods performed. Let's first look at the two Strings test. See code below:




The results look like this:


BenchmarkScoreScore Error (99.9%)Unit
testPlus15750720.32957703.6388ops/s
testStringBuffer14545063.2812623.9396ops/s
testStringBuilder15671930.21436265.5796ops/s
testStringConcat24124041.472498000.326ops/s
testStringJoiner10749530.45388130.9845ops/s



The clear winner here is String.concat().  Not really surprising as it doesn't have to pay the performance penalty of creating a StringBuilder / StringBuffer for each call. It does though, have to create a new String each time (which will be significant later) but for the very simple case of joining two Stings it is faster.

Another point is that as we expected plus and StringBuilder are equivalent despite the extra byte code produced. StringBuffer is only marginally slower than StringBuilder which is interesting and shows that the JIT must be doing some magic to optimise away the synchronisation.

The next test creates an array of 100 Strings with 10 characters each. The benchmark compares how long it takes for the different methods to concatenate the 100 Strings together. See code below:




The results look quite different this time:

BenchmarkScoreScore Error (99.9%)Unit
testPlus82297.26461496.838588ops/s
testStringBuffer501613.337514461.60235ops/s
testStringBuilder507697.90589510.921128ops/s
testStringConcat403378.15917458.6318ops/s
testStringJoiner381805.45696572.704663ops/s


Here the plus method really suffers.  The overhead of creating a StringBuilder every time you go round the loop is crippling. You can see this clearly in the byte code:


You can see that a new StringBuilder is created (line 30) every time the loop is executed. It is arguable that the JIT ought to spot this and be able to optimise, but it doesn't and using + becomes very slow.

Again StringBuilder and StringBuffer perform exactly the same but this time they are both faster than String.concat().  The price that String.concat() pays for creating a new String on each iteration of the loop eventually mounts up and a StringBuilder becomes more efficient. 

String.join() does pretty well given all the extra functionality you can add to this method but, as expected, for pure concatenation it is not the best option.

Summary

If you are concatenating Strings in a single line of code I would use the + operator as it is the most readable and performance really doesn't matter that much for a single call. Also beware of String.concat() as you will almost certainly need to carry out a null check which is not necessary with the other methods. 

When you are concatenating Strings in a loop you should use a StringBuilder.  You could use a StringBuffer but I wouldn't necessarily trust the JIT in all circumstances to optimise away the synchronization as efficiently as it would in a benchmark. 

All my results were achieved using JMH and they come with the usual health warning.  

4 comments:

  1. Nice work. but I think results would differ with different jvms.

    ReplyDelete
    Replies
    1. Yes absolutely correct and in fact the results might not only differ between major versions but minor versions as well. For example escape analysis has greatly improved in Java 8 to support efficient use of lambda expressions. Any changes to the JIT might have a material difference to the results of my tests.

      Nevertheless I think the basis and conclusions from my tests are sound. The source code for the tests are included so feel free to run against different JVMs and I would be very interested to see how you get on.

      All my tests were done against this version:
      java version "1.8.0_25"
      Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
      Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

      Delete
  2. The concat test has an error. You have to use

    combined = combined.concat(s);

    ReplyDelete
  3. What's about Intel Turbo Boost technology that changes CPU frequency during high load? Have you disabled it before doing testing?

    ReplyDelete