Monday, 11 January 2016

Writing 2 Characters into a Single Java char

Here's another nice trick we used when creating the ultra low latency Chronicle FIX-Engine.

When it comes to reading data off a stream of bytes it's way more efficient, if possible, to store data in a char rather than having to read it into a String.  (At the very least you are avoiding creating a String object, although this can be mitigated by using a cache or working with CharSequence rather than String but that's the subject of another post.)

Using JMH benchmarks I've found these timings: (I haven't included the source code for this as this is going to be the subject of another post where I describe the different methodologies in more detail).

Reading 2 ascii characters off a bytes stream into:

String - 34.48ns
Pooled String - 28.57ns
StringBuilder - 21.27ns
char (using 2 chars method) - 6.75ns

(As a benchmark reading a single char takes 3.27ns.)

The point is that it takes at least 3 times longer to read data into a String than a char, and that doesn't even take into account the garbage created.

So it goes without saying that when you know that you are expecting data that is always a single character, rather than reading that data into a String variable you should read it into a char.

Now what if you know that that data you are expecting on the stream is no more than 2 characters. (You find this situation, for example in FIX 5.0 tag 35 msgType). Do you have to use a String so that you can accommodate the extra character?  At first thoughts it appears so, after all a char can only contain a single character.

Or can it? 

A java char is made up of 2 bytes not one.  Therefore if you know that your data is made up of ascii characters you know that only a single byte (of the 2 bytes in the char) will be used. For example 'A' is 65 though to 'z' which is 122. 

You can print out the values that fit into a single byte with this simple loop:

for (int i = 0; i < 256; i++) {
    char c = (char)i;
    System.out.println(i+ ":" + c);

You are now free to use the other bye of the char to hold the second ascii character.

This is the way to do it: 

In this example you have read 2 bytes 'a' and 'b' and want to store them in a single char.

byte a = (byte)'a';
byte b = (byte)'b';
//Now place a and b into a single char
char ab = (char)((a << 8) + b);
//To retrieve the bytes individually see code below 
System.out.println((char)(ab>>8) +""+ (char)(ab & 0xff)); 
To better understand this let's look at the binary:

byte a  = (byte)'a' // 01100001
byte b  = (byte)'b' // 01100010

As you can see below, when viewed as a char, the top 8 bits are not being used

char ca = 'a' // 00000000 01100001
char cb = 'b' // 00000000 01100010

Combine the characters with a taking the top 8 bits and b the bottom 8 bits.
char ab = (char)((a << 8) + b); // 01100001 01100010


It's more efficient reading data into a char rather than a String.  If you know that you have a maximum of 2 ascii characters they can be combined into a single Java char.  Of course only use this technique if you really are worried about ultra low latency!


  1. Hi Daniel,

    Thanks for the post.

    I wonder why you don't read plain bytes from a be byte stream.

    1. Yes for ultimate speed you would keep them as bytes but more often than not the bytes off a stream need to go into some data model so that they can be manipulated further either as doubles, ints, Strings etc.

  2. This comment has been removed by the author.