Monday, 11 January 2016

Writing 2 Characters into a Single Java char

Here's another nice trick we used when creating the ultra low latency Chronicle FIX-Engine.

When it comes to reading data off a stream of bytes it's way more efficient, if possible, to store data in a char rather than having to read it into a String.  (At the very least you are avoiding creating a String object, although this can be mitigated by using a cache or working with CharSequence rather than String but that's the subject of another post.)

Using JMH benchmarks I've found these timings: (I haven't included the source code for this as this is going to be the subject of another post where I describe the different methodologies in more detail).

Reading 2 ascii characters off a bytes stream into:

String - 34.48ns
Pooled String - 28.57ns
StringBuilder - 21.27ns
char (using 2 chars method) - 6.75ns

(As a benchmark reading a single char takes 3.27ns.)

The point is that it takes at least 3 times longer to read data into a String than a char, and that doesn't even take into account the garbage created.

So it goes without saying that when you know that you are expecting data that is always a single character, rather than reading that data into a String variable you should read it into a char.

Now what if you know that that data you are expecting on the stream is no more than 2 characters. (You find this situation, for example in FIX 5.0 tag 35 msgType). Do you have to use a String so that you can accommodate the extra character?  At first thoughts it appears so, after all a char can only contain a single character.

Or can it? 

A java char is made up of 2 bytes not one.  Therefore if you know that your data is made up of ascii characters you know that only a single byte (of the 2 bytes in the char) will be used. For example 'A' is 65 though to 'z' which is 122. 

You can print out the values that fit into a single byte with this simple loop:

for (int i = 0; i < 256; i++) {
    char c = (char)i;
    System.out.println(i+ ":" + c);
}

You are now free to use the other bye of the char to hold the second ascii character.

This is the way to do it: 

In this example you have read 2 bytes 'a' and 'b' and want to store them in a single char.


byte a = (byte)'a';
byte b = (byte)'b';
//Now place a and b into a single char
char ab = (char)((a << 8) + b);
//To retrieve the bytes individually see code below 
System.out.println((char)(ab>>8) +""+ (char)(ab & 0xff)); 
To better understand this let's look at the binary:

byte a  = (byte)'a' // 01100001
byte b  = (byte)'b' // 01100010

As you can see below, when viewed as a char, the top 8 bits are not being used

char ca = 'a' // 00000000 01100001
char cb = 'b' // 00000000 01100010

Combine the characters with a taking the top 8 bits and b the bottom 8 bits.
char ab = (char)((a << 8) + b); // 01100001 01100010

Summary

It's more efficient reading data into a char rather than a String.  If you know that you have a maximum of 2 ascii characters they can be combined into a single Java char.  Of course only use this technique if you really are worried about ultra low latency!

3 comments:

  1. Hi Daniel,

    Thanks for the post.

    I wonder why you don't read plain bytes from a be byte stream.

    ReplyDelete
    Replies
    1. Yes for ultimate speed you would keep them as bytes but more often than not the bytes off a stream need to go into some data model so that they can be manipulated further either as doubles, ints, Strings etc.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete