Rational Java: January 2016

Introduction

This post will introduce you to ChronicleMap and explain how by using off heap memory you can do things you might never expect a Java program to be able to do. Using off heap memory is very un-Javaish. If you're following the debate about UNSAFE in Java 9 you will appreciate just how controversial it is. Nevertheless it provides the power to do some some really useful things. ChronicleMap harnesses the power of UNSAFE, taming it and making it 'SAFE' for the rest of us to use.

Let's demonstrate using a really simple use case for ChronicleMap

Scenario: Your server accepts requests from many clients. You want to keep a count of the requests from each client. You must ensure that every time the request count is incremented the number is persisted to disk to prevent against your process crashing and the data being lost.

Traditional approach: Create a table in database with two columns, clientID and requestCount. Every time you get a request from a client fetch the client's requestCount from the database, increment it and save the result back into the database. The actual database interaction can be abstracted somewhat by having data beans using Hibernate or the equivalent. But then you have deal with configuration and you still have to setup a database...

Wouldn't it be nice if: We could just save the data to a java.util.Map and that would be all the configuration you would need.

Enter ChronicleMap: ChronicleMap is just like a normal java HashMap except that rather than saving its data to on-heap memory it saves its data to off-heap memory. Since the off heap memory is backed by a memory mapped file all the data gets flushed to disk and persisted by the OS.

What's the difference between on and off heap memory: On heap memory is memory 'managed' by the JVM. Most importantly all data that is allocated on heap is subject to garbage collection. (This has its pros and cons but more of that another time). This memory is entirely local to the JVM and data in on heap memory will be lost when the JVM dies. Off heap memory is unmanaged memory. It can be shared between processes and persisted through memory mapped files.

Enough of the theory: Let's see the code for this in practice.

To add Chronicle-Map to you project just add this Maven dependency:

<dependency>
    <groupId>net.openhft</groupId>
    <artifactId>chronicle-map</artifactId>
    <version>2.4.12</version>
</dependency>

It takes exactly 2 lines write the code for our scenario - have a look at the code below:

The first line creates the map and the second updates it as you would any other java.util.concurrent.ConcurrentMap.

Each time you run the program you will notice that the number of user requests for "user1" is incremented.

A note to explain the parameters to the constructor: To achieve the greatest performance with ChronicleMap the Map does not resize. (Apart from anything else ChronicleMap was written as a performant data store for low latency systems but that's not what we are really concentrating on in this post). Therefore, to enable the Map to reserve the correct amount amount of disk space we must specify the maximum number of entries and also give it a rough idea of the length of the Strings in the key. (You shouldn't worry too much about over estimating the number of entries as disk space is only taken passively and will only be used if required.)

A deeper look into off heap memory: Hopefully that was simple enough we've seen how you can use ChronicleMap as an implementation of java.util.Map and all inserts and updates to that map are saved to disk because ChronicleMap is backed by off heap memory.

But let's say you don't want to write the number of requests back to the map each time you want it saved to disk. What you might want is to update the userRequests and for that number to be persisted without having to update the Map. (It is the equivalent of having an AtomicLong stored in a HashMap.)

To do this rather than using a Long to hold the user request you should use LongValue where LongValue is just a wrapper class for a Long. Crucially though LongValue must be backed by off heap memory. For this reason you can't just create an instance of LongValue using new as you would with normal on heap memory you need to use a factory provided by Chronicle.

It's much easier to understand this by examining as the code below:

The Map is built exactly the same way as in the simple example above other than the value type is LongValue not Long.

The variable userRequests is create by generating a direct (off heap) instance using the call DataValueClasses.newDirectInstance(). (If you want to understand more about the internals of this code run the program in a debugger and step into line addValue() on line 28.) All the data written into this instance variable is saved off heap and therefore persisted.

Because useRequest is backed by off heap memory we can increment it on line 28 and do not have to store it back into the ChronicleMap for the data to be persisted.

One thing to point out is the the method call on the Map acquireUsing() on line 25 which allows us to fetch the data in the Map using a a pre-existing instance variable. This means that no allocation takes place and allows for zero GC program. (This is important because GC is the enemy of predictable low latency systems).

Summary

ChronicleMap uses off heap memory to enable:

persistence (as we have seen in this post)
IPC (inter process communication) you can share the ChronicleMap between more than one JVM.
Zero GC (important for real-time systems)

If you need any of the above you might consider using ChronicleMap in your code!

Here's another nice trick we used when creating the ultra low latency Chronicle FIX-Engine.

When it comes to reading data off a stream of bytes it's way more efficient, if possible, to store data in a char rather than having to read it into a String. (At the very least you are avoiding creating a String object, although this can be mitigated by using a cache or working with CharSequence rather than String but that's the subject of another post.)

Using JMH benchmarks I've found these timings: (I haven't included the source code for this as this is going to be the subject of another post where I describe the different methodologies in more detail).

Reading 2 ascii characters off a bytes stream into:

String - 34.48ns
Pooled String - 28.57ns
StringBuilder - 21.27ns
char (using 2 chars method) - 6.75ns

(As a benchmark reading a single char takes 3.27ns.)

The point is that it takes at least 3 times longer to read data into a String than a char, and that doesn't even take into account the garbage created.

So it goes without saying that when you know that you are expecting data that is always a single character, rather than reading that data into a String variable you should read it into a char.

Now what if you know that that data you are expecting on the stream is no more than 2 characters. (You find this situation, for example in FIX 5.0 tag 35 msgType). Do you have to use a String so that you can accommodate the extra character? At first thoughts it appears so, after all a char can only contain a single character.

Or can it?

A java char is made up of 2 bytes not one. Therefore if you know that your data is made up of ascii characters you know that only a single byte (of the 2 bytes in the char) will be used. For example 'A' is 65 though to 'z' which is 122.

You can print out the values that fit into a single byte with this simple loop:

for (int i = 0; i < 256; i++) {
    char c = (char)i;
    System.out.println(i+ ":" + c);
}

You are now free to use the other bye of the char to hold the second ascii character.

This is the way to do it:

In this example you have read 2 bytes 'a' and 'b' and want to store them in a single char.

byte a = (byte)'a';
byte b = (byte)'b';
//Now place a and b into a single char
char ab = (char)((a << 8) + b);
//To retrieve the bytes individually see code below 
System.out.println((char)(ab>>8) +""+ (char)(ab & 0xff)); 
To better understand this let's look at the binary:

byte a  = (byte)'a' // 01100001
byte b  = (byte)'b' // 01100010

As you can see below, when viewed as a char, the top 8 bits are not being used

char ca = 'a' // 00000000 01100001
char cb = 'b' // 00000000 01100010

Combine the characters with a taking the top 8 bits and b the bottom 8 bits.
char ab = (char)((a << 8) + b); // 01100001 01100010

Summary

It's more efficient reading data into a char rather than a String.  If you know that you have a maximum of 2 ascii characters they can be combined into a single Java char.  Of course only use this technique if you really are worried about ultra low latency!

Rational Java

Pages

Wednesday 27 January 2016

Starting out with ChronicleMap: Using Off Heap Memory

Monday 11 January 2016

Writing 2 Characters into a Single Java char