Monday 30 March 2015

ChronicleMap - Java Architecture using Off Heap Memory

My last post was written a couple of weeks ago and after some valid feedback I'd like to clarify a couple of points as a preface to this article.

The main takeaway from 'Creating millions of objects with Zero Garbage' should be that with Chronicle you are not 'limited' to using jvm allocated on-heap memory when writing a Java program. Maybe the article would have been more aptly titled 'Creating Millions of Objects using Zero Heap'. Another point I wanted to bring out was that when you have no heap memory you cause no GC activity. 

A source of confusion came from the fact I used the term 'garbage' to describe the objects allocated on the heap. The objects allocated were actually not garbage though they caused GC activity. 

I contrived an example to demonstrate, one, that ChronicleMap does not use heap memory whilst ConcurrentHashMap does, and two, that when you use heap memory you can't ignore the GC. At the very least you need to tune your system carefully to ensure that you don't end up suffering from long GC pauses. This does not mean that there are no issues with allocating from off heap (see the end of this post) and it also does not mean that you can't tune your way through an on heap solution to eliminate GC. Going off heap is by no means a panacea to all Java performance issues but for very specific solutions it can deliver interesting opportunities some of which I will discuss in this post.

Let's examine this problem:

You have multiple JVMs on your machine and need to share data between them. You would also like that data to be persisted so that when the JVMs exits the data does not disappear.

Let's simplify for now and say that you have two JVMs running on the same machine, either or both of which would like to see updates from the other. Each Java program has a ConcurrentHashMap which it updates, those updates are stored and are of course available to it later. But how does the program get the updates applied by the other Java program to its map? And how do we prevent the data from evaporating when the programs exit?

Fundamentally, JDK on-heap collections such as HashMap and ConcurrentHashMap can't be shared directly between JVMs.  This is because heap memory is contained by the JVM through which it was allocated. Additionally when the JVM exits, the memory is released and the data is no longer available, there is no implicit way of persisting the memory outside the lifetime of the JVM. So you need to find some other mechanism to share the data between the JVMs and persist the data.  

From an architectural perspective you need accomplish these two tasks.
  1. A mechanism for sharing the data between the JVMs so that when one process updates the other will be informed of the action.
  2. A mechanism for storing the data so that when one or both JVMs exit the data does not just disappear.
Typically you might use a database as an external sharable store and messaging service to send the data updates to other processes to notify them that some data has been updated.

This results in the following architecture:



The problem with this architecture is that use lose the in-memory speeds of a HashMap, especially if writing to your database is not that fast and you want the write to be persisted before you send the message out over the messaging service. Also many solutions will involve TCP calls which can again be a source of latency. There are of course much faster ways to persist data than writing to a fully fledged database using mechanisms like journaling to disk, for example using a product like ChronicleQueue or similar.  But if you did use a journal you'd still have to build all the logic to recreate a Map data structure on restart not to mention having to keep a Map type structure up-to-date on another JVM.  In addition to the latency introduced by this architecture there is the complication of having to deal with the extra code and configuration for the database and messaging service. 

Even accepting that this sort of functionality can be wrapped up in frameworks, wouldn't it be great if your in memory Map was actually visible outside your JVM.  The Map should be able to implicitly persist the data so that its data is available independently of the life time of the JVM. It should allow access with the same 'memory' speeds as you might achieve using an on heap Map.

This is where ChronicleMap comes in.  ChronicleMap is an implementation of java.util.ConcurrentMap but critically it uses off heap memory which is visible outside the JVM to any other process running on the machine. (For a discussion about on-heap vs off-heap memory see here). 

Each JVM will create a ChronicleMap pointing at the same memory mapped files. When one process writes into its ChronicleMap the other process can instantly (~40 nanoseconds) see the update in its ChronicleMap. Since the data is stored in memory outside the JVM, a JVM exit will not cause any data to be lost.  The data will be held in memory (assuming there was no need for it to be paged out) and when the JVM restarts it can map it back in extremely quickly.  The only way data can be lost is if the OS crashes whilst it has dirty pages that haven't been persisted to disk. The solution to this is use replication which Chronicle supports but is beyond the scope of this post.

The architecture for this is simply this:



For a code example to get started with ChronicleMap see my last post or see the official ChronicleMap tutorial here.

There are a number of caveats and trade-offs to consider before diving into ChronicleMap.
  • The ChronicleMap entries have to be Serializable.  For systems that are very sensitive to performance you will need to implement the custom serialisation provided by Chronicle known as BytesMarshallable. Whilst this is pretty easy to implement it is not something that is necessary with an on-heap map. (Having said that storing data into a database will of course also require some method of serialisation.) 
  • Even with BytesMarshallable serialisation, the overhead of any serialisation might be significant to some systems. In such a scenario it is possible to employ a zero copy technique supported by Chronicle (see my last blog post for more details) to minimise the costs of serialisation. It is however a little trickier to implement than using 'normal' Java.  On the other hand in latency sensitive programs it will have the huge benefit of not creating any objects that might then later need to be cleaned up by the GC.
  • A ChronicleMap does not resize and must therefore be sized up front. This might be an issue if you have no idea how many items to expect.  It should be noted however, that oversizing, at least on Linux, is not a huge problem as Linux passively allocates memory. 
  • Chronicle relies on the OS to asynchronously flush to disk. If you want to be absolutely sure that data has actually been written to disk (as opposed to just being held in memory) you will need to replicate to another machine. In truth any mission critical system should be replicating to another machine so this might not be a big issue in adopting Chronicle.
  • ChronicleMap will be subject to OS memory paging issues. If memory is paged out and has to be swapped back in latency will be introduced into the system. Therefore even though you will be able to create ChronicleMaps with sizes well in excess of main memory, you will have to be aware that paging might occur depending on your access patterns on the data. 



Monday 16 March 2015

Creating Millions of Objects with Zero Garbage

As noted in First rule of performance optimisation, garbage is the enemy of fast code. Not only can it destroy any sort of deterministic performance by employing the services of the garbage collector but we start filling our CPU caches with garbage that will cause expensive cache misses for our program.

So, can we use Java without creating garbage? Is it possible, for example, in natural Java, to solve this problem:

Create 10m financial instrument objects, store them in a map, retrieve them and perform a calculation using each object without creating any garbage at all.

It is if you use Chronicle!  Chronicle provides libraries so that you can easily use off heap storage in the form off memory mapped files for your objects. (For full source code for this article see here.)

Let's look implementing a solution for the above problem. 

First let's have a look at how you might do this in normal Java so that we make sure we understand the problem and what happens if we use the standard Java libraries for our implementation.

This is the out put from the program:


*** Entering critical section ***
[GC (Allocation Failure)  98816K->92120K(125952K), 0.0317021 secs]
[Full GC (Ergonomics)  92120K->91917K(216576K), 0.2510530 secs]
[GC (Allocation Failure)  125197K->125430K(224256K), 0.0449051 secs]
[GC (Allocation Failure)  166390K->166686K(244224K), 0.0504341 secs]
[Full GC (Ergonomics)  166686K->165777K(387072K), 0.6243385 secs]
[GC (Allocation Failure)  226705K->226513K(388096K), 0.0785121 secs]
[GC (Allocation Failure)  293073K->293497K(392704K), 0.0825828 secs]
[Full GC (Ergonomics)  293497K->292649K(591872K), 1.2479519 secs]
[GC (Allocation Failure)  359209K->359433K(689664K), 0.0666344 secs]
[GC (Allocation Failure)  449033K->449417K(695296K), 0.1759746 secs]
[GC (Allocation Failure)  539017K->539385K(747008K), 0.1907760 secs]
[GC (Allocation Failure)  632569K->633009K(786944K), 0.2293778 secs]
[Full GC (Ergonomics)  633009K->631584K(1085952K), 2.1328028 secs]
[GC (Allocation Failure)  724768K->723368K(1146368K), 0.3092297 secs]
[GC (Allocation Failure)  827816K->825088K(1174016K), 0.3156138 secs]
[GC (Allocation Failure)  929536K->929952K(1207296K), 0.3891754 secs]
[GC (Allocation Failure)  1008800K->1009560K(1273856K), 0.4149915 secs]
[Full GC (Ergonomics)  1009560K->1007636K(1650688K), 3.4521240 secs]
[GC (Allocation Failure)  1086484K->1087425K(1671680K), 0.3884906 secs]
[GC (Allocation Failure)  1195969K->1196129K(1694208K), 0.2905121 secs]
[GC (Allocation Failure)  1304673K->1305257K(1739776K), 0.4291658 secs]
[GC (Allocation Failure)  1432745K->1433137K(1766912K), 0.4470582 secs]
[GC (Allocation Failure)  1560625K->1561697K(1832960K), 0.6003558 secs]
[Full GC (Ergonomics)  1561697K->1558537K(2343936K), 4.9359721 secs]
[GC (Allocation Failure)  1728009K->1730019K(2343936K), 0.7616385 secs]
[GC (Allocation Failure)  1899491K->1901139K(2413056K), 0.5187234 secs]
[Full GC (Ergonomics)  1901139K->1897477K(3119616K), 5.7177263 secs]
[GC (Allocation Failure)  2113029K->2114505K(3119616K), 0.6768888 secs]
[GC (Allocation Failure)  2330057K->2331441K(3171840K), 0.4812436 secs]
[Full GC (Ergonomics)  2331441K->2328578K(3530240K), 6.3054896 secs]
[GC (Allocation Failure)  2600962K->2488834K(3528704K), 0.1580837 secs]
*** Exiting critical section ***
Time for putting 32088
Time for getting 454
[GC (System.gc())  2537859K->2488834K(3547136K), 0.1599314 secs]
[Full GC (System.gc())  2488834K->2488485K(3547136K), 6.2759293 secs]
[GC (System.gc())  2488485K->2488485K(3559936K), 0.0060901 secs]
[Full GC (System.gc())  2488485K->2488485K(3559936K), 6.0975322 secs]
Memory(heap) used 2.6 GB


The two main points that jump out of this issue are, one, the number and expense of the garbage collections (clearly this could be tuned) and two the amount of heap used 2.6GB. In short, there's no getting away from it, this program produces masses of garbage.

Let's try exactly the same thing, this time using ChronicleMap

This is the code to solve the problem:

This is the output from the program:


[GC (Allocation Failure)  33280K->6595K(125952K), 0.0072065 secs]
[GC (Allocation Failure)  39875K->12177K(125952K), 0.0106678 secs]
[GC (Allocation Failure)  45457K->15289K(125952K), 0.0068434 secs]
[GC (Allocation Failure)  48569K->18357K(159232K), 0.0098287 secs]
[GC (Allocation Failure)  84917K->21008K(159232K), 0.0156393 secs]
*** Entering critical section ***
*** Exiting critical section ***
Time for putting 8554
Time for getting 4351
[GC (System.gc())  36921K->21516K(230400K), 0.0331916 secs]
[Full GC (System.gc())  21516K->15209K(230400K), 0.0630483 secs]
[GC (System.gc())  15209K->15209K(230912K), 0.0006491 secs]
[Full GC (System.gc())  15209K->15209K(230912K), 0.0234045 secs]
Memory(heap) used 18.2 MB


The main point here is obviously that there were no GCs in the critical section and that the whole program only used 18MB of heap. We have managed to create a program that ordinarily would have produced gigabytes of garbage without producing any garbage at all. 

A note on timings

ChronicleMap is clearly not a drop in replacement for ConcurrentHashMap, they have very different uses and it is beyond the scope of this post to go too much further into that line of discussion. But the main differences in functionality are  that ChronicleMap is persisted and can be shared amongst many JVMs. (ChronicleMap also has the ability to be tcp replicated.) Nevertheless it is interesting to quickly compare timings if nothing else than to make sure we are in the same ball park.  ChronicleMap was faster for putting, 8.5s compared to 32s.  But most of the time in ConcurrentHashMap was spent in GC and that might be tuned away to some extent.  ConcurrentHashMap was faster for getting, 0.5s compared to 4.3s.  Nevertheless on other runs I've seen ConcurrentHashMap taking over 7s because of a GC that occurred in that section. Even though ChronicleMap is doing significantly more work, the lack of garbage produced actually makes the timings comparable with ConcurrentHashMap.

Restarting the program   

Where ChronicleMap really comes into its own is on a restart.  Let's say your program goes down and you need to recalculate the same computation we did earlier.  In the case of ConcurrentHashMap we would have to repopulate the map in exactly the same we did earlier. With ChronicleMap, since the map is persistent it is just a matter of pointing the map at the existing file and rerunning the calculation to produce the totalQuantity.

Summary  


ConcurrentHashMapChronicleMap
gc pausesManyNone
update time32s8s
reads allowing gc7s4s
reads no gc0.5s4s
heap size2.6GB18MB
persistenceNoYes
fast restartNoYes

Wednesday 11 March 2015

Java Flight Recorder since jdk 1.8.0_40 / Further comments on safe points

I refer you my first blog post on this subject where I went through the fundamentals of Java Flight Recorder.

The really annoying thing about the tool is that in order to be able to run JFR against your program, the program had to be started with the system properties:

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder. 
This is not always convenient or even possible in certain runtime environments or where the process has been running a long time and is in a particular state that you would like to profile rather than restart.

Enter the latest jdk 1.8.0_40. With this build of the jdk you no longer need to have those options enabled when you started the java program. Just launch jmc and select your process as per normal and you will be presented with this dialog box.


You will be able to enable those command line options dynamically.  An excellent improvement to this excellent product!

Whilst on the topic of command line options for JFR I've started using this system property when running my programs. -XX:+DebugNonSafepoints 

According to the documentation:
One nice property of the JFR method profiler is that it does not require threads to be at safe points in order for stacks to be sampled. However, since the common case is that stacks will only be walked at safe points, HotSpot normally does not provide metadata for non-safe point parts of the code, which means that such samples will not be properly resolved to the correct line number and BCI. That is, unless you specify:
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints 
With DebugNonSafepoints, the compiler will generate the necessary metadata for the parts of the code not at safe points as well. 

To my mind the whole point of using JFR as opposed to YourKit etc is that it doesn't have to respect safe points and you can get more accurate stats about where time is actually being spent. I have seen evidence of this even without using this system property. But with this option enabled I expect the debug stats are even more accurate. I had a situation where I was looking for hotspot and JFR pointed me to a high level method which didn't help me at all. I then enabled the DebugNonSafepoints JFR pointed me to the exact point in the code where hotspot actually lay - fantastic!

Monday 9 March 2015

Take the IntelliJ Challenge!

It's always a great idea to try and master the tools with which you work. I'm on a mission to improve my IntelliJ skills and have been watching and reading tutorials on IntelliJ tips and tricks. One of the very best I've come across so far is this video by Hadi Hariri. As well as being very informative it is also entertaining and Hariri's presentation skills make the video anything but boring.  I encourage you to watch the video when you have a spare hour.

The challenge

Hariri, basically challenges IntelliJ users to take on these challenges:
  • Use IntelliJ without touching your mouse.
  • Use IntelliJ with only one open tab. 
To use IntelliJ without a mouse requires mastery of keyboard shortcuts. Once you've learnt them you will find that it's way more productive than taking your hands off the keyboard and fiddling with your mouse. I've tried really hard to manage completely without a mouse forcing myself to learn the keyboard short-cuts. It's actually really difficult - but worth it.  I can't say I'll ever get to the point where I won't use the mouse at all but if I can get my most frequent mouse operations down to fast keyboard short-cuts I would consider that mission accomplished.  

Using IntelliJ with just one tab (this can be configured in preferences - search for 'tab limit') is a very interesting idea. I've been working on this for a couple of weeks and it actually makes a lot of sense. My preferred coding configuration is to maximise the editing area (CNTRL SHIFT F12) and then split the screen vertically (right click on the tab header).  In this way I can see the code I'm working on as well another class.



The following table summarises the tips, tricks and keyboard shortcuts that were presented in the video, many of which I use frequently when attempting not to use my mouse. Some of these will be known to pretty much anyone who is not an IntelliJ novice but others are more obscure and all are pretty useful.

All the key bindings in this table are for MacOsx 10.5+. To bring up the key bindings menu enter CNTRL ~ for obvious reasons this command is the same for all key bindings. 





ModifiersKeyCommand DescriptionComment
SHIFT CMDaFind keyboard shortcut
CMDoOpen a typeType in Camel case letters for search
CMD SHIFToOpen a file
CMD ALToOpen a symbol
CMD1Focus project explorerEsc focus back in editor
SHIFT SHIFTSearch everywhere
CMD F12Show class membersCMD F12 again to show inherited members
F4Move from project window to editor
CMDeRecently viewed files
SHIFT CMDeRecently edited files
CNTRLTABSwitch to last viewed window forwards
CNTL SHIFTTABSwitch to last viewed window backwards
CNRLbGo to type declaration
CMD yView type declaration
F1Javadoc
ALT CMDbView all implementations
CMD[Go back to last cursor point
CMD]Go forward to cursor point
CMD SHIFTF12Maximise editorCMD SHIFT F12 to toggle back
CMD CMDShow all windows (if they are hidden)
SHIFTF4Extract the editor as new window
CMD SHIFTRIGHT ARROWresize project window wider
CMD SHIFTLEFT ARROWresize project window narrower
ALTUP ARROWsemantic selection increase
ALT DOWN ARROWsemantic selection decrease
ALT SHIFTUP ARROWmove highlighted code up
ALT SHIFTDOWN ARROWmove highlighted code down
CMDdduplicate current line
CMDBACKSPACEdelete current line
SHIFTENTERinsert new line under current
ALT SHIFTENTERinsert new line above current
ALT SHIFTMOUSE CLICKmulti caret support
CNTRL CMDgmulti carti selecting each instance
CNTRLgmove to next instance of highlighted
CNTRL SHIFTgmove to previous instance of highlighted
ALT CMDlcode reformatting
CMD wcloses a tab
CNTRLSPACEcode completion
CNTRL SHIFTSPACEsmart code completion
CNTL ALTvintroduce a variable
ALT ENTERGeneral help and useful completionse.g. create Test and fix errors
F2Go to next error or warning
SHIFTF2Go to previous error or warning
CMDfSearch
CMD rSearch and replace
ALT F12Bring up a terminal
CNTRLvGit menu
CNTRL SHIFTr Run this class
CNTRL SHIFTdDebug this class

If there are any other useful ones that you use frequently please let me know and I'll add them to the list!

A few other notes of interest:
  • Help -> Productivity Guide keeps track of what you do and by sorting on the functionality that you have not used can show you functions you probably didn't realise were available.
  • Help -> Default keyboard reference gives you a full list of keyboard short cuts
  • Post fix completion is fantastic. e.g. if you have an int i type i.fori to generate a for loop. Or if you have an Object obj type obj.notnull to generate a null check.  There are loads of other to investigate like try which generates a try catch.
  • If you are demoing or pair programming and want people watching to see your keyboard shortcuts use the presentation assist plugin which will highlight your keyboard shortcuts on the screen.
  • View -> PresentationMode makes your code much bigger and also useful for demos.
  • The gears icon in the project explorer allow you to automatically scroll from the project window to the editor view and visa versa.  I always select those options.  You can also select to see methods as well as classes in the project window.

Hope you found this useful and that taking on the IntelliJ challenge improves your productivity.  Please use the comments section for your favourite IntelliJ tips.

Creating Web Services and a Rest Server with JAX-RS and Jetty

Creating a WebService in Java is remarkably easy. To add it to a ServletContainer and deploy it to an embedded WebServer is only a few more lines of code. 

Let's create a simple calculator with a couple of functions as an example of a WebService. The calculator will compute the squareRoot and square of any number.  It will return a simple JSON response with the name of the action, the input and the output.

Before we start this is the Gradle configuration you will need:


This is the code for the Calculator:

The annotations determine the type of REST action to be applied to the method @GET, @PUT etc. The @Path annotation determines the URI of the request and the @Produces annotation determines how the response will be returned.  In our case we choose JSON, the conversion of which is all handled seamlessly.

In order to deploy our WebService we need a ServletContainer for which we will use Jersey and a WebServer into which we can drop the container for which we will use Jetty.

This is the code for the RestServer:



Once you have run the RestServer you will be able to test it with this URL.


http://localhost:8080/calculator/squareRoot?input=16


A really nice way to run queries from IntelliJ is to use the inbuilt REST Client which can be found under the tools menu.


  
When you run the REST Client you will get this Response:



It's a really easy way to test a RESTful server.