Tuesday 22 December 2015

How long does it take the jvm to effect escape analysis? Maybe longer than you think.

This post looks at escape analysis, in particular how long it takes for the jvm to effect escape analysis in a running program. I make some observations but don't have all the explanation at this point.

By way of introduction let's take a detour to look at a little known and even less used flag (which we'll see is a good thing) in the jvm, -Xcomp

The behaviour for this flag is defined in the jvm documentation as:




-Xcomp





Forces compilation of methods on first invocation. By default, the Client VM (-client) performs 1,000 interpreted method invocations and the Server VM (-server) performs 10,000 interpreted method invocations to gather information for efficient compilation. Specifying the -Xcomp option disables interpreted method invocations to increase compilation performance at the expense of efficiency.


At first sight this seems to be an excellent option.  A shortcut to warming up the jvm through 10,000 cycles - we can get the code to compile straight away. Shouldn't we always enable this option by default?

But the documentation does warn that this will be 'at the expense of efficiency'.

The jvm learns about code behaviour in the 10,000 warmup cycles so that when it comes to compiling it compiles in the most efficient way possible.  Compiling the code right away will mean that yes the code is indeed compiled but that the compiled code may not be the most efficient. You can read more about it in this blogpost - but that's not really the subject of this post.

Something else that doesn't happen if you use -Xcomp is escape analysis. This is actually rather surprising as the jvm shouldn't need to learn about whether escape analysis is possible by running the program.  This should be evident by a static analysis of the code.

Have a look at this code (I was inspired by the ideas in this blog):


We need to make sure that the program runs without a gc (I suggest these flags):

-verbosegc -Xmx4g -Xms4g

When the program waits for input carry out a heap dump to see how many Optional object have been created. Then hit any key to resume the program.

To perform a heap dump first run jps to identify the pid of the program then run:

jmap -histo pid | head
  
Do this once without the -Xcomp flag and once with -Xcomp flag.

Without the -Xcomp flag:

After first iteration:


num     #instances         #bytes  class name
----------------------------------------------
   1:           644      123241360  [I
   2:        234496        3751936  java.util.Optional
   3:          6582         791968  [C
   4:          2717         540712  [B
   5:          4786         114864  java.lang.String
   6:           662          75144  java.lang.Class
   7:          1349          63912  [Ljava.lang.Object;

After second iteration:

num     #instances         #bytes  class name
----------------------------------------------
   1:           644      116687472  [I
   2:        644095       10305520  java.util.Optional
   3:          6583         792056  [C
   4:          2717         540712  [B
   5:          4787         114888  java.lang.String
   6:           662          75144  java.lang.Class
   7:          1349          63912  [Ljava.lang.Object;

All subsequent iterations are the same no further objects are created:

There's clearly escape analysis kicking in after 234k iterations - not sure why it should take so long, usually (for example with compiling code) 10k iterations is enough? Also in the second iteration it creates another ~400k objects before escape analysis kick in which is also a bit mysterious.

With the -Xcomp flag

After the first iteration:


num     #instances         #bytes  class name
----------------------------------------------
   1:           653      153880352  [I
   2:       1000001       16000016  java.util.Optional
   3:          7397         834744  [C
   4:          2717         540728  [B
   5:          5685         136440  java.lang.String
   6:           672          76208  java.lang.Class
   7:          1349          63912  [Ljava.lang.Object;

After the second iteration:


num     #instances         #bytes  class name
----------------------------------------------
   1:           654      159354896  [I
   2:       2000001       32000016  java.util.Optional
   3:          7398         834832  [C
   4:          2717         540728  [B
   5:          5686         136464  java.lang.String
   6:           672          76208  java.lang.Class
   7:          1349          63912  [Ljava.lang.Object;

After each iteration the number of Optional objects goes up by 1m.



Summary

  • -Xcomp is a switch that should almost certainly never be used in production. I can imagine some scenarios where you might want to play around with disabling the interpreter but those would be very specific edge cases. 
  • It seems to take at least 200K iteration for escape analysis to be effective.  So you need to allow longer than the 10k iterations for a full warm up.
  • There is also another phase where after escaping out objects it seems to need to do this again. This needs further understanding. 
  • If you slow the program down a bit by doing some work in between the calls to create the Optional the number of objects reduces.  For example I found that a call to Math.sin reduces the Optional objects by about 50%.

Monday 14 December 2015

One liner JUnit Productivity Tip

There's nothing clever or complicated about this tip - nevertheless it's one that once discovered saved me a lot of time.

The tip is for the case when you're comparing two long strings.

Say you have the following code which compares two strings:

String expect = "a,aah,aahed,aahing,aahs,aardvark,aardvarks,aardwolf,ab,abaci,aback,abacus,abacuses,abaft,abalone,abalones,abandon,abandoned,abandonedly,abandonee,abandoner,abandoners,abandoning,abandonment,abandonments,abandons,abase,abased,abasedly,abasement,abaser,abasers,abases,abash,abashed,abashedly,abashes,abashing,abashment,abashments,abasing,abatable,abate,abated,abatement,abatements,abater,abaters,abates,abating,abatis,abatises,abator,abattoir,abattoirs,abbacies,abbacy,abbatial,abbe,abbes,abbess,abbesses";
String actual = "a,aah,aahed,aahing,aahs,aardvark,aardvarks,aardwolf,ab,abaci,aback,abacus,abacuses,abaft,abalone,abalones,abandon,abandoned,abandonedly,abandonee,abandoner,abandoners,abandoning,abandonment,abandonments,abandons,abase,abased,abasedly,abasement,abaser,abasers,abases,abash,abashed,abashedly,abashes,abashing,abashment,abashments,abasing,abatable,abate,abated,abatement,abatements,abater,abaters,abates,abating,abatis,abatises,abator,abattoir,abattoirs,abbacies,abbacy ,abbatial,abbe,abbes,abbess,abbesses";

assertEquals(expect, actual);

As it happens these strings are not exactly the same so you get this failure message when you run the test in IntelliJ:

org.junit.ComparisonFailure: 
Expected :a,aah,aahed,aahing,aahs,aardvark,aardvarks,aardwolf,ab,abaci,aback,abacus,abacuses,abaft,abalone,abalones,abandon,abandoned,abandonedly,abandonee,abandoner,abandoners,abandoning,abandonment,abandonments,abandons,abase,abased,abasedly,abasement,abaser,abasers,abases,abash,abashed,abashedly,abashes,abashing,abashment,abashments,abasing,abatable,abate,abated,abatement,abatements,abater,abaters,abates,abating,abatis,abatises,abator,abattoir,abattoirs,abbacies,abbacy,abbatial,abbe,abbes,abbess,abbesses
Actual   :a,aah,aahed,aahing,aahs,aardvark,aardvarks,aardwolf,ab,abaci,aback,abacus,abacuses,abaft,abalone,abalones,abandon,abandoned,abandonedly,abandonee,abandoner,abandoners,abandoning,abandonment,abandonments,abandons,abase,abased,abasedly,abasement,abaser,abasers,abases,abash,abashed,abashedly,abashes,abashing,abashment,abashments,abasing,abatable,abate,abated,abatement,abatements,abater,abaters,abates,abating,abatis,abatises,abator,abattoir,abattoirs,abbacies,abbacy ,abbatial,abbe,abbes,abbess,abbesses

 

The error messages isn't particularly helpful to say the least....

Fortunately IntelliJ provides us with a 'click to see the difference dialog' - but when you click and open the dialog, although it will highlight the difference it's really not immediately obvious where the differences are and you end up scrolling a very long way to find them. And it's certainly impossible to see all the differences at a glance.



So here's the tip instead of this:

assertEquals(expect, actual);

Insert new lines into your string as below:

assertEquals(expect.replace(',', '\n'), actual.replace(',', '\n'));

Then when you click to see the difference you get a dialog that looks like this and you 
are able to see the issues at a glance. Simple but effective.


 





Thursday 10 December 2015

7 Tips for Successful Code Generation in Java

By way of introduction, I've been a bit quiet recently and part of the reason for that is that I've been busy working on Chronicle-FIX. This is a new ultra low latency library in the Chronicle-Enterprise suite where we have proven that we can parse and store message in low single digit micro-seconds.  Of course it leverages our open source products Chronicle-Queue, Chronicle-Network and Chronicle-Bytes

One of the secrets of the low latency we achieve is that each implementation can generate a custom built fix engine based exactly on the schema it requires. In order to achieve this I've built a whole lot of code generation code.

So I thought I would share some of the lessons I learnt during this process.

1. Use a code generation library

This is by far the most important tip. I can't stress enough how important this is and how much time and complexity you will save.  You really don't want to be messing around trying to get the correct spacing in you code or doubly and triply escaping quotes when creating string literals!

I used JavaPoet, it's open source (Apache 2) and it is excellent. It handled everything I wanted to do including some quite complex generics. The only thing it didn't support was declaring static imports but that was easy to work around and an insignificant quibble for what was an excellent library.

JavaPoet decomposes Java code into objects: TypeSpec (the class or interface), MethodSpec (methods), FieldSpec (fields), CodeBlocks (blocks of code).  It cleverly provides a rich syntax for building up statements (lines of code) based on the printf pattern. You will find that following this paradigm will leave you with cleaner and more succinct code than trying to this yourself where it's easy to stray down the procedural coding route for this type of problem.

2. Write the code by hand first

Don't try and code generate without having written some example code first. It's not easy coding through the rear view mirror, and double complicated if you're working it out as you go along.

Spend a little bit of time write the code by hand first and then produce the code generation for it.

Additionally you will be wanting to produce the most optimal code possible which can only be done if you spend the time and effort writing it by hand first.  

3. Generate as little code as possible

Generate only as much code as you need. So for example, if all your generated classes need to implement a certain method use a helper class that can be called by the generated code rather than generating that method multiple times. Alternatively get your code to extend a static base class with the method on the base class.

4. Make sure you can blow away all generated code in one go

Don't mix your static code with your generated code. What you want to do is to have a package like com.test.generatedcode.xx.xx.  At the end of each test/development run you should be able to delete the whole folder com/test/generatedcode which means that you have no static code in that folder at all.

You will find that arranging your code in this way will make the testing cycle that much faster.

5. Start with a small verifiable set of code

The point of code generation is often to produce lots of code. It can be hard to test whether the code you have produced is correct or not.  For this reason start with a small but complex example that you can check both by reading the code and for which you have a test case.

Hopefully for cases after that it will be more of the same.

6. Generate test cases 

As mentioned in point 5) verifying that your code generator is correct for all cases can be difficult.  For this reason you should try and generate test cases as part of the code that is generated. You can then generate many skews of complicated code from your generator and have it test itself.

7. Be verbose

Code to generate code, by its nature, can be extremely difficult to read and understand. Code generating code will never have to be fast so don't worry about making it verbose. Use lots of comments but more importantly try and make it as explicit as possible so that you will be able to look back at it in a year and still be able to maintain it.

Wednesday 28 October 2015

Let's pause for a Microsecond

A lot of benchmarks in low latency Java applications involve having to measure a system under a certain load. This requires maintaining a steady throughput of events into the system as opposed to pumping events into a system at full throttle with no control whatsoever.

One of the tasks I often have to do is pause a producer thread for a short period inbetween events. Typically this amount of time will be single digit microseconds.

So how do you pause a Thread for this amount of time?  Most Java developers think instantly of Thread.sleep(). But that's not going to work because Thread.sleep() only goes down to milliseconds and that's an order of magnitude longer than the amount of time required for our pause in microseconds.

I saw an answer on StackOverflow pointing the user to TimeUnit.MICROSECONDS.sleep() in order to sleep for less than a millisecond.  This is clearly incorrect, to quote from the JavaDoc:

Performs a Thread.sleep using this time unit. This is a convenience method that converts time arguments into the form required by the Thread.sleep method.

So you're not going to be able to get better than a 1 millisecond pause , similar to Thread.sleep(1). (You can prove this trying the example on the code below).

The reason for this is that this method of pausing, namely putting a thread to sleep and waking it up, is never going to be fast or accurate enough to go lower than a millisecond.

Another question we should be introducing at this point is how accurate is Thread.sleep(1) anyway? We'll come back to this in later.

Another option when we want to pause for a microsecond is to use LockSupport.parkNanos(x).  Using the following code to park for 1 microsecond actually takes ~10us.  It's way better than TimeUnit.sleep() / Thread.sleep() but not really fit for purpose.  After 100us it does get into the same ball park with only a 50% variation.


The answer to our problems is to use System.nanoTime(). By busy waiting on a call to System.nanoTime we will be able to pause for a single microsecond.  We'll see the code for this in a second but first let's understand the accuracy of System.nanoTime(). Critically, how long does it take to perform the call to System.nanoTime().

Here's some code that will do exactly this:



The numbers will vary from one machine to another on my MBP I get ~40 nanoseconds.

That tells us that we should be able to measure to an accuracy of around 40 nanoseconds. Therefore, measuring 1 microsecond (1000 nanoseconds) should easily be possible.

This is the busy waiting approach 'pausing' for a microsecond:


The code waits for a microsecond and then times how long it has waited.  On my machine I get 1,115 nanoseconds which is within ~90% accurate. 

As you wait longer the accuracy increases, 10 microseconds takes 10,267 which is ~97% accurate and 100 microseconds takes 100,497 nanoseconds which is ~99.5% accurate.

What about Thread.sleep(1), how accurate is that?

Here's the code for that:


The average time in nanoseconds for 1 millisecond sleep is 1,295,509.  That only ~75% accurate.  It's probably good enough for nearly everything but if you want an exact millisecond pause you are far better off with a busy wait.  Of course you need to remember that busy waiting, by definition keeps your thread busy and will costs you a CPU.

Summary Table

Pause Method1us10us100us1000us/1ms10,000us/10ms
TimeUnit.Sleep()1284.61293.81295.71292.711865.3
LockSupport.parkNanos()8.128.4141.81294.311834.2
BusyWaiting1.110.1100.21000.210000.2


Conclusions

  • The only way to pause for a microsecond is by busy waiting
  • If you want to pause for anything less than a millisecond accurately you need to busy wait
  • LockSupport only begins to get accurate at 100us
  • System.nanoTime() takes ~40ns
  • Thread.sleep(1) is only 75% accurate
  • Busy waiting on more than 10us and above is almost 100% accurate
  • Busy waiting will tie up a CPU 





Friday 16 October 2015

Dynamic Java Code Injection

In this post we're going to look at how to dynamically load Java code into a running jvm. The code might be completely new or we might want to change the functionality of some existing code within our program.

(Before we start you might be wondering why on earth anyone might want to do this. The obvious example is for something like a rules engine. A rules engine would want to offer the ability for users to add or change rules without having to restart the system. You could do this by injecting DSL scripts as rules which would be called by your rules engine. The real problem with such an approach is that the DSL scripts would have to be interpreted making them exceedingly slow to run. Injecting actual Java code which can then be compiled and run in the same way as any other code in your program will be orders of magnitude more efficient.

At Chronicle we are using this very idea at the heart of our new microsecond micro-services/algo container).


The library we are going to use is the open source Chronicle library Java-Runtime-Compiler.

As you will see from the code below, the library is exceedingly simple to use - in fact it really only takes a couple of lines. Create a CachedCompiler and then call loadFromJava. (See the documentation here for the actual simplest use case.)

The program listed below does the following:
  1. Creates a thread which calls compute on a Strategy every second. The inputs to the Strategy are 10 and 20.
  2. Loads a strategy which add two numbers together
  3. Waits 3s
  4. Loads a strategy which deducts one number from the other
This is the full code listing:


This is the output (comments in blue):

The strategy has not been loaded yet. underlying in the StrategyProxy is null so Integer.MIN_VALUE is returned
-2147483648
The adding strategy has been loaded 10+20=30
30
30
30
After 3s the subtracting strategy is loaded. It replaces the adding strategy. 10-20=-10
-10
-10
-10
-10

-10

Note that in the code we created a new ClassLoader and a new CachedCompiler each time we loaded the Strategy.  The reason for this is that a ClassLoader can only have one instance of a particular class loaded at any one time.

If you were only using this library to load new code you would do it like this, without creating a ClassLoader (i.e. using the default ClassLoader) and using the CachedCompiler.


Class aClass = CompilerUtils.CACHED_COMPILER.loadFromJava(className, javaCode);

Friday 9 October 2015

Chronicle-Wire Tutorial (Part 3): Serialising Code

Before reading this, Part 3 of the Chronicle-Wire tutorial, I'm assuming that you are comfortable with at least Part 1 (the basics) of the tutorial.

So far we've looked at how to serialise data as objects as well as serialising data in the form of documents. Now we're going to look at how we can serialise code.

There are two ways this can be done with lambdas and with enums.

Serialising code with lambdas

One of the coolest features of Java 8 are lambdas which provide a means to pass functions (essentially code) from one part of your application to be run in other parts of your application.

Being able to serialise code is really useful, for example, if you want to write some code on a client which gets executed on the server. Or, for map reduce type problems where you break up a problem and distribute the code so that it can be run on many machines. 

This example demonstrates how this can be done, in this case using the class SerializableFunction. The lambda String::toUpperCase is serialised to Bytes using both TextWire and BinaryWire.  The Bytes are then deserialised into a Function which is applied to a string "hello world".



The output of this program is:

----------Testing with TextWire--------------------
Text Wire representation of serialised function
toUpperCase: !SerializedLambda {
  cc: !type net.openhft.engine.chronicle.demo.WireDemoLambdas,
  fic: net/openhft/chronicle/core/util/SerializableFunction,
  fimn: apply,
  fims: (Ljava/lang/Object;)Ljava/lang/Object;,
  imk: 5,
  ic: java/lang/String,
  imn: toUpperCase,
  ims: ()Ljava/lang/String;,
  imt: (Ljava/lang/String;)Ljava/lang/String;,
  ca: [
  ]
}

hello world -> HELLO WORLD

----------Testing with BinaryWire--------------------
Binary Wire representation of serialised function
00000000 CB 74 6F 55 70 70 65 72  43 61 73 65 B6 10 53 65 ·toUpper Case··Se
00000010 72 69 61 6C 69 7A 65 64  4C 61 6D 62 64 61 82 1E rialized Lambda··
00000020 01 00 00 C2 63 63 BC 31  6E 65 74 2E 6F 70 65 6E ····cc·1 net.open
00000030 68 66 74 2E 65 6E 67 69  6E 65 2E 63 68 72 6F 6E hft.engi ne.chron
00000040 69 63 6C 65 2E 64 65 6D  6F 2E 57 69 72 65 44 65 icle.dem o.WireDe
00000050 6D 6F 4C 61 6D 62 64 61  73 C3 66 69 63 B8 34 6E moLambda s·fic·4n
00000060 65 74 2F 6F 70 65 6E 68  66 74 2F 63 68 72 6F 6E et/openh ft/chron
00000070 69 63 6C 65 2F 63 6F 72  65 2F 75 74 69 6C 2F 53 icle/cor e/util/S
00000080 65 72 69 61 6C 69 7A 61  62 6C 65 46 75 6E 63 74 erializa bleFunct
00000090 69 6F 6E C4 66 69 6D 6E  E5 61 70 70 6C 79 C4 66 ion·fimn ·apply·f
000000a0 69 6D 73 B8 26 28 4C 6A  61 76 61 2F 6C 61 6E 67 ims·&(Lj ava/lang
000000b0 2F 4F 62 6A 65 63 74 3B  29 4C 6A 61 76 61 2F 6C /Object; )Ljava/l
000000c0 61 6E 67 2F 4F 62 6A 65  63 74 3B C3 69 6D 6B 05 ang/Obje ct;·imk·
000000d0 C2 69 63 F0 6A 61 76 61  2F 6C 61 6E 67 2F 53 74 ·ic·java /lang/St
000000e0 72 69 6E 67 C3 69 6D 6E  EB 74 6F 55 70 70 65 72 ring·imn ·toUpper
000000f0 43 61 73 65 C3 69 6D 73  F4 28 29 4C 6A 61 76 61 Case·ims ·()Ljava
00000100 2F 6C 61 6E 67 2F 53 74  72 69 6E 67 3B C3 69 6D /lang/St ring;·im
00000110 74 B8 26 28 4C 6A 61 76  61 2F 6C 61 6E 67 2F 53 t·&(Ljav a/lang/S
00000120 74 72 69 6E 67 3B 29 4C  6A 61 76 61 2F 6C 61 6E tring;)L java/lan
00000130 67 2F 53 74 72 69 6E 67  3B C2 63 61 82 00 00 00 g/String ;·ca····
00000140 00                                               ·                

hello world -> HELLO WORLD

Serialising code with enums

Using lambdas to serialise code give the developer maximum flexibility in terms of what can be serialised but there also a couple drawbacks.
  1. The serialised lambda is very bulky (see the print out of the bytes above)
  2. There is no control over what can be serialised.  If you are using serialisation to send messages across the wire between client and server you might want to have more control over the code that can be executed by the client on the server. 
Using enums addresses these shortcomings.

The code below does exactly the same as the code we saw in the lambda example in the first section. It takes the code to transform a string to upper case and serialises it. The code to transform the string is stored in an enum which implements Function.




This is the output from that program

----------Testing with TextWire--------------------

Text Wire representation of serialised function
toUpperCase: !StringFunctions TO_UPPER_CASE

hello world -> HELLO WORLD

----------Testing with BinaryWire--------------------

Binary Wire representation of serialised function
00000000 CB 74 6F 55 70 70 65 72  43 61 73 65 B6 0F 53 74 ·toUpper Case··St
00000010 72 69 6E 67 46 75 6E 63  74 69 6F 6E 73 ED 54 4F ringFunc tions·TO
00000020 5F 55 50 50 45 52 5F 43  41 53 45                _UPPER_C ASE     


hello world -> HELLO WORLD

Straight away we see the difference between the verbosity of the self describing lambda to the predefined enum.

Enum ->  toUpperCase: !StringFunctions TO_UPPER_CASE
Lambda -> toUpperCase: !SerializedLambda {
  cc: !type net.openhft.engine.chronicle.demo.WireDemoLambdas,
  fic: net/openhft/chronicle/core/util/SerializableFunction,
  fimn: apply,
  fims: (Ljava/lang/Object;)Ljava/lang/Object;,
  imk: 5,
  ic: java/lang/String,
  imn: toUpperCase,
  ims: ()Ljava/lang/String;,
  imt: (Ljava/lang/String;)Ljava/lang/String;,
  ca: [
  ]
}

Since all the enums have been pre-defined there are no nasty surprises as to what code is going to be run.

On the downside of course you lose the flexibility of ad hoc lambdas.

Note the line:

ClassAliasPool.CLASS_ALIASES.addAlias(StringFunctions.class, "StringFunctions");

As explained earlier this means that "StringFunctions" is mapped to the class StringFunctions:

Without this line (without class aliasing) the output:

toUpperCase: !StringFunctions TO_UPPER_CASE

would be

toUpperCase: !net.openhft.engine.chronicle.demo.WireDemoEnums$StringFunctions TO_UPPER_CASE

Apart from being more verbose and harder to read it would also be a problem when passing the message between languages which is one of the reasons to use Chronicle-Wire serialisation.

Summary

Use enums when you know you need maximum efficiency and/or control. Use lambdas when you need flexibility. In practice you will probably mix both for different parts of your application.

Chronicle-Wire Tutorial (Part 2): Working with Documents

In Part 1 we saw the basics of how to use Chronicle-Wire to serialise and deserialise objects and some of the benefits of using Wire.

In this post I want to show you some more advanced features around serialising data in the form of documents.

Serialising documents

Rather than serialising a whole object with Wire (which is what we did in the previous post with Person) it is possible that you might want to group a number of data items together in an ad hoc object and serialise that with Wire.

You can do this with the document feature of Chronicle.

This should be clearer by looking at this example.



The output from this program is:


--------TextWire Demo--------------
Data serialised with TextWire:
!data: {
  name: dan,
  age: 44
}

Data deserialised:
Name:dan
Age:44

---------BinaryWire Demo--------------
Data serialised with BinaryWire:
00000020 34 34 0A 7D 0A 18 00 00  00 C4 64 61 74 61 82 0E 44·}···· ··data··
00000030 00 00 00 C4 6E 61 6D 65  E3 64 61 6E C3 61 67 65 ····name ·dan·age
00000040 2C                                               ,                
Data deserialised:
Name:dan

Age:44

So you have the ability to create ad hoc objects in form of documents.

But you can go a step further with this:

Creating real objects on the fly

When writing a document you have the ability to give it a 'type'.  This is done by calling the method typePrefix() as you can see in the code below.



This is the out put from the program:

---------TextWire Demo--------------
Data serialised with TextWire:
Kdata: !chronicle.demo.Person {
  name: dan,
  age: 44
}

Data deserialised:
Person{name='dan', age=44}

---------BinaryWire Demo--------------
Data serialised with BinaryWire:
00000040 6E 2C 0A 20 20 61 67 65  3A 20 34 34 0A 7D 0A 42 n,·  age : 44·}·B
00000050 00 00 00 C4 64 61 74 61  B6 28 6E 65 74 2E 6F 70 ····data ·(net.op
00000060 65 6E 68 66 74 2E 65 6E  67 69 6E 65 2E 63 68 72 enhft.en gine.chr
00000070 6F 6E 69 63 6C 65 2E 64  65 6D 6F 2E 50 65 72 73 onicle.d emo.Pers
00000080 6F 6E 82 0E 00 00 00 C4  6E 61 6D 65 E3 64 61 6E on······ name·dan
00000090 C3 61 67 65 2C                                   ·age,            
Data deserialised:
Person{name='dan', age=44}


The code for the serialisation is almost the same as the we used in the last example except that this time it sets typePrefix() to the Person class.  This is the same Person we saw in the previous post. Code listing below:




Because we know the type of the object, we are able to deserialise using the method typedMarshallable() into a Java object.  In this example we have created a Person object.

Note: Wire has the concept of a ClassAliasPool which allows you to use shortened names or aliases rather than the fully qualified class name. This is important as it can make your data shorter and easier to read, both of which are goals of Chronicle-Wire.

Deserialising documents without creating objects

One of the goals of Chronicle in general and Chronicle-Wire in particular is to aim for zero object creation.  Reusing objects is key to achieving this goal.

In the code below we see how to deserialise data from the document directly into an existing object. (For the sake of brevity I'm only going to use TextWire).




Working with deserialised data

Along the same lines as we saw above you can also deserialise directly into a lambda that can be used to manipulate or use that data.

Take a look at this example that tests the value of the deserialised data:




The interesting thing to note here is how, when deserialising, the object() method can be employed to use the data. In this case we are asserting to prove we have the correct data but in a real application other more meaningful tasks would be created.

Summary

Hopefully this tutorial has introduced you to the power of using documents within Chronicle-Wire.  Creating ad hoc objects, parts of objects, and real objects from their constituent data parts are features that can make your code easier to write, easier to debug and most of all, make your code faster. 

Thursday 8 October 2015

Chronicle-Wire Tutorial (Part 1): The Basics

Chronicle-Wire is a library written by Chronicle that allows a Java developer to serialise Java objects. It's a library we developed to support our higher level products like Chronicle Queue and Chronicle Map. However the library has applications in any code that uses serialisation.

At this point you're wondering what's new about serialisation why another library... Serialisation is hardly a novel concept in Java. In fact Serializable has been in Java since jdk1.1 - so almost forever :)

The real innovation behind Wire is that it abstracts away the implementation of the serialisation to a pluggable Wire implementation.  The idea is that your objects need only describe what is to be serialised not how it should be serialised. This is done by the objects (the POJOs that are to be serialised) implementing the Marshallable interface.  

It is only at the point of serialisation that you decide how that serialisation is actually implemented by selecting a particular Wire implementation to provide to the process.

Let's look at an example of this which will hopefully make this concept much clearer:


This is the output from the program:

-----------TEXT WIRE------------

Person to serialise: Person{name='dan', age=44}
Text Wire prints:
name: dan
age: 44
Deserialised person: Person{name='dan', age=44}

-----------BINARY WIRE------------

Person to serialise: Person{name='dan', age=44}
Text Wire prints:
00000000 C4 6E 61 6D 65 E3 64 61  6E C3 61 67 65 2C       ·name·da n·age,  
Deserialised person: Person{name='dan', age=44}

-----------RAW WIRE------------

Person to serialise: Person{name='dan', age=44}
Text Wire prints:
00000000 03 64 61 6E 2C                                   ·dan,            

Deserialised person: Person{name='dan', age=44}

What should be clear here is that the class Person is in no way responsible for how its data is serialised.  That is done by the various implementations of Wire.

  • TextWire - serialises to text for a humanly readable format
  • BinaryWire - serialises to a self describing binary format
  • RawWire - serialises to a compact binary format

Person is only responsible for choosing the data that is to be serialised and describing the type of that data.  You will notice that Wire has a very large list of types which allow for maximum efficiency (e.g. int8(), int16() ) that can achieved by certain Wire implementations.

Whilst in the example above Person is serialised to Bytes (for more information on Bytes see here) you can actually serialise to whatever format you want. All you have to do is to implement the Wire interface.  (It is not necessary to provide an implementation for the methods if not appropriate.)

As well as obvious serialisation formats such as JSON, YAML, csv you can also create some rather bizarre ones as well. For example I created an LDAPWire which serialises objects into Attributes that can be stored into an LDAP database.  In this case I only implemented the text() methods of Wire.

This chart (scroll down) compares a couple of the Wire implementations with the competition. As you can see it compares favourably with SBE and Capt'n Proto.

Thursday 24 September 2015

Java to LDAP Tutorial (Including How to Install an LDAP Server / Client)

This tutorial will show you how to write Java code to interact with a LDAP. But before we can do that we will need to setup an LDAP server and client on our machine.

If at this point you are not sure of exactly what LDAP is, I recommend this post which provides an excellent definition with examples. (In a nutshell it helps to think of an LDAP server as a specialised database).

Installing an LDAP Server

I'm running on a MBP. After looking around for a while I found that the easiest LDAP Server to install was ApacheDirectory which you can download from here. (To install and start the server should take less than 5 minutes)

Once it's installed it automatically starts the daemon. You can then run the server with this command.

sudo launchctl start org.apache.directory.server 

For further installation instructions see here.

(If you need to uninstall you will find the application installed at /usr/local/apacheds-2.0.0-M20 just delete that directory and it will be gone)

LDAP Client

You will want to view the contents of your LDAP Server.  The easiest LDAP client to install is Apache Directory Studio which can be downloaded from here.

Once it is downloaded you need to create a connection to the server - the instructions for which are contained here.

When connected your Apache Directory Studio should look something like this:




Now to access LDAP from a Java program. The best way to show you how to do this is through an example program. The program will perform the following tasks:
  • Create an new LDAP object
  • View an LDAP object
  • Add a new attribute to an LDAP object
  • Modify an attribute on an LDAP object
  • Remove an attribute on an LDAP object
  • Delete an LDAP object
  • Search for all LDAP objects in a specific domain

Note:  This class cleans up after itself i.e. It leaves the LDAP Server in the state in which it was found.  If you want to see the various tasks in action just run one of the tasks and take a look at the LDAP Object through the LDAP Client. Don't forget you can modify the object in the LDAP Client and test in that way.

The code is below and should be self explanatory.

Monday 21 September 2015

JAXB - XML to Java in 2 lines of Code!

A couple of weeks ago I was presented with a bunch of xml files for which I was also given the xsd document.  The task was to calculate some metrics based on properties of the xml files.

Not too complicated.  All I had to do was:

  1. Convert the xml files to Java objects.
  2. Write some code to perform a calculation on said objects.
I hadn't actually tried this before but I challenged myself to do this (at least the boilerplate of point 1) in 2 lines of code. Here's how I did it:

Step 1 - Create a Java data model

I created a project in IntelliJ and dropped my xsd document in a resources directory.

I then highlighted the xsd file and clicked Tools->JAXB->Generate Java Code


I was presented with a dialogue box as below


Click OK and hey presto your whole data model has been created into Java objects.

Step 2 - Deserialise the xml file into Java Objects  

//line 1 create an Unmarshaller for the object type you are reading from the xml file
Unmarshaller um = JAXBContext.newInstance(DataObject.class).createUnmarshaller();
//line 2 deserialise the xml file into a java object
DataObject dataOject = (DataObject)um.unmarshal(new FileReader("DataObject.xml"));
//now run the calculation on dataObject

So that's it - only took 2 lines!