Thursday 10 December 2015

7 Tips for Successful Code Generation in Java

By way of introduction, I've been a bit quiet recently and part of the reason for that is that I've been busy working on Chronicle-FIX. This is a new ultra low latency library in the Chronicle-Enterprise suite where we have proven that we can parse and store message in low single digit micro-seconds.  Of course it leverages our open source products Chronicle-Queue, Chronicle-Network and Chronicle-Bytes

One of the secrets of the low latency we achieve is that each implementation can generate a custom built fix engine based exactly on the schema it requires. In order to achieve this I've built a whole lot of code generation code.

So I thought I would share some of the lessons I learnt during this process.

1. Use a code generation library

This is by far the most important tip. I can't stress enough how important this is and how much time and complexity you will save.  You really don't want to be messing around trying to get the correct spacing in you code or doubly and triply escaping quotes when creating string literals!

I used JavaPoet, it's open source (Apache 2) and it is excellent. It handled everything I wanted to do including some quite complex generics. The only thing it didn't support was declaring static imports but that was easy to work around and an insignificant quibble for what was an excellent library.

JavaPoet decomposes Java code into objects: TypeSpec (the class or interface), MethodSpec (methods), FieldSpec (fields), CodeBlocks (blocks of code).  It cleverly provides a rich syntax for building up statements (lines of code) based on the printf pattern. You will find that following this paradigm will leave you with cleaner and more succinct code than trying to this yourself where it's easy to stray down the procedural coding route for this type of problem.

2. Write the code by hand first

Don't try and code generate without having written some example code first. It's not easy coding through the rear view mirror, and double complicated if you're working it out as you go along.

Spend a little bit of time write the code by hand first and then produce the code generation for it.

Additionally you will be wanting to produce the most optimal code possible which can only be done if you spend the time and effort writing it by hand first.  

3. Generate as little code as possible

Generate only as much code as you need. So for example, if all your generated classes need to implement a certain method use a helper class that can be called by the generated code rather than generating that method multiple times. Alternatively get your code to extend a static base class with the method on the base class.

4. Make sure you can blow away all generated code in one go

Don't mix your static code with your generated code. What you want to do is to have a package like com.test.generatedcode.xx.xx.  At the end of each test/development run you should be able to delete the whole folder com/test/generatedcode which means that you have no static code in that folder at all.

You will find that arranging your code in this way will make the testing cycle that much faster.

5. Start with a small verifiable set of code

The point of code generation is often to produce lots of code. It can be hard to test whether the code you have produced is correct or not.  For this reason start with a small but complex example that you can check both by reading the code and for which you have a test case.

Hopefully for cases after that it will be more of the same.

6. Generate test cases 

As mentioned in point 5) verifying that your code generator is correct for all cases can be difficult.  For this reason you should try and generate test cases as part of the code that is generated. You can then generate many skews of complicated code from your generator and have it test itself.

7. Be verbose

Code to generate code, by its nature, can be extremely difficult to read and understand. Code generating code will never have to be fast so don't worry about making it verbose. Use lots of comments but more importantly try and make it as explicit as possible so that you will be able to look back at it in a year and still be able to maintain it.

3 comments:

  1. This is by far the most important tip. I can't stress enough how important this is and how much time and complexity you will save. The point of code generation is often to produce lots of code. It can be hard to test whether the code you have produced is correct or not. The latest reviews of 99Papers support this point of view.

    ReplyDelete
  2. This is by far the most important tip. I can't stress enough how important this is and how much time and complexity you will save. You really don't want to be messing around trying to get the correct spacing in you code or doubly and triply escaping quotes when creating string literals! 700-150 dumps

    ReplyDelete