Thursday 30 April 2015

Cheating with Exceptions - Java 8 Lambdas

Leaving aside the religious debate about Checked vs Runtime exceptions, there are times where due to poorly constructed libraries, dealing with checked examples can drive you insane.

Consider this snippet of code which you might want to write:

public void createTempFileForKey(String key) {
  Map<String, File> tempFiles = new ConcurrentHashMap<>();
  //does not compile because it throws an IOException!!
  tempFiles.computeIfAbsent(key, k -> File.createTempFile(key, ".tmp"));
}

For it to compile you need to catch the exception which leaves you with this code:

public void createTempFileForKey(String key) {
    Map<String, File> tempFiles = new ConcurrentHashMap<>();
    tempFiles.computeIfAbsent(key, k -> {
        try {
            return File.createTempFile(key, ".tmp");
        }catch(IOException e) {
            e.printStackTrace();
            return null;
        }
    });
}

Although this compiles, the IOException has effectively been swallowed.  The user of this method should be informed that an Exception has been thrown.

To address this you could wrap the IOException in a generic RuntimeException as below:

public void createTempFileForKey(String key) throws RuntimeException {
    Map<String, File> tempFiles = new ConcurrentHashMap<>();
    tempFiles.computeIfAbsent(key, k -> {
        try {
            return File.createTempFile(key, ".tmp");
        }catch(IOException e) {
            throw new RuntimeException(e);
        }
    });
}

This code does throw an Exception but not the actual IOException which was intended to be thrown by the code. It's possible that those in favour of RuntimeExceptions only would be happy with this code especially if the solution could be refined to created a customised IORuntimeException.  Nevertheless the way most people code, they would expect their method to be able to throw the checked IOException from the File.createTempFile method.   

The natural way to do this is a little convoluted and looks like this:

public void createTempFileForKey(String key) throws IOException{
        Map<String, File> tempFiles = new ConcurrentHashMap<>();
        try {
            tempFiles.computeIfAbsent(key, k -> {
                try {
                    return File.createTempFile(key, ".tmp");
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
            });
        }catch(RuntimeException e){
            if(e.getCause() instanceof IOException){
                throw (IOException)e.getCause();
            }
        }
}

From inside the lambda, you would have to catch the IOException, wrap it in a RuntimeException and throw that RuntimeException. The lambda would have to catch the RuntimeException unpack and rethrow the IOException. All very ugly indeed!

In an ideal world what we need is to be able to do is to throw the checked exception from within the lambda without having to change the declaration of computeIfAbsent. In other words, to throw a check exception as if it were an runtime exception. But unfortunately Java doesn't let us do that...

That is not unless we cheat! Here are two methods for doing precisely what we want, throwing a checked exception as if it were a runtime exception.

Method 1 - Using generics:

    public static void main(String[] args){
        doThrow(new IOException());
    }

    static void doThrow(Exception e) {
        CheckedException.<RuntimeException> doThrow0(e);
    }

    static <E extends Exception>
      void doThrow0(Exception e) throws E {
          throw (E) e;
    }

Note that we have create and thrown an IOException without it being declared in the main method.

Method 2 - Using Unsafe:

public static void main(String[] args){
        getUnsafe().throwException(new IOException());
    }

    private static Unsafe getUnsafe(){
        try {
            Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
            theUnsafe.setAccessible(true);
            return (Unsafe) theUnsafe.get(null);
        } catch (Exception e) {
            throw new AssertionError(e);
        }
    }

Again we have managed to throw an IOException without having declared it in the method.

Whichever method you prefer we are now free to write the original code in this way:

public void createTempFileForKey(String key) throws IOException{
        Map<String, File> tempFiles = new ConcurrentHashMap<>();

        tempFiles.computeIfAbsent(key, k -> {
            try {
                return File.createTempFile(key, ".tmp");
            } catch (IOException e) {
                throw doThrow(e);
            }
        });
    }
    
    private RuntimeException doThrow(Exception e){
        getUnsafe().throwException(e);
        return new RuntimeException();
    }

    private static Unsafe getUnsafe(){
        try {
            Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
            theUnsafe.setAccessible(true);
            return (Unsafe) theUnsafe.get(null);
        } catch (Exception e) {
            throw new AssertionError(e);
        }
    }

The doThrow() method would obviously be encapsulated in some utility class leaving your code in createTempFileForKey() pretty clean.

Lessons Learnt Translating 25k line of C# into Java

For various reasons I've recently completed a project converting a complex financial application from C# to Java. The reasons for the port were for the most part non-technical, rather, it was a strategic move for the business concerned. 

It was an interesting experience and I learnt a few lessons along the way that might be useful to share.

1. Construct language neutral tests over the existing system.

I'll start with perhaps the most important lesson of all. When porting a system, and this could be any port for any reason, there must be criteria to determine whether the port has been successful. The best way to do this is to construct a full set of tests around the original system, that can be 'exported without change' to the new system.  So for example, it's no good having a suite of JUnit tests if you want to move system from Java to a different language that doesn't support JUnit.  I can't stress enough how important it was that the changes to the tests could literally be copied from the old system to the new system without intervention. 

Another problem with JUnit tests is that they are often firmly linked to the existing implementation. Since the implementation is going to be rewritten, the tests are not portable between implementations.

The strategy we chose and which worked extremely well was to use Cucumber tests.  There are bindings for Cucumber in nearly all languages, it is well supported by IDEs (at least by both IntelliJ and Visual Studio) and as a bonus the tests are human readable. In this way you can involve non-technical users in building up the tests in preparation for the port. (As an aside, we had an attempt at getting the users to define the requirements for the new system by documenting everything the old system did and building tests round those requirements, but that unsurprisingly was completely unsuccessful.  It's far better building up test cases based off your existing implementation than trying to invent them for the new system!).

Using Cucumber was a real success and we created a new test every time there was a discrepancy between the systems. By the time we have finished we had around 1000 scenarios and we felt confident that the new system was correct.  It gave us the solid foundations we needed to continue developing the additional features and refactorings in the new system.

2. Try and automate as much of the translation as possible.

When faced with 25k+ lines of C# it's a pretty daunting task to think of hand translating every line into Java. Fortunately there are tools out there that are enormously helpful.  The product we used was from Tangible Software Solutions. For a couple of hundred dollars it saved literally hundreds of man hours of time.  It's not perfect by any means but it will give you the structure of the Java code (partials allow code for classes in C# to be split across more than one file) and make a pretty good attempt of giving you workable Java.  In our case hardly any of the generated code actually compiled but it was a really good head start. My analogy would be to early attempts at OCR. You could scan in a document but when you opened it in an editor you would find red underlinings against many words which had not been recognised correctly. It was a matter of going through all the red underlinings and working out what the word should have been. Much is the same with the code produced by the automated translation, when it was pulled into an IDE there were many compiler errors. Sometimes the automation left in the original C# and said that the translation could not be done automatically.  To its credit the tool always erred on the side of being more conservative, it never made mistakes with the Java it produced, which was important. 

3. Don't rush the translation

After you have run automated translation you will need to go back to the code and fix the compile errors by hand. If I had my time again I would spend 10 times longer making sure that every change I made to the code was absolutely correct.  Since I wasn't an expert in C# I sometimes made assumptions as to how the C# libraries worked.  Those assumptions were not always correct and I sometimes paid a heavy penalty debugging scenarios where, had I been more careful in the original translation there would never have been a problem. It's definitely worth spending time reading through the C# API of the classes you are translating. I found this especially important when using Date and DateTime objects.

It's also worth spending time learning the Visual Studio IDE.  When debugging side by side it will save time in the long run if you know how to use your IDE properly.

4. Use Java 8

Apart from all the obvious reasons to use Java 8 (it's the latest version of Java so why not use it...) the Stream API maps nicely onto C# Linq.  The syntax is a little different, for example Java uses '->' and C# uses '=>', but using the new Java 8 features really helps keeping the code comparable which all helps when debugging further down the line.

5. Be careful of unintended behaviour  

There are certain features of languages that you shouldn't rely on but might work all the same. Let me demonstrate with an example on which I spent far too much time.  The C# code was using a Dictionary which the code generator correctly translated to a HashMap. Both are unordered Maps.  However, even though Dictionary is unordered by contract (there is also an OrderedDictionary) when iterating through the Dictionary it seemed to preserve the insertion order.  This was not the case with HashMap, and since the order of elements was material to the result, we found discrepancies which were hard to debug. The solution was to replace all instances of HashMap with LinkedHashMap which does preserve the order.

6. Don't refactor too early

The code produced from the code generator is not pretty.  In fact it's pretty horrific to look at, breaking nearly every rule regarding naming conventions etc.  It's tempting to tidy up as you go along.  Resist that temptation until all your unit tests have passed.  You can always tidy up later.  Refactoring, even renaming, can introduce bugs especially in code base with which you are, by definition, not familiar. Also you might decide to re-run the code generator somewhere down the line and all your tidying up will at best need to be merged and at worst have been a waste of time.

Conclusion

Translating even a fairly complicated program from C# to Java is not impossible even if you're not that familiar with C#.  Using the correct tools and techniques and critically having reliable and repeatable tests will make all the difference to the success of your project. 

See follow up article describing the performance gains achieved by the code port. 

Monday 27 April 2015

FileSystemMap: A File System backed Map

As part of a project I'm working on at the moment I've been looking at creating a FileSystemMap. I've started a very small GitHub project here to host the code. 

Essentially this map implementation is will allow the user to interact with a directory on their file system as if it were a java.util.Map. Each entry in the map will be a file in that directory, the key will be the file name and the value will be the contents of the file.

This code builds a FileServiceMap and adds five entries:

Map map = new FileSystemMap("/tmp/filetests");
  map.put("one", "one");
  map.put("two", "two");
  map.put("three", "three");
  map.put("four", "four");
  map.put("five", "five");

This results in a directly structure like this:

/tmp/filetests/
 |----- five
 |----- four
 |----- one
 |----- three
 |----- two

Adding and removing entries will change the files in your directory. Changing the value of the entry will cause the file to be re-written with the new value as its contents. For more examples see the code in testMapMethods.

Additionally the FileSystemMap has been designed for two way interaction. Any programmatic updates to it are reflected on the file system and any updates to the file system will be picked up by the map and fired as events.  

This code registers changes to the file system and prints them out:

Map map = new FileSystemMap("/tmp/filetests");
map.registerForEvents(System.out::println);

This is some example output:

FPMEvent{eventType=NEW, programmatic=true, key='one', value='one'}

The eventType is one of: 

  • NEW - a file has been created
  • UPDATE - a file has been modified
  • DELETE - a file has been deleted
The programmatic flag indicates whether it was the FileSystemMap itself that caused the event to be fired. e.g. if put() is called, a file will be created which in turn will cause an event to fired. To avoid feedback it can be useful to know whether it was an operation on the FileSystemMap that triggered the event.

The key is the name of the file that changed.

The value is the latest value associated with the file that has changed.  Note: this may or may not be the value that actually triggered the change.  For example if there were two very fast changes to the entry it is entirely possible that the value for the first event will pick up the value after the second update has already taken place.

e.g. 

map.put("one", "1");
map.put("one", "2");

could produce this output:

FPMEvent{eventType=NEW, programmatic=true, key='one', value='2'}


The first event (triggered by setting to "one" to '1') is picked up but by the time the program has checked the contents of the file the file has changed to '2'. The second event is then picked up (triggered by setting to "one" to '2') but this is suppressed because the value has not changed.  

Some notes:

  1. Files starting with a '.' are ignored. The reason for this is that many editors (including vi) create temporary files which should not be picked up by the FileServiceMap.
  2. If you look at the code you will notice that the WatchService (since Java7) was used to monitor changes to the file system. It is important to understand that the WatchService is OS specific. In particular it doesn't work that well on Mac OSX. This thread discusses some of the issues.  Updates from the WatchService are slow and and fast flowing events can be dropped. In my testing Ubuntu performed significantly better than MacOSX.  However if you're mainly interested in changes to the file system carried out by hand even Mac OSX will be fine.
  3. The Map only supports Strings.

It goes without saying that this class is designed for its specific utility rather than any sort of performance. 

All contributions to this project are welcome!


Thursday 16 April 2015

Dealing with Interruptions

I was just watching the VJUG interview with Heinz Kabutz which inspired me to write a post about Interruptions. By the way I would recommend subscribing to the VJUG YouTube channel - very informative indeed.

Heinz is always good value and it's difficult to watch any of his presentations without learning a lot. He raised the topic of how to deal with an InterruptedException and postulated that few Java programmers deal correctly with it.  The best explanation of thread interruptions I've read is contained in my favourite book on Java - Java Concurrency In Practice (p138-144). If you've read these pages you will know how to deal with an InterruptedException correctly :-)

Here's a short summary:

How often have you come across this code:

.......
try {
   Thread.sleep(1000);
} catch(InterruptedException e){
   e.printStackTrace();
}
......

A process is required to sleep for a second but 'annoyingly' has to deal with an InterruptedException.  The developer doesn't really know what to do with this exception so just logs it to the console.

This is very bad practice! If you are sure that your thread will never be interrupted (you are writing this code in a closed system) then you should probably do something like throw an AssertionError in the catch block with a comment that this should never happen.  If it is at all possible that the thread might be interrupted then you need to deal with that interruption correctly.

A thread can be interrupted by calling its interrupt() method.  This will set its interrupt status to true and consequently when you call isInterrupted() will return true. When interrupt() is called certain blocking methods, like Thread.sleep() will throw an InterruptedException. Note that triggering the InterruptedException will set the interrupt status to false.  There is a method on Thread called interrupted() which like isInterrupted() returns the interrupt status of the thread but crucially sets the interrupt status back to false. (interrupted() is a very strangely named method for what it does...)

We can see all this at work in the following example:


To quote Java Concurrency in Practice:

"There is nothing in the API or language specification that ties interruption to any specific cancellation semantics, but in practice, using interruption for anything but cancellation is fragile and difficult to sustain in larger applications."

In other words an interrupt is just a signal. You could theoretically use the interrupt mechanism to instruct the thread to do anything you wanted, perhaps to take do action A instead of B - but we are counselled against it.

.......
try {
   Thread.sleep(1000);
} catch(InterruptedException e){
   actionA();
   return;
}
actionB();
......   

So what is the correct way to deal with an interrupt.  Well that depends a bit on your code. Let's assume we are using the interrupt 'correctly' as a cancellation and your code expects a cancellation to occur (this should be specified in the documentation) then your code should cancel its actions in a controlled manner.  Just because an exception is thrown does not means you have to exit in haste leaving a trial of mess behind you. Because you have dealt with the interrupt there is no need to restore the interrupt status on the thread. 

If you do not expect an interrupt then you should handle the interrupt gracefully (maybe finish what you are doing) and then restore the interruption on the thread for some code higher up the stack to deal with. Remember once the exception has been thrown the interrupt status is set to false. Here's the way (code taken from the book) as to how it should be done:

Tuesday 7 April 2015

Java 8 Lambdas in One Line

If you understand this line, or better still can write this code you can pretty much say that you have understood the essence of Java 8 Lambdas. Certainly in as much as they can be used with collections.

I found this in a recent presentation by Peter Lawrey.  (Definitely worth watching the whole presentation when you have a spare hour.)

Anyway the task was to find the 20 most frequent words in a file:

As you can see, with Java 8 this can actually be done in a single (albeit rather long) line. 

If you're not used to lambdas the code might seem a little scary but actually it's pretty declarative and when you get past the logic reads fairly easily.