Thursday, 30 April 2015

Lessons Learnt Translating 25k line of C# into Java

For various reasons I've recently completed a project converting a complex financial application from C# to Java. The reasons for the port were for the most part non-technical, rather, it was a strategic move for the business concerned. 

It was an interesting experience and I learnt a few lessons along the way that might be useful to share.

1. Construct language neutral tests over the existing system.

I'll start with perhaps the most important lesson of all. When porting a system, and this could be any port for any reason, there must be criteria to determine whether the port has been successful. The best way to do this is to construct a full set of tests around the original system, that can be 'exported without change' to the new system.  So for example, it's no good having a suite of JUnit tests if you want to move system from Java to a different language that doesn't support JUnit.  I can't stress enough how important it was that the changes to the tests could literally be copied from the old system to the new system without intervention. 

Another problem with JUnit tests is that they are often firmly linked to the existing implementation. Since the implementation is going to be rewritten, the tests are not portable between implementations.

The strategy we chose and which worked extremely well was to use Cucumber tests.  There are bindings for Cucumber in nearly all languages, it is well supported by IDEs (at least by both IntelliJ and Visual Studio) and as a bonus the tests are human readable. In this way you can involve non-technical users in building up the tests in preparation for the port. (As an aside, we had an attempt at getting the users to define the requirements for the new system by documenting everything the old system did and building tests round those requirements, but that unsurprisingly was completely unsuccessful.  It's far better building up test cases based off your existing implementation than trying to invent them for the new system!).

Using Cucumber was a real success and we created a new test every time there was a discrepancy between the systems. By the time we have finished we had around 1000 scenarios and we felt confident that the new system was correct.  It gave us the solid foundations we needed to continue developing the additional features and refactorings in the new system.

2. Try and automate as much of the translation as possible.

When faced with 25k+ lines of C# it's a pretty daunting task to think of hand translating every line into Java. Fortunately there are tools out there that are enormously helpful.  The product we used was from Tangible Software Solutions. For a couple of hundred dollars it saved literally hundreds of man hours of time.  It's not perfect by any means but it will give you the structure of the Java code (partials allow code for classes in C# to be split across more than one file) and make a pretty good attempt of giving you workable Java.  In our case hardly any of the generated code actually compiled but it was a really good head start. My analogy would be to early attempts at OCR. You could scan in a document but when you opened it in an editor you would find red underlinings against many words which had not been recognised correctly. It was a matter of going through all the red underlinings and working out what the word should have been. Much is the same with the code produced by the automated translation, when it was pulled into an IDE there were many compiler errors. Sometimes the automation left in the original C# and said that the translation could not be done automatically.  To its credit the tool always erred on the side of being more conservative, it never made mistakes with the Java it produced, which was important. 

3. Don't rush the translation

After you have run automated translation you will need to go back to the code and fix the compile errors by hand. If I had my time again I would spend 10 times longer making sure that every change I made to the code was absolutely correct.  Since I wasn't an expert in C# I sometimes made assumptions as to how the C# libraries worked.  Those assumptions were not always correct and I sometimes paid a heavy penalty debugging scenarios where, had I been more careful in the original translation there would never have been a problem. It's definitely worth spending time reading through the C# API of the classes you are translating. I found this especially important when using Date and DateTime objects.

It's also worth spending time learning the Visual Studio IDE.  When debugging side by side it will save time in the long run if you know how to use your IDE properly.

4. Use Java 8

Apart from all the obvious reasons to use Java 8 (it's the latest version of Java so why not use it...) the Stream API maps nicely onto C# Linq.  The syntax is a little different, for example Java uses '->' and C# uses '=>', but using the new Java 8 features really helps keeping the code comparable which all helps when debugging further down the line.

5. Be careful of unintended behaviour  

There are certain features of languages that you shouldn't rely on but might work all the same. Let me demonstrate with an example on which I spent far too much time.  The C# code was using a Dictionary which the code generator correctly translated to a HashMap. Both are unordered Maps.  However, even though Dictionary is unordered by contract (there is also an OrderedDictionary) when iterating through the Dictionary it seemed to preserve the insertion order.  This was not the case with HashMap, and since the order of elements was material to the result, we found discrepancies which were hard to debug. The solution was to replace all instances of HashMap with LinkedHashMap which does preserve the order.

6. Don't refactor too early

The code produced from the code generator is not pretty.  In fact it's pretty horrific to look at, breaking nearly every rule regarding naming conventions etc.  It's tempting to tidy up as you go along.  Resist that temptation until all your unit tests have passed.  You can always tidy up later.  Refactoring, even renaming, can introduce bugs especially in code base with which you are, by definition, not familiar. Also you might decide to re-run the code generator somewhere down the line and all your tidying up will at best need to be merged and at worst have been a waste of time.


Translating even a fairly complicated program from C# to Java is not impossible even if you're not that familiar with C#.  Using the correct tools and techniques and critically having reliable and repeatable tests will make all the difference to the success of your project. 

See follow up article describing the performance gains achieved by the code port. 

No comments:

Post a comment