Rational Java: C# vs Java Which one is Faster? Translating 25k C# into Java (2)

Tuesday, 5 May 2015

C# vs Java Which one is Faster? Translating 25k C# into Java (2)

In a previous article I described how I translated 25k lines of C# into Java and the lessons learnt from that exercise.

I received the following question:

Great article by the way. How did the performance compare to the C# version after the code was migrated?

One of the motivations to move from to re-write the system was to make the system faster and in fact this goal was achieved. We managed to reduce the amount of hardware by a factor of 5 whilst still improving the throughput of the system by a factor by 6. A huge benefit to the client in all ways.

The original assumption was that C# is not actually any slower than Java and that we would have to implement some fancy techniques to achieve any significant performance gains. As it happened the gains were achieved by a simple re-write of the system.

So is C# slower than Java? In a word, no. Whilst I didn't run any formal benchmarks on comparable hardware, my anecdotal evidence was that, in a like for like situation the performance was comparable. But what I did find was that it's very easy to architect a bad system in C# around data access.

There's an extremely tight coupling between C# and SqlServer. Visual Studio is literally a front end to both. It's also interesting that the C# developers I came across were equally proficient in C# as they were in SQLServer. This sounds great, after all, nearly all systems need to work with data stored in a database so tight integration between the two should be the way to go. Well, yes and no. It's great to have good tools and skills to enable you to access and manipulate data but the performance costs of 'data chat' must never be forgotten.

The main problem with the system on which I was working was that data access was tightly integrated into the code. Whenever a piece of data was needed a call was made to the DB. In fact in some cases logic that could have been carried out in the code was carried out in store procedures on the SQLServer. Whenever a result was calculated it was written back to the database. Not only was that extremely inefficient it made the system much harder to understand.

The first thing we did was to create a clean separation between the data and the code. All the data needed for the program to run was bulk exported from the database using bcp and those files were used to create objects which were held in the program memory. Once all the results had been calculated they were written to files which were bcp'd back up into the database. This eliminated the constant 'chat' between the program and the server and greatly speeded up the system. It also made the inputs and outputs of the system extremely transparent. Having database calls buried in the code can made the inputs and outputs rather opaque.

Because we were using Java and did not have access to the tight coupling to SQLServer we were forced to adhere to a critical good practice which is to 'separating data from its processing'. This was the key to the performance improvements that were achieved.

None of this should be taken to imply that the integration between C#, SQLServer and Visual Studio is a bad thing. In fact, quite the opposite it's a very powerful tool which as with all powerful tools can be dangerous if not handled with understanding and care.