Friday 22 May 2015

Git: Going back to a specific date time and retrospective tagging

I'm a fairly happy Git user :-)  It's and excellent product and a great improvement on CVS, SVN, SourceSafe, PVCS and other systems I have used in the past.

In my day to day working I've got into a pattern of using very simple commands like clone, commit, update, pull, push etc.

But then I got into the situation where my code was broken but I couldn't pinpoint exactly what had gone wrong or when this happened. What I wanted to do was to roll the code back until I reached the time when the code was last working which would allow me to determine the exact cause of the failure.

The question was how to go back in history using just the date:

Turns out that this is really easy: Just use t and retrospective tagginghe command below:

git checkout master@{2009-07-27 13:37}

If this fails because the reflog doesn't go back to that point in time then you need the command below:

git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master`

Another useful thing to do is to be able to tag a build retrospectively. So for example in our scenario, let's say you've found the last working version you might want to tag it.

That's also pretty easy.

Use the command  git log to show a history of all the commits on the project.  Once you have found the commit at which you want to tag just use this command:

git tag -a v0.9 7fcea889f -m "Retrospective tag"
7fcea889f matches the start of the commit id.
You can then push them up using git push --tags origin master
Then you have a tag which will easily allow anyone to checkout that version of code.

Monday 18 May 2015

Revisiting the First Rule of Performance Optimisation: Effects of Escape Analysis

In an earlier post I wrote about how to start optimising for performance. If you haven't read the post I would advise you do have a look but the long and short of it is that it's pretty much always worth reducing allocations as a first step to improving performance. But understanding exactly where your program actually allocates is not as easy as you might think...

Whilst reviewing an article on a similar topic it occurred to me that it would be interesting to try and measure exactly how much longer it takes to allocate as opposed to reuse an existing object. We know that allocation is really fast in Java (ignore the side effects of object allocation for now) so let's see how allocation compares to object re-use.

This is the first attempt at the code I wrote to try and measure that change:


The output form the program was as below:

No allocation took 15 with result 100000000
Allocations took 11 with result 100000000
No allocation took 12 with result 100000000
Allocations took 13 with result 100000000
No allocation took 0 with result 100000000
Allocations took 1 with result 100000000

Hardly anything in it - but then I got suspicious....

Was the testAllocations method actually allocating or was escape analysis being applied? Escape analysis allows the JIT to bypass the object allocation and use the stack instead of the heap. See here for more details.

Well one way to find out is to run the program with the command line setting -verbosegc and see if the gc was being invoked. As I suspected there was no gc, which, given the fact I had not set the -Xms and -Xmx flags should not have been possible.

So I wrote another little program forcing the testAllocate method to 'escape' and thereby allocate:


The output for this was very different, see below:

I've run with -verbose:gc  - I've picked out the System.out in red so that the times can be easily identified.

*** No allocation took 43 with result 100000000
[GC (Allocation Failure)  111405K->108693K(142336K), 0.1243020 secs]
[Full GC (Ergonomics)  108693K->69499K(190976K), 0.3276730 secs]
[GC (Allocation Failure)  102779K->102923K(204288K), 0.1275237 secs]
[GC (Allocation Failure)  149515K->149731K(224256K), 0.2172119 secs]
[Full GC (Ergonomics)  149731K->149326K(359936K), 1.1698787 secs]
[GC (Allocation Failure)  215886K->215502K(365056K), 0.2680804 secs]
 **** Allocations took 2321 with result 100000000
*** No allocation took 37 with result 100000000
[Full GC (Ergonomics)  365307K->52521K(314368K), 0.1279458 secs]
[GC (Allocation Failure)  124201K->124497K(314368K), 0.2493061 secs]
[GC (Allocation Failure)  196177K->196401K(419840K), 0.3004662 secs]
 **** Allocations took 733 with result 100000000
[GC (Allocation Failure)  276094K->274825K(429056K), 0.5655052 secs]
[Full GC (Ergonomics)  274825K->1024K(427520K), 0.0042224 secs]
*** No allocation took 583 with result 100000000
[GC (Allocation Failure)  98304K->59382K(527872K), 0.0727687 secs]
[GC (Allocation Failure)  224246K->224582K(534016K), 0.1465164 secs]
 **** Allocations took 263 with result 100000000
*** No allocation took 15 with result 100000000
[GC (Allocation Failure)  389446K->167046K(643584K), 0.0831585 secs]
 **** Allocations took 119 with result 100000000
[GC (Allocation Failure)  366101K->91504K(653312K), 0.0006495 secs]
*** No allocation took 15 with result 100000000
[GC (Allocation Failure)  329072K->290206K(681472K), 0.1650293 secs]
 **** Allocations took 202 with result 100000000

As you can see this time the program is definitely allocating and the time for the allocated test is considerably slower.

Let's see if we can eliminate the need for GC pauses by setting a high -Xms value.

This time the GC is almost eliminated and we get these results:

*** No allocation took 48 with result 100000000
 **** Allocations took 162 with result 100000000
*** No allocation took 44 with result 100000000
 **** Allocations took 172 with result 100000000
*** No allocation took 28 with result 100000000
 **** Allocations took 179 with result 100000000
*** No allocation took 40 with result 100000000
 **** Allocations took 365 with result 100000000
*** No allocation took 73 with result 100000000
[GC (Allocation Failure)  1572864K->188862K(6029312K), 0.2602120 secs]
 **** Allocations took 499 with result 100000000
*** No allocation took 43 with result 100000000
 **** Allocations took 62 with result 100000000
*** No allocation took 17 with result 100000000
 **** Allocations took 228 with result 100000000
*** No allocation took 95 with result 100000000
 **** Allocations took 364 with result 100000000
*** No allocation took 46 with result 100000000
 **** Allocations took 194 with result 100000000
*** No allocation took 14 with result 100000000
[GC (Allocation Failure)  1761726K->199238K(6029312K), 0.2232670 secs]

 **** Allocations took 282 with result 100000000

So even without GC, allocating is considerably slower.  

The next step was to examine what was happening by looking at my programs with a profiler.

I ran the first program, Allocation1, with my favourite profiler, YourKit and started seeing very different behaviour.  The program which we had previously shown not be allocating, was now allocating (I could see this my running with verbose GC below).

This is the output:

[YourKit Java Profiler 2014 build 14114] Log file: /Users/daniel/.yjp/log/Allocation1-11980.log
No allocation took 20 with result 100000000
[GC (Allocation Failure)  33280K->1320K(125952K), 0.0032859 secs]
[GC (Allocation Failure)  34600K->1232K(125952K), 0.0028476 secs]
[GC (Allocation Failure)  34512K->1136K(125952K), 0.0020955 secs]
[GC (Allocation Failure)  34416K->1168K(159232K), 0.0024346 secs]
[GC (Allocation Failure)  67728K->1240K(159232K), 0.0021334 secs]
Allocations took 109 with result 100000000
No allocation took 26 with result 100000000
[GC (Allocation Failure)  67800K->1376K(222208K), 0.0036706 secs]
[GC (Allocation Failure)  134496K->1508K(222208K), 0.0024567 secs]
Allocations took 98 with result 100000000
No allocation took 1 with result 100000000
[GC (Allocation Failure)  134628K->1516K(355840K), 0.0008581 secs]

Allocations took 56 with result 100000000

And in YourKit we see the creation of those objects:


Running with YourKit had interfered with the escape analysis and caused the objects to be allocated!!

The advice I gave in my previous post 'First Rule of Performance Optimisation' was to look to eliminate allocation.  This program (Allocation1) looks (from YourKit's view) like it is allocating and a developer, following my advice, might spend considerable effort reducing that allocation.  The developer would be surprised and disappointed when they found that on rerunning the program (without YourKit) that nothing at all had been achieved.  Of course now we know why that is, allocation were only a side-effect of running the profiler! Without the profiler the allocations were being 'escaped-anaysised' away.

So how are we to know whether allocations we see in the profiler are real or just caused by the profiler interfering with escape analysis.

The answer is to use Flight Recorder. See my previous post on the differences between Flight Recorder and other profilers.

This is the Flight Recording when we run Allocation1.



As you can see no allocations have been recorded.  Contrast this with the Flight Recording we get from running Allocation2.



Conclusions:

  • It is always better not to allocate objects
  • But escape analysis means that calling new does not always lead to an allocation! 
  • In general, profilers interfere with escape analysis so be careful of fixing an allocation that, without the profiler might be escaped away.
  • Using Flight Recorder should not interfere with escape analysis and give you a more accurate reflection as to your real allocation.


Further Reading on the subject: 

I subsequently came across this excellent (as usual) post from Nitsan Wakart
http://psy-lob-saw.blogspot.co.uk/2014/12/the-escape-of-arraylistiterator.html

Wednesday 13 May 2015

Exceptions in Lambdas: An Elegant Solution to a Bit of a Mess

Consider the following function for writing to a file:

The idea behind the method is to allow the user to pass in different implementations of InputStream to the method so that writeToFile can be called for example with a GZIPOuputStream, SnappyOuputStream (fast compression) or simply a plain FileInputStream.


It's a neat function which can be called like this:


Unfortunately as pointed out in the comment this does not compile!  The reason it doesn't compile is because the GZIPOutputStream throws an IOException in its constructor. What would have been nice was if the IOException was thrown out of the lambda and could then be dealt with in the try catch block - but that's not how lambdas work :-(

This is in fact how you have to deal with the exception to get the code to compile:


Not only is this ugly but you are left with the rather awkward problem of what to do with the IOException. In this case we have just re-packaged inside an AssertionError. See my previous post 'Cheating with Exceptions' on the correct way to handle this scenario. 

But there is a solution to this problem. Rather than using a java.util.function.Function that takes a value and returns a value, we can create a custom function that takes a value returns a value and throws an Exception. In this way the client code of writeToFile is nice and clean and can deal with the exception in a natural way. Moreover, lambdas are now used in the way they were intended to make our code prettier and easier to understand.

See full code listing below:


Tuesday 12 May 2015

Windows Development Environment on a Mac (for free)

There's nothing particularly novel in this post but since I've just spent a half a day creating a C# development environment on my Mac I thought it would be useful to outline the steps.

Here are the steps you will need to take:

  1. Download and install VirtualBox. This is a free virtualisation product backed by Oracle. It will allow you to run a virtualised client OS as a window inside OSX. It seems to be just as good as Parallels but is completely free. 
  2. Download Windows (I chose 8.1 which is the latest full released version).  You can get this from the Microsoft Technet Evaluation Centre.  It comes with a 90 day trial license which will of course have to converted into a full licence after that period has expired.
  3. Then create a new OS inside VirtualBox for Windows. My tips here are:
      • To give the OS at least 50GB disk if you have the space.  Full installations of Windows, VisualStudio and Office take a lot of disk space (around 25GB) and you want to have a decent amount left.
      • By default VirtualBox only allocates 1 CPU to the OS.  You want to increase that, also increase the amount of memory that you give the OS if you want decent performance experience.
      • This is pretty obvious but make sure you choose the correct setting for the OS you are installing. i.e. Windows 8.1 not Windows 8 if that's your OS.
  4.  It will take about 15 minutes for Windows to install and then you will find that the screen resolution isn't quite right.  You need to install Guest device - this video describes it well.
  5. Now that you are in Windows, make it full screen and you should almost forget you are on a Mac.  Download VisualStudio. The express edition is free. It takes an age to install so make yourself a cup of tea while you're waiting...
  6. Depending on what you're developing you might also want to install MS Office.  As with Windows you can download an evaluation version from the Microsoft Technet Evaluation Centre.
  7. If you're code is in GitHub and you prefer to use a Bash command line I would definitely recommend you download Git Bash.
So there you have it a fully installed Windows environment on your Mac for free - well at least free for 90 days by which time you will probably be desperate to come back to Mac OSX again anyway :-)

Sunday 10 May 2015

More Strange Behaviour in Tight Loops

In a previous article on the subject I pointed out the dangers of having code in tight loops especially their unintended affects on other threads running in the same JVM.

In this post I'll just point out a few interesting things I came across whilst testing the code in a tight loop.

Most Java developers, at some time, will have used the command line tools jps and jstack. jps is a command that let's you easily identify the Java processes running on your machine and their pids. When you have the pid of a Java process you can then use jstack to get a stack trace of that process. 

When I tried running running jstack on a process in a tight loop (see code here) I got this response:

Unable to open socket file: target process not responding or HotSpot VM not loaded The -F option can be used when the target process is not responding

I guess jstack needs the process to reach safepoint before being able to generate the stack trace.

Even more interestingly, when I tried opening the process in Flight Recorder (this is a really good profiler see here for a full write up) you get this dialog box:


Somewhat misleading - I have to say I spent quite a while puzzling over this before I realised that it was caused by trying to profile code in tight loop, and that all I had to do was add breathing space (see previous article) for Flight Recorder to start behaving again.

Summary 

I wonder whether having code in long running tight loops is actually a realistic scenario in which case my findings and issues will just be academic.  Or perhaps there might be scenarios, especially in failure cases, where this might actually be the case.  Certainly the behaviour of code in tight loops does shed some light on the way safepoints, GC, and monitoring tools interact with each other.


Friday 8 May 2015

Safepoints: Strange and Dangerous Behaviour of Code in Tight Loops

Here's a question to consider.

Is it possible to completely halt the execution of Thread A by code in Thread B where Thread B has no shared state or access to Thread A.

The immediate reaction would be, no.  Unless the threads share some lock they should have no effect on each other.

Turns out this is not actually true.

One thing all code inside a JVM has in common is the JVM itself.  So in theory, a thread can effect another thread by its interaction with the JVM.  Let's drill down into what happens when a GC is scheduled by the JVM.  First, all threads are brought to a safe point then the GC is executed.

The real problem with code in tight loops is that it can't be brought to a safe point.  The threads (those other than the one executing the tight loop) at the safepoint will be stalled there indefinitely until the tight loop exits.

(My experiences running and debugging this code can be found here. Some rather strange things happen to jstack and flight recorder when you try to investigate code in tight loops.)

Take a look at this code:

  

If you run the code with these command lines options:

-XX:+PrintSafepointStatistics -XX:+PrintGCDetails -XX:+PrintGC -XX:+PrintGCTimeStamps

you will observe the following output (notice the highlights in red which show the effects the threads have on each other):

testNoAllocations
testAllocationsOnHeap
0.605: [GC (Allocation Failure) [PSYoungGen: 33280K->5119K(38400K)] 111405K->98692K(132096K), 0.0795153 secs] [Times: user=0.45 sys=0.02, real=0.08 secs] 
0.685: [Full GC (Ergonomics) [PSYoungGen: 5119K->5085K(38400K)] [ParOldGen: 93573K->93475K(176640K)] 98692K->98561K(215040K), [Metaspace: 7131K->7131K(1056768K)], 2.2169620 secs] [Times: user=14.59 sys=0.25, real=2.21 secs] 
2.908: [GC (Allocation Failure) [PSYoungGen: 38365K->5117K(41984K)] 131841K->131353K(218624K), 0.1426349 secs] [Times: user=1.07 sys=0.03, real=0.14 secs] 
55.457: [GC (Allocation Failure) [PSYoungGen: 41981K->5119K(45056K)] 168217K->168171K(221696K), 0.1575692 secs] [Times: user=1.13 sys=0.03, real=0.16 secs] 
55.615: [Full GC (Ergonomics) [PSYoungGen: 5119K->0K(45056K)] [ParOldGen: 163051K->128718K(295936K)] 168171K->128718K(340992K), [Metaspace: 7197K->7197K(1056768K)], 0.9572316 secs] [Times: user=6.93 sys=0.11, real=0.95 secs] 
No allocation took 56015 with result 500000000000
56.579: [GC (Allocation Failure) [PSYoungGen: 39936K->512K(47104K)] 168654K->167438K(343040K), 0.1376084 secs] [Times: user=1.00 sys=0.01, real=0.14 secs] 
56.726: [GC (Allocation Failure) [PSYoungGen: 47104K->5120K(67072K)] 214030K->214230K(363008K), 0.1811774 secs] [Times: user=1.28 sys=0.03, real=0.18 secs] 

0
56.923: [GC (Allocation Failure) [PSYoungGen: 67072K->56320K(109568K)] 276182K->276390K(405504K), 0.2548213 secs] [Times: user=1.79 sys=0.05, real=0.26 secs] 
57.190: [GC (Allocation Failure) [PSYoungGen: 109568K->70144K(123392K)] 329638K->329822K(419328K), 0.4376422 secs] [Times: user=3.12 sys=0.08, real=0.44 secs] 
57.628: [Full GC (Ergonomics) [PSYoungGen: 70144K->0K(123392K)] [ParOldGen: 259678K->274807K(488448K)] 329822K->274807K(611840K), [Metaspace: 7197K->7197K(1056768K)], 2.5251285 secs] [Times: user=18.03 sys=0.31, real=2.52 secs] 
60.160: [GC (Allocation Failure) [PSYoungGen: 53248K->53376K(162304K)] 328055K->328183K(650752K), 0.2079755 secs] [Times: user=1.42 sys=0.04, real=0.21 secs] 
60.380: [GC (Allocation Failure) [PSYoungGen: 132224K->95232K(174080K)] 407031K->407263K(662528K), 0.4926310 secs] [Times: user=3.46 sys=0.05, real=0.49 secs] 

1
....
....
....
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.281: RedefineClasses                  [       5          0              0    ]      [     0     0     0     0     1    ]  0   
0.530: RedefineClasses                  [       5          0              0    ]      [     0     0     0     0     4    ]  0   
0.603: ParallelGCFailedAllocation       [      14          1              1    ]      [     1     0     1     0  2296    ]  0   
2.908: ParallelGCFailedAllocation       [      15          1              1    ]      [     0     0     0     0   142    ]  1   
3.059: ParallelGCFailedAllocation       [      15          1              2    ]      [ 52398     0 52398     0  1114    ]  0   
56.579: ParallelGCFailedAllocation       [      14          0              0    ]      [     0     0     0     0   137    ]  0   

56.726: ParallelGCFailedAllocation       [      14          0              0    ]      [     0     0     0     0   181    ]  0   


What this shows is that the testNoAllocations thread, running in a tight loop is unable to reach a safepoint thereby halting the progress of testAllocationsOnHeap for almost a minute! 

If you allow the tight loop to reach safepoint (by uncommenting the System.out line 38) the output is very different.  The threads are now running concurrently.

testNoAllocations
testAllocationsOnHeap
0.596: [GC (Allocation Failure) [PSYoungGen: 33280K->5117K(38400K)] 111405K->98666K(132096K), 0.0790627 secs] [Times: user=0.45 sys=0.02, real=0.08 secs] 
0.676: [Full GC (Ergonomics) [PSYoungGen: 5117K->5107K(38400K)] [ParOldGen: 93549K->93454K(176640K)] 98666K->98561K(215040K), [Metaspace: 7129K->7129K(1056768K)], 2.2247202 secs] [Times: user=14.40 sys=0.26, real=2.23 secs] 
No Allocations 0
2.907: [GC (Allocation Failure) [PSYoungGen: 38387K->5101K(41984K)] 131841K->131339K(218624K), 0.1416991 secs] [Times: user=0.99 sys=0.03, real=0.14 secs] 
3.054: [GC (Allocation Failure) [PSYoungGen: 41965K->5117K(56320K)] 168203K->168363K(232960K), 0.1655830 secs] [Times: user=1.21 sys=0.03, real=0.17 secs] 
3.220: [Full GC (Ergonomics) [PSYoungGen: 5117K->0K(56320K)] [ParOldGen: 163246K->167991K(331264K)] 168363K->167991K(387584K), [Metaspace: 7130K->7130K(1056768K)], 1.4696097 secs] [Times: user=10.81 sys=0.13, real=1.46 secs] 
No Allocations 1
4.711: [GC (Allocation Failure) [PSYoungGen: 51200K->512K(67072K)] 219191K->217223K(398336K), 0.2020788 secs] [Times: user=1.48 sys=0.03, real=0.20 secs] 
No Allocations 2
No Allocations 3
4.937: [GC (Allocation Failure) [PSYoungGen: 67072K->5120K(71680K)] 283783K->282951K(402944K), 0.2758780 secs] [Times: user=1.93 sys=0.05, real=0.28 secs] 
5.213: [Full GC (Ergonomics) [PSYoungGen: 5120K->0K(71680K)] [ParOldGen: 277831K->282444K(527360K)] 282951K->282444K(599040K), [Metaspace: 7198K->7198K(1056768K)], 2.8296645 secs] [Times: user=20.45 sys=0.31, real=2.83 secs] 
No Allocations 4
Allocations 0
No Allocations 5
8.065: [GC (Allocation Failure) [PSYoungGen: 66560K->66336K(164352K)] 349004K->348780K(691712K), 0.2564233 secs] [Times: user=1.91 sys=0.04, real=0.26 secs] 
No Allocations 6
No Allocations 7
No Allocations 8
8.354: [GC (Allocation Failure) [PSYoungGen: 155936K->82944K(172544K)] 438380K->438452K(699904K), 0.6268327 secs] [Times: user=4.53 sys=0.12, real=0.63 secs] 
No Allocations 9
No Allocations 10
9.003: [GC (Allocation Failure) [PSYoungGen: 172544K->132608K(233984K)] 528052K->517452K(761344K), 0.6610664 secs] [Times: user=4.87 sys=0.11, real=0.66 secs] 
No Allocations 11
Allocations 1
No Allocations 12
No Allocations 13
9.696: [GC (Allocation Failure) [PSYoungGen: 233984K->167424K(268800K)] 618828K->581940K(796160K), 0.7480646 secs] [Times: user=5.52 sys=0.07, real=0.75 secs] 
No Allocations 14
No Allocations 15
No Allocations 16
10.477: [GC (Allocation Failure) [PSYoungGen: 268800K->205152K(325632K)] 683316K->619668K(852992K), 0.8410307 secs] [Times: user=6.22 sys=0.11, real=0.84 secs] 
No Allocations 17
No Allocations 18
Allocations 2
No Allocations 19
11.351: [GC (Allocation Failure) [PSYoungGen: 323936K->232960K(351744K)] 738452K->649268K(879104K), 0.8260039 secs] [Times: user=6.21 sys=0.04, real=0.82 secs] 
No Allocations 20
No Allocations 21
No Allocations 22
12.209: [GC (Allocation Failure) [PSYoungGen: 351744K->232928K(410624K)] 768052K->649236K(937984K), 0.9079795 secs] [Times: user=6.58 sys=0.10, real=0.91 secs] 
No Allocations 23
No Allocations 24
Allocations 3



Note that even though in this last case the threads run concurrently, its not hard to imagine a system, though not completely stalled is temporarily halted whist the code in the tight loop comes up for breath and reaches a safe point.

Conclusions
  • Code in tight loops cannot be brought to safepoint. 
  • Even though threads in a JVM ought to be independent of each other, if they behave in a way in which they can't be brought to safepoint they will slow down other threads waiting for GC at safepoint.
  • If you have a system where: (a) one thread spends time in tight loops (b) the system uses GC (c) there are other critical threads; you should check your safepoint logs to make sure that the system is not being slowed down waiting at safepoints.


Tuesday 5 May 2015

C# vs Java Which one is Faster? Translating 25k C# into Java (2)

In a previous article I described how I translated 25k lines of C# into Java and the lessons learnt from that exercise.

I received the following question:

Great article by the way. How did the performance compare to the C# version after the code was migrated?

One of the motivations to move from to re-write the system was to make the system faster and in fact this goal was achieved. We managed to reduce the amount of hardware by  a factor of 5 whilst still improving the throughput of the system by a factor by 6.  A huge benefit to the client in all ways. 

The original assumption was that C# is not actually any slower than Java and that we would have to implement some fancy techniques to achieve any significant performance gains. As it happened the gains were achieved by a simple re-write of the system.

So is C# slower than Java? In a word, no. Whilst I didn't run any formal benchmarks on comparable hardware, my anecdotal evidence was that, in a like for like situation the performance was comparable. But what I did find was that it's very easy to architect a bad system in C# around data access.

There's an extremely tight coupling between C# and SqlServer. Visual Studio is literally a front end to both. It's also interesting that the C# developers I came across were equally proficient in C# as they were in SQLServer.  This sounds great, after all, nearly all systems need to work with data stored in a database so tight integration between the two should be the way to go.  Well, yes and no. It's great to have good tools and skills to enable you to access and manipulate data but the performance costs of 'data chat' must never be forgotten. 

The main problem with the system on which I was working was that data access was tightly integrated into the code.  Whenever a piece of data was needed a call was made to the DB. In fact in some cases logic that could have been carried out in the code was carried out in store procedures on the SQLServer. Whenever a result was calculated it was written back to the database. Not only was that extremely inefficient it made the system much harder to understand.

The first thing we did was to create a clean separation between the data and the code.  All the data needed for the program to run was bulk exported from the database using bcp and those files were used to create objects which were held in the program memory. Once all the results had been calculated they were written to files which were bcp'd back up into the database. This eliminated the constant 'chat' between the program and the server and greatly speeded up the system.  It also made the inputs and outputs of the system extremely transparent.  Having database calls buried in the code can made the inputs and outputs rather opaque. 

Because we were using Java and did not have access to the tight coupling to SQLServer we were forced to adhere to a critical good practice which is to 'separating data from its processing'.  This was the key to the performance improvements that were achieved.

None of this should be taken to imply that the integration between C#, SQLServer and Visual Studio is a bad thing. In fact, quite the opposite it's a very powerful tool which as with all powerful tools can be dangerous if not handled with understanding and care.

Friday 1 May 2015

Simple Class to Measure Latency

This is a very simple class I wrote to measure latency.  It's not the Rolls Royce solution that is HDRHistogram but if you want to add just one class to your project this does the trick quite nicely.

Here's a simple test program to show you how it's used:


This is some sample output:

Thread.sleep() random
Latency measured:
 0.32 us for 50 percentile
 0.44 us for 90 percentile
 0.68 us for 99 percentile
 26.82 us for 99.9 percentile
 582.66 us for 99.99 percentile
 2024.92 us worst percentile
Math.sqrt
Latency measured:
 0.04 us for 50 percentile
 0.06 us for 90 percentile
 0.09 us for 99 percentile
 0.12 us for 99.9 percentile
 0.20 us for 99.99 percentile
 28.17 us worst percentile

There are only 4 methods:
  • The constructor: This takes an int for the maximum number of times you want to measure. Apart from memory implications, oversizing, is not a problem.  In this implementation you need to take at least 10,000 measurements for the code to work. If you want to take less just adapt the code appropriately in printStats().
  • startMeasure() and endMeasure() are called on either side of the code to be measured.
  • printStats() prints out the results.
Implementation below: