Thursday, 2 July 2015

The Effect of Coordinated Omission - Measure loopback latency using a simple non-blocking NIO client/server

In this post I demonstrate a number of ideas and techniques: 
  1. How to write a simple non-blocking NIO client/server
  2. How to time latency loopback on your machine
  3. How to measure latencies in percentiles (as opposed to simple average)
  4. The effects of co-ordinated omission
  5. The realtionship between latency and throughput
Thanks to Peter Lawrey for his help with this post.

I was recently developing a low latency benchmark for client server application.  Initially I was simulating the benchmark on a single machine using TCP over loopback. The first measure I wanted to quantify was how much of the recorded latency I needed to allow for simple loopback latency. I would then be able to understand more clearly the latency added by my actual application.

To do this I created a program (code at the end of the post) that transfers a single byte from the client to the server and back again. This is done repeatedly and the results are processed.

The program is written using non-blocking java NIO to optimise the loopback latencies as much as possible. 

More importantly than recording the average time, the percentile latencies are recorded. (See a previous post here for a discussion of how to measure latency). Critically, the code factors for coordinated omission.  (To learn more about this see here from Gil Tene). In a nutshell, you don't time a job from when it started but from it should have started.

These are results on my 2 year old MBP.

Starting latency test rate: 80000 ... Average time 6001870.9
Loop back echo latency was 6007821.4/10525630.0 11,538,962/11,642,739 11,652,536/11653522 11,653,616 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 70000 ... Average time 4393443.9
Loop back echo latency was 4388003.2/7743356.1 8,496,550/8,576,130 8,583,327/8584029 8,584,097 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 60000 ... Average time 3054243.7
Loop back echo latency was 2943178.9/5463715.9 5,961,190/6,009,637 6,014,795/6015409 6,015,453 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 50000 ... Average time 1420485.4
Loop back echo latency was 1423450.9/2462332.1 2,694,163/2,715,932 2,717,909/2718146 2,718,163 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 40000 ... Average time 4828.0
Loop back echo latency was 79.8/23261.9 44,756/47,379 47,662/47712 47,713 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 30000 ... Average time 45.1
Loop back echo latency was 23.3/46.5 630/2,211 2,904/3092 3,106 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 25000 ... Average time 1095.0
Loop back echo latency was 24.8/36.8 47,300/58,166 59,151/59346 59,357 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 20000 ... Average time 26.7
Loop back echo latency was 24.7/29.5 60/88 259/339 340 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 15000 ... Average time 26.5
Loop back echo latency was 24.8/28.1 60/72 102/252 252 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 10000 ... Average time 26.7
Loop back echo latency was 24.8/28.9 61/72 165/369 369 us for 50/90 99/99.9 99.99/99.999 worst %tile

Running the same test with -Dcoordinated.omission=true

### Running with Coordinated Omission ###

Starting latency test rate: 80000 ... Average time 25.6
Loop back echo latency was 23.3/30.8 59/74 164/1604 9,165 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 70000 ... Average time 25.9

Loop back echo latency was 23.3/31.1 59/74 187/1726 11,358 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 60000 ... Average time 26.2

Loop back echo latency was 25.1/28.3 60/66 90/166 419 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 50000 ... Average time 26.4

Loop back echo latency was 24.6/28.3 60/72 232/1996 4,104 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 40000 ... Average time 26.2

Loop back echo latency was 24.6/28.5 60/67 90/251 319 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 30000 ... Average time 26.6

Loop back echo latency was 24.5/28.3 60/72 162/2257 3,236 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 25000 ... Average time 26.4

Loop back echo latency was 24.6/27.8 61/71 96/254 375 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 20000 ... Average time 26.4

Loop back echo latency was 24.5/27.4 61/70 96/243 293 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 15000 ... Average time 26.8

Loop back echo latency was 24.6/29.0 61/74 285/887 887 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 10000 ... Average time 26.7

Loop back echo latency was 24.6/28.5 61/73 99/307 307 us for 50/90 99/99.9 99.99/99.999 worst %tile


As you can see the effects of throughput are completely ignored! It would appear that even at a rate of 80,000 messages a second your 99.99 percentile is 374us when in fact it is way way larger than that.

In fact you can only hit your target latencies when your throughput is nearer 20,000 a second.  As you would understand intuitively there is a trade off between throughput and latency.  

The results for a server machine (i7-3790X 3.5 Ghz with Ubuntu 15.04) running Ubuntu are as below:


Starting latency test rate: 200000 ... Average time 9417750.1
Loop back echo latency was 9349156.4/17229351.3 18,907,915/19,073,572 19,089,824/19091465 19,091,645 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 160000 ... Average time 3155097.0
Loop back echo latency was 3186332.1/5649721.3 6,159,813/6,211,035 6,216,502/6217022 6,217,047 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 140000 ... Average time 47245.0
Loop back echo latency was 19.0/236226.4 339,075/362,914 364,575/364749 364,767 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 120000 ... Average time 8.2
Loop back echo latency was 6.6/13.1 22/89 388/1427 1,519 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 100000 ... Average time 7.3
Loop back echo latency was 6.6/7.4 18/20 59/107 317 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 80000 ... Average time 7.1
Loop back echo latency was 6.7/7.7 18/19 22/100 619 us for 50/90 99/99.9 99.99/99.999 worst %tile

Running the same test with -Dcoordinated.omission=true

### Running with Coordinated Omission ###

Starting latency test rate: 200000 ... Average time 6.8
Loop back echo latency was 6.5/6.9 18/19 21/23 122 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 160000 ... Average time 6.8
Loop back echo latency was 6.5/6.9 18/19 21/23 131 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 140000 ... Average time 6.9
Loop back echo latency was 6.5/7.1 18/19 21/23 442 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 120000 ... Average time 6.8
Loop back echo latency was 6.5/6.9 18/19 21/23 121 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 100000 ... Average time 6.9
Loop back echo latency was 6.5/6.9 18/19 21/23 106 us for 50/90 99/99.9 99.99/99.999 worst %tile

Starting latency test rate: 80000 ... Average time 6.8
Loop back echo latency was 6.5/6.9 17/19 21/23 104 us for 50/90 99/99.9 99.99/99.999 worst %tile

On the server machine, you can achieve stable latencies (~7us) with a throughput of 100,000.  

An interesting observation is the relationship between latency and throughput and the effect that lowing latency has on raising throughput (comparing the server and laptop). For more on this topic see Peter Lawrey's excellent presentation here.

The code for this test is below:
  


No comments:

Post a comment