Wednesday, 28 October 2015

Let's pause for a Microsecond

A lot of benchmarks in low latency Java applications involve having to measure a system under a certain load. This requires maintaining a steady throughput of events into the system as opposed to pumping events into a system at full throttle with no control whatsoever.

One of the tasks I often have to do is pause a producer thread for a short period inbetween events. Typically this amount of time will be single digit microseconds.

So how do you pause a Thread for this amount of time?  Most Java developers think instantly of Thread.sleep(). But that's not going to work because Thread.sleep() only goes down to milliseconds and that's an order of magnitude longer than the amount of time required for our pause in microseconds.

I saw an answer on StackOverflow pointing the user to TimeUnit.MICROSECONDS.sleep() in order to sleep for less than a millisecond.  This is clearly incorrect, to quote from the JavaDoc:

Performs a Thread.sleep using this time unit. This is a convenience method that converts time arguments into the form required by the Thread.sleep method.

So you're not going to be able to get better than a 1 millisecond pause , similar to Thread.sleep(1). (You can prove this trying the example on the code below).

The reason for this is that this method of pausing, namely putting a thread to sleep and waking it up, is never going to be fast or accurate enough to go lower than a millisecond.

Another question we should be introducing at this point is how accurate is Thread.sleep(1) anyway? We'll come back to this in later.

Another option when we want to pause for a microsecond is to use LockSupport.parkNanos(x).  Using the following code to park for 1 microsecond actually takes ~10us.  It's way better than TimeUnit.sleep() / Thread.sleep() but not really fit for purpose.  After 100us it does get into the same ball park with only a 50% variation.


The answer to our problems is to use System.nanoTime(). By busy waiting on a call to System.nanoTime we will be able to pause for a single microsecond.  We'll see the code for this in a second but first let's understand the accuracy of System.nanoTime(). Critically, how long does it take to perform the call to System.nanoTime().

Here's some code that will do exactly this:



The numbers will vary from one machine to another on my MBP I get ~40 nanoseconds.

That tells us that we should be able to measure to an accuracy of around 40 nanoseconds. Therefore, measuring 1 microsecond (1000 nanoseconds) should easily be possible.

This is the busy waiting approach 'pausing' for a microsecond:


The code waits for a microsecond and then times how long it has waited.  On my machine I get 1,115 nanoseconds which is within ~90% accurate. 

As you wait longer the accuracy increases, 10 microseconds takes 10,267 which is ~97% accurate and 100 microseconds takes 100,497 nanoseconds which is ~99.5% accurate.

What about Thread.sleep(1), how accurate is that?

Here's the code for that:


The average time in nanoseconds for 1 millisecond sleep is 1,295,509.  That only ~75% accurate.  It's probably good enough for nearly everything but if you want an exact millisecond pause you are far better off with a busy wait.  Of course you need to remember that busy waiting, by definition keeps your thread busy and will costs you a CPU.

Summary Table

Pause Method1us10us100us1000us/1ms10,000us/10ms
TimeUnit.Sleep()1284.61293.81295.71292.711865.3
LockSupport.parkNanos()8.128.4141.81294.311834.2
BusyWaiting1.110.1100.21000.210000.2


Conclusions

  • The only way to pause for a microsecond is by busy waiting
  • If you want to pause for anything less than a millisecond accurately you need to busy wait
  • LockSupport only begins to get accurate at 100us
  • System.nanoTime() takes ~40ns
  • Thread.sleep(1) is only 75% accurate
  • Busy waiting on more than 10us and above is almost 100% accurate
  • Busy waiting will tie up a CPU 





10 comments:

  1. It is System.nanoTime, not Sustem.nanoSecond.

    What's the point to wait 1us or more by executing code ?
    Let's burn the core by spinning this is the same thing. I do not think you relax more the system by calling System.nanoTime compared to a basic spin loop.
    LockSupport.parkNanos(1) seems more efficient to achieve the goal.

    Cheers

    ReplyDelete
    Replies
    1. Always nice to hear from you!
      Thanks for typo - fixed.
      I have no problem with a basic spin loop to achieve the busy wait for my application. Happy to dedicate a CPU to that.
      Added comparisons for LockSupport.parkNanos(x). As you can see it's not really useful if you actually want to wait 1us.
      Cheers

      Delete
    2. Well, my point is:
      Generally if you want to wait it is to relax the system and let other thread progress. In this regard, LockSupport.parkNanos(1) relax pretty well the system. Using a busy wait will not let other thread progress on the core you are running. You can afford that, but, then, I do not see the point to busy waiting for a time period. Usually you want to spin checking a volatile flag indicating data are ready. So between busy waiting for 10us or a basic spinning it makes no difference on the core execution.

      Cheers

      Delete
    3. In this case I want to pause a publishing thread exactly 1us between sending events.

      Delete
    4. Ok so it is to implement a rate control then. Make sense now.
      By the way, how Thread.yield behaves in this regard ? (on linux at least)

      Delete
    5. Thread.yield() takes ~0.3us - the equivalent of 7 calls to System.nanoTime().
      Of course I'm averaging the times for calls in tight loop. A single call to Thread.yield() on unwarmed code took ~15us.

      Delete
  2. Thank you for this very interesting post.

    ReplyDelete
  3. I always thought Thread.Sleep accurracy was highly dependent on the OS and versions of Windows in particular historically (/for instance) could show some wild variation (unless you played with some settings around hr timers). Which OS did you test ? Linux ?

    ReplyDelete
    Replies
    1. Yes you are correct there will be a difference across OS and hardware. My timings were on a MBP when connected to the mains (strange things happen when on battery). The point of my article however I do think holds true.

      Delete