Monday, 31 August 2015

An Unexpected Allocation - JIT Compilation Jitter

Whilst working on ByteWatcher (see my last post) I came across something rather strange.

This is the actual code snippet used to find out how much allocation there has been on a particular thread:

return (long) mBeanServer.invoke(
  name,
  GET_THREAD_ALLOCATED_BYTES,
  PARAMS,
  SIGNATURE
);

For the full context see here.

(The way ByteWatcher works is to call this method periodically to monitor allocation.)

One important point to be noted, especially when wanting to provide an accurate number for a program's allocation is that calling the above code itself - causes allocation.

The allocation caused by this call must be deducted from the number returned so that we isolate the allocation caused by the program.

i.e call to meanBeanServer = program thread allocation + call overhead  

What I noticed was that this amount of allocation was usually 336 bytes.

However when I called this method in a loop I discovered something interesting.  Every so often it would allocate a different amount.

For this test:

@Test
  public void testQuietMeasuringThreadAllocatedBytes() {
    ByteWatcherSingleThread am = new ByteWatcherSingleThread();
    System.out.println("MeasuringCostInBytes = " + am.getMeasuringCostInBytes());
    long[] marks = new long[1_000_000];
    for (int i = 0; i < 1_000_000; i++) {
      marks[i] = am.threadAllocatedBytes();
    }

    long prevDiff = -1;
    for (int i = 1; i < 1_000_000; i++) {
      long diff = marks[i] - marks[i - 1];
      if (prevDiff != diff)
        System.out.println("Allocation changed at iteration " + i + "->" + diff);
      prevDiff = diff;
    }
  }

This was typical result:

MeasuringCostInBytes = 336
Allocation changed at iteration 1->336
Allocation changed at iteration 12->28184
Allocation changed at iteration 13->360
Allocation changed at iteration 14->336
Allocation changed at iteration 1686->600
Allocation changed at iteration 1687->336
Allocation changed at iteration 2765->672
Allocation changed at iteration 2766->336
Allocation changed at iteration 5458->496
Allocation changed at iteration 5459->336
Allocation changed at iteration 6213->656
Allocation changed at iteration 6214->336
Allocation changed at iteration 6535->432
Allocation changed at iteration 6536->336
Allocation changed at iteration 6557->8536
Allocation changed at iteration 6558->336
Allocation changed at iteration 7628->576
Allocation changed at iteration 7629->336
Allocation changed at iteration 8656->4432
Allocation changed at iteration 8657->336
Allocation changed at iteration 9698->968
Allocation changed at iteration 9699->336
Allocation changed at iteration 11881->1592
Allocation changed at iteration 11882->336
Allocation changed at iteration 12796->1552
Allocation changed at iteration 12797->336
Allocation changed at iteration 13382->456
Allocation changed at iteration 13383->336
Allocation changed at iteration 14844->608
Allocation changed at iteration 14845->336
Allocation changed at iteration 36685->304
Allocation changed at iteration 52522->336
Allocation changed at iteration 101440->400
Allocation changed at iteration 101441->336

Given that there was definitely no allocation in the program it was a mystery to me why the same call was sometimes allocating different amounts.

In summary, over 1,000,000 runs, the program allocated different amounts about 25 times. Notably there were no spikes after 100k iterations.

I shared this problem with Heinz Kabutz and Chris Newland. Chris noticed that that the allocations were down to JIT compilation jitter.

This can be seen very clearly by re-running the test with the flag -Xint (only run in interpreted mode i.e. no JIT compilation). Now there were only 2 spikes.

MeasuringCostInBytes = 336
Allocation changed at iteration 1->336
Allocation changed at iteration 12->28184
Allocation changed at iteration 13->360
Allocation changed at iteration 14->336

Similarly running with the flag -Xcomp (compile only)

MeasuringCostInBytes = 336
Allocation changed at iteration 1->336
Allocation changed at iteration 12->29696
Allocation changed at iteration 13->360
Allocation changed at iteration 14->336

So now we can be pretty confident that it is JIT compilation jitter which is causing the rogue allocations.

I can't pinpoint the exact cause of the allocation but I guess it's understandable that there might be some sort of allocation when JIT compilation jitter occurs.

To compensate for this I introduced a calibration phase in the constructor of ByteWatcher which was further refined by Heinz.

You can see the calibration code here but it consists of a couple of phases:

  1. Call the method to work out how much the thread has allocated in tight loop (we call it 100,000 times) - allows the JIT to properly warm up the code so that it's all compiled
  2. Wait 50milliseconds  - this allows the JVM a chance to finish it's compilation jitter
With this code in the constructor, even running with no special flags there are no allocation spikes.

Conclusion:
  • JIT compilation jitter causes some allocation
  • Running the program without compilation jitter (i.e only interpreted or only compiled) greatly reduces that allocation but does not quite eliminate it.
  • After 100k runs the allocations stop which indicate that it takes 100k runs for the jitter to stop.  This is interesting because we know that the code should compile after 10k iterations.



3 comments:

  1. Hi,

    Did you try without TieredCompilation (enabled by default on JDK8) ?
    I think it will be more predictable (10K) without it.

    why not using
    (
    (com.sun.management.ThreadMXBean) ManagementFactory.getThreadMXBean()).getThreadAllocatedBytes(Thread.currentThread().getId());

    less overhead (48 bytes)
    But may not be portable because using com.sun.management package

    int VALUE_COUNT = 1000*1000;
    long[] values = new long[VALUE_COUNT];
    for (int i = 0; i < VALUE_COUNT; i++)
    {
    values[i] = ((ThreadMXBean) ManagementFactory.getThreadMXBean()).getThreadAllocatedBytes(Thread.currentThread().getId());
    }
    long prevDiff = -1;
    for (int i = 1; i < VALUE_COUNT; i++)
    {
    long diff = values[i] - values[i - 1];
    if (prevDiff != diff)
    System.out.println("Allocation changed at iteration " + i + "->" + diff);
    prevDiff = diff;
    }

    With TieredCompilation:
    Allocation changed at iteration 1->48
    Allocation changed at iteration 5631->560
    Allocation changed at iteration 5632->48
    Allocation changed at iteration 6655->1304
    Allocation changed at iteration 6656->48
    Allocation changed at iteration 95122->376
    Allocation changed at iteration 95123->48

    No Tiered compilation:
    Allocation changed at iteration 1->48
    Allocation changed at iteration 4999->2480
    Allocation changed at iteration 5000->424
    Allocation changed at iteration 5001->48
    Allocation changed at iteration 9912->816
    Allocation changed at iteration 9913->48
    Allocation changed at iteration 9999->560
    Allocation changed at iteration 10000->48
    Allocation changed at iteration 14563->200
    Allocation changed at iteration 14564->48

    ReplyDelete
  2. Thanks - interesting point about the tiered compilation.
    Goes to prove that compilation jitter does cause allocation :)

    ReplyDelete
  3. Jean-Philippe, the interesting thing about that property is that it is not exposed in the ThreadMXBean, but you can see it as a property in JConsole (which is where I found it whilst dicking around during a live presentation in one of my courses). So I'm not even sure if our code is going to be fully compatible and work in other JVMs. Anybody care to try it out in IBM?

    For the calibration code, I did know that tiered compilation makes things more tricky, but also didn't want to force users to have to turn it off, just to use the code. This is why I increased the repeats to hopefully catch all the JITter up front :-)

    ReplyDelete