This is the actual code snippet used to find out how much allocation there has been on a particular thread:
return (long) mBeanServer.invoke( name, GET_THREAD_ALLOCATED_BYTES, PARAMS, SIGNATURE );
For the full context see here.
(The way ByteWatcher works is to call this method periodically to monitor allocation.)
One important point to be noted, especially when wanting to provide an accurate number for a program's allocation is that calling the above code itself - causes allocation.
The allocation caused by this call must be deducted from the number returned so that we isolate the allocation caused by the program.
i.e call to meanBeanServer = program thread allocation + call overhead
What I noticed was that this amount of allocation was usually 336 bytes.
However when I called this method in a loop I discovered something interesting. Every so often it would allocate a different amount.
For this test:
@Test
public void testQuietMeasuringThreadAllocatedBytes() {
ByteWatcherSingleThread am = new ByteWatcherSingleThread();
System.out.println("MeasuringCostInBytes = " + am.getMeasuringCostInBytes());
long[] marks = new long[1_000_000];
for (int i = 0; i < 1_000_000; i++) {
marks[i] = am.threadAllocatedBytes();
}
long prevDiff = -1;
for (int i = 1; i < 1_000_000; i++) {
long diff = marks[i] - marks[i - 1];
if (prevDiff != diff)
System.out.println("Allocation changed at iteration " + i + "->" + diff);
prevDiff = diff;
}
}
This was typical result:
MeasuringCostInBytes = 336
Allocation changed at iteration 1->336
Allocation changed at iteration 12->28184
Allocation changed at iteration 13->360
Allocation changed at iteration 14->336
Allocation changed at iteration 1686->600
Allocation changed at iteration 1687->336
Allocation changed at iteration 2765->672
Allocation changed at iteration 2766->336
Allocation changed at iteration 5458->496
Allocation changed at iteration 5459->336
Allocation changed at iteration 6213->656
Allocation changed at iteration 6214->336
Allocation changed at iteration 6535->432
Allocation changed at iteration 6536->336
Allocation changed at iteration 6557->8536
Allocation changed at iteration 6558->336
Allocation changed at iteration 7628->576
Allocation changed at iteration 7629->336
Allocation changed at iteration 8656->4432
Allocation changed at iteration 8657->336
Allocation changed at iteration 9698->968
Allocation changed at iteration 9699->336
Allocation changed at iteration 11881->1592
Allocation changed at iteration 11882->336
Allocation changed at iteration 12796->1552
Allocation changed at iteration 12797->336
Allocation changed at iteration 13382->456
Allocation changed at iteration 13383->336
Allocation changed at iteration 14844->608
Allocation changed at iteration 14845->336
Allocation changed at iteration 36685->304
Allocation changed at iteration 52522->336
Allocation changed at iteration 101440->400
Allocation changed at iteration 101441->336
Given that there was definitely no allocation in the program it was a mystery to me why the same call was sometimes allocating different amounts.
In summary, over 1,000,000 runs, the program allocated different amounts about 25 times. Notably there were no spikes after 100k iterations.
I shared this problem with Heinz Kabutz and Chris Newland. Chris noticed that that the allocations were down to JIT compilation jitter.
This can be seen very clearly by re-running the test with the flag -Xint (only run in interpreted mode i.e. no JIT compilation). Now there were only 2 spikes.
MeasuringCostInBytes = 336
Allocation changed at iteration 1->336
Allocation changed at iteration 12->28184
Allocation changed at iteration 13->360
Allocation changed at iteration 14->336
Similarly running with the flag -Xcomp (compile only)
MeasuringCostInBytes = 336
Allocation changed at iteration 1->336
Allocation changed at iteration 12->29696
Allocation changed at iteration 13->360
Allocation changed at iteration 14->336
So now we can be pretty confident that it is JIT compilation jitter which is causing the rogue allocations.
I can't pinpoint the exact cause of the allocation but I guess it's understandable that there might be some sort of allocation when JIT compilation jitter occurs.
To compensate for this I introduced a calibration phase in the constructor of ByteWatcher which was further refined by Heinz.
You can see the calibration code here but it consists of a couple of phases:
- Call the method to work out how much the thread has allocated in tight loop (we call it 100,000 times) - allows the JIT to properly warm up the code so that it's all compiled
- Wait 50milliseconds - this allows the JVM a chance to finish it's compilation jitter
With this code in the constructor, even running with no special flags there are no allocation spikes.
Conclusion:
- JIT compilation jitter causes some allocation
- Running the program without compilation jitter (i.e only interpreted or only compiled) greatly reduces that allocation but does not quite eliminate it.
- After 100k runs the allocations stop which indicate that it takes 100k runs for the jitter to stop. This is interesting because we know that the code should compile after 10k iterations.