Friday 8 May 2015

Safepoints: Strange and Dangerous Behaviour of Code in Tight Loops

Here's a question to consider.

Is it possible to completely halt the execution of Thread A by code in Thread B where Thread B has no shared state or access to Thread A.

The immediate reaction would be, no.  Unless the threads share some lock they should have no effect on each other.

Turns out this is not actually true.

One thing all code inside a JVM has in common is the JVM itself.  So in theory, a thread can effect another thread by its interaction with the JVM.  Let's drill down into what happens when a GC is scheduled by the JVM.  First, all threads are brought to a safe point then the GC is executed.

The real problem with code in tight loops is that it can't be brought to a safe point.  The threads (those other than the one executing the tight loop) at the safepoint will be stalled there indefinitely until the tight loop exits.

(My experiences running and debugging this code can be found here. Some rather strange things happen to jstack and flight recorder when you try to investigate code in tight loops.)

Take a look at this code:

  

If you run the code with these command lines options:

-XX:+PrintSafepointStatistics -XX:+PrintGCDetails -XX:+PrintGC -XX:+PrintGCTimeStamps

you will observe the following output (notice the highlights in red which show the effects the threads have on each other):

testNoAllocations
testAllocationsOnHeap
0.605: [GC (Allocation Failure) [PSYoungGen: 33280K->5119K(38400K)] 111405K->98692K(132096K), 0.0795153 secs] [Times: user=0.45 sys=0.02, real=0.08 secs] 
0.685: [Full GC (Ergonomics) [PSYoungGen: 5119K->5085K(38400K)] [ParOldGen: 93573K->93475K(176640K)] 98692K->98561K(215040K), [Metaspace: 7131K->7131K(1056768K)], 2.2169620 secs] [Times: user=14.59 sys=0.25, real=2.21 secs] 
2.908: [GC (Allocation Failure) [PSYoungGen: 38365K->5117K(41984K)] 131841K->131353K(218624K), 0.1426349 secs] [Times: user=1.07 sys=0.03, real=0.14 secs] 
55.457: [GC (Allocation Failure) [PSYoungGen: 41981K->5119K(45056K)] 168217K->168171K(221696K), 0.1575692 secs] [Times: user=1.13 sys=0.03, real=0.16 secs] 
55.615: [Full GC (Ergonomics) [PSYoungGen: 5119K->0K(45056K)] [ParOldGen: 163051K->128718K(295936K)] 168171K->128718K(340992K), [Metaspace: 7197K->7197K(1056768K)], 0.9572316 secs] [Times: user=6.93 sys=0.11, real=0.95 secs] 
No allocation took 56015 with result 500000000000
56.579: [GC (Allocation Failure) [PSYoungGen: 39936K->512K(47104K)] 168654K->167438K(343040K), 0.1376084 secs] [Times: user=1.00 sys=0.01, real=0.14 secs] 
56.726: [GC (Allocation Failure) [PSYoungGen: 47104K->5120K(67072K)] 214030K->214230K(363008K), 0.1811774 secs] [Times: user=1.28 sys=0.03, real=0.18 secs] 

0
56.923: [GC (Allocation Failure) [PSYoungGen: 67072K->56320K(109568K)] 276182K->276390K(405504K), 0.2548213 secs] [Times: user=1.79 sys=0.05, real=0.26 secs] 
57.190: [GC (Allocation Failure) [PSYoungGen: 109568K->70144K(123392K)] 329638K->329822K(419328K), 0.4376422 secs] [Times: user=3.12 sys=0.08, real=0.44 secs] 
57.628: [Full GC (Ergonomics) [PSYoungGen: 70144K->0K(123392K)] [ParOldGen: 259678K->274807K(488448K)] 329822K->274807K(611840K), [Metaspace: 7197K->7197K(1056768K)], 2.5251285 secs] [Times: user=18.03 sys=0.31, real=2.52 secs] 
60.160: [GC (Allocation Failure) [PSYoungGen: 53248K->53376K(162304K)] 328055K->328183K(650752K), 0.2079755 secs] [Times: user=1.42 sys=0.04, real=0.21 secs] 
60.380: [GC (Allocation Failure) [PSYoungGen: 132224K->95232K(174080K)] 407031K->407263K(662528K), 0.4926310 secs] [Times: user=3.46 sys=0.05, real=0.49 secs] 

1
....
....
....
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.281: RedefineClasses                  [       5          0              0    ]      [     0     0     0     0     1    ]  0   
0.530: RedefineClasses                  [       5          0              0    ]      [     0     0     0     0     4    ]  0   
0.603: ParallelGCFailedAllocation       [      14          1              1    ]      [     1     0     1     0  2296    ]  0   
2.908: ParallelGCFailedAllocation       [      15          1              1    ]      [     0     0     0     0   142    ]  1   
3.059: ParallelGCFailedAllocation       [      15          1              2    ]      [ 52398     0 52398     0  1114    ]  0   
56.579: ParallelGCFailedAllocation       [      14          0              0    ]      [     0     0     0     0   137    ]  0   

56.726: ParallelGCFailedAllocation       [      14          0              0    ]      [     0     0     0     0   181    ]  0   


What this shows is that the testNoAllocations thread, running in a tight loop is unable to reach a safepoint thereby halting the progress of testAllocationsOnHeap for almost a minute! 

If you allow the tight loop to reach safepoint (by uncommenting the System.out line 38) the output is very different.  The threads are now running concurrently.

testNoAllocations
testAllocationsOnHeap
0.596: [GC (Allocation Failure) [PSYoungGen: 33280K->5117K(38400K)] 111405K->98666K(132096K), 0.0790627 secs] [Times: user=0.45 sys=0.02, real=0.08 secs] 
0.676: [Full GC (Ergonomics) [PSYoungGen: 5117K->5107K(38400K)] [ParOldGen: 93549K->93454K(176640K)] 98666K->98561K(215040K), [Metaspace: 7129K->7129K(1056768K)], 2.2247202 secs] [Times: user=14.40 sys=0.26, real=2.23 secs] 
No Allocations 0
2.907: [GC (Allocation Failure) [PSYoungGen: 38387K->5101K(41984K)] 131841K->131339K(218624K), 0.1416991 secs] [Times: user=0.99 sys=0.03, real=0.14 secs] 
3.054: [GC (Allocation Failure) [PSYoungGen: 41965K->5117K(56320K)] 168203K->168363K(232960K), 0.1655830 secs] [Times: user=1.21 sys=0.03, real=0.17 secs] 
3.220: [Full GC (Ergonomics) [PSYoungGen: 5117K->0K(56320K)] [ParOldGen: 163246K->167991K(331264K)] 168363K->167991K(387584K), [Metaspace: 7130K->7130K(1056768K)], 1.4696097 secs] [Times: user=10.81 sys=0.13, real=1.46 secs] 
No Allocations 1
4.711: [GC (Allocation Failure) [PSYoungGen: 51200K->512K(67072K)] 219191K->217223K(398336K), 0.2020788 secs] [Times: user=1.48 sys=0.03, real=0.20 secs] 
No Allocations 2
No Allocations 3
4.937: [GC (Allocation Failure) [PSYoungGen: 67072K->5120K(71680K)] 283783K->282951K(402944K), 0.2758780 secs] [Times: user=1.93 sys=0.05, real=0.28 secs] 
5.213: [Full GC (Ergonomics) [PSYoungGen: 5120K->0K(71680K)] [ParOldGen: 277831K->282444K(527360K)] 282951K->282444K(599040K), [Metaspace: 7198K->7198K(1056768K)], 2.8296645 secs] [Times: user=20.45 sys=0.31, real=2.83 secs] 
No Allocations 4
Allocations 0
No Allocations 5
8.065: [GC (Allocation Failure) [PSYoungGen: 66560K->66336K(164352K)] 349004K->348780K(691712K), 0.2564233 secs] [Times: user=1.91 sys=0.04, real=0.26 secs] 
No Allocations 6
No Allocations 7
No Allocations 8
8.354: [GC (Allocation Failure) [PSYoungGen: 155936K->82944K(172544K)] 438380K->438452K(699904K), 0.6268327 secs] [Times: user=4.53 sys=0.12, real=0.63 secs] 
No Allocations 9
No Allocations 10
9.003: [GC (Allocation Failure) [PSYoungGen: 172544K->132608K(233984K)] 528052K->517452K(761344K), 0.6610664 secs] [Times: user=4.87 sys=0.11, real=0.66 secs] 
No Allocations 11
Allocations 1
No Allocations 12
No Allocations 13
9.696: [GC (Allocation Failure) [PSYoungGen: 233984K->167424K(268800K)] 618828K->581940K(796160K), 0.7480646 secs] [Times: user=5.52 sys=0.07, real=0.75 secs] 
No Allocations 14
No Allocations 15
No Allocations 16
10.477: [GC (Allocation Failure) [PSYoungGen: 268800K->205152K(325632K)] 683316K->619668K(852992K), 0.8410307 secs] [Times: user=6.22 sys=0.11, real=0.84 secs] 
No Allocations 17
No Allocations 18
Allocations 2
No Allocations 19
11.351: [GC (Allocation Failure) [PSYoungGen: 323936K->232960K(351744K)] 738452K->649268K(879104K), 0.8260039 secs] [Times: user=6.21 sys=0.04, real=0.82 secs] 
No Allocations 20
No Allocations 21
No Allocations 22
12.209: [GC (Allocation Failure) [PSYoungGen: 351744K->232928K(410624K)] 768052K->649236K(937984K), 0.9079795 secs] [Times: user=6.58 sys=0.10, real=0.91 secs] 
No Allocations 23
No Allocations 24
Allocations 3



Note that even though in this last case the threads run concurrently, its not hard to imagine a system, though not completely stalled is temporarily halted whist the code in the tight loop comes up for breath and reaches a safe point.

Conclusions
  • Code in tight loops cannot be brought to safepoint. 
  • Even though threads in a JVM ought to be independent of each other, if they behave in a way in which they can't be brought to safepoint they will slow down other threads waiting for GC at safepoint.
  • If you have a system where: (a) one thread spends time in tight loops (b) the system uses GC (c) there are other critical threads; you should check your safepoint logs to make sure that the system is not being slowed down waiting at safepoints.


2 comments:

  1. Hello,

    you can also change the type of counter into your loops from int to long and see that safepoints are emitted into it.
    JIT considers that int loop can be fast enough to not have issue regarding reaching the safepoint out of the loop in timely manner.
    but for long loop it may take a non negligible amount of time.

    Cheers

    ReplyDelete
  2. Thanks excellent point - I've tested and confirmed that to be correct.

    Of course nested int loops can have the same effect as a long loop. But I guess it would be more unusual so wasn't optimised...

    ReplyDelete