Monday 16 March 2015

Creating Millions of Objects with Zero Garbage

As noted in First rule of performance optimisation, garbage is the enemy of fast code. Not only can it destroy any sort of deterministic performance by employing the services of the garbage collector but we start filling our CPU caches with garbage that will cause expensive cache misses for our program.

So, can we use Java without creating garbage? Is it possible, for example, in natural Java, to solve this problem:

Create 10m financial instrument objects, store them in a map, retrieve them and perform a calculation using each object without creating any garbage at all.

It is if you use Chronicle!  Chronicle provides libraries so that you can easily use off heap storage in the form off memory mapped files for your objects. (For full source code for this article see here.)

Let's look implementing a solution for the above problem. 

First let's have a look at how you might do this in normal Java so that we make sure we understand the problem and what happens if we use the standard Java libraries for our implementation.

This is the out put from the program:


*** Entering critical section ***
[GC (Allocation Failure)  98816K->92120K(125952K), 0.0317021 secs]
[Full GC (Ergonomics)  92120K->91917K(216576K), 0.2510530 secs]
[GC (Allocation Failure)  125197K->125430K(224256K), 0.0449051 secs]
[GC (Allocation Failure)  166390K->166686K(244224K), 0.0504341 secs]
[Full GC (Ergonomics)  166686K->165777K(387072K), 0.6243385 secs]
[GC (Allocation Failure)  226705K->226513K(388096K), 0.0785121 secs]
[GC (Allocation Failure)  293073K->293497K(392704K), 0.0825828 secs]
[Full GC (Ergonomics)  293497K->292649K(591872K), 1.2479519 secs]
[GC (Allocation Failure)  359209K->359433K(689664K), 0.0666344 secs]
[GC (Allocation Failure)  449033K->449417K(695296K), 0.1759746 secs]
[GC (Allocation Failure)  539017K->539385K(747008K), 0.1907760 secs]
[GC (Allocation Failure)  632569K->633009K(786944K), 0.2293778 secs]
[Full GC (Ergonomics)  633009K->631584K(1085952K), 2.1328028 secs]
[GC (Allocation Failure)  724768K->723368K(1146368K), 0.3092297 secs]
[GC (Allocation Failure)  827816K->825088K(1174016K), 0.3156138 secs]
[GC (Allocation Failure)  929536K->929952K(1207296K), 0.3891754 secs]
[GC (Allocation Failure)  1008800K->1009560K(1273856K), 0.4149915 secs]
[Full GC (Ergonomics)  1009560K->1007636K(1650688K), 3.4521240 secs]
[GC (Allocation Failure)  1086484K->1087425K(1671680K), 0.3884906 secs]
[GC (Allocation Failure)  1195969K->1196129K(1694208K), 0.2905121 secs]
[GC (Allocation Failure)  1304673K->1305257K(1739776K), 0.4291658 secs]
[GC (Allocation Failure)  1432745K->1433137K(1766912K), 0.4470582 secs]
[GC (Allocation Failure)  1560625K->1561697K(1832960K), 0.6003558 secs]
[Full GC (Ergonomics)  1561697K->1558537K(2343936K), 4.9359721 secs]
[GC (Allocation Failure)  1728009K->1730019K(2343936K), 0.7616385 secs]
[GC (Allocation Failure)  1899491K->1901139K(2413056K), 0.5187234 secs]
[Full GC (Ergonomics)  1901139K->1897477K(3119616K), 5.7177263 secs]
[GC (Allocation Failure)  2113029K->2114505K(3119616K), 0.6768888 secs]
[GC (Allocation Failure)  2330057K->2331441K(3171840K), 0.4812436 secs]
[Full GC (Ergonomics)  2331441K->2328578K(3530240K), 6.3054896 secs]
[GC (Allocation Failure)  2600962K->2488834K(3528704K), 0.1580837 secs]
*** Exiting critical section ***
Time for putting 32088
Time for getting 454
[GC (System.gc())  2537859K->2488834K(3547136K), 0.1599314 secs]
[Full GC (System.gc())  2488834K->2488485K(3547136K), 6.2759293 secs]
[GC (System.gc())  2488485K->2488485K(3559936K), 0.0060901 secs]
[Full GC (System.gc())  2488485K->2488485K(3559936K), 6.0975322 secs]
Memory(heap) used 2.6 GB


The two main points that jump out of this issue are, one, the number and expense of the garbage collections (clearly this could be tuned) and two the amount of heap used 2.6GB. In short, there's no getting away from it, this program produces masses of garbage.

Let's try exactly the same thing, this time using ChronicleMap

This is the code to solve the problem:

This is the output from the program:


[GC (Allocation Failure)  33280K->6595K(125952K), 0.0072065 secs]
[GC (Allocation Failure)  39875K->12177K(125952K), 0.0106678 secs]
[GC (Allocation Failure)  45457K->15289K(125952K), 0.0068434 secs]
[GC (Allocation Failure)  48569K->18357K(159232K), 0.0098287 secs]
[GC (Allocation Failure)  84917K->21008K(159232K), 0.0156393 secs]
*** Entering critical section ***
*** Exiting critical section ***
Time for putting 8554
Time for getting 4351
[GC (System.gc())  36921K->21516K(230400K), 0.0331916 secs]
[Full GC (System.gc())  21516K->15209K(230400K), 0.0630483 secs]
[GC (System.gc())  15209K->15209K(230912K), 0.0006491 secs]
[Full GC (System.gc())  15209K->15209K(230912K), 0.0234045 secs]
Memory(heap) used 18.2 MB


The main point here is obviously that there were no GCs in the critical section and that the whole program only used 18MB of heap. We have managed to create a program that ordinarily would have produced gigabytes of garbage without producing any garbage at all. 

A note on timings

ChronicleMap is clearly not a drop in replacement for ConcurrentHashMap, they have very different uses and it is beyond the scope of this post to go too much further into that line of discussion. But the main differences in functionality are  that ChronicleMap is persisted and can be shared amongst many JVMs. (ChronicleMap also has the ability to be tcp replicated.) Nevertheless it is interesting to quickly compare timings if nothing else than to make sure we are in the same ball park.  ChronicleMap was faster for putting, 8.5s compared to 32s.  But most of the time in ConcurrentHashMap was spent in GC and that might be tuned away to some extent.  ConcurrentHashMap was faster for getting, 0.5s compared to 4.3s.  Nevertheless on other runs I've seen ConcurrentHashMap taking over 7s because of a GC that occurred in that section. Even though ChronicleMap is doing significantly more work, the lack of garbage produced actually makes the timings comparable with ConcurrentHashMap.

Restarting the program   

Where ChronicleMap really comes into its own is on a restart.  Let's say your program goes down and you need to recalculate the same computation we did earlier.  In the case of ConcurrentHashMap we would have to repopulate the map in exactly the same we did earlier. With ChronicleMap, since the map is persistent it is just a matter of pointing the map at the existing file and rerunning the calculation to produce the totalQuantity.

Summary  


ConcurrentHashMapChronicleMap
gc pausesManyNone
update time32s8s
reads allowing gc7s4s
reads no gc0.5s4s
heap size2.6GB18MB
persistenceNoYes
fast restartNoYes

26 comments:

  1. Let me start out by saying that the following is in no way meant to negatively portray the Chronicle code. The OpenHFT work is amazing and I use it regularly.

    I can't reproduce any of your tests because the pom is incomplete and the code will not compile due to missing dependencies.

    You're not really drawing accurate comparisons here in a number of ways:

    1) The first test is being unfairly penalized because you're not specifying a minimum heap size as well as the maximum. You'd never run a low latency operation without also specifying -Xms as the same size as the max, in this case -Xms4G. Failure to do so will cause the JVM to perform a STW (stop the world) collection every time the heap grows. Your Chronicle test exhibits a few of these because it still does use managed heap but not as much so has less pauses.

    2) You're comparing "heap" when in reality all you're measuring is JVM-managed heap. The Chronicle code will consume process heap but it won't be measured by simply calculating JVM-managed heap which is what the code currently does. Additionally, IIRC the Chronicle implementation is based on mmap'ing files. So, while it may use less process heap (managed or otherwise), it will result in more segfaults which also have a cost. A more interesting comparison would be to measure overall system impact (RAM used, shared memory used, page faults, CPU, etc.) across what are two fundamentally different approaches to memory usage.

    Lastly, the contention that garbage will "destroy any sort of deterministic performance" and "will cause expensive cache misses" isn't really substantiated by this test. You don't measure cache effectiveness so we don't know how well either performs in that regard. It's possible to get relatively deterministic performance out of well tuned JVMs depending on workload. Speaking from experience running large scale Cassandra clusters, if all your garbage is created in the new generation and doesn't survive promotion at collection time, GC is incredibly effective and fast. It's effectively just zeroing out pointers in a preallocated memory slab. I've seen steady workloads collect gigabytes of new gen in milliseconds with little variance.

    ReplyDelete
    Replies
    1. Thanks for your comments which are well made and taken.

      Yes the test is clearly unfair as you pointed out in 1). I did allude to this in the article when I said "(clearly this could be tuned)". The main point of the article was to demonstrate that there are alternatives to creating objects on the heap. To people who are not aware of Chronicle or even the concept of off-heap data I think this might be very interesting. I used this very artificial example to demonstrate the point as clearly as I could.

      Definitely agree with point 2) if you are planning on Using ChronicleMap you do have to take into account the performance of the whole machine not just the JVM. Of course what really has to be compared in that case is ChronicleMap against another persisted solution, comparing against ConcurrentHashMap was only done for the sake of this example.

      The sorts of latencies that certain people who would use ChronicleMap might be aiming for are sub-millisecond. Ideally no gc collections at all. But accept your point that a well tuned JVM might well perform in a deterministic manner.

      Delete
  2. "In short, there's no getting away from it, this program produces masses of garbage." - This is simply not true. The program is triggering GC activity, but that does NOT mean it's creating garbage. The GC activity is triggered due to poor configuration (I ran the test with a larger heap of a fixed size, no GC prints). There's no garbage generated in the "Critical Section" (i.e there's nothing to collect, the map is a live reference).
    So here's how you create millions of objects (real objects, you can pass around and synchronise on and put into different containers and have methods on...) with no garbage, you just create the objects in question and put them into a well sized map (so there's no garbage on resizing the map) and you keep a live reference to the map.
    That's it! None of these objects will get collected -> no garbage.

    ReplyDelete
    Replies
    1. Thanks for the comment with which I completely agree.

      I've addressed some of the points in my comment above to Erik. You are of course correct, the garbage was only produced in this program because of poor configuration. The main point of the post was to introduce people to the idea that there are alternatives to on heap storage. Comparing ChronicleMap against HashMap was only done to highlight this idea. As you pointed out a much fairer (and more sensible) comparison would be to compare ChronicleMap to another persisted solution.

      There might be situations where someone has more data than they can fit into their heap. In this case ChronicleMap as an off heap solution would be a way out for them as it can exceed the main memory on the machine. One would have to accept that depending on how the data is accessed it might not be that performant due to page faulting but that really depends on the data access pattern in the program.

      Delete
    2. "the garbage was only produced in this program because of poor configuration"
      No, you are missing the point. GC activity does not mean garbage was produced.
      In particular heap resizing and promotion of objects is not indication of garbage.

      Delete
    3. Understood - I did mean and should have written GC activity.
      Thanks for pointing out.

      Delete
  3. I'll copy my comment from JavaCodeGeeks here. However as I see it is pretty similar to other's comments:

    CreateOnHeapTest doesn’t produce garbage, so it is a very bad candidate for testing garbage collection. You are storing the data in the ConcurrentHashMap, but you never remove it, so they never become eligible for collection.
    It is clearly visible in the GC log. The GC doesn’t remove these map entries, the used heap size is not reduced. And the overall heap size is growing continuously. It is a good example of memory leak, as nobody cleans up this collection, it just grows…

    The Java GC is based on the generational theory. it is about the most of the objects are used, and thrown away immediately. This is what Java GC is focusing on. GC pause time linearly depends on the number of live objects. Because of the generational theory, the number of live objects are always low, so the GC is fast.
    But in your example, -because of the memory leak- the number of live objects are very high, making the GC inefficient and slow. It has to walk through the huge reference tree every time. This is a very extreme case in this sense, not common in Java world.

    The reason for Garbage Collection is the much better performance compared to other solutions. In C++ you have to spend expensive CPU time to clean up after every single objects. In Java, cleaning up is almost free for most of the objects. As well as object allocation.
    So the overall throughput of the application must be better with GC. The drawback is that the latency is not stable. Since it is a well proven technology, I would be very surprised if experiment would show different result than this.

    ReplyDelete
  4. Thanks for taking the time to comment. Agree with all your points. As I replied earlier to previous comments the main point of the post was to demonstrate how Chronicle does not use heap memory. The HashMap was intended used purely as a contrast to show how much heap memory would be allocated in normal circumstances.

    ReplyDelete
  5. I dnt know what went wrong but when i executed the first case. It gave following output.

    ----- HASHMAP ------------------------
    *** Entering critical section ***
    [GC 29942K->29040K(120832K), 0.0402470 secs]
    [GC 58995K->57696K(152576K), 0.0496340 secs]
    [GC 121184K->121232K(185344K), 0.3986370 secs]
    [Full GC 121232K->121052K(299520K), 1.1513830 secs]
    [GC 184540K->184723K(330240K), 0.5455970 secs]
    [Full GC 184723K->184537K(441856K), 1.0315350 secs]
    [GC 278745K->278993K(474624K), 0.7381750 secs]
    [Full GC 278993K->278743K(619520K), 1.3248320 secs]
    [GC 405719K->405815K(712192K), 0.9467300 secs]
    [GC 509751K->509975K(744960K), 1.4869750 secs]
    [GC 613911K->614063K(808448K), 1.7037870 secs]
    [Full GC 614063K->613587K(1054208K), 3.2654490 secs]
    ************* STARTING GET *********************
    [GC 711379K->691041K(1113600K), 1.4565380 secs]
    [GC 788833K->691177K(1110528K), 0.8207020 secs]
    *** Exiting critical section ***
    Time for putting 15268
    Time for getting 6068
    [GC 730574K->691145K(975360K), 0.6077130 secs]
    [Full GC 691145K->225K(975360K), 0.0212590 secs]
    [GC 225K->225K(1108992K), 0.0043150 secs]
    [Full GC 225K->225K(1108992K), 0.0093000 secs]
    Memory(heap) used 2.7 MB



    When I executed 2nd case It gave this out put.
    ----- CHRONICLE MAP ------------------------
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    [GC 31744K->7497K(120832K), 0.0202670 secs]
    *** Entering critical section ***
    [GC 39241K->11625K(152576K), 0.0179970 secs]
    [GC 89009K->11665K(159232K), 0.0014340 secs]
    [GC 86417K->11665K(156672K), 0.0014400 secs]
    [GC 83857K->11665K(154624K), 0.0014570 secs]

    730 messages like this
    [GC 83857K->11665K(154624K), 0.0014570 secs]

    *** Exiting critical section ***
    Time for putting 58035
    Time for getting 88172
    [GC 28510K->11633K(154624K), 0.0013230 secs]
    [Full GC 11633K->9045K(154624K), 0.0853900 secs]
    [GC 9045K->9045K(154624K), 0.0011460 secs]
    [Full GC 9045K->9045K(154624K), 0.0375410 secs]
    Memory(heap) used 10.7 MB

    After this I dnt think 2nd one is any use for me .
    Did I make any mistake ?

    ReplyDelete
  6. Interesting... there shouldn't be any allocation in the critical section.
    Did you set -Xmx?

    Are you able to run the program using flight recorder and see where the allocation is coming from?
    (I have a blog post about how to run Flight Recorder (http://www.rationaljava.com/2015/03/java-flight-recorder-since-jdk-18040.html)

    ReplyDelete
  7. At the same time, another container would be put on the truck to accept another load of wastes. Due to this special nature, the truck carries a high price tag. Hence companies prefer garbage truck financing to acquire it. badger garbage disposal

    ReplyDelete
  8. If you have a large family though and often have to dispose of challenging waste like tough peels, bones and corncobs, you'll need at least a 3/4 or 1 horsepower unit to prevent jams.
    smallest garbage disposal

    ReplyDelete
  9. There are a few factors you should consider when choosing a garbage disposal unit.best garbage disposal unit for undermount sink Let's take a look at them now to help you make a good decision.

    ReplyDelete
  10. There always seems to be a bit of crap left over in the thing. It's nothing new. You flip the switch, expecting to hear that horrible train wreck of a sound that means it's working. best dual compartment trash cans

    ReplyDelete
  11. Another advantage is that these units remove the bad odors that may emanate from the stinking food items from the kitchen.shortest garbage disposal unit for undermount sink

    ReplyDelete
  12. The two major garbage disposal brands, Insinkerator and Waste King, each use a different default power connection.
    pull out trash cans

    ReplyDelete
  13. For two people who just dispose of soft waste a 1/4 hp would do a very adequate job. For a regular size family 4-6 people who mostly dispose of soft waste with the occasional bone or solid waste the 3/4hp should be very sufficient. outdoor garbage cans with locking with wheels

    ReplyDelete
  14. Nice to be visiting your blog again, it has been months for me. Well this article that i’ve been waited for so long. I need this article to complete my assignment in the college, and it has same topic with your article. Thanks, great share. Junk Car Removal

    ReplyDelete
  15. A lot of valuable information can be derived from the post. If you are probing for the best quality of affordable car disposals services then you can visit on car disposals Brisbane

    ReplyDelete
  16. Admiring the time and effort you put into your blog and detailed information you offer. Get the fast & convenience BMW repair services in Greensboro, NC at fair prices at eurobahnm.com. Check this BMW Greensboro NC for a wide range of information.

    ReplyDelete
  17. Took A LOT longer to complete than what I was told but the ceramic coating that was applied to my car was fantastic. My contact was David who did a great job at keeping me updated even when it was to tell me about a delay. Overall, great work!! visit
    Auto Body Shop Greensboro NC

    ReplyDelete
  18. Additional factors of the location endure secondary with myself. Combine strikes with the fact that it requires up a a lot more component of your good old ram than one more three for this describe, And it's not necassary to be reluctant within find regarding if your personal computer not necessarily safe and sound..

    So rankings make absolutely me personally a gazilbillionaire. The theory really is easy, An individual has New Jordan Shoes a cute smallish mary pet temperament which team you can stimulate. Remark series 5. Is not a backwards stabbing, Towards fact all wear require to go following filth on guys and women.

    Whilst FBI intellect bulletins hasn't tell Jones alternatively Jordan Shoes For Sale InfoWars for moniker, It lets you do talk about part of the conspiracy theory practices generally linked to the a lot Coach Outlet Store precisely radio stations coordinate, Get hold of the idea of the " new world " shop.Jones professed of Michael Kors Outlet Sale your current soft sand lift serious schools recording, This 26 little experienced put to sleep, Would have air force 1 in store been a joke, An incorrect the a hunt functioning wanted if you become a pretext for government entities when you need that take Cheap Yeezy Shoes or outlaw gun.

    In many cases, Truly together with small but successful flaggallboulders and help routine via a flight his or her a meal just like the gizzard a small rodent.. Seas Doo is reliable just Bombardier and she is the house of the desirable Rotax operated serps.

    In the condition of Missouri, The exact avg. Mattie Franklin were definitely an immense cooling fanatic including search engine spider wohumanity. Extremely sturdy to nutritious, The actual Viking Axe were a brutish tool at Cheap Yeezys For Sale just about the fact is that coming from mark twain, Confident enough weapon and quite effective in correct Cheap Ray Ban Sunglasses arms..

    ReplyDelete
  19. I enjoyed reading your article. But you had something to know about me.

    How do you secure outdoor garbage cans?

    Please let me know....

    ReplyDelete
  20. Not sure about Audi RS 7 vs BMW 3 Series, checkout this comparison. Bigger engine makes car more powerful. Check if big engine makes Audi RS 7 better than BMW 3 Series.
    audi rs 7 vs bmw 3 series

    ReplyDelete