Slaying the Garbage Collection dragon

JVM Garbage collection is a generational collector. This means it splits the heap into multiple generations, which it calls Eden, Survivor and Old.

By broad strokes, it works like this: New allocations are made in Eden. When eden fills up, a young GC cycle kicks in and copies any still living objects to the survivor space. Live objects that have been in survivor space for a long enough are copied to old generation. A full GC is needed to clean objects out of the old generation. Young GC is stop-the-world, so your application will see pauses when it occurs. If the GC logs show a long time spent copying, then tenuring is the cause.

73481.571: [GC pause (young)
Desired survivor size 524288000 bytes, new threshold 6 (max 6)
- age 1: 1990040 bytes, 1990040 total
- age 2: 1702416 bytes, 3692456 total
- age 3: 1698744 bytes, 5391200 total
- age 4: 1701408 bytes, 7092608 total
- age 5: 1700608 bytes, 8793216 total
- age 6: 1701408 bytes, 10494624 total
, 0.1204130 secs]
[Parallel Time: 107.3 ms, GC Workers: 8]
[GC Worker Start (ms): Min: 73481571.2, Avg: 73481576.1, Max: 73481585.3, Diff: 14.1]
[Ext Root Scanning (ms): Min: 0.0, Avg: 3.5, Max: 6.6, Diff: 6.5, Sum: 27.8]
[Update RS (ms): Min: 0.0, Avg: 1.7, Max: 3.5, Diff: 3.5, Sum: 13.7]
[Processed Buffers: Min: 0, Avg: 6.0, Max: 13, Diff: 13, Sum: 48]
[Scan RS (ms): Min: 0.7, Avg: 1.5, Max: 1.7, Diff: 1.0, Sum: 11.9]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 2.0]
[Object Copy (ms): Min: 81.9, Avg: 84.6, Max: 85.6, Diff: 3.6, Sum: 676.8]
[Termination (ms): Min: 9.5, Avg: 10.0, Max: 10.6, Diff: 1.1, Sum: 79.8]
[GC Worker Other (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 1.9]
[GC Worker Total (ms): Min: 92.5, Avg: 101.7, Max: 106.8, Diff: 14.3, Sum: 813.7]
[GC Worker End (ms): Min: 73481677.7, Avg: 73481677.9, Max: 73481678.1, Diff: 0.4]
[Code Root Fixup: 0.1 ms]
[Code Root Migration: 0.5 ms]
[Clear CT: 2.9 ms]
[Other: 9.6 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 5.1 ms]
[Ref Enq: 0.3 ms]
[Free CSet: 4.0 ms]
[Eden: 7980.0M(7980.0M)->0.0B(7984.0M) Survivors: 20.0M->16.0M Heap: 10170.3M(12.0G)->2192.1M(12.0G)]


The JVM offers no way to tell which objects are surviving and being copied between generations. Profilers like YourKit and Mission Control don’t show it. The information doesn’t seem to exist in hprof files. Really, you’re only option is to eyeball your code and hope something jumps out at you. Good luck if this is a large system, with many subsystems and years of development work!
Alternatively you can eyeball multiple object histograms from multiple hprof dumps taken over an extended period and hope something reveals itself. However, there’s no accurate and reliable way of finding out what is being copied between regions.

When all else fails, you can hack up the JDK…

Compiling your own JDK isn’t as bad as it sounds, though they do try to discourage it by burning your retinas with instructions on a cyan background.
Hacking it up, to show the info you want, is a bit more involved, but not rocket science. I modified the JDK to record every object it copies between generations in a map, and then output the list when outputting per thread statistics. It’s a hack. But it doesn’t need to go into production. It doesn’t need to be pretty. It doesn’t need to follow the JDK style. It doesn’t need to be good C++. It just needs to work. You can find my patch here.

First get the JDK sources and the dependencies needed to build.

~$ sudo apt-get install mercurial # needed to pull code
~$ sudo apt-get build-dep openjdk-7 # needed to build code
~$ hg clone jdk8u-wtftenure
~$ cd jdk8u-wtftenure
~/jdk8u-wtftenure$ bash ./


Apply the patch and build.

This should leave you with a JVM image in ~/jdk8u-wtftenure/build/linux-x86_64-normal-server-release/images/j2sdk-image/.

To use it, set JAVA_HOME to this path, and add $JAVA_HOME/bin to your PATH.

To record the list of objects being tenured you need to set “-XX:+ParallelGCVerbose” and “-XX:+UseG1GC”. This only works with the G1GC collector.

The output will look something like

15 71.19 56.85 79.86 0.01 0.01 3 0 0 0
IKDEBUG >> printing ages
IKDEBUG [B - age 0 - count 79
IKDEBUG [B - age 1 - count 152
IKDEBUG java.lang.ref.WeakReference - age 0 - count 78
IKDEBUG java.lang.ref.WeakReference - age 1 - count 533
IKDEBUG java.lang.Long - age 0 - count 74
IKDEBUG java.lang.Long - age 1 - count 96
IKDEBUG [Ljava.lang.Object; - age 0 - count 926
IKDEBUG [Ljava.lang.Object; - age 1 - count 832
IKDEBUG java.util.ArrayList - age 0 - count 92
IKDEBUG java.util.ArrayList - age 1 - count 246
IKDEBUG java.lang.ref.ReferenceQueue - age 0 - count 81
IKDEBUG java.lang.ref.ReferenceQueue - age 1 - count 533
IKDEBUG java.lang.ref.ReferenceQueue$Lock - age 0 - count 81
IKDEBUG java.lang.ref.ReferenceQueue$Lock - age 1 - count 533

This isn’t very useful in itself. You need to massage the data a bit to get something useful.

$ cat output.txt | awk '/IKDEBUG/ {SUM[$2] += $8} END { for (k in SUM) { print SUM[k] " " k }}' | sort -nr
83682219 java.util.concurrent.ConcurrentHashMap$Node
65864623 java.lang.Long
64053362 org.midonet.util.concurrent.TimedExpirationMap$Metadata
63963100 org.midonet.util.PaddedAtomicInteger
46520121 scala.Tuple2
46519012 java.util.concurrent.ConcurrentLinkedQueue$Node
40975159 org.midonet.packets.NatState$NatBinding
40480819 org.midonet.packets.IPv4Addr
30912221 [Ljava.lang.Object;
29973083 org.midonet.midolman.state.NatState$NatKey$$anon$1
28331915 java.util.HashMap$Node
23560549 java.util.IdentityHashMap
23560027 java.util.IdentityHashMap$KeySet
23527913 java.util.Collections$SetFromMap
21620801 org.midonet.midolman.state.ConnTrackState$ConnTrackKey$$anon$1
17011224 java.util.ArrayList
11919325 org.midonet.sdn.flows.FlowTagger$DestinationIpTag
11264756 org.midonet.midolman.topology.RouterMapper$RemoveTagCallback
11250663 org.midonet.midolman.state.FlowStateReplicator$$anon$3$$anon$6
11229007 [B

So once you have this data, what do you do? It really depends on your application.
In our case, it highlighted a number of places where we were keeping data for a relatively long time (1 minute). We already knew this was bad, but we had implemented object pooling and we .thought this was taking care of it. The data said otherwise…

Our solution was to take these data structures and reimplement in C++, and call out to them via JNI. This won’t be the solution to all problems like this, but in our case it brought time spent in GC under heavy load from multiple second per minute, to roughly 150ms per minute!

All examples in this post use Ubuntu 14.04. The JDK source code was taken pulled on June 2nd, 2017, so if the patch doesn’t apply revert to that date. This only works with G1GC collector.


Leave a Reply

Your email address will not be published. Required fields are marked *