我们的一个 java 应用程序在执行 GC 时会出现很长的暂停,似乎是因为“ParNew(提升失败)”而花费了时间。
Java 版本:
$ java -version
java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
GC设置:
JAVA_OPTS="$JAVA_OPTS -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -Dsun.rmi.dgc.client.gcInterval=86400000 -Dsun.rmi.dgc.server.gcInterval=86400000 -verbose:gc -Xloggc:/path/to/gc/gc.log"
JAVA_OPTS="$JAVA_OPTS -XX:NewSize=5g -XX:MaxNewSize=5g -Xms10g -Xmx10g -XX:PermSize=512m -XX:MaxPermSize=512m -XX:SurvivorRatio=30 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15"
# Parallel setting
JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC -XX:ParallelGCThreads=16"
# CMS setting
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+CMSCompactWhenClearAllSoftRefs -XX:CMSInitiatingOccupancyFraction=85 -XX:+CMSScavengeBeforeRemark -XX:+CMSConcurrentMTEnabled -XX:ParallelCMSThreads=2"
以下是几次长时间暂停的 gc.log 输出:
2015-06-04T10:02:28.988+0000: 129998.706: [GC[YG occupancy: 1904924 K (5079040 K)]2015-06-04T10:02:28.988+0000: 129998.706: [GC 129998.706: [ParNew (promotion
failed)
Desired survivor size 150994944 bytes, new threshold 10 (max 15)
- age 1: 39557704 bytes, 39557704 total
- age 2: 23752968 bytes, 63310672 total
- age 3: 11024200 bytes, 74334872 total
- age 4: 10529816 bytes, 84864688 total
- age 5: 9602712 bytes, 94467400 total
- age 6: 9621976 bytes, 104089376 total
- age 7: 27355240 bytes, 131444616 total
- age 8: 6326944 bytes, 137771560 total
- age 9: 7153592 bytes, 144925152 total
- age 10: 6944200 bytes, 151869352 total
- age 11: 7226592 bytes, 159095944 total
- age 12: 6674048 bytes, 165769992 total
: 1904924K->1904924K(5079040K), 84.3892780 secs] 6687900K- >7143259K(10321920K), 84.3894730 secs] [Times: user=241.33 sys=9.83, real=84.39 secs]
130083.096: [Rescan (parallel) , 1.0518870 secs]130084.148: [weak refs processing, 0.0056350 secs] [1 CMS-remark: 5238334K(5242880K)] 7143259K(10321920K), 85.
4473940 secs] [Times: user=249.44 sys=10.00, real=85.44 secs]
Total time for which application threads were stopped: 85.4485540 seconds
2015-06-09T01:36:38.354+0000: 531648.072: [GC[YG occupancy: 3444075 K (5079040 K)]2015-06-09T01:36:38.354+0000: 531648.072: [GC 531648.072: [ParNew (promotion
failed)
Desired survivor size 150994944 bytes, new threshold 6 (max 15)
- age 1: 118719600 bytes, 118719600 total
- age 2: 20791640 bytes, 139511240 total
- age 3: 2411752 bytes, 141922992 total
- age 4: 3146888 bytes, 145069880 total
- age 5: 3595160 bytes, 148665040 total
- age 6: 3950560 bytes, 152615600 total
- age 7: 2455960 bytes, 155071560 total
- age 8: 2780536 bytes, 157852096 total
- age 9: 2346000 bytes, 160198096 total
- age 10: 2952600 bytes, 163150696 total
- age 11: 1078304 bytes, 164229000 total
- age 12: 1857992 bytes, 166086992 total
- age 13: 1093752 bytes, 167180744 total
: 3444075K->3444075K(5079040K), 53.3528910 secs] 7494225K->8679897K(10321920K), 53.3531460 secs] [Times: user=168.75 sys=10.21, real=53.36 secs]
531701.426: [Rescan (parallel) , 1.2041260 secs]531702.630: [weak refs processing, 0.0002920 secs] [1 CMS-remark: 5235821K(5242880K)] 8679897K(10321920K), 54.
5583950 secs] [Times: user=178.02 sys=10.43, real=54.57 secs]
Total time for which application threads were stopped: 54.5615990 seconds
2015-06-09T01:37:32.913+0000: 531702.631: [CMS-concurrent-sweep-start]
2015-06-10T12:58:53.687+0000: 658983.405: [GC 658983.405: [ParNew (promotion failed)
Desired survivor size 150994944 bytes, new threshold 11 (max 15)
- age 1: 62173384 bytes, 62173384 total
- age 2: 4913808 bytes, 67087192 total
- age 3: 10095568 bytes, 77182760 total
- age 4: 12476376 bytes, 89659136 total
- age 5: 8521376 bytes, 98180512 total
- age 6: 985064 bytes, 99165576 total
- age 7: 8824120 bytes, 107989696 total
- age 8: 9991888 bytes, 117981584 total
- age 9: 23527832 bytes, 141509416 total
- age 10: 5438928 bytes, 146948344 total
- age 11: 6171880 bytes, 153120224 total
- age 12: 6983376 bytes, 160103600 total
- age 13: 1196080 bytes, 161299680 total
- age 14: 5357040 bytes, 166656720 total
: 5079040K->5079040K(5079040K), 31.3569910 secs]659014.762: [CMS: 5239285K->5242879K(5242880K), 11.5386160 secs] 9441788K->5341454K(10321920K), [CMS Perm : 42
6919K->426638K(524288K)], 42.8959280 secs] [Times: user=120.92 sys=7.35, real=42.90 secs]
Total time for which application threads were stopped: 42.8996760 seconds
Application time: 0.0001030 seconds
2015-06-10T12:59:36.587+0000: 659026.305: [GC [1 CMS-initial-mark: 5242879K(5242880K)] 5343035K(10321920K), 0.0946190 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
Total time for which application threads were stopped: 0.0979600 seconds
2015-06-10T12:59:36.682+0000: 659026.400: [CMS-concurrent-mark-start]
Application time: 0.2920200 seconds
我该如何解决这个问题?GC 设置是否有问题?
答案1
更新到 Java 1.6 的新版本可能会有帮助。一些补丁与 GC 行为有关。
看起来新的大小非常大。通常,在新一代 GC 中,只有几 MB 的空间幸存下来,而这似乎注定要留给老一代。尝试将大小降低-XX:NewSize
到 128m 到 512m 之间。删除该-XX:MaxNewSize=5g
设置可能不会有什么坏处。
这些设置可能会导致问题-XX:SurvivorRatio=30 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15"
。尝试设置-XX:SurvivorRatio=32
。这些设置将导致 GC 例程不断尝试调整幸存者比率。您获得的幸存者比率约为 10 到 15。监控幸存者占用率并进行调整,使其接近满载。
您可能希望将-XX:MaxTenuringThreshold
设置设为 4 或更低。在并发 GC 期间,新生代中的对象越多,速度就越慢。您希望在初始标记期间,新生代尽可能接近空。根据我的测试,在并发标记过程中,新生代中的所有对象都被视为活动对象。由于标记过程所花费的时间与活动对象的数量成正比,因此这可能是一个沉重的惩罚。使用较低的设置,您可能能够提高幸存者比率。
该设置-XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime
往往会使日志变得混乱,可能不会产生太多有用的信息。我会删除它们。
您可能需要增加用于并发 GC 的线程数。考虑 25% 的 CPU 线程。
考虑将这个值降低-XX:CMSInitiatingOccupancyFraction
到 70 到 75 之间。考虑到内存分配的大小,85% 可能就够了。如果内存过于分散,在内存压缩时会遇到长时间的停顿。
考虑将指定的总内存降至-Xms
-Xmx
5g 或 6g。您可能希望每 1 到 4 小时进行一次旧代 GC。如果内存在 GC 周期之间被调出,您可能需要等待它被调入。这会显著增加 GC 时间。您可能需要更多的旧代来支持较低的启动占用率。