您是否遇到过以下问题?我们的环境是 CentOS 7.2,运行 Java 应用时,系统有时会莫名其妙地重启,报错截图如下,请大家帮帮忙。
[BEGIN] 2019/10/19 12:09:34
STATE: TASK_RUNNING (PANIC)
crash> exit
[root@test ~]# crash /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux /home/vmcore
crash 7.1.5-2.el7
Copyright (C) 2002-2016 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.el7.x86_64/vmlinux
DUMPFILE: /home/vmcore [PARTIAL DUMP]
CPUS: 4
DATE: Fri Oct 18 14:59:51 2019
UPTIME: 129 days, 17:06:58
LOAD AVERAGE: 0.00, 0.01, 0.05
TASKS: 1011
NODENAME: XXXXXXXXXXXXXXXX
RELEASE: 3.10.0-327.el7.x86_64
VERSION: #1 SMP Thu Nov 19 22:10:57 UTC 2015
MACHINE: x86_64 (2194 Mhz)
MEMORY: 8 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"
PID: 122524
COMMAND: "java"
TASK: ffff88006e51f300 [THREAD_INFO: ffff8800bb150000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash>
crash>
crash>
crash> bt
PID: 122524 TASK: ffff88006e51f300 CPU: 0 COMMAND: "java"
#0 [ffff8800bb1538f0] machine_kexec at ffffffff81051beb
#1 [ffff8800bb153950] crash_kexec at ffffffff810f2542
#2 [ffff8800bb153a20] oops_end at ffffffff8163e1a8
#3 [ffff8800bb153a48] no_context at ffffffff8162e2b8
#4 [ffff8800bb153a98] __bad_area_nosemaphore at ffffffff8162e34e
#5 [ffff8800bb153ae0] bad_area at ffffffff8162e6c7
#6 [ffff8800bb153b08] __do_page_fault at ffffffff81641035
#7 [ffff8800bb153b60] trace_do_page_fault at ffffffff816411b3
#8 [ffff8800bb153b98] do_async_page_fault at ffffffff816408d9
#9 [ffff8800bb153bb0] async_page_fault at ffffffff8163d438
[exception RIP: tcp_sendmsg+261]
RIP: ffffffff81576a15 RSP: ffff8800bb153c68 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800bb153da8 RCX: ffff8800bb153fd8
RDX: 00000000fffffefd RSI: 0000000000000000 RDI: ffff8802302bcb70
RBP: ffff8800bb153d20 R8: 0000000000000000 R9: 0000000000000001
R10: ffff8800bb153da8 R11: 0000000000000293 R12: 00000000000000ef
R13: ffff8802302bcb00 R14: ffff8800bb153e28 R15: ffff880104d46400
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff8800bb153d28] inet_sendmsg at ffffffff815a0f44
#11 [ffff8800bb153d58] sock_aio_write at ffffffff8150fe47
#12 [ffff8800bb153e20] do_sync_write at ffffffff811dddad
#13 [ffff8800bb153ef8] vfs_write at ffffffff811de6c5
#14 [ffff8800bb153f38] sys_write at ffffffff811df06f
#15 [ffff8800bb153f80] system_call_fastpath at ffffffff81645909
RIP: 00007fe29b4336ad RSP: 00007fe198f1f220 RFLAGS: 00000246
RAX: 0000000000000001 RBX: ffffffff81645909 RCX: 000000007fffffff
RDX: 00000000000000ef RSI: 00007fe278130d40 RDI: 00000000000001e1
RBP: 00007fe198f1f2d0 R8: 00000000000000ef R9: 00000006d421efa0
R10: 0000000000002370 R11: 0000000000000293 R12: 00007fe198f1f310
R13: 00000000000000ef R14: 00007fe1d041c3b8 R15: 00007fe278130d40
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash>
crash>
crash>
crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 1997872 7.6 GB ----
FREE 39523 154.4 MB 1% of TOTAL MEM
USED 1958349 7.5 GB 98% of TOTAL MEM
SHARED 601413 2.3 GB 30% of TOTAL MEM
BUFFERS 0 0 0% of TOTAL MEM
CACHED 718444 2.7 GB 35% of TOTAL MEM
SLAB 54483 212.8 MB 2% of TOTAL MEM
TOTAL SWAP 2097151 8 GB ----
SWAP USED 764 3 MB 0% of TOTAL SWAP
SWAP FREE 2096387 8 GB 99% of TOTAL SWAP
COMMIT LIMIT 3096087 11.8 GB ----
COMMITTED 1466361 5.6 GB 47% of TOTAL LIMIT
crash>
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff81951440 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffff880232590000 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffff880232590b80 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffff880232591700 RU 0.0 0 0 [swapper/3]
1 0 1 ffff880232d78000 IN 0.0 188780 3292 systemd
2 0 0 ffff880232d78b80 IN 0.0 0 0 [kthreadd]
3 2 0 ffff880232d79700 IN 0.0 0 0 [ksoftirqd/0]
7 2 0 ffff880232d7c500 IN 0.0 0 0 [migration/0]
8 2 0 ffff880232d7d080 IN 0.0 0 0 [rcu_bh]
9 2 0 ffff880232d7dc00 IN 0.0 0 0 [rcuob/0]
169 2 0 ffff8802329f2280 IN 0.0 0 0 [rcu_sched]
170 2 0 ffff8802329f2e00 IN 0.0 0 0 [rcuos/0]
329 2 0 ffff880232566780 IN 0.0 0 0 [rcuos/159]
330 2 0 ffff880232567300 IN 0.0 0 0 [watchdog/0]
331 2 1 ffff88023213b980 IN 0.0 0 0 [watchdog/1]
332 2 1 ffff88023213c500 IN 0.0 0 0 [migration/1]
333 2 1 ffff88023213d080 IN 0.0 0 0 [ksoftirqd/1]
336 2 2 ffff88023213f300 IN 0.0 0 0 [watchdog/2]
337 2 2 ffff880232170000 IN 0.0 0 0 [migration/2]
338 2 2 ffff880232170b80 IN 0.0 0 0 [ksoftirqd/2]
340 2 2 ffff880232172280 IN 0.0 0 0 [kworker/2:0H]
341 2 3 ffff880232172e00 IN 0.0 0 0 [watchdog/3]
342 2 3 ffff880232173980 IN 0.0 0 0 [migration/3]
343 2 3 ffff880232174500 IN 0.0 0 0 [ksoftirqd/3]
345 2 3 ffff880232175c00 IN 0.0 0 0 [kworker/3:0H]
346 2 1 ffff880232176780 IN 0.0 0 0 [khelper]
347 2 3 ffff880232177300 IN 0.0 0 0 [kdevtmpfs]
348 2 1 ffff8802322b0000 IN 0.0 0 0 [netns]
349 2 1 ffff8802322b0b80 IN 0.0 0 0 [perf]
350 2 1 ffff8802322b1700 IN 0.0 0 0 [writeback]
351 2 1 ffff8802322b2280 IN 0.0 0 0 [kintegrityd]
352 2 1 ffff8802322b2e00 IN 0.0 0 0 [bioset]
353 2 1 ffff8802322b3980 IN 0.0 0 0 [kblockd]
354 2 1 ffff8802322b4500 IN 0.0 0 0 [md]
360 2 0 ffff880231fd8b80 IN 0.0 0 0 [khungtaskd]
361 2 0 ffff880231fd9700 IN 0.0 0 0 [kswapd0]
362 2 1 ffff880231fda280 IN 0.0 0 0 [ksmd]
363 2 1 ffff880231fdae00 IN 0.0 0 0 [khugepaged]
364 2 2 ffff880231fdb980 IN 0.0 0 0 [fsnotify_mark]
365 2 1 ffff880231fdc500 IN 0.0 0 0 [crypto]
373 2 1 ffff88022e731700 IN 0.0 0 0 [kthrotld]
375 2 1 ffff88022e732e00 IN 0.0 0 0 [kmpath_rdacd]
376 2 1 ffff88022e733980 IN 0.0 0 0 [kpsmoused]
395 2 1 ffff88022e0fb980 IN 0.0 0 0 [deferwq]
428 2 1 ffff88022e0f8b80 IN 0.0 0 0 [kauditd]
596 2 2 ffff88022e111700 IN 0.0 0 0 [ata_sff]
598 2 0 ffff88022e115c00 IN 0.0 0 0 [scsi_eh_0]
599 2 2 ffff88022e112280 IN 0.0 0 0 [scsi_tmf_0]
600 2 1 ffff880036a8ae00 IN 0.0 0 0 [events_power_ef]
605 2 1 ffff88022da78000 IN 0.0 0 0 [scsi_eh_1]
608 2 1 ffff88022da79700 IN 0.0 0 0 [scsi_tmf_1]
620 2 2 ffff88022da7dc00 IN 0.0 0 0 [ttm_swap]
621 2 3 ffff88022e324500 IN 0.0 0 0 [virtscsi-scan]
622 2 2 ffff88022e322e00 IN 0.0 0 0 [scsi_eh_2]
625 2 3 ffff88022e320000 IN 0.0 0 0 [scsi_tmf_2]
690 2 0 ffff88022e325c00 IN 0.0 0 0 [kdmflush]
691 2 0 ffff88022e326780 IN 0.0 0 0 [bioset]
698 2 0 ffff88022e327300 IN 0.0 0 0 [kdmflush]
699 2 0 ffff88022e321700 IN 0.0 0 0 [bioset]
716 2 1 ffff88022da7b980 IN 0.0 0 0 [xfsalloc]
717 2 1 ffff88022da7c500 IN 0.0 0 0 [xfs_mru_cache]
718 2 1 ffff88022da7f300 IN 0.0 0 0 [xfs-buf/dm-1]
719 2 1 ffff88022da7d080 IN 0.0 0 0 [xfs-data/dm-1]
720 2 1 ffff88022da7e780 IN 0.0 0 0 [xfs-conv/dm-1]
721 2 1 ffff88022cf18000 IN 0.0 0 0 [xfs-cil/dm-1]
722 2 1 ffff88022cf18b80 IN 0.0 0 0 [xfsaild/dm-1]
793 1 0 ffff880036895c00 IN 0.4 78188 40640 systemd-journal
822 1 2 ffff88022e38a280 IN 0.0 200764 1124 lvmetad
823 1 3 ffff88022e38c500 IN 0.0 43552 1040 systemd-udevd
824 2 2 ffff88022cf1ae00 IN 0.0 0 0 [rpciod]
849 2 1 ffff88022cf1dc00 IN 0.0 0 0 [vballoon]
863 2 3 ffff88022cf1e780 IN 0.0 0 0 [xfs-buf/vda1]
864 2 1 ffff88022cf19700 IN 0.0 0 0 [xfs-data/vda1]
865 2 1 ffff88022cf1c500 IN 0.0 0 0 [xfs-conv/vda1]
866 2 0 ffff88022ce9a280 IN 0.0 0 0 [xfs-cil/vda1]
868 2 3 ffff88022ce98b80 IN 0.0 0 0 [xfsaild/vda1]
881 2 1 ffff88022ce99700 IN 0.0 0 0 [kdmflush]
882 2 1 ffff88022ce9d080 IN 0.0 0 0 [bioset]
884 2 1 ffff88022ce9ae00 IN 0.0 0 0 [kdmflush]
885 2 1 ffff88022ce9c500 IN 0.0 0 0 [bioset]
891 2 1 ffff88022ce9b980 IN 0.0 0 0 [xfs-buf/dm-2]
892 2 1 ffff8800bb500000 IN 0.0 0 0 [xfs-data/dm-2]
893 2 1 ffff8800bb500b80 IN 0.0 0 0 [xfs-conv/dm-2]
894 2 1 ffff8800bb501700 IN 0.0 0 0 [xfs-cil/dm-2]
896 2 2 ffff8800bb502280 IN 0.0 0 0 [xfsaild/dm-2]
901 2 1 ffff8800bb505080 IN 0.0 0 0 [xfs-buf/dm-3]
902 2 1 ffff8800bb505c00 IN 0.0 0 0 [xfs-data/dm-3]
903 2 1 ffff8800bb506780 IN 0.0 0 0 [xfs-conv/dm-3]
904 2 1 ffff8800bb507300 IN 0.0 0 0 [xfs-cil/dm-3]
905 2 1 ffff880230bd8000 IN 0.0 0 0 [xfsaild/dm-3]
921 1 2 ffff88022e388b80 IN 0.0 114560 1588 auditd
933 1 2 ffff88022e730b80 IN 0.0 114560 1588 auditd
944 1 2 ffff88022e388000 IN 0.0 24340 1584 systemd-logind
945 1 0 ffff88022e38d080 IN 0.0 21616 1024 qemu-ga
947 1 2 ffff88022e389700 IN 0.0 24420 1416 dbus-daemon
948 1 0 ffff880230bdd080 IN 0.0 195044 692 gssproxy
949 1 1 ffff880230bd8b80 IN 0.0 195044 692 gssproxy
950 1 1 ffff880230bde780 IN 0.0 195044 692 gssproxy
951 1 1 ffff880230bd9700 IN 0.0 195044 692 gssproxy
952 1 1 ffff880230bdae00 IN 0.0 195044 692 gssproxy
953 1 1 ffff880230bdf300 IN 0.0 195044 692 gssproxy
954 1 1 ffff88022e38b980 IN 0.0 19296 996 irqbalance
957 1 1 ffff88022e38dc00 IN 0.2 689696 23020 rsyslogd
979 1 0 ffff88022e735c00 IN 0.2 689696 23020 in:imjournal
980 1 0 ffff88022e735080 IN 0.2 689696 23020 rs:main Q:Reg
1157 1 1 ffff88022e333980 IN 0.0 103844 3864 sshd
1164 1 0 ffff880230f0ae00 IN 0.0 24996 408 xinetd
1188 1 0 ffff880230f08b80 IN 0.0 80660 1412 zabbix_agentd
1197 1188 2 ffff88022f1f4500 IN 0.0 80660 1596 zabbix_agentd
1198 1188 3 ffff88022f1f5c00 IN 0.0 80784 2384 zabbix_agentd
1199 1188 3 ffff88022f1f3980 IN 0.0 80784 2384 zabbix_agentd
1200 1188 3 ffff880231fdf300 IN 0.0 80784 2384 zabbix_agentd
1201 1188 0 ffff880231fde780 IN 0.0 80792 2276 zabbix_agentd
1207 1 3 ffff8800bb504500 IN 0.0 110048 756 agetty
1326 1 1 ffff88022e113980 IN 0.0 89476 2092 master
1332 1326 0 ffff880230bda280 IN 0.0 89648 3836 qmgr
1352 1 1 ffff88022e332e00 IN 0.0 124192 1584 crond
1353 1 3 ffff88022e115080 IN 0.0 25672 1820 ntpd
1376 2 2 ffff88022e110b80 IN 0.0 0 0 [kworker/2:1H]
1912 2 0 ffff880144a39700 IN 0.0 0 0 [kworker/0:2]
2065 2 3 ffff88022e322280 IN 0.0 0 0 [kworker/3:1H]
10484 2 1 ffff88014cf45c00 IN 0.0 0 0 [kworker/1:2H]
10833 2 2 ffff88009e142e00 IN 0.0 0 0 [kworker/2:2]
12810 1 3 ffff8800bb331700 IN 46.4 9428784 4377788 java
12811 1 3 ffff8800bb336780 IN 46.4 9428784 4377788 java
30122 2 2 ffff88022fb85080 IN 0.0 0 0 [kworker/u320:0]
41953 2 3 ffff88006e411700 IN 0.0 0 0 [kworker/3:0]
42462 2 2 ffff88014568ae00 IN 0.0 0 0 [kworker/u320:2]
49529 1326 3 ffff88022ff47300 IN 0.0 89580 3940 pickup
50458 1 1 ffff8800bb330000 IN 46.4 9428784 4377788 java
51760 2 1 ffff88022c794500 IN 0.0 0 0 [kworker/1:1]
53165 2 1 ffff88022c792e00 IN 0.0 0 0 [kworker/1:0H]
54670 1 3 ffff8802303ce780 IN 46.4 9428784 4377788 java
55240 1 1 ffff88009e387300 IN 0.0 64948 1084 rpcbind
58290 1 3 ffff88014c99b980 IN 46.4 9428784 4377788 java
58294 1 3 ffff88014c99e780 IN 46.4 9428784 4377788 java
58613 2 0 ffff880036a8dc00 IN 0.0 0 0 [kworker/0:2H]
59013 2 1 ffff8800bb330b80 IN 0.0 0 0 [kworker/1:0]
59267 2 0 ffff8800bb1d5080 IN 0.0 0 0 [kworker/0:1]
59424 2 2 ffff8800bb1d0000 IN 0.0 0 0 [kworker/2:0]
59918 2 3 ffff8800bb1d2280 IN 0.0 0 0 [kworker/3:1]
61398 1 3 ffff88013024b980 IN 46.4 9428784 4377788 java
66543 2 0 ffff88014c99dc00 IN 0.0 0 0 [kworker/0:0H]
87627 1 0 ffff880230634500 IN 46.4 9428784 4377788 java
87628 1 1 ffff880230630b80 IN 46.4 9428784 4377788 java
123411 1 3 ffff8800910fe780 IN 46.4 9428784 4377788 java
123412 1 3 ffff880016043980 IN 46.4 9428784 4377788 java
123436 1 0 ffff8800bb3ce780 IN 46.4 9428784 4377788 java
123438 1 2 ffff8800bb3c8b80 IN 46.4 9428784 4377788 java
123439 1 0 ffff880016042280 IN 46.4 9428784 4377788 java
124242 1 0 ffff88006e415c00 IN 46.4 9428784 4377788 java
124243 1 0 ffff88022c482280 IN 46.4 9428784 4377788 java
124244 1 0 ffff88006e412e00 IN 46.4 9428784 4377788 java
124728 1 1 ffff880016040000 IN 46.4 9428784 4377788 java
125051 1 2 ffff88022da78b80 IN 46.4 9428784 4377788 java
149731 1 2 ffff88009e10ae00 IN 2.8 4143428 265568 java
149732 1 0 ffff88009e10d080 IN 2.8 4143428 265568 java
149733 1 3 ffff88022e2c3980 IN 2.8 4143428 265568 java
149734 1 3 ffff8802304ab980 IN 2.8 4143428 265568 java
149735 1 3 ffff8802304a8000 IN 2.8 4143428 265568 java
149736 1 1 ffff88022e116780 IN 2.8 4143428 265568 java
149737 1 0 ffff88022e2c5080 IN 2.8 4143428 265568 java
149738 1 0 ffff88009e143980 IN 2.8 4143428 265568 java
149739 1 3 ffff88009e10a280 IN 2.8 4143428 265568 java
149740 1 2 ffff88009e10b980 IN 2.8 4143428 265568 java
crash>
crash>
crash> bt
PID: 122524 TASK: ffff88006e51f300 CPU: 0 COMMAND: "java"
#0 [ffff8800bb1538f0] machine_kexec at ffffffff81051beb
#1 [ffff8800bb153950] crash_kexec at ffffffff810f2542
#2 [ffff8800bb153a20] oops_end at ffffffff8163e1a8
#3 [ffff8800bb153a48] no_context at ffffffff8162e2b8
#4 [ffff8800bb153a98] __bad_area_nosemaphore at ffffffff8162e34e
#5 [ffff8800bb153ae0] bad_area at ffffffff8162e6c7
#6 [ffff8800bb153b08] __do_page_fault at ffffffff81641035
#7 [ffff8800bb153b60] trace_do_page_fault at ffffffff816411b3
#8 [ffff8800bb153b98] do_async_page_fault at ffffffff816408d9
#9 [ffff8800bb153bb0] async_page_fault at ffffffff8163d438
[exception RIP: tcp_sendmsg+261]
RIP: ffffffff81576a15 RSP: ffff8800bb153c68 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800bb153da8 RCX: ffff8800bb153fd8
RDX: 00000000fffffefd RSI: 0000000000000000 RDI: ffff8802302bcb70
RBP: ffff8800bb153d20 R8: 0000000000000000 R9: 0000000000000001
R10: ffff8800bb153da8 R11: 0000000000000293 R12: 00000000000000ef
R13: ffff8802302bcb00 R14: ffff8800bb153e28 R15: ffff880104d46400
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff8800bb153d28] inet_sendmsg at ffffffff815a0f44
#11 [ffff8800bb153d58] sock_aio_write at ffffffff8150fe47
#12 [ffff8800bb153e20] do_sync_write at ffffffff811dddad
#13 [ffff8800bb153ef8] vfs_write at ffffffff811de6c5
#14 [ffff8800bb153f38] sys_write at ffffffff811df06f
#15 [ffff8800bb153f80] system_call_fastpath at ffffffff81645909
RIP: 00007fe29b4336ad RSP: 00007fe198f1f220 RFLAGS: 00000246
RAX: 0000000000000001 RBX: ffffffff81645909 RCX: 000000007fffffff
RDX: 00000000000000ef RSI: 00007fe278130d40 RDI: 00000000000001e1
RBP: 00007fe198f1f2d0 R8: 00000000000000ef R9: 00000006d421efa0
R10: 0000000000002370 R11: 0000000000000293 R12: 00007fe198f1f310
R13: 00000000000000ef R14: 00007fe1d041c3b8 R15: 00007fe278130d40
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash>
crash> dis -l tcp_sendmsg+261
/usr/src/debug/kernel-3.10.0-327.el7/linux-3.10.0-327.el7.x86_64/arch/x86/include/asm/bitops.h: 104
0xffffffff81576a15 <tcp_sendmsg+261>: lock andb $0xfe,0x8(%rax)
crash>
crash> exit
[END] 2019/10/19 14:28:41
答案1
如果可能的话,我建议先更新到最新的 CentOS 7.x 内核,然后再花更多精力来排除故障。这可能是一些旧的错误,在 2015 年 11 月到现在的某个时间被修复了。
从回溯来看,系统似乎正在执行文件系统写入操作(sys_write
,vfs_write
),由于某种原因,该操作需要网络访问(inet_sendmsg
)。您使用的是 NFS 还是其他需要网络访问的文件系统类型?无论如何,在 中,tcp_sendmsg
结果发现某个指针具有未初始化(空)值,这会触发异常并导致崩溃。查看tcp_sendmsg()
中net/ipv4/tcp.c
,该函数执行的第一件事就是锁定套接字,因此套接字指针可能为 NULL。