随机内核恐慌,没有明显的罪魁祸首

随机内核恐慌,没有明显的罪魁祸首

前段时间,我把旧台式机改造成debian服务器,完美运行了半年。

但后来我决定将机器移到互联网连接更好的地方,并添加一堆硬盘,使其成为一个合适的存储服务器(可以说是自制的 NAS)。

从现在开始,服务器随机崩溃。有时,需要一个多月才能崩溃。有时,需要一天的时间。最近,崩溃频率约为2-3天。

查看 dmesg,每次崩溃的原因似乎都不同。我完全不知道崩溃的原因是什么。

设置

  • CPU:Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
  • 主板:微星 MS-7821/Z87-G45 GAMING
  • 该机器在 Linux 4.9.0-8-amd64 上运行 Debian Stretch
  • Kdump 已安装
  • 系统安装在三星SSD 840 PRO (128 GB)上
  • 5 个 8 TB Western Digital Red HDD 用于存储
  • HDD 最初使用 mdadm 进行软件 RAID5 配置,但现在由 ZFS 使用 raidz2 进行管理。
  • Apache2(带有 nextcloud)和传输守护进程运行

消息

dmesg.201904140557
[230866.137537] PANIC: double fault, error_code: 0x0
[230866.137548] PANIC: double fault, error_code: 0x0
[230866.137550] CPU: 2 PID: 25608 Comm: apache2 Tainted: P          IO    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[230866.137551] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[230866.137551] task: ffff8d7d1eabe0c0 task.stack: ffffa02483d5c000
[230866.137555] RIP: 0010:[<ffffffffad8192fa>]  [<ffffffffad8192fa>] syscall_return_via_sysret+0x3e/0x4d
[230866.137556] RSP: 0018:ffffa02483d5ff50  EFLAGS: 00010002
[230866.137556] RAX: 0000000510035080 RBX: 0000000000000000 RCX: 00007fec9d79eacf
[230866.137557] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[230866.137557] RBP: 0000000000000000 R08: 00007fec6461ee20 R09: 0000000000000000
[230866.137558] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[230866.137558] R13: 0000000000000000 R14: 00007fec6461ee20 R15: 0000000000000000
[230866.137559] FS:  00007fec6461f700(0000) GS:ffff8d7e9fb00000(0000) knlGS:0000000000000000
[230866.137560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[230866.137560] CR2: ffffa02483d5ff48 CR3: 0000000510034000 CR4: 0000000000160670
[230866.137561] Stack:
[230866.137563]  0000000000000000 0000000000000000 00007fec6461ee20 0000000000000000
[230866.137564]  0000000000000000 0000000000000000 0000000000000000 0000000000000293
[230866.137565]  0000000000000000 0000000000000000 00007fec6461ee20 0000000000000000
[230866.137565] Call Trace:
[230866.137580] Code: 50 48 8b 54 24 60 48 8b 74 24 68 48 8b 7c 24 70 50 90 0f 20 d8 65 48 0b 04 25 e0 02 01 00 78 08 65 88 04 25 e7 02 01 00 0f 22 d8 <58> 48 8b a4 24 98 00 00 00 0f 01 f8 48 0f 07 50 90 0f 20 d8 65 
[230866.137580] Kernel panic - not syncing: Machine halted.
[230866.137581] CPU: 2 PID: 25608 Comm: apache2 Tainted: P          IO    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[230866.137582] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[230866.137583]  0000000000000000 ffffffffad534524 ffff8d7e9fb07f00 ffff8d7e9fb07f18
[230866.137584]  ffffffffad380ecd ffffffff00000008 ffff8d7e9fb07f28 ffff8d7e9fb07ec0
[230866.137585]  88dd6d6a799c212f 00000000000000c8 0000000000000092 0000000000000000
[230866.137585] Call Trace:
[230866.137589]  <#DF> 
[230866.137589]  [<ffffffffad534524>] ? dump_stack+0x5c/0x78
[230866.137591]  [<ffffffffad380ecd>] ? panic+0xe4/0x23f
[230866.137592]  [<ffffffffad258ac9>] ? df_debug+0x29/0x30
[230866.137594]  [<ffffffffad227b0f>] ? do_double_fault+0x9f/0x130
[230866.137595]  [<ffffffffad81a038>] ? double_fault+0x28/0x30
[230866.137596]  [<ffffffffad8192fa>] ? syscall_return_via_sysret+0x3e/0x4d

dmesg.201904172335
[322137.449206] general protection fault: 0000 [#1] SMP
[322137.464088] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc xt_multiport iptable_filter wireguard(O) ip6_udp_tunnel udp_tunnel overlay nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic zfs(PO) intel_rapl zunicode(PO) x86_pkg_temp_thermal zavl(PO) intel_powerclamp zcommon(PO) znvpair(PO) snd_hda_intel kvm_intel spl(O) kvm i915 snd_hda_codec irqbypass snd_hda_core snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul iTCO_wdt ghash_clmulni_intel drm_kms_helper intel_cstate mei_me iTCO_vendor_support snd_timer drm intel_uncore snd
[322137.678356]  soundcore evdev i2c_algo_bit mxm_wmi mei efi_pstore intel_rapl_perf lpc_ich sg shpchp serio_raw mfd_core pcspkr efivars wmi intel_smartconnect video button nfsd auth_rpcgss oid_registry nfs_acl lockd grace nct6775 hwmon_vid coretemp sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic usbhid hid dm_mod sd_mod xhci_pci ahci ehci_pci xhci_hcd ehci_hcd crc32c_intel libahci libata aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper psmouse cryptd scsi_mod i2c_i801 i2c_smbus alx usbcore mdio thermal usb_common fan
[322137.867812] CPU: 2 PID: 2034 Comm: transmission-da Tainted: P          IO    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[322137.898560] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[322137.922267] task: ffff9d0366de8040 task.stack: ffffb6ca48838000
[322137.940254] RIP: 0010:[<ffffffffc0dc49e2>]  [<ffffffffc0dc49e2>] zio_create+0x52/0x470 [zfs]
[322137.965860] RSP: 0018:ffffb6ca4883b970  EFLAGS: 00010282
[322137.982034] RAX: fbff9cff4e756040 RBX: fbff9cff4e756040 RCX: fbff9cff4e756040
[322138.003667] RDX: 0000000000000000 RSI: 0000000002404200 RDI: fbff9cff4e756048
[322138.025297] RBP: ffff9d03710ec680 R08: 000039c6a0245fd0 R09: 0000000000000002
[322138.046929] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb6ca4883bb30
[322138.068560] R13: 0000000000000001 R14: 00000000000f99d1 R15: ffff9cff040b1a10
[322138.090191] FS:  00007fee5e413700(0000) GS:ffff9d039fb00000(0000) knlGS:0000000000000000
[322138.114681] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[322138.132151] CR2: 000056466d3a1060 CR3: 00000005e6e22000 CR4: 0000000000160670
[322138.153783] Stack:
[322138.160066]  0000000000004000 ffff9cfebc544000 ffff9d0373c44000 ffff9d03710ec680
[322138.182681]  ffffffffc0d1eae0 ffff9cff040b1a10 ffff9cfebc544000 0000000000004000
[322138.205299]  ffff9d0373c44000 ffffffffc0dc551c ffffffffc0d1eae0 ffff9d027d98eaa8
[322138.227918] Call Trace:
[322138.235528]  [<ffffffffc0d1eae0>] ? arc_hdr_destroy+0x1e0/0x1e0 [zfs]
[322138.255086]  [<ffffffffc0dc551c>] ? zio_read+0xcc/0xe0 [zfs]
[322138.272293]  [<ffffffffc0d1eae0>] ? arc_hdr_destroy+0x1e0/0x1e0 [zfs]
[322138.291847]  [<ffffffffc0d21eb0>] ? arc_read+0x520/0xa30 [zfs]
[322138.309576]  [<ffffffffc0d28b8e>] ? dbuf_read+0x29e/0x7d0 [zfs]
[322138.327569]  [<ffffffffc0d294f8>] ? __dbuf_hold_impl+0x438/0x4d0 [zfs]
[322138.347379]  [<ffffffffc0d295fb>] ? dbuf_hold_impl+0x6b/0x90 [zfs]
[322138.366147]  [<ffffffffc0d298fb>] ? dbuf_hold+0x2b/0x60 [zfs]
[322138.383622]  [<ffffffffc0d30799>] ? dmu_buf_hold_array_by_dnode+0xf9/0x460 [zfs]
[322138.406034]  [<ffffffffc0d313d0>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
[322138.426487]  [<ffffffffc0d323cd>] ? dmu_read_uio_dbuf+0x3d/0x60 [zfs]
[322138.446691]  [<ffffffffc0db0b97>] ? zfs_read+0x127/0x3b0 [zfs]
[322138.465045]  [<ffffffffc0dcae24>] ? zpl_read_common_iovec+0x84/0xd0 [zfs]
[322138.486274]  [<ffffffffc0dcb8e1>] ? zpl_iter_read+0xa1/0xe0 [zfs]
[322138.505406]  [<ffffffff8ae0aacd>] ? new_sync_read+0xdd/0x130
[322138.523175]  [<ffffffff8ae0b261>] ? vfs_read+0x91/0x130
[322138.539686]  [<ffffffff8ae0c8f0>] ? SyS_pread64+0x90/0xb0
[322138.556649]  [<ffffffff8ac03b7d>] ? do_syscall_64+0x8d/0xf0
[322138.574196]  [<ffffffff8b21924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[322138.595828] Code: 10 31 f6 4c 89 44 24 08 4c 89 0c 24 4c 8b a4 24 88 00 00 00 44 8b ac 24 90 00 00 00 e8 68 02 f4 ff 48 8d 78 08 48 89 c1 48 89 c3 <48> c7 00 00 00 00 00 48 c7 80 30 04 00 00 00 00 00 00 31 c0 48 
[322138.656162] RIP  [<ffffffffc0dc49e2>] zio_create+0x52/0x470 [zfs]
[322138.675286]  RSP <ffffb6ca4883b970>

dmesg.201904260559
[72133.666580] general protection fault: 0000 [#1] SMP
[72133.681200] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter overlay wireguard(O) ip6_udp_tunnel udp_tunnel nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp zfs(PO) zunicode(PO) kvm_intel snd_hda_codec_realtek kvm zavl(PO) snd_hda_codec_generic irqbypass crct10dif_pclmul zcommon(PO) crc32_pclmul snd_hda_intel znvpair(PO) i915 snd_hda_codec spl(O) ghash_clmulni_intel intel_cstate snd_hda_core snd_hwdep snd_pcm intel_uncore iTCO_wdt efi_pstore iTCO_vendor_support drm_kms_helper snd_timer drm
[72133.895207]  mxm_wmi intel_rapl_perf mei_me sg snd serio_raw mei i2c_algo_bit lpc_ich pcspkr soundcore mfd_core evdev efivars shpchp wmi video intel_smartconnect button nct6775 hwmon_vid coretemp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic dm_mod usbhid hid sd_mod ahci libahci ehci_pci xhci_pci xhci_hcd ehci_hcd crc32c_intel libata aesni_intel psmouse aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd i2c_i801 scsi_mod i2c_smbus alx mdio usbcore usb_common fan thermal
[72134.084709] CPU: 3 PID: 4246 Comm: java Tainted: P          IO    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[72134.112335] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[72134.135784] task: ffff8dbb009d7100 task.stack: ffffb42103b38000
[72134.153510] RIP: 0010:[<ffffffffa9eea7a8>]  [<ffffffffa9eea7a8>] hrtimer_active+0x28/0x50
[72134.178049] RSP: 0018:ffffb42103b3be28  EFLAGS: 00010046
[72134.193962] RAX: 0000000000000000 RBX: ffff8dbb00c3c600 RCX: 0000000000000023
[72134.215337] RDX: fffd8dbb1fb94c00 RSI: 0000000000000008 RDI: ffff8dbb00c3c600
[72134.236710] RBP: 0000000000000000 R08: ffffffffaaa3eee0 R09: ffff8dbac7341380
[72134.258082] R10: 0000000000000013 R11: ffff8dbb01041b38 R12: ffff8dbb00c3c600
[72134.279452] R13: ffffb42103b3bec0 R14: 0000000000000000 R15: 0000000000000000
[72134.300824] FS:  00007fd2336ce700(0000) GS:ffff8dbb1fb80000(0000) knlGS:0000000000000000
[72134.325054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[72134.342261] CR2: 00007f36d94688a0 CR3: 00000005f211e000 CR4: 0000000000160670
[72134.363633] Stack:
[72134.369656]  ffffffffa9eeac77 0000000000000000 8a7c0674a85ffec5 ffff8dbb00c3c688
[72134.392008]  ffffb42103b3beb0 ffff8dbb00c3c600 ffffffffaa057b59 00007fd24811c410
[72134.414343]  ffffb42103b3bee0 ffff8dbb01041b00 0000000000000001 8a7c0674a85ffec5
[72134.436702] Call Trace:
[72134.444039]  [<ffffffffa9eeac77>] ? hrtimer_try_to_cancel+0x27/0x110
[72134.463080]  [<ffffffffaa057b59>] ? do_timerfd_settime+0x119/0x430
[72134.481590]  [<ffffffffaa058127>] ? SyS_timerfd_settime+0x57/0xb0
[72134.499837]  [<ffffffffa9e03b7d>] ? do_syscall_64+0x8d/0xf0
[72134.516529]  [<ffffffffaa41924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[72134.537380] Code: 00 00 00 0f 1f 44 00 00 48 8b 57 30 eb 1d 80 7f 38 00 75 32 48 3b 78 08 74 2c 39 50 04 75 e9 48 8b 57 30 48 8b 0a 48 39 c8 74 21 <48> 8b 02 8b 50 04 f6 c2 01 74 d8 f3 90 8b 50 04 f6 c2 01 75 f6 
[72134.596590] RIP  [<ffffffffa9eea7a8>] hrtimer_active+0x28/0x50
[72134.614098]  RSP <ffffb42103b3be28>

dmesg.201904270957
[100366.341655] general protection fault: 0000 [#1] SMP
[100366.356517] Modules linked in: veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter overlay wireguard(O) ip6_udp_tunnel udp_tunnel nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel zfs(PO) zunicode(PO) kvm zavl(PO) irqbypass zcommon(PO) crct10dif_pclmul znvpair(PO) crc32_pclmul spl(O) ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic i915 intel_cstate iTCO_wdt iTCO_vendor_support snd_hda_intel intel_uncore mxm_wmi evdev serio_raw efi_pstore intel_rapl_perf snd_hda_codec pcspkr snd_hda_core
[100366.570669]  snd_hwdep drm_kms_helper mei_me sg snd_pcm lpc_ich snd_timer drm snd mfd_core mei i2c_algo_bit soundcore shpchp intel_smartconnect wmi efivars video button nct6775 hwmon_vid coretemp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic dm_mod usbhid hid sd_mod ahci libahci libata xhci_pci crc32c_intel aesni_intel ehci_pci psmouse aes_x86_64 glue_helper i2c_i801 lrw xhci_hcd ehci_hcd gf128mul i2c_smbus ablk_helper cryptd usbcore alx scsi_mod mdio usb_common fan thermal
[100366.760030] CPU: 3 PID: 28567 Comm: apache2 Tainted: P          IO    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[100366.788960] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[100366.812667] task: ffff8c41b1eb4100 task.stack: ffffac678f30c000
[100366.830659] RIP: 0010:[<ffffffff8549800a>]  [<ffffffff8549800a>] __task_pid_nr_ns+0x3a/0x90
[100366.855979] RSP: 0018:ffffac678f30fcc8  EFLAGS: 00010282
[100366.872152] RAX: 0000000000000508 RBX: ffff8c4292b7ba40 RCX: 0000000000000001
[100366.893787] RDX: ffffffff86045d20 RSI: 0000000000000004 RDI: f7ff8c428aaa95c8
[100366.915418] RBP: ffffac678f30ff30 R08: 0000000000000000 R09: 0000000000000000
[100366.937052] R10: 0000000000000000 R11: 0000000000000000 R12: ffffac678f30fd78
[100366.958683] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
[100366.980317] FS:  00007f29e0c20700(0000) GS:ffff8c445fb80000(0000) knlGS:0000000000000000
[100367.004809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[100367.022279] CR2: 00007f773f92a1f8 CR3: 00000002575ee000 CR4: 0000000000160670
[100367.043913] Stack:
[100367.050195]  ffffffff8569cb93 00007f29e0c1fe20 0000000000000000 0000000000000000
[100367.072811]  0000000000000000 ffffffff8608b548 ffff8c400bc4ef80 ffff8c4292b7bb08
[100367.095407]  ffffac678f30fd20 00000000000b0008 0000000000000000 ffffac678f30fd20
[100367.118027] Call Trace:
[100367.125627]  [<ffffffff8569cb93>] ? SYSC_semtimedop+0x3b3/0xc50
[100367.143623]  [<ffffffff8552bd04>] ? __seccomp_filter+0x74/0x270
[100367.161615]  [<ffffffff8542f1f0>] ? recalibrate_cpu_khz+0x10/0x10
[100367.180130]  [<ffffffff854f01dc>] ? ktime_get_ts64+0x4c/0xf0
[100367.197342]  [<ffffffff85620bbf>] ? poll_select_copy_remaining+0xdf/0x150
[100367.217934]  [<ffffffff85403337>] ? syscall_trace_enter+0x117/0x2c0
[100367.236964]  [<ffffffff85403b7d>] ? do_syscall_64+0x8d/0xf0
[100367.253918]  [<ffffffff85a1924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[100367.275029] Code: 00 00 00 74 4e 85 f6 b8 08 05 00 00 74 1a 83 fe 04 74 0e 89 f6 48 8d 04 76 48 8d 04 c5 08 05 00 00 48 8b bf d0 04 00 00 48 01 c7 <48> 8b 0f 48 85 c9 74 20 8b b2 30 08 00 00 31 c0 3b 71 04 77 0d 
[100367.334428] RIP  [<ffffffff8549800a>] __task_pid_nr_ns+0x3a/0x90
[100367.352738]  RSP <ffffac678f30fcc8>

命令输出

# uname -a
Linux example.com 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
# lsmod
Module                  Size  Used by
ipt_REJECT             16384  6
nf_reject_ipv4         16384  1 ipt_REJECT
veth                   16384  0
xt_nat                 16384  1
xt_tcpudp              16384  3
ipt_MASQUERADE         16384  2
nf_nat_masquerade_ipv4    16384  1 ipt_MASQUERADE
nf_conntrack_netlink    36864  0
nfnetlink              16384  2 nf_conntrack_netlink
xfrm_user              36864  1
xfrm_algo              16384  1 xfrm_user
iptable_nat            16384  1
nf_conntrack_ipv4      16384  2
nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
nf_nat_ipv4            16384  1 iptable_nat
xt_addrtype            16384  2
xt_conntrack           16384  1
nf_nat                 24576  3 xt_nat,nf_nat_masquerade_ipv4,nf_nat_ipv4
nf_conntrack          114688  6 nf_conntrack_ipv4,nf_conntrack_netlink,nf_nat_masquerade_ipv4,xt_conntrack,nf_nat_ipv4,nf_nat
br_netfilter           24576  0
bridge                135168  1 br_netfilter
stp                    16384  1 bridge
llc                    16384  2 bridge,stp
xt_multiport           16384  1
iptable_filter         16384  1
wireguard             217088  0
ip6_udp_tunnel         16384  1 wireguard
udp_tunnel             16384  1 wireguard
overlay                49152  1
nls_ascii              16384  1
nls_cp437              20480  1
vfat                   20480  1
fat                    69632  1 vfat
snd_hda_codec_hdmi     49152  1
intel_rapl             20480  0
x86_pkg_temp_thermal    16384  0
intel_powerclamp       16384  0
kvm_intel             200704  0
kvm                   598016  1 kvm_intel
zfs                  2707456  8
irqbypass              16384  1 kvm
crct10dif_pclmul       16384  0
zunicode              331776  1 zfs
crc32_pclmul           16384  0
zavl                   16384  1 zfs
ghash_clmulni_intel    16384  0
zcommon                53248  1 zfs
intel_cstate           16384  0
znvpair                90112  2 zcommon,zfs
snd_hda_codec_realtek    90112  1
snd_hda_codec_generic    69632  1 snd_hda_codec_realtek
snd_hda_intel          36864  0
i915                 1257472  2
snd_hda_codec         135168  4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
drm_kms_helper        155648  1 i915
intel_uncore          118784  0
spl                    98304  3 znvpair,zcommon,zfs
snd_hda_core           90112  5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
iTCO_wdt               16384  0
mei_me                 36864  0
efi_pstore             16384  0
snd_hwdep              16384  1 snd_hda_codec
mxm_wmi                16384  0
iTCO_vendor_support    16384  1 iTCO_wdt
evdev                  24576  2
drm                   360448  3 i915,drm_kms_helper
snd_pcm               110592  4 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi
snd_timer              32768  1 snd_pcm
intel_rapl_perf        16384  0
efivars                20480  1 efi_pstore
serio_raw              16384  0
lpc_ich                24576  0
sg                     32768  0
snd                    86016  8 snd_hda_intel,snd_hwdep,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek,snd_pcm
pcspkr                 16384  0
mei                   102400  1 mei_me
i2c_algo_bit           16384  1 i915
soundcore              16384  1 snd
mfd_core               16384  1 lpc_ich
shpchp                 36864  0
wmi                    16384  1 mxm_wmi
intel_smartconnect     16384  0
video                  40960  1 i915
button                 16384  1 i915
nfsd                  331776  13
auth_rpcgss            61440  1 nfsd
oid_registry           16384  1 auth_rpcgss
nfs_acl                16384  1 nfsd
lockd                  90112  1 nfsd
grace                  16384  2 nfsd,lockd
sunrpc                344064  18 auth_rpcgss,nfsd,nfs_acl,lockd
nct6775                57344  0
hwmon_vid              16384  1 nct6775
coretemp               16384  0
efivarfs               16384  1
ip_tables              24576  2 iptable_filter,iptable_nat
x_tables               36864  9 xt_multiport,ipt_REJECT,xt_nat,ip_tables,iptable_filter,xt_tcpudp,ipt_MASQUERADE,xt_addrtype,xt_conntrack
autofs4                40960  3
ext4                  585728  2
crc16                  16384  1 ext4
jbd2                  106496  1 ext4
fscrypto               28672  1 ext4
ecb                    16384  0
mbcache                16384  3 ext4
raid10                 49152  0
raid456               106496  0
async_raid6_recov      20480  1 raid456
async_memcpy           16384  2 raid456,async_raid6_recov
async_pq               16384  2 raid456,async_raid6_recov
async_xor              16384  3 async_pq,raid456,async_raid6_recov
async_tx               16384  5 async_xor,async_pq,raid456,async_memcpy,async_raid6_recov
xor                    24576  1 async_xor
raid6_pq              110592  3 async_pq,raid456,async_raid6_recov
libcrc32c              16384  1 raid456
crc32c_generic         16384  0
raid1                  36864  0
raid0                  20480  0
multipath              16384  0
linear                 16384  0
md_mod                135168  6 raid1,raid10,multipath,linear,raid0,raid456
hid_generic            16384  0
usbhid                 53248  0
hid                   122880  2 hid_generic,usbhid
dm_mod                118784  6
sd_mod                 49152  14
ehci_pci               16384  0
xhci_pci               16384  0
xhci_hcd              188416  1 xhci_pci
ahci                   40960  8
ehci_hcd               81920  1 ehci_pci
crc32c_intel           24576  5
libahci                32768  1 ahci
aesni_intel           167936  1
aes_x86_64             20480  1 aesni_intel
libata                249856  2 ahci,libahci
glue_helper            16384  1 aesni_intel
lrw                    16384  1 aesni_intel
usbcore               253952  6 usbhid,ehci_hcd,xhci_pci,xhci_hcd,ehci_pci
gf128mul               16384  1 lrw
ablk_helper            16384  1 aesni_intel
i2c_i801               24576  0
cryptd                 24576  3 ablk_helper,ghash_clmulni_intel,aesni_intel
psmouse               135168  0
i2c_smbus              16384  1 i2c_i801
alx                    45056  0
scsi_mod              225280  3 sd_mod,libata,sg
mdio                   16384  1 alx
usb_common             16384  1 usbcore
fan                    16384  0
thermal                20480  0

更新

我在重新安装 RAM 模块之前和之后都运行了 memtest86(来自 memtest86.com 的原始版本): 内存测试日志

没有发现错误。

更新

重新安装 RAM 模块没有效果。所以我探索了新的假设。

我检查了是否有任何电气干扰,但碰撞时间与重型电机的使用之间没有相关性。

我还检查了磁盘访问和崩溃之间的相关性。看来,即使磁盘活动较少,崩溃也可能发生,但在某些磁盘活动下,崩溃发生的速度要快得多。例如,如果我并行读取所有磁盘 ( cat /dev/sdX > /dev/null),我可能会在一小时内使机器崩溃。然而,SMART数据显示没有任何问题。这里的输出smartctl -a /dev/sdb(其他磁盘看起来相同):

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       112
  3 Spin_Up_Time            0x0007   160   160   024    Pre-fail  Always       -       401 (Average 420)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       40
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   140   140   020    Pre-fail  Offline      -       15
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       7274
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       35
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       260
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       260
194 Temperature_Celsius     0x0002   224   224   000    Old_age   Always       -       29 (Min/Max 10/46)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

所以崩溃在某种程度上与磁盘有关,但我不知道如何。

答案1

查看日志,内核被污染,或在不受支持的状态下运行:

Tainted: P IO

污点标志列表可在内核文档。 P和O部分表示非GPL兼容许可的、外部构建的内核模块;最值得注意的是,其中列出了 ZFS 和相关模块。您提供的日志片段之一表明 ZFS 模块中发生了一般保护故障,但其余的都在内核的其他地方。此外,GPF 和双重故障是由处理器本身产生的,这意味着模块可能没有故障。

我更关心的是 I taint 标志。 I 标志的意思是“应用平台固件中的错误的解决方法”。这表明系统的 UEFI/BIOS 固件存在潜在的严重问题,可能会导致错误。在此开始之前您是否执行过 BIOS 更新,并且在进行硬件升级之前是否设置了此标志?

不幸的是,完整日志的链接不再有效,因此我无法提供更具体的帮助。完整的日志可能会提供有关系统正在解决的固件错误的详细信息,以及其他可能的故障指标。

相关内容