前段时间,我把旧台式机改造成debian服务器,完美运行了半年。
但后来我决定将机器移到互联网连接更好的地方,并添加一堆硬盘,使其成为一个合适的存储服务器(可以说是自制的 NAS)。
从现在开始,服务器随机崩溃。有时,需要一个多月才能崩溃。有时,需要一天的时间。最近,崩溃频率约为2-3天。
查看 dmesg,每次崩溃的原因似乎都不同。我完全不知道崩溃的原因是什么。
设置
- CPU:Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
- 主板:微星 MS-7821/Z87-G45 GAMING
- 该机器在 Linux 4.9.0-8-amd64 上运行 Debian Stretch
- Kdump 已安装
- 系统安装在三星SSD 840 PRO (128 GB)上
- 5 个 8 TB Western Digital Red HDD 用于存储
- HDD 最初使用 mdadm 进行软件 RAID5 配置,但现在由 ZFS 使用 raidz2 进行管理。
- Apache2(带有 nextcloud)和传输守护进程运行
消息
- dmesg.201904090640
- dmesg.201904111340
- dmesg.201904140557
- dmesg.201904172335
- dmesg.201904260559
- dmesg.201904270957
- dmesg.201904272249
dmesg.201904140557
[230866.137537] PANIC: double fault, error_code: 0x0
[230866.137548] PANIC: double fault, error_code: 0x0
[230866.137550] CPU: 2 PID: 25608 Comm: apache2 Tainted: P IO 4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[230866.137551] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[230866.137551] task: ffff8d7d1eabe0c0 task.stack: ffffa02483d5c000
[230866.137555] RIP: 0010:[<ffffffffad8192fa>] [<ffffffffad8192fa>] syscall_return_via_sysret+0x3e/0x4d
[230866.137556] RSP: 0018:ffffa02483d5ff50 EFLAGS: 00010002
[230866.137556] RAX: 0000000510035080 RBX: 0000000000000000 RCX: 00007fec9d79eacf
[230866.137557] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[230866.137557] RBP: 0000000000000000 R08: 00007fec6461ee20 R09: 0000000000000000
[230866.137558] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[230866.137558] R13: 0000000000000000 R14: 00007fec6461ee20 R15: 0000000000000000
[230866.137559] FS: 00007fec6461f700(0000) GS:ffff8d7e9fb00000(0000) knlGS:0000000000000000
[230866.137560] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[230866.137560] CR2: ffffa02483d5ff48 CR3: 0000000510034000 CR4: 0000000000160670
[230866.137561] Stack:
[230866.137563] 0000000000000000 0000000000000000 00007fec6461ee20 0000000000000000
[230866.137564] 0000000000000000 0000000000000000 0000000000000000 0000000000000293
[230866.137565] 0000000000000000 0000000000000000 00007fec6461ee20 0000000000000000
[230866.137565] Call Trace:
[230866.137580] Code: 50 48 8b 54 24 60 48 8b 74 24 68 48 8b 7c 24 70 50 90 0f 20 d8 65 48 0b 04 25 e0 02 01 00 78 08 65 88 04 25 e7 02 01 00 0f 22 d8 <58> 48 8b a4 24 98 00 00 00 0f 01 f8 48 0f 07 50 90 0f 20 d8 65
[230866.137580] Kernel panic - not syncing: Machine halted.
[230866.137581] CPU: 2 PID: 25608 Comm: apache2 Tainted: P IO 4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[230866.137582] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[230866.137583] 0000000000000000 ffffffffad534524 ffff8d7e9fb07f00 ffff8d7e9fb07f18
[230866.137584] ffffffffad380ecd ffffffff00000008 ffff8d7e9fb07f28 ffff8d7e9fb07ec0
[230866.137585] 88dd6d6a799c212f 00000000000000c8 0000000000000092 0000000000000000
[230866.137585] Call Trace:
[230866.137589] <#DF>
[230866.137589] [<ffffffffad534524>] ? dump_stack+0x5c/0x78
[230866.137591] [<ffffffffad380ecd>] ? panic+0xe4/0x23f
[230866.137592] [<ffffffffad258ac9>] ? df_debug+0x29/0x30
[230866.137594] [<ffffffffad227b0f>] ? do_double_fault+0x9f/0x130
[230866.137595] [<ffffffffad81a038>] ? double_fault+0x28/0x30
[230866.137596] [<ffffffffad8192fa>] ? syscall_return_via_sysret+0x3e/0x4d
dmesg.201904172335
[322137.449206] general protection fault: 0000 [#1] SMP
[322137.464088] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc xt_multiport iptable_filter wireguard(O) ip6_udp_tunnel udp_tunnel overlay nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic zfs(PO) intel_rapl zunicode(PO) x86_pkg_temp_thermal zavl(PO) intel_powerclamp zcommon(PO) znvpair(PO) snd_hda_intel kvm_intel spl(O) kvm i915 snd_hda_codec irqbypass snd_hda_core snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul iTCO_wdt ghash_clmulni_intel drm_kms_helper intel_cstate mei_me iTCO_vendor_support snd_timer drm intel_uncore snd
[322137.678356] soundcore evdev i2c_algo_bit mxm_wmi mei efi_pstore intel_rapl_perf lpc_ich sg shpchp serio_raw mfd_core pcspkr efivars wmi intel_smartconnect video button nfsd auth_rpcgss oid_registry nfs_acl lockd grace nct6775 hwmon_vid coretemp sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic usbhid hid dm_mod sd_mod xhci_pci ahci ehci_pci xhci_hcd ehci_hcd crc32c_intel libahci libata aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper psmouse cryptd scsi_mod i2c_i801 i2c_smbus alx usbcore mdio thermal usb_common fan
[322137.867812] CPU: 2 PID: 2034 Comm: transmission-da Tainted: P IO 4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[322137.898560] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[322137.922267] task: ffff9d0366de8040 task.stack: ffffb6ca48838000
[322137.940254] RIP: 0010:[<ffffffffc0dc49e2>] [<ffffffffc0dc49e2>] zio_create+0x52/0x470 [zfs]
[322137.965860] RSP: 0018:ffffb6ca4883b970 EFLAGS: 00010282
[322137.982034] RAX: fbff9cff4e756040 RBX: fbff9cff4e756040 RCX: fbff9cff4e756040
[322138.003667] RDX: 0000000000000000 RSI: 0000000002404200 RDI: fbff9cff4e756048
[322138.025297] RBP: ffff9d03710ec680 R08: 000039c6a0245fd0 R09: 0000000000000002
[322138.046929] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb6ca4883bb30
[322138.068560] R13: 0000000000000001 R14: 00000000000f99d1 R15: ffff9cff040b1a10
[322138.090191] FS: 00007fee5e413700(0000) GS:ffff9d039fb00000(0000) knlGS:0000000000000000
[322138.114681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[322138.132151] CR2: 000056466d3a1060 CR3: 00000005e6e22000 CR4: 0000000000160670
[322138.153783] Stack:
[322138.160066] 0000000000004000 ffff9cfebc544000 ffff9d0373c44000 ffff9d03710ec680
[322138.182681] ffffffffc0d1eae0 ffff9cff040b1a10 ffff9cfebc544000 0000000000004000
[322138.205299] ffff9d0373c44000 ffffffffc0dc551c ffffffffc0d1eae0 ffff9d027d98eaa8
[322138.227918] Call Trace:
[322138.235528] [<ffffffffc0d1eae0>] ? arc_hdr_destroy+0x1e0/0x1e0 [zfs]
[322138.255086] [<ffffffffc0dc551c>] ? zio_read+0xcc/0xe0 [zfs]
[322138.272293] [<ffffffffc0d1eae0>] ? arc_hdr_destroy+0x1e0/0x1e0 [zfs]
[322138.291847] [<ffffffffc0d21eb0>] ? arc_read+0x520/0xa30 [zfs]
[322138.309576] [<ffffffffc0d28b8e>] ? dbuf_read+0x29e/0x7d0 [zfs]
[322138.327569] [<ffffffffc0d294f8>] ? __dbuf_hold_impl+0x438/0x4d0 [zfs]
[322138.347379] [<ffffffffc0d295fb>] ? dbuf_hold_impl+0x6b/0x90 [zfs]
[322138.366147] [<ffffffffc0d298fb>] ? dbuf_hold+0x2b/0x60 [zfs]
[322138.383622] [<ffffffffc0d30799>] ? dmu_buf_hold_array_by_dnode+0xf9/0x460 [zfs]
[322138.406034] [<ffffffffc0d313d0>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
[322138.426487] [<ffffffffc0d323cd>] ? dmu_read_uio_dbuf+0x3d/0x60 [zfs]
[322138.446691] [<ffffffffc0db0b97>] ? zfs_read+0x127/0x3b0 [zfs]
[322138.465045] [<ffffffffc0dcae24>] ? zpl_read_common_iovec+0x84/0xd0 [zfs]
[322138.486274] [<ffffffffc0dcb8e1>] ? zpl_iter_read+0xa1/0xe0 [zfs]
[322138.505406] [<ffffffff8ae0aacd>] ? new_sync_read+0xdd/0x130
[322138.523175] [<ffffffff8ae0b261>] ? vfs_read+0x91/0x130
[322138.539686] [<ffffffff8ae0c8f0>] ? SyS_pread64+0x90/0xb0
[322138.556649] [<ffffffff8ac03b7d>] ? do_syscall_64+0x8d/0xf0
[322138.574196] [<ffffffff8b21924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[322138.595828] Code: 10 31 f6 4c 89 44 24 08 4c 89 0c 24 4c 8b a4 24 88 00 00 00 44 8b ac 24 90 00 00 00 e8 68 02 f4 ff 48 8d 78 08 48 89 c1 48 89 c3 <48> c7 00 00 00 00 00 48 c7 80 30 04 00 00 00 00 00 00 31 c0 48
[322138.656162] RIP [<ffffffffc0dc49e2>] zio_create+0x52/0x470 [zfs]
[322138.675286] RSP <ffffb6ca4883b970>
dmesg.201904260559
[72133.666580] general protection fault: 0000 [#1] SMP
[72133.681200] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter overlay wireguard(O) ip6_udp_tunnel udp_tunnel nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp zfs(PO) zunicode(PO) kvm_intel snd_hda_codec_realtek kvm zavl(PO) snd_hda_codec_generic irqbypass crct10dif_pclmul zcommon(PO) crc32_pclmul snd_hda_intel znvpair(PO) i915 snd_hda_codec spl(O) ghash_clmulni_intel intel_cstate snd_hda_core snd_hwdep snd_pcm intel_uncore iTCO_wdt efi_pstore iTCO_vendor_support drm_kms_helper snd_timer drm
[72133.895207] mxm_wmi intel_rapl_perf mei_me sg snd serio_raw mei i2c_algo_bit lpc_ich pcspkr soundcore mfd_core evdev efivars shpchp wmi video intel_smartconnect button nct6775 hwmon_vid coretemp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic dm_mod usbhid hid sd_mod ahci libahci ehci_pci xhci_pci xhci_hcd ehci_hcd crc32c_intel libata aesni_intel psmouse aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd i2c_i801 scsi_mod i2c_smbus alx mdio usbcore usb_common fan thermal
[72134.084709] CPU: 3 PID: 4246 Comm: java Tainted: P IO 4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[72134.112335] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[72134.135784] task: ffff8dbb009d7100 task.stack: ffffb42103b38000
[72134.153510] RIP: 0010:[<ffffffffa9eea7a8>] [<ffffffffa9eea7a8>] hrtimer_active+0x28/0x50
[72134.178049] RSP: 0018:ffffb42103b3be28 EFLAGS: 00010046
[72134.193962] RAX: 0000000000000000 RBX: ffff8dbb00c3c600 RCX: 0000000000000023
[72134.215337] RDX: fffd8dbb1fb94c00 RSI: 0000000000000008 RDI: ffff8dbb00c3c600
[72134.236710] RBP: 0000000000000000 R08: ffffffffaaa3eee0 R09: ffff8dbac7341380
[72134.258082] R10: 0000000000000013 R11: ffff8dbb01041b38 R12: ffff8dbb00c3c600
[72134.279452] R13: ffffb42103b3bec0 R14: 0000000000000000 R15: 0000000000000000
[72134.300824] FS: 00007fd2336ce700(0000) GS:ffff8dbb1fb80000(0000) knlGS:0000000000000000
[72134.325054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[72134.342261] CR2: 00007f36d94688a0 CR3: 00000005f211e000 CR4: 0000000000160670
[72134.363633] Stack:
[72134.369656] ffffffffa9eeac77 0000000000000000 8a7c0674a85ffec5 ffff8dbb00c3c688
[72134.392008] ffffb42103b3beb0 ffff8dbb00c3c600 ffffffffaa057b59 00007fd24811c410
[72134.414343] ffffb42103b3bee0 ffff8dbb01041b00 0000000000000001 8a7c0674a85ffec5
[72134.436702] Call Trace:
[72134.444039] [<ffffffffa9eeac77>] ? hrtimer_try_to_cancel+0x27/0x110
[72134.463080] [<ffffffffaa057b59>] ? do_timerfd_settime+0x119/0x430
[72134.481590] [<ffffffffaa058127>] ? SyS_timerfd_settime+0x57/0xb0
[72134.499837] [<ffffffffa9e03b7d>] ? do_syscall_64+0x8d/0xf0
[72134.516529] [<ffffffffaa41924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[72134.537380] Code: 00 00 00 0f 1f 44 00 00 48 8b 57 30 eb 1d 80 7f 38 00 75 32 48 3b 78 08 74 2c 39 50 04 75 e9 48 8b 57 30 48 8b 0a 48 39 c8 74 21 <48> 8b 02 8b 50 04 f6 c2 01 74 d8 f3 90 8b 50 04 f6 c2 01 75 f6
[72134.596590] RIP [<ffffffffa9eea7a8>] hrtimer_active+0x28/0x50
[72134.614098] RSP <ffffb42103b3be28>
dmesg.201904270957
[100366.341655] general protection fault: 0000 [#1] SMP
[100366.356517] Modules linked in: veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter overlay wireguard(O) ip6_udp_tunnel udp_tunnel nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel zfs(PO) zunicode(PO) kvm zavl(PO) irqbypass zcommon(PO) crct10dif_pclmul znvpair(PO) crc32_pclmul spl(O) ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic i915 intel_cstate iTCO_wdt iTCO_vendor_support snd_hda_intel intel_uncore mxm_wmi evdev serio_raw efi_pstore intel_rapl_perf snd_hda_codec pcspkr snd_hda_core
[100366.570669] snd_hwdep drm_kms_helper mei_me sg snd_pcm lpc_ich snd_timer drm snd mfd_core mei i2c_algo_bit soundcore shpchp intel_smartconnect wmi efivars video button nct6775 hwmon_vid coretemp nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc efivarfs ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic dm_mod usbhid hid sd_mod ahci libahci libata xhci_pci crc32c_intel aesni_intel ehci_pci psmouse aes_x86_64 glue_helper i2c_i801 lrw xhci_hcd ehci_hcd gf128mul i2c_smbus ablk_helper cryptd usbcore alx scsi_mod mdio usb_common fan thermal
[100366.760030] CPU: 3 PID: 28567 Comm: apache2 Tainted: P IO 4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[100366.788960] Hardware name: MSI MS-7821/Z87-G45 GAMING (MS-7821), BIOS V1.1 05/03/2013
[100366.812667] task: ffff8c41b1eb4100 task.stack: ffffac678f30c000
[100366.830659] RIP: 0010:[<ffffffff8549800a>] [<ffffffff8549800a>] __task_pid_nr_ns+0x3a/0x90
[100366.855979] RSP: 0018:ffffac678f30fcc8 EFLAGS: 00010282
[100366.872152] RAX: 0000000000000508 RBX: ffff8c4292b7ba40 RCX: 0000000000000001
[100366.893787] RDX: ffffffff86045d20 RSI: 0000000000000004 RDI: f7ff8c428aaa95c8
[100366.915418] RBP: ffffac678f30ff30 R08: 0000000000000000 R09: 0000000000000000
[100366.937052] R10: 0000000000000000 R11: 0000000000000000 R12: ffffac678f30fd78
[100366.958683] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
[100366.980317] FS: 00007f29e0c20700(0000) GS:ffff8c445fb80000(0000) knlGS:0000000000000000
[100367.004809] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[100367.022279] CR2: 00007f773f92a1f8 CR3: 00000002575ee000 CR4: 0000000000160670
[100367.043913] Stack:
[100367.050195] ffffffff8569cb93 00007f29e0c1fe20 0000000000000000 0000000000000000
[100367.072811] 0000000000000000 ffffffff8608b548 ffff8c400bc4ef80 ffff8c4292b7bb08
[100367.095407] ffffac678f30fd20 00000000000b0008 0000000000000000 ffffac678f30fd20
[100367.118027] Call Trace:
[100367.125627] [<ffffffff8569cb93>] ? SYSC_semtimedop+0x3b3/0xc50
[100367.143623] [<ffffffff8552bd04>] ? __seccomp_filter+0x74/0x270
[100367.161615] [<ffffffff8542f1f0>] ? recalibrate_cpu_khz+0x10/0x10
[100367.180130] [<ffffffff854f01dc>] ? ktime_get_ts64+0x4c/0xf0
[100367.197342] [<ffffffff85620bbf>] ? poll_select_copy_remaining+0xdf/0x150
[100367.217934] [<ffffffff85403337>] ? syscall_trace_enter+0x117/0x2c0
[100367.236964] [<ffffffff85403b7d>] ? do_syscall_64+0x8d/0xf0
[100367.253918] [<ffffffff85a1924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
[100367.275029] Code: 00 00 00 74 4e 85 f6 b8 08 05 00 00 74 1a 83 fe 04 74 0e 89 f6 48 8d 04 76 48 8d 04 c5 08 05 00 00 48 8b bf d0 04 00 00 48 01 c7 <48> 8b 0f 48 85 c9 74 20 8b b2 30 08 00 00 31 c0 3b 71 04 77 0d
[100367.334428] RIP [<ffffffff8549800a>] __task_pid_nr_ns+0x3a/0x90
[100367.352738] RSP <ffffac678f30fcc8>
命令输出
# uname -a
Linux example.com 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
# lsmod
Module Size Used by
ipt_REJECT 16384 6
nf_reject_ipv4 16384 1 ipt_REJECT
veth 16384 0
xt_nat 16384 1
xt_tcpudp 16384 3
ipt_MASQUERADE 16384 2
nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE
nf_conntrack_netlink 36864 0
nfnetlink 16384 2 nf_conntrack_netlink
xfrm_user 36864 1
xfrm_algo 16384 1 xfrm_user
iptable_nat 16384 1
nf_conntrack_ipv4 16384 2
nf_defrag_ipv4 16384 1 nf_conntrack_ipv4
nf_nat_ipv4 16384 1 iptable_nat
xt_addrtype 16384 2
xt_conntrack 16384 1
nf_nat 24576 3 xt_nat,nf_nat_masquerade_ipv4,nf_nat_ipv4
nf_conntrack 114688 6 nf_conntrack_ipv4,nf_conntrack_netlink,nf_nat_masquerade_ipv4,xt_conntrack,nf_nat_ipv4,nf_nat
br_netfilter 24576 0
bridge 135168 1 br_netfilter
stp 16384 1 bridge
llc 16384 2 bridge,stp
xt_multiport 16384 1
iptable_filter 16384 1
wireguard 217088 0
ip6_udp_tunnel 16384 1 wireguard
udp_tunnel 16384 1 wireguard
overlay 49152 1
nls_ascii 16384 1
nls_cp437 20480 1
vfat 20480 1
fat 69632 1 vfat
snd_hda_codec_hdmi 49152 1
intel_rapl 20480 0
x86_pkg_temp_thermal 16384 0
intel_powerclamp 16384 0
kvm_intel 200704 0
kvm 598016 1 kvm_intel
zfs 2707456 8
irqbypass 16384 1 kvm
crct10dif_pclmul 16384 0
zunicode 331776 1 zfs
crc32_pclmul 16384 0
zavl 16384 1 zfs
ghash_clmulni_intel 16384 0
zcommon 53248 1 zfs
intel_cstate 16384 0
znvpair 90112 2 zcommon,zfs
snd_hda_codec_realtek 90112 1
snd_hda_codec_generic 69632 1 snd_hda_codec_realtek
snd_hda_intel 36864 0
i915 1257472 2
snd_hda_codec 135168 4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
drm_kms_helper 155648 1 i915
intel_uncore 118784 0
spl 98304 3 znvpair,zcommon,zfs
snd_hda_core 90112 5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek
iTCO_wdt 16384 0
mei_me 36864 0
efi_pstore 16384 0
snd_hwdep 16384 1 snd_hda_codec
mxm_wmi 16384 0
iTCO_vendor_support 16384 1 iTCO_wdt
evdev 24576 2
drm 360448 3 i915,drm_kms_helper
snd_pcm 110592 4 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi
snd_timer 32768 1 snd_pcm
intel_rapl_perf 16384 0
efivars 20480 1 efi_pstore
serio_raw 16384 0
lpc_ich 24576 0
sg 32768 0
snd 86016 8 snd_hda_intel,snd_hwdep,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek,snd_pcm
pcspkr 16384 0
mei 102400 1 mei_me
i2c_algo_bit 16384 1 i915
soundcore 16384 1 snd
mfd_core 16384 1 lpc_ich
shpchp 36864 0
wmi 16384 1 mxm_wmi
intel_smartconnect 16384 0
video 40960 1 i915
button 16384 1 i915
nfsd 331776 13
auth_rpcgss 61440 1 nfsd
oid_registry 16384 1 auth_rpcgss
nfs_acl 16384 1 nfsd
lockd 90112 1 nfsd
grace 16384 2 nfsd,lockd
sunrpc 344064 18 auth_rpcgss,nfsd,nfs_acl,lockd
nct6775 57344 0
hwmon_vid 16384 1 nct6775
coretemp 16384 0
efivarfs 16384 1
ip_tables 24576 2 iptable_filter,iptable_nat
x_tables 36864 9 xt_multiport,ipt_REJECT,xt_nat,ip_tables,iptable_filter,xt_tcpudp,ipt_MASQUERADE,xt_addrtype,xt_conntrack
autofs4 40960 3
ext4 585728 2
crc16 16384 1 ext4
jbd2 106496 1 ext4
fscrypto 28672 1 ext4
ecb 16384 0
mbcache 16384 3 ext4
raid10 49152 0
raid456 106496 0
async_raid6_recov 20480 1 raid456
async_memcpy 16384 2 raid456,async_raid6_recov
async_pq 16384 2 raid456,async_raid6_recov
async_xor 16384 3 async_pq,raid456,async_raid6_recov
async_tx 16384 5 async_xor,async_pq,raid456,async_memcpy,async_raid6_recov
xor 24576 1 async_xor
raid6_pq 110592 3 async_pq,raid456,async_raid6_recov
libcrc32c 16384 1 raid456
crc32c_generic 16384 0
raid1 36864 0
raid0 20480 0
multipath 16384 0
linear 16384 0
md_mod 135168 6 raid1,raid10,multipath,linear,raid0,raid456
hid_generic 16384 0
usbhid 53248 0
hid 122880 2 hid_generic,usbhid
dm_mod 118784 6
sd_mod 49152 14
ehci_pci 16384 0
xhci_pci 16384 0
xhci_hcd 188416 1 xhci_pci
ahci 40960 8
ehci_hcd 81920 1 ehci_pci
crc32c_intel 24576 5
libahci 32768 1 ahci
aesni_intel 167936 1
aes_x86_64 20480 1 aesni_intel
libata 249856 2 ahci,libahci
glue_helper 16384 1 aesni_intel
lrw 16384 1 aesni_intel
usbcore 253952 6 usbhid,ehci_hcd,xhci_pci,xhci_hcd,ehci_pci
gf128mul 16384 1 lrw
ablk_helper 16384 1 aesni_intel
i2c_i801 24576 0
cryptd 24576 3 ablk_helper,ghash_clmulni_intel,aesni_intel
psmouse 135168 0
i2c_smbus 16384 1 i2c_i801
alx 45056 0
scsi_mod 225280 3 sd_mod,libata,sg
mdio 16384 1 alx
usb_common 16384 1 usbcore
fan 16384 0
thermal 20480 0
更新
我在重新安装 RAM 模块之前和之后都运行了 memtest86(来自 memtest86.com 的原始版本): 内存测试日志
没有发现错误。
更新
重新安装 RAM 模块没有效果。所以我探索了新的假设。
我检查了是否有任何电气干扰,但碰撞时间与重型电机的使用之间没有相关性。
我还检查了磁盘访问和崩溃之间的相关性。看来,即使磁盘活动较少,崩溃也可能发生,但在某些磁盘活动下,崩溃发生的速度要快得多。例如,如果我并行读取所有磁盘 ( cat /dev/sdX > /dev/null
),我可能会在一小时内使机器崩溃。然而,SMART数据显示没有任何问题。这里的输出smartctl -a /dev/sdb
(其他磁盘看起来相同):
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 112
3 Spin_Up_Time 0x0007 160 160 024 Pre-fail Always - 401 (Average 420)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 40
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 140 140 020 Pre-fail Offline - 15
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 7274
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 35
22 Helium_Level 0x0023 100 100 025 Pre-fail Always - 100
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 260
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 260
194 Temperature_Celsius 0x0002 224 224 000 Old_age Always - 29 (Min/Max 10/46)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
所以崩溃在某种程度上与磁盘有关,但我不知道如何。
答案1
查看日志,内核被污染,或在不受支持的状态下运行:
Tainted: P IO
污点标志列表可在内核文档。 P和O部分表示非GPL兼容许可的、外部构建的内核模块;最值得注意的是,其中列出了 ZFS 和相关模块。您提供的日志片段之一表明 ZFS 模块中发生了一般保护故障,但其余的都在内核的其他地方。此外,GPF 和双重故障是由处理器本身产生的,这意味着模块可能没有故障。
我更关心的是 I taint 标志。 I 标志的意思是“应用平台固件中的错误的解决方法”。这表明系统的 UEFI/BIOS 固件存在潜在的严重问题,可能会导致错误。在此开始之前您是否执行过 BIOS 更新,并且在进行硬件升级之前是否设置了此标志?
不幸的是,完整日志的链接不再有效,因此我无法提供更具体的帮助。完整的日志可能会提供有关系统正在解决的固件错误的详细信息,以及其他可能的故障指标。