我有大约 50 台具有相同硬件和内核的 Linux 服务器,但最近发现其中一些服务器的 CPU 负载过高,运行非常缓慢,top 和 ps 显示 TIME 列中有不可能的大数字;ps aux 输出的 CPU 负载高达 99%。
内核是Linux 3.0.13,它是定制的:
Linux 3.0.13-0.27-default #1 SMP Wed Feb 15 13:33:49 UTC 2012 (d73692b) x86_64 x86_64 x86_64 GNU/Linux
ps 高 CPU :
#ps u |grep 99\.9
root 1604 99.9 0.0 13760 2132 pts/1 Ss+ Oct29 38443218:17 -bash
root 13011 99.9 0.0 1532 588 tty1 Ss+ Oct28 20833538:06 /sbin/mingetty --noclear tty1
root 13014 99.9 0.0 1532 572 tty4 Ss+ Oct28 600517:28 /sbin/mingetty tty4
root 13016 99.9 0.0 1532 576 tty6 Ss+ Oct28 20833538:06 /sbin/mingetty tty6
root 14501 99.9 0.0 13760 2124 pts/2 Ss 18:30 1501293:42 -bash
顶部有非常大的 TIME:
#top
top - 18:34:20 up 2 days, 7:39, 2 users, load average: 7.08, 7.36, 7.96
Tasks: 158 total, 11 running, 111 sleeping, 2 stopped, 34 zombie
Cpu0 : 71.2% us, 17.2% sy, 0.7% ni, 10.9% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu1 : 0.6% us, 1.3% sy, 0.0% ni, 98.1% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu2 : 15.2% us, 8.9% sy, 0.0% ni, 65.2% id, 0.0% wa, 0.3% hi, 10.3% si
Cpu3 : 16.3% us, 10.0% sy, 0.0% ni, 64.8% id, 0.0% wa, 0.0% hi, 9.0% si
Cpu4 : 25.9% us, 12.3% sy, 0.0% ni, 56.1% id, 0.0% wa, 0.0% hi, 5.6% si
Cpu5 : 1.0% us, 3.6% sy, 0.0% ni, 86.9% id, 0.0% wa, 0.0% hi, 8.5% si
Cpu6 : 0.3% us, 0.0% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu7 : 10.6% us, 7.9% sy, 0.0% ni, 81.5% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 16444164k total, 15097904k used, 1346260k free, 107516k buffers
Swap: 0k total, 0k used, 0k free, 5322840k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
337 root 20 0 11392 732 400 S 99.9 0.0 324224:12 keepalive.sh
14499 root 20 0 7564 1836 1476 S 99.9 0.0 369896:40 sshd
13624 root 20 0 1348 212 172 S 99.9 0.0 486338:00 sleep.out
32713 root 20 0 1908 956 696 R 0.3 0.0 74529:31 top
1 root 20 0 720 224 188 S 0.0 0.0 436089:26 init
2 root 20 0 0 0 0 S 0.0 0.0 369896:40 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 385563:56 ksoftirqd/0
6 root RT 0 0 0 0 S 0.0 0.0 303560:22 migration/0
7 root RT 0 0 0 0 R 0.0 0.0 300258:44 watchdog/0
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
9 root 20 0 0 0 0 S 0.0 0.0 300258:44 kworker/1:0
10 root 20 0 0 0 0 S 0.0 0.0 23965:30 ksoftirqd/1
12 root RT 0 0 0 0 R 0.0 0.0 600517:28 watchdog/1
13 root RT 0 0 0 0 S 0.0 0.0 303408:44 migration/2
15 root 20 0 0 0 0 S 0.0 0.0 700896:24 ksoftirqd/2
16 root RT 0 0 0 0 R 0.0 0.0 600517:28 watchdog/2
已停止的进程:
root 5381 32001 99 11:42 ? 208-12:18:44 [sh] <defunct>
root 5383 32001 99 11:42 ? 208-12:18:44 [sh] <defunct>
root 5385 32001 0 11:42 ? 00:00:00 [sh] <defunct>
root 5387 32001 0 11:42 ? 00:00:00 [sh] <defunct>
root 32162 31998 99 11:42 ? 208-12:18:44 [sh] <defunct>
root 32164 32000 99 11:42 ? 208-12:18:44 [sh] <defunct>
root 32166 32000 0 11:42 ? 00:00:00 [sh] <defunct>
root 32168 31999 99 11:42 ? 208-12:18:44 [sh] <defunct>
root 32170 31999 0 11:42 ? 00:00:00 [sh] <defunct>
root 32172 31999 99 11:42 ? 208-12:18:44 [sh] <defunct>
root 32174 32004 0 11:42 ? 00:00:00 [sh] <defunct>
root 32175 31997 99 11:42 ? 208-12:18:44 [sh] <defunct>
root 32177 32004 99 11:42 ? 208-12:18:44 [sh] <defunct>
root 32179 31997 99 11:42 ? 208-12:18:44 [sh] <defunct>
root 32181 32004 0 11:42 ? 00:00:00 [sh] <defunct>
root 32183 32004 99 11:42 ? 208-12:18:44 [sh] <defunct>
root 32185 31997 0 11:42 ? 00:00:00 [sh] <defunct>
八个相同的 CPU:
#cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
stepping : 7
cpu MHz : 2400.236
cache size : 10240 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 4800.47
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
异常dmesg(benning半小时,第一列):
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.0.13-0.27-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Wed Feb 15 13:33:49 UTC 2012 (d73692b)
[ 0.000000] Command line: root=/dev/sda1 splash=0 crashkernel=256M-:128M@16M vga=0x31a
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 0000000000093400 (usable)
[ 0.000000] BIOS-e820: 0000000000093400 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000007e69b000 (usable)
[ 0.000000] BIOS-e820: 000000007e69b000 - 000000007e7a9000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000007e7a9000 - 000000007f3a9000 (reserved)
[ 0.000000] BIOS-e820: 000000007f3a9000 - 000000007f423000 (ACPI data)
[ 0.000000] BIOS-e820: 000000007f423000 - 000000007f4af000 (reserved)
[ 0.000000] BIOS-e820: 000000007f4af000 - 000000007f4b1000 (usable)
[ 0.000000] BIOS-e820: 000000007f4b1000 - 000000007f4b2000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000007f4b2000 - 000000007f4bb000 (reserved)
[ 0.000000] BIOS-e820: 000000007f4bb000 - 000000007f4c2000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000007f4c2000 - 000000007f4e4000 (reserved)
[ 0.000000] BIOS-e820: 000000007f4e4000 - 000000007f56a000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000007f56a000 - 000000007f7e0000 (usable)
[ 0.000000] BIOS-e820: 000000007f7e0000 - 000000007f7e1000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000007f7e1000 - 000000007f7e6000 (reserved)
[ 0.000000] BIOS-e820: 000000007f7e6000 - 000000007f800000 (usable)
[ 0.000000] BIOS-e820: 0000000080000000 - 0000000090000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fed1c000 - 00000000fed20000 (reserved)
[ 0.000000] BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 0000000480000000 (usable)
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI 2.7 present.
[ 0.000000] DMI: empty empty/ S7057 , BIOS V1.01B 06/23/2014
[ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[ 0.000000] No AGP bridge found
[ 0.000000] last_pfn = 0x480000 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-FFFFF write-protect
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 000000000000 mask 3FFC00000000 write-back
[ 0.000000] 1 base 000400000000 mask 3FFF80000000 write-back
[ 0.000000] 2 base 000080000000 mask 3FFF80000000 uncachable
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] 8 disabled
[ 0.000000] 9 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[ 0.000000] e820 update range: 0000000080000000 - 0000000100000000 (usable) ==> (reserved)
[ 0.000000] last_pfn = 0x7f800 max_arch_pfn = 0x400000000
[ 0.000000] found SMP MP-table at [ffff8800000fd930] fd930
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] Base memory trampoline at [ffff88000008e000] 8e000 size 20480
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] init_memory_mapping: 0000000000000000-000000007f800000
[ 0.000000] 0000000000 - 0040000000 page 1G
[ 0.000000] 0040000000 - 007f800000 page 2M
[ 0.000000] kernel direct mapping tables up to 7f800000 @ 1fffe000-20000000
[ 0.000000] init_memory_mapping: 0000000100000000-0000000480000000
[ 0.000000] 0100000000 - 0480000000 page 1G
[ 0.000000] kernel direct mapping tables up to 480000000 @ 7f7ff000-7f800000
[ 0.000000] RAMDISK: 37653000 - 37ff0000
[ 0.000000] crashkernel reservation failed - memory is in use.
[ 0.000000] ACPI: RSDP 00000000000f0490 00024 (v02 ALASKA)
[ 0.000000] ACPI: XSDT 000000007f3a9080 00084 (v01 ALASKA A M I 01072009 AMI 00010013)
[ 0.000000] ACPI: FACP 000000007f3b21a8 000F4 (v04 ALASKA A M I 01072009 AMI 00010013)
[ 0.000000] ACPI: DSDT 000000007f3a9198 0900C (v02 ALASKA A M I 00000001 INTL 20051117)
[ 0.000000] ACPI: FACS 000000007f4c0f80 00040
[ 0.000000] ACPI: APIC 000000007f3b22a0 000AA (v03 ALASKA A M I 01072009 AMI 00010013)
[ 0.000000] ACPI: MCFG 000000007f3b2350 0003C (v01 ALASKA OEMMCFG. 01072009 MSFT 00000097)
[ 0.000000] ACPI: SRAT 000000007f3b2390 00330 (v01 A M I AMI SRAT 00000001 AMI. 00000000)
[ 0.000000] ACPI: SLIT 000000007f3b26c0 00030 (v01 A M I AMI SLIT 00000000 AMI. 00000000)
[ 0.000000] ACPI: HPET 000000007f3b26f0 00038 (v01 ALASKA A M I 01072009 AMI. 00000004)
[ 0.000000] ACPI: SSDT 000000007f3b2728 70104 (v02 INTEL CpuPm 00004000 INTL 20051117)
[ 0.000000] ACPI: EINJ 000000007f422830 00130 (v01 AMI AMI EINJ 00000000 00000000)
[ 0.000000] ACPI: ERST 000000007f422960 00230 (v01 AMIER AMI ERST 00000000 00000000)
[ 0.000000] ACPI: HEST 000000007f422b90 000A8 (v01 AMI AMI HEST 00000000 00000000)
[ 0.000000] ACPI: BERT 000000007f422c38 00030 (v01 AMI AMI BERT 00000000 00000000)
[ 0.000000] ACPI: BGRT 000000007f422c68 00038 (v00 ALASKA A M I 01072009 AMI 00010013)
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[ 0.000000] SRAT: PXM 0 -> APIC 0x02 -> Node 0
[ 0.000000] SRAT: PXM 0 -> APIC 0x04 -> Node 0
[ 0.000000] SRAT: PXM 0 -> APIC 0x06 -> Node 0
[ 0.000000] SRAT: PXM 1 -> APIC 0x20 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 0x22 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 0x24 -> Node 1
[ 0.000000] SRAT: PXM 1 -> APIC 0x26 -> Node 1
[ 0.000000] SRAT: Node 0 PXM 0 0-80000000
[ 0.000000] SRAT: Node 0 PXM 0 100000000-280000000
[ 0.000000] SRAT: Node 1 PXM 1 280000000-480000000
[ 0.000000] NUMA: Initialized distance table, cnt=2
[ 0.000000] NUMA: Node 0 [0,80000000) + [100000000,280000000) -> [0,280000000)
[ 0.000000] Initmem setup node 0 0000000000000000-0000000280000000
[ 0.000000] NODE_DATA [000000027ffd9000 - 000000027fffffff]
[ 0.000000] Initmem setup node 1 0000000280000000-0000000480000000
[ 0.000000] NODE_DATA [000000047ffd8080 - 000000047ffff07f]
[ 0.000000] [ffffea0000000000-ffffea0008bfffff] PMD -> [ffff880277e00000-ffff88027edfffff] on node 0
[ 0.000000] [ffffea0008c00000-ffffea000fbfffff] PMD -> [ffff880477600000-ffff88047e5fffff] on node 1
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000010 -> 0x00001000
[ 0.000000] DMA32 0x00001000 -> 0x00100000
[ 0.000000] Normal 0x00100000 -> 0x00480000
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[7] active PFN ranges
[ 0.000000] 0: 0x00000010 -> 0x00000093
[ 0.000000] 0: 0x00000100 -> 0x0007e69b
[ 0.000000] 0: 0x0007f4af -> 0x0007f4b1
[ 0.000000] 0: 0x0007f56a -> 0x0007f7e0
[ 0.000000] 0: 0x0007f7e6 -> 0x0007f800
[ 0.000000] 0: 0x00100000 -> 0x00280000
[ 0.000000] 1: 0x00280000 -> 0x00480000
[ 0.000000] On node 0 totalpages: 2091184
[ 0.000000] DMA zone: 56 pages used for memmap
[ 0.000000] DMA zone: 5 pages reserved
[ 0.000000] DMA zone: 3910 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 14280 pages used for memmap
[ 0.000000] DMA32 zone: 500069 pages, LIFO batch:31
[ 0.000000] Normal zone: 21504 pages used for memmap
[ 0.000000] Normal zone: 1551360 pages, LIFO batch:31
[ 0.000000] On node 1 totalpages: 2097152
[ 0.000000] Normal zone: 28672 pages used for memmap
[ 0.000000] Normal zone: 2068480 pages, LIFO batch:31
[ 0.000000] ACPI: PM-Timer IO Port: 0x408
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x20] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x22] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x24] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x26] enabled)
[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
[ 0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec01000] gsi_base[24])
[ 0.000000] IOAPIC[1]: apic_id 2, version 32, address 0xfec01000, GSI 24-47
[ 0.000000] ACPI: IOAPIC (id[0x03] address[0xfec40000] gsi_base[48])
[ 0.000000] IOAPIC[2]: apic_id 3, version 32, address 0xfec40000, GSI 48-71
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[ 0.000000] SMP: Allowing 8 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 88
[ 0.000000] PM: Registered nosave memory: 0000000000093000 - 0000000000094000
[ 0.000000] PM: Registered nosave memory: 0000000000094000 - 00000000000a0000
[ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
[ 0.000000] PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
[ 0.000000] PM: Registered nosave memory: 000000007e69b000 - 000000007e7a9000
[ 0.000000] PM: Registered nosave memory: 000000007e7a9000 - 000000007f3a9000
[ 0.000000] PM: Registered nosave memory: 000000007f3a9000 - 000000007f423000
[ 0.000000] PM: Registered nosave memory: 000000007f423000 - 000000007f4af000
[ 0.000000] PM: Registered nosave memory: 000000007f4b1000 - 000000007f4b2000
[ 0.000000] PM: Registered nosave memory: 000000007f4b2000 - 000000007f4bb000
[ 0.000000] PM: Registered nosave memory: 000000007f4bb000 - 000000007f4c2000
[ 0.000000] PM: Registered nosave memory: 000000007f4c2000 - 000000007f4e4000
[ 0.000000] PM: Registered nosave memory: 000000007f4e4000 - 000000007f56a000
[ 0.000000] PM: Registered nosave memory: 000000007f7e0000 - 000000007f7e1000
[ 0.000000] PM: Registered nosave memory: 000000007f7e1000 - 000000007f7e6000
[ 0.000000] PM: Registered nosave memory: 000000007f800000 - 0000000080000000
[ 0.000000] PM: Registered nosave memory: 0000000080000000 - 0000000090000000
[ 0.000000] PM: Registered nosave memory: 0000000090000000 - 00000000fed1c000
[ 0.000000] PM: Registered nosave memory: 00000000fed1c000 - 00000000fed20000
[ 0.000000] PM: Registered nosave memory: 00000000fed20000 - 00000000ff000000
[ 0.000000] PM: Registered nosave memory: 00000000ff000000 - 0000000100000000
[ 0.000000] Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
[ 0.000000] Booting paravirtualized kernel on bare hardware
[ 0.000000] setup_percpu: NR_CPUS:4096 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:2
[ 0.000000] PERCPU: Embedded 26 pages/cpu @ffff88027fc00000 s74880 r8192 d23424 u524288
[ 0.000000] pcpu-alloc: s74880 r8192 d23424 u524288 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0 1 2 3 [1] 4 5 6 7
[ 0.000000] Built 2 zonelists in Zone order, mobility grouping on. Total pages: 4123819
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: root=/dev/sda1 splash=0 crashkernel=256M-:128M@16M vga=0x31a
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Memory: 16430212k/18874368k available (4405k kernel code, 2121024k absent, 323132k reserved, 7781k data, 1356k init)
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:262400 nr_irqs:1560 16
[ 0.000000] Extended CMOS year: 2000
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] allocated 134217728 bytes of page_cgroup
[ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[ 0.000000] hpet clockevent registered
[ 0.000000] Fast TSC calibration using PIT
[ 0.004000] Detected 2400.236 MHz processor.
[18014398.509486] Calibrating delay loop (skipped), value calculated using timer frequency.. 4800.47 BogoMIPS (lpj=9600944)
[18014398.509491] pid_max: default: 32768 minimum: 301
[18014398.696649] kdb version 4.4 by Keith Owens, Scott Lurndal. Copyright SGI, All Rights Reserved
[18014398.696878] Security Framework initialized
[18014398.696895] AppArmor: AppArmor initialized
[18014398.698294] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
[18014398.701889] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[18014398.703370] Mount-cache hash table entries: 256
[18014398.703558] Initializing cgroup subsys cpuacct
[18014398.703564] Initializing cgroup subsys memory
[18014398.703580] Initializing cgroup subsys devices
[18014398.703583] Initializing cgroup subsys freezer
[18014398.703585] Initializing cgroup subsys net_cls
[18014398.703588] Initializing cgroup subsys blkio
[18014398.703595] Initializing cgroup subsys perf_event
[18014398.703668] CPU: Physical Processor ID: 0
[18014398.703670] CPU: Processor Core ID: 0
[18014398.703677] mce: CPU supports 16 MCE banks
[18014398.703704] CPU0: Thermal monitoring enabled (TM1)
[18014398.703722] using mwait in idle threads.
[18014398.704892] ACPI: Core revision 20110413
[18014398.729594] x2apic not enabled, IRQ remapping init failed
[18014398.730201] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[18014398.769828] CPU0: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz stepping 07
[18014398.877280] Performance Events: PEBS fmt1+, SandyBridge events, Intel PMU driver.
[18014398.877288] ... version: 3
[18014398.877289] ... bit width: 48
[18014398.877291] ... generic registers: 8
[18014398.877293] ... value mask: 0000ffffffffffff
[18014398.877296] ... max period: 000000007fffffff
[18014398.877298] ... fixed-purpose events: 3
[18014398.877300] ... event mask: 00000007000000ff
[18014398.877514] NMI watchdog enabled, takes one hw-pmu counter.
[18014398.877651] Booting Node 0, Processors #1
[18014398.877654] smpboot cpu 1: start_ip = 8e000
[18014398.909186] NMI watchdog enabled, takes one hw-pmu counter.
[18014398.909344] #2
[18014398.909346] smpboot cpu 2: start_ip = 8e000
[18014398.940474] NMI watchdog enabled, takes one hw-pmu counter.
[18014398.940625] #3
[18014398.940627] smpboot cpu 3: start_ip = 8e000
[18014398.971752] NMI watchdog enabled, takes one hw-pmu counter.
[18014398.971970] Ok.
[18014398.971972] Booting Node 1, Processors #4
[18014398.971975] smpboot cpu 4: start_ip = 8e000
[18014399.081093] NMI watchdog enabled, takes one hw-pmu counter.
[18014399.081258] #5
[18014399.081259] smpboot cpu 5: start_ip = 8e000
[18014399.112362] NMI watchdog enabled, takes one hw-pmu counter.
[18014399.112528] #6
[18014399.112530] smpboot cpu 6: start_ip = 8e000
[18014399.143633] NMI watchdog enabled, takes one hw-pmu counter.
[18014399.143795] #7 Ok.
[18014399.143797] smpboot cpu 7: start_ip = 8e000
[18014399.174900] NMI watchdog enabled, takes one hw-pmu counter.
[18014399.174924] Brought up 8 CPUs
[18014399.174927] Total of 8 processors activated (38401.60 BogoMIPS).
[18014399.555854] devtmpfs: initialized
[18014399.559405] PM: Registering ACPI NVS region at 7e69b000 (1105920 bytes)
[18014399.559459] PM: Registering ACPI NVS region at 7f4b1000 (4096 bytes)
[18014399.559462] PM: Registering ACPI NVS region at 7f4bb000 (28672 bytes)
[18014399.559465] PM: Registering ACPI NVS region at 7f4e4000 (548864 bytes)
[18014399.559498] PM: Registering ACPI NVS region at 7f7e0000 (4096 bytes)
[18014399.559645] print_constraints: dummy:
[18014399.559675] Time: 10:55:14 Date: 10/28/15
[18014399.559772] NET: Registered protocol family 16
[18014399.559933] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[18014399.559938] ACPI: bus type pci registered
[18014399.560001] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000)
[18014399.560006] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820
[18014399.605298] PCI: Using configuration type 1 for base access
[18014399.606398] bio: create slab <bio-0> at 0
[18014399.612428] ACPI: EC: Look up EC in DSDT
[18014399.618314] ACPI: Executed 1 blocks of module-level executable AML code
[18014399.758896] ACPI: Interpreter enabled
[18014399.758903] ACPI: (supports S0 S1 S4 S5)
[18014399.758926] ACPI: Using IOAPIC for interrupt routing
[18014399.759318] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
[18014399.837279] ACPI: No dock devices found.
[18014399.837285] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[18014399.837640] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7e])
[18014399.837989] pci_root PNP0A08:00: host bridge window [io 0x0000-0x03af]
[18014399.837993] pci_root PNP0A08:00: host bridge window [io 0x03e0-0x0cf7]
[18014399.837996] pci_root PNP0A08:00: host bridge window [io 0x03b0-0x03df]
[18014399.837998] pci_root PNP0A08:00: host bridge window [io 0x0d00-0x9fff]
[18014399.838001] pci_root PNP0A08:00: host bridge window [mem 0x000a0000-0x000bffff]
[18014399.838005] pci_root PNP0A08:00: host bridge window [mem 0x000c0000-0x000dffff]
[18014399.838008] pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xdfffffff]
[18014399.838026] pci 0000:00:00.0: [8086:3c00] type 0 class 0x000600
[18014399.838072] pci 0000:00:00.0: PME# supported from D0 D3hot D3cold
[18014399.838075] pci 0000:00:00.0: PME# disabled
[18014399.838099] pci 0000:00:01.0: [8086:3c02] type 1 class 0x000604
[18014399.838148] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
[18014399.838151] pci 0000:00:01.0: PME# disabled
[18014399.838177] pci 0000:00:01.1: [8086:3c03] type 1 class 0x000604
[18014399.838225] pci 0000:00:01.1: PME# supported from D0 D3hot D3cold
另一个正常服务器的 dmesg 片段:
[ 0.000000] Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
[ 0.000000] Booting paravirtualized kernel on bare hardware
[ 0.000000] setup_percpu: NR_CPUS:4096 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:2
[ 0.000000] PERCPU: Embedded 26 pages/cpu @ffff88027fc00000 s74880 r8192 d23424 u524288
[ 0.000000] pcpu-alloc: s74880 r8192 d23424 u524288 alloc=1*2097152
[ 0.000000] pcpu-alloc: [0] 0 1 2 3 [1] 4 5 6 7
[ 0.000000] Built 2 zonelists in Zone order, mobility grouping on. Total pages: 4123819
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: root=/dev/sda1 splash=0 crashkernel=256M-:128M@16M vga=0x31a
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] xsave/xrstor: enabled xstate_bv 0x7, cntxt size 0x340
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Memory: 16430212k/18874368k available (4405k kernel code, 2121024k absent, 323132k reserved, 7781k data, 1356k init)
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] NR_IRQS:262400 nr_irqs:1560 16
[ 0.000000] Extended CMOS year: 2000
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] allocated 134217728 bytes of page_cgroup
[ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[ 0.000000] hpet clockevent registered
[ 0.000000] Fast TSC calibration using PIT
[ 0.004000] Detected 2399.963 MHz processor.
[ 0.000004] Calibrating delay loop (skipped), value calculated using timer frequency.. 4799.92 BogoMIPS (lpj=9599852)
[ 0.000009] pid_max: default: 32768 minimum: 301
[ 0.000384] kdb version 4.4 by Keith Owens, Scott Lurndal. Copyright SGI, All Rights Reserved
[ 0.000616] Security Framework initialized
[ 0.000632] AppArmor: AppArmor initialized
[ 0.002018] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
如果相关的话:
#cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
#
我的经理说即使重新启动也不能解决问题(CPU负载高,运行缓慢等)。关闭服务器,半小时后重新启动可能会解决问题。
只有少数服务器出现这种情况,其他服务器看起来正常。并且很难重现该问题。
这可能是软件问题还是硬件问题?我应该怎么做才能调试?(我没有内核 src 代码)。
如果您需要更多信息,请告诉我!