某些早晨,通常是 6:30 到 8:30 之间,我的虚拟机会锁定,甚至会对 VMWare 服务器主机本身造成附带损害。发生这种情况时,我无法通过 SSH 进入虚拟机或主机。
我相信我已经将范围缩小到 cron.daily 中的 mlocate 作业。但当然不应该有什么错误的有了那个 cron 作业,所以我手头上有一个更大的问题,我无法识别。就其价值而言,这台机器的 RAM 数量非常有限,只有 384MB。也许不现实,但这超出了 Debian 的要求,而且我知道系统在出现此问题时不会做太多其他事情。
以下是我在消息日志中收到的一些内容:
Jul 18 08:30:02 core kernel: [607607.955528] updatedb.mloc D ddadc12f 0 3274 3270
Jul 18 08:30:02 core kernel: [607607.955615] d746ece0 00000082 0011caef ddadc12f 000221d2 d746ee6c c1309fc0 00000000
Jul 18 08:30:02 core kernel: [607607.955692] d60c3b4c 01142a38 07a53f31 00000000 01142a38 d60c3b4c 01142a38 c6ae3d3c
Jul 18 08:30:02 core kernel: [607607.955709] c1309fc0 00f4f000 c6ae3d3c c1300e28 c02b9048 c6ae3d34 00000000 c0190d2e
Jul 18 08:30:02 core kernel: [607607.955723] Call Trace:
Jul 18 08:30:02 core kernel: [607607.956038] [<c02b9048>] io_schedule+0x49/0x80
Jul 18 08:30:02 core kernel: [607607.956472] [<c0190d2e>] sync_buffer+0x30/0x33
Jul 18 08:30:02 core kernel: [607607.956511] [<c02b9236>] __wait_on_bit+0x33/0x58
Jul 18 08:30:02 core kernel: [607607.956515] [<c0190cfe>] sync_buffer+0x0/0x33
Jul 18 08:30:02 core kernel: [607607.956524] [<c0190cfe>] sync_buffer+0x0/0x33
Jul 18 08:30:02 core kernel: [607607.956527] [<c02b92ba>] out_of_line_wait_on_bit+0x5f/0x67
Jul 18 08:30:02 core kernel: [607607.956533] [<c0131a91>] wake_bit_function+0x0/0x3c
Jul 18 08:30:02 core kernel: [607607.956583] [<c0190cca>] __wait_on_buffer+0x16/0x18
Jul 18 08:30:02 core kernel: [607607.956593] [<d89b153d>] ext3_find_entry+0x37a/0x515 [ext3]
Jul 18 08:30:02 core kernel: [607607.957163] [<c01bae24>] security_inode_alloc+0x16/0x17
Jul 18 08:30:02 core kernel: [607607.957192] [<c0184900>] alloc_inode+0x12e/0x186
Jul 18 08:30:02 core kernel: [607607.957210] [<c0184ce9>] iget_locked+0x5b/0x100
Jul 18 08:30:02 core kernel: [607607.957217] [<d89b2bea>] ext3_lookup+0x21/0x9b [ext3]
Jul 18 08:30:02 core kernel: [607607.957228] [<c017aac3>] do_lookup+0xb6/0x153
Jul 18 08:30:13 core kernel: [607607.957233] [<c017c6c4>] __link_path_walk+0x726/0xb26
Jul 18 08:30:13 core kernel: [607607.957239] [<c0186f4c>] mntput_no_expire+0x13/0xd9
Jul 18 08:30:13 core kernel: [607607.957243] [<c017cafb>] path_walk+0x37/0x70
Jul 18 08:30:13 core kernel: [607607.957247] [<c017cdaa>] do_path_lookup+0x122/0x184
Jul 18 08:30:13 core kernel: [607607.957251] [<c017d607>] __user_walk_fd+0x29/0x3a
Jul 18 08:30:13 core kernel: [607607.957255] [<c0177625>] vfs_lstat_fd+0x12/0x39
Jul 18 08:30:13 core kernel: [607607.957276] [<c01776b9>] sys_lstat64+0xf/0x23
Jul 18 08:30:13 core kernel: [607607.957283] [<c0103857>] sysenter_past_esp+0x78/0xb1
Jul 18 08:30:13 core kernel: [607607.957344] =======================
最近,
Jun 30 07:44:11 core kernel: [2065298.377450] ionice D 299741d5 0 32588 32441
Jun 30 07:44:11 core kernel: [2065298.377515] ce11a5e0 00000086 02a1416f 299741d5 000755a5 ce11a76c c1209fc0 00000000
Jun 30 07:44:11 core kernel: [2065298.377578] c38d5f6c 058eebe6 003d2086 00000000 058eebe6 c38d5f6c 058eebe6 c3b9fd08
Jun 30 07:44:11 core kernel: [2065298.377598] c1209fc0 00e4f000 c3b9fd08 c12001cc c02b9048 c3b9fd00 00000000 c0190d2e
Jun 30 07:44:11 core kernel: [2065298.377612] Call Trace:
Jun 30 07:44:11 core kernel: [2065298.378275] [<c02b9048>] io_schedule+0x49/0x80
Jun 30 07:44:11 core kernel: [2065298.379280] [<c0190d2e>] sync_buffer+0x30/0x33
Jun 30 07:44:11 core kernel: [2065298.379325] [<c02b9236>] __wait_on_bit+0x33/0x58
Jun 30 07:44:11 core kernel: [2065298.379331] [<c0190cfe>] sync_buffer+0x0/0x33
Jun 30 07:44:11 core kernel: [2065298.379338] [<c0190cfe>] sync_buffer+0x0/0x33
Jun 30 07:44:11 core kernel: [2065298.379342] [<c02b92ba>] out_of_line_wait_on_bit+0x5f/0x67
Jun 30 07:44:11 core kernel: [2065298.379348] [<c0131a91>] wake_bit_function+0x0/0x3c
Jun 30 07:44:11 core kernel: [2065298.379399] [<c0190cca>] __wait_on_buffer+0x16/0x18
Jun 30 07:44:12 core kernel: [2065298.379415] [<d09af08d>] ext3_bread+0x44/0x5b [ext3]
Jun 30 07:44:12 core kernel: [2065298.379680] [<d09b0f50>] dx_probe+0x3a/0x2ad [ext3]
Jun 30 07:44:12 core kernel: [2065298.379692] [<c01e046c>] rb_insert_color+0x4c/0xad
Jun 30 07:44:12 core kernel: [2065298.379741] [<d09b1280>] ext3_find_entry+0xbd/0x515 [ext3]
Jun 30 07:44:12 core kernel: [2065298.379753] [<c01344ec>] hrtimer_start+0xf7/0x110
Jun 30 07:44:12 core kernel: [2065298.379760] [<c01361e0>] getnstimeofday+0x37/0xbc
Jun 30 07:44:12 core kernel: [2065298.379765] [<c0134658>] ktime_get_ts+0x22/0x49
Jun 30 07:44:12 core kernel: [2065298.379769] [<c0155174>] delayacct_end+0x70/0x77
Jun 30 07:44:12 core kernel: [2065298.379788] [<c0156aee>] sync_page+0x0/0x36
Jun 30 07:44:12 core kernel: [2065298.379803] [<c0155249>] __delayacct_blkio_end+0x56/0x59
Jun 30 07:44:12 core kernel: [2065298.379810] [<c02b9063>] io_schedule+0x64/0x80
Jun 30 07:44:12 core kernel: [2065298.379816] [<d09b2bea>] ext3_lookup+0x21/0x9b [ext3]
Jun 30 07:44:12 core kernel: [2065298.379827] [<c017aac3>] do_lookup+0xb6/0x153
Jun 30 07:44:12 core kernel: [2065298.379847] [<c017c6c4>] __link_path_walk+0x726/0xb26
Jun 30 07:44:12 core kernel: [2065298.379852] [<c0131a49>] __wake_up_bit+0x29/0x2e
Jun 30 07:44:12 core kernel: [2065298.379857] [<c01621a6>] __do_fault+0x30e/0x34d
Jun 30 07:44:12 core kernel: [2065298.379863] [<c017cafb>] path_walk+0x37/0x70
Jun 30 07:44:12 core kernel: [2065298.379867] [<c017cdaa>] do_path_lookup+0x122/0x184
Jun 30 07:44:12 core kernel: [2065298.379872] [<c017d78c>] __path_lookup_intent_open+0x42/0x72
Jun 30 07:44:12 core kernel: [2065298.379878] [<c017d80b>] path_lookup_open+0xf/0x13
Jun 30 07:44:12 core kernel: [2065298.379882] [<c0177c98>] open_exec+0x1d/0x94
Jun 30 07:44:12 core kernel: [2065298.379900] [<c0164be3>] free_pgtables+0x86/0x93
Jun 30 07:44:12 core kernel: [2065298.379906] [<c0182b46>] dput+0x25/0xbb
Jun 30 07:44:12 core kernel: [2065298.379912] [<c0178d13>] do_execve+0x48/0x1c6
Jun 30 07:44:12 core kernel: [2065298.379917] [<c010213b>] sys_execve+0x2a/0x4a
Jun 30 07:44:12 core kernel: [2065298.379944] [<c0103857>] sysenter_past_esp+0x78/0xb1
Jun 30 07:44:12 core kernel: [2065298.379984] =======================
我要指出的是,ionice 实际上是被 mlocate cron 作业使用的。
编辑: 这个问题似乎是偶尔发生的——它可能每周一次导致机器完全死机,但随着正常运行时间的增加,情况似乎也变得更糟。我真的不想责怪 cron 作业,因为我通常在我安装和支持的几乎所有服务器上运行 debian lenny——这里没有什么不寻常的。可能是内存泄漏吗?我说它随着正常运行时间而“恶化”,因为我在 vmware 主机上运行 nagios,并且通常在 4-6 天后,我会在早上开始收到一分钟的负载警告,然后在第二天收到两分钟的负载警告。我一直试图在它发生时进行远程登录,但我就是无法在它发生时连接到客户虚拟机来查看还发生了什么。
答案1
也许 mlocate 是症状,但不是原因。服务器上还有其他 cron 作业吗?尝试删除它们(如果没有真的除了 mlocate 之外,其他操作(必需)并查看是否再次发生。服务器上是否已安装任何文件系统?