日志中的 Linux 内核消息 - 磁盘写入似乎挂在数据库上

日志中的 Linux 内核消息 - 磁盘写入似乎挂在数据库上

我遇到了数据库问题,提交没有完成(PostgreSQL),这似乎与填满我的服务器日志的这些消息有关:

Jan 30 02:29:45 server001 kernel: [3521062.240540] postgres      D 0000000000000000     0 23220   8154 0x00000004
Jan 30 02:29:45 server001 kernel: [3521062.240550]  ffff8800174c9d08 0000000000000082 ffff88041cd24728 0000000000015880
Jan 30 02:29:45 server001 kernel: [3521062.240559]  ffff8806c678b110 0000000000015880 0000000000015880 0000000000015880
Jan 30 02:29:45 server001 kernel: [3521062.240567]  0000000000015880 ffff8806c678b110 0000000000015880 0000000000015880
Jan 30 02:29:45 server001 kernel: [3521062.240575] Call Trace:
Jan 30 02:29:45 server001 kernel: [3521062.240582]  [<ffffffff810da010>] ? sync_page+0x0/0x50
Jan 30 02:29:45 server001 kernel: [3521062.240590]  [<ffffffff81528488>] io_schedule+0x28/0x40
Jan 30 02:29:45 server001 kernel: [3521062.240596]  [<ffffffff810da04d>] sync_page+0x3d/0x50
Jan 30 02:29:45 server001 kernel: [3521062.240603]  [<ffffffff815289a7>] __wait_on_bit+0x57/0x80
Jan 30 02:29:45 server001 kernel: [3521062.240610]  [<ffffffff810da1be>] wait_on_page_bit+0x6e/0x80
Jan 30 02:29:45 server001 kernel: [3521062.240618]  [<ffffffff81078540>] ? wake_bit_function+0x0/0x40
Jan 30 02:29:45 server001 kernel: [3521062.240627]  [<ffffffff810e4480>] ? pagevec_lookup_tag+0x20/0x30
Jan 30 02:29:45 server001 kernel: [3521062.240634]  [<ffffffff810da665>] wait_on_page_writeback_range+0xf5/0x190
Jan 30 02:29:45 server001 kernel: [3521062.240644]  [<ffffffff81053668>] ? try_to_wake_up+0x118/0x340
Jan 30 02:29:45 server001 kernel: [3521062.240651]  [<ffffffff810da727>] filemap_fdatawait+0x27/0x30
Jan 30 02:29:45 server001 kernel: [3521062.240659]  [<ffffffff811431b4>] vfs_fsync+0xa4/0xf0
Jan 30 02:29:45 server001 kernel: [3521062.240667]  [<ffffffff81143239>] do_fsync+0x39/0x60
Jan 30 02:29:45 server001 kernel: [3521062.240674]  [<ffffffff8114328b>] sys_fsync+0xb/0x10
Jan 30 02:29:45 server001 kernel: [3521062.240682]  [<ffffffff81012042>] system_call_fastpath+0x16/0x1b

编辑:

我在系统日志中发现了进一步的信息:

Jan 30 02:21:45 server001 kernel: [3520582.242828] INFO: task postgres:8750 blocked for more than 120 seconds.
Jan 30 02:21:45 server001 kernel: [3520582.242873] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 30 02:21:45 server001 kernel: [3520582.242933] postgres      D 00000000ffffffff     0  8750   8154 0x00000004
Jan 30 02:21:45 server001 kernel: [3520582.242946]  ffff880415cd7d08 0000000000000082 ffff880415cd7c88 0000000000015880
Jan 30 02:21:45 server001 kernel: [3520582.242957]  ffff88040dc79a60 0000000000015880 0000000000015880 0000000000015880
Jan 30 02:21:45 server001 kernel: [3520582.242965]  0000000000015880 ffff88040dc79a60 0000000000015880 0000000000015880
Jan 30 02:21:45 server001 kernel: [3520582.242974] Call Trace:
Jan 30 02:21:45 server001 kernel: [3520582.242994]  [<ffffffff810da010>] ? sync_page+0x0/0x50
Jan 30 02:21:45 server001 kernel: [3520582.243006]  [<ffffffff81528488>] io_schedule+0x28/0x40
Jan 30 02:21:45 server001 kernel: [3520582.243014]  [<ffffffff810da04d>] sync_page+0x3d/0x50
Jan 30 02:21:45 server001 kernel: [3520582.243022]  [<ffffffff815289a7>] __wait_on_bit+0x57/0x80
Jan 30 02:21:45 server001 kernel: [3520582.243029]  [<ffffffff810da1be>] wait_on_page_bit+0x6e/0x80
Jan 30 02:21:45 server001 kernel: [3520582.243039]  [<ffffffff81078540>] ? wake_bit_function+0x0/0x40
Jan 30 02:21:45 server001 kernel: [3520582.243049]  [<ffffffff810e4480>] ? pagevec_lookup_tag+0x20/0x30
Jan 30 02:21:45 server001 kernel: [3520582.243057]  [<ffffffff810da665>] wait_on_page_writeback_range+0xf5/0x190
Jan 30 02:21:45 server001 kernel: [3520582.243070]  [<ffffffff811f4e9c>] ? jbd2_log_start_commit+0x3c/0x50
Jan 30 02:21:45 server001 kernel: [3520582.243078]  [<ffffffff810da727>] filemap_fdatawait+0x27/0x30
Jan 30 02:21:45 server001 kernel: [3520582.243088]  [<ffffffff811431b4>] vfs_fsync+0xa4/0xf0
Jan 30 02:21:45 server001 kernel: [3520582.243096]  [<ffffffff81143239>] do_fsync+0x39/0x60
Jan 30 02:21:45 server001 kernel: [3520582.243103]  [<ffffffff8114328b>] sys_fsync+0xb/0x10
Jan 30 02:21:45 server001 kernel: [3520582.243114]  [<ffffffff81012042>] system_call_fastpath+0x16/0x1b

编辑2:

我意识到除了凌晨 2 点左右,日志中没有任何内容。但从那时起,我的数据库服务器就一直处于挂断状态。

--

运行 Ubuntu Linux 9.10,内核 2.6.31-20-server。

以前也发生过这种情况,但当时整个服务器都挂了。这次,服务器总体上可以正常工作。

我试图寻找两件事:

1) 我可以做些什么来取消挂起正在等待此操作的当前线程。 2) 什么原因导致这种情况?

相关内容