尝试安装软件包或升级 CentOS 8 系统会导致 e2-medium 实例无响应。这种情况在多个实例上反复发生。长时间监控时没有出现峰值。CPU 负载短暂飙升,但不超过 50%,磁盘 I/O 也是如此,读取速度约为 7 MB/s,读取 IOPS 约为 150(仍低于此实例使用的 30 GB SSD 持久磁盘的估计性能限制)。内存利用率稳定在 ~25%。将实例类型更改为 n1-standard-1 会有所帮助,然后安装/升级会快速进行而不会出现错误。
控制台输出中可以看到以下内容。这可能是虚拟机管理程序的错误吗?e2-medium 实例肯定足够强大,可以安装操作系统包。CentOS 8.3.2011,完全最新。
Jan 21 08:30:41 instancename systemd[1]: Starting dnf makecache...
Jan 21 08:30:55 instancename filebeat[1120]: 2021-01-21T08:30:55.871+0100#011INFO#011[monitoring]#011log/log.go:145#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":131072}}}},"cpu":{"system":{"ticks":90,"time":{"ms":2}},"total":{"ticks":230,"time":{"ms":5},"value":230},"user":{"ticks":140,"time":{"ms":3}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":11},"info":{"ephemeral_id":"a784951e-88c1-4222-a1ab-08363e8cf29c","uptime":{"ms":600277}},"memstats":{"gc_next":7335536,"memory_alloc":4873592,"memory_total":20972168},"runtime":{"goroutines":19}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":2}},"system":{"load":{"1":5.45,"15":1.05,"5":2.54,"norm":{"1":2.725,"15":0.525,"5":1.27}}}}}}
[ 615.060049] INFO: task kworker/u4:0:8 blocked for more than 120 seconds.
[ 615.067009] Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
[ 615.072913] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 615.081381] kworker/u4:0 D 0 8 2 0x80004000
[ 615.087209] Workqueue: writeback wb_workfn (flush-8:0)
[ 615.092521] Call Trace:
[ 615.095112] __schedule+0x2a6/0x700
[ 615.098757] ? wbt_exit+0x30/0x30
[ 615.102213] ? __wbt_done+0x30/0x30
[ 615.105939] schedule+0x38/0xa0
[ 615.109216] io_schedule+0x12/0x40
[ 615.112936] rq_qos_wait+0xfa/0x170
[ 615.116715] ? karma_partition+0x210/0x210
[ 615.120963] ? wbt_exit+0x30/0x30
[ 615.124428] wbt_wait+0x99/0xe0
[ 615.127725] __rq_qos_throttle+0x23/0x30
[ 615.131873] blk_mq_make_request+0x125/0x5d0
[ 615.136386] generic_make_request+0xcf/0x310
[ 615.140928] submit_bio+0x3c/0x160
[ 615.144530] ? xfs_prepare_ioend+0xc2/0xd0 [xfs]
[ 615.149505] iomap_submit_ioend.isra.38+0x4a/0x70
[ 615.154394] xfs_vm_writepages+0x64/0x90 [xfs]
[ 615.159159] do_writepages+0x41/0xd0
[ 615.162889] __writeback_single_inode+0x3d/0x360
[ 615.167730] writeback_sb_inodes+0x1e3/0x450
[ 615.172227] __writeback_inodes_wb+0x5d/0xb0
[ 615.177091] wb_writeback+0x25f/0x2f0
[ 615.181207] ? cpumask_next+0x17/0x20
[ 615.185015] wb_workfn+0x333/0x3f0
[ 615.188558] ? __switch_to_asm+0x35/0x70
[ 615.192712] process_one_work+0x1a7/0x360
[ 615.196889] worker_thread+0x30/0x390
[ 615.200814] ? create_worker+0x1a0/0x1a0
[ 615.204880] kthread+0x112/0x130
[ 615.208248] ? kthread_flush_work_fn+0x10/0x10
[ 615.212833] ret_from_fork+0x35/0x40
[ 615.216733] INFO: task auditd:679 blocked for more than 120 seconds.
[ 615.223229] Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
[ 615.229020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 615.237115] auditd D 0 679 1 0x00000320
[ 615.243158] Call Trace:
[ 615.246033] __schedule+0x2a6/0x700
[ 615.249772] ? mempool_alloc+0x67/0x190
[ 615.254019] ? wbt_exit+0x30/0x30
[ 615.257534] ? __wbt_done+0x30/0x30
[ 615.261471] schedule+0x38/0xa0
[ 615.265307] io_schedule+0x12/0x40
[ 615.268972] rq_qos_wait+0xfa/0x170
[ 615.272856] ? karma_partition+0x210/0x210
[ 615.277098] ? wbt_exit+0x30/0x30
[ 615.280684] wbt_wait+0x99/0xe0
[ 615.283989] __rq_qos_throttle+0x23/0x30
[ 615.288152] blk_mq_make_request+0x125/0x5d0
[ 615.292609] generic_make_request+0xcf/0x310
[ 615.297120] submit_bio+0x3c/0x160
[ 615.300809] ? xfs_prepare_ioend+0xc2/0xd0 [xfs]
[ 615.305583] iomap_submit_ioend.isra.38+0x4a/0x70
[ 615.310626] xfs_vm_writepages+0x64/0x90 [xfs]
[ 615.315321] do_writepages+0x41/0xd0
[ 615.319228] __filemap_fdatawrite_range+0xbe/0xf0
[ 615.324076] file_write_and_wait_range+0x4c/0xa0
Jan 21 08:30:56 [ 615.328888] xfs_file_fsync+0x5d/0x1e0 [xfs]
[ 615.334687] do_fsync+0x38/0x70
instancename ker[ 615.338159] __x64_sys_fsync+0x10/0x20
[ 615.343417] do_syscall_64+0x5b/0x1a0
nel: INFO: task [ 615.347246] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 615.353984] RIP: 0033:0x7f4555a00e57
kworker/u4:0:8 b[ 615.357890] Code: Bad RIP value.
[ 615.362766] RSP: 002b:00007f4553378d00 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
locked for more [ 615.370709] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f4555a00e57
[ 615.379456] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006
[ 615.386755] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f4553379700
[ 615.394155] R10: 0000000000000000 R11: 0000000000000293 R12: 00007fffcdb9149e
[ 615.401812] R13: 00007fffcdb9149f R14: 0000000000000000 R15: 00007f4553378e80
than 120 seconds[ 615.409225] INFO: task kworker/1:3:1731 blocked for more than 120 seconds.
[ 615.417605] Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
[ 615.423406] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 615.431453] kworker/1:3 D 0 1731 2 0x80004080
.
Jan 21 08:30:[ 615.437185] Workqueue: xfs-sync/sda1 xfs_log_worker [xfs]
[ 615.444063] Call Trace:
[ 615.446658] __schedule+0x2a6/0x700
56 instancename [ 615.450278] ? wbt_exit+0x30/0x30
[ 615.455116] ? __wbt_done+0x30/0x30
[ 615.458836] schedule+0x38/0xa0
kernel: Not[ 615.462118] io_schedule+0x12/0x40
[ 615.467115] rq_qos_wait+0xfa/0x170
[ 615.470771] ? karma_partition+0x210/0x210
tainted 4.18.0-[ 615.474997] ? wbt_exit+0x30/0x30
[ 615.479902] wbt_wait+0x99/0xe0
[ 615.483175] __rq_qos_throttle+0x23/0x30
[ 615.487231] blk_mq_make_request+0x125/0x5d0
[ 615.491647] generic_make_request+0xcf/0x310
240.1.1.el8_3.x8[ 615.496049] ? bio_add_page+0x42/0x50
[ 615.501214] submit_bio+0x3c/0x160
6_64 #1
Jan 21 [ 615.504816] xlog_state_release_iclog+0x5f/0xa0 [xfs]
08:30:56 instanc[ 615.511398] xfs_log_force+0x23c/0x2e0 [xfs]
[ 615.517508] ? __switch_to_asm+0x35/0x70
ename kernel: "e[ 615.521564] ? __switch_to_asm+0x41/0x70
[ 615.527005] ? finish_task_switch+0x77/0x2a0
cho 0 > /proc/sy[ 615.531547] xfs_log_worker+0x2f/0xf0 [xfs]
[ 615.537289] process_one_work+0x1a7/0x360
[ 615.541569] worker_thread+0x30/0x390
s/kernel/hung_ta[ 615.545779] ? create_worker+0x1a0/0x1a0
sk_timeout_secs"[ 615.551370] kthread+0x112/0x130
[ 615.556120] ? kthread_flush_work_fn+0x10/0x10
disables this m[ 615.560762] ret_from_fork+0x35/0x40
essage.
Jan 21 08:30:56 instancename kernel: kworker/u4:0 D 0 8 2 0x80004000
Jan 21 08:30:56 instancename kernel: Workqueue: writeback wb_workfn (flush-8:0)
Jan 21 08:30:56 instancename kernel: Call Trace:
Jan 21 08:30:56 instancename kernel: __schedule+0x2a6/0x700
Jan 21 08:30:56 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:56 instancename kernel: ? __wbt_done+0x30/0x30
Jan 21 08:30:56 instancename kernel: schedule+0x38/0xa0
Jan 21 08:30:56 instancename kernel: io_schedule+0x12/0x40
Jan 21 08:30:56 instancename kernel: rq_qos_wait+0xfa/0x170
Jan 21 08:30:56 instancename kernel: ? karma_partition+0x210/0x210
Jan 21 08:30:56 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:56 instancename kernel: wbt_wait+0x99/0xe0
Jan 21 08:30:56 instancename kernel: __rq_qos_throttle+0x23/0x30
Jan 21 08:30:56 instancename kernel: blk_mq_make_request+0x125/0x5d0
Jan 21 08:30:56 instancename kernel: generic_make_request+0xcf/0x310
Jan 21 08:30:56 instancename kernel: submit_bio+0x3c/0x160
Jan 21 08:30:56 instancename kernel: ? xfs_prepare_ioend+0xc2/0xd0 [xfs]
Jan 21 08:30:56 instancename kernel: iomap_submit_ioend.isra.38+0x4a/0x70
Jan 21 08:30:56 instancename kernel: xfs_vm_writepages+0x64/0x90 [xfs]
Jan 21 08:30:56 instancename kernel: do_writepages+0x41/0xd0
Jan 21 08:30:56 instancename kernel: __writeback_single_inode+0x3d/0x360
Jan 21 08:30:56 instancename kernel: writeback_sb_inodes+0x1e3/0x450
Jan 21 08:30:56 instancename kernel: __writeback_inodes_wb+0x5d/0xb0
Jan 21 08:30:56 instancename kernel: wb_writeback+0x25f/0x2f0
Jan 21 08:30:56 instancename kernel: ? cpumask_next+0x17/0x20
Jan 21 08:30:56 instancename kernel: wb_workfn+0x333/0x3f0
Jan 21 08:30:56 instancename kernel: ? __switch_to_asm+0x35/0x70
Jan 21 08:30:56 instancename kernel: process_one_work+0x1a7/0x360
Jan 21 08:30:56 instancename kernel: worker_thread+0x30/0x390
Jan 21 08:30:56 instancename kernel: ? create_worker+0x1a0/0x1a0
Jan 21 08:30:56 instancename kernel: kthread+0x112/0x130
Jan 21 08:30:56 instancename kernel: ? kthread_flush_work_fn+0x10/0x10
Jan 21 08:30:56 instancename kernel: ret_from_fork+0x35/0x40
Jan 21 08:30:56 instancename kernel: INFO: task auditd:679 blocked for more than 120 seconds.
Jan 21 08:30:56 instancename kernel: Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
Jan 21 08:30:56 instancename kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 21 08:30:56 instancename kernel: auditd D 0 679 1 0x00000320
Jan 21 08:30:56 instancename kernel: Call Trace:
Jan 21 08:30:56 instancename kernel: __schedule+0x2a6/0x700
Jan 21 08:30:56 instancename kernel: ? mempool_alloc+0x67/0x190
Jan 21 08:30:56 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:56 instancename kernel: ? __wbt_done+0x30/0x30
Jan 21 08:30:56 instancename kernel: schedule+0x38/0xa0
Jan 21 08:30:56 instancename kernel: io_schedule+0x12/0x40
Jan 21 08:30:56 instancename kernel: rq_qos_wait+0xfa/0x170
Jan 21 08:30:56 instancename kernel: ? karma_partition+0x210/0x210
Jan 21 08:30:56 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:56 instancename kernel: wbt_wait+0x99/0xe0
Jan 21 08:30:56 instancename kernel: __rq_qos_throttle+0x23/0x30
Jan 21 08:30:56 instancename kernel: blk_mq_make_request+0x125/0x5d0
Jan 21 08:30:56 instancename kernel: generic_make_request+0xcf/0x310
Jan 21 08:30:56 instancename kernel: submit_bio+0x3c/0x160
Jan 21 08:30:56 instancename kernel: ? xfs_prepare_ioend+0xc2/0xd0 [xfs]
Jan 21 08:30:56 instancename kernel: iomap_submit_ioend.isra.38+0x4a/0x70
Jan 21 08:30:56 instancename kernel: xfs_vm_writepages+0x64/0x90 [xfs]
Jan 21 08:30:56 instancename kernel: do_writepages+0x41/0xd0
Jan 21 08:30:56 instancename kernel: __filemap_fdatawrite_range+0xbe/0xf0
Jan 21 08:30:56 instancename kernel: file_write_and_wait_range+0x4c/0xa0
Jan 21 08:30:56 instancename kernel: xfs_file_fsync+0x5d/0x1e0 [xfs]
Jan 21 08:30:56 instancename kernel: do_fsync+0x38/0x70
Jan 21 08:30:56 instancename kernel: __x64_sys_fsync+0x10/0x20
Jan 21 08:30:56 instancename kernel: do_syscall_64+0x5b/0x1a0
Jan 21 08:30:56 instancename kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca
Jan 21 08:30:57 instancename kernel: RIP: 0033:0x7f4555a00e57
Jan 21 08:30:57 instancename kernel: Code: Bad RIP value.
Jan 21 08:30:57 instancename kernel: RSP: 002b:00007f4553378d00 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jan 21 08:30:57 instancename kernel: RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f4555a00e57
Jan 21 08:30:57 instancename kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006
Jan 21 08:30:57 instancename kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f4553379700
Jan 21 08:30:57 instancename kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007fffcdb9149e
Jan 21 08:30:57 instancename kernel: R13: 00007fffcdb9149f R14: 0000000000000000 R15: 00007f4553378e80
Jan 21 08:30:57 instancename kernel: INFO: task kworker/1:3:1731 blocked for more than 120 seconds.
Jan 21 08:30:57 instancename kernel: Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
Jan 21 08:30:57 instancename kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 21 08:30:57 instancename kernel: kworker/1:3 D 0 1731 2 0x80004080
Jan 21 08:30:57 instancename kernel: Workqueue: xfs-sync/sda1 xfs_log_worker [xfs]
Jan 21 08:30:57 instancename kernel: Call Trace:
Jan 21 08:30:57 instancename kernel: __schedule+0x2a6/0x700
Jan 21 08:30:57 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:57 instancename kernel: ? __wbt_done+0x30/0x30
Jan 21 08:30:57 instancename kernel: schedule+0x38/0xa0
Jan 21 08:30:57 instancename kernel: io_schedule+0x12/0x40
Jan 21 08:30:57 instancename kernel: rq_qos_wait+0xfa/0x170
Jan 21 08:30:57 instancename kernel: ? karma_partition+0x210/0x210
Jan 21 08:30:57 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:57 instancename kernel: wbt_wait+0x99/0xe0
Jan 21 08:30:57 instancename kernel: __rq_qos_throttle+0x23/0x30
Jan 21 08:30:57 instancename kernel: blk_mq_make_request+0x125/0x5d0
Jan 21 08:30:57 instancename kernel: generic_make_request+0xcf/0x310
Jan 21 08:30:57 instancename kernel: ? bio_add_page+0x42/0x50
Jan 21 08:30:57 instancename kernel: submit_bio+0x3c/0x160
Jan 21 08:30:57 instancename kernel: xlog_state_release_iclog+0x5f/0xa0 [xfs]
Jan 21 08:30:57 instancename kernel: xfs_log_force+0x23c/0x2e0 [xfs]
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: can not take infinite value
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing.
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: can not take infinite value
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing.
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: can not take infinite value
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing.
Jan 21 08:30:57 instancename kernel: ? __switch_to_asm+0x35/0x70
Jan 21 08:30:57 instancename kernel: ? __switch_to_asm+0x41/0x70
Jan 21 08:30:57 instancename kernel: ? finish_task_switch+0x77/0x2a0
Jan 21 08:30:57 instancename kernel: xfs_log_worker+0x2f/0xf0 [xfs]
Jan 21 08:30:57 instancename kernel: process_one_work+0x1a7/0x360
Jan 21 08:30:57 instancename kernel: worker_thread+0x30/0x390
Jan 21 08:30:57 instancename kernel: ? create_worker+0x1a0/0x1a0
Jan 21 08:30:57 instancename kernel: kthread+0x112/0x130
Jan 21 08:30:57 instancename kernel: ? kthread_flush_work_fn+0x10/0x10
Jan 21 08:30:57 instancename kernel: ret_from_fork+0x35/0x40
Jan 21 08:31:25 instancename filebeat[1120]: 2021-01-21T08:31:25.871+0100#011INFO#011[monitoring]#011log/log.go:145#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":-126976}}}},"cpu":{"system":{"ticks":90,"time":{"ms":5}},"total":{"ticks":240,"time":{"ms":10},"value":240},"user":{"ticks":150,"time":{"ms":5}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":11},"info":{"ephemeral_id":"a784951e-88c1-4222-a1ab-08363e8cf29c","uptime":{"ms":630279}},"memstats":{"gc_next":7327888,"memory_alloc":3678280,"memory_total":21368912},"runtime":{"goroutines":19}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":2}},"system":{"load":{"1":6.46,"15":1.27,"5":3.06,"norm":{"1":3.23,"15":0.635,"5":1.53}}}}}}
Jan 21 08:31:55 instancename filebeat[1120]: 2021-01-21T08:31:55.871+0100#011INFO#011[monitoring]#011log/log.go:145#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":100,"time":{"ms":3}},"total":{"ticks":250,"time":{"ms":4},"value":250},"user":{"ticks":150,"time":{"ms":1}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":11},"info":{"ephemeral_id":"a784951e-88c1-4222-a1ab-08363e8cf29c","uptime":{"ms":660278}},"memstats":{"gc_next":7327888,"memory_alloc":4167216,"memory_total":21857848},"runtime":{"goroutines":19}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":2}},"system":{"load":{"1":7.22,"15":1.5,"5":3.56,"norm":{"1":3.61,"15":0.75,"5":1.78}}}}}}
(...)
答案1
根据这段日志:
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: can not take infinite value
Jan 21 08:30:57 instancename collectd[1464]: write_gcm:
wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing.
这表明虚拟机实例没有交换内存,因此该度量插件尝试除以 0。
首先,转到/etc/stackdriver/collectd.conf
并检查以下配置是否已设置:
LoadPlugin swap
<Plugin "swap">
ValuesPercentage true
</Plugin>
然后,运行sudo service stackdriver-agent restart
然后运行free -m
检查交换内存。
希望能帮助到你!