多个进程因错误“资源暂时不可用”和“无法分叉”而停止,导致崩溃

多个进程因错误“资源暂时不可用”和“无法分叉”而停止,导致崩溃

我们正在 Dell Edge 网关 3002 上运行脚本以将数据发送到云,但运行 4-5 天后,网关变得无响应(挂起状态)并且无法 ping 通。网关上有 Ubuntu 18.04.3。检查日志后,我们收到多个与 cron、docker 等相关的“无法 fork/exec”和“资源临时不可用”错误

这些是崩溃当天的系统日志:

 Line 4613: Aug 19 10:23:12 80LSN42 snapd[885]: stateengine.go:108: state ensure error: cannot refresh snap-declaration for "core": Get https://api.snapcraft.io/api/v1/snaps/assertions/snap-declaration/16/99T7MUlRhtI3U0QFgl5mXXESAiSwt776?max-format=3: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Line 6546: Aug 19 12:03:06 80LSN42 snapd[885]: stateengine.go:108: state ensure error: Get https://api.snapcraft.io/api/v1/snaps/sections: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Line 6947: Aug 19 12:23:06 80LSN42 snapd[885]: stateengine.go:108: state ensure error: cannot refresh snap-declaration for "core": Get https://api.snapcraft.io/api/v1/snaps/assertions/snap-declaration/16/99T7MUlRhtI3U0QFgl5mXXESAiSwt776?max-format=3: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Line 13371: Aug 19 17:34:45 80LSN42 iotedged[8469]: 2019-08-19T09:34:45Z [ERR!] - server connection error: (unknown)
    Line 13372: Aug 19 17:34:45 80LSN42 iotedged[8469]: 2019-08-19T09:34:45Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
    Line 13372: Aug 19 17:34:45 80LSN42 iotedged[8469]: 2019-08-19T09:34:45Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
    Line 13377: Aug 19 17:34:48 80LSN42 iotedged[8469]: 2019-08-19T09:34:48Z [ERR!] - server connection error: (unknown)
    Line 13378: Aug 19 17:34:48 80LSN42 iotedged[8469]: 2019-08-19T09:34:48Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
    Line 13378: Aug 19 17:34:48 80LSN42 iotedged[8469]: 2019-08-19T09:34:48Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
    Line 14386: Aug 19 18:23:03 80LSN42 iotedged[8469]: 2019-08-19T10:23:03Z [ERR!] - server connection error: (unknown)
    Line 14387: Aug 19 18:23:03 80LSN42 iotedged[8469]: 2019-08-19T10:23:03Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
    Line 14387: Aug 19 18:23:03 80LSN42 iotedged[8469]: 2019-08-19T10:23:03Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
    Line 15240: Aug 19 19:03:06 80LSN42 snapd[885]: stateengine.go:108: state ensure error: cannot refresh snap-declaration for "core": Get https://api.snapcraft.io/api/v1/snaps/assertions/snap-declaration/16/99T7MUlRhtI3U0QFgl5mXXESAiSwt776?max-format=3: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Line 15739: Aug 19 19:28:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15740: Aug 19 19:28:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15760: Aug 19 19:29:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15761: Aug 19 19:29:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15780: Aug 19 19:30:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15781: Aug 19 19:30:01 80LSN42 CRON[25697]: (CRON) error (can't fork)
    Line 15799: Aug 19 19:31:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15800: Aug 19 19:31:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15818: Aug 19 19:32:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15819: Aug 19 19:32:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15843: Aug 19 19:33:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15844: Aug 19 19:33:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15871: Aug 19 19:34:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15872: Aug 19 19:34:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15892: Aug 19 19:35:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15893: Aug 19 19:35:01 80LSN42 CRON[25923]: (CRON) error (can't fork)
    Line 15913: Aug 19 19:36:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15914: Aug 19 19:36:01 80LSN42 CRON[25936]: (CRON) error (can't fork)
    Line 15933: Aug 19 19:37:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15934: Aug 19 19:37:01 80LSN42 CRON[25949]: (CRON) error (can't fork)
    Line 15955: Aug 19 19:38:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15956: Aug 19 19:38:01 80LSN42 CRON[25974]: (CRON) error (can't fork)
    Line 15975: Aug 19 19:39:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15976: Aug 19 19:39:01 80LSN42 CRON[26000]: (CRON) error (can't fork)
    Line 15996: Aug 19 19:40:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 15997: Aug 19 19:40:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 16016: Aug 19 19:41:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 16017: Aug 19 19:41:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 16036: Aug 19 19:42:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 16037: Aug 19 19:42:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 16055: Aug 19 19:43:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 16056: Aug 19 19:43:01 80LSN42 CRON[26090]: (CRON) error (can't fork)
    Line 16076: Aug 19 19:44:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 16077: Aug 19 19:44:01 80LSN42 CRON[26107]: (CRON) error (can't fork)
    Line 16096: Aug 19 19:45:01 80LSN42 cron[855]: (CRON) error (can't fork)
    Line 16097: Aug 19 19:45:01 80LSN42 CRON[26124]: (CRON) error (can't fork)
    Line 16108: Aug 19 19:45:37 80LSN42 dockerd[1227]: time="2019-08-19T19:45:37.338637305+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable\nSIGABRT: abort\nPC=0x6e7ede m=0 sigcode=18446744073709551610\n\ngoroutine 0 [idle]:\nruntime: unknown pc 0x6e7ede\nstack: frame={sp:0x7fff66f25ca8, fp:0x0} stack=[0x7fff667271b8,0x7fff66f261e0)\n00007fff66f25ba8:  0000000000000000  0000000000000000 \n00007fff66f25bb8:  0000000000000000  0000000000000000 \n00007fff66f25bc8:  00007f4a3d2cc000  0000000000453c50 <runtime.mmap.func1+0> \n00007fff66f25bd8:  00007fff66f25c28  00007fff66f25c38 \n00007fff66f25be8:  0000000000000040  0000000000000040 \n00007fff66f25bf8:  0000000000000001  000000006e43a318 \n00007fff66f25c08:  00000000006cde8c  0000000000d4a718 \n00007fff66f25c18:  000000000045b31e <runtime.callCgoMmap+62>  00007fff66f25c28 \n00007fff66f25c28:  00000000...
    Line 16108: Aug 19 19:45:37 80LSN42 dockerd[1227]: time="2019-08-19T19:45:37.338637305+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable\nSIGABRT: abort\nPC=0x6e7ede m=0 sigcode=18446744073709551610\n\ngoroutine 0 [idle]:\nruntime: unknown pc 0x6e7ede\nstack: frame={sp:0x7fff66f25ca8, fp:0x0} stack=[0x7fff667271b8,0x7fff66f261e0)\n00007fff66f25ba8:  0000000000000000  0000000000000000 \n00007fff66f25bb8:  0000000000000000  0000000000000000 \n00007fff66f25bc8:  00007f4a3d2cc000  0000000000453c50 <runtime.mmap.func1+0> \n00007fff66f25bd8:  00007fff66f25c28  00007fff66f25c38 \n00007fff66f25be8:  0000000000000040  0000000000000040 \n00007fff66f25bf8:  0000000000000001  000000006e43a318 \n00007fff66f25c08:  00000000006cde8c  0000000000d4a718 \n00007fff66f25c18:  000000000045b31e <runtime.callCgoMmap+62>  00007fff66f25c28 \n00007fff66f25c28:  00000000...
    Line 16113: Aug 19 19:45:38 80LSN42 dockerd[1227]: time="2019-08-19T19:45:38.006022597+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable\nSIGABRT: abort\nPC=0x6e7ede m=0 sigcode=18446744073709551610\n\ngoroutine 0 [idle]:\nruntime: unknown pc 0x6e7ede\nstack: frame={sp:0x7ffed95c0af8, fp:0x0} stack=[0x7ffed8dc2008,0x7ffed95c1030)\n00007ffed95c09f8:  0000000000000000  0000000000000000 \n00007ffed95c0a08:  0000000000000000  0000000000000000 \n00007ffed95c0a18:  00007f6565ba1000  0000000000453c50 <runtime.mmap.func1+0> \n00007ffed95c0a28:  00007ffed95c0a78  00007ffed95c0a88 \n00007ffed95c0a38:  0000000000000040  0000000000000040 \n00007ffed95c0a48:  0000000000000001  000000006e43a318 \n00007ffed95c0a58:  00000000006cde8c  0000000000d4a718 \n00007ffed95c0a68:  000000000045b31e <runtime.callCgoMmap+62>  00007ffed95c0a78 \n00007ffed95c0a78:  00000000...
    Line 16113: Aug 19 19:45:38 80LSN42 dockerd[1227]: time="2019-08-19T19:45:38.006022597+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable\nSIGABRT: abort\nPC=0x6e7ede m=0 sigcode=18446744073709551610\n\ngoroutine 0 [idle]:\nruntime: unknown pc 0x6e7ede\nstack: frame={sp:0x7ffed95c0af8, fp:0x0} stack=[0x7ffed8dc2008,0x7ffed95c1030)\n00007ffed95c09f8:  0000000000000000  0000000000000000 \n00007ffed95c0a08:  0000000000000000  0000000000000000 \n00007ffed95c0a18:  00007f6565ba1000  0000000000453c50 <runtime.mmap.func1+0> \n00007ffed95c0a28:  00007ffed95c0a78  00007ffed95c0a88 \n00007ffed95c0a38:  0000000000000040  0000000000000040 \n00007ffed95c0a48:  0000000000000001  000000006e43a318 \n00007ffed95c0a58:  00000000006cde8c  0000000000d4a718 \n00007ffed95c0a68:  000000000045b31e <runtime.callCgoMmap+62>  00007ffed95c0a78 \n00007ffed95c0a78:  00000000...
    Line 16116: Aug 19 19:45:42 80LSN42 dockerd[1227]: time="2019-08-19T19:45:42.542392306+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable\nSIGABRT: abort\nPC=0x6e7ede m=0 sigcode=18446744073709551610\n\ngoroutine 0 [idle]:\nruntime: unknown pc 0x6e7ede\nstack: frame={sp:0x7ffeee7dd0f8, fp:0x0} stack=[0x7ffeedfde608,0x7ffeee7dd630)\n00007ffeee7dcff8:  0000000000000000  0000000000000000 \n00007ffeee7dd008:  0000000000000000  0000000000000000 \n00007ffeee7dd018:  00007fe86fe7d000  0000000000453c50 <runtime.mmap.func1+0> \n00007ffeee7dd028:  00007ffeee7dd078  00007ffeee7dd088 \n00007ffeee7dd038:  0000000000000040  0000000000000040 \n00007ffeee7dd048:  0000000000000001  000000006e43a318 \n00007ffeee7dd058:  00000000006cde8c  0000000000d4a718 \n00007ffeee7dd068:  000000000045b31e <runtime.callCgoMmap+62>  00007ffeee7dd078 \n00007ffeee7dd078:  00000000...
    Line 16116: Aug 19 19:45:42 80LSN42 dockerd[1227]: time="2019-08-19T19:45:42.542392306+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable

这些错误不断重复。我们还尝试检查和更改以下限制:

  • kernel.pid_max limit(内核限制)
  • ulimit -u(用户限制)
  • systemctl 状态 cron/docker | grep 任务(服务限制)

但没有运气!请大家帮忙给点意见!!

相关内容