我们正在 Dell Edge 网关 3002 上运行脚本以将数据发送到云,但运行 4-5 天后,网关变得无响应(挂起状态)并且无法 ping 通。网关上有 Ubuntu 18.04.3。检查日志后,我们收到多个与 cron、docker 等相关的“无法 fork/exec”和“资源临时不可用”错误
这些是崩溃当天的系统日志:
Line 4613: Aug 19 10:23:12 80LSN42 snapd[885]: stateengine.go:108: state ensure error: cannot refresh snap-declaration for "core": Get https://api.snapcraft.io/api/v1/snaps/assertions/snap-declaration/16/99T7MUlRhtI3U0QFgl5mXXESAiSwt776?max-format=3: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Line 6546: Aug 19 12:03:06 80LSN42 snapd[885]: stateengine.go:108: state ensure error: Get https://api.snapcraft.io/api/v1/snaps/sections: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Line 6947: Aug 19 12:23:06 80LSN42 snapd[885]: stateengine.go:108: state ensure error: cannot refresh snap-declaration for "core": Get https://api.snapcraft.io/api/v1/snaps/assertions/snap-declaration/16/99T7MUlRhtI3U0QFgl5mXXESAiSwt776?max-format=3: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Line 13371: Aug 19 17:34:45 80LSN42 iotedged[8469]: 2019-08-19T09:34:45Z [ERR!] - server connection error: (unknown)
Line 13372: Aug 19 17:34:45 80LSN42 iotedged[8469]: 2019-08-19T09:34:45Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
Line 13372: Aug 19 17:34:45 80LSN42 iotedged[8469]: 2019-08-19T09:34:45Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
Line 13377: Aug 19 17:34:48 80LSN42 iotedged[8469]: 2019-08-19T09:34:48Z [ERR!] - server connection error: (unknown)
Line 13378: Aug 19 17:34:48 80LSN42 iotedged[8469]: 2019-08-19T09:34:48Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
Line 13378: Aug 19 17:34:48 80LSN42 iotedged[8469]: 2019-08-19T09:34:48Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
Line 14386: Aug 19 18:23:03 80LSN42 iotedged[8469]: 2019-08-19T10:23:03Z [ERR!] - server connection error: (unknown)
Line 14387: Aug 19 18:23:03 80LSN42 iotedged[8469]: 2019-08-19T10:23:03Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
Line 14387: Aug 19 18:23:03 80LSN42 iotedged[8469]: 2019-08-19T10:23:03Z [ERR!] - error writing a body to connection: Broken pipe (os error 32)
Line 15240: Aug 19 19:03:06 80LSN42 snapd[885]: stateengine.go:108: state ensure error: cannot refresh snap-declaration for "core": Get https://api.snapcraft.io/api/v1/snaps/assertions/snap-declaration/16/99T7MUlRhtI3U0QFgl5mXXESAiSwt776?max-format=3: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Line 15739: Aug 19 19:28:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15740: Aug 19 19:28:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15760: Aug 19 19:29:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15761: Aug 19 19:29:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15780: Aug 19 19:30:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15781: Aug 19 19:30:01 80LSN42 CRON[25697]: (CRON) error (can't fork)
Line 15799: Aug 19 19:31:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15800: Aug 19 19:31:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15818: Aug 19 19:32:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15819: Aug 19 19:32:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15843: Aug 19 19:33:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15844: Aug 19 19:33:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15871: Aug 19 19:34:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15872: Aug 19 19:34:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15892: Aug 19 19:35:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15893: Aug 19 19:35:01 80LSN42 CRON[25923]: (CRON) error (can't fork)
Line 15913: Aug 19 19:36:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15914: Aug 19 19:36:01 80LSN42 CRON[25936]: (CRON) error (can't fork)
Line 15933: Aug 19 19:37:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15934: Aug 19 19:37:01 80LSN42 CRON[25949]: (CRON) error (can't fork)
Line 15955: Aug 19 19:38:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15956: Aug 19 19:38:01 80LSN42 CRON[25974]: (CRON) error (can't fork)
Line 15975: Aug 19 19:39:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15976: Aug 19 19:39:01 80LSN42 CRON[26000]: (CRON) error (can't fork)
Line 15996: Aug 19 19:40:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 15997: Aug 19 19:40:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 16016: Aug 19 19:41:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 16017: Aug 19 19:41:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 16036: Aug 19 19:42:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 16037: Aug 19 19:42:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 16055: Aug 19 19:43:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 16056: Aug 19 19:43:01 80LSN42 CRON[26090]: (CRON) error (can't fork)
Line 16076: Aug 19 19:44:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 16077: Aug 19 19:44:01 80LSN42 CRON[26107]: (CRON) error (can't fork)
Line 16096: Aug 19 19:45:01 80LSN42 cron[855]: (CRON) error (can't fork)
Line 16097: Aug 19 19:45:01 80LSN42 CRON[26124]: (CRON) error (can't fork)
Line 16108: Aug 19 19:45:37 80LSN42 dockerd[1227]: time="2019-08-19T19:45:37.338637305+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable\nSIGABRT: abort\nPC=0x6e7ede m=0 sigcode=18446744073709551610\n\ngoroutine 0 [idle]:\nruntime: unknown pc 0x6e7ede\nstack: frame={sp:0x7fff66f25ca8, fp:0x0} stack=[0x7fff667271b8,0x7fff66f261e0)\n00007fff66f25ba8: 0000000000000000 0000000000000000 \n00007fff66f25bb8: 0000000000000000 0000000000000000 \n00007fff66f25bc8: 00007f4a3d2cc000 0000000000453c50 <runtime.mmap.func1+0> \n00007fff66f25bd8: 00007fff66f25c28 00007fff66f25c38 \n00007fff66f25be8: 0000000000000040 0000000000000040 \n00007fff66f25bf8: 0000000000000001 000000006e43a318 \n00007fff66f25c08: 00000000006cde8c 0000000000d4a718 \n00007fff66f25c18: 000000000045b31e <runtime.callCgoMmap+62> 00007fff66f25c28 \n00007fff66f25c28: 00000000...
Line 16108: Aug 19 19:45:37 80LSN42 dockerd[1227]: time="2019-08-19T19:45:37.338637305+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable\nSIGABRT: abort\nPC=0x6e7ede m=0 sigcode=18446744073709551610\n\ngoroutine 0 [idle]:\nruntime: unknown pc 0x6e7ede\nstack: frame={sp:0x7fff66f25ca8, fp:0x0} stack=[0x7fff667271b8,0x7fff66f261e0)\n00007fff66f25ba8: 0000000000000000 0000000000000000 \n00007fff66f25bb8: 0000000000000000 0000000000000000 \n00007fff66f25bc8: 00007f4a3d2cc000 0000000000453c50 <runtime.mmap.func1+0> \n00007fff66f25bd8: 00007fff66f25c28 00007fff66f25c38 \n00007fff66f25be8: 0000000000000040 0000000000000040 \n00007fff66f25bf8: 0000000000000001 000000006e43a318 \n00007fff66f25c08: 00000000006cde8c 0000000000d4a718 \n00007fff66f25c18: 000000000045b31e <runtime.callCgoMmap+62> 00007fff66f25c28 \n00007fff66f25c28: 00000000...
Line 16113: Aug 19 19:45:38 80LSN42 dockerd[1227]: time="2019-08-19T19:45:38.006022597+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable\nSIGABRT: abort\nPC=0x6e7ede m=0 sigcode=18446744073709551610\n\ngoroutine 0 [idle]:\nruntime: unknown pc 0x6e7ede\nstack: frame={sp:0x7ffed95c0af8, fp:0x0} stack=[0x7ffed8dc2008,0x7ffed95c1030)\n00007ffed95c09f8: 0000000000000000 0000000000000000 \n00007ffed95c0a08: 0000000000000000 0000000000000000 \n00007ffed95c0a18: 00007f6565ba1000 0000000000453c50 <runtime.mmap.func1+0> \n00007ffed95c0a28: 00007ffed95c0a78 00007ffed95c0a88 \n00007ffed95c0a38: 0000000000000040 0000000000000040 \n00007ffed95c0a48: 0000000000000001 000000006e43a318 \n00007ffed95c0a58: 00000000006cde8c 0000000000d4a718 \n00007ffed95c0a68: 000000000045b31e <runtime.callCgoMmap+62> 00007ffed95c0a78 \n00007ffed95c0a78: 00000000...
Line 16113: Aug 19 19:45:38 80LSN42 dockerd[1227]: time="2019-08-19T19:45:38.006022597+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable\nSIGABRT: abort\nPC=0x6e7ede m=0 sigcode=18446744073709551610\n\ngoroutine 0 [idle]:\nruntime: unknown pc 0x6e7ede\nstack: frame={sp:0x7ffed95c0af8, fp:0x0} stack=[0x7ffed8dc2008,0x7ffed95c1030)\n00007ffed95c09f8: 0000000000000000 0000000000000000 \n00007ffed95c0a08: 0000000000000000 0000000000000000 \n00007ffed95c0a18: 00007f6565ba1000 0000000000453c50 <runtime.mmap.func1+0> \n00007ffed95c0a28: 00007ffed95c0a78 00007ffed95c0a88 \n00007ffed95c0a38: 0000000000000040 0000000000000040 \n00007ffed95c0a48: 0000000000000001 000000006e43a318 \n00007ffed95c0a58: 00000000006cde8c 0000000000d4a718 \n00007ffed95c0a68: 000000000045b31e <runtime.callCgoMmap+62> 00007ffed95c0a78 \n00007ffed95c0a78: 00000000...
Line 16116: Aug 19 19:45:42 80LSN42 dockerd[1227]: time="2019-08-19T19:45:42.542392306+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable\nSIGABRT: abort\nPC=0x6e7ede m=0 sigcode=18446744073709551610\n\ngoroutine 0 [idle]:\nruntime: unknown pc 0x6e7ede\nstack: frame={sp:0x7ffeee7dd0f8, fp:0x0} stack=[0x7ffeedfde608,0x7ffeee7dd630)\n00007ffeee7dcff8: 0000000000000000 0000000000000000 \n00007ffeee7dd008: 0000000000000000 0000000000000000 \n00007ffeee7dd018: 00007fe86fe7d000 0000000000453c50 <runtime.mmap.func1+0> \n00007ffeee7dd028: 00007ffeee7dd078 00007ffeee7dd088 \n00007ffeee7dd038: 0000000000000040 0000000000000040 \n00007ffeee7dd048: 0000000000000001 000000006e43a318 \n00007ffeee7dd058: 00000000006cde8c 0000000000d4a718 \n00007ffeee7dd068: 000000000045b31e <runtime.callCgoMmap+62> 00007ffeee7dd078 \n00007ffeee7dd078: 00000000...
Line 16116: Aug 19 19:45:42 80LSN42 dockerd[1227]: time="2019-08-19T19:45:42.542392306+08:00" level=error msg="Handler for GET /containers/edgeHub/top returned error: OCI runtime state failed: runc did not terminate sucessfully: runtime/cgo: pthread_create failed: Resource temporarily unavailable
这些错误不断重复。我们还尝试检查和更改以下限制:
- kernel.pid_max limit(内核限制)
- ulimit -u(用户限制)
- systemctl 状态 cron/docker | grep 任务(服务限制)
但没有运气!请大家帮忙给点意见!!