具有多个实例的 Systemd 并在失败时重新启动

具有多个实例的 Systemd 并在失败时重新启动

我使用 systemd 创建了一个具有多个实例的结构,可以使用不同的参数多次运行同一程序。我的意图是每个实例都是独立于其他实例的,如果一个实例失败,它将重新启动,而其他实例保持不变。

这是我的目标单位:

[Unit]
Description=Cutter
After=FD-go-00_tree.service
After=FD-go-01_pre.service
[email protected] [email protected] [email protected] [email protected] 

[Install]
WantedBy=FD-go-00_tree.service

这是我的服务单位:

[Unit]
Description="FD-cutter # %i - instance"
After=FD-go-00_tree.service
After=FD-go-01_pre.service
PartOf=FD-go-05_cutter.target
ConditionPathExists=/home/himarc/projects/multi-service/EnvironmentFile/FD-go-05_cutter_%i

# StartLimitIntervalSec in recent systemd versions
StartLimitInterval=0

[Service]
Type=simple
EnvironmentFile=/home/himarc/projects/multi-service/EnvironmentFile/FD-go-05_cutter_%i
ExecStart=/usr/bin/nice -n -1 /home/himarc/projects/bin/FD-cutter ${MyInput_1} ${MyInput_2} ${MyPath} %i
StandardOutput=file:/srv/FD/%i/trace/FD-log-cutter.log
StandardError=file:/srv/FD/%i/trace/FD-log-cutter.log
Restart=always

# time to sleep before restarting a service
RestartSec=1

[Install]
WantedBy=FD-go-00_tree.service

如果实例发生故障,则不会重新启动单个服务,而是会重新启动整个目标单元。

Aug 25 11:15:06 localhost kernel: [2493693.364584] FD-cutter[21251]: segfault at 4c8 ip 000055d7ee0d9e28 sp 00007f312186caf0 error 6 in FD-cutter[55d7ee0d2000+1a000]
Aug 25 11:15:06 localhost kernel: [2493693.364591] Code: f8 ff ff 48 8d 15 08 22 21 00 48 8d 35 d1 26 21 00 48 8b 05 32 25 21 00 48 8d 3d 2b 25 21 00 48 c7 05 e8 21 21 00 00 00 00 00 <48> 89 88 c8 04 00 00 48 89 90 d0 04 00 00 48 89 e9 31 d2 e8 e0 dd
Aug 25 11:15:06 localhost systemd[1]: [email protected]: Main process exited, code=killed, status=11/SEGV
Aug 25 11:15:06 localhost systemd[1]: [email protected]: Failed with result 'signal'.
Aug 25 11:15:08 localhost systemd[1]: [email protected]: Service hold-off time over, scheduling restart.
Aug 25 11:15:08 localhost systemd[1]: [email protected]: Scheduled restart job, restart counter is at 1.
Aug 25 11:15:08 localhost systemd[1]: Stopped target Cutter.
Aug 25 11:15:08 localhost systemd[1]: Stopping Cutter.
Aug 25 11:15:08 localhost systemd[1]: Stopping "FD-cutter # RC1P112 - instance"...
Aug 25 11:15:08 localhost systemd[1]: Stopping "FD-cutter # RC1P111 - instance"...
Aug 25 11:15:08 localhost systemd[1]: Stopping "FD-cutter # RC1P212 - instance"...
Aug 25 11:15:08 localhost systemd[1]: Stopped "FD-cutter # RC1P211 - instance".
Aug 25 11:15:08 localhost systemd[1]: Started "FD-cutter # RC1P211 - instance".
Aug 25 11:15:08 localhost systemd[1]: Stopped "FD-cutter # RC1P112 - instance".
Aug 25 11:15:08 localhost systemd[1]: Stopped "FD-cutter # RC1P111 - instance".
Aug 25 11:15:08 localhost systemd[1]: Stopped "FD-cutter # RC1P212 - instance".
Aug 25 11:15:08 localhost systemd[1]: Started "FD-cutter # RC1P212 - instance".
Aug 25 11:15:08 localhost systemd[1]: Started "FD-cutter # RC1P111 - instance".

有没有办法只重新启动发生故障的服务单元,而不重新启动其他服务单元?

答案1

您添加了Requires=目标和实例之间的关系。这是一种非常牢固的关系。根据系统单元(5)这意味着:

需要=

与 Wants= 类似,但声明了更强的依赖性。也可以通过将符号链接添加到单元文件附带的 .requires/ 目录来配置这种类型的依赖项。

如果该单位被激活,列出的单位也将被激活。如果其他单元之一无法激活,并且设置了对失败单元的排序依赖性 After=,则该单元将不会启动。此外,无论是否指定 After=,如果其他单元之一显式停止,则该单元将停止。

通常,使用 Wants= 而不是 Requires= 是更好的选择,以便使系统在处理故障服务时更加健壮。

因此,如果将 更改为Requires=Wants=则启动目标将启动模板化服务,但模板化服务失败不会影响目标。

相关内容