将 systemd-nspawn 容器作为 systemd 单元进行管理

将 systemd-nspawn 容器作为 systemd 单元进行管理

问题

如何将 systemd-nspawn 容器作为 systemd 单元进行管理?

设置

安装镜像

# machinectl pull-raw --verify=no https://ftp.halifax.rwth-aachen.de/fedora/linux/releases/30/Cloud/x86_64/images/Fedora-Cloud-Base-30-1.2.x86_64.raw.xz

发现图像名称

# machinectl list-images
NAME                            TYPE RO  USAGE  CREATED                     MODIF
Fedora-Cloud-Base-30-1.2.x86_64 raw  no  891.6M Fri 2019-04-26 02:14:49 UTC Fri 2

1 images listed.

在 Fedora 容器内启动交互式 shell

# systemd-nspawn -M Fedora-Cloud-Base-30-1.2.x86_64

编写/root/app.py处理信号的python3应用程序

# https://stackabuse.com/handling-unix-signals-in-python/
import signal
import os
import time
import sys

def terminateProcess(signalNumber, frame):
    print(f'received signal {signalNumber}')
    print ('exiting...')
    sys.exit()

def receiveSignal(signalNumber, frame):
    print(f'received signal {signalNumber}')
    return

if __name__ == '__main__':
    # register the signals to be caught
    signal.signal(signal.SIGHUP, receiveSignal)
    signal.signal(signal.SIGINT, terminateProcess)
    signal.signal(signal.SIGQUIT, receiveSignal)
    signal.signal(signal.SIGILL, receiveSignal)
    signal.signal(signal.SIGTRAP, receiveSignal)
    signal.signal(signal.SIGABRT, receiveSignal)
    signal.signal(signal.SIGBUS, receiveSignal)
    signal.signal(signal.SIGFPE, receiveSignal)
    #signal.signal(signal.SIGKILL, receiveSignal)
    signal.signal(signal.SIGUSR1, receiveSignal)
    signal.signal(signal.SIGSEGV, receiveSignal)
    signal.signal(signal.SIGUSR2, receiveSignal)
    signal.signal(signal.SIGPIPE, receiveSignal)
    signal.signal(signal.SIGALRM, receiveSignal)
    signal.signal(signal.SIGTERM, terminateProcess)

    # output current process id
    print(f'pid {os.getpid()}')

    # wait in an endless loop for signals 
    while True:
        time.sleep(1)

使用组合键退出容器

Control + ]]]

应用程序服务尝试1

写入 /etc/systemd/system/app.service 单元文件

[Service]
ExecStart=/usr/bin/systemd-nspawn --keep-unit -M Fedora-Cloud-Base-30-1.2.x86_64 python3 -u /root/app.py
SyslogIdentifier=%N
  • --keep-unitswitch 将 systemd-nspawn 和 app.py 保留在system.slice/app.servicecgroup中
  • -u无缓冲输出开关
  • SyslogIdentifier 使用%N字符串“app”的说明符,这是不带后缀的单元名称

重新加载systemd守护进程

# systemctl daemon-reload

在另一个终端中,使用systemd-journald持续跟踪日志输出

# journalctl -f -u app.service

启动app.service单元

# systemctl start app.service

停止app.service单元

# systemctl stop app.service

观察日志

-- Logs begin at Fri 2019-08-23 16:58:11 UTC. --
Aug 23 17:26:42 srv0 systemd[1]: Started app.service.
Aug 23 17:26:42 srv0 app[12745]: Spawning container Fedora-Cloud-Base-30-1.2.x86_64 on /var/lib/machines/Fedora-Cloud-Base-30-1.2.x86_64.raw.
Aug 23 17:26:42 srv0 app[12745]: Press ^] three times within 1s to kill container.
Aug 23 17:26:42 srv0 app[12745]: Failed to create directory /tmp/nspawn-root-afZQoJ/sys/fs/selinux: Read-only file system
Aug 23 17:26:42 srv0 app[12745]: Failed to create directory /tmp/nspawn-root-afZQoJ/sys/fs/selinux: Read-only file system
Aug 23 17:26:42 srv0 app[12745]: pid 1
Aug 23 17:26:54 srv0 systemd[1]: Stopping app.service...
Aug 23 17:26:54 srv0 app[12745]: Container Fedora-Cloud-Base-30-1.2.x86_64 terminated by signal KILL.
Aug 23 17:26:54 srv0 systemd[1]: app.service: Main process exited, code=exited, status=1/FAILURE
Aug 23 17:26:54 srv0 systemd[1]: Stopped app.service.
Aug 23 17:26:54 srv0 systemd[1]: app.service: Unit entered failed state.
Aug 23 17:26:54 srv0 systemd[1]: app.service: Failed with result 'exit-code'.
Aug 23 17:26:54 srv0 systemd[1]: Stopped app.service.

systemd-nspawn 使用 SIGKILL 而不是 SIGTERM。

Aug 23 17:26:54 srv0 app[12745]: Container Fedora-Cloud-Base-30-1.2.x86_64 terminated by signal KILL.请参阅“I don’t Want to SIGKILL app.py, I Want to SIGTERM it”这一行。

应用程序服务尝试2

阅读 github 问题

https://github.com/systemd/systemd/issues/7105#issuecomment-467491778

使用-a/--as-pid2开关

[Service]
ExecStart=/usr/bin/systemd-nspawn --keep-unit --as-pid2 -M Fedora-Cloud-Base-30-1.2.x86_64 python3 -u /root/app.py
SyslogIdentifier=%N

守护进程重新加载、启动、停止

观察日志

Aug 23 17:29:59 srv0 systemd[1]: Started app.service.
Aug 23 17:29:59 srv0 app[12841]: Spawning container Fedora-Cloud-Base-30-1.2.x86_64 on /var/lib/machines/Fedora-Cloud-Base-30-1.2.x86_64.raw.
Aug 23 17:29:59 srv0 app[12841]: Press ^] three times within 1s to kill container.
Aug 23 17:29:59 srv0 app[12841]: Failed to create directory /tmp/nspawn-root-jaGbcx/sys/fs/selinux: Read-only file system
Aug 23 17:29:59 srv0 app[12841]: Failed to create directory /tmp/nspawn-root-jaGbcx/sys/fs/selinux: Read-only file system
Aug 23 17:29:59 srv0 app[12841]: pid 2
Aug 23 17:30:06 srv0 systemd[1]: Stopping app.service...
Aug 23 17:30:06 srv0 app[12841]: Container Fedora-Cloud-Base-30-1.2.x86_64 terminated by signal KILL.
Aug 23 17:30:06 srv0 systemd[1]: app.service: Main process exited, code=exited, status=1/FAILURE
Aug 23 17:30:06 srv0 systemd[1]: Stopped app.service.
Aug 23 17:30:06 srv0 systemd[1]: app.service: Unit entered failed state.
Aug 23 17:30:06 srv0 systemd[1]: app.service: Failed with result 'exit-code'.

app.py 现在以 pid 2 运行!但仍然收到 SIGKILL 信号而不是 SIGTERM。

答案1

阅读 github 问题更多

https://github.com/systemd/systemd/issues/7105#issuecomment-467491778

使用--kill-signal旗帜

[Service]
ExecStart=/usr/bin/systemd-nspawn --keep-unit --kill-signal=SIGTERM -M Fedora-Cloud-Base-30-1.2.x86_64 python3 -u /root/app.py
SyslogIdentifier=%N

重新加载systemd,启动app.service,并停止app.service

systemctl daemon-reload
systemctl start app.service
systemctl stop app.service

观察日志

Aug 23 17:51:32 srv0 systemd[1]: Started app.service.
Aug 23 17:51:32 srv0 app[12994]: Spawning container Fedora-Cloud-Base-30-1.2.x86_64 on /var/lib/machines/Fedora-Cloud-Base-30-1.2.x86_64.raw.
Aug 23 17:51:32 srv0 app[12994]: Press ^] three times within 1s to kill container.
Aug 23 17:51:32 srv0 app[12994]: Failed to create directory /tmp/nspawn-root-71uVxm/sys/fs/selinux: Read-only file system
Aug 23 17:51:32 srv0 app[12994]: Failed to create directory /tmp/nspawn-root-71uVxm/sys/fs/selinux: Read-only file system
Aug 23 17:51:32 srv0 app[12994]: pid 1
Aug 23 17:51:35 srv0 app[12994]: Trying to halt container. Send SIGTERM again to trigger immediate termination.
Aug 23 17:51:35 srv0 app[12994]: received signal 15
Aug 23 17:51:35 srv0 app[12994]: exiting...
Aug 23 17:51:35 srv0 systemd[1]: Stopping app.service...
Aug 23 17:51:35 srv0 app[12994]: Container Fedora-Cloud-Base-30-1.2.x86_64 exited successfully.
Aug 23 17:51:35 srv0 systemd[1]: Stopped app.service.
Aug 23 17:51:35 srv0 systemd[1]: Stopped app.service.

看看 SIGTERM 是如何传递到 app.py 的!

相关内容