名义上的失败的一个例子:
user@host ~> touch a
user@host ~> tail -f ./a &
user@host ~> ps aux | grep tail
user-+ 1457 0.0 0.0 8120 580 pts/1 S 16:04 0:00 tail -f ./a
user-+ 1459 0.0 0.0 9040 2568 pts/1 S+ 16:04 0:00 grep --color=auto tail
user@host ~> ls -l /proc/1457/fd
total 0
lrwx------ 1 user user 64 Jun 8 16:04 0 -> /dev/pts/1
lrwx------ 1 user user 64 Jun 8 16:04 1 -> /dev/pts/1
lrwx------ 1 user user 64 Jun 8 16:04 2 -> /dev/pts/1
lr-x------ 1 user user 64 Jun 8 16:04 3 -> /home/user/a
lr-x------ 1 user user 64 Jun 8 16:04 4 -> anon_inode:inotify
user@host ~> docker run -it --rm -v /proc/1457/fd/3:/a ubuntu
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:75: mounting "/proc/1457/fd/3" to rootfs at "/a" caused: mount through procfd: invalid argument: unknown.
这似乎不是绑定挂载的基本限制:
user@host ~> touch b
user@host ~> sudo mount --bind /proc/1457/fd/3 b
user@host ~> cat b
user@host ~> echo "test" > a
test
user@host ~> cat b
test
我查看了 opencontainer/runc 的源代码,发现除了上述概念验证之外,还有一个小问题。为了避免攻击者将挂载目标路径替换为符号链接,runc 打开目标路径,然后使用下面的相应路径/proc/self/fd
来引用它。
我编写了一个小 C 程序来模拟这些情况,以确保这个概念仍然合理:
// a.c
#include <sys/mount.h>
#include <stdio.h>
#include <fcntl.h>
int main(int argc, char const *argv[])
{
if (argc < 4) {
fprintf(stderr, "Not enough args");
return 1;
}
char src_path[1024];
{
int r = snprintf(src_path, sizeof(src_path), "/proc/%s/fd/%s", argv[1], argv[2]);
if (r < 0 || r >= sizeof(src_path)) {
fprintf(stderr, "src sprintf failed\n");
return 1;
}
}
int fd = open(argv[3], O_RDWR);
if (fd == 0) {
perror("Open file failed: ");
return 1;
}
char dst_path[1024];
{
int r = snprintf(dst_path, sizeof(dst_path), "/proc/self/fd/%d", fd);
if (r < 0 || r >= sizeof(dst_path)) {
fprintf(stderr, "dst sprintf failed\n");
return 1;
}
}
int r = mount(src_path, dst_path, "none", MS_BIND, 0);
if (r != 0) {
perror("Mount failed: ");
return 1;
}
return 0;
}
其成功运行完成并达到了预期的效果:
user@host ~> gcc ./a.c
user@host ~> sudo ./a.out 1457 3 c
user@host ~ [1]> echo "another test" >> c
another test
user@host ~> cat a
test
another test
我抓住了救命稻草,跟踪了 docker-containerd,以确保 runc 只是使用带有类似参数的 mount,这里是日志的摘录:
1491 newfstatat(AT_FDCWD, "/proc/1457/fd/3", {st_mode=S_IFSOCK|0777, st_size=0, ...}, 0) = 0
1491 newfstatat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged/volume", 0xc0000e81d8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
1491 newfstatat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged/volume", 0xc0000e82a8, 0) = -1 ENOENT (No such file or directory)
1491 newfstatat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged", <unfinished ...>
1491 <... newfstatat resumed>{st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
1491 openat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged/volume", O_RDONLY|O_CREAT|O_CLOEXEC, 0755) = 7
1491 epoll_ctl(8, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2955630296, u64=140083313989336}} <unfinished ...>
1491 <... epoll_ctl resumed>) = -1 EPERM (Operation not permitted)
1491 close(7) = 0
1491 newfstatat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged/volume", {st_mode=S_IFREG|0755, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
1491 openat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged/volume", O_RDONLY|O_CLOEXEC|O_PATH) = 7
1491 epoll_ctl(8, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2955630296, u64=140083313989336}} <unfinished ...>
1491 <... epoll_ctl resumed>) = -1 EBADF (Bad file descriptor)
1491 readlinkat(AT_FDCWD, "/proc/self/fd/7", "/var/lib/docker/overlay2/31b012c"..., 128) = 103
1491 mount("/proc/1457/fd/3", "/proc/self/fd/7", 0xc0001bf1d7, MS_BIND, NULL) = -1 EINVAL (Invalid argument)
1491 close(7) = 0
您可以看到/proc/1457/fd/3
是 stat,并且它存在,但是当尝试挂载它时会失败并出现 EINVAL。阅读后文档我看不出这有什么明显的原因。
操作系统详细信息:
user@host ~> uname -a
Linux host 5.13.0-44-generic #49~20.04.1-Ubuntu SMP Wed May 18 18:44:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Docker 详细信息:
user@host ~> docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.8.1-docker)
scan: Docker Scan (Docker Inc., v0.17.0)
Server:
Containers: 62
Running: 17
Paused: 0
Stopped: 45
Images: 1198
Server Version: 20.10.14
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3df54a852345ae127d1fa3092b95168e4a88e2f8
runc version: v1.0.3-0-gf46b6ba
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.13.0-44-generic
Operating System: Ubuntu 20.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 15.55GiB
Name: host
ID: BWNT:CW6A:OMFO:TI67:5TMC:CUCT:CPXG:GSXM:NKNZ:VMBK:MBEM:RRZV
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
我已经想不出哪里出了问题,也不知道下一步该怎么做。提前感谢大家的帮助。