为什么尝试绑定挂载 /proc/ 下的文件/fd 在 docker 中失败?

为什么尝试绑定挂载 /proc/ 下的文件/fd 在 docker 中失败?


user@host ~> touch a
user@host ~> tail -f ./a &
user@host ~> ps aux | grep tail
user-+ 1457  0.0  0.0   8120   580 pts/1    S    16:04   0:00 tail -f ./a
user-+ 1459  0.0  0.0   9040  2568 pts/1    S+   16:04   0:00 grep --color=auto tail
user@host ~> ls -l /proc/1457/fd
total 0
lrwx------ 1 user user 64 Jun  8 16:04 0 -> /dev/pts/1
lrwx------ 1 user user 64 Jun  8 16:04 1 -> /dev/pts/1
lrwx------ 1 user user 64 Jun  8 16:04 2 -> /dev/pts/1
lr-x------ 1 user user 64 Jun  8 16:04 3 -> /home/user/a
lr-x------ 1 user user 64 Jun  8 16:04 4 -> anon_inode:inotify
user@host ~> docker run -it --rm -v /proc/1457/fd/3:/a ubuntu
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:75: mounting "/proc/1457/fd/3" to rootfs at "/a" caused: mount through procfd: invalid argument: unknown.


user@host ~> touch b
user@host ~> sudo mount --bind /proc/1457/fd/3 b
user@host ~> cat b
user@host ~> echo "test" > a
user@host ~> cat b

我查看了 opencontainer/runc 的源代码,发现除了上述概念验证之外,还有一个小问题。为了避免攻击者将挂载目标路径替换为符号链接,runc 打开目标路径,然后使用下面的相应路径/proc/self/fd来引用它。

我编写了一个小 C 程序来模拟这些情况,以确保这个概念仍然合理:

// a.c

#include <sys/mount.h>
#include <stdio.h>
#include <fcntl.h>

int main(int argc, char const *argv[])
  if (argc < 4) {
    fprintf(stderr, "Not enough args");
    return 1;

  char src_path[1024];
    int r = snprintf(src_path, sizeof(src_path), "/proc/%s/fd/%s", argv[1], argv[2]);
    if (r < 0 || r >= sizeof(src_path)) {
      fprintf(stderr, "src sprintf failed\n");
      return 1;

  int fd = open(argv[3], O_RDWR);
  if (fd == 0) {
    perror("Open file failed: ");
    return 1;

  char dst_path[1024];
    int r = snprintf(dst_path, sizeof(dst_path), "/proc/self/fd/%d", fd);
    if (r < 0 || r >= sizeof(dst_path)) {
      fprintf(stderr, "dst sprintf failed\n");
      return 1;

  int r = mount(src_path, dst_path, "none", MS_BIND, 0);
  if (r != 0) {
    perror("Mount failed: ");
    return 1;

  return 0;


user@host ~> gcc ./a.c
user@host ~> sudo ./a.out 1457 3 c
user@host ~ [1]> echo "another test" >> c
another test
user@host ~> cat a
another test

我抓住了救命稻草,跟踪了 docker-containerd,以确保 runc 只是使用带有类似参数的 mount,这里是日志的摘录:

1491 newfstatat(AT_FDCWD, "/proc/1457/fd/3", {st_mode=S_IFSOCK|0777, st_size=0, ...}, 0) = 0
1491 newfstatat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged/volume", 0xc0000e81d8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
1491 newfstatat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged/volume", 0xc0000e82a8, 0) = -1 ENOENT (No such file or directory)
1491 newfstatat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged",  <unfinished ...>
1491 <... newfstatat resumed>{st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
1491 openat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged/volume", O_RDONLY|O_CREAT|O_CLOEXEC, 0755) = 7
1491 epoll_ctl(8, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2955630296, u64=140083313989336}} <unfinished ...>
1491 <... epoll_ctl resumed>)        = -1 EPERM (Operation not permitted)
1491 close(7)                        = 0
1491 newfstatat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged/volume", {st_mode=S_IFREG|0755, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
1491 openat(AT_FDCWD, "/var/lib/docker/overlay2/31b012c4b8edbb2e8c1e0115e4e7c6a4b88a8045f39975367f01f61ccf9d1a5b/merged/volume", O_RDONLY|O_CLOEXEC|O_PATH) = 7
1491 epoll_ctl(8, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2955630296, u64=140083313989336}} <unfinished ...>
1491 <... epoll_ctl resumed>)        = -1 EBADF (Bad file descriptor)
1491 readlinkat(AT_FDCWD, "/proc/self/fd/7", "/var/lib/docker/overlay2/31b012c"..., 128) = 103
1491 mount("/proc/1457/fd/3", "/proc/self/fd/7", 0xc0001bf1d7, MS_BIND, NULL) = -1 EINVAL (Invalid argument)
1491 close(7)                        = 0

您可以看到/proc/1457/fd/3是 stat,并且它存在,但是当尝试挂载它时会失败并出现 EINVAL。阅读后文档我看不出这有什么明显的原因。


user@host ~> uname -a
Linux host 5.13.0-44-generic #49~20.04.1-Ubuntu SMP Wed May 18 18:44:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Docker 详细信息:

user@host ~> docker info
 Context:    default
 Debug Mode: false
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.8.1-docker)
  scan: Docker Scan (Docker Inc., v0.17.0)

 Containers: 62
  Running: 17
  Paused: 0
  Stopped: 45
 Images: 1198
 Server Version: 20.10.14
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3df54a852345ae127d1fa3092b95168e4a88e2f8
 runc version: v1.0.3-0-gf46b6ba
 init version: de40ad0
 Security Options:
   Profile: default
 Kernel Version: 5.13.0-44-generic
 Operating System: Ubuntu 20.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 15.55GiB
 Name: host
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
 Live Restore Enabled: false

