在将一些文件从本地文件夹传输到 NFS 文件夹后,Rsync 进入“可中断睡眠”模式。我尝试备份的文件夹包含超过 180GB 的数据。
这是 rsync 挂起之前的输出:
[sender] expand file_list pointer array to 524288 bytes, did move
我正在运行带有 rsync 版本 3.1.0 协议版本 31 的 Ubuntu Server 14.04 LTS,并且使用以下选项运行 rsync:
/usr/bin/rsync -rHAXxvvut --numeric-ids --progress {SRC_FOLDER} {NFS_FOLDER}
感谢任何提示
答案1
考虑到rsync
你使用的是一款开源软件,因此很容易访问相关源代码。
下载主程序.tar.gz
并应用 Ubuntu 补丁 (rsync_3.1.0-2ubuntu0.4.diff.gz) 后,您将获得所用 rsync 的底层代码。如下所示:
$ mkdir rsync
$ cd rsync/
$ wget http://archive.ubuntu.com/ubuntu/pool/main/r/rsync/rsync_3.1.0.orig.tar.gz
$ wget http://archive.ubuntu.com/ubuntu/pool/main/r/rsync/rsync_3.1.0-2ubuntu0.4.diff.gz
$ gzip -d rsync_3.1.0-2ubuntu0.4.diff.gz
$ tar zxvf rsync_3.1.0.orig.tar.gz
$ cd rsync-3.1.0/
$ patch -p1 < ../rsync_3.1.0-2ubuntu0.4.diff
现在一个简单的grep
就可以快速告诉我们您的错误消息的上下文:
$ grep -r 'expand file_list pointer array to'
flist.c: rprintf(FCLIENT, "[%s] expand file_list pointer array to %s bytes, did%s move\n",
所以你很幸运,因为你的错误消息只用在单个文件的单个片段中。即:列表文件。
让我们来看看:
相对容易猜出包含错误消息(第 325、326、327、328 行)的例程已被命名flist_expand
,听起来像是确保整个文件列表(到 rsync)可以保存在适当大小的内存结构中所需的东西(即:需要 rsync 的文件越多,处理 rsync 计算所需的内存就越多,并且由于列表不是“提前”已知的,因此需要通过为“列表”分配适当的内存块来动态计算[或多或少])。
所以,我敢打赌你的问题在于不是取决于你正在 rsync 的数据大小,但是文件数量。我会尝试将您的 rsync 拆分为多个子 rsync,并重点关注内部子文件夹。
事实上,最好能更好地调查一下:
- 第 328 行:
(new_ptr == flist->files) ? " not" : "");
- 第 334 行:
out_of_memory("flist_expand");
但这远远超出了我最初的目标:-)
无论如何,我敢打赌,检查你的日志你会发现一些“内存不足”的消息……:-)
呼呼!
答案2
两点建议可能会有帮助
一直对这个
-a
选择感到满意(来自男人“归档模式;等于 -rlptgoD(无 -H、-A、-X)“)rsync 可能正在等待 NFS 授予对文件的访问权限。看来 NFS 确实能够锁定 rsync(可能是在覆盖某个文件时),看看 rsync 在进入“睡眠”之前访问了哪个文件会很有趣。有一个命令可以查看 rsync 当前打开了哪些文件
lsof -ad3-999 -c rsync
(来自问库本图)
答案3
使用 rsync 将内容传输到 NFS 文件夹可能非常低效。想想当 rsync 想要获取远程文件的校验和或就地修改远程文件时会涉及哪些内容。让 rsync 与文件服务器上运行的 rsync 进程通信要好得多。如果可能的话,我会先更改它,然后查看当前问题是否仍然存在。即通过 ssh 使用 rsync 或运行 rsync 守护进程,完全不考虑 NFS。
了解 rsync 正在做什么strace
可能会有用:
strace -p <PID>
或者让 strace 启动 rsync 如下:
strace rsync [rsync options] <src> <target>
通过 PID 附加到正在运行的进程可能需要 root 权限,但这可以更改(由 root 更改)。
答案4
我刚刚在设置使用 rsync 的 rsnapshot 时遇到了这个问题。
现在,我可以可靠地重复这个问题,当 rsnapshot 在一个非常大的目录(47,616 个目录条目)上调用 rsync 时第二重复对同一目录进行 rsnapshot。rsync 从一个使用 ext4 格式化的本地磁盘开始,并发送到该磁盘。
我第一次看到它时,调用了常见的 /usr/bin/rsync 二进制文件(删除了符号),“rsync 版本 3.2.3 协议版本 31”,这是我的 Xubuntu“22.04.1 LTS (Jammy Jellyfish)”版本中的文件,但我没有在 strace 下使用它。因此,gdb 回溯对我来说是无法解读的。在三个“挂起”的 rsync 进程(父进程、发送方进程和接收方进程)上运行 strace 后,发现所有三个进程在打印“expand file_list 指针数组”后,都在 select(2) 系统调用上循环,超时时间为 60 秒。
因此我从源代码重建了相同的 3.2.3 版本的 rsync,并重复了相同的测试(在我弄清楚如何重现这个问题之后),并得到了以下三个 gdb 堆栈回溯:
============== Backtrace #1 ==============
#0 0x00007f69b109b74d in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffcd48d9c90, writefds=writefds@entry=0x7ffcd48d9d90, exceptfds=exceptfds@entry=0x7ffcd48d9d10, timeout=timeout@entry=0x7ffcd48d9c60)
at ../sysdeps/unix/sysv/linux/select.c:69
#1 0x0000564b6ef0f544 in perform_io (needed=needed@entry=18, flags=flags@entry=2) at io.c:741
#2 0x0000564b6ef115f5 in write_buf (f=f@entry=4, buf=0x7ffcd48da019 "3dec1fdb7cfc0341_0", len=len@entry=18) at io.c:2125
#3 0x0000564b6eee9088 in send_file_entry (first_ndx=<optimized out>, ndx=<optimized out>, symlink_len=0, symlink_name=0x0, file=0x564b70627350,
fname=0x7ffcd48d9fd0 ".config/_some_application__that_I_use_/Cache/3dec1fdb7cfc0341_0", f=4) at flist.c:565
#4 send_file_name (f=f@entry=4, flist=flist@entry=0x564b6f6ae160, fname=fname@entry=0x7ffcd48dc150 ".config/_some_application__that_I_use_/Cache/3dec1fdb7cfc0341_0", stp=stp@entry=0x0, flags=flags@entry=65540,
filter_level=filter_level@entry=2) at flist.c:1604
#5 0x0000564b6eeea49d in send_directory (f=f@entry=4, flist=flist@entry=0x564b6f6ae160, fbuf=fbuf@entry=0x7ffcd48dc150 ".config/_some_application__that_I_use_/Cache/3dec1fdb7cfc0341_0", len=len@entry=72,
flags=flags@entry=65540) at flist.c:1839
#6 0x0000564b6eeea9b6 in send1extra (f=f@entry=4, file=file@entry=0x564b6f454a50, flist=flist@entry=0x564b6f6ae160) at flist.c:1992
#7 0x0000564b6eeeb21f in send_extra_file_list (f=f@entry=4, at_least=at_least@entry=1000) at flist.c:2078
#8 0x0000564b6eef8add in send_files (f_in=f_in@entry=5, f_out=f_out@entry=4) at sender.c:245
#9 0x0000564b6ef02b3c in client_run (f_in=5, f_out=4, pid=pid@entry=2780890, argc=argc@entry=1, argv=argv@entry=0x564b6efe65b0) at main.c:1317
#10 0x0000564b6eee2227 in start_client (argv=<optimized out>, argc=1) at main.c:1580
#11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812
============== Backtrace #2 ==============
#0 0x00007f69b109b74d in __GI___select (nfds=nfds@entry=2, readfds=readfds@entry=0x7ffcd48d7cd0, writefds=writefds@entry=0x7ffcd48d7dd0, exceptfds=exceptfds@entry=0x7ffcd48d7d50, timeout=timeout@entry=0x7ffcd48d7ca0)
at ../sysdeps/unix/sysv/linux/select.c:69
#1 0x0000564b6ef0f544 in perform_io (needed=76, flags=flags@entry=4) at io.c:741
#2 0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=len@entry=69, convert=<optimized out>) at io.c:966
#3 0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=69, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#4 0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3f368 "[%s] expand file_list pointer array to %s bytes, did%s move\n") at log.c:442
#5 0x0000564b6eee4445 in flist_expand (extra=<optimized out>, flist=0x564b6f41b3e0) at flist.c:309
#6 flist_expand (flist=0x564b6f41b3e0, extra=<optimized out>) at flist.c:287
#7 0x0000564b6eeec9bd in recv_file_list (f=3, dir_ndx=dir_ndx@entry=2960) at flist.c:2584
#8 0x0000564b6ef13243 in wait_for_receiver () at io.c:1699
#9 0x0000564b6ef0f934 in wait_for_receiver () at io.c:1677
#10 perform_io (needed=89, flags=flags@entry=4) at io.c:862
#11 0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=len@entry=82, convert=<optimized out>) at io.c:966
#12 0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=82, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#13 0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3fba5 "%s is uptodate\n") at log.c:442
#14 0x0000564b6eeee433 in set_file_attrs (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, sxp=<optimized out>, sxp@entry=0x7ffcd48dc010,
fnamecmp=fnamecmp@entry=0x0, flags=<optimized out>) at rsync.c:661
#15 0x0000564b6eef3b78 in recv_generator (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, ndx=44459, itemizing=itemizing@entry=1, code=code@entry=FLOG,
f_out=f_out@entry=1) at generator.c:1805
#16 0x0000564b6eef4c95 in generate_files (f_out=f_out@entry=1, local_name=local_name@entry=0x0) at generator.c:2318
#17 0x0000564b6ef01f7c in do_recv (f_in=<optimized out>, f_in@entry=0, f_out=f_out@entry=1, local_name=local_name@entry=0x0) at main.c:1106
#18 0x0000564b6ef026cc in do_server_recv (argv=<optimized out>, argc=<optimized out>, f_out=1, f_in=0) at main.c:1219
#19 start_server (f_in=f_in@entry=0, f_out=f_out@entry=1, argc=<optimized out>, argv=<optimized out>) at main.c:1253
#20 0x0000564b6ef0281b in child_main (argc=<optimized out>, argv=<optimized out>) at main.c:1226
#21 0x0000564b6ef232b9 in local_child (argc=2, argv=argv@entry=0x7ffcd48df490, f_in=f_in@entry=0x7ffcd48df3f0, f_out=f_out@entry=0x7ffcd48df3f4, child_main=child_main@entry=0x564b6ef02800 <child_main>) at pipe.c:166
#22 0x0000564b6eee21d2 in do_cmd (f_out_p=0x7ffcd48df3f4, f_in_p=0x7ffcd48df3f0, remote_argc=<optimized out>, remote_argv=<optimized out>, user=0x0, machine=<optimized out>, cmd=<optimized out>) at main.c:651
#23 start_client (argv=<optimized out>, argc=1) at main.c:1569
#24 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812
============== Backtrace #3 ==============
#0 0x00007f69b109b74d in __GI___select (nfds=nfds@entry=2, readfds=readfds@entry=0x7ffcd48d7cd0, writefds=writefds@entry=0x7ffcd48d7dd0, exceptfds=exceptfds@entry=0x7ffcd48d7d50, timeout=timeout@entry=0x7ffcd48d7ca0)
at ../sysdeps/unix/sysv/linux/select.c:69
#1 0x0000564b6ef0f544 in perform_io (needed=76, flags=flags@entry=4) at io.c:741
#2 0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=len@entry=69, convert=<optimized out>) at io.c:966
#3 0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=69, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#4 0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3f368 "[%s] expand file_list pointer array to %s bytes, did%s move\n") at log.c:442
#5 0x0000564b6eee4445 in flist_expand (extra=<optimized out>, flist=0x564b6f41b3e0) at flist.c:309
#6 flist_expand (flist=0x564b6f41b3e0, extra=<optimized out>) at flist.c:287
#7 0x0000564b6eeec9bd in recv_file_list (f=3, dir_ndx=dir_ndx@entry=2960) at flist.c:2584
#8 0x0000564b6ef13243 in wait_for_receiver () at io.c:1699
#9 0x0000564b6ef0f934 in wait_for_receiver () at io.c:1677
#10 perform_io (needed=89, flags=flags@entry=4) at io.c:862
#11 0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=len@entry=82, convert=<optimized out>) at io.c:966
#12 0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=82, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#13 0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3fba5 "%s is uptodate\n") at log.c:442
#14 0x0000564b6eeee433 in set_file_attrs (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, sxp=<optimized out>, sxp@entry=0x7ffcd48dc010,
fnamecmp=fnamecmp@entry=0x0, flags=<optimized out>) at rsync.c:661
#15 0x0000564b6eef3b78 in recv_generator (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, ndx=44459, itemizing=itemizing@entry=1, code=code@entry=FLOG,
f_out=f_out@entry=1) at generator.c:1805
#16 0x0000564b6eef4c95 in generate_files (f_out=f_out@entry=1, local_name=local_name@entry=0x0) at generator.c:2318
#17 0x0000564b6ef01f7c in do_recv (f_in=<optimized out>, f_in@entry=0, f_out=f_out@entry=1, local_name=local_name@entry=0x0) at main.c:1106
#18 0x0000564b6ef026cc in do_server_recv (argv=<optimized out>, argc=<optimized out>, f_out=1, f_in=0) at main.c:1219
#19 start_server (f_in=f_in@entry=0, f_out=f_out@entry=1, argc=<optimized out>, argv=<optimized out>) at main.c:1253
#20 0x0000564b6ef0281b in child_main (argc=<optimized out>, argv=<optimized out>) at main.c:1226
#21 0x0000564b6ef232b9 in local_child (argc=2, argv=argv@entry=0x7ffcd48df490, f_in=f_in@entry=0x7ffcd48df3f0, f_out=f_out@entry=0x7ffcd48df3f4, child_main=child_main@entry=0x564b6ef02800 <child_main>) at pipe.c:166
#22 0x0000564b6eee21d2 in do_cmd (f_out_p=0x7ffcd48df3f4, f_in_p=0x7ffcd48df3f0, remote_argc=<optimized out>, remote_argv=<optimized out>, user=0x0, machine=<optimized out>, cmd=<optimized out>) at main.c:651
#23 start_client (argv=<optimized out>, argc=1) at main.c:1569
#24 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812
===
请注意,预期的错误消息正在 Stack Backtrace #2 中打印,并且recv_file_list
() 和flist_expand
() 位于该堆栈上。
这些 rsync 实例或多或少地使用以下命令行选项来调用,我已将 rsnapshot 配置为调用它:
rsync -av --delete-during --archive --verbose --one-file-system --hard-links --xattrs --sparse src_dir dest_dir
我将看看是否能够进一步调试这个问题,如果可以的话,我会在这里发表另一篇文章。