Rsync 挂起:将 file_list 指针数组扩展为 N 个字节,并移动

Rsync 挂起:将 file_list 指针数组扩展为 N 个字节,并移动

在将一些文件从本地文件夹传输到 NFS 文件夹后,Rsync 进入“可中断睡眠”模式。我尝试备份的文件夹包含超过 180GB 的数据。

这是 rsync 挂起之前的输出:

[sender] expand file_list pointer array to 524288 bytes, did move

我正在运行带有 rsync 版本 3.1.0 协议版本 31 的 Ubuntu Server 14.04 LTS,并且使用以下选项运行 rsync:

/usr/bin/rsync -rHAXxvvut --numeric-ids --progress {SRC_FOLDER} {NFS_FOLDER}

感谢任何提示

答案1

考虑到rsync你使用的是一款开源软件,因此很容易访问相关源代码

下载主程序.tar.gz并应用 Ubuntu 补丁 (rsync_3.1.0-2ubuntu0.4.diff.gz) 后,您将获得所用 rsync 的底层代码。如下所示:

$ mkdir rsync
$ cd rsync/
$ wget http://archive.ubuntu.com/ubuntu/pool/main/r/rsync/rsync_3.1.0.orig.tar.gz
$ wget http://archive.ubuntu.com/ubuntu/pool/main/r/rsync/rsync_3.1.0-2ubuntu0.4.diff.gz
$ gzip -d rsync_3.1.0-2ubuntu0.4.diff.gz
$ tar zxvf rsync_3.1.0.orig.tar.gz 
$ cd rsync-3.1.0/
$ patch -p1 < ../rsync_3.1.0-2ubuntu0.4.diff

现在一个简单的grep就可以快速告诉我们您的错误消息的上下文:

$ grep -r 'expand file_list pointer array to' 
flist.c:        rprintf(FCLIENT, "[%s] expand file_list pointer array to %s bytes, did%s move\n",

所以你很幸运,因为你的错误消息只用在单个文件的单个片段中。即:列表文件

让我们来看看:

flist.c 上下文

相对容易猜出包含错误消息(第 325、326、327、328 行)的例程已被命名flist_expand,听起来像是确保整个文件列表(到 rsync)可以保存在适当大小的内存结构中所需的东西(即:需要 rsync 的文件越多,处理 rsync 计算所需的内存就越多,并且由于列表不是“提前”已知的,因此需要通过为“列表”分配适当的内存块来动态计算[或多或少])。

所以,我敢打赌你的问题在于不是取决于你正在 rsync 的数据大小,但是文件数量。我会尝试将您的 rsync 拆分为多个子 rsync,并重点关注内部子文件夹。

事实上,最好能更好地调查一下:

  1. 第 328 行:(new_ptr == flist->files) ? " not" : "");
  2. 第 334 行:out_of_memory("flist_expand");

但这远远超出了我最初的目标:-)

无论如何,我敢打赌,检查你的日志你会发现一些“内存不足”的消息……:-)

呼呼!

答案2

两点建议可能会有帮助

  • 一直对这个-a选择感到满意(来自男人“归档模式;等于 -rlptgoD(无 -H、-A、-X)“)

  • rsync 可能正在等待 NFS 授予对文件的访问权限。看来 NFS 确实能够锁定 rsync(可能是在覆盖某个文件时),看看 rsync 在进入“睡眠”之前访问了哪个文件会很有趣。有一个命令可以查看 rsync 当前打开了哪些文件lsof -ad3-999 -c rsync(来自问库本图

答案3

使用 rsync 将内容传输到 NFS 文件夹可能非常低效。想想当 rsync 想要获取远程文件的校验和或就地修改远程文件时会涉及哪些内容。让 rsync 与文件服务器上运行的 rsync 进程通信要好得多。如果可能的话,我会先更改它,然后查看当前问题是否仍然存在。即通过 ssh 使用 rsync 或运行 rsync 守护进程,完全不考虑 NFS。

了解 rsync 正在做什么strace可能会有用:

strace -p <PID>

或者让 strace 启动 rsync 如下:

strace rsync [rsync options] <src> <target>

通过 PID 附加到正在运行的进程可能需要 root 权限,但这可以更改(由 root 更改)。

答案4

我刚刚在设置使用 rsync 的 rsnapshot 时遇到了这个问题。

现在,我可以可靠地重复这个问题,当 rsnapshot 在一个非常大的目录(47,616 个目录条目)上调用 rsync 时第二重复对同一目录进行 rsnapshot。rsync 从一个使用 ext4 格式化的本地磁盘开始,并发送到该磁盘。

我第一次看到它时,调用了常见的 /usr/bin/rsync 二进制文件(删除了符号),“rsync 版本 3.2.3 协议版本 31”,这是我的 Xubuntu“22.04.1 LTS (Jammy Jellyfish)”版本中的文件,但我没有在 strace 下使用它。因此,gdb 回溯对我来说是无法解读的。在三个“挂起”的 rsync 进程(父进程、发送方进程和接收方进程)上运行 strace 后,发现所有三个进程在打印“expand file_list 指针数组”后,都在 select(2) 系统调用上循环,超时时间为 60 秒。

因此我从源代码重建了相同的 3.2.3 版本的 rsync,并重复了相同的测试(在我弄清楚如何重现这个问题之后),并得到了以下三个 gdb 堆栈回溯:

============== Backtrace #1 ==============
#0  0x00007f69b109b74d in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffcd48d9c90, writefds=writefds@entry=0x7ffcd48d9d90, exceptfds=exceptfds@entry=0x7ffcd48d9d10, timeout=timeout@entry=0x7ffcd48d9c60)
at ../sysdeps/unix/sysv/linux/select.c:69
#1  0x0000564b6ef0f544 in perform_io (needed=needed@entry=18, flags=flags@entry=2) at io.c:741
#2  0x0000564b6ef115f5 in write_buf (f=f@entry=4, buf=0x7ffcd48da019 "3dec1fdb7cfc0341_0", len=len@entry=18) at io.c:2125
#3  0x0000564b6eee9088 in send_file_entry (first_ndx=<optimized out>, ndx=<optimized out>, symlink_len=0, symlink_name=0x0, file=0x564b70627350, 
    fname=0x7ffcd48d9fd0 ".config/_some_application__that_I_use_/Cache/3dec1fdb7cfc0341_0", f=4) at flist.c:565
#4  send_file_name (f=f@entry=4, flist=flist@entry=0x564b6f6ae160, fname=fname@entry=0x7ffcd48dc150 ".config/_some_application__that_I_use_/Cache/3dec1fdb7cfc0341_0", stp=stp@entry=0x0, flags=flags@entry=65540, 
    filter_level=filter_level@entry=2) at flist.c:1604
#5  0x0000564b6eeea49d in send_directory (f=f@entry=4, flist=flist@entry=0x564b6f6ae160, fbuf=fbuf@entry=0x7ffcd48dc150 ".config/_some_application__that_I_use_/Cache/3dec1fdb7cfc0341_0", len=len@entry=72, 
    flags=flags@entry=65540) at flist.c:1839
#6  0x0000564b6eeea9b6 in send1extra (f=f@entry=4, file=file@entry=0x564b6f454a50, flist=flist@entry=0x564b6f6ae160) at flist.c:1992
#7  0x0000564b6eeeb21f in send_extra_file_list (f=f@entry=4, at_least=at_least@entry=1000) at flist.c:2078
#8  0x0000564b6eef8add in send_files (f_in=f_in@entry=5, f_out=f_out@entry=4) at sender.c:245
#9  0x0000564b6ef02b3c in client_run (f_in=5, f_out=4, pid=pid@entry=2780890, argc=argc@entry=1, argv=argv@entry=0x564b6efe65b0) at main.c:1317
#10 0x0000564b6eee2227 in start_client (argv=<optimized out>, argc=1) at main.c:1580
#11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812

    
    ============== Backtrace #2 ==============

#0  0x00007f69b109b74d in __GI___select (nfds=nfds@entry=2, readfds=readfds@entry=0x7ffcd48d7cd0, writefds=writefds@entry=0x7ffcd48d7dd0, exceptfds=exceptfds@entry=0x7ffcd48d7d50, timeout=timeout@entry=0x7ffcd48d7ca0)
    at ../sysdeps/unix/sysv/linux/select.c:69
#1  0x0000564b6ef0f544 in perform_io (needed=76, flags=flags@entry=4) at io.c:741
#2  0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=len@entry=69, convert=<optimized out>) at io.c:966
#3  0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=69, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#4  0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3f368 "[%s] expand file_list pointer array to %s bytes, did%s move\n") at log.c:442
#5  0x0000564b6eee4445 in flist_expand (extra=<optimized out>, flist=0x564b6f41b3e0) at flist.c:309
#6  flist_expand (flist=0x564b6f41b3e0, extra=<optimized out>) at flist.c:287
#7  0x0000564b6eeec9bd in recv_file_list (f=3, dir_ndx=dir_ndx@entry=2960) at flist.c:2584
#8  0x0000564b6ef13243 in wait_for_receiver () at io.c:1699
#9  0x0000564b6ef0f934 in wait_for_receiver () at io.c:1677
#10 perform_io (needed=89, flags=flags@entry=4) at io.c:862
#11 0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=len@entry=82, convert=<optimized out>) at io.c:966
#12 0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=82, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#13 0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3fba5 "%s is uptodate\n") at log.c:442
#14 0x0000564b6eeee433 in set_file_attrs (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, sxp=<optimized out>, sxp@entry=0x7ffcd48dc010, 
    fnamecmp=fnamecmp@entry=0x0, flags=<optimized out>) at rsync.c:661
#15 0x0000564b6eef3b78 in recv_generator (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, ndx=44459, itemizing=itemizing@entry=1, code=code@entry=FLOG, 
    f_out=f_out@entry=1) at generator.c:1805
#16 0x0000564b6eef4c95 in generate_files (f_out=f_out@entry=1, local_name=local_name@entry=0x0) at generator.c:2318
#17 0x0000564b6ef01f7c in do_recv (f_in=<optimized out>, f_in@entry=0, f_out=f_out@entry=1, local_name=local_name@entry=0x0) at main.c:1106
#18 0x0000564b6ef026cc in do_server_recv (argv=<optimized out>, argc=<optimized out>, f_out=1, f_in=0) at main.c:1219
#19 start_server (f_in=f_in@entry=0, f_out=f_out@entry=1, argc=<optimized out>, argv=<optimized out>) at main.c:1253
#20 0x0000564b6ef0281b in child_main (argc=<optimized out>, argv=<optimized out>) at main.c:1226
#21 0x0000564b6ef232b9 in local_child (argc=2, argv=argv@entry=0x7ffcd48df490, f_in=f_in@entry=0x7ffcd48df3f0, f_out=f_out@entry=0x7ffcd48df3f4, child_main=child_main@entry=0x564b6ef02800 <child_main>) at pipe.c:166
#22 0x0000564b6eee21d2 in do_cmd (f_out_p=0x7ffcd48df3f4, f_in_p=0x7ffcd48df3f0, remote_argc=<optimized out>, remote_argv=<optimized out>, user=0x0, machine=<optimized out>, cmd=<optimized out>) at main.c:651
#23 start_client (argv=<optimized out>, argc=1) at main.c:1569
#24 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812


        ============== Backtrace #3 ==============

#0  0x00007f69b109b74d in __GI___select (nfds=nfds@entry=2, readfds=readfds@entry=0x7ffcd48d7cd0, writefds=writefds@entry=0x7ffcd48d7dd0, exceptfds=exceptfds@entry=0x7ffcd48d7d50, timeout=timeout@entry=0x7ffcd48d7ca0)
    at ../sysdeps/unix/sysv/linux/select.c:69
#1  0x0000564b6ef0f544 in perform_io (needed=76, flags=flags@entry=4) at io.c:741
#2  0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=len@entry=69, convert=<optimized out>) at io.c:966
#3  0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=69, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#4  0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3f368 "[%s] expand file_list pointer array to %s bytes, did%s move\n") at log.c:442
#5  0x0000564b6eee4445 in flist_expand (extra=<optimized out>, flist=0x564b6f41b3e0) at flist.c:309
#6  flist_expand (flist=0x564b6f41b3e0, extra=<optimized out>) at flist.c:287
#7  0x0000564b6eeec9bd in recv_file_list (f=3, dir_ndx=dir_ndx@entry=2960) at flist.c:2584
#8  0x0000564b6ef13243 in wait_for_receiver () at io.c:1699
#9  0x0000564b6ef0f934 in wait_for_receiver () at io.c:1677
#10 perform_io (needed=89, flags=flags@entry=4) at io.c:862
#11 0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=len@entry=82, convert=<optimized out>) at io.c:966
#12 0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=82, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#13 0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3fba5 "%s is uptodate\n") at log.c:442
#14 0x0000564b6eeee433 in set_file_attrs (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, sxp=<optimized out>, sxp@entry=0x7ffcd48dc010, 
    fnamecmp=fnamecmp@entry=0x0, flags=<optimized out>) at rsync.c:661
#15 0x0000564b6eef3b78 in recv_generator (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, ndx=44459, itemizing=itemizing@entry=1, code=code@entry=FLOG, 
    f_out=f_out@entry=1) at generator.c:1805
#16 0x0000564b6eef4c95 in generate_files (f_out=f_out@entry=1, local_name=local_name@entry=0x0) at generator.c:2318
#17 0x0000564b6ef01f7c in do_recv (f_in=<optimized out>, f_in@entry=0, f_out=f_out@entry=1, local_name=local_name@entry=0x0) at main.c:1106
#18 0x0000564b6ef026cc in do_server_recv (argv=<optimized out>, argc=<optimized out>, f_out=1, f_in=0) at main.c:1219
#19 start_server (f_in=f_in@entry=0, f_out=f_out@entry=1, argc=<optimized out>, argv=<optimized out>) at main.c:1253
#20 0x0000564b6ef0281b in child_main (argc=<optimized out>, argv=<optimized out>) at main.c:1226
#21 0x0000564b6ef232b9 in local_child (argc=2, argv=argv@entry=0x7ffcd48df490, f_in=f_in@entry=0x7ffcd48df3f0, f_out=f_out@entry=0x7ffcd48df3f4, child_main=child_main@entry=0x564b6ef02800 <child_main>) at pipe.c:166
#22 0x0000564b6eee21d2 in do_cmd (f_out_p=0x7ffcd48df3f4, f_in_p=0x7ffcd48df3f0, remote_argc=<optimized out>, remote_argv=<optimized out>, user=0x0, machine=<optimized out>, cmd=<optimized out>) at main.c:651
#23 start_client (argv=<optimized out>, argc=1) at main.c:1569
#24 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812

===

请注意,预期的错误消息正在 Stack Backtrace #2 中打印,并且recv_file_list() 和flist_expand() 位于该堆栈上。

这些 rsync 实例或多或少地使用以下命令行选项来调用,我已将 rsnapshot 配置为调用它:

rsync -av --delete-during --archive --verbose --one-file-system --hard-links --xattrs --sparse src_dir dest_dir

我将看看是否能够进一步调试这个问题,如果可以的话,我会在这里发表另一篇文章。

相关内容