如何使用rsync或scp高效地将文件从machineB和machineC复制到machineA？

Question 1

您的脚本的主要问题是您scp为每个文件打开一个单独的连接，这会添加一个很多不必要的开销。你可以尝试这样的事情：

#!/usr/bin/env bash

readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot

PRIMARY_PARTITION=(0 548 272 4 544 276 8 556 280 12 552 284 16 256 564 20 260 560 24 264 572)
SECONDARY_PARTITION=(1101 1374 1641 1371 1647 1098 1635 1365 1095 1638 1089 1362 1659 1359)

dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)

## Build your list of filenames before the loop. 
for n in "${PRIMARY_PARTITION[@]}"
do
    primary_files="$primary_files :$dir1"/t1_weekly_1680_"$n"_200003_5.data
done

## Repeat for $SECONDARY_PARTITION
for n in "${SECONDARY_PARTITION[@]}"
do
    secondary_files="$secondary_files :$dir2"/t1_weekly_1680_"$n"_200003_5.data
done

if [ "$dir1" = "$dir2" ]
then
    ## I am using find largely because the * 
    ## in rm -rf "$PRIMARY"/* screws up the syntax 
    ## highlighting on the site and it is a good habit to
    ## get into anyway. Feel free to use rm -rf in your script.
    find "$PRIMARY" -mindepth 1 -delete
    find "$SECONDARY" -mindepth 1 -delete

    ## rsync can be run with this format:
    ##   rsync user@dest:/target/path1 :/target/path2 :/target/pathN /dest/path
    #
    ## which is why I added the : in the loop above. So, these commands will 
    ## open only 2 conections per file list. First you will try to copy all $primary_partition
    ## files from machineA, then all $primary_partition files from machineB. 
    ## rsync will complain about files not found (which is why I'm redirecting standard
    ## error to /dev/null) but will continue. You then repeat the process for machineC.
    rsync -avz david@${FILERS_LOCATION[0]}"${primary_files}" $PRIMARY/ 2>/dev/null
    rsync -avz david@${FILERS_LOCATION[1]}"${primary_files}" $PRIMARY/ 2>/dev/null

    ## Do the same for $secondary_partition files
    rsync -avz david@${FILERS_LOCATION[0]}"${secondary_files}" $SECONDARY/ 2>/dev/null
    rsync -avz david@${FILERS_LOCATION[1]}"${secondary_files}" $SECONDARY/ 2>/dev/null
fi

Answer

您的脚本的主要问题是您scp为每个文件打开一个单独的连接，这会添加一个很多不必要的开销。你可以尝试这样的事情：

#!/usr/bin/env bash

readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot

PRIMARY_PARTITION=(0 548 272 4 544 276 8 556 280 12 552 284 16 256 564 20 260 560 24 264 572)
SECONDARY_PARTITION=(1101 1374 1641 1371 1647 1098 1635 1365 1095 1638 1089 1362 1659 1359)

dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)

## Build your list of filenames before the loop. 
for n in "${PRIMARY_PARTITION[@]}"
do
    primary_files="$primary_files :$dir1"/t1_weekly_1680_"$n"_200003_5.data
done

## Repeat for $SECONDARY_PARTITION
for n in "${SECONDARY_PARTITION[@]}"
do
    secondary_files="$secondary_files :$dir2"/t1_weekly_1680_"$n"_200003_5.data
done

if [ "$dir1" = "$dir2" ]
then
    ## I am using find largely because the * 
    ## in rm -rf "$PRIMARY"/* screws up the syntax 
    ## highlighting on the site and it is a good habit to
    ## get into anyway. Feel free to use rm -rf in your script.
    find "$PRIMARY" -mindepth 1 -delete
    find "$SECONDARY" -mindepth 1 -delete

    ## rsync can be run with this format:
    ##   rsync user@dest:/target/path1 :/target/path2 :/target/pathN /dest/path
    #
    ## which is why I added the : in the loop above. So, these commands will 
    ## open only 2 conections per file list. First you will try to copy all $primary_partition
    ## files from machineA, then all $primary_partition files from machineB. 
    ## rsync will complain about files not found (which is why I'm redirecting standard
    ## error to /dev/null) but will continue. You then repeat the process for machineC.
    rsync -avz david@${FILERS_LOCATION[0]}"${primary_files}" $PRIMARY/ 2>/dev/null
    rsync -avz david@${FILERS_LOCATION[1]}"${primary_files}" $PRIMARY/ 2>/dev/null

    ## Do the same for $secondary_partition files
    rsync -avz david@${FILERS_LOCATION[0]}"${secondary_files}" $SECONDARY/ 2>/dev/null
    rsync -avz david@${FILERS_LOCATION[1]}"${secondary_files}" $SECONDARY/ 2>/dev/null
fi

Question 2

rsync负责：仅复制已更改的文件，忽略您不想复制的文件（-C例如，切换，这将排除 CVS 在其存储库中排除的相同文件，尽管您可以指定任何内容），递归复制整个文件目录结构（当然，只有需要的更改，而不是所有内容）。它可以选择压缩流，从而加快传输速度。它也更快，因为它在单个连接中完成整个复制。

由于您仅复制单个文件，因此大多数功能都不会被使用。你会使用

rsync -avz "$firstfile" "$secondfile"

scp除了标志之外，这与其他完全相同（a - 存档保留权限和时间戳，v 表示详细程度，z 表示压缩）。

但是，您也可以使用 scp 进行压缩：

scp -p -C …

我认为这是这里最简单的解决方案。只需添加一个标志即可完成。

Answer