如何使用 s3cmd 将文件同步到 Amazon S3,检查是否已发送并在本地删除?

如何使用 s3cmd 将文件同步到 Amazon S3,检查是否已发送并在本地删除?

我正在尝试使用Amazon S3服务从我的应用程序中进行存储logs。Given/user/bin/s3cmd --help告诉我需要了解如何发送文件:

s3cmd --help
usage: s3cmd [options] COMMAND [parameters]

S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing "buckets" and uploading, downloading and removing
"objects" from these buckets.

options:
  -h, --help            show this help message and exit
  --configure           Invoke interactive (re)configuration tool.
  -c FILE, --config=FILE
                        Config file name. Defaults to
                        /home/valter.silva/.s3cfg
  --dump-config         Dump current configuration after parsing config files
                        and command line options and exit.
  -n, --dry-run         Only show what should be uploaded or downloaded but
                        don't actually do it. May still perform S3 requests to
                        get bucket listings and other information though (only
                        for file transfer commands)
  -e, --encrypt         Encrypt files before uploading to S3.
  --no-encrypt          Don't encrypt files.
  -f, --force           Force overwrite and other dangerous operations.
  --continue            Continue getting a partially downloaded file (only for
                        [get] command).
  --skip-existing       Skip over files that exist at the destination (only
                        for [get] and [sync] commands).
  -r, --recursive       Recursive upload, download or removal.
  --check-md5           Check MD5 sums when comparing files for [sync].
                        (default)
  --no-check-md5        Do not check MD5 sums when comparing files for [sync].
                        Only size will be compared. May significantly speed up
                        transfer but may also miss some changed files.
  -P, --acl-public      Store objects with ACL allowing read for anyone.
  --acl-private         Store objects with default ACL allowing access for you
                        only.
  --acl-grant=PERMISSION:EMAIL or USER_CANONICAL_ID
                        Grant stated permission to a given amazon user.
                        Permission is one of: read, write, read_acp,
                        write_acp, full_control, all
  --acl-revoke=PERMISSION:USER_CANONICAL_ID
                        Revoke stated permission for a given amazon user.
                        Permission is one of: read, write, read_acp, wr
                        ite_acp, full_control, all
  --delete-removed      Delete remote objects with no corresponding local file
                        [sync]
  --no-delete-removed   Don't delete remote objects.
  -p, --preserve        Preserve filesystem attributes (mode, ownership,
                        timestamps). Default for [sync] command.
  --no-preserve         Don't store FS attributes
  --exclude=GLOB        Filenames and paths matching GLOB will be excluded
                        from sync
  --exclude-from=FILE   Read --exclude GLOBs from FILE
  --rexclude=REGEXP     Filenames and paths matching REGEXP (regular
                        expression) will be excluded from sync
  --rexclude-from=FILE  Read --rexclude REGEXPs from FILE
  --include=GLOB        Filenames and paths matching GLOB will be included
                        even if previously excluded by one of
                        --(r)exclude(-from) patterns
  --include-from=FILE   Read --include GLOBs from FILE
  --rinclude=REGEXP     Same as --include but uses REGEXP (regular expression)
                        instead of GLOB
  --rinclude-from=FILE  Read --rinclude REGEXPs from FILE
  --bucket-location=BUCKET_LOCATION
                        Datacentre to create bucket in. As of now the
                        datacenters are: US (default), EU, us-west-1, and ap-
                        southeast-1
  --reduced-redundancy, --rr
                        Store object with 'Reduced redundancy'. Lower per-GB
                        price. [put, cp, mv]
  --access-logging-target-prefix=LOG_TARGET_PREFIX
                        Target prefix for access logs (S3 URI) (for [cfmodify]
                        and [accesslog] commands)
  --no-access-logging   Disable access logging (for [cfmodify] and [accesslog]
                        commands)
  -m MIME/TYPE, --mime-type=MIME/TYPE
                        Default MIME-type to be set for objects stored.
  -M, --guess-mime-type
                        Guess MIME-type of files by their extension. Falls
                        back to default MIME-Type as specified by --mime-type
                        option
  --add-header=NAME:VALUE
                        Add a given HTTP header to the upload request. Can be
                        used multiple times. For instance set 'Expires' or
                        'Cache-Control' headers (or both) using this options
                        if you like.
  --encoding=ENCODING   Override autodetected terminal and filesystem encoding
                        (character set). Autodetected: UTF-8
  --verbatim            Use the S3 name as given on the command line. No pre-
                        processing, encoding, etc. Use with caution!
  --list-md5            Include MD5 sums in bucket listings (only for 'ls'
                        command).
  -H, --human-readable-sizes
                        Print sizes in human readable form (eg 1kB instead of
                        1234).
  --progress            Display progress meter (default on TTY).
  --no-progress         Don't display progress meter (default on non-TTY).
  --enable              Enable given CloudFront distribution (only for
                        [cfmodify] command)
  --disable             Enable given CloudFront distribution (only for
                        [cfmodify] command)
  --cf-add-cname=CNAME  Add given CNAME to a CloudFront distribution (only for
                        [cfcreate] and [cfmodify] commands)
  --cf-remove-cname=CNAME
                        Remove given CNAME from a CloudFront distribution
                        (only for [cfmodify] command)
  --cf-comment=COMMENT  Set COMMENT for a given CloudFront distribution (only
                        for [cfcreate] and [cfmodify] commands)
  --cf-default-root-object=DEFAULT_ROOT_OBJECT
                        Set the default root object to return when no object
                        is specified in the URL. Use a relative path, i.e.
                        default/index.html instead of /default/index.html or
                        s3://bucket/default/index.html (only for [cfcreate]
                        and [cfmodify] commands)
  -v, --verbose         Enable verbose output.
  -d, --debug           Enable debug output.
  --version             Show s3cmd version (1.0.0) and exit.
  -F, --follow-symlinks
                        Follow symbolic links as if they are regular files

但它没有说明如何检查文件是否已发送并删除已发送的文件。我应该通过 MD5 检查并通过某些shell脚本在本地删除吗?

答案1

顺便说一下,我需要做一些类似的事情,并编写了以下 bash 脚本。它的作用是:

  1. 使用以下命令获取目录中早于 $MINUTES 分钟的文件列表find
  2. 用于lsof确定文件是否已打开(如果文件是由编辑器打开的,则可能不正确)
  3. 用于s3cmd将文件复制到 S3 存储桶中。
  4. 比较 S3 中远程文件和本地文件的 MD5 和。如果符合要求,则删除本地文件。

-

#!/bin/bash
MINUTES=60
TARGET_DIR="s3://AWSbucketname/subfolder/`hostname -s`/"
LOCAL_DIR="/path/to/folder"
FILES=()

echo ""
echo "About to upload files in $LOCAL_DIR up to S3 folder:"
echo "    $TARGET_DIR"
echo "Then delete if MD5 sums line up."
echo "Starting in 5 seconds..."
sleep 5

cd $LOCAL_DIR

# Throw the list of files that the find command gets into an array
while IFS= read -d $'\0' -r file ; do
    FILES=("${FILES[@]}" "$file")
done < <(find $LOCAL_DIR -name \*.wav -mmin +$MINUTES -print0)

# echo "${WAV_FILES[@]}"   # DEBUG

for local_file in "${WAV_FILES[@]}"
do
    # Check that the file in question is not open.
    # lsof returns non-zero return value for file not in use
    lsof "$local_file" 2>&1 > /dev/null
    if test $? -ne 0 ; then
        echo ""
        echo "$local_file isn't open. Copying to S3..."
        s3cmd -p put $local_file $TARGET_DIR
        # s3cmd -n put $local_file $TARGET_DIR # DEBUG - dry-run

        ## Now attempt to delete if the MD5 sums check out:

        remote_file=${local_file##*/}
        md5sum_remote=`s3cmd info  "$TARGET_DIR$remote_file" | grep MD5 | awk '{print $3}'`
        md5sum_local=`md5sum $local_file | awk '{print $1}'`
        if [[ "$md5sum_remote" == "$md5sum_local" ]]; then
          echo "$remote_file MD5 sum checks out. Deleting..."
          rm $local_file
        fi
    fi
done

答案2

过了一段时间,我能够开发一个代码来bash检查md5sum两者s3和我的local文件并删除local 已经存在的文件amazon s3

#!/bin/bash
datacenter="amazon"
hostname=`hostname`;
path="backup/server245"

s3=`s3cmd ls --list-md5 -H s3://company-backup/company/"$datacenter"/"$hostname"/"$path"/`

s3_list=`echo "$s3"|awk {'print $4" "$5'} | sed 's= .*/= ='`

locally=`md5sum /"$path"/*.gz`;
locally_list=$(echo "$locally" | sed 's= .*/= =');
#echo "$locally_list";

IFS=$'\n'
for i in $locally_list
do
  #echo $i
  locally_hash=`echo $i|awk {'print $1'}`
  locally_file=`echo $i|awk {'print $2'}`

  for j in $s3_list
  do
    s3_hash=$(echo $j|awk {'print $1'}); 
    s3_file=$(echo $j|awk {'print $2'});

    #to avoid empty file when have only hash from folder
    if [[ $s3_hash != "" ]] && [[ $s3_file != "" ]]; then 
      if [[ $s3_hash == $locally_hash ]] && [[ $s3_file == $locally_file ]]; then
        echo "### REMOVING ###";
        echo "$locally_file";
        #rm /"$path"/"$locally_file";
      fi
    fi
  done
done
unset IFS

答案3

来自官方文档:

--delete-after(新上传后执行删除[同步])

或者

--delete-after-fetch(获取到本地文件后删除远程对象(仅适用于 [get] 和 [sync] 命令)。)

如果你想从远程同步到本地

https://s3tools.org/usage

答案4

我使用了@Valter Silva 的答案,但也做了一些修改来检查文件大小。

#!/bin/bash
datacenter="amazon"
hostname=`hostname`;
path="backup/server245"

s3=`s3cmd ls --list-md5 -H s3://company-backup/company/"$datacenter"/"$hostname"/"$path"/`

s3_list=`echo "$s3"|awk {'print $4" "$5'} | sed 's/s3:\/\/.*\/\(.*\)/\1/'`

locally=`md5sum /"$path"/*.gz`;
locally_list=$(echo "$locally" | sed 's= .*/= =');

IFS=$'\n'
for i in $locally_list
do
  local_file_hash=`echo $i|awk {'print $1'}`
  local_file_name=`echo $i|awk {'print $2'}`
  local_file_size=`ls -l "$local_path"/"$local_file_name" |awk {'print $5'}`

  for j in $s3_list
  do
    s3_file_size=$(echo $j|awk {'print $1'});
    s3_file_hash=$(echo $j|awk {'print $2'});
    s3_file_name=$(echo $j|awk {'print $3'});

    #to avoid empty file when have only hash from folder
    if [[ $s3_file_hash != "" ]] && [[ $s3_file_name != "" ]]; then
      if [[ $s3_file_hash == $local_file_hash ]] && [[ $s3_file_size == $local_file_size ]] && [[ $s3_file_name == $local_file_name ]]; then
        echo "### REMOVING ###";
        echo "$local_file_name";
        rm "$local_path"/"$local_file_name";
      fi
    fi
  done
done
unset IFS

相关内容