我正在尝试使用Amazon S3
服务从我的应用程序中进行存储logs
。Given/user/bin/s3cmd --help
告诉我需要了解如何发送文件:
s3cmd --help
usage: s3cmd [options] COMMAND [parameters]
S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing "buckets" and uploading, downloading and removing
"objects" from these buckets.
options:
-h, --help show this help message and exit
--configure Invoke interactive (re)configuration tool.
-c FILE, --config=FILE
Config file name. Defaults to
/home/valter.silva/.s3cfg
--dump-config Dump current configuration after parsing config files
and command line options and exit.
-n, --dry-run Only show what should be uploaded or downloaded but
don't actually do it. May still perform S3 requests to
get bucket listings and other information though (only
for file transfer commands)
-e, --encrypt Encrypt files before uploading to S3.
--no-encrypt Don't encrypt files.
-f, --force Force overwrite and other dangerous operations.
--continue Continue getting a partially downloaded file (only for
[get] command).
--skip-existing Skip over files that exist at the destination (only
for [get] and [sync] commands).
-r, --recursive Recursive upload, download or removal.
--check-md5 Check MD5 sums when comparing files for [sync].
(default)
--no-check-md5 Do not check MD5 sums when comparing files for [sync].
Only size will be compared. May significantly speed up
transfer but may also miss some changed files.
-P, --acl-public Store objects with ACL allowing read for anyone.
--acl-private Store objects with default ACL allowing access for you
only.
--acl-grant=PERMISSION:EMAIL or USER_CANONICAL_ID
Grant stated permission to a given amazon user.
Permission is one of: read, write, read_acp,
write_acp, full_control, all
--acl-revoke=PERMISSION:USER_CANONICAL_ID
Revoke stated permission for a given amazon user.
Permission is one of: read, write, read_acp, wr
ite_acp, full_control, all
--delete-removed Delete remote objects with no corresponding local file
[sync]
--no-delete-removed Don't delete remote objects.
-p, --preserve Preserve filesystem attributes (mode, ownership,
timestamps). Default for [sync] command.
--no-preserve Don't store FS attributes
--exclude=GLOB Filenames and paths matching GLOB will be excluded
from sync
--exclude-from=FILE Read --exclude GLOBs from FILE
--rexclude=REGEXP Filenames and paths matching REGEXP (regular
expression) will be excluded from sync
--rexclude-from=FILE Read --rexclude REGEXPs from FILE
--include=GLOB Filenames and paths matching GLOB will be included
even if previously excluded by one of
--(r)exclude(-from) patterns
--include-from=FILE Read --include GLOBs from FILE
--rinclude=REGEXP Same as --include but uses REGEXP (regular expression)
instead of GLOB
--rinclude-from=FILE Read --rinclude REGEXPs from FILE
--bucket-location=BUCKET_LOCATION
Datacentre to create bucket in. As of now the
datacenters are: US (default), EU, us-west-1, and ap-
southeast-1
--reduced-redundancy, --rr
Store object with 'Reduced redundancy'. Lower per-GB
price. [put, cp, mv]
--access-logging-target-prefix=LOG_TARGET_PREFIX
Target prefix for access logs (S3 URI) (for [cfmodify]
and [accesslog] commands)
--no-access-logging Disable access logging (for [cfmodify] and [accesslog]
commands)
-m MIME/TYPE, --mime-type=MIME/TYPE
Default MIME-type to be set for objects stored.
-M, --guess-mime-type
Guess MIME-type of files by their extension. Falls
back to default MIME-Type as specified by --mime-type
option
--add-header=NAME:VALUE
Add a given HTTP header to the upload request. Can be
used multiple times. For instance set 'Expires' or
'Cache-Control' headers (or both) using this options
if you like.
--encoding=ENCODING Override autodetected terminal and filesystem encoding
(character set). Autodetected: UTF-8
--verbatim Use the S3 name as given on the command line. No pre-
processing, encoding, etc. Use with caution!
--list-md5 Include MD5 sums in bucket listings (only for 'ls'
command).
-H, --human-readable-sizes
Print sizes in human readable form (eg 1kB instead of
1234).
--progress Display progress meter (default on TTY).
--no-progress Don't display progress meter (default on non-TTY).
--enable Enable given CloudFront distribution (only for
[cfmodify] command)
--disable Enable given CloudFront distribution (only for
[cfmodify] command)
--cf-add-cname=CNAME Add given CNAME to a CloudFront distribution (only for
[cfcreate] and [cfmodify] commands)
--cf-remove-cname=CNAME
Remove given CNAME from a CloudFront distribution
(only for [cfmodify] command)
--cf-comment=COMMENT Set COMMENT for a given CloudFront distribution (only
for [cfcreate] and [cfmodify] commands)
--cf-default-root-object=DEFAULT_ROOT_OBJECT
Set the default root object to return when no object
is specified in the URL. Use a relative path, i.e.
default/index.html instead of /default/index.html or
s3://bucket/default/index.html (only for [cfcreate]
and [cfmodify] commands)
-v, --verbose Enable verbose output.
-d, --debug Enable debug output.
--version Show s3cmd version (1.0.0) and exit.
-F, --follow-symlinks
Follow symbolic links as if they are regular files
但它没有说明如何检查文件是否已发送并删除已发送的文件。我应该通过 MD5 检查并通过某些shell
脚本在本地删除吗?
答案1
顺便说一下,我需要做一些类似的事情,并编写了以下 bash 脚本。它的作用是:
- 使用以下命令获取目录中早于 $MINUTES 分钟的文件列表
find
- 用于
lsof
确定文件是否已打开(如果文件是由编辑器打开的,则可能不正确) - 用于
s3cmd
将文件复制到 S3 存储桶中。 - 比较 S3 中远程文件和本地文件的 MD5 和。如果符合要求,则删除本地文件。
-
#!/bin/bash
MINUTES=60
TARGET_DIR="s3://AWSbucketname/subfolder/`hostname -s`/"
LOCAL_DIR="/path/to/folder"
FILES=()
echo ""
echo "About to upload files in $LOCAL_DIR up to S3 folder:"
echo " $TARGET_DIR"
echo "Then delete if MD5 sums line up."
echo "Starting in 5 seconds..."
sleep 5
cd $LOCAL_DIR
# Throw the list of files that the find command gets into an array
while IFS= read -d $'\0' -r file ; do
FILES=("${FILES[@]}" "$file")
done < <(find $LOCAL_DIR -name \*.wav -mmin +$MINUTES -print0)
# echo "${WAV_FILES[@]}" # DEBUG
for local_file in "${WAV_FILES[@]}"
do
# Check that the file in question is not open.
# lsof returns non-zero return value for file not in use
lsof "$local_file" 2>&1 > /dev/null
if test $? -ne 0 ; then
echo ""
echo "$local_file isn't open. Copying to S3..."
s3cmd -p put $local_file $TARGET_DIR
# s3cmd -n put $local_file $TARGET_DIR # DEBUG - dry-run
## Now attempt to delete if the MD5 sums check out:
remote_file=${local_file##*/}
md5sum_remote=`s3cmd info "$TARGET_DIR$remote_file" | grep MD5 | awk '{print $3}'`
md5sum_local=`md5sum $local_file | awk '{print $1}'`
if [[ "$md5sum_remote" == "$md5sum_local" ]]; then
echo "$remote_file MD5 sum checks out. Deleting..."
rm $local_file
fi
fi
done
答案2
过了一段时间,我能够开发一个代码来bash
检查md5sum
两者s3
和我的local
文件并删除local
已经存在的文件amazon s3
:
#!/bin/bash
datacenter="amazon"
hostname=`hostname`;
path="backup/server245"
s3=`s3cmd ls --list-md5 -H s3://company-backup/company/"$datacenter"/"$hostname"/"$path"/`
s3_list=`echo "$s3"|awk {'print $4" "$5'} | sed 's= .*/= ='`
locally=`md5sum /"$path"/*.gz`;
locally_list=$(echo "$locally" | sed 's= .*/= =');
#echo "$locally_list";
IFS=$'\n'
for i in $locally_list
do
#echo $i
locally_hash=`echo $i|awk {'print $1'}`
locally_file=`echo $i|awk {'print $2'}`
for j in $s3_list
do
s3_hash=$(echo $j|awk {'print $1'});
s3_file=$(echo $j|awk {'print $2'});
#to avoid empty file when have only hash from folder
if [[ $s3_hash != "" ]] && [[ $s3_file != "" ]]; then
if [[ $s3_hash == $locally_hash ]] && [[ $s3_file == $locally_file ]]; then
echo "### REMOVING ###";
echo "$locally_file";
#rm /"$path"/"$locally_file";
fi
fi
done
done
unset IFS
答案3
来自官方文档:
--delete-after(新上传后执行删除[同步])
或者
--delete-after-fetch(获取到本地文件后删除远程对象(仅适用于 [get] 和 [sync] 命令)。)
如果你想从远程同步到本地
答案4
我使用了@Valter Silva 的答案,但也做了一些修改来检查文件大小。
#!/bin/bash
datacenter="amazon"
hostname=`hostname`;
path="backup/server245"
s3=`s3cmd ls --list-md5 -H s3://company-backup/company/"$datacenter"/"$hostname"/"$path"/`
s3_list=`echo "$s3"|awk {'print $4" "$5'} | sed 's/s3:\/\/.*\/\(.*\)/\1/'`
locally=`md5sum /"$path"/*.gz`;
locally_list=$(echo "$locally" | sed 's= .*/= =');
IFS=$'\n'
for i in $locally_list
do
local_file_hash=`echo $i|awk {'print $1'}`
local_file_name=`echo $i|awk {'print $2'}`
local_file_size=`ls -l "$local_path"/"$local_file_name" |awk {'print $5'}`
for j in $s3_list
do
s3_file_size=$(echo $j|awk {'print $1'});
s3_file_hash=$(echo $j|awk {'print $2'});
s3_file_name=$(echo $j|awk {'print $3'});
#to avoid empty file when have only hash from folder
if [[ $s3_file_hash != "" ]] && [[ $s3_file_name != "" ]]; then
if [[ $s3_file_hash == $local_file_hash ]] && [[ $s3_file_size == $local_file_size ]] && [[ $s3_file_name == $local_file_name ]]; then
echo "### REMOVING ###";
echo "$local_file_name";
rm "$local_path"/"$local_file_name";
fi
fi
done
done
unset IFS