我正在尝试将多个名为“F3.bam”的文件复制到两级目录中,然后在复制后用子目录的名称重命名这些文件。
例如:
/samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam
/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam
/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam
预期成绩:
1. 首先将文件复制到上两个目录:
/samples/mydata1/RUN1/ID_date/F3.bam
/samples/mydata2/RUN1/ID2_date4/F3.bam
/samples/mydataxxx/RUN1/IDxxx_datexxx/F3.bam
2. 根据当前子目录的名称重命名文件:
/samples/mydata1/RUN1/ID_date/ID_date_F3.bam
/samples/mydata2/RUN1/ID2_date4/ID2_date4_F3.bam
/samples/mydataxxx/RUN1/IDxxx_datexxx/IDxxx_datexxx_F3.bam
理想情况下,bash 循环会很棒(在 Mac 上工作)。
答案1
这是我的解决方案的 TLDR 版本:您可以使用dirname
和basename
命令以及进程替换来构建复制命令的目标路径。
下面是更长的解释。
这是一个(超级详细)脚本,它使用 Bash 循环大致完成您想要的操作:
#!/bin/bash
# copy_and_rename.bash
#
# Copy multiple files 2 folders up and rename these files
# to contain their parent directory as a prefix.
#
# Set internal field separator to handle spaces in file names
IFS=$'\n'
# Iterate over the list of file paths
for _file_path in $@; do
# Get the file name
_file_name="$(basename ${_file_path})"
echo "${_file_name}"
# Get the path to the target directory (two levels above the file)
_target_directory_path=$(dirname $(dirname ${_file_path}))
echo "${_target_directory_path}"
# Get the parent directory of the target directory
_parent_directory_path=$(dirname ${_target_directory_path})
echo "${_parent_directory_path}"
# Get the name of the parent directory
_parent_directory_name=$(basename ${_parent_directory_path})
echo "${_parent_directory_name}"
# Construct the new file path
_new_file_path="${_target_directory_path}/${_parent_directory_name}_${_file_name}"
echo "${_new_file_path}"
# Copy and rename the file
echo "cp -i \"${_file_path}\" \"${_new_file_path}\""
cp -i "${_file_path}" "${_new_file_path}"
echo
done
显然你可以压缩它很多,但我保持这种方式是为了解释价值。
以下是前面的脚本的样子,没有任何注释或多余的变量或echo
语句:
for _file_path in $@; do
cp -i "${_file_path}" \
"$(dirname $(dirname ${_file_path}))/$(basename $(dirname $(dirname $(dirname ${_file_path}))))_$(basename ${_file_path})"
done
它非常脆弱,并且在错误处理方面没有太多作用。我还留下了一些echo
用于调试的语句,以便您可以看到它在做什么,并且可以在第一次运行它时对其进行健全性检查。
为了测试它,我使用以下脚本创建了您的文件,我将其包含在此处,以防您发现它对进一步测试有用:
#!/bin/bash
# create_test_files.bash
# Set internal field separator to handle spaces in file names
IFS=$'\n'
# Choose an prefix for the file paths
_prefix="/tmp"
# Create array of sample files
_sample_files=(
"/samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam"
"/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam"
"/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam"
)
# Create directories and files
for _file in "${_sample_files[@]}"; do
# Add the prefix to the path
_path="${_prefix}${_file}"
# Create parent directory
mkdir -p "$(dirname ${_path})"
# Create file
touch "${_path}"
done
我使用以下命令检查文件是否正确创建find
:
$ find /tmp/samples -type f
/tmp/samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam
/tmp/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam
/tmp/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam
然后我像这样调用脚本:
bash copy_and_rename.bash \
/tmp/samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam \
/tmp/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam \
/tmp/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam
然后我再次使用来检查脚本是否有效find
:
$ find /tmp/samples -type f
/tmp/samples/mydata1/RUN1/ID_date/PCR2/ID_date_F3.bam
/tmp/samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam
/tmp/samples/mydata2/RUN1/ID2_date4/PCR2/ID2_date4_F3.bam
/tmp/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam
/tmp/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/IDxxx_datexxx_F3.bam
/tmp/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam
最后,我删除了所有测试文件,也使用find
:
find /tmp/samples -type f -exec rm {} \;
答案2
此版本仅使用 bash 参数替换来对路径进行切片和切块。向其传递一个或多个绝对文件路径:
#!/bin/env bash
for path; do
dir="${path%/*}"
dest="${dir%/*/*}"
cp "$path" "${dest}/${dest##*/}_${path##*/}"
done
这是一个扩展版本。这个也接受相对路径,并且要遍历的父目录的数量是可调的:
#!/bin/env bash
# Each param for this script is the path of a file. It
# accepts relative paths if you have appropriate tool to
# robustly determine absolute paths (not trivial). Here
# we're using GNU 'realpath' tool.
#
# Usage: copy2up filepath1 [filepath2...]
# for converting relative paths to absolute
# if it's missing replace realpath with available tool
# (or just always use absolute path arguments)
pathtool=realpath
# directory levels upwards to copy files
levels=2
# iterate over each parameter
for path; do
if [[ ! $path =~ ^/ ]]; then
# convert relative to absolute
path="$($pathtool $path)"
fi
file="${path##*/}"
dir="${path%/*}"
dest=$dir
# chdir upwards 'levels' times to destination
for (( i=0; i<$levels; i++ )); do
dest="${dest%/*}"
done
# to be prepended to original filename
destpfx="${dest##*/}"
newpath="${dest}/${destpfx}_${file}"
cp "$path" "$newpath"
done
至于您的具体用例,find
如果您是这样定位“F3.bam”文件的,则可以运行此命令。例如:
find /some/path -name F3.bam -exec copy2up.sh {} +
答案3
使用find
和shell (POSIX sh/bash/Korn/zsh) parameter substitution expansion
如下。
find . -type f -name "F3.bam" -execdir sh -c '
trgt="${PWD%/*/*}"; echo cp -v "$1" "${trgt}/${trgt##*/}_${1#./}" ' _ '{}' \;
说明::
F3.bam
我们正在寻找仅与-execdir
此处匹配的文件,find
即将当前目录更改为找到文件的目录,然后在该目录本身内F3.bam
执行。sh -c ' ... '
有了trgt="${PWD%/*/*}"
“cut-up-to-first-suffix”:我们正在乘坐文件名本身及其两级子目录/samples/mydata1/RUN1/ID_date**/PCR2/TIME1**
(大胆的与后缀匹配的部分/*/*
将被删除)并分配给变量trgt
。所以trgt
现在设置/samples/mydata1/RUN1/ID_date
为第一个文件。
是"$1"
相对的文件路径 ./filename
到现在$PWD
。
在${trgt##*/}_
“cut-up-to-last-prefix”中:我们使用trgt
变量值来获取应放在文件名前面的子目录名称,因此这将仅返回ID_date
,ID2_date4
或IDxxx_datexxx
等(删除所有内容,直到看到最后一个斜杠/
)并添加下划线_
。
这会从相对的 中${1#./}
删除点斜线。./
./filepath
答案4
dirname
您可以根据需要多次嵌套:
set /samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam \
/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam \
/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam
for bam; do
dir="$(dirname "$(dirname "$(dirname "$bam")")")"
mv "$bam" "$dir"/"$(basename "$dir")"_"$(basename "$bam")"
done