复制并重命名文件 2 目录以上

复制并重命名文件 2 目录以上

我正在尝试将多个名为“F3.bam”的文件复制到两级目录中,然后在复制后用子目录的名称重命名这些文件。

例如:

/samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam
/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam
/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam

预期成绩:

1. 首先将文件复制到上两个目录:

/samples/mydata1/RUN1/ID_date/F3.bam
/samples/mydata2/RUN1/ID2_date4/F3.bam
/samples/mydataxxx/RUN1/IDxxx_datexxx/F3.bam

2. 根据当前子目录的名称重命名文件:

/samples/mydata1/RUN1/ID_date/ID_date_F3.bam
/samples/mydata2/RUN1/ID2_date4/ID2_date4_F3.bam
/samples/mydataxxx/RUN1/IDxxx_datexxx/IDxxx_datexxx_F3.bam

理想情况下,bash 循环会很棒(在 Mac 上工作)。

答案1

这是我的解决方案的 TLDR 版本:您可以使用dirnamebasename命令以及进程替换来构建复制命令的目标路径。

下面是更长的解释。


这是一个(超级详细)脚本,它使用 Bash 循环大致完成您想要的操作:

#!/bin/bash

# copy_and_rename.bash
#
#   Copy multiple files 2 folders up and rename these files
#   to contain their parent directory as a prefix.
#

# Set internal field separator to handle spaces in file names
IFS=$'\n'

# Iterate over the list of file paths
for _file_path in $@; do

    # Get the file name
    _file_name="$(basename ${_file_path})"

    echo "${_file_name}"

    # Get the path to the target directory (two levels above the file)
    _target_directory_path=$(dirname $(dirname ${_file_path}))

    echo "${_target_directory_path}"

    # Get the parent directory of the target directory
    _parent_directory_path=$(dirname ${_target_directory_path})

    echo "${_parent_directory_path}"

    # Get the name of the parent directory
    _parent_directory_name=$(basename ${_parent_directory_path})

    echo "${_parent_directory_name}"

    # Construct the new file path
    _new_file_path="${_target_directory_path}/${_parent_directory_name}_${_file_name}"

    echo "${_new_file_path}"

    # Copy and rename the file
    echo "cp -i \"${_file_path}\" \"${_new_file_path}\""
    cp -i "${_file_path}" "${_new_file_path}"
    echo
done

显然你可以压缩它很多,但我保持这种方式是为了解释价值。

以下是前面的脚本的样子,没有任何注释或多余的变量或echo语句:

for _file_path in $@; do
    cp -i "${_file_path}" \
    "$(dirname $(dirname ${_file_path}))/$(basename $(dirname $(dirname $(dirname ${_file_path}))))_$(basename ${_file_path})"
done

它非常脆弱,并且在错误处理方面没有太多作用。我还留下了一些echo用于调试的语句,以便您可以看到它在做什么,并且可以在第一次运行它时对其进行健全性检查。

为了测试它,我使用以下脚本创建了您的文件,我将其包含在此处,以防您发现它对进一步测试有用:

#!/bin/bash

# create_test_files.bash

# Set internal field separator to handle spaces in file names
IFS=$'\n'

# Choose an prefix for the file paths
_prefix="/tmp"

# Create array of sample files
_sample_files=(
    "/samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam"
    "/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam"
    "/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam"
)

# Create directories and files
for _file in "${_sample_files[@]}"; do

    # Add the prefix to the path
    _path="${_prefix}${_file}"

    # Create parent directory
    mkdir -p "$(dirname ${_path})"

    # Create file
    touch "${_path}"
done

我使用以下命令检查文件是否正确创建find

$ find /tmp/samples -type f

/tmp/samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam
/tmp/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam
/tmp/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam

然后我像这样调用脚本:

bash copy_and_rename.bash \
/tmp/samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam \
/tmp/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam \
/tmp/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam

然后我再次使用来检查脚本是否有效find

$ find /tmp/samples -type f

/tmp/samples/mydata1/RUN1/ID_date/PCR2/ID_date_F3.bam
/tmp/samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam
/tmp/samples/mydata2/RUN1/ID2_date4/PCR2/ID2_date4_F3.bam
/tmp/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam
/tmp/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/IDxxx_datexxx_F3.bam
/tmp/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam

最后,我删除了所有测试文件,也使用find

find /tmp/samples -type f -exec rm {} \;

答案2

此版本仅使用 bash 参数替换来对路径进行切片和切块。向其传递一个或多个绝对文件路径:

#!/bin/env bash
for path; do
    dir="${path%/*}"
    dest="${dir%/*/*}"
    cp "$path" "${dest}/${dest##*/}_${path##*/}"
done

这是一个扩展版本。这个也接受相对路径,并且要遍历的父目录的数量是可调的:

#!/bin/env bash

# Each param for this script is the path of a file. It
# accepts relative paths if you have appropriate tool to
# robustly determine absolute paths (not trivial). Here
# we're using GNU 'realpath' tool.
#
# Usage: copy2up filepath1 [filepath2...]

# for converting relative paths to absolute
# if it's missing replace realpath with available tool
# (or just always use absolute path arguments)
pathtool=realpath

# directory levels upwards to copy files
levels=2

# iterate over each parameter
for path; do
    if [[ ! $path =~ ^/ ]]; then
        # convert relative to absolute
        path="$($pathtool $path)"
    fi
    file="${path##*/}"
    dir="${path%/*}"

    dest=$dir
    # chdir upwards 'levels' times to destination
    for (( i=0; i<$levels; i++ )); do
        dest="${dest%/*}"
    done

    # to be prepended to original filename
    destpfx="${dest##*/}"

    newpath="${dest}/${destpfx}_${file}"
    cp "$path" "$newpath"
done

至于您的具体用例,find如果您是这样定位“F3.bam”文件的,则可以运行此命令。例如:

find /some/path -name F3.bam -exec copy2up.sh {} +

答案3

使用findshell (POSIX sh/bash/Korn/zsh) parameter substitution expansion如下。

find . -type f -name "F3.bam" -execdir sh -c '
    trgt="${PWD%/*/*}"; echo cp -v "$1" "${trgt}/${trgt##*/}_${1#./}" ' _ '{}' \;

说明:

F3.bam我们正在寻找仅与-execdir此处匹配的文件,find即将当前目录更改为找到文件的目录,然后在该目录本身内F3.bam执行。sh -c ' ... '

有了trgt="${PWD%/*/*}"“cut-up-to-first-suffix”:我们正在乘坐文件名本身及其两级子目录/samples/mydata1/RUN1/ID_date**/PCR2/TIME1**大胆的与后缀匹配的部分/*/*将被删除)并分配给变量trgt。所以trgt现在设置/samples/mydata1/RUN1/ID_date为第一个文件。

"$1"相对的文件路径 ./filename到现在$PWD

${trgt##*/}_“cut-up-to-last-prefix”中:我们使用trgt变量值来获取应放在文件名前面的子目录名称,因此这将仅返回ID_date,ID2_date4IDxxx_datexxx等(删除所有内容,直到看到最后一个斜杠/)并添加下划线_

这会从相对的 中${1#./}删除点斜线。././filepath

答案4

dirname您可以根据需要多次嵌套:

set /samples/mydata1/RUN1/ID_date/PCR2/TIME1/F3.bam \
/samples/mydata2/RUN1/ID2_date4/PCR2/TIME7/F3.bam \
/samples/mydataxxx/RUN1/IDxxx_datexxx/PCR2/TIMExxx/F3.bam

for bam; do
  dir="$(dirname "$(dirname "$(dirname "$bam")")")"
  mv "$bam" "$dir"/"$(basename "$dir")"_"$(basename "$bam")"
done

相关内容