我有多个文件夹,其中有很多文件,每个文件夹都有同名的txt文件,我想将同名的文件合并到一个txt文件中。
例子:
folder/
-sub1
-sub2
-sub3
.
.
.
-sub28
每个子文件夹中都有多个文件:
EAF001.ID001.txt EAF001.ID002.txt EAF001.ID003.txt EAF001.ID004.txt
EAF001.ID005.txt EAF001.ID006.txt EAF001.ID007.txt EAF001.ID008.txt
EAF001.ID009.txt EAF001.ID010.txt EAF001.ID011.txt EAF001.ID012.txt
EAF001.ID013.txt EAF001.ID014.txt EAF001.ID015.txt EAF001.ID016.txt
我想要合并具有相同名称的文件。
EAF001.ID001.merge.txt EAF001.ID002.merge.txt EAF001.ID003.merge.txt EAF001.ID004.merge.txt
EAF001.ID005.merge.txt EAF001.ID006.merge.txt EAF001.ID007.merge.txt EAF001.ID008.merge.txt
EAF001.ID009.merge.txt EAF001.ID010.merge.txt EAF001.ID011.merge.txt EAF001.ID012.merge.txt
EAF001.ID013.merge.txt EAF001.ID014.merge.txt EAF001.ID015.merge.txt EAF001.ID016.merge.txt
任何帮助将非常感激。
答案1
export dir='/path/to/folder'
find "$dir" -mindepth 2 -type f -name 'EAF*.txt' \
-exec sh -c 'for f; do
bn=$(basename "$f" .txt);
cat "$f" >> "$dir/$bn.merged.txt";
done' sh {} +
该-mindepth 2
选项排除 /path/to/folder 目录本身中的文件进行处理(即,它只查找子目录中的文件),这样如果输出文件已经存在,它就不会将输出文件连接到自身上。
无论是否存在重复的文件名,这都会将文件附加到“merged.txt”输出文件中。
如果您只想合并重复的文件名:
typeset -Ax counts # declare $counts to be an exported associative array
export dir='/path/to/folder'
# find out how many there are of each filename
while read -d '' -r f; do
let counts[$f]++;
done < <(find "$dir" -mindepth 2 -type f -name 'EAF*.txt' -print0)
# concatenate only the duplicates
find "$dir" -mindepth 2 -type f -name 'EAF*.txt' \
-exec bash -c 'for f; do
if [ "${counts[$f]}" -gt 1 ]; then
bn=$(basename "$f" .txt);
cat "$f" >> "$dir/$bn.merged.txt";
fi
done' sh {} +
这需要bash
或一些其他支持关联数组的 shell(即不是 POSIX sh
)。
答案2
find
您可以循环遍历 txt 文件并使用和计算重复名称wc
。如果重复名称的计数大于 1,则将其附加到 merge.txt 文件。
#!/bin/bash
output_dir="output"
rm -rf "$output_dir"
mkdir "$output_dir"
for file in */*.txt; do
file_name=$(basename "$file" .txt)
duplicate_names_count=$(find . -type f -name "$file_name.txt" | wc -l)
if [ "$duplicate_names_count" -gt 1 ]; then
cat "$file" >> "$output_dir/${file_name}.merge.txt"
fi
done