脚步

脚步

我想将 sourcefile.txt其中包含 10000 行(每天都在增加)分成 30 个相等的文件。我有调用的目录prog1prog30我想将拆分的文件保存到具有相同文件名的这些目录中。例如/prog1/myfile.txt/prog2/myfile.txt/prog30/myfile.txt

这是我的 bash 脚本,名为在目录divide.sh中运行prog

#!/bin/bash
programpath=/home/mywebsite/project/a1/
array=/prog1/
totalline=$(wc -l < ./sourcefile.txt)   
divide="$(( $totalline / 30 ))"   
split --lines=$divide $./prog1/myfile.txt    
exit 1
fi

答案1

#!/bin/bash

# assuming the file is in the same folder as the script
INPUT=large_file.txt
# assuming the folder called "output" is in the same folder
# as the script and there are folders that have the patter
# prog01 prog02 ... prog30
# create that with mkdir output/prog{01..30} 
OUTPUT_FOLDER=output

OUTPUT_FILE_FORMAT=myfile

# split 
# -n -> 30 files
# $OUTPUT_FILE_FORMAT -> should start with this pattern
# --numeric-suffixes=1 -> end of file name should start from 01 
split -n 30 $INPUT $OUTPUT_FILE_FORMAT --numeric-suffixes=1

# move all files to their repective directories
for i in {01..30} 
do
    mv $OUTPUT_FILE_FORMAT$i $OUTPUT_FOLDER/prog$i/myfile.txt
done

echo "done :)"

exit

split 命令足以完成此任务。但是,这里的解决方案要求您使文件夹名称开始于prog01而不是prog1

答案2

唯一awk的解决方案(这里等于 30 个文件):

awk 'BEGIN{ cmd="wc -l <sourcefile.txt"; cmd|getline l; l=int((l+29)/30); close(cmd) } 
    NR%l==1{trgt=sprintf("prog%d",((++c)))}{print >trgt"/myfile.txt"}' sourcefile.txt

或者让 shell 运行并返回行数源文件.txtawk并按照建议传递给杰蒂尔

awk 'NR%l==1{trgt=sprintf("prog%d",((++c)))}{print >trgt"/myfile.txt"}' 
    l=$(( ($(wc -l <sourcefile.txt)+29)/30 )) sourcefile.txt

答案3

split+bash解决方案:

lines=$(echo "t=$(wc -l ./sourcefile.txt | cut -d' ' -f1); d=30; if(t%d) t/d+1 else t/d" | bc)
split -l $lines ./sourcefile.txt "myfile.txt" --numeric-suffixes=1

for f in myfile.txt[0-9]*; do 
    dir_n="prog"$(printf "%d" "${f#*txt}")  # constructing directory name
    mv "$f" "$dir_n/myfile.txt"
done

假设你已经有名为 prog1 到 prog30 的文件夹(正如你提到的)

  • lines- 包含每个输出文件的整数行数

    • t- 文件的总行数./sourcefile.txt
    • d=30是一个分隔符
  • --numeric-suffixes=1-分裂的选项,告诉使用从以下位置开始的数字后缀1

答案4

脚步

  1. 计算文件中的行数并除以 30 lines = cat ${file} | wc -l

  2. 获取您需要的文件数量(bash 会将其四舍五入为整数) numOfFiles = ${lines} / 30

  3. 使用 split 来分割文件 split -l ${lines} -d --additional-suffix=-filename.extension ${file}

预期结果

x01-文件名.扩展名、x02-文件名.扩展名... xN-文件名.扩展名

将其包装到 for 循环中以一次处理多个文件

#!/bin/bash    
for FILE in $(find ${pathToWorkingDir} -type f -name "filename.extension")
do
    split -l ${lines} -d --additional-suffix=-filename.extension ${file}
    if [ $? -eq 0 ]; then
        echo "${file} splitted file correctly"
    else
        echo "there was a problem splitting ${file}"
        exit 1 #we exit with an error code
    fi
done
exit 0 #if all processed fine we exit with a success code

相关内容