处理大量文件（300k+）以收集结果的更有效方法？

Question 1

如果我理解正确的话，你有一个fields.txt包含很多行的文件。你有几个res-0-n-0文件。并且，对于fields.txt您中的每一行，将其复制到文件results.txt的内容（如果存在）中res-0-<line_number>。

我认为您可以简单地fields.txt逐行读取文件，并根据需要回显文件results.txt内容中的行。res-0-<line_number>

我会选择这样的东西：

#! /bin/sh

LINE_NUMBER=0
while read line;
do
  if [ -f "res-0-$LINE_NUMBER-0" ]
  then
    echo "$line $(cat res-0-$LINE_NUMBER-0)" >> result.txt
  else
    echo "$line" >> result.txt
  fi
  ((LINE_NUMBER++))
done < fields.txt

Answer

如果我理解正确的话，你有一个fields.txt包含很多行的文件。你有几个res-0-n-0文件。并且，对于fields.txt您中的每一行，将其复制到文件results.txt的内容（如果存在）中res-0-<line_number>。

我认为您可以简单地fields.txt逐行读取文件，并根据需要回显文件results.txt内容中的行。res-0-<line_number>

我会选择这样的东西：

#! /bin/sh

LINE_NUMBER=0
while read line;
do
  if [ -f "res-0-$LINE_NUMBER-0" ]
  then
    echo "$line $(cat res-0-$LINE_NUMBER-0)" >> result.txt
  else
    echo "$line" >> result.txt
  fi
  ((LINE_NUMBER++))
done < fields.txt

Question 2

尝试生成 sed 脚本，然后仅在文件上应用一次field.txt：

while IFS='' read -r line; do
    res=$(<res-0-"$line"-0)
    real_line=$(( line + 1 ))
    prinft "%s" "${real_line}s/.$/ ${res}/" >> myscript.sed
done < res_numbers_sorted.tmp

然后执行以下操作：

sed -i -f myscript.sed field.txt

这样，您只需对大文件进行一次迭代。让我知道这是否有帮助。

Answer

尝试生成 sed 脚本，然后仅在文件上应用一次field.txt：

while IFS='' read -r line; do
    res=$(<res-0-"$line"-0)
    real_line=$(( line + 1 ))
    prinft "%s" "${real_line}s/.$/ ${res}/" >> myscript.sed
done < res_numbers_sorted.tmp

然后执行以下操作：

sed -i -f myscript.sed field.txt

这样，您只需对大文件进行一次迭代。让我知道这是否有帮助。

处理大量文件（300k+）以收集结果的更有效方法？

答案1

答案2

相关内容