分割一个大的.gz 文件并对每个分割文件进行 gzip 压缩？

Question 1

使用一个小型的 perl 程序就可以最好地完成此任务。

我在这里敲了一个：ftp://ftp.sqsol.co.uk/pub/tools/zsplit/

看一下它并随意修改它以满足您的个人需要。

Answer

使用一个小型的 perl 程序就可以最好地完成此任务。

我在这里敲了一个：ftp://ftp.sqsol.co.uk/pub/tools/zsplit/

看一下它并随意修改它以满足您的个人需要。

Question 2

这是一个循环awk，gzip它将按照行边界分割文件并压缩各个部分：

# Generate files part0.dat.gz, part1.dat.gz, etc.
prefix="part"
count=0
suffix=".dat"

lines=10000 # Split every 10000 line.

zcat thefile.dat.gz |
while true; do
  partname=${prefix}${count}${suffix}

  # Use awk to read the required number of lines from the input stream.
  awk -v lines=${lines} 'NR <= lines {print} NR == lines {exit}' >${partname}

  if [[ -s ${partname} ]]; then
    # Compress this part file.
    gzip --best ${partname}
    (( ++count ))
  else
    # Last file generated is empty, delete it.
    rm -f ${partname}
    break
  fi
done

要重新创建原始文件，只需zcat part*.dat.gz | gzip --best >thefile1.dat.gz。由于gzip使用的压缩选项不同，压缩文件的 MD5 校验和可能与原始文件不同，但未压缩的文件完全相同。

Answer

这是一个循环awk，gzip它将按照行边界分割文件并压缩各个部分：

# Generate files part0.dat.gz, part1.dat.gz, etc.
prefix="part"
count=0
suffix=".dat"

lines=10000 # Split every 10000 line.

zcat thefile.dat.gz |
while true; do
  partname=${prefix}${count}${suffix}

  # Use awk to read the required number of lines from the input stream.
  awk -v lines=${lines} 'NR <= lines {print} NR == lines {exit}' >${partname}

  if [[ -s ${partname} ]]; then
    # Compress this part file.
    gzip --best ${partname}
    (( ++count ))
  else
    # Last file generated is empty, delete it.
    rm -f ${partname}
    break
  fi
done

要重新创建原始文件，只需zcat part*.dat.gz | gzip --best >thefile1.dat.gz。由于gzip使用的压缩选项不同，压缩文件的 MD5 校验和可能与原始文件不同，但未压缩的文件完全相同。

分割一个大的.gz 文件并对每个分割文件进行 gzip 压缩？

答案1

答案2

相关内容