如何在出现N次模式后将一个文件拆分为多个文件？

Question 1

一种方法是使用awk：

awk -v moleculesNum=7 '
/^@<TRIPOS>MOLECULE/{
    if((++num)%moleculesNum==1){
        close(outfile); outfile="file" (++Output)
    }
}
{ print >outfile }' infile

这会将原始文件分割成多个文件，每个文件最多有 7 个分子（可在moleculesNum=7参数中调整）

Answer

一种方法是使用awk：

awk -v moleculesNum=7 '
/^@<TRIPOS>MOLECULE/{
    if((++num)%moleculesNum==1){
        close(outfile); outfile="file" (++Output)
    }
}
{ print >outfile }' infile

这会将原始文件分割成多个文件，每个文件最多有 7 个分子（可在moleculesNum=7参数中调整）

Question 2

以下是bash基于 - 的csplit实用程序方法：

### user customization section
tmpdir=$(mktemp -d)
prefix='outfile'
bunch=5
pat='@<TRIPOS>MOLECULE'

## break up the input file on pattern
csplit ./file \
  --silent \
  --elide-empty-files \
  --prefix "$tmpdir/$prefix" \
  --suffix-format='%d.tmp' \
  "/$pat/+1" '{*}' \
;

## coalesce the split up files into bunches
i=0
while :; do
  start=$(( bunch * i ))
  stop=$(( start + bunch - 1 ))
  for ((j=start; j<=stop; j++)) {
    printf '%s\n' "$tmpdir/$prefix$j.tmp"
  } | xargs cat > "./$prefix.$i" 2>/dev/null || break
  (( i++ ))
done

当前目录将保存 outfiles.* 束。

Answer

以下是bash基于 - 的csplit实用程序方法：

### user customization section
tmpdir=$(mktemp -d)
prefix='outfile'
bunch=5
pat='@<TRIPOS>MOLECULE'

## break up the input file on pattern
csplit ./file \
  --silent \
  --elide-empty-files \
  --prefix "$tmpdir/$prefix" \
  --suffix-format='%d.tmp' \
  "/$pat/+1" '{*}' \
;

## coalesce the split up files into bunches
i=0
while :; do
  start=$(( bunch * i ))
  stop=$(( start + bunch - 1 ))
  for ((j=start; j<=stop; j++)) {
    printf '%s\n' "$tmpdir/$prefix$j.tmp"
  } | xargs cat > "./$prefix.$i" 2>/dev/null || break
  (( i++ ))
done

当前目录将保存 outfiles.* 束。

如何在出现N次模式后将一个文件拆分为多个文件？

答案1

答案2

相关内容