通过模式匹配将文件拆分为特定的输出文件名

Question 1

我仍然会考虑使用csplit，但随后重命名生成的文件。

#!/bin/sh
mkdir ".tmp.$$" || exit 2
csplit -f ".tmp.$$/tmp_" -zk -n 4 "$1" '/# new file/' '{*}'

for file in ".tmp.$$"/tmp_*
do
    shift
    mv -f "$file" "$1"
done
if ! rmdir ".tmp.$$" 2>/dev/null
then
    echo "Warning: not all file parts were assigned" >&2
    rm -rf ".tmp.$$"
    exit 1
fi
exit 0

用法

mysplit <source_file> <target_names...>

Answer

我仍然会考虑使用csplit，但随后重命名生成的文件。

#!/bin/sh
mkdir ".tmp.$$" || exit 2
csplit -f ".tmp.$$/tmp_" -zk -n 4 "$1" '/# new file/' '{*}'

for file in ".tmp.$$"/tmp_*
do
    shift
    mv -f "$file" "$1"
done
if ! rmdir ".tmp.$$" 2>/dev/null
then
    echo "Warning: not all file parts were assigned" >&2
    rm -rf ".tmp.$$"
    exit 1
fi
exit 0

用法

mysplit <source_file> <target_names...>

Question 2

即使在文本文件和文件名中包含空格和非 ASCII 字符，也可以使用此方法，而无需使用临时文件：

infile:

# new file
text in file1

blabla
# new file
text in file2
# new file
text in file3

$//*+\

s
# new file
4!
aaaaaaaaa
i^
# new file

#¬}}{][|\~@

必须为 awk 命令提供文件名作为单独的参数，并使用单引号，这样 shell 就不会展开（双引号），在此split.sh脚本中：

awk -v file="0" '
  BEGIN { 
    print "AWK arguments:"
    for (i = 0; i < ARGC; i++){
    ARRAY[i] = ARGV[i]
    print "\047"ARRAY[i]"\047"
    if (i > 1){
      ARGV[i] = ""
    }
  }
  print "Writing:"
}
!/^# new file$/{
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 >> ARRAY[file+1]
}
/^# new file$/{
  close(file)
  ++file
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 > ARRAY[file+1]
}
' 'infile' '1.txt' '2.txt' '3.txt' 'file $_%.txt' '&file  _.txt'

控制台看起来像这样：

AWK arguments:
'awk'
'infile'
'1.txt'
'2.txt'
'3.txt'
'file $_%.txt'
'&file  _.txt'
Writing:
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'

如果参数作为另一个命令的输出传递（文件必须先前存在于文件系统中）：

' $(ls infile | tr '\n' ' ' ; ls *.txt)

它用空格分割参数：

AWK arguments:
'awk'
'infile'
'&file'
'_.txt'
'1.txt'
'2.txt'
'3.txt'
'_.txt'
'file'
'$_%.txt'
Writing:
writing to: '&file'
writing to: '&file'
writing to: '&file'
writing to: '&file'
writing to: '_.txt'
writing to: '_.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'

为了解决这个问题，将参数作为数组传递给 awk，用换行符而不是空格分隔，使用以下split.sh脚本：

array=(infile *.txt)
awk -v file="0" '
  BEGIN { 
    print "AWK arguments:"
    for (i = 0; i < ARGC; i++){
    ARRAY[i] = ARGV[i]
    print "\047"ARRAY[i]"\047"
    if (i > 1){
      ARGV[i] = ""
    }
  }
  print "Writing:"
}
!/^# new file$/{
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 >> ARRAY[file+1]
}
/^# new file$/{
  close(file)
  ++file
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 > ARRAY[file+1]
}
' "${array[@]}"

现在结果是：

AWK arguments:
'awk'
'infile'
'&file  _.txt'
'1.txt'
'2.txt'
'3.txt'
'file $_%.txt'
Writing:
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'

要写入的文件数量必须至少与将执行的拆分数量相同。如果多了，其余的将被忽略。

Answer