我有一个包含以下内容的文件:
# new file
text in file 1
# new file
text in file 2
# new file
text in file 3
这里的模式是# new file
。
我没有将每个文件保存到 xx00、xx01 和 xx02,而是保存到特定文件:another file
、file new
、last one
。
这3个文件存在于当前目录中,所以我想将它们作为数组提供,覆盖它们:
csplit -z infile '/# new file/' "${array[*]}"
可以直接提供数组
array=('another file' 'file new' 'last one')
echo ${array[*]}
another file file new last one
或者列出当前目录
array=($(find . -type f))
echo ${array[*]}
./another file ./file new ./last one
的修改这个脚本可能是解决方案:
awk -v file="1" -v occur="2" '
{
print > (file".txt")
}
/^\$\$\$\$$/{
count++
if(count%occur==0){
if(file){
close(file".txt")
++file
}
}
}
' Input_file
答案1
我仍然会考虑使用csplit
,但随后重命名生成的文件。
#!/bin/sh
mkdir ".tmp.$$" || exit 2
csplit -f ".tmp.$$/tmp_" -zk -n 4 "$1" '/# new file/' '{*}'
for file in ".tmp.$$"/tmp_*
do
shift
mv -f "$file" "$1"
done
if ! rmdir ".tmp.$$" 2>/dev/null
then
echo "Warning: not all file parts were assigned" >&2
rm -rf ".tmp.$$"
exit 1
fi
exit 0
用法
mysplit <source_file> <target_names...>
答案2
即使在文本文件和文件名中包含空格和非 ASCII 字符,也可以使用此方法,而无需使用临时文件:
infile
:
# new file
text in file1
blabla
# new file
text in file2
# new file
text in file3
$//*+\
s
# new file
4!
aaaaaaaaa
i^
# new file
#¬}}{][|\~@
必须为 awk 命令提供文件名作为单独的参数,并使用单引号,这样 shell 就不会展开(双引号),在此split.sh
脚本中:
awk -v file="0" '
BEGIN {
print "AWK arguments:"
for (i = 0; i < ARGC; i++){
ARRAY[i] = ARGV[i]
print "\047"ARRAY[i]"\047"
if (i > 1){
ARGV[i] = ""
}
}
print "Writing:"
}
!/^# new file$/{
print "writing to: " "\047"ARRAY[file+1]"\047"
print $0 >> ARRAY[file+1]
}
/^# new file$/{
close(file)
++file
print "writing to: " "\047"ARRAY[file+1]"\047"
print $0 > ARRAY[file+1]
}
' 'infile' '1.txt' '2.txt' '3.txt' 'file $_%.txt' '&file _.txt'
控制台看起来像这样:
AWK arguments:
'awk'
'infile'
'1.txt'
'2.txt'
'3.txt'
'file $_%.txt'
'&file _.txt'
Writing:
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: '&file _.txt'
writing to: '&file _.txt'
writing to: '&file _.txt'
如果参数作为另一个命令的输出传递(文件必须先前存在于文件系统中):
' $(ls infile | tr '\n' ' ' ; ls *.txt)
它用空格分割参数:
AWK arguments:
'awk'
'infile'
'&file'
'_.txt'
'1.txt'
'2.txt'
'3.txt'
'_.txt'
'file'
'$_%.txt'
Writing:
writing to: '&file'
writing to: '&file'
writing to: '&file'
writing to: '&file'
writing to: '_.txt'
writing to: '_.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
为了解决这个问题,将参数作为数组传递给 awk,用换行符而不是空格分隔,使用以下split.sh
脚本:
array=(infile *.txt)
awk -v file="0" '
BEGIN {
print "AWK arguments:"
for (i = 0; i < ARGC; i++){
ARRAY[i] = ARGV[i]
print "\047"ARRAY[i]"\047"
if (i > 1){
ARGV[i] = ""
}
}
print "Writing:"
}
!/^# new file$/{
print "writing to: " "\047"ARRAY[file+1]"\047"
print $0 >> ARRAY[file+1]
}
/^# new file$/{
close(file)
++file
print "writing to: " "\047"ARRAY[file+1]"\047"
print $0 > ARRAY[file+1]
}
' "${array[@]}"
现在结果是:
AWK arguments:
'awk'
'infile'
'&file _.txt'
'1.txt'
'2.txt'
'3.txt'
'file $_%.txt'
Writing:
writing to: '&file _.txt'
writing to: '&file _.txt'
writing to: '&file _.txt'
writing to: '&file _.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
要写入的文件数量必须至少与将执行的拆分数量相同。如果多了,其余的将被忽略。