通过模式匹配将文件拆分为特定的输出文件名

通过模式匹配将文件拆分为特定的输出文件名

我有一个包含以下内容的文件:

# new file
text in file 1
# new file
text in file 2
# new file
text in file 3

这里的模式是# new file

我没有将每个文件保存到 xx00、xx01 和 xx02,而是保存到特定文件:another filefile newlast one

这3个文件存在于当前目录中,所以我想将它们作为数组提供,覆盖它们:

csplit -z infile '/# new file/' "${array[*]}"

可以直接提供数组

array=('another file' 'file new' 'last one')
echo ${array[*]}
another file file new last one

或者列出当前目录

array=($(find . -type f))
echo ${array[*]}
./another file ./file new ./last one

的修改这个脚本可能是解决方案:

awk -v file="1" -v occur="2" '
{
  print > (file".txt")
}
/^\$\$\$\$$/{
  count++
  if(count%occur==0){
    if(file){
      close(file".txt")
      ++file
    }
  }
}
'  Input_file

答案1

我仍然会考虑使用csplit,但随后重命名生成的文件。

#!/bin/sh
mkdir ".tmp.$$" || exit 2
csplit -f ".tmp.$$/tmp_" -zk -n 4 "$1" '/# new file/' '{*}'

for file in ".tmp.$$"/tmp_*
do
    shift
    mv -f "$file" "$1"
done
if ! rmdir ".tmp.$$" 2>/dev/null
then
    echo "Warning: not all file parts were assigned" >&2
    rm -rf ".tmp.$$"
    exit 1
fi
exit 0

用法

mysplit <source_file> <target_names...>

答案2

即使在文本文件和文件名中包含空格和非 ASCII 字符,也可以使用此方法,而无需使用临时文件:

infile:

# new file
text in file1

blabla
# new file
text in file2
# new file
text in file3

$//*+\

s
# new file
4!
aaaaaaaaa
i^
# new file

#¬}}{][|\~@

必须为 awk 命令提供文件名作为单独的参数,并使用单引号,这样 shell 就不会展开(双引号),在此split.sh脚本中:

awk -v file="0" '
  BEGIN { 
    print "AWK arguments:"
    for (i = 0; i < ARGC; i++){
    ARRAY[i] = ARGV[i]
    print "\047"ARRAY[i]"\047"
    if (i > 1){
      ARGV[i] = ""
    }
  }
  print "Writing:"
}
!/^# new file$/{
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 >> ARRAY[file+1]
}
/^# new file$/{
  close(file)
  ++file
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 > ARRAY[file+1]
}
' 'infile' '1.txt' '2.txt' '3.txt' 'file $_%.txt' '&file  _.txt'

控制台看起来像这样:

AWK arguments:
'awk'
'infile'
'1.txt'
'2.txt'
'3.txt'
'file $_%.txt'
'&file  _.txt'
Writing:
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'

如果参数作为另一个命令的输出传递(文件必须先前存在于文件系统中):

' $(ls infile | tr '\n' ' ' ; ls *.txt)

它用空格分割参数:

AWK arguments:
'awk'
'infile'
'&file'
'_.txt'
'1.txt'
'2.txt'
'3.txt'
'_.txt'
'file'
'$_%.txt'
Writing:
writing to: '&file'
writing to: '&file'
writing to: '&file'
writing to: '&file'
writing to: '_.txt'
writing to: '_.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'

为了解决这个问题,将参数作为数组传递给 awk,用换行符而不是空格分隔,使用以下split.sh脚本:

array=(infile *.txt)
awk -v file="0" '
  BEGIN { 
    print "AWK arguments:"
    for (i = 0; i < ARGC; i++){
    ARRAY[i] = ARGV[i]
    print "\047"ARRAY[i]"\047"
    if (i > 1){
      ARGV[i] = ""
    }
  }
  print "Writing:"
}
!/^# new file$/{
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 >> ARRAY[file+1]
}
/^# new file$/{
  close(file)
  ++file
  print "writing to: " "\047"ARRAY[file+1]"\047"
  print $0 > ARRAY[file+1]
}
' "${array[@]}"

现在结果是:

AWK arguments:
'awk'
'infile'
'&file  _.txt'
'1.txt'
'2.txt'
'3.txt'
'file $_%.txt'
Writing:
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '&file  _.txt'
writing to: '1.txt'
writing to: '1.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '2.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: '3.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'
writing to: 'file $_%.txt'

要写入的文件数量必须至少与将执行的拆分数量相同。如果多了,其余的将被忽略。

相关内容