各位——
我对此有点困惑。我正在尝试编写一个 bash 脚本,该脚本将使用 csplit 获取多个输入文件并根据相同的模式拆分它们。 (对于上下文:我有多个包含问题的 TeX 文件,由 \question 命令分隔。我想将每个问题提取到它们自己的文件中。)
到目前为止我的代码:
#!/bin/bash
# This script uses csplit to run through an input TeX file (or list of TeX files) to separate out all the questions into their own files.
# This line is for the user to input the name of the file they need questions split from.
read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files
read -ep "Type the directory where you would like to save the split files: " save
read -ep "What unit do these questions belong to?" unit
# This is a check for the user to confirm the file list, and proceed if true:
echo "The file(s) being split is/are $files. Please confirm that you wish to split this file, or cancel."
select ynf in "Yes" "No"; do
case $ynf in
No ) exit;;
Yes ) echo "The split files will be saved to $save. Please confirm that you wish to save the files here."
select ynd in "Yes" "No"; do
case $ynd in
Yes )
# This line will create a loop to conduct the script over all the files in the list.
for i in ${files[@]}
do
# Mass re-naming is formatted to give "guestion###.tex' to enable processing a large number of questions quickly.
# csplit is the utility used here; run "man csplit" to learn more of its functionality.
# the structure is "csplit [name of file] [output options] [search filter] [separator(s)].
# this script calls csplit, will accept the name of the file in the argument, searches the files for calls of "question", splits the file everywhere it finds a line with "question", and renames it according to the scheme [prefix]#[suffix] (the %03d in the suffix-format is what increments the numbering automatically).
# the '\\question' allows searching for \question, which eliminates the split for \end{questions}; eliminating the \begin{questions} split has not yet been understood.
csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}'
done; exit;;
No ) exit;;
esac
done
esac
done
return
我可以确认它确实按照我对输入文件的预期进行了循环。但是,我注意到的行为是,它将按预期将第一个文件拆分为“q1.tex q2.tex q3.tex”,当它移动到列表中的下一个文件时,它将拆分问题并覆盖旧文件,第三个文件将覆盖第二个文件的拆分等。我希望发生的是,如果 File1 有 3 个问题,它将输出:
q1.tex
q2.tex
q3.tex
然后,如果 File2 有 4 个问题,它将继续递增到:
q4.tex
q5.tex
q6.tex
q7.tex
有没有办法让 csplit 检测此循环中已经完成的编号,并适当增加?
感谢大家提供的任何帮助!
答案1
该csplit
命令没有保存上下文(也不应该),因此它总是从 1 开始计数。无法解决此问题,但您可以维护自己插入到前缀字符串中的计数值。
或者,尝试更换
read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files
...
for i in ${files[@]}
do
csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}'
done
和
read -a files -ep 'Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. '
...
cat "${files[@]}" | csplit - --prefix="$save/${unit}q" --suffix-format='%03d.tex' '/\\question/' '{*}'
这是相对罕见的实例之一,人们确实需要使用cat {file} | ...
ascsplit
只需要一个文件参数(或者-
对于标准输入)。
我已将您的read
操作更改为使用数组变量,因为这就是您(正确地)尝试在for ... do csplit ...
循环中使用的内容。
无论您最终决定做什么,我强烈建议您在使用所有变量时用双引号引起来,特别是对数组列表(例如"${files[@]}"
.
答案2
使用 Awk,您可以运行以下内容:
awk '/\\question/ {i++} ; {print > "q" i ".tex"}' exam*.tex
如果要定义out-dir(d)和topic(t),并控制数字长度:
awk '/\\question/ {f=sprintf("%s/%s-q%03d.tex", d, t, i++)} {print>f}' d=d1 t=t1 ex*
为了跳过 TeX preambulo,我们可以在定义“f”时“打印”:
awk '/\\question/ {f=sprintf("%s/%s-q%03d.tex", d, t, ++i)}
f {print>f}' d=d1 t=t1 ex*
答案3
你可以使用这个脚本
grep -o -P '(parameter).*(parameter)' your_teX_file.teX > questions.txt
您将获得questions.txt
所有问题的文件,然后您可以将其拆分。
split -l 1 questions.txt