csplit多个文件为多个文件

csplit多个文件为多个文件

各位——

我对此有点困惑。我正在尝试编写一个 bash 脚本,该脚本将使用 csplit 获取多个输入文件并根据相同的模式拆分它们。 (对于上下文:我有多个包含问题的 TeX 文件,由 \question 命令分隔。我想将每个问题提取到它们自己的文件中。)

到目前为止我的代码:

#!/bin/bash
# This script uses csplit to run through an input TeX file (or list of TeX files) to separate out all the questions into their own files.
# This line is for the user to input the name of the file they need questions split from.

read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files

read -ep "Type the directory where you would like to save the split files: " save

read -ep "What unit do these questions belong to?" unit

# This is a check for the user to confirm the file list, and proceed if true:

echo "The file(s) being split is/are $files. Please confirm that you wish to split this file, or cancel."
select ynf in "Yes" "No"; do
    case $ynf in 
        No ) exit;;
        Yes ) echo "The split files will be saved to $save. Please confirm that you wish to save the files here."
            select ynd in "Yes" "No"; do
            case $ynd in
                Yes )
#                   This line will create a loop to conduct the script over all the files in the list.
                    for i in ${files[@]}
                    do
#                   Mass re-naming is formatted to give "guestion###.tex' to enable processing a large number of questions quickly.
#                   csplit is the utility used here; run "man csplit" to learn more of its functionality.
#                   the structure is "csplit [name of file] [output options] [search filter] [separator(s)].
#                   this script calls csplit, will accept the name of the file in the argument, searches the files for calls of "question", splits the file everywhere it finds a line with "question", and renames it according to the scheme [prefix]#[suffix] (the %03d in the suffix-format is what increments the numbering automatically).
#                   the '\\question' allows searching for \question, which eliminates the split for \end{questions}; eliminating the \begin{questions} split has not yet been understood.
                        csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}'
                    done; exit;;
                No ) exit;;
            esac
        done
    esac
done

return

我可以确认它确实按照我对输入文件的预期进行了循环。但是,我注意到的行为是,它将按预期将第一个文件拆分为“q1.tex q2.tex q3.tex”,当它移动到列表中的下一个文件时,它将拆分问题并覆盖旧文件,第三个文件将覆盖第二个文件的拆分等。我希望发生的是,如果 File1 有 3 个问题,它将输出:

q1.tex
q2.tex
q3.tex

然后,如果 File2 有 4 个问题,它将继续递增到:

q4.tex
q5.tex
q6.tex
q7.tex

有没有办法让 csplit 检测此循环中已经完成的编号,并适当增加?

感谢大家提供的任何帮助!

答案1

csplit命令没有保存上下文(也不应该),因此它总是从 1 开始计数。无法解决此问题,但您可以维护自己插入到前缀字符串中的计数值。

或者,尝试更换

read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files

...

for i in ${files[@]}
do
    csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\\question'/ '{*}'
done

read -a files -ep 'Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. '

...

cat "${files[@]}" | csplit - --prefix="$save/${unit}q" --suffix-format='%03d.tex' '/\\question/' '{*}'

这是相对罕见的实例之一,人们确实需要使用cat {file} | ...ascsplit只需要一个文件参数(或者-对于标准输入)。

我已将您的read操作更改为使用数组变量,因为这就是您(正确地)尝试在for ... do csplit ...循环中使用的内容。

无论您最终决定做什么,我强烈建议您在使用所有变量时用双引号引起来,特别是对数组列表(例如"${files[@]}".

答案2

使用 Awk,您可以运行以下内容:

awk '/\\question/ {i++} ; {print > "q" i ".tex"}'  exam*.tex

如果要定义out-dir(d)和topic(t),并控制数字长度:

awk '/\\question/ {f=sprintf("%s/%s-q%03d.tex", d, t, i++)} {print>f}' d=d1 t=t1 ex*

为了跳过 TeX preambulo,我们可以在定义“f”时“打印”:

awk '/\\question/ {f=sprintf("%s/%s-q%03d.tex", d, t, ++i)} 
     f            {print>f}' d=d1 t=t1 ex*

答案3

你可以使用这个脚本

grep -o -P '(parameter).*(parameter)' your_teX_file.teX > questions.txt

您将获得questions.txt所有问题的文件,然后您可以将其拆分。

split -l 1 questions.txt

相关内容