将大量文本文件转换为 pdf，并根据头文件命名

Question 1

如果您有一个相对简单的文件树，其中只有一层目录，并且每个目录包含文件列表但没有子目录，您应该能够执行类似的操作（您可以将其直接粘贴到您的文件中）终端并点击Enter):

for dir in *; do    ## For each directory
 if [ "$(ls -A "$dir")" ]; then  ## If the dir is not empty
   for file in "$dir"/*; do      ## For each file in $dir
    i=0;                         ## initialize a counter
    ## Get the subject
    sub=$(grep ^Subject: "$file" | cut -d ':' -f 2-);
    ## get the date, and format it to MMDDYY_Hour:Min:Sec
    date=$(date -d "$(grep ^Date: $file | cut -d ':' -f 2-)" +%m%d%y_%H:%M:%S);
    ## the pdf's name will be <directory's name> _ <date> _ <subject>
    name="$dir"_"$date"_"$sub";
    ## if a file of this name exists
    while [ -e "$dir/$name".pdf ]; do
      let i++;                       ## increment the counter
      name="$dir"_"$date"_"$sub"$i;  ## append it to the pdf's name
    done;
    wkhtmltopdf "$file" "$dir"/"$name".pdf; ## convert html to pdf
  done
 fi
done

笔记

该解决方案需要wkhtmltopdf:

使用 webkit 渲染引擎和 qt 将 html 转换为 pdf 的简单 shell 实用程序。

在基于 Debian 的系统上，您可以使用以下命令安装它
```
sudo apt-get install wkhtmltopdf
```
它假设有没有文件在顶级目录中和仅需要所需的 html 文件在所有子目录中。
它可以处理包含空格、换行符和其他非正统字符的文件和目录名。
dir1/foo给定一个包含您发布的示例内容的文件，它将创建一个名为的文件dir1/dir1_020513_20:59:53_Civilized Discourse Construction Kit10.pdf

Answer

如果您有一个相对简单的文件树，其中只有一层目录，并且每个目录包含文件列表但没有子目录，您应该能够执行类似的操作（您可以将其直接粘贴到您的文件中）终端并点击Enter):

for dir in *; do    ## For each directory
 if [ "$(ls -A "$dir")" ]; then  ## If the dir is not empty
   for file in "$dir"/*; do      ## For each file in $dir
    i=0;                         ## initialize a counter
    ## Get the subject
    sub=$(grep ^Subject: "$file" | cut -d ':' -f 2-);
    ## get the date, and format it to MMDDYY_Hour:Min:Sec
    date=$(date -d "$(grep ^Date: $file | cut -d ':' -f 2-)" +%m%d%y_%H:%M:%S);
    ## the pdf's name will be <directory's name> _ <date> _ <subject>
    name="$dir"_"$date"_"$sub";
    ## if a file of this name exists
    while [ -e "$dir/$name".pdf ]; do
      let i++;                       ## increment the counter
      name="$dir"_"$date"_"$sub"$i;  ## append it to the pdf's name
    done;
    wkhtmltopdf "$file" "$dir"/"$name".pdf; ## convert html to pdf
  done
 fi
done

笔记

该解决方案需要wkhtmltopdf:

使用 webkit 渲染引擎和 qt 将 html 转换为 pdf 的简单 shell 实用程序。

在基于 Debian 的系统上，您可以使用以下命令安装它
```
sudo apt-get install wkhtmltopdf
```
它假设有没有文件在顶级目录中和仅需要所需的 html 文件在所有子目录中。
它可以处理包含空格、换行符和其他非正统字符的文件和目录名。
dir1/foo给定一个包含您发布的示例内容的文件，它将创建一个名为的文件dir1/dir1_020513_20:59:53_Civilized Discourse Construction Kit10.pdf

Question 2

您始终可以使用页面标题进行命名约定，因此它应该是唯一的。

给定包含地址列表的文件，以下是一行：

while read url; do wkhtmltopdf $url "$(curl -s $url | grep -o "<title>[^<]*" | tail -c+8).pdf"; done < urls.lst

urls.lst包含 url 列表的文件在哪里。

Answer

您始终可以使用页面标题进行命名约定，因此它应该是唯一的。

给定包含地址列表的文件，以下是一行：

while read url; do wkhtmltopdf $url "$(curl -s $url | grep -o "<title>[^<]*" | tail -c+8).pdf"; done < urls.lst

urls.lst包含 url 列表的文件在哪里。

将大量文本文件转换为 pdf，并根据头文件命名

答案1

答案2

相关内容