pdfgrep 的高级用法

2024-6-17 • tag-icon

我需要帮助来解决一个问题。这看起来很简单，但我想并非如此。

#!/bin/bash

pdfgrep -Hn $1 *.pdf

exit 0

如果我在任何包含 ocr 格式的 PDF 文件的目录中运行此代码，它将输出“匹配文件+该文件中的页码+模式 $1 的匹配行”。

现在来谈谈问题。假设我想用两个模式“$1”和“$2”做同样的事情。这并不像

pdfgrep -Hn $1 | pdfgrep -Hn $2 *.pdf

如何做到这一点，以便结果将是“匹配文件+包含两个模式的文件中页码$1和“2美元”？

任何帮助都值得感激:-)

谢谢你！

/保罗

答案1

我发现了一个非常简单的可行解决方案：

#!/bin/bash

pdfgrep -Hn $1 *.pdf | cut -f1,2 -d':' > /tmp/sok1.tmp
pdfgrep -Hn $2 *.pdf | cut -f1,2 -d':' > /tmp/sok2.tmp
comm -1 -2 --nocheck-order /tmp/sok1.tmp /tmp/sok2.tmp

exit 0

/保罗

答案1

相关内容