将扫描的文档转换为“Word”文档？

Question 1

Tesseract 对我来说是一个非常好的选择！

我的使用方法如下：

如果你没有安装它，请安装它：

 sudo apt-get install tesseract-ocr

然后：

将 .JPG 扫描文件转换为 .tif（这是 Tesseract 所需的格式
）。使用 ImageMagick 完成此操作如下：

convert foo.JPG foo.tif
现在只需让 Tesseract 发挥它的魔力：

tesseract foo.tif foo（将保存输出到 foo.txt）

我最近需要将一本有 36 页的旧手册转换为数字版本。我编写了一个 BASH 脚本来完成此操作。

代码在这里：

#!/bin/bash
# makeDoc.sh
# Turn a set of scanned JPG pages into a single document file.
# Requires the ImageMagick and Tesseract packages.
# Author: Fred Fury 

echo "makeDoc.sh"
echo "Convert a set of scanned JPG pages into a single document file."
echo "Starting up..."
for i in {01..36}
do
    echo "converting $i.JPG to $i.tif..."
    bash -c "convert $i.JPG $i.tif"     # Convert the file to tesseract usable format
    bash -c "tesseract $i.tif $i &>-"   # Convert the tif to txt
done
echo "Merging files into Output.doc"    
bash -c "cat *.txt > Output.doc"        # Merge all the generated txt files into a single file
echo "Done."

还可以查看此页面以了解其他一些解决方案：最好、最简单的 OCR 解决方案是什么？这就是我发现 tesseract 的地方。

希望有帮助！

Answer

Tesseract 对我来说是一个非常好的选择！

我的使用方法如下：

如果你没有安装它，请安装它：

 sudo apt-get install tesseract-ocr

然后：

将 .JPG 扫描文件转换为 .tif（这是 Tesseract 所需的格式
）。使用 ImageMagick 完成此操作如下：

convert foo.JPG foo.tif
现在只需让 Tesseract 发挥它的魔力：

tesseract foo.tif foo（将保存输出到 foo.txt）

我最近需要将一本有 36 页的旧手册转换为数字版本。我编写了一个 BASH 脚本来完成此操作。

代码在这里：

#!/bin/bash
# makeDoc.sh
# Turn a set of scanned JPG pages into a single document file.
# Requires the ImageMagick and Tesseract packages.
# Author: Fred Fury 

echo "makeDoc.sh"
echo "Convert a set of scanned JPG pages into a single document file."
echo "Starting up..."
for i in {01..36}
do
    echo "converting $i.JPG to $i.tif..."
    bash -c "convert $i.JPG $i.tif"     # Convert the file to tesseract usable format
    bash -c "tesseract $i.tif $i &>-"   # Convert the tif to txt
done
echo "Merging files into Output.doc"    
bash -c "cat *.txt > Output.doc"        # Merge all the generated txt files into a single file
echo "Done."

还可以查看此页面以了解其他一些解决方案：最好、最简单的 OCR 解决方案是什么？这就是我发现 tesseract 的地方。

希望有帮助！

Question 2

我之前也遇到过类似的问题。尝试将文件上传到online-convert.com。这需要一段时间，但 Web 应用程序几乎可以处理任何格式。祝你好运！

Answer

我之前也遇到过类似的问题。尝试将文件上传到online-convert.com。这需要一段时间，但 Web 应用程序几乎可以处理任何格式。祝你好运！

将扫描的文档转换为“Word”文档？

答案1

答案2

相关内容