Tesseract 无法处理 *.bmp 文件。它出现此错误。
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica
Error in pixReadMemBmp: size incommensurate with image data
Error in pixReadStream: bmp: no pix returned
Error in pixRead: pix not read
Error during processing.
超正方体-v
tesseract 4.00.00alpha
leptonica-1.74.4
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
Found AVX2
Found AVX
Found SSE
稳定版本
tesseract 4.0.0
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found SSE
Error in pixRemoveColormap: pixs must be {1,2,4,8} bpp
Error in pixGetDepth: pix not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined
Error in pixClone: pixs not defined
Please call SetImage before attempting recognition.
Error during processing.
Ubuntu
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
答案1
因此,如果您想在结果中使用 tesseract,请将 bmps 转换为(最好)tiff。
for a in *.bmp; do
convert $a ${a%.*}.tiff
done
您需要 ImageMagick(转换)和 bash 来获取代码