Chinese OCR

OCR for Windows

OCR for Mac OS

Tesseract on Mac OS X

Here are the details on how to install and run Tesseract-OCR 3.0 on Mac OS X: First install tesseract via Homebrew and then download the Chinese language training files:

brew install tesseract
mkdir -p ~/Downloads/tessdata
cd ~/Downloads/tessdata
wget http://tesseract-ocr.googlecode.com/files/chi_sim.traineddata.gz
wget http://tesseract-ocr.googlecode.com/files/chi_tra.traineddata.gz
gunzip chi_sim.traineddata.gz chi_tra.traineddata.gz

(With the newer Homebrew formula you can simply run a brew install tesseract --all-languages so you don't need to get the language files yourself.)

The recognition process for a picture (here inputfile.jpg) is then as simple as:

convert inputfile.jpg -type Grayscale inputfile.tif
export TESSDATA_PREFIX=~/Downloads/
tesseract inputfile.tif output -l chi_sim

The traditional training data file does not work though at the moment. See bug 381 and 336.

Open Source OCR for Linux

Request for Help

If your know more, please let me know!

Comments