Text Line Recognition in OCRopus
- primary interface for "OCR": IRecognizeLine
- standard mode of operation: takes a "text line" as input, outputs a recognition lattice
- text line can be any image with linear arrangement of symbols
- direction/geometry of the line image is script dependent
- lattice represented in terms of weighted finite state transducers
Text Line Recognition Interface
struct IRecognizeLine : IComponent {
virtual void recognizeLine(IGenericFst &result,bytearray &image);
virtual void recognizeLine(intarray &segmentation,IGenericFst &result,bytearray &image);
...
};
- interface is much simpler than for ICharacterClassifier
- that's because all the results can get stuffed into the result data structure
- two versions
- first version just performs recognition and outputs a lattice
- second version outputs the internally generated oversegmentation
- in the second version, the transducer transduces from segmentation components to output characters
Text Line Training Interface
struct IRecognizeLine : IComponent {
virtual void startTraining(const char *type="adaptation")
virtual void addTrainingLine(bytearray &image,nustring &transcription)
virtual void addTrainingLine(intarray &segmentation, bytearray &image_grayscale, nustring &transcription)
virtual void finishTraining()
...
};
- all training must be surrounded by startTraining ... finishTraining
- in startTraining, you indicate whether this is initial training or adaptation
- you can train either...
- on an unsegmented line image, by giving the line image and the transcription
- on a segmented line image, by giving the segmentation, line image, and transcription
- a recognizer doesn't have to implement both methods (but it's good if it does)
Text Line Alignment
struct IRecognizeLine : IComponent {
virtual void align(nustring &chars,intarray &result,floatarray &costs, bytearray &image,IGenericFst &transcription)
...
};
- text line recognizers should also implement an alignment method
- this method should output an aligned result using the transcription as the "ground truth"
- this is useful for generated isolated character training data
Loading and Saving
struct ICharacterClassifier : IComponent {
...
virtual void save(FILE *stream)
virtual void load(FILE *stream)
};
- as before, you can load and save models
Simple Example
image = read_image_gray_checked(arg[1])
segmenter = make_SegmentPageByRAST()
segmenter:segment(segmentation, image)
recognizer = ocropus_make_RecognizeLine('bpnet', 'models/neural-net-file.nn')
regions = RegionExtractor()
regions:setPageLines(segmentation)
line_image = bytearray()
for i = 1, regions:length() - 1 do
regions:extract(line_image, image, i, 1)
fst = make_StandardFst()
recognizer:recognizeLine(fst, line_image) s = nustring()
fst:bestpath(s) print(s:utf8())
end