Course: OCRopus

Text Line Recognition

Text Line Recognition in OCRopus

  • primary interface for "OCR": IRecognizeLine
  • standard mode of operation: takes a "text line" as input, outputs a recognition lattice
    • text line can be any image with linear arrangement of symbols
    • direction/geometry of the line image is script dependent
    • lattice represented in terms of weighted finite state transducers

Text Line Recognition Interface

    struct IRecognizeLine : IComponent {
      virtual void recognizeLine(IGenericFst &result,bytearray &image);
      virtual void recognizeLine(intarray &segmentation,IGenericFst &result,bytearray &image);
        ...
    };

  • interface is much simpler than for ICharacterClassifier
    • that's because all the results can get stuffed into the result data structure
  • two versions
    • first version just performs recognition and outputs a lattice
    • second version outputs the internally generated oversegmentation
    • in the second version, the transducer transduces from segmentation components to output characters

Text Line Training Interface

    struct IRecognizeLine : IComponent {
        virtual void startTraining(const char *type="adaptation")
        virtual void addTrainingLine(bytearray &image,nustring &transcription)
        virtual void addTrainingLine(intarray &segmentation, bytearray &image_grayscale, nustring &transcription)
        virtual void finishTraining()
        ...
    };
  • all training must be surrounded by startTraining ... finishTraining
  • in startTraining, you indicate whether this is initial training or adaptation
  • you can train either...
    • on an unsegmented line image, by giving the line image and the transcription
    • on a segmented line image, by giving the segmentation, line image, and transcription
    • a recognizer doesn't have to implement both methods (but it's good if it does)

Text Line Alignment

    struct IRecognizeLine : IComponent {
        virtual void align(nustring &chars,intarray &result,floatarray &costs, bytearray &image,IGenericFst &transcription)
        ...
    };
  • text line recognizers should also implement an alignment method
  • this method should output an aligned result using the transcription as the "ground truth"
  • this is useful for generated isolated character training data

Loading and Saving


    struct ICharacterClassifier : IComponent {
        ...
        virtual void save(FILE *stream)
        virtual void load(FILE *stream)
    };
  • as before, you can load and save models

Simple Example

image = read_image_gray_checked(arg[1])
segmenter = make_SegmentPageByRAST()
segmenter:segment(segmentation, image)
recognizer = ocropus_make_RecognizeLine('bpnet', 'models/neural-net-file.nn')
regions = RegionExtractor()
regions:setPageLines(segmentation)
line_image = bytearray()
for i = 1, regions:length() - 1 do
    regions:extract(line_image, image, i, 1)
    fst = make_StandardFst()
    recognizer:recognizeLine(fst, line_image)
    s = nustring()
    fst:bestpath(s)
    print(s:utf8())
end

Navigation

Recent site activity