Course: OCRopus

Integrating Your Own OCR or Layout Analysis

Integrating Your Own Recognizer

struct IRecognizeLine : IComponent { virtual void recognizeLine(IGenericFst &result,bytearray &image);...};


  • IRecognizeLine has a lot of methods, but this is the only one you need to implement for recognition
  • output
    • if you just output a string, use ???result:setString(your_string)
    • if you output a lattice, copy it over from your data structure into result
  • when implemented...
    • you can run your recognizer with full layout analysis, and run OCRopus evaluation tools
    • if you used ???setString, you can use language models for OCR error correction
    • if you output a lattice, you can use statistical language models for optimal recognition
    • note that you need different language models in the two cases
    • if you try to use your recognizer with other toplevels (e.g., training), the unimplemented methods will give an error

Command Line Invocation

  • the simplest way of integrating your system is via command line invocation
  • this is not appropriate for distribution or inclusion into OCRopus, but it's useful for testing

Steps:

  • write out the text line image
    • write_image_gray("temp.png",image)
  • invoke your command line tool using system
    • system("my-ocr temp.png > temp.txt")
  • read the result back in
    • fread(buf,sizeof buf,1,file)
  • return the result
    • fst.setString(buf)

Outputting a Lattice

struct IRecognizeLine : IComponent { virtual void recognizeLine(intarray &segmentation,IGenericFst &result,bytearray &image);...}

  • if you want your recognizer to integrate with OCRopus language modeling, you need to output a lattice
  • to do this...
    • take your own lattice and copy it into the result FST using the setStart(...), addArc(...), etc methods
    • output your underlying oversegmentation (cuts, etc.) in segmentation 
    • your FST should have input segments as input symbols, and unicode code points as output
  • some hints
    • the segmentation is used for training and evaluation; if you can't output, resize it to 0,0
    • if you have a system that outputs text and bounding boxes, it's still useful to implement this
      • additional evaluation (#segmentation errors, etc.), training
      • use result.setString(...) to construct an FST for your Unicode string
      • compute an oversegmentation of the input using one of the OCRopus segmenters
      • use ???align_bounding_boxes_with_segmentation to align your bounding boxes with its segmentation
    • if your system outputs cuts, polygonal bounds, etc.
      • generate an corresponding segmentation image
        • e.g., draw the cuts into an image, invert, label_components, ???seed
      • intersect the segmentation image with a binarized version of the input

Making Your Recognizer Useful for Others / Evaluation

struct IRecognizeLine : IComponent {
    virtual void align(nustring &chars,intarray &result,floatarray &costs, bytearray &image,IGenericFst &transcription)
...
};

  • a second method that is used for training is ground truth alignment
  • you are given...
    • a language model representing a transcription (possibly ambiguous)
    • an input image
  • you need to return...
    • the list of recognized characters
    • the input states to the transcription finite state transducer
    • the costs of aligning each character

Allowing Your Recognizer to be Trained through OCRopus

struct IRecognizeLine : IComponent {
    virtual void startTraining(const char *type="adaptation")
    virtual void finishTraining()
    virtual void save(FILE *stream)
    virtual void load(FILE *stream)
...
};

  • you need to implement these bookkeeping methods for your recognizer to do anything at all
  • your implementation should check that training methods are only called between startTraining and finishTraining
  • if you don't implement load/save, your models can't be saved
  • your implementation of load/save should check that the models it gets really come from your system

Types of Training

  • there are different types of training
    • "initial" -- train a recognizer from scratch
    • "adaptation" -- adapt an already trained recognizer with samples from a particular book
    • "incremental" -- add more training data to an existing recognizer
  • you may want to behave differently for the different kinds of training
  • "adaptation"
    • when "adaptation" is called on a particular recognizer, it is always from the same context
    • permanently modifies the recognizer
    • to return to the unadapted state, load(...) is used on the original model
  • in the future, we may improve this interface

Allowing Your Recognizer to be Trained through OCRopus

struct IRecognizeLine : IComponent {
    virtual void addTrainingLine(bytearray &image,nustring &transcription)
    virtual void addTrainingLine(intarray &segmentation, bytearray &image_grayscale, nustring &transcription)
...
}

  • there are two training interfaces
    • the first gives a line and its ground truth transcription
      • you can use this for e.g. Viterbi training
    • the second gives a line, its segmentation into characters, and its ground truth transcription
      • you can use this for isolated character training
    • you can implement either one or both
      • some training scripts will fail if the method they need isn't implemented

Summary of Integrating your Recognizers

  • simplest form: only integrate text line recognition and do so through command line
  • recommended stages
    • implement image -> string text line recognition
    • implement image -> lattice text line recognition as best you can
    • implement alignment (if you can for your recognizer, it's fairly tricky)
    • implement training on text line images
    • implement training on lattices

Improving the Existing Layout Analysis

  • The usual layout analysis used in OCRopus happens in four steps:
    • text/image segmentation (yields probability map or bitmap, converted to list of rectangles)
    • column finding (SegmentPageByRAST)
    • constrained text line finding (SegmentPageByRAST)
    • reading order determination
  • You can customize/replace any one of these steps
    • constrained text line finding and column finding rarely make mistakes
    • most improvement possible in terms of text/image segmentation

Integrating Your Own Layout Analysis

If you have your own layout analysis method, you can easily integrate it:
  • input is a grayscale or binary image from OCRopus
    • if you can't cope with grayscale, call one of the OCRopus binarizers
  • convert the output into a pixel-accurate color segmentation
    • if your system outputs polygons or rectangles
      • iterate through these and draw them into an image with the same dimension as the original
      • intersect that image with the binary version of that image
  • you can easily integrate through a command line interface
    • write binary image to disk
    • invoke your layout analysis from the command line
    • read back the results and convert to pixel-accurate segmentation

Navigation

Recent site activity