Integrating Your Own Recognizer
struct IRecognizeLine : IComponent { virtual void recognizeLine(IGenericFst &result,bytearray &image);...};
- IRecognizeLine has a lot of methods, but this is the only one you need to implement for recognition
- output
- if you just output a string, use ???result:setString(your_string)
- if you output a lattice, copy it over from your data structure into result
- when implemented...
- you can run your recognizer with full layout analysis, and run OCRopus evaluation tools
- if you used ???setString, you can use language models for OCR error correction
- if you output a lattice, you can use statistical language models for optimal recognition
- note that you need different language models in the two cases
- if you try to use your recognizer with other toplevels (e.g., training), the unimplemented methods will give an error
Command Line Invocation
- the simplest way of integrating your system is via command line invocation
- this is not appropriate for distribution or inclusion into OCRopus, but it's useful for testing
Steps:- write out the text line image
- write_image_gray("temp.png",image)
- invoke your command line tool using system
- system("my-ocr temp.png > temp.txt")
- read the result back in
- fread(buf,sizeof buf,1,file)
- return the result
Outputting a Lattice
struct IRecognizeLine : IComponent { virtual void
recognizeLine(intarray &segmentation,IGenericFst
&result,bytearray &image);...}
- if you want your recognizer to integrate with OCRopus language modeling, you need to output a lattice
- to do this...
- take your own lattice and copy it into the result FST using the setStart(...), addArc(...), etc methods
- output your underlying oversegmentation (cuts, etc.) in segmentation
- your FST should have input segments as input symbols, and unicode code points as output
- some hints
- the segmentation is used for training and evaluation; if you can't output, resize it to 0,0
- if you have a system that outputs text and bounding boxes, it's still useful to implement this
- additional evaluation (#segmentation errors, etc.), training
- use result.setString(...) to construct an FST for your Unicode string
- compute an oversegmentation of the input using one of the OCRopus segmenters
- use ???align_bounding_boxes_with_segmentation to align your bounding boxes with its segmentation
- if your system outputs cuts, polygonal bounds, etc.
- generate an corresponding segmentation image
- e.g., draw the cuts into an image, invert, label_components, ???seed
- intersect the segmentation image with a binarized version of the input
Making Your Recognizer Useful for Others / Evaluation
struct IRecognizeLine : IComponent { virtual void align(nustring
&chars,intarray &result,floatarray &costs, bytearray
&image,IGenericFst &transcription) ... };
- a second method that is used for training is ground truth alignment
- you are given...
- a language model representing a transcription (possibly ambiguous)
- an input image
- you need to return...
- the list of recognized characters
- the input states to the transcription finite state transducer
- the costs of aligning each character
Allowing Your Recognizer to be Trained through OCRopus
struct IRecognizeLine : IComponent { virtual void
startTraining(const char *type="adaptation") virtual void
finishTraining() virtual void save(FILE *stream) virtual void load(FILE
*stream) ... };
- you need to implement these bookkeeping methods for your recognizer to do anything at all
- your implementation should check that training methods are only called between startTraining and finishTraining
- if you don't implement load/save, your models can't be saved
- your implementation of load/save should check that the models it gets really come from your system
Types of Training
- there are different types of training
- "initial" -- train a recognizer from scratch
- "adaptation" -- adapt an already trained recognizer with samples from a particular book
- "incremental" -- add more training data to an existing recognizer
- you may want to behave differently for the different kinds of training
- "adaptation"
- when "adaptation" is called on a particular recognizer, it is always from the same context
- permanently modifies the recognizer
- to return to the unadapted state, load(...) is used on the original model
- in the future, we may improve this interface
Allowing Your Recognizer to be Trained through OCRopus
struct IRecognizeLine : IComponent { virtual void
addTrainingLine(bytearray &image,nustring &transcription)
virtual void addTrainingLine(intarray &segmentation, bytearray
&image_grayscale, nustring &transcription) ... }
- there are two training interfaces
- the first gives a line and its ground truth transcription
- you can use this for e.g. Viterbi training
- the second gives a line, its segmentation into characters, and its ground truth transcription
- you can use this for isolated character training
- you can implement either one or both
- some training scripts will fail if the method they need isn't implemented
Summary of Integrating your Recognizers
- simplest form: only integrate text line recognition and do so through command line
- recommended stages
- implement image -> string text line recognition
- implement image -> lattice text line recognition as best you can
- implement alignment (if you can for your recognizer, it's fairly tricky)
- implement training on text line images
- implement training on lattices
Improving the Existing Layout Analysis - The usual layout analysis used in OCRopus happens in four steps:
- text/image segmentation (yields probability map or bitmap, converted to list of rectangles)
- column finding (SegmentPageByRAST)
- constrained text line finding (SegmentPageByRAST)
- reading order determination
- You can customize/replace any one of these steps
- constrained text line finding and column finding rarely make mistakes
- most improvement possible in terms of text/image segmentation
Integrating Your Own Layout Analysis If you have your own layout analysis method, you can easily integrate it: - input is a grayscale or binary image from OCRopus
- if you can't cope with grayscale, call one of the OCRopus binarizers
- convert the output into a pixel-accurate color segmentation
- if your system outputs polygons or rectangles
- iterate through these and draw them into an image with the same dimension as the original
- intersect that image with the binary version of that image
- you can easily integrate through a command line interface
- write binary image to disk
- invoke your layout analysis from the command line
- read back the results and convert to pixel-accurate segmentation
|
|