Course: OCRopus

OCR by Oversegmentation

Recognition by Oversegmentation

  • when characters are isolated, OCR is just classification
  • but characters are not
  • approach
    • generate all reasonable character hypotheses
    • classify each of them
    • assign a cost to the segmentation
    • assemble everything back together

Recognition by Oversegmentation

Input


Character Hypotheses

Segmentation Lattice




Obtaining Character Hypotheses

Approach

  1. find all potential cuts between characters
  2. for each pair of cuts, the region in between is a potential character
    (be somewhat selective about it)

Representation of Cuts

  • instead of representing cut paths, we represent the "finest" possible division by cut paths
  • this gives us a collection of character parts
  • we assemble character parts into

Computing Oversegmentations

  • connected components
  • valleys of the upper contour
  • skeletal segmenter
  • curved cut segmenter

Curved Cut Segmenter

  • dynamic programming based segmenter
  • computes optimal cuts through strings
  • small number of cost parameters
  • can cope with kerning, slant, italics

ISegmentLine

    struct ISegmentLine : IComponent {
        virtual void charseg(intarray &out,bytearray &in) = 0;
    };

Output
  • The RGB triple is split into two 12bit values.
  • The first 12bits give the line number in reading order, starting with 1.
  • The second 12bits give the character segment in “reading order”/“segmentation order”, starting with 1.
  • The special value #FFFFFF represents the page background.
  • Note that pixel value #000000 is illegal (reserved) in segmentation files.

Oversegmentation Example







Curved Cuts




Representing Segmentation Lattices

  • other systems use different data types for segmentation lattices, language models, etc.
  • this leads to an explosion of code

OCRopus

  • segmentation lattices, language models, etc. are all represented as finite state transducers
  • a powerful algorithm library can be used to manipulate finite state transducers

IGenericFst


    struct IGenericFst : virtual IComponent {
        virtual void clear() = 0;
        virtual int newState() = 0;
        virtual void addTransition(int from,int to,int output,float cost,int input)
        virtual void setStart(int node) = 0;
        virtual void setAccept(int node,float cost=0.0)
        virtual int special(const char *s) = 0;
        virtual void bestpath(nustring &result) = 0;
        virtual int nStates()
        virtual int getStart()
        virtual float getAcceptCost(int node)
        virtual void arcs(colib::intarray &ids, ...)
        virtual void rescore(int from,int to,int output,float new_cost,int input)
        virtual void rescore(int from, int to, int symbol, float new_cost)
    };
  • don't panic... it's just a directed graph with labeled arcs
  • you don't have to deal with it yourself anyway usually

IGrouper

  • a Grouper helps you put all these things together
    • it groups character parts into character hypotheses
    • it lets you iterate through the character hypotheses
    • it lets you return your character classification results
    • finally, it generates the segmentation lattice for you

IGrouper Example

Pseudocode; this doesn't quite work yet.

grouper = make_StandardGrouper()
grouper:setSegmentation(segmentation)
for i=0,grouper:length()-1 do
    grouper:extract(char_image,char_mask,line_image,i)
    classifier:setImage(char_image)
    for j=0,classifier:length()-1 do
        classifier:cls(s,j)
        grouper:setClass(i,s,classifier:cost(j))
    end
end
grouper:getLattice(fst)

Recognition Without Language Model


The optimal recognition without a language model can be found simply by finding the best path through the lattice.  There are a number of methods available for this purpose:

  • OpenFST's bestpath
    • generic implementation for OpenFST implementations of IGenericFST
    • no pruning
  • OCRopus A* search
    • pruning using A* search
    • (not currently available)
  • OCRopus beam search
    • pruning search path
    • beam_search(nustring_out,fst_in,beam_width_opt)
Bestpath is also a convenience method on fst (mostly for debugging); it uses whatever

Full Line Recognition

line = bytearray()
recognizer = make_NewBpnetLineOCR(arg[1])
result = nustring()
fst = make_StandardFst()

read_image_gray(line,arg[2])
recognizer:recognizeLine(fst,line)
beam_search(result,fst)
print(result:utf8())

Recognition With a Language Model


line = bytearray()
result = nustring()
fst = make_StandardFst()
langmod = make_StandardFst()
recognizer = make_NewBpnetLineOCR(arg[1])
langmod:load(arg[2])

read_image_gray(line,arg[3])
recognizer:recognizeLine(fst,line)
beam_search_in_composition(result,fst,langmod)
print(result:utf8())


Navigation

Recent site activity