Representing Segmentation Lattices
- other systems use different data types for segmentation lattices, language models, etc.
- this leads to an explosion of code
OCRopus
- segmentation lattices, language models, etc. are all represented as finite state transducers
- a powerful algorithm library can be used to manipulate finite state transducers
IGenericFst
struct IGenericFst : virtual IComponent {
virtual void clear() = 0;
virtual int newState() = 0;
virtual void addTransition(int from,int to,int output,float cost,int input)
virtual void setStart(int node) = 0;
virtual void setAccept(int node,float cost=0.0)
virtual int special(const char *s) = 0;
virtual void bestpath(nustring &result) = 0;
virtual int nStates()
virtual int getStart()
virtual float getAcceptCost(int node)
virtual void arcs(colib::intarray &ids, ...)
virtual void rescore(int from,int to,int output,float new_cost,int input)
virtual void rescore(int from, int to, int symbol, float new_cost)
};
- don't panic... it's just a directed graph with labeled arcs
- you don't have to deal with it yourself anyway usually
IGrouper
- a Grouper helps you put all these things together
- it groups character parts into character hypotheses
- it lets you iterate through the character hypotheses
- it lets you return your character classification results
- finally, it generates the segmentation lattice for you
IGrouper Example
Pseudocode; this doesn't quite work yet.grouper = make_StandardGrouper()
grouper:setSegmentation(segmentation)
for i=0,grouper:length()-1 do
grouper:extract(char_image,char_mask,line_image,i)
classifier:setImage(char_image)
for j=0,classifier:length()-1 do
classifier:cls(s,j)
grouper:setClass(i,s,classifier:cost(j))
end
end
grouper:getLattice(fst)
Recognition Without Language Model
The optimal recognition without a language model can be found simply by finding the best path through the lattice. There are a number of methods available for this purpose:
- OpenFST's bestpath
- generic implementation for OpenFST implementations of IGenericFST
- no pruning
- OCRopus A* search
- pruning using A* search
- (not currently available)
- OCRopus beam search
- pruning search path
- beam_search(nustring_out,fst_in,beam_width_opt)
Bestpath is also a convenience method on fst (mostly for debugging); it uses whatever
Full Line Recognition
line = bytearray()
recognizer = make_NewBpnetLineOCR(arg[1])
result = nustring()
fst = make_StandardFst()
read_image_gray(line,arg[2])
recognizer:recognizeLine(fst,line)
beam_search(result,fst)
print(result:utf8())
Recognition With a Language Model
line = bytearray()
result = nustring()
fst = make_StandardFst()
langmod = make_StandardFst()
recognizer = make_NewBpnetLineOCR(arg[1])
langmod:load(arg[2])
read_image_gray(line,arg[3])
recognizer:recognizeLine(fst,line)
beam_search_in_composition(result,fst,langmod)
print(result:utf8())