Course: OCRopus

Adaptation of Language Models

Adaptation of Language Models

  • extensively studied in speech recognition, some statistical NLP
  • techniques are directly applicable to OCR, but results are not
    • techniques that don't work for speech may work well for OCR
    • statistics of OCR are very different

Simple Techniques for Adaptation of Language Models

  • language recognition
    • recognize the language using fast, heuristic methods
  • dictionary augmentation
    • very commonly used in OCR systems
    • after a first round of recognition with a generic language model , pick any new words found in whose likelihood is above a given threshold and add them to the (dictionary-based) language model
  • MAP language recognition
    • given a set of language models and a first-round recognized text , pick the "best" language model according to and repeat recognition
  • language model mixture
    • given a set of language models ,  pick parameters and use as the language model (e.g., by optimizing the likelihood of and then repeating recognition)

Navigation

Recent site activity