Class KneserNeyLmReaderCallback<W>

java.lang.Object
edu.berkeley.nlp.lm.io.KneserNeyLmReaderCallback<W>
Type Parameters:
W -
All Implemented Interfaces:
ArrayEncodedNgramLanguageModel<W>, LmReader<ProbBackoffPair,ArpaLmReaderCallback<ProbBackoffPair>>, LmReaderCallback<LongRef>, NgramOrderedLmReaderCallback<LongRef>, NgramLanguageModel<W>, Serializable

Class for producing a Kneser-Ney language model in ARPA format from raw text. Confusingly, this class is both a LmReaderCallback (called from TextReader, which reads plain text), and a LmReader, which "reads" counts and produces Kneser-Ney probabilities and backoffs and passes them on an ArpaLmReaderCallback
Author:
adampauls
See Also:
  • Field Details

    • serialVersionUID

      protected static final long serialVersionUID
      See Also:
    • DEFAULT_DISCOUNT

      protected static final float DEFAULT_DISCOUNT
      See Also:
    • lmOrder

      protected final int lmOrder
    • wordIndexer

      protected final WordIndexer<W> wordIndexer
      This array represents the discount used for each ngram order. The original Kneser-Ney discounting (-ukndiscount) uses one discounting constant for each N-gram order. These constants are estimated as D = n1 / (n1 + 2*n2) where n1 and n2 are the total number of N-grams with exactly one and two counts, respectively. For simplicity, our code just uses a constant discount for each order of 0.75. However, other discounts can be specified.
    • ngrams

    • opts

      protected final ConfigOptions opts
    • startIndex

      protected final int startIndex
  • Constructor Details

    • KneserNeyLmReaderCallback

      public KneserNeyLmReaderCallback(WordIndexer<W> wordIndexer, int maxOrder)
      Parameters:
      wordIndexer -
      maxOrder -
      inputIsSentences - If true, input n-grams are assumed to be sentences, and all sub-ngrams of up to order maxOrder are added. If false, input n-grams are assumed to be atomic.
    • KneserNeyLmReaderCallback

      public KneserNeyLmReaderCallback(WordIndexer<W> wordIndexer, int maxOrder, ConfigOptions opts)
  • Method Details