##////////////////////////////////////////////////////////////////////////
##  Version 1.4
##////////////////////////////////////////////////////////////////////////
Mon Jul 19 01:43:07 EDT 2004

- Added support for building, training, and using hidden markov
  models (HMMs).

- Added nltk.clusterer, a package for clustering algorithms.
  Implementations include k-means, group average agglomerative,
  and expectation maximization clustering.  Support is included
  for using various distance metrics (such as euclidean and
  cosine distance).

- Refactored the feature detection & text classification packages:
  - feature detection is used to build a dictionary containing a
    token's features.
  - feature encoding is used, if necessary, to translate the feature
    dictionary into a homogenous (typically sparse) vector
  - feature-based classifiers examine a token's feature dictionary
    or feature vector, and use it to classify the token.

- Added support for accessing a token's context:
  - Most token readers & tokenizers support an "add_context" flag,
    which says to add a CONTEXT property to each token, containing
    a pointer to the token's context.  Context pointers are
    implemented using two new classes, SubtokenContextPointer and
    TreeContextPointer.

- The task methods used to find multiple solutions for a given task
  (e.g., parse_n()) underwent minor design & name changes.
  These changes were necessary to properly support multiple-
  solution methods for tasks that process collections of tokens,
  such as TaggerI and ClustererI

- Property indirection is now used to give more specific property
  names for individual tokens (eg WORDS or SENTS instead of
  SUBTOKENS).

- Extended chunking to allow for cascaded chunking.

- Added TokenReaderI, a generic interface for reading in string-
  encoded tokens.  In many cases, token readers have replaced
  tokenizers.  E.g., TreebankTokenizer was replaced by
  TreebankTokenReader.

- Replaced TreeToken with a generic Tree data type, which can be
  used to build trees over any type of data (not just tokens).

- Increased the level of detail returned by several corpus readers.

- Updated the test suite, and switched from the unittest framework
  to the doctest framework.

- Improved support for working with probaqbilities in log-space

- Improved support for pickling tokens (note: must use protocol=2
  when pickling tokens containing cycles)

- Added several new methods for manipulating trees.

- Added support for property indirection to all tasks

- Added TaskI, a base class for all processing interfaces

- Several bug-fixes.

##////////////////////////////////////////////////////////////////////////
##  Version 1.3
##////////////////////////////////////////////////////////////////////////
Sat Mar 20 11:12:44 EST 2004

We made some significant changes to NLTK's basic architecture.  These
changes make the basic processing tasks easier to use; and make it
easier to combine different processing tasks into a single system.
Under the new architecture:
  - Tokens are encoded as mutable mappings from properties to values
  - Tokens are used to encode all units of language (words, sentences,
    syntax trees, documents, etc).
  - Tokens can can contain other tokens (e.g. a document token's SUBTOKENS
    property might contain a list of the document's words); or pointers
    to other tokens (e.g., a parse constituent's PARENT property might
    contain a pointer to the constituent's parent).
  - Processing tasks (parsing, tokenizing, tagging, etc) work by
    adding new information to existing tokens (e.g., a new TREE property
    or a new SUBTOKENS property).
  - "Property indirection" can be used to control which properties a
    given processing task uses for input and output (e.g., whether a
    parser uses the words' TEXT or TAG as the LEAF property.
  - Locations (such as character spans) can be added to each token, to
    provide unique identifiers.
  - Specialized token subclasses provide extra methods.  For example,
    TreeToken defines methods like height() and leaves().

Additions:
  - Added Witten-Bell and Good-Turing smoothing for probability
    distributions
  - Added a regular expression based tagger
  - Added a regular expression based stemmer
  - Added an implementation of the earley parser
  - Added feature structures, including unification with variables and
    reentrance.
  - Added support for parsing CFGs and CFG productions
  - Added support for trees that automatically maintain parent
    pointers.

Improvements:
  - Redesigned the chart parser system to improve flexibility and
    efficiency.  (Chart parsers now run 10-20x faster!)

Graphical Demonstrations:
  - Improved the chart parser demo:
    - Runs 5-10x faster
    - Improved matrix window
    - Added an optional "results" window
  - Added a corpus viewer, which can be used to browse corpora and
    select items within each corpus.
  - Added support for drawing feature structures
  - Improved font support in the graphical demos.

##////////////////////////////////////////////////////////////////////////
##  Version 1.2
##////////////////////////////////////////////////////////////////////////
Tue Nov  4 23:52:30 EST 2003

Additions:
  - New corpus readers (for the names corpus, the stopwords corpus,
    the Semcor corpus, and the WordNet corpus).
  - Added nltk.sense, which tokenizes semcor (nb that nltk.sense
    does *not* define an interface for sense tagging; sense
    tagging is just a special case of classification (or perhaps
    tagging).
  - Added a look-ahead window for sequential tagging.
  - Added Bob Berwick's version of the chart parser demo.
  - New corpora:
    - Added a corpus of male & female names.
    - Added a corpus of stopwords.
    - Added the semcor corpus.
    - Added WordNet 1.7 data files.
  - New modules in nltk-contrib:
    - Added a restructured version of the token module.  Some version
      of this module is likely to replace nltk.token eventually.
    - Added Trevor's restructuring of the classifier package, including
      a boosting classifier, a decision list, and a decision tree.
    - Added a full implementation of Lesk's dictionary based
      tagger, including an easy interface to plug in new dictionaries.
      Currently supports Roget's and WordNet.
    - Added a simulated annealing tagger.
    - Added pywordnet.
    - Added a wordnet stemmer.
    - Added a stub implementation of annotation graphs.
    - Added an interface to Babelfish (http://babelfish.altavista.com).

Changes:
  - Split nltk.token into two modules:
    - The new nltk.token just defines Token and Location
    - The new nltk.tokenizer defines all tokenizers
  - Moved nltk.fsa to nltk_contrib
  - Various bug fixes

##////////////////////////////////////////////////////////////////////////
##  Version 1.1
##////////////////////////////////////////////////////////////////////////
Mon Aug 18 22:51:26 EDT 2003

New Packages:
  - Added nltk_data, a new package that contains sample data and
    corpora that can be easily used with NLTK.  The new nltk.corpus
    module can be used to access the data in this package.  Several
    new tokenizers (such as TreebankTokenizer) are available to
    tokenize this data appropriately.
  - Added nltk_contrib, a new package that contains third-party
    contributions to the toolkit.

Additions to the main nltk package:
  - Added a demo() function to every module defined by the toolkit.
    These demo functions can be imported and called; or they can be
    run by running the individual module as a script.
  - Added a stemmer interface, and an implementation of the Porter
    Stemmer.

Graphical Demonstration Improvements:
  - Added a colorized grammar editor (nltk.draw.cfg), that is used
    by the parser demos.
  - Added a dialog box to change the demo sentence
  - Added a matrix view to the chart parser demo
  - Interface improvements to the chart parser demo
  - nltk.draw.plot now uses BLT & PWM to draw plots, if they are
    available.
  - Added menus
  - Added text size controls; animation controls; and speed controls.

Changes:
  - Split nltk.token into 2 modules: the new nltk.token defines
    locations and tokens; and the new nltk.tokenizer defines &
    implements tokenization.
  - Redesigned ConditionalProbDist.
  - Changed chart parser rule orders for strategies, to match standard
    parsing algorithms (eg Earley)
  - Updated nltk.parser.chunk to generate flat Trees (instead of lists
    of lists).  This allows it to chunk multiple categories.

Minor improvements:
  - nltk.cfg.nonterminals() provides a more convenient and less
    error prone way to create nonterminals.
  - Renamed ProbabilisticMixin.p() to prob()
  - Overload the division operator of Nonterminal, to make it easy
    to create "slashed" categories.
  - Added logprob() methods, for reading probabilities in log space.
  - Added nltk.set.MutableSet, for encoding mutable sets.
  - TaggedTokenizer and ChunkedTaggedTokenizers now ignore untagged
    words (rather than assigning them tags of None).
  - Simplified the Location & Token constructors
  - Added "source" optional parameter to TokenizerI.tokenize; and
    "unit" optional parameter TokenizerI's constructor.
  - Removed type consistency checks from nltk.tree

Testing & bug fixes, etc
  - Assorted bug fixes in nltk.cfg, nltk.probability, nltk.set,
    nltk.tagger, nltk.tokenizer, nltk.tree, nltk.classifier.maxent,
    nltk.draw, nltk.draw.plot, nltk.draw.chart, nltk.draw.rdparser,
    nltk.draw.srparser, nltk.draw.tree, nltk.parser,
    nltk.parser.probabilistic,
  - Expanded unit test suite
  - Assorted efficiency improvements


##////////////////////////////////////////////////////////////////////////
##  Version 1.0
##////////////////////////////////////////////////////////////////////////
Sat Jul  6 06:52:19 EDT 2002

- Complete rewrite of nltk.probability.  The new version is cleaner and
  simpler, and supports statistical estimation better.  It gets rid of
  Events, and uses new ConditionalFreqDist and ConditionalProbDistI
  classes to handle conditional distributions.  nltk.probability now
  has working MLE, ELE, laplace, lidstone, heldout, and cross-validation
  statistical estimators.
- Defined a new base class for sequential taggers.  This simplifies
  the tagger implementations, and allows us to do much better
  backoff.  In particular, it solves the problem of higher order
  taggers not having access to the tag predictions generated by lower
  order taggers.
- Replaced chart edge class with an interface & two implementations:
  TokenEdge and ProductionEdge.  A TokenEdge records that a token
  occurs in the text; a ProductionEdge records that a tree has been
  found to be (partially) consistent with the text.
- New regular expressions tutorial
- Updated tagging tutorial, probability tutorial, and introduction
  tutorial with new material
- Assorted minor improvements
  - Simplified type checking system
  - Changed "unknown tag" for taggers from "UNK" to None.
  - Tokens can now have float start/end points
  - Added common "AbstractTree" subclass for Tree and TreeToken
  - Changed/fixed the implementation of the Naive Bayes classifier
  - Interface improvements for the chart parsing demo
  - Added nltk.draw.tree.draw_trees, which can be used to draw a list of
    trees in a single window.


##////////////////////////////////////////////////////////////////////////
##  Version 0.7.1
##////////////////////////////////////////////////////////////////////////
Sat Jun  8 21:10:35 EDT 2002

- Fixed broken "view grammar," "view lexicon," and "reset" buttons on
  the chart parser demo.
- Added a "help" button for the chart parser demo
- Fixed font sizes for chart parser demo, recursive descent parser
  demo, and shift/reduce parser demo.
- Fixed name change bug in stepping shift/reduce parser

##////////////////////////////////////////////////////////////////////////
##  Version 0.7
##////////////////////////////////////////////////////////////////////////
Fri Jun  7 10:57:40 EDT 2002

- Complete re-implementation of the recursive descent (top-down)
  parser.  The new implementation should be easier for students to
  understand.
- New stepping versions of the recursive-descent parser and the shift/
  reduce parser.
- New interactive tool for experimenting with the shift/reduce parser.
- New interactive tool for experimenting with the recursive descent 
  parser.
- Major re-organization of nltk.draw (visualization & interactive tools)
  - New "CanvasWidget" class, based loosely on CLIG.  (See CLIG homepage
    at <http://www.ags.uni-sb.de/~konrad/clig.html> for info about CLIG)
  - Added basic canvas widgets: TextWidget, SymbolWidget, BoxWidget,
    OvalWidget, ParenWidget, BracketWidget, SequenceWidget,StackWidget, 
    SpaceWidget, ScrollWatcherWidget
  - Added CanvasFrame, which manages a canvas & scrollbars.
  - New canvas widgets for displaying trees: TreeSegmentWidget and
    TreeWidget
  - Minor fixes to nltk.draw.chart
  - Minor fixes to nltk.draw.fsa
- Added "tree positions" to Tree and TreeToken: now you can use
  "mytree[3,2,4]" (or even "loc=(3,2,4); mytree[loc]") to mean
  "mytree.children()[3].children()[2].children()[4]"
- Added with_substitution member to Tree and TreeToken 

##////////////////////////////////////////////////////////////////////////
##  Version 0.6
##////////////////////////////////////////////////////////////////////////
Fri May 31 03:15:47 EDT 2002

Grammars:
  - Added nltk.cfg, which defines context gree grammars (and
    probablistic CFGs).  This module replaces nltk.rule.  (The old "Rule"
    class is replaced by the Production class.)

Parsers:
  - Added probabilistic parser interface with two basic implementations,
    and a tutorial. 
  - Added a recursive descent top-down parser.
  - Major reorganization of the parser modules (nltk.parser,
    nltk.srparser, nltk.chartparser, nltk.chunkparser,
    nltk.rechunkparser, nltk.srparser_template).  They have been
    collected into a single package with 3 sub-modules:
      - nltk.parser defines ParserI and 2 simple parser implementations
        (ShiftReduceParser and RecursiveDescentParser)
      - nltk.parser.chart defines chart parsing data classes, and 3
        chart parser implementations (ChartParser, SteppingChartParser,
        and IncrementalChartParser)
      - nltk.parser.chunk defines ChunkParserI and the regexp-based
        chunk parser REChunkParser.  (n.b., as of version 0.6, chunk
        parsers are still not technically parsers, since they return
        chunk structures, not trees.  This will change in version 0.7).
      - nltk.parser.probabilistic defines ProbabilisticParserI, and two
        basic probabilistic parser implementations (ViterbiPCFGParser
        and BottomUpPCFGChartParser).
  - ParserI interface redefined, in accordance with multivalue technical
    report.
  - Major reorganization of the chart parser classes:
    - Merged DottedRule and Edge into a single class (Edge)
    - Removed several Edge methods (self_loop_end, fr, etc).
    - Got rid of the static/edge triggered rule distinction.  
    - Replaced Strategy class with a simple list of ChartRules.
    - Replaced old ChartParser with 2 new implementations:
      - (the new) ChartParser uses rules that implement ChartRuleI
        (similar to the old static rules).  It is simpler, but less
        efficient.  
      - IncrementalChartParser uses rules that implement
        IncrementalChartRuleI (similar to the old edge triggered
        rules).  It is slightly more complex, but more efficient.
    - Updated SteppingChartParser to derive from the new ChartParser
      class, and cleaned it up.
  - Improved trace output for parsers
  - Simplified the algorithm used by ShiftReduceParser (was SRParser).
  - Several improvements and bug fixes for the chart parser demo tool
    (nltk.draw.chart)

Finite State Automata:  
  - Added nltk.fsa, which defines finite state automata, and a function
    to convert regular expressions to FSAs.
  - Added FSA visualization tool (nltk.draw.fsa).

Classifiers:
  - Added a ConfusionMatrix class, which encodes and displays the
    confusion matrix for a classifier on a given task.
  - Added MultiBagOfWordsFDList, an integer-valued feature detector
    list constructed from a set of + words and a set of labels.
  - Changed GIS_FDList to work with non-binary features.

Other:
  - Added CharTokenizer, a character tokenizer that skips whitespace.
  - Added several probabilistic mixin classes (ProbabilisticToken, 
    ProbabilisticTreeToken, etc.)
  - Several minor bug fixes
  - Improved/added a number of docstrings
  - Changed nltk.draw.tree to produce more attractive trees.

##////////////////////////////////////////////////////////////////////////
##  Version 0.5
##////////////////////////////////////////////////////////////////////////
Thu Dec 13 13:07:54 EST 2001

- Implemented a text classification package.  This package includes
  data structures & interfaces for text classification, classifier
  training, feature-based text classification, feature selection; 
  a Naive Bayes classifier, and a maximum entropy classifier.

##////////////////////////////////////////////////////////////////////////
##  Version 0.4.1
##////////////////////////////////////////////////////////////////////////
Fri Oct 19 06:00:00 EDT 2001

- Re-implemented ChunkScore for efficiency reasons.  Added len()
  member to ChunkScore (gives the number of chunks scored).  
- Implemented pretty-printing __str__ method for CFFreqDist
- Added bins optional arg for CFFreqDist constructor, which is used 
  to generate B and Nr[0] correctly.
- Fixed HeldOutProbDist
- Implemented CrossValidationProbDist
- Added probability test code
- RETokenizers can now use negative regular expressions (specified via
  an optional arg to the RETokenizer constructor)
- RETokenizer.tokenize source arg changed to a keyword arg (this may
  change again -- we will define some standard for passing source &
  unit to tokenizers)


##////////////////////////////////////////////////////////////////////////
##  Version 0.4
##////////////////////////////////////////////////////////////////////////
Sat Oct 14 06:00:00 EDT 2001

- Changed terminology in the chart parser
- Added more meta-information in setup.py and __init__.py
- Added CHANGELOG.TXT and INSTALL.TXT files to the source distribution.
- Added a "getrule" keyword arg to SteppingChartParser.step() that
  causes it to return an (edge, rule) pair (instead of just an edge).
  This is used by nltk.draw.chart.ChartDemo
- Removed nltk.draw_tree & nltk.draw_treetoken (they are obsolete)
- Minor reference documentation fixes
- Moved ChunkedTaggedTokenizer from tagger to chunkparser.py;
  updated the implementation of ChunkedTaggedTokenizer to eliminate
  a bug (index was incremented for [ and ])
- Moved unchunk to chunkparser.py
- Moved chunk scoring code to chunkparser.py.  Converted scoring
  code to the ChunkScore class.  
- Major overhaul of the rechunkparser module.
- Fixed a divide-by-zero bug in Set.f_measure
- Fixed bug in WSTokenizer.xtokenize (it was ignoring source/unit)
- Added a LineTokenizer, which tokenizes a file into sentences, based
  on newline characters.
- Fixed minor docstring bug in WSTokenizer
- Changed Token to print verbose string rep from __repr__.  I've been
  going back and forth on how we should use different tostring() type
  methods.  This will all get standardized/regularized at some point.
  Probably by version 0.5
- Renamed TreeToken.location() to TreeToken.loc()
- Improvements to nltk.draw.chart, including: zooming; improved edge
  highlighting; root window for ChartView constructor is now optional; 
  control over whether the tree-view & source-view frames should be
  shown; text field that displays the rule that generated the last
  edge; different default test grammar/string.  Several bug fixes.
- Improvements to nltk.draw.plot_graph and nltk.draw.tree_edit
- Changed all imports of nltk modules to use full names (e.g.
  "nltk.token" instead of just "token")


##////////////////////////////////////////////////////////////////////////
##  Version 0.3
##////////////////////////////////////////////////////////////////////////

- Added a copy of the IBM CPL in LICENSE.TXT
- Major reorganization to the chart parser: 
    - added a new StrategyClass, which encodes a strategy
    - formalized the notions of chart rule & strategy in the
      comments/documentation.
    - Divided rules into "static" rules and "edge-triggered" (or
      "dynamic") rules.
    - Changed rules from methods to functions, to facilitate
      user-definition of new rules.
    - Defined standard chart-parsing strategies
    - wrote a new implementation of the stepping chart parser.
    - replaced Edge.children with Edge.tree
    - Added a length operator definition for Chart
    - cleaned up the edge class
- Renamed ParserI.parseTypes to ParserI.parse_types (to be
  consistent with our naming conventions)
- Changed printed representations of frequency distributions to
  make it more likely that students won't confuse them with
  dictionaries.
- Renamed Rule.dotted to Rule.drule (for consistency with
  ChartParser)
- Improved printed repr functions for Rule
- Minor docstring/comment fixes
- Added a template for a shift-reduce parser
- Renamed TreeToken.location() to TreeToken.loc()
- Added a children() accessor to both Tree and TreeToken
- Updated nltk.draw.chart in response to changes in
  nltk.chartparser
- Improved nltk.draw.chart.ChartDemo: implemented tree view;
  added a way to view rules.
- Initial import of draw.tree_edit, a module for drawing and
  editing Trees and TreeTokens.

##////////////////////////////////////////////////////////////////////////
##  Version 0.2
##////////////////////////////////////////////////////////////////////////

- Updated reference documentation strings
- Added cond_samples to frequency distributions
- Fixed Nr method of frequency distributions
- Fixed bug in SimpleFreqDist.cond_max
- Changed implementation of CFFreqDist to use a dictionary of
  SimpleFreqDists, instead of a dictionary of dictionaries.
- Added a preliminary implementation of a probability distribution
  that uses held-out data.
- Added untag() and accuracy() methods to nltk.tagger, for testing
  taggers.  
- Cleaned up BackoffTagger implementation
- Fixed minor bug in BackoffTagger: it had incorrect behavior if all
  of the taggers returned UNK for any token.
- Added UnigramTagger
- Fixed bugs in comparison methods for TaggedType.
- Changed parseTaggedType to convert tags to upper case.
- Added xtokenize() methods to WSTokenizer and RETokenizer (and
  associated class _XTokenTuple, which is the actual return type of
  the xtokenize methods).  This method is analogous to xrange() or
  xreadlines() -- it only constructs tokens as they are requested.
  This can result in much more memory-efficient use of tokenized
  texts. 
- Fixed minor bugs in Location.__str__ that was causing it to not print
  sources-of-sources 
- Fixed bug in Token repr function, that used != instead of "is not"
  to compare to None.
- Fixed minor bug in RETokenizer caused unwanted behavior if the
  regexp didn't match anything in the text at all.
- Fixed code for drawing trees; added a new module, draw.tree, based
  on the old draw_tree and draw_token modules, for displaying trees.
- Fixed bug in draw.plot_graph that caused log-log plots of integer
  values to get quantized in an unreasonable way
- Added some more unit testing test cases
- In the chartparser, broke BU_init_step into BU_init_edge_step and
  BU_init_step (for the latter, you don't have to specify which edge
  to use). 

##////////////////////////////////////////////////////////////////////////
##  Version 0.1
##////////////////////////////////////////////////////////////////////////

- Initial release
