Iterators over documents, and composition thereof.
This package contains the classes that allow to compose {@linkplain it.unimi.dsi.mg4j.search.DocumentIterator iterators over documents}. Such iterators are returned, for instance, by {@link it.unimi.dsi.mg4j.index.IndexReader#documents(int)}.
MG4J provides minimal-interval semantics. That is, if the index is full-text, a document iterator will provide a list of documents and, for each document, a list of minimal intervals. This intervals denote ranges of positions in the document that satisfy the iterator: for instance, if you compose two documents iterators using an {@link it.unimi.dsi.mg4j.search.AndDocumentIterator}, you will get as a result the intersection of the document lists of the underlying iterators. Moreover, for each document you will get the minimal set of intervals that contain one interval both from the first iterators and from the second one.
This information is of course very useful if you're going to assign a score to the document, as smaller intervals mean a more precise match. At the basic level (e.g., iterators returned by an index), the intervals returned upon a document are intervals of length one containing the term that was used to generate the iterator. Intervals for compound iterators are built in a natural way, preserving minimality. More details can be found in Charles L. A. Clarke and Gordon V. Cormack, Shortest-Substring Retrieval and Ranking (ACM Transactions on Information Systems, vol. 18, no. 1, Jan 2000, pages 44−78). Scorers for documents may be found in the {@link it.unimi.dsi.mg4j.search.score} package.
Note that MG4J provides minimal-interval semantics for a set of indices. This extension is a significant improvement over single-index semantics. However, defining the exact meaning of a query is a nontrivial problem that will be fully dealt with in a forthcoming paper.