org.apache.lucene.wikipedia.analysis
public class WikipediaTokenizer extends Tokenizer
Modifier and Type | Field and Description |
---|---|
static String |
BOLD |
static String |
BOLD_ITALICS |
static String |
CATEGORY |
static String |
CITATION |
static String |
EXTERNAL_LINK |
static String |
EXTERNAL_LINK_URL |
static String |
HEADING |
static String |
INTERNAL_LINK |
static String |
ITALICS |
static String |
SUB_HEADING |
Constructor and Description |
---|
WikipediaTokenizer(Reader input)
Creates a new instance of the
WikipediaTokenizer . |
Modifier and Type | Method and Description |
---|---|
Token |
next(Token result)
Returns the next token in the stream, or null at EOS.
|
void |
reset()
Resets this stream to the beginning.
|
void |
reset(Reader reader)
Expert: Reset the tokenizer to a new reader.
|
next
public static final String INTERNAL_LINK
public static final String EXTERNAL_LINK
public static final String EXTERNAL_LINK_URL
public static final String CITATION
public static final String CATEGORY
public static final String BOLD
public static final String ITALICS
public static final String BOLD_ITALICS
public static final String HEADING
public static final String SUB_HEADING
public WikipediaTokenizer(Reader input)
WikipediaTokenizer
. Attaches the
input
to a newly created JFlex scanner.input
- The Input Readerpublic Token next(Token result) throws IOException
TokenStream
This implicitly defines a "contract" between consumers (callers of this method) and producers (implementations of this method that are the source for tokens):
Token.clear()
before setting the fields in it & returning itTokenFilter
is considered a consumer.next
in class TokenStream
result
- a Token that may or may not be used to returnIOException
public void reset() throws IOException
TokenStream
reset
in class TokenStream
IOException
public void reset(Reader reader) throws IOException
Tokenizer
reset
in class Tokenizer
IOException
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.