Package org.apache.uima.tools.components
Class XmlDetagger
java.lang.Object
org.apache.uima.analysis_component.AnalysisComponent_ImplBase
org.apache.uima.analysis_component.Annotator_ImplBase
org.apache.uima.analysis_component.CasAnnotator_ImplBase
org.apache.uima.tools.components.XmlDetagger
- All Implemented Interfaces:
AnalysisComponent
A multi-sofa annotator that does XML detagging. Reads XML data from the input Sofa (named
"xmlDocument"); this data can be stored in the CAS as a string or array, or it can be a URI to a
remote file. The XML is parsed using the JVM's default parser, and the plain-text content is
written to a new sofa called "plainTextDocument".
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate String
static final String
Name of optional configuration parameter that contains the name of an XML tag that appears in the input file.private SAXParserFactory
private Type
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic AnalysisEngineDescription
Parses and returns the descriptor for this Analysis Gnein.static URL
void
initialize
(UimaContext aContext) Performs any startup tasks required by this component.void
Inputs a CAS to the AnalysisComponent.void
typeSystemInit
(TypeSystem aTypeSystem) Informs this annotator that the CAS TypeSystem has changed.Methods inherited from class org.apache.uima.analysis_component.CasAnnotator_ImplBase
getRequiredCasInterface, process
Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase
getCasInstancesRequired, hasNext, next
Methods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase
batchProcessComplete, collectionProcessComplete, destroy, getContext, getResultSpecification, reconfigure, setResultSpecification
-
Field Details
-
PARAM_TEXT_TAG
Name of optional configuration parameter that contains the name of an XML tag that appears in the input file. Only text that falls within this XML tag will be considered part of the "document" that it is added to the CAS by this CAS Initializer. If not specified, the entire file will be considered the document.- See Also:
-
parserFactory
-
sourceDocInfoType
-
mXmlTagContainingText
-
-
Constructor Details
-
XmlDetagger
public XmlDetagger()
-
-
Method Details
-
initialize
Description copied from interface:AnalysisComponent
Performs any startup tasks required by this component. The framework calls this method only once, just after the AnalysisComponent has been instantiated.The framework supplies this AnalysisComponent with a reference to the
UimaContext
that it will use, for example to access configuration settings or resources. This AnalysisComponent should store a reference to its theUimaContext
for later use.- Specified by:
initialize
in interfaceAnalysisComponent
- Overrides:
initialize
in classAnalysisComponent_ImplBase
- Parameters:
aContext
- Provides access to services and resources managed by the framework. This includes configuration parameters, logging, and access to external resources.- Throws:
ResourceInitializationException
- if this AnalysisComponent cannot initialize successfully.
-
typeSystemInit
Description copied from class:CasAnnotator_ImplBase
Informs this annotator that the CAS TypeSystem has changed. The Analysis Engine calls this method immediately following the call toAnalysisComponent_ImplBase.initialize(org.apache.uima.UimaContext)
, and will call it again whenever the CAS TypeSystem changes.In this method, the Annotator should use the
TypeSystem
to resolve the names of Type and Features to the actualType
andFeature
objects, which can then be used during processing.- Overrides:
typeSystemInit
in classCasAnnotator_ImplBase
- Parameters:
aTypeSystem
- the new type system to use as input to your initialization- Throws:
AnalysisEngineProcessException
- if the provided type system is missing types or features required by this annotator
-
process
Description copied from class:CasAnnotator_ImplBase
Inputs a CAS to the AnalysisComponent. This method should be overriden by subclasses to perform analysis of the CAS.- Specified by:
process
in classCasAnnotator_ImplBase
- Parameters:
aCAS
- A CAS that this AnalysisComponent should process.- Throws:
AnalysisEngineProcessException
- if a problem occurs during processing
-
getDescription
Parses and returns the descriptor for this Analysis Gnein. The descriptor is stored in the uima-core.jar file and located using the ClassLoader.- Returns:
- an object containing all of the information parsed from the descriptor.
- Throws:
InvalidXMLException
- if the descriptor is invalid or missing
-
getDescriptorURL
-