Use either direct subclasses of the appropriate node and set them on the
PrototypicalNodeFactory
,
or use a dynamic proxy implementing the required node type interface.
In the former case this avoids the wrapping and delegation, while the latter
case handles the wrapping and delegation without this class.
Here is an example of how to use dynamic proxies to accomplish the same effect as using decorators to wrap Text nodes:
import java.lang.reflect.InvocationHandler; import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method; import java.lang.reflect.Proxy; import org.htmlparser.Parser; import org.htmlparser.PrototypicalNodeFactory; import org.htmlparser.Text; import org.htmlparser.nodes.TextNode; import org.htmlparser.util.ParserException; public class TextProxy implements InvocationHandler { protected Object mObject; public static Object newInstance (Object object) { Class cls; cls = object.getClass (); return (Proxy.newProxyInstance ( cls.getClassLoader (), cls.getInterfaces (), new TextProxy (object))); } private TextProxy (Object object) { mObject = object; } public Object invoke (Object proxy, Method m, Object[] args) throws Throwable { Object result; String name; try { result = m.invoke (mObject, args); name = m.getName (); if (name.equals ("clone")) result = newInstance (result); // wrap the cloned object else if (name.equals ("doSemanticAction")) // or other methods System.out.println (mObject); // do the needful on the TextNode } catch (InvocationTargetException e) { throw e.getTargetException (); } catch (Exception e) { throw new RuntimeException ("unexpected invocation exception: " + e.getMessage()); } finally { } return (result); } public static void main (String[] args) throws ParserException { // create the wrapped text node and set it as the prototype Text text = (Text) TextProxy.newInstance (new TextNode (null, 0, 0)); PrototypicalNodeFactory factory = new PrototypicalNodeFactory (); factory.setTextPrototype (text); // perform the parse Parser parser = new Parser (args[0]); parser.setNodeFactory (factory); parser.parse (null); } }
public abstract class AbstractNodeDecorator extends java.lang.Object implements Text
Modifier | Constructor and Description |
---|---|
protected |
AbstractNodeDecorator(Text delegate)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
void |
accept(NodeVisitor visitor)
Deprecated.
Apply the visitor to this node.
|
java.lang.Object |
clone()
Deprecated.
Clone this object.
|
void |
collectInto(NodeList list,
NodeFilter filter)
Deprecated.
Collect this node and its child nodes into a list, provided the node
satisfies the filtering criteria.
|
void |
doSemanticAction()
Deprecated.
Perform the meaning of this tag.
|
boolean |
equals(java.lang.Object arg0)
Deprecated.
|
NodeList |
getChildren()
Deprecated.
Get the children of this node.
|
int |
getEndPosition()
Deprecated.
Gets the ending position of the node.
|
Page |
getPage()
Deprecated.
Get the page this node came from.
|
Node |
getParent()
Deprecated.
Get the parent of this node.
|
int |
getStartPosition()
Deprecated.
Gets the starting position of the node.
|
java.lang.String |
getText()
Deprecated.
Accesses the textual contents of the node.
|
void |
setChildren(NodeList children)
Deprecated.
Set the children of this node.
|
void |
setEndPosition(int position)
Deprecated.
Sets the ending position of the node.
|
void |
setPage(Page page)
Deprecated.
Set the page this node came from.
|
void |
setParent(Node node)
Deprecated.
Sets the parent of this node.
|
void |
setStartPosition(int position)
Deprecated.
Sets the starting position of the node.
|
void |
setText(java.lang.String text)
Deprecated.
Sets the contents of the node.
|
java.lang.String |
toHtml()
Deprecated.
Return the HTML for this node.
|
java.lang.String |
toPlainTextString()
Deprecated.
A string representation of the node.
|
java.lang.String |
toString()
Deprecated.
Return the string representation of the node.
|
protected Text delegate
protected AbstractNodeDecorator(Text delegate)
public java.lang.Object clone() throws java.lang.CloneNotSupportedException
public void accept(NodeVisitor visitor)
Node
public void collectInto(NodeList list, NodeFilter filter)
Node
This mechanism allows powerful filtering code to be written very
easily, without bothering about collection of embedded tags separately.
e.g. when we try to get all the links on a page, it is not possible to
get it at the top-level, as many tags (like form tags), can contain
links embedded in them. We could get the links out by checking if the
current node is a CompositeTag
, and going
through its children. So this method provides a convenient way to do
this.
Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:
NodeList list = new NodeList (); NodeFilter filter = new TagNameFilter ("A"); for (NodeIterator e = parser.elements (); e.hasMoreNodes ();) e.nextNode ().collectInto (list, filter);Thus,
list
will hold all the link nodes, irrespective of how
deep the links are embedded.
Another way to accomplish the same objective is:
NodeList list = new NodeList (); NodeFilter filter = new TagClassFilter (LinkTag.class); for (NodeIterator e = parser.elements (); e.hasMoreNodes ();) e.nextNode ().collectInto (list, filter);This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.
collectInto
in interface Node
list
- The list to collect nodes into.filter
- The criteria to use when deciding if a node should
be added to the list.public int getStartPosition()
getStartPosition
in interface Node
public void setStartPosition(int position)
setStartPosition
in interface Node
position
- The new start position.public int getEndPosition()
getEndPosition
in interface Node
public void setEndPosition(int position)
setEndPosition
in interface Node
position
- The new end position.public Page getPage()
public void setPage(Page page)
public boolean equals(java.lang.Object arg0)
equals
in class java.lang.Object
public Node getParent()
Node
Lexer
.
Currently, the object returned from this method can be safely cast to a
CompositeTag
, but this behaviour should not
be expected in the future.public java.lang.String getText()
Text
public void setParent(Node node)
Node
public NodeList getChildren()
getChildren
in interface Node
null
otherwise.public void setChildren(NodeList children)
setChildren
in interface Node
children
- The new list of children this node contains.public void setText(java.lang.String text)
Text
public java.lang.String toHtml()
Node
public java.lang.String toPlainTextString()
Node
for (Enumeration e = parser.elements (); e.hasMoreElements ();) // or do whatever processing you wish with the plain text string System.out.println ((Node)e.nextElement ()).toPlainTextString ());
toPlainTextString
in interface Node
public java.lang.String toString()
Node
System.out.println (node);or within a debugging environment.
public void doSemanticAction() throws ParserException
Node
Node.getChildren()
.doSemanticAction
in interface Node
ParserException
- If a problem is encountered performing the
semantic action.HTML Parser is an open source library released under LGPL.