Class ParseHTMLAction

  extended by org.inria.ns.reflex.processor.core.AbstractAction
      extended by org.inria.ns.reflex.processor.core.AbstractSetAction
          extended by org.inria.ns.reflex.processor.xcl.ParseHTMLAction
All Implemented Interfaces:
Computable, Executable, Presentable, NamespaceContextFactory

public class ParseHTMLAction
extends AbstractSetAction

The <xcl:parse-html> element parses the HTML data source specified in the source attribute ; after parsing the HTML is exposed as a well-formed XML document (DOM or SAX).

The HTML parser will scan HTML files and "fix up" many common mistakes found in HTML documents such as adding missing parent elements, closing elements with optional end tags, handling mismatched inline element tags...


The parsing may be tuned by using the features and properties of the underlying implementation. This implementation is backed by CyberNeko HTML Parser.

HTML sources

The XML source processed may be :

Additionnaly, if the source attribute is replaced by text-source, the parsing will be performed on the string value of the attribute.

Philippe Poulard

Nested Class Summary
Nested classes/interfaces inherited from class org.inria.ns.reflex.processor.core.AbstractAction
AbstractAction.ParameterAdapter, AbstractAction.UselessAction
Field Summary
Fields inherited from class org.inria.ns.reflex.processor.core.AbstractSetAction
Fields inherited from class org.inria.ns.reflex.processor.core.AbstractAction
actions, parent, processorInstance
Constructor Summary
ParseHTMLAction(Expression source, boolean parseText, EvaluableExpression style, Expression elemCase, Expression attrCase, Element element, AbstractAction parent)
          Create a new instance of ParseHTMLAction.
Method Summary
 Object getComputedValue(DataSet dataSet)
          Return the computed value of the property, by parsing an HTML source.
 ExternalIdentifierResolver getExternalIdentifierResolver(ErrorHandler eh)
static AbstractAction unmarshal(AbstractAction parent, Element element)
          XML unmarshaller for ParseAction.
Methods inherited from class org.inria.ns.reflex.processor.core.AbstractSetAction
addProperty, getComputedName, getName, getValue, runAction, scope, unmarshalScope
Methods inherited from class org.inria.ns.reflex.processor.core.AbstractAction
addAction, addFallbackAction, createContext, getCanonicalPath, getFallbackAction, getLocalFallbackAction, getLogger, getNamespaceContext, getNode, getParent, recover, recover, removeFallbackAction, reorganize, run, runActions, toPrettyString, toPrettyString, toString
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Detail


public ParseHTMLAction(Expression source,
                       boolean parseText,
                       EvaluableExpression style,
                       Expression elemCase,
                       Expression attrCase,
                       Element element,
                       AbstractAction parent)
                throws XMLException
Create a new instance of ParseHTMLAction.

source - The source of the datas to parse, that will be the value of the property, as an Expression.
parseText - true if the source must be parsed as a text source, false otherwise.
style - The style of parsing : DOM or SAX.
elemCase - Indicates how to process the case of HTML elements.
attrCase - Indicates how to process the case of HTML attributes.
element - The element from which the action has been unmarshalled. Used for namespace prefix resolution when performing XPath expressions.
parent - The action depending from.
UnmarshalException - When the "name" attribute is not a valid value template.
Method Detail


public static AbstractAction unmarshal(AbstractAction parent,
                                       Element element)
                                throws XMLException
XML unmarshaller for ParseAction.

parent - The action depending from
element - The XML element to unmarshall.
The ParseAction created.
UnmarshalException - When the element and its content is not those expected.


public Object getComputedValue(DataSet dataSet)
                        throws ExecutionException,
Return the computed value of the property, by parsing an HTML source.

Specified by:
getComputedValue in interface Computable
Specified by:
getComputedValue in class AbstractSetAction
dataSet - The set of datas used when the name is computed.
The computed value of the property.
ExecutionException - If the computation can't be performed.


public ExternalIdentifierResolver getExternalIdentifierResolver(ErrorHandler eh)