org.inria.ns.reflex.xml.filter.helpers
Class Tokenizer

java.lang.Object
  extended by org.inria.ns.reflex.xml.filter.helpers.Tokenizer
All Implemented Interfaces:
Configurable, Filter, StandaloneFilter

public class Tokenizer
extends Object
implements Filter, StandaloneFilter, Configurable

A Tokenizer is a filter that fire SAX events from a raw text input.

It fires a startDocument() event then characters() events for each sequence of the input that matches a regular expression, then an endDocument() event.

The separators defined by the pattern are not firing SAX events. The pattern must not match the empty string.

It can be configured with the following parameters :

NameValueDefault
patternThe pattern used for the tokenization\s*
flagsThe options for the interpretation of the regular expression
bufferThe size of the buffer (char size)2048

If the buffer is too small to match the pattern, all the characters present in the buffer will be fired at once.

Author:
Philippe Poulard
See Also:
Pattern, RegexpTokenizer, Names.TOKENIZER_FILTER

Nested Class Summary
 
Nested classes/interfaces inherited from interface org.inria.ns.reflex.structures.Configurable
Configurable.Impl
 
Constructor Summary
Tokenizer()
          Create a new regexp-based tokenizer.
 
Method Summary
 boolean containsAttribute(Object key)
          Indicates if a parameter of this tokenizer has been set previously.
 Object getAttribute(Object key)
          Get a parameter of this tokenizer.
 Map getAttributes()
          Get the parameters of this tokenizer as a map.
 StandaloneProducer getSAXSource(InputSource inputSource)
          Return a SAX producer that fire characters events for each line read in the input.
 void mergeAttributes(Map attributes)
          Set some parameters to this tokenizer.
 void setAttribute(Object key, Object data)
          Set a parameter to this tokenizer.
 void setAttributes(Map parameters)
          Set some parameters to this tokenizer.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Tokenizer

public Tokenizer()
Create a new regexp-based tokenizer.

Method Detail

getSAXSource

public StandaloneProducer getSAXSource(InputSource inputSource)
Return a SAX producer that fire characters events for each line read in the input.

Specified by:
getSAXSource in interface StandaloneFilter
Parameters:
inputSource - The source to read.
Returns:
A SAX producer.
See Also:
StandaloneFilter.getSAXSource(InputSource)

setAttribute

public void setAttribute(Object key,
                         Object data)
Set a parameter to this tokenizer.

Specified by:
setAttribute in interface Configurable
Parameters:
key - "pattern", "flags", or "buffer".
data - The data.
See Also:
Configurable.setAttribute(java.lang.Object, java.lang.Object)

setAttributes

public void setAttributes(Map parameters)
Set some parameters to this tokenizer.

Specified by:
setAttributes in interface Configurable
Parameters:
parameters - The map of parameters to set.
See Also:
setAttribute(Object, Object), Configurable.setAttributes(java.util.Map)

getAttribute

public Object getAttribute(Object key)
Get a parameter of this tokenizer.

Specified by:
getAttribute in interface Configurable
Parameters:
key - "pattern", "flags", or "buffer".
Returns:
The value of that parameter.
See Also:
Configurable.getAttribute(java.lang.Object)

getAttributes

public Map getAttributes()
Get the parameters of this tokenizer as a map.

Specified by:
getAttributes in interface Configurable
Returns:
A map of {"pattern" -> String, "flags" -> String, "buffer" -> Number}
See Also:
Configurable.getAttributes()

containsAttribute

public boolean containsAttribute(Object key)
Indicates if a parameter of this tokenizer has been set previously.

Specified by:
containsAttribute in interface Configurable
Parameters:
key - "pattern", "flags", or "buffer".
Returns:
true if that parameter has been set, false otherwise.
See Also:
getAttribute(Object), Configurable.containsAttribute(java.lang.Object)

mergeAttributes

public void mergeAttributes(Map attributes)
Set some parameters to this tokenizer.

Specified by:
mergeAttributes in interface Configurable
Parameters:
parameters - The map of parameters to set.
See Also:
setAttribute(Object, Object), Configurable.mergeAttributes(java.util.Map)