Stem Tokens using ExampleSet (Operator Toolbox)
Synopsis
Replaces terms by pattern matching rules. This operator uses an ExampleSet to stem a list of words inside a ''Process Documents'' operator.Description
This operator can be used in your ''Process Documents'' operator and allows to provide a custom list of tokens to be filtered out. It is like the Stem (Dictionary) operator, except the input here is an ExampleSet rather than a file.
It reduces terms to a base form using an external ExampleSet with replacement rules. The ExampleSet must contain a rule per line: targetExpression:pattern1 pattern2 ... where targetExpression is the term to which the input terms are reduced, if it matches any of the patterns. patternX is a simple string or a regular expression. A simple example would be a mapping like: weekday : .*day Please keep in mind, that very short words are filtered out in the default setting of the TextInput operators.
Input
- doc
The documents input port.
- exa (Data table)
The ExampleSet with the tokens.
Output
- doc
The resulting document.
Parameters
- attribute The name of the attribute that should be used for stemming.
Tutorial Processes
Stem weekdays from a document
In this example we are replacing name of weekdays with the word ''weekday''.