Filter Tokens using ExampleSet (Operator Toolbox)
Synopsis
This operator uses an ExampleSet to filter a list of words inside a ''Process Documents'' operator.Description
This operator can be used in the ''Process Documents'' operator and allows to provide a custom list of tokens to be filtered out. It is similar to the Filter Token (Dictionary) operator, except the input here is an ExampleSet rather than a file.
Input
- doc
The documents input port.
- exa (Data table)
The ExampleSet with the tokens.
Output
- doc
The resulting document.
Parameters
- attribute The name of the attribute that should be used for filtering.
- case sensitive If true words are matched case sensitive.
- invert filter If this parameter is set to true the selected condition is inverted. The provided list is thus treated as a white list instead of a black list.
Tutorial Processes
Filter fruit names from a document
In this example we are creating an ExampleSet and then filtering those words. This can be used for filtering stop words or other specific words one may be interested in.