Filter Tokens Using ExampleSet
(Operator Toolbox)
Synopsis
This operator uses an ExampleSet to filter a list of words inside a ''Process Documents'' operator.Description
This operator can be used in the ''Process Documents'' operator and allows to provide a custom list of tokens to be filtered out. It is similar to the Filter Token (Dictionary) operator, except the input here is an ExampleSet rather than a file.
Input
docThe documents input port.
exa (Data table)The ExampleSet with the tokens.
Output
docThe resulting document.
Parameters
- attribute The name of the attribute that should be used for filtering.
- case sensitive If true words are matched case-sensitive.
- invert filter If this parameter is set to true the selected condition is inverted. The provided list is thus treated as a white list instead of a black list.
Tutorial Processes
Filter fruit names from a document
In this example we are creating an ExampleSet and then filtering those words. This can be used for filtering stopwords or other specific words one may be interested in.