Filter Tokens using ExampleSet (Operator Toolbox)
SynopsisThis operator uses an ExampleSet to filter a list of words inside a ''Process Documents'' operator.
This operator can be used in the ''Process Documents'' operator and allows to provide a custom list of tokens to be filtered out. It is similar to the Filter Token (Dictionary) operator, except the input here is an ExampleSet rather than a file.
The documents input port.
- exa (Data Table)
The ExampleSet with the tokens.
The resulting document.
- attribute The name of the attribute that should be used for filtering. Range:
- case_sensitive If true words are matched case sensitive. Range:
- invert_filter If this parameter is set to true the selected condition is inverted. The provided list is thus treated as a white list instead of a black list. Range:
Filter fruit names from a document
In this example we are creating an ExampleSet and then filtering those words. This can be used for filtering stop words or other specific words one may be interested in.