Categories

Versions

Filter Tokens using ExampleSet (Operator Toolbox)

Synopsis

This operator uses an ExampleSet to filter a list of words inside a ''Process Documents'' operator.

Description

This operator can be used in the ''Process Documents'' operator and allows to provide a custom list of tokens to be filtered out. It is similar to the Filter Token (Dictionary) operator, except the input here is an ExampleSet rather than a file.

Input

  • doc

    The documents input port.

  • exa (Data table)

    The ExampleSet with the tokens.

Output

  • doc

    The resulting document.

Parameters

  • attribute The name of the attribute that should be used for filtering.
  • case sensitive If true words are matched case sensitive.
  • invert filter If this parameter is set to true the selected condition is inverted. The provided list is thus treated as a white list instead of a black list.

Tutorial Processes

Filter fruit names from a document

In this example we are creating an ExampleSet and then filtering those words. This can be used for filtering stop words or other specific words one may be interested in.