Categories

Versions

Filter Tokens Using ExampleSet (Operator Toolbox)

Synopsis

This operator uses an ExampleSet to filter a list of words inside a ''Process Documents'' operator.

Description

This operator can be used in the ''Process Documents'' operator and allows to provide a custom list of tokens to be filtered out. It is similar to the Filter Token (Dictionary) operator, except the input here is an ExampleSet rather than a file.

Input

  • doc

    The documents input port.

  • exa (Data table)

    The ExampleSet with the tokens.

Output

  • doc

    The resulting document.

Parameters

  • attribute The name of the attribute that should be used for filtering.
  • case sensitive If true words are matched case-sensitive.
  • invert filter If this parameter is set to true the selected condition is inverted. The provided list is thus treated as a white list instead of a black list.

Tutorial Processes

Filter fruit names from a document

In this example we are creating an ExampleSet and then filtering those words. This can be used for filtering stopwords or other specific words one may be interested in.