Categories

Versions

Filter Tokens (by POS Ratios) (Text Processing)

Synopsis

Filters tokens based on criteria on POS ratios.

Description

This operator keeps only tokens which fulfill specified criteria about Part of Speech (POS) ratios. The operator calculates the amounts of verbs, nouns etc. and keep only tokens which provide a specified amount of those types.

Input

  • document

    The document port.

Output

  • document

    The document port.

Parameters

  • language sourceSpecifies whether the language is set explicitely by the user or specified as a meta data attribute in the document.
  • languageThe language for the used part of speech (POS) tagger.
  • language attributeThe meta data attribute key that contains the iso language code of the document.
  • min ratio adjectivesThe minimum ratio of adjectives for each token to be kept
  • min ratio nounsThe minimum ratio of nouns for each token to be kept
  • min ratio verbsThe minimum ratio of verbs for each token to be kept