Categories

Versions

JSON to Data (Text Processing)

Synopsis

Flattens and transforms JSON documents into an example set.

Description

This operator transforms a collection of JSON documents into an example set such that each JSON document corresponds to one example.

Nested objects are flattened using the dot as separator. Square brackets are used to denote arrays.

For instance, flattening the JSON document {"point": {"x": ..., "y": ... }} results in an example with two attributes: "point.x" and "point.y". Similarly, a document of the form {"x": ..., "y": [..., ...]} results in an example set with the attributes "x", "y[0]" and "y[1]".

Input

  • documents (Collection)

    A collection of JSON documents to be transformed into an example set.

Output

  • example set (Data table)

    The example set containing the entries extracted from the JSON documents.

Parameters

  • ignore arrays If the checkbox is activated, the operator ignores nested array structures. If the checkbox is deactivated, the operator includes nested array structures.
  • limit attributes If the checkbox is activated, the additional parameter "minimal examples (absolute)" is displayed. Only attributes are included that have at least the specified number of examples in the input data. If the checkbox is deactivated, all attributes are included independent of the number of examples in the input data.
  • minimal examples (absolute) This parameter is only visible when the "limit attributes" checkbox is activated. Enter the minimum number of examples for the attributes.
  • skip invalid documents If the checkbox is activated, invalid documents (i.e., not in JSON format) are skipped and a warning is logged. If the checkbox is deactivated, the process execution is stopped.
  • guess data types If the checkbox is activated, the data type for each value will be guessed. If the checkbox is deactivated, the original JSON data types will be kept.
  • keep missing attributes If the checkbox is activated, attributes that only have missing values will be kept. If the checkbox is deactivated, attributes that only have missing values will be removed from the resulting example set.
  • missing values aliases This comma separated list will be looked up for every nominal value. If the value is contained it will be interpreted as a missing value. The lookup is case sensitive.