JSON to Data (Text Processing)
Synopsis
Flattens and transforms JSON documents into an example set.Description
This operator transforms a collection of JSON documents into an example set such that each JSON document corresponds to one example.
Nested objects are flattened using the dot as separator. Square brackets are used to denote arrays.
For instance, flattening the JSON document {"point": {"x": ..., "y": ... }} results in an example with two attributes: "point.x" and "point.y". Similarly, a document of the form {"x": ..., "y": [..., ...]} results in an example set with the attributes "x", "y[0]" and "y[1]".
Input
- documents (Collection)
A collection of JSON documents to be transformed into an example set.
Output
- example set (Data table)
The example set containing the entries extracted from the JSON documents.
Parameters
- ignore_arrays If the checkbox is activated, the operator ignores nested array structures. If the checkbox is deactivated, the operator includes nested array structures. Range: boolean
- limit_attributes If the checkbox is activated, the additional parameter "minimal examples (absolute)" is displayed. Only attributes are included that have at least the specified number of examples in the input data. If the checkbox is deactivated, all attributes are included independent of the number of examples in the input data. Range: boolean
- minimal_examples_(absolute) This parameter is only visible when the "limit attributes" checkbox is activated. Enter the minimum number of examples for the attributes. Range: integer
- skip_invalid_documents If the checkbox is activated, invalid documents (i.e., not in JSON format) are skipped and a warning is logged. If the checkbox is deactivated, the process execution is stopped. Range: boolean
- guess_data_types If the checkbox is activated, the data type for each value will be guessed. If the checkbox is deactivated, the original JSON data types will be kept. Range: boolean
- keep_missing_attributes If the checkbox is activated, attributes that only have missing values will be kept. If the checkbox is deactivated, attributes that only have missing values will be removed from the resulting example set. Range: boolean
- missing_values_aliases This comma separated list will be looked up for every nominal value. If the value is contained it will be interpreted as a missing value. The lookup is case sensitive. Range: String