Categories

Versions

Replace All Missings (Model Simulator)

Synopsis

This operator handles all missing values in a data set automatically.

Description

A universal missing value handler which handles nominal values by replacing missings with a new value MISSING and replaces missings and infinites in numerical columns with the average of the non-missings. If the complete column is missing, all values will become zero. For date columns, it is using the first known date or a zero date (1 Jan 1970) again if all are missing.

The advantage of this operator is that it handles all data types as well as missing and infinite values and produces a single preprocessing model to deal with all those situations for other data sets, i.e. in scoring scenarios. While this is simpler, it is also less flexible and powerful than using the dedicated missing value handling operators.

Input

  • example set input (Data table)

    This port expects an ExampleSet for which all missing and infinite values should be replaced.

Output

  • example set output (Data table)

    The processed data without any missings or infinite values.

  • original (Data table)

    The original data set.

  • preprocessing model (Preprocessing Model)

    You can apply this model on new data sets with Apply Model to get the same missing value handling than for the specified data set.

Tutorial Processes

Replace All Missings for Titanic

This process replaces all missing values in the Titanic data set. All numerical values are replaced by the average values for the remaining rows, e.g. using the age 29.881 for missing ages. All nominal missings have been replaced by the word MISSING. There are no dates in the Titanic data set, but they would have been replaced by the first date in the set.