Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Fill Data Gaps (RapidMiner Studio Core)

Synopsis

This operator fills the gaps (based on the ID attribute) in the given ExampleSet by adding new examples in the gaps. The new example will have null values.

Description

The Fill Data Gaps operator fills the gaps (based on gaps in the ID attribute) in the given ExampleSet by adding new examples in the gaps. The new examples will have null values for all attributes (except the id attribute) which can be replenished by operators like the Replace Missing Values operator. It is ideal that the ID attribute should be of integer type. This operator performs the following steps:

  • The data is sorted according to the ID attribute
  • All occurring distances between consecutive ID values are calculated.
  • The greatest common divisor (GCD) of all distances is calculated.
  • All rows which would have an ID value which is a multiple of the GCD but are missing are added to the data set.

Input

  • example set input (Data Table)

    This input port expects an ExampleSet. It is the output of the Subprocess operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data.

Output

  • example set output (Data Table)

    The gaps in the ExampleSet are filled with new examples and the resulting ExampleSet is output of this port.

  • original (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • use_gcd_for_step_sizeThis parameter indicates if the greatest common divisor (GCD) should be calculated and used as the underlying distance between all data points. Range: boolean
  • step_sizeThis parameter is only available when the use gcd for step size parameter is set to false. This parameter specifies the step size to be used for filling the gaps. Range: integer
  • startThis parameter can be used for filling the gaps at the beginning (if they occur) before the first data point. For example, if the ID attribute of the given ExampleSet starts with 3 and the start parameter is set to 1. Then this operator will fill the gaps in the beginning by adding rows with ids 1 and 2. Range: integer
  • endThis parameter can be used for filling the gaps at the end (if they occur) after the last data point. For example, if the ID attribute of the given ExampleSet ends with 100 and the end parameter is set to 105. Then this operator will fill the gaps at the end by adding rows with ids 101 to 105. Range: integer

Tutorial Processes

Introduction to the Fill Data Gaps operator

This Example Process starts with the Subprocess operator which delivers an ExampleSet. A breakpoint is inserted here so that you can have a look at the ExampleSet. You can see that the ExampleSet has 10 examples. Have a look at the id attribute of the ExampleSet. You will see that certain ids are missing: ids 3, 6, 8 and 10. The Fill Data Gaps operator is applied on this ExampleSet to fill these data gaps with examples that have the appropriate ids. You can see the resultant ExampleSet in the Results Workspace. You can see that this ExampleSet has 14 examples. New examples with ids 3, 6, 8 and 10 have been added. But these examples have missing values for all attributes (except the id attribute) which can be replenished by using operators like the Replace Missing Values operator.