Categories

Versions

You are viewing the RapidMiner Developers documentation for version 9.1 - Check here for latest version

API changes in RapidMiner 7.3

RapidMiner 7.3 brings two changes that affect the development of extensions. First, a central API for the creation of data sets (ExampleSet instances) was introduced. Second, the ExampleSet interface was extended by a method to allow for freeing unused data.

These changes only affect you if your extension includes operators that generate new data sets or defines its own ExampleSets (e.g., custom views).

Generating data sets

RapidMiner 7.3 adds the ExampleSets class that provides a set of static methods to build new data sets. Those methods replace direct instantiations of ExampleTable implementations such as the MemoryExampleTable. In particular, all public constructors for the MemoryExampleTable class have been deprecated.

The new API provides methods to create data sets from both columnar and row-oriented data:

import com.rapidminer.example.Attribute;
import com.rapidminer.example.Attributes;
import com.rapidminer.example.ExampleSet;
import com.rapidminer.example.table.AttributeFactory;
import com.rapidminer.example.table.BinominalMapping;
import com.rapidminer.example.table.DataRow;
import com.rapidminer.example.table.DataRowFactory;
import com.rapidminer.example.table.NominalMapping;
import com.rapidminer.example.utils.ExampleSetBuilder;
import com.rapidminer.example.utils.ExampleSets;
import com.rapidminer.operator.Operator;
import com.rapidminer.operator.OperatorDescription;
import com.rapidminer.operator.OperatorException;
import com.rapidminer.tools.Ontology;

import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.List;

/** Create example set using column fillers */
Attribute topTen = AttributeFactory.createAttribute("Top Ten Numbers", Ontology.INTEGER);
Attribute coinFlip = AttributeFactory.createAttribute("Coin Flip", Ontology.BINOMINAL);

NominalMapping coin = new BinominalMapping();
int heads = coin.mapString("Heads");
int tails = coin.mapString("Tails");
coinFlip.setMapping(coin);

ExampleSet numbers = ExampleSets.from(topTen, coinFlip)
    .withRole(topTen, Attributes.ID_NAME)
    .withBlankSize(10)
    .withColumnFiller(topTen, i -> i + 1)
    .withColumnFiller(coinFlip, i -> Math.random() < 0.5 ? heads : tails)
    .build();


/** Create example set from double matrix */
ExampleSetBuilder builder = ExampleSets.from(AttributeFactory.createAttribute(Ontology.REAL),
    AttributeFactory.createAttribute(Ontology.REAL),
    AttributeFactory.createAttribute(Ontology.REAL));

builder.withExpectedSize(10);

double rawData[][] = new double[10][3];
for (double[] row : rawData) {
    builder.addRow(row);
}

ExampleSet matrix = builder.build();


/** Create example set from custom DataRows */
Attribute nominalAttribute = AttributeFactory.createAttribute("Nominal", Ontology.NOMINAL);
Attribute numericalAttribute = AttributeFactory.createAttribute("Numerical", Ontology.REAL);
Attribute dateTimeAttribute = AttributeFactory.createAttribute("DateTime", Ontology.DATE_TIME);

List<Attribute> attributes = new ArrayList<>();
attributes.add(nominalAttribute);
attributes.add(numericalAttribute);
attributes.add(dateTimeAttribute);

ExampleSetBuilder builder = ExampleSets.from(attributes).withExpectedSize(2);
DataRowFactory dataRowFactory = new DataRowFactory(DataRowFactory.TYPE_DOUBLE_ARRAY, '.');

DataRow dataRow = dataRowFactory.create(attributes.size());
// this is important, for nominal attributes the value to set in the data row is the index of the mapped string!
dataRow.set(nominalAttribute, nominalAttribute.getMapping().mapString("Hello"));
dataRow.set(numericalAttribute, 1.0);
dataRow.set(dateTimeAttribute, Instant.now().toEpochMilli());
builder.addDataRow(dataRow);

dataRow = dataRowFactory.create(attributes.size());
// see comment above, index of the mapped string!
dataRow.set(nominalAttribute, nominalAttribute.getMapping().mapString("World"));
dataRow.set(numericalAttribute, 42.0);
dataRow.set(dateTimeAttribute, Instant.now().plus(Duration.ofDays(1)).toEpochMilli());
builder.addDataRow(dataRow);

ExampleSet exampleSet = builder.build();

Freeing unused resources

The ExampleSet interface has been extended by the cleanup() method. RapidMiner will invoke this method at certain points of the process execution, e.g., in between operators. Please note, that the default implementation does nothing.

/**
 * Frees unused resources, if supported by the implementation. Does nothing by default.
 *
 * Should only be used on freshly {@link #clone}ed {@link ExampleSet}s to ensure that the
 * cleaned up resources are not requested afterwards.
 *
 * @since 7.3
 */
public default void cleanup() {
    // does nothing by default
}

When implementing custom example sets that manage their own resources, please use this method to free unused data such as temporary attributes.

If you do not manage your own resources, but implement a custom ExampleSet that acts as view on top another data set, please delegate the call accordingly.

For instance, most of RapidMiner's view implementations reference a single parent example set. Thus, their implementation of cleanup() boils down to:

@Override
public void cleanup() {
    parent.cleanup();
}