Random Forest Encoder (Operator Toolbox)
Synopsis
This operator applies a Random Forest model on a data set. The difference between this operator and the usual Apply Model operator is that this does not create confidences and predictions but rather the confidence for the positive class for each individual tree in the forest. The result is an ExampleSet with X new attributes (where X = number of trees) called score_X. This can be used as an encoder. One application for this is to build a more sophisticated voting model than the typical voting (average) by training another learner on the results. Another use case is to encode nominal features into numerical ones.
This operator also provides a preprocessing model. This preprocessing model can be grouped with any subsequent model to be applied after another.
Input
- exa (Data table)
Input ExampleSet which should be encoded.
- mod (Random Forest Model)
Random Forest model which is used for encoding.
Output
- exa (Data table)
The ExampleSet with the result of the application.
- mod
The passed through Random Forest model.
- pre
A preprocessing model which can be used to apply the same transformation to another data set. This can also be used with the Group Models operator.
Parameters
- remove original attributes If checked all original attributes are removed from the resulting ExampleSet and only the encoding attributes are kept.
Tutorial Processes
Encode Sonar
In this process we read the sonar data set and encode it with a Random Forest with 10 trees. The result is a new ExampleSet containing 10 scores but not the individual attributes anymore.
Encode Golf and use Logistic Regression
In this process we read the Golf data set and create a cross-validated classification model on it. We encode our data with a Random Forest and use a Logistic Regression on the encoded results. The models are grouped together using the preprocessing model provided by the Random Forest Encoder.