You are viewing the RapidMiner Studio documentation for version 8.0 - Check here for latest version

Decision Tree (Multiway) (RapidMiner Studio Core)

Synopsis

This operator generates a multiway decision tree.

Description

The Decision Tree (Multiway) operator is a nested operator i.e. it has a subprocess. The subprocess must have a Tree learner i.e. an operator that expects an ExampleSet and generates a Tree model. You need to have basic understanding of subprocesses in order to apply this operator. Please study the documentation of the Subprocess operator for basic understanding of subprocesses.

If we have only categorical attributes, we can use any C4.5-like algorithm in order to obtain a multi-way decision tree, although we will usually obtain a binary tree if our dataset includes continuous attributes. Using binary splits on numerical attributes implies that the attributes involved should be able to appear several times in the paths from the root of the tree to its leaves. Although these repetitions can be simplfied when converting the decision tree into a set of rules, they make the constructed tree more leafy, unnecessarily deeper, and harder to understand for human experts. The non-binary splits on continuous attributes make the trees easier to understand and also seem to lead to more accurate trees in some domains.

The representation of the data as Tree has the advantage compared with other approaches of being meaningful and easy to interpret. The goal is to create a classification model that predicts the value of the label based on several input attributes of the ExampleSet. Each interior node of tree corresponds to one of the input attributes. The number of edges of an interior node is equal to the number of possible values of the corresponding input attribute. Each leaf node represents a value of the label given the values of the input attributes represented by the path from the root to the leaf. This description can be easily understood by studying the Example Process of the Decision Tree operator.

Input

  • training set (Data Table)

    This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input.

Output

  • model (Decision Tree)

    The Decision Tree is delivered from this output port. This classification model can now be applied on unseen data sets for the prediction of the label attribute.

  • example set (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Tutorial Processes

Introduction to the Decision Tree (Multiway) operator

The Golf data set is loaded using the Retrieve operator. A breakpoint is inserted here so that you can have a look at the ExampleSet. The Decision Tree (Multiway) operator is applied on this ExampleSet. The Decision Tree operator is applied in the subprocess of the Decision Tree (Multiway) operator. The resultant Tree is connected to the result port of the process and it can be seen in the Results Workspace.