Set Role (RapidMiner Studio Core)
SynopsisThis operator is used to change the role of one or more attributes.
The Role of an attribute reflects the part played by that attribute in an ExampleSet. Changing the role of an attribute may change the part played by that attribute in a process. One attribute can have exactly one role. This operator is used to change the role of one or more attributes of the input ExampleSet. This is a very simple operator, all you have to do is to select an attribute and select a new role for it. Different learning operators require attributes with different roles. This operator is frequently used to set the right roles for attributes before applying the desired operator. The change in role is only for the current process, i.e. the role of the attribute is not changed permanently in the ExampleSet. The Set Role operator should not be confused with the Rename operator or Type Conversion operators. The Rename operator is used to change the name of an attribute. Many Type Conversion operators are available (at Data Transformation/Type conversion/) to change the type of attributes e.g. the Nominal to Binominal operator, the Numerical to Polynomial operator and many more.
Broadly roles are classified into two types i.e. regular and special. Regular attributes simply describe the examples. Regular attributes are usually used during learning processes. One ExampleSet can have numerous regular attributes. Special attributes are those which identify the examples separately. Special attributes have some specific task. Special roles are: label, id, prediction, cluster, weight, and batch. An ExampleSet can have numerous special attributes but one special role cannot be repeated. If one special role is assigned to more than one attribute in an ExampleSet, all these attributes will change their role to regular except the last one (before version 5.3.14 these attributes were dropped). This concept can be easily understood by studying the attached Example Process. Explanation of various roles is given in the parameters section.
- example set (Data Table)
This input port expects an ExampleSet. It is output of the Retrieve operator in our Example Process. Output of other operators may also be used as input. It is essential that meta data should be attached with the data for the input because the role of an attribute is specified in the meta data of the ExampleSet. The Retrieve operator provides meta data along with the data.
- example set (Data Table)
The ExampleSet with modified role(s) is output of this port.
- original (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
- nameThe name of the attribute whose role should be changed is specified through this parameter. You can select the attribute either from the drop down list or type it manually. Range: string
- target_roleThe target role of the selected attribute is the new role assigned to it. Following target roles are possible:
- regular: Attributes without a special role, i.e. those which simply describe the examples are called regular attributes and just leave out the role designation in most cases. Regular attributes are used as input variables for learning tasks.
- id: This is a special role, it acts as id attribute for the ExampleSet and it is usually unique in every example of the ExampleSet. The id role is used to clearly identify the examples of concerned ExampleSet. In this case the attribute adopts the role of an identifier and is called ID for short. Unique ids can be given to all the examples using the Generate ID operator.
- label: This is a special role, it acts as a target attribute for learning operators e.g. the Decision Tree operator. Labels identify the examples in any way and they must be predicted for new examples that are not yet characterized in such a manner. The label is also called 'goal variable'.
- prediction: This is a special role, it acts as predicted attribute of a learning scheme. For example when a predictive model is learnt through any learning operator and then it is applied using the Apply Model operator, in the output we have a new attribute with role prediction which holds the values of label predicted by the given model. The label and prediction attributes are also used for evaluating the performance of a model.
- cluster: This is a special role, it indicates the membership of an example of the ExampleSet to a particular cluster. For example, the output of the k-Mean operator adds a column with cluster role.
- weight: This is a special role, it indicates the weight of the examples with regard to the label. Weights are used in learning processes to give different importance to examples with different weights. Attribute weights are used in numerous operators e.g. the Select By Weights operator. Weights can also be used in evaluating the performance of models e.g. the Performance operator has a use example weights parameter to consider the weight of examples during the performance evaluation process.
- batch: This is a special role, it indicates the membership to an example batch.
- user defined: Any role can be provided by directly typing in the textbox instead of selecting a role from the dropdown menu. If 'ignore' is written in the textbox, that attribute will be ignored by the coming operators in the process. This is also a special role, thus it needs to be unique. To ignore multiple attributes unique roles can be assigned like ignore01, ignore02, igonre03 and so on.
- set_additional_rolesClick this button to modify roles of more than one attribute. A click on this button opens a new menu which allows you to select any attribute and assign any role to it. It also allows assigning multiple roles to the same attribute. But, as an attribute can have exactly one role, only the last role assigned to that attribute is actually assigned to it and all previous roles assigned to it are ignored. Range: menu
Setting roles of attributes
In this Example Process, the 'Labor-Negotiation' data set is loaded using the Retrieve operator. The roles of its attributes are changed using the Set Role operator. Here is an explanation of what happens when this process is executed: the attributes name and shift-differential are dropped because standby-pay is also given the label role. As label is a special role and only one attribute of the same special role can exist, the first attributes are dropped and the last attribute (standby-pay) is assigned to the label role. duration is assigned to weight role wage-inc-1st, longterm-disability-assistance, pension, bereavement-assistance and wage-inc-2nd are given a regular role. They were regular attributes even before the reassignment of the same role. Thus assigning the same role will not make any change. As there can be numerous regular attributes, no attribute is dropped. wage-inc-3rd and working-hours roles were not modified. Thus they retain their original roles i.e. regular. col-adj is assigned to id role. education-allowance is assigned to batch role. statutory-holidays and vacations are assigned to ignore0 and ignore1 roles respectively. contrib-to-dental-plan is assigned to prediction role. contrib-to-health-plan is assigned to cluster role.
Some attributes are dropped as explained earlier but note that the number of examples remains the same. Roles assigned in this Example Process were just to show how the Set Role operator works; in real scenarios such assignments of role may not be very useful. This also highlights another point that the Set Role operator is not context-aware. It assigns roles set by the users irrespective of its context. So users must have the knowledge of what role to be assigned in which scenario. Thanks to the Problems View and quick fixes, it becomes easy to set the right roles before applying different learning operators. Note that the Problems View displays two warnings even in this Example Process.