Categories

Versions

Vector Linear Regression (RapidMiner Studio Core)

Synopsis

This operator calculates a vector linear regression model from the input ExampleSet.

Description

Regression is a technique used for numerical prediction. Regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable ( i.e. the label attribute) and a series of other changing variables known as independent variables (regular attributes). Just like Classification is used for predicting categorical labels, Regression is used for predicting a continuous value. For example, we may wish to predict the salary of university graduates with 5 years of work experience, or the potential sales of a new product given its price. Regression is often used to determine how much specific factors such as the price of a commodity, interest rates, particular industries or sectors influence the price movement of an asset.

Linear regression attempts to model the relationship between a scalar variable and one or more explanatory variables by fitting a linear equation to observed data. For example, one might want to relate the weights of individuals to their heights using a linear regression model.

This operator performs a vector linear regression. It regresses all regular attributes upon a vector of labels. The attributes forming the vector should be marked as special, the special role names of all label attributes should start with 'label'.

Input

  • training set (Data Table)

    This input port expects an ExampleSet. This operator cannot handle nominal attributes; it can be applied on data sets with numeric attributes. Thus often you may have to use the Nominal to Numerical operator before application of this operator.

Output

  • model (Linear Regression Model)

    The regression model is delivered from this output port. This model can now be applied on unseen data sets.

  • example set (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • use_biasThis parameter indicates if an intercept value should be calculated or not. Range: boolean
  • ridgeThis parameter specifies the ridge parameter for using in ridge regression. Range: real

Tutorial Processes

Applying the Vector Linear Regression operator on the Polynomial data set

The 'Polynomial' data set is loaded using the Retrieve operator. The Split Data operator is applied for splitting the ExampleSet into training and testing data sets. The Vector Linear Regression operator is applied on the training data set with default values of all parameters. The regression model generated by the Vector Linear Regression operator is applied on the testing data set of the 'Polynomial' data set using the Apply Model operator. The resultant labeled data from the Apply Model operator can be seen in the Results Workspace.