Categories

Versions

You are viewing the RapidMiner Studio documentation for version 8.1 - Check here for latest version

Prescriptive Analytics (Model Simulator)

Synopsis

Given a model and a desired output, this operator automatically finds the optimal inputs.

Description

In predictive modeling, a model is used to predict an outcome, given an input. This operator reverses that procedure, starting with a model and a desired output, and prescribing an optimized input to achieve the desired outcome.

The operator uses an evolutionary optimization method, based on the model, with one of the following targets:

  • minimize confidence for a class
  • maximize confidence for a class
  • get as close as possible to a certain confidence for a class
  • minimize regression prediction
  • maximize regression prediction
  • get as close as possible to a certain regression prediction

The training data can be used to to constrain the optimization, so that all numerical values satisfy one or more of the following conditions:

  • stay close to the average, within 1 / 2 / 3 times the standard deviation
  • stay above the minimum
  • stay below the maximum
  • stay above a certain value
  • stay below a certain value

Moreover, the user may assign constant values to any of the attributes, overriding the above conditions.

Input

  • model (Model)

    This port expects a model, whose optimal inputs should be identified.

  • training data (Data Table)

    This port expects an ExampleSet, the same ExampleSet that was used to create the model.

Output

  • optimal data (Data Table)

    The optimal data which, when used as an input to the model, delivers the desired result.

Parameters

  • classification Indicates if the model is a classification model or a regression model. Range: boolean
  • class name The class for which the confidence should be optimized. Range: string
  • optimization direction The optimization strategy: minimize, maximize, or specify a value. A specific value can be useful for regression / forecasting problems. Range: selection
  • value to reach Specify a confidence or regression value which should be reached. Only available if the value for "optimization direction" is "specific value". Range: real
  • stay around average (numerical) Indicates if numerical values should stay in a specified range around the average value which helps to prevent extreme values which might be not feasible as inputs. Range: boolean
  • standard deviations around average Defines the number of standard deviations the values can move away from the numerical average. Range: real
  • stay above global minimum (numerical) Indicates if numerical values should stay above the minimum value of the corresponding attribute. Range: boolean
  • stay below global maximum (numerical) Indicates if numerical values should stay below the maximum value of the corresponding attribute. Range: boolean
  • stay above value (numerical) Indicates if numerical values should stay above a specified value. Range: boolean
  • minimum value Attribute values during optimization should stay above this value. Range: real
  • stay below value (numerical) Indicates if numerical values should stay below a specified value. Range:
  • maximum value Attribute values during optimization should stay below this value. Range: real
  • constant attribute values A list of attributes which should be kept at constant values. You can specify name-value pairs with the attribute name on the left and the desired constant value on the right. Range: list
  • limit type Defines when the optimization ends. No limit uses a heuristic to detect the optimum. Time limit stops after specified time. Generations stops after the specified number of generations is reached. Range: selection
  • maximum generations The maximum number of generations for the evolutionary optimization algorithm. Only available if the limit is "generations and population size". Range: integer
  • population size The number of individuals in the population of the evolutionary optimization algorithm. Only available if the limit is "generations and population size". Range: integer
  • time limit (in seconds) The maximum number of seconds the optimization will run. Only available if the limit is "time limit". Range: integer

Tutorial Processes

Prescriptive Analytics for Titanic

This process trains a Naive Bayes model on the Titanic data. It then uses the operator Prescriptive Analytics to find the optimal attribute values which maximize the likelihood for survival.

Please note that most default parameter values will deliver reasonable results without going to the extremes. But we made some important settings. First, we defined that this is a classification problem and that we want to maximize the confidence for the prediction of "Yes". We also set some constant values which are things which you cannot easily change when being a passenger of the Titanic. In this case, this would be the age of the person and the gender. We used the values 40 and Female here.

After the process is executed, you will get a new ExampleSet as a result which will show the optimal settings in this case. If you purchase a first class ticket for $133 and only travel with one parent or child, you will have a 99% likelihood of survival.