Prescriptive Analytics (Model Simulator)

Synopsis

Given a model and a desired output, this operator automatically finds the optimal inputs.

Description

In predictive modeling, a model is used to predict an outcome, given an input. This operator reverses that procedure, starting with a model and a desired output, and prescribing an optimized input to achieve the desired outcome.

The operator uses an evolutionary optimization method, based on the model, with one of the following targets:

minimize confidence for a class
maximize confidence for a class
get as close as possible to a certain confidence for a class
minimize regression prediction
maximize regression prediction
get as close as possible to a certain regression prediction

The training data can be used to to constrain the optimization, so that all numerical values satisfy one or more of the following conditions:

stay close to the average, within 1 / 2 / 3 times the standard deviation
stay above the minimum
stay below the maximum
stay above a certain value
stay below a certain value

Moreover, the user may assign constant values to any of the attributes, overriding the above conditions.

Input

model (Model)
This port expects a model, whose optimal inputs should be identified.
training data (Data table)
This port expects an ExampleSet, the same ExampleSet that was used to create the model.

Output

optimal data (Data table)
The optimal data which, when used as an input to the model, delivers the desired result.

Parameters

classification Indicates if the model is a classification model or a regression model. Range: boolean
class name The class for which the confidence should be optimized. Range: string
optimization direction The optimization strategy: minimize, maximize, or specify a value. A specific value can be useful for regression / forecasting problems. Range: selection
value to reach Specify a confidence or regression value which should be reached. Only available if the value for "optimization direction" is "specific value". Range: real
stay around average (numerical) Indicates if numerical values should stay in a specified range around the average value which helps to prevent extreme values which might be not feasible as inputs. Range: boolean
standard deviations around average Defines the number of standard deviations the values can move away from the numerical average. Range: real
stay above global minimum (numerical) Indicates if numerical values should stay above the minimum value of the corresponding attribute. Range: boolean
stay below global maximum (numerical) Indicates if numerical values should stay below the maximum value of the corresponding attribute. Range: boolean
stay above value (numerical) Indicates if numerical values should stay above a specified value. Range: boolean
minimum value Attribute values during optimization should stay above this value. Range: real
stay below value (numerical) Indicates if numerical values should stay below a specified value. Range:
maximum value Attribute values during optimization should stay below this value. Range: real
constant attribute values A list of attributes which should be kept at constant values. You can specify name-value pairs with the attribute name on the left and the desired constant value on the right. Range: list
limit type Defines when the optimization ends. No limit uses a heuristic to detect the optimum. Time limit stops after specified time. Generations stops after the specified number of generations is reached. Range: selection
maximum generations The maximum number of generations for the evolutionary optimization algorithm. Only available if the limit is "generations and population size". Range: integer
population size The number of individuals in the population of the evolutionary optimization algorithm. Only available if the limit is "generations and population size". Range: integer
time limit (in seconds) The maximum number of seconds the optimization will run. Only available if the limit is "time limit". Range: integer

Tutorial Processes

Prescriptive Analytics for Titanic

This process trains a Naive Bayes model on the Titanic data. It then uses the operator Prescriptive Analytics to find the optimal attribute values which maximize the likelihood for survival.

Please note that most default parameter values will deliver reasonable results without going to the extremes. But we made some important settings. First, we defined that this is a classification problem and that we want to maximize the confidence for the prediction of "Yes". We also set some constant values which are things which you cannot easily change when being a passenger of the Titanic. In this case, this would be the age of the person and the gender. We used the values 40 and Female here.

After the process is executed, you will get a new ExampleSet as a result which will show the optimal settings in this case. If you purchase a first class ticket for $133 and only travel with one parent or child, you will have a 99% likelihood of survival.

Categories

Versions