Gaussian Process (RapidMiner Studio Core)

Synopsis

This operator is an implementation of Gaussian Process (GP) which is a probabilistic method both for classification and regression.

Description

A Gaussian process is a stochastic process whose realizations consist of random values associated with every point in a range of times (or of space) such that each such random variable has a normal distribution. Moreover, every finite collection of those random variables has a multivariate normal distribution. Gaussian processes are important in statistical modeling because of properties inherited from the normal. For example, if a random process is modeled as a Gaussian process, the distributions of various derived quantities can be obtained explicitly. Such quantities include: the average value of the process over a range of times; the error in estimating the average using sample values at a small set of times.

Gaussian processes (GPs) extend multivariate Gaussian distributions to infinite dimensionality. Formally, a Gaussian process generates data located throughout some domain such that any finite subset of the range follows a multivariate Gaussian distribution. Gaussian Process is a powerful non-parametric machine learning technique for constructing comprehensive probabilistic models of real world problems. They can be applied to geostatistics, supervised, unsupervised and reinforcement learning, principal component analysis, system identification and control, rendering music performance, optimization and many other tasks.

Input

training set (Data Table)
This input port expects an ExampleSet. This operator cannot handle nominal attributes; it can be applied on data sets with numeric attributes. Thus often you may have to use the Nominal to Numerical operator before the application of this operator.

Output

model (Kernel Model)
The Gaussian Process is applied and the resultant model is delivered from this output port. This model can now be applied on unseen data sets.
example set (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

kernel_typeThe type of the kernel function is selected through this parameter. Following kernel types are supported: rbf, cauchy, laplace, poly, sigmoid, Epanechnikov, gaussian combination, multiquadric. Range: selection
kernel_lengthscaleThis parameter specifies the lengthscale to be used in all kernels. Range: real
kernel_degreeThis is the kernel parameter degree. This is only available when the kernel type parameter is set to polynomial or epachnenikov. Range: real
kernel_biasThis parameter specifies the bias to be used in the poly kernel. Range: real
kernel_sigma1This is the kernel parameter sigma1. This is only available when the kernel type parameter is set to epachnenikov, gaussian combination or multiquadric. Range: real
kernel_sigma2This is the kernel parameter sigma2. This is only available when the kernel type parameter is set to gaussian combination. Range: real
kernel_sigma3This is the kernel parameter sigma3. This is only available when the kernel type parameter is set to gaussian combination. Range: real
kernel_shiftThis is the kernel parameter shift. This is only available when the kernel type parameter is set to multiquadric. Range: real
kernel_aThis is the kernel parameter a. This is only available when the kernel type parameter is set to sigmoid Range: real
kernel_bThis is the kernel parameter b. This is only available when the kernel type parameter is set to sigmoid Range: real
max_basis_vectorsThis parameter specifies the maximum number of basis vectors to be used. Range: integer
epsilon_tolThis parameter specifies the tolerance for gamma induced projections. Range: real
geometrical_tolThis parameter specifies the tolerance for geometry induced projections. Range: real

Tutorial Processes

Introduction to the Gaussian Process operator

The 'Polynomial' data set is loaded using the Retrieve operator. The Split Validation operator is applied on it for training and testing a regression model. The Gaussian Process operator is applied in the training subprocess of the Split Validation operator. All parameters are used with default values. The Gaussian Process operator generates a regression model. The Apply Model operator is used in the testing subprocess to apply this model on the testing data set. The resultant labeled ExampleSet is used by the Performance operator for measuring the performance of the model. The regression model and its performance vector are connected to the output and it can be seen in the Results Workspace.