Relevance Vector Machine (RapidMiner Studio Core)
Synopsis
This operator is an implementation of Relevance Vector Machine (RVM) which is a probabilistic method both for classification and regression.Description
The Relevance Vector Machine operator is a probabilistic method both for classification and regression. The implementation of the relevance vector machine is based on the original algorithm described by 'Tipping/2001'. The fast version of the marginal likelihood maximization ('Tipping/Faul/2003') is also available if the rvm type parameter is set to 'Constructive-Regression-RVM'.
A Relevance Vector Machine (RVM) is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and classification. The RVM has an identical functional form to the support vector machine, but provides probabilistic classification. It is actually equivalent to a Gaussian process model with a certain covariance function. Compared to that of support vector machines (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require cross-validation-based post-optimizations). However RVMs use an expectation maximization (EM)-like learning method and are therefore at risk of local minima. This is unlike the standard sequential minimal optimization(SMO)-based algorithms employed by SVMs, which are guaranteed to find a global optimum.
Input
- training set (Data Table)
This input port expects an ExampleSet. This operator cannot handle nominal attributes; it can be applied on data sets with numeric attributes. Thus often you may have to use the Nominal to Numerical operator before the application of this operator.
Output
- model (Kernel Model)
The RVM is applied and the resultant model is delivered from this output port. This model can now be applied on unseen data sets.
- example set (Data Table)
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
Parameters
- rvm_typeThis parameter specifies the type of RVM Regression. The following options are available: Regression-RVM, Classification-RVM and Constructive-Regression-RVM. Range: selection
- kernel_typeThe type of the kernel function is selected through this parameter. Following kernel types are supported: rbf, cauchy, laplace, poly, sigmoid, Epanechnikov, gaussian combination, multiquadric Range: selection
- kernel_lengthscaleThis parameter specifies the lengthscale to be used in all kernels. Range: real
- kernel_degreeThis is the kernel parameter degree. This is only available when the kernel type parameter is set to polynomial or epachnenikov. Range: real
- kernel_biasThis parameter specifies the bias to be used in the poly kernel. Range: real
- kernel_sigma1This is the kernel parameter sigma1. This is only available when the kernel type parameter is set to epachnenikov, gaussian combination or multiquadric. Range: real
- kernel_sigma2This is the kernel parameter sigma2. This is only available when the kernel type parameter is set to gaussian combination. Range: real
- kernel_sigma3This is the kernel parameter sigma3. This is only available when the kernel type parameter is set to gaussian combination. Range: real
- kernel_shiftThis is the kernel parameter shift. This is only available when the kernel type parameter is set to multiquadric. Range: real
- kernel_aThis is the kernel parameter a. This is only available when the kernel type parameter is set to sigmoid Range: real
- kernel_bThis is the kernel parameter b. This is only available when the kernel type parameter is set to sigmoid Range: real
- max_iterationThis parameter specifies the maximum number of iterations to be used. Range: integer
- min_delta_log_alphaThe iteration is aborted if the largest log alpha change is smaller than min delta log alpha. Range: real
- alpha_maxThe basis function is pruned if its alpha is larger than the alpha max. Range: real
- use_local_random_seedThis parameter indicates if a local random seed should be used for randomization. Using the same value of local random seed will produce the same randomization. Range: boolean
- local_random_seedThis parameter specifies the local random seed. This parameter is only available if the use local random seed parameter is set to true. Range: integer
Tutorial Processes
Introduction to the RVM operator
The 'Polynomial' data set is loaded using the Retrieve operator. The Split Validation operator is applied on it for training and testing a regression model. The Relevance Vector Machine operator is applied in the training subprocess of the Split Validation operator. All parameters are used with default values. The Relevance Vector Machine operator generates a regression model. The Apply Model operator is used in the testing subprocess to apply this model on the testing data set. The resultant labeled ExampleSet is used by the Performance operator for measuring the performance of the model. The regression model and its performance vector are connected to the output and it can be seen in the Results Workspace.