Categories

Versions

Support Vector Machine (Evolutionary) (AI Studio Core)

Synopsis

This operator is a SVM implementation using an evolutionary algorithm to solve the dual optimization problem of an SVM.

Description

The Support Vector Machine (Evolutionary) uses an Evolutionary Strategy for optimization. This operator is a SVM implementation using an evolutionary algorithm to solve the dual optimization problem of an SVM. It turns out that on many datasets this simple implementation is as fast and accurate as the usual SVM implementations. In addition, it is also capable of learning with Kernels which are not positive semi-definite and can also be used for multi-objective learning which makes the selection of the parameter C unnecessary before learning. For more information please study 'Evolutionary Learning with Kernels: A Generic Solution for Large Margin Problems' by Ingo Mierswa.

This operator supports various kernel types including dot, radial, polynomial, sigmoid, anova, epachnenikov, gaussian combination and multiquadric. Explanation of these kernel types is given in the parameters section.

Here is a basic description of the SVM. The standard SVM takes a set of input data and predicts, for each given input, which of the two possible classes comprises the input, making the SVM a non-probabilistic binary linear classifier. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite- dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier. Whereas the original problem may be stated in a finite dimensional space, it often happens that the sets to discriminate are not linearly separable in that space. For this reason, it was proposed that the original finite-dimensional space be mapped into a much higher-dimensional space, presumably making the separation easier in that space. To keep the computational load reasonable, the mapping used by the SVM schemes are designed to ensure that dot products may be computed easily in terms of the variables in the original space, by defining them in terms of a kernel function K(x,y) selected to suit the problem. The hyperplanes in the higher dimensional space are defined as the set of points whose inner product with a vector in that space is constant.

Input

  • training set (Data table)

    This input port expects an ExampleSet. This operator cannot handle nominal attributes; it can be applied on data sets with numeric attributes. Thus often you may have to use the Nominal to Numerical operator before the application of this operator.

Output

  • model (Kernel Model)

    The SVM model is delivered from this output port. This model can now be applied on unseen data sets.

  • example set (Data table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • kernel typeThe type of the kernel function is selected through this parameter. Following kernel types are supported: dot, radial, polynomial, sigmoid, anova, epachnenikov, gaussian combination, multiquadric
    • dot: The dot kernel is defined by k(x,y)=x*y i.e. it is inner product of x and y.
    • radial: The radial kernel is defined by exp(-g ||x-y||^2) where g is the gamma, it is specified by the kernel gamma parameter. The adjustable parameter gamma plays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand.
    • polynomial: The polynomial kernel is defined by k(x,y)=(x*y+1)^d where d is the degree of polynomial and it is specified by the kernel degree parameter. The polynomial kernels are well suited for problems where all the training data is normalized.
    • sigmoid: The sigmoid kernel is defined by a two layered neural net tanh(a x*y+b) where a is alpha and b is the intercept constant. These parameters can be adjusted using the kernel a and kernel b parameters. A common value for alpha is 1/N, where N is the data dimension. Note that not all choices of a and b lead to a valid kernel function.
    • anova: The anova kernel is defined by raised to power d of summation of exp(-g (x-y)) where g is gamma and d is degree. gamma and degree are adjusted by the kernel gamma and kernel degree parameters respectively.
    • epanechnikov: The Epanechnikov kernel is this function (3/4)(1-u2) for u between -1 and 1 and zero for u outside that range. It has two adjustable parameters kernel sigma1 and kernel degree.
    • gaussian combination: This is the gaussian combination kernel. It has adjustable parameters kernel sigma1, kernel sigma2 and kernel sigma3.
    • multiquadric: The multiquadric kernel is defined by the square root of ||x-y||^2 + c^2. It has adjustable parameters kernel sigma1 and kernel sigma shift.
  • kernel gammaThis is the kernel parameter gamma. This is only available when the kernel type parameter is set to radial or anova.
  • kernel sigma1This is the kernel parameter sigma1. This is only available when the kernel type parameter is set to epachnenikov, gaussian combination or multiquadric.
  • kernel sigma2This is the kernel parameter sigma2. This is only available when the kernel type parameter is set to gaussian combination.
  • kernel sigma3This is the kernel parameter sigma3. This is only available when the kernel type parameter is set to gaussian combination.
  • kernel shiftThis is the kernel parameter shift. This is only available when the kernel type parameter is set to multiquadric.
  • kernel degreeThis is the kernel parameter degree. This is only available when the kernel type parameter is set to polynomial, anova or epachnenikov.
  • kernel aThis is the kernel parameter a. This is only available when the kernel type parameter is set to sigmoid
  • kernel bThis is the kernel parameter b. This is only available when the kernel type parameter is set to sigmoid
  • CThis is the complexity constant which sets the tolerance for misclassification, where higher C values allow for 'softer' boundaries and lower values create 'harder' boundaries. A complexity constant that is too large can lead to over-fitting, while values that are too small may result in over-generalization.
  • epsilonThis parameter specifies the width of the regression tube loss function of the regression SVM.
  • start population typeThis parameter specifies the type of start population initialization.
  • max generationsThis parameter specifies the number of generations after which the algorithm should be terminated.
  • generations without improvalThis parameter specifies the stop criterion for early stopping i.e. it stops after n generations without improvement in the performance. n is specified by this parameter.
  • population sizeThis parameter specifies the population size i.e. the number of individuals per generation. If set to -1, all examples are selected.
  • tournament fractionThis parameter specifies the fraction of the current population which should be used as tournament members.
  • keep bestThis parameter specifies if the best individual should survive. This is also called elitist selection. Retaining the best individuals in a generation unchanged in the next generation, is called elitism or elitist selection.
  • mutation typeThis parameter specifies the type of the mutation operator.
  • selection typeThis parameter specifies the selection scheme of this evolutionary algorithms.
  • crossover probThe probability for an individual to be selected for crossover is specified by this parameter.
  • use local random seedThis parameter indicates if a local random seed should be used for randomization. Using the same value of local random seed will produce the same randomization.
  • local random seedThis parameter specifies the local random seed. This parameter is only available if the use local random seed parameter is set to true.
  • hold out set ratioThis operator uses this amount as a hold out set to estimate generalization error after learning.
  • show convergence plotThis parameter indicates if a dialog with a convergence plot should be drawn.
  • show population plotThis parameter indicates if the population plot in case of the non-dominated sorting should be shown.
  • return optimization performanceThis parameter indicates if final optimization fitness should be returned as performance.

Tutorial Processes

Introduction to the Support Vector Machine (Evolutionary) operator

This is a simple Example Process which gets you started with the SVM (Evolutionary) operator. The Retrieve operator is used to load the 'Golf' data set. The Nominal to Numerical operator is applied on it to convert its nominal attributes to numerical form. This step is necessary because the SVM (Evolutionary) operator cannot take nominal attributes, it can only classify using numerical attributes. The model generated from the SVM (Evolutionary) operator is then applied on the 'Golf-Testset' data set. The Nominal to Numerical operator was applied on this data set as well. This is necessary because the testing and training data set should be in the same format. The statistical performance of this model is measured using the Performance operator. This is a very basic process. It is recommended that you develop a deeper understanding of SVM for getting better results through this operator. The support vector machine (SVM) is a popular classification technique. However, beginners who are not familiar with SVM often get unsatisfactory results since they miss some easy but significant steps.

You should have a good understanding of kernel types and different parameters associated with each kernel type in order to get better results from this operator. The gaussian combination kernel was used in this example process. All parameters were used with default values. The accuracy of this model was just 35.71%. Try changing different parameters to get better results. If you change the parameter C to 1 instead of 0, you will see that accuracy of the model rises to 64.29%. Thus, you can see how making small changes in parameters can have a significant effect on overall results. Thus it is very necessary to have a good understanding of parameters of kernel type in use. It is equally important to have a good understanding of different kernel types, and choosing the most suitable kernel type for your ExampleSet. Try using the polynomial kernel in this Example Process (also set the parameter C to 0); you will see that accuracy is around 64.29% with default values for all parameters. Change the value of the parameter C to 1 instead of 0. Doing this increased the accuracy of model with gaussian combination kernel, but here you will see that accuracy of the model drops.

Default values were used for most of the parameters in the process. To get more accurate results these values should be carefully selected. Usually techniques like cross-validation are used to find the best values of these parameters for the ExampleSet under consideration.