(AI Studio Core)
Synopsis
This operator uses the distance between an example's label value and the result of a local polynomial regression to determine the weight of this example.Description
This operator performs a weighting of the examples and hence the resulting exampleset will contain a new weight attribute. If a weight attribute was already included in the exampleSet, its values will be used as initial values for this algorithm. If not, each example is assigned a weight of 1.
For calculating the weights, this operator will perform a local polynomial regression for each example. For more information about local polynomial regression, take a look at the operator description of the Local Polynomial Regression operator.
After the predicted result has been calculated, the residuals are computed and rescaled using their median.
This result will be transformed by a smooth function, which cuts of values greater than a threshold. This means, that examples without prediction error will gain a weight of 1, while examples with an error greater than the threshold will be down weighted to 0.
This procedure is iterated as often as specified by the user and will result in weights, which will penalize outliers heavily. This is especially useful for algorithms using the least squares optimization such as Linear Regression, Polynomial Regression or Local Polynomial Regression, since least square is very sensitive to outliers.
Input
- example set (Data table)
This is an example set input port
Output
- example set (Data table)
This is an example set output port
Parameters
- degreeSpecifies the degree of the local fitted polynomial. Please keep in mind, that a higher degree than 2 will increase calculation time extremely and probably suffer from overfitting.
- ridge factorSpecifies the ridge factor. This factor is used to penalize high coefficients. In order to aviod overfitting this might be increased.
- iterationsThe number of iterations performed for weight calculation. See operator description for details.
- numerical measureSelect measure
- kernel typeThe kernel type
- kernel gammaThe kernel parameter gamma.
- kernel sigma1The kernel parameter sigma1.
- kernel sigma2The kernel parameter sigma2.
- kernel sigma3The kernel parameter sigma3.
- kernel degreeThe kernel parameter degree.
- kernel shiftThe kernel parameter shift.
- kernel aThe kernel parameter a.
- kernel bThe kernel parameter b.
- neighborhood typeDetermines which type of neighborhood should be used. Either with fixed number of neighbors, or all neighbors within a distance or mixed.
- kSpecifies the number of neighbors in the neighborhood. Regardless of the local density, always that much samples are returned.
- fixed distanceSpecifies the size of the neighborhood. All points within this distance are added.
- relative sizeSpecifies the size of the neighborhood relative to the total number of examples. A value of 0.04 would include 4% of the data points into the neighborhood.
- distanceSpecifies the size of the neighborhood. All points within this distance are added.
- at leastIf the neighborhood count is less than this number, the distance is increased until this number is met.
- smoothing kernelDetermines which kernel type is used to calculate the weights of distant examples.