Categories

Versions

Write Special Format (RapidMiner Studio Core)

Synopsis

This operator writes an ExampleSet or subset of an ExampleSet in a special user defined format.

Description

The path of the file is specified through the example set file parameter. The special format parameter is used for specifying the exact format. The character following the $ character introduces a command. Additional arguments to this command may be supplied by enclosing them in square brackets. The following commands can be used in the special format parameter:

  • $a : This command writes all attributes separated by the default separator.
  • $a[separator] : This command writes all attributes separated by a separator (the separator is specified as an argument in brackets).
  • $s[separator][indexSeparator] : This command writes in sparse format. The separator and indexSeparator are provided as first and second arguments respectively. For all non zero attributes the following strings are concatenated: the column index, the value of the indexSeparator, the attribute value. The attributes are separated by the specified separator.
  • $v[name] : This command writes the values of a single attribute. The attribute name is specified as an argument. This command can be used for writing both regular and special attributes.
  • $k[index] : This command writes the values of a single attribute. The attribute index is specified as an argument. The indices start from 0. This command can be used for writing only regular attributes.
  • $l : This command writes the values of the label attribute.
  • $p : This command writes the values of the predicted label attribute.
  • $d : This command writes all prediction confidences for all classes in the form 'conf(class)=value'
  • $d[class] : This command writes the prediction confidences for the defined class as a simple number. The required class is provided as an argument.
  • $i : This command writes the values of the id attribute.
  • $w : This command writes the example weights.
  • $b : This command writes the batch number.
  • $n : This command writes the newline character i.e. newline is inserted when this character is reached.
  • $t : This command writes the tabulator character i.e. tab is inserted when this character is reached.
  • $$ : This command writes the dollar sign.
  • $[ : This command writes the '[' character i.e. the opening square bracket.
  • $] : This command writes the ']' character i.e. the closing square bracket.
Please Make sure that the format string ends with $n or the add line separator parameter is set to true if you want examples to be separated by newlines.

Input

  • input (Data Table)

    This input port expects an ExampleSet. It is output of the Apply Model operator in the attached Example Process. The output of other operators can also be used as input.

Output

  • through (Data Table)

    The ExampleSet that was provided at the input port is delivered through this output port without any modifications. This is usually used to reuse the same ExampleSet in further operators of the process.

Parameters

  • example_set_fileThe ExampleSet is written into the file specified through this parameter. Range: filename
  • special_formatThis parameter specifies the exact format of the file. Many commands are available for specifying the format. These commands are discussed in the description of this operator. Range: string
  • fraction_digitsThis parameter specifies the number of fraction digits in the output file. This parameter is used for rounding off real numbers. Setting this parameter to -1 will write all possible digits i.e. no rounding off is done. Range: integer
  • quote_nominal_valuesThis parameter indicates if nominal values should be quoted with double quotes. Range: boolean
  • add_line_separatorThis parameter indicates if each example should be followed by a line break or not . If set to true, each example is followed by a line break automatically. Range: boolean
  • zippedThis parameter indicates if the data file content should be zipped or not. Range: boolean
  • overwrite_modeThis parameter indicates if an existing file should be overwritten or data should be appended. Range: selection
  • encodingThis is an expert parameter. There are different options, users can choose any of them Range: selection

Tutorial Processes

Writing labeled data set in a user-defined format

The k-NN classification model is trained on the 'Golf' data set. The trained model is then applied on the 'Golf-Testset' data set using the Apply Model operator. The resulting labeled data set is written in a file using the Write Special Format operator. Have a look at the parameters of the Write Special Format operator. You can see that the ExampleSet is written into a file named 'special'. The special format parameter is set to ' $[ $l $] $t $p $t $d[yes] $t $d[no]'. This format string is composed of a number of commands, it can be interpreted as: '[label] predicted_label confidence (yes) confidence (no)'. This format string states that four attributes shall be written in the file i.e. 'label', 'predicted label', 'confidence (yes)' and 'confidence (no)'. Each attribute should be separated by a tab. The label attribute should be enclosed in square brackets. Run the process and see the written file for verification.