Read XRFF (Advanced File Connectors)

Synopsis

This operator is used for reading XRFF (eXtensible attribute-Relation File Format) files.

Description

This operator can read XRFF files known from Weka. The XRFF (eXtensible attribute-Relation File Format) is an XML-based extension of the ARFF format in some sense similar to the original RapidMiner file format for attribute description files (.aml). You can see a sample XRFF file by studying the attached Example Process.

Since the XML representation takes up considerably more space because the data is wrapped into XML tags, one can also compress the data via gzip. RapidMiner automatically recognizes a file being gzip compressed, if the file's extension is .xrff.gz instead of .xrff.

The XRFF file is divided into two portions i.e. the header and the body. The header has the meta data description and the body has the instances. Via the class="yes" attribute in the attribute specification in the header, one can define which attribute should be used as a prediction label attribute. Although the RapidMiner terminology for such classes is "label" instead of "class" we support the terminology class in order to have compatibility with the original XRFF files.

Input

  • file (File)

    This optional port expects a file object.

Output

  • output (Data Table)

    The XRFF file is read from the specified path and the resultant ExampleSet is delivered through this port.

Parameters

  • data_fileThis parameter specifies the path of the XRFF file. It can be selected using the choose a file button. Range: filename
  • id_attributeThis parameter specifies the name of the id attribute. Please note that this field is case-sensitive. Range: string
  • datamanagementThis parameter determines how the data is represented internally. This is an expert parameter. There are different options, users can choose any of them. Range: selection
  • decimal_point_characterThis parameter specifies the character that is used as decimal point. Range: string
  • sample_ratioThis parameter specifies the fraction of the data set which should be read. If it is set to 1, the complete data set is read. If it is set to -1 then the sample size parameter is used for determining the size of the data to read. Range: real
  • sample_sizeThis parameter specifies the exact number of samples which should be read. If it is set to -1 the sample ratio parameter is used for determining the size of data to read. If both are set to -1 the complete data set is read. Range: integer
  • use_local_random_seedThis parameter indicates if a local random seed should be used for randomization. Using the same value of local random seed will produce the same randomization. Range: boolean
  • local_random_seedThis parameter specifies the local random seed. This parameter is only available if the use local random seed parameter is set to true. Range: integer

Tutorial Processes

Writing and Reading an XRFF file

This Example Process demonstrates the use of the Write XRFF and Read XRFF operators respectively. This Example Process shows how these operators can be used to write and read an ExampleSet. The 'Golf' data set is loaded using the Retrieve operator. This ExampleSet is provided as input to the Write XRFF operator. The example set file parameter is set to 'D:\golf_xrff' thus a file named 'golf_xrff' is created (if it does not already exist) in the 'D' drive of your computer. You can open the written file and make changes in it (if required). The Read XRFF operator is applied next. The data file parameter is set to 'D:\golf_xrff' to read the file that was just written using the Write XRFF operator. The remaining parameters are used with default values. The resultant ExampleSet can be seen in the Results Workspace.