Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.6 - Check here for latest version

Read SPSS (Advanced File Connectors)

Synopsis

This operator is used for reading SPSS files.

Description

The Read SPSS operator can read the data files created by SPSS (Statistical Package for the Social Sciences), an application used for statistical analysis. SPSS files are saved in a proprietary binary format and contain a dataset as well as a dictionary that describes the dataset. These files save data by 'cases' (rows) and 'variables' (columns).

These files have a '.SAV' file extension. SAV files are often used for storing datasets extracted from databases and Microsoft Excel spreadsheets. SPSS datasets can be manipulated in a variety of ways, but they are most commonly used to perform statistical analysis tests such as regression analysis, analysis of variance, and factor analysis.

Input

  • file (File)

    This optional port expects a file object.

Output

  • output (IOObject)

    Data from the SPSS file is delivered through this port mostly in form of an ExampleSet.

Parameters

  • filenameThis parameter specifies the path of the SPSS file. It can be selected using the choose a file button. Range: filename
  • datamanagementThis parameter determines how the data is represented internally. This is an expert parameter. There are different options, users can choose any of them. Range: selection
  • attribute_naming_modeThis parameter determines which SPSS variable properties should be used for naming the attributes. Range: selection
  • use_value_labelsThis parameter specifies if the SPSS value labels should be used as values. Range: boolean
  • recode_user_missingsThis parameter specifies if the SPSS user missings should be recoded to missing values. Range: boolean
  • sample_ratioThis parameter specifies the fraction of the data set which should be read. If it is set to 1, the complete data set is read. If it is set to -1 then the sample size parameter is used for determining the size of the data to read. Range: real
  • sample_sizeThis parameter specifies the exact number of samples which should be read. If it is set to -1, then the sample ratio parameter is used for determining the size of data to read. If both are set to -1 then the complete data set is read. Range: integer
  • use_local_random_seedThis parameter indicates if a local random seed should be used for randomization. Using the same value of local random seed will produce the same randomization. Range: boolean
  • local_random_seedThis parameter specifies the local random seed. This parameter is only available if the use local random seed parameter is set to true. Range: integer

Tutorial Processes

Reading an SPSS file

You need to have an SPSS file for this process. In this process, the name of the SPSS file is airline_passengers.sav and it is placed in the D drive of the computer. The file is read using the Read SPSS operator. All parameters are used with default values. After execution of the process you can see the resultant ExampleSet in the Results Workspace.