You are viewing the RapidMiner Studio documentation for version 8.2 - Check here for latest version
Write Database (JDBC Connectors)
Synopsis
This operator writes an ExampleSet to an SQL database.Description
The Write Database operator is used for writing an ExampleSet to the specified SQL database. You need to have at least basic understanding of databases and database connections in order to use this operator properly. Go through the parameters and the attached Example Process to understand the flow of this operator.
The user can specify the database connection and a table name. Please note that the table will be created during writing if it does not exist. The most convenient way of defining the necessary parameters is the Manage Database Connections wizard. The most important parameters (database URL and user name) will be automatically determined by this wizard. At the end, you only have to define the table name. This operator only supports the writing of the complete ExampleSet consisting of all regular and special attributes and all examples. If this is not desired, perform some preprocessing operators like the Select Attributes or Filter Examples operators before applying the Write Database operator. Data from database tables can be read in RapidMiner by using the Read Database operator.
Input
- input
This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.
Output
- through
The ExampleSet that was provided at the input port is delivered through this output port without any modifications. This is usually used to reuse the same ExampleSet in further operators of the process.
Parameters
- define_connectionThis parameter indicates how the database connection should be specified. It gives you three options: predefined, url and jndi. Range: selection
- connectionThis parameter is only available when the define connection parameter is set to predefined. This parameter is used for connecting to the database using a predefined connection. You can have many predefined connections. You can choose one of them using the drop down box. You can add a new connection or modify previous connections using the button next to the drop down box. You may also accomplish this by clicking on Manage Database Connections... from the Tools menu in the main window. A new window appears. This window asks for several details e.g. Host, Port, Database system, schema, username and password. The Test button in this new window will allow you to check whether the connection can be made. Save the connection once the test is successful. After saving a new connection, it can be chosen from the drop down box of the connection parameter. You need to have basic understanding of databases for configuring a connection. Range: string
- database_systemThis parameter is only available when the define connection parameter is set to url. This parameter is used for selecting the database system in use. It can have one of the following values: MySQL, PostgreSQL, Sybase, HSQLDB, ODBC Bridge (e.g. Access), Microsoft SQL Server (JTDS), Ingres, Oracle. Range: selection
- database_urlThis parameter is only available when the define connection parameter is set to url. This parameter is used for defining the URL connection string for the database, e.g. 'jdbc:mysql://foo.bar:portnr/database'. Range: string
- usernameThis parameter is only available when the define connection parameter is set to url. This parameter is used for specifying the username of the database. Range: string
- passwordThis parameter is only available when the define connection parameter is set to url. This parameter is used for specifying the password of the database. Range: string
- jndi_nameThis parameter is only available when the define connection parameter is set to jndi. This parameter is used for giving the JNDI a name for a data source. Range: string
- table_nameThis parameter is used for selecting the required table from the specified database. Please note that you can also write a table name here, if the table does not exist it will be created during writing. Range:
- overwrite_modeThis parameter indicates if an existing table should be overwritten or data should be appended to the existing data. Range: selection
- set_default_varchar_lengthThis parameter allows you to set varchar columns to default length. Range: boolean
- default_varchar_lengthThis parameter is only available when the set default varchar length parameter is set to true. This parameter specifies the default length of varchar columns. Range: integer
- add_generated_primary_keysThis parameter indicates whether a new attribute holding the auto generated primary keys should be added to the table in the database. Range: boolean
- db_key_attribute_nameThis parameter is only available when the add generated primary keys parameter is set to true. This parameter specifies the name of the attribute for the auto generated primary keys. Range: string
- batch_sizeThis parameter specifies the number of examples which are written at once with one single query to the database. Larger values can greatly improve the speed. However, too large values can drastically decrease the performance. Moreover, some databases have restrictions on the maximum number of values written at once. Range: integer
Tutorial Processes
Writing an ExampleSet to a mySQL database
The 'Golf' data set is loaded using the Retrieve operator. The Write Database operator is used for writing this data set to a mySQL database. The define connection parameter is set to predefined and it is configured using the button next to the drop down box. The name of the connection is set to 'mySQLconn'. The following values are set in the connection parameter's wizard: the Database system is set to 'mySQL'. The Host is set to 'localhost'. The Port is set to '3306'. The Database scheme is set to 'golf'; this is the name of the database. The User is set to 'root'. No password is provided. You will need a password if your database is password protected. Set all the values and test the connection. Make sure that the connection works.
The table name parameter is set to 'golf_table' which is the name of the required table in the 'golf' database. Run the process, you will see the entire 'golf_table' in the Results Workspace. You can also check the 'golf' database in phpmyadmin to see the 'golf_table'. You can read this table from the database using the Read Database operator. Please study the Example Process of the Read Database operator for more information.