You are viewing the RapidMiner Studio documentation for version 9.5 - Check here for latest version
Using the Cassandra Connector
This guide targets the new Connection Management introduced with RapidMiner Studio 9.3.
For the old Legacy Cassandra connections see the 9.2 documentation
The Cassandra connector allows you to connect to clusters of the NoSQL database Cassandra directly from RapidMiner Studio. It supports all CRUD operations (Create, Read, Update, and Delete), as well as running more sophisticated database commands. This document will walk you through how to:
- Install the NoSQL Connector extension
- Connect to your Cassandra cluster
- Read from Cassandra
- Write to Cassandra
Install the NoSQL Connector extension
First, you need to install the NoSQL Extension:
Connect to Your Cassandra Cluster
Before you can use the Cassandra connector, you have to configure a new Cassandra connection. For this purpose, you will need the connection details of your database (host name, port, and keyspace name). If your Cassandra installation requires authentication, you will also need valid credentials.
In RapidMiner Studio, right-click on the repository you want to store your Cassandra connection in and choose Create Connection.
You can also click on Connections > Create Connection and select the repository from the dropdown of the following dialog.
Enter a name for the new connection and set Connection Type to Cassandra:
Click on Create and switch to the Setup tab in the Edit connection dialog.
Fill in the connection details of your Cassandra cluster:
The preconfigured port is the default port used by Cassandra. Note that Cassandra does not require user authentication by default.
While not required, we recommend testing your new Cassandra connection by clicking the Test connection button. If the test fails, please check whether the details are correct.
Click Save to save your connection and close the Edit connection dialog.
You can now use the newly created connection with all of the Cassandra operators!
Read from Cassandra
The Read Cassandra operator allows to read data from Cassandra tables.
Open a new process in RapidMiner Studio, drag the Read Cassandra operator into the Process view, and connect its output port to the result port of the process: Select your Cassandra connection for the connection entry parameter from the Connections folder of the repository you stored it in by clicking on the button next to it: Alternatively, you can drag the Cassandra connection from the repository into the Process Panel and connect the resulting operator with the Read Cassandra operator.
Define the query consistency level. For clusters with fewer than three nodes, it is recommended to set it to ONE. Otherwise use the default value QUORUM.
Define the query type (query, query file, or table). If you choose table, another parameter will show which will be populated with the tables available.
Run the process! In the Result Perspective, you should see the example set loaded from Cassandra. In our example, the example set contains RapidMiner Studio's Deals sample data set:
Write to Cassandra
The Write Cassandra operator allows to write data to Cassandra tables. As a requirement of the Cassandra data storage system each data row needs to be identified by an unique ID (which can consist of one or more columns). The following example illustrates how to write one of RapidMiner Studio's sample data sets to a new Cassandra table.
Open a new process in RapidMiner Studio.
Drag the Iris sample data set and the Write Cassandra operator into the Process view and connect the operators as shown in the following screen shot. Select your Cassandra connection and enter a name for the new table:
Note that you can also select an existing table.
Cassandra would then update the table with the new data (if the schema of the new data matches the selected Cassandra table schema). This also means that one has to be careful when writing data to Cassandra as data with the same unique ID as the new data will just be overwritten.
Connect the Write Cassandra operator to the results port and run the process!