Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.2 - Check here for latest version

Using the Cassandra Connector

The Cassandra connector allows you to connect to clusters of the NoSQL database Cassandra directly from RapidMiner Studio. It supports all CRUD operations (Create, Read, Update, and Delete), as well as running more sophisticated database commands. This document will walk you through how to:

Install the NoSQL Connector extension

First, you need to install the NoSQL Extension:

Connect to Your Cassandra Cluster

Before you can use the Cassandra connector, you have to configure a new Cassandra connection. For this purpose, you will need the connection details of your database (host name, port, and keyspace name). If your Cassandra installation requires authentication, you will also need valid credentials.

  1. Open the Manage Connections dialog in RapidMiner Studio by going to Manage Connections IconTools > Manage Connections.

  2. Click on Add Connection Add Connection Icon in the lower left:

  3. Enter a name for the new connection and select Cassandra IconCassandra Connection as the Connection Type:

  4. Fill in the connection details of your Cassandra cluster:

    The preconfigured port is the default port used by Cassandra. Note that Cassandra does not require user authentication by default. Optionally, you can test the new configuration by clicking on the Connection Test IconTest button.

  5. Click Save IconSave all changes to save your connection and close the Manage Connections window.

You can now use the newly created connection with all of the Cassandra operators!

Read from Cassandra

The Read Cassandra operator allows to read data from Cassandra tables.

  1. Open a new process New Process Icon in RapidMiner Studio, drag the Read Cassandra operator into the Process view, and connect its output port to the result port of the process:

  2. Select your Cassandra connection from the connection drop down menu in the Parameters view.

  3. Define the query consistency level. For clusters with fewer than three nodes, it is recommended to set it to ONE. Otherwise use the default value QUORUM.

  4. Define the query type (query, query file, or table). If you choose table, another parameter will show which will be populated with the tables available.

  5. Run Run Process the process! In the Result Perspective, you should see the example set loaded from Cassandra. In our example, the example set contains RapidMiner Studio's Deals sample data set:

Write to Cassandra

The Write Cassandra operator allows to write data to Cassandra tables. As a requirement of the Cassandra data storage system each data row needs to be identified by an unique ID (which can consist of one or more columns). The following example illustrates how to write one of RapidMiner Studio's sample data sets to a new Cassandra table.

  1. Open a new process New Process Icon in RapidMiner Studio.

  2. Drag the Iris sample data set and the Write Cassandra operator into the Process view and connect the operators as shown in the following screen shot. Select your Cassandra connection and enter a name for the new table:

    Note that you can also select an existing table.

    Cassandra would then update the table with the new data (if the schema of the new data matches the selected Cassandra table schema). This also means that one has to be careful when writing data to Cassandra as data with the same unique ID as the new data will just be overwritten.

  3. Connect the Write Cassandra operator to the results port and run Run Process the process!