Categories

Versions

You are viewing the RapidMiner Studio documentation for version 2024.0 - Check here for latest version

Using the Cassandra Connector

The Cassandra connector allows you to connect to clusters of the NoSQL database Cassandra directly from Altair AI Studio. It supports all CRUD operations (Create, Read, Update, and Delete), as well as running more sophisticated database commands. This document will walk you through how to:

Install the NoSQL Connector extension

First, you need to install the NoSQL Extension:

Connect to Your Cassandra Cluster

Before you can use the Cassandra connector, you have to configure a new Cassandra connection. For this purpose, you will need the connection details of your database (host name, port, and keyspace name). If your Cassandra installation requires authentication, you will also need valid credentials.

  1. In Altair AI Studio, right-click on the repository you want to store your Cassandra connection in and choose New Connection Icon Create Connection.

    img/01-create-new-connection.png

    You can also click on Connections > New Connection Icon Create Connection and select the repository from the dropdown of the following dialog.

  2. Enter a name for the new connection and set Connection Type to Cassandra Icon Cassandra:

    img/cassandra/create_cassandra.png

  3. Click on Create IconCreate and switch to the Setup tab in the Edit connection dialog.

  4. Fill in the connection details of your Cassandra cluster:

    img/cassandra/cassandra_basic.png

    The preconfigured port is the default port used by Cassandra. Note that Cassandra does not require user authentication by default.

    While not required, we recommend testing your new Cassandra connection by clicking the Connection Test IconTest connection button. If the test fails, please check whether the details are correct.

  5. Click Save IconSave to save your connection and close the Edit connection dialog.

You can now use the newly created connection with all of the Cassandra operators!

Read from Cassandra

The Read Cassandra operator allows to read data from Cassandra tables.

  1. Open a new process New Process Icon in Altair AI Studio, drag the Read Cassandra operator into the Process view, and connect its output port to the result port of the process: Select your Cassandra connection for the connection entry parameter from the Connections folder of the repository you stored it in by clicking on the repository chooser icon button next to it:

    img/cassandra/cassandra_select_connection.png

    Alternatively, you can drag the Cassandra connection from the repository into the Process Panel and connect the resulting operator with the Read Cassandra operator.

    img/cassandra/cassandra_select_connection_repository.png

  2. Define the query consistency level. For clusters with fewer than three nodes, it is recommended to set it to ONE. Otherwise use the default value QUORUM.

  3. Define the query type (query, query file, or table). If you choose table, another parameter will show which will be populated with the tables available.

  4. Run Run Process the process! In the Result Perspective, you should see the example set loaded from Cassandra. In our example, the example set contains Altair AI Studio's Deals sample data set:

    img/cassandra/cassandra_result.png

Write to Cassandra

The Write Cassandra operator allows to write data to Cassandra tables. As a requirement of the Cassandra data storage system each data row needs to be identified by an unique ID (which can consist of one or more columns). The following example illustrates how to write one of Altair AI Studio's sample data sets to a new Cassandra table.

  1. Open a new process New Process Icon in Altair AI Studio.

  2. Drag the Iris sample data set and the Write Cassandra operator into the Process view and connect the operators as shown in the following screen shot. Select your Cassandra connection and enter a name for the new table:

    img/cassandra/cassandra_write_iris.png

    Note that you can also select an existing table.

    Cassandra would then update the table with the new data (if the schema of the new data matches the selected Cassandra table schema). This also means that one has to be careful when writing data to Cassandra as data with the same unique ID as the new data will just be overwritten.

  3. Connect the Write Cassandra operator to the results port and run Run Process the process!