You are viewing the RapidMiner Studio documentation for version 8.0 - Check here for latest version

Write Cassandra (NoSQL)

Synopsis

This operator writes an example set to a Cassandra database.

Description

The 'Write Cassandra' operator writes an example set to a Cassandra database. The input example set is expected to have an ID attribute which is used as primary key for the selected Cassandra table. If the table has a compound primary key use the parameter 'primary key attributes' to add more attribute as key attributes.

Input

  • input (Data Table)

    Requires an example set read by an appropriate operator. The example set must contain an ID attribute. Therefore a Set rule operator must be added to the process in order to specify the ID attribute.

Output

  • output (Data Table)

    The passed through example set that is written to the Cassandra database.

Parameters

  • conncetion The connection details for the Cassandra connection have to be specified. If you have already configured a Cassandra connection, you can select it from the drop-down list. If you have not configured a Cassandra connection yet, select the Cassandra icon right to the drop-down list. Create a new Cassandra connection in the Manage connections box. The contact points and keyspace name are mandatory. Range: configurable
  • consistency_level The consistency level for the Cassandra query. The consistency level defines how many Cassandra nodes have to respond to the query in order to be successful. Possible levels are: ONE, TWO, THREE, QUORUM, ALL, ANY
    • ONE: A write must be written at least to one node.
    • TWO: A write must be written at least to two nodes.
    • THREE: A write must be written at least to three nodes.
    • QUORUM: A write must be written at least on a quorum of nodes. A quorum is calculated as (rounded down to a whole number): (replication_factor / 2) + 1. For example, with a replication factor of 3, a quorum is 2 (can tolerate 1 node down). With a replication factor of 6, a quorum is 4 (can tolerate 2 nodes down).
    • ALL: A write must be written on all nodes in the cluster for that row key.
    • ANY: A write must be written to at least one node
    Range: selection
  • table_name Name of the table to which the example set should be written. If a table with the same name already exists, it is updated, presupposed the example set is compatible, i.e., attribute names and types do match. In case the table does not exist yet, a new table with this name is created and the example set is written to this table. The ID attribute of the example set is used as primary key. In case index columns should be defined for the newly created table, use the parameter 'index columns'. Range: string
  • batch_size This parameter defines the maximum number of rows which should be written with one request. Default value is 1000. Range: integer
  • primary_key_attributes If the Cassandra table already exists and has a compound primary key, you can add more attributes to the primary key that is used to store the example set. If the Cassandra table does not exist yet, you can add primary key attributes in the Edit parameter list: primary key attributes to create a compound primary key. This primary key consists on the ID attribute and the selected attributes. Range: enumeration
  • index_columns This option is only required in case the Cassandra table does not exists yet. It allows you to define columns as index columns for the newly created table in the Edit paramater list: index columns. Range: enumeration
  • use_ttl If the checkbox is activated, an additional parameter 'ttl' (Time To Live) is displayed. The parameter allows you to specify a time interval value in seconds for the written data. If set, the inserted values are automatically removed from the database after the specified time interval. Note: This remove action affects only the inserted values, not the column themselves. This means that any subsequent update of the column will reset the 'ttl' value. By default, values are never removed. Range: boolean
  • ttl If the 'use_ttl' checkbox is activated, you can specify a value in seconds. By default this value is 120 seconds. You can enter any positive number >= 1. Range: integer