Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.3 - Check here for latest version

Using the Azure Blob Storage Connector

The Azure Blob Storage Connector allows you to access your Azure Blob Storage directly from RapidMiner Studio. Both read and write operations are supported. This document will walk you through how to:

Connect to your Azure Blob Storage account

To configure a new Azure Blob Storage Connection you will need the connection details of your Azure Blob Storage account (at least the access key and the secret key).

  1. In RapidMiner Studio, right-click on the repository you want to store your Azure Blob Storage Connection in and choose New Connection Icon Create Connection.

    img/azure-blob/01-create-new-connection.png

    You can also click on Connections > Create Connection New Connection Icon and select the repository from the dropdown of the following dialog.

  2. Give a name to the new Connection, and set Connection Type to Azure Blob Storage Icon Azure Blob Storage:

    img/azure-blob/02-create-select-azure-blob-type.png

  3. Click on Create IconCreate and switch to the Setup tab in the Edit connection dialog.

  4. Fill in the connection details of your Azure Blob Storage account:

    img/azure-blob/04-fill-in-azure-blob-connection-details.png

    While not required, we recommend testing your new Azure Blob Storage Connection by clicking the Connection Test IconTest connection button. If the test fails, please check whether the details are correct.

  5. Click Save IconSave to save your Connection and close the Edit connection dialog. You can now start using the Azure Blob Storage operators!

Read from Azure Blob Storage

The Read Azure Blob Storage operator reads data from your Azure Blob Storage account. The operator can be used to load arbitrary file formats, since it only downloads and does not process the files. To process the files, you will need to use additional operators such as Read Document, Read Excel, or Read XML.

Let us start with reading a simple log file from Azure Blob Storage.

  1. Drag a Read Azure Blob Storage operator into the Process Panel. Select your Azure Blob Storage Connection for the connection entry parameter from the Connections folder of the repository you stored it in by clicking on the repository chooser icon button next to it:

    img/azure-blob/01-choose-connection-from-repo.png

    Alternatively, you can drag the Azure Blob Storage Connection from the repository into the Process Panel and connect the resulting operator with the Read Azure Blob Storage operator.

    img/azure-blob/01-retrieve-connection-from-repo.png

  2. Click on the file chooser button file chooser icon to view the files in your Azure Blob Storage account. Select the file that you want to load and click File Chooser IconOpen.

    img/azure-blob/read-from-azure-blob-03.png

    As mentioned above, the Read Azure Blob Storage operator does not process the contents of the specified file. In our example, we have chosen a log file (a plain text file). This file type can be processed via the Read Document operator which is part of the Text Processing extension for RapidMiner Studio.

  3. If you have not already installed the Text Processing extension for RapidMiner Studio, please go to the marketplace and do so now. Then add a Read Document operator between the Read Azure Blob Storage operator and the result port:

    img/azure-blob/03-add-read-document.png

  4. Run Run Process the process! In the Results perspective, you should see a single document containing the content of the log file.

    img/azure-blob/04-result-log-file.png

You could now use further text processing operators to work with this document, e.g., to determine the commonness of certain events. To write results back to Azure Blob Storage, you can use the Write Azure Blob Storage operator. It uses the same Connection Type as the Read Azure Blob Storage operator and has a similar interface. You can also read from a set of files in an Azure Blob Storage directory, using the Loop Azure Blob Storage operator. For this you need to specify the connection entry and the folder that you want to process, as well the steps of the processing loop with nested operators. For more details please read the help of the Loop Azure Blob Storage operator.