Categories

Versions

You are viewing the RapidMiner Radoop documentation for version 9.7 - Check here for latest version

Connecting to a CDH Quickstart VM

As of this writing the latest available version of Cloudera Quickstart VM is 5.13. This guide was created for that.

Start and configure the Quickstart VM

  1. Download the Cloudera Quickstart VM from the Cloudera website.

  2. Import the OVA packaged VM to your virtualization environment (Virtualbox and VMware are covered in this guide).

  3. It is strongly recommended to upgrade to Java 1.8 on the single-node cluster provided by the VM. Otherwise, the execution of Single Process Pushdown and Apply Model operators will fail.

    You can take the following steps only if no clusters or Cloudera management services have been started yet. For the full upgrading process, read Cloudera's guide.

    Upgrading to Java 1.8:

    • Start the VM.
    • Download and unzip JDK 1.8 -- preferrably jdk1.8.0_162 or greater -- to /usr/java/jdk1.8.0_162.
    • Add the following configuration line to /etc/default/cloudera-scm-server:

        export JAVA_HOME=/usr/java/jdk1.8.0_162
      
    • Launch Cloudera Express (or Enterprise trial version).

    • Open a web browser, and log in to Cloudera Manager (quickstart.cloudera:7180) using cloudera/cloudera as credentials. Navigate to Hosts / quickstart.cloudera / Configuration. In Java Home Directory field, enter

        /usr/java/jdk1.8.0_162
      
    • On the home page of Cloudera Manager, (re)start the Cloudera QuickStart cluster and Cloudera Management Service as well.

  4. If you are using Virtualbox, make sure that the VM is shut down, and set the type of the primary network adapter from NAT to Host-only. The VM will work only with this setting in a Virtualbox environment.

  5. Start the VM and wait for the boot to complete. A browser with some basic information will appear.

  6. Edit your local hosts file (on your host operating system, not inside the VM) and add the following line (replace <vm-ip-address> with the IP address of the VM):

    <vm-ip-address> quickstart.cloudera

Setup the connection in RapidMiner Studio

  1. Click on New Connection Icon New Connection button and choose Manual Connection Icon Add Connection Manually

  2. Set Hadoop username to hive. (As an alternative, you can set both Hadoop username and Username on Hive tab to your own user.)

  3. Add quickstart.cloudera as NameNode Address

  4. Add quickstart.cloudera as Resource Manager Address

  5. Add quickstart.cloudera as Hive Server Address

  6. Select Cloudera Hadoop (CDH5) as Hadoop version

  7. Add the following entries to the Advanced Hadoop Parameters:

    Key Value
    dfs.client.use.datanode.hostname true

    (This parameter is not required when using the Import Hadoop Configuration Files option):

    Key Value
    mapreduce.map.java.opts -Xmx256m
  8. Select the appropriate Spark Version (this should be Spark 1.6 if you want use the VM's built-in Spark assembly jar) and set the Assembly Jar Location to the following value:

    local:///usr/lib/spark/lib/spark-assembly.jar