You are viewing the RapidMiner Radoop documentation for version 9.7 - Check here for latest version
Connecting to a CDH Quickstart VM
As of this writing the latest available version of Cloudera Quickstart VM is 5.13. This guide was created for that.
Start and configure the Quickstart VM
Download the Cloudera Quickstart VM from the Cloudera website.
Import the OVA packaged VM to your virtualization environment (Virtualbox and VMware are covered in this guide).
It is strongly recommended to upgrade to Java 1.8 on the single-node cluster provided by the VM. Otherwise, the execution of Single Process Pushdown and Apply Model operators will fail.
You can take the following steps only if no clusters or Cloudera management services have been started yet. For the full upgrading process, read Cloudera's guide.
Upgrading to Java 1.8:
- Start the VM.
- Download and unzip JDK 1.8 -- preferrably jdk1.8.0_162 or greater -- to
Add the following configuration line to
Launch Cloudera Express (or Enterprise trial version).
Open a web browser, and log in to Cloudera Manager (
cloudera/clouderaas credentials. Navigate to Hosts / quickstart.cloudera / Configuration. In Java Home Directory field, enter
On the home page of Cloudera Manager, (re)start the Cloudera QuickStart cluster and Cloudera Management Service as well.
If you are using Virtualbox, make sure that the VM is shut down, and set the type of the primary network adapter from NAT to Host-only. The VM will work only with this setting in a Virtualbox environment.
Start the VM and wait for the boot to complete. A browser with some basic information will appear.
Edit your local
hostsfile (on your host operating system, not inside the VM) and add the following line (replace
<vm-ip-address>with the IP address of the VM):
Setup the connection in RapidMiner Studio
Click on New Connection button and choose Add Connection Manually
Set Hadoop username to
hive. (As an alternative, you can set both Hadoop username and Username on Hive tab to your own user.)
quickstart.clouderaas NameNode Address
quickstart.clouderaas Resource Manager Address
quickstart.clouderaas Hive Server Address
Select Cloudera Hadoop (CDH5) as Hadoop version
(This parameter is not required when using the Import Hadoop Configuration Files option):
Select the appropriate Spark Version (this should be Spark 1.6 if you want use the VM's built-in Spark assembly jar) and set the Assembly Jar Location to the following value: