You are viewing the RapidMiner Radoop documentation for version 10.1 - Check here for latest version
Connecting to a CDH Quickstart VM
As of this writing the latest available version of Cloudera Quickstart VM is 5.13. This guide was created for that.
Start and configure the Quickstart VM
Download the Cloudera Quickstart VM from the Cloudera website.
Import the OVA packaged VM to your virtualization environment (Virtualbox and VMware are covered in this guide).
It is strongly recommended to upgrade to Java 1.8 on the single-node cluster provided by the VM. Otherwise, the execution of Single Process Pushdown and Apply Model operators will fail.
You can take the following steps only if no clusters or Cloudera management services have been started yet. For the full upgrading process, read Cloudera's guide.
Upgrading to Java 1.8:
- Start the VM.
- Download and unzip JDK 1.8 -- preferrably jdk1.8.0_162 or greater -- to
/usr/java/jdk1.8.0_162
. Add the following configuration line to
/etc/default/cloudera-scm-server
:export JAVA_HOME=/usr/java/jdk1.8.0_162
Launch Cloudera Express (or Enterprise trial version).
Open a web browser, and log in to Cloudera Manager (
quickstart.cloudera:7180
) usingcloudera/cloudera
as credentials. Navigate to Hosts / quickstart.cloudera / Configuration. In Java Home Directory field, enter/usr/java/jdk1.8.0_162
On the home page of Cloudera Manager, (re)start the Cloudera QuickStart cluster and Cloudera Management Service as well.
If you are using Virtualbox, make sure that the VM is shut down, and set the type of the primary network adapter from NAT to Host-only. The VM will work only with this setting in a Virtualbox environment.
Start the VM and wait for the boot to complete. A browser with some basic information will appear.
Edit your local
hosts
file (on your host operating system, not inside the VM) and add the following line (replace<vm-ip-address>
with the IP address of the VM):<vm-ip-address> quickstart.cloudera
Setup the connection in RapidMiner Studio
Click on New Connection button and choose Add Connection Manually
Set Hadoop username to
hive
. (As an alternative, you can set both Hadoop username and Username on Hive tab to your own user.)Add
quickstart.cloudera
as NameNode AddressAdd
quickstart.cloudera
as Resource Manager AddressAdd
quickstart.cloudera
as Hive Server AddressSelect Cloudera Hadoop (CDH5) as Hadoop version
Add the following entries to the Advanced Hadoop Parameters:
Key Value dfs.client.use.datanode.hostname
true
(This parameter is not required when using the Import Hadoop Configuration Files option):
Key Value mapreduce.map.java.opts
-Xmx256m
Select the appropriate Spark Version (this should be Spark 1.6 if you want use the VM's built-in Spark assembly jar) and set the Assembly Jar Location to the following value:
local:///usr/lib/spark/lib/spark-assembly.jar