You are viewing the RapidMiner Radoop documentation for version 10.1 - Check here for latest version

RapidMiner Radoop Property Settings

The table below describes the general settings that influence RapidMiner Radoop's operation. They are found in the RapidMiner Studio > Settings > Preferences pull-down menu dialog, under the Radoop tab. All other settings that control the execution are located in the Connection settings. See the Configuring Radoop Connections page for a complete list.

Note that each internal key starts with the prefix rapidminer.radoop.

Miscellaneous

Property	Internal key	Default value	Description
Auto describe	auto_describe	disabled	Toggles whether to automatically describe all Hive objects after connection or refresh. If enabled, this property saves the state of the toggle button on the Hadoop Data view. All metadata of your Hive objects are immediately fetched, which can be slow if there are many objects.
Describe max errors	describe.max_errors	5	Sets the threshold for errors. The Hadoop Data view considers a connection failed if it encounters, during describing Hive objects, more errors than this limit. You may have to increase this value if, for example, you have many Hive objects erroring when described (e.g., missing custom input/output format classes).

Sample size

Property	Internal key	Default value	Description
Sample size overall	sample_size.overall	200000	Sets the sample size for Hadoop data sets on nest output. When data arrives onto the output of the Radoop Nest, it is fetched into the client machine's memory. Use this value to limit the size of the data (sample). A value of 0 means full sample.
Sample size breakpoint	sample_size.breakpoint	1000	Sets the sample size for a Hadoop data set after a breakpoint in a process, and in the Hadoop Data view. When you pause a RapidMiner Radoop process using a breakpoint, a sample of the processed data is fetched into the client machine's memory to be manually examined. Use this value to define the number of rows in the sample. The Hadoop Data View also uses this limit when exploring tables. A value of 0 means full sample.

Logging

Property	Internal key	Default value	Description
Enable log4j logging	log4j	disabled	Determines if log4j logs should be collected into the user folder.
Log4j properties file	log4j.properties		If log4j log collection is enabled and you wish to use your own log4j.properties file, define its location here. The file must contain the 'log4j.rootLogger' property which defines the logging level and the appenders to attach.

JDBC Connection Pool

Property	Internal key	Default value	Description
Connection pool size	connection_pool.fast_statement.size	8	Size of the Hive JDBC connection pool. Increase it if you want to run many operations in parallel (e.g. on RapidMiner Server).
Connection pool timeout	connection_pool.fast_statement.timeout	85	Timeout for waiting for available connection (seconds).

Logging

Property	Internal key	Default value	Description
Enable log4j logging	radoop.log4j	disabled	Determines if log4j logs should be collected into the user folder.
Log4j properties file	radoop.log4j.properties	empty	If log4j log collection is turned on and you wish to use your own log4j.properties file, define its location here. The file must contain the 'log4j.rootLogger' property which defines the logging level and the appenders to attach.

Categories

Versions

RapidMiner Radoop Property Settings

Miscellaneous

Sample size

Logging

JDBC Connection Pool

Logging