Categories

Versions

You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version

Known Hadoop Errors

This section lists errors in the Hadoop components that might effect RapidMiner Radoop process execution. If there is a workaround for an issue, it's also described here.

General Hadoop errors

When using a Radoop Proxy or a SOCKS Proxy, HDFS operations may fail

  • The cause is HDFS-3068
  • Affects: probably newer Hadoop versions, and is still unresolved
  • Error message (during Full Test or file upload):
    java.lang.IllegalStateException: Must not use direct buffers with InputStream API
  • Workaround is to add this property to Advanced Hadoop Parameters: dfs.client.use.legacy.blockreader with a value of true

Windows client does not work with Linux cluster on Hadoop 2.2 (YARN)

  • The cause is YARN-1824
  • Affects: Hadoop 2.2 - YARN, with Windows client and Linux cluster
  • The import test fails, with the single line in the log: /bin/bash: /bin/java: No such file or directory
  • Setting mapreduce.app-submission.cross-platform to false changes the error message to "No job control"
  • There is no workaround for this issue, upgrading to Hadoop 2.4+ is recommended.

AccessControlException in log messages

  • The cause is HADOOP-11808
  • Warning message is WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
  • This message doesn't affect the execution or the results of the process.

General Hive errors

SocketTimeoutException: Read timed out is thrown when using Hive-on-Spark

  • When Hive-on-Spark is used with Spark Dynamic Allocation disabled, the parsing of a HiveQL command may start a SparkSession, and during that period other requests may fail. See HIVE-17532.
  • Affects: Hive with Hive-on-Spark enabled
  • Solution: enabling Spark Dynamic Allocation in the Hive service avoids this issue. Note that SocketTimeoutException may still be thrown for other reasons, please consult your Hadoop support in that case.

IllegalMonitorStateException is thrown during process execution

  • Probably the cause is HIVE-9598. Usually occurs after long period of inactivity on the Studio interface, or if the HiveServer2 service is changed or restarted.
  • Affects: Hive 0.13 (may affect earlier releases), said to be fixed in Hive 0.14
  • Error message example:
java.lang.RuntimeException: java.lang.IllegalMonitorStateException
        at eu.radoop.datahandler.hive.HiveHandler.runFastScriptTimeout(HiveHandler.java:761)
        at eu.radoop.datahandler.hive.HiveHandler.runFastScriptsNoParams(HiveHandler.java:727)
        at eu.radoop.datahandler.hive.HiveHandler.runFastScript(HiveHandler.java:654)
        at eu.radoop.datahandler.hive.HiveHandler.dropIfExists(HiveHandler.java:1853) ...
        Caused by: java.lang.IllegalMonitorStateException
        at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(Unknown Source)
        at java.util.concurrent.locks.ReentrantLock.unlock(Unknown Source)
        at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:239)
  • Workaround: Re-opening the process in Studio may solve it. If not, a Studio restart should solve it. If this is a constant issue on your cluster, please upgrade to a Hive version where this issue has been fixed (see ticket above).

Hive connection test with SSL enabled fails with Invalid Status 21

  • This can mean two things:

    • either the JDBC URL Postfix value starting with ssl=true is missing from the connection configuration (see Apache Wiki), or
    • it is properly specified, but bug HIVE-10048 in the Apache Hive JDBC Driver still leads to this error (HIVE-14019 is also related)
  • In the latter case, the followings apply.
  • Affects: Radoop version below 7.5.1, because its Apache Hive JDBC driver has this bug
  • Radoop version 7.5.1 solves this issue as it contains a patch for the Apache Hive JDBC driver
  • Workaround for versions below 7.5.1: you need to configure your HiveServer2 instance to use either Kerberos + SASL or LDAP + SSL, and not Kerberos + SSL.

Hive job fails with MapredLocalTask error

  • Hive may start a so-called local task to perform a JOIN. If there is an error during this local work (typically, an out of memory error), it may only return an error code and not a proper error message.
  • Affects: Hive 0.13.0, Hive 1.0.0, Hive 1.1.0 and potentially other versions
  • Error message example (return code may differ): return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
  • See full error message in /tmp/hive/ local directory, by default, on the cluster node that performed the task.
  • Workaround: check whether the Join operator in your process uses the appropriate keys, so the result set does not explode. If the Join is defined correctly, add the following key-value pair to the Advanced Hive Parameters list for your connection: key hive.auto.convert.join with value false.

Hive job fails with Kryo serializer error

  • The cause is probably the same as for HIVE-7711
  • Affects: Hive 1.1.0 and potentially other versions
  • Error message: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: [...]
  • Workaround: process re-run might help. Adding the following key-value pair to Advanced Hive Parameters list for the connection prevents this type of error: key hive.plan.serialization.format with value javaXML.
  • Manually installing RapidMiner Radoop functions prevents this type of error.

Hive job fails with NoClassDefFoundError error

  • The cause is addressed in HIVE-2573, this patch is included in CDH 5.4.3
  • Affects: CDH 5.4.0 to CDH 5.4.2
  • Error message: java.sql.SQLException: java.lang.NoClassDefFoundError: [...]
  • Error message in HiveServer2 log: java.lang.RuntimeException: java.lang.NoClassDefFoundError: [...]
  • Solution: process re-run may help, but an upgrade to CDH 5.4.3 is the permanent solution.
  • Manually installing RapidMiner Radoop functions also prevents this type of error.

Hive job fails before completion

  • Probably the cause is HIVE-4605
  • Affects: Hive 0.13.1 and below
  • Error message: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
  • Error message in HiveServer2 log: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from [...] .hive-staging_hive_[...] to: .hive-staging_hive_...
  • There is no known workaround, please re-run the process preferably without any concurrent read from the same Hive table.

JOIN may lead to NullPointerException in CDH

  • The cause may be HIVE-3872, but there are no MAP JOIN hints
  • Usually some kind of self join of a complex view leads to this error
  • Workaround: use Materialize Data (or Multiply) before the Join operator (in case of multiple Joins, you may have to find out exactly which is the first Join that leads to this error, and materialize right before it)

Number of connections to Zookeeper reaches the maximum allowed

  • The cause is that each HiveServer2 "client" creates a new connection to Zookeeper: HIVE-4132, HIVE-6375
  • Affects Hive 0.10 and several newer versions.
  • After the maximum number (default is 60, Radoop can easily reach that) is reached, HiveServer2 becomes inaccessible, since its connection attempt to Zookeeper fails. A HiveServer2 or Zookeeper restart is required in this case.
  • Workaround: increase maxClientCnxns property of Zookeeper, e.g. to 500.

Non latin1 characters may cause encoding issues when used in a filter clause or in a script

  • The cause is that RapidMiner Radoop relies heavily on Hive VIEW objects. The code of a VIEW is stored in the Hive Metastore database, which is an arbitrary relational database usually created during the Hadoop cluster install. If this database does not handle the character encoding well, then the RapidMiner Radoop interface will also have issues.
  • Affects: Hive Metastore DB created by default MySQL scripts, and it may affect other databases as well
  • Workaround: your Hadoop administrator can test if your Hive Metastore database can deal with the desired encoding. A Hive VIEW, created through Beeline, that contains a filter clause with non latin1 characters should return the expected result set when used as a source object in a SELECT query. Please contact your Hadoop support regarding enconding issues with Hive.

Confidence values are missing or predicted label may be shown incorrectly (e.g. after a Discretize operator)

  • This issue probably comes up only if Hive Metastore is installed on MySQL relational database with latin1 character set as database default and the label contains special multibyte UTF-8 characters, like the infinity symbol (∞) that a Discretize operator uses.
  • Affects: Hive Metastore DB created by default MySQL scripts, and it may affect other databases as well
  • Workaround: your Hadoop administrator can test if your Hive Metastore database can deal with the desired encoding. A Hive VIEW, created through Beeline, that contains a filter clause with non latin1 characters should return the expected result set when used as a source object in a SELECT query. Please contact your Hadoop support regarding enconding issues with Hive.

Attribute roles and binominal mappings may be lost when storing in a Hive table with non-default format

  • The cause is HIVE-6681
  • Probably it is fixed in Hive 0.13
  • As the roles and the binominal mappings are stored in column comments, when these are replaced with 'from deserializer", the roles are lost.

PARQUET format may cause Hive to fail

  • There are several related issues, one of them is HIVE-6375
  • Affects: Hive 0.13 is said to be fixed, but may still have issues like HIVE-6938
  • CREATE TABLE AS SELECT statement fails with MapRedTask return code 2; or we get NullPointerException; or we get ArrayIndexOutOfBoundsException because of column name issues
  • Workaround is to use different format. Materialize Data operator may not be enough, as the CTAS statement gives the error.

ClassNotFoundException during Model apply (more than one UDF's in a query)

  • The error message is HiveQL problem (java.sql.SQLException: java.lang.NoClassDefFoundError: com/rapidminer/operator/AbstractIOObject$InputStreamProvider) or the class might be com/rapidminer/operator/OperatorException
  • Affects: RapidMiner Radoop 2.3.1 or earlier release. Later releases contain a built-in workaround for this issue.
  • Potential cause: Hive sometimes does not handle different JAR dependencies for UDF's in the same query.
  • Workaround is using the Materialize Data operator after and/or before the Apply Model operator.
  • Another workaround is manually installing RapidMiner Radoop functions. For more information see the Installing RapidMiner Radoop functions manually section on the Operation and Maintenance page.

Hive may hang while parsing large queries

  • When submitting large queries to the Hive parser, the execution may stop and later fail or never recover.
  • Workaround: This issue usually happens with the Apply Model operator with very large models (like Trees). Set the use general applier parameter to true to avoid the large queries but get the same result.

Unable to cancel certain Hive-on-Spark queries

  • The cause is HIVE-13626
  • The YARN application can be stuck in RUNNING state on the cluster if the query is canceled immediately after it is submitted.
  • This issue can be experienced by chance when a process using Hive-on-Spark is stopped. It also affects full tests applied on Radoop connections, but it is fixed in Radoop 7.2.0.
  • To resolve the situation, one can kill the job manually.

Starting a Hive-on-Spark job fails due to timeout

  • The hive.spark.client.server.connect.timeout property is set to 90000ms by default. This may be short for a Hive-on-Spark job to start, especially when multiple jobs are waiting for execution (e.g. parallel execution of processes).
  • From RapidMiner Radoop 7.4, a dedicated error message explains this issue. In older versions, the following error message is shown in most cases: ERROR : Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'.
  • The property value can be modified only in hive-site.xml cluster configuration file.

General Impala errors

Impala may return empty results when DISTINCT or COUNT(DISTINCT ..) expressions are used

  • There are a lot of similar bug tickets.
  • Seems to only come up when an INSERT is used (Store in Hive operator). The DISTINCT expression may be used in Aggregate or Remove Duplicates operators.
  • A related issue: Impala still does not support multiple COUNT(DISTINCT ..) expressions IMPALA-110

Impala may fail to rename table if the target name previously existed

  • RapidMiner Radoop may use DROP TABLE ... and ALTER TABLE ... RENAME TO ... calls in e.g. the Store in Hive operator
  • Affects Radoop versions 7.1.0 and below
  • Error message:

    ImpalaRuntimeException: Error making 'alter_table' RPC to Hive Metastore: CAUSED BY: InvalidOperationException: New location for this table default.example already exists : hdfs://quickstart.cloudera:8020/user/hive/warehouse/example

  • Workaround: Insert a Select Attributes operator with default parameter settings before Store in Hive or set custom storage parameter to true on Store in Hive to disable the optimization. This does not modify the target table but disables the optimization in Radoop that uses the ALTER TABLE statement.

General Spark errors

Spark 2.0.1 is unable to create database

  • On Spark 2.0.1, the execution fails with the following exception: "Unable to create database default as failed to create its directory hdfs://"...
  • Cause is SPARK-17810.
  • Affects only Spark 2.0.1. Please use Spark 2.0.0 or upgrade to Spark 2.0.2 or later.
  • Workaround is to add spark.sql.warehouse.dir as an Advanced Spark Parameter with a path that begins with "file:/". This is not expected to work on Windows.

Spark job may fail with relatively large ORC input data

  • Error message is "Size exceeds Integer.MAX_VALUE"
  • The cause is SPARK-1476
  • Workaround is using Text input format. The bug may occur with Text format too if the HDFS blocks are large.

Reading the output of Spark Script may fail for older Hive versions if the DataFrame contains too many null values

  • The Spark job succeeds, but reading the output parquet table fails with NullPointerException.
  • Affects Hive 1.1 and below. The cause is PARQUET-136.
  • Workaround is using the fillna() function on the output DataFrame (Python API, R API)

Exception in Spark job may not fail the process if no output connection is defined.

  • An exception occured in the Spark script but the Spark job succeeds. If the operator has no output, the process succeeds.
  • Cause is SPARK-7736 and SPARK-10851.
  • Affects Spark 1.5.x. Fixed in Spark 1.5.1 for Python, Spark 1.6.0 for R. The exception can be checked in the ApplicationMaster logs on the Resource Manager web interface.
  • Workaround is upgrading to Spark 1.5.1/1.6.0 or making a dummy output connection and returning a (small) dummy dataset.