I've been struggling with accessing Hive table data from MR. There were several stumbling blocks;

  • Making sure that the right libraries were being used by Maven.
  • Getting the correct hive-site.xml picked up by the configuration mechanism.
  • Sorting out differences between the old JobConf api and the and new YARN job api.

JobConf and everything in org.apache.hadoop.mapred package is part of the old API used to write hadoop jobs, Job and everything in the org.apache.hadoop.mapreduce package is  the new and  API to write hadoop jobs.  Main points here are to change mapred packages to mapreduce and use Configuration instead of JobConf.

Here's a good reference :

http://hadoopbeforestarting.blogspot.de/2012/12/difference-between-hadoop-old-api-and.html

Add the path to a hive-site.xml to the project java class path if you're developing in Eclipse and want to run in standalone / pseudo-distributed mode.

Initially, it  was not clear how to set up the Maven dependencies for this project.  I needed to pull in Hive, but the Hortonworks repo did not have the right version. I ended up using Hortonworks and Maven Central, pulling the hive dependencies from Maven Central.

I'll revisit this when it's all up and running.

 

Here are the Hive dependencies

org.apache.hadoop
hadoop-client
2.4.0.2.1.5.0-695

org.apache.hive.hcatalog
hive-hcatalog-core
0.13.1

And the repos;

Maven Repository Switchboard
http://repo1.maven.org/maven2

HDPReleases
HDP Releases
http://repo.hortonworks.com/content/repositories/releases/

Update: I keep running into a compatibility problem with my Hive MapReduce read code;
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
From what I gather, the error comes up form mixing MR 1 and YARN code, but it 's not clear if that's the cause of this problem. I tried to run the HiveRead job outside of Eclipse to see if the error went away. Here are some step I had to take to get the Hive variables set up.

source /etc/alternatives/hadoop-conf/hadoop-env.sh
source /etc/alternatives/hive-conf/hive-env.sh
export HIVE_HOME=/usr/lib/hive
export PATH=$PATH:$HIVE_HOME/bin
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib:/etc/alternatives/hive-conf

NOTES: I'm using maven /Eclipse

This error is encountered when building the jar without including dependencies
hadoop jar HiveRead.jar com.bcampbell.hadoopproject.App :
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf