First install Maven – the entire Apache ecosystem is built with Maven. Then we’ll be editing the pom.xml.
To set up a Maven project run this;
mvn archetype:generate \
-DarchetypeGroupId=org.apache.maven.archetypes \
-DarchetypeArtifactId=maven-archetype-quickstart \
-DgroupId=com.bcampbell.hadoopproject \
-DartifactId=wordcount
Then edit the pom file that’s generated. This will then in turn be used to generate an Eclipse workspace.
Here’s the pom file;
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | |
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | |
<modelVersion>4.0.0</modelVersion> | |
<groupId>com.bcampbell.hadoopproject</groupId> | |
<artifactId>wordcount</artifactId> | |
<version>0.0.1</version> | |
<packaging>jar</packaging> | |
<name>wordcount</name> | |
<url>http://maven.apache.org</url> | |
<properties> | |
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | |
<hadoop.version>2.3.0-cdh5.1.2</hadoop.version> | |
</properties> | |
<build> | |
<pluginManagement> | |
<plugins> | |
<plugin> | |
<groupId>org.apache.maven.plugins</groupId> | |
<artifactId>maven-compiler-plugin</artifactId> | |
<version>2.3.2</version> | |
<configuration> | |
<source>1.7</source> | |
<target>1.7</target> | |
</configuration> | |
</plugin> | |
</plugins> | |
</pluginManagement> | |
<plugins> | |
<plugin> | |
<groupId>org.apache.maven.plugins </groupId> | |
<artifactId>maven-eclipse-plugin</artifactId> | |
<version>2.9</version> | |
<configuration> | |
<projectNameTemplate> | |
${project.artifactId} | |
</projectNameTemplate> | |
<buildOutputDirectory> | |
eclipse-classes | |
</buildOutputDirectory> | |
<downloadSources>true</downloadSources> | |
<downloadJavadocs>false</downloadJavadocs> | |
</configuration> | |
</plugin> | |
<plugin> | |
<groupId>org.apache.maven.plugins</groupId> | |
<artifactId>maven-shade-plugin</artifactId> | |
<version>1.7.1</version> | |
<executions> | |
<execution> | |
<phase>package</phase> | |
<goals> | |
<goal>shade</goal> | |
</goals> | |
</execution> | |
</executions> | |
</plugin> | |
<plugin> | |
<groupId>org.apache.maven.plugins</groupId> | |
<artifactId>maven-eclipse-plugin</artifactId> | |
<version>2.9</version> | |
<configuration> | |
<buildOutputDirectory>eclipse-classes</buildOutputDirectory> | |
<downloadSources>true</downloadSources> | |
<downloadJavadocs>false</downloadJavadocs> | |
</configuration> | |
</plugin> | |
</plugins> | |
</build> | |
<dependencies> | |
<dependency> | |
<groupId>junit</groupId> | |
<artifactId>junit</artifactId> | |
<version>3.8.1</version> | |
<scope>test</scope> | |
</dependency> | |
<dependency> | |
<groupId>org.apache.hadoop</groupId> | |
<artifactId>hadoop-client</artifactId> | |
<version>${hadoop.version}</version> | |
<scope>provided</scope> | |
</dependency> | |
</dependencies> | |
<repositories> | |
<repository> | |
<id>cloudera</id> | |
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url> | |
<releases> | |
<enabled>true</enabled> | |
</releases> | |
<snapshots> | |
<enabled>false</enabled> | |
</snapshots> | |
</repository> | |
</repositories> | |
</project> |
Check the config and generate a build;
mvn validate
mvn compile
mvn package
After that – generate the eclipse workspace;
mvn -Declipse.workspace=eclipse_workspace eclipse:configure-workspace eclipse:eclipse
And Let Eclipse know about Maven
Window -> Preferences
Java -> Build Path -> Classpath Variables -> New
name will be M2_REPO
path will be something like ~/.m2
Click the OK button twice
Use the command like to check the setup;
java -cp wordcount-0.0.1.jar com.bcampbell.hadoopproject.App
Run the jar with the wordcount
hadoop jar wordcount-0.0.1.jar com.bcampbell.hadoopproject.WordCount /user/bcampbell/input output44
Sadly I’m getting many missing libraries in the Eclipse workspace. These are specified in Maven but did not get downloaded. They are all the Hadoop jars. There was a step for this, but I think I go think I got it wrong. I’ll update later when I get this working.
UPDATE – I changed the hadoop dependency to;
org.apache.hadoop
hadoop-client
2.3.0-cdh5.1.2
This resolved the missing jars in the .m2 repository directory.
Leave a comment
Comments feed for this article