Practice 2 - Introduction to MapReduce - Alternative exercise guides for Eclipse IDE
These page contains alternative guides for exercises 2 and 3 for the Second lab.
Exercise 4.2 Configuring Eclipse for Hadoop
- Start Eclipse IDE
- Most new Eclipse comes with Maven project support included. If not you can install Maven plugin for Eclipse manually
- In Eclipse, go to
Help-> Eclipse Marketplace
- Search for maven
- Install Maven integration for Eclipse (m2e)
- In Eclipse, go to
- Import the
hadoop-mapreduce-project
folder as a Maven project into Eclipse- Go to
File->Import->Maven->Existing Maven Projects->Next
- Set the location of the Root directory to be
hadoop-mapreduce-project\hadoop-mapreduce-examples\
inside the previously unpacked hadoop source folder.
- Go to
- Wait until Maven has finished configuring the project dependencies.
- Open the pom.xml file inside the Project directory and solve any errors that appear.
- If you get an error about a connected
ant build
Eclipse plugin the choose to ignore it for the current project. - If you get an error about
jdk tools
- Make sure your system variable JAVA_HOME links correctly to Java SDK installation path.
- Open your System Control Panel: Control Panel -> System -> Advanced System Settings
- Check if JAVA_HOME is set. Its value should be the main directory of your Java 8 JDK inside your computer.
- If it does not exist, add a new JAVA_HOME system variable
- If you still get the same the error, add the following dependency to Maven pom.xml:
<dependency> <groupId>jdk.tools</groupId> <artifactId>jdk.tools</artifactId> <scope>system</scope> <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath> <version>1.8.0_161</version> </dependency>
- Change the
<version>1.8.0_161</version>
line to match the version of Java SDK installed in your computer.
- Change the
- Update Maven configuration for your project
- Right click your project (in Eclipse project explorer), choose Maven -> Update Project
- Make sure your system variable JAVA_HOME links correctly to Java SDK installation path.
- If you get an error about a connected
Exercise 4.3 Running the WordCount example in Eclipse
- Create a new folder named input inside your Eclipse project. We will put all the input files for the WordCount Mapreduce application there.
- Download 5 random books from Gutenberg in text (UTF-8) format:
- Move the downloaded text files into the input folder
- Find the WordCount class inside your Eclipse project (org.apache.hadoop.examples package)
- Try to execute WordCount class in Eclipse
- Right click on the WordCount class -> Run As -> Java Application
- You will initially see an error concerning the number of supplied arguments.
- Modify the configuration of the WordCount class to change what arguments should be supplied to it.
- Right click WordCount class -> Run As -> Run Configuration -> Arguments
- WordCount class takes two command line argument, input folder and output folder
- Specify the previously created folder (where you moved Gutenberg books) as input folder and an arbitrarily named folder as output
- when using relative folder paths in Eclipse, folders are created inside the Eclipse project main folder
- If the execution is successful, output will be written into part-r-00000 file inside the output folder
- If you run the application again, output folder must first be deleted, moved or changed to a new folder.