1. Introduction
  2. Prerequistes
  3. Set Environment Variables
  4. Setup SSH daemon
  5. Download hadoop and place it in the home directory.
  6. Unpack hadoop
  7. Configure Hadoop
  8. Format the namenode
  9. Setup hadoop plugin
  10. Start the cluster
  11. Setup hadoop location
  12. Upload data
  13. Create and run a test project.
Bookmark and Share

Prerequisites

Before we begin, make sure the following components are installed on your workstation:

This tutorial has been written for and tested with Hadoop version 0.19.1. If you are using another version, some things may not work.

Make sure you have exactly the same versions of the software as shown above. Hadoop will not work with versions of Java earlier than 1.6 and versions of Eclipse later than 3.3.2 due to plug-in API incompatibility.

 

 

Installing Cygwin

After installing the prerequisite software, the next step is to install the Cygwin environment. Cygwin is a set of Unix packages ported to Microsoft Windows. It is needed to run the scripts supplied with Hadoop because they are all written for the Unix platform.

To install the cygwin environment follow these steps:

  1. Download cygwin installer from http://www.cygwin.com.
  2. Run the downloaded file. You will see the window shown on the screenshots below.


    Cygwin installer

    Cygwin Installer
  3. When you see the above screenshot, keep pressing the 'Next' button until you see the package selection screen shown below. Make sure you select 'openssh'. This package is required for the correct functioning of the Hadoop cluster and Eclipse plug-in.

Click here to see larger version

  1. After you selected these packages press the 'Next' button to complete the installation.

Continue

Bookmark and Share