Hadoop Cluster

About

Hadoop is a Distributed File System written in Java that supports MapReduce. This project looked at the scalability of Hadoop MapReduce on a growing cluster size with a fixed problem. The study was run on both a real and virtual cluster.

Results of a Scalability Performance Study

Hadoop Scalability Report

Hadoop Scalability Slides

Source Code

The source code consists of several bash scripts that were written to carry out the experimentation.

Source Code and Scripts

Input

The input consists of a variety of free ebooks that were downloaded from Project Gutenberg.

Input

Output

The output of the experiments can be downloaded below.

Output