Hadoop Cluster
About
Hadoop is a Distributed File System written in Java that supports MapReduce. This project looked at the scalability of Hadoop MapReduce on a growing cluster size with a fixed problem. The study was run on both a real and virtual cluster.
Results of a Scalability Performance Study
Source Code
The source code consists of several bash scripts that were written to carry out the experimentation.
Input
The input consists of a variety of free ebooks that were downloaded from Project Gutenberg.
Output
The output of the experiments can be downloaded below.