Abstract
We will present a technique to accelerate Hadoop and other Big Data applications by optimizing data in the local file system. Hadoop is often I/O bound so increasing I/O rates will also increase map/reduce execution time. A transparent Compression/Decompression file system can accelerate Hadoop without any change to workflow or applications.
Learning Objectives
Understand why Hadoop is frequently I/O bound
Understand how data optimization accelerates I/O and therefore addresses the I/O bound problem.
Compare and contrast native software compression and hardware accelerated compression in Hadoop environmnents.
Understand the costs and benefits of various approches to compression in Hadoop