Abstract
Internet changed the world and continues to revolutionize how people are connected, exchange data and do business. This radical change is one of the causes of the rapid explosion of data volume that required a new data storage approach and design. One of the common elements is that unstructured data rules the IT world. How famous Internet services we all use every day can support and scale with thousands of new users and hundreds of TB added daily and continue to deliver an enterprise-class SLA ? What are various technologies behind a Cloud Storage service to support hundreds of millions users? This tutorial covers technologies introduced by famous papers about Google File System and BigTable, Amazon Dynamo or Apache Hadoop. In addition, Parallel, Scale-out, Distributed and P2P approaches with open source and proprietary ones are illustrated as well. This tutorial adds also some key features essential at large scale to help understand and differentiate industry vendors and open source offerings.
Learning Objectives
Understand technology directions for large scale storage deployments
Be able to compare technologies
Learn from big internet companies about their storage choices and approaches
Identify market solutions and align them with various use cases