Combining SNIA Cloud, Tape and Container Format Technologies for the Long Term Retention of Big Data (Spring 2013)

webinar

Author(s)/Presenter(s):

Roger Cummings

Simona Rabinovici-Cohen

Library Content Type

Presentation

Tutorial

Library Release Date

Focus Areas

Abstract

Generating and collecting very large data sets is becoming a necessity in many domains that also need to keep that data for long periods. Examples include astronomy, atmospheric science, genomics, medical records, photographic archives, video archives, and large-scale e-commerce. While this presents significant opportunities, a key challenge is providing economically scalable storage systems to efficiently store and preserve the data, as well as to enable search, access, and analytics on that data in the far future. Both cloud and tape technologies are viable alternatives for storage of big data and SNIA supports their standardization. The SNIA Cloud Data Management Interface (CDMI) provides a standardized interface to create, retrieve, update, and delete objects in a cloud. The SNIA Linear Tape File System (LTFS) takes advantage of a new generation of tape hardware to provide efficient access to tape using standard, familiar system tools and interfaces. In addition, the SNIA Self-contained Information Retention Format (SIRF) defines a storage container for long term retention that will enable future applications to interpret stored data regardless of the application that originally produced it.

This tutorial will present advantages and challenges in long term retention of big data, as well as initial work on how to combine SIRF with LTFS and SIRF with CDMI to address some of those challenges. SIRF with CDMI will also be examined in the European Union integrated research project ENSURE – Enabling kNowledge, Sustainability, Usability and Recovery for Economic value.

Learning Objectives

Recognize the challenges and value in the long-term preservation of big data, and the role of new cloud and tape technologies to assist in addressing them
Identify the need, use cases, and proposed architecture of SIRF. Also, review the latest activities in SNIA LTR technical working group to combine SIRF with LTFS and SIRF with CDMI for long term retention and mining of big data.
Discuss the usage of SIRF with CDMI in the ENSURE project that draws on actual commercial use cases from health care, clinical trials, and financial services.