Primary Data Deduplication in Windows Server 8 | SNIA

Abstract

We present the architecture of the Windows Server 8 primary data deduplication system that is designed to achieve high deduplication savings at low computational overhead on commodity storage platforms. High deduplication savings previously obtained using small ~4KB variable length chunking are achieved with 16-20x larger chunks. A more uniform chunk size distribution and increased deduplication savings are obtained using a new regression chunking algorithm. The challenge of scaling deduplication processing resource usage with data size is addressed using a RAM frugal chunk hash index and data partitioning, so that server resources remain available to fulfill the primary workload. Efficient performance and low RAM footprint associated with data access are maintained through the use of multiple techniques, including caching, read-ahead, and multi-level redirection tables.

Learning Objectives

Primary data deduplication and additional challenges over backup deduplication in server storage platforms.
Sub-file level deduplication, data chunking algorithms, tradeoff between deduplication space savings and chunk size.
How does deduplication and compression work together?
Scaling data deduplication processing: memory and I/O efficient chunk hash index, partitioned processing and reconciliation of partitions.
Techniques for primary data serving when deduplication is enabled.