Abstract
The talk will examine the following challenging question: how to predict what data reduction ratio one should expect when storing his data in a storage system that provides compression and deduplication? We will describe cutting edge results on how to perform these tasks and explain the obstacles in doing so efficiently. Specifically, the answer to this question is very different for compression and deduplication - the task is significantly harder for deduplication. The talk will describe solutions for both tasks that are based on sampling and give estimations rather than actual results.