Abstract
Cloud-native distributed storage services (object, DB, KV, streaming log) typically provide capacity scale-out, availability, durability guarantees via software. But the high performance of new media is lost under software layers. Advances in storage media/protocols present a heterogeneous storage environment needing innovative integration approaches to mixed media types. Also, storage services must scale for emerging elastic applications that are dynamic and demand short-duration performance boosts for subsets of data, without over-provisioning or incurring rebalance overheads while scaling out. We propose to decouple cluster level tasks from the I/O path in SDS architectures. This enables containerized SDS modules with new scale-out vectors in compute and caching. They can be scaled based on client load or to reduce performance impact of SDS tasks like scrubbing & recovery. With no rebalance overheads, they can also be affinitized to applications, for better performance & match application mobility by cache pre-fetching. We share our experience in decoupling SDS layers over NVMe-oF using Ceph as a case study. We demonstrate issues like handling remote asynchronous transactions between decoupled components, and provide PoC performance data. We discuss challenges in locating data when SDS components are dynamic; as well as NVMe-oF advances needed to support distributed storage services. We hope this leads to a community discussion on open questions that remain in this space.
Learning Objectives
Current shortcomings in SDS architectures,Extending NVMe over fabrics for distributed storage,Storage architecture redesign for serverless applications