Over 900 people (and
counting) have watched ourSNIA Networking Storage Forum (NSF) webcast, “Object Storage: Trends, Use Cases” where our expert panelist had a lively discussion on
object storage characteristics, use cases and performance acceleration. If you
have not seen this session yet, we encourage you to check it out on-demand. The conversation included several interesting
questions related to object storage. As promised, here are answers to them:
Q: Today object
storage allows many new capabilities but also new challenges, such as the need
for geographic and local load balancers in a distributed scale out
infrastructure that at the same time do not become the bottleneck of the object
services at an unsustainable cost. Are there any solutions available today that
have these features built in?
A: Some object
storage solutions have features such as load balancing and geographic
distribution built into the software, though often the storage administrator must
manually configure parts of these features at the network and/or server level.
Most object storage cloud (StaaS) implementations include a distributed,
scale-out infrastructure (including load balancing) in their implementation.
Q: What’s the approximate
current market share of block vs. file vs. object storage deployed today? Where
do you see this going in the next 5 years?
A: You can analyze
this based on spending or capacity, since object storage typically costs less
per terabyte than block or file storage. Including all private and public cloud
storage worldwide, object storage probably makes up between 20-30% of the
spending and between 40-60% of all storage capacity. If we look only at
enterprise (not cloud) storage, then object storage probably constitutes 10-15%
of spending and 20-30% of capacity.
Q: There was a
comment at the start of the discussion where object storage is less performant
than block/file which was clarified as a myth? Can you share some performance
numbers for a given size of data?
A: On average,
existing object storage is less performant than existing block/file storage
because it is usually deployed on top of slower storage media, slower servers, and
slower networks. But there is no reason object storage needs to be any slower
than block/file storage for throughput and large I/O sizes. If deployed using
fast infrastructure, the fastest object storage solutions run just as fast—in
throughput terms—as the fastest block or file storage. However, in many cases,
object storage may not be appropriate for highly-transactional small I/O
workloads, which typically run on top of block or file storage.
Q: Do I need to
transform to key value or can I just query S3?
A: You don’t query
S3. To retrieve an object via S3 is simply an HTTP GET request which can be
done from a browser. Many types of object storage support the S3 API, either natively
or through translation, but there may be some types that require switching your
applications to support a different key value storage API.
Q: Where does NVMe
KV Command Set (in NVMe 2.0) sit in the S3 Amazon stack? How does it change the
API structure?
A. The NVMe Key
Value Command Set does not sit at the same level as the S3 API. The S3 API sits
above protocols like the NVMe KV Command Set. The SNIA Key Value API allows a
library to be written to the NVMe KV Command Set specification which is part of
NVMe 2.0. Amazon S3 today supports use
of key value pairs but does not currently employ the SNIA Key Value Storage
API.
Q: Aren’t
analytics on Object Storage slow and difficult? Have there been any changes in
this area that make analytics faster?
A: This is one of
the myths about object storage that we wanted to debunk in this webcast.
Analytics on object storage is only slow if the storage itself is slow. It’s
difficult only if the analytics tools or query cannot query object storage.
While it is true that most traditional object storage deployed in the past ran
on slower storage media (and connected with slower networks), there are now
fast object storage solutions that can perform just as well as block or file storage
solutions. In fact, some object storage software/service options include
analytics capabilities built into the storage servers, and computational
storage can include analytics capabilities within the drives themselves.
Q: For Kubernetes,
if the client is the app why is CSI required (COSI)?
A: CSI provides an
interface between the containerized app and persistent storage outside of the
Kubernetes orchestrator. It allows storage vendors to support containerized
applications.
Q: Is the entire
KV database from a given S3 bucket being downloaded to the local drive?
A: AWS S3 sync can
be used to synchronize an entire bucket to a local directory, but there are
multiple ways to move data to and from AWS S3 to your local directories or
other instance types.
Q: Given the
volume, sensitivity, and the hybrid nature of data generation, location, and
access — does object storage include security/encryption/key management built
into the solution deployments?
A: Some object
storage products include encryption and key management. Others do encryption
while integrating with an external key management solution. At a high level,
any object storage solution should include support for encryption and other
security features.
Q: Does object storage
support compression and dedupe?
A: Most object
storage solutions include the ability to support dedupe or single-instance
storage (storing only one copy of identical objects if the same object is
submitted multiple times). Some object storage solutions include support for
compression performed within the storage service, but it’s more common for
objects to be compressed by the application or client before being sent to the
object storage system.
Q: Amazon’s S3
in-the-cloud storage means saving in data ingress-egress, but losing on the
Amazon CPU to perform the analysis in Amazon’s cloud compute platform, doesn’t
it? Not understanding how data remains
“”local.”
A: If you’re
comparing AWS S3 to on-premises local storage, whether it will be less
expensive to run analytics using AWS or using your own on-prem servers depends on
the scale, maturity, and efficiency of your in-house analytics. Typically, an
IT department building a small or new analytics operation will find it less
costly to use AWS cloud storage and cloud analytics. While a large IT
organization running a scalable, mature and efficient analytics operation would
find they can do so at a lower cost than outsourcing it to AWS. Whether on-prem
or in the cloud, object storage solutions can typically scale out further in
capacity, while supporting a customizable level of processing performance based
on the user’s requirements.
Q: Cheap and deep describes Openstack Swift,
which claims to be hardware agnostic (deploys on readily available commodity
hardware) – then you have to add network bandwidth, CPU, SSD, etc, for what you
want to do at speed that makes it cheaper in the long run to go for a
purpose-built array and fabric. Why not stay client-server at the outset, with
a fast array, fast processing and fast network?
A: For geographic
location, data remains local if you store it in your local data center without
replicating it to remote locations. For data analytics purposes, data is
“local” if it’s stored in the same data center or on the same network segment
as the analytics servers. When the data and the analytics servers are in
different data centers and not connected by a high-bandwidth, low-latency
network, then analytics performance may suffer. This is true for object or any
other type of storage solution. If the data is stored in Amazon servers, there
may be less control over where data remains.
Q: Does supporting
the NVMe KV Command Set in NVMe SSD/HDDs improve the performance or latency
when compared to standard NVM Command Set?
A: Using SSDs/HDDs
which support the NVMe KV Command Set structure should improve performance and
latency over using the standard NVM Command Set, if storing an object as a key value
pair.
Q. Do SSDs need
to support both Command sets or just one?
A. An SSD can
support just NVMe Command Set, just the NVM Command Set or both. A namespace on
an NVMe SSD is formatted for one or the other. To get the benefits of the NVMe
KV Command Set, an SSD only needs to implement that command set.
Q. Are there any latest updates on the KV Command
Set ecosystem in Linux?
A. The latest
drivers for Linux are available on a public GitHub site at: https://github.com/OpenMPDK/KVSSD
Q: Computational storage with S3 SELECT: Usually, an
object storage solution doesn’t write objects to a single disk, there is some
kind of erasure coding for data protection and probably some file system as an
abstraction layer which the disk may not be aware of. Also, the data is usually
encrypted. How would S3 SELECT be able to parse the original object data on a
single drive?
A: Yes, most object storage solutions use erasure coding
or a simple mirroring mechanism to ensure each object is stored in redundant
locations, and yes, erasure coding usually splits up each object across
multiple drives. A storage-side query such as AWS S3 Select runs a query on or
near the object storage servers and returns a subset of the data to the client
or requestor instead of returning the entire object to the requestor for the
query. In this type of query, the object storage servers can decrypt encryption
before executing the local query, if the encryption was done on the object
server side. (If the encryption was done by the client before being sent to the
object storage, then the queries would not be able to run at or on the storage
servers.) The storage servers would also be able to reassemble an erasure-coded
object locally to the storage servers to run the query, or possibly distribute
and run the query on the multiple erasure coding destinations for that object.
Interested in more
information on object storage? Check out the SNIA Educational Library.
Leave a Reply