Is the Sun Setting on Some of Your Technologies?

Tom Friend

Jan 14, 2021

title of post
So much of what we discuss within SNIA is the latest emerging technologies in storage. While it’s good to know about what technology is coming, it’s also important to understand the technologies that should be sunsetted. It’s the topic of our next SNIA Networking Storage Forum (NSF) webcast on February 3, 2021, “Storage Technologies & Practices Ripe for Refresh.”  In this webcast, you’ll learn about storage technologies and practices in your data center that are ready for refresh or possibly retirement. Find out why some long-standing technologies and practices should be re-evaluated. We’ll discuss:
  • Obsolete hardware, protocols, interfaces and other aspects of storage
  • Why certain technologies are no longer in general use
  • Technologies on their way out and why
  • Drivers for change
  • Justifications for obsoleting proven technologies
  • Trade-offs risks: new faster/better vs. proven/working tech
Register today and bring your questions for our panel of experts.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Data Deduplication FAQ

Alex McDonald

Jan 5, 2021

title of post

The SNIA Networking Storage Forum (NSF) recently took on the topics surrounding data reduction with a 3-part webcast series that covered Data Reduction Basics, Data Compression and Data Deduplication. If you missed any of them, they are all available on-demand.

In Not Again! Data Deduplication for Storage Systems” our SNIA experts discussed how to reduce the number of copies of data that get stored, mirrored, or backed up. Attendees asked some interesting questions during the live event and here are answers to them all.

Q. Why do we use the term rehydration for deduplication?  I believe the use of the term rehydration when associated with deduplication is misleading. Rehydration is the activity of bringing something back to its original content/size as in compression. With deduplication the action is more aligned with a scatter/gather I/O profile and this does not require rehydration.

A. "Rehydration" is used to cover the reversal of both compression and deduplication. It is used more often to cover the reversal of compression, though there isn't a popularly-used term to specifically cover the reversal of deduplication (such as "re-duplication").  When reading compressed data, if the application can perform the decompression then the storage system does not need to decompress the data, but if the compression was transparent to the application then the storage (or backup) system will decompress the data prior to letting the application read it. You are correct that deduplicated files usually remain in a deduplicated state on the storage when read, but the storage (or backup) system recreates the data for the user or application by presenting the correct blocks or files in the correct order.

Q. What is the impact of doing variable vs fixed block on primary storage Inline?

A. Deduplication is a resource intensive process. The process of sifting the data inline by anchoring, fingerprinting and then filtering for duplicates not only requires high computational resources, but also adds latency on writes. For primary storage systems that require high performance and low latencies, it is best to keep these impacts of dedupe low. Doing dedupe with variable-sized blocks or extents (for e.g. with Rabin fingerprinting) is more intensive than using simple fixed-sized blocks. However, variable-sized segmentation is likely to give higher storage efficiency in many cases. Most often this tradeoff between latency/performance vs. storage efficiency tips in favor of applying simpler fixed-size dedupe in primary storage systems.

Q. Are there special considerations for cloud storage services like OneDrive?

A. As far as we know, Microsoft OneDrive avoids uploading duplicate files that have the same filename, but does not scan file contents to deduplicate identical files that have different names or different extensions. As with many remote/cloud backup or replication services, local deduplication space savings do not automatically carry over to the remote site unless the entire volume/disk/drive is replicated to the remote site at the block level. Please contact Microsoft or your cloud storage provider for more details about any space savings technology they might use.

Q. Do we have an error rate calculation system to decide which type of deduplication we use?

A. The choice of deduplication technology to use largely depends on the characteristics of the dataset and the environment in which deduplication is done. For example, if the customer is running a performance and latency sensitive system for primary storage purposes, then the cost of deduplication in terms of the resources and latencies incurred may be too high and the system may use very simple fixed-size block based dedupe. However, if the system/environment allows for spending extra resources for the sake of storage efficiency, then a more complicated variable-sized extent based dedupe may be used. About error rates themselves, a dedupe storage system should always be built with strong cryptographic hash-based fingerprinting so that the error rates of collisions are extremely low. Errors due to collisions in a dedupe system may lead to data loss or corruption, but as mentioned earlier these can be avoided by using strong cryptographic functions.

Q. Considering the current SSD QLC limitations and endurance... Can we say that if a right choice for deduped storage?

A. In-line deduplication either has no effect or reduces the wear on NAND storage because less data is written. Post-process deduplication usually increases wear on NAND storage because blocks are written then later erased--due to deduplication--and the space later fills with new data. If the system uses post-process deduplication, then the storage software or storage administrator needs to weigh the space savings benefits vs. the increased wear on NAND flash. Since QLC NAND is usually less expensive and has lower write endurance than SLC/MLC/TLC NAND, one might be less likely to use post-process deduplication on QLC NAND than on more expensive NAND which has higher endurance levels.

Q. On slides 11/12 - why not add compaction as well - "fitting" the data onto respective blocks and "if 1k file, not leaving the rest 3k of 4k block empty"?

A. We covered compaction in our webcast on data reduction basics “Everything You Wanted to Know About Storage But Were Too Proud to Ask: Data Reduction.” See slide #18 below.

Again, I encourage you to check out this Data Reduction series and follow us on Twitter @SNIANSF for dates and topics of more SNIA NSF webcasts.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Data Deduplication FAQ

Alex McDonald

Jan 5, 2021

title of post
The SNIA Networking Storage Forum (NSF) recently took on the topics surrounding data reduction with a 3-part webcast series that covered Data Reduction Basics, Data Compression and Data Deduplication. If you missed any of them, they are all available on-demand. In Not Again! Data Deduplication for Storage Systems” our SNIA experts discussed how to reduce the number of copies of data that get stored, mirrored, or backed up. Attendees asked some interesting questions during the live event and here are answers to them all. Q. Why do we use the term rehydration for deduplication?  I believe the use of the term rehydration when associated with deduplication is misleading. Rehydration is the activity of bringing something back to its original content/size as in compression. With deduplication the action is more aligned with a scatter/gather I/O profile and this does not require rehydration. A. “Rehydration” is used to cover the reversal of both compression and deduplication. It is used more often to cover the reversal of compression, though there isn’t a popularly-used term to specifically cover the reversal of deduplication (such as “re-duplication”).  When reading compressed data, if the application can perform the decompression then the storage system does not need to decompress the data, but if the compression was transparent to the application then the storage (or backup) system will decompress the data prior to letting the application read it. You are correct that deduplicated files usually remain in a deduplicated state on the storage when read, but the storage (or backup) system recreates the data for the user or application by presenting the correct blocks or files in the correct order. Q. What is the impact of doing variable vs fixed block on primary storage Inline? A. Deduplication is a resource intensive process. The process of sifting the data inline by anchoring, fingerprinting and then filtering for duplicates not only requires high computational resources, but also adds latency on writes. For primary storage systems that require high performance and low latencies, it is best to keep these impacts of dedupe low. Doing dedupe with variable-sized blocks or extents (for e.g. with Rabin fingerprinting) is more intensive than using simple fixed-sized blocks. However, variable-sized segmentation is likely to give higher storage efficiency in many cases. Most often this tradeoff between latency/performance vs. storage efficiency tips in favor of applying simpler fixed-size dedupe in primary storage systems. Q. Are there special considerations for cloud storage services like OneDrive? A. As far as we know, Microsoft OneDrive avoids uploading duplicate files that have the same filename, but does not scan file contents to deduplicate identical files that have different names or different extensions. As with many remote/cloud backup or replication services, local deduplication space savings do not automatically carry over to the remote site unless the entire volume/disk/drive is replicated to the remote site at the block level. Please contact Microsoft or your cloud storage provider for more details about any space savings technology they might use. Q. Do we have an error rate calculation system to decide which type of deduplication we use? A. The choice of deduplication technology to use largely depends on the characteristics of the dataset and the environment in which deduplication is done. For example, if the customer is running a performance and latency sensitive system for primary storage purposes, then the cost of deduplication in terms of the resources and latencies incurred may be too high and the system may use very simple fixed-size block based dedupe. However, if the system/environment allows for spending extra resources for the sake of storage efficiency, then a more complicated variable-sized extent based dedupe may be used. About error rates themselves, a dedupe storage system should always be built with strong cryptographic hash-based fingerprinting so that the error rates of collisions are extremely low. Errors due to collisions in a dedupe system may lead to data loss or corruption, but as mentioned earlier these can be avoided by using strong cryptographic functions. Q. Considering the current SSD QLC limitations and endurance… Can we say that if a right choice for deduped storage? A. In-line deduplication either has no effect or reduces the wear on NAND storage because less data is written. Post-process deduplication usually increases wear on NAND storage because blocks are written then later erased–due to deduplication–and the space later fills with new data. If the system uses post-process deduplication, then the storage software or storage administrator needs to weigh the space savings benefits vs. the increased wear on NAND flash. Since QLC NAND is usually less expensive and has lower write endurance than SLC/MLC/TLC NAND, one might be less likely to use post-process deduplication on QLC NAND than on more expensive NAND which has higher endurance levels. Q. On slides 11/12 – why not add compaction as well – “fitting” the data onto respective blocks and “if 1k file, not leaving the rest 3k of 4k block empty”? A. We covered compaction in our webcast on data reduction basics “Everything You Wanted to Know About Storage But Were Too Proud to Ask: Data Reduction.” See slide #18 below.
Again, I encourage you to check out this Data Reduction series and follow us on Twitter @SNIANSF for dates and topics of more SNIA NSF webcasts.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Questions on Securing Data in Transit Answered

Alex McDonald

Dec 9, 2020

title of post
Data in transit provides a large attack surface for bad actors. Keeping data secure from threats and compromise while it’s being transmitted was the topic at our live SNIA Networking Storage Forum (NSF) webcast, Securing Data in Transit. Our presenters, Claudio DeSanti, Ariel Kit, Cesar Obediente, and Brandon Hoff did an excellent job explaining how to mitigate risks. We had several questions during the live event. Our panel of speakers have been kind enough to answer them here. Q. Could we control the most important point – identity, that is, the permission of every data transportation must have an identity label, so that we can control anomalies and misbehaviors easily? A. That is the purpose of every authentication protocol: verify the identity of entities participating in the authentication protocol on the basis of some secret values or certificates associated with the involved entity. This is similar to verifying the identity of a person on the basis of an identity document associated with the person. Q. What is BGP? A. BGP stands for Border Gateway Protocol, it is a popular routing protocol commonly used across the Internet but also leveraged by many customers in their environments. BGP is used to exchange routing information and next hop reachability between network devices (routers, switches, firewall, etc.). In order to establish this communication among the neighbors, BGP creates a TCP session in port 179 to maintain and exchange BGP updates. Q. What are ‘north-south’ and ‘east west’ channels? A. Traditionally “north-south” is traffic up and down the application or solution “stack” such as from client to/from server, Internet to/from applications, application to/from database, application to/from storage, etc. East-West is between similar nodes–often peers in a distributed application or distributed storage cluster. For example, east-west could include traffic from client to client, between distributed database server nodes, between clustered storage nodes, between hyperconverged infrastructure nodes, etc. Q. If I use encryption for data in transit, do I still need a separate encryption solution for data at rest? A. The encryption of data in transit protects the data as it flows through the network and blocks attack types such as eavesdropping, however, once it arrives to the target the data is decrypted and saved to the storage unencrypted unless data at rest encryption is applied. It is highly recommended to use both for best protection, data at rest protection protects the data in case the storage target is accessed by an attacker. The SNIA NSF did a deep dive on this topic in a separate webcast “Storage Networking Security Series: Protecting Data at Rest.” Q. Will NVMe-oFÔ use 3 different encryption solutions depending upon whether it’s running over Fibre Channel, RDMA, or IP? A. When referring to data in transit, the encryption type depends on the network type, hence, for different networks we will use different data-in-motion encryption protocols, nevertheless, they can all be based on Encapsulating Security Protocol (ESP) with same cipher suite and key exchange methods. Q. Can NVMe-oF over IP already use Transport Layer Security (TLS) for encryption or is this still a work in progress? Is the NVMe-oF spec aware of TLS? A. NVMe-oF over TCP already supports TLS 1.2. The NVM Express Technical Proposal TP 8011 is adding support for TLS 1.3. Q. Are there cases where I would want to use both MACsec and IPSec, or use both IPSec and TLS?  Does CloudSec rely on either MACSec or IPSec? A. Because of the number of cyber-attacks that are currently happening on a daily basis, it is always critical to create a secure environment in order to protect confidentially and integrity of the data. MACsec is enabled in a point-to-point Ethernet link and IPSec could be classified as to be end-to-end (application-to-application or router-to-router). Essentially you could (and should) leverage both technologies to provide the best encryption possible to the application. These technologies can co-exist with each other without any problem. The same can be said if the application is leveraging TLS. To add an extra layer of security you can implement IPSec, for example site-to-site to IPSec VPN. This is true especially if the communication is leveraging the Internet. CloudSec, on the other hand, doesn’t rely on MACsec because MACsec is a point-to-point Ethernet Link technology and CloudSec provides the transport and encryption mechanism to support a multi-site encryption communication. This is useful where more than one data center is required to provide an encryption mechanism to protect the confidentially and integrity of the data. The CloudSec session is a point-to-point encryption over Data Center Interconnect on two or more sites. CloudSec key exchange uses BGP to guarantee the correct information gets the delivered to the participating devices. Q. Does FC-SP-2 require support from both HBAs and switches, or only from the HBAs? A. For data that moves outside the data center, Fibre Channel Security Protocols (FC-SP-2) for Fibre Channel or IPsec for IP would need to be supported by the switches or routers. No support would be required in the HBA. This is most common use case for FC-SP-2.  Theoretically, if you wanted to support FC-SP-2 inside the secure walls of the data center, you can deploy end-to-end or HBA-to-HBA encryption and you won’t need support in the switches.  Unfortunately, this breaks some switch features since information the switch relies on would be hidden. You could also do link encryption from the HBA-to-the switch, and this would require HBA and switch support.  Unfortunately, there are no commercially available HBAs with FC-SP-2 support today, and if they become available, interoperability will need to be proven. This webcast from the Fibre Channel Industry Association (FCIA) goes into more detail on Fibre Channel security. Q. Does FC-SP-2 key management require a centralized key management server or is that optional? A. For switch-to-switch encryption, keys can be managed through a centralized server or manually. Other solutions are available and in production today. For HBAs, in most environments there would be thousands of keys to manage so a centralized key management solution would be required and FC-SP provides 5 different options. Today, there are no supported key management solutions for FC-SP-2 from SUSE, RedHat, VMware, Windows, etc. and there are no commercially available HBAs that support FC-SP-2. This webcast was part of our Storage Networking Security Webcast Series and they are all available on demand. I encourage you to take a look at the other SNIA educational webcasts from this series:

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

NVMe Key-Value Standard Q&A

John Kim

Nov 9, 2020

title of post

Last month, Bill Martin, SNIA Technical Council Co-Chair, presented a detailed update on what’s happening in the development and deployment of the NVMe Key-Value standard. Bill explained where Key Value fits within an architecture, why it’s important, and the standards work that is being done between NVM Express and SNIA. The webcast was one of our highest rated. If you missed it, it’s available on-demand along with the webcast slides. Attendees at the live event had many great questions, which Bill Martin has answered here:

Q. Two of the most common KV storage mechanisms in use today are AWS S3 and RocksDB. How does NVMe KV standards align or differ from them? How difficult would it be to map between the APIs and semantics of those other technologies to NVMe KV devices?

A. KV Storage is intended as a storage layer that would support these and other object storage mechanisms. There is a publicly available KVRocks: RocksDB compatible key value store and MyRocks compatible storage engine designed for KV SSDs at GitHub. There is also a Ceph Object storage design available. These are example implementations that can help an implementer get to an efficient use of NVMe KV storage.

Q. At which layer will my app stack need to change to take advantage of KV storage?  Will VMware or Linux or Windows need to change at the driver level?  Or do the apps need to be changed to treat data differently?  If the apps don’t need to change doesn’t this then just take the data layout tables and move them up the stack in to the server?

A. The application stack needs to change at the point where it interfaces to a filesystem, where the interface would change from a filesystem interface to a KV storage interface. In order to take advantage of Key Value storage, the application itself may need to change, depending on what the current application interface is. If the application is talking to a RocksDB or similar interface, then the driver could simply be changed out to allow the app to talk directly to Key Value Storage. In this case, the application does not care about the API or the underlying storage. If the application is currently interfacing to a filesystem, then the application itself would indeed need to change and the KV API provides a standardized interface that multiple vendors can support to provide both the necessary libraries and access to a Key Value storage device. There will need to be changes in the OS to support this in providing a kernel layer driver for the NVMe KV device. If the application is using an existing driver stack that goes through a filesystem and does not change, then you cannot take advantage of KV Storage, but if the application changes or already has an object storage interface then the kernel filesystem and mapping functions can be removed from the data path.

Q. Is there a limit to the length of a key or value in the KV Architecture?

A.There are limits to the Key and value sizes in the current NVMe standard. The current implementation limits the key to 16 bytes due to a desire to pass the key within the NVMe command. The other architectural limit on a key is that the length of the key is specified in a field that allows up to 255 bytes for the key length. To utilize this, an alternative mechanism for passing the key to the device is necessary. For the value, the limit on the size is 4 GBytes.

Q. Are there any atomicity guarantees (e.g. for overwrites)?

A. The current specification makes it mandatory for atomicity at the KV level. In other words, if a KV Store command overwrites an existing KV pair and there is a power failure, you either get all of the original value or all of the new value.

Q. Is KV storage for a special class of storage called computational storage or can it be used for general purpose storage?

A. This is for any application that benefits from storing objects as opposed to storing blocks. This is unrelated to computational storage but may be of use in computational storage applications. One application that has been considered is that for a filesystem that rather than using the filesystem for storing blocks and having a mapping of each file handle to a set of blocks that contain the file contents, you would use KV storage where the file handle is the key and the object holds the file contents.

Q. What are the most frequently used devices to use the KV structure?

A. If what is being asked is, what are the devices that provide a KV structure, then the answer is, we expect the most common devices using the KV structure will be KV SSDs.

Q. Does the NVMe KV interface require 2 accessed in order to get the value (i.e., on access to get the value size in order to allocate the buffer and then a second access to read the value)?

A.If you know the size of the object or if you can pre-allocate enough space for your maximum size object then you can do a single access. This is no different than current implementations where you actually have to specify how much data you are retrieving from the storage device by specifying a starting LBA and a length. If you do not know the size of the value and require that in order to retrieve the value then you would indeed need to submit two commands to the NVMe KV storage device.

Q. Does the device know whether an object was compressed, and if not how can a previously compressed object be stored?

A. The hardware knows if it does compression automatically and therefore whether it should de-compress the object. If the storage device supports compression and the no-compress option, then the device will store metadata with the KV pair indicating if no-compress was specified when storing the file in order to return appropriate data. If the KV storage device does not perform compression, it can simply support storage and retrieval of previously compressed objects. If the KV storage device performs its own compression and is given a previously-compressed object to store and the no-compress option is not requested, the device will recompress the value (which typically won’t result in any space savings) or if the no-compress option is requested the device will store the value without attempting additional compression.

Q. On flash, erased blocks are fixed sizes, so how does Key Value handle defrag after a lot of writes and deletes?

A. This is implementation specific and depends on the size of the values that are stored. This is much more efficient on values that are approximately the size of the device’s erase block size as those values may be stored in an erase block and when deleted the erase block can be erased. For smaller values, an implementation would need to manage garbage collection as values are deleted and when appropriate move values that remain in a mostly empty erase block into a new erase block prior to erasing the erase block. This is no different than current garbage collection. The NVMe KV standard provides a mechanism for the device to report optimal value size to the host in order to better manage this as well.

Q. What about encryption?  Supported now or will there be SED versions of [key value] drives released down the road?

A. There is no reason that a product could not support encryption with the current definition of key value storage. The release of SED (self-encrypting drive) products is vendor specific.

Q. What are considered to be best use cases for this technology? And for those use cases - what's the expected performance improvement vs. current NVMe drives + software?

A. The initial use case is for database applications where the database is already storing key/value pairs. In this use case, experimentation has shown that a 6x performance improvement from RocksDB to a KV SSD implementing KV-Rocks is possible.

Q. Since writes are complete (value must be written altogether), does this mean values are restricted to NVMe's MDTS?

 A. Yes. Values are limited by MDTS (maximum data transfer size). A KV device may set this value to something greater than a block storage device does in order to support larger value sizes.

Q. How do protection scheme works with key-value (erasure coding/RAID/...)?

A. Since key value deals with complete values as opposed to blocks that make up a user data, RAID and erasure coding are usually not applicable to key value systems. The most appropriate data protection scheme for key value storage devices would be a mirrored scheme. If a storage solution performed erasure coding on data first, it could store the resulting EC fragments or symbols on key value SSDs.

Q. So Key Value is not something built on top of block like Object and NFS are?  Object and NFS data are still stored on disks that operate on sectors, so object and NFS are layers on top of block storage?  KV is drastically different, uses different drive firmware and drive layout?  Or do the drives still work the same and KV is another way of storing data on them alongside block, object, NFS?

A. Today, there is only one storage paradigm at the drive level -- block. Object and NFS are mechanisms in the host to map data models onto block storage. Key Value storage is a mechanism for the storage device to map from an address (a key) to a physical location where the value is stored, avoiding a translation in the host from the Key/value pair to a set of block addresses which are then mapped to physical locations where data is then stored. A device may have one namespace that stores blocks and another namespace that stores key value pairs. There is not a difference in the low-level storage mechanism only in the mapping process from address to physical location. Another difference from block storage is that the value stored is not a fixed size.

Q. Could you explain more about how tx/s is increased with KV?

A. The increase in transfers/second occurs for two reasons: one is because the translation layer in the host from key/value to block storage is removed; the second is that the commands over the bus are reduced to a single transfer for the entire key value pair. The latency savings from this second reduction is less significant than the savings from removing translation operations that have to happen in the host.

Keep up-to-date on work SNIA is doing on the Key Value Storage API Specification at the SNIA website.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

NVMe Key-Value Standard Q&A

John Kim

Nov 9, 2020

title of post
Last month, Bill Martin, SNIA Technical Council Co-Chair, presented a detailed update on what’s happening in the development and deployment of the NVMe Key-Value standard. Bill explained where Key Value fits within an architecture, why it’s important, and the standards work that is being done between NVM Express and SNIA. The webcast was one of our highest rated. If you missed it, it’s available on-demand along with the webcast slides. Attendees at the live event had many great questions, which Bill Martin has answered here: Q. Two of the most common KV storage mechanisms in use today are AWS S3 and RocksDB. How does NVMe KV standards align or differ from them? How difficult would it be to map between the APIs and semantics of those other technologies to NVMe KV devices? A. KV Storage is intended as a storage layer that would support these and other object storage mechanisms. There is a publicly available KVRocks: RocksDB compatible key value store and MyRocks compatible storage engine designed for KV SSDs at GitHub. There is also a Ceph Object storage design available. These are example implementations that can help an implementer get to an efficient use of NVMe KV storage. Q. At which layer will my app stack need to change to take advantage of KV storage?  Will VMware or Linux or Windows need to change at the driver level?  Or do the apps need to be changed to treat data differently?  If the apps don’t need to change doesn’t this then just take the data layout tables and move them up the stack in to the server? A. The application stack needs to change at the point where it interfaces to a filesystem, where the interface would change from a filesystem interface to a KV storage interface. In order to take advantage of Key Value storage, the application itself may need to change, depending on what the current application interface is. If the application is talking to a RocksDB or similar interface, then the driver could simply be changed out to allow the app to talk directly to Key Value Storage. In this case, the application does not care about the API or the underlying storage. If the application is currently interfacing to a filesystem, then the application itself would indeed need to change and the KV API provides a standardized interface that multiple vendors can support to provide both the necessary libraries and access to a Key Value storage device. There will need to be changes in the OS to support this in providing a kernel layer driver for the NVMe KV device. If the application is using an existing driver stack that goes through a filesystem and does not change, then you cannot take advantage of KV Storage, but if the application changes or already has an object storage interface then the kernel filesystem and mapping functions can be removed from the data path. Q. Is there a limit to the length of a key or value in the KV Architecture? A.There are limits to the Key and value sizes in the current NVMe standard. The current implementation limits the key to 16 bytes due to a desire to pass the key within the NVMe command. The other architectural limit on a key is that the length of the key is specified in a field that allows up to 255 bytes for the key length. To utilize this, an alternative mechanism for passing the key to the device is necessary. For the value, the limit on the size is 4 GBytes. Q. Are there any atomicity guarantees (e.g. for overwrites)? A. The current specification makes it mandatory for atomicity at the KV level. In other words, if a KV Store command overwrites an existing KV pair and there is a power failure, you either get all of the original value or all of the new value. Q. Is KV storage for a special class of storage called computational storage or can it be used for general purpose storage? A. This is for any application that benefits from storing objects as opposed to storing blocks. This is unrelated to computational storage but may be of use in computational storage applications. One application that has been considered is that for a filesystem that rather than using the filesystem for storing blocks and having a mapping of each file handle to a set of blocks that contain the file contents, you would use KV storage where the file handle is the key and the object holds the file contents. Q. What are the most frequently used devices to use the KV structure? A. If what is being asked is, what are the devices that provide a KV structure, then the answer is, we expect the most common devices using the KV structure will be KV SSDs. Q. Does the NVMe KV interface require 2 accessed in order to get the value (i.e., on access to get the value size in order to allocate the buffer and then a second access to read the value)? A.If you know the size of the object or if you can pre-allocate enough space for your maximum size object then you can do a single access. This is no different than current implementations where you actually have to specify how much data you are retrieving from the storage device by specifying a starting LBA and a length. If you do not know the size of the value and require that in order to retrieve the value then you would indeed need to submit two commands to the NVMe KV storage device. Q. Does the device know whether an object was compressed, and if not how can a previously compressed object be stored? A. The hardware knows if it does compression automatically and therefore whether it should de-compress the object. If the storage device supports compression and the no-compress option, then the device will store metadata with the KV pair indicating if no-compress was specified when storing the file in order to return appropriate data. If the KV storage device does not perform compression, it can simply support storage and retrieval of previously compressed objects. If the KV storage device performs its own compression and is given a previously-compressed object to store and the no-compress option is not requested, the device will recompress the value (which typically won’t result in any space savings) or if the no-compress option is requested the device will store the value without attempting additional compression. Q. On flash, erased blocks are fixed sizes, so how does Key Value handle defrag after a lot of writes and deletes? A. This is implementation specific and depends on the size of the values that are stored. This is much more efficient on values that are approximately the size of the device’s erase block size as those values may be stored in an erase block and when deleted the erase block can be erased. For smaller values, an implementation would need to manage garbage collection as values are deleted and when appropriate move values that remain in a mostly empty erase block into a new erase block prior to erasing the erase block. This is no different than current garbage collection. The NVMe KV standard provides a mechanism for the device to report optimal value size to the host in order to better manage this as well. Q. What about encryption?  Supported now or will there be SED versions of [key value] drives released down the road? A. There is no reason that a product could not support encryption with the current definition of key value storage. The release of SED (self-encrypting drive) products is vendor specific. Q. What are considered to be best use cases for this technology? And for those use cases – what’s the expected performance improvement vs. current NVMe drives + software? A. The initial use case is for database applications where the database is already storing key/value pairs. In this use case, experimentation has shown that a 6x performance improvement from RocksDB to a KV SSD implementing KV-Rocks is possible. Q. Since writes are complete (value must be written altogether), does this mean values are restricted to NVMe’s MDTS? A. Yes. Values are limited by MDTS (maximum data transfer size). A KV device may set this value to something greater than a block storage device does in order to support larger value sizes. Q. How do protection scheme works with key-value (erasure coding/RAID/…)? A. Since key value deals with complete values as opposed to blocks that make up a user data, RAID and erasure coding are usually not applicable to key value systems. The most appropriate data protection scheme for key value storage devices would be a mirrored scheme. If a storage solution performed erasure coding on data first, it could store the resulting EC fragments or symbols on key value SSDs. Q. So Key Value is not something built on top of block like Object and NFS are?  Object and NFS data are still stored on disks that operate on sectors, so object and NFS are layers on top of block storage?  KV is drastically different, uses different drive firmware and drive layout?  Or do the drives still work the same and KV is another way of storing data on them alongside block, object, NFS? A. Today, there is only one storage paradigm at the drive level — block. Object and NFS are mechanisms in the host to map data models onto block storage. Key Value storage is a mechanism for the storage device to map from an address (a key) to a physical location where the value is stored, avoiding a translation in the host from the Key/value pair to a set of block addresses which are then mapped to physical locations where data is then stored. A device may have one namespace that stores blocks and another namespace that stores key value pairs. There is not a difference in the low-level storage mechanism only in the mapping process from address to physical location. Another difference from block storage is that the value stored is not a fixed size. Q. Could you explain more about how tx/s is increased with KV? A. The increase in transfers/second occurs for two reasons: one is because the translation layer in the host from key/value to block storage is removed; the second is that the commands over the bus are reduced to a single transfer for the entire key value pair. The latency savings from this second reduction is less significant than the savings from removing translation operations that have to happen in the host. Keep up-to-date on work SNIA is doing on the Key Value Storage API Specification at the SNIA website.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Ilker Cebeli

Oct 20, 2020

title of post

Everyone is looking to squeeze more efficiency from storage. That’s why the

SNIA Networking Storage Forum hosted a live webcast last month “Compression: Putting the Squeeze on Storage.” The audience asked many great questions on compression techniques. Here are answers from our expert presenters, John Kim and Brian Will:

Q. When multiple unrelated entities are likely to compress the data, how do they understand that the data is already compressed and so skip the compression?

A. Often they can tell from the file extension or header that the file has already been compressed. Otherwise each entity that wants to compress the data will try to compress it and then discard the results if it makes the file larger (because it was already compressed). 

Q. I’m curious about storage efficiency of data reduction techniques (compression/ thin provisioning etc) on certain database/server workloads which end up being more of a hindrance. Ex: Oracle ASM, which does not perform very well under any form of storage efficiency method. In such scenarios, what would be the recommendation to ensure storage is judiciously utilized?

A. Compression works well for some databases but not others, depending both on how much data repetition occurs within the database and how the database tables are structured. Database compression can be done on the row, column or page level, depending on the method and the database structure. Thin provisioning generally works best if multiple applications using the storage system (such as the database application) want to reserve or allocate more space than it actually needs. If your database system does not like the use of external (storage-based, OS-based, or file system-based) space efficiency techniques, you should check if it supports its own internal compression options.

Q. What is a DPU?

A. A DPU is a data processing unit that specializes in moving, analyzing and processing data as it moves in and out of servers, storage, or other devices. DPUs usually combine network interface card (NIC) functionality with programmable CPU and/or FPGA cores. Some possible DPU functions include packet forwarding, encryption/decryption, data compression/decompression, storage virtualization/acceleration, executing SDN policies, running a firewall agent, etc. 

Q. What's the difference between compression versus compaction?

A. Compression replaces repeated data with either shorter symbols or pointers that represent the original data but take up less space. Compaction eliminates empty space between blocks or inside of files, often by moving real data closer together. For example, if you store multiple 4KB chunks of data in a storage system that uses 32KB blocks, the default storage solution might consume one 32KB storage block for each 4KB of data. Compaction could put 5 to 8 of those 4KB data chunks into one 32KB storage block to recover wasted free space.

Q. Is data encryption at odds with data compression?  That is, is data encryption a problem for data compression?

A.If you encrypt data first, it usually makes compression of the encrypted data difficult or impossible, depending on the encryption algorithm. (A simple substitution cypher would still allow compression but wouldn't be very secure.) In most cases, the answer is to first compress the data then encrypt it. Going the other way, the reverse process is to first decrypt the data then decompress it.

Q. How do we choose the binary form code 00, 01, 101, 110, etc?

A. These will be used as the final symbol representations written into the output data stream. The table represented in the presentation is only illustrative, the algorithm document in the deflate RFC is a complete algorithm to represent symbols in a compacted binary form.

Q. Is there a resource for different algorithms vs CPU requirements vs compression ratios?

A. A good resource to see the cost versus ratio trade-offs with different algorithms is on GItHub here. This utility covers a wide range of compression algorithms, implementations and levels. The data shown on their GitHub location is benchmarked against the silesia corpus which represents a number of different data sets.

Q. Do these operations occur on individual data blocks, or is this across the entire compression job?

A. Assuming you mean the compression operations, it typically occurs across multiple data blocks in the compression window. The compression window almost always spans more than one data block but usually does not span the entire file or disk/SSD, unless it's a small file.

Q. How do we guarantee that important information is not lost during the lossy compression?

A. Lossy compression is not my current area of expertise but there is a significant area of information theory called Rate-distortion theory which is used for quantification of images for compression, that may be of interest. In addition, lossy compression is typically only used for files/data where it's known the users of that data can tolerate the data loss, such as images or video. The user or application can typically adjust the compression ratio to ensure an acceptable level of data loss.

Q. Do you see any advantage in performing the compression on the same CPU controller that is managing the flash (running the FTL, etc.)?

A.There may be cache benefits from running compression and flash on the same CPU depending on the size of transactions. If the CPU is on the SSD controller itself, running compression there could offload the work from the main system CPU, allowing it to spend more cycles running applications instead of doing compression/decompression.

Q. Before compressing data, is there a method to check if the data is good to be compressed?

A.Some compression systems can run a quick scan of a file to estimate the likely compression ratio. Other systems look at the extension and/or header of the file and skip attempts to compress it if it looks like it's already compressed, such as most image and video files. Another solution is to actually attempt to compress the file and then discard the compressed version if it's larger than the original file.

Q. If we were to compress on a storage device (SSD) what do you think are the topic challenges? Error propagation? Latency/QoS or other?

A. Compressing on a storage device could mean higher latency for the storage device, both when writing files (if compression is inline) or when reading files back (as they are decompressed). But it's likely this latency would otherwise exist somewhere else in the system if the files were being compressed and decompressed somewhere other than on the storage device. Compressing (and decompressing) on the storage device means the data will be transmitted to (and from) the storage while uncompressed, which could consume more bandwidth. If an SSD is doing post compression (i.e. compression after the file is stored and not inline as the file is being stored), it would likely cause more wear on the SSD because each file is written twice.

Q. Are all these CPU-based compression analyses?

A. Yes these are CPU-based compression analyses.

Q. Can you please characterize the performance difference between, say LZ4 and Deflate in terms of microseconds or nanoseconds?

A. Extrapolating from the data available here, an 8KB request using LZ4 fast level 3 (lz4fast 1.9.2 -3) would take 9.78 usec for compression and 1.85 usec for decompression. While using zlib level 1 for an 8KB request compression takes 68.8 usec while decompression will take 21.39 usec. Another aspect to note it that at while LZ4 fast level 3 takes significantly less time, the compression ratio is 50.52% while zlib level 1 is 36.45%, showing that better compression ratios can have a significant cost.

Q. How important is the compression ratio when you are using specialty products?

A. The compression ratio is a very important result for any compression algorithm or implementation.

Q. In slide #15, how do we choose the binary code form for the characters?

A. The binary code form in this example is entirely controlled by the frequency of occurrence of the symbol within the data stream. The higher the symbol frequency the shorter the binary code assigned. The algorithm used here is just for illustrative purposes and would not be used (at least in this manner) in a standard. Huffman Encoding in DEFLATE. Here is a good example of a defined encoding algorithm.

This webcast was part of a SNIA NSF series on data reduction. Please check out the other two sessions:

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Ilker Cebeli

Oct 20, 2020

title of post
Everyone is looking to squeeze more efficiency from storage. That’s why the SNIA Networking Storage Forum hosted a live webcast last month “Compression: Putting the Squeeze on Storage.” The audience asked many great questions on compression techniques. Here are answers from our expert presenters, John Kim and Brian Will: Q. When multiple unrelated entities are likely to compress the data, how do they understand that the data is already compressed and so skip the compression? A. Often they can tell from the file extension or header that the file has already been compressed. Otherwise each entity that wants to compress the data will try to compress it and then discard the results if it makes the file larger (because it was already compressed). Q. I’m curious about storage efficiency of data reduction techniques (compression/ thin provisioning etc) on certain database/server workloads which end up being more of a hindrance. Ex: Oracle ASM, which does not perform very well under any form of storage efficiency method. In such scenarios, what would be the recommendation to ensure storage is judiciously utilized? A. Compression works well for some databases but not others, depending both on how much data repetition occurs within the database and how the database tables are structured. Database compression can be done on the row, column or page level, depending on the method and the database structure. Thin provisioning generally works best if multiple applications using the storage system (such as the database application) want to reserve or allocate more space than it actually needs. If your database system does not like the use of external (storage-based, OS-based, or file system-based) space efficiency techniques, you should check if it supports its own internal compression options. Q. What is a DPU? A. A DPU is a data processing unit that specializes in moving, analyzing and processing data as it moves in and out of servers, storage, or other devices. DPUs usually combine network interface card (NIC) functionality with programmable CPU and/or FPGA cores. Some possible DPU functions include packet forwarding, encryption/decryption, data compression/decompression, storage virtualization/acceleration, executing SDN policies, running a firewall agent, etc. Q. What’s the difference between compression versus compaction? A. Compression replaces repeated data with either shorter symbols or pointers that represent the original data but take up less space. Compaction eliminates empty space between blocks or inside of files, often by moving real data closer together. For example, if you store multiple 4KB chunks of data in a storage system that uses 32KB blocks, the default storage solution might consume one 32KB storage block for each 4KB of data. Compaction could put 5 to 8 of those 4KB data chunks into one 32KB storage block to recover wasted free space. Q. Is data encryption at odds with data compression?  That is, is data encryption a problem for data compression? A.If you encrypt data first, it usually makes compression of the encrypted data difficult or impossible, depending on the encryption algorithm. (A simple substitution cypher would still allow compression but wouldn’t be very secure.) In most cases, the answer is to first compress the data then encrypt it. Going the other way, the reverse process is to first decrypt the data then decompress it. Q. How do we choose the binary form code 00, 01, 101, 110, etc? A. These will be used as the final symbol representations written into the output data stream. The table represented in the presentation is only illustrative, the algorithm document in the deflate RFC is a complete algorithm to represent symbols in a compacted binary form. Q. Is there a resource for different algorithms vs CPU requirements vs compression ratios? A. A good resource to see the cost versus ratio trade-offs with different algorithms is on GItHub here. This utility covers a wide range of compression algorithms, implementations and levels. The data shown on their GitHub location is benchmarked against the silesia corpus which represents a number of different data sets. Q. Do these operations occur on individual data blocks, or is this across the entire compression job? A. Assuming you mean the compression operations, it typically occurs across multiple data blocks in the compression window. The compression window almost always spans more than one data block but usually does not span the entire file or disk/SSD, unless it’s a small file. Q. How do we guarantee that important information is not lost during the lossy compression? A. Lossy compression is not my current area of expertise but there is a significant area of information theory called Rate-distortion theory which is used for quantification of images for compression, that may be of interest. In addition, lossy compression is typically only used for files/data where it’s known the users of that data can tolerate the data loss, such as images or video. The user or application can typically adjust the compression ratio to ensure an acceptable level of data loss. Q. Do you see any advantage in performing the compression on the same CPU controller that is managing the flash (running the FTL, etc.)? A.There may be cache benefits from running compression and flash on the same CPU depending on the size of transactions. If the CPU is on the SSD controller itself, running compression there could offload the work from the main system CPU, allowing it to spend more cycles running applications instead of doing compression/decompression. Q. Before compressing data, is there a method to check if the data is good to be compressed? A.Some compression systems can run a quick scan of a file to estimate the likely compression ratio. Other systems look at the extension and/or header of the file and skip attempts to compress it if it looks like it’s already compressed, such as most image and video files. Another solution is to actually attempt to compress the file and then discard the compressed version if it’s larger than the original file. Q. If we were to compress on a storage device (SSD) what do you think are the topic challenges? Error propagation? Latency/QoS or other? A. Compressing on a storage device could mean higher latency for the storage device, both when writing files (if compression is inline) or when reading files back (as they are decompressed). But it’s likely this latency would otherwise exist somewhere else in the system if the files were being compressed and decompressed somewhere other than on the storage device. Compressing (and decompressing) on the storage device means the data will be transmitted to (and from) the storage while uncompressed, which could consume more bandwidth. If an SSD is doing post compression (i.e. compression after the file is stored and not inline as the file is being stored), it would likely cause more wear on the SSD because each file is written twice. Q. Are all these CPU-based compression analyses? A. Yes these are CPU-based compression analyses. Q. Can you please characterize the performance difference between, say LZ4 and Deflate in terms of microseconds or nanoseconds? A. Extrapolating from the data available here, an 8KB request using LZ4 fast level 3 (lz4fast 1.9.2 -3) would take 9.78 usec for compression and 1.85 usec for decompression. While using zlib level 1 for an 8KB request compression takes 68.8 usec while decompression will take 21.39 usec. Another aspect to note it that at while LZ4 fast level 3 takes significantly less time, the compression ratio is 50.52% while zlib level 1 is 36.45%, showing that better compression ratios can have a significant cost. Q. How important is the compression ratio when you are using specialty products? A. The compression ratio is a very important result for any compression algorithm or implementation. Q. In slide #15, how do we choose the binary code form for the characters? A. The binary code form in this example is entirely controlled by the frequency of occurrence of the symbol within the data stream. The higher the symbol frequency the shorter the binary code assigned. The algorithm used here is just for illustrative purposes and would not be used (at least in this manner) in a standard. Huffman Encoding in DEFLATE. Here is a good example of a defined encoding algorithm. This webcast was part of a SNIA NSF series on data reduction. Please check out the other two sessions:

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

How Can You Keep Data in Transit Secure?

Alex McDonald

Oct 12, 2020

title of post

It's well known that data is often considered less secure while in motion, particularly across public networks, and attackers are finding increasingly innovative ways to snoop on and compromise data in flight. But risks can be mitigated with foresight and planning. So how do you adequately protect data in transit? It’s the next topic the SNIA Networking Storage Forum (NSF) will tackle as part of our Storage Networking Security Webcast Series.  Join us October 28, 2020 for our live webcast Securing Data in Transit.

In this webcast, we'll cover what the threats are to your data as it's transmitted, how attackers can interfere with data along its journey, and methods of putting effective protection measures in place for data in transit. We’ll discuss: 

  • The large attack surface that data in motion provides, and an overview of the current threat landscape
  • What transport layer security protocols (SSL, TLS, etc.) are best for protecting data in transit?
  • Different encryption technologies and their role in protecting data in transit
  • A look at Fibre Channel security
  • Current best practice deployments; what do they look like?

Register today and join us on a journey to provide safe passage for your data.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

How Can You Keep Data in Transit Secure?

AlexMcDonald

Oct 12, 2020

title of post
It’s well known that data is often considered less secure while in motion, particularly across public networks, and attackers are finding increasingly innovative ways to snoop on and compromise data in flight. But risks can be mitigated with foresight and planning. So how do you adequately protect data in transit? It’s the next topic the SNIA Networking Storage Forum (NSF) will tackle as part of our Storage Networking Security Webcast Series.  Join us October 28, 2020 for our live webcast Securing Data in Transit. In this webcast, we’ll cover what the threats are to your data as it’s transmitted, how attackers can interfere with data along its journey, and methods of putting effective protection measures in place for data in transit. We’ll discuss:
  • The large attack surface that data in motion provides, and an overview of the current threat landscape
  • What transport layer security protocols (SSL, TLS, etc.) are best for protecting data in transit?
  • Different encryption technologies and their role in protecting data in transit
  • A look at Fibre Channel security
  • Current best practice deployments; what do they look like?
Register today and join us on a journey to provide safe passage for your data.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to Networked Storage