SNIA: Experts on Data Explained

Richelle Ahlvers

Apr 14, 2025

title of post

Everybody in tech knows that change is constant and advancements happen really fast! Think back just five years—how many of your projects have transformed, morphed, or even disappeared? The same holds true for SNIA. For the past 25 years, SNIA has continuously adapted to the shifting tech landscape.  

Historically known as the Storage Networking Industry Association, storage has been at the heart of our mission. However, as the explosion of data accelerates, SNIA’s scope has expanded to encompass all technologies related to data. In fact, our name now is just “SNIA.” We no longer spell out the acronym.

Last year, we redefined our mission to highlight our continued evolution and strategic vision to better reflect our 2,000 plus members’ expertise and projects. “SNIA: Experts on Data” highlights our broader, data-centric approach, covering acceleration, computation, and more. We have segmented our work into six data-focus areas: Accelerate, Protect, Optimize Infrastructure, Store, Transport and Format. 

Do some of these areas overlap? Yes. Does SNIA still care about Storage? Definitely.

For a deeper dive, we’ve launched a "Data Focus" podcast series where SNIA leaders break down each focus area, share real-world applications, and discuss the ongoing work driving the organization. I encourage you to listen or watch these Data Focus podcasts here on the SNIA website

If you’re passionate about data and technology, we invite you to join us. SNIA offers flexible membership options, allowing you to contribute to existing initiatives or bring new ideas to life. Be part of the innovation—help us shape the future of SNIA!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Unveiling the Power of 24G SAS: Enabling Storage Scalability in OCP Platforms

STA Forum

Oct 2, 2024

title of post
By Cameron T. Brett & Pankaj Kalra In the fast-paced world of data centers, innovation is key to staying ahead of the curve. The Open Compute Project (OCP) has been at the forefront of driving innovation in data center hardware, and its latest embrace of 24G SAS technology is a testament to this commitment. Join us as we delve into the exciting world of 24G SAS and its transformative impact on OCP data centers. OCP's Embrace of 24G SAS The OCP datacenter SAS-SATA device specification, a collaborative effort involving industry giants like Meta, HPE, and Microsoft, was first published in 2023. This specification laid the groundwork for the integration of 24G SAS technology into OCP data centers, marking a significant milestone in storage innovation. The Rise of SAS in Hyperscale Environments While SAS has long been associated with traditional enterprise storage, its adoption in hyperscale environments is less widely known. However, SAS's scalability, reliability, and manageability have made it the storage interface of choice for hyperscale and enterprise data centers, powering some of the largest and most dynamic infrastructures in the world. The Grand Canyon Storage Platform At the Open Compute Project Global Summit conference in October 2022, the Grand Canyon storage platform made its debut, showcasing the capabilities of 24G SAS technology. This cutting-edge system, built around a 24G SAS storage architecture, offers high storage capacity, with either SAS or SATA hard disk drives (72 slots), designed to meet the ever-growing demands of modern data centers. Exploring the Scalability of SAS One of the key advantages of SAS is its unparalleled scalability, capable of supporting thousands of devices seamlessly. This scalability, combined with SAS's reliability and manageability features, makes it the ideal choice for data centers looking to expand their storage infrastructure while maintaining operational efficiency. Delving Deeper into SAS Technology For those eager to learn more about SAS technology, a visit to the SNIA STA Forum provides a wealth of resources and information. Additionally, exploring videos on the SAS Playlist on SNIAVideo YouTube Channel offers insights into the capabilities and applications of this innovative storage interface. Conclusion As OCP continues to drive innovation in data center hardware, the embrace of 24G SAS technology represents a significant step forward in meeting the evolving needs of modern data centers. By harnessing the power of SAS, OCP data centers are poised to achieve new levels of scalability, reliability, and performance, ensuring they remain at the forefront of the digital revolution. Follow us on X and LinkedIn to stay updated on the latest developments in 24G SAS technology.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

New Standard Brings Certainty to the Process of Proper Eradication of Data

Eric Hibbard

Oct 3, 2022

title of post

A wide variety of data types are recorded on a range of data storage technologies, and businesses need to ensure data residing on data storage devices and media are disposed of in a way that ensures compliance through verification of data eradication.

When media are repurposed or retired from use, the stored data often must be eliminated (sanitized) to avoid potential data breaches. Depending on the storage technology, specific methods must be employed to ensure that the data is eradicated on the logical/virtual storage and media-aligned storage in a verifiable manner.

Existing published standards such as NIST SP 800-88 Revision 1 (Media Sanitization) and ISO/IEC 27040:2015 (Information technology – Security techniques – Storage security) provide guidance on sanitization, covering storage technologies from the last decade but have not kept pace with current technology or legislative requirements.  

New standard makes conformance clearer

Recently published (August 2022), the IEEE 2883-2022 IEEE Standard for Sanitizing Storage addresses contemporary technologies as well as providing requirements that can be used for conformance purposes.

The new international standard, as with ISO/IEC 27040, defines sanitization as the ability to render access to target data on storage media infeasible for a given level of effort. IEEE 2883 is anticipated to be the go-to standard for media sanitization of modern and legacy technologies.

The IEEE 2883 standard specifies three methods for sanitizing storage: Clear, Purge, and Destruct. In addition, the standard provides technology-specific requirements and guidance for eradicating data associated with each sanitization method.

It establishes:

  • A baseline standard on how to sanitize data by media type according to accepted industry categories of Clear, Purge, and Destruct
  • Specific guidance so that organizations can trust they have achieved sanitization and can make confident conformance claims
  • Clarification around the various methods by media and type of sanitization
  • The standard is designed to be referenceable by other standards documents, such as NIST or ISO standards, so that they can also reflect the most up-to-date sanitization methods.

With this conformance clarity, particularly if widely adopted, organizations will be able to make more precise decisions around how they treat their end-of-life IT assets.

In addition, IEEE recently approved a new project IEEE P2883.1 (Recommended Practice for Use of Storage Sanitization Methods) to build on IEEE 2883-2022. Anticipated topic will cover guidance on selecting appropriate sanitization methods and verification approaches.

If you represent a data-driven organization, data security audit or certification organization, or a manufacturer of data storage technologies—you should begin preparing for these changes now.

More Information

For more information visit the IEEE 2883 – Standard for Sanitizing Storage project page. The current IEEE Standard for Sanitizing Storage is also available for purchase.

There is an IEEE webinar on Storage Sanitization – Eradicating Data in an Eco-friendly Way scheduled for October 26th. Register now.

The SNIA Storage Security Summit held in May this year covered the topic of media sanitization and the new standard and you can now view the recorded presentation.

Eric A. Hibbard, CISSP-ISSAP, ISSMP, ISSEP, CIPT, CISA, CCSK 

Chair, SNIA Security Technical Work Group & Chair; INCITS TC CS1 Cyber Security; Chair, IEEE Cybersecurity & Privacy Standards Committee (CPSC); Co-Chair, Cloud Security Alliance (CSA) – International Standardization Council (ISC); Co-Chair, American Bar Association – SciTech Law – Internet of Things (IoT) Committee

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Implications of Doing More at the Edge

Alex McDonald

May 10, 2022

title of post
In our SNIA Networking Storage Forum webcast series, “Storage Life on the Edge” we’ve been examining the many ways the edge is impacting how data is processed, analyzed and stored. I encourage you to check out the sessions we’ve done to date: On June 15, 2022, we continue the series with “Storage Life on the Edge: Accelerated Performance Strategies” where our SNIA experts will discuss the need for faster computing, access to storage, and movement of data at the edge as well as between the edge and the data center, covering:
  • The rise of intelligent edge locations
  • Different solutions that provide faster processing or data movement at the edge
  • How computational storage can speed up data processing and transmission at the edge
  • Security considerations for edge processing
We look forward to having you join us to cover all this and more. We promise to keep you on the edge of your virtual seat! Register today.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Tom Friend

Oct 20, 2021

title of post
What types of storage are needed for different aspects of AI? That was one of the many topics covered in our SNIA Networking Storage Forum (NSF) webcast “Storage for AI Applications.” It was a fascinating discussion and I encourage you to check it out on-demand. Our panel of experts answered many questions during the live roundtable Q&A. Here are answers to those questions, as well as the ones we didn’t have time to address. Q. What are the different data set sizes and workloads in AI/ML in terms of data set size, sequential/ random, write/read mix? A. Data sets will vary incredibly from use case to use case. They may be GBs to possibly 100s of PB. In general, the workloads are very heavily reads maybe 95%+. While it would be better to have sequential reads, in general the patterns tend to be closer to random. In addition, different use cases will have very different data sizes. Some may be GBs large, while others may be <1 KB. The different sizes have a direct impact on performance in storage and may change how you decide to store the data. Q. More details on the risks associated with the use of online databases? A. The biggest risk with using an online DB is that you will be adding an additional workload to an important central system. In particular, you may find that the load is not as predictable as you think and it may impact the database performance of the transactional system. In some cases, this is not a problem, but when it is intended for actual transactions, you could be hurting your business. Q. What is the difference between a DPU and a RAID / storage controller? A. A Data Processing Unit or DPU is intended to process the actual data passing through it. A RAID/storage controller is only intended to handle functions such as data resiliency around the data, but not the data itself. A RAID controller might take a CSV file and break it down for storage in different drives. However, it does not actually analyze the data. A DPU might take that same CSV and look at the different rows and columns to analyze the data. While the distinction may seem small, there is a big difference in the software. A RAID controller does not need to know anything about the data, whereas a DPU must be programmed to deal with it. Another important aspect is whether or not the data will be encrypted. If the data will encrypted, a DPU will have to have additional security mechanisms to deal with decryption of the data. However, a RAID-based system will not be affected. Q. Is a CPU-bypass device the same as a SmartNIC? A. Not entirely. They are often discussed together, but a DPU is intended to process data, whereas a SmartNIC may only process how the data is handled (such as encryption, handle TCP/IP functions, etc.).  It is possible for a SmartNIC to also act as a DPU where the data itself is processed. There are new NVMe-oF™ technologies that are beginning to allow FPGA, TPD, DPU, GPU and other devices direct access to other servers’ storage directly over a high-speed local area network without having to access the CPU of that system. Q. What work is being done to accelerate S3 performance with regard to AI? A. A number of companies are working to accelerate the S3 protocol. Presto and a number of Big Data technologies use it natively. For AI workloads there are a number of caching technologies to handle the re-reads of training on a local system. Minimizing the performance penalty Q. From a storage perspective, how do I take different types of data from different storage systems to develop a model? A. Work with your project team to find the data you need and ensure it can be served to the ML/DL training (or inference) environment in a timely manner. You may need to copy (or clone) data on to a faster medium to achieve your goals. But look at the process as a whole. Do not underestimate the data cleansing/normalization steps in your storage analysis as it can prove to be a bottleneck. Q. Do I have to “normalize” that data to the same type, or can a model accommodate different data types? A. In general, yes. Models can be very sensitive. A model trained on one set of data with one set of normalizations may not be accurate if data that was taken from a different set with different normalizations is used for inference. This does depend on the model, but you should be aware not only of the model, but also the details of how the data was prepared prior to training. Q. If I have to change the data type, do I then need to store it separately? A. It depends on your data, “do other systems need it in the old format?” Q. Are storage solutions that are right for one form of AI also the best for others? A. No. While it may be possible to use a single solution for multiple AIs, in general there are differences in the data that can necessitate different storage. A relatively simple example is large data (MBs) vs. small data (~1KB). Data in that multiple MBs large example can be easily erasure coded and stored more cost effectively. However, for small data, Erasure Coding is not practical and you generally will have to go with replication. Q. How do features like CPU bypass impact performance of storage? A. CPU bypass is essential for those times when all you need to do is transfer data from one peripheral to another without processing. For example, if you are trying to take data from a NIC and transfer it to a GPU, but not process the data in any way, CPU bypass works very well. It prevents the CPU and system memory from becoming a bottleneck. Likewise, on a storage server, if you simply need to take data from an SSD and pass it to a NIC during a read, CPU bypass can really help boost system performance. One important note: if you are well under the limits of the CPU, the benefits of bypass are small. So, think carefully about your system design and whether or not the CPU is a bottleneck. In some cases, people will use system memory as a cache and in these cases, bypassing CPU isn’t possible. Q. How important is it to use All-Flash storage compared to HDD or hybrid? A. Of course, It depends on your workloads. For any single model, you may be able to make due with HDD. However, another consideration for many of the AI/ML systems is that their use can quite suddenly expand. Once there is some amount of success, you may find that more people will want access to the data and the system may experience more load. So beware of the success of these early projects as you may find your need for creation of multiple models from the same data could overload your system. Q. Will storage for AI/ML necessarily be different from standard enterprise storage today? A. Not necessarily. It may be possible for enterprise solutions today to meet your requirements. However, a key consideration is that if your current solution is barely able to handle its current requirements, then adding an AI/ML training workload may push it over the edge. In addition, even if your current solution is adequate, the size of many ML/DL models are growing exponentially every year.  So, what you provision today may not be adequate in a year or even several months.  Understanding the direction of the work your data scientists are pursuing is important for capacity and performance planning.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage for Applications Webcast Series

John Kim

Sep 8, 2021

title of post
Everyone enjoys having storage that is fast, reliable, scalable, and affordable. But it turns out different applications have different storage needs in terms of I/O requirements, capacity, data sharing, and security.  Some need local storage, some need a centralized storage array, and others need distributed storage—which itself could be local or networked. One application might excel with block storage while another with file or object storage. For example, an OLTP database might require small amounts of very fast flash storage; a media or streaming application might need vast quantities of inexpensive disk storage with extra security safeguards; while a third application might require a mix of different storage tiers with multiple servers sharing the same data. This SNIA Networking Storage Forum “Storage for Applications” webcast series will cover the storage requirements for specific uses such as artificial intelligence (AI), database, cloud, media & entertainment, automotive, edge, and more. With limited resources, it’s important to understand the storage intent of the applications in order to choose the right storage and storage networking strategy, rather than discovering the hard way that you’ve chosen the wrong solution for your application. We kick off this series on October 5, 2020 with “Storage for AI Applications.” AI is a technology which itself encompasses a broad range of use cases, largely divided into training and inference. In this webcast, we’ll look at what types of storage are typically needed for different aspects of AI, including different types of access (local vs. networked, block vs. file vs. object) and different performance requirements. And we will discuss how different AI implementations balance the use of on-premises vs. cloud storage. Tune in to this SNIA Networking Storage Forum (NSF) webcast to boost your natural (not artificial) intelligence about application-specific storage. Register today. Our AI experts will be waiting to answer your questions.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

An FAQ on Data Reduction Fundamentals

John Kim

Oct 5, 2020

title of post
There’s a fair amount of confusion when it comes to data reduction terminology and techniques. That’s why the SNIA Networking Storage Forum (NSF) hosted a live webcast, “Everything You Wanted to Know About Storage But Were Too Proud to Ask: Data Reduction.”  It was a 101-level lesson on the fundamentals of data reduction, which can be performed in different places and at different stages of the data lifecycle. The goal was to clear up confusion around different data reduction and data compression techniques and set the stage for deeper dive webcasts on this topic (see the end of this blog for info on those). As promised during the webcast, here are answers to the questions we didn’t have time to address during the live event. Q. Does block level compression have any direct advantage over file level compression? A. One significant advantage is not requiring the entire thing, the file or database or whatever we’re storing, to be compressed and decompressed as a unit. That would almost certainly increase read latency, and for large files, require quite a bit of caching. In the case of blocks, a single block can be the compression unit, even if it’s part of a file, database or other larger data structure. Compressing a block is much faster and computationally less intensive, which is reflected in reduced latency overhead and cache impacts. Q. You made it sound like thin provisioning had no overhead but on-demand allocation is an overhead and can be quite bad at the worst time.  Do you agree? A. Finding free space when the system is at capacity may be an issue, and this may indeed cause significant slowdowns. This is an undesirable situation, and the advice is never to run so close to the capacity wire that thin provisioning impacts performance or jeopardizes successfully writing the data. In a system with adequate amounts of free space, caching can make the normally small overhead of thin provisioning very small to unmeasurable. Q. Will migration to SSD zoning vs. HDD based block/pages impact data compression? A. It shouldn’t, since compression is done at a level where zoning isn’t an issue. Compression is only applicable to blocks or files. Q. Does compressing blocks on computational storage devices have the disadvantage of not reducing the PCIe bandwidth since raw data has to be transferred over to the storage devices? A. Yes. But the same is true of any storage device; so computational storage is no worse in respect of the transfer of the data, but it provides much more apparent storage on the device once it gets there. A computational storage device requires no application changes to do this. Q. How do we measure performance in out-of- line <data> reduction? A. Data reduction techniques like compression and deduplication can be done in-line (that is, while writing the data) or out-of-line (as a later point in time). Out-of-line shifts the compute required from now—where big horsepower is required if there’s to be no impact on storage performance, to later, where smaller processors can take their time. Out-of-line data reduction requires more space to store the data, as it’s unreduced when it’s written. These tradeoffs also have impacts on performance (both back-end latency and bandwidth). This all impacts the total cost of the system. It’s not so much that we need to measure the performance of in-line vs. out-of-line, something we know how to do, and declare one a winner; but it’s whether the system provides us the needed performance at the right cost. That’s a purchasing decision, not a technology one. Q. How do customers (or vendors) decide how wide their deduplication net should be, i.e. one disk, per file, across one file system, one storage system, or multiple storage systems? A. By testing and balancing the savings vs. the cost. One thing is true: the balance right now is very definitely in favor of deduplicating at every level where possible. Vendors can demonstrate huge space savings advantages by doing so. Consumers, as indicated by my answer to the previous question, need to look at the whole system and its cost vs. performance, and buy on that basis. Q. Is compression like doing deduplication on a very small and very local scale? A. You could think of it as bit-level deduplication, and then realize that you can stretch an analogy to breaking point… Q. Are some blocks or files so small that it’s not worth doing deduplication or cloning because the extra metadata will be larger than the block/file space savings? A. Yes. They’re often stored as is – but they do need metadata to say that they’re raw and not reduced. Q. Do cloning and snapshots operate only at the block level or can they operate at the file or object level too? A. Cloning and snapshots can operate at the file or object level, as long as there is an efficient way of extracting and storing the differences. Sometimes it’s cheaper and simpler just to copy the whole thing, especially for small files or objects. Q. Why does (Virtual Data Optimizer) VDO do dedupe before compression if the other way is preferable? Why is it better to compress then deduplicate? A. That’s a decision that the designers of VDO felt gave them the best storage efficiencies and reasonable compute overheads. (It’s also not the only system that uses this order.) But the dedupe scope of VDO is relatively small. Compression then deduplication allows in-line compression with out-of-line and much broader deduplication across very large sets of data, and there are many systems that use this order for that reason. Q. There’s also so much stuff because we (as an industry) have enabled storing so much stuff. (cheaply/affordably) ? Today’s business and storage market would look and act differently if costs were different. Data reduction’s interaction with encryption (e.g. proper ordering) could be useful to mention. Or a topic for another presentation! A. We’ll consider it! Remember I said we were taking a deeper dive on the topic of data reduction? We have two more webcast in this series – one on compression and the other on data deduplication. You can access them here:

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to Data Storage