Storage Implications of Doing More at the Edge

Alex McDonald

May 10, 2022

title of post

In our SNIA Networking Storage Forum webcast series, “Storage Life on the Edge” we’ve been examining the many ways the edge is impacting how data is processed, analyzed and stored. I encourage you to check out the sessions we’ve done to date:

On July 12, 2022, we continue the series with “Storage Life on the Edge: Accelerated Performance Strategies” where our SNIA experts will discuss the need for faster computing, access to storage, and movement of data at the edge as well as between the edge and the data center, covering:

  • The rise of intelligent edge locations
  • Different solutions that provide faster processing or data movement at the edge
  • How computational storage can speed up data processing and transmission at the edge
  • Security considerations for edge processing

We look forward to having you join us to cover all this and more. We promise to keep you on the edge of your virtual seat! Register today.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Implications of Doing More at the Edge

Alex McDonald

May 10, 2022

title of post
In our SNIA Networking Storage Forum webcast series, “Storage Life on the Edge” we’ve been examining the many ways the edge is impacting how data is processed, analyzed and stored. I encourage you to check out the sessions we’ve done to date: On June 15, 2022, we continue the series with “Storage Life on the Edge: Accelerated Performance Strategies” where our SNIA experts will discuss the need for faster computing, access to storage, and movement of data at the edge as well as between the edge and the data center, covering:
  • The rise of intelligent edge locations
  • Different solutions that provide faster processing or data movement at the edge
  • How computational storage can speed up data processing and transmission at the edge
  • Security considerations for edge processing
We look forward to having you join us to cover all this and more. We promise to keep you on the edge of your virtual seat! Register today.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Life on the Edge

Tom Friend

Dec 20, 2021

title of post

Cloud to Edge infrastructures are rapidly growing.  It is expected that by 2025, up to 75% of all data generated will be created at the Edge.  However, Edge is a tricky word and you’ll get a different definition depending on who you ask. The physical edge could be in a factory, retail store, hospital, car, plane, cell tower level, or on your mobile device. The network edge could be a top-of-rack switch, server running host-based networking, or 5G base station.

The Edge means putting servers, storage, and other devices outside the core data center and closer to both the data sources and the users of that data—both edge sources and edge users could be people or machines.

 This trilogy of SNIA Networking Storage Forum (NSF) webcasts will provide:

  1. An overview of Cloud to Edge infrastructures and performance, cost and scalability considerations
  2. Application use cases and examples of edge infrastructure deployments
  3. Cloud to Edge performance acceleration options

Attendees will leave with an improved understanding of compute, storage and networking resource optimization to better support Cloud to Edge applications and solutions.

At our first webcast in this series on January 26, 2022, “Storage Life on the Edge: Managing Data from the Edge to the Cloud and Back you‘ll learn:

  • Data and compute pressure points: aggregation, near & far Edge
  • Supporting IoT data
  • Analytics and AI considerations
  • Understanding data lifecycle to generate insights
  • Governance, security & privacy overview
  • Managing multiple Edge sites in a unified way

Register today! We look forward to seeing you on January 26th.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage Life on the Edge

Tom Friend

Dec 20, 2021

title of post
Cloud to Edge infrastructures are rapidly growing.  It is expected that by 2025, up to 75% of all data generated will be created at the Edge.  However, Edge is a tricky word and you’ll get a different definition depending on who you ask. The physical edge could be in a factory, retail store, hospital, car, plane, cell tower level, or on your mobile device. The network edge could be a top-of-rack switch, server running host-based networking, or 5G base station. The Edge means putting servers, storage, and other devices outside the core data center and closer to both the data sources and the users of that data—both edge sources and edge users could be people or machines. This trilogy of SNIA Networking Storage Forum (NSF) webcasts will provide:
  1. An overview of Cloud to Edge infrastructures and performance, cost and scalability considerations
  2. Application use cases and examples of edge infrastructure deployments
  3. Cloud to Edge performance acceleration options
Attendees will leave with an improved understanding of compute, storage and networking resource optimization to better support Cloud to Edge applications and solutions. At our first webcast in this series on January 26, 2022, “Storage Life on the Edge: Managing Data from the Edge to the Cloud and Back you‘ll learn:
  • Data and compute pressure points: aggregation, near & far Edge
  • Supporting IoT data
  • Analytics and AI considerations
  • Understanding data lifecycle to generate insights
  • Governance, security & privacy overview
  • Managing multiple Edge sites in a unified way
Register today! We look forward to seeing you on January 26th.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

A Q&A on Discovery Automation for NVMe-oF IP-Based SANs

Tom Friend

Nov 22, 2021

title of post
In order to fully unlock the potential of the NVMe® IP based SANs, we first need to address the manual and error prone process that is currently used to establish connectivity between NVMe Hosts and NVM subsystems. Several leading companies in the industry have joined together through NVM Express to collaborate on innovations to simplify and automate this discovery process. This was the topic of discussion at our recent SNIA Networking Storage Forum webcast “NVMe-oF: Discovery Automation for IP-based SANs” where our experts, Erik Smith and Curtis Ballard, took a deep dive on the work that is being done to address these issues. If you missed the live event, you can watch it on demand here and get a copy of the slides. Erik and Curtis did not have time to answer all the questions during the live presentation. As promised, here are answers to them all. Q. Is the Centralized Discovery Controller (CDC) highly available, and is this visible to the hosts?  Do they see a pair of CDCs on the network and retry requests to a secondary if a primary is not available? A. Each CDC instance is intended to be highly available. How this is accomplished will be specific to vendor and deployment type. For example, a CDC running inside of an ESX based VM can leverage VMware’s high availability (HA) and fault tolerant (FT) functionality. For most implementations the HA functionality for a specific CDC is expected to be implemented using methods that are not visible to the hosts.  In addition, each host will be able to access multiple CDC instances (e.g., one per “IP-Based SAN”). This ensures any problems encountered with any single CDC instance will not impact all paths between the host and storage. One point to note, it is not expected that there will be multiple CDC instances visible to each host via a single host interface.  Although this is allowed per the specification, it does make it much harder for administrators to effectively manage connectivity. Q. First: Isn’t the CDC the perfect Denial-of-Service (DoS) attack target? Being the ‘name server’ of NVMe-oF, when the CDC is compromised no storage is available anymore. Second: the CDC should be running as a multi-instance cluster to realize high availability or even better, have the CDC distributed like name-server in Fibre Channel (FC). A. With regard to denial-of-service attacks. Both FC’s Name Server and NVMe-oF’s CDC are susceptible to this type of problem and both have the ability to mitigate these types of concerns. FC can fence or shut a port that has a misbehaving end device is attached to it. The same can be done with Ethernet, especially when the “underlay config service” mentioned during the presentation is in use. In addition, the CDC’s role is slightly different than FC’s Name Server. If a denial-of-service attack was successfully executed against a CDC instance, existing host to storage connections would remain intact. Hosts that are rebooted or disconnected and reconnected could have a problem connecting to the CDC and could have a problem reconnecting to storage via the IP SAN that is experiencing the DoS attack. For the second concern, it’s all about the implementation. Nothing in the standard prevents the CDC from running in an HA or FT mode. When the CDC is deployed as a VM, hypervisor resident tools can be leveraged to provide HA and FT functionality. When the CDC is deployed as a collection of microservices that are running on the switches in an IP SAN, the services will be distributed. One implementation available today uses distributed microservices to enable scaling to meet or exceed what is possible with an FC SAN today. Q. Would Dell/SFSS CDC be offered as a generic Open Source in public domain so other third parties don’t have to develop their own CDC? If other third parties do not use Dell’s open CDC and develop their own CDC, how would that multi-CDC control work for a customer with multi-vendor storage arrays with multiple CDCs? A. Dell/SFSS CDC will not be open source. However, the reason Dell worked with HPE and other storage vendors on the 8009 and 8010 specifications, is to ensure that whichever CDC instance the customer chooses to deploy, all storage vendors will be able to discover and interoperate with it.  Dell’s goal is to create an NVMe IP-Based SAN ecosystem. As a result, Dell will work to make the CDC implementation (SFSS) interoperate with every NVMe IP-Based product, regardless of the vendor. The last thing anyone wants is for customers to have to worry about basic compatibility. Q. Does this work only for IPv4? We’re going towards IPv6 only environments? A. Both IPv4 and IPv6 can be used. Q. Will the discovery domains for this be limited to a single Ethernet broadcast domain or will there be mechanisms to scale out discovery to alternate subnets, like we see with DHCP & DHCP Relay? A. By default mDNS is constrained to a single broadcast domain. As a result, when you deploy a CDC as a VM, if the IP SAN consists of multiple broadcast domains per IP SAN instance (e.g., IP SAN A = VLANs/subnets 1, 2 and 3; IP SAN B = VLANs/subnets 10, 11 and 13) then you’ll need to ensure an interface from the VM is attached to each VLAN/subnet. However, creating a VM interface for each subnet is a sub-optimal user experience and as a result, there is the concept of an mDNS proxy in the standard. The mDNS proxy is just an mDNS responder that resides on the switch (similar concept to a DHCP proxy) that can respond to mDNS requests on each broadcast domain and point the end devices to the IP Address of the CDC (which could be on a different subnet). When you are selecting a switch vendor to use for your IP SAN, ensure you ask if they support an mDNS proxy. If they do not, you will need to do extra work to get your IP SAN configured properly. When you deploy the CDC as a collection of services running on the switches in an IP SAN, one of these services could be an mDNS responder. This is how Dell’s SFSS will be handling this situation. One final point about IP SANs that span multiple subnets: Historically, these types of configurations have been difficult to administer because of the need to configure and maintain static route table entries on each host. NVM Express has done an extensive amount of work with 8010 to ensure that we can eliminate the need to configure static routes. For more information about the solution to this problem, take a look at nvme-stas on github. Q. A question for Erik: If mDNS turned out to be a problem, how did you work around it? A. mDNS is actually a problem right now because none of the implementations in the first release actually support it. In the second release this limitation is resolved. In any case, the only issue I am expecting with mDNS will be environments that don’t want to use it (for one reason or another) or can’t use it (because the switch vendor does not support an mDNS proxy). In these situations, you can administratively configure the IP address of the CDC on the host and storage subsystem. Q. Just a comment. In my humble opinion, slide 15 is the most important slide at helping people see- at a glance – what these things are. Nice slide. A. Thanks, slide 15 was one of the earliest diagrams we created to communicate the concept of what we’re trying to do. Q. With Fibre Channel there are really only two HBA vendors and two switch vendors, so interoperability, even with vendor-specific implementations, is manageable. For Ethernet, there are many NIC and Switch vendors. How is interoperability going to be ensured in this more complex ecosystem? A. The FC discovery protocol is very stable. We have not seen any basic interop issues related to login and discovery for years. However, back in the early days (’98-02), we needed to validate each HBA/Driver/FW version and do so for each OS we wanted to support. With NVMe/TCP, each discovery client is software based and OS specific (not HBA specific).  As a result, we will only have two host-based discovery client implementations for now (ESX and Linux – see nvme-stas) and a discovery client for each storage OS. To date, we have been pleasantly surprised at the lack of interoperability issues we’ve seen as storage platforms have started integrating with CDC instances. Although it is likely we will see some issues as other storage vendors start to integrate with CDC instances from different storage vendors. Q. A lot of companies will want to use NVMe-oF via IP/Ethernet in a micro-segmented network. There are a lot of L3/routing steps to reach the target. This presentation did not go into this part of scalability, only into scalability of discovery. Today, all networks are now L3 with smaller and smaller subnets with lot of L3-points. A. This is a fair point. We didn’t have time to go into the work we have done to address the types of routing issues we’re anticipating and what we’ve done to mitigate them. However, nvme-stas, The Dell sponsored open-source discovery client for Linux, demonstrates how we use the CDC to work around these types of issues. Erik is planning to write about this topic on his blog brasstacksblog.typepad.com. You can follow him on twitter @provandal to make sure you don’t miss it.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Cabling, Connectors and Transceivers Questions Answered

Tim Lustig

Nov 9, 2021

title of post

In our recent live SNIA Network Storage Forum webcast, “Next-generation Interconnects: The Critical Importance of Connectors and Cables” provided an outstanding tutorial on the latest in the impressive array of data center infrastructure components designed to address expanding requirements for higher-bandwidth and lower-power. They covered common pluggable connectors and media types, copper cabling and transceivers, and real world use cases. If you missed the live event, it is available on-demand.

We ran out of time to answer all the questions from the live audience. As promised, here are answers to them all.

Q. For 25GbE, is the industry consolidating on one of the three options?

A. The first version of 25 GbE (i.e. 25GBASE-CR1) was specified by the Ethernet Technology Consortium around 2014.  This was followed approximately two years later by the IEEE 802.3 versions of 25 GbE (i.e. 25GBASE-CR-S and 25GBASE-CR). As a result of the timing, the first 25 GbE capable products to market only supported consortium mode. More recently developed switches and server products support both consortium and IEEE 802.3 modes, with the link establishment protocol favoring the IEEE 802.3 mode when it is available. Therefore, IEEE 802.3 will likely be the incumbent over the long term.  

Please note that there is no practical difference between Consortium vs. IEEE 802.3 for short reaches (<=3m)  between end points, where a 25G-CA-N cable is used. The longer cable reaches above 3m (CA-25G-L) requires the IEEE 802.3 modes with forward error correction, which adds latency.  The CA-25G-S type of cable is the least common. See slide 16 and slide 17 from the presentation.

Q. What's the max DAC cables (passive and/or active) length for 100G PAM4?

A. The IEEE P802.3ck specification for 100 Gb/s/lane copper cable PHYs targets a reach of at least 2 meters for a passive copper cable. Because the specification is still in development, the exact reach is still being worked out. Expect >2 meters for passive cables and 3-4 meters for active cables. Note that this is a reduction of reach from previous rates, as illustrated on slide 27, and that DAC cables are not for long range and generally used for very short interconnections between devices. For longer reaches, AOC cables are preferred.  

The passive copper cable length is primarily driven by the performance of the SERDES in the host (i.e. switch, server, FPGA, etc) and the construction materials of the cable assembly. Active copper cables (ACCs or AECs) use several different methods of signal conditioning to increase the cable reach; more sophisticated solutions have greater reach, greater latency and greater cost. See slide 35.

Q. What's the latency difference between active and passive DAC (PAM4 encoding)?

A. Passive copper cables do not contain signal conditioning electronics. Active copper cables (ACCs or AECs) use several different methods of signal conditioning to increase the cable reach; more sophisticated solutions have greater reach, greater latency and greater cost. See slide 35. A simple active cable may use one or more linear equalizer IC where as a complex cable uses a full retimer that may use FEC logic embedded inside.

For 50G PAM4 rates, the difference in one-way latency between a passive copper cable and a simple active copper cable is ~20 nsec. The difference between a passive copper cable and a complex active copper cable could be as high as ~80 nsec.

Q. Can you comment about "gearbox" cables (200G 4lane (@56G) to 100G 4 lane (@28G)?

A. A few companies are supplying cables that have 50G PAM4 on one end and 25G-NRZ on the other with gear box. We see it as a niche; used to link new to older equipment.

Q. Showing QSFP instead of OSFP on slide 29 in the image at the top right?

A. Good catch. That was a mistake on the slide. It has been corrected. Thanks.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Cabling, Connectors and Transceivers Questions Answered

Tim Lustig

Nov 9, 2021

title of post
In our recent live SNIA Network Storage Forum webcast, “Next-generation Interconnects: The Critical Importance of Connectors and Cables” provided an outstanding tutorial on the latest in the impressive array of data center infrastructure components designed to address expanding requirements for higher-bandwidth and lower-power. They covered common pluggable connectors and media types, copper cabling and transceivers, and real world use cases. If you missed the live event, it is available on-demand. We ran out of time to answer all the questions from the live audience. As promised, here are answers to them all. Q. For 25GbE, is the industry consolidating on one of the three options? A. The first version of 25 GbE (i.e. 25GBASE-CR1) was specified by the Ethernet Technology Consortium around 2014.  This was followed approximately two years later by the IEEE 802.3 versions of 25 GbE (i.e. 25GBASE-CR-S and 25GBASE-CR). As a result of the timing, the first 25 GbE capable products to market only supported consortium mode. More recently developed switches and server products support both consortium and IEEE 802.3 modes, with the link establishment protocol favoring the IEEE 802.3 mode when it is available. Therefore, IEEE 802.3 will likely be the incumbent over the long term. Please note that there is no practical difference between Consortium vs. IEEE 802.3 for short reaches (<=3m)  between end points, where a 25G-CA-N cable is used. The longer cable reaches above 3m (CA-25G-L) requires the IEEE 802.3 modes with forward error correction, which adds latency.  The CA-25G-S type of cable is the least common. See slide 16 and slide 17 from the presentation. Q. What’s the max DAC cables (passive and/or active) length for 100G PAM4? A. The IEEE P802.3ck specification for 100 Gb/s/lane copper cable PHYs targets a reach of at least 2 meters for a passive copper cable. Because the specification is still in development, the exact reach is still being worked out. Expect >2 meters for passive cables and 3-4 meters for active cables. Note that this is a reduction of reach from previous rates, as illustrated on slide 27, and that DAC cables are not for long range and generally used for very short interconnections between devices. For longer reaches, AOC cables are preferred. The passive copper cable length is primarily driven by the performance of the SERDES in the host (i.e. switch, server, FPGA, etc) and the construction materials of the cable assembly. Active copper cables (ACCs or AECs) use several different methods of signal conditioning to increase the cable reach; more sophisticated solutions have greater reach, greater latency and greater cost. See slide 35. Q. What’s the latency difference between active and passive DAC (PAM4 encoding)? A. Passive copper cables do not contain signal conditioning electronics. Active copper cables (ACCs or AECs) use several different methods of signal conditioning to increase the cable reach; more sophisticated solutions have greater reach, greater latency and greater cost. See slide 35. A simple active cable may use one or more linear equalizer IC where as a complex cable uses a full retimer that may use FEC logic embedded inside. For 50G PAM4 rates, the difference in one-way latency between a passive copper cable and a simple active copper cable is ~20 nsec. The difference between a passive copper cable and a complex active copper cable could be as high as ~80 nsec. Q. Can you comment about “gearbox” cables (200G 4lane (@56G) to 100G 4 lane (@28G)? A. A few companies are supplying cables that have 50G PAM4 on one end and 25G-NRZ on the other with gear box. We see it as a niche; used to link new to older equipment. Q. Showing QSFP instead of OSFP on slide 29 in the image at the top right? A. Good catch. That was a mistake on the slide. It has been corrected. Thanks.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Revving Up Storage for Automotive

Tom Friend

Nov 8, 2021

title of post

Each year cars become smarter and more automated. In fact, the automotive industry is effectively transforming the vehicle into a data center on wheels. Connectedness, autonomous driving, and media & entertainment all bring more and more storage onboard and into networked data centers. But all the storage in (and for) a car is not created equal. There are 10s if not 100s of different processors on a car today. Some are attached to storage, some are not and each application demands different characteristics from the storage device.

The SNIA Networking Storage Forum (NSF) is exploring this fascinating topic on December 7, 2021 at our live webcast “Revving Up Storage for Automotive” where industry experts from both the storage and automotive worlds will discuss:

  • What’s driving growth in automotive storage?  
  • Special requirements for autonomous vehicles
  • Where automotive data is typically stored?  
  • Special use cases
  • Vehicle networking & compute changes and challenges

Start your engines and register today to join us as we drive into the future!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Revving Up Storage for Automotive

Tom Friend

Nov 8, 2021

title of post
Each year cars become smarter and more automated. In fact, the automotive industry is effectively transforming the vehicle into a data center on wheels. Connectedness, autonomous driving, and media & entertainment all bring more and more storage onboard and into networked data centers. But all the storage in (and for) a car is not created equal. There are 10s if not 100s of different processors on a car today. Some are attached to storage, some are not and each application demands different characteristics from the storage device. The SNIA Networking Storage Forum (NSF) is exploring this fascinating topic on December 7, 2021 at our live webcast “Revving Up Storage for Automotive” where industry experts from both the storage and automotive worlds will discuss:
  • What’s driving growth in automotive storage?
  • Special requirements for autonomous vehicles
  • Where automotive data is typically stored?
  • Special use cases
  • Vehicle networking & compute changes and challenges
Start your engines and register today to join us as we drive into the future!

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Tom Friend

Oct 20, 2021

title of post

What types of storage are needed for different aspects of AI? That was one of the many topics covered in our SNIA Networking Storage Forum (NSF) webcast “Storage for AI Applications.” It was a fascinating discussion and I encourage you to check it out on-demand. Our panel of experts answered many questions during the live roundtable Q&A. Here are answers to those questions, as well as the ones we didn’t have time to address.

Q. What are the different data set sizes and workloads in AI/ML in terms of data set size, sequential/ random, write/read mix?

A. Data sets will vary incredibly from use case to use case. They may be GBs to possibly 100s of PB. In general, the workloads are very heavily reads maybe 95%+. While it would be better to have sequential reads, in general the patterns tend to be closer to random. In addition, different use cases will have very different data sizes. Some may be GBs large, while others may be <1 KB. The different sizes have a direct impact on performance in storage and may change how you decide to store the data.

Q. More details on the risks associated with the use of online databases?

A. The biggest risk with using an online DB is that you will be adding an additional workload to an important central system. In particular, you may find that the load is not as predictable as you think and it may impact the database performance of the transactional system. In some cases, this is not a problem, but when it is intended for actual transactions, you could be hurting your business.

Q. What is the difference between a DPU and a RAID / storage controller?

A. A Data Processing Unit or DPU is intended to process the actual data passing through it. A RAID/storage controller is only intended to handle functions such as data resiliency around the data, but not the data itself. A RAID controller might take a CSV file and break it down for storage in different drives. However, it does not actually analyze the data. A DPU might take that same CSV and look at the different rows and columns to analyze the data. While the distinction may seem small, there is a big difference in the software. A RAID controller does not need to know anything about the data, whereas a DPU must be programmed to deal with it. Another important aspect is whether or not the data will be encrypted. If the data will encrypted, a DPU will have to have additional security mechanisms to deal with decryption of the data. However, a RAID-based system will not be affected.

Q. Is a CPU-bypass device the same as a SmartNIC?

A. Not entirely. They are often discussed together, but a DPU is intended to process data, whereas a SmartNIC may only process how the data is handled (such as encryption, handle TCP/IP functions, etc.).  It is possible for a SmartNIC to also act as a DPU where the data itself is processed. There are new NVMe-oF™ technologies that are beginning to allow FPGA, TPD, DPU, GPU and other devices direct access to other servers’ storage directly over a high-speed local area network without having to access the CPU of that system.

Q. What work is being done to accelerate S3 performance with regard to AI?

A. A number of companies are working to accelerate the S3 protocol. Presto and a number of Big Data technologies use it natively. For AI workloads there are a number of caching technologies to handle the re-reads of training on a local system. Minimizing the performance penalty

Q. From a storage perspective, how do I take different types of data from different storage systems to develop a model?

A. Work with your project team to find the data you need and ensure it can be served to the ML/DL training (or inference) environment in a timely manner. You may need to copy (or clone) data on to a faster medium to achieve your goals. But look at the process as a whole. Do not underestimate the data cleansing/normalization steps in your storage analysis as it can prove to be a bottleneck.

Q. Do I have to "normalize" that data to the same type, or can a model accommodate different data types?

A. In general, yes. Models can be very sensitive. A model trained on one set of data with one set of normalizations may not be accurate if data that was taken from a different set with different normalizations is used for inference. This does depend on the model, but you should be aware not only of the model, but also the details of how the data was prepared prior to training.

Q. If I have to change the data type, do I then need to store it separately?

A. It depends on your data, "do other systems need it in the old format?"

Q. Are storage solutions that are right for one form of AI also the best for others?

A. No. While it may be possible to use a single solution for multiple AIs, in general there are differences in the data that can necessitate different storage. A relatively simple example is large data (MBs) vs. small data (~1KB). Data in that multiple MBs large example can be easily erasure coded and stored more cost effectively. However, for small data, Erasure Coding is not practical and you generally will have to go with replication.

Q. How do features like CPU bypass impact performance of storage?

A. CPU bypass is essential for those times when all you need to do is transfer data from one peripheral to another without processing. For example, if you are trying to take data from a NIC and transfer it to a GPU, but not process the data in any way, CPU bypass works very well. It prevents the CPU and system memory from becoming a bottleneck. Likewise, on a storage server, if you simply need to take data from an SSD and pass it to a NIC during a read, CPU bypass can really help boost system performance. One important note: if you are well under the limits of the CPU, the benefits of bypass are small. So, think carefully about your system design and whether or not the CPU is a bottleneck. In some cases, people will use system memory as a cache and in these cases, bypassing CPU isn’t possible.

Q. How important is it to use All-Flash storage compared to HDD or hybrid?

A. Of course, It depends on your workloads. For any single model, you may be able to make due with HDD. However, another consideration for many of the AI/ML systems is that their use can quite suddenly expand. Once there is some amount of success, you may find that more people will want access to the data and the system may experience more load. So beware of the success of these early projects as you may find your need for creation of multiple models from the same data could overload your system.

Q. Will storage for AI/ML necessarily be different from standard enterprise storage today?

A. Not necessarily. It may be possible for enterprise solutions today to meet your requirements. However, a key consideration is that if your current solution is barely able to handle its current requirements, then adding an AI/ML training workload may push it over the edge. In addition, even if your current solution is adequate, the size of many ML/DL models are growing exponentially every year.  So, what you provision today may not be adequate in a year or even several months.  Understanding the direction of the work your data scientists are pursuing is important for capacity and performance planning.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to Networked Storage