Author:

J Metz

Company : Rockport Networks

Blog

Hyperscalers Take on NVMe™ Cloud Storage Questions

Hyperscalers Take on NVMe™ Cloud Storage Questions

J Metz

Dec 2, 2019

Our recent webcast on how Hyperscalers, Facebook and Microsoft are working together to merge their SSD drive requirements generated a lot of interesting questions. If you missed “How Facebook & Microsoft Leverage NVMe Cloud Storage” you can watch it on-demand. As promised at our live event. Here are answers to the questions we received. Q. How does Facebook or Microsoft see Zoned Name Spaces being used? A. Zoned Name Spaces are how we will consume QLC NAND broadly. The ability to write to the NAND sequentially in large increments that lay out nicely on the media allows for very little write amplification in the device. Q. How high a priority is firmware malware? Are there automated & remote management methods for detection and fixing at scale? A. Security in the data center is one of the highest priorities. There are tools to monitor and manage the fleet including firmware checking and updating. Q. If I understood correctly, the need for NVMe rooted from the need of communicating at faster speeds with different components in the network. Currently, at which speed is NVMe going to see no more benefit with higher speed because of the latencies in individual components? Which component is most gating/concerning at this point? A. In today’s SSDs, the NAND latency dominates. This can be mitigated by adding backend channels to the controller and optimization of data placement across the media. There are applications that are direct connect to the CPU where performance scales very well with PCIe lane speeds and do not have to deal with network latencies. Q. Where does zipline fit? Does Microsoft expect Azure to default to zipline at both ends of the Azure network? A. Microsoft has donated the RTL for the Zipline compression ASIC to Open Compute so that multiple endpoints can take advantage of “bump in the wire” inline compression. Q. What other protocols exist that are competing with NVMe? What are the pros and cons for these to be successful? A. SATA and SAS are the legacy protocols that NVMe was designed to replace. These protocols still have their place in HDD deployments. Q. Where do you see U.2 form factor for NVMe? A. Many enterprise solutions use U.2 in their 2U offerings. Hyperscale servers are mostly focused on 1U server form factors were the compact heights of E1.S and E1.L allow for vertical placement on the front of the server. Q. Is E1.L form factor too big (32 drives) for failure domain in a single node as a storage target? A. E1.L allows for very high density storage. The storage application must take into account the possibility of device failure via redundancy (mirroring, erasure coding, etc.) and rapid rebuild. In the future, the ability for the SSD to slowly lose capacity over time will be required. Q. What has been the biggest pain points in using NVMe SSD – since inception/adoption, especially, since Microsoft and Facebook started using this. A. As discussed in the live Q&A, in the early days of NVMe the lack of standard drives for both Windows and Linux hampered adoption. This has since been resolved with standard in box drive offerings. Q. Has FB or Microsoft considered allowing drives to lose data if they lose power on an edge server? if the server is rebuilt on a power down this can reduce SSD costs. A. There are certainly interesting use cases where Power Loss Protection is not needed. Q. Do zoned namespaces makes Denali spec obsolete or dropped by Microsoft? How does it impact/compete open channel initiatives by Facebook? A. Zoned Name Spaces incorporates probably 75% of the Denali functionality in an NVMe standardized way. Q. How stable is NVMe PCIe hot plug devices (unmanaged hot plug)? A. Quite stable. Q. How do you see Ethernet SSDs impacting cloud storage adoption?

A. Not clear yet if Ethernet is the right connection mechanism for storage disaggregation. CXL is becoming interesting.

Q. Thoughts on E3? What problems are being solved with E3? A. E3 is meant more for 2U servers. Q. ZNS has a lot of QoS implications as we load up so many dies on E1.L FF. Given the challenge how does ZNS address the performance requirements from regular cloud requirements? A. With QLC, the end to end systems need to be designed to meet the application’s requirements. This is not limited to the ZNS device itself, but needs to take into account the entire system. If you’re looking for more resources on any of the topics addressed in this blog, check out the SNIA Educational Library where you’ll find over 2,000 vendor-neutral presentations, white papers, videos, technical specifications, webcasts and more.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Cloud Storage Hyperscalers Networked Storage NVMe Solid State Storage SSDs

Blog

Hyperscalers Take on NVMe™ Cloud Storage Questions

Hyperscalers Take on NVMe™ Cloud Storage Questions

J Metz

Dec 2, 2019

Q. How does Facebook or Microsoft see Zoned Name Spaces being used?

A. Zoned Name Spaces are how we will consume QLC NAND broadly. The ability to write to the NAND sequentially in large increments that lay out nicely on the media allows for very little write amplification in the device.

Q. How high a priority is firmware malware? Are there automated & remote management methods for detection and fixing at scale?

A. Security in the data center is one of the highest priorities. There are tools to monitor and manage the fleet including firmware checking and updating.

Q. If I understood correctly, the need for NVMe rooted from the need of communicating at faster speeds with different components in the network. Currently, at which speed is NVMe going to see no more benefit with higher speed because of the latencies in individual components? Which component is most gating/concerning at this point?

A. In today's SSDs, the NAND latency dominates. This can be mitigated by adding backend channels to the controller and optimization of data placement across the media. There are applications that are direct connect to the CPU where performance scales very well with PCIe lane speeds and do not have to deal with network latencies.

Q. Where does zipline fit? Does Microsoft expect Azure to default to zipline at both ends of the Azure network?

A. Microsoft has donated the RTL for the Zipline compression ASIC to Open Compute so that multiple endpoints can take advantage of "bump in the wire" inline compression.

Q. What other protocols exist that are competing with NVMe? What are the pros and cons for these to be successful?

A. SATA and SAS are the legacy protocols that NVMe was designed to replace. These protocols still have their place in HDD deployments.

Q. Where do you see U.2 form factor for NVMe?

A. Many enterprise solutions use U.2 in their 2U offerings. Hyperscale servers are mostly focused on 1U server form factors were the compact heights of E1.S and E1.L allow for vertical placement on the front of the server.

Q. Is E1.L form factor too big (32 drives) for failure domain in a single node as a storage target?

A. E1.L allows for very high density storage. The storage application must take into account the possibility of device failure via redundancy (mirroring, erasure coding, etc.) and rapid rebuild. In the future, the ability for the SSD to slowly lose capacity over time will be required.

Q. What has been the biggest pain points in using NVMe SSD - since inception/adoption, especially, since Microsoft and Facebook started using this.

A. As discussed in the live Q&A, in the early days of NVMe the lack of standard drives for both Windows and Linux hampered adoption. This has since been resolved with standard in box drive offerings.

Q. Has FB or Microsoft considered allowing drives to lose data if they lose power on an edge server? if the server is rebuilt on a power down this can reduce SSD costs.

A. There are certainly interesting use cases where Power Loss Protection is not needed.

Q. Do zoned namespaces makes Denali spec obsolete or dropped by Microsoft? How does it impact/compete open channel initiatives by Facebook?

A. Zoned Name Spaces incorporates probably 75% of the Denali functionality in an NVMe standardized way.

Q. How stable is NVMe PCIe hot plug devices (unmanaged hot plug)?

A. Quite stable.

Q. How do you see Ethernet SSDs impacting cloud storage adoption?

A. Not clear yet if Ethernet is the right connection mechanism for storage disaggregation. CXL is becoming interesting.

Q. Thoughts on E3? What problems are being solved with E3?

A. E3 is meant more for 2U servers.

Q. ZNS has a lot of QoS implications as we load up so many dies on E1.L FF. Given the challenge how does ZNS address the performance requirements from regular cloud requirements?

A. With QLC, the end to end systems need to be designed to meet the application's requirements. This is not limited to the ZNS device itself, but needs to take into account the entire system.

If you're looking for more resources on any of the topics addressed in this blog, check out the SNIA Educational Library where you'll find over 2,000 vendor-neutral presentations, white papers, videos, technical specifications, webcasts and more.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Cloud Storage Networked Storage NVMe Solid State Storage

Blog

How Facebook & Microsoft Leverage NVMe™ Cloud Storage

How Facebook & Microsoft Leverage NVMe™ Cloud Storage

J Metz

Oct 9, 2019

What do Hyperscalers like Facebook and Microsoft have in common? Find out in our next SNIA Networking Storage Forum (NSF) webcast, How Facebook and Microsoft Leverage NVMe Cloud Storage, on November 19, 2019 where you'll hear how these cloud market leaders are using NVMe SSDs in their architectures. Our expert presenters, Ross Stenfort, Hardware System Engineer at Facebook and Lee Prewitt, Principal Hardware Program Manager, Azure CSI at Microsoft, will provide a close up look into their application requirements and challenges, why they chose NVMe flash for storage, and how they are successfully deploying NVMe to fuel their businesses. You'll learn:

IOPs requirements for Hyperscalers
Challenges when managing at scale
Issues around form factors
Need to allow for "rot in place"
Remote debugging requirements
Security needs
Deployment success factors

I hope you will join us for this look at NVMe in the real world. Our experts will be on-hand to answer your questions during and after the webcast. Register today. We look forward to seeing you on November 19^th.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Networked Storage NVMe

Blog

How Facebook & Microsoft Leverage NVMe Cloud Storage

How Facebook & Microsoft Leverage NVMe Cloud Storage

J Metz

Oct 9, 2019

What do Hyperscalers like Facebook and Microsoft have in common? Find out in our next SNIA Networking Storage Forum (NSF) webcast, How Facebook and Microsoft Leverage NVMe Cloud Storage, on November 19, 2019 where you’ll hear how these cloud market leaders are using NVMe SSDs in their architectures. Our expert presenters, Ross Stenfort, Hardware System Engineer at Facebook and Lee Prewitt, Principal Hardware Program Manager, Azure CSI at Microsoft, will provide a close up look into their application requirements and challenges, why they chose NVMe flash for storage, and how they are successfully deploying NVMe to fuel their businesses. You’ll learn:

IOPs requirements for Hyperscalers
Challenges when managing at scale
Issues around form factors
Need to allow for “rot in place”
Remote debugging requirements
Security needs
Deployment success factors

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Cloud Storage Hyperscalers Networked Storage NVMe

Blog

Introducing the Storage Networking Security Webcast Series

Introducing the Storage Networking Security Webcast Series

J Metz

Sep 3, 2019

This series of webcasts, hosted by the SNIA Networking Storage Forum, is going to tackle an ambitious project – the scope of securing data, namely storage systems and storage networks. Obviously, many of the concepts and realities contained in this series are going to be broadly applicable to all kinds of data protection, but there are some aspects of security that have a unique impact on storage, storage systems, and storage networks. Because of the fact that security is a holistic concern, there has to be more than "naming the parts." It's important to understand how the pieces fit together, because it's where those joints exist that many of the threats become real. Understanding Storage Security and Threats This presentation is going to go into the broad introduction of security principles in general. This will include some of the main aspects of security, including defining the terms that you must know, if you hope to have a good grasp of what makes something secure or not. We'll be talking about the scope of security, including threats, vulnerabilities, and attacks – and what that means in real storage terms. Securing the Data at Rest When you look at the holistic concept of security, one of the most obvious places to start are the threats to the physical realm. Among the topics here, we will include: ransomware, physical security, self-encrypting drives, and other aspects of how data and media are secured at the hardware level. In particular, we'll be focusing on the systems and mechanisms of securing the data, and even touch on some of the requirements that are being placed on the industry by government security recommendations. Storage Encryption This is a subject so important that it deserves its own specific session. It is a fundamental element that affects hardware, software, data-in-flight, data-at-rest, and regulations. In this session, we're going to be laying down the taxonomy of what encryption is (and isn't), how it works, what the trade-offs are, and how storage professionals choose between the different options for their particular needs. This session is the "deep dive" that explains what goes on underneath the covers when encryption is used for data in flight or at rest. Key Management In order to effectively use cryptography to protect information, one has to ensure that the associated cryptographic keys are also protected. Attention must be paid to how cryptographic keys are generated, distributed, used, stored, replaced and destroyed in order to ensure that the security of cryptographic implementations are not compromised. This webinar will introduce the fundamentals of cryptographic key management including key lifecycles, key generation, key distribution, symmetric vs asymmetric key management and integrated vs centralized key management models. Relevant standards, protocols and industry best practices will also be presented. Securing Data in Flight Getting from here to there, securely and safely. Whether it's you in a car, plane, or train – or your data going across a network, it's critical to make sure that you get there in one piece. Just like you, your data must be safe and sound as it makes its journey. This webcast is going to talk about the threats to your data as it's transmitted, how interference happens along the way, and the methods of protecting that data when this happens. Securing the Protocol Different storage networks have different means for creating security beyond just encrypting the wire. We'll be discussing some of the particular threats to storage that are specific to attacking the vulnerabilities to data-in-flight. Here we will be discussing various security features of Ethernet and Fibre Channel, in particular, secure data in flight at the protocol level, including (but not limited to): MACSec, IPSec, and FC-SP2. Security Regulations It's impossible to discuss storage security without examining the repercussions at the regulatory level. In this webcast, we're going to take a look at some of the common regulatory requirements that require specific storage security configurations, and what those rules mean in a practical sense. In other words, how do you turn those requirements into practical reality? GDPR, the California Consumer Privacy Act (CCPA), other individual US States' laws – all of these require more than just ticking a checkbox. What do these things mean in terms of applying them to storage and storage networking? Securing the System: Hardening Methods "Hardening" is something that you do to an implementation, which means understanding how all of the pieces fit together. We'll be talking about different methods and mechanisms for creating secure end-to-end implementations. Topics such as PCI compliance, operating system hardening, and others will be included. Obviously, storage security is a huge subject. This ambitious project certainly doesn't end here, and there will always be additional topics to cover. For now, however, we want to provide you with the industry's best experts in storage and security to help you navigate the labyrinthian maze of rules and technology... in plain English. Please join us and register for the first webcast in the series, Understanding Storage Security and Threats on October 8th.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Networked Storage storage security

Blog

Introducing the Storage Networking Security Webcast Series

Introducing the Storage Networking Security Webcast Series

J Metz

Sep 3, 2019

This series of webcasts, hosted by the SNIA Networking Storage Forum, is going to tackle an ambitious project – the scope of securing data, namely storage systems and storage networks. Obviously, many of the concepts and realities contained in this series are going to be broadly applicable to all kinds of data protection, but there are some aspects of security that have a unique impact on storage, storage systems, and storage networks. Because of the fact that security is a holistic concern, there has to be more than “naming the parts.” It’s important to understand how the pieces fit together, because it’s where those joints exist that many of the threats become real. Understanding Storage Security and Threats This presentation is going to go into the broad introduction of security principles in general. This will include some of the main aspects of security, including defining the terms that you must know, if you hope to have a good grasp of what makes something secure or not. We’ll be talking about the scope of security, including threats, vulnerabilities, and attacks – and what that means in real storage terms. Storage Encryption This is a subject so vast that it deserves its own specific session. It is a foundational element that affects hardware, software, data-in-flight, data-at-rest, and regulations. In this session, we’re going to be laying down the taxonomy of what encryption is (and isn’t), how it works, what the trade-offs are, and how storage professionals choose between the different options for their particular needs. This session will also lay the groundwork for future webcasts (especially Securing the Data and Storage Regulations). Securing the Data and the Media When you look at the holistic concept of security, one of the most obvious places to start are the threats to the physical realm. Among the topics here, we will include “Data Encryption at Rest,” physical security, self-encrypting drives, and other aspects of how data and media are secured at the hardware level. Securing the Protocol Different storage networks have different means for creating security beyond just encrypting the wire. We’ll be discussing some of the particular threats to storage that are specific to attacking the vulnerabilities to data-in-flight. Here we will be discussing various security features of Ethernet and Fibre Channel, in particular, secure data in flight at the protocol level, including (but not limited to): MACSec, IPSec, and FC-SP2. Security Regulations It’s impossible to discuss storage security without examining the repercussions at the regulatory level. In this webinar, we’re going to take a look at some of the common regulatory requirements that require specific storage security configurations, and what those rules mean in a practical sense. In other words, how do you turn those requirements into practical reality? GDPR, the California Consumer Privacy Act (CCPA), other individual US States’ laws – all of these require more than just ticking a checkbox. What do these things mean in terms of applying them to storage and storage networking? Securing the System: Hardening Methods “Hardening” is something that you do to an implementation, which means understanding how all of the pieces fit together. We’ll be talking about different methods and mechanisms for creating secure end-to-end implementations. Topics such as PCI compliance, operating system hardening, and others will be included. Obviously, storage and security is a huge subject. This ambitious project certainly doesn’t end here, and there will always be additional topics to cover. For now, however, we want to provide you with the industry’s best experts in storage and security to help you navigate the labyrinthian maze of rules and technology… in plain English. Please join us and register for the first webcast in the series, Understanding Storage Security and Threats on October 8th.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Data Protection data security Network Security Networked Storage Security Threats storage security

Blog

Author of NVMe™/TCP Spec Answers Your Questions

Author of NVMe™/TCP Spec Answers Your Questions

J Metz

Mar 27, 2019

900 people have already watched our SNIA Networking Storage Forum webcast, What NVMe/TCP Means for Networked Storage? where Sagi Grimberg, lead author of the NVMe/TCP specification, and J Metz, Board Member for SNIA, explained what NVMe/TCP is all about. If you haven’t seen the webcast yet, check it out on-demand.

Like any new technology, there’s no shortage of areas for potential confusion or questions. In this FAQ blog, we try to clear up both.

Q. Who is responsible for updating NVMe Host Driver?

A. We assume you are referring to the Linux host driver (independent OS software vendors are responsible for developing their own drivers). Like any device driver and/or subsystem in Linux, the responsibility of maintenance is on the maintainer(s) listed under the MAINTAINERS file. The responsibility of contributing is shared by all the community members.

Q. What is the realistic timeframe to see a commercially available NVME over TCP driver for targets? Is one year from now (2020) fair?

A. Even this year commercial products are coming to market. The work started even before the spec was fully ratified, but now that it has been, we expect wider NVMe/TCP support available. Q. Does NVMe/TCP work with 400GbE infrastructure? A. As of this writing, there is no reason to believe that upper layer protocols such as NVMe/TCP will not work with faster Ethernet physical layers like 400GbE. Q. Why is NVMe CQ in the controller and not on the Host? A. The example that was shown in the webcast assumed that the fabrics controller had an NVMe backend. So the controller backend NVMe device had a local completion queue, and on the host sat the “transport completion queue” (in NVMe/TCP case this is the TCP stream itself). Q. So, SQ and CQ streams run asynchronously from each other, with variable ordering depending on the I/O latency of a request? A. Correct. For a given NVMe/TCP connection, stream delivery is in-order, but commands and completions can arrive (and be processed by the NVMe controller) in any order. Q. What TCP ports are used? Since we have many NVMe queues, I bet we need a lot of TCP ports. A. Each NVMe queue will consume a unique source TCP port. Common NVMe host implementations will create a number of NVMe queues in the same order of magnitude of the number of CPU cores. Q. What is the max size of Data PDU supported? Are there any restrictions in parallel writes? A. The maximum size of an H2CData PDU (MAXH2CDATA) is negotiated and can be as large as 4GB. It is recommended that it will be no less than 4096 bytes. Q. Is immediate data negotiated between host and target? A. The in-capsule data size (IOCCSZ) is negotiated on an NVMe level. In NVMe/TCP the admin queue command capsule size is 8K by default. In addition, the maximum size of the H2CData PDU is negotiated during the connection initialization. Q. Is NVMe/TCP hardware infrastructure cost lower? A. This can vary widely, but we assume you are referring to Ethernet hardware infrastructure. Plus, NVMe/TCP does not require RDMA capable NIC so the variety of implementations is usually wider which typically drives down cost. Q. What are the plans for the major OS suppliers to support NVMe over TCP (Windows, Linux, VMware)? A. Unfortunately, we cannot comment on their behalf, but Linux already supports NVMe/TCP which should find its way to the various distributions soon. We are working with others to support NVMe/TCP, but suggest asking them directly. Q. Where does the overhead occur for NVMe/TCP packetization, is it dependent on the CPU, or does the network adapter offload that heavy lifting? And what is the impact of numerous, but extremely small transfers? A. Indeed a software NVMe/TCP implementation will introduce an overhead resulting from the TCP stream processing. However, you are correct that common stateless offloads such as Large Receive Offload and TCP Segmentation Offload are extremely useful both for large and for small 4K transfers. Q. What do you mean Absolute Latency is higher than RDMA by “several” microseconds? <10us, tens of microseconds, or 100s of microseconds? A. That depends on various aspects such as the CPU model, the network infrastructure, the controller implementation, services running on top etc. Remote access to raw NVMe devices over TCP was measured to add a range between 20-35 microseconds with Linux in early testing, but the degrees of variability will affect this. Q. Will Wireshark support NVMe/TCP soon? Is an implementation in progress? A. We most certainly hope so, it shouldn’t be difficult, but we are not aware of an ongoing implementation in progress. Q. Are there any NVMe TCP drivers out there? A. Yes, Linux and SPDK both support NVMe/TCP out-of-the-box, see: https://nvmexpress.org/welcome-nvme-tcp-to-the-nvme-of-family-of-transports/ Q. Do you recommend a dedicated IP network for the storage traffic or can you use the same corporate network with all other LAN traffic? A. This really depends on the use case, the network utilization and other factors. Obviously if the network bandwidth is fully utilized to begin with, it won’t be very efficient to add the additional NVMe/TCP “load” on the network, but that alone might not be the determining factor. Otherwise it can definitely make sense to share the same network and we are seeing customers choosing this route. It might be useful to consider the best practices for TCP-based storage networks (iSCSI has taught valuable lessons), and we anticipate that many of the same principles will apply to NVMe/TCP. The AQM, buffer etc. tuning settings is very dependent on the traffic pattern and needs to be developed based on the requirements. Base configuration is determined by the vendors. Q. On slide 28: no, TCP needs the congestion feedback, mustn’t need to be a drop (could be ecn, latency variance etc) A. Yes, you are correct. The question refers to how that feedback is received, though, and in the most common (traditional) TCP methods it’s done via drops. Q. How can you find out/check what TCP stack (drop vs. zero-buffer) your network is using? A. The use/support of DCTCP is mostly driven by the OS. The network needs to support and have ECN enabled and correctly configured for the traffic of interest. So the best way to figure this out is to talk to the network team. The use of ECN,etc. needs to be developed between server and network team Q. On slide 33: drop is signal of overloaded network; congestion on-set is when there is a standing Q (latency already increases). Current state of the art is to always overload the network (switches). A. ECN is used to signal before drop happens to make it more efficient. Q. Is it safe to assume that most current switches on the market today support DCTCP/ECN and that we can mix/match switches from vendors across product families? A. Most modern ASICS support ECN today. Mixing different product lines needs to be carefully planned and tested. AQM, Buffers etc. need to be fine-tuned across the platforms. Q. Is there a substantial cost savings by implementing all of what is needed to support NVMe over TCP versus just sticking with RDMA? Much like staying with Fibre Channel instead of risking performance with iSCSI not being and staying implemented correctly. Building the separately supported network just seems the best route. A. By “sticking with RDMA” you mean that you have already selected RDMA, which means you already made the investments to make it work for your use case. We agree that changing what currently works reliably and meets the targets might be an unnecessary risk. NVMe/TCP brings a viable option for Ethernet fabrics which is easily scalable and allows you to utilize a wide variety of both existing and new infrastructure while still maintaining low latency NVMe access. Q. It seems that with multiple flavors of TCP and especially congestion management (DCTCP, DCQCN?) is there a plan for commonality in ecosystem to support a standard way to handle congestion management? Is that required in the switches or also in the HBAs? A. DCTCP is an approach for L3 based congestion management, whereas DCQCN is a combination of PFC and ECN for RoCEv2(UDP) based communication. So both of these are two different approaches. Q. Who are the major players in terms of marketing this technology among storage vendors? A. The key organization to find out about NVMe/TCP (or all NVMe-related material, in fact), is NVM Express® Q. Can I compare the NVMe over TCP to iSCSI? A. Easy, you can download upstream kernel and test both of the in-kernel implementations (iSCSI and NVMe/TCP). Alternatively you can reach out to a vendor that supports any of the two to test it as well. You should expect NVMe/TCP to run substantially faster for pretty much any workload. Q. Is network segmentation crucial as “go to” architecture with host to storage proximity objective to accomplish objective of manage/throttled close to near loss-less connectivity? A. There is a lot to unpack in this question. Let’s see if we can break it down a little. Generally speaking, best practice is to keep the storage as close to the host as possible (and is reasonable). Not only does this reduce latency, but it reduces the variability in latency (and bandwidth) that can occur at longer distances. In cases where storage traffic shares bandwidth (i.e., links) with other types of traffic, the variable nature of different applications (some are bursty, others are more long-lived) can create unpredictability. Since storage – particularly block storage – doesn’t “like” unpredictability, different methods are used to regain some of that stability as scales increase. A common and well-understood best practice is to isolate storage traffic from “regular” Ethernet traffic. As different workloads tend to be either “North-South” but increasingly “East-West” across the network topologies, this network segmentation becomes more important. Of course, it’s been used as a typical best practice for many years with protocols such as iSCSI, so this is not new. In environments where the variability of congestion can have a profound impact on the storage performance, network segmentation will, indeed, become crucial as a “go-to” architecture. Proper techniques at L2 and L3 will help determine how close to a “lossless” environment can be achieved, of course, as well as properly configured QoS mechanisms across the network. As a general rule of thumb, though, network segmentation is a very powerful tool to have for reliable storage delivery. Q. How close are we to shared NVMe storage either over Fiber or TCP? A. There are several shared storage products available on the market for NVMe over Fabrics, but as of this writing (only 3 months after the ratification of the protocol) no major vendors have announced NVMe over TCP shared storage capabilities. A good place to look for updates is on the NVM Express website for interoperability and compliance products. [https://nvmexpress.org/products/] Q. AQM -> DualQ work in IETF for coexisting L4S (DCTCP) and legacy TCP. Ongoing work @ chip merchants A. Indeed a lot of advancements around making TCP evolve as the speeds and feeds increase. This is yet another example that shows why NVMe/TCP is, and will remain, relevant in the future. Q. Are there any major vendors who are pushing products based on these technologies? A. We cannot comment publicly on any vendor plans. You would need to ask a vendor directly for a concrete timeframe for the technology. However, several startups have made public announcements on supporting NVMe/TCP. Lightbits Labs, to give one example, will have a high-performance low-latency NVMe/TCP-based software-defined-storage solution out very soon.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

NVMe NVMe TCP RDMA

Blog

What Are the Networking Requirements for HCI?

What Are the Networking Requirements for HCI?

J Metz

Feb 20, 2019

Hyperconverged infrastructures (also known as “HCI”) are designed to be easy to set up and manage. All you need to do is add networking. In practice, the “add networking” part has been more difficult than most anticipated. That’s why the SNIA Networking Storage Forum (NSF) hosted a live webcast “The Networking Requirements for Hyperconverged Infrastructure” where we covered what HCI is, storage characteristics of HCI, and important networking considerations. If you missed it, it’s available on-demand. We had some interesting questions during the live webcast and as we promised during the live presentation, here are answers from our expert presenters: Q. An HCI configuration ought to exist out of 3 or more nodes, or have I misunderstood this? In an earlier slide I saw HCI with 1 and 2 nodes. A. You are correct that HCI typically requires 3 or more nodes with resources pooled together to ensure data is distributed through the cluster in a durable fashion. Some vendors have released 2 node versions appropriate for edge locations or SMBs, but these revert to a more traditional failover approach between the two nodes rather than a true HCI configuration. Q. NVMe-oF means running NVMe over Fibre Channel or something else? A. The “F” in “NVMe-oF” stands for “Fabrics”. As of this writing, there are currently 3 different “official” Fabric transports explicitly outlined in the specification: RDMA-based (InfiniBand, RoCE, iWARP), TCP, and Fibre Channel. HCI, however, is a topology that is almost exclusively Ethernet-based, and Fibre Channel is a less likely storage networking transport for the solution. The spec for NVMe-oF using TCP was recently ratified, and may gain traction quickly given the broad deployment of TCP and comfort level with the technology in IT. You can learn ore about NVMe-oF in the webinar “Under the Hood with NVMe over Fabrics” and NVMe/TCP in this NSF webcast “What NVMe

/TCP Means to Networked Storage.” Q. In the past we have seen vendors leverage RDMA within the host but not take it to the fabric i.e. RDMA yes, RDMA over fabric may be not. Within HCI, do you see fabrics being required to be RDMA aware and if so, who do you think will ultimately decide this, HCI vendor, applications vendor, the customer, or someone else? A. The premise of HCI systems is that there is an entire ecosystem “under one roof,” so to speaker. Vendors with HCI solutions on the market have their choice of networking protocols that best works with their levels of virtualization and abstraction. To that end, it may be possible that RDMA-capable fabrics will become more common as workload demands on the network increase, and IT looks for various ways to optimize traffic. Hyperconverged infrastructure, with lots of east-west traffic between nodes, can take advantage of RDMA and NVMe-oF to improve performance and alleviate certain bottlenecks in the solution. It is, however, only one component piece of the overall picture. The HCI solution needs to know how to take advantage of these fabrics, as do switches, etc. for an end-to-end solution, and in some cases other transport forms may be more appropriate. Q. What is a metadata network? I had never heard that term before. A. Metadata is the data about the data. That is, HCI systems need to know where the data is located, when it was written, how to access it. That information about the data is called metadata. As systems grow over time, the amount of metadata that exists in the system grows as well. In fact, it is not uncommon for the metadata quantity and traffic to exceed the data traffic. For that reason, some vendors recommend establishing a completely separate network for handling the metadata traffic that traverses the system.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Ethernet Data Storage HCI Hyperconverged Hyperconvergence Networked Storage Storage Networking

Blog

Experts Answer Virtualization and Storage Networking Questions

Experts Answer Virtualization and Storage Networking Questions

J Metz

Jan 25, 2019

The SNIA Networking Storage Forum (NSF) kicked off the New Year with a live webcast "Virtualization and Storage Networking Best Practices." We guessed it would be popular and boy, were we right! Nearly 1,000 people registered to attend. If you missed out, it's available on-demand. You can also download a copy of the webcast slides. Our experts, Jason Massae from VMware and Cody Hosterman from Pure Storage, did a great job sharing insights and lessons learned on best practices on configuration, troubleshooting and optimization. Here's the blog we promised at the live event with answers to all the questions we received. Q. What is a fan-in ratio? A. fan-in ratio (sometimes also called an "oversubscription ratio") refers to the number of hosts links related to the links to a storage device. Using a very simple example can help understand the principle: Say you have a Fibre Channel network (the actual speed or protocol does not matter for our purposes here). You have 60 hosts, each with a 4GFC link, going through a series of switches, connected to a storage device, just like in the diagram below:

This is a 10:1 oversubscription ratio; the "fan-in" part refers to the number of host bandwidth in comparison to the target bandwidth. Block storage protocols like Fibre Channel, FCoE, and iSCSI have much lower fan-in ratios than file storage protocols such as NFS, and deterministic storage protocols (like FC) have lower than non-deterministic (like iSCSI). The true arbiter of what the appropriate fan-in ratio is determined by the application. Highly transactional applications, such as databases, often require very low ratios. Q. If there's a mismatch in the MTU between server and switch will the highest MTU between the two get negotiated or else will a mismatch persist? A. No, the lowest value will be used, but there's a caveat to this. The switch and the network in the path(s) can have MTU set higher than the hosts, but the hosts cannot have a higher MTU than the network. For example, if your hosts are set to 1500 and all the network switches in the path are set to 9k, then the hosts will communicate over 1500. However, what can, and usually does, happen is someone sets the host(s) or target(s) to 9k but never changes the rest of the network. When this happens, you end up with unreliable or even loss of connectivity. Take a look at the graphic below:

A large ball can't fit through a hole smaller than itself. Consequently, a 9k frame cannot pass through a 1500 port. Unless you and your network admin both understand and use jumbo frames, there's no reason to implement in your environment. Q. Can you implement port binding when using two NICs for all traffic including iSCSI? A. Yes you can use two NICs for all traffic including iSCSI, many organizations use this configuration. The key to this is making sure you have enough bandwidth to support all the traffic/ IO that will use those NICs. You should, at the very least, use 10Gb NICs faster if possible. Remember, now all your management, VM and storage traffic are using the same network devices. If you don't plan accordingly, everything can be impacted in your virtual environment. There are some hypervisors capable of granular network controls to manage which type of traffic uses which NIC, certain failover details and allow setting QoS limits on the different traffic types. Subsequently, you can ensure storage traffic gets the required bandwidth or priority in a dual NIC configuration. Q. I've seen HBA drivers that by default set their queue depth to 128 but the target port only handles 512. So two HBAs would saturate one target port which is undesirable. Why don't the HBA drivers ask what the depth should be at installation? A. There are a couple of possible reasons for this. One is that many do not know what it even means, and are likely to make a poor decision (higher is better, right?!). So vendors tend to set these things at defaults and let people change them if neededâ€”and usually that means they have purpose to change them. Furthermore, every storage array handles these things differently, and that can make it more difficult to size these things. It is usually better to provide consistencyâ€”having things set uniformly makes it easier to support and will give more consistent expectations even across storage platforms. Second, many environments are largeâ€”which means people usually are not clicking and type through installation. Things are templatized, or sysprepped, or automated, etc. During or after the deployment their automation tools can configure things uniformly in accordance with their needs. In short, it is like most things: give defaults to keep one-off installations simple (and decrease the risks from people who may not know exactly what they are doing), complete the installations without having to research a ton of settings that may not ultimately matter, and yet still provide experienced/advanced users, or automaters, ways to make changes. Q. A number of white papers show the storage uplinks on different subnets. Is there a reason to have each link on its own subnet/VLAN or can they share a common segment? A. One reason is to reduce the number of logical paths. Especially in iSCSI, the number of paths can easily exceed supported limits if every port can talk to every target. Using multiple subnets or VLANs can drop this in halfâ€”and all you really use is logical redundancy, which doesn't really matter. Also, if everything is in the same subnet or VLAN and someone make some kind of catastrophic change to that subnet or VLAN (or some device in it causes other issues), it is less likely to affect both subnets/VLANs. This gives some management "oops" protection. One change will bring all storage connectivity down.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Ethernet Data Storage Networked Storage virtualization

Blog

Experts Answer Virtualization and Storage Networking Questions

Experts Answer Virtualization and Storage Networking Questions

J Metz

Jan 25, 2019

The SNIA Networking Storage Forum (NSF) kicked off the New Year with a live webcast “Virtualization and Storage Networking Best Practices.” We guessed it would be popular and boy, were we right! Nearly 1,000 people registered to attend. If you missed out, it’s available on-demand. You can also download a copy of the webcast slides. Our experts, Jason Massae from VMware and Cody Hosterman from Pure Storage, did a great job sharing insights and lessons learned on best practices on configuration, troubleshooting and optimization. Here’s the blog we promised at the live event with answers to all the questions we received. Q. What is a fan-in ratio? A. fan-in ratio (sometimes also called an “oversubscription ratio”) refers to the number of hosts links related to the links to a storage device. Using a very simple example can help understand the principle: Say you have a Fibre Channel network (the actual speed or protocol does not matter for our purposes here). You have 60 hosts, each with a 4GFC link, going through a series of switches, connected to a storage device, just like in the diagram below:

This is a 10:1 oversubscription ratio; the “fan-in” part refers to the number of host bandwidth in comparison to the target bandwidth. Block storage protocols like Fibre Channel, FCoE, and iSCSI have much lower fan-in ratios than file storage protocols such as NFS, and deterministic storage protocols (like FC) have lower than non-deterministic (like iSCSI). The true arbiter of what the appropriate fan-in ratio is determined by the application. Highly transactional applications, such as databases, often require very low ratios. Q. If there’s a mismatch in the MTU between server and switch will the highest MTU between the two get negotiated or else will a mismatch persist? A. No, the lowest value will be used, but there’s a caveat to this. The switch and the network in the path(s) can have MTU set higher than the hosts, but the hosts cannot have a higher MTU than the network. For example, if your hosts are set to 1500 and all the network switches in the path are set to 9k, then the hosts will communicate over 1500. However, what can, and usually does, happen is someone sets the host(s) or target(s) to 9k but never changes the rest of the network. When this happens, you end up with unreliable or even loss of connectivity. Take a look at the graphic below:

A large ball can’t fit through a hole smaller than itself. Consequently, a 9k frame cannot pass through a 1500 port. Unless you and your network admin both understand and use jumbo frames, there’s no reason to implement in your environment. Q. Can you implement port binding when using two NICs for all traffic including iSCSI? A. Yes you can use two NICs for all traffic including iSCSI, many organizations use this configuration. The key to this is making sure you have enough bandwidth to support all the traffic/ IO that will use those NICs. You should, at the very least, use 10Gb NICs faster if possible. Remember, now all your management, VM and storage traffic are using the same network devices. If you don’t plan accordingly, everything can be impacted in your virtual environment. There are some hypervisors capable of granular network controls to manage which type of traffic uses which NIC, certain failover details and allow setting QoS limits on the different traffic types. Subsequently, you can ensure storage traffic gets the required bandwidth or priority in a dual NIC configuration. Q. I’ve seen HBA drivers that by default set their queue depth to 128 but the target port only handles 512. So two HBAs would saturate one target port which is undesirable. Why don’t the HBA drivers ask what the depth should be at installation? A. There are a couple of possible reasons for this. One is that many do not know what it even means, and are likely to make a poor decision (higher is better, right?!). So vendors tend to set these things at defaults and let people change them if needed—and usually that means they have purpose to change them. Furthermore, every storage array handles these things differently, and that can make it more difficult to size these things. It is usually better to provide consistency—having things set uniformly makes it easier to support and will give more consistent expectations even across storage platforms. Second, many environments are large—which means people usually are not clicking and type through installation. Things are templatized, or sysprepped, or automated, etc. During or after the deployment their automation tools can configure things uniformly in accordance with their needs. In short, it is like most things: give defaults to keep one-off installations simple (and decrease the risks from people who may not know exactly what they are doing), complete the installations without having to research a ton of settings that may not ultimately matter, and yet still provide experienced/advanced users, or automaters, ways to make changes. Q. A number of white papers show the storage uplinks on different subnets. Is there a reason to have each link on its own subnet/VLAN or can they share a common segment? A. One reason is to reduce the number of logical paths. Especially in iSCSI, the number of paths can easily exceed supported limits if every port can talk to every target. Using multiple subnets or VLANs can drop this in half—and all you really use is logical redundancy, which doesn’t really matter. Also, if everything is in the same subnet or VLAN and someone make some kind of catastrophic change to that subnet or VLAN (or some device in it causes other issues), it is less likely to affect both subnets/VLANs. This gives some management “oops” protection. One change will bring all storage connectivity down.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Ethernet Data Storage Networked Storage Storage Networking virtualization

Subscribe to J Metz

J Metz

Hyperscalers Take on NVMe™ Cloud Storage Questions

Find a similar article by tags

Leave a Reply

Hyperscalers Take on NVMe™ Cloud Storage Questions

Find a similar article by tags

Leave a Reply

How Facebook & Microsoft Leverage NVMe™ Cloud Storage

Find a similar article by tags

Leave a Reply

How Facebook & Microsoft Leverage NVMe Cloud Storage

Find a similar article by tags

Leave a Reply

Introducing the Storage Networking Security Webcast Series

Find a similar article by tags

Leave a Reply

Introducing the Storage Networking Security Webcast Series

Find a similar article by tags

Leave a Reply

Author of NVMe™/TCP Spec Answers Your Questions

Find a similar article by tags

Leave a Reply

What Are the Networking Requirements for HCI?

Find a similar article by tags

Leave a Reply

Experts Answer Virtualization and Storage Networking Questions

Find a similar article by tags

Leave a Reply

Experts Answer Virtualization and Storage Networking Questions

Find a similar article by tags

Leave a Reply