Our recent webcast on how Hyperscalers, Facebook and Microsoft are working together to merge their SSD drive requirements generated a lot of interesting questions. If you missed “How Facebook & Microsoft Leverage NVMe Cloud Storage” you can watch it on-demand. As promised at our live event. Here are answers to the questions we received.
Q. How does Facebook or Microsoft see Zoned Name Spaces being used?
A. Zoned Name Spaces are how we will consume QLC NAND
broadly. The ability to write to the NAND sequentially in large increments that
lay out nicely on the media allows for very little write amplification in the
device.
Q. How high a priority is firmware malware? Are there
automated & remote management methods for detection and fixing at scale?
A. Security in the data center is one of the highest
priorities. There are tools to monitor and manage the fleet including firmware
checking and updating.
Q. If I understood correctly, the need for NVMe rooted
from the need of communicating at faster speeds with different components in
the network. Currently, at which speed is NVMe going to see no more benefit
with higher speed because of the latencies in individual components? Which
component is most gating/concerning at this point?
A. In today’s SSDs, the NAND latency dominates. This
can be mitigated by adding backend channels to the controller and optimization
of data placement across the media. There are applications that are direct
connect to the CPU where performance scales very well with PCIe lane speeds and
do not have to deal with network latencies.
Q. Where does zipline fit? Does Microsoft expect Azure
to default to zipline at both ends of the Azure network?
A. Microsoft has donated the RTL for the Zipline
compression ASIC to Open Compute so that multiple endpoints can take advantage
of “bump in the wire” inline compression.
Q. What other protocols exist that are competing with
NVMe? What are the pros and cons for these to be successful?
A. SATA and SAS are the legacy protocols that NVMe was
designed to replace. These protocols still have their place in HDD deployments.
Q. Where do you see U.2 form factor for NVMe?
A. Many enterprise solutions use U.2 in their 2U
offerings. Hyperscale servers are mostly focused on 1U server form factors were
the compact heights of E1.S and E1.L allow for vertical placement on the front
of the server.
Q. Is E1.L form factor too big (32 drives) for failure
domain in a single node as a storage target?
A. E1.L allows for very high density storage. The
storage application must take into account the possibility of device failure
via redundancy (mirroring, erasure coding, etc.) and rapid rebuild. In the
future, the ability for the SSD to slowly lose capacity over time will be
required.
Q. What has been the biggest pain points in using NVMe
SSD – since inception/adoption, especially, since Microsoft and Facebook
started using this.
A. As discussed in the live Q&A, in the early days
of NVMe the lack of standard drives for both Windows and Linux hampered
adoption. This has since been resolved with standard in box drive offerings.
Q. Has FB or Microsoft considered allowing drives to
lose data if they lose power on an edge server? if the server is rebuilt on a
power down this can reduce SSD costs.
A. There are certainly interesting use cases where
Power Loss Protection is not needed.
Q. Do zoned namespaces makes Denali spec obsolete or
dropped by Microsoft? How does it impact/compete open channel initiatives by
Facebook?
A. Zoned Name Spaces incorporates probably 75% of the
Denali functionality in an NVMe standardized way.
Q. How stable is NVMe PCIe hot plug devices (unmanaged
hot plug)?
A. Quite stable.
Q. How do you see Ethernet SSDs impacting cloud
storage adoption?
A. Not clear yet if Ethernet is the right connection mechanism for storage disaggregation. CXL is becoming interesting.
Q. Thoughts on E3? What problems are being solved with E3? A. E3 is meant more for 2U servers. Q. ZNS has a lot of QoS implications as we load up so many dies on E1.L FF. Given the challenge how does ZNS address the performance requirements from regular cloud requirements? A. With QLC, the end to end systems need to be designed to meet the application’s requirements. This is not limited to the ZNS device itself, but needs to take into account the entire system. If you’re looking for more resources on any of the topics addressed in this blog, check out the SNIA Educational Library where you’ll find over 2,000 vendor-neutral presentations, white papers, videos, technical specifications, webcasts and more.
Leave a Reply