SmartNICs to xPUs Q&A

The SNIA Networking Storage Forum kicked off its xPU webcast series last month with “SmartNICs to xPUs – Why is the Use of Accelerators Accelerating?” where SNIA experts defined what xPUs are, explained how they can accelerate offload functions, and cleared up confusion on many other names associated with xPUs such as SmartNIC, DPU, IPU, APU, NAPU. The webcast was highly-rated by our audience and already has more than 1,300 views. If you missed it, you can watch it on-demand and download a copy of the presentation slides at the SNIA Educational Library. The live audience asked some interesting questions and here are answers from our presenters. Q. How can we have redundancy on an xPU? A. xPUs are optimal for optimizing and offloading server/appliance and application redundancy schemes. Being the heart of the data movement and processing at the server, xPUs can expose parallel data-paths and be a reliable control point for server management. Also, the xPUs’ fabric connecting the hosts can provide self-redundancy and elasticity such that redundancy between xPU devices can be seamless and provide simplified redundancy and availability scheme between the different entities in the xPU fabric that is connecting between the servers over the network. The fact that xPUs don’t run the user applications, (or maybe in the worst case run some offload functions for them) makes them a true stable and reliable control point for such redundancy schemes. It’s also possible to put two (or potentially more) xPUs into each server to provide redundancy at the xPU level. Q. More of a comment. I’m in the SSD space, and with the ramp up in E.1L/S E.3 space is being optimized for these SmartNICs/GPUs, DPUs, etc. Also better utilizing space inside a server/node, and allow for serial interface location on the PCB. Great discussion today. A. Yes, it’s great to see servers and component devices evolving towards supporting cloud-ready architectures and composable infrastructure for data centers. We anticipate that xPUs will evolve into a variety of physical form factors within the server especially with the modular server component standardization work that is going on. We’re glad you enjoyed the session. Q. How does CXL impact xPUs and their communication with other components such as DRAM? Will this eliminate DDR and not TCP/IP? A. xPUs might use CXL as an enhanced interface to the host, to local devices connected to the xPU or to a CXL fabric that acts as an extension of local devices and xPUs network, for example connected to an entity like a shared memory pool. CXL can provide an enhanced, coherent memory interface and can take a role in extending access to slower tiers of memory to the host or devices through the CXL.MEM interface. It can also provide a coherent interface through the CXL.CACHE interface that can create an extended compute interface and allow close interaction between host and devices. We think CXL will provide an additional tier for memory and compute that will be living side by side with current tiers of compute and memory, each having its own merit in different compute scenarios. Will CXL eliminate DDR? Local DDR for the CPU will always have a latency advantage and will provide better compute in some use cases, so CXL memory will add additional tiers of memory/PMEM/storage in addition to that provided by DDR. Q. Isn’t a Fibre Channel (FC) HBA very similar to a DPU, but for FC? A. The NVMe-oF offloads make the xPU equivalent to an FC HBA, but the xPU can also host additional offloads and services at the same time. Both FC HBAs and xPUs typically accelerate and offload storage networking connections and can enable some amount of remote management. They may also offload storage encryption tasks. However, xPUs typically support general networking and might also support storage tasks, while FC HBAs always support Fibre Channel storage tasks and rarely support any non-storage functions. Q. Were the old TCP Offload Engine (TOE) cards from Adaptec many years ago considered xPU devices, that were used for iSCSI? A.They were not considered xPUs as—like FC HBAs—they only offloaded storage networking traffic, in this case for iSCSI traffic over TCP. In addition, the terms “xPU,” “IPU” and “DPU” were not in use at that time. However, TOE and equivalent cards laid the ground work for the evolution to the modern xPU. Q. For xPU sales to grow dramatically won’t that happen after CXL has a large footprint in data centers? A. The CXL market is focused on a coherent device and memory extension connection to the host, while the xPU market is focused on devices that handle data movement and processing offload for the host connected over networks. As such, CXL and xPU markets are complementary. Each market has its own segment and use case and viability independent on each other. As discussed above, the technical solutions are complements so that the evolution of each market proliferates from the other. Broader adoption of CXL will enable faster and broader functionality for xPUs, but is not required for rapid growth of the xPU market. Q. What role will CXL play in these disaggregated data centers? A. The ultimate future of CXL is a little hard to predict. CXL has a potential role in disaggregation of coherent devices and memory pools at the chassis/rack scale level with CXL switch devices, while xPUs have the role of disaggregating at the rack/datacenter level. xPUs will start out connecting multiple servers across multiple racks then extend across the entire data center and potentially across multiple data centers (and potentially from cloud to edge). It is likely that CXL will start out connecting devices within a server then possibly extend across a rack and eventually across multiple racks. If you are interested in learning more about xPUs, I encourage you to register for our second webcast “xPU Accelerator Offload Functions”to hear what problems the xPUs are coming to solve, where in the system they live, and the functions they implement.

Featured Post

Storage Trends in AI: Your Questions Answered

Unlocking CXL's Potential Q&A

Cloud Object Storage Incompatibilities Q&A

SNIA: Experts on Data Explained

RDMA Q&A

Ceph Storage for AI/ML Q&A

Find a similar article by tags

Leave a Reply