How are Compute Express Link
(CXL
) and the SNIA Smart Data Accelerator Interface (SDXI) related? It’s a topic we covered in detail at our recent SNIA Networking Storage Forum webcast, “What’s in a Name? Memory Semantics and Data Movement with CXL and SDXI” where our experts, Rita Gupta and Shyam Iyer, introduced both SDXI and CXL, highlighted the benefits of each, discussed data movement needs in a CXL ecosystem and covered SDXI advantages in a CXL interconnect. If you missed the live session, it is available in the SNIA Educational Library along with the presentation slides. The session was highly rated by the live audience who asked several interesting questions. Here are answers to them from our presenters Rita and Shyam.
Q. Now that SDXI v1.0 is out, can application
implementations use SDXI today?
A. Yes. Now that SDXI v1.0 is out, implementations can start building to the v1.0 SNIA standard. If you are looking to influence a future version of the specification, please consider joining the SDXI Technical Working Group (TWG) in SNIA. We are now in the planning process for post v1.0 features so we welcome all new members and implementors to come participate in this new phase of development. Additionally, you can use the SNIA feedback portal to provide your comments.
Q. You mentioned SDXI is
interconnect-agnostic and yet we are talking about SDXI and a specific
interconnect here i.e. CXL. Is SDXI architected to work on CXL?
A. SDXI is designed to be interconnect agnostic. It standardizes
the memory structures, function setup, control, etc. to make sure that a
standardized mover can have an architected global state. It does not preclude
an implementation from taking advantage of the features of an underlying interconnect.
CXL will be an important instance which is why it was a big part of this
presentation.
Q. I think you covered it in the talk,
but can you highlight some specific advantages for SDXI in a CXL environment
and some ways CXL can benefit from an SDXI standardized data mover?
A. CXL-enabled architecture expands the
targetable System Memory space for an architected memory data mover like SDXI. Also,
as I explained, SDXI implementors have a few unique implementation choices in a
CXL-based architecture that can further improve/optimize data movement. So,
while SDXI is interconnect agnostic, SDXI and CXL can be great buddies :-).
With CXL concepts like “shared memory”
and “pooled memory,” SDXI can now become a multi-host data mover. This is huge
because it eliminates a lot of software stack layers to perform both intra-host
and inter-host bulk data transfers.
Q. CXL is termed as low latency, what are
the latency targets for CXL devices?
A. While overall CXL device latency
targets may depend on the media, the guidance is to have CXL access latency to
be within one NUMA hop. In other words, the CXL memory access should have
similar latency to that of remote socket DRAM access.
Q. How are SNIA and CXL collaborating on
this?
A. SNIA and CXL have a marketing alliance
agreement that allows SNIA and CXL to work on joint marketing activities such
as this webcast to promote collaborative work. In addition, many of the
contributing companies are members of both CXL and the SNIA SDXI TWG. This
helps in ensuring the two groups stay connected.
Q. What is the difference in memory
pooling and memory sharing? What are the advantages of either?
A. Memory pooling (also referred to as memory disaggregation) is an approach where multiple hosts allocate dedicated memory resources from the pool of CXL memory device(s) dynamically, as needed. The memory resources are allocated to one host at any given time. The technique ensures optimum and efficient usage of expensive memory resources, providing TCO advantage.
In a memory sharing usage model,
allocated blocks of memory can be used by multiple hosts at the same time. The
memory sharing provides optimum usage of memory resources and also provides
efficiency on memory allocation and management.
Q. CXL is termed as low latency, what are
the latency targets for CXL devices? Can SDXI enable the data movement across
the CXL devices in peer-to-peer fashion?
A. Yes. Indeed. SDXI devices can target
all memory regions accessible to the host and among other usage models perform data
movement across CXL devices in a peer-to-peer fashion. Of course, this assumes
a few implications around platform support, but SDXI is designed for such data
movement use cases as well.
Q. Trying to look for equivalent
terms…can you think of SDXI as what NVMe® is for NVMe-oF
and CXL as the
underlying transport fabric like TCP?
A. There are some similarities, but the use
cases are very different and therefore I suspect the implementations would
drive the development of these standards very differently. Like NVMe which
defines various opcodes to perform storage operations, SDXI defines various
opcodes to perform memory operations. And it is also true that SDXI opcodes/descriptors
can be used to move data using PCIe and CXL as the I/O interconnect and a future
expansion to ethernet based interconnects can be envisioned. Having said that,
memory operations have different SLAs, performance characteristics, byte
addressability concerns, and ordering requirements among other things. SDXI is
enabling a new class of such devices.
Q. Is there a limitation on the
granularity size of transfer – SDXI is limited to bulk transfers only or does
it also address small granular transfers?
A. As a standard specification, SDXI allows
implementations to process descriptors for data transfer sizes ranging from 1 Byte
to 4GB. That said, the software may use size thresholds to determine offloading
data transfers via SDXI devices based on implementation quality.
Q. Will there be a standard SDXI driver
available from SNIA or is each company responsible for building a driver to be
compatible with the SDXI compatible hardware they build?
A. The SDXI TWG is not developing
the common open-source driver because of license considerations in SNIA. The
SDXI TWG is beginning to work on a common user-space open-source library for
applications.
The SDXI spec is enabling the development
of a common class level driver by reserving a class code with PCI SIG for PCIe
based implementations. The driver implementations are being enabled and
influenced with discussions in the SDXI TWG and other forums.
Q. Software development is throttled by
the availability of standard CXL host platforms. When will those be available and
for what versions?
A. We cannot comment on specific
product/platform availability and would advise to connect with the vendors for
the same. There is CXL1.1 based host platform available in the market and
publicly announced.
Q. Does a PCIe based data mover with an
SDXI interface actually DMA data across the PCIe link? If so, isn’t this higher latency and less
power efficient than a memcpy operation?
A. There is quite a bit of prior art
research within academia and industry that indicates that for certain data
transfer size thresholds, an offloaded data movement device like an SDXI device
can be more performant than employing a CPU thread. While software can employ
more CPU threads to do the same operation via memcpy it comes at a cost. By
offloading them to SDXI devices, expensive CPU threads can be used for other
computational tasks helping improve overall TCO. Certainly, this will depend on
implementation quality, but SDXI is enabling such innovations with a
standardized framework.
Q. Will SDXI impact/change/unify NVMe?
A. SDXI is expected to complement the data
movement and acceleration needs of systems comprising NVMe devices as well as
needs within an NVMe subsystem to improve storage performance. In fact, SNIA
has created a subgroup, the “CS+SDXI” subgroup that is comprised of members of SNIA’s Computational
Storage TWG and SDXI TWG to think about such kinds
of use cases. Many computational storage use cases can be enhanced with a
combination of NVMe and SDXI-enabled technologies.



Leave a Reply