Last month, the SNIA Networking Storage Forum hosted
several experts leading the Open Programmable Infrastructure (OPI) project with
a live webcast, “An Introduction to the OPI (Open Programmable Infrastructure) Project.” The
project has been created to address a new class of cloud and datacenter
infrastructure component. This new infrastructure element, often referred to as
Data Processing Unit (DPU), Infrastructure Processing Unit (IPU) or xPU as a
general term, takes the form of a server hosted PCIe add-in card or on-board
chip(s), containing one or more ASIC’s or FPGA’s, usually anchored around a
single powerful SoC device.
Our OPI
experts provided an introduction to the OPI Project
and then explained lifecycle provisioning, API, use cases, proof of concept and
developer platform. If you missed the live presentation, you can watch it on
demand and download a PDF of the slides at the SNIA
Educational Library. The attendees at the live session asked several
interesting questions. Here are answers to them from our presenters.
Q. Are there any plans for OPI to use GraphQL for API
definitions since GraphQL has a good development environment, better security,
and a well-defined, typed, schema approach?
A. GraphQL is a good choice for frontend/backend services
with many benefits as stated in the question. These benefits are particularly
compelling for data fetching. For OPI for communications between different
microservices we still see gRPC as a better choice. gRPC has a strong ecosystem
in cloud and K8S systems with fast execution, strong typing, and polygot
endpoints. We see gRPC as the best choice for most OPI APIs due to the strong containerized
approach and ease building schemas with Protocol Buffers. We do keep
alternatives like GraphQL in mind for specific cases.
Q. Will OPI add APIs for less common use cases like
hypervisor offload, application verification, video streaming, storage
virtualization, time synchronization, etc.?
A. OPI will continue to add APIs for various use cases
including less common ones. The initial focus of the APIs is to address
the major areas of networking, storage, security and then expand to address
other cases. The API discussions today are already expanding to consider
the virtualization (containers, virtual machines, etc.) as a key area to
address.
Q. Do you communicate with CXL
Consortium too?
A. While we have not communicated with the Compute Express
Link (CXL) Consortium formally. There have been a few conversations with CXL
interested parties. We will need to engage in discussions with CXL Consortium
like we have with SNIA, DASH, and others.
Q. Can you elaborate on the purpose of APIs for AI/ML?
A. The DPU solutions contain accelerators
and capabilities that can be leveraged by AI/ML type solutions, and we will
need to consider what APIs need to be exposed to take advantage of these
capabilities. OPI believes there is a set of data movement and co-processor
APIs to support DPU incorporation into AI/ML solutions. In keeping with its
core mission, OPI is not going to attempt to redefine the existing core AI/ML
APIs. We may look at how to incorporate those into DPUs directly as well.
Q. Have you considered creating a TEE (Trusted Execution
Environment) oriented API?
A. This is something that has been considered and is a
possibility in the future. There are some different sides to this:
1) OPI itself using TEE on the
DPU. This may be interesting, although we’d need a compelling use case.
2) Enabling OPI users to utilize
the TEE via a vendor neutral interface. This will likely be interesting, but
potentially challenging for DPUs as OPI is considering them. We are
currently focused on enabling applications running in containers on DPUs and
securing containers via TEE is currently a research area in the industry. For example,
there is this project at the “sandbox” maturity level: https://www.cncf.io/projects/confidential-containers/
Q. Will OPI support integration with OCP Caliptra project for ensuring silicon level hardware authentication during boot? Reference: https://siliconangle.com/2022/10/18/open-compute-project-announces-caliptra-new-standard-hardware-root-trust/
A. OPI hasn’t looked at Caliptra yet. As Caliptra
matures OPI will follow the industry ecosystem wider direction in this area. We currently follow https://www.dmtf.org/standards/spdm
for attestation plus IEEE
802.1AR – Secure Device Identity and https://www.rfc-editor.org/rfc/pdfrfc/rfc8572.txt.pdf
for secure device zero touch
provisioning and onboarding.
Q. When testing NVIDIA DPUs on some server models, the
temperature of the DPU was often high because of lack of server cooling
resulting in the DPU shutting itself down. First question, is there an open API
to read sensors from DPU card itself? Second question, what happens when DPU
shuts down, then cools, and comes back to life again? Will the server be
notified as per standards and DPU will be usable again?
A. Qualified DPU servers from major manufacturers integrate close loop thermals to make sure that cooling is appropriate and temp readout is implemented. If a DPU is used in a non-supported server, you may see the challenges that you experienced with overheating and high temperatures causing DPU shutdowns. Since the server is still in charge of the chassis, PDUs, fans and others, it is the BMCs responsibility to take care of overall server cooling and temperature readouts. There are several different ways to measure temperature, like SMBUS, PLDM and others already widely used with standard NICs, GPUs and other devices. OPI is looking into which is the best specification to adopt for handling temperature readout, DPU reboot, and overall thermal management. OPI is not looking to define any new standards in this area.
If you are interested in learning more about DPUs/xPUs, SNIA has covered this topic extensively in the last year or so. You can find all the recent presentations at the SNIAVideo YouTube Channel.

Leave a Reply