# COMPUTE + MEMORY S + STORAGE SUMMIT

Architectures, Solutions, and Community VIRTUAL EVENT, APRIL 11-12, 2023

Universal Chiplet Interconnect Express<sup>TM</sup> (UCle<sup>TM</sup>): On-Package Innovation Slot for Compute, Memory, and Storage Applications

Presented by

Dr. Debendra Das Sharma, Intel Senior Fellow Chair of UCIe Consortium



# Agenda

- Interconnects in Compute Landscape
- UCIe: An Open Standard for Chiplets for innovations in Compute, Memory, and Storage Applications
- Future Directions and Conclusions

### Explosion of data enabling data-centric revolution



Drivers: Cloud, 5G, sensors, automotive, IoT, etc.. Large data sets with aggressive time to insight goals! Scaling challenges: Latency, Bandwidth, Capacity all important!

Move faster, Store more, Process everything seamlessly, efficiently, and securely

✓ COMPUTE + MEMORY

# Compute Landscape and Interconnects

| Category                                               | Type and Scale                                                                            | Data Rate/<br>Characteristics                                                                                                                    | PHY Latency<br>(Tx + Rx)                                 | Wireless                                                                                           |
|--------------------------------------------------------|-------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| Latency<br>Tolerant<br>(Narrow,<br>very high<br>speed) | Networking /<br>Fabric for<br>Data Center Scale                                           | 56/ 112 GT/s-> 224 GT/s (PAM4)<br>4-8 Lanes, cables/ backplane                                                                                   | 20+ ns (+ >100<br>FEC)                                   | interconnect  Router  Inter DC tinks  Core/Edge Network  Spine Switch                              |
| Latency<br>Sensitive<br>(Wide, high<br>speed)          | Load-Store I/O Arch. Ordering (PCIe/ CXL / SMP cache coherency – PCIe PHY) Node (-> Rack) | 32 GT/s (NRZ) -> PCIe Gen6 64 GT/s (PAM4)  Hundreds of Lanes Power, Cost, Si-Area, Backwards Compatible, Latency, On-board -> cables/ backplanes | <10ns<br>(Tx+ Rx: PHY-<br>PIPE)<br>0-1ns FEC<br>overhead | Data center interconnect  Rack Rack Rack Rack Rack Rack  UPI  Leaf Switch  CXL 2/3                 |
| Latency<br>Sensitive<br>(super-wide,<br>high speed)    | Load-Store and proprietary                                                                | 4 G – 32G (single-ended, NRZ) 2D, 2.5D (-> 3D) Thousands of Lanes Ultra low power, ultra low latency High b/w density                            | <2ns (PHY –<br>Transaction<br>Layer)                     | Processor interconnect  PCIe CXL  UIO: UPI, PCIe, CXL  UCIe  SoC interconnect  PIPE, LPIF, CPI,UFI |

Load-Store I/O: From Package/ Node to Rack / Pod

On-Package

# Off-Package Load-Store Interconnects: PCIe and CXL

- With PCIe: (900+ member companies)
  - Memory Connected to CPU Cacheable
  - Memory Connected to PCIe device is Uncacheable
  - Different Ordering rules across I/O vs coherency domains
  - Ubiquitous I/O for compute continuum
- With CXL: (~200 member companies)
  - Caching and memory protocols on top of PCIe
  - Device can cache memory
  - Memory attached to device is cacheable
  - Leverages PCIe infrastructure
- PCle and CXL very successful industry standards:
  - Multi-generational, backward compatible, IP/ tools
  - Compliance program with plug-and-play

On-Package Interconnects should leverage PCIe/CXL infrastructure for standardization and Load-Store Usages.. Need to seamlessly move functionality from node to package to die level



#### Design Choice: Seamless Integration from Node → Package → On-die

Enables Reuse, Better User Experience, Economies of Scale, TTM advantage



# 

Architectures, Solutions, and Community VIRTUAL EVENT, APRIL 11-12, 2023



UCIe: An Open Standard for Chiplets for innovations in Compute, Memory, and Storage Applications: Open Ecosystem, Best Power/ Performance/ Cost metrics, Ubiquitous, Continuous innovation with backward compatibility

### Moore Predicted "Day of Reckoning"

"It may prove to be more economical to build large systems out of smaller functions, which are separately packaged and interconnected<sup>1</sup>."

Gordon E. Moore

1: "Cramming more components onto integrated circuits", Electronics, Volume 38, Number 8, April 19, 1965



#### Motivation for UCIe

- Enables SoC construction that exceeds maximum reticle size
  - Package becomes new System-on-a-Chip (SoC) with same dies (Scale Up)
- Reduces time-to-solution (e.g., enables die reuse)
- Lowers portfolio cost (product & project)
  - Enables optimal process technologies
  - Smaller (better yield)
  - Reduces IP porting costs
  - Lowers product SKU cost
- Enables a customizable, standard-based product for specific use cases (bespoke solutions)
- Scales innovation (manufacturing/ process locked IPs)



Align the industry around an open platform to enable chiplet based solutions

# UCIe: Key Metrics and Adoption Criteria

#### **Key Performance Indicators**

- Bandwidth density (linear & area)
  - Data Rate & Bump Pitch
- Energy Efficiency (pJ/b)
  - Scalable energy consumption
  - Low idle power (entry/exit time)
- Latency (end-to-end: Tx+Rx)
- Channel Reach
  - Technology, frequency, & BER
- Reliability & Availability
- Cost: Standard vs advanced packaging

#### **Factors Affecting Wide Adoption**

- Interoperability
  - •Full-stack, plug-and-play with existing s/w
  - Different usages/segments ubiquity
- Technology
  - •Across process nodes & packaging options
  - Power delivery & cooling
  - •Repair strategy (yield improvement)
  - Debug controllability & observability
- Broad industry support / Open ecosystem
  - Learnings from other standards efforts

UCIe is architected and specified from the ground-up to deliver the best KPIs while meeting wide adoption criteria

### **Jumpstarting UCle**

- Focus of UCIe 1.0 Specification
  - Physical Layer: Die-to-Die I/O with industry-leading KPIs
  - Protocol: CXL<sup>™</sup>/PCIe® for near term volume attach
    - SoC construction issues are addressed since CXL/PCIe is a board-to-board interface
    - CXL/PCIe addresses common use cases
      - I/O attach with PCIe/CXL.io
      - Memory use cases: CXL.mem
      - Accelerator use cases: CXL.cache
  - Well defined specification: ensure interoperability and future evolution



PROTOCOL LAYER

**DIE-TO-DIE ADAPTER** 

ARB/MUX (when applicable) CRC/RETRY (when applicable) LINK STATE MANAGEMENT PARAMETER NEGOTIATION

**PHYSICAL LAYER** 

LINK TRAINING
LANE REPAIR (when applicable)
LANE REVERSAL (when applicable)
SCRAMBLING/DE-SCRAMBLING (opt-in)
SIDEBAND TRAINING & TRANSFERS
ANALOG FRONT END
CLOCK FORWARD

### **UCIe 1.0: Supports Standard and Advanced Packages**



(Standard Package)

Standard Package: 2D – cost effective, longer distance

Advanced Package: 2.5D – power-efficient, high bandwidth density

Dies can be manufactured anywhere and assembled anywhere – can mix 2D and 2.5D in same package – Flexibility for SoC designer



(Multiple Advanced Package Options)

One UCIe 1.0 Spec covers both type of packaging options

# UCIe Usage Model: SoC at Package Level

- SoC as a Package level construct
  - Standard and/ or Advanced package
  - Homogeneous and/or heterogeneous chiplets
  - Mix and match chiplets from multiple suppliers
- Across segments: Hand-held, Client, Server,
   Workstation, Comms, HPC, etc
  - Similar to PCIe/ CXL at board level





### UCIe Usage: Off-package connectivity w/ Retimers



(Use Case: Load-Store I/O (CXL) as the fabric across the Pod providing low-latency and high bandwidth resource pooling/ sharing as well as message passing)



Provision to extend off-package with UCIe Retimers connecting to other media (e.g., optics)

(Optical connections: Intra-Rack and Pod)



(Pooled/ Shared Memory) (Pooled Accelerator)

(Switch dies connected through UCIe PHY + Adapter Running a proprietary switch internal protocol)



### UCIe 1.0: Characteristics and Key Metrics

| CHARACTERISTICS      | STANDARD<br>PACKAGE  | ADVANCED<br>PACKAGE | COMMENTS                                                                |
|----------------------|----------------------|---------------------|-------------------------------------------------------------------------|
| Data Rate (GT/s)     | 4, 8, 12, 16, 24, 32 |                     | Lower speeds must be supported -interop (e.g., 4, 8, 12 for 12G device) |
| Width (each cluster) | 16                   | 64                  | Width degradation in Standard, spare lanes in Advanced                  |
| Bump Pitch (um)      | 100 – 130            | 25 - 55             | Interoperate across bump pitches in each package type across nodes      |
| Channel Reach (mm)   | <= 25                | <=2                 |                                                                         |

| KPIs / TARGET FOR KEY METRICS  | STANDARD<br>PACKAGE            | ADVANCED<br>PACKAGE | COMMENTS                                                               |  |
|--------------------------------|--------------------------------|---------------------|------------------------------------------------------------------------|--|
| B/W Shoreline (GB/s/mm)        | 28 - 224                       | 165 – 1317          | Conservatively estimated: AP: 45u; Standard: 110u; Proportionate       |  |
| B/W Density (GB/s/mm²)         | 22-125                         | 188-1350            | data rate (4G – 32G)                                                   |  |
| Power Efficiency target (pJ/b) | 0.5                            | 0.25                |                                                                        |  |
| Low-power entry/exit latency   | 0.5ns <=16G, 0.5-1ns >=24G     |                     | Power savings estimated at >= 85%                                      |  |
| Latency (Tx + Rx)              | < 2ns                          |                     | Includes D2D Adapter and PHY (FDI to bump and back)                    |  |
| Reliability (FIT)              | 0 < FIT (Failure In Time) << 1 |                     | FIT: #failures in a billion hours (expecting ~1E-10) w/ UCIe Flit Mode |  |

UCIe 1.0 delivers the best KPIs while meeting the projected needs for the next 5-6 years across the compute continuum.

"

✓ COMPUTE + MEMORY

✓ + STORAGE SUMMIT

#### **Future Directions and Conclusions**

- UCIe Consortium is incorporated with board elections in June 2022 adding two board members
- UCIe is an open industry standard that establishes an open chiplet ecosystem and ubiquitous interconnect at the package level.
  - Tremendous support across the industry with several companies announcing IP/VIP availability
  - Poised to be the interconnect of SoCs the same way PCIe and CXL are at the board level
  - UCIe 1.0 Specification is available to the public <a href="https://www.uciexpress.org/specification">https://www.uciexpress.org/specification</a>
- UCIe Consortium welcomes interested companies and institutions to join the organization at the Contributor or Adopter level.
- 5 Technical Working Groups (Electrical, Protocol, Form Factor/Compliance, Manageability / Security, Systems and Software) and Marketing Working Group driving the technology forward
  - Plenty of innovations happening in the consortium
- Join us if you have not done so! Learn more by visiting www.UCIexpress.org





### Please take a moment to rate this session.

Your feedback is important to us.