# 448Gb/s native O/E modulation format for Al compute networks: PAM4 vs. PAM6



Maxim Kuschnerov, Balazs Matuz, Tom Wettlin, Nebojsa Stojanovic, Stefano Calabro

*Our vision and mission is to bring digital to every person, home and organization for a fully connected, intelligent world.* 

OIF 448Gbps Signaling for AI Workshop - April 15-16, 2025



### Industry on 448G O/E modulation (PAM4/6/8)



#### **Options and Challenges**



| Host / Switch   | Line / Optics   | Electrical reach | Optical reach | Power efficiency      | Manufacturability |
|-----------------|-----------------|------------------|---------------|-----------------------|-------------------|
| 400G PAM4 (x8)  | 400G PAM4 (x8)  | Poor             | Good          | Good                  | Good              |
| 200G PAM4 (x16) | 400G PAM4 (x8)  | Good             | Good          | Challenging (gearbox) | Good              |
| 400G PAM6 (x8)  | 400G PAM4 (x8)  | Possible         | Good          | Challenging (gearbox) | Good              |
| 400G PAM6 (x8)  | 400G PAM6 (x8)  | Possible         | Poor          | Unknown               | Good              |
| 200G PAM4 (x16) | 200G PAM6 (x16) | Good             | Good          | Good                  | Unknown           |
|                 |                 |                  |               |                       |                   |

3.2T brings challenges with optical reach, electrical reach, and power

LUMENTUM

#### Ashika Shaji, Nathan Tracy, TE Jeff Hutchins, Ranovus Lumentum Modulation Choices and Trade-offs (448 Gbps) -Looking at PAM6 Next generation of VSR interface 400G Electrical In-Rack Simulations 224Gbd Backwards compatible, aligned with optics 112GHz • Pluggable C2M/VSR interface is still desirable, Best Options 179.2Gbd 89.6GHz Offers slight bandwidth relief, has SNR penalty but front panel interconnect becomes very OFDM PAM4 PAM6 PAMR PAM12 179.26bd 89.6GHz CROSS-32 has slightly more detector margin than PAM6 for 50% of challenging Required SNR for DFE eceived symbols, 6-level constellation, more complex detector 16.6dB 20.1dB 26.0dB 22 6dB Double the data rate @1e-3 BER (2D)/MLSD Channels will not improve in proportion to the data Reduced noise immunity on link, reduced SNR due to more levels, 224Gbd SNR at Slicer 13.4dB 20.6dB 23.5dB 25.4dB nay need training sequence for adaptation rate SNR Margin -3.2dB -0.6dB 0.5dB 0.9dB Analog front-end will not scale in proportion to the 149.3Gbd High SNR penalty, more relief on BW 74.6GH: DFE SER $2.8 \times 10^{-3}$ $3.0 \times 10^{-3}$ $3.7 \times 10^{-1}$ $5.3 \times 10^{-3}$ data rate DSO-32 ouble Square 2D 179.2Gbd 89.6GHz No different than cross-32 in detector margin, 8-level constellation Light-MLSE SER $8.8 \times 10^{-1}$ $1.5 \times 10^{-4}$ $1.9 \times 10^{-1}$ $1.3 \times 10^{-3}$ vs 6-level 1126b Bi-directional differential links MLSE SER $1.3 \times 10^{-3}$ $7.0 \times 10^{-1}$ $1.2 \times 10^{-4}$ $1.9 \times 10^{-4}$ · Higher bandwidth efficiency is going to be Need Hybrid/echo canceller BER $3.5 \times 10^{-3}$ (but keeps same I/O count needed 112Gbc Single-ended links, need lane to lane alignment PAM6 - PAM8: Good compromise between IL, bandwidth and SNR ensitive to common mode and xtalk (xtalk canceller (but keeps same I/O count) PAM4: Bandwidth and X-talk limitation 40 50 60 ~224Gbd ~112GHz Need FFT/inv-FFT, not backwards con PAM12: SNR requirement for modulation penalty OFDM: Significant SNR loss due to high PAPR ea oif2025.043.00

Ken Lusted, Synopsis

Cathy Liu, Broadcom

448Gbps- Will PAM4 Work?

· Currently, no industry direction on modulation scheme

Shorter connector, reduced stub

· Higher order modulation schemes may require -

· Highly shielded designs

Bulk Cable

PCB Loss

Connector\*

Total Loss for Typical Channel

(TPo-TP5)

Industry Status

Loss @ 104.25GHz

12dB/m

2dB/in

~4dB

>42dB

Not considering reson
Early estimates

High Insertion Loss and roll-off

> PAM-4 presents a significant challenge

NRZ

PAM 4

PAM 4

28 GHz

28 GHz

56 GHz

Crosstalk higher

Data Rat

56 Gbps

112 Gbps

224 Gbps

448 Gbps

Resonances

Halil Cirit, Meta

#### Bandwidth limitations on the electrical channel side are dominating the modulation discussion



#### OIF 448Gbps Signaling for AI Workshop April 15-16, 2025

2

## **Optical PAM4 demonstrations at OFC 2025**

- OFC 2025 has seen several demonstrations of 400G/lane optical feasibility
- TFLN achieves a higher bandwidth overall, with the highest EML baud rate shown by Lumentum
- SiP has been limited to 160-175Gbaud demonstrations
- First products will include gear-boxed solutions with 224G SerDes

### TFLN



EML

# Native signaling for various architectures

#### Native modulation needed

- First 448G/lane optical modules will be based on gear-boxed 224G SerDes
- However, a native modulation scheme supporting both electrical and optical channels is the ideal choice for future Ethernet

### Support all architectures

 E2E low latency FEC architecture support needed for AECs, retimed pluggables, LPO, LRO, NPO, CPO transceivers

#### Inner FEC not primary use case

• Better inner FEC in the pluggable module is an extended use case, but should not guide the modulation format choice

#### PAM6 better for electrical channels

- 448Gb/s PAM6 performs better over current electrical channel models
- Can PAM6 also be a competitive format for optics or is PAM4 the best native modulation?





4

## **DSP power consumption PAM4 vs. PAM6**

#### DAC and ADC: Advantage for PAM6

- Benefit from the 20% lower symbol rate of PAM6
- No increase in resolution for PAM6 with respect to PAM4 needed

#### FFE: Slight advantage for PAM4

- Time domain implementation assumed to reduce latency in the SerDes
- It benefits from the 20% lower symbol rate in terms of both operations per sec. and number of required taps
- It suffers from the increased number of levels of PAM6 with respect to PAM4

#### **MLSE:** Advantage for PAM4

- Can be simplified through state reduction. It benefits from the 20% lower symbol rate for PAM6
- It suffers from the increased number of levels and from the 2D nature of the PAM6 constellation

#### **Overall: Slight advantage for PAM6**

Our preliminary estimate results in a slight power advantage for PAM6

### For the same power consumption, once can e.g. assume a slightly higher overhead FEC for PAM6

#### PAM4 vs. PAM6 DSP power



#### PAM4 PAM6

#### Assumptions

- Symbol rate PAM6 = 80% symbol rate of PAM4
- State-reduced maximum likelihood sequence detection (MLSD) and timedomain feed forward equalizer (FFE)
- FFE complexity for PAM6 is assumed to be 50% higher than for PAM4 (excluding symbol rate impact). This accounts for larger constellation (more complexity) and less stringent bandwidth limitations (less complexity)
- MLSE complexity for PAM6 is assumed to be 100% higher than for PAM4 (excluding symbol rate impact)
- Timing recovery (TR) with similar assumptions, although easier to implement for PAM6 if there is less bandwidth limitation



## **FEC** assumption

#### KP4 for PAM4

- 200G PAM4 legacy mode will require KP4 by definition
- KP4 is the best initial assumption for 448G PAM4 in the host

#### **Better FEC for PAM6**

- We assume a higher overhead FEC for PAM6 to achieve a fairer comparison to the higher baud rate / power PAM4
- HD-FEC to support all retimed architectures

#### Lower overall risk

 Technological risk of 180Gbaud PAM6 SerDes with 12% FEC is still lower than 212Gbaud PAM4 with KP4

### **Reduces SNR gap**

 SNR gap can be reduced from ~3dB → ~1.5dB, which is relevant for optical channels to limit laser output power

### Error floor margin

 Better FEC for PAM6 is needed also to improve the error floor margin

#### Optical Rx sensitivity PAM4 vs. PAM6





# **Chromatic dispersion & wavelength plan**

#### **MLSE for higher CD**

 MLSE is already part of 224G AUI and will be part of 448G SerDes DSP to increase the CD tolerance

#### PAM4 vs. PAM6

No substantial advantage for PAM6 enabling a new applications

### 1.6T FR4

- Accomodating for transmitters with different chirp, a 1.6T FR4 interface with PAM4/PAM6 looks feasible
- Uncooled FR4-2km with 10nm spacing possible
- Chirp managed FR4-2km with 20nm spacing possible

#### 3.2T FR8

• On paper possible on a LAN-WDM grid

7

 LAN-WDM would require a tighter laser accuracy of +/-0.5nm compared to today's cooled lasers with +/-1nm, which would further increase costs



### MPI

#### **Networking interruption**

- Networking failures in GPU training clusters have a significant effect on cluster performance and amount of GPU sparing
- ~80% of all optical transceiver failures come from link contamination (link flaps)

#### More stringent MPI spec

- · New data centers with initially more dust in the air
- Legacy Ethernet MPI spec is **-35dB**, but should be increased for future scenarios
- Linear drive (LPO/LRO/CPO) use cases will lead to more reflections in the analog signal path
- **PAM4** has a higher inherent MPI tolerance employing receiver sided compensation techniques of up to **-25dB**

#### **Improving PAM6**

 Better PAM6 performance would require additional signal overhead (e.g. 1.5-2%) and more effort with standardization of the equalization scheme

| Component                      | Category                 | Interruption Count | % of Interruptions |  |
|--------------------------------|--------------------------|--------------------|--------------------|--|
| Faulty GPU                     | GPU                      | 148                | 30.1%              |  |
| GPU HBM3 Memory                | GPU                      | 72                 | 17.2%              |  |
| Software Bug                   | Dependency               | 54                 | 12.9%              |  |
| Network Switch/Cable           | Network                  | 35                 | 8.4%               |  |
| Host Maintenance               | Unplanned<br>Maintenance | 32                 | 7.6%               |  |
| GPU SRAM Memory                | GPU                      | 19                 | 4.5%               |  |
| GPU System Processor           | GPU                      | 17                 | 4.1%               |  |
| NIC                            | Host                     | 7                  | 1.7%               |  |
| NCCL Watchdog Timeouts         | Unknown                  | 7                  | 1.7%               |  |
| Silent Data Corruption         | GPU                      | 6                  | 1.4%               |  |
| GPU Thermal Interface + Sensor | GPU                      | 6                  | 1.4%               |  |
| SSD                            | Host                     | 3                  | 0.7%               |  |
| Power Supply                   | Host                     | 3                  | 0.7%               |  |
| Server Chassis                 | Host                     | 2                  | 0.5%               |  |
| IO Expansion Board             | Host                     | 2                  | 0.5%               |  |
| Dependency                     | Dependency               | 2                  | 0.5%               |  |
| CPU                            | Host                     | 2                  | 0.5%               |  |
| System Memory                  | Host                     | 2                  | 0.5%               |  |



#### [Meta] https://arxiv.org/abs/2407.21783

| Estir                            | nated Time to | First Job Failure | (Minutes)  |          |
|----------------------------------|---------------|-------------------|------------|----------|
| Mean Time to Failure<br>Per Link | 3 years       | 4 years           | 5 years    | 10 years |
| Number of GPUs                   |               | 145               | <b>另</b> 些 |          |
| 10,000                           | 157.7         | 210.2             | 262.8      | 525.6    |
| 20,000                           | 78.8          | 105.1             | 131.4      | 262.8    |
| 50,000                           | 31.5          | 42.0              | 52.6       | 105.1    |
| 100,000                          | 15.8          | 21.0              | 26.3       | 52.6     |

[SemiAnalysis]



PAM-4



# Improving FEC latency & power

#### E2E FEC

Soft decoding is not an option for the host FEC due to retimed interfaces

#### PAM4 FEC

- Better performance could improve electrical channel performance
- Increased OH for PAM4 is generally not desired . in the host
- KP4 FEC will be part for the SerDes for the 200G ٠ interop mode and should be ideally reused
- MLC with different Reed-Solomon FECs can achieve same performance as KP4 at lower overhead and power consumption (5.8%  $\rightarrow$ 4.1%)

#### PAM6 FEC

MLC can also deliver optimized codes for PAM6 • and provide better performance than BICM decoders

| Code (Hard decoded)            | ОН    | BER @ 1e-15 | NCG   | Complexity |
|--------------------------------|-------|-------------|-------|------------|
| RS(544,514)                    | 5.8%  | 2.2e-4      | 6.9dB | 1x         |
| MLC RS(554,514) + RS(544,542)  | 4.9%  | 2.3e-4      | 7.0dB | 1.05x      |
| RS(560,514)                    | 8.9%  | 6.1e-4      | 7.4dB | 2.86x      |
| RS(576,514)                    | 12.1% | 1.1e-3      | 7.8dB | 5.79x      |
| RS(544,514) + BCH(128,120)     | 12.9% | 1.4e-3      | 8.0dB | 1x         |
| MLC RS(544,514) + BCH(128,120) | 7.4%  | 7.33e-4     | 7.8dB | 0.5x       |

#### Selection of basic FEC options



April 15-16, 2025



### Conclusions

- Electrical and optical domains require an identical modulation format similar to previous Ethernet standards to avoid gearboxes for every architecture using 448G SerDes
- PAM4 has an obvious advantage in the optical domain
- PAM4 limitation in the electrical domain largely come from connectors
- PAM6 could potentially overcome the drawbacks in the optical domain with more effort (higher overhead)
- Other optical effects, like DGD or FWM, not critical at 2km for PAM4/6 for FR4
- Next to retimed architectures, linear drive optics and copper cabled designs will dominate the decision finding





10

# Thank you

