CXL Brings Data Center Sized Computing with 3.0 Standard, Think 4.0
A new version of a standard backed by major cloud providers and chip companies could change the way some of the world’s biggest data centers and fastest supercomputers are built.
The CXL Consortium on Tuesday announced a new specification called CXL 3.0 — also known as Compute Express Link 3.0 — that eliminates more of the bottlenecks that slow down computing in enterprise computing and data centers.
The new specification provides a communication link between chips, memory and storage in systems, and it is twice as fast as its predecessor called CXL 2.0.
CXL 3.0 also introduces enhancements for more accurate pooling and sharing of computing resources for applications such as artificial intelligence.
CXL 3.0 aims to improve bandwidth and capacity, and can better provision and manage computing, memory and storage resources, said Kurt Lender, Co-Chair of the CXL Marketing Working Group, in an interview with HPCwire.
Hardware and cloud providers are merging around CXL, which has crushed other competing interconnects. This week, OpenCAPI, an IBM-backed interconnect standard, merged with CXL Consortium, following in the footsteps of Gen-Z, which did the same in 2020.
CXL released the first CXL 1.0 specification in 2019, and soon followed it with CXL 2.0, which supported PCIe 5.0, which is found in a handful of chips such as Intel’s Sapphire Rapids and Nvidia’s Hopper GPU.
The CXL 3.0 specification is based on PCIe 6.0, which was finalized in January. CXL has a data transfer speed of up to 64 gigatransfers per second, which is the same as PCIe 6.0.
The CXL interconnect can connect chips, storage and memory near and far together, allowing system vendors to build data centers like one giant system, said Nathan Brookwood, principal analyst at Insight 64.
CXL’s ability to support memory, storage and processing expansion in a disaggregated infrastructure gives the protocol a step ahead of competing standards, Brookwood said.
Data center infrastructures are evolving towards a decoupled structure to meet the growing processing and bandwidth needs for AI and graphics applications, which require large pools of memory and storage. AI and scientific computing systems also require processors beyond CPUs, and organizations are installing AI boxes and, in some cases, quantum computers, for more power.
CXL 3.0 improves bandwidth and capacity through better switching and fabric technologies, said consortium lender CXL.
“CXL 1.1 was kind of in the node, and then with 2.0 you can stretch a little more into the data center. And now you can actually run racks, you can build decomposable or composable systems, with the… fabric technology that we brought with CXL 3.0,” said Lender.
At the rack level, CPU or memory shelves can be created as separate systems, and CXL 3.0 enhancements provide more flexibility and options in resource switching over previous CXL specifications.
Typically, servers have CPU, memory, and I/O, and their physical expansion can be limited. In a disaggregated infrastructure, one can run a cable to a separate memory tray via a CXL protocol without relying on the popular DDR bus.
“You can decompose or compose your data center as you see fit. You have the ability to move resources from one node to another, and you don’t have to overprovision as much as we do today, especially with memory,” Lender said, adding “it’s It’s about developing systems and sort of interconnecting now through this fabric and through CXL.
The CXL 3.0 protocol uses the electrical components of the PCI-Express 6.0 protocol, as well as its I/O and memory protocols. Some improvements include support for new processors and terminals that can take advantage of the new bandwidth. CXL 2.0 had single-level switching, while 3.0 has multi-level switching, which provides more latency on the fabric.
“You can actually start thinking of memory as storage – you could have hot memory and cold memory, etc. You can have different levels and apps can take advantage of that,” Lender said.
The protocol also takes into account the ever-changing infrastructure of data centers, providing more flexibility on how system administrators wish to aggregate and disaggregate processing units, memory and storage. The new protocol opens up more channels and resources for new chip types that include SmartNICs, FPGAs, and IPUs that may require access to more memory and storage resources in data centers.
“HPC Composable Systems…you’re not limited by a box. HPC loves clusters today. And [with CXL 3.0] now you can do consistent clusters and low latency. The growth and flexibility of these nodes is expanding rapidly,” Lender said.
The CXL 3.0 protocol can support up to 4096 nodes and has a new concept of sharing memory between different nodes. This is an improvement over a static configuration in older CXL protocols, where memory could be sliced and attached to different hosts, but could not be shared once allocated.
“We now have sharing where multiple hosts can actually share a memory segment. Now you can really consider fast and efficient data movement between hosts if needed, or if you have an AI-like application that you want to pass data around. data from one processor or host to another,” Lender said.
The new feature enables a peer-to-peer connection between nodes and endpoints in a single domain. This sets up a wall in which traffic can be isolated to travel only between nodes connected to each other. This allows faster data transfer from accelerator to accelerator or device to device, which is essential for creating a consistent system.
“If you think about certain applications, then certain GPUs and different accelerators, they want to pass information quickly, and now they have to go through the CPU. With CXL 3.0, they don’t have to go through the CPU that way, but the CPU is cohesive, aware of what’s going on,” Lender said.
The pooling and allocation of memory resources is managed by software called Fabric Manager. Software can be anywhere in the system or on hosts to control and allocate memory, but this could ultimately impact software developers.
“If you get to the level of prioritization, and when you start to have all the different latencies in switching, that’s where there’s going to have to be some awareness and application tuning. I think we definitely have that capability today,” Lender said.
It could be two to four years before companies start releasing CXL 3.0 products, and processors will need to be aware of CXL 3.0, Lender said. Intel has integrated CXL 1.1 support into its Sapphire Rapids chip, which is expected to begin shipping in volume later this year. The CXL 3.0 protocol is backward compatible with older versions of the interconnect standard.
CXL products based on earlier protocols are slowly coming to market. SK Hynix introduced its first Compute Express Link (CXL) DDR5 DRAM memory samples this week and will begin manufacturing CXL memory modules in volume next year. Samsung also introduced CXL DRAM earlier this year.
While products based on the CXL 1.1 and 2.0 protocols are on a two to three year product release cycle, CXL 3.0 products might take a little longer as they adopt a more complex computing environment.
“CXL 3.0 might actually be a bit slower due to some of the Fabric Manager software working. It’s not simple systems when you start getting into fabrics, people will want to do proof of concept and prove the technology first. It will probably be a three to four year delay,” Lender said.
Some companies already started working on the CXL 3.0 verification IP six to nine months ago and are refining the tools for the final specification, Bender said.
The CXL has a board meeting in October to discuss next steps, which may also involve CXL 4.0. The standards organization for PCIe, called PCI-Special Interest Group, announced last month that it is planning PCIe 7.0, which increases data transfer speed to 128 gigatransfers per second, twice that of PCIe 6.0.
The lender was cautious about how PCIe 7.0 could potentially fit into a next-gen CXL 4.0. CXL has its own set of I/O, memory, and cache protocols.
“CXL relies on the electrical components of PCIe, so I cannot absolutely commit or guarantee that [CXL 4.0] will work on 7.0. But that’s the intention – to use electricity,” Lender said.
In this case, one of the principles of CXL 4.0 will be to double the bandwidth by moving to PCIe 7.0, but “beyond that everything else will be what we do – more fabric or do different settings”, has Lender said.
CXL has seen an accelerating pace, with three specification releases since its inception in 2019. There was industry confusion over the best high-speed coherent I/O bus, but attention has now coagulated around CXL.
“Now we have the fabric. There are bits of Gen-Z and OpenCAPI that aren’t even in CXL 3.0, so are we going to incorporate them? Of course, we will consider doing this kind of work in the future,” Lender said.