Tech News

Startup claims to boost LLM performance using standard memory instead of GPU HBM — but experts remain unconvinced by the numbers despite promising CXL technology

March 25, 2024

(Image credit: Getty Images)

MemVerge, a provider of software designed to accelerate and optimize data-intensive applications, has partnered with Micron to boost the performance of LLMs using Compute Express Link (CXL) technology.

The company’s Memory Machine software uses CXL to reduce idle time in GPUs caused by memory loading.

The technology was demonstrated at Micron’s booth at Nvidia GTC 2024 and Charles Fan, CEO and Co-founder of MemVerge said, “Scaling LLM performance cost-effectively means keeping the GPUs fed with data. Our demo at GTC demonstrates that pools of tiered memory not only drive performance higher but also maximize the utilization of precious GPU resources.”

Impressive results

The demo utilized a high-throughput FlexGen generation engine and an OPT-66B large language model. This was performed on a Supermicro Petascale Server, equipped with an AMD Genoa CPU, Nvidia A10 GPU, Micron DDR5-4800 DIMMs, CZ120 CXL memory modules, and MemVerge Memory Machine X intelligent tiering software.

The demo contrasted the performance of a job running on an A10 GPU with 24GB of GDDR6 memory, and data fed from 8x 32GB Micron DRAM, against the same job running on the Supermicro server fitted with Micron CZ120 CXL 24GB memory expander and the MemVerge software.

The FlexGen benchmark, using tiered memory, completed tasks in under half the time of traditional NVMe storage methods. Additionally, GPU utilization jumped from 51.8% to 91.8%, reportedly as a result of MemVerge Memory Machine X software’s transparent data tiering across GPU, CPU, and CXL memory.

Raj Narasimhan, senior vice president and general manager of Micron’s Compute and Networking Business Unit, said “Through our collaboration with MemVerge, Micron is able to demonstrate the substantial benefits of CXL memory modules to improve effective GPU throughput for AI applications resulting in faster time to insights for customers. Micron’s innovations across the memory portfolio provide compute with the necessary memory capacity and bandwidth to scale AI use cases from cloud to the edge.”

However, experts remain skeptical about the claims. Blocks and Files pointed out that the Nvidia A10 GPU uses GDDR6 memory, which is not HBM. A MemVerge spokesperson responded to this point, and others that the site raised, stating, “Our solution does have the same effect on the other GPUs with HBM. Between Flexgen’s memory offloading capabilities and Memory Machine X’s memory tiering capabilities, the solution is managing the entire memory hierarchy that includes GPU, CPU and CXL memory modules.”

MemVerge Memory Machine X results — (Image credit: MemVerge)

More from TechRadar Pro

Are we exaggerating AI capabilities?
‘The fastest AI chip in the world’: Gigantic AI CPU has almost one million cores
AI chip built using ancient Samsung tech is claimed to be as fast as Nvidia A100 GPU

Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too.

Latest

A Samsung 990 EVO with its retail packaging

Samsung 990 EVO review: great for the price, just don’t expect true PCIe 5.0 speeds

See more latest ►

1

Newly discovered Microsoft Z1000 SSD baffles experts — no, world’s most valuable company won’t start selling SSDs anytime soon but it may well be tinkering with data center storage as it did with CPU
2

IKEA’s super-cheap fast chargers look a bargain for your iPhone 15 or Android phone
3

The 14 best deals from the massive Currys Spring Sale that I’d actually buy
4

Our favorite enthusiast DSLR is at a record low-price in the Amazon Spring Sale
5

Windows 11 is getting a controversial Windows 10 feature that some people accuse of being pointless bloat

1

Buying a new TV in 2024? Make it a Sony
2

Another Microsoft vulnerability is being used to spread malware
3

New iPhone display tech could block reflections and most sunlight – and it could debut as soon as the iPhone 17
4

This Wunderlist successor is helping me trade three apps for one
5

Another driver update, another set of huge performance boosts for free, as Intel Arc GPUs keep getting better

Around the TUT GLOBAL

Startup claims to boost LLM performance using standard memory instead of GPU HBM — but experts remain unconvinced by the numbers despite promising CXL technology

Startup claims to boost LLM performance using standard memory instead of GPU HBM — but experts remain unconvinced by the numbers despite promising CXL technology

Impressive results

More from TechRadar Pro

EDITOR PICKS

POPULAR CATEGORY

ABOUT US

FOLLOW US

Impressive results

Are you a pro? Subscribe to our newsletter

More from TechRadar Pro

Most Popular

RELATED ARTICLESMORE FROM AUTHOR

CES 2025 day 3: the 11 best gadgets we’ve seen, from Lenovo’s rollable laptop to Panasonic’s new flagship OLED TV

OnePlus’ rumored ‘mini’ flagship could be the final nail in the coffin for small Android phones

Delta Air Lines just announced its vision for the future of flying – here are my 3 favorite features coming to its planes and...

Around the TUT GLOBAL

EDITOR PICKS

POPULAR CATEGORY

ABOUT US

FOLLOW US

RELATED ARTICLES MORE FROM AUTHOR