Specialized AI Chip and Server Products for LLM Inference Workloads 

SERVER






















Datacenter


LPU-based datacenter server outperforms state-of-the art DGX A100 in text generation

workloads, such as ChatGPT, in terms of performance (>30%), cost-effectiveness (>2x),

and power efficiency (>2x), with a superior accelerator scalability 

Silicon IP


LLM Processing Unit (LPU) IP


Highly optimized and flexible processor IP that is able to reconfigure both memory types

and compute resources for low-power or high-performance during LLM inference

depending on customer needs

ASIC Product 

32 LPU Cores with Streamlined Dataflow

128 GB LPDDR5X per Chip

Data Types: FP16, BF16, FP8, FP4, INT8, INT4

HW-Native Continuous Batching

LLMs, Multimodal Models, MoEs

vLLM Compatible with Paged Attention