Specialized AI Chip and Server Products for LLM Inference Workloads

Specialized AI Chip and Server Products for LLM Inference Workloads
SERVER
Datacenter
LPU-based datacenter server outperforms state-of-the art DGX A100 in text generation
workloads, such as ChatGPT, in terms of performance (>30%), cost-effectiveness (>2x),
and power efficiency (>2x), with a superior accelerator scalability
Silicon IP
LLM Processing Unit (LPU) IP
Highly optimized and flexible processor IP that is able to reconfigure both memory types
and compute resources for low-power or high-performance during LLM inference
depending on customer needs
ASIC Product
32 LPU Cores with Streamlined Dataflow
128 GB LPDDR5X per Chip
Data Types: FP16, BF16, FP8, FP4, INT8, INT4
HW-Native Continuous Batching
LLMs, Multimodal Models, MoEs
vLLM Compatible with Paged Attention