LPU

LLM Processing Unit (LPU) IP

Highly optimized and flexible processor IP that is able to reconfigure both memory types and compute resources for low-power or high-performance during LLM inference depending on customer needs

LLM-specialized coarse core design for end-to-end inference
Highly scalable engines to support single to extremely large batch size for optimal edge and cloud computing
Fully modular hardware for customizability
Use of standard PCIe and UCIe protocols for seamless SoC integration
Ability to integrate all DRAM types while maintaining maximum bandwidth utilization

Specifications

: Max Performance; Power (W); Die Area (mm²); Application

LPU Latency Optimized: 8 TFLOPS/core (FP16); 3.55; 5.82; Edge Devices

LPU High Performance (Batching Enabled): 12 TFLOPS/core (FP16)
24 TOPS/core (INT8); 4.72; 7.91; Cloud Servers

LPU Balanced (Batching Enabled): 6 TFLOPS/core (FP16)
12 TOPS/core (INT8); 2.84; 5.29; Edge Servers/Devices