LPU

LLM Processing Unit (LPU) IP

Highly optimized and flexible processor IP that is able to reconfigure both memory types and compute resources for low-power or high-performance during LLM inference depending on customer needs

  • LLM-specialized coarse core design for end-to-end inference
  • Highly scalable engines to support single to extremely large batch size for optimal edge and cloud computing
  • Fully modular hardware for customizability
  • Use of standard PCIe and UCIe protocols for seamless SoC integration
  • Ability to integrate all DRAM types while maintaining maximum bandwidth utilization

Specifications

Max Performance
Power (W)
Die Area (mm2)
Application
LPU Latency Optimized
8 TFLOPS/core (FP16)
3.55
5.82
Edge Devices
LPU High Performance
(Batching Enabled)
12 TFLOPS/core (FP16)
24 TOPS/core (INT8)
4.72
7.91
Cloud Servers
LPU Balanced
(Batching Enabled)
6 TFLOPS/core (FP16)
12 TOPS/core (INT8)
2.84
5.29
Edge Servers/Devices