LPU
LLM Processing Unit (LPU) IP
Highly optimized and flexible processor IP that is able to reconfigure both memory types and compute resources for low-power or high-performance during LLM inference depending on customer needs
- LLM-specialized coarse core design for end-to-end inference
- Highly scalable engines to support single to extremely large batch size for optimal edge and cloud computing
- Fully modular hardware for customizability
- Use of standard PCIe and UCIe protocols for seamless SoC integration
- Ability to integrate all DRAM types while maintaining maximum bandwidth utilization
Specifications
- Max Performance
- Power (W)
- Die Area (mm2)
- Application
- LPU Latency Optimized
- 8 TFLOPS/core (FP16)
- 3.55
- 5.82
- Edge Devices
- LPU High Performance
(Batching Enabled) - 12 TFLOPS/core (FP16)
24 TOPS/core (INT8) - 4.72
- 7.91
- Cloud Servers
- LPU Balanced
(Batching Enabled) - 6 TFLOPS/core (FP16)
12 TOPS/core (INT8) - 2.84
- 5.29
- Edge Servers/Devices