Bertha 500
Efficient by Design. Sustainable by Architecture.
Bertha is built to do more with less — delivering powerful AI inference while minimizing energy consumption and resource waste.

Powering AI Inference Across Cloud Data Centers


Performance Chart
Comparison with NVIDIA H100 (refer to the table)
Throughput
Tokens/sec
x2 Higher throughput than the competitors
Cost Efficiency
Tokens/sec/$1M
x19 Higher than the competitors
Power Efficiency
Tokens/sec/KW
x12 Higher than the competitors
Key Features
LPU-based Architecture
Streamlined memory access and inference-optimized dataflow with maximum model parameter reuse to drastically increase the utilization of memory bandwidth and computational units. Many coarse cores with independent compute lanes to support many concurrent requests.
Soc Integration
Advanced integration of LPU fabricated with 4nm technology node, 8 channels of LPDDR5x, and PCIe Gen5 for state-of-the-art system-on-chip design.
Multi-chip Scalability
Custom on-chip network controller for computation-communication overlapping to hide the communication overhead and achieve near-perfect scalability.
HyperDex Software
Plug & play solution for seamless serving Generative AI applications on HyperAccel hardware. Support for standardized ML frameworks for inference(e.g., PyTorch, ONNX, vLLM) with SDKs for further optimizations, deployment, and profiling based on user needs.
Specifications
- FP16
- 384 TFLOPS
- FP8
- 768 TFLOPS
- Target Frequency
- 1.5 GHz
- Number System
- FP16/8/4, INT8/4
- DRAM Bandwidth
- LPDDR5X, 546 GB/s
- DRAM Size
- 128 GB (up to 256 GB)
- SRAM Size
- 256 MB
- TDP
- 250 W
- Batch Size
- 1-1024
- Form factor
- Dual Slot