Bertha 500

Efficient by Design. Sustainable by Architecture.

Bertha is built to do more with less — delivering powerful AI inference while minimizing energy consumption and resource waste.

Powering AI Inference Across Cloud Data Centers

Bertha 500 (B500) is the world’s most efficient accelerator for AI inference. B500 is capable of effectively supporting 1024 concurrent requests (i.e., batch) with computational power of 768 TFLOPS, memory bandwidth of 546 GB/s, and average of 90% hardware utilization during entirety of inference.
B500 significantly reduces total cost of ownership (TCO) for cloud hyperscalers and provide affordable solution for AI services, delivering 2x greater performance, 19x better cost efficiency, and 12x better power efficiency than state-of-the-art GPUs.

Performance Chart

Comparison with NVIDIA H100 (refer to the table)

Throughput

Tokens/sec

804
NVIDIA H100
1645
Bertha

x2 Higher throughput than the competitors

Cost Efficiency

Tokens/sec/$1M

22970
NVIDIA H100
411225
Bertha

x19 Higher than the competitors

Power Efficiency

Tokens/sec/KW

1149
NVIDIA H100
13700
Bertha

x12 Higher than the competitors

Key Features

LPU-based Architecture

Streamlined memory access and inference-optimized dataflow with maximum model parameter reuse to drastically increase the utilization of memory bandwidth and computational units. Many coarse cores with independent compute lanes to support many concurrent requests.

Soc Integration

Advanced integration of LPU fabricated with 4nm technology node, 8 channels of LPDDR5x, and PCIe Gen5 for state-of-the-art system-on-chip design.

Multi-chip Scalability

Custom on-chip network controller for computation-communication overlapping to hide the communication overhead and achieve near-perfect scalability.

HyperDex Software

Plug & play solution for seamless serving Generative AI applications on HyperAccel hardware. Support for standardized ML frameworks for inference(e.g., PyTorch, ONNX, vLLM) with SDKs for further optimizations, deployment, and profiling based on user needs.

Specifications

FP16
384 TFLOPS
FP8
768 TFLOPS
Target Frequency
1.5 GHz
Number System
FP16/8/4, INT8/4
DRAM Bandwidth
LPDDR5X, 546 GB/s
DRAM Size
128 GB (up to 256 GB)
SRAM Size
256 MB
TDP
250 W
Batch Size
1-1024
Form factor
Dual Slot