Bertha 500

Efficient by Design. Sustainable by Architecture.

Bertha is built to do more with less — delivering powerful AI inference while minimizing energy consumption and resource waste.

Ask Inquiry

Powering AI Inference Across Cloud Data Centers

Bertha 500 (B500) is the world’s most efficient accelerator for AI inference. B500 is capable of effectively supporting 1024 concurrent requests (i.e., batch) with computational power of 768 TFLOPS, memory bandwidth of 546 GB/s, and average of 90% hardware utilization during entirety of inference.

B500 significantly reduces total cost of ownership (TCO) for cloud hyperscalers and provide affordable solution for AI services, delivering 2x greater performance, 19x better cost efficiency, and 12x better power efficiency than state-of-the-art GPUs.

Performance Chart

Comparison with NVIDIA H100 (refer to the table)

Throughput

Tokens/sec

804

NVIDIA H100

1645

Bertha

x2 Higher throughput than the competitors

Cost Efficiency

Tokens/sec/$1M

22970

NVIDIA H100

411225

Bertha

x19 Higher than the competitors

Power Efficiency

Tokens/sec/KW

1149

NVIDIA H100

13700

Bertha

x12 Higher than the competitors

Key Features

LPU-based Architecture

Streamlined memory access and inference-optimized dataflow with maximum model parameter reuse to drastically increase the utilization of memory bandwidth and computational units. Many coarse cores with independent compute lanes to support many concurrent requests.

Soc Integration

Advanced integration of LPU fabricated with 4nm technology node, 8 channels of LPDDR5x, and PCIe Gen5 for state-of-the-art system-on-chip design.

Multi-chip Scalability

Custom on-chip network controller for computation-communication overlapping to hide the communication overhead and achieve near-perfect scalability.

HyperDex Software

Plug & play solution for seamless serving Generative AI applications on HyperAccel hardware. Support for standardized ML frameworks for inference(e.g., PyTorch, ONNX, vLLM) with SDKs for further optimizations, deployment, and profiling based on user needs.

Specifications

FP16: 384 TFLOPS

FP8: 768 TFLOPS

Target Frequency: 1.5 GHz

Number System: FP16/8/4, INT8/4

DRAM Bandwidth: LPDDR5X, 546 GB/s

DRAM Size: 128 GB (up to 256 GB)

SRAM Size: 256 MB

TDP: 250 W

Batch Size: 1-1024

Form factor: Dual Slot

Bertha 500

Powering AI Inference Across Cloud Data Centers

Performance Chart

Throughput

Cost Efficiency

Power Efficiency

Key Features

LPU-based Architecture

Soc Integration

Multi-chip Scalability

HyperDex Software

Specifications

Accelerate What’s Next in AI