Forte 55X

Running Generative AI at Unrivaled Speed

Optimized for low-latency, power-efficient AI inference by bringing together LPU technology with HBM-equipped AMD Alveo U55C high performance compute card

Enabling data privacy and customization for on-premise datacenters

Forte 55X (F55X) is an industry-leading accelerator for high-seed AI inference. With streamlined architecture and 460 GB/s HBM, F55X achieves both lower latency and power consumption than the state-of-the-art GPUs. F55X is optimized for single inference to ensure data privacy.
F55X utilizes AMD Xilinx’s U55C FPGA for ultimate programmability, allowing user-specific development and post-sales optimization. F55X is being co-marketed with AMD after functional and performance validation.

Learn More

Performance

Comparison with NVIDIA L4

Throughput

Tokens/sec

8x Forte 55X
8x NVIDIA L4
13.9
23.7
OPT 66B
27.4
46.5
OPT 30B
103.4
175.8
OPT 6.7B
306.4
520.9
OPT 1.3B

x1.7 Higher throughput than the competitor

Efficiency

Tokens/sec/kW

243.5
1x NVIDIA L4
343.8
1x Forte 55X

x1.42 Higher than the competitor

Key Features

LPU-based Architecture

Streamliend memory access with precise alignment of memory bandwidth and compute bandwidth for 90% hardware utilization during inference.

SoC Integration

Based on AMD Alveo U55C FPGA for reconfigurability, power-saving, and fast time-to-market. Integration of HBM2 optimized for low-latency workload. High-speed 100Gbps Ethernet networking for superior scalability.

Multi-chip Scalability

Custom on-chip network controller for computation-communication overlapping to hide the communication overhead and achieve near-perfect scalability.

HyperDex Software

Plug & play solution for seamless serving Generative AI applications on HyperAccel hardware. Support for standardized ML frameworks for inference(e.g., PyTorch, vLLM) with SDKs for further optimizations, deployment, and profiling based on user needs.

Specifications

Target Frequency
200 MHz
Number System
FP16
DRAM Bandwidth
HBM2, 460 GB/s
DRAM Size
16 GB
SRAM Size
2 MB
Power Consumption
75 W
Supported Models
LLM (e.g., GPT, OPT, Llama, Claude, Phi)
Batch Size
1
Form factor
Single slot