Forte 55X

Running Generative AI at Unrivaled Speed

Optimized for low-latency, power-efficient AI inference by bringing together LPU technology with HBM-equipped AMD Alveo U55C high performance compute card

Ask Inquiry

Enabling data privacy and customization for on-premise datacenters

Forte 55X (F55X) is an industry-leading accelerator for high-seed AI inference. With streamlined architecture and 460 GB/s HBM, F55X achieves both lower latency and power consumption than the state-of-the-art GPUs. F55X is optimized for single inference to ensure data privacy.

F55X utilizes AMD Xilinx’s U55C FPGA for ultimate programmability, allowing user-specific development and post-sales optimization. F55X is being co-marketed with AMD after functional and performance validation.

Learn More

Performance

Comparison with NVIDIA L4

Throughput

Tokens/sec

8x Forte 55X

8x NVIDIA L4

13.9

23.7

OPT 66B

27.4

46.5

OPT 30B

103.4

175.8

OPT 6.7B

306.4

520.9

OPT 1.3B

x1.7 Higher throughput than the competitor

Efficiency

Tokens/sec/kW

243.5

1x NVIDIA L4

343.8

1x Forte 55X

x1.42 Higher than the competitor

Key Features

LPU-based Architecture

Streamliend memory access with precise alignment of memory bandwidth and compute bandwidth for 90% hardware utilization during inference.

SoC Integration

Based on AMD Alveo U55C FPGA for reconfigurability, power-saving, and fast time-to-market. Integration of HBM2 optimized for low-latency workload. High-speed 100Gbps Ethernet networking for superior scalability.

Multi-chip Scalability

Custom on-chip network controller for computation-communication overlapping to hide the communication overhead and achieve near-perfect scalability.

HyperDex Software

Plug & play solution for seamless serving Generative AI applications on HyperAccel hardware. Support for standardized ML frameworks for inference(e.g., PyTorch, vLLM) with SDKs for further optimizations, deployment, and profiling based on user needs.

Specifications

Target Frequency: 200 MHz

Number System: FP16

DRAM Bandwidth: HBM2, 460 GB/s

DRAM Size: 16 GB

SRAM Size: 2 MB

Power Consumption: 75 W

Supported Models: LLM (e.g., GPT, OPT, Llama, Claude, Phi)

Batch Size: 1

Form factor: Single slot