GENERATIVE AI
Made Fast, Efficient, and Affordable
 We Provide Hyper-Accelerated Silicon IP/Solutions for Generative AI
Hyper-Accelerated
Hardware Solutions
for Emerging AI Applications
hyperaccel_feature_01

Silicon IP

hyperaccel_feature_02

Server

hyperaccel_feature_03

Software

HyperAccel creates fast, efficient, and affordable

inference system that accelerates transformer-based large

language models (LLM) with multi-billion parameters, such as OpenAI GPT, Meta LLaMA.

Our AI chip, Latency Processing Unit, is the world-first hardware

accelerator dedicated for the end-to-end inference of LLM.

We provide Hyper-Accelerated Silicon IP/Solutions for

emerging Generative AI applications

World-First AI Processor Dedicated for Hyperscale Generative AI
Latency Processing Unit
hyperaccel_orion
Fastest and
Most Efficient
GenAI Inference
Latency Processing Unit Architecture
LPU_efficient mechanism
Latency Processing Unit Architecture
LPU consists of latency-optimized and highly scalable hardware architecture (e.g., streamlined memory access and streamlined execution engine) that perfectly balances memory bandwidth and compute logic to maintain effective bandwidth usage of ~90%.
Expandable Synchronization Link (ESL)
Expandable Synchronization Link (ESL)
ESL is innovative peer-to-peer communication technology that performs low-latency synchronization. With ESL, LPUs synchronize as soon as data segment comes out from one LPU. Multi-LPU system can efficiently accelerate hyperscale models that has ten to hundred billion parameters.
Unprecedented
Performance
and Scalability
Performance of HyperAccel LPU
OPT 66B
OPT 30B
OPT 6.7B
OPT 1.3B
23.6
46.5
175.8
520.9
tokens/sec
8x LPU
8:2048
Efficiency Analysis of
HyperAccel LPU vs GPU Platform
Edge
343.8

1x HyperAccel LPU

243.5

1x NVIDIA L4

1.42x
tokens/sec/kW
OPT 6.7B
8:2048
Datacenter
38.8

8x HyperAccel LPU

29.5

2x NVIDIA H100*

1.33x
tokens/sec/kW
OPT 66B
8:2048
* 8x LPU and 2x H100 have similar cost
Scalability Analysis of
HyperAccel LPU vs. GPU Platform
1.8X
1.74X
1.72X
1.8X
1.72X
1.74X
1 LPU vs. 8 LPUs on 100 Short-Sentence Generation using LLaMA-7B

1 LPU (49.2sec)

8LPU (8.6sec)

Real-Time LPU Chatbot Application using Gradio
We Are Hiring! Join Our HyperAcceleration
We are looking for brilliant and passionate people to join us and play a major role in building the next big thing in Generative AI. If you enjoy working on cutting-edge technologies, solving complex problems, have a team spirit, grit and a can-do-attitude – your place is with us!
Join Us Here