GENERATIVE AI
Faster, Cheap, and Easy
 We Provide Hyper-Accelerated Silicon IP/Solutions for Generative AI
Hyper-Accelerated
Hardware Solutions
for Generative AI Applications
hyperaccel_feature_01

AI Chip/
Silicon IP

hyperaccel_feature_02

Server

hyperaccel_feature_03

Software

HyperAccel creates fast, efficient, and affordable

inference system that accelerates transformer-based large

language models (LLM) with multi-billion parameters,

such as OpenAI GPT, Meta LLaMA.

Our AI chip, Latency Processing Unit, is the world-first hardware

accelerator dedicated for the end-to-end inference of LLM.

We provide Hyper-Accelerated Silicon Chip/Solutions for

emerging Generative AI applications

World-First AI Processor Dedicated for Hyperscale Generative AI
LLM Processing Unit
hyperaccel_orion
Fastest and
Most Efficient
GenAI Inference
LLM Processing Unit Architecture
LPU_efficient mechanism
LLM Processing Unit Architecture
LPU consists of latency-optimized and highly scalable hardware architecture (e.g., streamlined memory access and streamlined execution engine) that perfectly balances memory bandwidth and compute logic to maintain effective bandwidth usage of ~90%.
Expandable Synchronization Link (ESL)
Expandable Synchronization Link (ESL)
ESL is innovative peer-to-peer communication technology that performs low-latency synchronization. With ESL, LPUs synchronize as soon as data segment comes out from one LPU. Multi-LPU system can efficiently accelerate hyperscale models that has ten to hundred billion parameters.
Unprecedented
Performance
and Scalability
Performance of HyperAccel LPU*
OPT 66B
OPT 30B
OPT 6.7B
OPT 1.3B
23.6
46.5
175.8
520.9
tokens/sec
8x LPU
8:2040
Efficiency Analysis of
HyperAccel LPU vs GPU Platform
Edge
343.8

1x HyperAccel LPU

243.5

1x NVIDIA L4

1.42x
tokens/sec/kW
OPT 6.7B
8:2040
Datacenter
38.8

8x HyperAccel LPU

29.5

2x NVIDIA H100*

1.33x
tokens/sec/kW
OPT 66B
8:2040
* 8x LPU and 2x H100 have similar cost
Scalability Analysis of
HyperAccel LPU vs. GPU Platform
1.8X
1.74X
1.72X
1.8X
1.72X
1.74X
1 LPU vs. 8 LPUs on 100 Short-Sentence Generation using LLaMA-7B

1 LPU (49.2sec)

8LPU (8.6sec)

Real-Time LPU Chatbot Application using Gradio
We Are Hiring! Join Our HyperAcceleration
We are looking for brilliant and passionate people to join us and play a major role in building the next big thing in Generative AI. If you enjoy working on cutting-edge technologies, solving complex problems, have a team spirit, grit and a can-do-attitude – your place is with us!
Join Us Here