GENERATIVE AI
Made Fast, Efficient, and Affordable
 We Provide Hyper-Accelerated Silicon IP/Solutions for Generative AI
Hyper-Accelerated
Hardware Solutions
for Emerging AI Applications

HyperAccel creates fast, efficient, and affordable

inference system that accelerates transformer-based large

language models (LLM) with multi-billion parameters, such as OpenAI GPT, Meta LLaMA.


Our AI chip, Latency Processing Unit, is the world-first hardware

accelerator dedicated for the end-to-end inference of LLM.


We provide Hyper-Accelerated Silicon IP/Solutions for

emerging Generative AI applications

World-First Acceleration Appliance for Hyperscale Generative AI
HYPERACCEL Orion.
LPU-based datacenter server outperforms state-of-the art DGX A100 in text generation workloads, such as ChatGPT, in terms of performance (>30%), cost-effectiveness (>2x), and power efficiency (>2x), with a superior accelerator scalability
Unprecedented
Performance
and Scalability
Performance Comparison Between
LPU and GPU Platform
Edge Server Analysis
Datacenter Server Analysis
Edge Server Analysis
0.34

1x HyperAccel LPU

0.24

1x NVIDIA L4

1.42x
OPT 6.7B
8:2048
0.36

2x HyperAccel LPU

0.26

2x NVIDIA L4

1.38x
OPT 6.7B
8:2048
Datacenter Server Analysis
0.08

2x HyperAccel LPU

0.07

1x NVIDIA H100

1.14x
OPT 30B
8:2048
0.04

8x HyperAccel LPU

0.03

2x NVIDIA H100

1.33x
OPT 66B
8:2048
Strong Scaling of
HyperAccel Orion vs. NVIDIA DGX A100
1.8X
1.72X
1.74X
1.8X
1.72X
1.74X
For scalability, Orion achieves 1.76× speedup on average for doubling
the number of devices, whereas DGX A100 achieves 1.38× speedup.
1 LPU vs. 8 LPUs on Meta AI LLaMA2 7B Model

1 LPU (49.6sec)

8 LPU (8.7sec)

Fastest and
Most Efficient
GenAI Inference
Latency Processing Unit (LPU)
LPU consists of latency-optimized and highly scalable hardware architecture (e.g., streamlined memory access and streamlined execution engine) that perfectly balances memory bandwidth and compute logic to maintain effective bandwidth usage of ~90%.
Expandable Synchronization Link (ESL)
Expandable Synchronization Link (ESL)
ESL is innovative peer-to-peer communication technology that performs low-latency synchronization. With ESL, LPUs synchronize as soon as data segment comes out from one LPU. Multi-LPU system can efficiently accelerate hyperscale models that has ten to hundred billion parameters.
Software
New Standard of
Software Stack
for LLMs
HyperDex Framework
A new paradigm of compiler technology that bridges datacenter applications, hyperscale models, and LPU-based hardware via high-level easy-to-use API and internal optimization tools to ultimately enable the transition from narrow AI to general AI
We Are Hiring! Join Our HyperAcceleration
We are looking for brilliant and passionate people to join us and play a major role in building the next big thing in Generative AI. If you enjoy working on cutting-edge technologies, solving complex problems, have a team spirit, grit and a can-do-attitude – your place is with us!
Join Us Here