하이퍼엑셀 히어로 섹션
서버 하드웨어 이미지
Generative AI. Made Fast, Efficient, and Affordable
We Provide Hyper-Accelerated Silicon IP/Solutions for Generative AI

Hyper-Accelerated
Hardware Solutions
for Generative AI Applications

AI Chip/
Silicon IP

Server


Software


HyperAccel creates fast, efficient, and affordable

inference system that accelerates transformer-based large

language models (LLM) with multi-billion parameters,

such as OpenAI GPT, Meta LLaMA.


Our AI chip, Latency Processing Unit, is the world-first hardware

accelerator dedicated for the end-to-end inference of LLM.


We provide Hyper-Accelerated Silicon Chip/Solutions for

emerging Generative AI applications

World-First AI Processor Dedicated for Hyperscale Generative AI

LLM Processing Unit

Fastest and
Most Efficient
GenAI Inference





LLM Processing Unit Architecture


 LPU consists of latency-optimized and highly scalable hardware architecture
 (e.g., streamlined memory access and streamlined execution engine) that

 perfectly balances memory bandwidth and compute logic to maintain

 effective bandwidth usage of ~90%











Expandable Synchronization Link (ESL) 


ESL is innovative peer-to-peer communication technology that performs low-latency

synchronization. With ESL, LPUs synchronize as soon as data segment comes out from one LPU.

Multi-LPU system can efficiently accelerate hyperscale models that has ten to hundred billion parameters. 







Unprecedented
Performance
and Scalability

HyperAccel LPU Charts

Performance of HyperAccel LPU*

Unit : tokens/sec
OPT 66B
OPT 30B
OPT 6.7B
OPT 1.3B
23.6
46.5
175.8
520.9
8x LPU
* Implemented on AMD U55C FPGA

Efficiency Analysis of HyperAccel LPU vs GPU Platform

Edge

Unit : tokens/sec/kW
Efficiency
343.8
1x HyperAccel LPU
243.5
1x NVIDIA L4
1.42x

Datacenter

Unit : tokens/sec/kW
Efficiency
38.8
8x HyperAccel LPU
29.5
2x NVIDIA H100*
1.33x

Scalability Analysis of HyperAccel LPU vs. GPU Platform

1.8X
1.74X
1.72X
1.8X
1.74X
1.72X

1 LPU vs. 8 LPUs on 100 Short-Sentence Generation using LLaMA-7B

1 LPU (49.2sec)

8LPU (8.6sec) 

Real-Time LPU Chatbot Application using Gradio

Built from deep understanding of transformer-based Large Language Models,

our LPU technology aims for higher throughput performance with 10x gains in cost and energy efficiency as compared to today’s market solution.

Our ASIC product (4nm) hyper-focused on

LLM inference will come out soon in 1Q 2026. 

32 LPU Cores with Streamlined Dataflow
128 GB LPDDR5X per Chip
Data Types: FP16, BF16, FP8, FP4, INT8, INT4
HW-Native Continuous Batching
LLMs, Multimodal Models, MoEs
vLLM Compatible with Paged Attention
mobile background

We Are Hiring! Join Our HyperAcceleration

We are looking for brilliant and passionate people to join us and play a

major role in building the next big thing in Generative AI. If you enjoy

working on cutting-edge technologies, solving complex problems, have

a team spirit, grit and a can-do-attitude – your place is with us!