Software

Built for AI Services,
Fully Compatible with Leading AI Frameworks

A user-friendly full software stack that bridges AI applications, hyperscale models, and LPU hardware for optimal inference platform

Ecosystem for LPU
Ease of Programming and Deployment

Provides standardized ecosystem for Generative AI inference
Supports LLM inference and model frameworks, such as vLLM and HuggingFace
Accommodates all transformer-based LLMs (e.g., GPT, Llama, Qwen, Mistral, Grok, DeepSeek, Falcon, Gemma) and multi-modal models
Provides Pytorch support and Python-embedded domain-specific language (eDSL) for authoring high-performance, efficient LPU kernels
Facilitates developer page and model zoo for easy compilation
Implements device runtime and driver to create and execute binaries on the LPU
Enables seamless LLM inference experience for developers familiar with GPUs

Key Features

Intra-layer parallelism for self-attention and feed-forward network

Partitions target model parameters across multiple devices

Optimal memory allocation and alignment of model parameters

Parallel instruction chaining for maximum latency saving

In Action

HyperAccel vs. Meta Platform

Running Llama 3.1 on HX-F55X vs. NVIDIA GPU

In Action

HyperAccel x NAVER:
Chatbot Application

Running NAVER HyperCLOVA X on HX-F55X