Software
Built for AI Services,
Fully Compatible with Leading AI Frameworks
A user-friendly full software stack that bridges AI applications, hyperscale models, and LPU hardware for optimal inference platform
Ecosystem for LPU
Ease of Programming and Deployment
- Provides standardized ecosystem for Generative AI inference
- Supports LLM inference and model frameworks, such as vLLM and HuggingFace
- Accommodates all transformer-based LLMs (e.g., GPT, Llama, Qwen, Mistral, Grok, DeepSeek, Falcon, Gemma) and multi-modal models
- Provides Pytorch support and Python-embedded domain-specific language (eDSL) for authoring high-performance, efficient LPU kernels
- Facilitates developer page and model zoo for easy compilation
- Implements device runtime and driver to create and execute binaries on the LPU
- Enables seamless LLM inference experience for developers familiar with GPUs
Key Features
Intra-layer parallelism for self-attention and feed-forward network
Partitions target model parameters across multiple devices
Optimal memory allocation and alignment of model parameters
Parallel instruction chaining for maximum latency saving
In Action
HyperAccel vs. Meta Platform
Running Llama 3.1 on HX-F55X vs. NVIDIA GPU
In Action
HyperAccel x NAVER:
Chatbot Application
Running NAVER HyperCLOVA X on HX-F55X