Software

Built for AI Services,
Fully Compatible with Leading AI Frameworks

A user-friendly full software stack that bridges AI applications, hyperscale models, and LPU hardware for optimal inference platform

Ecosystem for LPU
Ease of Programming and Deployment

  • Provides standardized ecosystem for Generative AI inference
  • Supports LLM inference and model frameworks, such as vLLM and HuggingFace
  • Accommodates all transformer-based LLMs (e.g., GPT, Llama, Qwen, Mistral, Grok, DeepSeek, Falcon, Gemma) and multi-modal models
  • Provides Pytorch support and Python-embedded domain-specific language (eDSL) for authoring high-performance, efficient LPU kernels
  • Facilitates developer page and model zoo for easy compilation
  • Implements device runtime and driver to create and execute binaries on the LPU
  • Enables seamless LLM inference experience for developers familiar with GPUs

Key Features

Intra-layer parallelism for self-attention and feed-forward network

Partitions target model parameters across multiple devices

Optimal memory allocation and alignment of model parameters

Parallel instruction chaining for maximum latency saving

In Action

HyperAccel vs. Meta Platform

Running Llama 3.1 on HX-F55X vs. NVIDIA GPU

In Action

HyperAccel x NAVER:
Chatbot Application

Running NAVER HyperCLOVA X on HX-F55X