Supported AI Platform & Why It Matters¶
DeepExtension is built to bridge the gap between enterprise users and the complexity of LLM training. A key part of that mission is supporting AI platforms that balance performance, accessibility, and developer-friendliness across a variety of hardware and operating systems.
This page explains the current supported platforms, the rationale behind them, and how they shape the DeepExtension user experience.
Why Platform Choice Matters¶
The choice of AI platform directly influences:
- Compatibility with popular ML libraries (e.g., PyTorch, TensorFlow)
- Training and inference performance
- Ease of installation and deployment
- Hardware cost and accessibility
For non-AI experts or resource-constrained teams, a complex setup can become a deal-breaker. DeepExtension aims to lower the entry barrier without compromising performance — which is why platform support is a strategic decision.
CUDA Platform: Industry Standard for LLM Training¶
From the early days of DeepExtension, we adopted CUDA as the primary training and inference backend. CUDA-enabled GPUs (NVIDIA) remain the de facto standard for:
- Full compatibility with PyTorch and TensorFlow
- Optimized support for LLM architectures and large-scale parallelism
- Ecosystem maturity (tools, community, research support)
This makes CUDA the most reliable and performant choice for serious training workflows. DeepExtension's training modules, including GRPO and SFT, are fully optimized for CUDA environments.
CUDA is strongly recommended for enterprise users who require scalable training on open-source foundation models (e.g., Qwen, LLaMA, DeepSeek).
MLX on Apple Silicon: Lightweight and Accessible¶
While CUDA is powerful, it's not easily accessible to all users — especially individual researchers or smaller teams without access to NVIDIA hardware.
Apple's M-series chips (M1–M4) introduced a new opportunity. With a Unified Memory Architecture (UMA) and impressive on-device AI performance, they offer:
- A compact yet capable development environment
- No additional GPU needed — it’s built into the chip
- Quiet, energy-efficient operation ideal for everyday usage
We initially tested PyTorch with Metal backend (MPS) on macOS, but found its performance to be inconsistent and its compatibility limited.
Instead, we chose to integrate with MLX — Apple’s new machine learning framework built specifically for Apple Silicon. MLX offers:
- Superior performance over MPS in real-world scenarios
- Simpler setup and memory management
- High efficiency for small-scale training and experimentation
DeepExtension now includes pre-installed MLX training demos to help new users run their first fine-tuning workflow right from their Mac.
Currently Supported Platforms¶
| Platform | Backend | Supported OS | Use Cases |
|---|---|---|---|
| CUDA | PyTorch / TensorFlow | Linux, Windows (via WSL) | Full-scale model training, production |
| MLX | MLX | macOS (M1–M4) | Local development, small-scale training |
Other platforms are not currently supported, but may be considered based on user demand.
Roadmap for Future Platform Support¶
We understand that hardware and ecosystem preferences vary — especially across global and diverse user bases.
While CUDA and MLX meet the majority of current use cases, we are actively collecting feedback to evaluate support for platforms such as:
- Windows (for CPU inference and simple UI testing)
- AMD ROCm platform (for open GPU ecosystems)
- ONNX Runtime or TensorRT (for inference-optimized deployment)
If you have specific platform needs or environment constraints, please reach out via Support. Your input directly helps shape our roadmap.
DeepExtension is here to make LLM training more accessible — regardless of whether you run on a data center, a MacBook, or a startup budget.