Supported Models
Model catalog
syndicAI supports the latest frontier-class open-source models. All models are available on all tiers — you choose which model to run when creating your Squad Server.
| Model | Total params | Active params | Context | Architecture | VRAM required |
|---|---|---|---|---|---|
| MiniMax M2.5 | 230B | ~10B | 192K | MoE | ~255GB (AWQ) |
| GLM-5 | 744B | ~45B | 200K | MoE | 200–320GB |
| DeepSeek V3.2 | 671B | ~37B | 128K | MoE | 180–280GB |
| Qwen3-235B | 235B | ~22B | 262K | MoE | 130–180GB |
| Qwen3-Coder-480B | 480B | ~35B | 262K | MoE | 200–320GB |
| Qwen2.5-Coder-32B | 32B | 32B | 128K | Dense | 40–60GB |
Why Mixture-of-Experts?
Most of our supported models use Mixture-of-Experts (MoE) architecture. In an MoE model, only a small fraction of the total parameters are activated for each token — typically 5–15% of the total. This means:
- Frontier-class quality: The full parameter count (230B, 671B, 744B) gives the model enormous knowledge capacity
- Efficient inference: Only the active parameters (10B, 37B, 45B) consume compute per token, enabling fast generation
- Reasonable GPU requirements: A 230B MoE model can run on hardware that would struggle with a 230B dense model
This is why open-source models have caught up to proprietary ones — MoE architecture delivers the quality of massive models with the efficiency of smaller ones.
Model details
MiniMax M2.5
Our recommended default. MiniMax M2.5 is a 230B-parameter MoE model with approximately 10B active parameters per token. It delivers near-Claude-Opus quality on coding tasks with exceptionally fast inference.
- Context window: 192K tokens
- Strengths: Autocomplete, multi-file refactoring, code review, agentic coding workflows
- Benchmark highlights: Competitive with Claude Sonnet 4.6 on SWE-bench, HumanEval, and MBPP
- Reference config: 8× RTX 5090 (254.7 GB VRAM) with AWQ quantization — well above 30 t/s at ~$3.28/hr
GLM-5
The largest model in our catalog. GLM-5 is a 744B-parameter MoE model from Zhipu AI, with approximately 45B active parameters.
- Context window: 200K tokens
- Strengths: Complex reasoning, very large codebases, multi-step problem solving
- Benchmark highlights: SWE-bench 77.8%, competitive with frontier proprietary models
- GPU config: 4× A100 80GB or 2× H100 80GB recommended
DeepSeek V3.2
A strong all-around model. DeepSeek V3.2 is a 671B-parameter MoE model with approximately 37B active parameters.
- Context window: 128K tokens
- Strengths: Balanced coding and reasoning, good instruction following
- Benchmark highlights: Frontier-competitive across coding benchmarks
- GPU config: 2× H100 80GB or 4× A100 80GB recommended
Qwen3-235B
A versatile MoE model from Alibaba's Qwen team with strong coding and reasoning capabilities.
- Context window: 262K tokens (longest in our catalog)
- Strengths: Long-context tasks, coding, and mathematical reasoning
- GPU config: 2× A100 80GB recommended
Qwen3-Coder-480B
Purpose-built for code. A 480B MoE model specifically trained for software engineering tasks.
- Context window: 262K tokens
- Strengths: Code generation, test writing, code review, agentic workflows
- GPU config: 4× A100 80GB or 2× H100 80GB recommended
Qwen2.5-Coder-32B
Our "lightweight" option. A 32B dense model that runs on a single GPU and still delivers GPT-4o-level coding performance.
- Context window: 128K tokens
- Strengths: Fast inference, lower cost, good enough for many coding tasks
- GPU config: 1× A100 80GB (or even 1× A6000 48GB)
Choosing a model
| Use case | Recommended model |
|---|---|
| Default / best value | MiniMax M2.5 |
| Maximum quality | GLM-5 |
| Budget-conscious | Qwen2.5-Coder-32B |
| Longest context | Qwen3-235B |
| Code-focused | Qwen3-Coder-480B |
You can change your model by creating a new Squad Server. Model changes require a new server because the GPU configuration and model weights differ.