Supported Models

Model catalog

syndicAI supports the latest frontier-class open-source models. All models are available on all tiers — you choose which model to run when creating your Squad Server.

Model Total params Active params Context Architecture VRAM required
MiniMax M2.5 230B ~10B 192K MoE ~255GB (AWQ)
GLM-5 744B ~45B 200K MoE 200–320GB
DeepSeek V3.2 671B ~37B 128K MoE 180–280GB
Qwen3-235B 235B ~22B 262K MoE 130–180GB
Qwen3-Coder-480B 480B ~35B 262K MoE 200–320GB
Qwen2.5-Coder-32B 32B 32B 128K Dense 40–60GB

Why Mixture-of-Experts?

Most of our supported models use Mixture-of-Experts (MoE) architecture. In an MoE model, only a small fraction of the total parameters are activated for each token — typically 5–15% of the total. This means:

  • Frontier-class quality: The full parameter count (230B, 671B, 744B) gives the model enormous knowledge capacity
  • Efficient inference: Only the active parameters (10B, 37B, 45B) consume compute per token, enabling fast generation
  • Reasonable GPU requirements: A 230B MoE model can run on hardware that would struggle with a 230B dense model

This is why open-source models have caught up to proprietary ones — MoE architecture delivers the quality of massive models with the efficiency of smaller ones.

Model details

MiniMax M2.5

Our recommended default. MiniMax M2.5 is a 230B-parameter MoE model with approximately 10B active parameters per token. It delivers near-Claude-Opus quality on coding tasks with exceptionally fast inference.

  • Context window: 192K tokens
  • Strengths: Autocomplete, multi-file refactoring, code review, agentic coding workflows
  • Benchmark highlights: Competitive with Claude Sonnet 4.6 on SWE-bench, HumanEval, and MBPP
  • Reference config: 8× RTX 5090 (254.7 GB VRAM) with AWQ quantization — well above 30 t/s at ~$3.28/hr

GLM-5

The largest model in our catalog. GLM-5 is a 744B-parameter MoE model from Zhipu AI, with approximately 45B active parameters.

  • Context window: 200K tokens
  • Strengths: Complex reasoning, very large codebases, multi-step problem solving
  • Benchmark highlights: SWE-bench 77.8%, competitive with frontier proprietary models
  • GPU config: 4× A100 80GB or 2× H100 80GB recommended

DeepSeek V3.2

A strong all-around model. DeepSeek V3.2 is a 671B-parameter MoE model with approximately 37B active parameters.

  • Context window: 128K tokens
  • Strengths: Balanced coding and reasoning, good instruction following
  • Benchmark highlights: Frontier-competitive across coding benchmarks
  • GPU config: 2× H100 80GB or 4× A100 80GB recommended

Qwen3-235B

A versatile MoE model from Alibaba's Qwen team with strong coding and reasoning capabilities.

  • Context window: 262K tokens (longest in our catalog)
  • Strengths: Long-context tasks, coding, and mathematical reasoning
  • GPU config: 2× A100 80GB recommended

Qwen3-Coder-480B

Purpose-built for code. A 480B MoE model specifically trained for software engineering tasks.

  • Context window: 262K tokens
  • Strengths: Code generation, test writing, code review, agentic workflows
  • GPU config: 4× A100 80GB or 2× H100 80GB recommended

Qwen2.5-Coder-32B

Our "lightweight" option. A 32B dense model that runs on a single GPU and still delivers GPT-4o-level coding performance.

  • Context window: 128K tokens
  • Strengths: Fast inference, lower cost, good enough for many coding tasks
  • GPU config: 1× A100 80GB (or even 1× A6000 48GB)

Choosing a model

Use case Recommended model
Default / best value MiniMax M2.5
Maximum quality GLM-5
Budget-conscious Qwen2.5-Coder-32B
Longest context Qwen3-235B
Code-focused Qwen3-Coder-480B

You can change your model by creating a new Squad Server. Model changes require a new server because the GPU configuration and model weights differ.