Supported Models — Docs

Model Catalog

syndicAI supports the latest frontier-class open-source models. All models are available on all tiers — you choose which model to run when creating your Squad Server. Many families ship in multiple quantization builds (AWQ, FP8, INT4, NVFP4, and others); the wizard shows only builds that fit your selected GPU hardware.

Model	Total params	Active params	Context	Architecture	Reference VRAM
GLM-5.2	753B	~39B	262K	MoE	~410GB (AWQ) · Hopper+
MiniMax M3	453B	~23B	262K	MoE	~258GB (NVFP4) · ~451GB (MXFP8)
Kimi K2.7 Code	1T	~32B	262K	MoE	~595GB (INT4)
Nex-N2 Pro	397B	~17B	262K	MoE	~210GB (INT4) · ~237GB (NVFP4)
MiniMax M2.5	230B	~10B	196K	MoE	~122GB (AWQ)
Nex-N2 Mini	35B	~3B	262K	MoE	~35GB (FP8)
Qwen3.6 27B	27B	27B	262K	Dense	~28GB (FP8)

SWE-bench Pro scores below are vendor-reported or catalog-sourced benchmarks for coding agent quality — useful for comparing families, not a guarantee of your exact workflow.

Why Mixture-of-Experts?

Most of our supported models use Mixture-of-Experts (MoE) architecture. In an MoE model, only a small fraction of the total parameters are activated for each token — typically 5–15% of the total. This means:

Frontier-class quality: The full parameter count (230B, 453B, 753B, 1T) gives the model enormous knowledge capacity
Efficient inference: Only the active parameters (3B–39B) consume compute per token, enabling fast generation
Reasonable GPU requirements: A 230B MoE model can run on hardware that would struggle with a 230B dense model

This is why open-source models have caught up to proprietary ones — MoE architecture delivers the quality of massive models with the efficiency of smaller ones.

Model Details

GLM-5.2

Top of the catalog on SWE-bench Pro. GLM-5.2 is a 753B-parameter MoE model from Zhipu AI with approximately 39B active parameters per token.

Context window: 262K tokens
Strengths: Agentic coding, complex reasoning, tool use, multi-step refactors
Benchmark highlights: SWE-bench Pro 62.1% (Z.ai)
Available builds: FP8 (~756GB weights) and AWQ (~410GB weights)
GPU note: Requires Hopper (H100) or Blackwell datacenter GPUs — Ampere and older are not supported due to sparse-MLA kernel requirements

MiniMax M3

MiniMax's latest flagship coding agent. 453B total parameters, ~23B active, with strong tool calling and long-context agent workflows.

Context window: 262K tokens
Strengths: Long-horizon agents, multi-file coding, frontier-class throughput on Blackwell
Benchmark highlights: SWE-bench Pro 59.0% (MiniMax)
Available builds: MXFP8 (~451GB) and NVFP4 (~258GB, Blackwell datacenter)
GPU config: NVFP4 on 2× RTX PRO 6000 Blackwell is the practical sweet spot; MXFP8 needs more headroom

Kimi K2.7 Code

Moonshot AI's code-specialized MoE, built for long-horizon software engineering and agentic workflows.

Context window: 262K tokens
Strengths: Multi-step coding agents, large codebases, sustained tool loops
Benchmark highlights: SWE-bench Pro 58.6% (proxied from Kimi K2.6 — no K2.7 Code score published yet)
Available builds: Official INT4 (~595GB) and NVFP4 Blackwell build
GPU config: 4× H100 80GB or 8× A100 80GB class hardware for the INT4 build

Nex-N2 Pro

Nex-AGI's highest-end open-weight coding model. 397B MoE with ~17B active parameters and hybrid linear-attention KV budgeting for long context.

Context window: 262K tokens
Strengths: Near-frontier coding quality on a more practical footprint (INT4 build)
Benchmark highlights: SWE-bench Pro 58.8% (Nex-AGI vendor card)
Available builds: BF16 (~794GB), INT4 (~210GB), NVFP4 (~237GB, Blackwell)

MiniMax M2.5

Our long-running reference config. Still an excellent default for squads that want proven stability and the lowest VRAM floor in the frontier tier.

Context window: 196K tokens
Strengths: Autocomplete, multi-file refactoring, code review, agentic coding workflows
Benchmark highlights: SWE-bench Pro 55.4% (MiniMax)
Available builds: AWQ (~122GB) and FP8 (~230GB)
Reference config: 2× RTX PRO 6000 S with AWQ — well above 30 tok/s at ~$1.60/hr

Nex-N2 Mini

Compact Nex-AGI MoE for single-GPU tiers. 35B total, ~3B active — surprisingly capable for its size.

Context window: 262K tokens
Strengths: Lower cost, fast inference, native tool calling
Benchmark highlights: SWE-bench Pro 50.2% (Nex-AGI vendor card)
Available builds: FP8 (~35GB)
GPU config: 1× datacenter GPU with ≥48GB VRAM

Qwen3.6 27B

Alibaba's dense instruction model — the lightweight option when you want strong tool calling without MoE overhead.

Context window: 262K tokens
Strengths: Fast inference, lower cost, good tool-use and coding for a dense 27B model
Benchmark highlights: SWE-bench Pro 53.5% (aggregator estimate)
Available builds: Official FP8 (~28GB)
GPU config: 1× A100 80GB or 2× RTX 4090/5090 class GPUs

Choosing a Model

Use case	Recommended model
Maximum SWE-bench quality	GLM-5.2
Flagship agent + long context	MiniMax M3
Long-horizon coding agents	Kimi K2.7 Code
Best value / reference config	MiniMax M2.5 AWQ
Near-frontier on fewer GPUs	Nex-N2 Pro (INT4)
Budget / single GPU	Qwen3.6 27B or Nex-N2 Mini
Longest context	Any 262K family (GLM-5.2, M3, Kimi, Nex-N2, Qwen3.6)

You can change your model by creating a new Squad Server. Model changes require a new server because the GPU configuration and model weights differ.