How It Works — Docs

The Big Picture

syndicAI provisions dedicated GPU instances on the spot market, deploys vLLM as the inference engine, and exposes an OpenAI-compatible API for your squad. The entire lifecycle — from provisioning to billing — is managed through syndicAI's control plane so you never touch infrastructure directly.

Your tools (Cursor, aider, SDK)
    ↓  HTTPS + API Key
Squad Server (GPU Node)
    ├── Satellite Proxy (NestJS) — auth, rate limiting, telemetry
    └── Inference Engine (vLLM)  — model serving, token generation
    ↓  Management data only
syndicAI Control Plane
    ├── API (Cloudflare Workers)  — CRUD, billing, lifecycle
    ├── Dashboard (Angular SPA)   — squad management UI
    └── Database (Cloudflare D1)  — accounts, squads, usage

The key architectural principle: token data (your prompts, code, and model responses) never leaves the GPU node. Only management data (usage counts, health checks, lifecycle events) flows to syndicAI's central systems.

Satellite Architecture

Each Squad Server runs a "satellite" — a lightweight NestJS proxy that sits between your tools and the inference engine. The satellite handles:

Authentication: Validates API keys against syndicAI's control plane (local cache, TTL 5 min; fallback to central on miss)
Request routing: Forwards validated requests to the local inference engine
Usage telemetry: Counts tokens processed and reports aggregate usage (not content) to central via periodic heartbeat events
Lifecycle reporting: Reports each stage of the boot sequence and any errors back to central so the dashboard stays in sync

The satellite runs on the same GPU node as the inference engine. There's no intermediate hop, no third-party routing, and no central proxy. Your requests go directly to your dedicated hardware.

Inference Engine

syndicAI uses vLLM as the inference engine for all squad models. vLLM provides:

Continuous batching: Multiple squad members' requests are batched efficiently, so 5–10 concurrent users experience minimal performance degradation
PagedAttention: Efficient GPU memory management for long context windows
Speculative decoding: Faster token generation for supported models
OpenAI-compatible API: Native /v1/chat/completions, /v1/messages (Anthropic), and /v1/models endpoints

Lifecycle

A Squad Server goes through these states:

Provisioning — GPU instance is being allocated on the spot market; container is deploying
Starting — Model weights are loading; inference engine is initializing
Active — Server is running and accepting requests
Stopped — Server has been manually stopped or auto-stopped after reaching the weekly GPU-hour limit
Budget exhausted — Weekly GPU budget consumed; server restarts when the budget resets (Monday 00:00 UTC)
Credit exhausted — Account credit balance reached zero; server stops until credit is added
Error — An unrecoverable error occurred; syndicAI will automatically attempt re-provisioning on a new node

Auto-start and auto-stop ensure you only consume GPU-hours when your squad is actively coding.

Security Model

TLS everywhere: All connections use HTTPS — from your tools to the satellite, and from the satellite to syndicAI's APIs
API key authentication: Each squad member has their own API key. Keys are validated against syndicAI's control plane on each request
Satellite-first data isolation: Token data (prompts, completions, code context) stays on the GPU node. syndicAI's central systems never see, store, or process your code
No logging of token content: The satellite reports aggregate usage metrics (token counts, request counts) but never logs the content of requests or responses
Managed infrastructure: GPU instances run in isolated containers with no shared tenancy. Your squad's server is yours alone

GPU Spot Market

syndicAI provisions GPU instances from the spot market — the same datacenter-class hardware (NVIDIA A100, H100) used by major AI labs, available at a fraction of on-demand pricing.

The spot market makes high-end GPUs accessible:

A100 80GB: Typically $1.00–1.60/hour on the spot market
H100 80GB: Typically $2.00–3.20/hour on the spot market
Multi-GPU configurations: 2× or 4× GPU setups for larger models

syndicAI handles all the complexity of spot market provisioning — instance selection, availability monitoring, automatic migration if a spot instance is reclaimed, and graceful shutdown/restart.