Weaving cosmic threads...

TechnologyFeatured

The AI Model Arena: How We Route Free Models Across 10 Agents

Transparency over trade secrets. Every model, every agent, every decision.

A transparent look at which AI models power Arcanea, why we chose them, and how you can use the same free models in your own workflows. Updated weekly.

FrankXApril 4, 20268 min read

Updated April 17, 2026

This post is now backed by a living routing spec. Claude Opus 4.7 is our orchestrator-of-record; GLM 5 replaces GLM 4.7 for research and docs; NVIDIA Nemotron 3 Super is our new throughput engine. The full task-class to model map (including BYOK and subscription tiers) lives at packages/router-spec/models.yaml — every Arcanea surface (Claude Code, OpenCode, MCP, this page) reads from it.

Why Model Transparency Matters

Most AI products treat their model stack like a trade secret. You interact with a chat interface, you get a response, and somewhere behind the curtain a model you cannot name processed your request at a cost you cannot see.

We think this is wrong. If you are building with AI — if your creative work, your code, your world-building depends on these systems — you deserve to know exactly what is running, why it was chosen, and what it costs. Arcanea publishes its full model stack. Every agent, every model, every routing decision.

The Free Model Revolution

Something unprecedented happened in early 2026. Chinese labs and open-source projects began releasing frontier-class models with free API tiers. Not “free trial” — genuinely free, with generous rate limits and competitive performance. This changed the economics of agentic engineering overnight.

Here are the models we currently route through OpenCode Zen at zero cost:

Qwen 3.6 PlusAlibaba

1M token context window, best-in-class agentic reasoning, strong multilingual performance. The model you reach for when an agent needs an entire codebase in context.

MiniMax M2.5MiniMax

Highest SWE-Bench score among free models. Exceptional at code generation, refactoring, and debugging. Where Qwen thinks broadly, MiniMax cuts precisely.

Kimi K2.5Moonshot AI

Strong mathematical reasoning and structured analysis. When a task needs careful step-by-step thinking rather than broad pattern matching, Kimi delivers.

GLM 5Zhipu AI

Successor to GLM 4.7 — preserved thinking traces, stronger multilingual reasoning, and the free-tier leader for deep research and long-form documentation.

Nemotron 3 SuperNVIDIA

1M context at roughly 5x throughput of most peers. When an agent needs to scan, transform, or navigate at speed, Nemotron is the fastest option in the free tier.

Big PickleCommunity

The wildcard. Competitive benchmarks with surprisingly strong creative and conversational abilities. Personality and tone over raw reasoning.

GPT-5 NanoOpenAI

Lightweight, fast inference, reliable structured output. The workhorse for tasks that need speed over depth.

Our Routing Philosophy

The naive approach to model routing is simple: find the best model, use it for everything. This is wrong for three reasons.

Task fit. A model that excels at code generation may produce mediocre creative writing. A model with a 1M context window is overkill for a 200-token classification task.

Latency. If an agent needs a quick decision to unblock a pipeline, waiting 5 seconds for a large model when a small model could answer in 200ms is a design failure.

Resilience. If your entire system depends on one model and that API goes down, your system goes down. Routing across multiple providers means no single point of failure.

The Arcanean approach: 10 specialized agents, each assigned the model whose strengths match their role. The orchestrator does not pick the “best” model — it picks the right model for each task.

The Routing Table

Full agent-to-model mapping. This table is not static — we re-evaluate weekly based on benchmark updates, new model releases, and observed production performance.

Agent	Role	Model	Rationale
Sisyphus	Orchestrator	Qwen 3.6 Plus	1M context holds the full project state. Best agentic reasoning for task decomposition and delegation.
Hephaestus	Coder	MiniMax M2.5	Highest SWE-Bench at 80.2%. Writes code that passes tests on the first attempt.
Oracle	Architect	Big Pickle	Deep, deliberate reasoning for system design. Sees the whole picture before suggesting changes.
Prometheus	Researcher	Qwen 3.6 Plus	1M context ingests entire papers and codebases. The fire-bringer of knowledge.
Metis	Strategist	Qwen 3.6 Plus	Long-context reasoning to weigh trade-offs across the entire system.
Momus	Reviewer	MiniMax M2.5	80.2% SWE-Bench catches what others miss. The honest critic your code needs.
Atlas	Coordinator	Kimi K2.5	Strongest frontend model. Carries the world of integrations and UI work.
Librarian	Docs/Research	GLM 5	Successor to GLM 4.7 with preserved thinking traces. Multilingual research, natural documentation prose, deeper reasoning.
Explore	Navigator	GPT-5 Nano	Fastest free model. Instant wayfinding through any codebase.

Benchmarks That Matter

SWE-Bench is the gold standard for coding ability — it measures whether a model can actually fix real bugs in real codebases. But it is one metric among many. Here is our full evaluation framework:

SWE-Bench Verified

Can the model fix real software bugs in real codebases?

Leader: MiniMax M2.5

Context Window (Effective)

How well the model actually uses long context, not the advertised max.

Leader: Qwen 3.6 Plus

Inference Speed

Time to first token and tokens per second.

Leader: GPT-5 Nano

Instruction Following

Does the model do exactly what you asked without drift?

Leader: Qwen 3.6 Plus

Creative Quality

Human-evaluated output for narrative, dialogue, and world-building.

Leader: Claude Opus 4.7

Cost Efficiency

All free tier — but rate limits and throughput differ.

Leader: GPT-5 Nano

How to Use This Yourself

Everything described here is reproducible. The model routing runs through OpenCode Zen, and the configuration is open. Visit /models for the live arena with current benchmarks and model cards.

To configure the same routing in your own oh-my-opencode setup:

.opencode/config.yaml

// ~/.config/opencode/oh-my-opencode.json
{
  "agents": {
    "sisyphus": {
      "model": "opencode/qwen3.6-plus-free",
      "variant": "high",
      "fallback_models": ["opencode/minimax-m2.5-free", "opencode/kimi-k2.5-free"]
    },
    "hephaestus": {
      "model": "opencode/minimax-m2.5-free",
      "variant": "medium",
      "fallback_models": ["opencode/kimi-k2.5-free", "opencode/qwen3.6-plus-free"]
    },
    "oracle": {
      "model": "opencode/big-pickle",
      "fallback_models": ["opencode/qwen3.6-plus-free"]
    }
  },
  "model_fallback": true
}

Each agent gets a primary model matched to its role, with fallback chains that cascade if a model is rate-limited. The model_fallback flag enables automatic switching. Full config with all 9 agents and fallback chains is on the Models page.

Weekly Updates

The AI model landscape shifts fast. New models drop weekly. Benchmarks update. Free tiers change. We track all of it.

The /models page updates every week with new model additions, updated benchmark scores, routing table changes, and rate limit updates.

The model arena is not a static page — it is a living document of how we build with AI. Transparency is not a feature. It is a principle.

View the Model Arena Developer Docs Try in Chat

Tags:#ai-models#free-ai#benchmarks#model-routing#opencode#agentic-engineering#llm-comparison#cost-optimization

Back to Blog

TechnologyFeatured

The AI Model Arena: How We Route Free Models Across 10 Agents

Transparency over trade secrets. Every model, every agent, every decision.

A transparent look at which AI models power Arcanea, why we chose them, and how you can use the same free models in your own workflows. Updated weekly.

FrankXApril 4, 20268 min read

Updated April 17, 2026

Why Model Transparency Matters

The Free Model Revolution

Here are the models we currently route through OpenCode Zen at zero cost:

Qwen 3.6 PlusAlibaba

1M token context window, best-in-class agentic reasoning, strong multilingual performance. The model you reach for when an agent needs an entire codebase in context.

MiniMax M2.5MiniMax

Highest SWE-Bench score among free models. Exceptional at code generation, refactoring, and debugging. Where Qwen thinks broadly, MiniMax cuts precisely.

Kimi K2.5Moonshot AI

Strong mathematical reasoning and structured analysis. When a task needs careful step-by-step thinking rather than broad pattern matching, Kimi delivers.

GLM 5Zhipu AI

Successor to GLM 4.7 — preserved thinking traces, stronger multilingual reasoning, and the free-tier leader for deep research and long-form documentation.

Nemotron 3 SuperNVIDIA

1M context at roughly 5x throughput of most peers. When an agent needs to scan, transform, or navigate at speed, Nemotron is the fastest option in the free tier.

Big PickleCommunity

The wildcard. Competitive benchmarks with surprisingly strong creative and conversational abilities. Personality and tone over raw reasoning.

GPT-5 NanoOpenAI

Lightweight, fast inference, reliable structured output. The workhorse for tasks that need speed over depth.

Our Routing Philosophy

The naive approach to model routing is simple: find the best model, use it for everything. This is wrong for three reasons.

Task fit. A model that excels at code generation may produce mediocre creative writing. A model with a 1M context window is overkill for a 200-token classification task.

Latency. If an agent needs a quick decision to unblock a pipeline, waiting 5 seconds for a large model when a small model could answer in 200ms is a design failure.

Resilience. If your entire system depends on one model and that API goes down, your system goes down. Routing across multiple providers means no single point of failure.

The Routing Table

Full agent-to-model mapping. This table is not static — we re-evaluate weekly based on benchmark updates, new model releases, and observed production performance.

Agent	Role	Model	Rationale
Sisyphus	Orchestrator	Qwen 3.6 Plus	1M context holds the full project state. Best agentic reasoning for task decomposition and delegation.
Hephaestus	Coder	MiniMax M2.5	Highest SWE-Bench at 80.2%. Writes code that passes tests on the first attempt.
Oracle	Architect	Big Pickle	Deep, deliberate reasoning for system design. Sees the whole picture before suggesting changes.
Prometheus	Researcher	Qwen 3.6 Plus	1M context ingests entire papers and codebases. The fire-bringer of knowledge.
Metis	Strategist	Qwen 3.6 Plus	Long-context reasoning to weigh trade-offs across the entire system.
Momus	Reviewer	MiniMax M2.5	80.2% SWE-Bench catches what others miss. The honest critic your code needs.
Atlas	Coordinator	Kimi K2.5	Strongest frontend model. Carries the world of integrations and UI work.
Librarian	Docs/Research	GLM 5	Successor to GLM 4.7 with preserved thinking traces. Multilingual research, natural documentation prose, deeper reasoning.
Explore	Navigator	GPT-5 Nano	Fastest free model. Instant wayfinding through any codebase.

Benchmarks That Matter

SWE-Bench is the gold standard for coding ability — it measures whether a model can actually fix real bugs in real codebases. But it is one metric among many. Here is our full evaluation framework:

SWE-Bench Verified

Can the model fix real software bugs in real codebases?

Leader: MiniMax M2.5

Context Window (Effective)

How well the model actually uses long context, not the advertised max.

Leader: Qwen 3.6 Plus

Inference Speed

Time to first token and tokens per second.

Leader: GPT-5 Nano

Instruction Following

Does the model do exactly what you asked without drift?

Leader: Qwen 3.6 Plus

Creative Quality

Human-evaluated output for narrative, dialogue, and world-building.

Leader: Claude Opus 4.7

Cost Efficiency

All free tier — but rate limits and throughput differ.

Leader: GPT-5 Nano

How to Use This Yourself

Everything described here is reproducible. The model routing runs through OpenCode Zen, and the configuration is open. Visit /models for the live arena with current benchmarks and model cards.

To configure the same routing in your own oh-my-opencode setup:

.opencode/config.yaml

// ~/.config/opencode/oh-my-opencode.json
{
  "agents": {
    "sisyphus": {
      "model": "opencode/qwen3.6-plus-free",
      "variant": "high",
      "fallback_models": ["opencode/minimax-m2.5-free", "opencode/kimi-k2.5-free"]
    },
    "hephaestus": {
      "model": "opencode/minimax-m2.5-free",
      "variant": "medium",
      "fallback_models": ["opencode/kimi-k2.5-free", "opencode/qwen3.6-plus-free"]
    },
    "oracle": {
      "model": "opencode/big-pickle",
      "fallback_models": ["opencode/qwen3.6-plus-free"]
    }
  },
  "model_fallback": true
}

Weekly Updates

The AI model landscape shifts fast. New models drop weekly. Benchmarks update. Free tiers change. We track all of it.

The /models page updates every week with new model additions, updated benchmark scores, routing table changes, and rate limit updates.

The model arena is not a static page — it is a living document of how we build with AI. Transparency is not a feature. It is a principle.

View the Model Arena Developer Docs Try in Chat

Tags:#ai-models#free-ai#benchmarks#model-routing#opencode#agentic-engineering#llm-comparison#cost-optimization

Back to Blog