Why Model Transparency Matters
Most AI products treat their model stack like a trade secret. You interact with a chat interface, you get a response, and somewhere behind the curtain a model you cannot name processed your request at a cost you cannot see.
We think this is wrong. If you are building with AI — if your creative work, your code, your world-building depends on these systems — you deserve to know exactly what is running, why it was chosen, and what it costs. Arcanea publishes its full model stack. Every agent, every model, every routing decision.
The Free Model Revolution
Something unprecedented happened in early 2026. Chinese labs and open-source projects began releasing frontier-class models with free API tiers. Not “free trial” — genuinely free, with generous rate limits and competitive performance. This changed the economics of agentic engineering overnight.
Here are the models we currently route through OpenCode Zen at zero cost:
Qwen 3.6 PlusAlibaba1M token context window, best-in-class agentic reasoning, strong multilingual performance. The model you reach for when an agent needs an entire codebase in context.
MiniMax M2.5MiniMaxHighest SWE-Bench score among free models. Exceptional at code generation, refactoring, and debugging. Where Qwen thinks broadly, MiniMax cuts precisely.
Kimi K2.5Moonshot AIStrong mathematical reasoning and structured analysis. When a task needs careful step-by-step thinking rather than broad pattern matching, Kimi delivers.
GLM 4.7Zhipu AIBalanced general-purpose model with particularly strong creative writing. Generates narrative content that reads naturally rather than mechanically.
Big PickleCommunityThe wildcard. Competitive benchmarks with surprisingly strong creative and conversational abilities. Personality and tone over raw reasoning.
GPT-5 NanoOpenAILightweight, fast inference, reliable structured output. The workhorse for tasks that need speed over depth.
Our Routing Philosophy
The naive approach to model routing is simple: find the best model, use it for everything. This is wrong for three reasons.
Task fit. A model that excels at code generation may produce mediocre creative writing. A model with a 1M context window is overkill for a 200-token classification task.
Latency. If an agent needs a quick decision to unblock a pipeline, waiting 5 seconds for a large model when a small model could answer in 200ms is a design failure.
Resilience. If your entire system depends on one model and that API goes down, your system goes down. Routing across multiple providers means no single point of failure.
The Arcanean approach: 10 specialized agents, each assigned the model whose strengths match their role. The orchestrator does not pick the “best” model — it picks the right model for each task.
The Routing Table
Full agent-to-model mapping. This table is not static — we re-evaluate weekly based on benchmark updates, new model releases, and observed production performance.
| Agent | Role | Model | Rationale |
|---|---|---|---|
| Sisyphus | Orchestrator | Qwen 3.6 Plus | 1M context holds the full project state. Best agentic reasoning for task decomposition and delegation. |
| Hephaestus | Coder | MiniMax M2.5 | Highest SWE-Bench at 80.2%. Writes code that passes tests on the first attempt. |
| Oracle | Architect | Big Pickle | Deep, deliberate reasoning for system design. Sees the whole picture before suggesting changes. |
| Prometheus | Researcher | Qwen 3.6 Plus | 1M context ingests entire papers and codebases. The fire-bringer of knowledge. |
| Metis | Strategist | Qwen 3.6 Plus | Long-context reasoning to weigh trade-offs across the entire system. |
| Momus | Reviewer | MiniMax M2.5 | 80.2% SWE-Bench catches what others miss. The honest critic your code needs. |
| Atlas | Coordinator | Kimi K2.5 | Strongest frontend model. Carries the world of integrations and UI work. |
| Librarian | Docs/Research | GLM 4.7 | Multilingual research and knowledge extraction. Natural documentation prose. |
| Explore | Navigator | GPT-5 Nano | Fastest free model. Instant wayfinding through any codebase. |
Benchmarks That Matter
SWE-Bench is the gold standard for coding ability — it measures whether a model can actually fix real bugs in real codebases. But it is one metric among many. Here is our full evaluation framework:
Can the model fix real software bugs in real codebases?
How well the model actually uses long context, not the advertised max.
Time to first token and tokens per second.
Does the model do exactly what you asked without drift?
Human-evaluated output for narrative, dialogue, and world-building.
All free tier — but rate limits and throughput differ.
How to Use This Yourself
Everything described here is reproducible. The model routing runs through OpenCode Zen, and the configuration is open. Visit /models for the live arena with current benchmarks and model cards.
To configure the same routing in your own oh-my-opencode setup:
// ~/.config/opencode/oh-my-opencode.json
{
"agents": {
"sisyphus": {
"model": "opencode/qwen3.6-plus-free",
"variant": "high",
"fallback_models": ["opencode/minimax-m2.5-free", "opencode/kimi-k2.5-free"]
},
"hephaestus": {
"model": "opencode/minimax-m2.5-free",
"variant": "medium",
"fallback_models": ["opencode/kimi-k2.5-free", "opencode/qwen3.6-plus-free"]
},
"oracle": {
"model": "opencode/big-pickle",
"fallback_models": ["opencode/qwen3.6-plus-free"]
}
},
"model_fallback": true
}Each agent gets a primary model matched to its role, with fallback chains that cascade if a model is rate-limited. The model_fallback flag enables automatic switching. Full config with all 9 agents and fallback chains is on the Models page.
Weekly Updates
The AI model landscape shifts fast. New models drop weekly. Benchmarks update. Free tiers change. We track all of it.
The /models page updates every week with new model additions, updated benchmark scores, routing table changes, and rate limit updates.
The model arena is not a static page — it is a living document of how we build with AI. Transparency is not a feature. It is a principle.