Council — Native macOS App for Multi-Model Blind Deliberation

Source: ai-research/council-albertofettucini-2026-06-14.md (GitHub repo + README + FEATURES.md, fetched 2026-06-14), flagged in raw/GitHub_Trending_Weekly_36_-_ghostty-blackhole_LiteDoc_MiMo-Code_Bernini_UniRL_MSA_concord.md

Council is a native macOS app (Swift/SwiftUI, MIT, 81★) that turns “ask one model a hard question” into a structured panel deliberation. The same prompt goes to up to nine seats — each any of twelve backends (Claude · GPT · Gemini · DeepSeek · Grok · Mistral · Perplexity · OpenRouter · Ollama · Apple Intelligence · two custom OpenAI-compatible endpoints) — they answer in parallel, critique each other blind, get scored for how far apart they landed, optionally run one bounded rebuttal round, and the app produces a synthesis that preserves the dissent instead of blending it away. You decide; it logs the decision. It’s the desktop, multi-model embodiment of the “judge panel / perspective-diverse verification” pattern this wiki keeps returning to.

Key Takeaways

The pipeline is the product, not the model. Seven stages: (1) Ask a council of three seats; (2) parallel answers stream live side by side; (3) blind peer review — each advisor critiques the others without knowing authorship, so there’s no brand bias, just the argument; (4) divergence score 0–100 (how far apart, how many camps, who’s the outlier — measures agreement, not correctness); (5) optional bounded debate (one rebuttal round — who moved, who held); (6) synthesis & dissent (a decision-ready distillation plus the outlier’s full answer spotlighted, “because the majority can be confidently wrong together”); (7) you decide and log it in a journal you can revisit to record how it actually turned out.
Model diversity is the point, and personas sharpen it. Distinct per-seat personas (Analyst · Practitioner · Skeptic) plus a Devil’s Advocate role force genuine divergence “not three ways of saying the same thing.” Using different base models gives epistemic diversity that same-model personas can’t — different blind spots, not the same blind spot reworded. ^[inferred — general multi-agent-deliberation principle, consistent with the README’s blind-review rationale]
Preserved dissent is the identity feature. The FEATURES.md spec literally tags “Synthesis with preserved divergence” as ”← YOUR IDENTITY FEATURE”: a blended answer that hides who disagreed and why is the exact thing users complain about. Council surfaces the outlier on its own.
BYOK, fully local, no server, no telemetry. API keys live only in the macOS Keychain (masked in UI; never written to disk, exports, logs, or session files); each key is sent only to that provider’s own endpoint over HTTPS (Ollama stays on localhost). You pay providers directly — Council never sits in the middle. This is also its open-source trust story: the code is auditable.
Cost transparency is treated as urgent, not optional. A four-stage pipeline across three models is ~12+ API calls per question, and BYOK means bill-shock lands on the user — so Council ships a running token/ $t a l l y, a o n e - l in e p r e - r u n es t ima t e ("$ 0.04 · 12 calls”), and an optional spend alert.
There’s a council CLI with a CI gate. The same CouncilKit engine runs headless: pipe a document in, get JSON out (--json, schema council.cli.v1), and gate CI on divergence with --fail-above 40 (exit 1 if the council disagrees too much). CLI runs land in the app’s history.
Minimal-UI as a design thesis. The spec’s PRIME DIRECTIVE is “clean, calm, minimal — the deliberate opposite of the cluttered dashboards every competitor ships (BoltAI, Msty, TypingMind).” When feature visibility and visual calm conflict, choose calm.

Why it matters — deliberation as cheap verification

The wiki’s recurring thesis is that verification is the rate-limiter (see The Verification Frontier): AI leverage compounds where checking is cheap. Council operationalizes one cheap-ish verifier — adversarial cross-examination by independent models — and makes its output legible (a divergence score + a spotlighted dissent) rather than a false-confidence single answer. It’s the multi-model, human-in-the-loop analog of:

the evaluator-optimizer / parallelization shapes in Agent Workflow Patterns;
the single-model /decide contrarian-pass decision memo in Seven Claude Skills Run My Business (Council swaps the one model’s contrarian pass for genuinely different models reviewing blind);
the “judge panel / perspective-diverse verify” orchestration pattern — generate N independent answers from different angles, then let independent critics refute before you commit.

The honest limit the app states itself: the divergence score “measures agreement, not correctness.” A council can converge and be wrong together — which is exactly why the preserved-dissent view and the human decision gate are load-bearing, not decorative.

Implementation

Tool/Service: Council (albertofettucini/Council) — native macOS app + council CLI, both on the shared CouncilKit Swift package. Setup:

App: download the build from the latest release, unzip, drag Council.app to Applications. Requires macOS 14+. It’s unsigned (free solo OSS, no paid Apple cert), so first launch needs right-click → Open → Open (or System Settings → Privacy & Security → “Open Anyway”). Opens normally after.
Build from source: git clone, open Council.xcodeproj (Xcode 16+; Xcode 26 for the Liquid Glass build), ⌘R. No third-party dependencies — pure SwiftUI + Foundation.
Keys: paste your own provider API keys into the masked field (stored in Keychain). Council links you to each provider’s console from the key-entry step if you don’t have one yet. Cost: the app is free (MIT). You pay each model provider directly with your own keys. Apple Intelligence (on-device) and local Ollama seats are free to run; the pre-run estimate + running tally keep the metered seats honest. Integration notes: the CLI is the agent-facing surface —

cd CouncilKit && swift build -c release
cp .build/release/council /usr/local/bin/
council keys set claude
council "should we ship now or wait?" --seats claude,gpt,gemini
cat design.md | council "review this" --md     # attach a doc → decision memo
council "..." --json                            # structured output (council.cli.v1)
council "..." --fail-above 40                   # CI gate: exit 1 if divergence too high

The --fail-above gate is the interesting automation primitive: wire a council into a release pipeline so a contentious design/decision document blocks the merge until humans look, while a low-divergence one passes silently.

Try It

Use it as a decision-memo generator for a real call you’re weighing (vendor choice, architecture, positioning). Seat Claude + GPT + Gemini with Analyst / Skeptic / Practitioner personas, run the blind peer-review + debate, and export the synthesis-with-dissent as a paste-ready memo.
Add a Devil’s Advocate seat on any decision you’re already leaning toward — it’s the cheapest way to surface the strongest counter-argument before you commit.
Pipe a draft through the CLI (cat brief.md | council "poke holes in this" --md) to get a multi-model critique without leaving the terminal.
Prototype a CI divergence gate with --fail-above on a low-stakes repo first to feel out a sensible threshold before trusting it to block real merges.
Verify the privacy claim yourself — it’s open source: read where keys are read/written before trusting it with paid API keys (the README claims Keychain-only, never to disk/logs/exports).

Agent Workflow Patterns — Council is parallelization + evaluator-optimizer with a human decision gate, made into a desktop app.
The Verification Frontier — why multi-perspective deliberation is a verification mechanism, and where it stalls (agreement ≠ correctness).
Seven Claude Skills Run My Business — the single-model /decide decision-memo skill with a contrarian pass; Council is its multi-model cousin.
Venice AI — Private LLM Inference — BYOK / privacy-first sibling; Council’s “custom OpenAI-compatible endpoint” seats can point at a Venice (or local llama.cpp / LM Studio / vLLM) box.
Multi-Agent Patterns — the evaluator-optimizer and parallel-evaluation shapes Council implements.
12-Factor Agents — own-your-control-flow framing; Council’s pipeline is a fixed, legible control flow over stateless model calls.

Open Questions

How is the divergence score actually computed? The README describes it (camps, outlier, 0–100) but not the metric (embedding distance? rubric? a judge model?). Worth reading CouncilKit to know what the number means before gating CI on it.
Does blind peer review reliably stay blind when models can sometimes recognize their own / a rival’s style? Unverified.
No formal eval of decision quality. The value prop is plausible and well-designed, but there’s no published study that Council’s synthesis beats a single strong model on real decisions — the wiki’s standing caution that “agreement ≠ correctness” applies to the tool’s own benefit claim.

Jonathon's AI Wiki

Explorer

Council — Native macOS App for Multi-Model Blind Deliberation

Key Takeaways

Why it matters — deliberation as cheap verification

Implementation

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Council — Native macOS App for Multi-Model Blind Deliberation

Key Takeaways

Why it matters — deliberation as cheap verification

Implementation

Try It

Related

Open Questions

Graph View

Table of Contents

Backlinks