SchemaBrain MCP Charter v1.2.0
Status: locked 2026-05-12 as the public design contract for SchemaBrain’s
MCP surface. Living document; version bumps governed by the Versioning
section below. All MCP tools shipped from v0.5 onward conform to this charter
unless explicitly noted in their docstring.
Current charter version: v1.2.0. Past releases are collected in
the Version history section at the end of the page.
Preamble
This charter is the design law for SchemaBrain’s MCP server. It exists because every existing semantic layer and database catalog was designed for humans first (analysts, BI tools, data engineers), with MCP retrofitted on top. SchemaBrain is the opposite: the primary consumer of every tool is an LLM, and the design choices follow from that. This document is for three audiences:- Contributors adding or modifying MCP tools — every PR is reviewed against the principles and enforcement levels below.
- Operators integrating SchemaBrain into agent stacks — the response envelope and per-tool metadata are the stable contracts you can build on.
- Other MCP authors — SchemaBrain commits publicly to these principles because no canonical “agent-first MCP design” reference exists yet. Adoption, criticism, and divergence are all welcome.
What “agent-first” means concretely
Six design choices follow from “the primary consumer is an LLM”:| Choice | Human-first server | SchemaBrain |
|---|---|---|
| Definition entry | Hand-authored YAML / API docs | Auto-inferred from schema + behavior |
| Response shape | Optimized for human parsing | Optimized for LLM composition |
| Tool descriptions | What the tool does | When to use it, when not to, what to combine it with |
| Errors | Stack traces / exception types | Recovery contracts (kind + message + next-call hint) |
| Confidence | Implicit / trust the operator | Explicit HIGH / MEDIUM / LOW with provenance |
| Update model | Tied to model deploy | Continuous re-index from observed warehouse traffic |
Principles
1. Status enum, not boolean
Every tool response carries astatus enum with six values. The
sixth, refused, was reserved in v1.1 and is emitted today by
get_metric on the PII-block path; the type contract is stable for
additional producers as they ship. A boolean ok / error split
silently lumps partial responses and empty results into “success,”
which is the false-positive trap that turns into a backstab in
production.
| Status | Meaning |
|---|---|
success | Tool ran, returned the requested data. |
empty | Tool ran, no matches / no data — not an error. e.g. find_relevant_tables returned zero hits. |
partial | Tool ran, returned some data with caveats. e.g. an enrichment job timed out mid-table; here is what completed. |
degraded | Tool ran via a fallback path. e.g. keyword retriever used because the embedding store was unavailable. |
error | Tool could not process. Always paired with a populated error object. |
refused | Tool ran cleanly and chose to refuse — typically because the query would touch PII or violate an allowlist. Always paired with a populated error object using one of pii_blocked / policy_blocked / allowlist_violation. Emitted today by get_metric on the PII-block path (pii_blocked); policy_blocked and allowlist_violation are reserved producers. |
2. Tool descriptions are “use when” statements, not API docs
API docs describe what a tool does. Agent-first descriptions describe when to use it, when not to, and what to combine it with. This is the single highest-leverage place to influence agent tool-choice behavior — the LLM never reads your code, but it reads every tool description on every turn. Three-rule structure for every tool description:- Lead with “Use this when…” — orients the LLM’s tool-choice mental model in the first sentence.
- Include “Use X instead when…” or “Don’t use when…” — disambiguates against neighbour tools.
- Name 1–2 common compositions — encodes workflow into the description so the LLM falls into the right flow naturally.
Returns information about a database table, including its columns, data types, and foreign keys.✅ Right:
Use this when the user names a specific table (e.g. “show me the orders table”). Returns columns with types, foreign keys, and an LLM-generated description. Usefind_relevant_tablesinstead when the user describes the table semantically (“the table with customer data”) rather than by name. Common compositions: chainfind_relevant_tables → describe_tablefor semantic-to-structural queries; chaindescribe_table → describe_columnto drill into a specific column’s join graph.
Verification
A February 2026 arXiv study of 856 real-world MCP tools (Smelly MCP Tool Descriptions) found that 97% have at least one description quality “smell” — most commonly Unclear Purpose, Missing Usage Guidelines, and Unstated Limitations. The three-rule structure above directly attacks the first two; the lint rule (Enforcement level 1) is the cheap mechanical check. Tool descriptions are also tested via blind agent eval: same descriptions, fixed query set, run against Claude / GPT / Gemini. Tool-choice agreement and end-to-end task success rate are tracked over time; the threshold is a calibration knob, not a hardcoded floor, with the first baseline measured once query-log mining surfaces realistic agent intents (see Open items for the staging plan). See Enforcement level 3.3. Errors are prompts for the next tool call
Every error returns three things: what failed, why, and what to try next. No stack traces, no exception type names, no Python-side jargon. An error is the agent’s opportunity to recover — give it the recovery path. Error contract:Initial error-kind registry
The full registry is maintained in code (Pydantic Literal on thekind field).
v1.0 ships with these kinds; additions are minor-version bumps.
| Kind | When |
|---|---|
unknown_name | Caller referenced a name that doesn’t exist (table, column, schema). |
malformed_name | Caller passed a name that violates the expected shape (e.g. bare orders instead of schema.orders). |
missing_credential | A required credential (env var, config) is absent at call time. |
index_not_ready | A query hit the MCP server before schemabrain index ran successfully. |
schema_drift | The store and the live source disagree about object existence. |
cost_cap_exceeded | The configured --max-cost was reached mid-call. |
internal_error | A bug; the agent should not retry. Logged for repair. |
4. Confidence is HIGH/MEDIUM/LOW with per-field provenance
Confidence is reported as a three-bucket enum, not a raw float. Buckets force the server to commit to a trust judgment instead of pushing raw scores into the LLM’s reasoning chain. Floats are kept internally for sorting and calibration; the API surface buckets at the boundary. Note: this is a design choice, not a research finding. The published calibration literature is split — proper-scoring-rule RL with continuous scores remains competitive on benchmarks. We chose buckets because they expose a smaller surface for the LLM to over-interpret, and because the threshold values are easier to tune from observed agent behavior than a continuous scoring head.| Bucket | Internal float range | Semantics |
|---|---|---|
HIGH | ≥ 0.8 | Schema-sourced facts, declared FKs, exact name matches. |
MEDIUM | 0.5 – 0.8 | LLM-generated descriptions with strong context; query-log-inferred joins with multiple observations. |
LOW | < 0.5 | LLM-generated descriptions with weak context; single-observation inferences. |
null | n/a | Confidence does not apply (e.g. on a structural facts-only response). |
5. Tools document composition patterns
Most useful agent behavior over SchemaBrain is multi-tool: discover, then describe, then drill in. The charter declares canonical workflows so the LLM doesn’t have to derive them from scratch every session. Composition patterns live in two places:- Inside each tool description (Principle 2 already requires “name 1–2 common compositions”).
- In an aggregated workflow reference (this section) for the cases that span more than two tools.
Canonical workflows (v1.0)
| User intent | Workflow |
|---|---|
| ”What’s in this database?” | list_indexed_schemas → find_relevant_tables(query="*") |
| ”Tell me about a domain (e.g. ‘revenue’)“ | find_relevant_tables → describe_table (top 1–3 hits) → describe_column for any low-confidence descriptions |
| ”How do these tables relate?” | suggest_joins → describe_table on any bridge tables |
| ”I want to aggregate something” | list_metrics → describe_entity (for the bound entity) → get_metric |
| ”Show me how others have queried this” | get_example_queries(table_or_column) |
Specs
Response envelope
Every MCP tool returns a Pydantic-typed object conforming to this shape:follow_up_hints is the lightweight version of composition: the tool names
1–3 next tools the agent might want to call. The agent is free to ignore
them, but they reduce the chance of dead-end branches.
Transport integration
SchemaBrain delivers the envelope inside MCP’sstructuredContent field,
with a serialized JSON mirror in content[0].text for backward compatibility
with clients that don’t yet read structuredContent. The envelope shape is
published as each tool’s outputSchema so spec-compliant clients can
validate without an out-of-band Pydantic schema. See the
MCP specification on tool results.
Response size discipline
Per Anthropic’s published guidance, tool responses should stay under ~25k tokens unless explicitly necessary. Tools that can return large payloads expose aresponse_format parameter:
concise returns the minimum useful payload (top match, summary fields).
detailed returns the full structured response. Default is concise so
agents opt in to larger payloads only when needed.
Applies to find_relevant_tables, describe_table, and (when shipped)
get_example_queries. Tools that always return small payloads
(describe_column, suggest_joins at low max_hops) need not implement it.
Per-tool metadata
Each tool exposes metadata alongside its response (not inside it — that would pay token cost on every call). The metadata is fetched once per session by the MCP transport layer.latency_hint:fast< 100ms,moderate100ms–1s,slow≥ 1s.idempotent: safe to retry without observable change in outcome.side_effects:none= pure compute,read= touches the store / source,write= mutates the store. Onlyread/noneon the MCP tool surface;writereserved for future surfaces (e.g. operator-sideapply/import).
| SchemaBrain field | Canonical MCP hint |
|---|---|
side_effects: "none" | readOnlyHint: true, destructiveHint: false, openWorldHint: false |
side_effects: "read" | readOnlyHint: true, destructiveHint: false, openWorldHint: true |
side_effects: "write" | readOnlyHint: false, destructiveHint: true, openWorldHint: true |
idempotent: true | idempotentHint: true |
idempotent: false | idempotentHint: false |
Versioning
The charter follows semver:- Patch (1.0.0 → 1.0.1) — clarification, typo fixes, examples added. No shape change.
- Minor (1.0 → 1.1) — additive changes. New error kinds, new optional envelope fields, new principles that don’t invalidate prior ones. Backward compatible.
- Major (1.x → 2.0) — breaking changes. Removing fields, changing field types, retiring principles. Backward compatibility is guaranteed within a major version only.
charter_version. The wire field
emits the shape contract version (major.minor only — e.g. "1.0",
"1.1", "1.2", "2.0"); patch bumps are documentation-only and do not change
the wire emission. A consumer pinning on "1.0" therefore receives all
1.0.x doc clarifications transparently. Consumers can pin or negotiate.
SchemaBrain commits to maintaining the most-recent two major versions
simultaneously when a major bump occurs.
Version history
v1.2.0 (2026-05-23) — additive: 2D trust signal
v1.2.0 (2026-05-23) — additive: 2D trust signal
New optional
Provenance.inference_method Literal (closed:
manually_authored / llm_suggested / fk_constraint /
dbt_import / observed_in_query_log) names how each fact
was derived. New optional Provenance.validation_state Literal
(closed: draft / applied / confirmed) names how validated
that fact is.The orthogonal axes replace the pre-1.2 behaviour where every
producer hardcoded confidence="HIGH" regardless of derivation
(which conflated FK-derived joins with LLM-guessed metrics on
the same scale). The confidence field stays — its value is
now derived from the 2D signal via derive_confidence(). Old
clients reading only confidence see a more honest 1D label;
new clients can read the 2D signal directly.All changes are backward-compatible with v1.0 / v1.1 clients.
The wire charter_version field bumps from "1.1" to "1.2".
Full type spec in schemabrain/mcp/envelope.py.v1.1.0 (2026-05-15) — additive: refusal taxonomy
v1.1.0 (2026-05-15) — additive: refusal taxonomy
Three new ErrorKinds (
pii_blocked, policy_blocked,
allowlist_violation); reserved refused status in the Status
literal (no v0.5 / v1 tool emits it — v2’s execute /
validate_query are the first producers); two new optional
Recovery fields (suggested_rewrite, widening_hint) as
the shape v2’s refuse-with-rewrite path will populate.All changes are backward-compatible with v1.0 clients. The
wire charter_version field bumps from "1.0" to "1.1".v1.0.1 (2026-05-15) — clarification-only
v1.0.1 (2026-05-15) — clarification-only
Replaced internal milestone references with the substantive
trigger they stood for (query-log mining surfacing realistic
agent intents). No shape change.
Enforcement
Three levels, two always-on, one at phase boundaries.| Level | What it checks | Cost | Cadence |
|---|---|---|---|
| 1. Description lint | Each tool description starts with “Use this when…”, names at least one composition, stays under 500 chars. | $0 | every PR |
| 2. Envelope schema | Every tool response Pydantic-validates against the envelope. Status enum is honored. Required fields are present. | $0 | every PR |
| 3. Blind agent eval | Fixed query set run against Claude, GPT, Gemini. Tool-choice agreement and end-to-end task success rate tracked over time. Initial baseline measured once query-log mining surfaces realistic agent intents; thresholds are a calibration knob, not a hardcoded floor. | ~$5–10 / run | phase boundary (end of v0.5, end of v1, end of v2, …) |
scripts/charter_lint.py —
wired into the lint-and-unit job in .github/workflows/ci.yml. The script
loads the live FastMCP server, applies the four Principle 2 description rules
above, then round-trips each tool’s happy path through ToolResponse Pydantic
validation. Contributors can reproduce the gate locally with
python scripts/charter_lint.py; rule logic lives in pure functions that are
unit-tested in tests/test_charter_lint.py.
Anti-pattern style
This charter does not maintain a standalone anti-pattern section. Each principle above pairs its rule with one ❌ / ✅ example. Anti-patterns are illustrations of principles, not their own discipline. Rationale: standalone anti-pattern sections (1) tend to multiply unbounded as the project ages, (2) read as judgment of other MCP servers in the ecosystem, and (3) drift in tone from instructional to preachy.Open items (deferred to future minor versions)
These are known gaps in v1.0. Each will land in a minor version when its implementation reaches readiness.- Error-kind registry expansion — v1.0 shipped 7 kinds; v1.1
added 3 (
pii_blocked,policy_blocked,allowlist_violation) for the refuse-before-execute taxonomy. Real-world agent traffic will surface more (especially around partial results, rate-limiting, transient failures). Further additions remain minor bumps. refusedstatus producers — first producer landed in v0.4 (get_metricPII-block path emitsrefused+pii_blocked).policy_blockedandallowlist_violationremain reserved; theRecoveryshape gainedsuggested_rewriteandwidening_hintfields in v1.1 to support the refuse-with-rewrite and refuse-with-widening-hint paths future producers will populate.- Eval query set — the fixed query set used for Level 3 enforcement is defined and frozen once the query-log mining feature surfaces realistic agent intents from real workloads. Until then, Level 3 runs on a hand-curated starter set.
charter_versionnegotiation protocol — v1.0 publishes the version in metadata; explicit client-side negotiation is deferred until multiple major versions exist.- Cost-hint baselines —
cost_hintfields ship in v1.0, but the numbers are extrapolations until measured against the 2026-05-11 cost anchors and beyond. - Code-execution surface (paradigm watch) — Anthropic’s November 2025
code execution with MCP
reframes tools as code APIs loaded on demand. SchemaBrain’s
find_relevant_tables→describe_tablechain is a candidate for a singleschemabrain.pymodule exposing typed Python functions to a code-executing agent. Decision deferred until v0.7 once query-log data confirms which agent composition patterns dominate. Flagged here so we don’t appear blind to the paradigm shift.
How to propose changes
Open a PR against this file with:- The principle / spec being changed.
- The motivation (one paragraph — what agent behavior surfaces the gap).
- The proposed semver bump (patch / minor / major).
- Backward-compatibility impact, if any.