Mechanism: structured recovery
One-line claim: Every failure response is a contract, not a string. Errors and refusals ship a typed
Recovery block — suggested_tool, suggested_args, fuzzy_matches — so the agent can self-heal programmatically instead of parsing English."Table 'user' not found". An agent reading that has two options — give up, or ask the user. SchemaBrain returns the failure as a typed contract: here’s what failed, here’s the next tool to call, here are the arguments. The agent acts on the contract.
This page documents the four shapes that make recovery actionable.
The envelope: every response carries the same shape
All 12 MCP tools return a typedToolResponse envelope (schemabrain/mcp/envelope.py):
ToolResponse._validate_status_data_invariant) enforces that error is populated iff status ∈ {error, refused}. data must be populated for success; other statuses (empty, partial, degraded) allow either a payload or None.
The Recovery block — the agent’s next move
When status ∈ {error, refused}, the ToolError carries a Recovery:
| Field | Purpose |
|---|---|
suggested_tool | The tool name to call next. The agent’s tool registry has it; no resolution needed. |
suggested_args | A ready-to-pass arguments dict. The agent fills the call and goes. |
fuzzy_matches | Plausible alternatives (for unknown_name-shaped errors). The agent can pick one or chain the suggested tool. |
suggested_rewrite | (v1.1, populated by v2’s refuse-with-rewrite path) A safe version of the original query. |
widening_hint | (v1.1, populated by allowlist_violation) The scope that would unblock the call. |
recovery.suggested_tool, calls it, moves on.
The error-kind registry — closed grammar, not free text
ToolError.kind is a Pydantic Literal — a closed set of 26 strings that the agent can switch on without parsing. Adding a new kind is a minor charter version bump; the wire field announces the version.
kind: "ambiguous_time_dimension" can branch on the literal without guessing — and the suggested recovery tool is structured: recovery.suggested_args = {"time_dimension": "order.placed_at"} says exactly which candidate the agent should retry with.
The degraded path — partial success with a structured caveat
Not every fallback is an error. status: "degraded" means the tool ran and returned data, but with a structured warning the agent should respect:
degradation_reason is a closed Literal (DegradationReason) with exactly 3 values:
| Reason | When |
|---|---|
fan_out_join | The metric chain crossed a one_to_many / many_to_many canonical join; aggregates may be inflated by the cross-product. Agent should consider count_distinct over an identity column. |
missing_order_by_with_limit | limit is set + group_by is non-empty + order_by is empty; the LIMIT N slice is non-deterministic. Agent should pass order_by= to specify what “top N” means. |
time_dimension_unavailable | Caller passed time_grain but no reachable timestamp column was found; the plan ships unbucketed. |
The follow_up_hints field — chain the next tool
Most useful agent behavior over SchemaBrain is multi-tool: discover, then describe, then drill in. follow_up_hints is a 1-3 tool-name list every response can carry:
What this is not
- It is not a guarantee of recovery success.
recovery.suggested_toolis a hint, not a promise. The suggested call might also fail (e.g.,find_relevant_entities("user")returns empty). The agent gets another structured response and another recovery hint. - It is not a defense against bad agent behavior. A pathological agent can ignore
recoveryentirely and keep retrying the failing call. The audit trail (/mechanism/audit-chain) catches the loop pattern; per-call cost discipline catches the budget impact. - It is not a substitute for tool descriptions. Recovery contracts catch runtime failures. Tool descriptions (Charter v1.2 Principle 2) prevent the wrong-tool-from-the-start failures. Both layers ship.
- It is not yet a SQL rewrite path.
Recovery.suggested_rewriteis reserved in v1.1 — no current tool emits it. v2’s planned refuse-with-rewrite primitives are the first producers; today’s refusal responses usesuggested_toolinstead.
Verify it yourself
schemabrain serve is a stdio transport — it does not bind a TCP port. You can drive a structured-recovery response directly from a Python REPL by calling the tool implementation:
schemabrain init will also surface the recovery envelope naturally — drive the agent to a known-bad metric name and watch the structured error block come back.
The full envelope contract lives in schemabrain/mcp/envelope.py. Charter rationale is in docs/agent-ux-charter.md §3.
Related
Charter v1.2
The contract every recovery block obeys.
Read-only
Why writes never reach the recovery path.
PII taxonomy
What triggers a pii_blocked refusal.
Audit chain
Every refusal is recorded with its recovery block.