SchemaBrain — Security Posture

Procurement-friendly summary of SchemaBrain’s security architecture, threat model, and disclosure process. For the full attack-surface walk-through with code citations, see docs/threat-model.md.

SchemaBrain is the trust and intelligence layer between AI agents and your production Postgres database. This document is the summary your security team will want to read before approving the package.

At a glance

Property	Posture
Deployment model	Local single-process MCP server. Runs on the operator’s own machine.
Network egress	None to SchemaBrain infrastructure (we do not run any). LLM enrichment during `index` calls Anthropic if the operator opts in.
Source code	Apache-2.0, open source (github.com/Arun-kc/schemabrain). Every claim on this page is verifiable from the code.
Database credential exposure	Env-var-only. Never logged, never returned in any envelope, filtered out of SQLAlchemy connection strings via `safe_engine_url`.
Wire format	MCP over stdio (no HTTPS / SSE transport today).
Multi-tenant	No. Single-tenant by design.
Hosted service	None. Local-first is an architectural commitment, not a feature gap.

The four load-bearing mechanisms

Detail in docs/mechanism/; summarized for security review:

1. Architectural read-only

The MCP surface exposes 12 tools, none of which accept SQL. There is no execute_query, no run_sql, no validate_query. Agents emit structured tool calls; SchemaBrain compiles parameterized SQL from operator-validated definitions. Belt-and-suspenders: default_transaction_read_only=on is set at the session level on every connection (connectors/postgres.py and cli.py at both the serve-side metric executor and the query-log miner — source), and NullPool eliminates the connection-reuse pollution surface across all three database engines in the binary.

2. PII propagation at the metric compiler

12 categories grounded in GDPR / CCPA / HIPAA / PCI DSS. Each column is tagged at index time. Tags propagate through five surfaces at compile time — measure columns (including composite-expression operands), time-dimension columns, group_by columns, filter predicates, and JOIN ON pairs. A get_metric whose path touches a blocked category refuses before the database is queried. Three catastrophic-leak categories (credential, payment_card, government_id) are blocked by default on zero-config install. Grouping BY a tagged column is refused as row-level disclosure regardless of policy — the group keys would be the raw values (count(*) ... GROUP BY email is SELECT DISTINCT email in disguise), so it is blocked on the same rule as MIN/MAX over a tagged column.

3. Tamper-evident audit chain

Every tool call writes a row to an append-only mcp_audit SQLite table. Each row carries chain_hash[N] = sha256(chain_hash[N-1] || canonical(row[N])). SQL triggers forbid UPDATE/DELETE at the database layer (audit/ddl.py:64, :71). schemabrain audit verify re-walks the chain; exit codes are 0 (intact), 1 (mismatch found), 2 (operational error).

4. Structured recovery envelopes

Typed refusal contracts — the agent gets back recovery.suggested_tool and recovery.suggested_args rather than English error strings. Closed ErrorKind Literal of 26 values means audit pipelines can switch on error.kind without parsing. The refusal envelope surfaces only the blocked category set, not the attempted set — closes a probe-oracle attack where iterative group_by calls could reconstruct the PII tagging in O(columns) calls.

Threat model (summary)

The full version with code citations is at docs/threat-model.md. Quick walk through the four assets:

Asset 1 — MCP tool surface

Threat	Mitigation
DB credentials in argv / journald / shell history	Env-var-only via `--url-env VARNAME`; positional URL is deprecated with a warning
SQL injection via tool arguments	All identifiers validated against `^[A-Za-z_][A-Za-z0-9_$]*$` with 63-char bound; all SQL parameterized
Audit log tampering	SQL triggers forbid mutation; SHA256 chain detects post-hoc rewrites via any external snapshot
URL query string smuggling session config	`safe_engine_url` strips everything except `sslmode` / `sslrootcert` / `application_name`
Connection pool session-state escape	`NullPool` on all three engines — every call opens a fresh connection

Asset 2 — Prompt injection into the database

Threat	Mitigation
Column comment overrides the enrichment system prompt	Structured JSON delimiters; descriptions are data, not code; operator can re-run `index` to overwrite
Sample value contains exfiltration payload	Two layers: email / SSN / credit-card-shaped values are redacted in place before samples reach the store, and both sample values and their shape signatures are dropped entirely from the enrichment prompt for any column the name-based PII classifier flags (so names, addresses, phones, and non-US IDs whose column name signals PII never reach the prompt). Residual gaps: free-text columns with generic names (`notes`, `bio`) classify public, and generic non-PII injection payloads in a public column’s samples still survive (this is a PII filter, not an injection filter)
Adversarial column name biases the LLM	Defense-in-depth via the system prompt’s structured output requirement

Residual risk: an adversary controlling column comments can probably bias enrichment description tone. Blast radius bounded by what free-text English can express.

Asset 3 — Agent pathological loops

Threat	Mitigation
Runaway tool-call loop drains LLM budget	`--max-cost` cap on `index`; read-side tools cost effectively zero per call
`get_metric` blow-up via repeated grain-mismatched queries	Postgres-side `statement_timeout` (operator-tunable via `--statement-timeout-ms`; default 30000ms / 30s; pass `0` to disable); grain mismatches refused at compile time before SQL leaves the process
Unbounded result-set rows	Application-level row cap via `--max-rows-per-result` (operator-tunable; default 10000; pass `0` to disable) — defense against an LLM that asks for a `SELECT *` equivalent

Asset 4 — Adversarial schema names

Threat	Mitigation
150 KB identifier exhausts agent context window	Identifier validator rejects names longer than 63 chars
Unicode confusable schema names mislead operator	Identifier regex `^[A-Za-z_][A-Za-z0-9_$]*$` rejects non-ASCII alphanumerics
Embedded ANSI escape sequences corrupt terminal output	Same regex rejects control characters

What we explicitly do NOT defend against

Honesty matters for procurement reviews. These are out of scope:

Denial of service against the host process. A misbehaving agent can spike CPU on the host machine. Operator infrastructure (cgroups, ulimits, container limits) is the right layer.
Filesystem attacks on ~/.schemabrain/. OS permissions + full-disk encryption are the right layer. We set the events file to 0600 and rely on standard umask for the store.
Side-channel attacks on the local SQLite cache. Out of scope for a local single-process tool.
Supply-chain attacks on PyPI dependencies. Closed by a separate programme: SECURITY.md + Dependabot + pip-audit + Bandit + Semgrep in CI.
Multi-tenant isolation. Single-tenant by design.
Network-tamper detection. No remote sinks today. Audit chain is local-file integrity, not network-transport integrity.

Compliance posture

We do not certify against SOC 2 / ISO 27001 / HIPAA / PCI DSS — those are organizational programmes, not properties of a local CLI tool. We do ship mechanisms that operators building toward those certifications can point at as compensating controls:

Compliance need	What SchemaBrain provides
Audit trail integrity	SHA256 chain with `audit verify`; append-only SQL triggers
Access logging	Every tool call recorded in `mcp_audit` with `tool_name`, `status`, `pii_categories`, `occurred_at`, `chain_hash`
PII handling	Typed taxonomy across 12 regulatory-grounded categories; column-level redaction in `describe_entity`
Read-only access	Architectural (no write tool in the binary) + recommended read-only Postgres role + session-level `default_transaction_read_only=on`
Credential protection	Env-var-only; never logged; filtered out of connection strings
Vulnerability disclosure	Process documented at `SECURITY.md`; GitHub Private Vulnerability Reporting is the preferred channel

What SchemaBrain does NOT have (yet)

For procurement teams comparing to enterprise products:

RBAC / fine-grained per-user policies. SchemaBrain is single-operator. Multi-operator policy enforcement is a future product surface.
Hosted control plane / centralized audit aggregation. No SaaS today; local-first is architecturally committed.
External audit anchor (audit checkpoint). Manual workaround: cron schemabrain audit verify --full + persist the chain head hash externally.
EXPLAIN-based cost cap. --statement-timeout-ms (30s default) and --max-rows-per-result (10000 default) are the v0.4 cost guards. EXPLAIN-based pre-execution cost estimation is on the v0.5+ roadmap.
HTTPS / SSE transport. stdio only today. Required for cloud MCP clients like ChatGPT Connectors; v0.5+ roadmap.
Content-aware PII classification. v0.4 classifier is name-based with structural refinements (integer-FK guard). Sample-based content matching is roadmap.
Anti-prompt-injection at the data-retrieval layer. Sample values are PII-redacted but not delimiter-wrapped or pattern-stripped against generic injection payloads. Roadmap.

Reporting a vulnerability

Do not open public GitHub issues for security reports. Use one of these private channels:

GitHub Private Vulnerability Reporting (preferred): github.com/Arun-kc/schemabrain/security/advisories/new
Email: arunkc91@gmail.com with subject prefix [schemabrain-security]

Full process, expected timelines, and scope at SECURITY.md.

CLI

MCP tools

Concepts

Security & operations

Design decisions (ADR)

Security

SchemaBrain — Security Posture

At a glance

The four load-bearing mechanisms

1. Architectural read-only

2. PII propagation at the metric compiler

3. Tamper-evident audit chain

4. Structured recovery envelopes

Threat model (summary)

Asset 1 — MCP tool surface

Asset 2 — Prompt injection into the database

Asset 3 — Agent pathological loops

Asset 4 — Adversarial schema names

What we explicitly do NOT defend against

Compliance posture

What SchemaBrain does NOT have (yet)

Reporting a vulnerability

Further reading

​SchemaBrain — Security Posture

​At a glance

​The four load-bearing mechanisms

​1. Architectural read-only

​2. PII propagation at the metric compiler

​3. Tamper-evident audit chain

​4. Structured recovery envelopes

​Threat model (summary)

​Asset 1 — MCP tool surface

​Asset 2 — Prompt injection into the database

​Asset 3 — Agent pathological loops

​Asset 4 — Adversarial schema names

​What we explicitly do NOT defend against

​Compliance posture

​What SchemaBrain does NOT have (yet)

​Reporting a vulnerability

​Further reading

SchemaBrain — Security Posture

At a glance

The four load-bearing mechanisms

1. Architectural read-only

2. PII propagation at the metric compiler

3. Tamper-evident audit chain

4. Structured recovery envelopes

Threat model (summary)

Asset 1 — MCP tool surface

Asset 2 — Prompt injection into the database

Asset 3 — Agent pathological loops

Asset 4 — Adversarial schema names

What we explicitly do NOT defend against

Compliance posture

What SchemaBrain does NOT have (yet)

Reporting a vulnerability

Further reading