Anonymized Proof Reports

What a verification gate buys you

The pattern: when an AI agent produces an action — a fix, a classification, a recommendation, a record match — a structured artifact assembles only when every cross-check resolves. Verbatim input, root cause, action taken, annotated before/after, served-asset / CLI / API verification. If a check fails, the artifact refuses to assemble and the agent escalates to a human with the failure visible.

For human-in-the-loop AI: the gate gives the reviewing human easy-to-follow evidence to approve, deny, or correct in seconds. No Slack scroll. No "I'll trust the agent for now." Every action ships with its receipt.

Where it generalizes: medical-record matching that must cite the patient's own records, document classification that justifies each label from training labels, compliance review that references the rule cited, customer-support agent actions that link to the article they used. Same pattern, different verification primitive.

Distribution: share-by-link only. Not public on LinkedIn. Recipient list controlled. For higher-security distribution, host behind Cloudflare Access with per-recipient signed URLs.

Sample 1 · AI / RAG product AI-output bounds gate

RAG walled garden — agent stays inside the approved knowledge boundary

The cleanest case of an AI agent's verification gate. Across 4 test scenarios — high-confidence pass, off-topic blocked, partial-match limited, cross-topic blocked — the agent provably refuses to answer from outside the approved corpus. This is the verification gate for production AI agents, applied to a chat that must cite its source.

5 screenshots (anonymized) · 4 gate conditions · ~10-min read

Open report →

Sample 2 · cloud platform Action verification

Banner hid the real cloud-provider error

An AI-agent-driven fix verified at the dev stage. Anatomy: verbatim client quote → root cause → action taken → annotated screenshot with red ring overlay → served-asset check. The same anatomy applies whenever an agent's action needs human approve/deny/correct: classification, decision, recommendation, record match.

3 screenshots · dev verification · ~10-min read

Open report →

Sample 3 · cloud platform End-to-end verification

End-to-end: agent action confirmed by external API

Test-stage counterpart to Sample 2. The agent created a real cloud project; the report verifies the side-effect (paired cloud accounts) actually exists in the right place via the cloud-provider API. Anatomy: success verdict, blurred screenshots, "nameplate" boxes showing the verified shape (account IDs, OU IDs, root IDs — values redacted, structure shown).

2 screenshots · external-API verification · ~7-min read

Open report →