RAG Walled Garden QA Report

AI Agent Visual Testing · Generated [date] · Target: dev.example.com · Topic for this run: "Newton's Laws of Motion" (illustrative substitute domain)

2 of 4 Tests Passed

Test 1: Chat Pass Gate — on-topic question answered with source citations

Test 2: Chat Blocked Gate — off-topic question properly blocked with redirect chips

Test 3: Chat Limited Gate — partial-match question got full-block instead of limited/caveat response

Test 4: Cross-topic Question — adjacent-topic question blocked entirely instead of redirected with context

Summary

Test	Scenario	Message Sent	Result
Test 1	Chat Pass Gate (high confidence)	"What is Newton's second law of motion?"	PASS
Test 2	Chat Blocked Gate (off-topic)	"What's the best recipe for chocolate cake?"	PASS
Test 3	Chat Limited Gate (partial match)	"How does relativistic motion affect classical mechanics?"	FAIL
Test 4	Cross-topic Question (adjacent topic locked)	"How does friction relate to thermodynamic entropy?"	FAIL

Test 1: Chat Pass Gate (High Confidence)

Expected

Response references F = ma and surrounding curated source material. A citation appears below the response (book icon + "Source: ...").

PASS — Response correctly references the second law with source citations from the topic-scoped knowledge base.

AI Agent Observation

Response says: "Newton's second law states that the net force acting on an object is equal to the product of its mass and acceleration: F = ma. This is one of three fundamental laws Newton introduced in Principia Mathematica (1687) to describe the relationship between motion and the forces acting on a body. The law explains both why a heavier object requires more force to accelerate at the same rate as a lighter one, and why the same force produces less acceleration on a more massive object." Sources cited (illustrative, not real titles): - [Author A] (2018) Classical Mechanics for Engineers - [Author B] (2010) Principles of Physics Also shows the "⚠ Limited sources available for this topic" notice in italic — proving the "limited" notice path exists and renders.

Test 2: Chat Blocked Gate (Off-Topic)

Expected

Response is a blocked message (no AI-generated answer). Redirect topic chips appear with available topics.

Test 2: Blocked off-topic question (blurred)

PASS — Off-topic question properly blocked with redirect to available topics.

AI Agent Observation

Response says: "I don't have curated sources on that specific question yet. I can speak confidently about these topics:" Redirect topic chips displayed: Mechanics, Thermodynamics, Optics, Electromagnetism. No recipe content was generated. The off-topic question was correctly blocked.

Test 3: Chat Limited Gate (Partial Match)

Expected

Response includes a "limited sources" caveat (warning icon + italic text) with a partial answer drawing from in-scope sources, acknowledging the relativistic part is outside the current topic.

FAIL — Received full-block response instead of limited/caveat response.

Issue Found

Response was identical to the chocolate-cake off-topic question: "I don't have curated sources on that specific question yet…" with redirect chips. The relativistic-vs-classical question is partially on-topic (classical mechanics IS the current topic) but the gate treated it as fully off-topic. Note: Test 1's response DID render the "⚠ Limited sources available" italic warning, proving the limited-gate mechanism exists. It just wasn't triggered for this edge case — the RAG similarity threshold may be too aggressive, treating anything that doesn't closely match ingested in-topic content as fully off-topic.

Root Cause Hypothesis

The RAG gate's similarity threshold likely classifies "relativistic motion + classical mechanics" as below the minimum relevance score, falling into the "no match" bucket rather than the "partial match" bucket. The threshold between "limited" and "blocked" may need tuning, OR the RAG gate may not yet have a separate "limited" tier — it may only distinguish "match" vs "no match." Suggested files: app/services/rag.py (gate logic), app/services/chat.py (response handling).

Test 4: Cross-topic Question (Adjacent Topic Locked)

Sent: "How does friction relate to thermodynamic entropy?" — asked while on the Mechanics topic page; Thermodynamics topic is gated.

Expected

Response acknowledges friction (in-topic) and points to the gated Thermodynamics topic for the entropy half. Or, at minimum, offers a contextual redirect rather than a generic block.

FAIL — Adjacent topic locked; cross-topic question fully blocked instead of contextually redirected.

Issue Found

Two separate issues: 1. The Thermodynamics topic is locked — direct navigation to /journey/thermodynamics returns "Error: Topic locked." Topics are sequential; the user must complete the Mechanics journey before unlocking Thermodynamics. 2. Asking "How does friction relate to thermodynamic entropy?" while on the Mechanics topic page returned the same full-block response as fully off-topic questions. The gate treated entropy (a Thermodynamics concept) as off-topic for Mechanics, even though friction is core to Mechanics and entropy is a natural follow-on.

Root Cause Hypothesis

The RAG gate scopes retrieval to the current topic only. On the Mechanics page, only Mechanics-tagged source documents are searched. Entropy content exists in the corpus but is tagged to Thermodynamics, so the gate returns no matches and blocks the response. This is arguably correct behavior (topic isolation) but means cross-topic questions always get fully blocked rather than redirected with helpful context (e.g., "I can answer the friction half from this topic; for entropy, complete the Mechanics journey to unlock Thermodynamics"). Suggested files: app/services/rag.py (topic scoping), app/routes/chat.py (cross-topic context handling).

Onboarding Flow (Pre-Test)

PASS — Full onboarding completed: authentication, topic selection (Mechanics), profile setup. Landed on journey page with chat phase active.

Why this matters — the AI-output verification pattern

This QA report demonstrates the cleanest case of "the artifact refuses to ship" — applied to an AI agent's behavior, not to a code fix. The walled garden is the gate: the AI is only allowed to answer from approved, topic-scoped sources, and refuses to generate when the retrieval score is below threshold. Each test scenario is a different gate condition: high-confidence pass, off-topic block, partial-match limited, cross-topic block.

The pattern generalizes far beyond chat: any AI workflow with a "stay inside the boundary" requirement (medical record matching that must cite the patient's own records, document classification that must justify each label from training labels, compliance review that must reference the rule it cited) benefits from a gate that's verifiable, with a screenshot or structured output proving each scenario behaves as designed.

Report generated by AI agent · Test runner: redacted · Screenshots: 5 captured (blurred for anonymity)