PAT-03 · Architecture pattern
RAG knowledge layer
Answers grounded in your documents, not the model's guesswork.
What it is
A retrieval-augmented-generation layer that puts your own documents in front of a language model at answer time — so responses are grounded in your knowledge base, with citations, instead of the model's training data. Retrieval first, generation second.
When to use it
When answers must come from a corpus the model was never trained on — internal docs, policies, product knowledge — and must be current, attributable, and access-controlled. Use it when “the model made it up” is unacceptable.
Skip it if a few documents fit in the prompt directly, or if the knowledge is already in the model.
System shape
The flow is a pipeline, not a prompt: documents are chunked and embedded into a vector index; at query time the system retrieves the most relevant chunks — filtered by the user's permissions — assembles them into context, and only then calls the model, which answers with citations back to the source.
Retrieval quality, not the prompt, decides answer quality. An evaluation loop measures answers against known-good cases, so changes are improvements rather than guesses. The model is the last step, not the system.
Failure modes
- Treating RAG as “search with a nicer prompt” — bad retrieval yields confident wrong answers.
- Ignoring permissions at retrieval — the model surfaces documents a user should not see.
- No citations — answers are unverifiable, so they are untrusted.
- A stale index — answers drift away from the source of truth.
- No evaluation — you cannot tell whether a change helped or hurt.
How we build it
We treat retrieval as the system: chunking and embedding tuned to the corpus, permissions enforced at retrieval and not after, a citation on every answer, and an evaluation harness so changes are measured, not guessed.
The model is wired in last.
Related
- Field note
- RAG is not search with a nicer prompt
- Service
- AI integration
Next
Have a system like this to build?