SG
← all patterns

PAT-03 · Architecture pattern

RAG knowledge layer

Answers grounded in your documents, not the model's guesswork.

What it is

A retrieval-augmented-generation layer that puts your own documents in front of a language model at answer time — so responses are grounded in your knowledge base, with citations, instead of the model's training data. Retrieval first, generation second.

When to use it

When answers must come from a corpus the model was never trained on — internal docs, policies, product knowledge — and must be current, attributable, and access-controlled. Use it when “the model made it up” is unacceptable.

Skip it if a few documents fit in the prompt directly, or if the knowledge is already in the model.

System shape

Documents are chunked and embedded into a vector index. At query time the system retrieves the most relevant chunks, filtered by the user's permissions, assembles them into context, and the model returns a grounded answer with citations. A secondary evaluation loop feeds answer quality back to retrieval.floweval loopdocumentsyour corpuschunk · embedtunedvector indexembeddingsqueryuserretrievepermissionedprompt assemblycontextgrounded answer+ citationseval loop
PAT-03 · system shape

The flow is a pipeline, not a prompt: documents are chunked and embedded into a vector index; at query time the system retrieves the most relevant chunks — filtered by the user's permissions — assembles them into context, and only then calls the model, which answers with citations back to the source.

Retrieval quality, not the prompt, decides answer quality. An evaluation loop measures answers against known-good cases, so changes are improvements rather than guesses. The model is the last step, not the system.

Failure modes

  • Treating RAG as “search with a nicer prompt” — bad retrieval yields confident wrong answers.
  • Ignoring permissions at retrieval — the model surfaces documents a user should not see.
  • No citations — answers are unverifiable, so they are untrusted.
  • A stale index — answers drift away from the source of truth.
  • No evaluation — you cannot tell whether a change helped or hurt.

How we build it

We treat retrieval as the system: chunking and embedding tuned to the corpus, permissions enforced at retrieval and not after, a citation on every answer, and an evaluation harness so changes are measured, not guessed.

The model is wired in last.

Next

Have a system like this to build?