Skip to content

Docs

What is RAG?

RAG stands for Retrieval-Augmented Generation. It's a pattern where an AI model looks up relevant material from an external source before it generates an answer, so the response is grounded in your data instead of just the model's training set.

Why it exists

Large language models are trained on a fixed snapshot of the internet. They don't know what you wrote yesterday. They don't know your customer list, your runbooks, your meeting notes, or the specific architecture decisions your team made last quarter.

You can stuff that information into the prompt. But prompts are bounded, and you don't want to paste your entire knowledge base into every conversation.

RAG is the workaround. Instead of dumping everything into the prompt, you store your data in a searchable index, look up only the pieces relevant to the current question, and paste those into the prompt right before the model writes its answer.

How RAG works

Every RAG system has the same three moves. The details vary; the shape does not.

1. Retrieve

Take the user's question, search an external store (a vector database, a full-text index, or both), and pull back the documents most likely to be relevant. This is the "retrieval" part.

2. Augment

Paste the retrieved snippets into the prompt that goes to the model, usually under a heading like "Use this context to answer:". This is the "augmented" part.

3. Generate

The model writes its answer using the retrieved context plus its own training. Done well, you also get citations back to the source documents.

Most "chat with your documents" products are RAG under the hood. ChatGPT's Custom GPTs with uploaded files: RAG. Notion AI answering questions about your workspace: RAG. A support chatbot that knows your help centre: usually RAG.

RAG vs fine-tuning

Fine-tuning bakes knowledge into the model's weights by re-training it on your data. RAG keeps the knowledge outside the model and looks it up at query time. The trade-offs:

RAG

  • Updates instantly when your data changes
  • Cheap to add new documents
  • Can cite sources
  • Works with any model

Fine-tuning

  • Better at teaching style, format, and behaviour
  • No retrieval step at runtime
  • Hard to update: you re-train when data changes
  • Tied to one specific model

For "remember my notes," RAG is almost always the right tool. For "respond in this exact voice," fine-tuning can help. They're not mutually exclusive. Production systems often use both.

RAG vs MCP

RAG is a pattern. MCP is a protocol. They're not competitors. MCP is a clean way to plumb retrieval into the assistant.

In a typical RAG setup, somebody has to build the retrieval pipeline: pick a vector store, decide what counts as a relevant chunk, pre-process documents into an index, and keep it in sync. That work doesn't go away with MCP. It just moves.

With MCP, the AI client decides when to search. It calls a tool on an MCP server, the server returns matching notes, and the model uses them. The retrieval logic lives on the server. The model gets fresh, live results without you wiring it up per-client.

Put plainly: MCP doesn't replace RAG, it standardises the wiring. The MCP server still does the retrieval. What you avoid is building and maintaining your own pipeline on top of an AI client that doesn't natively know how to call your data.

Where RAG falls short

RAG is great, not magic. A few honest limits worth naming:

  • Retrieval is the bottleneck. If the search step returns the wrong snippets, the model writes a confident answer based on irrelevant material.
  • Chunking matters. Split documents too coarsely and you waste context window. Too finely and you lose the surrounding meaning.
  • It can still hallucinate. Grounding makes answers better, not always correct. Models occasionally invent details that weren't in the retrieved text.
  • Stale snapshots. Many RAG systems index your data once and forget. If the index isn't kept in sync, the AI quietly answers from an old version. This is the same trap as Custom GPT knowledge files.
  • Permissions are easy to leak. A naive RAG system that pre-indexed everything can surface a document the current user shouldn't see. The retrieval layer has to respect access controls.

Live retrieval through MCP avoids a few of these. The AI hits your real data each time, and your authentication runs on every call.

RAG and Hjarni

Hjarni doesn't sell itself as a "RAG product." But functionally, when you connect Claude or ChatGPT to Hjarni's MCP server, that's exactly what you get: the AI retrieves relevant notes from your knowledge base, those notes are pasted into context, and the model writes an answer grounded in what you actually wrote.

The difference is who runs the pipeline. Hjarni runs the search and read tools. Your AI client decides when to call them. You don't manage embeddings, chunk sizes, or a vector store. You just write notes.

RAG is the pattern. MCP is the wiring. Your notes are the data. Hjarni is the place they live.

If you want the long version of the "give your AI a memory" argument, see how to give your AI long-term memory.

Give your AI a memory

Common questions

FAQ

Does RAG replace fine-tuning?

For most knowledge use-cases, yes. Fine-tuning is better for teaching style, format, and behaviour. RAG is better for keeping facts current. Production systems often use both.

Is MCP a kind of RAG?

MCP is a protocol, not a pattern. But when an AI client calls an MCP server's search tool and uses the results in its answer, the end-to-end behaviour is RAG. The retrieval just runs on the server instead of inside the product.

Do I need a vector database to use RAG?

No. Vector search is one way to retrieve, full-text search is another, and most strong systems blend both. Hjarni's MCP server uses full-text search; you don't run a vector database to get the benefit.

Why do RAG systems still hallucinate?

Grounding reduces hallucinations, it doesn't eliminate them. If the retrieval step returns the wrong documents, or the model overrides the context with its own training, you can still get invented details. Citations help you spot it.

Where can I learn more about the original RAG paper?

The 2020 paper by Lewis et al. ("Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks") is the canonical reference. The industry usage of the term has since broadened.

Write once. You both remember.

Free to start. No credit card required.

Give your AI a memory

Works with Claude and ChatGPT today. Gemini coming soon.