MachineCraft LogoMachineCraft
Retrieval (RAG)SHIPPED

Vector Store RAG

Retrieval-augmented generation, end to end. Index your data once, then answer every question from the passages that actually matter — grounded responses that scale past a single document.

The flow

What's in the pipeline.

Index a corpus and answer from the passages most relevant to each question.

  1. File
  2. Split Text
  3. Embeddings
  4. Vector Store
  5. Prompt
  6. Language Model
  7. Chat Output
Engine
Workflow EngineSHIPPED
Category
Retrieval (RAG)
Level
Advanced
Components
7

When the knowledge is too large to paste into a prompt, you retrieve instead of stuff. Vector Store RAG is the full pattern: split your documents into passages, embed them, store them in a vector database, and at question time pull back only the chunks that match — then answer from those. It's the production approach to grounding a model in a corpus.

How it works

There are two phases, and the flow shows both: an ingestion path that indexes your data, and a query path that answers questions.

Ingest (run when your data changes):

  1. File loads the source documents.
  2. Split Text breaks them into passage-sized chunks — small enough to be precise, large enough to keep meaning intact.
  3. Embeddings turns each chunk into a vector that captures its meaning.
  4. Vector Store writes those vectors to the database so they're searchable.

Query (run per question):

  1. The question is embedded the same way and the Vector Store returns the nearest passages — semantic match, not keyword match.
  2. Prompt assembles those retrieved passages with the question.
  3. Language Model answers from the supplied context, and Chat Output returns it.

Separating ingestion from query is what lets this scale: you index once, query many times, and re-index only when the underlying data changes.

When to reach for it

Reach for Vector Store RAG when grounding has to scale — a documentation set, a policy library, a knowledge base, anything too big or too changeable to fit in a single prompt. Retrieval keeps each answer focused on the few passages that matter, which keeps responses accurate and affordable as the corpus grows.

When to reach for something else

The index is real infrastructure to stand up and maintain — overkill for one document, where Document Q&A loads the whole text with nothing to configure. And retrieval grounds facts, not conversation; for a multi-turn assistant, compose this with the Memory Chatbot pattern so the bot keeps the thread and the citations.

Try it

Index a handful of documents, then ask a question whose answer lives in just one of them:

"Which regions is data residency supported in, and what's the default?"

A working pipeline retrieves the passage that covers residency and answers from it — ignoring the rest of the corpus. Inspect the retrieved chunks to confirm the model is citing real passages, not filling gaps from training. That visibility into what was retrieved is how you trust a RAG system.

Retrieval quality is chunking quality

Most RAG results live or die on the Split Text settings. Chunks too large bury the answer in noise; too small and they lose context. Tune chunk size and overlap to your documents — it's the highest-leverage knob in the flow.

Make this template yours.

MachineCraft is free during private beta. Join the waitlist and we’ll bring you in to start from this flow and adapt it to your work.