Back to blog
AIMCPRAGOpen SourcePython

claude-rag: giving Claude Code a long-term memory, fully local

We built a small open-source MCP server that indexes every past Claude Code conversation on your machine and lets Claude search them — no cloud, no API keys, ~350 lines of Python.

·View live site
claude-rag: giving Claude Code a long-term memory, fully local

The problem

Anyone who uses Claude Code daily runs into the same wall: the model is sharp inside a single session, but it has no memory of last week's session in the same repo. You end up re-explaining the same architectural decisions, the same workaround for that one flaky test, the same reason you picked library A over library B — over and over.

Claude Code already writes every turn to disk under ~/.claude/projects/**/*.jsonl. The transcripts are right there. The only thing missing is a way for Claude to actually look at them.

What we built

claude-rag is a tiny RAG layer over those local transcripts, exposed as an MCP server so Claude can query it from inside the IDE.

  • Parses every JSONL transcript under ~/.claude/projects/, strips IDE/system-reminder wrappers, and chunks by character (1500 char chunks, 200 overlap)
  • Embeds chunks with sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 — runs on CPU via fastembed, supports both English and Chinese queries
  • Stores vectors in a local Chroma PersistentClient — cosine similarity, ~50 MB after a few months of use
  • Incremental: only re-embeds JSONL files whose mtime/size changed since the last run
  • MCP server (stdio) exposes three tools: search_conversations, get_session_window, reindex

Why it matters

Once registered, Claude can answer things like "do you remember what we decided about X" or "回忆一下我之前是怎么处理 Y 的" by calling search_conversations on its own — no manual copy-paste, no separate UI to open. The relevant chunks come back into the current context window and the conversation continues.

The whole thing is also a useful pattern in itself: a small, single-purpose MCP server, no SaaS, no telemetry, fits inside one developer's machine. Easy to fork and adapt.

Privacy

Nothing leaves the machine. The embedding model runs locally, the vector DB is a file on disk, and the MCP server only talks to Claude Code over stdio. No API keys are required and none are used.

The Chroma DB does store the full chunk text — same sensitivity profile as the original transcripts — so it lives under ~/.claude-rag/ and is excluded from sync by default.

The numbers

  • ~350 lines of Python total
  • ~220 MB embedding model (downloaded once, cached)
  • ~50 MB index after several months of daily use
  • 5–10 minutes for a first full index on CPU
  • Incremental re-index in seconds for a normal workday

Try it

It's open source. Clone, install with uv, run the indexer, register the MCP server, and Claude has its memory back.

Have a project like this?

JY Tech designs, ships, and runs Next.js + Vercel products for startups and small businesses across the US and China.

Talk to us
Loading footer...