Building a Personal Knowledge Graph for My AI Tools
Self-hosted Cognee plus a 14B local LLM finally gave my Claude Code sessions cross-project memory. The hard part wasn't the infrastructure.
I run Claude Code across ten active projects. A quant trading platform with 14 signal generators. A smart-building strategy platform assessing buildings across ten domains. A career coaching app. A personal site. A self-hosted agent framework. Each has its own CLAUDE.md file, its own project memory dir, its own set of decisions I've accumulated about how I want Claude to behave in that codebase.
Each one is also a new session. A fresh prompt. A stranger who doesn't know what I decided last Tuesday in a different repo.
The worst version of this showed up when I was writing a blog post about one project and wanted to reference a decision I'd made in another. Claude had no way to know. Auto-memory is per-project. CLAUDE.md files don't cross-reference. Every cross-project question started with me pasting screenshots. I was the context pipe.
So I built a personal knowledge graph.
The stack
Cognee handles extraction and retrieval. It pulls entities and relationships out of markdown into a graph plus a vector index, and exposes search over MCP. It runs in a Proxmox LXC on my homelab cluster, fronted by NGINX Proxy Manager with bearer auth, HA-replicated across two nodes.
The LLM tier matters more than I expected. I A/B tested three local models against the same 7-file test corpus on serial per-dataset cognify: gemma4:31b failed reliability — produced invalid KnowledgeGraph JSON about 40% of the time. qwen2.5:32b was reliable but twice as slow as Phi4 and measurably shallower on per-item detail. phi4:14b won on hand-graded scoring, 13 of 15 versus Qwen's 11.5 of 15. Embeddings run locally via Ollama too — snowflake-arctic-embed2 at 1024 dimensions with an 8K-token context, after nomic-embed-text kept erroring on summary chunks that exceeded its 2K runtime limit.
The seeder is idempotent by content hash. SHA-256 per source file, .seed-state.json tracks what's been ingested, re-runs skip unchanged files. Batches of five sources at a time. Cognify fires per-dataset — never against multiple datasets in one call, because concurrent cognify on a single SQLite metadata DB deadlocks. No content leaves my LAN. Everything runs on a Mac Studio M3 Ultra.
The interesting failures
Acronym collision. My profile file mentions "MCP connectors" as part of the Splunk-adjacent work I do at Cisco. My self-hosted agent research files mention "Mission Control" as the name of the orchestrator. The extractor collapsed them. A query about what I'm working on with Splunk returned agent-framework research instead of Model Context Protocol integration. The fix was a single clause added to my profile: "MCP (Model Context Protocol, the Anthropic specification for connecting LLMs to external tools)." Re-seeded that file. Direct queries now resolve correctly. Some older graph edges still point at the conflated node — extractors learn the right definition without rewiring everything that referenced the old one.
Title flattening. My title on disk reads "Global Workplace Field CTO and Engineering Director." The graph returns "Global Workplace Field CTO." The Engineering Director suffix is dropped during summarization. You can't fight a 14B model on exact string preservation. You route around it.
LanceDB poisoning. My first attempt to ingest a 77KB CLAUDE.md file took down the entire projects dataset. A single chunk's extraction failed mid-pipeline and wrote a partial record with feedback_weight=None into LanceDB's TextSummary_text table. Because that field is non-null in the schema, every subsequent good chunk also failed — the whole table locked up. Recovery required a full volume wipe. Fix: split the file at H2 boundaries into eleven smaller sources (856 bytes to 30KB each). If one section's extraction fails now, only that section is poisoned, not the whole store.
I've documented a dozen separate gotchas like these so future sessions don't relearn them — serial-cognify-per-dataset, the batch size limit, the explicit max_tokens requirement for certain providers, MCP reverse-proxy host headers with explicit ports, why Sonnet was too expensive for bulk extraction, how to handle the 77KB file. Each one cost me either time or money to learn the first time.
The design insight
Every one of those failures pointed at the same thing: the memory system has a correctness boundary, and I have to declare where it ends.
The rule that landed is precedence. Three layers:
- My profile file is canonical for identity — role, title, voice, topic guardrails. The memory system never answers those questions, even if it thinks it knows, because its extraction is lossier than the source.
- Blog post files are canonical for voice rhythm and repetition checks.
- Persona-memory is canonical for the things the canonical files don't cover — project architectures, specific decisions, technical receipts.
When the sources might conflict, the skill asks me instead of picking. Silent resolution is the failure mode worse than noise.
The moment that became explicit, the rest fell out. My write-post skill gained a gate: only query persona-memory when the post references a specific project by name and needs technical detail that isn't in git history or the project's CLAUDE.md. A post about Sunday night dread doesn't need a trading platform's architecture injected. A post about my quant platform does.
The broader point
The interesting question in AI tooling right now isn't whether you can give a model context. The infrastructure for that is commodity. Vector stores are cheap. Graph databases are open source. Local LLMs are good enough for extraction. An MCP server is a weekend of work.
The interesting question is what the model should be trusted to remember, and what it should defer to a canonical source for. Trust too much and you get confident answers that are subtly wrong. Trust too little and you've built infrastructure nobody uses.
I built the knowledge graph. Then I wrote the rule that tells my AI not to trust it for the things it's bad at.
That rule was the actual feature.