Project Overview
Flat vector-search RAG answers “what chunk looks similar to this question” — but real questions about a codebase or document set are structural: how does X relate to Y, what depends on this, what’s the big picture here? Graphify is a retrieval system I built to answer those questions properly.
It ingests any input — code, documentation, papers, images, video — and builds a persistent knowledge graph on top, then serves structured queries against it.
The Challenge
- Vector similarity retrieves chunks, not relationships — it can’t answer “what connects A to B”
- Understanding a large codebase or document corpus needs a map, not a search box
- The graph must persist and stay queryable across sessions, not be rebuilt per question
What I Built
- Universal ingestion: one pipeline that turns heterogeneous inputs (source code, Markdown, PDFs, media) into graph entities and relationships
- God nodes and community detection: automatically surfaces the most connected concepts and clusters related entities, giving an instant architectural overview of any corpus
- Query, path, and explain tools: ask direct questions, trace the connection path between any two entities, or get an explanation of a node in the context of its neighborhood
- Persistent storage: graphs survive across sessions and grow incrementally as new input arrives
Outcomes
- Answers architecture-level questions about codebases and document sets that flat similarity search cannot express
- Used in my own workflow to onboard onto unfamiliar codebases and research topics
- The retrieval patterns it implements — entity extraction, graph construction, structured querying — are the same ones I build into client RAG systems
Technologies Used
- Language: Python
- Approach: LLM-driven entity/relationship extraction, graph algorithms (community detection, pathfinding)
- Interface: Query/path/explain tooling over the persisted graph