Graphify — Knowledge Graphs from Any Input

Project Overview

Flat vector-search RAG answers “what chunk looks similar to this question” — but real questions about a codebase or document set are structural: how does X relate to Y, what depends on this, what’s the big picture here? Graphify is a retrieval system I built to answer those questions properly.

It ingests any input — code, documentation, papers, images, video — and builds a persistent knowledge graph on top, then serves structured queries against it.

The Challenge

Vector similarity retrieves chunks, not relationships — it can’t answer “what connects A to B”
Understanding a large codebase or document corpus needs a map, not a search box
The graph must persist and stay queryable across sessions, not be rebuilt per question

What I Built

Universal ingestion: one pipeline that turns heterogeneous inputs (source code, Markdown, PDFs, media) into graph entities and relationships
God nodes and community detection: automatically surfaces the most connected concepts and clusters related entities, giving an instant architectural overview of any corpus
Query, path, and explain tools: ask direct questions, trace the connection path between any two entities, or get an explanation of a node in the context of its neighborhood
Persistent storage: graphs survive across sessions and grow incrementally as new input arrives

Outcomes

Answers architecture-level questions about codebases and document sets that flat similarity search cannot express
Used in my own workflow to onboard onto unfamiliar codebases and research topics
The retrieval patterns it implements — entity extraction, graph construction, structured querying — are the same ones I build into client RAG systems

Technologies Used

Language: Python
Approach: LLM-driven entity/relationship extraction, graph algorithms (community detection, pathfinding)
Interface: Query/path/explain tooling over the persisted graph

Graphify — Knowledge Graphs from Any Input

Client

Role

Timeline

Tech Stack