Back to Work
AIRAGLLM

Graphify — Knowledge Graphs from Any Input

A retrieval system that turns codebases, docs, papers, and media into a persistent, queryable knowledge graph — structured retrieval where flat RAG falls short

Client

Personal project

Role

Creator

Timeline

Ongoing

Tech Stack

AIRAGLLM

Project Overview

Flat vector-search RAG answers “what chunk looks similar to this question” — but real questions about a codebase or document set are structural: how does X relate to Y, what depends on this, what’s the big picture here? Graphify is a retrieval system I built to answer those questions properly.

It ingests any input — code, documentation, papers, images, video — and builds a persistent knowledge graph on top, then serves structured queries against it.

The Challenge

  • Vector similarity retrieves chunks, not relationships — it can’t answer “what connects A to B”
  • Understanding a large codebase or document corpus needs a map, not a search box
  • The graph must persist and stay queryable across sessions, not be rebuilt per question

What I Built

  • Universal ingestion: one pipeline that turns heterogeneous inputs (source code, Markdown, PDFs, media) into graph entities and relationships
  • God nodes and community detection: automatically surfaces the most connected concepts and clusters related entities, giving an instant architectural overview of any corpus
  • Query, path, and explain tools: ask direct questions, trace the connection path between any two entities, or get an explanation of a node in the context of its neighborhood
  • Persistent storage: graphs survive across sessions and grow incrementally as new input arrives

Outcomes

  • Answers architecture-level questions about codebases and document sets that flat similarity search cannot express
  • Used in my own workflow to onboard onto unfamiliar codebases and research topics
  • The retrieval patterns it implements — entity extraction, graph construction, structured querying — are the same ones I build into client RAG systems

Technologies Used

  • Language: Python
  • Approach: LLM-driven entity/relationship extraction, graph algorithms (community detection, pathfinding)
  • Interface: Query/path/explain tooling over the persisted graph