Building a Code Knowledge Graph for Ai Agents

Building a Code Knowledge Graph for AI Agents

There’s a fundamental tension at the heart of agentic coding today. AI models are remarkably good at writing code, but they’re surprisingly bad at understanding the codebase they’re writing it in.

When you ask Claude, Copilot, or Codex to make a change to your project, the agent does what any new developer would do: it greps around, reads some files, builds a mental model, and starts editing. But unlike a human developer who accumulates understanding over weeks and months, the AI starts from scratch every session. Worse, it’s working through a keyhole – reading files one at a time, piecing together relationships from text patterns, burning context window on exploration before it can even begin the real work.

What if there was a better way? What if the agent could simply ask what calls a function, what a change would break, or how two distant parts of the codebase connect – and get an instant, accurate answer?

That’s what codemap does.

The Origin Story: codegraph

The idea isn’t originally mine. Credit goes to codegraph, a project by Colby McHenry that first demonstrated the concept: parse a codebase with tree-sitter, extract symbols and their relationships into a graph, and expose that graph to AI tools via the Model Context Protocol (MCP).

It was an elegant insight. Instead of making the AI read and re-read source files to understand structure, give it a pre-computed knowledge graph it can query directly. Function calls, class hierarchies, import chains, containment relationships – all indexed and queryable in milliseconds.

When I encountered codegraph, I immediately saw the potential. But I also saw an opportunity to explore something I’d been thinking about: what does it look like to deploy MCP servers as compiled binaries?

Most MCP servers in the ecosystem are Node.js or Python scripts. They work, but they come with runtime dependencies, startup overhead, and deployment complexity. A compiled Rust binary, by contrast, is a single file. No runtime. No node_modules. No virtual environments. Just copy the binary and run it.

So I rebuilt the concept from scratch in Rust. The result is codemap.

At its core, codemap builds a semantic knowledge graph of your code. It parses source files using tree-sitter grammars, extracts symbols (functions, classes, methods, structs, traits, interfaces, enums, constants) and their relationships (calls, contains, imports, extends, implements), and stores everything in an SQLite database.

The indexing pipeline works like this:

  1. Walk the source tree – respecting .gitignore, skipping node_modules, target, .git, and other noise
  2. Parse each file with the appropriate tree-sitter grammar (Rust, TypeScript, JavaScript, Python, Go, Java, C, or C++)
  3. Extract nodes – every named symbol gets a node with its kind, location, signature, visibility, docstring, and language
  4. Extract edges – function calls, containment (class contains method), imports, inheritance, implementations
  5. Resolve references – after all files are indexed, link unresolved function calls to their definitions
  6. Store in SQLite – with indexes optimized for the queries AI agents actually make

The entire index lives in .codemap/index.db inside your project. Re-indexing is incremental: codemap computes SHA-256 hashes of file contents and only re-processes files that have changed.

codemap exposes its knowledge graph through 17 MCP tools, designed around the questions AI agents actually need answered during coding tasks:

Exploration tools:

  • codemap-search – Find symbols by name with case-insensitive prefix matching
  • codemap-node – Get detailed metadata about a specific symbol
  • codemap-definition – Retrieve the full source code of a symbol with surrounding context
  • codemap-file – List every symbol defined in a specific file

Graph navigation:

  • codemap-callers – Who calls this function?
  • codemap-callees – What does this function call?
  • codemap-path – Find the call chain between two symbols (e.g., how does main reach database_query?)
  • codemap-references – Find every reference to a symbol across the codebase
  • codemap-hierarchy – Navigate class/module containment trees

Analysis tools:

  • codemap-impact – Before changing a function, see everything that would be affected (direct and indirect callers)
  • codemap-diff-impact – Same analysis, but scoped to a specific line range in a file
  • codemap-unused – Find dead code with zero incoming references
  • codemap-implementations – Find all structs/classes implementing a trait or interface

AI-focused:

  • codemap-context – The flagship tool. Given a task description like “add rate limiting to the auth endpoint,” it extracts keywords, finds matching symbols, traverses the graph for related code, reads the relevant source files, and returns a focused context package with code snippets – all in one call

If you’ve watched an AI coding agent work on a non-trivial task, you’ve seen the pattern: the agent spends 60-70% of its turns just navigating. Searching for files. Reading them. Searching again. Reading more files. Building up enough context to understand what it needs to change.

This exploration phase is expensive in multiple ways:

  • Token cost – every file read consumes context window
  • Latency – each tool call is a round trip
  • Accuracy – grep-based exploration misses semantic relationships (a text search for “authenticate” won’t tell you that verify_token calls check_session which calls authenticate)
  • Context pollution – reading entire files to find one function buries the relevant code in noise

codemap compresses this exploration phase dramatically. Instead of:

Agent: *reads src/auth.rs* (400 lines consumed)
Agent: *greps for "authenticate"* (finds 12 matches across 8 files)
Agent: *reads 5 more files* (2000 lines consumed)
Agent: "OK, I think I understand the auth flow now..."

You get:

Agent: *calls codemap-context("add rate limiting to authentication")*
Agent: "Here are the 6 relevant functions with their source code,
        call relationships, and locations. Let me make the changes."

But the most powerful capability isn’t search – it’s impact analysis. When an agent needs to refactor a function, the critical question isn’t “where is this function?” It’s “what breaks if I change it?”

codemap-impact answers this by traversing the call graph upward from a symbol, separating direct callers from indirect callers, and presenting the full blast radius of a change. codemap-diff-impact goes further, letting you scope this analysis to specific line ranges – “if I change lines 45-60 of src/auth.rs, what’s affected?”

For AI agents making autonomous changes, this is the difference between confident refactoring and anxious guessing.

codemap-unused finds functions, methods, and classes with zero incoming references – code that nothing in the project calls. This is surprisingly useful for agents doing cleanup tasks or trying to understand which code is actually live versus vestigial.

“How does the HTTP request handler eventually reach the database?” is a question that takes a human developer significant time to answer in a large codebase. codemap-path answers it instantly by finding call chains between any two symbols, returning up to 5 distinct paths.

Beyond the code intelligence features, codemap is an experiment in MCP server deployment. The Rust implementation compiles to a single static binary – no runtime dependencies, no package managers, no version conflicts.

This matters for several reasons:

Installation is trivial. Download the binary for your platform, put it on your PATH, done. The MCPB bundle format makes it even simpler for Claude Desktop users – drag and drop.

Startup is instant. No interpreter to boot, no modules to load. The server is ready in milliseconds, which matters when MCP clients spawn tool servers on demand.

Resource usage is minimal. The compiled binary is small, memory-efficient, and CPU-efficient. Tree-sitter parsing in Rust is fast. SQLite is fast. There’s no garbage collector, no JIT warmup.

Cross-platform is straightforward. The same codebase compiles for macOS (Apple Silicon and Intel), Linux, and Windows. CI produces binaries for all platforms on every release.

The implementation is organized into focused modules:

  • extraction/ – Tree-sitter AST walking with language-specific configurations. Each language gets a LanguageConfig that maps AST node types to codemap’s symbol kinds, handles language-specific call patterns, and manages import/export syntax.

  • db/ – SQLite operations with optimized indexes for name search, edge traversal, and file lookup. All mutations happen in transactions for consistency. The schema is simple: files, nodes, edges, and unresolved_refs.

  • graph/ – BFS-based algorithms for caller/callee traversal, impact analysis, call path finding, and related symbol discovery. All traversals include cycle detection via visited sets and configurable depth limits.

  • context/ – The AI context builder that ties everything together. It extracts keywords from task descriptions, finds entry points via symbol search, expands via graph traversal, reads source files for code snippets, and formats everything into a structured report.

  • mcp/ – The MCP protocol layer using the rmcp crate, with individual handler modules for each of the 17 tools. Supports both stdio (for local AI tools) and HTTP (for remote access) transports.

One architectural decision I’m particularly pleased with: the reference resolution pass. During initial extraction, function calls to symbols in other files can’t be immediately resolved because those files may not have been indexed yet. So codemap stores them as unresolved_refs and runs a resolution pass after all files are indexed, matching call targets by name and creating the appropriate edges. This means the call graph captures cross-file relationships accurately.

# Install (macOS, Apple Silicon)
curl -L https://github.com/grahambrooks/codemap/releases/latest/download/codemap-darwin-arm64.tar.gz | tar xz
sudo mv codemap /usr/local/bin/

# Index your project
cd /path/to/your/project
codemap index

# Start the MCP server
codemap serve

Configure it in your AI tool of choice – Claude Desktop, GitHub Copilot, OpenAI Codex – and the tools appear automatically. The agent can then query the knowledge graph as naturally as reading a file.

codemap is still early. The graph captures a useful subset of code semantics, but there’s much more that could be extracted: data flow analysis, type relationships, error propagation paths, test coverage mapping. The tree-sitter foundation makes these extensions feasible without changing the core architecture.

I’m also interested in how the knowledge graph could be used beyond individual agent sessions. Pre-computed understanding of a codebase could power smarter code review, automated documentation, architectural drift detection, and more.

The project is MIT-licensed and open source at github.com/grahambrooks/codemap. Contributions and ideas are welcome.


codemap is a Rust reimplementation of codegraph by Colby McHenry, focused on compiled binary deployment and expanded code intelligence capabilities. It works with Claude Desktop, GitHub Copilot, OpenAI Codex, and any MCP-compatible client.