Headroom — Context Compression for AI Agents

GitHub: chopratejas/headroom PyPI: headroom-ai npm: headroom-ai Docs: headroom-docs.vercel.app Model: Kompress-base (HuggingFace) License: Apache 2.0

60–95% fewer tokens · Library · Proxy · MCP · 6 algorithms · Local-first · Reversible

Compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

How It Works

Your agent → Headroom (locally) → LLM

Headroom sits between your agent and the LLM as a transparent compression layer:

ContentRouter — detects content type, selects the right compressor
SmartCrusher / CodeCompressor / Kompress-base — compress JSON, AST, or prose
CacheAligner — stabilizes prefixes so provider KV caches actually hit
CCR (Compress-Confirm-Retrieve) — stores originals locally; LLM calls headroom_retrieve if needed
Cross-agent memory — shared store across Claude, Codex, Gemini, auto-dedup
headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md

Installation

pip install "headroom-ai[all]"   # Python
npm install headroom-ai          # Node/TypeScript

Usage: 4 Modes

1 — Wrap a coding agent (one command)

headroom wrap claude
headroom wrap codex
headroom wrap cursor
headroom wrap aider
headroom wrap copilot

2 — Drop-in proxy (zero code changes)

headroom proxy --port 8787
# Any OpenAI-compatible client can route through it

3 — Inline library

from headroom import compress
compressed = compress(messages)

4 — MCP server

Tools: headroom_compress, headroom_retrieve, headroom_stats

headroom mcp install

Proof

Token savings on real workloads:

Workload	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
GitHub issue triage	54,174	14,761	73%
Codebase exploration	78,502	41,254	47%

Accuracy preserved on benchmarks:

Benchmark	Baseline	Headroom	Delta
GSM8K (Math)	0.870	0.870	±0.000
TruthfulQA (Factual)	0.530	0.560	+0.030
SQuAD v2 (QA)	—	97%	19% compression
BFCL (Tools)	—	97%	32% compression

Agent Compatibility

Agent	`wrap`	Notes
Claude Code	✅	—memory, —code-graph
Codex	✅	shares memory with Claude
Cursor	✅	prints config
Aider	✅	starts proxy + launches
Copilot CLI	✅	starts proxy + launches (supports subscription mode)
OpenClaw	✅	ContextEngine plugin

Architecture

SmartCrusher — JSON compression
CodeCompressor — AST-aware code compression
Kompress-base — trained text compression model (HuggingFace)
CCR — reversible compression: originals stored locally, retrievable on demand
CacheAligner — KV cache optimization

description	Compress everything your AI agent reads — tool outputs, logs, RAG chunks, files, conversation history — before it reaches the LLM. 60–95% fewer tokens without losing accuracy. By chopratejas.
tags	context-compression, token-optimization, ai-agents, mcp, proxy, claude-code, codex

Huy's Wiki

Explorer

Headroom — Context Compression Layer for AI Agents