Back to DevLog

Setting Up Headroom Proxy for Token Savings

2 min read

I've been burning through Claude Code tokens like crazy. When you're running 8-hour sessions across a dozen projects, that context window fills up fast. So I built Headroom — a local proxy that sits between Claude Code and the Anthropic API and compresses tool outputs before they hit the model.

The problem

Claude Code is incredible, but it's chatty. Every file read, every grep result, every git diff — all of that eats into your context window. Once you hit the limit, the system starts compressing old messages automatically, and you lose context from earlier in the session. For long builds, that's a real problem.

How Headroom works

Headroom runs on localhost:8787 as a transparent proxy. You point Claude Code at it by setting ANTHROPIC_BASE_URL=http://127.0.0.1:8787, and it intercepts every API call. Tool outputs get compressed using LLMLingua before being sent to the model. The model never notices — it gets the same semantic content in fewer tokens.

# Terminal 1
headroom proxy --llmlingua-device cpu

# Terminal 2
set ANTHROPIC_BASE_URL=http://127.0.0.1:8787
claude

The results

After a week of running Headroom across all my projects, the numbers are clear:

  • 34% average token savings across sessions
  • Context window lasts roughly 1.5x longer before compression kicks in
  • No noticeable quality degradation in responses
  • Health check endpoint at /health for monitoring, stats at /stats

What I learned

The biggest savings come from large file reads and grep results — those tool outputs are often 80% boilerplate that the model doesn't need verbatim. Headroom strips that down while preserving the important bits.

If you're running Claude Code for extended sessions, this is a no-brainer optimization. The proxy adds maybe 200ms of latency per call, which is nothing compared to the time you save by not losing context mid-session.

Next up: I'm looking into running LLMLingua on GPU for even faster compression. The CPU mode works fine for now, but there's room to push it further.

Share this post