How One Question Cut Our Token Usage in Half
We spent weeks optimizing our collaboration graph. Then we asked our AI partner what it actually needed.
The Expensive Assumption
I'm building ginko, a dev tool that pairs humans and AI partners on a shared knowledge graph. Epics, sprints, tasks, architecture decisions, patterns, all connected in a graph that both partners can query and update.
One core problem: when an AI partner starts a session, it needs to understand the current task. What are we building? Why? What files are involved? What constraints apply?
The obvious solution: give the AI structured data. JSON. Machine-readable, precise, complete. What else would you hand to a language model?
So that's what we built. When the AI queried a task, it got back something like this:
{
"id": "e012_s02_t03",
"title": "Add rate limiting to API endpoints",
"status": "not_started",
"priority": "high",
"sprint": "e012_s02",
"epic": "e012",
"estimated_hours": 4,
"assignee": "dev@team.com",
"tags": ["api", "security", "infrastructure"],
"dependencies": ["e012_s02_t01", "e012_s02_t02"],
"acceptance_criteria": [
"Rate limits enforced on all public endpoints",
"429 responses include Retry-After header",
"Rate limit config is per-endpoint"
],
"related_files": ["src/middleware/auth.ts", "src/config/limits.ts"],
"related_adrs": ["ADR-015", "ADR-023"]
}
Correct. Complete. And, as it turned out, wasteful.
But we didn't know that yet. Everything worked. The AI could parse the JSON, reference the fields, do the work. We shipped it and moved on.
Over the following weeks, we did what every team does with token costs: we optimized. We trimmed fields the AI didn't need. We compressed payloads. We batched queries. Incremental gains, hard-won.
Then one day, while reviewing how the AI consumed task data during a session, I asked a question I'd never thought to ask:
"Which format is actually more useful to you, this JSON, or a narrative summary?"
The answer reframed the entire design.
The AI didn't need a property grid. It needed to understand intent. The JSON gave it fourteen fields to parse and reassemble into a mental model. What it wanted was the story: why does this task exist, what are we trying to achieve, and how should we approach it.
We rebuilt the task output as a story card:
WHY: Public API has no rate limiting. A single client can overwhelm the service.
WHAT: Add per-endpoint rate limiting with configurable thresholds
HOW: Express middleware using sliding window counter in Redis, config in limits.ts
SCOPE: Public endpoints only. Internal service-to-service calls excluded.
FILES: src/middleware/rate-limit.ts, src/config/limits.ts, ADR-015
Same information. Different shape. Five lines instead of fourteen fields.
The results weren't subtle:
| Metric | Before | After |
|---|---|---|
| Tokens for task context | 5,000–10,000 | 150–300 |
| File reads per task | 3–5 | 0 |
| Tool calls to understand task | 3–5 | 0 |
| Time to first meaningful action | 30–60s | Immediate |
The headline to this post didn't tell the whole truth. We didn't cut our token usage by 50%. Not twice as good. A 97% reduction in tokens spent on task orientation, and the elimination of every file read the AI previously needed to reconstruct the context that was already in the graph.
The engineering took a few hours. The question took months to think to ask.
And the gains compound. Every task transition in a session, starting a new task, switching context, resuming after a break, used to cost those 3–5 file reads. Over a six-task sprint, that's 15–30 fewer tool calls and 30,000–60,000 fewer tokens the AI never has to spend on orientation. Tokens that now go toward actual work.
The Assumption Was Invisible
So why didn't we ask sooner?
Because "AI needs structured data" felt axiomatic. It's a language model. It processes tokens. Give it clean, structured, machine-readable input and let it do its thing. This is so obviously correct that it never registered as an assumption. It was just how things work.
That's the pattern worth naming: the assumptions that feel most obvious are the ones you never examine. Not because they're wrong: our JSON was perfectly valid input. But "valid" and "optimal" are different things. The AI could parse the JSON. It just had to do unnecessary work to reconstruct the narrative that the sprint author had already written.
Consider what we were actually optimizing before we asked the question. We trimmed fields. We shortened property names. We experimented with different JSON structures. We were polishing the surface of the wrong shape. Weeks of incremental work on a format that was fundamentally misaligned with how the AI actually consumed the information.
The deeper insight: the AI wasn't struggling with parsing. JSON is trivial to parse. It was struggling with reconstructing intent from fragmented fields. A story card delivers WHY, WHAT, and HOW in a single scan. A property grid forces the reader, human or AI, to infer the relationships between disconnected fields and assemble the story themselves.
Once we saw it, it was obvious. Of course a format designed for comprehension outperforms a format designed for serialization. But "obvious in retrospect" is the signature of an unexamined assumption. The fix wasn't smarter engineering. It was better listening.
Inquiry as Practice
This story is about tokens and task formats, but the principle underneath is broader.
How often do we build systems on assumptions nobody questioned? Not bad assumptions. Reasonable ones. The kind that feel so sensible that questioning them seems like a waste of time. "Users want more features." "The API should return JSON." "The AI needs structured data."
Each of these might be true. But have you asked?
The concept I keep coming back to is what I've started calling the 30x question: a single question whose answer changes the design more than weeks of execution. It's not a technique or a framework. It's a habit. And the uncomfortable truth is that the questions with the highest leverage are the ones that feel the least necessary to ask. They hide behind assumptions that feel settled.
This extends beyond AI. But AI makes the pattern especially visible because you can literally ask your tools what they need and get a meaningful answer. If you're building with AI and you've never asked your model what format it prefers, what context it's missing, or what makes a task easy vs. hard to work on, you're optimizing blind.
Inquiry isn't a sign of uncertainty. It's a practice that compounds. Every unexamined assumption is a potential 30x question waiting to be asked.
What are you not asking?
Actionable Insights
- Ask your AI partner what it needs, not what you assume it needs. Try it literally: "Which of these two formats is more useful to you, and why?" The answer may surprise you.
- Test comprehension, not just parsing. Your AI can process any format you throw at it. The real question is which format lets it understand intent fastest with the fewest tokens.
- Schedule assumption audits. Pick one "obvious" design choice per sprint and question it directly. The decisions that feel most settled are the ones most worth examining.
- Measure the right thing. We were optimizing token count by trimming JSON fields. The real win was changing the shape of the data entirely. Optimization without inquiry is polishing the wrong surface.
- Treat your AI as a collaborator, not a consumer. Consumers get whatever format you ship. Collaborators get asked what works.