Semble Claims 98% Fewer Tokens Than Grep for AI Agents. That’s a Bigger Deal Than It Sounds.
A new open-source tool called Semble is getting serious attention after a Show HN post picked up hundreds of points and a long comment thread. The headline claim is wild but clear: Semble can cut token use for code search by about 98% compared with a brute-force grep-plus-read workflow.
If you build ai agents, this hits a real pain point. Agents don’t fail only because model quality is bad. They fail because context retrieval is messy, expensive, and slow. Semble is trying to fix exactly that layer.
This is why the story matters. It’s not “another agent framework.” It’s infrastructure for how agents decide what code to read in the first place.
What Actually Happened
Semble was posted to Hacker News as a code search system designed for agents. Instead of dumping giant file chunks into a model and hoping it finds what it needs, Semble indexes code and returns tighter, more relevant snippets.
The project’s core argument is simple: grep-style search is a blunt instrument for agent workflows. It finds matching strings, but agents then over-read huge amounts of surrounding code. That inflates token spend and still misses important context boundaries.
Semble’s maintainers report better retrieval quality at far lower token budgets, including claims of high recall with only a few thousand tokens where naive grep workflows need massive windows. Whether your exact mileage matches their benchmark or not, the design direction is what matters.
Why This Matters for Builders
Agent economics are now a business problem, not just an engineering problem. Every unnecessary token in retrieval is a tax on gross margin, especially for products running multi-step loops across large repos.
If an agent performs search, planning, edit, test, and retry cycles all day, retrieval inefficiency compounds hard. A small per-step waste becomes a huge monthly API bill. That is why token efficiency and API cost reduction are moving to the center of agent infrastructure discussions.
Semble resonates because it attacks a high-frequency cost center. Most teams obsess over model choice, but retrieval design often has a bigger impact on cost-per-task than switching from one top model to another.
What’s Different vs Grep + Read
Traditional grep-based agent flows are easy to implement but expensive in production. You search keywords, open broad file ranges, then pass oversized chunks into the model. It works for demos, but it scales poorly.
Semble’s approach uses indexing and ranking so the agent sees less but better context. In practical terms, that means smaller prompts, lower latency, and fewer irrelevant detours. The model spends more compute on reasoning and less on reading noise.
This is the same architecture shift search engines made years ago: from literal match to relevance-oriented retrieval. Agents are now going through that transition for code search.
The Business Angle: Margins, Not Just Developer Happiness
For founders, this is where the story gets real. Better code search can move three numbers at once: cost per successful task, task completion speed, and reliability of outputs. Those three numbers directly influence retention and pricing power.
If you run ai development services in los angeles or any agency building internal copilots, improved retrieval can shave meaningful spend from client projects. That makes pilots easier to sell and long-term contracts easier to keep profitable.
If you ship vertical products like ai property management software, ai hiring tools, or ai recruitment software, you may not market “code search” directly. But if your internal agents are cheaper and faster, your end-user experience improves while your unit economics get healthier.
Who Should Adopt This First
The best early adopters are teams running code-aware agents in production: autonomous bug-fix systems, refactoring assistants, PR review agents, and internal engineering copilots. These teams feel token burn daily and can measure retrieval impact quickly.
Second are teams with large monorepos and multi-language codebases. Bigger repos amplify retrieval inefficiency, so ranking-based search usually pays off faster there.
Third are consultancies and platform teams serving many customers. Even modest per-task savings get multiplied across accounts and become meaningful margin improvements.
Who Shouldn’t Overreact
If your app does mostly single-shot chat with tiny code snippets, this may not change much. If retrieval is a small share of total cost, you won’t see dramatic savings.
Also, don’t treat a benchmark claim as universal truth. Your stack, repo structure, and agent loop design determine outcomes. You still need a controlled A/B test with your own tasks, your own failure modes, and your own budget thresholds.
In short: strong signal, but validate before rewriting your whole architecture.
How to Evaluate Semble in a Week (Practical Plan)
Start with a side-by-side harness: current grep workflow vs Semble-based retrieval. Use real production-like tasks, not toy examples.
Track five metrics: tokens per task, latency to first useful result, total completion time, task success rate, and number of retrieval-related retries. Those metrics will tell you whether efficiency gains are real or just shifted elsewhere.
Then look at economics. Convert token savings into monthly dollars at your actual traffic level. Many teams underestimate this step and only discover margin pain after growth.
Finally, watch quality carefully. Token savings are only useful if the agent still finds the right code and avoids brittle edits. The right goal is not “smallest prompt.” It is “lowest cost per correct outcome.”
What This Signals About the Next Wave of Agent Infrastructure
Semble is part of a broader trend: agent stacks are maturing from model demos into systems engineering. Retrieval, memory, tool routing, evals, and guardrails are becoming the differentiators.
That has a strategic implication for product teams. Competing on “we use a powerful model” is weak. Competing on “our agents complete work reliably at half the cost” is defensible.
Even debates like ai construction workflow vs bridgit.com eventually come down to this. The winner is rarely the flashiest model. It’s usually the team that delivers better operational outcomes with tighter economics.
Bottom Line
Semble’s 98% token-reduction claim captured attention because it addresses the bottleneck everyone building ai agents eventually hits: retrieval bloat. By replacing brute-force context loading with smarter indexing and ranking, it points to a more scalable architecture for agent systems.
For builders, the takeaway is straightforward. Treat code search as core agent infrastructure, not a utility script. Measure it, optimize it, and tie it to unit economics. That is how you turn agent demos into profitable products.
If this category keeps improving, the next generation of ai development tools won’t just be “smarter.” They’ll be dramatically cheaper to run at scale. And in this market, cheaper plus reliable usually wins.
Now you know more than 99% of people. — Sara Plaintext