Grunt Work Pays Off: The Caveman Hack Saving Claude Users Up to 75% on Tokens

A developer on Reddit discovered something that made the AI community laugh before taking note: teach Claude to communicate like a prehistoric human and watch your token bill shrink by up to 75%. The post on r/ClaudeAI has racked up over 400 comments and 10K votes—a rare blend of genuine technical insight and absurdist comedy. Somewhere, a Silicon Valley VC is furiously taking notes on "disruption through regression."

The mechanic is straightforward. Instead of letting Claude warm up with pleasantries, narrate every step, and close with an offer to help further, the developer constrained the model to short, stripped-down sentences. Tool first, result first, no explanation. A normal web search task that would run about 180 output tokens dropped to roughly 45. It's like asking your roommate to text you back—brevity is a feature, not a bug.

The original poster claims up to 75% reduction in output by making the model sound like it just discovered fire. In caveman terms, as one Redditor put it: "Why waste time say lot word when few word do trick?" That quote has probably been screenshotted more times than most crypto whitepapers. And honestly, it reads better than half the abstracts on arXiv.

What this technique doesn't touch is the input context—the full conversation history, attached files, and system instructions the model re-reads on every turn. That input typically dwarfs the output, especially in longer coding sessions. Real-world sessions accounting for all this input show savings around 25%, not 75%. Still meaningful, just not the headline number. The headline number is the one that gets retweeted. The 25% is the one that pays your AWS bill.

It's also wise to feed the model normal instructions. Don't give it the "caveman" talk as input, as that could spiral into a "garbage in, garbage out" situation. Nobody wants an AI that speaks in broken English AND gives broken answers. That's not a feature—that's a support ticket waiting to happen.

There's also the question of intelligence degradation. A handful of researchers in the thread argued that forcing an AI into a less sophisticated persona could hurt its reasoning quality—that verbal constraints might bleed into cognitive ones. The concern hasn't been definitively settled, but it's worth considering when evaluating results. Then again, some of the best developers I know communicate exclusively in grunts and single-word Slack messages, and they seem to be doing fine.

Despite the caveats, the technique found a second life on GitHub almost immediately. Developer Shawnchee packaged the rules into a standalone caveman-skill compatible with Claude Code, Cursor, Windsurf, Copilot, and over 40 other agents. The skill distills the approach into 10 rules: no filler phrases, execute before explaining, no meta-commentary, no preamble, no postamble, no tool announcements, explain only when needed, let code speak for itself, and treat errors as things to fix rather than narrate. Nine of these rules are about what NOT to do. The tenth is implied. Welcome to prompt engineering, where success is measured in what you don't say.

Benchmarks in the repo, verified with tiktoken, show output token reductions of 68% on web search tasks, 50% on code edits, and 72% on question-and-answer exchanges—for an average output reduction of 61% across four standard tasks. These numbers aren't just encouraging—they're the kind of metrics that make CFOs quietly nod in approval during budget meetings.

A parallel repo by developer Julius Brussee took a slightly different approach, framing the same idea as a SKILL.md file with 562 stars on GitHub. The spec: respond like a smart caveman, cut articles, filler, and pleasantries, keep all technical substance. Code blocks remain unchanged. Error messages are quoted exactly. Technical terms stay intact. Caveman only speaks the English wrapper around the facts. This one even comes with different modes to affect how much you want to strip, switching between Normal, Lite, and Ultra. It's like a volume knob for your AI's personality, except the only setting is "less."

The models do the exact same work but provide a much shorter answer, which results in big savings over time. It's the AI equivalent of that one coworker who responds to emails with "K" instead of a three-paragraph confirmation. Annoying? Maybe. Efficient? Absolutely.

The broader cost context gives the joke a sharper edge. Anthropic is among the most expensive models in terms of price per token. For developers running agentic workflows with dozens of turns per session, output verbosity isn't a stylistic complaint—it's a line item. If a caveman grunt can replace a five-sentence summary of what the model just did, those saved tokens add up across thousands of API calls. At scale, verbosity is a luxury tax. This hack is a tax dodge.

The caveman skill is installable in one command via skills.sh and works globally across projects. Whether or not it makes Claude marginally less articulate, it's already made a lot of developers significantly less annoyed. Sometimes progress isn't about making things better—it's about making them shut up and do the thing. And in this market, that's a feature worth grunting about.

Grunt Work Pays Off: The Caveman Hack Saving Claude Users Up to 75% on Tokens

Share Article

Quick Info