Gemma 4 Scorecard

Gemma 4 Multi-Token Prediction: The Scorecard

Tech: 8.5/10 — Multi-token prediction is a genuinely solid engineering move, not vaporware. Speculative decoding isn't new, but baking it into Gemma 4 with purpose-built drafter models shows Google gets the inference cost problem. The execution appears clean: faster latency, same quality, lower compute. The weakness? We need real-world benchmarks across different hardware. Lab numbers always look better than production. Still, this is the kind of unglamorous work that actually moves the needle.

Comms: 7/10 — Google's messaging is clear and developer-focused, which is refreshing. They're not overselling this as AGI-adjacent or revolutionary. But they're also underselling the strategic importance: this directly threatens commercial inference providers' margins. The comms should've leaned harder into "we're making closed-source LLM economics obsolete." Instead it's polite and technical. Missed opportunity to stoke some FUD among the Anthropic/OpenAI camp.

Pricing: 9/10 — Here's where Google wins hardest. Open-source + faster inference = zero friction adoption. Startups building on Gemma 4 immediately see lower per-token costs. That's not a pitch, that's math. For Google, it's a Trojan horse into developer ecosystems. By the time a startup realizes they're locked into Gemma for inference, they're already locked in. Genius play.

Hype vs. Substance: 8/10 — This is actually substance with modest hype, which is rare and commendable. Multi-token prediction won't make anyone rich tomorrow, but it compounds. In 12 months, every open-source model will need this feature or die. Google's early. The hype is appropriately sized—technical blogs, not TechCrunch think pieces. Substance wins.

Competitive Position: 8.5/10 — OpenAI and Anthropic are focused on capability arms races. Google is playing long-term infrastructure chess. Every percentage point of inference efficiency Google adds to Gemma is a percentage point less margin OpenAI needs to defend. Claude and GPT-4 are still better models, but for 80% of use cases, faster and cheaper Gemma becomes the default. This move makes Google the LLM commodity player, which is exactly where they should be. Strong position, especially if they iterate fast.

Bottom line: Gemma 4 with multi-token prediction is a credible technical achievement aimed at a real problem. It won't dominate headlines, but it will dominate developer choices in cost-sensitive workloads. Google's doing what Google does best: making infrastructure faster and cheaper, then watching everyone else scramble. This is a 7.5/10 overall release—solid, strategic, and the kind of boring that actually changes markets.

Stay sharp. — Max Signal