ANALYSIS: Gemini 3.5 Flash and Omni. What’s Real, What’s Hype.

Google shipped... a lot. Here’s what matters for you.

May 22, 2026

TL;DR: Gemini 3.5 Flash is Google’s high-throughput agentic AI model, hitting 277 tokens per second at $1.50 per million input tokens with a 31-point hallucination drop over its predecessor. Gemini Omni adds native 4K video with multi-angle scene control. Both are rare in 2026: AI releases that justify a leader’s attention.

Here’s the thing, I was talking to a leader not long ago, and they brought up the same frustration I hear from almost every executive I talk to: trying to keep up with every AI tool. Chase every release, and the best you get is a shallow understanding of all of them. That’s where leaders miss the moves that actually matter.

I don’t try to keep up with every release. I co-founded Cozora partly to solve that problem: hearing weekly from experts who go deep on each tool category, so I don’t have to drown in announcement cycles. Most weeks, there’s nothing worth writing about.

This week is one of the exceptions. Google shipped Gemini 3.5 Flash and Gemini Omni, and both deserve a leader’s attention. The tech press is going to get the takeaway wrong, though, so let’s walk through what’s actually there.

In this post, you’ll learn:

The chess test I use to decide which AI releases earn my time
What Gemini 3.5 Flash actually delivers, behind the speed headline
What Gemini Omni changes for marketing and creative leaders
The one move worth making this week, and the one most leaders will get wrong

The Chess Test

A serious chess player doesn’t study every game ever played. They study the games that changed how the game is played. AI releases work the same way.

Here’s the three-question filter I run every release through:

Does it change a workflow I actually run? A workflow I’m running this week where failure would cost something real, not a hypothetical I might pick up someday.
Does the improvement compound? A 10% speed bump is interesting. A 30-point hallucination drop changes what I can hand off to AI in the first place.
Does it shift what’s possible, or just what’s faster? Faster is a feature update I can ignore until next quarter. A release that shifts what’s possible is a board change I have to plan around now.

If a release doesn’t clear two of three, I let it pass. Most leaders I coach don’t have a filter like this. They have a feed instead. Every release feels equally urgent, which means none of them gets real attention. That’s the trap, and the discipline of knowing what to ignore is what adaptability in the AI Leadership Triad actually looks like in practice.

Why Gemini 3.5 Flash Passes

Gemini 3.5 Flash is Google’s high-throughput agentic model. It runs at roughly 277 tokens per second at $1.50 per million input tokens. Those are the numbers you’ll see in every headline.

Here’s what actually matters: the hallucination rate dropped 31 points compared to Gemini 3 Flash, from 92% down to 61%, per Artificial Analysis data published in May 2026. That’s the win. Speed lets you scale a workflow. Lower hallucinations let you trust a workflow with something that matters. Macquarie Bank is already using Flash to reason over 100-plus page tax and onboarding documents, the kind of work that used to take their team weeks. That’s the agentic era arriving in a regulated industry, not a demo.

Compared to Claude Sonnet 4.6 at $3 per million input tokens and GPT-5.5 at $5, Flash is the cheapest agentic model on the headline numbers, running roughly 4x faster than either. Whether that holds in practice depends on the next two paragraphs.

Now the catch. Two of them.

First, the headline price isn’t the real price. On complex benchmarks, Flash runs roughly 5.5x more expensive than its predecessor because it emits about twice as many tokens to reach the same answer. Compare effective cost rather than sticker price before you migrate.

Second, long-context retrieval regressed. At 128K tokens, recall dropped to 77%, down from 85% on Gemini 3.1 Pro. If you’re feeding Flash a 200-page board packet and expecting it to reliably pull the right paragraph back, you’re going to see hallucinations of a different flavor. Test before you trust.

Don’t pick a model based on one number. Pick based on the workflow you’re putting it in.

Here’s Google’s quick Gemini 3.5 Flash demo… (credit, Google)

Why Gemini Omni Passes

Gemini Omni is Google’s Veo 4 native multimodal model. It produces 15 to 30-second clips at 4K resolution, with multi-angle scene control from a single prompt and conversational editing. Tell it to swap a coffee cup for a tea cup, and it does, without destroying the rest of the scene. That’s new.

For marketing and creative leaders, this passes the chess test for one reason: it collapses production cycles you currently outsource. The first draft of a product demo, a recruitment video, a customer testimonial sequence: work that took a week now takes an afternoon. That has real implications for how creative teams should balance AI and human craft.

The gate is cost. Omni eats tokens at a rate that makes it the wrong default for everyday content. It’s a specialized tool for high-value storytelling, not a commodity utility for filling a content calendar. Use it where the final asset justifies the spend. Skip it where a Loom recording would do.

Here’s a glimpse of what it can do…

The One Move This Week

Pick one workflow where hallucinations have actually hurt you in the past 90 days. Pilot Gemini 3.5 Flash on that workflow for 30 days. Don’t migrate everything. Don’t announce a strategy. Measure cost-per-completed-task, not cost-per-token. Then decide.

Now… what’s the last AI release you tried to adopt that didn’t stick, and what would have made you pass on it earlier?

For further reading, check out Karo (Product with Attitude) ‘s latest also on this release:

Product with Attitude

Gemini Omni Flash: Cute Videos, Serious Product Strategy, and the Future of Editable Reality

a month ago · 53 likes · 21 comments · Karo (Product with Attitude)

Questions Leaders Are Asking

What is Gemini 3.5 Flash?

Gemini 3.5 Flash is Google’s high-throughput agentic AI model, released in May 2026. It runs at roughly 277 tokens per second at $1.50 per million input tokens and $9 per million output tokens. The most consequential improvement over Gemini 3 Flash is a 31-point reduction in hallucinations, dropping from 92% to 61%.

Is Gemini 3.5 Flash actually cheaper than Claude or GPT-5.5?

Not on a real-world basis. Headline pricing is $1.50 per million input tokens, less than Claude Sonnet 4.6 ($3) or GPT-5.5 ($5). But Gemini 3.5 Flash emits roughly twice as many tokens to reach the same answer, making it about 5.5x more expensive than its predecessor on complex benchmarks. Look at effective cost rather than the sticker price.

What is Gemini Omni used for?

Gemini Omni is Google’s Veo 4 multimodal model, designed for production-grade video. It generates 15 to 30-second clips at 4K resolution with multi-angle scene control and conversational editing. It’s a specialized tool for high-value storytelling like marketing assets, recruitment videos, and product demos, not for everyday content workflows.

Should I switch from Claude or ChatGPT to Gemini 3.5 Flash?

Not without testing. Pilot Gemini 3.5 Flash on one bounded workflow where hallucinations have hurt you, for 30 days. Measure cost-per-completed-task, not cost-per-token. The pricing advantage gets eaten by Flash’s token chattiness on complex tasks, and long-context recall regressed to 77% at 128K. Test before you migrate.

How do I decide which AI releases to actually pay attention to?

Run every release through three questions: Does it change a workflow you actually run this week? Does the improvement compound (a 30-point hallucination drop, not a 10% speed bump)? Does it shift what’s possible or just what’s faster? If a release doesn’t clear two of three, let it pass. Most don’t.

What is the agentic era in AI?

The agentic era describes AI models capable of autonomous, multi-step reasoning over real workflows, not just one-shot text generation. Gemini 3.5 Flash is a benchmark example: it runs production agentic loops at high speed with significantly lower hallucinations. Enterprises like Macquarie Bank now use it for 100-plus page document reasoning in regulated industries.

Joel Salinas is an Executive AI Coach for leaders at small and mid-sized businesses and nonprofits. 1:1 coaching, team workshops, and AI strategy work built around amplifying what your team is already good at. Creator of the AI Leadership Triad. He writes Leadership in Change.

Written by a human, for humans.

Melanie Goodman

May 23

Brilliant piece! This is such an important distinction for leaders right now. Most executives are being pulled into an endless cycle of AI headlines without ever getting the time to understand where the operational value actually sits. McKinsey’s 2025 AI research showed that companies seeing the strongest returns are focusing on a handful of strategically embedded use cases rather than chasing every new model release.

Which capability from the latest Gemini releases do you think business leaders are most likely to underestimate over the next 12 months?

1 reply by Joel Salinas

Joël Kai Lenz

May 22

Such a great review with superb insights! Great job Joel :)

14 more comments...

Discussion about this post

Ready for more?