Ryan Haines / Android Authority
If you buy a Pixel 10 Pro series phone, or even last year’s Pixel 9 Pro, you get one full year’s worth of Google’s Gemini Pro subscription. This $20-per-month service unlocks the powerful Gemini 2.5 Pro model and a suite of cutting-edge AI tools. Until very recently, the crown jewel of this package was Veo 3, Google’s impressive text-to-video generator that could turn any description into a hyper-realistic short video.
But the AI world moves at lightning speed. This past week, OpenAI announced its competing Sora 2 model, meaning Google’s video generator is no longer the only game in town. While Sora 2 is invite-only for now, the model already has an active user base. So naturally, I took OpenAI’s Sora 2 for a spin vs Google’s Veo 3 to find out which AI video generator has the upper hand.
Google Veo 3 vs OpenAI Sora: The results are astounding
Let’s start with a simple prompt without any characters or complex details that could trip up any of the AI video generators: “A photorealistic shot of espresso being poured into a white cup in slow motion.” Given the static nature of this shot, you’d expect all models to nail the task. However, the results were strikingly different.
The first-gen Sora model’s attempt was passable at a glance. It understood the objects — cup, liquid, machine — and assembled them in the correct order. But the illusion quickly fell apart. The “espresso” had a thick, gloopy consistency and splashed into the cup with unnatural physics. It was a video of the words in the prompt, but it lacked any sense of artistry or realism.
Veo 3’s generation, by contrast, felt like it was captured by a professional videographer. The espresso flowed with convincing viscosity, and the liquid swirled realistically as it settled. It’s not a perfect result as the coffee only dispensed from one side of the portafilter, but still a significant improvement over Sora’s attempt.
Sora 2 is the newest and best of the bunch — it showcases realistic physics without any of the errors exhibited in Veo 3’s result. But is it a vast improvement? Not really. But luckily for OpenAI, we’re just getting started.
What about animals? The first-gen Sora model actually did an acceptable job of capturing the frenetic energy of a golden retriever in a crowded park. Veo 3 did a slightly better job, but the random sea of background characters were a clear sign of AI’s presence.
Sora 2 is where things become unsettlingly real. It rendered the golden retriever with extreme precision and the entire scene was believable. The people in the park weren’t blurry nor artificial. My only nitpick would be that the scene had too many other dogs for an ordinary urban park.
Moving on, I asked for a motorcyclist riding along a beach at sunset. Once again, the original Sora model gave me a borderline cartoonish result where one motorcycle fishtails while another glides into the water with zero resistance. I wouldn’t call this result passable. Surprisingly, Sora 2 unexpectedly failed at this task too, making the same mistakes as its predecessor.
Veo 3, on the other hand, delivered a shot that looked downright cinematic. The motorcycle moved predictably on sand, left behind a tread mark and trail of dust, and the bike leaned subtly as the rider turned. But the lighting was the most stunning part; the low sun cast long, dramatic shadows and glinted realistically off the motorcycle.
My next prompt proved to be a difficult challenge for the older models: “Iconic yellow taxi driving along Kolkata’s streets during a bright day.” Sora and Veo 3 couldn’t generate usable clips, but their failures were interesting nevertheless.
Sora’s version broke the rules of reality. It struggled with object permanence, causing pedestrians to pop into existence on the sidewalk or, in one jarring moment, briefly merge into each other. Needless to say, this dreamlike sequence doesn’t resemble reality.
Veo 3’s attempt was more coherent but failed on the execution of details. It did a much better job of capturing the authentic atmosphere of Kolkata, but the taxi itself moved with a weird, sliding motion that didn’t feel connected to the road. Furthermore, as is common with AI, any text was rendered unreadable. The newer Sora 2 model performed much better, nailing the atmosphere of the city and even the occupants of the vehicle. You could easily pass it off as a real video.
Finally, let’s take a look at what I think is the most impressive result yet for Google’s model: The Mandalorian in Bangkok. Surprisingly, neither Sora nor Veo 3 refused my prompt on copyright grounds.
Either way, the result from Veo 3 was staggering. The character it produced was a splitting image of the real deal, from the specific sheen of the armor to the iconic silhouette of the helmet. It looked less like an AI generation and more like a deleted scene from the show.
Sora, on the other hand, delivered a close approximation at best. It generated a generic character clad in shiny, polished chrome with neon lights reflecting off its surface. It captured the Bangkok part of the prompt but failed on the main subject. In a way, Sora avoided breaching copyright, but it also failed to accurately follow my instructions.
Unfortunately, the newer Sora 2 model now refuses to generate a video containing a copyrighted character, even though we know it’s fully capable of doing so, so it earns a DNF for this one.
AI video generation has come a long way
Mishaal Rahman / Android Authority
When OpenAI announced Sora in early 2024, most of us were taken aback by just how realistic and convincing it looked. Those early samples showcased impressive cinematic flair and promised to disrupt video production. At the time, OpenAI also had one of the best AI image generators in the form of DALL·E. But when Sora finally launched in December 2024, it fell short of those lofty expectations. Google followed up with its Veo model only a few days later, nevertheless, and steadily iterated with aggressive updates that culminated in the Veo 3 we have today.
Unfortunately, Google’s early AI video generator release wasn’t as flawless as the demos suggested either. But Veo 3 and Sora 2 are different beasts entirely.
Initial Veo and Sora models suffered from the same tell-tale signs of generative AI: background objects would shift unnaturally, characters lacked object permanence, sometimes blending into the environment or even fusing with one another. Physics also barely mattered as objects moved in frictionless, impossible ways and you were lucky to get any narrative consistency.
Sora 2, and Google’s Veo 3 to a slightly lesser extent, address nearly all of these flaws. A single sentence prompt can now yield a full-fledged video, complete with realistic voices and even music. That makes these AI video generation tools incredibly handy for light content creation. Teachers can create visual stories for class, business owners spin quick ads for social media — the use cases feel endless.
The only problem is cost. With Gemini Pro, you get only three Veo 3 videos per day. However, I found that the Google Labs project called Flow also grants you 1,000 AI credits per month. This translates to roughly 100 videos using the Veo 3 “Fast” model.
Sora 2, on the other hand, is currently free to use, even without a ChatGPT subscription. OpenAI CEO Sam Altman has admitted this open access is unsustainable, though, as usage has already exceeded expectations. A daily limit seems inevitable, but in fairness, I typically got a usable clip on the first try thanks to the model’s stronger grasp of physics, motion, and real-world nuance.
The catch is that Sora 2 isn’t publicly available yet, and OpenAI will almost certainly place a hard limit on the number of video generations once the service rolls out more broadly. So for now, Veo 3 remains one of the best-kept secrets of Google’s Gemini Pro subscription.
Thank you for being part of our community. Read our Comment Policy before posting.