AI video generation isn’t yet mainstream, but you do get limited access to Sora video generation with a ChatGPT Plus subscription. Sora can create videos based on just about any prompt, but results are a mixed bag, especially without careful prompt calibration and multiple iterations. And unlike Gemini’s Veo 3 video generation model, Sora can’t generate videos with audio.
To evaluate their video generation abilities, I gave ChatGPT and Gemini three prompts, starting with: “Generate me a video of somebody cooking in a studio apartment. I want to see a gas stove in the kitchen, and I want you to include brown dining chairs.”
ChatGPT’s video is mildly horrifying. The subject contorts its body in strange ways, while generation errors, such as a pot with two handles and an odd kitchen layout with two cooktops, are rampant. Gemini’s video looks better and impresses with fairly accurate audio, but it isn’t perfect, either. I see some distorted cooking utensils in the background, the subject moves to place a lid on a pot that already has a spoon in it, and a wooden spoon appears out of thin air.
To test the chatbots’ abilities to handle complex motion, I asked them to create a video of somebody solving a Rubik’s Cube in a competitive setting.
Once again, ChatGPT’s video is a fever dream. It renders multiple cubes (one of which has heavy distortion), and the subject doesn’t actually manipulate them. Gemini creates relatively believable audio generation, and the facial expressions of its subjects are a highlight. However, the cube and the fingers of its subject show distortion, while the numbers on the timer don’t make sense.
My final test was for text generation within a video: “Generate me a video of a teacher writing down the first law of thermodynamics on a whiteboard while explaining the concept to the class.”
ChatGPT performs better here than in the other tests, largely thanks to the realistic teacher, but it still misses the mark. Its failure to have the subject actually write anything, the multiple clocks on the wall, and the nonsensical text place this video firmly in the uncanny valley. Gemini also did the best with this prompt out of the three I gave it. The audio is particularly good, with the teacher correctly outlining the first law of thermodynamics. Gemini manages to generate text in the neighborhood of what I asked for, too, but it simply starts appearing on the whiteboard at the end and isn’t as legible as it could be.
Even though my tests suggest otherwise, you can generate impressive videos with both ChatGPT and Gemini. However, this takes lots of prompt tweaking, many generations, and time. Nonetheless, ChatGPT can’t match Gemini in either features or performance. Beyond its audio generation capabilities, Gemini has an AI animation tool called Whisk and a unique filmmaker tool called Flow, which lets you cut and extend clips.
Keep in mind, however, that you can access Sora video generation with ChatGPT Plus ($20 per month), whereas Veo 3 video generation currently requires Google’s AI Ultra plan ($250 per month). Even if you ante up for AI Ultra, you get just 12,500 credits per month (each generation with Veo 3 costs 100 credits). Gemini’s Veo 2 video generation, which is accessible through Google’s cheaper AI Pro plan, produces results comparable with Sora.