AI video generation is becoming mainstream, thanks in part to ChatGPT’s Sora AI. The old version of Sora, Sora 1, is still available to paid ChatGPT subscribers, but the current version, Sora 2, is rolling out to a wider audience after being invite-only initially. If you don’t have access to it yet, you will likely have it soon.
Sora can create videos based on just about any prompt, and its videos also have audio. Results sometimes seriously impress, but they can also disappoint, especially without careful prompt calibration and multiple iterations. Sora isn’t just a video generation model, either: It’s also a TikTok-like social platform for sharing AI videos.
To evaluate their video generation abilities, I gave ChatGPT (Sora 2) and Gemini (Veo 3.1: Quality) three prompts, starting with: “Somebody going about their daily life in a trendy apartment with rustic decor.” ChatGPT’s video doesn’t impress. It treats a levitating cup as if it were a pour-over, and afterward, the person in the video awkwardly crouches in front of a table. Veo’s video isn’t great either. In it, a person cooking grabs a spoon, but the spoon he grabs duplicates, leaving one on the table and one in his hand. Oddly, a record player is also on the kitchen counter. In both videos, the audio is slightly distorted, doesn’t sync up perfectly, and is missing certain sounds.
To test the chatbots’ abilities to handle complex motion, I asked them to create a video of somebody solving a Rubik’s Cube in a competitive setting with the following prompt: “Show me a pro Rubik’s Cube solver solving a cube.” Once again, neither video is especially good. Both feature distorted cubes, which means the audio in both videos doesn’t quite sync with what’s on screen. ChatGPT’s timer doesn’t make sense, while Veo’s camera zoom is distracting. The voice of ChatGPT’s persona also has a slightly distorted quality, making it feel AI-generated.
My final test was for text generation within a video: “Generate me a video of a teacher in front of a class writing down y = mx+b on a whiteboard while explaining the concept.” Unsurprisingly, there are significant issues with these videos as well. ChatGPT’s text is nonsense, and the voice of its teacher is, again, distorted. Veo’s video, confusingly, starts with “y = __ +b” on a whiteboard, and the teacher fills in the “mx” portion, while most of what its teacher actually says is garbled nonsense. Neither delivers on my prompt.
Even though my tests suggest otherwise, you can generate impressive videos with both ChatGPT and Gemini. However, this requires numerous prompt tweaks, multiple generations, and considerable time. If you pay for ChatGPT’s expensive, $200-per-month Pro subscription, you can use Sora 2 Pro, rather than Sora 2. I ran the same prompts through Sora 2 Pro below, and although the generated quality seems slightly higher, the videos still feature various errors and distortions.
ChatGPT Pro also lets you leverage Sora 2’s Storyboard feature, which breaks down videos into individual scenes that you can script. These videos can be 25 seconds in length compared with 10 seconds for the standard ones. Although this feature is useful for generating more complex videos, it doesn’t result in meaningfully fewer errors and distortions in testing. Veo has a somewhat similar Flow tool for editing videos and stitching them together, but it doesn’t avoid issues either.
