The current wave of generative AI animation often feels like a magic trick that only works once. You type in a prompt, a video appears, and if you don’t like the result — maybe the feet are all wonky, which is a regular issue with AI generations — your only real option is to try a different prompt. This “black box” approach is exactly what Cartwheel, a new 3D animation startup, is trying to dismantle.
Andrew Carr and Jonathan Jarvis, two veterans with roots at OpenAI and Google, respectively, founded the company, which is working to build a future where AI handles the technical drudgery of animation while leaving the creative soul to the artist.
I spoke with Carr and Jarvis about launching their company, defining “taste” with AI, and the technical and creative difficulties of animation in 2026.
What sets Cartwheel apart
According to the founders, one of the biggest hurdles in this space is that 3D motion data is remarkably scarce compared to the endless oceans of text and images available online that AI models are trained on.
“If you look at all the big tech companies, they’ve built their models on written language, audio, image, [and] video because there’s just so much of it, so finding those patterns is much easier,” Jarvis said. “We knew it was going to be hard, but it turns out to be harder than we thought by probably a factor of 10 or 100 to get that data.”
Read more: Generative AI in Gaming Is Here, but Facing Pushback From Gamers — and Developers
While other tech giants focus on generating final pixels, Cartwheel has spent years mapping how humans actually move. Their models are built to understand the nuances of a performance so that a simple 2D video of someone dancing in their backyard can be translated into a precise, realistic 3D skeleton.
This shift from flat images to 3D assets is what gives animators the control they have been missing in the AI era.
Cartwheel has spent years tackling the difficult task of mapping how humans actually move.
Preventing AI “sameness”
Cartwheel’s executives said they view AI’s “sameness” as a byproduct of a lack of control. If everyone uses the same generator to produce a video, the results may eventually start to look all too similar.
“The output of our system is designed for people to edit. It’s designed for people to touch and manipulate, and we don’t want someone to type something in and then have it shuffle through to a finished animation. That’s not the point of it. That’s boring, who’s going to watch that?” Carr said.
“The fact that it’s very easy for people to get into it and edit it actually totally removes the sameness problem,” he said. “You put it on different characters, you put it in different environments, you change how it looks, you push the performance, you pull the performance, and in that sense [sameness] turns into a nonissue.”
Carr and Jarvis said the solution is to provide a “control layer” where the AI output is just the starting point. By generating 3D data instead of flat video, the creator can change the lighting, move the camera or adjust a character’s pose after the AI has done its initial work — making the technology a sophisticated power tool rather than a replacement for the artist.
Founder Andrew Carr said one of his core scientific hypotheses is that movement and motion is a fundamental data type.
The future of animation with AI
Beyond just making animation faster and lowering the barrier to entry, the company is looking toward a concept they call “open-ended storytelling” or “open-ended world-building.” In modern gaming and social media, the demand for content has reached a scale that manual animation cannot possibly match.
Cartwheel envisions characters that aren’t just programmed with a few set moves but are powered by motion models that allow them to react and perform in real time. It’s less about choreographing every single frame and more about “rehearsing” with a digital actor that understands the intent of the scene.
Ultimately, the goal is to bridge the gap between 2D vision and 3D execution, said the founders.
“One of the core hypotheses that we hope is true in the next three years for Cartwheel is everyone will work in 3D even if it’s authored in 2D, even if the final output is just 2D video,” Carr said.
By focusing on the “layer below the pixels,” Carr and Jarvis said they hope that as animation becomes more automated, it also becomes more personal. The machine handles the biomechanics and the file exports, but the human keeps the final say on the taste, the timing and the heart of the story.
