The arrival of cheap, accessible and highly capable AI tools has made it possible to manipulate digital audio, video and image content in a way unknown to date. The phenomenon has only just begun and some researchers are warning of a exponential increase in its capacity, volume and potential for cyber attacks. Simply put, the deepfakes of 2026 will make it impossible to distinguish fact from fiction.
Deepfakes were – along with ransomware – the main cyberattacks of 2024 and this year they have “improved” drastically. AI-generated faces, voices, and full-body representations that mimic real people increased in quality far beyond what experts could expect. Furthermore, they were increasingly used to deceive users.
In many everyday situations, especially low-resolution video calls and multimedia content shared on social networks, its realism is now high enough to reliably fool inexperienced viewers. In practice, synthetic media have become indistinguishable from authentic recordings for ordinary people and, in some cases, even for companies and institutions.
This increase is not limited to the quality of the developments. The volume of deepfakes has grown exponentially: the cybersecurity firm DeepStrike estimates that from approximately 500,000 deepfakes online in 2023 it has increased to around 8 million in 2025, with a annual growth close to 900%. Faced with them and despite reports that highlight the threat that generative AI poses to digital reality and the proposals for multilayer defense frameworks, very little progress has been made.
One of the authors of the report, a professor of computer science and director of the Media Forensics Laboratory at the University at Buffalo, has published a situation analysis predicting a terrible next year where the majority will not be able to distinguish legitimate content.
Spectacular “improvements” in Deepfakes
Several technical changes underlie this drastic escalation. First of all, video realism took a significant leap thanks to video generation models specifically designed to maintain temporal consistency. These models produce videos with coherent motion, consistent identities of the people portrayed, and consistent content from frame to frame. The models separate information related to the representation of a person’s identity from information about the movement, so that the same movement can be assigned to different identities. Or the same identity can have multiple types of movements.
These models produce stable and coherent faces without flickering, warping, or structural distortions around the eyes and jaw that once served as reliable forensic evidence of deepfakes.
In second place, voice cloning has crossed what the expert calls the “indistinguishable threshold.” A few seconds of audio are now enough to generate a convincing clone, with natural intonation, rhythm, emphasis, emotion, pauses and breathing noise. This capability is already fueling large-scale fraud. Some large retailers report receiving more than 1,000 AI-generated scam calls per day. The perceptual clues that previously revealed synthetic voices have practically disappeared.
Third, consumer tools have reduced the technical barrier practically to zero. The arrival of enhanced AI applications like OpenAI’s Sora 2 or Google’s Veo 3, along with a wave of startups, allow anyone to describe an idea, let an extensive language model like OpenAI’s ChatGPT or Google’s Gemini write a script, and generate high-quality audiovisual content in minutes. AI agents can automate the entire process. The ability to generate coherent, argument-based deepfakes at scale has been effectively democratized.
This combination of increasing numbers and characters almost indistinguishable from real humans creates serious challenges for detecting deepfakes, especially in a media environment where people’s attention is fragmented and content spreads faster than it can be verified. And there are numerous reports of real damage: misinformation, harassment, financial scams and almost any cyberattackonce AI deepfakes have ceased to be a theoretical function and have become an exploitable “solution” in the real world that undermines digital trust, exposes companies to new risks and boosts the commercial business of cybercriminals.
Deepfakes of 2026: indistinguishable and in real time
The researcher advances a future where deepfakes advance towards real-time synthesiscapable of producing videos that will resemble the nuances of human appearance, making it easier to evade detection systems. The frontier is shifting from static visual realism to temporal and behavioral coherence: models that generate live or near-live content instead of pre-rendered clips.
Identity modeling is converging on unified systems that capture not only what a person looks like, but also how they move, sound, and speak in different contexts. The result moves from “this looks like person X” to “this behaves like person X over time.”
As these capabilities mature, the Perceptual gap between synthetic and authentic human media will continue to narrow. The meaningful line of defense will be away from human judgment. Instead, it will rely on infrastructure-level protections. These include secure provenance, such as cryptographically signed media, and AI content tools that use open standards such as that proposed by the Coalition for Content Provenance and Authenticity.
“Simply looking more closely at the pixels will no longer be enough”concludes the forensic expert.
