Every few months, someone declares that “AI will replace all of us.”
Since I work with it closely, I get that question all the time.
But look closer: AI isn’t replacing people, it’s replacing tasks. And there’s a huge difference.
LLMs Are Parrots With Jet Engines
Large language models like ChatGPT, Claude, and DeepSeek are built to predict the next token so convincingly that it feels like a person wrote it, and they are brilliant at it. They can translate better than Google Translate, draft emails, debug code, and even simulate a therapist’s warmth.
But being good at sounding right is not the same as being right.
These models learn from a blend of books, articles, code repos, Wikipedia, forum posts, and scraped web pages. Some of it is peer-reviewed. Most of it isn’t. No army of editors checks the truth of every line. The data is riddled with contradictions, biases, outdated facts, and outright fabrications. Think of it as learning medicine from every medical textbook ever written… and every health forum, every horoscope blog, and a few recipe sites for good measure. The model sees patterns, but it doesn’t “know” which patterns reflect reality. It just gets very good at mimicking consensus language.
I’ve seen first-hand why that matters.
Quality Over Quantity
In 2016, I worked on a machine-learning project to detect obfuscated malware. Microsoft had a public Kaggle dataset (Microsoft Malware Classification Challenge) for exactly this problem. My supervisor advised me to use it or to generate synthetic data. Instead, I decided to start from zero.
For several months, I downloaded malware every day, ran samples in a sandbox, reverse-engineered binaries, and labeled them myself. By the end, I had a dataset of about 120,000 malware and benign samples, which is far smaller than Microsoft’s but was built by hand.
The results spoke loudly:
| Training Dataset | Accuracy |
|—-|—-|
| Microsoft Kaggle dataset | 53% |
| My own hand-built dataset | 80% |
| My dataset + synthetic data | 64% |
Same algorithms. Same pipeline. Only the data changed.
The point: the best performance came from manual, expert-curated data. Public data contained anomalies; synthetic data introduced its own distortions. The only way to get high-quality signals was to invest time, expertise, and money in curation.
That’s the opposite of how LLMs are trained: they scrape everything and try to learn from it, anomalies and all. It’s why they can “sound right” while being wrong.
And the worst part is that it’s putting down roots. A single hallucination from ChatGPT, posted on social media, gets shared, retweeted, repackaged, and ends up being fed back into the next training set. The result is a kind of digital inbreeding.
The internet was already full of low-quality content before LLMs arrived: fake news, fictional “how-tos,” broken code, spammy text. Now, we’re mixing in even more synthetic output.
Who curates? At present, mostly automated filters, some human red-teaming, and internal scoring systems. There’s no equivalent of peer review at scale, no licensing board, no accountability for bad data.
Where do we get “new” data?
Which naturally leads to the obvious question: where do we find fresh, high-quality training data when the public web is already picked over, polluted, and increasingly synthetic?
The first idea almost everyone has is “We’ll just train on our own user data.”
In 2023, I tried exactly that with my gamedev startup Fortune Folly – an AI tool to help developers build RPG worlds. We thought the beta-test logs would be perfect training material: the right format, real interactions, directly relevant to our domain.
The catch?
One single tester produced more data than fifteen normal users combined, but not because they were building richer worlds. They were relentlessly trying to steer the system into sexual content, bomb-making prompts, and racist responses. They were far more persistent and inventive in breaking boundaries than any legitimate user.
Left unsupervised, that data would have poisoned our model’s behavior. It would have learned to mimic the attacker, not the community we were trying to serve.
This is exactly the data-poisoning problem that big AI labs face at a planetary scale. Without active human review and curation, “real user data” can encode the worst, not the best, of human input, and your model will faithfully reproduce it.
The Takeaway
ChatGPT is only the first step on the path toward “replacement.” It looks like an expert in everything, but in reality, it’s a specialist in natural language.
Its future is as an interface for conversation between you and deeper, domain-specific models trained on carefully curated datasets. Even those models, however, will still need constant updating, validation, and human expertise behind the scenes. But they won’t replace experienced professionals; they’ll just change the way they deliver their knowledge.
The real “replacement threat” would come only if we manage to build an entire fabric of machine learning systems: scrapers that collect data in real time, reviewer models that verify and fact-check it, and expert models that ingest this cleaned knowledge. That would be a living ecosystem, not just a single LLM.
But I don’t think we’re anywhere near that. Right now, we already burn massive amounts of energy just to generate human-like sentences. Scaling up to the level needed for real-time, fully reviewed expert knowledge would require orders of magnitude more computing power and energy than we can realistically provide.
And even if the infrastructure existed, someone still has to build the expert datasets. I’ve seen promising attempts in medicine, but every one of them relied on teams of specialists working countless hours building, cleaning, and validating their data.
In other words: AI may replace tasks, but it’s nowhere close to replacing people.