Perhaps the new year is a good time to look back on the old year and see where we have come within the annual cycle.
There will never be another year like 2024 for artificial intelligence.
Throughout the year, obscure product demos became household names. People really started to focus on using non-human sentient agents for problems like climate change. We also saw radical changes in the infrastructure behind these models.
I was looking at some of the roundups that are out there as we enter the new year. This is quite detailed and contains several dozen points, many of which I have covered. But here are some of the big ones that stand out to me as I look back on the past 365 days.
AGI is closer
One of the overarching ideas that comes up again and again is that we are closer to artificial general intelligence, or AGI, than we thought early last year.
Here’s a survey I conducted in January among several people close to the industry. You can see those different time frame predictions, balanced against each other.
But now many of the cognoscenti think that we are currently on the eve of AGI itself. A large number of those predictions will therefore be significantly revised.
AI can solve language
By the end of the year, we also discovered that we now actually have the power to build real-time translation into our consumer products.
This was mainly due to the demos of Meta’s AI Ray-Ban glasses a few weeks ago. When Mark Zuckerberg interviews people with the AI engine that converts his question into other languages in real time, we see this technology at work.
Language is also important.
I was watching this interview with Lex Fridman from last February where he talked about the importance of applying AI to different world languages. We cannot take for granted, he explained, that people speak English.
“Everything where there is interaction with a product, it all has to be captured, everything has to be turned into data,” he said at the time. “And that will be the advantage: the algorithms don’t matter… you have to be able to tailor it to each individual person, and do that, not over a single day or a single interaction, but over a lifetime, sharing memories, the lows, the highs and the lows, with your great language model.
I have consistently applied the analogy of the Tower of Babel story to the process of figuring out how to use AI to communicate. It is an “inverted Tower of Babel” in which different language speakers come together to celebrate their new ability to understand each other without the help of a human translator.
The Transformer is the engine, but it is also replaceable
As 2024 progressed, I discussed the use of transformers in new language model systems.
Experts talk about the transformer as an “attention mechanism” that allows the program to focus on things that matter more – to it – and to the human user.
But 2024 also brought glimmers of brand new concepts to replace the transformer, ideas moving into the realm of quantum computing, and super-powerful information processing that isn’t limited by a traditional logical structure.
Which brings me to my next point.
Revolutionizing neural network capacity
Another thing we’ve seen become increasingly important is liquid neural networks.
Now it’s time to add the usual disclaimer: I’ve consulted on liquid neural network projects tackled by the MIT CSAIL lab group led by Daniela Rus – so I have some personal connection to this trend.
Liquid neural networks change the essential structure of the digital organism to enable much more powerful AI cognition with fewer resources.
In large part, that’s the kind of thing that has been useful in allowing people to put high-performance LLMs on edge devices like smartphones. It’s likely the deciding factor in Google’s ability to roll out Gemini on personal devices by the end of this year. So now we can quite literally “talk to our wallets,” and that’s a big difference. Part of the acceptance of AI itself will have to do with its ubiquity – where we encounter it and how it affects our lives.
AI is winning in multimedia
Here’s another big overarching takeaway from the work people have been doing with AI in 2024. It has to do with media.
I looked back and it turns out I covered an early announcement about OpenAI’s Sora in February. And indeed, we saw an early version rolled out late last year. I’ve personally used it to make some interesting and funny videos, all without any casting, shooting or production. It was pretty amazing.
Not to mention the groundbreaking text-to-podcast model where you can actually plug in a PDF or resource fact sheet and have two non-human ‘people’ talking about your chosen topic, which sounds exactly like a pair of traditional disc jockeys. . (Also check out the brand new blizzard of stories about Scarlett Johansson protesting the use of a Scarlett-like voice for the now-removed Sky assistant.)
This is another example of personal use of AI to convey that we are now in a new era. As you listen to these people, or even interact with them in conversation, ask yourself: are these people real? And how do I know that? They respond to me personally in real time – how do they do that if they don’t exist?
You could call this a ‘deep Turing test’, and it’s clear that the systems pass with flying colors.
Anyway, that’s my overview for 2024. There’s a lot more of course, from genomics to publishing and everything in between, but now that we’re past the Auld Lang Syne, people are wondering what’s going to happen in 2025? We’ll see, soon.