This is my sequel to the famous Transformers paper titled “Attention is all you need”. The technology described in that paper has led to almost every major breakthrough in AI in the past seven years, everything from ChatGPT to Google’s video generation model, Veo and everything in between.
The original paper was focused on helping computers (AI models) better understand and generate language. It describes how computers can derive meaning from a sentence by reading it all at once and using a mechanism called “attention” to decide which words are important.
For example, in the sentence “The cat that chased the mouse was hungry”, the model can connect the words “cat”, “mouse”, and “was hungry” to get meaning from the sentence while ignoring everything else.
The implementation of “attention” led to the creation of the powerful LLMs we have today.
As people have gotten more used to LLMs and made it a part of their lives, I have noticed that there’s still one challenge to maximising the usefulness of AI tools and models…
Context.
Context is the surrounding circumstances and background information that helps explain something.
It guides how you carry out tasks at work, it’s sometimes what makes a joke funny, it’s why Google search won, it’s everything.
A big challenge with AI models/tools at the moment is that they don’t always have context while carrying out certain tasks, and this impacts the quality of work done or answers provided.
Knowing what to take into consideration, how much impact it should have and when it’s no longer useful is the true test of intelligence.
Going back to our example sentence The cat that chased the mouse was hungry”, if we know how AI models apply attention, we can get more out of them by providing that information efficiently.
I’ll go as far as saying even if AGI is achieved, it isn’t relevant unless it can handle context well.
At the moment, various AI tools have tried to solve this in different ways. Memory and session awareness, RAG (Retrieval Augmented Generation), fine-tuning, MCP, and even creating prompt engineering guides.
However, I don’t think the problem has been properly solved yet because a lot of the current approaches put the burden of context on the user. This is not ideal because:
- It’s not always easy for humans to provide context because we sometimes don’t fully remember all the information and metadata we have, and use to make certain decisions.
- It negatively affects the user experience of AI tools if the end-user has to decide what to add, when etc. It’s like saying a joke but having to explain it to everyone every time.
I see a couple of ways that AI tools will try to solve the challenge of having enough context about how a person thinks of things:
-
Constant presence: General-purpose AI models could start to provide ever-present tools similar to the Limitless AI pendant to constantly track places you visit, things you see, conversations you have, reactions you have to things and apply that in answering questions or performing tasks for you. This could also just be the next evolution of smartphones.
-
Data import: In this approach, AI tools will provide mechanisms for users to import and sync their data from various platforms into their favourite AI tool. You’ll be able to sync your activity on X (things you see, like, post or repost), your work tools (emails, presentations, meeting recordings), your shopping data (items you’ve liked or bought on Amazon, Asos, etc)
-
Implants: In the far future, I also think AI tools could explore a neuralink type brain implant that allows people ‘extend their brain’ and ‘get the full power of AI’. With this, they could get more context to be able to answer questions and even eliminate delays with using AI tools. Why do you need to think of a question, unlock your phone, open the ChatGPT app and type the question when you can just think of a question and get the answer?
The ethics around these approaches is a conversation on its own. But whichever way they decide to go, the next thing will be to figure out when to apply that context and how much weight it should have. Some platforms try to do this through fine-tuning, weights and pre-processing requests AKA Reasoning. I absolutely think this is the way to go.
However, I think it would be interesting to see dynamic weights. i.e AI tools using different weights and providing answers in a different way based on what they already know about you. Some sort of personalisation. Similar to how Google sometimes tailor answers to location and search history, etc.
I’m not sure what the next phase of AI looks like, but I‘m convinced that a strong understanding and application of context will be an important part of it. I’m excited to see how players in the space handle it.