Transcript
Bhat: What we’re going to talk about is agentic AI. I have a lot of detail to talk about, but first I want to tell you the backstory. I personally was at Oracle like eight years ago. I landed there through another acquisition. After I finished integrating the company, I said, let me go explore some new opportunities. I ended up leading AI strategy across all of Oracle’s products. Eight years ago, we didn’t have a lot of answers, and almost every CIO, CDO, CTO I spoke with, the minute we started talking about AI, they’d say, “Stop. I only have one question for you. I’ve been collecting all the semi-structured data, unstructured data, putting it in my data lake, like you guys asked me to, but that’s where data goes to die. How do I go from data to intelligent apps?”
Frankly, we didn’t have a great answer back then. That’s what took me on this quest. I was looking for startups we could acquire. I was looking for who was innovating. I ended up joining Rockset as a founding team member. I was the chief product officer over the last seven years there. We built this distributed search and AI database, which got acquired by OpenAI. Now, eight years later, I’m having a lot of very similar conversations where enterprises are asking, what we have today with AI, LLMs, are very interesting, ChatGPT, super interesting. Doesn’t write great intros, but it’s pretty good for other things. The question still remains, how do I go from my enterprise data to intelligent agents, no longer apps. That’s what we’re going to talk about.
GenAI Adoption
Before I get into it, though, let’s look at some data. Somebody asked me the other day if I believe in AI. I believe in God and I believe in data. Let’s see what data tells us. If you chart out the adoption of some of the most transformative technologies of our times, PCs, internet, and look at it two years out from when it was first commercialized. Like, what was the first commercially available version? For PCs, it goes back to 1981, when the IBM PC was commercialized. Then you plot it out two years later and see what was the rate of adoption. With GenAI, it’s almost twice that of what we saw with PCs and internet. That’s how fast this thing is taking over. That’s both incredibly exciting and creating a lot of disruption for a lot of people. When people now ask me if I believe in AI, I’m basically saying, here’s what the data says. I don’t want to be on the wrong side of history for this one.
GenAI vs. Agentic AI
The previous chart that I showed you, that was basically GenAI. We’re now talking about agentic AI. What’s the difference? The simplest way to think about or just internalize the main concept here is, with GenAI, we talk about zero-shot, and as you start getting better prompt engineering, you’d go into multi-shot. Really with agentic AI, we’re talking about multi-step. Think about how you and I work. If you’re writing code, you don’t just sit down and say, let me go write this big piece of code, and the first time you do it, end-to-end, assume that it’s going to be perfect. No, you take a series of steps. You say, let me go gather some more data around this. Let me go do some research. Let me talk to some other senior engineers. Let me iterate on this thing, test, debug, get some code reviews.
These are all the steps that you’re following. How can you possibly expect AI to just go in zero step and get it perfectly right? It doesn’t. The big advancement we’re seeing now is with multi-step and agentic AI. That brings us to the most simple definition of agent that there is. It can basically act. It’s not just generating text or not just generating images for you. It’s actually taking actions, so it has some autonomy. It can collaborate with other agents. This is very important. As we get into the talk here, you’ll see why the success depends on multi-agent collaboration. Of course, it has to be able to learn. Imagine an agent that’s helping you with your flight bookings. You might ask it to go rebook your tickets, and it might collaborate with a couple more agents to go look up a bunch of flights. It might collaborate with another agent to maybe accept your credit card payment.
Eventually, after it’s done the booking, it realizes it’s overbooked that flight. It has to learn from that mistake and do better next time. This is the most simple definition. We’re all engineers. We like precise definitions. The interesting thing that happened in the industry is this sparked a whole lot of debate on what exactly is the correct definition of an agent. Hien was literally debating against himself in the keynote, and that’s what we like to do. We like to debate. A very simple solution came out, which is, let’s just agree it’s agentic. We don’t really know what exactly an agent looks like, but now we can start looking at what agentic AI looks like.
I really like this definition. We’re not great at naming, but the definition makes sense, because the reality is that this lies on a spectrum. Agentic AI really lies on a very wide spectrum. That’s what your spectrum looks like. We talked a little bit about multi-step. In prompt engineering, what you’re doing is you’re just going zero-shot to multi-shot. You’re really trying to understand, how can I give a series of prompts to my LLM to get better outputs? Now with agentic AI, you’re very quickly saying, maybe I don’t want the same LLM to be called every single time. You might call the same LLM, or you might call a different LLM as you start realizing that different models do better on different dimensions.
Very quickly you realize that if, again, you think about a human, the most fundamental thing that a human does well is reasoning and memory. It’s not enough to have reasoning, you also need memory. You need to remember all the learnings that you’ve had from your past. That’s the long-term memory. Then you need memory between your steps, and that’s just the short-term working memory. The long-term memory typically comes from vector databases. In fact, Rockset, the database that I worked on over the last seven years, was an indexing database. What we did was we indexed all the enterprise data for search and retrieval.
Then, of course, added vector indexing so that you could now do hybrid search. Vector databases are great for long-term memory. There’s still a lot of new stuff emerging on the short-term working memory. That’s just one part of the equation. As you start adding memory, you want learning, and this is what reflection does. Reflection is basically where you say, I’m going to do the feedback loop iteratively and keep going and keep learning and keep getting better, until you get to a point where you’re not getting any more feedback. It can continue. Learning is endless. Every time something changes externally, the learning loop kicks in again, and you keep having more reflection. This is like a self-refined loop, and this is super important for agentic AI.
Planning and tool use is another big area that’s emerging. In fact, when you see LLMs doing planning, it’s quite mind boggling. As you get more senior in your role, you get really good at looking at a big problem and breaking it down into specific things. You think about, how do you chunk it up into smaller tasks? You think about, who do you assign it to? In what sequence does it need to be done? Again, an LLM is really good at doing this kind of planning and getting those steps executed in the right way. The tool use portion is just as important, because as it’s planning, it’s planning what tools to use as well. Those tools are supplied by you, so the quality of your output depends on what tools you’re supplying. Here I have some examples, like search, email, calendaring. That’s not all. You can think about the majority of the tools actually being code execution. You decide what it needs to compute, and you can give it some code execution, and those can be the functions that it calls and uses as tools.
Eventually, you can imagine agents using all these tools to do their work. Again, you control the quality of that output by giving it excellent tools. As you start thinking about this, you get into multi-agent collaboration really fast. When you think about the discussion around agents so far, the reason for this is the multi-agent collaboration gives you much better output. This actually is some really interesting data that was shared by Andrew Ng of DeepLearning.AI. HumanEval is a coding benchmark, so it basically has a series of simple programming prompts. Think of it like something you’d see on a basic software engineering interview, and you can use that to benchmark how well your GPT is performing.
In this case, 3.5 outperforms GPT-4 when you use agentic workflows. That’s really powerful. Most of you know that between 3.5 and 4, there was a drastic improvement. You can almost see that on the screen there. If you look at just zero-shot between 3.5 and 4, the improvement was amazing, but now 3.5 is already able to outperform GPT-4 if you layer on or wrap agentic workflows on top.
As you look at this, the next layer of complexity is, let’s get into multi-agent workflows, because there is absolutely no way an agent can do an entire role. The agent can only do specific tasks. What you really want to do is, you scope the task, you decide how small the task can be, and then you have one agent just focusing on one specific task. The best mental model I can give you is, think about microservices. Some of you may love it. Some of you may hate it. The reason you want to think about microservices is the future we’re looking at already is a micro agent future. You’re not going to have these agents that just resemble a human being. You’re going to have agents that do very specific tasks and they collaborate with each other.
Where Are We Headed? (Evolution and Landscape of Agentic AI)
If many of you have seen some of the splashy demos that came out earlier and then said, this simply does not work in production, this is the real reason that agents failed earlier this year. People tried to create these agents that would literally take on entire roles. I’m going to have an agent that’s going to be an SDR. I’m going to replace all these SDRs. I’m going to be an agent that replaces software engineers. It’s not happening. The reason is, going from that splashy demo to production is extremely hard. Today, the state of the art is very much, let’s be realistic. Let’s break it down to very small, simple tasks. Let’s narrow the scope as much as we possibly can, and let’s allow these micro agents to orchestrate, coordinate, and collaborate with each other. If you imagine that, you can envision a world where you have very specific things.
Going back to your software engineering example, you have an agent that starts writing code. You have another agent that is reviewing the code, all it does is code review. You have another agent that’s planning and scheduling and allowing them to figure out how they break down their tasks. You might even have a manager agent that’s overseeing all of them. Is this going to really replace us? Unlikely. Think of them as your own little set of minions that do your work for you.
However, like somebody asked me, does that mean now I have to manage all these minions? That’s the last thing I want to do. Not really, because this is where your supervisor agents come in, your planner agents, your manager agents come in. You will basically interface with a super minion that can handle and coordinate and assign tasks and report back to you. This is just like the right mental model, as you think about where are we headed and how do we think about the complexity that this is going to create if everybody in the organization starts creating a set of agents for themselves? This was my thesis, but I was really happy to see that this view of the world is starting to emerge from external publications as well.
This was actually McKinsey publishing their view of how you’re going to have all these specialist agents. They not only use the tools that you provide to them, they also query your internal databases. They might go query your Salesforce. They might go query wherever your data lives, depending on what governance you lay on, and what access controls you give these agents.
Agentic AI Landscape: Off-the-Shelf vs. Custom Enterprise Agents
That brings me to, what is the agentic AI landscape as of today? Some of these startups might actually be familiar to you. Resolve.ai had a booth at QCon. I’m by no means saying that these are the winning startups. All I’m saying is these are startups where I’ve personally spent time with the investors, with the CEOs, going super deep into, what are they building? What are the challenges? Where do they see this going? I’ll give you a couple of these examples. Sierra AI is building a customer service agent. SiftHub, which is building a sales agent, very quickly showed me a demo that says, “We’re not trying to create a salesperson. We’re just trying to automate one very specific task, and that is as specific as just responding to RFPs”. If you’ve ever been in a situation where your sales team is saying, “I need input from you. I need to go respond to this RFP, or I need input from you to go through the security review”.
That is a very specific task in a salesperson’s end-to-end workflow, and that’s really where SiftHub is focusing their sales agent. That’s how specific it gets. Resolve.ai is calling it an AI production engineer, but if you look at the demo, it shows you, basically, it’s in Slack with you, so basically, it’s exactly where you live. It’s an on-call support. When you’re on-call, it helps you to remediate. It’s very simple. It’s RCA remediation. It only kicks in when you’re on-call. This is where all the startups are focusing. When you look at enterprise platforms, and the enterprise companies are coming in saying, this is great. Startups are going in building these small, vertical agents.
If you look at an end-to-end workflow in any enterprise, there’s no way these startups are going to build all these little custom agents that they need, so might as well create a way for people to go build their own custom agents. Most of the big companies, Salesforce, Snowflake, Databricks, OpenAI are all starting to talk about custom enterprise agents. How do you build them? How do you deploy them, using the data that already lives on their platform? Each of them is taking a very different approach.
Salesforce coming at it from having all your customer data in there. To Databricks and Snowflake coming at it from a very different perspective, because they’re being your warehouse or your data lake. To OpenAI, of course, coming in with having both the LLM. This is really the choice that you have. Do you use off-the-shelf or do you build your own custom enterprise agents? The real answer is, as long as you can use an off-the-shelf agent, you want to do that. There’s just not going to be enough off-the-shelf agents. In that case, you might want to build a custom agent for yourself. It is really hard.
Custom AI Agents – Infra Stack and Abstractions
That’s what we’re going to talk about. What are the challenges in building these custom agents? Where is the infra stack today? What are the abstractions that you want to think about and layer on? How do you approach this problem space? I’m just going to simplify the infra building blocks here. Most of these are things that you’ve already done for your applications. This is where all the learning that you’ve had in building enterprise applications really transfers over. Of course, at the foundational level, you have the foundation models, that’s what’s giving you the reasoning.
Then you have your data layer, whether it’s your data lake, and of course, your vector database, giving you the memory. The context and memory and the reasoning give you the foundation. As you get into the many micro agents that we talked about, the real challenges are going to show up in the orchestration and governance layer. Because somebody was asking me, again, isn’t it easier for us to just wait for the reasoning to become so good that we don’t have to deal with any of this? Not really. Because if you think about how this is developing, I’m going to give you a real example, it’s not like, let’s say you’ve been hiring high school interns at work. They’re your equivalent of saying, I just have a simple foundational model that can do simple tasks using general knowledge.
As these models get better, it’s not like you’re suddenly getting a computer science engineer or you’re getting some PhD researcher. What you’re really getting is the IQ of that intern has gotten better and higher, but they’re still not super specialized, so you will have to give it the context. You will have to build in the specialization, irrespective of how much advancement you start seeing. You will see a lot more advancement on the reasoning side, but all these other building blocks are still with you. The governance layer here is really complicated. Again, a lot of the primitives that we’ve had on the application governance, it still carries over. How do you think about access controls? How do you think about privacy? How do you think about access to sensitive data? If an agent can act on your behalf, how do you make sure that these agents are carrying over the privileges that were assigned to the person who created the agent? There’s a lot of complexity that you need to think through.
Similarly, on the orchestration side too, when you start thinking about multi-agent workflows, it’s not simply, this agent is now passing information or passing the ball to another agent. What ends up happening is now you suddenly see distributed systems. Because, again, when LLM starts planning the work, it immediately starts distributing it. Now all the orchestration challenges that you’re very familiar with in distributed systems are going to show up here. It’s up to you and how you want to orchestrate all of this.
Then, on the top layer, you will see conspicuously missing, much talk about UIs. Why? Because the way you interact with an agent is going to be very different. Whether it’s conversational or whether it’s chat, you’re going to interact with it in the channels that you live today. You really don’t want to go to a bunch of different tools and log in, and forget the password every single time. You just have an agent, and that agent now is collaborating with a bunch of other agents to get the work done. The real top layer is going to be SDKs and APIs’ access to the tools that you are providing it. This is where it’s up to you to control which tools it can access. What is the quality of the tools that you’re providing it? How do you set up the infra building blocks in a way that is really scalable?
I want to spend a minute on specifically the context question, because you think about where we opened and said, we’re still figuring out how do you bring all of the enterprise context into agentic AI? There are a few different ways to go about it. We talked about prompt engineering. I always say, if you can get away with prompting, you want to stay there. It’s the simplest and most flexible way to get the job done. Many times, prompting is just not enough.
Then you look at more advanced techniques, RAG, or Retrieval Augmented Generation is the simplest, next best alternative. What you basically do with RAG is you’re indexing all your data. Imagine a vector database, Rockset. What we built is a great example here. Imagine you’re indexing your enterprise data so that now, as you’re building your agentic workflows, you’re building in the reasoning alongside your enterprise context. The challenge here is, it’s first very complicated, setting up your RAG pipeline, choosing your vector database, scaling your vector database, not that simple. The other big challenge you’re going to run into is, it’s extremely expensive.
Indexing is not cheap, as you know. Real-time indexing, which is what you run into when you really want agents that can act in real-time. Real-time indexing is very expensive, and you want to keep your data updated. Unless you really need it, you don’t want to go there. Fine-tuning is when you have much more domain-specific needs and training. I’d rather not go there, unless you absolutely have to, because as an enterprise, you’re much more in control of your data, your context, and much better to make a foundational model work for you. There are plenty of good ones out there.
Take your pick, open source, like really fast-paced development there. It’s very expensive too. Training your own model is extremely hard, extremely expensive, and just getting the talent to go train your own models in this competitive environment, forget about it. I think the first three are really the most commonly deployed. Training, yes, if you have extremely high data maturity and domain-specific needs.
The other thing to remember here is, all these building blocks we talked about are changing very fast. If you’re going to expose all of these to your end users, it’s going to be very hard. The best thing you can do is think about, what are the right abstractions from the user’s perspective? I have Salesforce as an example here because I thought they did a good job of really thinking about, what is the end user thinking about? How do they create agents? Really exposing those abstractions to everybody in the enterprise, and then decoupling your infra stack from how users create and manage their agents. The minute you do this decoupling, it gives you a lot more flexibility and allows you to move much faster as your infra stack starts changing.
Advancing Your Infrastructure for Agentic AI
With that, let’s talk more into what are the real challenges in your infra stack? I think we talked about the building blocks, but it looks very simple. It’s pretty complex under the hood. This statement, “Software is eating the world, but AI is going to eat software”, I couldn’t agree more. I’ve been having some conversations with some of the startups doing agentic AI and asking them how their adoption is going. The biggest finding for them is how quickly they’re able to go and show companies that they can go reduce their SaaS application licenses. You have 300 licenses of, name your favorite SaaS vendor, we’ll bring that down to 30 by the end of the year, and 5 the following year.
That’s the trajectory that you’re talking to customers about, and they’re already seeing that. They’re seeing customers starting to retire. One of the more famous ones was the Klarna CEO actually made a statement saying that they are now completely retiring Workday and Salesforce in favor of agentic AI. That is really bold and fast, if they’re able to get there that quickly. That is the thinking. Klarna was actually my customer while we were at Rockset. All of the innovation was happening in the engineering team. The engineers were moving incredibly fast, experimenting with AI, and before you know it, we have the Klarna CEO make a statement like this. This is going to be a very interesting world.
This tradeoff is one of the hardest to make. Having worked on distributed databases for the last few years, we spent so much time thinking about the price, performance tradeoff, and how do you give users more control so they can make their own tradeoffs? Now you have the cost, accuracy, latency tradeoff. Accuracy, especially like going from 95% to 98%, it’s like 10x more expensive. You really have to think deeply about what is the use case, and what tradeoff does it need? Latency, again, real-time is very expensive. Unless your agent needs to answer questions about flight booking and what seats are available right now, you might not need that much real-time.
Maybe you can make do with an hour latency, as long as you set up the guardrails in a way that this agent is going to be an hour behind. Cost, I had this really interesting experience. I was in the room. We were having this really heated debate between product and engineering. The product managers are like, I need this kind of performance. Why can’t you deliver? The engineers were like, you’re out of your mind. There’s no way we can ever get there at that scale. This is ridiculous. After 10 minutes of this, someone asked a very simple question, what is the implicit cost assumption we’re making here? That is really the disconnect between what product managers are saying and what the engineers are saying. Engineers know there’s a fixed budget, and the PMs are only thinking about, I can go charge the customer more, but just give me the performance. This disconnect is very real.
The only advice I have here is, think very deeply about your use case and what tradeoffs you’re making for your use case. Communicate it with all your stakeholders as early and often as you can, because these can’t be implicit assumptions, you have to make sure everybody’s on the same page. Then know that these tradeoffs will keep changing, especially given how fast the models are changing. You’re going to get advancements. These tradeoffs will change. If you made certain assumptions and bake them in, all hell will break loose when the next model comes out.
The previous one was about tradeoffs. Here there’s no or, this is an and. You absolutely need to layer on trust as one of the most important layers here. There are four key components. You’re thinking about transparency. Literally, nobody will adopt agentic AI, or any sort of AI if they cannot understand what was the thought process behind it. What were the steps? If you’ve even used GPT search, it now gives you links that tell you where am I getting this information from. If you want credibility, you want adoption, you have to build explainability right into it. I want to give a shout out to Resolve.ai for their demo, immediately, not only does it do an RCA, it tells you why. Why does it think this is the root cause, and gives you links to all of the information that it’s using to get there.
That’s explainability you want to bake in. Observability is hard. You know all the application logging and monitoring in everything that you’ve built in, how do you now apply that to agentic AI when agents are going in, taking all these actions on behalf of humans? Are you going to log every single action? You probably have to because you need to be able to trace back. What are you going to monitor? How are you going to evaluate? There’s just so many challenges on the observability side. This is where, as senior engineers, all the learnings that you’ve had from application domain, can transfer over here. This is still an emerging field. There aren’t a lot of easy answers. We’ve talked a little bit about governance.
How do you ensure that the agent is only accessing the data it’s supposed to? How do you ensure that it’s not doing anything that is not authorized? How do you make sure that it’s not getting malicious attacks, because anytime you create this new surface vector, you’re going to have more attacks? We haven’t even seen the beginning of what kind of cybersecurity challenges we’re going to run into once you start giving agentic AI more controls. The more you can get ahead of this now, think about these challenges, because this is moving so fast. Before you know it, you’ll have somebody in the company saying, yes, we’ve already built a couple of agents and deployed it here. What about all these other things?
The scalability concerns are also extremely real. When OpenAI was talking to us, I was looking at your scale, and it completely blew my mind. It was 100x more than anything I’d ever seen. That kind of scalability is by design. You have to bake it in from day one. The best approach here is to take a very modular approach. Not only are you breaking down your entire workflow into specific agents, you’re also thinking about, how do you break up your agent itself into a very modular approach so that each of them can be scaled independently, debugged independently?
We talked about the data stack, your foundational model. Make sure every single thing is modular, and that you have monitoring and evaluation built right into it. Because what’s going to happen is, you’re going to have a lot more advancements in every part of your stack, and you have to be able to swap that piece out depending on what the state of the art is. As you swap it out, you have to be able to evaluate the impact of this new change. You’re going to see a lot of regressions if GPT-5 comes out tomorrow, and you go, let me just swap out 4o and just drop in 5. I have no idea what is going to break. I’m speaking about GPT, but you can apply this to Claude. You can apply this to anything.
The next version comes out, you’re going to see a bunch of regressions because we just don’t know what this is going to look like. I think Sam Altman in one of his interviews made a really interesting statement, he said, the startups that are going to get obliterated very soon would be the ones that are building for the current state of AI, because everything is going to change. Especially, how do you think about this when maybe your development cycles are 6 to 12 months, whereas the advancements in AI are happening every month, or every other month.
Unknown Unknowns
That brings us to what we call the unknown unknowns. This is the most difficult challenge. When we were building an early-stage startup at Rockset, we used to say this, building a startup is like driving on a winding road at night with only your headlights to guide you. Of course, it’s raining, and you have no idea what unknown unknowns you’re going to hit. It’s kind of where we are today. The best thing to do is say, what things are going to change? Let’s bake in flexibility and agility into everything that we do. Let’s assume that things are going to change very fast. You can choose to wait it out. You might be thinking, if things are changing so fast, why don’t I just wait it out? You could, but then your competitor is probably not waiting it out. Other people in your company are not going to wait. Everybody is really excited. They start deploying agents, and before you know it, this is going rogue.
Lean Into The 70-20-10 (Successful Early Adopters Prioritize the Human Element)
This is really the most interesting piece as I’ve seen companies go through this journey. What ends up happening is, all of us, we love to talk about the tech, the data stack, the vector databases, which LLM, let’s compare all these models. Let’s talk about what algorithms, like what tools are we going to give to the agents. This is only less than 30%, ideally, of your investment. Where you want to focus is on the people and the process. Because what ends up happening, if you envision this new world where you’re going to have your own personal set of minions, not just you, but everybody in the company starts creating all of these agents or micro agents which are collaborating with each other, and with your agents, and maybe even some off-the-shelf agents that you’ve purchased, can you imagine how much that is going to disrupt the way we do work today?
This whole concept of, let’s open up this application, log in to this UI, and do all of these things, just goes away. That is going to be much harder than anything we can imagine on the agentic AI technology advancements, because ultimately you’re disrupting everything that people know. It was really fun to hear Victor’s presentation on multi-agent workflows. In one slide he mentioned, let’s make sure the agents delegate some of the tricky tasks to humans. I still believe, humans will be delegating work to agents, and not the other way round. How do you delegate tasks to agents? What should you delegate? How do you make sure that agents understand the difference in weights between all the different tasks that it can do? That is going to be really hard.
It is a huge opportunity to stop doing tedious tasks. It is a huge opportunity to get a lot more productive. It comes at the cost of a lot of disruption. It comes with people and processes having to really readjust to this new world of working. What we’re finding is that most of the leaders who are embracing this change are spending 70% of their time and energy and investment into people and processes, and only 30% on the rest. This is a really good mental model to have.
Conclusion
We haven’t talked at all about ethics. Eliezer is one of the researchers on ethics. I think the greatest danger really is that we conclude that we understand it, because we are just scratching the surface. The fun is just beginning.
See more presentations with transcripts