Beyond The Hype: Architecting Systems With Agentic AI

Transcript

Losio: We are going to chat about architecting systems with agentic AI. What’s today’s topic? Today’s topic is agentic AI that many consider even a buzzword in software architecture. What does it really mean for senior practitioners that have to build today’s production system? Panelists will discuss what makes AI agentic and how it differs from traditional AI implementation.

My name is Renato Losio. I’m a cloud architect, and an editor here at InfoQ. I will be joined by four experts coming from very different backgrounds, very different companies, sectors. They will discuss patterns, antipatterns, the lesson learned architecting systems with agentic AI. When, for example, agentic AI helps to solve problems that conventional approaches cannot, or what are the use cases where the added complexity maybe is justified and where it’s not. I would like to give each one of them the chance to introduce themselves and share their professional journey in agentic AI, and why did they join the table today?

Joseph: I’m Arun Joseph. I’m the co-founder and CEO of Masaic Agentic Systems. Previously, I led one of Europe’s first large-scale agentic systems for Deutsche Telekom in my role as head of engineering there. It was called Eclipse LMOS. It’s an open-source project now stewarded by Eclipse Foundation, deployed across four countries. It was an agentic PaaS, which was built. At Masaic, we are building large-scale decisioning systems with the power of agentic AI. At the core of our agentic AI journey, we believe that agentic AI is all about autonomous loops. We need a new computing paradigm for it, we call it the agentic loop. We are building an open-source component called AGC, it’s called agentic compute, which we believe should be something like the containers for cloud.

Koch: I’m an AWS Hero. I work for FICO. We are building an analytics system platform, which is based on AI technologies as well. We announced our foundational focused models, large language models that we’re bringing to the market through the platform as well. There, I’m an enterprise architect for mainly our infrastructure components. I’m a developer by heart. I’m mainly on the agentic AI development workflow right now, coming as well from CI/CD as a background.

Kurian: My name is Merrin Kurian. I’m a distinguished engineer at Intuit. Previously, I’ve led platform engineering for products, and led multiple transformations such as adopting public cloud and event-driven microservices at Intuit. Currently, I lead the AI Foundation, which provides platform capabilities for applying AI to all of Intuit’s product experiences. These include traditional ML, generative AI, and of course, agents. We started on this journey two-and-a-half years ago at Intuit, launching Intuit Assist, which is our AI-powered assistant across the products, QuickBooks, TurboTax, Mailchimp, Credit Karma. Underneath all of these is the platform that powers all these experiences, which we internally call GenOS. As we take up topics regarding agents, I can bring in various aspects of how we develop the platform and how they are helping our engineers build reliable agents.

Jewell: I’m Tyler. I am CEO of a company called Akka. We make an agentic AI platform, and that platform makes it easy to build and get systems into production. It helps you keep them there safely with continuous compliance, and it also scales them cost-effectively. This is the fourth company I’ve run. All previous companies were developer platform-related, and oddly enough, they were all related to Java as a programming language.

Agentic AI – Hype vs. Reality?

Losio: Actually, one thing that surprised me was how many media reports, as well as we have always a bit of hype on every big important topic we see, agentic AI is one of them, where people start to be super excited, then those are the big stories. Then suddenly you have the other way story where people start to say, all agentic AI projects fail, and that’s just hype and whatever else. There have been a number of media reports that say that, yes, many AI projects fail, don’t make it to production. Is agentic AI more hype than reality right now?

Jewell: We actually gave a board presentation to our investors about 9 months ago, and we had told them that we were launching an agentic AI platform. We felt like at the time that the market was one of the fastest oncoming markets we had ever seen, that we were going to have lots of competitors, and there were going to be all kinds of customers that showed up just almost immediately. Roll the clock forward 9 months, and what we’re seeing here is we have a lot of good traction and activity. There’s a bunch of really interesting projects that are going on. In terms of a large enterprise, it’s very selective. While everybody that we’re talking to is doing some work with agentic AI, it’s mostly prototypes and experimentation. Those things are not expected to get into production because they were not really qualified projects or large budgeted systems that were being pursued. On the enterprises that have gone there and failed, they face three challenges. Complexity, these systems are distributed systems that require orchestration agents, shared state, oftentimes streaming data. That’s a real challenge. It’s hard to trust these systems because of non-determinism. Lastly, there’s a lot of shadow costs that show up. It’s not just the model cost of self-hosting it. There’s a large amount of ancillary costs that go into the ongoing operation of these systems that don’t exist in more traditional deterministic systems. Those are tough challenges for organizations to get their arms around.

Losio: Do you share the same feeling about hype?

Kurian: Yes, there is definitely hype. We have been living through this. I think for this community of senior engineers and architects and practitioners, I will count and call upon all of you to help distinguish the hype from reality. Because there is a lot of influencers, even in LinkedIn. I never believed that I’ll see LinkedIn influencers, but I see them now. It’s really hard unless you apply your previously garnered skills, where you evaluate technology to find out which use cases it is suitable for. Just do the same thing. Just don’t let product teams create roadmaps based on what they see in social media. Help them understand what is possible and not possible. That’s how you collaborate cross-functionally and help each other, and build meaningful experiences for your customers.

Traditional Automation/AI vs. Agentic AI, in Prod

Losio: We started the discussion discussing agentic AI, but, personally, one of the first feeling I have is actually, what are the characteristics that distinguish agentic AI from a bit more traditional AI or traditional automation in general, actually, in a production environment?

Joseph: Traditional automation is more like execution engines, the RPAs, the workflow engines. The traditional AI itself has been more like a predictive mechanism, more like a probabilistic engine here, whether there’s sentiment analysis, predictive operations. There is a gap between prediction and an actual execution, which was there before. In my belief, agentic AI closes that gap in that now you have a new computing construct, which if built right, allows you to also do predict and take the next step for what needs to be executed and also execute it completely. A concrete example would be, we started our agentic program when I was heading the AI program in DT. In 2023, we put in front of a customer the first agent, which was built with RAG bots at that point in time. There were really some embarrassing moments in 2023, including helping the customers to switch to a different provider, even recommending. We had such experiences. What allowed us to move forward was the traditional, for example, the dialog tree systems, which was back then. There’s a lot of dialog tree systems, which was used in chatbots. This is a good example of traditional automation execution engines. It needed to be replaced by something because the customer might say something, I want to change my contract, and it can go in any number of dimensions, which you cannot script in the traditional automation, which is the reason why we had to build what we built, which now has around 65% deflection rate, back when I left Deutsche Telekom.

Losio: What’s your definition of agentic AI? How do you see it?

Koch: I think it’s more like giving some system the possibility to act on my behalf. For me, that means the whole thing around agentic AI is a hype, as Tyler also said. We have done similar things before. There are technologies that allow the same thing to happen. Now we’re giving a little bit of more brainpower to the possibility of a system acting on our behalf. I think that’s what the main difference with agentic AI is here from traditional stuff that we have been doing before. I think we’re on the hype. I think we’re somehow not defining correctly what an agent is yet. For me, an agent would be a brain that thinks for me and that grows with my experiences. I’m not seeing a lot of enterprises or tool companies building that for us today. That’s where I’m not really clear on what is an agent in this whole pattern that we’re talking about.

Joseph: We started to define agents at least within what we are building, as anything which is not a loop is not an agent, for our own mental clarity. What is a loop from a programming contract point of view? Lovable, Cursor, they’re all looping. For example, you give an intent, and it’s not like a one-shotting an answer. It’s like saying, I want to build a CRM system or something like that. It actually does a loop which takes that intent, breaks it down, and does the construction, then does an introspection, whether it was actually followed, much like the OODA loop or so. Then it takes an action. As against a simple question like, how many r’s are there in strawberry? This is not an agent, or rather say, make me a poem or something. This is like one-shotting. At least for our own clarity, we started to distinguish agent. Agent has this inherent characteristic, at least right now when we model the system should have the ability to follow a loop. This loop can be a guardrail when an answer is produced. You’re able to introspect that answer to ensure that it is correct or not. Or it can also include learning loops, for example, wherein you are able to inject that learning into the next iteration, which is the key. That’s also one of the reasons why the compute itself should become an agent, which is the next abstraction layer we need to come up with, like the agentic compute.

Koch: Does this mean that for you, an agent is partly starting to think? How deep is this going to go? I think you’re alluding to that with your own definition of a loop being an agent, an agent using a loop, compute, and stuff like that. I think what we’re talking about is we’re teaching agents to work together and to collaborate, and to then with that, they become a brain. My main challenge with all of that hype is it’s still the same LLMs behind the scenes. That’s where I’m eager to hear what you guys are building for us.

Joseph: I can give you a concrete example of what we’re actually building. For example, if you really want to, we are deploying a system which is used in large scale industrial decision-making of downtime. For example, there are these large machines, they produce sensor data, telemetry data, then there are tickets which are raised over the period of last one year. Then there is also account information, SOPs. Now you have information, this is information system. If you give it to a human, what does the human do? “Human, can you come up with a strategy which reduces the machine downtime by analyzing all these?” From this point on to the outcome that is produced, the human is doing something, let’s call it thinking, action, whatever it is. The systems that we are building is also in similar lines. For example, this industrial automation use case, you have a system which has a deep research engine, to which you say, analyzing all this information as an operations manager, what should I do to reduce the machine downtime in the South German region for the next month? It looks at spare parts, supply chain, and does some predictions. Whether that answer is 100% accurate or not is the business model that we are building as we progress forward. That’s the way we started to think about these agentic loops.

Jewell: I think we define an agentic system or at least a system where you want to make use of LLMs as having three properties. One, you should be able to define some goal. If you can state the goal and articulate it in a structured specified way, this is something that could be given to that system that it could work towards. The second is that it has to have guardrails. Guardrails and parameters under which it can operate towards that goal. Then the third is that it needs to adapt. Adaptation here could be a looping mechanism where it tries multiple iterations, but it could also be something integrated with a reinforcement learning feedback loop from that model or a data pipeline. Or it could even be taking a continuous stream of real-time data or IoT metrics or sensor data, and using that to adjust the goals or adjust the nature of the system. If you can say, I have a problem, and you can define it as I have a goal, I have a set of guardrails, and this is how I want it to adapt over time, then it is an agentic system and it is well suited for AI to potentially solve.

The Ideal Problem Fit, for Agentic AI

Losio: Actually, that’s made me think, I was attending a conference in another city. I was coming back by train, and another attendee, when I mentioned that I was the moderator of this roundtable, asked me an interesting question about agentic AI. It was, how do you actually evaluate whether a problem in the first place needs agentic AI? From an architect point of view, I have a problem. We mentioned before that many projects, actually, don’t really fail, people are just playing a bit with a demo, trying something new, but then they were really not meant for production. The reality is, how do I choose a project or a problem that I want to solve with agentic AI?

Kurian: I can take some examples from Intuit. It’s not as grand as what Arun had previously mentioned in terms of agenticness, but it has some agentic loop working on it. It’s everything with unstructured data, for example: text, speech, audio, documents, images. These are hard to hand code processing. Previously, we used to use computer vision models, for example. Now, again, as Arun mentioned previously, agents is where the AI systems and the software systems are coming together as one unit to produce the outcome. Again, Tyler talked about goals, guardrails, so all of these are coming together finally in a coherent system where you can actually solve problems. Going back to the example. At Intuit, we collect a lot of information. We make our users do a lot of form filling. Maybe we don’t have to do that anymore. The users can interact with the system, give the input in whatever context or content they want to. Maybe they upload documents, maybe they describe things in text, or they just talk about things, and then we capture the information. The info gathering could itself be an agentic system. We don’t have to now create forms which produce structured data so that we can produce structured APIs which can be stored in structured schemas. We always make it convenient for the underlying software system and work upwards. Maybe now we can rethink the user experiences to be more multi-modal native, because now LLMs are good at processing unstructured data. That’s one example where previously these were harder problems to solve with multi-modal LLMs. Now we can leverage them to solve problems which were previously harder, in a completely different way.

Losio: Do you have any other example from your experience?

Jewell: Yes, where does it stick, where does it not? The beautiful thing about LLMs is that if you give the parameters in a structured enough way, you can get a structured enough response from it. In many ways, it’s a new form of a transistor, but it’s a multi-dimensional transistor. It allows us to rebuild the units of innovation in a different way. Where do you apply that as an architect? In a situation where you have dynamic undefined rules. If you have a static rules list, probably it’s going to just be easier to program it, but it’s when it’s undefined or they’re constantly changing on you. Persona-based thinking where you can specify the role and give parameters on how that role is supposed to be leveraging their thought processes. Structured schemas. Anytime you have a structured document, you can match against that. It’s also great. Then, planning goal targeting. These are all low-hanging fruit with the accounts that we work with, and what they’re using it for.

Joseph: Essentially, there was a recent meme about the screen size comparison as you move up the corporate ladder, for example. The engineer sits with four or five different screens, and the CEOs sit with an iPad. In between, the screen sizes start to reduce. I’m just referring to a mental model, which we started to see as well, the large-scale decisioning. Why is a corporate structure like that? As you move up the ladder, the idea is at least these people are supposed to make better decisions based on the data and do de-orchestration. Orchestration at the lowest level. It’s not low from a high-low perspective, from an absolute sense. You would need to do certain things. You need to handle a customer in a shop and please the customer to make that sale. What should be solved? What are the competitors? This is an orchestration loop which handles at the top. Where agentic AI truly shines, if you try to retrofit that into today’s Camunda workflow or an A10, it might be seriously undercutting what it can actually do. Where it actually should shine is if it produces these large-scale orchestration loops on top and underneath the chain and the leaves, you could have very deterministic processes in there.

Losio: Do you have any feedback on that?

Koch: I find that mental model pretty interesting, Arun. The question is, how much are we able to delegate to those agents in the future? How much prompt engineering do I need to do to make that happen, or will I get agents to do the prompt engineer for me? This is where I see a lot of things. If I look at, how do I architect as a cloud architect today? Then, that’s maybe also a question for you, how you do that. You need to have a lot of experience built into what you’re actually doing as taking your decisions on when to choose SQS versus Kafka versus something else. There is a decision process happening in your head that you’ve built out by experience. We can say all of the LLMs have that experience. With the LLMs having that experience, the agents get that experience as well. I don’t think that that’s the experience that I’m talking about. There is a feeling of when do you take which decision, when do you architect for what? That’s where, at least for me, it’s at the current state of agents still a challenge to make them really useful in taking architecture decisions, because, essentially, I need to train them. We talked about, how much memory do the agents have? I think that’s where I need to train my agents going forward to have the similar knowledge of experiences through the career, that will then help me to shape how the agent interacts with me. At least that’s my view on it right now.

Joseph: You’re spot on. Essentially, as engineers, the first principles are not changing. What is happening is there is a new computing layer. That’s the way at least I start to think. For example, the idea that you need to take a large problem and divide it into smaller things, this holds. This is the whole idea of programming and systems engineering. There should be a reliable way in which distributed systems should do message passing. For example, the actor models and, of course, Akka is right here. Those things are not changing, the CAP theorem, nothing is changing. What’s happening is underneath all the determinism that we have built up, the best example is the CAP theorem. For example, you cannot have all 300%. It’s all a matter of constraints. We have all these systems. We know how to build all these systems. If you try to retrofit this new wave of computing somehow into this one, it’s rather a mundane view. Rather, if you put it on top, what happens is you have interesting orchestrations that is fully possible. For example, even with our own operations that we try to do right now, we are thinking constantly, how do you make it agentic? Not because where agentic system is today, what it might head to. I’m a huge AWS fan, we run in AWS as well, but the AWS console, it requires a PhD probably to go in there and figure out all that, and all the SDKs and all. Essentially, we are thinking constantly, how do you convert that intent into all the deterministic orchestrations, which could say, if this happened, what could be the reason and can you trigger this workflow? The workflow itself is deterministic, but the decision and the human in the loop and orchestration with the shortest possible feedback time, at least we start to think about it, where it could actually go.

Kurian: Johannes, maybe you’re expecting magic. There is no magic here. There’s a lot of hard work you need to do, but differently. That’s the only difference. Previously you might have had to code everything, right now you have to collect a lot of high-quality datasets, and then establish well-defined goals, and then maybe define the agent hierarchy if you have multiple agents to achieve the kind of systems you want to achieve. Again, like I said, there’s no magic. It is still hard work, even to prompt, even the agentic IDE to generate the kind of code you want. If you’re not very specific and detailed, it will just hallucinate and make things up, which will undermine the whole process. There is a lot of effort you need to still put in, but a different kind of effort. That was what I was trying to say.

Jewell: The real essence here is that instead of being a software architect or even a systems architect, you have to transition yourself into a learning architect. Because what you’re really doing is you’re having to take these elements down and say, if I wanted to achieve this goal, I have to decompose that into what are the units of thinking that I need and what is the right specification and the right data that is needed in order to achieve that. That’s a different type of thinking. I think that software engineers and system architects were trained to design systems against SLAs or latency metrics, or maybe even throughput metrics. This is now a learning and an accuracy and a safety objective that you’re working against. It’s a different type of thinking.

The Level of Trust with Agentic Systems

Losio: When Johannes asked me about how do I feel, it’s actually one of those challenges coming from a system architect as a software architect. I think I’ve always had a mindset that was more traditional, more like I run my monitoring, I run my cron job, I run my APIs, and this is deterministic. Do I trust something that is not, even if it provides me better results. It takes time to adapt to it. It’s a bit, I feel like, the concept of a self-driving car. It’s not enough that it does less accidents than a human-driven car. It’s just that it’s almost unacceptable that it does an accident. As a software architect, I find it almost unacceptable that I don’t have full control on the system. Even probably mentally, I never had it, but still, I find it really hard to accept it. I think that’s that transition that is hard to do, but that’s the mindset.

Jewell: I’m a pilot, and within the last 5 years, a number of new modern planes are equipped with autoland capabilities. They’ve always had autopilot, but I wouldn’t really call autopilot any intelligence. It’s pretty rudimentary in what it does. Autoland planes are there. All you have to do is push a button. It will identify the nearest airport. It will pick the right approach. It will fly the approach and set it all the way down. The idea of doing that and letting go of the control, but just can’t imagine myself doing it unless it was an absolute emergency or I had a passenger who needed to execute that button. Just can’t do it. Landing is one of the most dangerous times when you’re flying a plane, that’s where the most incidents and accidents occur.

Koch: In that case, Tyler, there is a lot of guardrails, if you activate automated stuff like that. There’s going to be a lot of protections for that. Is that what you’re saying, that we’re going to need to put enough guardrails and protections around any agents that we’re putting into our software development lifecycle is what you’re saying?

Jewell: I think all agentic systems are going to need to have guardrails. I think what you’re touching upon there is, how do you trust an agentic system? This is something that we spend a lot of time thinking about. It’s one of the reasons why we think we’re differentiated versus a lot of these other agentic frameworks that are open source. Most agentic frameworks, they give you a lot of amazing tools to write the agents or maybe even to test the agents. That doesn’t mean that you can trust the agents. We’ve been thinking about, what does it take to reduce that time to trust? How do you get to a point where you can inherently trust it as quickly as possible? When you can do that, you can put it in production. I think we’ve come up with what we think are six pillars on your path to trusting a system. The first one is that you don’t have to end with governance. I know as software architects, we don’t want to think about this, but you got to start with governance. You need to define your metrics. You got to understand what your accuracy and safety criteria are going to be. Then you have to define your policies and you have to define your controls. You have to define those up front. If you can’t do that, then you really don’t understand what you want out of the system. Then you need to build upon a stable, we like to think impenetrable runtime and network layer, because these systems are going to fail quite frequently. It’s got to be a runtime that knows how to isolate itself. It’s got to be a distributed system that knows how to guarantee reliability and resilience, because you’re going to have multiple agents on a distributed coordinated backbone that are sharing state. Lots of things can go wrong. Then you have to have all interactions between users, tools, agents be verifiable. Every agent needs to have an identity. Every user has to have an identity. All the interactions have to be on a zero trust and basically least privilege authorized mechanism. Then, every interaction has to be traced, monitored, and auditable. When you do all of that, then you can set it up for it to be able to continuously adapt itself. That sounds like a huge burden, but that’s what it takes. If you can do all those things, you now can inherently trust that the system is going to behave within the parameters that you’ve outlined it to do.

Monitoring and Debugging (Autonomous Decisioning with AI)

Losio: It’s really, how do you handle the monitoring and the debugging and system maintenance? Maybe I’m naive, but when I start to think that AI agents make autonomous decisions, my first feeling is, how do I monitor, debug, more so when things go wrong?

Kurian: I want to again say couldn’t agree more to whatever Tyler just mentioned. That’s why we took a platform approach to solving agentic use cases at Intuit. We first built a platform and said, the guardrails, the responsible AI, security, privacy, compliance, auditing, observability, monitoring, everything is platform concerned. We will take care of it. You can go build, solve for your customers’ use cases. We manage this centrally. Again, I cannot emphasize it more, governance. You can imagine the kind of business Intuit is in. We have various regulations, financial services. All of these need to be baked in. We cannot let use cases go build this on their own. First, velocity. Second is, of course, auditability. The other part I want to emphasize, in addition to the monitoring of a system as a software system, which you have the three pillars of observability, tracing, logging, all of that, is evaluation. I think the biggest leap from being a software engineer to become an AI engineer is how you do structured and systematic evaluation. For that, you need to work with data. A lot of people who are jumping into the AI bandwagon today, this is their first taste or first experience with AI. They completely forget the aspect of data. Without data, there’s no AI. LLMs made it super easy. You don’t have to now create feature sets to train models. This is the first time they’re working with data. Previously, we have heard, it works in my machine. Now I’m hearing it works for my data. It works for my questions. How do you curate, collect systematically good quality data for your prompts, for RAG, for evaluation, for fine-tuning, for monitoring these systems in production once they are rolled out? How do you have good quality regression evaluation suite? It’s not important that you do offline evaluation before you launch. How do you consistently collect traces, evaluate in production, bring that learning back, fix whatever problems you identified in production? This is a continuous learning system where data has to flow. At the core of all of it, there must be a continuous data pipeline that powers the end-to-end system. That’s something that traditional software engineers, architects may overlook. That’s super critical for monitoring, debugging, and maintaining these systems and keeping them at the same quality as you first launched, because things can go wrong, things you previously didn’t think about happen. You need to have a complete grip, a control on what happens in production, and that only happens if you have this continuous flow of data.

Handling, Debugging, and Monitoring End-User Interactions

Losio: Arun, you mentioned before what you built at Deutsche Telekom towards the customer-centric contrast and whatever. I was thinking iteration on that. I was wondering, actually, how do you monitor those interactions with the end user? How do you debug them? How do you handle them? Because I see many things can go wrong there to the end user.

Joseph: Interestingly, I remember last year in the InfoQ Munich Summit, this was a presentation at the point where most people were not doing agents at that time. We were talking about our five-country journey and all. There are two things which I’d like to talk about. Point number one was, we talk about observability, guardrails, and hundreds of things. Essentially, the systems that we are building right now is there is more data and no one is able to figure out what to do next, for example. When we started at Deutsche Telekom, we started with of course a new framework, which we built in Kotlin, because most of our engineers were in the JVM stack. The current engineers who knew their APIs could construct better agents than if you start to build some other stack, the shiny framework, then you have Conway’s Law in action. The world doesn’t need more AI agent frameworks, is what we soon realized. I’ll give you a concrete example. There was a billing agent which was developed back then on our agentic framework, which was called ARC, the Agents Reactor. At some point in the prompt, 75% of the prompt was what not to do, and the 25% was what it should do. This was not scalable at all. It’s not the agentic framework per se, it’s, how do you program an LLM? Then we came down with something back then, which was referred to as the agent definition language. For example, what is the best way to break the prompt itself into small units called programs? Then we started to apply everything that we learned in programming, for example, tree shaking. When a request comes in, just like the modules which are not required during the compilation, you tree shake and let it go, just like that. In the large program construct, based on the intent, you only compose the necessary small unit programs, which we called ADL constructs, which started to reduce the variability in there. Essentially, we realized it’s not the agentic framework thing, but also the way in which you approach guardrails is as important as this.

The last point which I would like to discuss is, in this one, what we are right now building, the idea of guardrail evaluation and all, the number of tooling is exploding. People don’t know what to do with it in the first place. We realized right now we are building agentic decisioning systems. Right now, there are two approaches we take. One is we build inherently evaluation pipelines on the business use case, but also from the end user experience. If you’re getting an answer saying, here is ways in which you can reduce the downtime, including graphs, including numbers, and including metrics. At the same time, it tells you, here is how I arrived at that answer. Here is a computation that I used. I could be wrong, but take a look at it. You have two parties involved now. It’s not only the engineers. There are better constructs that you can build to let the user also take high critical actions as well for building reliable systems.

Scalability Challenges, with Agentic Systems

Losio: You mentioned in the beginning that it’s an obvious mistake, or at least an obvious scalability issue, that the agent was mostly trained on what not to do instead of what to do. I think you mentioned three-quarters was what not to do. Can you explain a bit better, can you clarify to me why that cannot scale?

Joseph: This we call the Jenga tower problem. Essentially, the building of agents was what? The programming part. Of course, software engineers jumped into the bandwagon, and then they started to write the best programming constructs that they knew of, functional programming, pure functions, side effects, whatnot. Then you have to actually write your business process. How do you write it? Because, essentially, if you are automating a rigid process, you don’t need an agent in the first place. If you are coming up with a non-deterministic process, which can be executed in a number of combinatorial ways, for the first demo, you would write, “Agent, when a customer request comes in, if it’s about billing, call the billing tool. If the customer complains, or billing increases, use the billing dispute resolution tool.” It worked on the local machine or whatever, as Merrin mentioned. Then, when you put it into production, you start to see the first signs of trouble, because the customer is asking, I would like to cancel the contract, but I would like to also check what other options are available, for example. Now it starts to cancel the contract. Now the developer or somebody goes in and says, if the customer asks about upselling or something, don’t do it. Then suddenly you have a huge bunch of do not do things.

Jewell: This sort of problem, it’s actually been shown that if you sit there and say, I want you to do this, but there’s these 13 things that you’re not supposed to do, like these are the guardrails, the more of those things that you add, the more likely the LLM is to forget one of them. Just because you wrote them down there, it’s not going to necessarily honor that. This gets into a little bit about how regulations are written. As system architects, what you tend to do is you start by saying, here’s my happy path. I want you to do this thing. Then let me try to tell you all the things you shouldn’t do and you should do while you’re trying to do that thing. As opposed to if you wrote a regulation, a regulation actually starts by saying, thou shalt never do this, except in these situations. Thou shalt not fly into category B airspace, unless it’s an emergency, unless it’s this, unless it’s that. Oftentimes, when you take that, you shall never do this except under these three or four other conditions, then you tend to have a happier outcome overall. It’s a type of thinking approach.

Integrating Agentic AI into Existing Systems

Losio: I was thinking, we love to have agentic AI programs, but at the same time, in most companies, they don’t exist in their own world abstractly. They have to interact with other existing systems. When I try to solve a problem with a user, I have to interact with an existing API, external service, whatever. How should architects think about integrating agentic AI into existing systems? If we really think about production, or more so interface protocols, abstraction. Do you have any suggestion, anything you can think of?

Kurian: I can think a lot because Intuit is a 40-plus year old company. We have large legacy systems where none of those were designed for the agentic times. I’ll start with API. Agents need tools, and tools are underneath APIs. Then the APIs that the agents can work with or the LLMs can work with are simple, non-overlapping, which is not the case in any of our lines of business. We have complex domains. We have multiple date types. We have multiple tax types. There’s a lot of complexity in the domain until we reimagine the API, either create simpler versions or create a semantic layer. We are not going to be able to plug in everything as-is. Even if the enterprise architects are not working on agents, I think for the benefit of the future, where agents could be more mainstream, we all have to start thinking in terms of, what can LLMs use from our current architecture? Again, basics. If you don’t have good, high-quality documentation, if it is not good enough for agents, definitely it’s not going to be good enough for humans. If you write human-readable documentation for even your OpenAPI spec, that might serve agents better. There’s a lot we can go and do fundamentally in our enterprises before we jump into the AI bandwagon. Likewise for data. We need to have metadata that agents or LLMs can understand from the large data lake of tables and schemas so that they can bring in the right context to be able to plan and reason. Our data architecture needs to evolve. Our API systems definitions need to evolve. The tooling is getting better. Arun mentioned explosion of tools. I’m hoping eventually they’ll consolidate, or at least with Intuit, we are trying to wrap everything into a certain package so our developers don’t have to scout for synthetic data generation framework, an evaluation framework. We are bringing it all in one package so people can spend time solving problems rather than scouting for individual tools. Maybe more of these wrapper packages will become mainstream. These are things that enterprises can start thinking about.

Losio: Do you have any experience integrating agentic AI with existing systems, challenges, suggestions?

Jewell: You’re going to need that data for pipelines if you’re going to be doing any sort of model development or reinforcement learning on hosted models. There’s a real pipelining challenge that you’ve got to do. You’ve got to basically embrace some lightweight MLOps situations. Then you’ve got your systems of record out there that you need to have access to. You can front those with MCP tools. If you’re going to front them as MCP tools, then you need to make sure that they’re structured APIs that are fronting those that then have all the authorization and access policies associated with it. I think that this idea that agents, like A2A protocol, I think it’s a wonderful concept. We haven’t come across any enterprise that’s willing to just allow agents to discover other agents and start communicating with them without some verifiable identity of course through that. What ends up happening is that you end up mimicking the A2A protocol through some workflow or orchestration with durable execution so that it then manages the back and forth in a more structured way. Then, the last thing you got to do is, you’re going to have your memory, which is going to be a combination of your semantic knowledge, facts, any graph data for your relationships and your historical interactions. You’re going to have to find a way to bring all those in so that the agent can meld those and pass it along into the context window of each LLM. Those are all the data integration scenarios that you’re dealing with, with these agentic systems.

Architecting Agentic Systems for Continual Relevance

Losio: What strategy can be employed in architecting agentic systems to ensure their continual relevance in a constantly evolving landscape? One thing we have mentioned is that what we did two years ago was not that relevant.

Joseph: This is the best question I think so far on the projection into the future, what we need to do. Essentially, it’s been just the blink of an eye since the explosion of tooling and techniques and patterns, what we are trying to do. If anybody is willing to bet, I clearly see companies and teams who picked and chose framework A and B completely stuck now, because they chose something which could not evolve, and open-source projects getting abandoned completely. What we realized while building this in the agentic space for LMOS was, you need to build in such a way that you need to have that right balance between, you’re not coupled to any frameworks. What is the best exposition of the API, which you can expose as a platform API? The best example that I can think of is the responses API from OpenAI, which actually does the agentic orchestration loop and not the completions API. The entire agent frameworks are built on completions API, which results in all these brittle pieces and tooling, and this and that. If you look at responses API, it’s a very clean API. It focuses you to only look into the business systems, business applications, and it abstracts observability. It abstracts how do you plumb out the agentic orchestration loops or connecting MCP tools. That’s one of the reasons we started our own core AGC, which is the agentic compute, which we are going to use to build our systems, because the frameworks can change, tooling can change, but this layer can really adapt completely. You’re not putting everything into code, but rather into a substrate.

Getting Started with Agentic AI – Beginner-Friendly Frameworks/Projects

Losio: Can you suggest practical projects or frameworks that are beginner-friendly for agentic AI applications? If you had to start today to write an agentic AI application, where would you start? Do you have any advice?

Koch: I’m on the AWS side of the technology stack, so there’s been Strands agents announced a while ago. I think AWS announced it too early because it’s only Python and I don’t like Python, which is why I haven’t used it. I’m waiting for TypeScript or similar technologies to come out to be able to use it, but that would be my starting point. Then I think what makes it easy to start for you to build and host an agent is something that you need to look at as a developer. Where can you get your agent hosted without setting up a huge Kubernetes cluster or without compiling a lot of infrastructure around that? Any platform that makes that simple can be your friend, and actually starting to try out what an agent is.

How did you start your journey? Did you start locally building agents or did you start directly on an integration environment?

Jewell: First of all, please go check out akka.io. We’re a complete platform for building agents, their orchestration memory and their streaming. It is really simple to get started. The actual agent’s only a few lines of code and it’s something you can deploy into a cluster and scale up right away. The interesting thing was when I actually got started with it, it didn’t take the simple route. It was about almost a year ago now and we had had a number of customers and partners come to us asking us about the AI use case, and it wasn’t something that we had been taking all that seriously. They said, no, we’ve been using some of these more open-source frameworks and we were having some resilience issues, and we think that Akka could help us solve some of these problems. That’s when we started to take it more seriously. The way that we went about learning this space was, we went to basics. We started by just trying to understand how the models work and write programmatic applications that were programmed against those models. They wouldn’t even be called agents at this day and age, just to understand how they handle data inputs, how they chunk responses, looked at the communication pattern of it all. Then we used that, and we started building enrichment loops, which then made the foundation of agents. We took the backwards looking approach, backing into the problem. We did that work over a number of months before we sat down and said, what should our agentic framework look like? What are the principles here, and how does it relate to a distributed system? It was a bit of an unusual approach that we took on that. We took the hard way, probably not the advice for everybody else.

Losio: Actually, I noticed a nice comment about thinking like a learning architect. I think it’s better to frame it as thinking like a teaching, coaching architect, not just absorbing, but guiding and shaping how the agent learns.

Managing Trust Delegation for Agents

What approach have you seen come forward for managing trust delegation for agents? For example, how to enable my agent to actually go and modify something or buy something? How does the agent prove that it has authority to do so on my behalf?

Kurian: I think Tyler touched upon this earlier on the verifiable identity. At Intuit, we think that agents are their own systems. You cannot associate with any user’s existing roles, permissions. We need to create our own agent permissions. That’s how we are tackling it. Then we can have its own audit trail on what the agent did, what the agent planned, what the agent thought about doing, and have it as a separate entity altogether, and track its own activities and operations on its own.

Jewell: Just taking that one step further, I’m going to get it wrong, but it’s either MasterCard or American Express or some major bank, they did some partnership either with OpenAI or Gemini. There is some protocol floating out there on how you can do delegated currency transactions. Enabling an agent to do some delegated actual transaction on behalf of the user. It’s a whole series of identity and authentication mechanisms that are embedded into the protocol. I know that it’s floating out there. The idea is that the agents can then actually transact on your behalf without a human in the loop. Verifying that seems scary to me.

Traditional AI Models vs. Agentic AI Systems, in Practice

Losio: How do agentic AI systems differ from traditional AI models in practice?

Joseph: It’s simple. I think traditional AI predicts and autonomous executions executes. Agentic AI completes the loop, predicts, executes, make it come true or execute it. That’s the way to frame it.

Keeping Agentic AI Relevant

Losio: What would need to be done differently now in the agentic world so it doesn’t fizzle out?

Jewell: I don’t think there needs to be any sort of agent standards. I think that the standards here are going to be in traceability, evaluation, guardrails. Perhaps some communication standards, better interfaces on MCP tooling. It’s not the agent frameworks themselves, it’s all the things around them, that’s where the standards will come, and then that’ll make these systems live on.

See more presentations with transcripts

Beyond the Hype: Architecting Systems with Agentic AI

Transcript

Agentic AI – Hype vs. Reality?

Traditional Automation/AI vs. Agentic AI, in Prod

The Ideal Problem Fit, for Agentic AI

The Level of Trust with Agentic Systems

Monitoring and Debugging (Autonomous Decisioning with AI)

Handling, Debugging, and Monitoring End-User Interactions

Scalability Challenges, with Agentic Systems

Integrating Agentic AI into Existing Systems

Architecting Agentic Systems for Continual Relevance

Getting Started with Agentic AI – Beginner-Friendly Frameworks/Projects

Managing Trust Delegation for Agents

Traditional AI Models vs. Agentic AI Systems, in Practice

Keeping Agentic AI Relevant

Leave a Reply Cancel reply

Stay Connected

Latest News

Second Beta Of KDE Plasma 6.5 Released For Testing

Steelseries Arctis Gamebuds Hit an All-Time Low Price

iOS 26 shows Apple’s talent for hype over substance

Kindle Scribe Colorsoft vs Remarkable Paper Pro: The colour display e-readers compared

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Transcript

Agentic AI – Hype vs. Reality?

Traditional Automation/AI vs. Agentic AI, in Prod

The Ideal Problem Fit, for Agentic AI

The Level of Trust with Agentic Systems

Monitoring and Debugging (Autonomous Decisioning with AI)

Handling, Debugging, and Monitoring End-User Interactions

Scalability Challenges, with Agentic Systems

Integrating Agentic AI into Existing Systems

Architecting Agentic Systems for Continual Relevance

Getting Started with Agentic AI – Beginner-Friendly Frameworks/Projects

Managing Trust Delegation for Agents

Traditional AI Models vs. Agentic AI Systems, in Practice

Keeping Agentic AI Relevant

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News