[Video Podcast] Agentic Systems Without Chaos: Early Operating Models For Autonomous Agents

Watch the video:

Transcript

Next Generation Architecture Playbook – Episode 3

Shweta Vohra: Welcome, welcome, welcome everyone. Welcome to this third episode of our podcast series, Next Generation Playbook for AI Era Insights and Patterns: What is Relevant. And if you have yet not seen, please go back and see the episode one and two. First one we did with Grady Booch, which was all about what’s fundamentally changing, what is principled view on what’s new and what is just appearing to be new but it is same old design and architect thing. Then we looked at what is evolutionary about the architectures while coding is going at a pace, whether vibe coding, spec coding or various forms of coding, but then how do we evolve our decisions, our designs and architecture with equal pace? Is it even possible? How do we go about that?

Agentic Systems Without Chaos [01:27]

And in today’s episode we have agentic systems without chaos. Well, it is already a chaos, but how do we give early operating models for autonomous agents to our viewers, listeners, and bit of direction what we as practitioners are seeing on the field and how do we go about that? It’s a mutual learning. Let’s learn together. So, to have this topic discussed today, I have with me Joe Stein. Hey, Joe, how are you doing?

Joseph Stein: Good. How’s it going?

Shweta Vohra: Good. Happy to have you here. And I know offline we have been talking about the subject, but finally, it’s the day when we are going to hear you a lot and maybe share a bit of it, our experiences in the place. But what I would like to do is that hear from you in 60 seconds or maybe a little more about you Joe, and what are you excited about agentic stuff around the industry?

Why This Space Feels Different [02:27]

Joseph Stein: Sure. So, for me, it’s really a combination of unlocking and evolving new capabilities and being able to tackle problems that we weren’t able to tackle before. And for me as an architect, as an engineer and someone who’s very creative, a lot of us, we love hard problems and new hard problems that just weren’t even like, oh, you could do that? Is something now that’s tangible. But there’s also been a shift of the operating model and as a architect developer, security professional, and I’ve been doing this since ’97, long-term industry, thoughtful engineer, there’s a lot that goes into all of what changes and happens around that and how different roles happen for different people and where the autonomy goes in and takes tedious tasks away. I think it’s a very exciting time.

Shweta Vohra: 100%. I like saying it that it’s 10x opportunity and 10x responsibility.

Joseph Stein: Exactly.

Shift or Just More Automation? [03:36]

Shweta Vohra: Because the complications and the complexity with which this technology is increasing, if you put the problem simply, it’s easy to solve. But if we make the problem statement itself complex, it’s difficult. But good to hear from you that you’re excited and I know with your vast experience we will have a lot to talk about today. So to start with, the problem space, what do you think? Is agentic AI a new shift or evolution or is it just the more enhancements to the same MLN automation world, which we have been anyway, carrying out from years?

Joseph Stein: It’s an entirely different domain space and there are connectivities to everything from microservices to classic ML that go into that new domain. Like everything else in IT, it’s a Venn diagram, we just have a new circle. And that space has only been evolving now for maybe the last year where folks like OWASP have done a great job getting out there. Some organizations like the AI Alliance have been coming together trying to work through some problems from that perspective, but it’s something that has a new encompassment of challenges and opportunities ahead.

What Counts as Truly Agentic [04:57]

Shweta Vohra: Well said. How about we start with agentic use cases because I see across industry, there’s a lot of confusion which is not agentic. People are calling it as agentic. A lot of time, confusion in terms of which applications are a good candidate to be evolved as agentic use cases, but let’s start by defining agentic use cases. So what, according to you, maybe if you can give us one example clearly that this is agentic use case and this is not.

Joseph Stein: Sure. So to me, an agentic use case is something that would be like some type of incident production response system where some anomaly comes into the system and then some LLM is going and making some decision around what should happen based on tools calls that it’s making to introspect what’s happening with the system at that time, right?

So a combination of anomaly or many anomalies coming in, doing some logical decision that is going to occur and not a rules engine, but actually, some non-deterministic state. And then making those tools calls, which are really just API calls at the end of the day, to gather information and flow that through a process that may or may not involve a human that then orchestrates to achieve a goal such as constraining and moving a server offline so it no longer talks on the network and having a full use case end-to-end where you’re accomplishing what someone might’ve had to do for 45 minutes, you can now get done in 30 seconds and corner off a security threat and you reduce the meantime for the security operations people for seeing and deciding this could be a security threat.

For non-agentic use cases, I see those as the chatbots of the world, the deterministic systems of the world where you know for a fact that it’s almost like you have a compiler and you know it’s going to actually do what it needs to do, that everything is going to be functional in nature from a programming perspective, that you’re not going to have side effects, that you’re going to have item potency and you’re going to have all these things that we have today as software engineers that just go away and either have to be built up and considered or not used and cornered off. And they’re very different systems from the way that I look at it.

Shweta Vohra: So if you make all that, definitely one certain difference is determinism and non-determinism, but on top of that, the use cases which have loopback learning and dynamism to the level where it’s not coded yet system is able to decide the path forward and take the decisions, call the tooling, etc, is what makes it special and different from any normal other autonomous use cases or solutions, etc. Have you heard about Moltbook OpenClaw?

Open Source and the New Wave of Experimentation [08:21]

Joseph Stein: Yes, I heard it leaked 1.5 million API keys for 37,000 users.

Shweta Vohra: That’s insane to me.

Joseph Stein: But I have been following everything around OpenClaw and Nanobot and PicoBot, which can run on a Raspberry Pi now. This is why I love open source honestly, is because people can come up with some idea and then yes, it may not have practical applications at my day job today, but it’s going to. I’ve already seen VCs just instead of investing in a startup just put their own money in and just spin up their own OpenClaw hosted sandbox company. It’s wild what’s going on out there. They just claw code into their own startup again and it’s going to be interesting where and how fast that evolves.

The guy who wrote it now works for open AI, so it’s going to be a continual interesting opportunity to see who… The same way that we had the rush with who is going to own the app stores and who is going to own the device and mobile, that’s not going to be the same here because it’s the internet and the way you get to agents and work with them and that’s a whole new experience that is going to, I think, evolve this year.

Shweta Vohra: Yes, open source makes it interesting, faster, but at the same time crazy because people think projects not the system and that’s the interesting part. But I want to call out one use case, which I heard from Peter, the author of this OpenClaw, the famous OpenClaw, and he was asked which is the use case you would like to share? You had that aha moment that yes, OpenClaw is different and doing different and he said that he was away from his system and this OpenClaw had to take call for example, the boundaries he created is that don’t cross outside my computer, do whatever you need to do, figure it out from the local system. And so it first check the local binaries if it has not found how it can install those binaries from the local scope or clear the parts and figure it out and then start using that application when he’s away. So that is something smartness which system evolved itself that okay, if I cannot do X, let me see Y if I cannot do Y, let me see Z.

Joseph Stein: It has a soul file that is continuously something that evolves itself and is self replicating.

Shweta Vohra: Absolutely.

Joseph Stein: It’s like the most novel little thing. But sometimes, simple is all you need, right? Sometimes the best engineering is a triangle to hold the bridge up.

What This Means for Architects and Designers [11:14]

Shweta Vohra: So, we have said that yes, agentic is getting more and more real every day. We have said that yes, we have agentic use cases and the use cases which are not so agentic, but people are getting confused. But what does it mean for designers and architects? What is that happens when system plans act and execute? What is they need to be careful about?

Joseph Stein: I think the industry is going from pioneering to more stability now around this and all of the problems that you have to be thinking about are boundaries, my autonomous unit and what boundaries does it have? Then what are the boundaries that exist between that autonomous unit and the other autonomous units that it’s going to interact with to take its autonomous task. It’s not just about one agent, it’s about orchestrating many agents working in many different ways with many different APIs on a massive scale within an enterprise. You could have tens of thousands of agents running and working all at the same time making different API calls in any one time. You need verification and backup of evidence of what actions those are taking based on your requirements.

Just like anything else that we would be doing that we would be facilitating the risk appetite for our organization and what risk mitigations we actually do as engineers because every organization is different and they all have different risk appetites. But the threats here are different and there’s a combination of setting that risk appetite and then once you understand the risk appetite, because most people don’t even know what the domain is, you’ve got to tell them, here’s your risks. What do you want to do from a business perspective behind it? Don’t ask me the engineer. So that’s one aspect of it.

And then being able to have an additional set of metrics and observability for people who have thought in the industry about observability of our systems, now we need to have that exact same thing for our AI and we need to see what’s happening with our AI, what’s happening with our prompts, what’s happening with our tools calls, what are the orchestrations that are occurring with those tools calls for the prompts that are coming in overtime so we can SDLC them? And onto the SDLC, there’s an entirely new SDLC that is emerging right now. I don’t know how to put my finger on it. It is moving so fast, but the way that we are going to be working with and interacting and managing code bases is just going from co-pilot to command center. It’s a radical shift and I’m excited about it.

It’s going to be difficult and there’s going to be shifts in roles and responsibilities, but I could focus on problems and I can hopefully be more impactful to my work and for stuff that I may want to do on the side like for baseball stuff, whatever, because I’m a league baseball nerd, but it’s a completely different shift. And once you start doing that and change the SDLC, the CI-CD now has to change because finally, Kubernetes doesn’t have to be the answer for everything because Kubernetes isn’t the answer for everything. But it is because the tooling is there, the people are there and the hype is there and blah, blah blah. But if I could just quad code my way and deploy on a couple of instances and click, I’m in. And if I could then go and reuse those plugins as skills and do all sorts of other agents that are set up now around autoscaling and different ways of handling it and building it up, I’m hoping we will see a new layer emerge around how that is used now, but we’ll see how it goes.

So I think it’s a complete shift across the entire spectrum of everything we’re doing as architects and developers and engineers in everything we hit on a day-to-day basis.

Shweta Vohra: We are giving a lot of responsibility here and it sounds to me quite aspirational because I agree that responsibility is increasing, the risks are increasing and we need an entire new way if this propels like this at this space, we need newer ways to think the whole things around the agentic systems, which certainly has more. We need to talk about those explainability and observability you said, but before that, I want to double-click on the risks because that is what is immediately in hand. That is what people can control, people can watch out for when they’re designing these agentic systems. So tell me, what do you think are the newer risks, which I’m not talking about the traditional ML and gen AI, other aspects of it with the agentic systems, what are those newer risks which you are trying to call out?

New Risks in Agentic Systems [16:12]

Joseph Stein: Newer risks are prompt injection and hijacking of the control of an agent. And it’s interesting because what brought me to even understand this risk and this threat was some research that Bruce Schneier posted. He’s the author of Applied Cryptography and industry leader around security and it’s all around what they call the Morris II worm and basically, if you have an email agent, you are susceptible to having that email potentially hijack your orchestration tool layer based on the interactions that you could have going back and forth between your prompts and the tools calls where your tools calls can either be funneled and the client can take over or just do things like a denial of service attack. So that denial of service attack unlike before, isn’t just going to cost you downtime. It’s now going to eat up tokens that cost money. So it’s different blast radiuses around some of the same things that we had before, but in new ways, coming out with different experiences.

It even goes so far where you look at things like supply chain security and how it applies to here. There’s been papers and research done where you can train an LLM to have certain pieces of information inside of it so that the prompts going in will be able to generate backdoors in code. When the code comes back from the LLM, it will actually have malware in the code generated in the generated code in the model. That’s nuts. And I read these papers and I try it out on my machine and I’m just like, wow. Right? There’s all sorts of new different attacks coming in around that. And then you have things like toolchain escalation and to me, MCP is just remote stored procedures. They’re just stored procedures, that’s all they are. They’re nothing EJBs, whatever you want to call them. They’re the EJBs of 2020. But they still have a place and a purpose and tools are important to understand and to me, it’s all about intent.

But if they’re just API direct calls where you’re just hitting rate limits and not knowing what the different APIs are going and orchestrating around, you have a risk of having those tools being called in the wrong way because the LLMs are still not that smart based on their context window and what’s coming in and what they’ve seen before. So doing things like trying to figure out how to cache your orchestrations and then start thinking about anything that’s out of cache and how you handle the exceptions and the narratives around that. It’s an entirely new pattern that you have to start thinking about when you’re architecting your systems.

And if you’re a system who develops distributed systems of scale, you always think about caching, right? So it’s not like you’re not thinking about caching anymore, but now, you’ve got to think about it at a different layer of where it’s interacting with how it’s handling this non-deterministic system and storing non-deterministic data. That’s a cache mess, right? So what do I do? How do I create some embedding around it or something where I can go ahead and hold some set of floating points or something that is something, right? So it’s a hard problem to solve and it comes back to the non-deterministic systems, right?

Shweta Vohra: Yes. I think if I may summarize it, you’re saying that some problems and some risks are where we need higher level of abstractions. For example, if we had those injection issues, now it’s prompt injection plus, plus the layers are increasing. Plus second part you said is a bit more standardization, bit more, at least on the controlling parts, which may be MCP server plus more standards which will be forming. So more of decision making and the controls line there where we still be, if I may call predictable with unpredictability.

Joseph Stein: Yes, yes.

Explainability Without Noise [20:39]

Shweta Vohra: And there are certain things which need newer approach and newer researches and more solutions coming around. That’s interesting, but I think you’ve called out very nice areas for us to delve into. If we may now look into the explainability bar part, which you touched upon. We have explainability, we have human in loop and everybody is using these as jargons, but how much of it is? Because earlier, we had single prediction or single change decisions, but now, we have multiple predictions and multiple change decisions which are happening and every stage gives me the explainability. For me, if I need to have more observability to observe more, it’s not helpful, it’s not insightful for me. It shouldn’t be the case that for everything, we get loads and loads of observability and then we start thinking how do we get insights out of it? So what in your view is explainability and if any early insights you have from your work that, okay, how much is good?

Joseph Stein: So a lot of it comes down to the combination of use case and what set of action, if any, or actions that need to get performed and when. Sometimes, things are active processes where the human in the loop is part of the workflow, sometimes the human is a passive control where something might be going on and the user or the human might need to go take a look because now, the workflow has been stopped. And sure, the human is still in the workflow but at a different swim lane and has a different set of criteria of what they may need to see in order to adjust for that.

You don’t necessarily need to see potentially, unless you’re an auditor, every single little piece of every single little touch of every single little system and IP address and user access and the whole DSPM, right? You don’t need to necessarily see all that, but you need to have, and I think it’s going to be almost like a video game where you’re going to have 75 different things going on a day with one or more assistants and agents and they’re going to be generating reports, taking out tasks, they’re going to be working through actions. They may have failed a boundary that you need to look at before it goes out and oh my God, it’s for your boss, so you’ve now dropped anything and you’re like going like this. And then your to-do list is now filling up automatically for the AI is now your boss. You’re now getting a to-do list from your AI of things that you either need to take action on or need to go back to the AI about or something else.

Maybe you need to go talk to Suzy and you really need to take this out of the loop completely, or maybe you just need to go ahead and switch the input box or radio button and click next. It’s going to be so driven by use cases and the platform aspect of that is going to be, I think, interesting. It’s a whole new user experience. It’s a new behavior that it’s like all the mobile apps that we have on our phone and everything that we do on our phone, but now lots faster and has access to everything and makes decisions for us.

Shweta Vohra: Absolutely, more for democratization plus more responsibility and risk. I hope we may not have all the answers, but yes, we know that, if people are listening to this and can start gearing it up for the better side of those explainability and those problems of operation sides of it as well, it’ll be really, really good. And I believe that this is not decreasing the work. This is increasing the work at different layers, but the responsibilities are increasing in that sense and the work, human work is… Do you think human work is reducing with all these?

More Power, More Work [24:47]

Joseph Stein: I actually find my job to be more demanding and increasing than I do decreasing. I made the mistake in a day, turning something around that was just nearly impossible to do in weeks of time and doing it and poof, it was there. And then the next day it was like, all right, how about this now? And it’s like, wait, wait, I’ve got 750 other things to do. I just dropped everything just for that one little prototype demo.

So the way I think about it is with great power comes great responsibility and those responsibilities are coming. Sure, there’s a reduction of work, absolutely, but the responsibilities now are becoming so much more powerful because the expectations are higher. Before, the expectations of what is production quality is very much determinized based on all sorts of different politics and religion and organizations and everything else. But now, if you wanted to, you could have every single threat written down by AI with every single known vulnerability pulled in from MITRE, checked up in a box and have architecture diagrams and user guides and FAQs, and a continuous run book that’s an automated website that you build and have people log to it, and all of that is just a click of a button away.

So you’ve got to read that, you’ve got to understand it. You’ve got to make sure that the AI isn’t doing something crazy. I’ve seen the AI just be like, “Oh, sure, I’m just going to log the key and the logs.” Oops. You can’t ship things like that. So our responsibilities become very different in how we are now stewards and reviewers and looking at the world from a lens that from my perspective, I’ve always wanted to look at the world at… I have a very high bar for things and it’s very hard for me and other people and how we all work and negotiate our different weaknesses and strengths together as a team to get to that bar that makes our software awesome and makes us good at what we do.

I think we all want to be better at that, and I think people are really coming together to be better. It’s just going to mean now we’re going to have a whole lot of new things that we’re going to have to categorize and isolate on. We can come up with 100 things, but we can’t afford them from a business perspective. So now all of a sudden you’ve got the ideation nightmare. You’ve got 1500 things you could do, but what’s the thing to do? What is the business to do? You could do anything now, but now what do you do? You add a strategic business plan.

Platform Thinking for Agentic Systems [27:41]

Shweta Vohra: I’m hearing same thing, which goes in my head. So good to know that things are matching on that side of thing, that things are increasing and responsibilities are increasing. But those who know me, they know that I’ve written a lot about the coding platform engineering patterns, and I’m a big fan of that. Whatever we can give to platforms and do it in standardized way while making it easier for the consumers, the providers and the whole ecosystem, we should do it. Whereas pattern I’m seeing currently because it’s early age also, early time, so also for agentic systems it’s mostly the team-based implementations, which is going in circles. What’s your view on the early platforms or doing it platform approach? What’s your view about that?

Joseph Stein: So at my company we built it centrally and I’m really glad we did it that way because we still have a couple of decentralized systems that do run from prior to our system going live and we’re having to negotiate ISO 42001 migrations now.

Shweta Vohra: That sounds interesting.

Joseph Stein: Oh, it’s fun.

Shweta Vohra: Tell us more.

Joseph Stein: So, our platform is focused around identities and key access based on geographical regions with open source models that we run across our GPUs and our private cloud data center. So essentially, you get not just V1 chat completions and V1 embeddings, but we also built an entire RAG as a service system all built out around PG Vector that does some amazing hybrid search that our data scientists came up with.

We have eight steps in our data pipelining for our RAG system. It calls LLMs, it does tokenization and hybrid searching and more calls to LLMs and all sorts of good stuff and works across any document type and it comes back with citations that has around it and you’re able to get from chatting with documents from source of document and have it all tied back and it’s all in one place and every business unit uses it. It’s tied into ServiceNow for our CMDB, right? So everyone uses ServiceNow and CMDB most likely. So everything is tied into CMDB. So whether it’s our Geneva product or our Blue Prism product or our Advent product, we’ve got 350 products. We made 167 acquisitions when I started two years ago. We’ve grown through acquisition and growth.

Shweta Vohra: So, that makes me interrupt you here and ask you when did you start?

Joseph Stein: Yes, I started-

Shweta Vohra: If it is 350 products already integrated on your platform, level four agentic systems?

Joseph Stein: Well, no, no. So when I started two years ago, there was two groups doing AI where we’re like an 8-billion-dollar public on the NASDAQ-listed 20,000 person, 30,000 customer company or something like that. And when I started, I started in the private cloud group and our private cloud runs in multiple geographies and data centers and we’re basically a fund manager for funds. We’re fund administrator, excuse me, not manager. We’re a fund administrator and then we have tax products, accounting products, automation products, health products. Oh my god. We just have products everywhere that do everything. Learning products. I don’t even know. I’ve seen so much.

Shweta Vohra: Okay, so you’re saying that it is integration of the products on your AI platform is what is centralized?

Joseph Stein: Yes, yes. And what that’s done is all the systems that get built up from the V1 chat completions and the RAG and doing service discovery and having a place for your A-to-A agent cards to go to, being able to have from the ground up and everyone know that we have a center of excellence around that. We have one teams channel around that. We have one 24 by 7 support that we run internally around that and everything is focused and just grows up from that one central place. And then we have a work HQ system that is the agentic overlay on top of that that does agent building so that if you’re not a coder, you could actually go ahead and build agents and wire them together and then have them run and orchestrate and integrate and process across data sets and do the different wiring and set your prompts. And it’s a really cool system.

Shweta Vohra: It’s in production.

Joseph Stein: Yes, it’s in production. It’s been in production for a while and yes, billions of tokens, like thousands of use cases, UK, U.S, AWS, our private cloud, all sorts of fun.

What an Early Operating Model Looks Like ()[32:30]

Shweta Vohra: That’s interesting. So from your experience then, if we get benefited, so I hear you, you’re saying that platform approach early on you’ve started and further building on layers on top of it and now, that’s the reason you have agent tech studios or environments set up and people are using it actively. What is the operating model others can take from this autonomous system which can be built at scale? What is that we can learn from you?

Joseph Stein: I think a lot of it is having to build the tooling for the organization and either having something where you can extend some system that will allow you to run and to do this or it’s going to be something that you have to build yourself or something that comes in open source. I don’t think all of these systems are there yet. I’ve seen a couple of other systems and folks in the industry who are doing this. There’s an announcement with Goldman Sachs and Anthropic around compliance and so they have a system now, so it’s getting out there more around our systems. For us, for our AI gateway and our work HQ system besides us running them for our own internal products, we also have them as products that we offer. AWS has components. There’s a lot of different paths around where and how those are starting to come about and form and stabilize.

I think that everything from user experience all the way down to DevSecOps have to be accounted for. You really have to think completely free-60 about every stakeholder and every user that is now going to be interacting with and using your system and you may have to cobble together a whole bunch of different things in order to build it. You may get away with just using Envoy’s gateway and writing a whole little services because you only have a shop with 25 people or maybe you’re a large enterprise with 15,000 people that do .NET, Java, Go and they do it in 38 different countries and who knows? The fundamental principles though are still the same as just how you build that platform out and I don’t think the platform engineering pieces are really much different than we have had before except for this new domain that has to get introduced for the new things that we have to account for. So it’s like platform engineering plus, plus, plus almost.

Shweta Vohra: Yes, I think in terms of operating model, if I now put what you said, it’s about the registration, the life cycle, the observability, the RACI and some of those aspects which early on, if we put together will be really helpful for the organizations to do it in the right way.

Joseph Stein: And when you go through your stakeholders and your systems, it’s not always things that you do. It’s a combination of functional and non-functional requirements and as architects, you need to be the one responsible who says, okay, we need this person to be able to go make this decision and this is their responsibility and has to go in like we need to go and build and do that and get operations people to help us. But that has to get done here. It may not be an engineering task, but it still is part of the overall architecture of what you’re trying to accomplish.

Shweta Vohra: Agreed. I think same thing I’m hearing from all our guests that yes, architects should take more responsibility in this case and help build that understanding early on to engineers in terms of system thinking and broader thinking that where it starts, where it ends and what they need to be now more careful about those emergent behaviors. I want to touch upon now that what is that organizations should do, whether they should wait for most standards to come, platforms to come, early experiment, go in production. With your experience, where do you see?

Should Organisations Wait or Start Slow? (36:39)

Joseph Stein: Start yesterday.

Shweta Vohra: You’re talking like a CXO now.

Joseph Stein: It doesn’t mean you have to –

Shweta Vohra: Talk like an architect.

Joseph Stein: It doesn’t mean you have to ship it to production, but if you don’t start to understanding what these tools can do, you’ll never be able to have your mind be able to bridge the gap of what is actually now possible in your business with these tools and your competitors will, full stop. To me, it’s just that simple. It’s a market. Everyone’s got competition.

Shweta Vohra: Do you think they should wait for standards to emerge or support them?

Joseph Stein: I don’t think so. And I’ll tell you why. Let’s look at something like MCP and talk about MCP for a second. Let’s say MCP is SOAP and there’s going to be some new standard like rest that’ll emerge that everyone’s going to use, but SOAP was powerful when it came out. It allowed businesses to have financial fraud transactions and all sorts of interoperability for healthcare and HL7 in order to do computer to computer exchange and interaction of data and files and was a powerful solution back in 2002 or whatever it was. It was amazing.

And you know what? SOAP is still around. It still exists. HL7 is still SOAP, it hasn’t gone anywhere and did a new, something emerged and everybody finally went wild and everyone uses it now and everyone does data exchange. Sure, absolutely. But the folks who grasped the interactions and exchanges of data early on are the ones who could start to understand what there may or may not be and apply it to those technologies And sorry for the boat CXO now I’m going back there, but it’s true because when you’re thinking about whether or not to start, you’ve got to try them out.

Just what’s your attack surface, really just start simple. What’s my attack surface? Is it like things on my desktop? Great. Spin up a box where you can go in Cisco and you have a VM cloud box, now go crazy and go spend a couple of days just going and trying to think about the things you can do and how it could benefit and make things better. You don’t have to go from zero to hero, but to not use these tools, to me is saying you don’t want to use a computer when the internet is around. You don’t want to use mobile phones anymore, but it’s so much faster than those technologies were. Oh, you don’t want to use and adopt the cloud. It’s the same conversation we had over and over and over and over again and now, it’s so much more fast and compact than it used to be and it’s moving so much faster.

Shweta Vohra: So stay in that zone and let’s make it real, not complicated, I said complicated but not complicated, but let’s make it real that now, if we are wearing CXO hat, we know that we have to keep the lights on. We have business to run, we have to make new things work with the existing, right? So what are those guidances or what are those points from your experience where we merge and marry these existing with new while we play with the new, what are those things which you would suggest?

Merging the New with the Existing [40:02]

Joseph Stein: So I think it’s a combination of having some new things getting tried out from a feature perspective and while the engineers are doing that, allow them to try out some new tooling at the same time so that you’re doing a combination of allowing the engineers to explore their needs and their creativity and what they need for them to be more productive, but also doing something that’s critical for the business and building out a feature that maybe you can build out in three weeks instead of three months. Pretty good, right? Or maybe even three days and you get to a demo and you start getting customer calls and get them excited about something, however your organization might pull.

But your cycle, whether it’s a sales cycle or a marketing cycle or engineering life cycle, those cycles are going to be I think, radically different by the end of the year. Of all the different things that we’re going to have in the consumer marketplace that are going to start to get stabilized, that we’re going to start trusting once we trust in the consumer we trusted at the organization, more platforms are going to be built with more security and governance and those platforms are going to be available on the enterprise for either products or open source or just built internally is the system.

Shweta Vohra: I think that creativity and more ways of letting engineers figure out the more ways they want to solve the problem with is an interesting part to look at.

Joseph Stein: Sorry, I didn’t mean to interrupt, but what you just said, right there, that’s going to be really hard for businesses to let go of the business owning the business requirement, into the engineer’s hand, right?

Shweta Vohra: You are giving it in machines and forget about the engineers.

Joseph Stein: Well sure, however you want to look at it, however you want to say the same thing I said differently. But maybe the engineers have a bigger problem with that than the business does, but it’s going to be problematic. That’s going to be really something that every organization is going to have to deal with and they’re all going to deal with it differently based on their people and culture and everything.

Shweta Vohra: I see more from the existing view from the perspective that, see, when it comes to agentic, I always say that everything is not agentic. It is very, very context and situation and unless until you reimagine the whole problem space, which is a new thing, which is a new system, you anyway have to carry out new things, new model around it, but then see to it that where are those meeting points and how do you make things work together? So that’s the perspective and I hear you completely newer space, newer opportunities, more people to do that, but then requirements are going in more broader perspectives in the hands of man and machine.

Cost, Scale, and Sustainability [42:54]

With that said, let’s touch upon the cost and sustainability because I know a lot of companies took this challenge of being green by 2030 or things like that. With gen AI, more cost more sustainability issues, definitely. Yet we are not fully talking about it. What do you think is the change in terms of costing models and sustainability? Any early insights from your work because everybody’s talking tokens.

Joseph Stein: Part of costing is also what you can afford for your requirements too. Sometimes, there is no available… It’s not even a money factor. Sometimes your data can’t just go to another provider such as open AI and you can use their tokens. But the way that I’ve looked at this and we run our own GPU and we run our own open source models, but I look at it as, from the perspective of, we only have a fixed amount of GPU, we could only run a certain amount of models and I’ve got 20,000 people who want 35,000 different models because they saw them on Hacker News and Reddit.

So how do you serve the people? How do you give them the models they want? And we try to tie it down and roll it around into use cases where different and certain sets of use cases will have different models and different regions that ultimately, they have to get pinned to because they’re in production, they can’t have model drift, it can’t be this continuous, oh, there’s a new model there and it’s so much smarter and so much better. You know what? Maybe for your use case, that changed the response of your prompt and you didn’t want that because you liked the email that was going out and now all of a sudden, your new email was so thinking and so smart that people are complaining. So the new model is not always the best model and sometimes you have to sustain models just like you do operating systems and treat them like end of life.

I still have Llama 318B running. I still have use cases in production with Llama 318. I think it’s running on one chip with a couple of other models. So no big deal, but it’s relative. So we run Qwen 383B, 30B for thinking we run Kimi 25. We also have Kimi 2 and Qwen 235 Vision, instruct and thinking and we have a smaller Qwen vision, which is much faster because a lot of people who want vision don’t need it to be smart, but they need it to at least do what it needs to do from the structure perspective. So we have a whole bunch of smaller but good enough, just good enough models that maybe don’t have a PhD but they have a master’s degree. So we run those and the master’s degrees are just faster than those PhDs are and we break it down.

And that all comes back to cost because that time on the GPU that’s waiting on the thinking model for the bigger Qwen model is taking token time on the GPU and cycles from someone else coming into our platform, queuing into our system, waiting for that token to actually be able to process on that GPU and tick.

If it was a foundational model or something you’re paying for at AWS or Anthropic or Azure or OpenAI or Gemini at Google, whatever, those different models that you’re paying for are going to be cost for models that maybe you can run in a different way and not have to expand the cost because you don’t need to solve some move on experiment for some new theoretical physics equation. You’re just doing invoice processing and that’s all you need.

So to me, when I think about cost, it has always come back down to what is the total cost of ownership of our GPUs and how do we create a multi-dimensional plane in between our GPUs for doing things like oversubscription for requests coming into different regions of what models they want to use and what tenants they have and prioritization around tenancy and rate limiting, so we can maximize the four chips we have running our Qwen3 model for dev staging, UAT production, this application, that application, this boundary, that boundary and it’s still all funnels and works around the same four chips and we’re isolating it that way.

I think that you have to be thinking about it that way because that’s how the big model provider’s thinking about it. How do they do their total cost of ownership? And I think it’s now at a point where people are going to start looking at their token bills like they eventually started looking at their cloud bills where they’re like, “Wait a minute, we just outsourced to AWS and it’s costing us more. Oh my goodness, what happened?”

Shweta Vohra: Absolutely.

Joseph Stein: Right? That reckoning is going to come, it’s coming. I have no prediction when, but that will come.

Shweta Vohra: I agree. I agree to your point because this is in my observation too, that if you have, let’s say giving a platform or giving a service, until the time you expose that cost to someone and let them own the cost of that total cost of ownership for their use case or their services, what they’re getting, they don’t get to realize it, where it lies in the chain. So it’s pretty much like make your children learn early in the game how to use money wisely. That’s a very good point.

Hard Lessons from Early Success [48:19]

With your experience, if you have done any mistakes or early principles or anything you want to give to the builders and engineers and architects who are in the moment designing those systems, what principle or learning experience would be that you would say that take this early on with agentic systems in particular?

Joseph Stein: Just right off the top of my head and something that almost gives me shivers down my spine, I would say that my biggest failure over the last year and a half of doing this has been my success. The system exploded with usage so fast because everyone was like, wait, we could do V1 chat completions and all we have to do is go to a website and download a COI and all of a sudden, poof, we can make V1 chat completions and there’s an image model and we can start sending in all of our images that we were never able to do before and now, we can start getting business in mail rooms and all this type of new type of opportunities that just sprawled within the organization over three to six months. It was exciting, it was fun, but it was a constant firefighting.

The train was rolling at 90 miles an hour and we were just trying to get enough tracks so that there was no stopping at the station. We were just trying to get enough tracks so we can loop to slow down maybe one day so we can stop at a station, which we eventually did at the end of the year and that was good, but it takes a while, good problems to have kind of thing. But those were severe and it wasn’t hype. It wasn’t just like everyone tried it out. Everyone who tried it out was using it for something. They had some tangible thing that they found that they could apply it to in their day-to-day that helped them out and they became a user. And it was exciting, but it was also a lot of incoming, a lot of structure that we didn’t have a lot of organizational support to run it that we had to put into place a lot of new software that had to get built that we went live with an MVP.

We only had a couple of users when we kicked this off. They were just going to try it and then all of a sudden, we had 250 users within three months and half of them were in prod even though there’s the big warning label, don’t go into prod, and it was in red big letters with an underline and it says, “Do not go into prod with this.” But it happens and DR gets set up and we build and we make things work and it becomes reliable and it works.

Shweta Vohra: So be ready to explore. Don’t fret over it too much. Don’t take stress and be prepared for scale early on is what I’m hearing. Am I right?

Joseph Stein: Yes. The worst thing that can happen is someone actually takes what you’ve done and goes live with it. There’s so many new boundaries and things to be considering and for all of my experience, I feel like I started just fresh all over again.

Shweta Vohra: So let’s call it this way, be in a hurry to learn, but don’t rush to take half-hearted or half solution out there to create more problems. Wonderful. With that said, maybe a last question. If let’s say you and I meet again in December, what new will we be discussing about? I’m sure there will be a lot more happening during this time from February, we are meeting in December. What is your prediction? Any prediction early on?

What Comes Next [51:58]

Joseph Stein: Yes, I think we’re going to start seeing the boundary coming into the workplace and the consumer with hardware just in what I’m seeing and what you can do now with the software automation on these simple hardware devices. And I’m not saying robots, it’s not about the robots, but you could even look at something like Alexa. And my Alexa every day, I say the same thing to Alexa. I’m not going to say what it is now because all of a sudden, you’re going to start hearing it. But I should be able to do natural language programming with my Alexa and say, “Whenever you’re playing a music station and I want to know who’s playing it, you should just do that for me. I shouldn’t have to ask you. You should always tell me who’s playing it then play it for me.” I don’t want a programmer at Alexa some music company to go and sit with a product manager and make that decision for me. I want to be the product manager for my own product of yours, not you obviously, but of the new world and be able to shape my interaction.

And I think we’re going to start seeing assistants working with assistants and agents working with agents and as these companies start building their own agents, they’re going to start working like you’re not going to be emailing invoices from one agent system over email to another agentic system in order for ARAP and get that all working. That transmission mechanism like fax went away and turned into email. Email will go away and maybe turn into AAA or some A2A new standard or something like that. And I see the world starting to grow and go cross organization.

Shweta Vohra: Very interesting. And I’m sure technology will surprise us, but a lot more responsibility, which also is our duty to call out at the end. Thank you, Joe, for joining. I’m glad you could do and we were doing a lot of discussions in the background, but it’s finally coming to this shape. Whatever form and shape it has come. We’ll see. Thank you so much.

Joseph Stein: Thank you. Bye.

Mentioned:

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

[Video Podcast] Agentic Systems Without Chaos: Early Operating Models for Autonomous Agents

Watch the video:

Transcript

Next Generation Architecture Playbook – Episode 3

Agentic Systems Without Chaos [01:27]

Why This Space Feels Different [02:27]

Shift or Just More Automation? [03:36]

What Counts as Truly Agentic [04:57]

Open Source and the New Wave of Experimentation [08:21]

What This Means for Architects and Designers [11:14]

New Risks in Agentic Systems [16:12]

Explainability Without Noise [20:39]

More Power, More Work [24:47]

What an Early Operating Model Looks Like ()[32:30]

Merging the New with the Existing [40:02]

Cost, Scale, and Sustainability [42:54]

Hard Lessons from Early Success [48:19]

What Comes Next [51:58]

Leave a Reply Cancel reply

Stay Connected

Latest News

Exclusive: YC Doubles Down On Trayd, A Construction Tech Startup That Just Raised $10M In 3 Weeks

How To Write a Social Media Manager Resume & Cover Letter in 2023

Opera One now lets users interact with Gemini from the sidebar – 9to5Mac

AI investment due diligence platform raises £6m – UKTN

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Watch the video:

Transcript

Next Generation Architecture Playbook – Episode 3

Agentic Systems Without Chaos [01:27]

Why This Space Feels Different [02:27]

Shift or Just More Automation? [03:36]

What Counts as Truly Agentic [04:57]

Open Source and the New Wave of Experimentation [08:21]

What This Means for Architects and Designers [11:14]

New Risks in Agentic Systems [16:12]

Explainability Without Noise [20:39]

More Power, More Work [24:47]

What an Early Operating Model Looks Like ()[32:30]

Merging the New with the Existing [40:02]

Cost, Scale, and Sustainability [42:54]

Hard Lessons from Early Success [48:19]

What Comes Next [51:58]

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News