Key Lessons From Shipping AI Products Beyond The Hype

Transcript

Calçado: I’m Phil. I have spent the last three years working deeply into product building with generative AI. Many of the things that we’ve learned during this process, became a series of articles, that then got to presentations, and various different things. Something I’ve learned over the last three years producing content and giving talks and all that stuff about AI, is that it’s hard to address a diverse audience, because different people are coming from different perspectives, from different levels of experience. I will take advantage of the fact that this is supposed to be a keynote and set the scene.

Basically, what you need to know is a little bit of architecture, a little bit of AI. What you don’t know from one side to another, hopefully I’m going to give you pointers that you can look up, some good homework for you all. I have enough articles and other things that can help understand a topic that might be on the blind spot a little bit.

Bias Check

I hate when presentations have who am I slides, because it’s like why you should listen to me. This one’s the other way around. It’s like why you should take what I’m saying with a grain of salt. The reason you should do that is because, although I’ve been in this journey for a fair long time compared to how nascent GenAI is, I have my own biases. I’ve been building software in a very specific way for 30 years. I’ve successfully built some teams and some architectures, really on microservices, distributed systems, all that stuff. That’s what I’m bringing to this world.

If I had a data science background or an AI research background, probably I would have different opinions and different things to say. Basically, to me, the way this manifests itself is that I’m very biased towards actually getting stuff done. I’m very biased towards iterative. I’m an old-school agile person from the 2000s. I want to see iterative incremental development in everything I do. I don’t think that AI gets a pass out of this conversation. Buyer beware, that’s the kind of bias that I have.

What Was Outropy (Build, Run, and Optimize AI Pipelines)?

Before we go into details and architecture, I want to talk a little bit about what we were building, Outropy. I’ve been an engineer for 25 years now. Over time, I started managing teams, leading teams, and doing the management leadership work, tech lead, manager, CTO, director, played all these roles at different organizations.

One thing I always found is that when I’m working on code, I have VS Code, IntelliJ, Eclipse, whatever IDE you might use, they have all these automations, refactors, different things, to make your life much easier and allow you to work in much larger code bases. When I’m wearing my leadership brain, my leadership role, this is basically nothing. There’s a bunch of Google spreadsheets somebody gives you. I’m pretty sure anybody here who has been to more than two or three positions as a manager or a director has a Google Drive folder full of templates and checklists and things that you carry from job to job that you like applying.

My initial idea back in 2021 was, can we automate this? Can I create basically the VS Code for everything that a manager does, or everything an engineer does? It’s not just writing code. We basically started building this the moment generative AI became a thing. It took about six months to get to the first public beta. Back then we were using ChatGPT 2.5. Eventually we were using GPT-4. It was a little too expensive. We’re a small company. We had to limit how we used that. We learned a lot from that, and the various flavors of Llama models that were being released. We started as a Slack chatbot.

Eventually, we became like a Google Chrome extension. This is a screenshot from our pitch deck that shows the Chrome extension. One of the interesting things about Outropy itself, it’s a very garden-variety AI startup. We were so early that we were one of the few actual companies to reach a few thousand people actually using the application when everybody else was just doing demos and things like that. We’ve been playing this game for quite a while.

I described here almost a very weak pitch deck. This is 2023, early 2024. We’re up against everybody. Everybody was releasing a similar tool. Microsoft had I don’t know how many versions of compilers. Salesforce is still trying to figure out Agentforce. That’s the kind of stuff that we had. The first screenshot here was May 2023, Salesforce says we’re releasing this thing. Slack GPT is going to be awesome. Then every three months or so, they would make an announcement, like, it’s totally coming. It’s going to be awesome. Here’s a conceptual video of what it’s going to look like.

If you work for startups and if you’ve fundraised, especially the very early stages, you know that every time they had an announcement like this, our investors and our customers are like, Salesforce is going to destroy you guys. Why are you even trying? Have you seen Slack AI? The video is amazing. Of course, it’s a conceptual video that doesn’t really exist. It’s awesome. Eventually, they released it in I think it was Valentine’s Day 2024. Then I was like, also on the phone trying to, can I install this? Can I try it out? One thing that was really surprising to me was that the product that we had built, if you use Slack AI, and I know it’s a little vague if you didn’t, but basically, the product we built was miles ahead in terms of quality than Slack AI, of course, based on my own benchmarks.

We published a few articles around this with more data that I’m happy to share. The point was that I started not understanding why Salesforce, such a big company with a lot of people, a lot of money, access to the same technology or better, was struggling building AI products. You can replace Salesforce there with Google, even more indie platforms. Most AI products you’re using right now, it’s this summary of my email that gets everything wrong, hallucinates, and stuff. The status quo was terrible and it’s still not good.

Something interesting to us was that we were focusing on the engineering leader. That was our ICP, our customer, the people we’re talking to. We were growing like crazy. For such a small startup with no marketing budget, we’re adding a lot of people, new organizations. It’s a little weird because we’re a very small startup and people were plugging us to their sensitive data, and we’re like, “Please don’t do that”.

Then we realized why it was such. I have to be clear that this is a post-mortem. This product failed miserably. One of the things we saw as we were failing that was really interesting was that the users were not really interested in the tool as much. When we would talk to them, and say, can you give feedback on this thing that you’ve been using? We realized that they were actually using our tool because they were trying to reverse engineer how we’re building this.

Basically, the conversation we had, this is almost a verbatim quote, is like, how can two guys and a dog — and the dog is not even doing any work — how can you build this system that has all this agentic behavior? We didn’t even call it agents back then. It was like more of a Copilot verbiage. I have nine people, data scientists in the corner, and all we have is a chatbot that tells you it rocks. This is something that I spent a lot of time thinking about and working on. We produced a lot of content. There’s a lot of different articles that go into details and a lot of different things that I’m going to talk about here.

The Three Ways We Build AI

Over time, I’ve been developing a theory of why these things suck. Why most AI products, especially in the productivity space, are just not good. I think it has to do with how these products are built. The way I see it, there’s basically three ways that we build AI today. You might see this in your company or across the ecosystem.

The first one is Twitter-driven development, which is that, this changes the game. Now everything has changed. You’re so cooked. OpenAI releases this thing, your startup is not going to make it. This is a very prevalent mindset among a lot of different people, where it feels like they’re always building for the new version of the models that are going to come next year, or that was promised to come next year, or whatever it is. They’re not really worried about the current limitations of technology, because Sam Altman said that we’re totally going to get AGI next year. What’s the point? I’m going to build for the future.

In fact, Sam Altman said this multiple times, that you should build for the future, where OpenAI dominates everything. There’s a lot of people building software like this, products like this. I think that gets to a point where you have the flashy, fantastic demos that sometimes get funded by hundreds of millions of dollars, but don’t really deliver as much, because they’re not dealing with reality.

On the more realistic side of things, you have another option, which is very common in existing companies, less so in startups, which is, this is basically a data science project. I’m not a data scientist. I’m a software engineer, backend, through and through. I’ve managed a lot of data science teams over the years at SoundCloud, DigitalOcean, SeatGeek, and others. One thing that’s interesting about the way that data science teams work is that they usually tend to treat project by project as its own thing. It’s less product thinking and more project thinking. I remember when we were building classifiers and recommenders at SoundCloud, it would take one year. It’s like, can we have a spam classifier? A team would go off. I would fund this team. It would be eight people. One year figuring out what to do.

If you’re old-school data science, Enron emails was what we used to use back then. Then they’ll come back after that period of time, and it’s like, great, we have a classifier. What’s the success rate? It can detect 50% of spam. It’s like, what? You’re telling me that I just invested eight people for one year, to get something that’s just as good as flipping a coin? That’s not great. Like, “Don’t worry, we have this new technique. We are going to build this new advancement. It’s going to be much better. We just need 10 months”. They go again for 10 months, and they get something that’s 55% good at classifying spam. At the same time, they wrote five different research papers because this technique’s really novel.

That slow, incremental thing is how data science usually builds stuff. That’s one of the things that we see in AI a lot. I know a lot of different companies that are building the AI product that was announced sometimes in an earnings call by some fancy CEO. They have 10 people in a lab fiddling with models, trying different things, trying different techniques, whatever was out on Hacker News yesterday they’re trying today, trying to get this system into a product. It’s taking too long. It’s not going well. The classic story. Very waterfall. I don’t think that this approach works well for products for one reason. When we’re doing this for data science, I didn’t have my spam classifier. I had other things that I could do. I had humans labeling data. I have user self-reporting when something was spam, all this different stuff.

When we’re talking about AI and generative AI in 2025, what we’re talking about is putting the AI right on the critical path for your product. Basically, how much your company’s worth depends on the critical path for the product. You can’t take this approach that takes one, two years to get something done.

Then there’s a third approach that you might have guessed is the one I prefer. Again, back to my biases. Which is basically treat this as engineering projects. The way I see this, is the way that we do the classic from skateboard to spaceship iterative development. That’s how we built the tool that eventually became Outropy. I think there’s a lot of interesting things that can be done that way, things that are a little harder. One of the biggest blockers people find when trying to apply a software engineering approach and product engineering approach to AI is that there’s a lot of things in AI that are just not a good match for the technology, especially that we built for software engineering. I think there’s merit to this, is things that need to change. The situation is not as dire as we might think. That’s one of the things that I want to discuss a little bit further.

Building Blocks – Workflows and Agents

Let’s think about building blocks. Different people use different words for different things in AI. I want to establish as a vocabulary for the rest of the talk that there’s basically two objects or entities within a generative AI system. There’s workflows and agents. Workflows, if you read anything I wrote before, I used to call them inference pipelines. I still prefer the term inference pipelines. Anthropic called them workflows, and I don’t have the marketing budget. I’ll just go with workflows for now. A workflow is basically a predefined set of steps to achieve a goal with AI. “Summarize this email”. It’s like, you go here, there, there, there, done. Or, “Recommend me something”. The different things that we do with AI, with a static pipeline. Agents, it’s interesting because nobody has any idea what an agent is.

The way that we’ve been going about it is, systems where LLMs dynamically direct their own processes, tool usage. Basically, it’s a piece of software that is semi-autonomous. It can make decisions. It can collaborate with other things. Some of these things are tools. Some of these things are other agents. It executes tasks on its own. It’s given a goal and it goes and does that. With these two broad concepts, we’re going to dig deeper on them. The first thing you see, when you talk about workflows, if you talk to a vendor, especially a RAG vendor, RAG is Retrieval-Augmented Generation.

The summary of RAG is, I’m going to put context into your prompt so that the LLM knows about you, your problem, your company, whatever it is that it needs to know to solve a particular problem. There are various different ways to do that. There’s various different frameworks and ideas around this. Basically, a lot of vendors will sell you this. Like, we’re going to get you data from all your data sources. In this case, building on the example of our own tool, we get all the different productivity tools. We’re going to send it to a model. We’re going to use some vector database. Back then, 2023, vector databases were super-hot. Everybody was trying to sell you a vector database. Then you’re going to have the data that you need to do what you want. What we’ve learned is that this almost never works. This is great for demos that you show your boss and it gets funding for the project.

Once you start actually building systems, just this one step from A to B is, LLMs are not that smart. They were not that smart then and they’re not that smart now. What you need to do usually is add more steps that add more flavor, more color, more structure to what you’re doing. In our case, one very typical thing was we were processing messages from Slack. You could say, our first product, our first feature was a daily briefing that you receive every morning. I could send you all the messages from Slack, say, amongst all these things, please tell me what are the topics that Phil should care about. That was our first beta. It worked very well for the demo. Didn’t so much for anything else.

The way that we evolved that is that instead of doing this, we actually added stuff. It’s like, these are all the messages that happened in Slack within the last 24 hours, can you break this down into discrete conversations and tell me what are the topics of these conversations? Among these conversations, can we have individual discussions? Because Slack people come and go in a very asynchronous workflow, and build this object model, and that’s really a domain model the same way that we have domain models elsewhere. That then we as a final step say, ok, fetch data from this object model, this structure that has semantic meaning. With this other context, it might be whatever it is, what time of the day.

One thing that was really important to us was whatever was in your calendar for the day. Create the summary, the daily briefing for this person. There’s a lot of interesting things around this, especially on caching and other things that you can do. The most important thing to me is that this was an exercise in actually building a domain model. That’s what we were doing. We’re building multiple different slices, bounded context, whatever you want to call, using the LLM to do the transformation. Don’t fall for the usual thing. In fact, this is actually one of the basic workflows we had, this is a screenshot from our internal systems, that actually generates the daily briefing. I actually think that’s the case.

As you can see, each one of these things is basically a step on a pipeline that executes some transformation, takes data in one format, returns data in another format. I said it’s a step in a pipeline because basically, to me, we can create a very direct parallel between these workflows and data pipelines. That’s useful because then you can start thinking about the tools that we use for data pipelines already. Do you use Apache Airflow? Could be a good tool for you. Do you use some different data workflow, DAG engine? That could be good for you, too. It starts giving you a little more to work with instead of just starting from this mythical world of AI where everything’s possible but also nothing happens.

Then it still leaves us with agents. It’s like, what are agents? What are these agents saying? How can I build an agent? What does that look like? As a software architect, how can I model an agent? What kind of entity is it? The first thing, when it comes to these agents, businesses think of them as microservices. I tell you, don’t do that. As somebody who spent way too much time on this microservice stuff, I’m sorry, or thank you, or you’re welcome, whatever flavor you might prefer in terms of distributed systems. Agents are actually a very bad fit for a traditional microservice architecture.

Obviously, you can adapt and mix and match and do things. A lot of people do. Agents are very stateful. This is often terrible for microservices for various different reasons. They’re stateful to a point where because they have memory, they need to basically load everything they know about the user whenever they receive a request from that user. Then they have to do that again and again. It’s a very complicated setup that I don’t think is a good match for this. Non-deterministic behavior.

The only reason I think microservices even work is because there’s not a lot of variance on the paths that one takes within a microservice architecture. Of course, we have 10 million different combinations of microservices or whatever, yes, but that number is bound. You introduce a new feature. Basically, a new circuit is designed within your architecture. That’s fine, but that’s an event that has happened. When it comes to AI, you never know which way around your microservice architecture that request is going to take. That will definitely hit you when it comes to operations.

Data-intensive, poor locality, related to state a little bit, but it’s always very hard to fetch data. AI depends on fetching data from disparate sources, and only little chunks here and there that vary a lot. It’s hard to cache. Caching a lot of times don’t make sense. It’s kind of like stuff breaking, and underlying external dependencies. The microservices world of, I’m calling my database, I’m calling another service, and maybe there’s an exception, is turned upside down because you never know what you’re going to get back from an LLM. This is like the, don’t do microservices rant.

Then, let’s go back to that definition, try to get into a few words that summarize what agents are according to this particular version. Agents have memory. They have understanding of what has happened in the past and how that impacts the future. That’s really important for what you’re trying to do. They’re goal oriented, so instead of going step by step like we were doing with the workflow, where each step does a little thing, you should be able to tell the agent, “Do this thing”, and it goes and does that for you. They’re dynamic, depending on what’s in the memory and what stimuli it receives, it will behave in a different way. It likes to collaborate. Even to be effective, agents need to do something in the real world. That means they will call a tool or they will call other agents, and oftentimes a combination of this.

Then, wearing my very biased software engineering hat, when I look at this, it’s like, I actually know a paradigm that matches this very well. We talk about memory, that sounds like just general state. It’s very heavy, but it’s general state. Goal oriented sounds like encapsulation, sounds like I’m giving you what, not how, and you’re protecting the how. Dynamic, polymorphic, we all know these complicated words with a lot of Ys and Cs in the end, that sounds good. Collaboration can be a version of message passing, can be other things as well.

The reason that this is useful to me is because it allows me to start thinking of agents more like the way that we always thought about objects in object-oriented programming. I wouldn’t claim that this is anywhere close to the correct definition that some particular lab or source of AI information would tell you, but as an engineer, this is really helpful to me because this helps me build systems with these things. I can start thinking of them. Maybe my thinking will evolve past that, but it gives me a starting point. I know how to build objects and I can work this way. Basically, in the toolkit that we build, we started thinking of workflows as basically data pipelines, and agents are a lot like objects, with the good and the bad side of it.

Architecture Pointers

From this, let’s talk about a few of the architecture pointers. I’m going to go over a few hot takes on the things that we built, but there’s a lot more information online if any of this is interesting or something wants to follow-up. The first thing, we talk a lot about agents collaborating. What I’m going to tell you is, avoid point-to-point agent collaboration, which is funny because I just told you the agents are objects and that’s how objects work. Your object A called object B.

That’s when my whole framework breaks down a little bit. Because, usually you start coupling all these different agents to each other in a way that is even worse than you have in an object-oriented system because the agents are autonomous and they make decisions on their own. This is philosophical. I can discuss a lot of software quality measures of this. Also, another thing that’s really weird with agents is that because of the way they float in the ether, it’s very easy for you to end up reinventing like WS-*, SOAP, and all this stuff. Why? Because you’re going to start thinking about, I need a directory for my agents. I need some discoverability weight. What if an agent doesn’t know that another agent has changed information, what happens then? How can they negotiate this? How can they do security? I’ve seen a lot of people start basically rebuilding what we used to do with XML back 20 years ago, but with JSON, better somehow.

Basically, we invented the classic web services stack in this environment. Avoid that, is my take. Then, ok, but if I avoid an agent calling another one directly, what do I do? There’s one particular paradigm that I think works very well, which is basically using semantic events. There’s different definitions of semantic events. To me, the reason I like to say semantic events, because a lot of times companies have some kind of bus where you tap into some bin log, MySQL, Postgres, or whatever, where you have a lot of CRUD events. This row was deleted. This row was updated. Use this for this. Postgres is like a whole stream project. DynamoDB has one as well. I think every database has something like that. I don’t like to use that when it comes to different systems that are semi-independent. That’s agents, that’s microservices, that’s everything. I think this stuff needs to be a proper entity. It needs to be a proper object.

Instead of saying, table users had a row added to it, it’s like, no, there’s a create event user. There’s a schema that’s well-defined, well-known, all this stuff. Then, that’s the kind of thing that you can post into a bus. When I say bus these days, most people think about Kafka. I do not recommend you start to do Kafka. I recommend you delay using Kafka as much as you possibly can. If you’re successful, you probably will have to touch Kafka at some point. To give an example of what we were doing, following this architecture, was using Redis. Actually, we were doing Python. We were using first this SQLAlchemy, which is Python’s most popular ORM system, has its own event system. We started using that all in memory. Then we moved to using Redis, which a lot of people do.

Eventually, we started moving to Kafka after we hit some barriers with that. Basically, you send a message to this bus, and an agent register itself, and says, I’m interested in this and this and this kind of event. It’s a very traditional architecture for messaging that works a lot. It works well in this sense.

Then, because I’m talking about agents, I’m going to do an MCP name drop here for those who are more linked in than probably you should. MCP is called Model Context Protocol. It’s a standard defined by Anthropic. Basically, it’s a standard for how systems communicate with each other. It is a complete hot take, take it or leave it, is if you are a small startup right now working on AI, yes, build on MCP, do MCP interfaces, write blog posts about it, because it’s all investors want to know about. Whatever podcast they listened to last week mentioned MCP, and they are really focused on that. If you build an internal product, I would refrain from getting too close to even thinking about building an MCP interface. I was talking about WS-* and SOAP.

The protocol is evolving, and we’ve seen this before. I heard from a presentation that MCP exists to solve the problems that you don’t know you have yet. I’ve heard this before. It was called SOAP, and that didn’t go very well for anybody. Instead, what eventually we ended up converging towards was like RESTful architectures, protocol buffers, gRPC, which are based on things that we basically develop empirically. We saw them work in production, then we created a protocol on that. Again, hot take, hold your horses if you don’t have to. If you’re raising money, MCP all the things. MCP your dog. I don’t care. Just MCP everything. You will need that. That’s how this market works. It’s not fair. It’s terrible, but it is what it is.

Another part of agents that’s very important that we’re talking about is agentic memory. How does that work? What does that mean? It’s really interesting, because there’s different ways to think about it. It’s actually a very interesting problem to sit down and try to solve on your own. Basically, you need to keep track of everything that an agent knows about somebody. In our case, there was many different agents, for example, a user, I need to know what do I know about user X? It’s a productivity tool, do I know that they say that they’re not going to be in the office today, so they’re probably working from home? Or, they’re not working from home because they are sick? All this different information was really useful for the kind of software we were building.

Then, how you go about building this kind of stuff. The first attempt people make, and there’s even products to do that, is basically, you list everything that you know about the person in that document. It might sound that I’m abstracting, but no, that’s what people do. They just create a very long text document with everything that you know about that person. Then you put this into a vector database. Whenever you have to do something about that user, you do similarity search. You find the facts that you know about that person. I know that Phil was on vacation or whatever it is. This is how ChatGPT memory works. If you had some freaky, weird things with ChatGPT, a lot of times it’s because of that. It knows that you’ve mentioned your cat, it doesn’t know that your cat died. Those are the things that happen on a system like this. Oftentimes, especially for productivity tools, but for any AI product, having no memory is better than having a faulty memory.

The person with two watches situation is complicated. There’s a very interesting option within what we know in software engineering that actually works very well. A lot of people are doing it. We were doing it, which is event sourcing. Basically, the idea is, we’re already talking about having a bus that has events and there are agents interested in those events. What if you actually get this stream of events that you’re receiving about the user and you compact them into some form of representation about what’s happening. In our case, we were doing a lot of natural language, and natural language can be complicated. Then we found this format here, it’s called AMR, Abstract Meaning Representation. I don’t necessarily suggest that anybody uses this, but it’s an interesting way to think. Basically, it’s a format that information retrieval people defined a million years ago that breaks sentences of what was said into structures. This looks like Lisp a little bit, but it has, Bob sends a message.

The fact is the message was sent on Slack. Bob sent a message on Slack, is a fact that has a structure, could be a JSON payload or whatever it is. You can start breaking natural language into these facts and creating a log system that you can create a snapshot of whatever it is. The way that we actually use memory was through a graph database. We actually use Neo4j. The main interesting thing about AI that we’ve seen in other domains, especially when dealing with natural language, you cannot be super sure about things. We actually had a probabilistic graph, which is the thing that I’ve written about, and I can send you pointers. Because somebody says, this project is late. How do I know if this was the person who manages the project? How do I know if there was an intern who misunderstood something? Or, how do I know if the person was joking? There’s different things that one needs to do that way.

In any case, event source is a great option. If you all know something called Zap, I think they open-sourced their product, that’s also how they do it. They do it in a different way. They do it with pure natural language. I suggest that you don’t do your events on pure natural language, you structure them. Event sourcing is definitely a way to go about this.

The next one is related to what we were talking about data science in the beginning. A lot of the projects I’ve seen and I’ve sponsored in the data science side of things, they create monolithic pipelines. Basically, I want to have a classifier for my content. They will get something that builds from datastore A all the way to spitting out the classified content, whatever it was, on the other side. There’s some we use, usually copy and paste. Basically, they’ll build this pipeline from A to B. If you read all the stuff about data mesh and all these fantastic things that are totally great and nobody does, there’s different ways that you can solve this problem. That’s usually how people work. It’s just like these monolithic databases. That’s how we started. I think it’s a valid starting point.

Then you ended up having, basically first, high coupling between each stage. Even copy and pasting things is hard because there’s no well-defined interface between each stage. The worst thing to me was that the pipeline mixes unrelated concerns. What that means is, in our case, the same pipeline was fetching messages from Slack. Not necessarily the act of fetching from Slack because that was saved in a database. It was more like understanding that Slack has a particular format for how they do mentions, and what’s a thread on Slack, and all these different concepts. It had to understand also how Google Calendar works just so that it could generate some report in the end. That’s a lot of different concerns in one go. Instead, what we did a lot was basically, again, applying good old software engineering and breaking workflows down into smaller ones that actually had some published interface, some actual semantic meaning and semantic entities that it returned.

In our case, these are members from the actual personalized summary. The first, we had one tiny pipeline that summarized discussion from our Slack channels. Then we would deduplicate discussions because it turns out that people talk about the same stuff in different channels. We would then rank the discussions in personalized summary.

The interesting thing about something like this is that I can basically swap the first component there, and say, no, I’m not supporting Slack anymore. Or, I want to use the same logic, but for, in our case, Discord, or Microsoft Teams, or even email, you could basically use the same thing. We use, obviously, the golden grail for everything we do. That’s an interesting thing about AI in particular, we’re talking about agents being dynamic. It allows agents to swap things in and out as they please. This is a more advanced thing that you might not ever do. If you just think of the concept of agents as a thing, they should be able to do that. They should be able to, actually, I’m going to choose this re-ranker instead of that re-ranker. That happens a lot.

Distributed Objects, Twelve-Factor Apps, and Durable Workflows

We talked a little bit about how these things are built, these various different components of an agent to be resilient to a production environment, to a product. Wait, I was talking about agents being objects, and I’m hoping you all understand that it’s an analogy that’s useful. There’s one interesting problem with objects, which is, there’s always this guy. Martin is always saying something that destroys everything I do.

In this case, it’s a very old meme, this is 2004, around distributed objects, the first law of distributed objects, do not distribute to objects. If I’m telling you these things are like objects and these are distributed systems. Say, ok, math is not mathing here. What’s going on? The first thing about this, to me, is that context, whoever was writing software in 2004 knows that this is very much focused on the kind of software we had back then, with RMI, distributed objects, CORBA, Enterprise JavaBeans, all that kind of stuff, where the kind of call we would make would be user.getName. You wouldn’t know if that’s a remote call or not.

Oftentimes, it was a remote call, and your system would ground to a halt because it was terrible, chatty interfaces. An agent is a more coarse-grained object, which is more similar to a component. It doesn’t really matter that much, because there are implications to how we deploy software and operate software in the agentic systems. Two years ago, when I started this, I would go to an AI meetup, and I’ll talk to other folks also building these systems, and I would ask them, what about these problems? It was really confusing to them, because they were just deploying to even sometimes a DigitalOcean Droplet, or some small thing like that.

The problem was that every single thing we’ve built in the last 10-plus years is based on this little document around infrastructure. This is a Twelve-Factor App manifesto. This is late 2000s, early 2010s. Heroku, as part of really the innovation of a product that they did back then, they released a guide on how to write applications that actually scale on Heroku. Everybody in the industry is like, this is pretty reasonable, and maybe we should follow these principles as well. From containers to Kubernetes to everything, basically, that we build around infrastructure and cloud infrastructure, that’s not data science related, over the last 15 years, have assumed that we can follow at least a big subset of these principles here.

AI systems in general, but agentic systems in particular, just break so many of these rules that it becomes useless. I just flagged some that I think are more intuitive here. Store configuration environment. Configuration doesn’t control your system anymore. Your system is semi-autonomous, so it has configuration in itself that changes over time. Execute as one more stateless process. There’s no stateless. We actually need to bring context every time we execute something, and that context is data that’s expensive to bring to memory, so we probably want to keep it around. Concurrency. Concurrency is a joke in AI right now. Not only doing things concurrently can be really expensive, and it’s getting better, but still expensive. Latency is a killer.

There’s one particular thing that happens a lot in complicated AI workflows is that there’s always a bottleneck, which is a decision point. We’re like, what’s the user trying to say? Oftentimes, this is the slowest part of your system, and you cannot actually make that any faster because you need that decision to move on. Disposability, dev/prod parity, log. There are all these different things that basically break this model completely, which is one other reason that I’d say I wouldn’t think of microservices as a good way to deploy agents and AI in general.

What should we use then? I stumbled upon something that’s becoming more popular, durable workflows. Actually, the reason I started thinking about this was because people I used to work with at DigitalOcean a million years ago actually started using that a lot. DigitalOcean uses Temporal. Other folks that I started talking to, like very large scale, I think DoorDash, some very big players use Temporal. I was like, that’s interesting. What’s a durable workflow? These three here, I try to be comprehensive. The only one I personally have experienced is Temporal. The other ones, I’m sure they’re nice, but I haven’t used them.

A durable workflow is basically a way to do all that pipeline in a way that the framework or the runtime takes care of retries, basically all the resilience aspects of it. Retry is a big one, timeouts, all this kind of stuff, for you. If your workflow is interrupted midway, it will actually be able to checkpoint and start from there. How does it do that? Is it magic? It’s funny. No, it’s doing that by doing the things that the high scale people have been trying to tell us for years now, which is separate side effects from orchestration code. In a framework like this, you usually have some piece of code that’s just like, this is just orchestration, just data flow, and this is side effect driven code.

The article I wrote more recently has a lot of interesting information on how we use this to build agents around these different primitives. Irrespective, what I see over and again is that if you are building an AI system, and sometimes even just a data system, you’re going to end up reinventing this anyway. If you don’t use a framework already, you will get a sidekick on Redis, and then eventually it’s like, I need to create jobs, and, I need to split my jobs into smaller parts because I need some checkpointing, and this and this and that. Everybody ends up reinventing this whole technology. I would invite you to at least make yourself familiar with that so that when the need comes, you can go for it.

Overcomplicated Architectures

As a final step, I think that that’s cool, sure, but can we talk about the elephant in the room? Did I say that I served 10,000 users? This is the final architecture that we had for this product. I was a consultant for many years, in the 2000s, and one way or another, I ended up in a situation in my career where I often come to companies where they’re in the teenage phase. They have product market fit, product is going like crazy, they hit a wall because technology doesn’t work. I come in, somebody locks me in a room. It’s like, ok, let’s talk about the architecture. I’m going to tell you that if you brought me as a CTO, VP of Engineering, whatever, my first day is like, “Tell me how we build things here, what’s the architecture”, and you show me something like this. I’m like, how many users do we have? You say 10,000.

My first immediate reaction will be, I need to fire every single one of these people, because this is way overcomplicated. Ten thousand people, I should be able to serve from my laptop in any system. This is insane. What is going on? Hopefully, a little bit of what we talked about explains why this ended up being so complicated. Between the different databases for memory storage, for content storage, between having the semantic buzz maybe from the beginning, because now you have these agent things that shouldn’t be calling each other, the whole thing in the bottom there is all about making sure that we have some resilience around, in this case, was the ChatGPT API. It’s a little bit better now. I would still have a lot of bulkheads between your code and whatever AI model you’re calling.

Basically, what I’m trying to say is that we definitely need better platforms. I’m not saying that this is where AI is going to be in the future. I hope not. If we still have to do this kind of stuff to build a productivity tool, we’re all screwed. This AI stuff is not going to work. These platforms don’t really fully exist right now. There are two things that I think are interesting, but I haven’t used them.

One of them is called BAML. BAML is basically an attempt at structuring AI and the way that we build software AI. There’s one big problem for me. I’m a big programming language nerd, and I love programming languages. BAML has its own programming language that transpiles to whatever language you use. I think there’s a massive barrier to entry, and I don’t know if they’re going to be successful that way. It’s good to keep watching those guys. They’re smart. I’m sure they’ll find a way to deal with that. Another one is within the Rails space. It’s very recent. I haven’t had a lot of time to dig into, but within the Rails community, the Shopify people, Obie Fernandez, also ex-Thoughtworks guy, released some interesting thinking around how to build AI in a structured way within the Rails framework. I’ve done a lot of work within the Rails side of things on AI, and there’s interesting opportunities, and a lot of problems.

Final Note on Shipping AI Products

The TL;DR for everything that I’m trying to tell you, is that there’s a lot of hype, there’s a lot of conversation, there’s a lot of different definitions and things like that. I am not one to come here and say that’s how OpenAI should work. That’s how Anthropic should work, or Mistral, or whatever. I don’t have anywhere close to the right experience to say how to build an AI model, how to train one, how to serve one. Even operating one is something I have very little experience. What I’ve observed building products around this, going on three years now, is that outside the box, it’s really not that different from the concepts we always had in software engineering.

Folks who are a little older like me here, are like, “Yes, no way. There we go again”. It’s actually very important, because AI is in this unfortunate position where there’s way too much money, and that means that people need to hype up stuff, and there’s basically no technical content in a lot of these things. In absence of somebody telling you, this is how we built these things. InfoQ and QCon back in the 2010s when we were all going cloud native, you would go there and you see people talking about how Netflix, which was much smaller, was adopting the cloud. It’s like, ok, these guys did this and this and this, and that worked. Twitter would say, we built Finagle, we built this. There would be these hubs of content that were great and were all based on actual experience and not vendors telling you what you should do. We don’t really have this with AI anymore. For many different reasons, we don’t.

One way to go about this when you’re building products, and not basically run away because you’re desperate, is to use your software engineering brain, try to find parallels, and don’t think that this is anything different from when you first heard of a NoSQL database. It’s like, what does that mean? It’s very different. It’s a different way, but still a lot of your assumptions and concepts apply.

See more presentations with transcripts

Key Lessons from Shipping AI Products Beyond the Hype

Transcript

Bias Check

What Was Outropy (Build, Run, and Optimize AI Pipelines)?

The Three Ways We Build AI

Building Blocks – Workflows and Agents

Architecture Pointers

Distributed Objects, Twelve-Factor Apps, and Durable Workflows

Overcomplicated Architectures

Final Note on Shipping AI Products

Leave a Reply Cancel reply

Stay Connected

Latest News

Pinterest Uncovers Rare Search Failure During Migration to Kubernetes

Ford’s Answer to China: A Completely New Way of Making Cars

Trump’s tariffs on Bosch China would have a “fairly small” impact, president says · TechNode

Apple Fitness Plus Targets GLP-1 Users With New Weight-Loss Partnership

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Transcript

Bias Check

What Was Outropy (Build, Run, and Optimize AI Pipelines)?

The Three Ways We Build AI

Building Blocks – Workflows and Agents

Architecture Pointers

Distributed Objects, Twelve-Factor Apps, and Durable Workflows

Overcomplicated Architectures

Final Note on Shipping AI Products

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News