Key Takeaways
- Architects need to separate AI hype from real software. Design systems based on tangible components such as LLMs, not a vague vision of AI.
- Determining how, where, and when to use AI elements comes down to traditional trade-off analysis.
- First, determine if AI software is a good fit for your application. Like any technology, AI can be used creatively, but inappropriately.
- Second, determine how to effectively use AI. Consider the trade-offs of using an AI-as-a-service API versus self-hosting.
- Architects can augment their decision making and communication skills with AI, leading to better designs and better understanding among stakeholders.
Arthur C. Clarke famously said, “Any sufficiently advanced technology is indistinguishable from magic”. Right now, that “magic” technology has come to be known as AI. Artificial Intelligence is a great umbrella term and is fantastic for marketing, but it doesn’t mean one specific thing we can simply add to our software. And yet, product owners, CEOs, and marketing teams want us to add it to everything. Customers aren’t asking for AI, but they will start to expect it as table stakes for every application.
We must get past vague, hand-wavy guidance about how and why we should use AI. It’s like being asked to get out the spray can and apply a nice coat of AI to everything. Such a haphazard approach will not create the outcomes people are hoping for. We need a thoughtful approach that understands what AI is and where and when we should use it. That’s what I’m calling Architectural Intelligence.
Software architects now need to understand what AI is and what it isn’t. This requires getting past the marketing and hype phrases and talking about real software.
Once we know what that real software is, we must figure out where it makes sense in our designs. And because architects need to think about the architecture of their systems as well as how to improve our architectural practice, we’ll spend some time on how architects can use AI as part of the process of designing systems.
What is AI?
AI is just a marketing term and almost meaningless. In all seriousness, the term AI gets thrown around when we don’t have a better term to describe something. Once some widget becomes a known entity, whether a process, code, or product, we give it a name. Some have said we call something Artificial Intelligence until we understand it, and then we call it computer science.
“AI” means we don’t have a more specific name for something. That’s one of the key points I want to convey. Instead of discussing where we’ll add AI to our software, we need words to describe the real things we can incorporate into our designs.
By now, you, or someone you know, have at least one story of a business leader saying, “We need to add AI to our product”. Replace “AI” with “DB”, and think about how you would respond to that request. Would you turn around and ask the businessperson what type of database they want? I don’t think that will work out too well.
But as software architects, we know the answer to that question. It depends. We need to ask what the requirements are. What type of data do we need to store? How will it be accessed? Where will it be accessed? And we use those requirements to help inform our trade-off analysis. There is plenty of guidance around all the trade-offs. Relational vs. document store? File-based vs. in-memory? What are the ACID requirements? Etc.
Return to the scenario where you were asked to add AI to your software. You still can’t ask the businessperson what type of AI they want. However, the answer is still the same. It depends. But do you know what it depends on? Do you know the right questions to ask to help define requirements? Do you know the options available to you?
AI usually means GenAI
Five years ago, the only time most people talked about AI was referring to science fiction. Books, movies, TV shows, and other media have AI as characters. Those characters are examples of what is known as artificial general intelligence, AGI. We don’t have that.
Today, in 2024, when someone says “AI” as a blanket term, they probably mean Generative AI or GenAI. We see this in dedicated products, such as ChatGPT or GitHub Copilot. And it’s also the most common “spray some AI on it” approach for products. If some executive wants to ensure their product looks modern and innovative, they add some GenAI. This has led to far too many chatbots. And generative AI is not only about language. We can generate images, videos, and audio using Dall-E, Midjourney, and other tools.
GenAI starts with machine learning
GenAI is a specific application of the broad concept of machine learning, ML. More specifically, it’s a deep learning process. For our purposes, the machine learning process is relatively straightforward. We’ll leave the complicated details to the data scientists.
Creating an ML model starts with training data—a lot of training data. It can be anything—historical sales data, photos of animals, user comments. You feed that input and expected output data into an ML training process. This is like a unit test. Given input A and expected output B, how close did you get? The training process creates an ML model and then evaluates it against the expected output. It’s a simple process in which you repeat thousands, millions, or billions of times, continually refining the model.
For non-data scientists, it’s helpful to think of ML models as simple function boxes, like a compiled library. In the same way you import a math library to compute square roots and logarithms, you can import the ML model into your application. Alternatively, you can call it via an API.
In either case, it’s a simple function box. You provide some input, and it returns some output. That’s it. The inputs and outputs can be extremely complex compared to a square root function, but the function box analogy is valid for all types of ML models. Image recognition models. Sentiment analysis models. Sales prediction models. And getting back to GenAI, large language models.
LLMs are ML models where the training data, the input, and the output are words—lots, and lots, and lots of words. The input for an LLM is a series of words. Well, really a series of tokens. What’s a token? It might be a word or part of a word. The details aren’t important. Remember, this all results in an array of floats the model can handle.
Our output is a token—not a series—one single token. LLMs are just predictive text engines. However, predicting a single token or word doesn’t seem very useful. Next, the single token is fed back into the model, and everything that came before and it predicts the next token. This repeats until the model returns a token that says it’s done. This process is known as autoregression.
Where should we use AI?
Now that we know AI means GenAI, and GenAI really means LLM, and an LLM is just a type of ML model, we can start to think about where that functionality makes sense within our systems.
In 2024, we’re at a point where machine learning and GenAI are no longer an afterthought. These are technologies that are quickly becoming core elements of modern software systems. How did we get here?
Not long ago, when companies started hiring data scientists and creating machine learning models, everything was an add-on solution. The data already existed. It was used to train custom ML models. Those models were then used as a component in an analytical process. Companies did experiments, and experiments don’t always work, so you keep them separate from your business-critical applications.
Now, we’re seeing the shift-left of ML because we want the resulting models to be internal components in our software, not extra add-ons in secondary systems. In the past couple of years, we’ve already seen AI components that were just add-ons, like all those chatbots, also shifting left and becoming core system components that provide functionality to users. That’s what we want to get to. We don’t want AI to be a sideshow. But what does that look like? Where does it make sense to use an AI component instead of “traditional” software written by developers?
Considering where and how to use AI in your system is like introducing any new technology to your system design. You need to do a trade-off analysis. What are the pros and cons of a design that uses AI versus one that doesn’t use AI? Or one that uses a different AI model. Consider the scenario where you’re asked to add AI to your software. How do we begin our trade-off analysis? What are the “It depends” factors?
Is AI appropriate for the scenario?
When designing for a new feature, we usually start with appropriate options and discard those that are not. Because AI is in a hype cycle, we see suggestions to use it where it isn’t relevant. Instead of AI, think of another piece of technology and how we might misuse it—a relational database as a caching engine or an email service built entirely using Lambda functions. AI is not the golden hammer we’ve been waiting for.
The ways we can use AI cover the spectrum from good to possible to bad ideas. Many of the features where AI makes sense are scenarios where words are involved. These are large language models, so they’re great at working with language.
Good uses of AI
LLMs can provide a natural language interface. This works for both input and output. Let people describe what they’re trying to do, and the LLM can transform that into system instructions. Or the LLM can transform a system response into a more human-friendly format.
This makes sense if you have a customizable yet complicated UI where someone can describe what they want, but it takes a while to configure everything. Consider an AI-powered, ad-hoc report builder. You request a new report that runs every Monday through a chat-style interface and provides a week-over-week comparison. That can save time and frustration of learning how to configure all the settings through the UI.
Possible uses of AI
There are some scenarios where AI makes sense, but unless it is implemented thoughtfully, the outcomes can be far from ideal. Many complex software systems are composed of critical but difficult-to-use subsystems.
In an e-commerce scenario, you may have a rules engine to define all the discounts available to customers. When faced with a poor user experience when configuring all the rules, replacing the rules engine with an AI may be tempting. However, if the LLM decides whether a customer is eligible for a discount, you may be surprised by what happens.
Always keep in mind that LLMs are designed to predict the next word. At some point, when asked if a customer is eligible for a discount, that next word may be “yes” when you expect it to be “no”. That’s a feature, not a bug.
Instead of replacing the entire rules engine with an LLM, use it to improve the user experience, making it easier to enter and validate the rules.
Questionable uses of AI
The third category is where using AI is a terrible idea. Most of these scenarios involve auditing and data traceability, such as precise mathematical scenarios or when creating reports for regulatory and compliance needs.
An LLM may be a good candidate for analyzing your data, summarizing what it means, and suggesting what you should do. But don’t use an LLM if you need the data to add up correctly. They are terrible at math. Are you asking the LLM to help you build a report using your reporting API? Good. Are you asking the LLM to sum up all the sales totals for the previous quarter? Bad.
There are reasons the fundamentals of accounting have not changed since they were created hundreds of years ago. You should not let an AI balance your books and report those numbers to your shareholders.
AI is non-deterministic software
The vast majority of software has deterministic outcomes. If this, then that. This allows us to write unit tests and have functional requirements. If the software does something unexpected, we file a bug and rewrite the software until it does what we expect.
However, we should consider AI to be non-deterministic. That doesn’t mean random, but there is an amount of unpredictability built in, and that’s by design. The feature, not a bug, is that the LLM will predict the most likely next word. “Most likely” does not mean “always guaranteed”.
For those of us who are used to dealing with software being predictable, this can seem like a significant drawback. However, there are two things to consider. First, GenAI, while not 100% accurate, is usually good enough. And second, architects deal with “good enough” all the time. We know that “perfect is the enemy of good enough”. Architects perform trade-off analysis and try to determine which compromises are acceptable.
When considering AI components in your system design, consider where you are okay with “good enough” answers. I realize we’ve spent decades building software that does what it’s expected to do, so this may be a complex idea to think about.
As a thought exercise, replace a proposed AI component with a human. How would you design your system to handle incorrect human input? Anything from UI validation to requiring a second person’s review. What if the User in User Interface is an AI? There should probably be similar validation for what the AI is producing.
This is manifested in the fact that so many products are called “assistants” or “copilots”. Those terms are intended to reassure us that humans are still making the decisions. But if you turn over decision-making to the AI, that’s when you go from a copilot to an agent. Moving to AI agents requires additional consideration of the validation process.
How to use AI effectively
If we determine that GenAI makes sense for our scenario, the next design decision is how to use it effectively. Once again, the answer is, “it depends”. Using AI effectively will always be context specific. However, some high-level considerations usually apply.
First, we must ensure we make “good enough” software acceptable. It’s worth noting that AI outputs often have a very high level of “good”. Look at how the LLMs have evolved and can now pass most standardized tests.
When possible, create tests so you can quantitatively evaluate and compare LLMs. If you have a specific task, make a set of test prompts, input them to the LLMs, and assess the quality of their output to what you would expect an experienced person to produce.
Because the proposed uses of AI often replace or supplement a human, it’s helpful to do the thought exercise of a human doing the work. Would the person just need a high school diploma, or do they need specialized training, like an accountant or paralegal? If it’s the latter, look for specialized models that speak that language.
Bigger is not always better
When a new LLM is announced, the big numbers make headlines, with the number of parameters being front and center, whether 8 billion or 80 billion. However, architects know there are always trade-offs, and those extra capabilities have downsides. Instead of automatically using the latest LLM, we have to consider the intended purpose.
The most significant large language models are great for general-purpose tasks but may be more powerful than necessary. Larger models cost more to train and operate, measured in dollars, time, or carbon footprint. If a smaller model can meet your needs, it may be faster, cheaper, and accurate enough.
Smaller models can also be more specialized and trained for a specific domain. You may not be able to ask about Taylor Swift, but chances are your software doesn’t need to know about popular artists.
AI evaluation and optimization
Unless your product is a language model, training your own language model does not make sense. These are now commodity products you can purchase and should be a “buy” versus “build” solution. In some cases, it may make sense to fine-tune a model, which is analogous to sending a model to grad school. However, this only makes sense if that fine-tuned model is a major differentiator for your software products.
The preferred technique to optimize LLM performance is Retrieval-Augmented Generation (RAG). RAG complements an LLM by searching a knowledge base. This is good software architecture, combining two pieces of technology into something even better.
An LLM can provide a natural-language interface that translates user input into a better search query. The results are then passed through the LLM to generate a user-friendly response, including citing the sources. The RAG pattern suits many scenarios where product owners want to add AI to existing systems.
Renting versus owning
Because LLMs are commodity products, one choice is to access them via an API in an LLM-as-a-Service approach or install them on hardware you manage. When getting started, the pay-as-you-go approach makes the most sense. This allows you to do experiments, figure out what works best, and compare different models. Gather the essential trade-off data, such as quality of response, time to return a response, and cost. It may also lead to a result that using an LLM does not make sense for your scenario and may help you adapt your use case as needed.
Once you’ve determined an LLM can be a useful component of your software, a self-hosting approach may make sense. This isn’t a simple cost calculation because many factors are involved, and you may not be able to use the same models internally that you can get through an API. Again, some experimentation may be necessary for adequate trade-off analysis.
Another factor to consider is security. When you control the hardware and the network, there are fewer concerns about data leakage. If you have to sanitize the data being sent to the LLM, that will impact the quality of the output. Also, if you want to fine-tune a model, you will likely commit to self-hosting.
When should an architect use AI?
In a prior talk, I covered the four fundamental skills that come into play every day for architects. First, the foundation of everything is communication. The second is decision-making. Third is being able to adapt to change. Fourth is leadership. These are all important, and on any given day, you may need to focus on one more than the other. But this is the baseline rank and how they build upon each other.
The good news is if you’re looking for places where AI can help you be a better architect, AI is beneficial at the bottom of the pyramid. I’m not saying AI should replace your decision-making process or handle all of your communication, but it can augment the skills you already have and take them to the next level.
Communication boosts with AI
LLMs are great at summarizing large amounts of information. For a software architect, this is especially helpful when communicating with various audiences. Starting with a long design document, the LLM can generate different summaries for the CTO, the product owner, and the development team.
You can also ask the LLM for feedback on what you’ve written. Before you send an email or give a presentation, prompt the LLM to respond with any questions the CTO or product owner would likely ask.
Decision-making boosts with AI
Good architecture often comes from a collaborative process, with people standing around a whiteboard, drawing and erasing new ideas. For an architect working alone or just getting started, an LLM can act as another architect to bounce ideas off of. LLMs are good at brainstorming additional design options and can explain trade-offs.
You can even ask an LLM to write an entire architecture decision record (ADR). This can be useful for a first pass, especially if it forces you to think about and describe the requirements clearly enough to input them into the LLM. However, as with all interactions, never assume it is correct. This is especially true for designing new software components because the LLM may spit out words that sound correct but are not feasible to implement.
Summary
Architects need to separate the AI hype from real software that we can actually implement. There is nothing magical about GenAI and LLMs. Determining how, where, and when to use AI elements comes down to traditional trade-off analysis. One way to get more familiar with the capabilities of any new software is to use it on a regular basis. Architects can augment their decision-making and communication skills with AI tools, leading to better designs and greater understanding among team members.
This article started with Clarke’s Third Law. His second law states, “The only way of discovering the limits of the possible is to venture a little way past them into the impossible”.
There’s a lot of hype out there. I hope I’ve provided practical advice on how tangible “AI” tools can be used pragmatically in software systems. But I also know I don’t have all the answers, and I certainly don’t know what is possible.
Sometimes, it’s helpful to believe the hype and try to do the impossible. You might be surprised by what you can achieve.
This article was adapted from a presentation at the iSAQB Software Architecture Gathering 2024.