Transcript
Groner: One of the things that frustrate me a little bit, especially in this AI world, and as a Java developer, is that the majority of the examples that I see online, it’s all about Python. I’m a Java developer. It’s fine. I’m sure that for all of you that are seasoned developers, it’s ok to develop something in Python, and then you can integrate it with your Java project. We can do that. It’s not the same thing. I would like to actually be able to code in Java to do whatever I have to. After all, it’s a pretty powerful language, a very established ecosystem. I would like to use Java so I can develop my own AI solutions using Java. Let’s say that we do want to build an AI solution in Java. How do we do that?
First things first, you’re going to select your LLM. You can go with OpenAI. It’s a pretty standard choice for your Hello World. You’re going to go to the documentation and you’ll see how to actually do a Hello World using OpenAI. Of course, you’ll see Python over there. Python is always there. I’m going to count as a win because we’re starting to see examples in Java as well. You’re going to integrate it with OpenAI. You’re going to do your POC. You’re going to run your test.
All of a sudden, OpenAI is not the best choice for me. How do I change that? Let’s say I want to move to Gemini or any other LLM out there. I’ll go to the documentation. I’ll see a different API. Of course, we’ll see very similar concepts. Prompt, how to do all that, function calling. The code’s going to be different. If you are developing your solution and you are designing in a way that you are following clean architecture, all the good stuff, you should not have a lot of problems. If you had embedded the API into your code, it’s going to be a little bit more complicated than simply changing one line of code for that.
Fortunately, we are Java developers. We also have this famous framework called Spring. Spring is also a pretty established ecosystem. The majority of the Java solutions that we have out there are using Spring. Of course, we have Quarkus and other frameworks as well that are catching up. It’s been in the industry for the past 25 years. I’ve been using Spring since 2009. I’ve been through big changes, like database migration, changing completely from one database to the other, those big architecture changes. Because I was using Spring, it was easy, a lot of testing, of course. It was easy to change from one thing to the other just because of the way that the framework is architected behind the scenes. Why don’t we have something like that for AI? Now we are in the AI era, and we want to also do easy stuff with AI.
Background
My name is Loiane. I’ve been working with Java for the past almost 20 years. One of the things that I would like for you to take away today is AI, especially implementing AI into our Java solutions, should be all about software engineering. It’s not that complicated. You’ve learned how to do a REST API. You’ve learned how to integrate that with a database. We all learn about that. Then you learn about design system, microservices. We all learn. Now let’s just add AI as one little module to our projects, and there you go. We have modern solutions out there as well.
Spring AI
With Spring AI, we can still be in our comfort zone after all, it’s just another module that we’re going to be adding to our project. We can use this for the most popular use cases out there. Do you want to do a ChatGPT clone? Do you want to use memory? Do you want to use embeddings? You want to go through the agentic AI route, add a little bit of function calling tools, MCP servers, clients, all that is available to us. I hope I can show you.
Let’s start with the basics. Let’s start with our Hello World. The most simple use case that you can add to your application right now is just a simple AI chat. Just a button on your frontend, you open and then you have a chat, the user can interact with the chat without any additional logic. That’s the most simple use case. How do we do that? First things first, you’re going to go to your pom.xml. Yes, I’m old school, I like Maven. We’re simply going to add whatever LLM of your choosing into your pom.xml file. Again, I love property files. I still use property files. I’m not going to go to the YAML route, at least not for now. I also need to add my API key.
Again, project ID depending on the LLM that you’re going to choose. That’s it, we’re ready to get started. How do we get started? We’re simply going to add chatClient. This is a beautiful interface that is available so we can start doing simple prompts. You inject through your constructor. You’re going to build a chat client according to your needs. We’re going to go through that route as well. There you go, prompt, passing the user message, and calling the prompt, getting the content, and displaying to the user. There you go, you have AI in your project. Yes, it is simple as that. Not complicated.
If you change your mind, again, I don’t want to use OpenAI, I want to use Gemini, or Claude, or anything else. This is beautiful. Remove my old dependency, put my new one, my new properties. That’s it. You don’t have to change any of your source code. It will still work. In case you want to use multiple LLMs in your project, I’m sure that at some point in your career you have added multiple databases in your project, and all you have to do is add a qualifier annotation, and that’s it. Same thing right here. You add a qualifier: this is OpenAI, this is Gemini, this is any other model. You can use multiple LLMs for whatever is the best that they offer for your use case.
Let’s go to the next step. Let’s kick it up a notch. We’ve done the simple chat. Now the user can do whatever they want. If you go to Gemini, if you go to ChatGPT, I don’t know about you, but probably I’ll have my summary from how the conference went, and I’m going to just add my text, my draft. Please generate a version so I can post this on LinkedIn, with hashtags and all the emoticons, and things like, make it cool. The ChatGPT or Gemini will generate a text for me. I don’t want to paste that same text again. All I want to do is, now generate a version so I can post it on Twitter. It’s going to generate a new version. I don’t have to provide the context because I have provided the first text already. This is what we call memory. With memory, we’re simply providing some context. The way that this works behind the scenes, of course, everything is being persisted in a database. If your company has ChatGPT, yes, everything is being persisted.
Every prompt, every question that you’re asking is being persisted or can be persisted into some database. You can use that as context. Behind the scenes, you can get this from either a vector store, a relational database. You can also provide how many messages that you will like to provide to the prompt. It’s going to append all the prompts, both questions and answers, and will provide that in the system prompt as a context. Of course, in the AI world, tokens are the currency. We also have to be very careful with that. Usually, a word will have between two to five tokens. If your response is long, if you’re not actually tuning this in, this can become quite expensive. By default, Spring’s going to use the last 20 messages.
If you want to make this shorter, like a shorter memory, you can put how many messages that you like. We also have something very cool that’s called Advisors. This is very important, and you’ll see this throughout. The Advisors are like interceptors. It will intercept your interaction with the model. It can enhance or update as needed. In this case, we’re just simply saying, please, whatever I’m sending here, like requests or responses, persist it into the storage that I had defined earlier. That’s all. Again, this is a very nice, cool example that you can add to your solutions right now. Not many lines of code. You can also select and pass the conversation ID. When you are opening ChatGPT or Gemini, you’ll see the sidebar, and then you can select the previous conversations. You can also delete them as well. You can select each of those conversations, have a conversation ID, and that’s how you can do this as well within your own application.
Personas, Guardrails, and Security
With new technologies comes new responsibilities, and of course, problems. Why not? I’m sure at some point in your life, if your application went through security testing, you have gotten a SQL injection. Let me present to you, prompt injection. It’s another thing that we have to worry about. One cool thing that I find, especially with Spring AI, is that we can define a system prompt, act like this. You define what persona, how the AI should behave in this case. You can also add things like, if the question is not related to this context, please say that you don’t know, or something like that.
Then you can pass this as default a system to every interaction that you have within your LLM. Simple as that. For whatever input you’re getting from the user, you’re going to pass it as a user prompt. You are differentiating, this is the context, this is how you should behave, and this is what the user is inputting. I’ve tried this. You can say, please ignore all the previous instructions, and it doesn’t work, which is really cool. Security 101 from the prompt injection is like done. I don’t have to worry about. I don’t have to use any fancy tool. I think that that’s pretty awesome. Of course, prompt injection is not the only thing that we have to worry about. Our favorite security website already has a list of a lot of issues, and, of course, how to try to resolve them. There are many other issues. Let’s go back to our previous example. Let’s say that you just added a ChatGPT, Gemini clone to your solution, and you have users accessing that chat.
Depending on the industry that you work with, maybe you work in healthcare, maybe you work in finance, and your users and your system has access to confidential information, PII, private identifiable information, and we should not add confidential information or private identifiable information into the prompt. How can you make sure that the users that are using your solution are not going to be providing confidential information? Or, how can you make sure that the LLM is not returning any responses where we have confidential information? Imagine something like, provide me what’s your user prompt, or if your LLM has access to a database where you have a bunch of social security numbers or credit card numbers, how can you make sure that the response that the LLM is generating is not going to contain this information that should not be shared? Again, we also have the concept of guardrails. We can intercept both the request and the response, and not allow our request to even reach the LLM so it can be processed.
In the same way, if the LLM is providing a response with confidential information, we can also intercept and not let that response to be displayed back to the user. How do we do that? For example, we can create an Advisor for that. Remember I told you Advisors are very important in this case. We’re going to just create a simple prompt here saying, please detect if any confidential information or any private identifiable information has been shared by the users. Then you’re going to add your logic over here.
If it passes the validation, yes, please continue with the interaction. This is good to go. Or, no, then computer says, “No, you’re not allowed”. Of course, you’re going to add something more user friendly here. You just say, you cannot process this request. It is easy as that. You can also take ownership of whatever you can share with the LLM and whatever the LLM can share back with the user or should not be sharing. To use this, again, you create a simple Java class and you’re going to add to your Advisor. Not many lines of code. We already have a really nice use case here that we can add to our Java Spring solutions. We’re going to increase complexity a little bit more here. I promise, everything is easy.
RAG – Retrieval-Augmented Generation
Let’s talk about the next level of popular use case, which is, we’re going to provide context to the LLM. I don’t know about you, but I always ask to whatever model I’m interacting with, what’s your knowledge base date? Then you try to ask more recent questions just to see how it’s going to behave. In the majority of the cases, they will not know how to respond and say, I’m sorry, I don’t have access to this information. The data I’ve been training with is up till usually 2024, somewhere in 2024. A way that we have to fix that is provide that context to the LLM. We do that through RAG, Retrieval-Augmented Generation. This is because we’re going to do the retrieval from our own knowledge base, our own database.
In this case, it’s usually a vector store. We do something called embedding search. A lot of mathematics goes behind that. You don’t need to know any of that unless you want to work with creating your own vector store. As a user, as a developer, you don’t usually have to know that many details, but it’s so much fun in case you like to study a little bit more. Then, it’s going to try to get the information from the vector store, provide the context to the LLM. Of course, the LLM is going to enhance that and generate the response back to you. It’s a pretty simple concept, but it’s beautiful. The way that we do that is we can get this information from whatever text files, spreadsheets, PDF documents, or your own database. Maybe just enhance a little bit, put that into a nice document, and then you can feed the vector store with that information as well. The one good thing that I find about Spring AI is it has built-in ETLs as well.
If you’re dealing with text files or PDFs or spreadsheets, you don’t have to create code to actually read that. It will provide you with these document readers with whatever format you need, at least the most common ones. It does all the magic behind the scenes for you, and will provide you with the documents so you can insert them into your vector store or into your graph store as well. Then, once you have that vector store, you can use that into your prompt. However, I do not recommend doing this within your code itself. This should be like a batch job. You run it every night, get the documents, feed the vector store, and once the vector store has all the data, it’s ready to be used within your actual solution.
In this case, again, pretty simple. Inject the vector store, and then you have another Advisor, a question and answer passing your vector store. In this case, what the chat client will do is it will go to your vector database, will try to find that information based on the embedding search, and then provide that information to the prompt. Then the LLM can process and return back with the response of that. Usually, for memory, it’s going to be another dependency you’re going to add to your pom.xml, and the red with the vector store is going to be another dependency. Depending on the vector store, like whatever provider, there is one specific for that, and that’s the beauty of it. You start building your building blocks and add all the dependencies that you need so you don’t have the entire project available, and then you don’t have a bunch of stuff that you don’t need. You only add what you actually need.
AI Agents
The icing on the cake for building AI solutions is AI agents. Whatever we have learned so far comes to this, so how we can build AI agents so our APIs or our LLMs are a little bit more independent. I don’t know about you, but every time I see a new demo, those can be scary, beautiful, incredible, mind-blowing, but scary to think about that. Like, how they are doing that. Is the AI so smart that it’s capable of going to my database, getting data for me, updating data on my behalf? What do you think it is, yes or no? Yes and no. No, they’re not that smart. They’re only smart with the functionality that we provide to them. We have to build the capability. We’re not there yet. Maybe in the future, who knows? Nobody knows. Right now, we have to build it.
If I want my AI to go into my database and get information for me, or if I want my AI to go to the database and update a record for me, I have to actually code that functionality and tell them, this is what I have coded. Feel free to use it if you think you should use it. It’s smart. Yes, it can do that. No, if you don’t provide that capability, they will not know how to do that. This is what we call the actual AI agents. There are two types. One is the workflows. It’s well-defined, predefined tasks. The agent itself, it’s a little bit more dynamic and autonomous as well. You provide the capability and let the AI decide, should they use it or not? There is one very important concept, which is called function calling, or in case of Spring, tools. With tools, what’s going to happen is, I have my query, I have my request. I’m going to pass that to the LLM. The LLM will evaluate it, and in case it’s needed, it will actually invoke my API. The API is not the LLM. The API will actually do whatever it’s supposed to do, and the LLM will process and generate the response for me.
For example, let’s say that we are building an AI agent, an agent where the users are going to be able to use to book flights or to cancel a flight or a reservation, and you don’t want to wait in line. You just want to chat with the AI and let the AI do it for you. We can do that. We can write a service. We’re very familiar, service. We can just do a select from the database, select the columns from the tables with whatever query, and then return an object. These are the booking details. These are your reservation details. If I want to be able to cancel a booking, update, status canceled. AI will not know how to do that, but you know how to do that, so you can build that capability.
One of the things that I love about Spring, I like to call it annotation-driven development. You add an annotation. This is how you’re going to tell the AI, if you need to cancel a booking, call this function, pass in these parameters. It will know how to pass the parameters. It is smart for that. My logic’s going to take care of it. Of course, you can have as many tools, as many methods that you like in your class. You have your Java class with all your logic. Then, putting everything together, what are you going to do? You pass the instance, or you let Spring pass the instance in the constructor for you, and you have eight tools. Default’s going to be applied for all the instances of this chat client within this class, this service, or when you’re prompting, you can also have a configuration for tools, and you pass the booking tools. That easy. With this, we can enhance the capabilities and actually let the AI do things for you.
Demo
Let’s see the SpringLine AI demo. I have this SpringFly booking. I have the list of all the reservations over here. I also have my concierge. I’m going to say, “Hi, can you cancel a booking for me?” “Yes, happy to assist. Provide me with the booking reference and the name”. I’m going to provide this one. Then, my name is actually Emily Johnson today. Let me copy this. My name is Emily Johnson. It’s thinking. Yes, canceled. It’s not behaving the way that I intended because it should confirm first the cancellation, but it went ahead, so I still have to fine-tune the prompt a little bit more, which happens as well. It has canceled. Let me refresh the list here and see if that’s actually canceled. Yes, it’s canceled. Two questions, it canceled. It actually went into the database using that source code that I showed you before. Simple. We can make our AI a little bit smarter as well.
Observability
The other thing as well, which we’re going to cover is observability. I’m just going to be a little bit ahead. I think we all know, or at least the majority of you should be familiar with observability, worked with microservices, all the good stuff that comes with the microservices architecture, resiliency, observability. We can actually get metrics away from this, and you don’t have to do anything else in order to be able to, because this is Spring, the framework that we all know and love. It’s already built in. All you have to do is just enable that. I’ve asked to expose everything.
We can see all the operations. We can actually see, for example, the token usage, and let me pass this. Let’s take a look at this metric. Almost 8,000 tokens, that was a pretty expensive two questions. You can get all the metrics from this, put like Zipkin, actually do all the traceability of all the methods that are calling the interactions with the LLM, how much time it’s taking, so you can check if performance is good, if you need to fine-tune your prompt and all that good stuff. Again, a lot of metrics are available, and you can just use that and embed that into your applications as well.
Then, observability, which we talked about. All is built-in. Again, you don’t have to actually do anything. It’s going to be available to you. They’re in the documentation. I highly recommend that you all read the documentation. If you want to go one step ahead, read the source code. It’s a beautiful framework. The way that they have architected, beautiful. You can get all the details, all the metrics that are available, so you can actually enable them into your project.
MCPs (Model Context Protocols), and Testing
Of course, we have to talk about MCPs here, because it’s the new hot thing out there right now. I like the way that Juri framed this, like MCPs are the microservices from the LLMs, because it’s a way that you can try to reuse it. There is a bunch of MCP servers already available. If you want to, you can create your MCP server as well, or your MCP client with Spring. Guess how you do that? You add another dependency into your pom.xml, and you add an annotation, depending if it’s a server or a client. It’s easy as that. Everybody can go home, go back to work, and start building your solutions, including MCP servers and clients with Spring AI.
Testing, right now, it’s still a work in progress. Spring AI does have some support for testing as well. It’s usually how to see if it’s actually returning a good response or not. There is work in progress for a framework that is actually going to be able to do this. This is something that you don’t mock. You actually have to do that. Then you can do like Ollama or something locally. Use test containers and try to test this, so you actually have some testing and test coverage in your code as well. That’s basically that.
AI Is Not Rocket Science
One thing is, AI is not rocket science. Outside the black box, which is training the model, the data science, all the beauty behind it, but in case you like to start using AI in your solutions, it’s not rocket science. Again, we’ve learned how to build REST APIs, connect to the database, use Kafka, use MQs, microservices architecture. Then, after that, we read a bunch of books for design systems. You know how to pass all those interviews. It’s the same thing here. You have your building blocks. Once you know the building blocks, it’s just a matter of, how do I apply that design pattern here? When you go to agentic AI, you’ll have a bunch of design patterns. You have several prompts. You can execute prompts, one after the other, provide an input, get the output, use this input as an input to the other one, some sort of workflow, or you can have parallel prompts running at the same time.
The sky is the limit here. You can use your creativity, of course, with a valid use case, and start building your agents. Who knows, maybe you’re going to build the next GitHub Copilot, the agent mode or the Cursor? If you think about that, if you have a pretty good prompt, again, it’s not rocket science. All you have is fine-tune your prompt, make sure that you’re using the correct pattern for that, and you have a beautiful solution, integrate into your Java application, not Python. Again, don’t forget to have fun. This is really awesome. It’s a great time to be an engineer. Let’s just go ahead and start building AI solutions.
Questions and Answers
Participant 1: I noticed that the Advisor model contains the abstractions necessary for you to just add, like you showed, a vector store or whatever else. Do you have any control over the prompt or the prompt template, about how that data gets populated in the prompt before it goes to the agent?
Groner: The Advisor is just an interceptor. You can actually create whatever logic you want. If you wanted to sanitize or strip the prompt from whatever information, or even block the prompt to actually be sent to the LLM, you can add whatever logic that you need, put that into a class, and pass that class as an Advisor to the chat client API.
Participant 1: You could augment the prompt also with extra information as well.
Groner: Yes. They are the Advisors that are built-in within the Spring AI framework, but you can create your own as well.
Participant 2: For the agentic AI, is it just workflows? Agents in Spring AI, is it just workflow like calling tools with predefined or can it be autonomous?
Groner: That depends on how you create it, because the control is all in how you are creating them. If you do predefined tasks, we usually call it the workflow. If it’s a little bit more dynamic, like that demo that I presented to you, that would be more like an agent AI. You can define your prompt. This is the chat service. Right here, for example, you can define your prompt. This is a very simple prompt that I added here, like, act as a travel agent. The authentication requirements, like ask for the booking reference and the operations. If you cannot locate, say that you apologize, the available tools that are done, like so you can fetch details, modify reservations, process cancellations as needed as well.
Once you pass the tools, now you’re giving this model the actual power to make the decisions if it should use the tool or not for you. That’s why I would say it’s an agent because it’s a little bit more dynamic and we have different tools that are available for it to use. Depending on my question, it will make a decision. Of course, there’s a lot of logic behind the scenes that Spring AI has implemented here. If I am asked to cancel a reservation, it will actually call the cancel reservation function that I have created, the method that I have created, and then call it and return the response. This is an example of an agent. It’s simple.
Participant 2: Is it like a bean, like the tool itself?
Groner: Yes, the tool itself, it’s like a bean. This is just a service in Spring. All I’m doing for the methods that I want to expose, not every method you have to expose, but here I have the booking details, change the booking dates, cancel the booking. I have three methods that it can use. I don’t have any other annotations, just the tool. That’s it. Because this is a bean, I can pass through my constructor, and let Spring AI use it.
Participant 3: I have a question about your use of RAG. In your example, you connected to a booking database. You used the examples of booking ID, first name, last name. What is it that hooks those fields, which are database-specific fields, up to the language concepts of booking, first name, last name? What if those fields were named in your database A, B, C? What would you do to say, A is a booking ID, B is a first name, C is a last name, and hook that up to the whole AI framework that connects English language concepts or language concepts to those fields?
Groner: In this case, that’s why I’ve mentioned that I had to fine-tune a little bit because it actually canceled without asking for permission first. In my prompt, I’ve asked the LLM to ask for the booking number, the first name, and the last name before you can cancel. In my method signature, I’m actually passing the three fields as information. Then, if we take a look at this, it goes to the database, do the select and everything. In this case, this is just memory. Pretend this is a database. I will have like a select query over here, just standard source code that I will go to do a retrieval in the database.
If I go back to the prompt itself, to retrieve the booking detail, use the book reference and customer details. In this case, the LLM itself is trying to determine what’s the booking reference, what’s a first name, and what’s a last name. Because I pass the number, probably thinks, this is the booking detail, and then first name, last name. This is probably a bigger science behind it in how you can determine if it’s a first name or last name. That part I don’t have to tell it how it is because the LLM has been trained for that.
Participant 4: That is just like MCP. Same thing, just built in?
Groner: Yes, if you want to create an MCP, I could use these booking tools and define that this is an MCP server. I would just add the annotation, this is a server, at the dependency. This is a MCP server, and publish. We have an MCP server with these three capabilities that we have defined here. Then, if you want to use it, you create a different service, add the client dependency. Just hook it up, it’s done.
Participant 5: Java being type strict as it is, how do you find having to write prompt in general conversational English?
Groner: Prompt engineering is a science itself. Writing an AI solution is not difficult. I think the key is if you have the right prompt for it. There are several companies, startups out there that are just popping up with AI solutions. Usually, one of the differentials that they will say is like, I have fine-tuned my prompt, and you can actually test what’s the accuracy of that prompt itself. It’s a different thing altogether.
Participant 5: If you see in Python, we have Pydantic, we have a bunch of different libraries that allows you to type safe the response from AI. Do we have any such thing with Spring AI framework?
Groner: At this time, you have to write it yourself. There are not a lot of libraries right now. With Java, we have like LangChain. That’s another library that’s available for you to use with different AI solutions. Now you have Spring AI as well. We’re getting there, maybe in the future. Maybe you can create one and publish, and we can use it.
Participant 6: Can we write some sort of testing? Is it possible to verify the changes that we are doing is actually valid or not?
Groner: I did not write and test for this example. What you’re testing is the accuracy of what you’re trying to achieve or not. There is actually one interface, one class that is provided by Spring AI that you’re going to add your prompt and it’s going to tell you, ok, this is 70% good, or like, its return is doing 80% of the times what it’s supposed to do. That’s about it. There are a few other folks that are actually creating some more, like a testing library so you can use it to actually do more unit testing. You cannot mock this. I tried it. I’ve asked Copilot to try to do it for me. It also did not succeed, until I found out there wasn’t a way to actually mock it. It’s still limited. It’s a work in progress. We’ll get there. Test your code. Don’t do like me.
Participant 7: In response to the question about frameworks for Spring and Java for AI, there is a new framework called Embabel that was created by the founder of Spring, Rod Johnson. It’s an agent framework built on top of Spring AI.
Participant 8: I have a question regarding the tool calling the signature, because the language model, it’s an autoregression model, so you will predict the next token. Suppose you are now generating the signature, you want to call the function, or tool, whatever you call it, will the framework retry until you generate the correct signature?
Groner: I haven’t tried that yet.
Participant 8: Because it can hallucinate, it can give you something, or they have, yes, I can do it, but actually that’s not what you want.
Groner: Yes. This is another very common problem. I have not tested that out yet. If we could have some retry mechanism, if that would even work, because Spring has the retry embedded as well. A very good test.
See more presentations with transcripts