Infusing AI Into Your Java Applications

Key Takeaways

As a Java developer, there’s no need to learn another language to get started writing AI-infused applications.

Java developers can use the open-source project, LangChain4j, to manage interactions between Java applications and large language models (LLMs), such as storing and managing chat memory to keep requests to the LLM efficient, focused, and less expensive.

Using LangChain4j with Quarkus simplifies interacting with LLMs and you also benefit from Quarkus’s developer joy with dev mode, Dev UI, and easy observability tool integration.

Java is battle-tested, with a robust, enterprise-ready ecosystem (think performance and security) that will help you succeed in writing and running production-ready AI-infused applications in Java.

Get started learning the basic concepts of writing AI-infused applications in Java with LangChain4j and Quarkus. Try it out for yourself by creating a simple chatbot application and get ahead of the curve in the rapidly evolving field of AI.

Artificial intelligence (AI) is becoming increasingly pervasive. As an Enterprise Java developer you might be wondering what value AI can add to your business applications, what tools Java provides to easily do that, and what skills and knowledge you might need to learn. In this article, we equip you with the basic knowledge and skills that you need to start exploring the capabilities of AI to build intelligent and responsive Enterprise Java applications.

When we talk about AI in this article, we mean getting responses from a large language model (LLM) based on a request that the Java application sends to the LLM. In our article’s example, we create a simple chatbot that customers can ask for planetary tourist destination recommendations, and then use to book a spaceship to visit them. We demonstrate using Java frameworks like LangChain4j with Quarkus to efficiently interact with LLMs and create satisfying applications for end-users.

Hello (AI) World: Getting an LLM to Respond to a Prompt

The first version of our spaceship rental application will build a chatbot that interacts with customers using natural language. It should answer any customer questions about planets they wish to visit in the solar system. For the full application code, see spaceship rental step-01 directory in the GitHub repository.

The chatbot sends the customer’s questions to the application which interacts with the LLM to help process the natural language questions and to respond to the customer.

For the AI-related parts of the application, we create just two files:

An AI service, CustomerSupportAgent.java, which builds a prompt informing the LLM about our solar system’s planets and instructs the LLM to answer questions from customers.

A WebSocket endpoint, ChatWebSocket.java, which receives the user’s messages from the chatbot.

AI services are Java interfaces that provide a layer of abstraction. When using LangChain4j, these interfaces make LLM interaction easier. AI services are an integration point, so in a real application you would need to consider security, observability, and fault tolerance of the connections and interactions with the LLM. As well as handling LLM connection details (stored separately in the application.properties configuration file), an AI service builds the prompts and manages chat memory for the requests it sends to the LLM.

The prompt is built from two pieces of information in the AI service: the system message and the user message. System messages are typically used by developers to give the LLM contextual information and instructions for handling the request, often including examples that you want the LLM to follow when generating its response. User messages provide the LLM with application user requests.

The CustomerSupportAgent interface is registered as the AI service in the application. It defines the messages used to build the prompt and sends the prompt to the LLM:

@SessionScoped
@RegisterAiService
public interface CustomerSupportAgent {
    @SystemMessage(""" 
        You are a friendly, but terse customer service agent for Rocket's 
        Cosmic Cruisers, a spaceship rental shop.
        You answer questions from potential guests about the different planets
        they can visit.
        If asked about the planets, only use info from the fact sheet below. 
        """
        + PlanetInfo.PLANET_FACT_SHEET) 
    String chat(String userMessage); 
}

Let’s look at what this code is doing. The @SessionScoped annotation maintains the session for the duration of the web service connection and maintains the chat memory for the duration of the conversation. The @RegisterAIService annotation registers an interface as an AI service. LangChain4j automatically implements the interface. The @SystemMessage annotation tells the LLM how to behave when responding to the prompt.

When the end user types a message in the chatbot, the WebSocket endpoint passes the message to the chat() method in the AI service. There is no @UserMessage annotation specified in our AI service interface, so the AI service implementation automatically creates a user message with the chat() method parameter value (in this case the userMessage parameter). The AI service adds the user’s message to the system message to build a prompt that it sends to the LLM, then displays the response from the LLM in the chatbot interface.

Note that, for readability, the planet information has been placed in a separate PlanetInfo class. Alternatively, you could place the planet information directly in the system message.

The ChatWebSocket class defines a WebSocket endpoint for the application’s chatbot UI to interact with:

@WebSocket(path = "/chat/batch")
public class ChatWebSocket {
 
    private final CustomerSupportAgent customerSupportAgent;
 
    public ChatWebSocket(CustomerSupportAgent customerSupportAgent) {
        this.customerSupportAgent = customerSupportAgent;
    }
 
    @OnOpen
    public String onOpen() {
        return "Welcome to Rocket's Cosmic Cruisers! How can I help you today?";
    }
 
    @OnTextMessage
    public String onTextMessage(String message) {
        return customerSupportAgent.chat(message);
    }
}

The CustomerSupportAgent interface uses constructor injection to automatically provide a reference to the AI service. When the end user types a message in the chatbot, the onTextMessage() method passes the message to the AI service chat() method.

For example, if the user asks, “What’s a good planet to visit if I want to see volcanoes?”, the application responds with a recommendation and why the user might like to visit there, as a fan of volcanoes:

The Spaceship Rental application chatbot.

Providing an Illusion of Memory

As you continue your conversation with the chatbot, it might seem as though it is aware of previous messages exchanged, that is, the context of your conversation. When you talk to another person, you take for granted that they remember what you (and they) last said. Requests to an LLM are stateless, though, so each response is generated solely based on the information contained within the request prompt.

To maintain context in a conversation, the AI service uses chat memory, through LangChain4j, to store prior user messages and the chatbot’s responses. By default, the Quarkus LangChain4j extension stores the chat in memory and the AI service manages the chat memory (for example, by dropping or summarizing the oldest messages) as needed to remain within the memory limits. LangChain4j by itself would require you to first configure a memory provider but that is not needed when using the Quarkus LangChain4j extension. This gives a practical illusion of memory to end users and improves the user experience so they can enter follow-on messages without needing to repeat everything they previously said. The user chatbot experience can also be improved by streaming the responses from the LLM.

Streaming Responses for a More Responsive User Experience

You might notice responses to your chat message window take time to generate and then appear all at once. To improve the chatbot’s perceived responsiveness, we can modify the code to return each token of the response as it is generated. This approach, called streaming, allows users to start reading a partial response before the entire response is available. For the full application code, see the GitHub spaceship rental step-02 directory.

Changing our application to stream the chatbot response is easy. First, we’ll update the CustomerSupportAgent interface to add a method that returns an instance of the SmallRye Mutiny Multi<String> interface:

@SessionScoped
@RegisterAiService
@SystemMessage(""" 
    You are a friendly, but terse customer service agent for Rocket's Cosmic Cruisers, a spaceship rental shop. You answer questions from potential guests about the different planets they can visit. If asked about the planets, only use info from the fact sheet below. 
    """ 
    + PlanetInfo.PLANET_FACT_SHEET) 
public interface CustomerSupportAgent {
    String chat(String userMessage);

    Multi<String> streamChat(String userMessage);
}

Moving the @SystemMessage annotation to the interface means that the annotation doesn’t have to be added to each of the methods in the interface. The streamChat() method returns the LLM’s response to the chat window one token at a time (instead of waiting to display the full response all at once).

We also need to call the new streamChat() method from a WebSocket endpoint. To preserve both batch and stream functionality, we create a new ChatWebSocketStream class that exposes the /chat/stream WebSocket endpoint:

@WebSocket(path = "/chat/stream")
public class ChatWebSocketStream {

    private final CustomerSupportAgent customerSupportAgent;

    public ChatWebSocketStream(CustomerSupportAgent customerSupportAgent) {
        this.customerSupportAgent = customerSupportAgent;
    }

    @OnOpen
    public String onOpen() {
        return "Welcome to Rocket's Cosmic Cruisers! How can I help you today?";
    }

    @OnTextMessage
    public Multi<String> onStreamingTextMessage(String message) {
        return customerSupportAgent.streamChat(message);
    }
}

The customerSupportAgent.streamChat() call invokes the AI service to send the user message to the LLM.

After making some minor tweaks to the UI, we can now toggle streaming on and off in our chatbot:

The application with the new streaming option enabled.

With streaming enabled, each token (each word, or part-word) produced by the LLM is immediately returned to the chat interface.

Generating Structured Outputs From Unstructured Data

Up to this point, the LLM’s outputs have been intended for the application’s end user. But what if, instead, we want the LLM’s output to be used directly by our application? When the LLM responds to a request, the AI service that mediates the interaction with the LLM can return structured outputs, which are formats that are more structured than a String, such as POJOs, lists of POJOs, and native types.

Returning structured outputs significantly simplifies the integration of an LLM’s output with your Java code because it enforces that the output received by the application from the AI service maps to your Java object’s predefined schema. Let’s demonstrate the usefulness of structured outputs by helping the end user select a spaceship from our fleet that meets their needs. For the full application code, see the GitHub spaceship rental step-03 directory.

We begin by creating a simple Spaceship record to store information about each individual spaceship in the fleet:

record Spaceship(String name, int maxPassengers, boolean hasCargoBay, List<String> allowedDestinations) { 
}

Similarly, to represent the user’s query about the spaceships in our fleet, we create a SpaceshipQuery record, which is based on the information the user provided in the chat:

@Description("A request for a compatible spaceship")
public record SpaceshipQuery(int passengers, boolean hasCargo, List<String> destinations) { 
}

The Fleet class populates several Spaceship objects and provides a way to filter out those that do not match the user.

Next, we update the CustomerSupportAgent interface to take the user’s message (unstructured text) to create a structured output in the form of the SpaceshipQuery record. To accomplish this feat, we only need to set the return type for a new extractSpaceshipAttributes() method in our AI service to be a SpaceshipQuery:

SpaceshipQuery extractSpaceshipAttributes(String userMessage);

Under the covers, LangChain4j automatically generates a request to the LLM including a JSON schema representation of the desired response. LangChain4j deserializes the JSON-formatted response from the LLM and uses it to return a SpaceshipQuery record, as requested.

We also need to know whether the user’s input is about one of our spaceships, or about some other topic. This filtering is accomplished using a simpler, structured output request that returns a boolean:

@SystemMessage("""
You are a friendly, but terse customer service agent for Rocket's Cosmic Cruisers, a spaceship rental shop. 
Respond with 'true' if the user message is regarding spaceships in our rental fleet, and 'false' otherwise.
""")
boolean isSpaceshipQuery(String userMessage);

Our last addition to the CustomerSupportAgent interface enables the agent to provide a spaceship suggestion based on our fleet and the user’s request, with and without streaming:

@UserMessage("""
        Given the user's query regarding available spaceships for a trip {message}, provide a well-formed, clear and concise response listing our applicable spaceships.
        Only use the spaceship fleet data from {compatibleSpaceships} for your response.
        """)
    String suggestSpaceships(String message, List<Spaceship> compatibleSpaceships);
 
@UserMessage("""
        Given the user's query regarding available spaceships for a trip {message}, provide a well-formed, clear and concise response listing our applicable spaceships.
        Only use the spaceship fleet data from {compatibleSpaceships} for your response.
        """)
Multi<String> streamSuggestSpaceships(String message, List<Spaceship> compatibleSpaceships);
}

Our last step is to update the ChatWebSocket and ChatWebSocketStream classes to first check if the user’s query is about spaceships in our fleet. If so, the customer support agent creates a SpaceshipQuery record by extracting the information from the user’s message and then responds with suggested spaceships from the fleet that are compatible with the user’s request. The updated code is similar for both the ChatWebSocket and ChatWebSocketStream classes, so only the ChatWebSocket class is shown here:

@OnTextMessage
public String onTextMessage(String message) {
    boolean isSpaceshipQuery = customerSupportAgent.isSpaceshipQuery(message);

    if (isSpaceshipQuery) {
        SpaceshipQuery userQuery = customerSupportAgent.extractSpaceshipAttributes(message);

        List<Spaceship> spaceships = Fleet.findCompatibleSpaceships(userQuery);
        return customerSupportAgent.suggestSpaceships(message, spaceships);
    } else 
        return customerSupportAgent.chat(message);
}

With these updates, the customer support agent is ready to use the structured outputs to provide the user with spaceship suggestions:

The application providing the user with spaceship suggestions based on the structured output.

With that, we have completed an AI-infused Java chatbot application that provides planetary tourism recommendations and spaceship rentals.

To continue learning, experiment with the full code of our sample application alongside the Quarkus with LangChain4j docs.

More on these AI concepts

We’ve discussed various AI concepts throughout this article. If you want to know more about any of them, here is a quick explainer.

Large Language Models (LLMs)

When we talk about AI in this article, we generally mean getting responses from a large language model. LLMs are machine learning models that are trained to generate a sequence of outputs based on a sequence of inputs (often text inputs and outputs, but some multi-modal LLMs can work with images, audio or video). LLMs can perform a wide variety of tasks, such as summarizing a document, translating between languages, fact extraction, writing code, etc. This task of creating new content from the input is what’s referred to as Generative AI, or GenAI. You can infuse such capabilities into your application as needed.

Making Requests to LLMs: Prompts, Chat Memory, and Tokens

How you request information from an LLM influences not only the response you get back from the LLM but the end user’s experience, and the application’s running costs.

Prompts

Sending a request to an LLM, whether from application code or as an end-user in a chat interface, involves writing a prompt. A prompt is the information (usually, but not always, text) to which the LLM responds. If you think of communicating with an LLM like communicating with another person, how you phrase your request is important to making sure the other person (or the LLM, in this case) understands what you want to know. For example, ensuring that you give the context of the request before going on to ask for a specific piece of information, and not providing lots of irrelevant information to confuse the listener.

Chat Memory

Unlike when you are talking to another person, LLMs are stateless and don’t remember the previous request, so everything you need the LLM to take into consideration needs to be in your request: the prompt, any previous requests and responses (the chat memory), and any tools you provide to help the LLM respond. However, providing too much information to the LLM in the prompt can potentially complicate the request. It can also be costly.

Tokens

LLMs convert the words in your prompt into a sequence of tokens. Most hosted LLMs charge usage based on the number of tokens in the request and response. A token can represent a whole word or a part of a word. For example, the word “unbelievable” is typically split into multiple tokens: “un”, “bel”, and “ievable”. The more tokens that you include in the request, especially when you include all the chat memory, the greater the potential cost of running the application.

Providing all the chat memory in a request can make requests both costly and less clear. Requests to LLMs are limited in length, so it’s important to manage the chat memory and how much information is included in the request. This can be helped a lot by the Java frameworks that you use, such as LangChain4j with Quarkus, which we use for the sample application in this article.

LangChain4j and Quarkus Frameworks

LangChain4j is an open-source Java framework that manages interactions between Java applications and LLMs. For example, LangChain4j, through the concept of AI services, stores and helps you to manage chat memory, so that you can keep requests to the LLM efficient, focused, and less expensive.

Quarkus is a modern, cloud-native, open-source Java framework optimized for developer productivity, running in containerized environments, and fast startup with low memory usage. The LangChain4j extensions to Quarkus simplify configuration of connecting to and interacting with LLMs in AI-infused Java applications.

The LangChain4j project can be used with other Java application frameworks, including Open Liberty, Spring Boot, and Micronaut. MicroProfile and Jakarta EE are also working together with LangChain4j to provide an open standards-based programming model for developing AI applications.

The Sample Application

You can find the complete sample application that we demonstrate throughout this article in GitHub. The application is written in Java and runs on Quarkus using the Quarkus LangChain4j extensions.

Conclusion

Infusing AI into Java applications enhances the application’s capabilities and the end-user’s experience. With the help of Java frameworks like Quarkus and LangChain4j to simplify interactions with LLMs, Java developers can easily infuse AI into business applications.

Writing AI-infused applications in Java means you’re working in Java’s robust, enterprise-ready ecosystem, which not only helps you to easily interact with AI models, but also makes it easy for the applications to benefit from enterprise essentials such as performance, security, observability, and testing.

The field of AI is rapidly evolving. By mastering the concepts and technologies in this article, you can stay ahead of the curve and start exploring how AI can help you build intelligent and engaging Java applications. Experiment with the full code of our sample application alongside the Quarkus with LangChain4j docs.

If you’d like to learn more, try this tutorial on how to extend the knowledge of the LLM with content from PDF documents by using retrieval augmented generation (RAG): Build an AI-powered document assistant with Quarkus and LangChain4j.

Thanks to Red Hatters Clement Escoffier, Markus Eisele, and Georgios Andrianakis for valuable review comments.

Infusing AI into your Java applications

Key Takeaways

Hello (AI) World: Getting an LLM to Respond to a Prompt

Providing an Illusion of Memory

Streaming Responses for a More Responsive User Experience

Generating Structured Outputs From Unstructured Data

More on these AI concepts

Large Language Models (LLMs)

Making Requests to LLMs: Prompts, Chat Memory, and Tokens

Prompts

Chat Memory

Tokens

LangChain4j and Quarkus Frameworks

The Sample Application

Conclusion

Leave a Reply

Key Takeaways

Hello (AI) World: Getting an LLM to Respond to a Prompt

Providing an Illusion of Memory

Streaming Responses for a More Responsive User Experience

Generating Structured Outputs From Unstructured Data

More on these AI concepts

Large Language Models (LLMs)

Making Requests to LLMs: Prompts, Chat Memory, and Tokens

Prompts

Chat Memory

Tokens

LangChain4j and Quarkus Frameworks

The Sample Application

Conclusion

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply