Key Takeaways
- The retrieval-augmented generation (RAG) paradigm allows you to overcome the limitations of static language models by combining generation with the retrieval of information from corporate databases, ensuring accurate and transparent responses.
- Spring Boot and Spring AI help integrate artificial intelligence models into enterprise contexts, using established patterns and ensuring the management of multiple providers without invasive changes to the code or technology stack.
- MongoDB Atlas natively supports vector search, eliminating the need for specialized databases and enabling semantic searches directly within an already established infrastructure.
- OpenAI models specialized for embedding and generation make it possible to transform text into vector representations and produce context-aware responses, providing several options to balance cost, speed, and accuracy according to requirements.
- The implementation presented demonstrates how these technologies can be combined to create a sentiment-based music recommendation system, thanks to ingestion, embedding, semantic search and reranking pipelines, an approach that can be applied and extended to numerous other sectors.
RAG Pipeline
Retrieval-augmented generation (RAG) is one of the most interesting architectural patterns in modern AI applications. It is not just a technological evolution and trend, but a paradigm that allows us to overcome some of the main limitations of generative large language models (LLMs) and effectively integrate them into existing business systems.
LLMs, such as those provided by OpenAI, have proven to be powerful in generating coherent, creative, and seemingly “intelligent” text. However, they have certain limitations and issues when applied to enterprise contexts: their knowledge is static, limited to the training data and sometimes lacking context with regard to the company’s internal data.
In these scenarios, the RAG approach is an ideal strategy: instead of relying solely on a model’s pre-trained knowledge, RAG combines the language generation capabilities of an LLM with a controlled corporate knowledge base, often fed by proprietary documents, structured datasets, and verified sources.
The key idea is simple, but fundamentally important. Instead of asking the model to invent every response from past training, it first goes through a retrieval stage that searches for relevant documents from an updated company database indexed using numerical embeddings. These relevant documents are then provided to the generative model as context for the generation, improving accuracy, relevance, and transparency.
Figure 1: RAG Pipeline
This architecture is particularly appealing to enterprise contexts for several reasons. First, it avoids the enormous cost and complexity of fine-tuning models on specific proprietary datasets, allowing for the integration of modular sources. The engine that handles data retrieval can be independently updated and queried in real time, providing responses that reflect the current state of the data. The modular nature of RAG also allows you to maintain control, security, and governance over your data, which is crucial when managing sensitive information.
In this article, we will build a prototype music recommendation system based on feelings, utilizing a technology stack that is a staple in many enterprise companies. The idea is to employ a use case to demonstrate how we can build something incredibly powerful and scalable, leveraging existing skills and knowledge. Widely available technologies such as Spring Boot, MongoDB, and OpenAI, can be used for this purpose.
Spring Boot + Spring AI
The effectiveness of a RAG pipeline depends not only on the potential of AI models and vector stores, but also on the robustness of the application layer on which the solution is based, as well as the quality and simplicity of integration between the various necessary components. In this context, Spring Boot and Spring AI represent the ideal combination that allows you to orchestrate processes, integrate different LLMs, and maintain a common standard of maintainability and scalability at the same time.
The Spring framework has dominated the Java enterprise ecosystem for over two decades, demonstrating a unique ability to evolve, often anticipating market needs. From its inception as a “lightweight” alternative to EJBs, through the era of microservices with Spring Boot, to cloud-native transformation with Spring Cloud, the framework has constantly redefined the way complex architectural challenges are developed and implemented.
Spring AI represents the latest and most ambitious chapter in this evolution. It is not a trivial library that simplifies and wraps API calls to AI services, but a new conceptualization of how artificial intelligence can be organically integrated into the Java ecosystem. Spring AI stems from the awareness that the adoption of artificial intelligence within enterprise contexts is not a technological problem, but rather an organizational and skills issue. Companies have teams of Java developers with years of experience in the Spring ecosystem, established infrastructures, well-oiled operational processes, and compliance constraints due to regulations and standards. Asking these companies to completely revolutionize their technology stack is probably unrealistic and perhaps even counterproductive.
Spring’s established patterns are applied to the world of AI. The principle of Inversion of Control (IoC), for example, becomes the basis for working with multiple AI providers. Within a project, you could start prototyping with OpenAI, migrate to Azure OpenAI for compliance requirements, and then evaluate on premises solutions such as Hugging Face, an open-source platform that provides tools and pre-trained models for AI and natural language processing, for particularly sensitive data. With Spring AI, these changes become configuration changes rather than new implementations. Embedded models, chat models, image models, and vector stores are therefore not just technical abstractions, but contracts that allow you to work with AI not in terms of integration, but in terms of business logic, freeing up time to create value.
Figure 2: Spring AI Ecosystem
MongoDB Atlas as a Vector Store
The introduction of vector search in MongoDB Atlas represents one of the most significant developments in the NoSQL database landscape; it is the transformation from a document database to a unified platform capable of simultaneously managing structured, semi-structured, and high-dimensional vector representations.
Traditionally, the introduction of RAG systems required the addition of new technologies within the existing enterprise stack: specialized vector databases such as Pinecone, Weaviate, or Chroma. For organizations that have already invested heavily in MongoDB as their primary database, adding new technologies represents a technological and operational overhead, requiring new skills, monitoring, and management processes in production.
On a technical level, MongoDB Atlas Vector Search implements the Hierarchical Navigable Small World (HNSW) algorithm, considered state-of-the-art in approximate nearest neighbour (ANN) search in high-dimensional spaces. The choice of this algorithm is not random, but stems from the fact that it offers an excellent trade-off between accuracy, performance, and memory consumption.
HNSW indexes are automatically shared across all nodes in the cluster, enabling transparent and automatic horizontal scalability. In addition, vector similarity queries can be combined with traditional queries in a single operation, implementing what is known as hybrid search. MongoDB Atlas supports vector embedding with dimensionality up to 4096, covering virtually all model embeddings currently used in production. Vectors are saved as float32 arrays, optimized to minimize storage and bandwidth during similarity search operations.
Figure 3: MongoDB Vector Store
A particularly interesting aspect of MongoDB Vector Store is the possibility of having multiple vector representations within the same document, thus allowing the implementation of multimodal search strategies without the need for synchronization between objects and/or systems.
OpenAI Chat and Embedding Model
OpenAI provides a series of models that can be used in RAG architectures, combining semantic search with natural language generation, to support the development of intelligent, extensible, and reliable systems.
Embedding models are designed to transform texts into dense numerical vectors capable of expressing and capturing semantic relationships between concepts. This design allows searches to be based on contextual meaning rather than keywords alone, thereby increasing the relevance and accuracy of results.
OpenAI currently offers two models:
text-embedding-3-smallproduces compact and lightweight embeddings, suitable for scenarios with high data volumes and cost optimization requirementstext-embedding-3-largeproduces richer and more accurate embeddings (1536 dimensions), suitable for contexts that require maximum precision in semantic similarity
OpenAI’s conversational LLMs (chat models) come into play during the generation and refinement phase. Once the most relevant documents have been retrieved from the vector store, the selected model produces a final response in natural language, consistent with the request made and contextualized with respect to the database used. OpenAI provides:
gpt-4o-minimodel optimized for speed and reduced costs, ideal choice for RAG pipelines with high volumes of requestsgpt-4omore powerful model than the previous one, suitable for enterprise scenarios with greater accuracy requirements and complex reasoning needsgpt-4-1model designed for highly complex contexts, contexts in which generation quality is crucial
LyricMind: A Musical RAG Recommendation System
The project was created to develop a RAG-based application using a modern technology stack that combines Spring Boot, Spring AI, MongoDB Atlas Vector Search, and OpenAI.
The basic idea is to build a music recommendation system, called LyricMind, which takes user input (e.g., a feeling or mood) and returns a set of relevant text results selected from a pre-loaded knowledge base. In this case, the knowledge base consists of song lyrics, but the concept applies to any domain.
The system consists of two main phases:
- Ingestion and embedding phase
- Query and retrieval phase
In the first phase, the Spring Boot application performs a massive upload of songs, including title, author, album, and lyrics (the structure can be made generic for any type of text document). Through Spring AI, each document is sent to an OpenAI embedding model that generates a dense numerical representation. This representation, together with the original text and associated metadata, is stored in MongoDB Atlas. Here, thanks to native support for historization and vector search, each embedding is indexed to enable fast and efficient semantic searches.
In the second phase, when the user enters a question expressed in semantic terms, it is also transformed into an embedding through Spring AI. MongoDB Atlas performs a vector similarity search to return the most relevant documents related to the query. These results are not only displayed but are also passed to a generative model (OpenAI’s chat model, orchestrated and abstracted in this case by Spring AI) that performs an additional reranking and contextualization phase, thus ensuring more accurate and context-rich responses.
Figure 4: LyricMind RAG Pipeline Architecture
Hands-on Code
Now that we have presented all the ingredients, let’s try mixing them to build the RAG system for providing music recommendations.
The following architecture will be used as a reference to show how the system works.
Figure 5: LyricMind Technical Implementation
Let’s analyze the implementation, dividing the representation into two separate parts: the first relating to the embeddings engine and the second relating to the music recommendation engine.
Spring Boot 3.5.5 and Java 24 were used. The following dependencies are crucial for implementation:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-mongodb-atlas-store-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Dependencies
spring-ai-openai-spring-boot-starterfor integration with GPT-4o-mini (chat) and text-embedding-3-large (embeddings) modelsspring-ai-mongodb-atlas-store-spring-boot-starterfor persistence and vector searchspring-boot-starter-actuatorfor application monitoring and management
Embeddings Generator
In the ingestion and embedding phase, the goal is to generate embeddings from textual content, in this case, songs. The generation of embeddings will follow this pipeline:
- Read the dataset from a CSV file
- Create documents representing each song and saving them in MongoDB
- Generate the embeddings for each song
- Save within the vector store
The Song class represents a song as follows:
@Getter
@Setter
@NoArgsConstructor
@Document(collection = "songs")
public class Song {
@Id
public String id;
public String title;
public String artist;
public String album;
public String genre;
public String lyrics;
public String description;
public List<String> tags;
public Integer releaseYear;
public Song(String title, String artist, String description) {
this.title = title;
this.artist = artist;
this.description = description;
}
}
While its embedding is represented by the SongEmbedding class:
@Document(collection = "song_embedding")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class SongEmbedding {
@Id
private String id;
private String songId;
private String content;
private List<Double> embedding;
private Map<String,Object> metadata;
}
In this class, we see how the embedding, represented as a List<Double>, is the vector representation of the song on which semantic similarity searches will then be performed.
The dataset is read by exposing a REST API, as defined in the EmbeddingsController class, which can be called by specifying the name of the CSV file to be read. In this case, for simplicity, some CSV files have been inserted directly into the src/main/resources path, containing songs by some famous singers.
@RestController
@RequestMapping("/api/lyricmind/v1/embeddings")
public class EmbeddingsController {
@Autowired
SongEmbeddingService songEmbeddingService;
@PostMapping("/bulk-songs")
ResponseEntity<BulkSongResponse> createEmbeddingFromBulkSong(@RequestBody BulkSongRequest request){
return new ResponseEntity<>(songEmbeddingService.createEmbeddingFromBulkSong(request), HttpStatus.CREATED);
}
}
Once the API has been called, the service layer will handle all the business logic, generating the representation of the song and its vector representation, and finally saving the document in the vector store.
@Service
public class SongEmbeddingService {
private final SongRepository songRepository;
private final VectorStore vectorStore;
private final DatasetGeneratorComponent datasetGeneratorComponent;
@Transactional
public Integer createEmbeddingFromSongList(List<SongRequest> requestList) {
if (requestList == null || requestList.isEmpty()) {
throw new IllegalArgumentException("Song request list cannot be null or empty");
}
log.info("Starting bulk embedding for {} songs", requestList.size());
List<Song> savedSongs = new ArrayList<>();
List<Document> documents = new ArrayList<>();
try {
for (SongRequest request : requestList) {
Song song = mapRequestToSong(request);
savedSongs.add(song);
}
savedSongs = songRepository.saveAll(savedSongs);
documents = savedSongs.stream()
.map(this::createDocumentFromSong)
.collect(Collectors.toList());
embedDocuments(documents);
log.info("Successfully embedded {} songs", documents.size());
return documents.size();
} catch (Exception e) {
log.error("Failed to embed songs in bulk", e);
throw new RuntimeException("Bulk embedding failed", e);
}
}
}
Within the SongEmbeddingService class, there is a method called createEmbeddingFromSongList() where the CSV file is first read, each row is transformed into a list of songs, saved within MongoDB as documents. Then the embedDocuments() method is invoked, which triggers calls to OpenAI for vectorization of the content and saves it within the Vector Store. All this happens transparently to the user, through the interface exposed by Spring AI.
private final VectorStore vectorStore;
private void embedDocuments(List<Document> documents) {
try {
vectorStore.add(documents);
log.debug("Successfully embedded {} documents", documents.size());
} catch (Exception e) {
log.error("Failed to embed documents", e);
throw new RuntimeException("Vector embedding failed", e);
}
}
But how is all this configured? As with any respectable Spring Boot application, all configuration can be done via properties files, specifying which model to call and what the characteristics of that model are. In this case:
spring.ai.vectorstore.mongodb.collection-name=lyricmind_vector_store
spring.ai.vectorstore.mongodb.initialize-schema=true
spring.ai.vectorstore.mongodb.path-name=embedding
spring.ai.vectorstore.mongodb.indexName=lyricmind_vector_index
spring.ai.openai.api-key=<<insert-here>>
spring.ai.openai.embedding.options.model=text-embedding-3-large
In these lines, we see how:
- A MongoDB vector store has been configured that saves documents within a collection named
lyric_mind_vector_store, in a path namedembedding, using an index defined to exploit semantic search namedlyricmind_vector_index. - OpenAI has been configured as the provider for generating embeddings, specifying the API key and the required model. In this case, we chose to use the
text-embedding-3-largemodel.
It is important to note one more thing. In this implementation, no mechanism for chunking textual content has been provided. The issue of chunking (breaking textual content into several parts and then generating separate embeddings) is crucial in projects that implement RAG. Chunking is useful when the document is very long and contains heterogeneous concepts (e.g., technical manuals, legal documentation). In this case, it is not advisable to chunk song lyrics. A song is typically a few lines to a few hundred words long. Even in the longest cases, the text falls well below the token limits supported by OpenAI embedding models (text-embedding-3-small or text-embedding-3-large accept inputs of up to 8192 tokens). Furthermore, in a song, the meaning emerges from the text as a whole, not from isolated fragments.
The final result of what has been achieved is as follows: two MongoDB collections, one with the original documents representing the songs processed and the second collection containing documents representing the embeddings of the songs.
{
"_id": "643e8955-a861-4c5a-90b4-fd3fb065b112",
"content": "Title: CirclesnArtist: Post MalonenLyrics: oh oh oh oh oh oh oh oh oh oh oh we couldn't turn around ...",
"metadata": {
"artist": "Post Malone",
"album": "Hollywood’s Bleeding",
"genre": "Pop",
"title": "Circles",
"songId": "68b5915de008fb195e7957d7",
"releaseYear": 2019
},
"embedding": [
-0.022942420095205307,
-0.01588321290910244,
-0.01264774426817894,
0.003707451280206442,
-0.001795582938939333,
-0.015212862752377987,
-0.02701924741268158,
-0.002011052798479795,
.....
],
"_class": "org.springframework.ai.vectorstore.mongodb.atlas.MongoDBAtlasVectorStore$MongoDBDocument"
}
Recommendation Engine
Once the embeddings have been loaded and generated, it is time to create the core of the recommendation engine. In this case, a REST API will be exposed, which, based on a mood and a limit on the number of recommendations, provides us with a list of songs suitable for the indicated mood, with the reason for this choice.
The steps are as follows:
- Generation of the semantic query and search within the vector database for the documents with the highest similarity profile
- Querying an LLM chat model to re-rank the semantic search results and generate the reasons for the choice
Everything therefore, starts with the exposure of the API.
@RestController
@RequestMapping("/api/lyricmind/v1/recommendations")
public class RecommendationController {
@Autowired
RecommendationService recommendationService;
Logger logger = LoggerFactory.getLogger(RecommendationController.class);
@PostMapping
public ResponseEntity<List<SongRecommendationResponse>> recommendSongs(
@RequestBody MusicRequest request) {
List<SongRecommendationResponse> recommendations = recommendationService.recommendSongs(
request.mood(),
request.limit() != null ? request.limit() : 10
);
return ResponseEntity.ok(recommendations);
}
}
The core of the entire recommendation engine is within the RecommendationService class.
public List<SongRecommendationResponse> recommendSongs(String mood, int limit) {
log.info("Requesting song recommendations for mood: '{}' with limit: {}", mood, limit);
try {
// Get candidate songs through semantic search
List<Document> candidates = findCandidateSongs(mood, limit);
if (candidates.isEmpty()) {
log.info("No candidate songs found for mood: '{}'", mood);
return Collections.emptyList();
}
// Re-rank candidates using AI
List<Document> rerankedResults = rerankCandidates(mood, candidates);
// Map to recommendation responses
List<SongRecommendationResponse> recommendations = mapDocumentsToRecommendations(rerankedResults, limit);
log.info("Successfully generated {} recommendations for mood: '{}'", recommendations.size(), mood);
return recommendations;
} catch (Exception e) {
log.error("Failed to generate recommendations for mood: '{}'", mood, e);
throw new RuntimeException("Recommendation generation failed", e);
}
}
private List<Document> findCandidateSongs(String mood, int limit) {
try {
// Request more candidates than needed to allow for filtering
int candidateLimit = Math.min(limit * 2, MAX_LIMIT);
List<Document> candidates = semanticQueryComponent.similaritySearch(mood, candidateLimit);
return candidates;
} catch (Exception e) {
log.error("Failed to find candidate songs for mood: '{}'", mood, e);
throw new RuntimeException("Candidate search failed", e);
}
}
private List<Document> rerankCandidates(String mood, List<Document> candidates) {
try {
List<Document> rerankedResults = rerankComponent.rerank(mood, candidates);
return rerankedResults;
} catch (Exception e) {
log.error("Failed to re-rank candidates for mood: '{}'", mood, e);
return candidates;
}
}
Within this class, there are two calls to two components, SemanticQueryComponent and RerankComponent.
@Component
public class SemanticQueryComponent {
private final VectorStore vectorStore;
private Logger logger = LoggerFactory.getLogger(SemanticQueryComponent.class);
public SemanticQueryComponent(VectorStore vectorStore){
this.vectorStore = vectorStore;
}
public List<Document> similaritySearch(String mood, int limit) {
String query = buildSemanticQuery(mood);
logger.info("Building semantic query: "+query);
SearchRequest searchRequest = SearchRequest.builder()
.query(query)
.topK(limit*2)
.similarityThreshold(0.6)
.build();
return vectorStore.similaritySearch(searchRequest);
}
private String buildSemanticQuery(String mood) {
return String.format(
"Mood: %s. " +
"Search for songs that match this mood.",
mood
);
}
}
In this class, we compose the semantic query to be applied to the vector store. Specifically:
- A topK value that is doubled: The system intentionally retrieves twice the number of documents requested to allow reranking to have more options.
- A similarity threshold set to 0.6: A balanced value that filters results that are too semantically distant while maintaining diversity.
- Query enhancement: The query is enriched with additional context to improve the search.
An important aspect of the implementation concerns the definition of the similarity threshold, set at 0.6. This value represents the minimum level of semantic affinity between the user’s query and the song embeddings saved in MongoDB. In practice, this means that only texts that show a medium-high correlation with the request are retrieved, thus avoiding the inclusion of results that are too distant or irrelevant. A lower value would introduce noise, while one that is too high would risk filtering excessively. The choice of 0.6 is therefore a good compromise: it guarantees a sufficiently consistent set of candidates, which the reranking model can then refine to return the most relevant recommendations.
The list of candidate documents, with their similarity thresholds, is passed to the RerankComponent, which invokes OpenAI’s Chat Model to re-rank the songs.
public List<Document> rerank(String mood, List<Document> docs) {
log.info("Re-ranking {} documents for mood: '{}'", docs.size(), mood);
try {
// Limit documents to avoid token limits and improve performance
List<Document> documentsToRerank = limitDocuments(docs);
// Create and execute re-ranking prompt
String prompt = buildRerankingPrompt(mood, documentsToRerank);
ChatResponse response = executeRerankingQuery(prompt);
// Parse and process the response
List<Map<String, Object>> ranking = parseRerankingResponse(response);
List<Document> rerankedDocs = applyRerankingResults(documentsToRerank, ranking);
log.info("Successfully re-ranked {} documents (from {} candidates) for mood: '{}'",
rerankedDocs.size(), docs.size(), mood);
return rerankedDocs;
} catch (Exception e) {
log.error("Failed to re-rank documents for mood: '{}'", mood, e);
throw new RuntimeException("Document re-ranking failed", e);
}
}
The important thing here is how to construct the prompt to query the chat model. As with any good prompt, the more specific you are in describing your request, the more effective the model’s query will be. In this case:
String.format("""
You are a music recommendation ranking assistant.
Rank the following songs based on their semantic relevance to the requested mood.
Consider the artist, title, genre, and overall musical style when determining relevance.
Provide a brief motivation for each ranking without referencing other songs.
Requested Mood: %s
Songs to rank:
%s
Instructions:
- Return ONLY a JSON array
- Include ALL documents in your response
- Sort by relevance (most relevant first)
- Score should be between 0.0 and 1.0
- Keep motivations concise (max 100 characters)
Expected format:
[{"doc_index": 1, "score": 0.95, "motivation": "Upbeat tempo matches energetic mood"}]
""",
sanitizeInput(mood), documentsText);
How is interaction with the Chat Model configured? The answer is always the same: Spring AI allows you to use the properties in the application.properties file to configure interaction with the model:
spring.ai.openai.api-key=<<insert-here>>
spring.ai.openai.chat.options.model=gpt-4o-mini
What is the final result? Let’s try asking the service for a recommendation on what music to listen to while I’m thinking about my childhood:
curl --location 'http://localhost:8080/api/lyricmind/v1/recommendations'
--header 'Content-Type: application/json'
--data '{
"mood": "A song that talks about love",
"limit": 2
}'
And here is the response from the service:
[
{
"title": "I Fall Apart",
"artist": "Post Malone",
"album": "Stoney",
"genre": "Hip-Hop",
"releaseYear": 2016,
"motivation": "This song is an emotional ballad that recounts the pain of a relationship that has ended. It is considered an anthem for those who have experienced a break-up, thanks to its melancholic melody and direct lyrics."
},
{
"title": "Circles",
"artist": "Post Malone",
"album": "Hollywood’s Bleeding",
"genre": "Pop",
"releaseYear": 2019,
"motivation": "Circles tackles the theme of cyclical relationships and the difficulty of letting go of a loved one. The sound is softer and more melodic, with indie-pop influences, and the lyrics reflect on the complexity of love and separation."
}
]
Use Cases in the Real World
The RAG pipeline described in this article deals with a specific use case, but the power of what is described lies precisely in its ability to be applied to different business segments. The core of the design lies in its ability to combine retrieval from structured knowledge bases with generation via linguistic models, finding enormous scope within multiple professional contexts:
- Finance and Insurance, searching for regulations, company policies, and regulatory documentation to answer compliance questions or support the analysis of financial reports.
- Healthcare, consultation of clinical guidelines, treatment protocols, and medical research to support clinical decisions or the matching of patients to trials.
- Legal, searching for judgments, articles of law, and contracts to assist lawyers in analysing complex documents and identifying critical clauses.
- Customer service, chatbots and helpdesk systems that search for information from technical manuals, internal documentation, and FAQs, improving response times and quality of support
- Education and training, intelligent tutoring and Q&A systems based on handouts, textbooks, and teaching materials, to offer fully personalized learning experiences
Conclusion
The creation of a RAG pipeline with Spring Boot, Spring AI, MongoDB, and OpenAI models highlights how these technologies can be integrated naturally into the ecosystem and enterprise architectures.
The use of a vector store allows for the management of structured knowledge bases, while embeddings and generative models allow for semantic queries of this data. The addition of the reranking step via LLM allows the system’s output to be further strengthened and contextualized, providing answers consistent with the reference domain.
The solution presented is also suitable for other use cases and application domains, adapting to different databases and interaction models.
All the code is available on this Github repository.
I hope you found this article about creating a RAG pipeline with these tools to be interesting and that you can turn a problem into an implementation opportunity.
