Large language models (LLMs) are very popular in the software world these days. They introduce new articles, blog posts, courses, and models from leading companies in our industry, such as Meta, Huggingface, Microsoft, etc., which requires us to follow new technologies closely.
We’ve decided to write some short, informative articles to introduce these topics and keep up-to-date on the latest technology. The first topic we will cover will be RAG (Retrieval-Augmented Generation).
We will create a series of articles on the topic we have determined, with three different articles that are useful and complement each other. In this article, we begin our series with the definition and basic information of RAG models.
Large Language Models have entered every aspect of our lives. We could say they’ve revolutionized the field. However, they’re not as seamless a tool as we like to call them. It also has a major drawback: It’s faithful to its training. It remains faithful to the data it was trained on. They can’t deviate from this. A model that completed training in November 2022 won’t be able to master news, laws, technological developments, etc., that emerged in January 2023. For example, an LLM model that completed training and entered service in 2021 cannot answer a question about the Russia-Ukraine war that began on February 24, 2022.
This is because its development was completed before that date. Of course, this problem was not left unsolved, and a new product, a new system, was introduced. The RAG (Retrieval-Augmented Generation) system emerged to provide you with up-to-date information whenever you need it. In the rest of our article, let’s take a closer look at both the system created with the LLM model and the RAG system, one by one, to get to know them.
Access to Information in LLM Models
The working principle of Large Language Models is based on the data taught during training, in other words, static knowledge. They have no ability to extract external data through any means. To give an example, consider a child. If we cut off this child’s external communication and teach them only English, we won’t be able to hear a single Chinese word from them.
This is because we’ve raised a child who is fluent in English, not Chinese. By cutting off their connection to the outside world, we’ve also restricted their ability to learn from external sources. Just like this child, LLM models are also infused with basic knowledge but are closed to external data.
Another characteristic of LLM models is that they are black boxes. These models aren’t fully aware of why they perform the operations they perform. They base their calculations solely on mathematical operations. Asking any LLM model, “Why did you give that answer?” You are probably pushing him too hard. They don’t answer questions through reasoning or research. The keyword here is “why?” We can also consider it a bug in LLM models.
To better understand this structure, let’s consider an example from the healthcare field. When a user asks, “I have pain in my face and eyes, and persistent postnasal drip. What should I do?”, the LLM model might respond, “Pain in my face and eyes, and postnasal drip could be signs of sinusitis. Please consult a doctor. Acute sinusitis is treated with antibiotics. In addition to medication, you can use nasal sprays like seawater or saline to soothe the sinuses.”
Everything seems normal up to this point. But if we ask the model, “Why did you give that answer?” after this answer, things get complicated. The reason the model provided this answer is because the words “facial pain” and “nasal drip” frequently appeared together with the word “sinusitis” in the training data. These models prefer to store information in their memory as a statistical pattern. Mathematical expressions are important for LLM models.
Because it doesn’t prefer to store information in its memory based on sources, it answers the question and satisfies most people instantly, but when people with a more investigative nature ask the model, “Why did you give this answer?”, the model fails to produce any explanatory answer. I believe we now have a sufficient understanding of LLM models. Now, we can discuss the RAG system and its solutions to these problems.
RAG: Combining LLMs With Retrieval Systems
RAG systems offer innovations compared to systems built on pure LLM models. One of these is that they work with dynamic information, not static information like LLM models. In other words, they also scan external sources without being limited to the data they were trained on. The first “r” in RAG stands for retrieval. The retrieval component’s role is to perform search operations.
The generator is the second main component, and its role is to generate the correct answer based on the data retrieval returns. In this article, we will briefly touch on RAG’s operating principle: Retrieval scans information from external sources and retrieves documents relevant to the user’s question by breaking them into small pieces called “chunks.”
It vectorizes these chunks and the user’s question, and then generates the most efficient answer by examining the fit between them. This task of generating the answer is performed by “Generation.” We will discuss it in detail in the next article in this series. These basic components that make up RAG systems make it a stronger structure. They prevent them from being stuck in static information like pure LLM models. RAG systems offer significant advantages, especially in many fields that require up-to-date information. For example, you might want to create a doctor’s model in the medical field.
Because your model will serve a vital area, it won’t have outdated or incomplete information. The medical world, just like the IT sector, is developing every day, and new studies are being put forward. Therefore, your model is expected to master even the latest studies. Otherwise, you risk endangering human life with a misleading model. In such cases, RAG-supported systems eliminate the problem of outdated information by connecting to external databases.
The fundamental difference between RAG systems and LLM models is RAG’s core philosophy: “Don’t store information, access it when you need it!” While pure large language models store information in their memories and produce answers after being trained, RAG systems access information by searching and scanning outside whenever they need it, in line with their philosophy.
Just like a human searching the internet, this overcomes one of the most significant disadvantages of pure LLM models: their memory-reliance. To further illustrate our point, we can compare these two systems.
Scenario:
User: “What is the United States’ December 2024 inflation rate?”
LLM: “According to December 2022 data, it was 6.5%.” (Not an up-to-date answer)
RAG:
- Retrieves December 2024 data from a reliable source or database (World Bank, Trading Economics, etc.).
- LLM uses this data and responds, “According to Trading Economics, inflation in the United States for December 2024 is announced as 2.9%.”
Let’s briefly compare what we’ve discussed so far and present it in the table below.
Features |
LLM (static model) |
RAG (retrieval-augmented generation) |
---|---|---|
Information |
Limited to training data |
Can pull real-time information from external sources |
Current Level |
Low |
High |
Transparency |
The source of the decision cannot be disclosed (black box) |
The source can be cited |
In conclusion, to summarize briefly, while LLM models are limited to the data they are trained on, RAG systems are not only built on a specific LLM model and have basic knowledge, but also have the ability to draw real-time information from external sources. This advantage ensures that it is always up-to-date. This concludes the first article in our series. In the next article, we will delve into the more technical aspects. For friends who want to access or examine the practice of this work, they can find the links to the relevant repos I created with Python and related libraries in my GitHub account at the end of the article.
Hope to see you in the next article of the series.
Metin YURDUSEVEN.
Further Reading
“We would also like to acknowledge the foundational contribution of Facebook AI’s 2020 RAG paper, which significantly informed this article’s perspective.”
metinyurdev github
Multi-Model RAG Chatbot Project
PDF RAG Chatbot Project