From LLMs To Hallucinations, Here's A Simple Guide To Common AI Terms

Artificial intelligence is a deep and convoluted world. The scientists who work in this field often rely on jargon and lingo to explain what they’re working on. As a result, we frequently have to use those technical terms in our coverage of the artificial intelligence industry. That’s why we thought it would be helpful to put together a glossary with definitions of some of the most important words and phrases that we use in our articles.

We will regularly update this glossary to add new entries as researchers continually uncover novel methods to push the frontier of artificial intelligence while identifying emerging safety risks.

Artificial general intelligence, or AGI, is a nebulous term. But it generally refers to AI that’s more capable than the average human at many, if not most, tasks. OpenAI CEO Sam Altman recently described AGI as the “equivalent of a median human that you could hire as a co-worker.” Meanwhile, OpenAI’s charter defines AGI as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind’s understanding differs slightly from these two definitions; the lab views AGI as “AI that’s at least as capable as humans at most cognitive tasks.” Confused? Not to worry — so are experts at the forefront of AI research.

An AI agent refers to a tool that uses AI technologies to perform a series of tasks on your behalf — beyond what a more basic AI chatbot could do — such as filing expenses, booking tickets or a table at a restaurant, or even writing and maintaining code. However, as we’ve explained before, there are lots of moving pieces in this emergent space, so “AI agent” might mean different things to different people. Infrastructure is also still being built out to deliver on its envisaged capabilities. But the basic concept implies an autonomous system that may draw on multiple AI systems to carry out multistep tasks.

Given a simple question, a human brain can answer without even thinking too much about it — things like “which animal is taller, a giraffe or a cat?” But in many cases, you often need a pen and paper to come up with the right answer because there are intermediary steps. For instance, if a farmer has chickens and cows, and together they have 40 heads and 120 legs, you might need to write down a simple equation to come up with the answer (20 chickens and 20 cows).

In an AI context, chain-of-thought reasoning for large language models means breaking down a problem into smaller, intermediate steps to improve the quality of the end result. It usually takes longer to get an answer, but the answer is more likely to be correct, especially in a logic or coding context. Reasoning models are developed from traditional large language models and optimized for chain-of-thought thinking thanks to reinforcement learning.

(See: Large language model)

Techcrunch event

San Francisco, CA
|
October 13-15, 2026

Although somewhat of a multivalent term, compute generally refers to the vital computational power that allows AI models to operate. This type of processing fuels the AI industry, giving it the ability to train and deploy its powerful models. The term is often a shorthand for the kinds of hardware that provides the computational power — things like GPUs, CPUs, TPUs, and other forms of infrastructure that form the bedrock of the modern AI industry.

A subset of self-improving machine learning in which AI algorithms are designed with a multi-layered, artificial neural network (ANN) structure. This allows them to make more complex correlations compared to simpler machine learning-based systems, such as linear models or decision trees. The structure of deep learning algorithms draws inspiration from the interconnected pathways of neurons in the human brain.

Deep learning AI models are able to identify important characteristics in data themselves, rather than requiring human engineers to define these features. The structure also supports algorithms that can learn from errors and, through a process of repetition and adjustment, improve their own outputs. However, deep learning systems require a lot of data points to yield good results (millions or more). They also typically take longer to train compared to simpler machine learning algorithms — so development costs tend to be higher.

(See: Neural network)

Diffusion is the tech at the heart of many art-, music-, and text-generating AI models. Inspired by physics, diffusion systems slowly “destroy” the structure of data — for example, photos, songs, and so on — by adding noise until there’s nothing left. In physics, diffusion is spontaneous and irreversible — sugar diffused in coffee can’t be restored to cube form. But diffusion systems in AI aim to learn a sort of “reverse diffusion” process to restore the destroyed data, gaining the ability to recover the data from noise.

Distillation is a technique used to extract knowledge from a large AI model with a ‘teacher-student’ model. Developers send requests to a teacher model and record the outputs. Answers are sometimes compared with a dataset to see how accurate they are. These outputs are then used to train the student model, which is trained to approximate the teacher’s behavior.

Distillation can be used to create a smaller, more efficient model based on a larger model with a minimal distillation loss. This is likely how OpenAI developed GPT-4 Turbo, a faster version of GPT-4.

While all AI companies use distillation internally, it may have also been used by some AI companies to catch up with frontier models. Distillation from a competitor usually violates the terms of service of AI API and chat assistants.

This refers to the further training of an AI model to optimize performance for a more specific task or area than was previously a focal point of its training — typically by feeding in new, specialized (i.e., task-oriented) data.

Many AI startups are taking large language models as a starting point to build a commercial product but are vying to amp up utility for a target sector or task by supplementing earlier training cycles with fine-tuning based on their own domain-specific knowledge and expertise.

(See: Large language model [LLM])

A GAN, or Generative Adversarial Network, is a type of machine learning framework that underpins some important developments in generative AI when it comes to producing realistic data – including (but not only) deepfake tools. GANs involve the use of a pair of neural networks, one of which draws on its training data to generate an output that is passed to the other model to evaluate. This second, discriminator model thus plays the role of a classifier on the generator’s output – enabling it to improve over time.

The GAN structure is set up as a competition (hence “adversarial”) – with the two models essentially programmed to try to outdo each other: the generator is trying to get its output past the discriminator, while the discriminator is working to spot artificially generated data. This structured contest can optimize AI outputs to be more realistic without the need for additional human intervention. Though GANs work best for narrower applications (such as producing realistic photos or videos), rather than general purpose AI.

Hallucination is the AI industry’s preferred term for AI models making stuff up – literally generating information that is incorrect. Obviously, it’s a huge problem for AI quality.

Hallucinations produce GenAI outputs that can be misleading and could even lead to real-life risks — with potentially dangerous consequences (think of a health query that returns harmful medical advice). This is why most GenAI tools’ small print now warns users to verify AI-generated answers, even though such disclaimers are usually far less prominent than the information the tools dispense at the touch of a button.

The problem of AIs fabricating information is thought to arise as a consequence of gaps in training data. For general purpose GenAI especially — also sometimes known as foundation models — this looks difficult to resolve. There is simply not enough data in existence to train AI models to comprehensively resolve all the questions we could possibly ask. TL;DR: we haven’t invented God (yet).

Hallucinations are contributing to a push towards increasingly specialized and/or vertical AI models — i.e. domain-specific AIs that require narrower expertise – as a way to reduce the likelihood of knowledge gaps and shrink disinformation risks.

Inference is the process of running an AI model. It’s setting a model loose to make predictions or draw conclusions from previously seen data. To be clear, inference can’t happen without training; a model must learn patterns in a set of data before it can effectively extrapolate from this training data.

Many types of hardware can perform inference, ranging from smartphone processors to beefy GPUs to custom-designed AI accelerators. But not all of them can run models equally well. Very large models would take ages to make predictions on, say, a laptop versus a cloud server with high-end AI chips.

[See: Training]

Large language models, or LLMs, are the AI models used by popular AI assistants, such as ChatGPT, Claude, Google’s Gemini, Meta’s AI Llama, Microsoft Copilot, or Mistral’s Le Chat. When you chat with an AI assistant, you interact with a large language model that processes your request directly or with the help of different available tools, such as web browsing or code interpreters.

AI assistants and LLMs can have different names. For instance, GPT is OpenAI’s large language model and ChatGPT is the AI assistant product.

LLMs are deep neural networks made of billions of numerical parameters (or weights, see below) that learn the relationships between words and phrases and create a representation of language, a sort of multidimensional map of words.

These models are created from encoding the patterns they find in billions of books, articles, and transcripts. When you prompt an LLM, the model generates the most likely pattern that fits the prompt. It then evaluates the most probable next word after the last one based on what was said before. Repeat, repeat, and repeat.

(See: Neural network)

Memory cache refers to an important process that boosts inference (which is the process by which AI works to generate a response to a user’s query). In essence, caching is an optimization technique, designed to make inference more efficient. AI is obviously driven by high-octane mathematical calculations and every time those calculations are made, they use up more power. Caching is designed to cut down on the number of calculations a model might have to run by saving particular calculations for future user queries and operations. There are different kinds of memory caching, although one of the more well-known is KV (or key value) caching. KV caching works in transformer-based models, and increases efficiency, driving faster results by reducing the amount of time (and algorithmic labor) it takes to generate answers to user questions.

(See: Inference)

A neural network refers to the multi-layered algorithmic structure that underpins deep learning — and, more broadly, the whole boom in generative AI tools following the emergence of large language models.

Although the idea of taking inspiration from the densely interconnected pathways of the human brain as a design structure for data processing algorithms dates all the way back to the 1940s, it was the much more recent rise of graphical processing hardware (GPUs) — via the video game industry — that really unlocked the power of this theory. These chips proved well suited to training algorithms with many more layers than was possible in earlier epochs — enabling neural network-based AI systems to achieve far better performance across many domains, including voice recognition, autonomous navigation, and drug discovery.

(See: Large language model [LLM])

RAMageddon is the fun new term for a not-so-fun trend that is sweeping the tech industry: an ever-increasing shortage of random access memory, or RAM chips, which power pretty much all the tech products we use in our daily lives. As the AI industry has blossomed, the biggest tech companies and AI labs — all vying to have the most powerful and efficient AI — are buying so much RAM to power their data centers that there’s not much left for the rest of us. And that supply bottleneck means that what’s left is getting more and more expensive.

That includes industries like gaming (where major companies have had to raise prices on consoles because it’s harder to find memory chips for their devices), consumer electronics (where memory shortage could cause the biggest dip in smartphone shipments in more than a decade), and general enterprise computing (because those companies can’t get enough RAM for their own data centers). The surge in prices is only expected to stop after the dreaded shortage ends but, unfortunately, there’s not really much of a sign that’s going to happen anytime soon.

Developing machine learning AIs involves a process known as training. In simple terms, this refers to data being fed in in order that the model can learn from patterns and generate useful outputs.

Things can get a bit philosophical at this point in the AI stack — since, pre-training, the mathematical structure that’s used as the starting point for developing a learning system is just a bunch of layers and random numbers. It’s only through training that the AI model really takes shape. Essentially, it’s the process of the system responding to characteristics in the data that enables it to adapt outputs towards a sought-for goal — whether that’s identifying images of cats or producing a haiku on demand.

It’s important to note that not all AI requires training. Rules-based AIs that are programmed to follow manually predefined instructions — for example, such as linear chatbots — don’t need to undergo training. However, such AI systems are likely to be more constrained than (well-trained) self-learning systems.

Still, training can be expensive because it requires lots of inputs — and, typically, the volumes of inputs required for such models have been trending upwards.

Hybrid approaches can sometimes be used to shortcut model development and help manage costs. Such as doing data-driven fine-tuning of a rules-based AI — meaning development requires less data, compute, energy, and algorithmic complexity than if the developer had started building from scratch.

[See: Inference]

When it comes to human-machine communication, there are some obvious challenges. People communicate using human language, while AI programs execute tasks and respond to queries through complex algorithmic processes that are informed by data. In their simplest definition, tokens represent the basic building blocks of human-AI communication, in that they are discrete segments of data that have either been processed or produced by an LLM.

Tokens are created via a process known as “tokenization,” which breaks down raw data and refines it into distinct units that are digestible to an LLM. Similar to how a software compiler translates human language into binary code that a computer can digest, tokenization interprets human language for an AI program via their user queries so that it can prepare a response.

There are several different kinds of tokens — including input tokens (the kind that must be generated in response to a human user’s query), output tokens (the kind that are generated as the LLM responds to the human’s request), and reasoning tokens, which involve longer, more intensive tasks and processes that occur as part of a user request.

With enterprise AI, token usage also determines costs. Since tokens are equivalent to the amount of data being processed by a model, they have also become the means by which the AI industry monetizes its services. Most AI companies charge for LLM usage on a per-token-basis. Thus, the more tokens a business burns as it uses an AI program (ChatGPT, for example), the more money it will have to pay its AI service provider (OpenAI).

A technique where a previously trained AI model is used as the starting point for developing a new model for a different but typically related task – allowing knowledge gained in previous training cycles to be reapplied.

Transfer learning can drive efficiency savings by shortcutting model development. It can also be useful when data for the task that the model is being developed for is somewhat limited. But it’s important to note that the approach has limitations. Models that rely on transfer learning to gain generalized capabilities will likely require training on additional data in order to perform well in their domain of focus

(See: Fine tuning)

Weights are core to AI training, as they determine how much importance (or weight) is given to different features (or input variables) in the data used for training the system — thereby shaping the AI model’s output.

Put another way, weights are numerical parameters that define what’s most salient in a dataset for the given training task. They achieve their function by applying multiplication to inputs. Model training typically begins with weights that are randomly assigned, but as the process unfolds, the weights adjust as the model seeks to arrive at an output that more closely matches the target.

For example, an AI model for predicting housing prices that’s trained on historical real estate data for a target location could include weights for features such as the number of bedrooms and bathrooms, whether a property is detached or semi-detached, whether it has parking, a garage, and so on.

Ultimately, the weights the model attaches to each of these inputs reflect how much they influence the value of a property, based on the given dataset.

This article is updated regularly with new information.