Authors:
(1) Raphaël Millière, Department of Philosophy, Macquarie University ([email protected]);
(2) Cameron Buckner, Department of Philosophy, University of Houston ([email protected]).
Table of Links
Abstract and 1 Introduction
2. A primer on LLMs
2.1. Historical foundations
2.2. Transformer-based LLMs
3. Interface with classic philosophical issues
3.1. Compositionality
3.2. Nativism and language acquisition
3.3. Language understanding and grounding
3.4. World models
3.5. Transmission of cultural knowledge and linguistic scaffolding
4. Conclusion, Glossary, and References
3.4. World models
Another core skeptical concern holds that systems like LLMs designed and trained to perform next-token prediction could not possibly possess world models. The notion of world model admits of several interpretations. In machine learning, it often refers to internal representations that simulate aspects of the external world. World models enable the system to understand, interpret, and predict phenomena in a way that reflects real-world dynamics, including causality and intuitive physics. For example, artificial agents can world models to predict the consequences of specific actions or interventions in a given environment (Ha & Schmidhuber 2018, LeCun n.d.). World models are often taken to be crucial for tasks that require a deep understanding of how different elements interact within a given environment, such as physical reasoning and problem-solving.
Unlike reinforcement learning agents, LLMs do not learn by interacting with an environment and receiving feedback about the consequences of their actions. The question whether they possess world models, in this context, typically pertains to whether they have internal representations of the world that allows them to parse and generate language that is consistent with real-world knowledge and dynamics. This ability would be critical to rebutting the skeptical concern that LLMs are mere Blockheads (Block 1981). Indeed, according to psychologism, systems like LLMs can only count as intelligent or rational if they are able to represent some of the same world knowledge that humans do – and if the processes by which they generate human-like linguistic behavior do so by performing appropriate transformations over those representations. Note that the question whether LLMs may acquire world models goes beyond the foregoing issues about basic semantic competence. World modeling involves representing not just the worldly referents of linguistic items, but global properties of the environment in which discourse entities are situated and interact.
There is no standard method to assess whether LLMs have world models, partly because the notion is often vaguely defined, and partly because it is challenging to devise experiments that can reliably discriminate between available hypotheses – namely, whether LLMs rely on shallow heuristics to respond to queries about a given environment, or whether they deploy internal representations of the core dynamics of that environments. Much of the relevant experimental evidence comes from intervention methods that we will discuss in Part II; nonetheless, it is also possible to bring behavioral evidence to bear on this issue by presenting models with new problems that cannot be solved through memorized shortcuts. For example, Wang, Todd, Yuan, Xiao, Côté & Jansen (2023) investigated whether GPT-4 can acquire task-specific world models to generate interactive text games. Specifically, they used a new corpus of Python text games focusing on common-sense reasoning tasks (such as building a campfire), and evaluated GPT-4’s ability to use these games as learning templates in context when prompted to generate a new game based on a game sampled from the corpus and a task specification. The guiding intuition of this experiment is that the capacity to generate a runnable program to perform a task in a text-based game environment is a suitable proxy for the capacity to simulate task parameters internally (i.e., for the possession of a task-relevant world model). Wang et al. found that GPT-4 could produce runnable text games for new tasks in 28% of cases using one-shot in-context learning alone, and in 57% of cases when allowed to self-correct based on seeing Python error messages. The fact that the model was able to generate functional text-based games based on unseen “real-world” tasks in a significant proportion of trials provides very tentative evidence that it may represent how objects interact in the game environment. Nonetheless, this hypothesis would need to be substantiated by in-depth analysis of the information encoded internally by the model’s activations, which is particularly challenging to do for very large models, and outright impossible for closed models whose weights are not released like GPT-4 (see Part II).
There are also theoretical arguments for the claim that LLMs might learn to simulate at least some aspects of the world beyond sequence probability estimates. For example, Andreas (2022) argues that the training set of an LLM can be understood as output created by–and hence, evidence for–the system of causal factors that generated that text. More specifically, Internet-scale training datasets consist of large numbers of individual documents. While the entire training set will encompass many inconsistencies, any particular document in the training set will tend to reflect the consistent perspective of the agent that originally created it. The most efficient compression of these texts may involve encoding values of the hidden variables that generated them: namely, the syntactic knowledge, semantic beliefs, and communicative intentions of the text’s human author(s). If we are predicting how a human will continue a series of numbers “2, 3, 5, 7, 11, 13, 17”, for example, it will be more efficient to encode them as a list of prime numbers between 1 and 20 than to remember the whole sequence by rote. Similarly, achieving excellent performance at next-token prediction in the context of many passages describing various physical scenarios may promote the representation of latent variables that could generate those scenarios – including, perhaps, aspects of causality and intuitive physics. As we will see in Part II, the clearest existence proof for the ability of Transformers to acquire world models from next-token prediction alone comes from the analysis of toy models trained on board game moves. At least in this very simple domain, there is compelling behavioral and mechanistic evidence that autoregressive Transformer models can learn to represent latent features of the game environment.