Authors:
(1) Raphaël Millière, Department of Philosophy, Macquarie University ([email protected]);
(2) Cameron Buckner, Department of Philosophy, University of Houston ([email protected]).
Table of Links
Abstract and 1 Introduction
2. A primer on LLMs
2.1. Historical foundations
2.2. Transformer-based LLMs
3. Interface with classic philosophical issues
3.1. Compositionality
3.2. Nativism and language acquisition
3.3. Language understanding and grounding
3.4. World models
3.5. Transmission of cultural knowledge and linguistic scaffolding
4. Conclusion, Glossary, and References
3.2. Nativism and language acquisition
Another traditional dispute concerns whether artificial neural network models of language challenge popular arguments for nativism in language development.[9] This dispute centers on two claims from mainstream generative linguistics about the learnability of grammar that are occasionally conflated: a strong in-principle claim and a weaker developmental claim. According to the strong learnability claim, no amount of exposure to linguistic data would be sufficient, on its own, to induce the kind of syntactic knowledge that children rapidly acquire. It follows that statistical model learners without built-in grammatical priors should be incapable of mastering language rules. While this strong claim is less popular than it once was among generative linguists, it can still be found in a popular textbook (Carnie 2021, pp. 17-20). The weaker learnability claim is supported by “poverty of the stimulus” arguments, according to which the actual nature and quantity of linguistic input available to children during development is insufficient, without innate knowledge, to induce the correct generalization about underlying syntactic structures (Pearl 2022). To address this inductive challenge, Chomskyan linguists argued that children must be born with an innate “Universal Grammar,” which would have potentially dozens of principles and parameters that could, through small amounts of experience, be efficiently fit to particular grammars in particular languages (Chomsky 2000, Dąbrowska 2015, Lasnik & Lohndal 2010).
The apparent success of LLMs in learning syntax without innate syntactic knowledge has been offered as a counterexample to these nativist proposals. Piantadosi (2023), in particular, forcefully argues that LLMs undermine “virtually every strong claim for the innateness of language” that has been proposed over the years by generative linguists. LLMs’ ability to generate grammatically flawless sentences, together with a large body of work in computational linguistics demonstrating their acquisition of sophisticated syntactic knowledge from mere exposure to data, certainly puts considerable pressure on in-principle learnability claim (Piantadosi 2023, Millière forthcoming). In that sense, LLMs provide at least an empiricist existence proof that statistical learners can induce syntactic rules without the aid of innate grammar.
However, this does not directly contradict the developmental claim, because LLMs typically receive orders of magnitude more linguistic input than human children do. Moreover, the kind of input and learning environment that human children face exhibits many ecological disanalogies with those of LLMs; human learning is much more interactive, iterative, grounded, and embodied. Nonetheless, specific language models can be used as model learners by carefully controlling variables of the learning scenario to fit a more realistic target; in principle, such model learners could constrain hypotheses regarding the necessary and sufficient conditions for language learning in humans (Warstadt & Bowman 2022, Portelance & Jasbi 2023). Indeed, ongoing efforts to train smaller language models in more plausible learning environments are starting to bring evidence to bear on the developmental claim. The BabyLM challenge, for example, involves training models on a small corpus including childdirected speech and children’s books; winning submissions from the inaugural challenge outperformed models trained on trillions of words on a standard syntax benchmark, suggesting that statistical models can learn grammar from data in a more data-efficient manner than typically claimed (Warstadt et al. 2023). This corroborates previous work on the surprising efficiency of small language models in learning syntactic structures from a relatively modest amount of data (Huebner et al. 2021).
These initial results are still tentative; whether statistical learners without built-in parsers can learn grammar as efficiently as children from the same kind of input remains an open empirical question. A promising strategy is to mimic the learning environment of children as closely as possible, by training models directly on a dataset of developmentally plausible spoken text (Lavechin et al. 2023), or even on egocentric audiovisual input recorded from a camera mounted on a child’s head (Sullivan et al. 2021, Long et al. 2023).[10] If future models trained on these or similar datasets were confirmed to exhibit the kinds of constrained syntactic generalizations observed in children, this would put considerable pressure on the developmental learnability claim–suggesting that even a relatively “poor” linguistic stimulus might be sufficient to induce grammatical rules for a learner with very general inductive biases.
[9] See Millière (forthcoming) for a detailed review and discussion.
[10] It is worth noting that attempts to mimic children’s learning scenario do not always translate to expected improvements in model learning efficiency. For example, there are strong a priori reasons to believe that curriculum learning—presenting training examples in a meaningful order, such as gradually increasing syntactic complexity and lexical sophistication–should help both children and language models. Yet initial results from the BabyLM challenge found that attempts to leverage curriculum learning were largely unsuccessful (Warstadt et al. 2023).