Table of Links
Abstract and Introduction
Related Work
The media, filter bubbles and echo chambers
Network effects and Information Cascades
Model collapse
Known biases in LLMs
A Model of Knowledge Collapse
Results
Discussion and References
Appendix
Comparing width of the tails
Defining knowledge collapse
Appendix
Comparing width of the tails
As mentioned above, the reported results used a tdistribution with 10 degrees of freedom, which has slightly wider tails than a standard normal distribution. We can compare the results with a standard normal distribution (i.e.a t-distribution as the degrees of freedom becomes large) or with wider tails. In Figure 7, we plot a comparison of the results from the main section (with 10 degrees of freedom with wider or narrower tails (3 and 9999 degrees of freedom respectively). The main difference is for more extreme discounts provided by AI (< 0.7), for which the wider tails contribute to knowledge collapse (i.e.generate a public knowledge distribution further from the true distribution). Narrower tails, such as from a standard normal distribution, generate results broadly similar to the main model. Thus, as expected more information in the tails makes the effect of knowledge collapse more pronounced, but is plays less of a role than the other parameters discussed above in determining the dynamic of collapse.
Defining knowledge collapse
To define knowledge collapse we need to distinguish between a few conceptual sets of ‘knowledge’, whether or not these are empirically observable.[12] First, we consider the broad set of historical human knowledge that was at one point held in common within communities of humans, shared and reproduced in a regular way, which we might call **‘broad historical knowledge’. Second, we consider the set of knowledge that is held or accessible to us, (humans who are living in a given epoch), which we call ‘available current knowledge.’ In the example cited in the main section, the ancient Roman recipe for concrete is part of broad historical knowledge but not part of available current knowledge.
Technological innovations from the printing press to the internet to AI mediate human interactions and human’s exposure to historical and current sources of knowledge. The net effect might be to restrict or expand access to diverse knowledge and the long-tails of human knowledge. For example, the digitization of archives might make obscure sources available to a wider audience and thus increase the amount of ‘broad historical knowledge’ that is part of the ‘available current knowledge.’
We also distinguish a third, narrower set of knowledge, which reflects not what is theoretically accessible to humans but which is readily part of human patterns of thinking or habits of thought. This we call ‘human memory knowledge’ or ‘human working knowledge’ by reference to human working memory.
For example, consider the problem of listing all the animals that have ever existed on earth. There might be some that humans previously knew about, but which subsequently went extinct and which do not exist anywhere among the scientific literature or individuals currently living on earth. More narrowly, the set of “available current knowledge” corresponds to the set of all animals that a team of all biologists could compile with access to the internet and other records. Finally, however, if we were able to conduct a survey of all humans on earth and ask them to name as many animals as possible in, say, one day, we would come up with a more limited list (that would include many repetitions).
In many practical applications, ‘human working knowledge’ is the most relevant because it is the knowledge that shapes human action and reflection. A doctor considering possible sources of a crossover pathogen might rely on their knowledge of common species in asking a patient if they had recently been in the presence of certain animals (even if a researcher who specializes in this area might consult know more and sources to find a longer possible list). A linguist trying to evaluate or create possible linguistic theories implicitly bases their judgement on the known language families and their structures, and so on. Edison and his team famously tried thousands of different filament materials, but if it bamboo had not been among the materials that came to mind as they searched alternatives, a practical electric bulb may have been invented only later.
Finally, it is useful to define the ‘epistemic horizon’ as the set of knowledge that a community of humans considers practically possible to know and worth knowing. [13] A common controversy in the public imagination is whether traditional medicines are worth consideration when searching for medical cures. Such traditional medicines might be outside of the epistemic horizon because they are not written down in the scientific literature, are only known by individuals speaking lesser known languages, or because the scientists in question consider them too costly to acquire or unlikely to be beneficial. One way to think about this relationship is as a generalization of ‘availability bias’, in which we take the set of readily recalled information to be more likely, important, or relevant (Tversky and Kahneman, 1973).
In these terms, we define ‘knowledge collapse’ as the progressive narrowing over time (or over technological representations) of the set of human working knowledge and the current human epistemic horizon relative to the set of broad historical knowledge.
On a theoretical level, the idea of epistemic horizon has an intellectual heritage in Immanuel Kant’s argument about the forms and categories of understanding that underly the possibility of knowledge (Kant, 1933). Subsequent authors expanded on the implications if these categories are in some way fashioned by one’s upbringing and community (e.g. Herder, 2024; Hegel, 2018; Mannheim, 1952).[14] A related concern is the way that the scientific community can be, at least during certain epochs, bounded by its inherited understanding of the world (Kuhn, 1997; Zamora Bonilla, 2006). As noted above, specific technological forms may generate a flood of information that inhibit the communication of information (Pfister, 2011).
Finally, one of the challenges presented by the ‘epistemic horizon’ (as of that of an ‘event horizon’[15]) is that we cannot observe directly its limits. For example, the presence of an event our current model takes to be very rare (e.g.a “20-sigma” event) can suggest our current model is incorrect, but in the absence of such a rare event, we cannot know if the current tails of knowledge are correct or too thin (Taleb, 2007). These considerations suggest the concern of generational knowledge collapse is plausible and an unbounded optimism in the ability of rational actors to update on the value of tail knowledge may be shortsighted.
[12] The broadest definition of ‘human knowledge’ might encompass all the beliefs, information, values, and representations of the world ever held by humans anywhere on earth, whether recorded or not. We are unable to access almost all of this, and we tend to assume that the useful parts of this have been passed on to others, but theoretically we might want to allow for the fact that, for example some human somewhere once had an important, original, and useful belief just before they, say, got hit by a car and could not tell anyone. Secondly, in using the term ‘knowledge’, we do not restrict our focus based on the truth of the beliefs held, such that in referring to ‘human knowledge’ we refer to a variety of beliefs and statements, some of which contract others.
[13] In economic terms, it is the set of information that for which the individual believes the expected returns are greater than the expected costs. This might be considered for a specific task or set of tasks, but could be generalized to the set of knowledge for which she expects positive gains over a period of time, her lifetime, or for society over a finite or infinite horizon with discounting.
[14] e.g.“If man received every thing from himself and developed it independently of extrinsic objects, then a history of a man might be possible, but not of men in general. But as our specific character resides precisely in this, that, born almost without instinct, we are raised to manhood only by lifelong practice, on which both the perfectibility as well as the corruptibility of our species rests, so it is precisely thereby that the history of mankind is made a whole: that is, a chain of sociability and formative tradition from the first link to the last.(Herder, 2024, p.226)
[15] Technically, we can observe the ‘shadow’ of an event horizon (Khodadi et al., 2020)