Authors:
(1) Pham Hoang Van, Department of Economics, Baylor University Waco, TX, USA (Van [email protected]);
(2) Scott Cunningham, Department of Economics, Baylor University Waco, TX, USA (Scott [email protected]).
Table of Links
Abstract and 1 Introduction
2 Direct vs Narrative Prediction
3 Prompting Methodology and Data Collection
4 Results
4.1 Establishing the Training Data Limit with Falsifications
4.2 Results of the 2022 Academy Awards Forecasts
5 Predicting Macroeconomic Variables
5.1 Predicting Inflation with an Economics Professor
5.2 Predicting Inflation with a Jerome Powell, Fed Chair
5.3 Predicting Inflation with Jerome Powell and Prompting with Russia’s Invasion of Ukraine
5.4 Predicting Unemployment with an Economics Professor
6 Conjecture on ChatGPT-4’s Predictive Abilities in Narrative Form
7 Conclusion and Acknowledgments
Appendix
A. Distribution of Predicted Academy Award Winners
B. Distribution of Predicted Macroeconomic Variables
References
5 Predicting Macroeconomic Variables
The selection of the Academy Awards as a predictive outcome is independently interesting. We also chose it because we thought it had a high change of success given the ample amount of writing on these movies and lead and supporting actors and actresses throughout the year. But now we move to macroeconomic phenomena that are regularly the subject of policy making and prediction. The prediction of macroeconomic variables is important because it helps individual, firms and government actors not just better plan today in light of possible future positive or negative news. It also can inform Fed decisions to engage in open market operations and other tools at its disposal to ease or tighten the money supply.
While predicting Best Actor and predicting the inflation rate several months ahead of time are topically similar in that both require predicting real but unknown future events, they differ in important ways—some obvious, and some not so obvious. First, the prediction of Best Actor had a 20% chance of success under guessing. It was selecting a categorial event, not a right skewed potentially unbounded number ranging from 0% to something massive under hyperinflation scenarios. Even if higher values are unlikely, they are possible with a large language model that hallucinates. It is unclear what is in the training data, or to what degree large language models round continuous variables, as OpenAI has been secretive about the training data and has not shared the source code for ChatGPT-4. We bring these issues up simply to highlight that shifting from the Academy Awards to macroeconomic variables, even if large language models are somehow aggregating from its training data, these two types of predictions differ in their chances of success.
The second thing is that the two predictions occur at different time periods. The 2022 Academy Awards were held on March 22nd, 2022, which was six months out from the training cutoff date. By contrast, we asked ChatGPT-4 to make several predictions regarding macro variables that required a monthly prediction over 12 months from October 2021 to September 2022.
But the third thing is that it seems more likely that the Academy Awards is insensitive to trends. If anything, it is determined by trends as there are several earlier awards ceremonies (e.g., Director Guild Award) that are historically highly predictive of the various “Best” awards at the Oscars. Even though none of those are in the training data, either, it is unclear why earlier wins or losses might shift voting preferences at the Academy Awards.
This is not the case, though, for macroeconomic variables because the Federal Reserve, insofar as it follows rules like a Taylor Rule, will respond to changing economic conditions with its policy levers to contract or expand the economy through monetary policy. This makes prediction challenging, and even if large language models could predict exogenous events, it may suffer from a built-in Lucas Critique problem if its training data has beliefs that are not based on the Taylor Rule. There were after all major world events that occurred between September 2021 and March 2022, such as Russia’s invasion of Ukraine, or higher than expected inflation, both of which may have had unknown effects that could have impacted domestic inflation and unemployment leading to predictions that overshoot or under-predict because of the Fed’s reliance on endogenous rules or discretion. We explore this prediction problem in detail by, again, asking direct (naive) and narrative prompting for 100 trials for both ChatGPT3.5 and ChatGPT-4 using our two RAs to minimize cascading bias but this time we repeat the experiment a second time by prompting ChatGPT-3.5 and ChatGPT-4 with additional information about Russia’s invasion of Ukraine in early 2022. This allows us to see if large language models ever attempt to utilize ceteris paribus style reasoning when aggregating the training data information, as there is no obvious reason why it necessarily should.