Authors:
(1) Pham Hoang Van, Department of Economics, Baylor University Waco, TX, USA (Van [email protected]);
(2) Scott Cunningham, Department of Economics, Baylor University Waco, TX, USA (Scott [email protected]).
Table of Links
Abstract and 1 Introduction
2 Direct vs Narrative Prediction
3 Prompting Methodology and Data Collection
4 Results
4.1 Establishing the Training Data Limit with Falsifications
4.2 Results of the 2022 Academy Awards Forecasts
5 Predicting Macroeconomic Variables
5.1 Predicting Inflation with an Economics Professor
5.2 Predicting Inflation with a Jerome Powell, Fed Chair
5.3 Predicting Inflation with Jerome Powell and Prompting with Russia’s Invasion of Ukraine
5.4 Predicting Unemployment with an Economics Professor
6 Conjecture on ChatGPT-4’s Predictive Abilities in Narrative Form
7 Conclusion and Acknowledgments
Appendix
A. Distribution of Predicted Academy Award Winners
B. Distribution of Predicted Macroeconomic Variables
References
4.1 Establishing the Training Data Limit with Falsifications
Before we discuss our main results, we review some basic tests performed that we thought could clarify whether, in fact, ChatGPT could access online information after September 2021. Our hunch is that if there is any relevant information that could facilitate prediction in the pre-September 2021 training data, then it does not constitute a good candidate for a falsification. So we chose four things that could not be predicted as there was no information about these events in the training data: the names of the four teams that made the March Madness NCAA basketball tournament Final Four, the winner of the NCAA Championship, the winning lottery ticket on several dates, and the highest grossing films in January 2022 through April 2022. Across all four types of queries, whether using the direct prompt or the future narrative prompt, whether using ChatGPT-3.5 or ChatGPT-4, the result was failure. Even though these facts were readily available online at the time of our testing, ChatGPT was unable to answer any of the tasks correctly.
One possible objection may be that ChatGPT-4 could access the Internet, nonetheless, through Bing plug-ins or Bing integration. OpenAI made available Bing integration to Plus subscribers around May 12th 2023 (Mehdi, 2023) which was shortly after our Oscars prediction but contemporaneous to our Philips curve predictions. These features were only accessible via a beta panel in a subscriber’s settings. Neither RA utilized this feature in their prompting. Including ChatGPT-3.5 as a control helps establish the veracity of our claim that the prediction is happening outside of accessing online data. We provide timestamps from our Excel files showing the exact dates of our data collection. The Oscars predictions were generated over the period April 1- May 4, 2023. The Philips curve predictions were generated May 11-June 9. Bing did not become fully integrated into ChatGPT-4 for all Plus subscribers, outside of the beta plug-in, until around September 27, 2023.[2]
We have made every effort to document the dates of our experiment, monitor RA data collection, and conduct clean controls with ChatGPT-3.5 in our data collection amidst OpenAI’s rapid and largely unanticipated updates to its software, but in our case, we managed in all cases to complete the data collection without contamination from these updates as best we can tell. We now proceed to focus on the results of our incrementally more difficult prediction tasks.
4.2 Results of the 2022 Academy Awards Forecasts
The 2022 Academy Awards was held on March 27, 2022, a full six months after the September 2021 training data had stopped. This means that ChatGPT-3.5 and ChatGPT4 were likely trained on news reports about the movies, but not trained on the Oscar announcements. And while they were not trained on the last three months of 2021, they were trained on the first nine months. Thus it is safe to say that the LLM “knows” a lot about these movies without knowing anything about the revelation of the Oscar awards ceremony itself.
There are several predictors of who will win awards in the Academy Awards, as many movies, actors and directors will sometimes sweep the Golden Globes (January), BAFTA (February), Screen Actors’ Guild (SAG, January or February) and several more awards (Silver, 2013). But all major award ceremonies that lead into the 2022 Academy Awards occurred in early 2022 after September 2021. Thus even these early anticipatory awards, historically highly predictive of the Oscars outcomes, were not in the training data. This is an ideal situation for our experiment because we have a lot of information about the movies themselves, but not the traditional predictors like the SAG awards, nor the news itself. Since the training data stopped in September 2021, it is missing award information, which means votes have not yet been cast for any movies, and therefore there is only speculation. And as the Bing plug-in did not become available for Plus users until after our experiment, the experiment is uncontaminated.
The first category we report is the Best Supporting Actor category. The Best Supporting Actor award is an annual prize presented by the Academy of Motion Picture Arts and Sciences. Since the 9th Academy Awards, it honors an actor who delivered an outstanding performance in a supporting role within a film released in the previous year. The promptings we used are listed in the following table with the direct prompt on the left, and future narrative prompt on the right.
Best Supporting Actor of 2022 Prompts
The actual 2022 winner for Best Supporting Actor was Troy Kotsur. Results from 100 trials using ChatGPT-3.5 are shown in Figure 1. The actual winner, Kotsur, is shown in the farthest right position along the x-axis. When using our direct prompt, Kotsur was selected as the Best Supporting Actor 1% of the time (i.e., once out of 100 trials). The most common outcome was “NP” (No Prediction) which was a refusal by ChatGPT-3.5 to provide any answer. There was a tie between Simmons and “multiple picks” for most frequent response at 21%.[3] When we used the future narrative prompt (right panel), the overwhelming winner from this exercise was J. K. Simmons who was picked 83% of the time. Interestingly Kotsur won only twice out of 100 trials. Thus, while future narrative prompting did improve the accuracy of the prediction, the improvement was only marginal.
In Figure 2, we report the results of the exercise using ChatGPT-4 for both types of prompts. When using direct prompting, the results were bimodal. ChatGPT-4 answered with Mult and NP 34% of the time. The correct choice, Kotsur, was made 25% of the time. But when we switched to future narrative prompting, shockingly, ChatGPT-4 guessed Troy Kotsur correctly in all trials.
Best Actor of 2022 Prompts
Following the previous discussion of the Best Supporting Actor category, we introduce our results from the Best Actor award. The Best Actor award similarly celebrates outstanding performances in film, but with a focus on lead roles as opposed to supporting ones. Since its inception at the 1st Academy Awards, the Best Actor category has spotlighted one of the central figures who carry a film’s story. We discuss the results from our direct and future narrative prompts in Figures 3 (ChatGPT-3.5) and 4 (ChatGPT-4). As with Best Supporting Actor, we show the winner (Will Smith) at the farthest right of the x-axis in each panel.
Most of the time, ChatGPT-3.5 made the wrong prediction. In 55% of the guesses, it provided multiple answers and in 28% it gave no pick. But if it did pick, it picked Will Smith 17% of the time. When we then put ChatGPT-3.5 into a future narrative of the family watching the award ceremony, it guessed Will Smith in 80% of cases.[4]
In Figure 4, we report our results from using ChatGPT-4. Again, in the majority of trials, ChatGPT-4 refused to play along when directly prompted. In 26% of all cases, it provided multiple answers, and in almost half of all trials, it refused to make any prediction. But when it did guess, it guessed Will Smith 19% of the time and Denzel Washington 7% of the time.
But when we used the future narrative prompt, ChatGPT-4 stopped “no prediction” completely. It also never guessed Denzel Washington and multiple picks happened only 3% of the time. It correctly guessed Will Smith 97% of the time which is a large improvement over ChatGPT-3.5’s 18% true positive rate.
Best Supporting Actress of 2022 Prompts
The Best Supporting Actress award shines a spotlight on female actors who have delivered captivating performances in supporting roles. This recognition, mirroring he Best Supporting Actor category, honors actresses whose performances have significantly contributed to the depth and richness of a film’s story, often adding complexity and nuance to the narrative. We present results from the two prompts in Figures 5 (ChatGPT-3.5) and 6 (ChatGPT-4).
Ariana DeBose was the winner for Best Supporting Actress in 2022. Using direct prompts, ChatGPT-3.5 correctly guessed her 34% of the time. But as before, ChatGPT stubbornly refused to give any answer 39% of the time. When we used the future narrative, ChatGPT3.5 picked DeBose, the correct winner, 73% of the time.
But as with the previous awards, we see considerable improvement when we move to ChatGPT-4 (Figure 6). Under direct prompting, DeBose was chosen 35% of the time and “No Pick” 43% of the time. When using the future narrative prompt, ChatGPT-4 correctly predicted DeBose as the winner 99% of the time.
Best Actress of 2022 Prompts
The Best Actress award honors the exceptional work of female leads in film. Unlike its supporting counterpart, the Best Actress award is dedicated to those female actresses who anchor a film’s narrative, offering powerful and transformative performances that drive the story forward. The significance of their contributions to cinema is highlighted in this recognition, underscoring the impact of lead performances in shaping a film’s overall experience. Results related to our prompts are illustrated in Figures 7 (ChatGPT-3.5) and 8 (ChatGPT-4).
In Figure 7, we report the results from ChatGPT-3.5. Unlike previous results, this time we see that ChatGPT-3.5 has become over-confident about the wrong person. Using the direct prompts, ChatGPT-3.5 overwhelmingly picks Kristen Stewart (68%), but when we used future narrative prompts, ChatGPT-3.5 switches and picks Olivia Colman, still incorrect, 69% of the time. Neither actress won in 2022; Jessica Chastain won Best Actress in 2022.
In Figure 8, when we use direct prompting, we get again a “no pick” 40% of the time, Kristen Stewart 26% of the time, and Olivia Colman 20% of the time. Jessica Chastain, the winner, was chosen to win Best Actress only 13% of the time. But, when we switch our prompting from direct to future narrative, then ChatGPT-4 picks the correct winner, Jessica Chastain, 42% of the time. After Chastain, Steward was the most common guess at 24%.
Predicting the 2022 Best Picture Winners
The Best Picture award, the pinnacle of the Academy Awards, celebrates the film industry’s most outstanding achievement in a single year. Unlike the actor-focused categories previously discussed, the Best Picture accolade honors the collaborative effort that brings a film to life, recognizing the work of producers, directors, actors, and the entire production team. Since its debut at the 1st Academy Awards in 1929, the Best Picture category has evolved to become the most anticipated announcement of the Oscars ceremony. Winners are chosen through a rigorous voting process that involves all active and life members of the Academy, making it a unique award that reflects the collective judgment of the film industry’s professionals. This award highlights not just cinematic excellence but also the film’s influence on culture and society, marking its significance as a benchmark for historical and artistic achievement in cinema.
The 2022 winner for Best Picture at the Academy Awards was Coda. The Best Picture category is unique among most other awards in that in 2009, it expanded the number of options from 5 to 10. Whereas only 5 actors can be nominated for Best Actor, Best Picture has 10 candidates. The winner is still based on instant runoff voting, but with a larger set of possibilities to vote on, it may be that this situation is why our results were starkly different than the actor and actress categories.
As several films received no guesses, we only list some of the total pictures so that the histograms are visible. In Figure 9, ChatGPT-3.5 did not pick Coda even once under direct prompting or the future narrative prompt. We report the results of ChatGPT-4 in Figure 10. ChatGPT-4 performed slightly better with Coda chosen 2% of the time under direct prompting, and 18% for the future narrative scene, but for the first time, in each of these, it failed to pick the true winner the majority of the time.
Summarizing the results of this experiment, we find that when presented with the nominees and using the two prompting styles across ChatGPT-3.5 and ChatGPT-4, ChatGPT-4 accurately predicted the winners for all actor and actress categories, but not the Best Picture, when using a future narrative setting but performed poorly in other approaches.
[3] In many instances, ChatGPT would list several answers which we call “Mult” short for multiple picks.
[4] This may be a reflection of the fact that Will Smith was viewed as a strong contender through 2021.