There’s bad news for those using digital surveys to try to understand people’s online behavior: We may no longer be able to determine whether a human is responding to them or not, a recent study has shown—and there seems to be no way around this problem.
This means that all online canvassing could be vulnerable to misrepresenting people’s true opinions. This could have repercussions for anything that falls under the category of “information warfare,” from polling results, to misinformation, to fraud. Non-human survey respondents, in aggregate, could impact anything from flavors and pricing for a pack of gum, to something more damaging, such as whether or not someone could get government benefits—and what those should be.
The problem here is twofold: 1) humans not being able to tell the difference between human and bot responses, and 2) in instances where automation is regulating action based on these responses, there would be no way to use such polling and safeguard against potentially dangerous problems as a result of this indistinguishability.
The study by Dartmouth’s Sean J. Westwood in the PNAS journal of the National Academy of Sciences, titled “The potential existential threat of large language models to online survey research,” claims to show how we can no longer trust that, in survey research, we can no longer simply assume that a “coherent response is a human response.” Westwood created an autonomous agent capable of producing “high-quality survey responses that demonstrate reasoning and coherence expected of human responses.”
To do this, Westwood designed a “model-agnostic” system designed for general-purpose reasoning, that focuses on a two layer architecture: One that acts as an interface to the survey platform and can deal with multiple types of queries while extracting relevant content, and another “core layer” that uses a “reasoning engine” (like an LLM). When a survey is conducted, Westwood’s software loads a “demographic persona” that can store some recall of prior answers and then process questions to provide a “contextually appropriate response” as an answer.
Once the “reasoning engine” decides on an answer, the interface in the first layer outputs a mimicked human response. The system is also “designed to accommodate tools for bypassing antibot measures like reCAPTCHA.” Westwood’s system has an objective that isn’t to “perfectly replicate population distributions in aggregate . . . but to produce individual survey completions (that) would be seen as reasonable by a reasonable researcher.”
Westwood’s results suggest that digital surveys may or may not be a true reflection of people’s opinions. There is just as likely a chance that surveys could, instead, be describing what an LLM assumes is “human behavior.” Furthermore, humans or AI making decisions based on those results could be relying on the “opinions” of simulated humans.
