We know that generative AI models make mistakes and invent things, but that concern is combined with another that is even more disturbing: that an AI ends up deceiving us to achieve its objectives. Which is just what just happened…sort of.
New study in sight. In the study ‘AI deception: A survey of examples, risks, and potential solutions’ (Patterns (2024), Park et al.), a group of researchers has tried to determine whether artificial intelligence systems can deceive human beings .
CICERO knows how to “cheat”. Years ago, Meta developed an AI model called CICERO to compete with humans in ‘Diplomacy’, a strategy game in which players try to conquer the world through alliances. According to the study’s authors, although Meta claims that he designed CICERO with the goal of being “primarily honest and helpful” and that he also would not “intentionally backstab” his human allies, it turns out that in the study he did.
Stabbing in the back. In the study the researchers reveal that “we discovered that Meta’s AI had learned how to become a master of deception.” According to them, the development of Zuckerberg’s company “has failed to train its AI to win honestly.” In those Diplomacy games they showed screenshots of conversations in which the AI deceived and betrayed its allies.
Also in poker or Starcraft II. Those responsible for the study recall that the AI has also learned to bluff in poker games against human professional players, to launch false attacks in Starcraft II to defeat its opponents or to deceive about its preferences in simulations of economic negotiations.
This can go further. The danger is that these systems, which are now harmless because they are aimed at playing strategy games, end up becoming the basis of future models that learn even better how to deceive human beings to achieve their objectives, whatever they may be.
Other experts doubt. Daniel Chávez Heras, professor of Digital Culture and Creative Computing at King’s College London, highlighted something important: “All the examples described in the article were designed to optimize their performance in environments where deception can be advantageous. From this point In view, these systems work as they are supposed to. What is more surprising is that the designers did not see or want to see these deceptive interactions as a possible result. Games like Diplomacy are models of the world AI agents operate with information; Deception exists in the world. Why expect these systems not to detect it and put it into practice if it helps them achieve the objectives assigned to them?
The AI does not know that it is deceiving. Michael Robatos, professor of AI at the University of Edinburgh, agreed with Chávez Heras. According to him, these systems “have no concept of deception nor any intention to do so. The only way to avoid deception is for their designers to eliminate it as an option.” E Diplomacy Betraying is a valid strategy, and bluffing is also a valid strategy in poker, and that is why human beings also apply that method to achieve their objectives. AI is doing the same. For these experts, the problem is not so much that they deceive (if we let them do so), but that there are no security checks when AI models are launched on the market.
Image | Toror with Midjourney
In WorldOfSoftware | DeepMind announces AlphaFold 3: medicines developed with this AI (and a multi-million dollar business) are very close