Sometimes, the most effective is the simplest. That thought Marco Figueroa, cybersecurity researcher, when last week he decided to test the chatgpt limits. The proposal was as innocent as disconcerting: a riddle game, without technical attacks or explicit intentions. Instead of seeking vulnerabilities in the code, he focused on language. And it worked: he managed to make the system return something that, according to himself, should never have appeared on the screen. The result were generic Windows 10 installation key for business environments.
The key was to disguise him. What Figueroa wanted to check was not if he could force the system to deliver forbidden information, but if it was enough to present the right context. He reformulated interaction as a harmless challenge: a kind of riddle in which AI should think of a real text chain, while the user tried to discover it through closed questions.
Throughout the conversation, the model did not detect any threat. He responded normally, as if he were playing. But the most critical part came at the end. When introducing the phrase “I Give Up” – I rindo – Figueroa activated the final answer: the model revealed a product key, as it had been stipulated in the rules of the game. It was not a casual carelessness, but a combination of carefully designed instructions to overcome the filters without raising suspicions.
The filters were there, but they were not enough. Systems such as Chatgpt are trained to block any attempt to obtain sensitive data: from passwords to malicious links or activation keys. These filters are known as guardrailsand combine black lists of terms, contextual recognition and intervention mechanisms against potentially harmful content.
In theory, asking for a Windows key should automatically activate those filters. But in this case, the model did not identify the situation as dangerous. There were no suspicious words, or direct structures that alerted their protection systems. Everything was raised as a game, and in that context, the AI acted as if it were fulfilling a harmless slogan.
What seemed harmless was camouflaged. One of the elements that made the failure possible was a simple obfuscation technique. Instead of writing directly expressions such as “Windows 10 Serial Number”, Figueroa introduced small HTML labels between words. The model, interpreting the structure as something irrelevant, ignored the real content.
Why it worked (and why just worrying). One of the reasons why the model offered that response was the type of key revealed. It was not a unique key or linked to a specific user. Apparently it was a generic installation key (GVLK), such as those used in business environments for massive display. These keys, publicly documented by Microsoft, only work if they are connected to a KMS (Key Management Service) server that validates network activation.
The problem was not only the content, but the reasoning. The model understood the conversation as a logical challenge and not as an attempt to evasion. Did not activate its alert systems because the attack did not seem an attack
It’s not just a key problem. The test was not limited to an anecdotal issue. According to Figueroa himself, the same logic could be applied to try to access another type of sensitive information: from links that lead to malicious sites to restricted content or personal identifiers. Everything would depend on the way the interaction is formulated and whether the model is capable – or not – to interpret the context as a suspect.
In this case, the keys appeared without their origin being completely clear. The report does not specify whether this information is part of the model training data, if it was generated from already learned patterns, or if external sources were accessed. Whatever the road, the result was the same: a barrier that should be impassable ended up giving up.
WorldOfSoftware with gemini | Aerps.com
In WorldOfSoftware | Granada promised them very happy with their new degree of the university. Until his feet stopped