ChatGPT was touted as ‘the heart’ of OpenAI’s new Atlas browser, which will be only available at first on computers powered by Apple’s operating system Copyright AFP MARCO BERTORELLO
OpenAI’s recent overview of how they intend to strengthen their cyber resilience has come amid criticisms of their own AI advancement. In turn, this came right on the heels of accelerating release cycles (GPT5.2 announced weeks after 4o).
OpenAI is actively addressing cybersecurity risks as its AI models advance. The company is investing in strengthening its models for defensive cybersecurity tasks and creating tools to assist defenders in auditing code and patching vulnerabilities.
This is not an easy task and OpenAI has warned that future models could pose high cybersecurity risks, capable of developing working zeroday exploits or assisting with complex cyberespionage campaigns.
To offer a solution, the firm is implementing a defenceindepth approach, focusing on access controls, infrastructure hardening, and monitoring to manage these risks effectively. Is this enough?
To some analysts, these updates trigger more questions than answers, including:
How should enterprises assess whether an AI model is actually safe to deploy in production environments?
OpenAI is investing in security tooling for developers. What does that mean for defenders who don’t control the code or infrastructure?
Can LLM safeguards realistically keep up with how fast attackers mutate prompts and payloads?
To help answer these, hooked up with Mayank Kumar, Founding AI Engineer at DeepTempo, an AI solution built for threat detection.
Commenting on Open AI’s developments, Kumar expresses the following viewpoint: “I welcome progress, especially that of AI and chatbots, which are so widely used, abused, and lacking in oversight. However, OpenAI’s security efforts focus on securing the AI supply chain and the platform itself, primarily benefitting developers who control the code.”
This will lead to weaknesses, reckons Kumar: “While these agentic tools help reduce predeployment vulnerabilities, the prompt remains an inherent security bottleneck and a persistent attack interface. Since the prompt is the only way a user can interact with the model, any safeguard focused solely on sanitising the input will be brittle. This is pretty much synonymous with rules in cybersecurity defence.”
At the heart are technological obstacles, including: “Their core challenge is detecting the multistep, agentic actions that bypass prompt filters and manifest in live, dynamic environments, long after code is deployed. Because AI attackers use legitimate tools to pivot rapidly, defence requires specialised deep learning based models. This approach shifts the security paradigm beyond the model’s brittle interface to focus on observable consequences of the agent’s actions in the operating environment.”
Addressing these fundamental weaknesses, Kumanr finds: “Sanitising inputs or say prompts are like rules. Hence, Static LLM safeguards are fundamentally locked in a losing race against the speed and scale of attacker mutation. Attackers are able to generate multiple versions of prompts with the same intent but to rapidly bypass content filters, faster than vendors can patch them.”
As to the consequence, Kumar thinks: ” This speed mismatch renders the frontend prompt refusal insufficient for enterprise security. The defensive strategy must shift from blocking input to detecting the resulting intent by monitoring the action of LLM agents in the live environment.”
As to the implications of all this for the business community, Kumar recommends: “Enterprises must assess AI safety by evaluating the entire AI application stack, not just the foundation model. Assessment requires validation across three pillars: Robustness (testing for prompt injection), Alignment (adherence to corporate policies) and Observability (full auditable logging of inputs and actions).”
Kumar adds: “Most importantly, organisations must enforce the principle of least privilege on the AI agent itself, strictly limiting its access to tools, APIs and data. The most effective defence involves deploying a continuously monitored AI system where a specialised detection model can analyse the agent’s behaviour and immediately flag anomalous or malicious sequences of actions in production.”
