Roblox Sentinel is an AI system designed to detect early signs of potential child endangerment for further analysis and investigation. Implemented as a Python library, Sentinel uses contrastive learning to handle highly imbalanced datasets that often challenge traditional classifiers and can be applied to a wide range of use cases.
Detecting rare classes of content is hard for traditional classifiers because of the scarcity of examples, as in the case of child grooming attempts, which are vastly outnumbered by harmless conversations. For example, Roblox says its production system contains only 13,000 samples of harmful samples conversations, compared to potentially millions of harmless ones.
It is also important to understand that a single message (e.g., “Where are you from?”) may seem benign on its own, but reveal harmful intent when viewed in the context of surrounding messages and their progression.
Roblox engineers devised a specific approach to overcome these challenges.
By prioritizing recall over precision, Sentinel serves as a high-recall candidate generator for more thorough investigation. This approach is particularly effective for applications where rare patterns are critical to identify. Rather than treating each message in isolation, Sentinel analyzes patterns across messages to identify concerning behavior.
To achieve this, Sentinel analyzes a user’s recent messages and scores them based on embedding similarity. The score is computed by measuring how close each message is to rare-class and common-class examples, then taking the ratio of rare-class similarity to common-class similarity.
Sentinel then aggregates the scores calculated for recent messages from the same source to calculate skewness as a measure of the presence of suspicious patterns.
A positive skewness indicates a pattern where most content is common, but with enough rare-class similarities to create a right-skewed distribution.
A key advantage of this method is its resilience to variations in the number of observations, says Roblox, making it well-suited for sources with different activity levels.
According to Roblox, Sentinel improved the platform safety and led to over 1,000 official reports to authorities in the first few months of deployment. Since the system prioritize recall over precision, all suspicious cases will require human expert screening and investigation.
The decisions made by these analysts create a feedback loop that enables us to continuously refine and update the examples, indexes, and training sets. This human-in-the-loop process is essential to help Sentinel adapt to and keep pace with new and evolving patterns and methods of bad actors working to evade our detection.
While Sentinel AI has been designed with Roblox’s specific use case in mind, its creators say it can be applied to any classification problem where examples of the target class are scarce, in particular when the context across multiple observations matters, and high recall is a requirement. Another advantage of Sentinel is its ability to operate in near real-time at massive scale.