DoorDash has built and deployed an AI-driven safety system called SafeChat to moderate conversations between Dashers and customers across in-app chat, images, and voice calls. Safechat applies machine learning to detect and respond to unsafe content in near real time, screening communications for offensive or inappropriate material and enabling immediate actions such as reporting issues or unassigning deliveries. SafeChat focuses on safety rather than engagement or automation, positioning AI as core infrastructure to protect platform integrity and the well-being of Dashers and customers.
SafeChat uses a layered AI architecture that combines machine learning models with human review for escalation. The system handles millions of interactions daily, classifying text messages, images, and voice communications. Its text moderation was implemented in two phases.
DoorDash engineers first used a three-layered approach in Phase 1. The first layer, a moderation API, provided a low-cost, high-recall filter that automatically cleared about 90 percent of messages with minimal latency. Messages not cleared proceeded to a fast, low-cost large language model (LLM) with higher precision, identifying 99.8 percent of messages as safe. The remaining messages were evaluated by a more precise, higher-cost LLM, scoring messages across profanity, threats, and sexual content. These scores enabled safety actions such as allowing Dashers to cancel orders when high-risk messages were detected.
DoorDash trained an internal model on roughly 10 million messages from Phase 1, enabling a two-layered approach in Phase 2. The internal model became the first layer, automatically clearing most messages. Only flagged messages advanced to the precise LLM for detailed scoring. Layer 1 responses occur in under 300 milliseconds, while flagged messages may take up to three seconds. This system handles 99.8 percent of traffic with improved scalability and reduced costs.
DoorDash two-layer text moderation architecture (Source: DoorDash Tech Blog)
Image moderation is processed by computer vision models selected for throughput and granularity. Thresholds and confidence scores were tuned through iterative human review to reduce false positives and false negatives. The system can process hundreds of thousands of images daily while maintaining latency compatible with live interactions.
Voice moderation was initially deployed in observe-only mode to calibrate confidence scores. Once thresholds were validated, the system could take automated actions such as interrupting calls or restricting future communications.
DoorDash voice moderation architecture (Source: DoorDash Tech Blog)
As part of the feature announcement in a LinkedIn post, DoorDash says,
SafeChat is about building confidence and trust for everyone on our platform through AI innovation and thoughtful design, creating a safer, smoother experience for all users.
SafeChat’s enforcement layer applies proportionate actions according to the severity and recurrence of violations. The system can block or redact unsafe messages, terminate calls, restrict communications, or escalate issues to human safety agents. Repeated or severe violations trigger account reviews or suspensions.
DoorDash engineers reported that combining layered AI models with human feedback loops enabled SafeChat to operate at scale while maintaining near real-time response. According to the company, the system has contributed to a roughly 50% reduction in low and medium-severity safety incidents since deployment.
