Datadog combined structured metadata from its incident management app with Slack messages to create an LLM-driven functionality assisting engineers in composing incident postmortems. While working on this solution, the company dealt with the challenges of using LLMs outside of the interactive dialog systems and ensuring that high-quality content was produced.
Datadog opted to enhance the process of creating incident postmortems by utilizing LLMs to compile various sections of the postmortem report that engineers can subsequently review and customize to create the final version. The team working on this functionality spent over 100 hours fine-tuning the structure and LLM instructions to achieve satisfactory outcomes with diverse inputs.
The team explored different model alternatives such as GPT-3.5 and GPT-4 to asses cost, speed and quality of results and discovered that these can differ significantly, depending on the model version. Engineers observed, for instance, that GPT-4 produced more accurate results but was also much slower and more expensive than GPT-3.5. In the end, based on experimentation, engineers opted to use different model versions for different sections, depending on the complexity of the content, to strike a balance between cost, speed, and accuracy. Additionally, building the report was executed in parallel for different sections, resulting in the total time getting reduced from 12 minutes to below 1 minute.
Parallel execution of LLM requests with different models (Source: Datadog Engineering Blog)
Another interesting and important aspect of combining AI and human inputs in the context of writing postmortem reports was trust and privacy. The team focused on explicitly marking AI-generated content as such to prevent human readers, including reviewers, from blindly accepting it as final. Additionally, engineers have ensured that any sensitive information and secrets were stripped from the data fed into LLM models and replaced with placeholders. Datadog engineers explain how they addressed data security concerns:
Given the sensitivity of technical incidents, protecting confidential information was paramount. As part of the ingestion API, we implemented secret scanning and filtering mechanisms that scrubbed and replaced suspected secrets with placeholders before feeding data into the LLM. Once the AI-generated results were retrieved, placeholders were filled in with the actual content, ensuring privacy and security throughout the process.
As part of the AI-enhanced solution, postmortem authors got the ability to customize templates used for various sections of the report. Section templates also included LLM instructions in clear text to further promote transparency and trust in the system and allow users to adjust LLM instructions to better suit their needs.
Postmortem report section (Source: Datadog Engineering Blog)
Having worked on the LLM-powered functionality, the Datadog team concluded that although they believe LLMs can support operations engineers in creating postmortem reports, they can’t fully replace humans, at least at present. Still, GenAI-enhanced products can greatly improve productivity and give human engineers a head start when working on incident reports. The team has learned a great deal while working on this functionality and plans to expand on the data sources available to LLMs when generating postmortem contents, including internal wikis, RFCs, and system information. Additionally, the developers would like to test using LLMs to generate alternative postmortem versions, including custom and public postmortems.