Datadog Employs LLMs For Assisting With Writing Accident Postmortems

Datadog combined structured metadata from its incident management app with Slack messages to create an LLM-driven functionality assisting engineers in composing incident postmortems. While working on this solution, the company dealt with the challenges of using LLMs outside of the interactive dialog systems and ensuring that high-quality content was produced.

Datadog opted to enhance the process of creating incident postmortems by utilizing LLMs to compile various sections of the postmortem report that engineers can subsequently review and customize to create the final version. The team working on this functionality spent over 100 hours fine-tuning the structure and LLM instructions to achieve satisfactory outcomes with diverse inputs.

The team explored different model alternatives such as GPT-3.5 and GPT-4 to asses cost, speed and quality of results and discovered that these can differ significantly, depending on the model version. Engineers observed, for instance, that GPT-4 produced more accurate results but was also much slower and more expensive than GPT-3.5. In the end, based on experimentation, engineers opted to use different model versions for different sections, depending on the complexity of the content, to strike a balance between cost, speed, and accuracy. Additionally, building the report was executed in parallel for different sections, resulting in the total time getting reduced from 12 minutes to below 1 minute.

Parallel execution of LLM requests with different models (Source: Datadog Engineering Blog)

Another interesting and important aspect of combining AI and human inputs in the context of writing postmortem reports was trust and privacy. The team focused on explicitly marking AI-generated content as such to prevent human readers, including reviewers, from blindly accepting it as final. Additionally, engineers have ensured that any sensitive information and secrets were stripped from the data fed into LLM models and replaced with placeholders. Datadog engineers explain how they addressed data security concerns:

Given the sensitivity of technical incidents, protecting confidential information was paramount. As part of the ingestion API, we implemented secret scanning and filtering mechanisms that scrubbed and replaced suspected secrets with placeholders before feeding data into the LLM. Once the AI-generated results were retrieved, placeholders were filled in with the actual content, ensuring privacy and security throughout the process.

As part of the AI-enhanced solution, postmortem authors got the ability to customize templates used for various sections of the report. Section templates also included LLM instructions in clear text to further promote transparency and trust in the system and allow users to adjust LLM instructions to better suit their needs.

Postmortem report section (Source: Datadog Engineering Blog)

Having worked on the LLM-powered functionality, the Datadog team concluded that although they believe LLMs can support operations engineers in creating postmortem reports, they can’t fully replace humans, at least at present. Still, GenAI-enhanced products can greatly improve productivity and give human engineers a head start when working on incident reports. The team has learned a great deal while working on this functionality and plans to expand on the data sources available to LLMs when generating postmortem contents, including internal wikis, RFCs, and system information. Additionally, the developers would like to test using LLMs to generate alternative postmortem versions, including custom and public postmortems.

Datadog Employs LLMs for Assisting with Writing Accident Postmortems

Leave a Reply Cancel reply

Stay Connected

Latest News

Google’s carbon emissions just went up again

China unveils “low-altitude economy” action plan to commercialize flying cars · TechNode

The ‘dangerous’ phones that won’t get UK emergency alert because they’re too old

I’m an Outdoor Writer. I’m Shopping 28 Deals From REI’s July 4 Sale

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News