DiRMA: Measuring How Your Organization Manages Chaos

Key Takeaways

Chaos Engineering (CE) and Disaster Recovery Testing (DiRT) are essential methodologies for addressing modern technological challenges beyond traditional error budgets.

DiRT enhances system resilience by intentionally instigating failures, exposing hidden risks, and improving disaster recovery effectiveness.

Maturity models like the novel Disaster Recovery Testing Maturity Assessment (DiRMA) framework provide structured paths to enhance DiRT implementation across people, processes, and tools, helping to overcome cultural resistance and metric measurement issues.

DiRMA evaluates DiRT adoption across people, processes, and tools, assessing maturity levels (Introductory to Advanced).

Continuous improvement is key, with DiRMA emphasizing ongoing enhancement of DiRT practices through monitoring, feedback, and adaptation to evolving technologies.

In today’s complex technological landscape, traditional error budgets are no longer sufficient to address modern challenges such as cloud outages, AI bias, data loss, and regulatory compliance. To build more resilient systems, companies like Google, Netflix, Slack, and CapitalOne have adopted structured methodologies such as CE and DiRT. While these approaches improve system reliability by deliberately introducing failures, implementing them effectively presents challenges, including cultural resistance, lack of ownership, and difficulty measuring their impact.

To address these challenges, organizations have developed maturity models, structured frameworks that assess the effectiveness of reliability programs and guide their improvement. However, while CE maturity models exist, they do not account for the unique characteristics of DiRT, which goes beyond system resilience to evaluate business processes and human responses.

This article introduces DiRMA, a new framework designed to measure and improve the maturity of DiRT programs across three key dimensions: people, processes, and tools. By assessing an organization’s current capabilities and providing a structured path for advancement, DiRMA helps teams overcome common obstacles and build a more resilient disaster recovery strategy.

The following sections will explore the fundamentals of DiRT, compare existing Chaos Engineering maturity models, and detail how DiRMA provides a comprehensive approach to evaluating and enhancing disaster recovery readiness.

DiRT Overview

DiRT is a structured approach to stress-testing systems by intentionally triggering controlled failures. Originally pioneered in large-scale technology infrastructures, DiRT helps organizations proactively identify weaknesses and refine their recovery strategies. Unlike traditional disaster recovery methods, which rely on theoretical scenarios, DiRT forces teams to confront real operational disruptions in a controlled manner, ensuring that failure responses are both effective and repeatable. The methodology consists of performing a coordinated and organized set of events, in which a group of engineers plan and execute real and fictitious outages for a defined period to test the effective response of the involved teams [Climent, 2019]. Tests are categorized into tiers based on the organizational breadth of those expected to respond or be impacted by the testing as it is described in Table 1.

Table 1. General Description of DiRT Tiers

Tier 3	In this tier, the exercises are based on testing resilience in specific systems or isolated products. It is not expected that the experiments impact other external applications.
Tier 2	Here, testing resilience is focused on probing the dependencies of a shared system or product, so the experiments are applied to services, such as databases or APIs, used by other applications.
Tier 1	Finally, at this level, the experimentation looks to test the organizational response to an enterprise-level event. The exercises used to be fictional and involve persons and processes.

According to the information shared by Google publicly, the following examples illustrate what types of real-world tests would be performed at each tier. Tables 2, 3 and 4 show scenarios and what a team can learn practicing with exercises in each one of tiers.

Table 2. Example to run a DiRT test in Tier 1

Name

Vulnerability exploited in a core system.

Scenario

A security vulnerability is exploited with the potential to be leveraged by a threat agent to compromise the availability of an e-commerce application.

What to learn

If security incident management protocols are working properly.

If all impacted users are notified.

If support staff is able to isolate the impacted components.

If the experience promotes creativity and culture.

If the communication matters, especially when time is limited.

If backup and restore are essentials during the restoration.

Table 3. Example to run a DiRT test in Tier 2

Name

Database degrades the response times of a service.

Scenario

A change in a database degrades the quality and the response times of an application.

What to learn

If the dependent service remains within its published SLA.

If the response communication is flawless.

If the emergency serving capacity is increased.

If service consumers tolerate worst case scenarios or if they assume the average experience as a baseline.

If alerting and monitoring systems behave the expected way.

Table 4. Example to run a DiRT test in Tier 3

Name

Test in an isolated service.

Scenario

A change in the configuration parameters during a deployment generates an increment in the CPU and memory consumption.

What to learn

If the incident management protocols are working properly.

If there is a continuous testing protocol defined by the owners.

If the disaster recovery plan is in a readiness state.

If the service is resilient to a specific class of failure.

If the service is not overly dependent on a specific resource (cluster, infrastructure service).

In addition to the tiers, tests are also classified in terms of prioritization, communication protocols, and impact expectations. These tests must all include a revert/rollback plan in case something goes wrong and they are reviewed and approved by a cross-functional technical team, different from the coordinating team. The lifecycle of a test is illustrated in Figure 1.

Figure 1. Illustration of the DiRT Test Lifecycle, based on the model presented in Chapter 5, contributed by Jason Cahoon, in the “Chaos Engineering” book [Rosenthal and Jones, 2020]

Google practical resilience testing covers a broad range of scenarios such as disconnecting complete data centers, forcing application traffic rerouting, introducing configuration changes to live services, and deploying deliberately flawed service versions. Additionally, they experiment by disabling people who might have knowledge or experience that is not documented, or removing documentation, process elements, or communication channels. More information can be found in [Rosenthal and Jones, 2020].

In the book, Rosenthal and Jones explain that DiRT offers a means of verifying conjecture and proving a system’s behavior empirically, leading to a deeper understanding and ultimately more stable systems. Given the importance of resilience testing, it is valuable to have a method to determine how to start, how to progress to advanced levels, and to evaluate how well one is progressing in this journey, which is precisely what DiRMA seeks to do.

Chaos Engineering Maturity Models

Maturity assessment models have been created to help organizations to understand and improve their capabilities in several particular areas, such as security, reliability, and innovation, among others. These areas have evolved along with the organizations to offer a better view of the current state of adoption, implementation, and sophistication of the relevant subject matter in the state of art. Those models use surveys, performance data, and observation to gather insights about related key indicators:

High employee turnover

Inconsistent communication

Frequent project delays

Lack of clear goals

Conflicting priorities

Poor decision-making processes

Low morale

High stress levels

Lack of standardized procedures

Particularly for CE engagements, two maturity models have been documented in the literature: the CE Maturity Assessment Model from Netflix [Rosenthal, Hochstein, Blohowiak, Jones and Basiri, 2017] and the CE Maturity Model from Harness [Mukkara, 2022]. In both cases, they were designed to guide organizations in their journey towards building more resilient systems through controlled experimentation.

The first one, Chaos Maturity Model (CMM), is based on two dimensions: sophistication and adoption. Sophistication measures the validity and safety of experiments, ranging from elementary (manual, non-production) to advanced (fully automated, integrated with development). Adoption measures the coverage of chaos experimentation, from in the shadows (unsanctioned, few systems) to cultural expectation (frequent experiments for all services, part of onboarding).

The second model, acronym Chaos Engineering Maturity Model (CEMM), proposed by Mukkara for Harness [Mukkara, 2022], was designed to guide organizations in progressively adopting and scaling CE practices. The model emphasized a gradual approach, for which it divided CE maturity into four levels, each requiring specific goals and actions. Starting with basic experiments and progressing to full integration into the development lifecycle and production environments.

The Level 1 – Test/Start is the foundational stage where organizations begin experimenting with CE and engineers are selecting less critical services for conducting basic experiments.

Level 2 – Automate emphasis on automating simple chaos experiments within the continuous delivery pipelines. Here the organizations begin collecting reliability metrics, such as service health and resiliency scores, to track improvements.

In a Level 3 – Scale, successful chaos engineering practices are scaled across all teams and services and auto-remediation is implemented for experiments resulting in system failures.

Finally the Level 4 – Expert has CE fully integrated into production environments, where chaos experiments are developed based on production incidents and validated in lower environments.

In essence, the objective of maturity assessment models is to provide a structured and systematic way for organizations to understand their current capabilities, identify areas for improvement, and achieve their strategic goals.

Disaster Recovery Testing Maturity Assessment (DiRMA)

DiRMA is inspired by the program DiRT, created in 2006 by Google to inject failures in critical systems, business processes and people dynamics to expose reliability risks and provide preemptive mitigations. Since some organizations have already started their journey toward the creation of environments for DiRT, in which they can launch failures, determine their level of resilience and test their incident response processes, it is essential to have frameworks, like CE Maturity Assessments, to evaluate the effectiveness, in this case, of a program like DiRT.

Intending to lay the first foundations for the development of these models, this article presents DiRMA: a practical framework for evaluating and improving the readiness of organizations in terms of DiRT. DiRMA is inspired by the PMA or Production Maturity Assessment [CRE, 2021], created also by Google to evaluate where a team lies on the SRE spectrum, and in this sense, uses an employee survey, group discussions, and leadership observations to determine in a range from one to five, the adoption level of DiRT.

DiRMA maps the results in three key dimensions: people, process, and tools. These dimensions give companies a clear picture of their current state and provide the next steps to reach a proper level of DiRT. IThe next sections explain the methodology, the three dimensions, and the five levels in the model.

DiRMA answers these questions:

How are people involved with DiRT, ranging from in-shadows to a complete training program?

How does DiRT use systems and business metrics to ensure the program’s reliability and accuracy?

How does DiRT utilize historical data to forecast capacity needs and inform resource allocation decisions?

The Framework determines a level of maturity, ranging from introductory to advanced, in three dimensions evaluated during the DiRT exercises, practiced in Google. The methodology is illustrated in Figure 2.

Figure 2. DiRMA Map proposed in this article

Specifically, each dimension (persons, processes, and tools) has a set of questions. Each one of those questions has five options to be answered and each of those options has a score associated as it is illustrated in Tables 5 and 6. With this structure, once participants complete the survey, the framework determines the organization’s score in each dimension by averaging the participants’ scores on each question and dimension.

All the questions have the same weight, so the measure used to determine the organization value is the mean, but it could also use the mode or the median, which are more robust measures than the mean. An example of the process is also illustrated in Figure 3.

Table 5. DiRMA questions and answer options example

Question 1: How are people involved with DiRT, ranging from in-shadows to a complete training program?
Answer Options
Option 1	☐ Never has run any experiments yet.
Option 2	☐ Early adopters infrequently perform DiRT.
Option 3	☐ Multiple teams are interested and engaged.
Option 4	☐ A team is dedicated to the practice of DiRT.
Option 5	☐ DiRT is part of the engineering onboarding process.

Table 6. Association among Answer Options, Scores and Maturity Levels

Question	How are people involved with DiRT, ranging from in-shadows to a complete training program?
Answer	Option 1	Option 2	Option 3	Option 4	Option 5
Score	1	2	3	4	5
Maturity Level	Introductory	Elementary	Basic	Sophisticated	Advanced

Figure 3. DiRMA Process proposed in this article

The Three Dimensions of DiRMA

DiRMA assesses the DiRT adoption level in terms of people, processes, and tools:

People: DiRMA delves into the knowledge, mindset, and attitudes of individuals involved in the disaster recovery program. It emphasizes the importance of evaluating different roles, including operations engineers, developers, product owners, architects, managers, and executives. Although measuring mindset or attitudes within an organization is a complex task because they are inherently intangible, the framework, here proposed, gains insights into them using a survey that includes questions designed to gauge employees’ feelings about their work, the organization, and their sense of belonging.

Process: DiRMA analyzes and assesses the maturity of processes within the disaster recovery program delivery. It highlights the need to consider the various subprocesses and the involvement of different teams and roles, emphasizing the importance of interviewing the right people.

Tools: DiRMA evaluates the sophistication of the tools, such as fault injection tools, monitoring platforms, and automation scripts, employed in disaster recovery program delivery. It recognizes that technology encompasses both technical and user experience aspects and acknowledges the diverse tools used for injecting and observing failures. Sophistication is measured in terms of the environment in which the tools are used, the setup configured, automatic result analysis, and either manual termination or automated experimentation. Other criteria include if the results are tracked over time and if tooling supports interactive comparison of experiment and control.

The Evolution of DiRMA Maturity

DiRMA defines distinct maturity levels, visualized as an evolutionary journey. To provide a better understanding of the methodology, DiRMA maps these levels in graphic representation:

Introductory (Padawan): at this initial stage, disaster recovery efforts, even if started with DIRT, are often disorganized, ad hoc, and potentially chaotic. Success relies heavily on individual efforts and lacks repeatability. Processes are poorly defined and documented, hindering replication.

Defined (Senior Padawan): DIRT is repeatable because processes are defined, established, and documented. Basic project management techniques are applicable, and successes in key process areas can be replicated.

Managed (Knight): stakeholders actively monitor and control DIRT within the organization through data collection and analysis. Process metrics are used, and the effective achievement of process objectives is evident across various operational conditions.

Advanced (Jedi): DIRT reaches an optimized level, where processes undergo continuous improvement through monitoring and feedback. The focus is on continually enhancing process performance through both incremental and innovative technological changes and improvements.

Although DiRMA is based on successful models such as the CMM from Netflix, CEMM from Harness, and PMA from Google, and despite having been used in academic scenarios, the journey towards complete validation has so far begun. In the future, it will be necessary to implement DiRMA in other scenarios, which will allow more data and get feedback on the learned lessons to be collected.

In the long term, DiRMA and the other maturity assessment models will have to adapt to the rapidly evolving landscape of emerging technologies by incorporating more dynamic and data-driven assessment methodologies and fostering greater interoperability between different maturity frameworks. Finally, a critical area for development should be the integration of human-centric factors, such as organizational culture and individual learning, to ensure that maturity models truly drive sustainable and meaningful progress.

Conclusions

In the evolution of reliability practices, traditional error budgets are insufficient for modern technological challenges (cloud outages, AI bias, etc.). Companies like Netflix, Slack, CapitalOne, and Google have adopted structured methodologies like Chaos Engineering (CE) and Disaster Recovery Testing (DiRT) to enhance reliability.

However, in implementing programs like DiRT, organizations have faced hurdles like cultural resistance, lack of ownership, and difficulty in measuring the business impact of reliability programs. Maturity models help address these challenges by providing a structured path for improvement.

Considering this scenario, this article introduced DIRMA, a framework that provides actionable insights on how to implement DiRT within an organization, identify areas for improvement and build a more robust and resilient disaster recovery plan. By using DiRMA, organizations can systematically identify areas for improvement and develop a more robust and resilient disaster recovery plan.

DiRMA provides a structured approach to assessing and enhancing disaster recovery readiness. By evaluating DiRT adoption across people, processes, and tools, organizations can systematically improve resilience and adaptability in an evolving technological landscape.

DiRMA: Measuring How Your Organization Manages Chaos

Key Takeaways

DiRT Overview

Chaos Engineering Maturity Models

Disaster Recovery Testing Maturity Assessment (DiRMA)

The Three Dimensions of DiRMA

The Evolution of DiRMA Maturity

Conclusions

Leave a Reply Cancel reply

Stay Connected

Latest News

A Top Self-Emptying Robot Vacuum and Mop is $300 Off

Best Product Launch Calendars to Plan Successful Launches

Top 6 Cryptos Set to Explode in 2025 – Best Crypto to Join Now That Could Dominate

Meta Oakley smart glasses revealed with an assist from Steph Curry

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Key Takeaways

DiRT Overview

Chaos Engineering Maturity Models

Disaster Recovery Testing Maturity Assessment (DiRMA)

The Three Dimensions of DiRMA

The Evolution of DiRMA Maturity

Conclusions

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News