Five Hard Lessons From Recovering A Catastrophic Microservices Migration

At QCon San Francisco, HeartFlow’s VP of Engineering, Sonya Natanzon, shared lessons from navigating a recovery process after inheriting a catastrophic identity migration that locked users out of a healthcare portal on day one. Her five hard-won lessons reveal that successful architectural recovery depends as much on perception management and team dynamics as on technical prowess.

The disaster Natanzon inherited stemmed from a nine-month effort to migrate a healthcare portal from monolithic architecture to microservices, using a commercial identity provider. The release crashed immediately, locking all users out. The original engineering lead has left, trust evaporated, and Natanzon stepped into the wreckage, facing a critical question: how to deliver the migration’s promised value while restoring system stability and team credibility.

Her first lesson emphasized the need to balance forward progress with damage control. With users, the team needed to demonstrate that the portal would be available when they needed it. Reliability trumped innovation. With business partners, they had to prove they could balance architectural improvement with concrete business value rather than pursuing technical perfection in isolation. Natanzon’s strategic decision was no more “big bang” releases: large releases delay business value that moves companies forward.

Natanzon urged attendees to constantly balance feature parity with new business value

Instead, she advocated for incremental delivery with increased transparency, communicating openly and clearly about the engineering work to rebuild stakeholder confidence, which leads to her second lesson: owning the spotlight. In a sharp change from the team’s previous behaviour, Natanzon championed proactively communicating progress, setbacks, and realistic timelines to stakeholders, as transparency about challenges builds trust more effectively than defensive posturing.

Her third lesson challenged conventional wisdom: make it better for now, not for the future. Teams recovering from failures often feel pressure to build robust, future-proof systems. Natanzon insisted on a different approach: build for immediate needs and relentlessly prune parts of the system that don’t deliver concrete business value. This pragmatic focus allowed the team to demonstrate tangible improvements quickly rather than getting mired in architectural perfectionism.

The fourth lesson addressed perception management head-on. Technical teams often dismiss concerns about perception as superficial, but Natanzon argued that perception directly impacts a team’s ability to execute. Negative perceptions tend to long outlive the problems that cause them, and crawling out of them is a long, tedious process. Also, perception is emotional, and so cold, hard data won’t necessarily change it. She recommends building relationships, consistently engaging stakeholders, and closing the loop on perceived problems as soon as possible.

Negative perceptions tend to long outlive the problems that cause them

Her final lesson: pay attention to the team. Tech is not the only patient in architecture disaster recovery – the team is a patient too. Natanzon stabilized the team through better documentation and good onboarding practices, then fundamentally changed the culture from knowledge silos and individual achievement to collaboration, transparency, and team success. Interestingly, the high attrition from the original failure actually made implementing cultural change easier.

The experience reinforces broader industry lessons about microservices migrations. As previous InfoQ coverage has documented, organizations frequently underestimate the complexity of breaking apart monolithic systems. Natanzon’s recovery playbook offers practitioners something more valuable than avoiding failure: a template for responding effectively when architectural initiatives go catastrophically wrong, recognizing that technical solutions alone cannot rescue projects that have lost organizational trust and team cohesion.

Five Hard Lessons from Recovering a Catastrophic Microservices Migration

Leave a Reply

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply