Transcript
Olimpiu Pop: Hello, everybody. I’m Olimpiu Pop. And I’m an InfoQ editor. And today I have in front of me Teena Idnani, the first keynote speaker from Dev Summit Munich and one of the speakers from this year’s QCon. She accomplished many amazing things. But without further ado, I’ll have to ask Teena to introduce herself. Teena, please.
Teena Idnani: Yes. Hi, Olimpiu. Thank you for having me. I am delighted to be here. And as an introduction, I’m a senior solution architect at Microsoft. I help organizations with their digital transformation journey, helping them design scalable cloud-native architectures. Before joining Microsoft, I worked at JP Morgan Chase, where I assisted the bank with setting up its Azure platform as part of its multi-cloud strategy.
Beyond my day-to-day work, I’m passionate about quantum computing and its potential applications, particularly in finance. I also actively mentor aspiring technologists, with a focus on supporting women in the tech industry. And, as you mentioned, I recently had the fantastic opportunity to share some insights into multi-cloud event-driven architecture at QCon 2025, which took place in April. I’m very excited to discuss the topic further with you here. But before we dive in, I want to mention that all thoughts and opinions that I’ll be sharing today are my own and don’t represent my employer’s views.
The Benefits of Event-Driven Architecture [01:54]
Olimpiu Pop: Okay, great. Thank you, Teena. So I was thinking about your presentation at QCon, and you had that scary diagram, those boxes, and it was unbelievable. And then it’s fascinating that, these days, we are discussing, even in the banking ecosystem, which was traditionally more conservative, the concept of multi-cloud and various layers of services that you’ll have, and probably on-premises as well. The important stuff that you don’t want to go out and then you have on different types of clouds, various services. And what are the key points that people should have in mind when looking into this kind of transformation, when you’re discussing multi-cloud in an ecosystem as complicated as the banking ecosystem?
Teena Idnani: Let’s discuss event-driven architecture, and then we’ll incorporate the multi-cloud aspect. At its core, event-driven architecture is a design paradigm that involves the production, detection, consumption, and even reaction to events. That’s what drives the system. And I remember the scary complex diagram, which I showed, in which there were hundreds of interconnected components which were distributed, and we are talking about that complexity. Event-driven programming is all about making these distributed components loosely coupled and enabling them to react to events. And that’s driving the system. For example, if you use an e-commerce platform, when a customer places an order, it generates an order-created event. That event contains all the necessary information, and the metadata may include the customer ID, product details, shipping address, and payment information.
But here is where it gets interesting: that single action that the customer did to place the order basically triggers a cascade of other events. For example, the payment system generates a payment process event, the inventory system creates an inventory reserved event, and the shipping system might generate, maybe, a shipping label created event. Now, each of these systems does not directly know the other. They’re just reacting to events that they care about, which means that you can have hundreds and thousands of interconnected components. Still, you don’t have to worry about them when you’re talking about event-driven, because the individual systems do not need to know about others directly. They’re just reacting to the events that they care about. So that’s what simplifies it. And what makes these event-driven architectures powerful is how they enable these loosely coupled systems.
Unlike when you talk about traditional request-response patterns, where services directly call each other. In an event-driven model, these components communicate by publishing and subscribing to events. This event-driven approach offers numerous advantages. And yes, there are a lot of considerations as well that you need to take care of. First, if I discuss the benefits, it provides exceptional scalability because your components can be scaled independently.
Event-Driven in Multi-Cloud [05:11]
Additionally, it enhances resilience. So, for example, if one component fails, others can continue operating. And most importantly, it enables that real-time responsiveness because your events are processed as they occur. Now, if you bring the multi-cloud flavor to it, what happens is that in a multi-cloud, you are not just talking about different services which are there in a particular system, but now you are also talking about extending those services to various cloud providers.
And that’s when it becomes essential to have some strategic considerations when companies are doing these multi-cloud transformations. For example, if you’re talking about the architecture and design, then you need to design for portability from the start. Use containerization, use microservices. When you’re building your APIs, you must be building cloud-agnostic APIs so that you avoid a deep vendor lock-in. Data is imperative when it crosses event boundaries. So you need to plan for data synchronization and consistency across clouds. And then the foundation of multi-cloud, right? The network. You also need to consider your hybrid connectivity patterns and network architecture across all areas.
Olimpiu Pop: Let me see if I got it right. So, an event-driven architecture is a domino system. You have the part that just gets started, you have an input, and then that cascades into multiple events, and they just go up to the point where they can. And then if something happens with a given system, you have that resilience that lets say the request is kept, and then when the system goes online, it’ll just start processing as well. If that’s the case in some situations, we can defer doing that. And then, because I got ahead of myself and added multi-cloud to the conversation in that particular case, you have to be careful about how you weave those into making pretty easy things. So, you’ll probably have IDs that are represented and make sense from one side to another, even though they are not the IDs you usually have in a database, where you simply link them. However, it’ll be the correct part.
Handling Data Residency and Regulation in Multi-Cloud [07:46]
Great, thank you. Still, it is funny for me. I was just thinking while we were discussing topics the previous weeks, now we are talking that much about sovereign cloud, and then we are debating now about how those things get represented in this new ecosystem with multi-cloud, but also sovereign cloud, because that adds an extra level of complexity. The banking sector is one of the most regulated ones. How do you anticipate this extra headache will look? So you have multi-cloud, and then you have the regulations, and then you have the new drive towards having your data contained in a place where you want to do it. Would it affect it or not?
Teena Idnani: Yes, it’ll affect it, and very rightly said, in financial organizations it becomes imperative to take care of these data residency requirements as well because of the strict compliance regulations that these financial organizations need to meet. But that’s where multi-cloud is also an advantage, because earlier, when you were imagining you were just in one particular cloud or maybe in one specific on-premise system, you were relying on the capabilities that that provider was giving to you. And sometimes, because of these data residency requirements, you really cannot access all the services that you ideally want to access. But this multi-cloud has expanded that service set for you. Each cloud provider offers a vast suite of services available in different regions. Now, if you don’t have a real vendor lock-in with a particular multi-cloud provider, then you can choose the best service that you feel would suit your specific scenario.
And of course, meeting the data residency requirements and the sovereign requirements of your regulator that you’re bound to. But it makes it easier to deal with. A lot of these regulatory organizations question a lot about the concentration risk. For example, if you are a large fintech bank and all your workloads are concentrated in a single cloud provider, such as AWS, then it becomes a concentration risk, given what happens if that cloud provider goes down. In those cases, it becomes essential for these fintech banks to explore the multi-cloud services to also look at these concentration risks that might get mitigated when you have multiple multi-cloud providers. So there are considerations when you are dealing with multi-cloud providers, specifically when it comes to these regulations, but there are also a lot of advantages around it.
Olimpiu Pop: Well, where there is an advantage, there is also a disadvantage. And then we have the chance, as technologists, to say it depends and make everybody smile hopefully. Did you say that you’re probably looking for solutions that are, let’s say, cloud-native, because that will help you, or maybe not go there to make things agnostic? Not relying on services tailor-made for one of the cloud providers will allow you to be sure that you have what it takes. Therefore, it’s better to use the cloud as infrastructure, simply, and then that’s it. So, you have a container, and then the cloud is simply a place where it resides; you just ensure that the infrastructure is not under your desk, but in the cloud. So, more or less, that should be the case.
Teena Idnani: Yes, that’s one way to look at it.
Olimpiu Pop: Okay.
Vendor Specific or Cloud-Native Services: What to Choose For Your Need? [11:31]
Teena Idnani: Yes, so it depends. In this case, yes, you are correct; we can utilise containers and then leverage your cloud providers as our infrastructure setup. But if you want to take advantage of the services that these cloud providers provide, then it also becomes essential to look at the past offerings of these cloud providers. So, it depends on a use-case-to-use-case basis whether you want to go for containerization and then use your cloud provider as an infrastructure provider, or you want to use the different platform-as-a-service offerings that these cloud providers have to offer, given their past services. For example, if I were to focus solely on Azure, we have Azure Functions. Now, with Azure functions, if you can configure them correctly, or with AWS Lambda functions, you can configure them to provide scalability options.
They can give you different security configurations like identity management, authentication, authorization. So those are the kind of things that you automatically get, the redundancy options. So there are advantages. The interoperability will be difficult, but that is the trade-off, right? Do you want to reap the full benefits that these cloud providers can provide you by using their platform services? However, to ensure this, you will require interoperability between the different services and systems hosted by various cloud providers. Or you want to go with, let’s say, each microservice that you have, you containerize it, and then you host it on a particular cloud provider, and that’s how you start using it. It depends on the use case that you’re dealing with.
Olimpiu Pop: Okay, but that begs the question. Okay, this is very complex, that’s for sure. How do you stay on top of that? Because banks also have a very high Service Level Agreement (SLA), usually in terms of operational efficiency. That’s the main question. How do you stay on top of that? Numerous solutions are emerging in the cloud-native ecosystem that are now taking centre stage.
Teena Idnani: Therefore, I suggest that you first upgrade yourself and stay current, especially with the new services being released by cloud providers on a daily basis. It’s not easy, but I want the organizations to invest in your team skills. And very rightly said this: multi-cloud systems, specifically event-driven architectures, require a different way of thinking. Your developers need to understand, first of all, the concepts like “What is eventual consistency? What is that idempotency? Distributed tracing?”
Additionally, your operations team needs to be comfortable with multiple cloud platforms. So this is not just a technology transformation that we are talking about; it’s also a skill transformation. So the teams must stay upskilled on the latest cloud providers. It’s also a massive mindset transformation to reap the benefits of cloud and see how you can use which service of which cloud provider, which will work well in the specific use case that you’re in. And then, how do you build these well-architected systems, applications, and architectures? Yes, not an easy job for sure, but that’s what keeps us on our toes, right?
Observability in Multi-Cloud Environments [15:05]
Olimpiu Pop: Yes, that’s true. So I was at KubeCon, and a lot of the folks there were talking about OpenTelemetry and open observability and all the other stuff. As an architect, what are you recommending? What did you try, and what was working in terms of observability?
Teena Idnani: Yes, I recommend OpenTelemetry. So, basically what I usually advocate is that observability is where many multi-cloud initiatives succeed or fail. Observability is critical, but at the same time, it’s very challenging in multi-cloud environments because when your events cross multi-cloud boundaries, traditional monitoring approaches break down, resulting in lost end-to-end visibility. So to solve that, what I recommend is, first of all, standardizing on an observability data model across all clouds, which means agreeing to standard formats for tracing, for metrics, for logs, regardless of which cloud they originate from.
And then yes, I do recommend OpenTelemetry as a foundation for this standardization because OpenTelemetry basically provides vendor-neutral APIs. That’s a crucial aspect here, as it allows for the instrumentation to collect your telemetry data. This means that your application code instruments all the different events the same way, regardless of which cloud provider you’re running it on. You’re running it in AWS, you’re running it in Azure, or you’re running it in GCP, and then that’s one platform that you need to use.
However, for tracing specifically, I have seen teams that work together implement these correlation IDs that persist across different cloud boundaries. Each event will carry its unique ID, but it’ll also contain the complete causality chain, allowing them to reconstruct the entire event journey, even if it spans multiple services across different clouds.
Olimpiu Pop: Okay.
Teena Idnani: And then you have dashboards at any given point in time; you do require visualization tools and the dashboards because it’s essential to create a unified observability platform that aggregates data from all cloud providers. In such cases, I prefer a cloud-agnostic solution because you’re dealing with a multi-cloud environment. So, using a cloud-agnostic solution, like Grafana or Elasticsearch, or you could leverage one particular cloud’s monitoring capabilities as a central aggregation point and get the rest of the cloud providers to send all the traces and logs to that central aggregation point. However, I believe observability, when it arrives, is crucial to ensure that we get it right, especially in multi-cloud scenarios.
Olimpiu Pop: So, regardless of whether we are working in a single cloud environment where we have multiple services or we are working in a multi-cloud environment, the secret sauce is to remain consistent and harmonize the way we are using it. For instance, in the case of logs, always use the same standard so that it makes sense and they look the same regardless of where the information is coming from, whether it’s cloud A, cloud B, or our infrastructure under the desk. And then, when you’re discussing tracing, given that you would like to have it properly organized and you have the complete picture of everything, just make sure that the ID is moving from one call to another, regardless of the cloud boundary, to just allow the telemetry magic to show itself.
Teena Idnani: Yes, yes, exactly. I think you just nailed it. Absolutely. This is one of the most common mistakes that I often see: the need for that semantic consistency in your data. By that, I don’t just mean that your data format matches and matches. No, actually, the meaning behind those metrics, those error codes, should be consistent as well. So absolutely, consistency is the key there.
Olimpiu Pop: Okay. And then the visualization is just the aggregator. If you completed the first two correctly, the rest should be a breeze because OpenTelemetry is a standard, and everything has been built on top of it, allowing us to take advantage of the benefits of consistency.
Teena Idnani: Right. And additionally, I think what is beneficial is rather than just having technical dashboards, instead of just showing your CPU utilization across different clouds, if you have these business process dashboards wherein you are showing things like how many orders got processed per minute or maybe your average order processing time, or if you want to show maybe your orders which are stuck in processing. That will be an excellent use of the dashboards and the visualization because it’ll give your business stakeholders visibility into what’s happening without really needing to understand the underlying technical complexity.
Olimpiu Pop: I like this because I see more people who are talking about user experience and the benefits that the end user sees, so the business side of the organization. So I liked that you touched on that because it’s a notch closer to the ideal situation where you’re just delivering a service, and then you don’t care about the underlying technology. So, that is the advice that we need to keep in mind: a domain boundary, let’s say, when we are moving from one side to another, and then think from the point of view of the domain. So, rather than just saying that we want, I don’t know, five orders per minute, the satisfaction of filling in that order is 20%.
When to Consider Event-Driven Architecture for Your System [20:57]
It’s Friday, so it’s hard to come up with proper numbers. Nevertheless, event-driven architecture is not for the faint of heart. So it’s not something that you’ll start off the bat because, as you said, it’s not a classical architecture server-side that’s quite simple to understand. Event-driven requires a level of understanding of everything before you get started, and to make things right. When shouldn’t you use event-driven architecture, or what’s the scale of your system or the amount of events? I don’t know. What is a proper heuristic to make sure that you understand when you should look into event-driven architectures rather than overcomplicating your life?
Teena Idnani: Yes, very, very well said. The first suggestion from my side would be to consider whether it’s a straightforward scenario that your application is dealing with. There’s no need to over-engineer, as event-driven systems require a mindset shift when building them. Because right from the start of when you need to be brainstorming the events, you need to start thinking from a domain-driven model, those kinds of things, and then you need to make your way out. It is complex, and it is not easy.
So, in case your application is something that can be handled alongside the rest of the communication, such as a simple linear workflow. It has straightforward crowd operations, without much unpredictability; they’re all very predictable. Then I won’t say you need to go for an event-driven approach. If you have a simple request-response pattern where direct calls are sufficient, I recommend sticking to the direct request-response pattern rather than building an event-driven system on top of it.
If your application is a small-sized application with minimal integration requirements, you have few integration boundaries, you’re not interacting with multiple different systems, and you don’t require loose coupling between the systems, then I would say don’t opt for an event-driven approach. Or sometimes we have tasks that need batch processing. So, event-driven is more for real-time things. So, if you have to do something, a batch overnight job has to run. I don’t see a reason why you need to be using an event-driven approach for those kinds of things. Other places where you should be thoughtful before using event-driven are, “What if your application has strong consistency requirements?” As you may have heard, I mentioned at the QCon conference that you should be aware of your consistency. Is your application okay with eventual consistency? And consistency is not binary; it’s a spectrum.
You need to determine the consistency level your application can use. If it requires strong consistency, I recommend that, for example, if a financial transaction requires your asset properties across multiple operations, then an event-driven approach may not be the best use case. Another example is your real-time inventory management, where stock levels should be accurate; otherwise, you are sending an order for a product that doesn’t exist.
So those kinds of things become more challenging when you’re doing it in an event-driven way. It’s not possible to do event-driven in those scenarios, but the debugging will become a little bit more challenging for such simple scenarios. You will see those observability challenges. You will experience performance and latency-related issues. And then we talked about the skill set of the people. So you will have those team constraints, the organization constraints. You may also sometimes encounter data limitations. So, if you have highly sensitive data, then you cannot save it in your queues. Those kinds of things sometimes become a reason for you not to use the decoupling method of event-driven programming, and instead opt for straight, direct request-response mechanisms.
Olimpiu Pop: Okay, that’s fair. In a lot of situations, as mentioned, I was discussing with Sam Newman the other day, and he was saying the following: “If you want… So, pretty much, that’s why it jumped into my mind. It’s pretty much as you are saying, “If you have other options, don’t go to event-driven”. And he was saying the same thing. “If you have other options, don’t go to microservices; start with the monolith and then break down pieces and go there”. Can we apply the same heuristic here?
Teena Idnani: If you have other options, then don’t go for event-driven. There are some antipatterns that you should avoid for event-driven. So, for example, do not use events for synchronous request-response; otherwise, you end up over-engineering simple problems with unnecessary event complexity. So, those are the kinds of things you need to avoid. Microservices are a good pattern to go with. It’s again a trade-off, right? Monolith versus microservices: both have their trade-offs. With a monolith, you are too tightly coupled; then you’re not doing that loose coupling. I would say it is required to use, but then you just need to be careful about certain antipatterns where you should avoid using an event-driven approach.
At Event-Storming, Bring Together the Technical People with the Business Domain People [26:36]
Olimpiu Pop: Fair enough. Thank you. I was just thinking, as you’re discussing, you have the event storming, and you’re just thinking about your events, and then you have to consider them to be in a given domain. Would you believe it would be helpful to have more complex teams? You were just saying that you need to have people who understand the business domain, where you have a horizontal cut through your company. As you mentioned earlier, we should also consider everything from a business perspective. However, techies are usually not as keen on this approach, as they care about zeros and ones. Well, we’ll not discuss qubits these days because it’s another conversation, but they don’t care about these things. How do you make sure that those points are tackled properly in these kinds of systems?
Teena Idnani: For event storming, right, it’s essential that you’re not just doing event storming with your technical people in the room. No, you must be doing event storming with the business analysts and the product owners, who have complete knowledge about it. And then yes, you should have tech in the same room as well because tech should hear about it. Gone are the days when tech would just take the requirement and then implement the requirement. It’s essential to know the big picture.
“Why are we doing it? What business problem are we trying to solve by using technology?” Therefore, it’s essential that business and technology meet each other’s needs; technology needs to support business, and vice versa. When you are discussing these domain-driven designs, when you’re talking about event storming, you are basically building what the user journey looks like. How will the customer, let’s say, place an order? What events will be generated in that case? How will those events then be consumed by the different application systems and the different components? You need to start doing that from a business point of view and then see how technology can help in achieving that outcome, the business outcome that we want.
Olimpiu Pop: Okay. In your experience up to now, in designing systems, let’s say event-driven systems, what are usually the customers looking for? I mean, as we discussed, you’re discussing with business leaders or whatever, whoever are the business stakeholders. I bet that they’re not coming to you and saying, “Hey Teena, I would like an event-driven system that does that for me”. What are they actually looking for?
Teena Idnani: Yes, they’re looking for solutions to their problems. Let me give you an example from one of my previous organisations’s example. One of the organizations I was working with, they were undergoing this digital transformation journey, and they had this requirement that one of the systems that they were using was really not giving them the business value. The transactions which were being processed on that system were really not very scalable. It was not giving them that right value add, and they were thinking of different options. What to do with it? One way would be to deprecate that capability altogether. Do not use that, and do not offer that to your customers again.
The other would be, let’s take a third-party service, which would do that for us. And then the last one would be, let us develop something internally, which then we can build from scratch, use the latest cloud-native services so that we are taking care of those requirements, not just the functional requirements, but also ensuring that we are building them, the applications, as scalable, as performance oriented, and then they have the right resiliency and redundancy and different kind of options.
So that’s the problem they come to us with. “That is the challenge that we are seeing. How do you think you can help us, or what is the best way to move forward?” And then in the example that I’m giving, we actually went with the third option. We decided that we would build a modern, digitally transformed application, API-driven, but we built it in Azure. And then it was also having a lot of integration points with the other components, which were already legacy and existing. And that’s when then the whole strategy comes into picture, that you need to look at all the integration data points, how your new service is going to integrate with the others, which cloud provider is it going to be hosted on, and then what are the different data residency requirements and everything that comes with it. But to start with, it all starts with a business problem. And then moving forward. How do you solve that business problem using the tech that we have?
How to Secure Your Systems’ Data [31:31]
Olimpiu Pop: Okay, it seems like problems taken from a manual, and that’s quite nice. So we are discussing about scalability or discussing about resilience and also redundancy because a lot, not the banks, stuck in my mind. The banks still have old systems, decade-old systems at some point. But those systems are working as expected. But again, they need to have some boundaries. Then, you need to integrate them with modern technology, as we would like to have banking capabilities on our phones. We want to have very fast banking wherever we are. So then this is the opportunity to do that. You just create, probably, a wrapper around the old systems. You create those boundaries; you just ensure that they react in the proper way, and then you push all those pieces together to just ensure that we are taking care also about the non-functional requirements, so the legal requirements at some point.
But as we are discussing about non-functional and discussing about data. Data is very expensive these days, especially if it gets outside the boundaries of the companies. How should you treat, from the security point of view, event-driven architectures? How do they differ from the classical architectures, or if they differ in any way?
Teena Idnani: It would be different when you are talking with varying systems of cloud.
Olimpiu Pop: Okay.
Teena Idnani: So security would be like, “You need to think about your data in transit”. So how do you secure your data when it is in transit, right? So, for example, you would want to encrypt all your event messages, which are crossing the different boundaries using your TLS or SSL, and you would use your mutual TLS; you would apply the digital signatures. So I think yes, you would want to do similar things that you would do for your on-premise system as well, similar to your multi-cloud systems. Ultimately you need to keep your data secure, and not just in transit, but also at rest. And I think that is where a lot of these database services, so if you’re talking about the PaaS services, which these cloud provider, provide, they provide you those encryption capabilities, whether it is at risk, whether it is in transit as well.
And then it’s very important for you to implement your key rotations. If you are having your storage account and then you have your access keys, then you need to ensure that you are doing those regular key rotations to ensure that we are keeping the data which is in these storage accounts kind of secure. So I would say that the considerations… Data is important; you need to keep it secure, whether it is in your on-premise system, whether it is in your cloud systems. With cloud providers, like I mentioned, these providers, they do provide you with those services that can help you keep your data secure, but configuration is your responsibility. So security is like a shared responsibility between your cloud providers and between you. You can’t just say that, “Hey, because I’ve hosted my data on a particular cloud provider, it is implicitly secure”. No, it is your responsibility. Like, the cloud providers will give you the right tools to make your data secure, but then the configuration responsibility lies with you.
Olimpiu Pop: But it just jumps in my mind now. It’s pretty much like your house. You have your house, but you have to take care of your windows and doors to be locked so that you’re safe. But then whenever somebody is leaving your house, you’re just making sure that you explain to them, “Be sure that you cross the road carefully and that you’re secure also in traffic”. So yes, fair enough. Thank you.
Teena Idnani: Yes, wonderful example. Exactly. And I think it’s also very important to have that network security. You gave me the example of house, and that’s exactly when I remembered the network parameter. So you need to ensure that you’re using the right private endpoints; you are avoiding that public internet exposure to your data. So absolutely, I think all of these things are very important when it comes to the security of your data.
Olimpiu Pop: Okay, cool. This was really insightful. So was there anything else that I should have asked you about event-driven and multi-cloud that I didn’t, but I should have?
Teena Idnani: No, I think you’ve pretty much covered it all. I think the considerations when you’re dealing with event-driven, specifically in the case of multi-cloud. And then I think observability was one part that I was not really able to touch very well on my QCon presentation, as you have the time limitations, and I still think I had so much to cover. So yes, I think you pretty much covered it all.
Olimpiu Pop: Okay, thank you. Thank you, Teena, for your time and for all the insights.
Teena Idnani: Thank you. It was a pleasure.
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.