Transcript
Di Cesare: I’m going to talk about what we did at DKB about introducing a product mindset in the platform team. I will explain as well what we actually mean by platform team, and what’s the point of introducing a product mindset, and our path to there, what we did, what we still have left, and what we reached.
DKB is a German online bank, which was founded in 1990, in Berlin. We are the special bank in Germany. We are not in Frankfurt. I’m a senior platform engineer. I’m part of the experience team of the platform, that is the team who is doing basically customer relations, so talking with customers, ensuring that user needs are taken into account in the platform, helping them onboard. I’m a member of the CNCF Platform Working Group, which is a good place to have some good information and work on the platform topic. I’m interested in linguistics and languages, so I talk often about communication and understanding each other, which is an important topic for platforms as well.
What is a Platform?
Then, when I talk about platform, so platform is quite a very broad term. The meaning I use here is the meaning that was used in “Team Topologies”, which is a very important book to read, which was released five years ago about that topic. This book describes different kinds of teams, and defines a platform team as the team that provides a compelling product to accelerate delivery by stream-aligned teams. Stream-aligned teams, from that book, means teams who are delivering value directly. For us, for example, people working with internet banking, working with private customers, and so on. Basically, a platform team is about putting know-how together instead of spreading it out through different product teams.
Where We Started
This is a bit of also the scope of the platform. Depending on who you talk to, you will have very different scopes about the platform. What we do, we actually started from teams which were here at the low-level infrastructure. A big part of our team, they are people who are system administrators, who are working with networking, and so on. We built something a bit larger, which includes platform orchestration, which includes what is called here the DevOps platform. It’s basically CI/CD, the deployment part. We don’t include high-level things like API gateway. There is another team which has a data platform, which builds on ours, but they are on the outside in the organization. We are going into internal development platforms, so mostly APIs, CLIs at the moment. Maybe we will go into providing integrated development environments at some point. For the moment, the users are quite free on that side.
The core of our platform started from the container platform, running Kubernetes on EKS, on AWS, and using Crossplane, which is an open-source infrastructure as code product. It’s a bit comparable to Terraform, but it keeps track of the state, live, so that if you modify something in the infrastructure, it will correct it more or less straight away or within minutes. The state we are, it was an engineering-based team. It focused a lot on the engineering topic. It means on the technical side, it works quite well, but we had some hurdles with the communication. We still communicate a lot as infrastructure engineers. It’s sometimes a bit difficult to get into it for product teams who don’t have an infrastructure background.
Challenges, and Goals
First of all, the challenges we encountered. A lot of the challenges for engineers are these people here. It’s not everything to have a platform that works. We had people who didn’t know at all about the platform, and who were actually replicating things that we can already provide. We have people who didn’t understand the scope of the platform, so they were using it for something different than what it was planned for. Who think that the platform, for example, is the operations of the applications, rather than a team that just provides a product, but doesn’t operate everything that the application is doing. Of course, so then the problem was, how can we ensure that we are not only solving a technical problem, but also solving a problem that our users actually have?
The goals we had were, how can we show the value of the platform, so what it brings, and important to show. It’s not only about providing the value, but it’s also about making it visible, which is a typical difficulty for everything that’s operations related, that you realize the value of an operations team at the time you remove it and nothing works anymore. It was important for us to try to show that value before it breaks, so to make it clear for the board and so on, why they are investing in that team and what it brings. We wanted also to show the state of the platform. We would always have people who come and say, when is the platform complete? It’s not that simple, unfortunately. We realized that we didn’t have a very good high-level way of showing where we are.
We could talk about features, about service mesh, is it implemented, and so on, but when you talk to the board, they have no idea what you’re talking about. You need to be more at their level to explain what it brings at a higher level for the company. Also, how can we share information effectively about the platform? To be sure that everybody understands things the same way, understands what the platform is supposed to do, understands the user documentation, and so on.
Roadmap (Platform Journey)
I’m going to start with talking about product mindset, and what’s the point of product mindset? About what we did to try to define the platform and to show the platform maturity. That’s basically the state of the platform. Then I will talk about communication and information. What are our experiences, and what we would say our best practices are.
1. Product Mindset
Regarding product mindset, so the main point of a product mindset, it means that you’re focusing on the user value. For a platform, the user is not always the same as the customer. Customer, you would typically say that’s the person that pays. It would be, in our case, someone like the CIO, the board. The users, in the case of a platform, they are not like, for a consumer product, exactly the same people who pay. You have to take into account both at the same time, to see that the people who pay have value, but it also brings some other value for the users in another way. It’s a bit of a different exercise. What is important as well is that a team that’s working with a product mindset is asked to fix problems and decides on the solution. It’s not a support team which would get a ticket with a full implementation and you have to do it that way.
Part of the work of the platform team is to think broad, not only solve the current problem you see, but also think about how can we solve this in the long term by maybe solving it so that it cannot happen again with other people. Or that there is information about what to do in that kind of situation in the future. A product team is also accountable for the outcome and not for the output. The important thing is not how much you do, the important thing is what is the outcome, what is the user value at the end. It’s not only about occupying people the whole time, but it’s also looking at, what do you want to do in the long run, and are you going in that direction? For background information, there is these sources. This book by Marty Cagan, especially “Transformed” is very useful, it’s the best starting point. It’s more in the concept of company transformation. “Team Topologies”, already mentioned, for the organization. The Melissa Perri book, “Escaping the Build Trap” is more specific in the development product team context.
Then, some tips we noticed. The way we’re doing at the moment is, you need both a clear vision and a clear strategy. Vision is the long-term part, so it’s for something really high level, something like, we are making cloud simpler for developers, so that it’s very clear for everybody what the goal of the team is. Strategy are topics that are a bit more short term or middle term, like one year, something like that, where you would say, for example, what are your exact goals for a specific year that we want to improve that particular part of the platform, the information, we want to introduce that tool that brings that, and so on. This is useful in the team so that people are working in the right direction, and also on the outside so that the users can understand what the platform team wants to do, and can also give some feedback. We actually have a very big problem and you’re not addressing it in your strategy, so, can we talk about it? I found that as an engineer and for many of the engineers, that part is difficult about discovering problems.
Engineers have a tendency to go into the solution straight away. Part of the work is looking around, discovering issues, talking with developers, working with developers so that you can actually see which problems they really have. In our engineering mindset, in a way, it doesn’t feel like real work. For many engineers, it’s difficult to do this because they would see real work is working on tickets, is implementing something, is doing architecture, designing, and so on, but not looking at people’s problems. This is actually quite an important part of the work. You need to strike a balance between easy wins and long-term value. You need some easy wins, especially in the beginning of the platform because you need to show value. If you can fix a problem and show that when we were involved with this issue, we were able to bring that and people worked faster, or we have better visibility and things like that, it’s good for you in the long term. You cannot only focus on that, you need to look as well, in the long term, how can we improve things? You need a balance of that.
In the past, when I was working as a consultant, I’ve seen issues for example of teams who didn’t look for easy wins and then they built a really good background in a year or two. Then, at some point, the board said, nothing happens here, and they canceled everything. You always have to keep in mind that you should stay visible and show something. About the discovery, I think shadowing is really a very good way to find things. Shadowing means that you work with one of the users, developers, people from finance, whatever, for an hour or two, and they show you how they work without you necessarily commenting.
Every time I do this, I always find, this is something we can improve. Or they are using it in a completely different way than what we thought, so maybe we need to improve the documentation. Or maybe the way they’re using is actually even better than what we had in mind, so we can take this over. Then, when doing shadowing, you realize that what helps the users is not always what they ask. Very often, as a platform specialist, you can bring a way of working, say, “If you work that way, if we introduce that way, then it will be a big help for the developers”.
Some challenges we run into. There is a lot of scrum in the company. I found that in scrum, the product owner role is normally doing both the product part and backlog management, basically, so the delivery, ensuring that things get done. Very often, the second part takes the priority at some point, because you have a big backlog, and the product owner ends up ensuring that things get complete, and starts not always considering new topics, because we have such a big backlog, and so on. I think it’s a bit in the way scrum works, that all these roles are in the same person, so maybe in some cases, it’s good to delegate a bit, to have someone who can help the product owner with product topics, for example. Who isn’t stuck in planning meetings the whole time, and so on. There is something I would call team encapsulation, as well, in scrum.
Scrum, I think, was designed to prevent engineers from getting disturbed by other teams the whole time, and need to work on fires the whole time. Sometimes it goes into the other extreme, that the team is completely insulated from the outside world. This is also not good. It must be clear. Communication with the outside is good, as long as you’re not only doing communication. It’s a good thing for engineers to be involved with customers once in a while, so that they can see what is the outcome they are bringing. Very often, we have users try to force the implementation of a requirement. It’s from product teams, but it’s very often from support teams, like security, for example, you must use this tool.
Then, very often, it’s important to go back, to understand, what are we actually trying to fix? Because with the example of security, typically, when we introduce a security tool, it will have effects on the processes. You might get more approvals. Things might become more complex, and you have to find a balance. This is also part of the work of the platform team, of finding the right balance between everybody.
2. Defining the Platform
This is what we use to define the platform. Typically, the part at the bottom would be what many people focus on, for this kind of platform, so the infrastructure, the hard stuff. There is more than this. There is, how do you talk to the users? It’s very important to define these kinds of processes. How do you interface with the users, both technically, so APIs and so on. Also, communication, you talk to the teams, what do you do when you have incidents, when you have requests? Which kind of work you’re doing.
For example, a platform team can do technical consulting. You can have database specialists who could go and work for a team for a few days to improve their processes. This is not something that’s in the infrastructure. It’s not automated. It’s very useful as well, because then you don’t have to have database specialists in every team. In what we call the common layer, this is more guidelines and policies. How do you work with the platform? For us, we’re a bank, so compliance is important. We try to summarize, where do we get information about compliance? What are the different sources? How are you supposed to work? Which topics do you have to look at when you look at security?
Then you can describe, these parts, they’re automated because we have a policy engine, for example, and these topics, if your workload runs in the platform and doesn’t get kicked out, then you’re good. There will be other topics which are more where the application will have to be involved, which cannot be solved by the platform team. We find that very often there’s a lot of value in just the information that the product teams can know, “So this is the scope of security, and we’re covering this, this, but not that, and we need help there”.
What we are trying to do in practice, so in product management, they usually make a distinction between problem space and solution space. Problem space is which problems you have, and solution space is how you fix them. I found that very often when you use tools like Jira, you are always focusing a lot on the solution space. There are objects like user stories, but in practice in Jira, they’re not really used that way. What is useful is to have a repository of what are the kind of use cases that are covered by the platform or could be covered by the platform, because it’s also good information to know, this is something we can work on. Maybe we don’t have time right now, but at least users know that this is something that the platform is not doing for the moment.
If it’s important for me and I need it critically, then I can do it on my own, and then maybe we can integrate it back later. It’s not only about showing what the platform covers right now, but also, what are different possible problems and how can they be solved. There might be different solutions as well. The only solution is not only about automating everything, but it can be about first giving pointers. Here are good pointers about that topic, in the internet. Here is how you manage persistence, or something like that. Then when you see it’s a bigger problem, people need more support, you can continue setting up best practices, and then also the technical size, so automating things. It’s not only about automation. Status and the responsibility as well, who to talk to when you have something about that topic.
3. Clarifying Platform Maturity
This is the platform maturity model from the CNCF, from the Cloud Native Computing Foundation. This describes the state of the platform, according to different topics. We found this is a very good tool to manage up, to talk with people at a higher level because then you can be much more generic without going into the details. It has five categories, so basically how you do the funding, how the team is funded.
How do you work with adoption? What are the interfaces between the team and the outside world, how the team operates? How do you measure things? It’s much easier to talk with people who have budgets, for example, at that kind of level to discuss that, “We want to go in the interfaces from provisional to operational, and it means we would introduce this and that. It will give that kind of value, and we think it would cost that much”. It’s much easier to discuss that way than having the board coming and say, when is your platform finished in the end? This is a bit, what we did, what is our situation, and what we want to improve in the future. It’s not always that you have to be. In the end, it’s a cost issue as well. You have to look at, is the cost in one category being the very best, is it really worth it for us or not?
4. Information – Communication and Documentation
Then, information. I really like this. Matthew Skelton is one of the guys who wrote “Team Topologies”. I think there is a lot of truth in this. We think a lot about implementation, but there is a lot in platform engineering which is about know-how, which is about keeping the know-how on what you do. As a consultant, I’ve seen projects in the past, especially some which were very successful technically. Something was automated and it works flawlessly. The team gets dissolved. It works perfect for 10 years. Then when you need to change something, you realize that nobody actually knows how it works and what it really does. This is not a good situation. Regarding information, so it’s important to consider it as a feature of the platform the same way as technical features. To make it as effective as possible, but don’t constrain too much. You have to look at what are the real use cases as well.
There are use cases for asking a quick question in chat, but you don’t necessarily need the best authoritative answer. It could be something like, is this supported in the platform? No, it’s not supported. Good, we know it. Communication is not only about ITSM tickets and incidents and things that get tracked perfectly. There are also topics about technical consulting. We find this is important in some topics as well. About this, I found this book very interesting, “The Async-First Playbook”. It’s a book about how to communicate in remote environments. I find it very useful even when you’re not remote because it focuses a lot on how to make information visible. I think especially when you work with Scrum, which is quite meeting heavy, the information tends to be in the meetings. If you’re not in the meetings, then you don’t have the information. This is not good. It’s important to think about, what is decided, how do we know that it’s decided, how can you look at it in the long term as well.
Then, documentation is an important topic as well. In our situation, we had it a bit more difficult because we basically are a merger of different teams doing different things. Of course, they all had their different documentation. It’s very difficult to put everything together. Also, because people who write documentation don’t all write documentation in the same way. Developer likes having things like merge requests for review, for example. In general, for us, people like architects, they would prefer to use a tool like Confluence where you have direct feedback. When you do a merge request, if you use something like GitDocs and you do merge request, you only see the real result when you deploy. That’s the difficulty of working that way with documentation. You have to find a way that works for everybody. We gave up in the short term to try to standardize everything, to say, there is this documentation platform, you have to use it, because some people don’t use it.
Then you have shadow documentation. This is not good because then you might have people describing how to use the platform in their own way. Then when the platform changes, if you don’t know that this documentation exists, it doesn’t get updated and it’s wrong. It’s bad for you even if it’s not the official documentation. It’s a good thing to always try to keep the overview, to be as inclusive as possible so that you know also if other product teams have their own documentation, what they are talking about, and so on. This is not only about user documentation. We try to make a map of what it looks like. We have basically documentation that’s more thought for the outside world, which will describe what is the architecture of the platform, what does it do, some quick introductions, and so on, for new users, for people like product managers in other teams so that they understand what it does. We have some more conceptual architecture documentation, which explains how things are built, what are the capabilities, how does it work.
Some which are more low level, so we have internal documentation for things like operations, and user documentation, how to use it or how to troubleshoot things. This is quite broad. Technical documentation needs to be in sync with the user documentation, which is very often a problem. For example, during onboarding, we encourage the people who are onboarding to tell straight away if something doesn’t work. It actually corresponds to this, the psychological safety. Tell when something doesn’t work. It’s not because you did something wrong or you’re incompetent or something like that, but it’s very probably because something has not been updated, so we need to know about it.
Especially for communication with the outside, some people, especially the most competent ones technically, have a tendency to think that everything is simple and obvious. You shouldn’t communicate too much to the outside because, basically, when you do that, you’re telling users you are not the right customer for us. It’s not a good situation because when you don’t have customers anymore, then you have a problem. It’s important, even if sometimes you have the impression that some people don’t understand very fast, that not everybody has the same background, and not to stop people from commenting because this is the way you learn how you can help them.
The Future – Next Steps
What does the future look like for us? We want to continue improving how all stakeholders participate, so even people who are not direct users. We had good experience working with people like finance, for example, which is not always the first kind of user you would think about in a platform, but for them, there’s a lot of value in being better able to track costs, to map costs to a specific product team. We actually had a lot of good collaboration that way. It’s always good when the finance team is happy. To work more, so to encourage especially the engineers to talk a bit more with the users and to consider things from a user point of view. We’re working a lot with Kubernetes. I think with Kubernetes, it’s very interesting technically. I find the same as well. You can spend the whole time looking at new technologies, and at some point, you have to look as well at, what is exactly the value behind for the users?
Questions and Answers
Participant 1: What’s your take, and you can pick whatever is easier in terms of amount of teams or budgeting costs for the engineering teams or whatever, on the balance between the platform and the stream-aligned teams or any other of the complicated subsystems. Or, how does it work in your case? How much capacity do you have working on this platform team? In terms of FTE, or how many people you have working on the platform team, what’s the balance between the platform team and the stream-aligned teams?
Di Cesare: All together, we have about 60 people. This includes some things which are not really working as a product at the moment, like part of the network team, identity and access management. As you saw in the first diagram, it’s quite broad. All together, it’s about 60 people, including administrative people as well, so scrum masters, product owners, and so on.
Participant 1: That’s 60 for the platform?
Di Cesare: Developers, we are in the hundreds. I wouldn’t be sure about the exact number, but it should be in 300, 400 in that level, roughly, yes.
Participant 2: How do these practices that you’ve showed here differ for the enterprise size? Imagine you have a 20-people company versus 2,000 or 20,000. Do you think they’re consistently applicable across all those three sizes?
Di Cesare: I think the principles, yes, but some parts will become more important depending on the size. All the parts with communication, they’re always easier when you’re small, because you can see the people next door. We see, for example, the communication inside the platform. Things like vision and strategy, they are important to be sure that the different sub-teams go in the same direction, and don’t start to work with different priorities, for example. I think on the enterprise side, it’s really important to focus on that as well. Probably to have decisions and communication made in a way that it can be tracked better when you’re on the enterprise level.
Participant 3: In your opinion, what are the most critical roles that have helped you during this journey to adopting a product mindset to the platform team?
Di Cesare: I think the key role in the end, is you have to have the product management mindset at the top, on the leadership side. If you don’t have this, I think you need to reach that at some point because I think people who decide must work in that way. If only the engineers work in that way, it’s not going to work. If you are in a situation where only engineers work with a product mindset, people on the leadership need to be convinced. We fortunately are in the situation that people in the leadership are convinced of this because we have also many end user products and they work more like this.
Participant 4: You mentioned about making small investments and finding small wins against making big investments. Do you have any advice for people who are already making big investments but are having challenges on how to dial back on that and make some progress?
Di Cesare: The main thing is you have to consider both because some teams who are more short-term focused will look only at the easy wins. Then you will get a lot of buy-in in the beginning but then you’re going to build a lot of technical debt as well, and things won’t be effective in the long term. I’ve seen also in other positions, the opposite, where you have companies which are architecture heavy, they have a very clean architecture, but they are not focusing very much on having something that works and that people can use even if it’s not perfect. The main focus is on the long term but you have to be sure that you always look at short-term things on the side as well, so that there is always some visible progress.
Participant 5: Do you have any tips on how to avoid shallow documentation?
Di Cesare: You need to talk with the users, and look at how users work. Very often when you look at how users work, you notice that they’re using this as information, and this is something that was written three years ago and it’s not relevant anymore. I think shadowing is a good exercise to do regularly to discover things like that.
See more presentations with transcripts