Transcript
Gosselin: We’re going to talk about this idea of empowerment during this session, and how that changes our thought process. Why developer portals are important, what even they are, and how they’re going to impact some of the future foundations. I love this quote from Satya Nadella to get us started, this idea that empowerment is the key to accelerating innovation in any team. We’ll break that down. We’ll explore that a little bit. My name is Travis. I work as a distinguished engineer for SPS Commerce. SPS Commerce is a Minneapolis-based organization, and we build out currently the world’s largest retail network that exchanges invoices, purchase orders between suppliers and retailers inside of it. I don’t really focus a lot on the products that we build necessarily as talking about the developer experience and how we build the software with our engineers, and the patterns and the productivity associated with it.
Developer Experience
If we had to align particularly on a developer experience definition, I like to think of it as the art of studying and improving how developers get their work done. What is involved in their day-to-day when they have 20 or 30-plus screens or dashboards to do something? How many tools are they working with? At SPS Commerce, we have a vibrant network of connected customers, over 50,000. As the world’s largest retail network, we exchange information at the heart of this through many different channels and interfaces, APIs, EDI. Just to give you a sense of our organization, because part of today’s session is all about impacting developer experience in maybe not an incredibly large organization, not one of your FAANG enterprises that are out there. We’re a 3,000-plus person organization. We have about 700 people in our technology department, that includes all of technology as well as product development.
To give you an idea of the number and granularity of services we have, we have about 2,400 deployable units. Of those, about 560 of them are APIs or have HTTP-based endpoints. I share this with you just to give you an idea that we’re not the largest organization out there. If we draw and we measure organizations based on small, medium, and large, you can see we put medium enterprise there around the $40 million gross revenue mark. SPS Commerce comes in the middle of some of those in terms of its revenue at about $600 million. The reality is that we’ve grown very fast in the last couple of years. We’ve almost tripled our revenue. That leaves us with a lot of pain points and a lot of gaps as we try to figure out some of the developer experience problems that we’re facing with such rapid growth.
Currently today, SPS Commerce spends 8% of its headcount on productivity teams. Productivity teams, if you remember the definition of that, really focus on SRE, platform engineering, productivity engineering, as well as your centralized database engineering teams, as well as your cloud operations teams. These are all made up into 8% of our focus, which is really low. We’ve had to do a lot with a little as best we can, like I think many of you are as well.
For us, that really pushes this envelope of the developer experience crisis that is intensifying. I love this quote in this way that it’s approached from the state of software delivery report that we see from Harness put out this year. It says that developers find themselves wearing multiple hats. They’re expected to be security experts implementing robust safeguards, operational specialists managing complex infrastructure, performance engineers optimizing system efficiency, and UX advocates. You get the idea. There’s a lot of hats. I feel like it’s DevSecOps today, it’ll be DevSecOps ABC, you name it, tomorrow. As developers, we are asking a lot, whether it’s through GitHub, whether it’s through your Kubernetes dashboard, AWS console, Azure portal, you name it. There are so many different interfaces that we have to navigate daily.
On top of that, the additional domains, whether it’s infrastructure as code, supply chain, SAST and DAST from a security perspective, Docs as code, operations, quality and testing, legacy. The list goes on because it’s a very difficult space to enter nowadays. I’m glad that I’m not a junior developer in that regard because there’s a lot of information to cover. We learned a lot about, I think, what the future looks like in terms of developing further abstractions that in turn pursue additional complexity. That was a fantastic keynote.
Maybe AI will save our productivity. Maybe not. It depends. Sixty-eight percent of developers spend more time resolving security vulnerabilities as a result of AI, and 67% spend more time debugging AI-generated code. The reality is a lot of predictions in 2025 are that productivity is going to go down because of some things we’ve been hearing about, because of the new skill sets that we have to learn in order to adapt. The adoption of AI codegen tools thus far has actually resulted in a significant increase in developer workload. There’s a displacement. We’re all doing really neat things inside of the development space, but we don’t necessarily do coding for 100% of our day.
In fact, the average stat is that developers spend 52 minutes a day coding, off of Software.com. There’s a lot of work to do as we think about how AI is going to impact the full SDLC. Ninety-six percent of developers say the full benefits of AI-assisted software development will never be realized until their use extends to the entire software delivery lifecycle. AI alone won’t save our productivity, but it certainly offers us a lot of insight into it. We need to think about how a developer portal is not going to necessarily just impact the developer experience, but also the machine experience. We need to think how it’s going to provide structured data and functionality and context, ultimately, for some of the questions and some of the agentic tasks that we want to execute it on.
I want to step back, though, and focus in on this idea of empowerment, because empowerment really involves a couple different aspects. I love this simple definition from Oxford, and that’s that authority or power given to someone to do something. I think in a lot of cases, we really want to empower engineers. We want to give them stewardship over the things that they’re developing and ultimately enable innovation. Empowerment really zones in on two important aspects. The first is pretty simple, this idea of reducing friction. That’s the obvious one.
The aspect focuses on removing obstacles, inefficiencies, and repetitive tasks that slow down developers’ work. Give them self-service capabilities where there’s ticket operations before. Enable streamlined onboarding so that way your senior engineers aren’t tied up with some of those onboarding questions. I think more interesting than reducing friction, when we talk about empowerment, the idea is around improving fulfillment. Fulfillment really focuses on enhancing developer satisfaction and ownership and growth in their roles. We talk about the ideas of stewardship and ownership. This is what provides fulfillment to our engineers.
Enhance collaboration by providing spaces for documentation forums, ticketing systems, visibility and metrics offering insights into the status of projects. Ownership with self-service access to tools and the ability to manage their own workflows. As we take these two concepts of reducing friction and improving fulfillment, we’re going to explore what it looks like to empower developers through an internal developer portal. I’m going to zone on on four key aspects today.
First, this idea of a development hub. Tackling cognitive overload head-on, and really recognizing the need for connection and centralization of some of the SDLC that we don’t have today. We’ll talk about building a developer-centered portal, and really zoning in on some of the questions that you might want to ask or consider in your organization as you tackle such a project. When we are tackling it, we’ll dive into some architectural considerations that are worth exploring that are a little bit specific to this space. We’ll also dive into some adoption and impact. I’ll share with you some of the wins and some of the not-so-great things that we ran into as a part of our journey at SPS Commerce with our adoption of a developer portal.
Development Hub
First, let’s start with this idea of a development hub or context. Context isn’t necessarily obvious. In fact, if we sit down and recognize that the complexities that we deal with as engineers on a day-to-day basis are incredibly tough and we tell those stories, I think some of our executives would be pretty floored by that. Sometimes context is obvious. It’s pretty obvious why that chair is over that hole and why there’s toilet paper next to it, I think. We all get the idea of its particular purpose. Context isn’t always obvious. What’s the purpose of this building? It looks like it’s some bunker of some kind. It was created to protect occupants from an apocalypse. Zombie apocalypse? I don’t know what type of apocalypses are possible these days. The reality is that it’s important for us not to make assumptions about what we see. Context helps drive that.
If you’re a Reacher fan, you’ll be familiar with, an investigation, details matter and assumptions kill. The assumptions I’m making in my software aren’t killing, but maybe they are for you. Maybe you’re working on something that is much more sensitive. How about this one? What’s the purpose of this? Is it a lighthouse? It doesn’t look like it’s near water. I’m not sure the purpose of this particular building. Until we understand the context of why it was built, the purpose, why is it there? This is the drop tower in Bremen, Germany, a 475-foot hollow tube that was built so scientists could study the effects of weightlessness by dropping objects from it. It makes a lot more sense than some kind of a lighthouse or tourist attraction.
This same idea applies to what we need to figure out in our development hubs and centralization, bringing context together. That can be done through an internal developer portal. You might call it an IDP. I personally do not prescribe to the term or the acronym IDP, because it can mean so many different things. Internal Developer Platform, or my favorite, identity provider, Intelligent Document Processing, Individual Development Plan. You know what that is if you’ve been on one. The best definition that I like to think about a developer portal is from OpsLevel. An internal developer portal is a central location that individual developers, teams, and managers use to own, operate, and improve software. You see, it’s describing this idea of a contract between your platform engineering teams, for your product development teams, between other aspects of the organization that are all centrally working out of the same hub. It has four concepts, four pillars that I think are important that tackle some of the challenges that reduce cognitive overload.
First, it’s a software catalog. If you don’t have a software catalog today, maybe you’re not at a scale where you need one. The reality is that understanding the ownership of services in your system, deployable units, how they’re all tracked together, is table stakes today. In the past, it’s been managed in spreadsheets. It’s been managed through homegrown tools and other capabilities. When you have a downstream dependency or you have a failing service in production, you’re trying to figure out who owns that. What’s the right Slack channel you go to? Your software catalog contains all of that information and more. The reality is that 50% of engineering teams lack the trust in the data quality in their software catalog. That’s because too often we’ve left a software catalog to be updated manually when it needs to be driven through automation and updated automatically.
On top of that, it’s a self-service concept. How can I execute myself when I need something without having other human intervention involved? Seventy-eight percent of engineering teams wait a day or more for SRE or DevOps assistance regularly. I feel this pain. We felt this pain at SPS Commerce very specifically, because for us, our teams were waiting a day after submitting a ticket. I need a new Kubernetes namespace. I need a new AWS account. That was a day, not of them actively working on it, but submitting a ticket. Figuring out where to even submit the ticket to. Who’s the backlog owner of this? When will it be picked up? Do I have to drop the URL of the ticket I created in a Slack channel too? What information do I include in the ticket? There are so many interesting questions that shouldn’t be part of just making the request that I need.
Scorecarding, 85% of developers indicate they don’t have clarity of standards today. I feel this too. What particular best practices, guardrails, approaches should my services implement? How do I understand when they’re not? Of course, workflow automation sits on top of all of this. The ability to create and support golden paths at scale. Real-time events that we can now see flowing through our system as we connect data and context through the SDLC together. Of course, the ability to start making machine-level decisions where we can.
At SPS Commerce, we call our internal developer portal, devportal. It’s a very unique name, I think. It looks like this at a high level. This is our software service catalog. We’re bringing together and incorporating all of the best parts of our SDLC, all the way from GitHub, centralized with a service catalog in the middle, AWS accounts, Jira, PagerDuty, Azure DevOps, to really build and provide that context, that centralization to our teams. They can open up a service that they own or see the series of services that they own, and immediately understand where they can go to figure out the next steps for a particular part of whatever problem they’re solving, and get deep links into all the rest of these tools and understand the ownership of them.
Understanding the context isn’t just at a service level for me, but we can also understand context at a platform level. Remember that a developer portal is also for other parts of the organization, not the product teams. When you talk about making platform-level decisions, this is pretty important. We had to basically come up with a particular solution to a problem that I think many of us are iterating on right now. It’s not incredibly novel.
First and foremost, understanding GitHub information we have. We know we have 5,000 repos. We know the languages that are in those repos. We know that we have a Kubernetes sizable cluster and runtime that’s available for our teams to deploy to. How do we understand the primary language of what’s deployed into the runtime? We can’t right now. We needed to enable ARM support across our runtime. That required us to have cross-compilation support as well as emulation support, and build those ARM-based containers, deploy them to Kubernetes on the relevant hosts.
In looking into our service catalog, we now connect the dots between GitHub, between our runtime, and we start to answer interesting questions. Like, how many of our services in this runtime are actually based on Python? The reason that’s interesting is because Python didn’t support cross-compilation. It only supported emulation. Our build servers weren’t equipped to handle that, at least not in any performant way. We looked into our service catalog with this information that we have now, and we understand by connecting the information together that Python made up 500 of our deployable units that was on this platform.
Significantly more, double that of any of the other languages that we needed support from a polyglot perspective. With that information, you might think that Python was the most important. You need to support emulation and it has to be fast. When in reality, we connect more details. What about the runtime? What is the actual compute workload of these? Pulled together and included there by one of my good friends, Jesse, on our SRE team. He’s looking at the data going, that’s great that you think Python should be something we support, but the green here represents Java-based compute workloads.
While Java was our minimum number of services, it’s the core of our system that scales much more amount of compute than any other services combined. It’s composed of 75% to 80%. Context is everything to make decisions. In fact, we can probably just focus this initial MVP on Java alone and accomplish very large savings across our compute cluster. Context is important.
If you haven’t built a business case for this context yet, if you’re still in the zone of not quite sure if leadership will buy into it, you need to develop a business case. Exploring this idea that positive developer experience is forged in acknowledging the complexities developers routinely face and actively creating ways to navigate that complexity, this is the direction you need to go down towards. You need to advocate for this business case of developer productivity. Here’s a couple ideas to help maybe give you some ammunition to take back.
First, you need to define a problem for this type of capability. I referenced earlier this idea that developers only code for 52 minutes a day. Why is that? Is it they’re only coding 52 minutes a day because they’re stuck in meetings the rest of the day? Building a developer portal might not solve that. It’s one particular solution, one particular approach. It may not be what you need to go after in your organization. Though I think it’s a necessity and table stakes to have some of these capabilities regardless. Perhaps instead you’re looking at the tool sprawl that you have in the organization, or the tedious time it takes to build out certain capabilities in an API and you need to build self-service functionality for that. Now you’re on the same track as an internal developer portal. You’re going to have to explore measurement.
Bringing industry-level benchmarks has never been easier in the number of reports that are out there for you to access freely online today. Those are great starting places. Like this report from Harness. Seventy-eight percent of developers spend at least 30% of their time on manual repetitive tasks. More interesting to that, what’s going to take this to your leadership and really pull them in is anecdotal stories about the toil that you face in your ecosystem. That’s what truly makes the difference when they feel the empathy and they understand that, Joe, who works literally on the next floor for me, is experiencing this level of pain. When they can feel and understand that, it makes a massive amount of difference in them wanting to buy into it. You see, attaching this to leadership priorities is incredibly important.
Leadership and executive level doesn’t necessarily see your pet project of enhancing productivity as something that’s going to impact their bottom line. It does. It impacts their bottom line massively. In fact, when we talk about leadership priorities, productivity can be very tightly woven together against speed to market, innovation. These are things that your leadership cares about. In the past, productivity has very much been seen as this idea that if you have enough funding, if you have enough budget, great, we’ll go do that. That is changing. That is shifting to a different understanding in the industry today where you need to be prioritizing productivity if you’re not today. Building a broader DX strategy is important. Building that into your culture is really important as well. This has to be both a bottom-up and an up-down shift in building a developer experience approach, an inner source-style of collaboration if you don’t already experience that.
Building a Developer-Centered Portal
What about how? How do we do this? Let’s explore a little bit more as we dive into the idea of building a developer-centered portal. This is really exploring the idea that there are three general approaches to a portal that you might pick, a strategy in your organization. This is provided from Cortex, who have some great foundational information on different strategies that you can pick. The first is a rigid approach. You could say, I need a developer portal, but I’m going to choose something pretty rigid because there’s so many quick wins and fast setup and data integrations that you can install, run, and execute on really quickly. It appeals to executives, lower level of investment from an owning individual team that has to develop on it to something that can just be stood up and worked on. The reality is you get what you’re willing to invest into it. There’s limited adaptability and it can hinder long-term progress or different opinions you might have. The opposite side of this is unopinionated.
The idea that we want to take on particular software that allows us to build our opinions into it. If you’re a highly opinionated organization and you want to approach your developer experience from some ideas that you have, you’re going to have to be more leaning towards that unopinionated, something that is extensible. I can prioritize different workflows, but it does require more significant investment. Of course, in architecture, we always want the balance. What are the tradeoffs? What’s the middle ground? We can find capabilities and functionality for developer-centered portals that are more balanced on standardization and flexibility. Whether you’re working in Backstage, Cortex, or Port, all of these are examples that have different levels of rigidity and opinionatedness across different dimensions of features that they have.
As an example, Backstage has long been reported as people getting into it accidentally in the sense that, I didn’t realize it was a framework to build a portal, not a portal to start from. At the same time, that gives me a lot of flexibility in the direction that we want to go. Something like Port as a fully SaaS offering is completely opposite side of Backstage that offers you a very high-fidelity way to get started fast, that gives you a very rigid UI, but a completely unopinionated data model that you get to design the schema for your particular SDLC workflow. There’s, of course, a ton of other options out there.
Quick, deep research on your favorite AI tool will give you a lot of information in this space. Harness is a great example of something that I’d suggest as a strategy. If you’re already in the Harness ecosystem, that makes a lot of sense. They have a hosted Backstage offering for you that gets you closer to a product, named as one of the magic quadrant leaders for DevOps in 2024 and 2025. There’s lots of great portals out there that are platform agnostic, and others that are, that you can take a look at. This is just a handful of those that are popping up on the market today. The reality is that central hubs for service metadata, the service catalog, exists across many different aspects today, whether it be spreadsheets, whether you’ve homegrown a solution, whether you’re doing some combination of these, moving to a more standardized, defined strategy, where it allows you to start building self-service capability and scorecards on top of it.
I think it’s important to ask this question and spend a little bit of time on, who should be building the developer portal? Who owns it? Who’s putting together that Backstage implementation, that Cortex or Port implementation, whatever you’ve chosen? The reality is it could be many different types of teams inside your operation side or your platform engineering. In our particular case, we’ve set up and worked through the definition of developer productivity engineering as it relates to Gradle Enterprise or Develocity, focusing a lot on CI/CD capabilities and productivity.
That team for us now is also an owner of our internal developer portal where they can invest specifically in the ideas of how to make our engineers more productive. The only difference is we’ve drawn that line a little differently. They are product developers that exist still in the development and the product side that continue to work on a product for our product developers. That’s what I mean when I say the idea of empathy-driven development. They work still in a group of platform engineering teams forming a matrix of teams. We feel like this gives us a little bit of a secret sauce in order to appeal more directly and have that empathy out of the gate as we design portions of the developer portal for the developers themselves.
Team Topologies relates in terms of that workflow. You have your stream-aligned dev teams there, which are your product teams. You have your platform teams underneath that are building out your internal developer platform and different capabilities. We slightly augment that, though, and we change it so that enabling team is actually connected more to the platform team and works more closely with them as it provides an abstracted capability and empathy for them. Of course, this breeds a number of different benefits, and empathy, fosters collaboration, and really gives us the ability to encourage inner source in a much more interesting way with engineers that are focused on the ideas of inner source or on open communication and open collaboration.
I also want to ask the question about balancing content. This is something I often discuss with individuals. The idea is, where do I start? What is the first thing that I build? What should I include? What shouldn’t I include in a developer portal? I think this is helpful to recognize that an internal developer portal helps you manage tool sprawl, not eliminate it. Our goal here is not to sit down and build every single interface into our internal developer portal. You’ll never be on top of that. You’ll always be behind. Instead, we talk in a lot of cases about connecting data and context like we’ve shown so far. Then being able to link out directly to these tools that can provide more high-quality, high-fidelity details. Link to existing tools instead of replicating their UIs.
For us, an example of that, we have Kubernetes dashboards. We have different Sumo Logic queries. We have Grafana dashboards that are available, and Prometheus queries. Number one most difficult thing that our teams find is once they get to where they need to go, it’s easy. I have the information right there, but getting there is difficult. What is the right namespace I need to go to? Which region, which cluster does this thing even exist in? Those are the difficult questions to answer. If I can just start from the devportal, quickly find my service, and I can click on a link that takes me right to the Kube dashboard, I can see information very quickly there. Same is true with our logs. What are the source categories that these logs belong in? How do I find them? These are the types of toil that we experience, especially during incident resolution, that can be taken away by driving context like this. You want to prioritize the integration of what I call owned data sources.
This idea that you likely have some software catalog started already, that you have a service ID perhaps that you relate to everything, allows you to drive essentially back the data integration into your developer portal. What I mean by that is for us, for example, our software catalog has a unique UUID version 4, that is created and has to exist inside the software catalog before you can deploy anything into AWS. If you go ahead and do it manually, within minutes it’ll be completely ripped out and destroyed. You have to have a valid idea of ownership and cost attribution before you’re allowed to do anything. That means that we can drive that context back to you automatically. Now in the developer portal, we’re able to automatically see what resources are tagged with your unique ID.
This type of owned data is something that is going to be necessary, especially for compliance and scale. It’s also incredibly helpful here because we can now draw those relationships directly for consumers. The opposite of that, of course, is unowned data sources that require extra effort before they even offer value to teams.
An example for us is, as we entered into this, we didn’t integrate LaunchDarkly or our feature flag mechanisms right away. The reason is they were largely the Wild West. We had not explored any governance, compliance, cost attribution there. We didn’t really understand that a feature flag relates to anything in particular. We can import the data, but why would anybody look at it if not from a platform level? They wouldn’t need it at a surface level. It doesn’t mean anything to them. We can’t connect those dots. To truly make it worthwhile, we had to drive ownership of those, reconsider how we tagged them effectively and the workflows for using them.
Lastly, here in terms of content, anti-patterns you might encounter. I’ve seen a number of teams move forward in using their internal developer portal as a reporting platform. While that is really tempting to do, it’s typically not the right level of transactionality. There’s typically not the right tool that you want to drive that level of warehousing with. You still need that. It’s not to say that you don’t need that. This just may not be the tool that’s going to do that for you. Current state, minimal history, some high-level trending information is typically what you’re going to find has the most important aspects in your developer portal. Pilot, POC, workshop ideas early as you consider what content is going to be useful to your engineers.
Architectural Considerations
Let’s explore some architecture a little bit. I want to call out a couple maybe not so obvious decisions that you’d have to make. First for us is number of environments. There’s definitely a temptation to say it’s a developer portal. It should be accumulating information from all of my environments, which is true. I don’t necessarily want to have two distinct developer portals. This one’s for dev. This one’s for prod. I need to bring my context together into one. In order to do that, you might be looking at restricting views of in-progress changes through permissions. Or you might want to actually set up two instances and ship data to both instances. The reality is we still need to roll out changes that we can test and evaluate. We need a subset of the data to do that. Just to call out to say, whatever your standardized approach to multiple environments is today, continue to do that as you consider your developer portal.
For us, volume was also a problem. As we had two instances, how do we handle a dev environment that doesn’t have all of the data in it? We need to test loading and evaluating it. You’re going to solve that no different than you solve some of your problems today around testing in lower environments that don’t have production-level data. You’re going to import the data back as you need. You’re going to use a subset of high-fidelity data. You’re going to curate custom data as needed for it.
I think more interesting is the aspect of infrastructure as code when it comes to developer portals. We’re using multiple environments. Even if you’re not, you might consider, do I need to roll out changes to my developer portal through infrastructure as code? Should everything be ephemeral GitHub repository-driven in general? I think the answer is yes, maybe? Balancing risk with change frequency needs is really important. If we start to draw out the idea of high risk and low risk, and then low change frequency and high change frequency, we can draw in certain aspects. Our models, our schema are something that are highly opinionated for us, and so they have a low change frequency. They don’t change that often once we have a good model in place that represents our schema.
Our integrations, they’re in a similar spot. We’re importing data that’s highly custom to our particular ecosystem. It doesn’t change that often. These are both high risk because if we do have to make a schema change and we have to nuke a bunch of the data that’s there, that’s going to be high impact. Self-service actions also exist on this list. Actions that we’ve created as contracts for executing ticket operations, for executing other aspects of the workflow. Scorecarding, which is all about best practice visibility to our teams: what they can see, what they can’t see.
Some things that are on the opposite end of the spectrum are catalogs and dashboards. Catalogs and dashboards are typically much higher change frequency and very low risk. These are types of things where a team needs a particular perspective or view of their services, and so we allow them to go in and just create private custom dashboards right away. Those are excluded from infrastructure as code. This is in some ways for us, not repeating mistakes of the past. In the past, we controlled all of our Grafana dashboard rollouts through infrastructure as code, which was, in our opinion, a big mistake. Especially during incident resolution, when you need to debug something and you have to roll out a pipeline through multiple environments to even see the dashboard change. Not ideal. Not the velocity that we want to instill. We’re slowing people down. This is all about calling out that there are important aspects of infrastructure as code, and there are other aspects that you should keep away from it.
In some cases, that’s not static. Sometimes how often things change and the risk at which they are is different. Sometimes our models, integrations, and scorecards, all of that is in the sandbox account as we fiddle around and we play, and we try to understand the best way to visualize something and integrate with something. Here’s another impact that multiple environments has. Ultimately, those do need to get extracted and moved back before they move to our production environment.
A call-out to data modeling. We want to balance scalability and maintainability on the models that represent our SDLC. Remember, we get the advantage now of actually modeling out through standard approaches and visualization of what our SDLC looks like. When have you ever gotten that ability to do that before? This is exciting. For us, as an example, we start to model out. We have a portal where we’re creating our model in. We have all sorts of different data sources that we want to import from GitHub. GitHub, as an example, for us, is our SAST offering where we do static code analysis through CodeQL and GitHub Advanced Security. We can import different types of functionality through that, whether it be secret scanning alerts, code scanning alerts, Dependabot alerts.
All of those, for us, map to a single data model internally. That’s the idea, again, of abstracting away the technology-specific schemas. We wouldn’t want to see, for example, in this case, a GitHub Dependabot alert schema, that’s not going to be the abstraction that we want to build over time as we introduce new tooling, as we introduce new functionality. In this particular case, you can see in our portal, it visualizes as a single data model that you can have a type on or an enumeration of Dependabot versus code scanning. You can group the data and use it in interesting ways. It does mean that you’re playing that tradeoff, just like you do in any data model that you build when you think about how generic something is.
In this case, the severity type, you can see in the top, is going to be set to none. It’s hardcoded, literally, because that is not a concept inside of a Dependabot mapping, whereas it is inside of code scanning. This is a call-out for you to consider, what is your generic model? Your model is not just your listing of tools that exists. For us, another example would be a repository. We’re going to start importing a bunch of GitLab data as well from different acquisitions, and those are going to map internally and centrally to a single code repository concept that now provides singular context across our teams. This balance is not easy sometimes, but I think this is the ultimate necessity in building and abstracting away your SDLC that you get to model.
Adoption and Impact
I want to explore adoption and impact, because this was not necessarily a smooth road for us. We definitely had some successes in our creation of our developer portal. Some of the successes, first of all, one of our primary goals, getting started out of the box, was to bring in all of our foundational data so that teams could understand what we call the full service context. The full service context is all about a one-shot view on a screen where you can see that I know the types of resources this has in AWS versus Azure. I can see that it’s in AWS as opposed to Azure. I can see that it has these elements or properties, here’s the pipelines for it that it’s created. It’s in Azure DevOps, it’s not in GitHub Actions, as our technology evolves. That one-screen view where I can tell a lot about what I’m working with for an SRE that’s looking at it for the first time, for a new team member on the team that owns it, looking at it for the first time.
On average, we had a 65% reduction in time spent gathering service context. You can see that was relatively different across engineers that have been there less than a year versus those that are 5 years or more. This information was gathered through qualitative metrics, which, again, maybe offers some more differences in how you approach it. For example, if there’s a small difference in the statistics between before and after, and it’s qualitative, we can’t necessarily agree that that was significant enough.
In this particular case, you can see the data points were significant enough that we were pretty happy with the results. Seventy-six percent reduction in time spent locating team entities. There’s a lot of opportunities to combine, I think, qualitative and quantitative data if you’re new to the productivity space. There’s a lot of guidance coming out of getdx.com that I encourage you to check out. A lot of different ways and approaches that you can gather some of this study information and combine them and look at industry benchmarks together with that.
I think one of the more interesting aspects that we did that was maybe unique or novel was we did very specific time studies. A time study looking at an individual and actually watching them through something that they recorded themselves doing, and understanding in more detail, where did you go wrong? What was the toil you experienced? Or was everything absolutely fine? We did this time study to identify toil, but also identify some of the missing best practices. As an example, just to bring it more to context, we did a time study on individuals in the organization creating a new repository in GitHub. Which a lot of the times those individuals started out saying, this is easy, I’ll go to GitHub and I’ll click new repository, give it a name and create, and I’m done. Why are you doing a time study on this? The reality is because our engineers thought that they were done at that point.
Some of the toil they encountered was they created those GitHub repositories as public repositories instead of private repositories, and made them fully open source, visible to the world. Their understanding and interpretation of public when they saw it was that that’s public to my organization. All of these little things are small examples of where it can be difficult to drive the masses of your organization down certain compliance.
Another example that we discovered there is, we do have expectations around certain features that are enabled in a GitHub repository around GitHub Advanced Security, around rulesets and branch protection, and just about none of those that completed the time study were able to create repositories with complete compliance to our best practices. This was found to be true across some of the other resources like pipelines and stuff that we created as well. The other interesting things we found were not just missing best practices, but the amount of toil that our teams would spend in looking for documentation about how to do the thing before they even did the thing was enormous. They spent a great deal of time. Even a self-service action with a centralized catalog, even a catalog that says, here’s the links to all the normal things you need to do, would in and of itself be massively helpful for these individuals.
However, we did at this time experience some interesting changes. This is our developer experience rating, which I think is insignificant because it changed from 7.4 to 7.6 out of 10 as part of the index we were compiling. During that study, we found out that our NPS, our Net Promoter Score, was actually minus 7. For those familiar with NPS, that’s not a great score.
In fact, that’s awful that we had more detractors. A Net Promoter Score is a customer experience metric that measures customer loyalty and satisfaction. This means that in the stats we had 25% of our engineers were saying, “Yes, this is good. I would recommend this to other people”. We had 32% saying, no, telling other people, “I do not want you to use this. This is not a good system to use. It does not provide productivity benefits”. Forty-three percent were neutral and 32% were detractors. They were actually involved in saying, “Please don’t make me use this. You should not use this portal”. What happened? Why did we have these detractors if we had done what we thought was right from the beginning when done our research? There are a couple experiments that didn’t quite pan out or theories that we had that didn’t quite pan out.
The first being daily use. We expected engineers will come in here, they’d love this thing, they’d use it daily. When in reality, what they were mostly doing was on-demand usage of it. I need to go there for the self-service catalog to complete the action that I need. Don’t force me in there, I don’t need to be in there, but when I have it, it’s great. That changed the views that we created for people. Instead of coming in and having a full plan my day view and an experience that expected them to be there all the time, in reality, they needed to go there for certain actions and approaches. We could tailor the experience more towards that. We also expected teams to love the real-time functionality, accumulating all the real-time events that came through the event stream, putting them into the devportal, made it really visible when a new vulnerability came in, when a new PR was ready.
What we found is that our teams were so overloaded with other functions and things that they had to do in their daily workflow that they needed some of that stuff to be ticketed. “It’s just too much. I can’t see all this stuff at once. I can’t action all this stuff at once. I can’t fix stuff. I can’t act upon stuff in real time”. That changed our mentality and our approach to it as well. Of course, we felt like the devportal offered a significantly better experience to work more efficiently for all of our engineers. What we missed on is that it takes time to develop that habit, that workflow. It takes time to learn that skill set.
Much like all of us in AI today are trying to figure out how much time do we invest in AI versus just jump in and do the thing ourselves. There’s an aspect of learning and experimentation that we have to go through in order to change, one by one, our developer workflows and our approach to it. Our developers weren’t necessarily given enough time to experiment and see how they can do things more efficiently because that, in and of itself, takes time, time they didn’t have.
Of course, we entered it thinking there are some pretty static personas. We have an executive persona. We have some dashboards for them and some capabilities. We have a manager persona and certain views that they can see their team by. We have a developer persona and different functions and features that were specific to them. What we found very quickly is that there are many different developer personas. There’s not one way that a developer works. There are many. We need to offer them more customization for that and also offer them more options out of the box on their different developer workflows. Each developer we looked at wanted to see something different. With that in mind, we made some changes. We moved past some of these now. Just an example of before and after and current state.
We had 20-plus individual and disconnected developer interfaces that you had to interact with to deliver something to production. Now we have a single interface. Mind you, most of it is read-only context, but it allows you to link out to most of the other tools and start to use compelling self-service actions. Scattered repeatable processes requiring tickets across many Jira backlogs. As you recall, it took about a day before we saw action on from certain teams. Now it’s part of the centralized catalog. The most important part is that standardized inputs. When I get a form with something to fill out, it’s very obvious, you have to select one of these options. It’s not a free-form description field in a Jira ticket. Of course, I get a level of trust. It’s a maintained contract. There’s a form. Someone has explicitly said these are your options for it, which gives me a level of confidence in usage of it.
Of course, as we roll out from a platform engineering perspective, new features, functionality, typically that might be buried in release notes, different developer documentation. Now that information is available in real time. I can see it on my stuff, because it highlights on my scorecard that you’re not following maybe your S3 bucket best practices as set out by the organization. That’s made highly visible to me. Of course, one of my favorites, the idea of time to launch a boilerplate API. For us, this meant standing everything up from scratch. You have no code to going to a fully deployed hello world API. On our platform, it used to take a week, which was just way too long. That wasn’t the developer working for a week. By the time they submitted tickets, they got results to those tickets. They got new namespaces. They got product accounts. They got the thing deployed. They went through the toil of iterating on their pipeline until it worked.
Eventually, that changed. Now it takes a total of 13 minutes for them to execute on, which has massively changed our executives’ perception of innovation. When they can quickly say, let’s launch the new API, and it just says hello world. Now I can start focusing on what matters with the business logic and the specifics of that API that I want to build out. The actual value that we’re exploring. Not focusing a week on how to put together the components before I even start on that.
Key Takeaways
A couple takeaways and thoughts for you, if you’re going to go down the path of developer portal, maybe you’re already proceeding down there. Treat your portal like a product. Internal doesn’t mean low priority. You need to apply product thinking here too. Talk to your users, your developers. Iterate, surveys, feedback. Prioritize usability. A well-loved portal evolves just like any successful external app. Build for change, not just for today. Tech stacks shift. Teams grow, and priorities evolve.
Your developer portal is intended to be static in that regard in terms of, it’s going to be the foundation that as you swap out tooling under the covers, this is going to be the pillar that gives some level of continuity to your teams. Of course, make it a team sport. A portal isn’t just an infrastructure project. It’s a collaboration and contract between all of the parties involved to build something better. I really like this quote from Katie Wilde, who talks about, “The future of developer experience is self-service. A great internal portal is how you stop interrupting your teams and start empowering them”.
Questions and Answers
Participant 1: When you said time to launch a boilerplate API, I don’t really understand what you meant by that. What kind of API? Was this something to be used internally or something for your customers? Was it a first stab at a product? I don’t understand what you were trying to accomplish there.
Gosselin: Boilerplate API, what I meant by that was the barebone skeleton of a seed project. What I would call a hello world API that has no resources but the single resource that is hello in it. You can think of it basically as a microservice, a tiny service example that has all the flowing bits fully released into our integration environment that is fully functional. That includes for us a number of resources that our teams were ticketing in the past.
Participant 2: What does it look like for your platform engineering teams to own and maintain those boilerplates and the infrastructure to get them deployed over time? Because you got to figure out what you need to build and then keep refiguring that out over time as the landscape changes.
Gosselin: The idea behind that and the approach that we’ve taken is actually that our platform engineering teams aren’t necessarily driving the higher-level seed application. We actually drive those from an inner source perspective. We connect platform engineering with the inner source community through a series of guilds that are focused on individual languages.
As an example, it would be very difficult for a single team to maintain that boilerplate API for Python, for Java, for Go, for .NET. We drive that by connecting them to those inner source teams that then in their spare time, in their extra cycles, are saying, this is our opinionated approach from a Java guild perspective of what a good Java API has, including the shared content libraries we use, the GitHub creation of it, all of those participants in it. That’s also the benefit is that, over time, if you’re starting from that same place, a lot of our teams in the past were building boilerplate APIs by copying from the last time they did it, which has now changed. Having a central seed application per language, in this case language specific, that our platform engineering teams can collaborate on was our approach today. Just to clarify, from a broken window problem, who’s really responsible? That does require some responsibility and championship internally. We do have language and guild specific champions that connect those dots or fill that gap.
Participant 3: This seems like a really comprehensive developer platform. How many engineers did it take to create this, and over what period of time?
Gosselin: I’ll call it the internal developer platform itself. For us, we created our internal developer platform, before we had a developer portal. A lot of our teams were building under the covers, using APIs to call things, and so we separate that a little bit from the portal that I showed to you today and how to accomplish that. If you build a comprehensive platform under the covers, it’s very easy to tack the portal on top of that. For us in terms of size, we’re talking about probably a team of about 40 over the course of 2 years in order to evolve it, and so that’s not just working on the platform, that’s all aspects surrounding platform engineering in our community.
Participant 4: If you treat this platform as a product, do you have a product manager, or who makes the decisions, who prioritizes these backlog items or creates a roadmap?
Gosselin: We have an internal technical product owner whose sole job is to focus on the developer platform as a whole, both including the backend bits as well as everything we talked about in the portal today. That’s really important because it does drive the level of consistency as a vertical through it all. We add a new feature, it doesn’t just materialize in the backend, it materializes in the product as a whole.
See more presentations with transcripts