Transcript
Zimmerman: My name is Jake. For the better half of the last decade, we’ve worked on Stripe’s Ruby infrastructure team, where we’ve spent quite a bit of time dealing with a large, stubborn Ruby codebase. One of the things that we’ve learned in that time is that stubborn codebases rack up all sorts of complaints about them as time goes on. For example, people will say things like, our code isn’t modular enough because it’s hard to untangle one piece without it seeming like you have to untangle everything at once. They’ll say things like, this dependency is 10 years out of date, because in the 10 years since that dependency was introduced, people have started to depend on every imaginable implementation detail of that dependency, making it hard to upgrade.
Another one is, we need to change how we talk to the database. Maybe that’s because you need to retroactively shard the data somehow. Maybe that’s because you want to swap the database out to make it faster. Whatever it is, one of the most pervasive assumptions in a stubborn codebase is that there’s only ever one way to talk to the database. Now I could keep going on here for hours, really, just talking about complaints that people have about these large, stubborn codebases, but I’m sure that you’ve got your own complaints coming to mind right now. The thing is, on our team, we’re pretty optimistic. We believe that it’s basically always possible to refactor the codebase in such a way that addresses these complaints.
Specifically, we believe that it’s almost always possible to have one team drive that migration, do this refactor to the better codebase. We’re going to take this as a given, that it’s better to have one team centralize the migration, but just as an alternative, what you might’ve done instead is have some declaration that says, this is the ideal end state of some refactor, and then split it up and have individual teams go figure out how they’re going to get from where we are now to where we need to be.
Again, we’re going to focus on the sorts of things that make these centralized migrations better, but just for sake of completeness, the reason why we operate this way is for these three points. The first one is that doing a centralized refactor or a centralized migration will concentrate expertise. Having this expertise concentrated means that if you had 10 teams go out and try and solve some problem, they’re going to encounter the same problem for the first time 10 times, and they’re going to have to come up with some novel solution for it. If you centralize the refactor, by the second or third time that the one team driving this migration encounters the problem, they’re going to be really good at just figuring out how to address it.
Then specifically with all that expertise concentrated among the team, it’s going to make automation much easier. We really want to incentivize automating this refactor because it’ll mean that fewer engineering hours overall are spent on this migration, and so the organization as a whole will get more stuff done.
Then, finally, having one team drive a migration means it’s just way more likely to finish at all. If you spread the burden of this refactor across many different teams, you’re going to have to now deal with many different teams figuring out when or even whether to prioritize doing this work. Since the whole reason why we wanted to do this migration in the first place was probably to unblock something that was pretty important, we’re going to have a lot of time where we’re just sitting around waiting for work to finish. We may as well centralize that migration and go do the work, instead of just sitting around idle.
What we really want to talk about here are what things make these centralized migrations successful, and we’ve boiled it down to two points. The first thing that a centralized migration needs to be successful is leverage over the codebase. Leverage is like this buzzword these days, people say it so often that we forget what it means, but it’s really just about using some small force to have a huge impact on a system. Without leverage over a codebase, there’s no way that a small team can have this huge refactor and can carry it out to completion.
The second thing that a centralized migration needs is some way to ratchet incremental progress. A centralized migration is going to take fewer engineering hours overall, but it’s going to be spread across a longer time horizon. While that refactor is happening, there’s going to be plenty of chances for our work to just accidentally be undone unless there’s some mechanism in place to prevent backsliding. Specifically, it’s not enough to just have some way to ratchet incremental progress, it has to be a good ratchet. We’re going to talk a lot about what makes a good ratchet later on. To successfully refactor a large, stubborn codebase, you need to have a point of leverage and you need to pick good ratchets.
Outline
The rest of the talk, we’re just going to drive this point home by way of introducing two of the large refactors that the Ruby infrastructure team at Stripe has run over the years. I’m going to start with a discussion of how we refactored Stripe’s codebase to improve developer happiness using Sorbet. Getty is going to chat about how we are currently refactoring Stripe’s Ruby monolith to make it more modular. After that, we’re going to take a little bit of a step back and talk about some of the high-level lessons that we’ve learned throughout this process.
Improving Developer Satisfaction with Sorbet
Before we get too far into specifics, I want to introduce Sorbet. Sorbet is a fast, powerful type checker that we built at Stripe designed for Ruby. It’s completely open source. It’s got all sorts of docs if you want to try it out in your own Ruby codebase. It’s got three headline features. The first is, it’s fast. To our knowledge, Stripe’s Ruby codebase is the largest Ruby codebase in the world, which meant that we basically had no choice except to make our type checker fast enough to handle these huge codebases. The second is you can use it in your IDE. Not only will it show you the list of errors, but it’ll give you autocompletion results, jump to definition, find all references, quick fix code actions, basically anything that you expect out of a good IDE. While sometimes there’s skepticism about introducing static typing in certain organizations, basically everyone likes fast, powerful editor tooling. What we see is that having this IDE integration is a critical part of pitching Sorbet to get adopted in these new Ruby codebases.
Then the third one is that Sorbet is gradual. It’s designed to be able to take a completely untyped Ruby codebase and add Sorbet to it piece by piece. It’ll add value even if you only use it a little bit, and it’ll get much more valuable the more that you lean into it. We’re not just like some type system enthusiasts that just build type systems for the fun of it. We built Sorbet because in 2017, Stripe’s developers were really unhappy. We knew they were unhappy because every six months at Stripe, we run a survey and we just ask them, what are you happy with about development in Stripe, and what are you unhappy with? On this iteration of the survey, people were saying things like, it’s too hard to understand the code because given the sheer volume of it, there was always going to be some corner of the codebase that was completely foreign to you.
They would say things like, it’s too slow to run all the tests because over the years we had accumulated so many tests trying to make sure that Stripe’s payments APIs are doing the right thing and moving money around correctly that it was no longer possible to run all of these tests locally. You had to push your change to CI and wait 5 or 10 minutes for this huge distributed cloud of CIRunners to give you back feedback about your tests.
Then, somewhat paradoxically, despite the fact that we had so many tests, people were still finding that there were too many production problems that were sneaking through the cracks. This caused people to lose faith in the documentation because they were making these changes by following the docs. Then people overall would notice that there’s just too much maybe low-quality or haphazard code, which introduced this vicious cycle where it was hard to understand.
Again, we’re pretty optimistic. We believe that taking this list of complaints, we can go ahead and figure out how to refactor the codebase to some better state. Step one of that has to be, we need to introduce a point of leverage. We built Sorbet to introduce this point of leverage. Building Sorbet represented a pretty small change, a small effort, and we’re going to talk a little bit about how small. It let us have a huge impact on the codebase. This becomes clear by going through this list of problems that people were saying. Where before people said that the code is too hard to understand, now they have this IDE that enables them to navigate through the codebase faster and more precisely than they ever could before, and build up this understanding. Where before people would say that it’s too slow to wait for tests, now they had this type checker that they could run in seconds that would show them a list of errors well before they ever pushed to CI. With this type checker, that eliminated an entire class of production problems.
For example, having something where it’s like this class name doesn’t exist as a runtime exception in production is a problem that simply just doesn’t happen in Stripe’s codebase anymore. Certain other classes of errors like, you typoed the name of a method or you passed the wrong types maybe haven’t been completely eliminated but are substantially more rare. These type annotations that people are adding then serve as a machine checked documentation, meaning that these type annotations are incredibly trustworthy. That’s especially so because these type annotations are actually also checked at runtime.
If you open up a file and you see a type signature on a method, there is every reason to believe that that annotation is correct, which is not the same as what you can say about nearby documentation comments. Then somewhat more subtly, Sorbet actually set this baseline for code quality. Specifically, if it’s hard or annoying to write a type annotation for a method, chances are that’s because the method itself is complicated and poor quality. Finding a way to add a type annotation to a method often means just simplifying the code outright. I think it’s pretty clear that Sorbet had a pretty large impact on the codebase, but I’m sure you’re still wondering, was it actually this small effort? Was it this leveraged effort?
A brief history of Sorbet. We started working on it again in the fall of 2017 at a time when there were a couple hundred engineers at Stripe. It took about nine months to build Sorbet, and then another three months to get to 75% of Stripe’s Ruby files opted into type checking. That’s probably less time than you might have thought. If you had been asking me in 2017 how long it takes to build a type checker, I would probably have been quoting you a number in years, not in months. This is a whole nother rant, but in general, one thing I’ve learned is you’re probably overestimating how hard it is to build new language level static analysis tooling.
At the end of the day, type checkers are still just programs. If you’re here, it’s because you’re pretty good at writing programs. You can get pretty far with a prototype that’s just a lint rule plus some hacked together supporting code. If you need more power, chances are your compiler toolchain probably has a way to write a plugin that gives you access to some compiler internals that get the job done. Even if you have to build something yourself, there’s tons of high-quality libraries these days for getting something working.
All you need to do is just whip something up quickly, prove out the idea, and make it better later. This is just how we build software these days. It doesn’t go out the window when the program that we’re trying to build is a type checker. Few people know this, but the first version of Sorbet actually started as the guts of a toy Scala compiler having been ripped out and just replaced with code to deal with Ruby. That was enough to convince us that the direction could work, and the rest is history. Basically, building Sorbet represented a comparatively small amount of effort that let us have this huge impact on the codebase.
Now let’s switch over and talk more about that second point that we need to carry out a successful migration, which is ratcheting. In Sorbet, the way ratcheting works is that there’s a typed: comment that you put at the top of every file specifying what we call the typed level. At typed: false, Sorbet only validates syntax and checks that the constants resolve. When you kick it up to typed: true, Sorbet will also run type inference in method bodies.
Then, at typed: strict, Sorbet will also say that every method needs an explicit type annotation, even if that annotation just says nothing more than, this method is effectively untyped. This typed level comment behaves like a ratchet because it’s easy to go up a level. All you have to do is just change the comment, fix the errors, commit the change, and you’re set. It’s hard to backslide against this progress. At Stripe, that friction comes from code review. If you try and make a PR that drops the typed level, your reviewer is just going to ask, why are you doing it this way? You could have fixed the type errors. You could even have silenced individual type errors rather than downgrading that entire file. This is the way that we make progress and lock it in over time.
Thanks to the power of iframes and WebAssembly, I can actually give you a live demo here. Here we’ve got a Ruby file. It’s pretty simple. It’s just trying to open a file, write a line to it, and then close it, print an error message if something goes wrong. The first step of adopting Sorbet is we go up to the top and we set typed: false at the top. Once we’ve done this, the first thing that we see is now we’ve got this error that Sorbet reports, an IoError. IoError, it doesn’t exist, we’ve typoed it. We needed to capitalize the O. That’s the first step.
As soon as we do that, Sorbet will just tell us what the problems are. In general, this class of problem was actually super common when we were rolling out Sorbet. What we notice here is that this IOError is in a rescue handler, which is basically the way that Ruby does try-catch exception handling. What it looks like this piece of code is trying to do is it doesn’t explicitly re-raise that exception. It looks like it’s trying to explicitly swallow the error.
This is the super-common pattern that we noticed, is that the happy path works totally fine, but there’s this ticking time bomb waiting in the shadows of when this code breaks the first time. Instead of silencing this error, what’s actually going to happen is the RubyVM is going to say, I don’t know what constant this is, and actually raise a name error exception that’s not caught, and then crashes the entire process. That’s value that we see that was especially common when we were rolling out Sorbet. This is pretty easy to fix. We can just go over here and Sorbet in the error message actually tells us that there is a fix available.
If we put our cursor there and we accept the code action, we can replace with IOError, and the error is fixed. That’s the whole point of this ratchet is that fixing it is pretty easy. It’s basically easy to turn the crank in the right direction. Once we’ve fixed all of these errors, we can lock in incremental progress with this typed: false level. We’re not done yet. Let’s go ahead and keep ratcheting that up. If we go up to typed: true, we see another error. This time Sorbet says that the log method does not exist, because in Ruby, the way that you print a line is you use the word puts.
That notion of like which errors are reported at which level gets at the heart of the issue of what makes a good ratchet. Because we said earlier, it’s not enough to just have a ratchet, it has to be a good ratchet. Our claim is that Sorbet’s typed: comments are a good ratchet because of three reasons. They are local, they are incremental, and they are actionable. To see what we mean by these points, let’s consider some alternative ways that we could have ratcheted Sorbet’s progress.
Instead of going by file with this typed: comment, we could have gone by folder. This would not have been local enough. It would have been really hard to confirm when you’re looking at a given piece of code whether Sorbet applies to it or not. You’d have to traipse up through the directory hierarchy to figure out whether there’s a config file and an enclosing directory that enables Sorbet and to what degree. That’s just way more work than just scrolling up to the top of the file and checking this comment.
The second thing is this would not have been incremental enough. It’s a huge lift to have to go add signatures, for example, to every method in an entire folder versus just adding signatures to the file that you already happen to have open from doing your normal work. Another alternative might have been to have some typed coverage percentage. Maybe like 60 percent of the lines in the entire codebase are covered by the type checker, and then couple this with some mechanism that says that that typed coverage percentage can only ever go up. This ratchet is really hard to action. The first reason is that just deleting typed code makes that metric go down. Ideally, you never want to penalize deleting code because no one wants to maintain code, and so the more that you can delete, the better.
Another problem is that if you need to call a method that’s owned by some other team, it’s pretty punishing to have to go add types to their method just before you can even call it in your own codebase. Because of this, we think that these coverage percentage-based ratchets are really hard to action.
I want to drill into what makes an actionable ratchet because it’s basically where the heart of the issue lies. We’re saying that these actionable ratchets are high signal, low noise, that they’re really stopping you from doing the things that you absolutely shouldn’t be doing. When they do trigger and they do tell you that you’re not allowed to do something, it should be within your power to fix it. Earlier we saw an example of this where certain classes of errors were reported only in certain levels, and that becomes a little bit easier to see with a different example here. Here we’ve actually started at typed: true, and we see two errors. The first is that this UnknownParent constant fails to resolve.
Then, also, this method_on_parent method doesn’t exist. When you’re adopting Sorbet, you wouldn’t jump straight to typed: true. You would start actually by going to typed: false. Here we see that Sorbet is saying this one error is the most important error for you to fix right now, and it’s that constant resolution error. We can go ahead and fix that. If we change it to just KnownParent, again, at this point, because this ratchet is incremental, we could have just locked in the progress right here, but let’s keep going. Now we’ll finally kick it up to typed: true. What we see is that the error actually changed.
Instead of saying that this method_on_parent doesn’t exist, Sorbet gives us a better error, which is that it does exist, but you haven’t given it enough arguments. That’s a much higher signal error message. You don’t want to have to figure out which are these errors that are just like spurious because you hadn’t fixed the root cause errors yet. By choosing our typed levels to explicitly pull and tease these things apart, we make the error messages in the ratchet much easier to action. To recap, the typed: comment is local because all you have to do is check the top of the current file. Incremental because all you have to do is think about the current file that you have open, and lock in your progress after fixing that one file.
Then actionable, because the problems that you’re going to encounter when upgrading a file are high signal and within your power to fix. To sum that up, when we rolled out Sorbet, developer satisfaction improved because we refactored this large, stubborn codebase by building Sorbet to be a point of leverage and introducing these typed: comment ratchets that form this good ratchet.
Making a Ruby Monolith More Modular
Getty is going to talk about a lot of the same ideas, but in the context of a different refactor that we’re working on to make Stripe’s Ruby monolith more modular. While some of the ideas are going to be the same, a lot of the ideas will be way more subtle and non-obvious, and he’s going to dig into all the reasons why that is.
Ritter: Modularity is something that we talk about, but I wanted to take a step back and talk about, if we’re going to be doing a codebase-wide refactor, why do we want to? Codebase-wide refactors are big. We want to make sure that they’re motivated by explicit problems that we can solve. To do this, I’m going to dig into a little bit of a toy example. This is an example of a logger written in Ruby. If you don’t know Ruby, you just need to follow the broad strokes and I’ll point things out. This logger implementation takes a message that is going to appear in the log and a structured payload of key-value pairs so that you can have structured logging of information that you want to have. It can be used like this. We can see here we’re calling logger with the string,
Attempting operation, and giving it a couple key-value pairs that we’re going to put in the logs. We can also see in this example a problem with our naive little logger. Here we have a merchant, which at Stripe is the model that we use to represent a user or business that is taking payments. In the logline that I have in the comment, we can see that there’s some secret field that I’ve included in there. Our naive logger is going to put some personally identifying information that we don’t want to have in the logs, in the logs. A well-intentioned engineer might see this and fix it in this relatively naive and straightforward way. The way that we’ve changed the code here is this line. We check to see whether the thing that we’re logging is a merchant.
If it is, we can print something special that redacts a lot of fields. Now we no longer have the secret information in the logs. This does indeed fix that problem, but it causes a subtler, more pernicious long-term problem, which is a dependency cycle. Now, because we have explicitly mentioned the merchant class in the context of our logger, anything that uses the logger is going to necessarily also use the implementation of merchant. Some services, maybe that’s ok. Maybe they’re already using both. Presumably, we’re using the logger in other services, say our CIRunner, which doesn’t need to know anything about the merchant class that takes payments. Yet, our CIRunner, by virtue of now using the logger, has to additionally pull in the implementation of merchant and presumably all of its transitive dependencies. This is just one cycle. This was one cycle fixing one real problem by doing it in the simplest possible way.
Over time, codebases can accumulate dozens, hundreds, thousands, tens of thousands of these kinds of bad dependency edges that produce cycles. Why are these a problem? When we have tangled code like this, it becomes more difficult to debug. Our logger now depends on the merchant. Does it depend on an implementation detail? Not that much in that example, but do we know that for sure? Is it possible that when we’re debugging a problem with logger, we need to dig into its transitive dependencies? This is also a problem with testing. Now, all of a sudden, we need to test more than we did before. We might need to look deeper. We might need to rerun those tests more. This can also be a problem with the things that we’re deploying. We’re using Ruby, which is a dynamic language, which means when we do a deploy, we take the source code and we move it to the machines that are going to run it.
You might also be using a compiled language in which you give it to a compiler, but now you’re giving more code to the compiler or you’re shipping more actual code. Because you’re loading up more code, there’s more code either in the binary or in the resulting artifact. There’s now more code at runtime, which means you’ve got more memory loaded. That can in turn decrease latency. That can increase garbage collection time, all sorts of things. Ultimately, this is going to cause problems both for developer velocity and productivity and for runtime performance. We want to fix this. We want to get to the point where we have more modularity, we have fewer of these dependency cycles. We’ve already talked about our thesis, which is that if you have a point of leverage and you have a ratchet, you can do it.
Step one is, come up with our point of leverage. The way that we did this at Stripe is we introduced a packaging system. This is an example of packaging configuration for the two examples that we gave before. We have a package for the logger, and we can see that it imports merchant. We have a package for the merchant, we can see that it imports the logger. Our packaging system is actually part of Sorbet. We extended Sorbet to have a notion of packages, which map onto Ruby classes and modules, and have to import the other classes and modules that they use.
If they don’t, it is a Sorbet error, which is a static error visible in both the editor and in CI. This really helps. Going through our codebase and adding packages everywhere gives us visibility into what is defined, what uses what, in a much more coarse way that is easier to get a handle on than doing it on a profile level. This isn’t quite enough. We’ve exposed that this cycle exists. We haven’t really exposed the problem with it.
When we built the packaging system, we added something else, which we call layering. Layering is not a new idea. I have a quote up here from Eric Evans’ book, “Domain-Driven Design”, which I believe was published in 2003, and it doesn’t originate with Evans either. I just particularly like his treatment. The way that he describes it is that the essential principle of layering is that any element of a layer depends only on the other elements in the same layer or on elements of the layer beneath it. Communication upwards must pass through some indirect mechanism. That’s great and abstract. What does that look like in practice for us? At Stripe, we group our codebase into five layers. Every package in the codebase is tagged with one of these five layers. Going from lowest to highest, we start at the utility layer, which defines utility libraries, logging, HTTP interfaces, database interfaces.
Above that, we have the power layer, which wraps up external APIs that we use, other partners that we need to talk to in a way that is not specific to Stripe’s business ideas. Above that, we have the business layer. This is where most models and logic live, things that we serialize to the database, and those can make use of the power layer to do particular things. Above that, we have the API layer, which suddenly becomes aware of things like serialization and deserialization from web wire formats, endpoints, and so forth, and can implement those by reaching into the business layer and using that.
At the top, we have services, which assembles the things built at the API layer into actual deployable services complete with middlewares and so forth. With these under our belt, we can now explain exactly the problem with that dependency cycle we saw before. Logger shouldn’t be depending on merchant because merchant’s at the business layer, and logger’s at the utility layer. This edge here is what we call a layering violation. Translating this back into our packaging system, we can now point to the exact line that’s the problem. Notice that we’ve added the layer tag to each of these packages, and we can also point to that particular line and say, logger shouldn’t be importing merchant. That’s a layering violation.
Now we’ve got this leverage, we’ve got this packaging system, and we’ve got these notions of layers all throughout the codebase. Now we need to actually do our refactor, and that’s where the ratchet comes in. The ratchet that we came up with is called strict_dependencies, and this is a different tag we also put in the packages. I’m going to give some images example, because this involves graphs of dependencies, and so it’s a little bit hard to visualize. Hopefully, this should help you understand what’s going on. At the first level, strict_dependencies is set to false. When we rolled it out, everything had false strict_dependencies.
At this point, a package can import whatever it wants. In this case, we’re looking at MyPkg, the only one here with a name and an emoji. This has a couple of imports. One of those imports is up to the API layer. Another one is down to the power layer. This is fine. The strict_dependencies are false. We have not started refactoring this package. If we want to move up one level, we get layered. If a package is tagged with strict_dependencies layered, then it is a problem both in editor and in CI when it imports a package from a higher layer. That means that, as stated right here, MyPkg can’t actually be layered because it has a dependency on something at the API layer. If we were to refactor it, say use something like inversion of control, so that that API package can call into MyPkg instead of the other way around, we can tag it as strict_dependencies layered, and now CI passes.
If we want to then improve it a little bit more, the next level up we call layered_dag, DAG for Directed Acyclic Graph. At this point, we say, not only must it only have imports to the same or lower layers, but the packages within that layer must not have any cycles. Again here, we see that MyPkg can’t be immediately made layered_dag because there is a cycle among the packages in the business layer that it relies on. If we refactor this in some way, split some code, now we can tag it with this, and again, now it’ll pass CI. Notice that here at the power layer, at the lower layer that it uses, there’s still a cycle, but we’ve said we don’t care about it for this one. This is a particularly local refactor.
Then, finally, we go up one more level in the ratchet and we get to strict_dependencies dag, which says, do the whole thing. There are no cycles anywhere, so now we have made the power layer a problem and this has to be refactored before we can finally lock it in, and say, now we have completely removed all cycles from our dependencies.
This is the ratchet that we came up with, strict_dependencies. Every package in the codebase started with one of these tags. Effectively, all of them at the beginning had to start as false because we had a really nasty tangled codebase. Over time, we have used this ratchet to figure out how to push forward and gradually remove bad dependencies from our entire codebase. This is a process that’s still ongoing. This is not a retrospective of what we’ve done. We’re in the process of doing this. When we started, near 100% of our codebase was in one massive cycle. From any package, you could get to almost any other package in the codebase by following a chain of imports. We refer to this as the ball of mud.
At this point, less than 10% of our Ruby codebase is in this state. Most of it has gotten to the point that we have factored it out, and we have improved the modularity across the board. To revisit both of these ratchets, we said that good ratchets are local, incremental, and actionable. In this case, we have a different definition of local, but because we think that there is a different amount of complexity involved, with the complicated graph-based nature of dependencies, we decided that it was better for it to be package local than file local. They’re incremental because you can go up a little bit and fix some of the problems without fixing all of them. The problems that you’re fixing are the most egregious at each point. They’re actionable because when you want to move up, you can find exactly the problems, chase those down, and then move to the next level.
Lessons Learned from Ratchets Ratcheted
All of this is a real great theoretical approach. We’ve talked about, yes, you just need a ratchet, you just need a leverage, it’s great. As the old quote goes, “In theory, there’s no difference between theory and practice, but in practice there is”. Obviously, this was not quite as clean as we like to pretend that it was. We had to work. We want to take a moment to talk about some of the things that were maybe not as clean, things that might come up if you’re trying to follow this to do your own codebase-wide refactor.
First of all, there’s a bunch of things that are important to have in addition to the leverage and the ratchet. First of all, and we talked about this in both cases, it’s important to have a reason. If you’re going through and doing all of this work to refactor your codebase, you don’t want to do it just because modularity is good. Modularity is good, but it’s not good because it’s like an inherent good. It’s good because there are practical things that you get out of making it modular. Similarly, static types are good, but they’re good because of the observability and safety and so forth. They’re not good because static types are inherently good. Another thing that’s really important is comprehensive documentation.
One of the things that’s most relevant here is, ratchets are going to stop people from doing a thing that they want to do. Ratchets are going to tell them, don’t write that code, it’s poorly typed. Don’t add that import, it’s a bad import. It’s really bad for developer UX if they get this and they don’t know why they got that and what to do about it. Another thing that really helps in that case is really good targeted tooling. Tooling that can help block and understand why these things are happening. Finally, organizational support is completely vital here. You’re going in and changing other people’s code. Other teams are going to see your PRs go by and say, is this really necessary? You’re doing all of these refactors, and so making sure that they are on board with your eventual goal is vital.
Another thing that’s worth noting is we presented all of these as though they were like jumped, fully-formed from the ether. We just had the idea and the ratchet was perfect. That’s not the case. It takes time to iterate on the right thing. A good example here is, our layers didn’t actually start with the five-layer thing. We had as few as three and as many as six. We over time figured out, do we need more layers? Do we need fewer layers? It is a function of how close we are to that ideal end state and how much that ideal end state reflects what we want our codebase to look like. I don’t know for sure that we’ve gotten right at one. The one that we’ve gotten to has worked really well for us but it’s probably not perfect.
Packaging had a couple of other false starts as well. When we originally started, we actually started with packaging being a runtime feature, and that was really interesting to us but the problem was that was really risky to deploy. All of a sudden, if our packager had a bug in it, you might not have the correct package at runtime, and that was just not actually giving the value that we wanted. Similarly, we originally wanted to have exports, that is the individual namespaces would only expose certain Ruby constants and not all of them. That ended up really affecting developer velocity because people were used to this codebase where anything could freely import anything, and so trying to do more granular exports proved to be a problem that just didn’t solve any of the problems that we were ultimately trying to solve.
An important corollary to this is, if you’re not sure about it, don’t rush it. In both cases, we actually took quite a while before this ratchet was directly in front of engineers. For Sorbet, we actually ran it in quiet mode for a fair bit. We would have it as part of CI but developers wouldn’t be blocked by type errors directly. Instead, those would go straight to the Sorbet team who would double check to make sure they’re the right errors, that they were expressed in a comprehensible way, that there weren’t bugs and so forth. For packaging, we originally started with one service, with our CI service as a test bed to make sure that packaging was both reasonable to work with and also captured the invariance that we wanted to capture before we put it in front of every engineer’s workflow.
Another thing that’s probably worth noting is, we said before, one of the things that keeps these ratchets going is code review. That’s true in as much as you can trust that code review is going to catch every bad thing and never let a bad thing through. We all know that’s not true. Not necessarily because code reviewers are slacking, sometimes the PRs are just too big. One of the things that we’ve done in both of these migrations is have a two-level ratchet, that is to say, a ratchet on the ratchet. In Sorbet, for example, we actually have a package level ratchet that tells you the minimum amount of Sorbet typedness you can have in your package. This is a little bit of a lie, we actually have two of those, because we tend to hold a little bit looser guidelines for test code.
For packages, we actually have a global ratchet. We have a mechanism for packages where once a package has achieved strict_dependencies, we add it to a global list we call, try not to regress. This list, if you then try to push that to CI and regress that strict_dependencies level, will give you an error. Importantly, anyone can remove something from this list. This list is shared and so people can say, yes, we got to strict_dependencies but it turns out I do need this bad dependency, I need to call into a legacy API that we haven’t refactored, and that’s fine. After a longer period of time, we then move it onto another list called never regress. This list can also be removed from, but only with our team’s authority. Now this is locked down more harshly, we still can remove things, but we can put our expertise in to decide whether it is ok to remove it.
Finally, we talked about tooling just to give some ideas about what that looks like. When we were rolling out Sorbet and gradually doing this codebase-wide migration for Sorbet, we had a lot of metrics and dashboards which powered something. Something really cool that we had is we actually tracked not just which files are say at false and at true, but also which files could be made true but haven’t been yet. Because among other things, we wanted to find out, ok, that’s going to be a ratchet. We don’t want to necessarily move the ratchet up too quickly and impede people’s developer velocity if we don’t think it’s going to be useful. We also have a lot of autocorrects. This is something that helps with the actionability of these ratchets. All of a sudden, these ratchets have autocorrects that you can apply very quickly, and automatically fix certain classes of problems that we understand.
Similarly, editor integration, which we talked up earlier, is completely invaluable because this is making it more comprehensible, more actionable, and more immediate. For packaging, we have a number of similar tools. For packaging, we have a tool called gen-packages, which automatically fixes up all imports, all exports, everything. This is really vital because this is where we need to put our expertise into making these imports and exports comprehensible. Sometimes you’ll get a strict_dependencies problem that’ll involve a cycle of 50 packages. Making sure that the error messages point to the right thing in a tool like gen-packages is incredibly important. We have visualization tools. We have a great tool, and this is one of the things that I alluded to before, called Dependency Doctor, where when someone comes across a strict dependency violation, Dependency Doctor steps in and tells them, here is how to fix it.
Then, of course, because this is built on Sorbet, we also have editor integration to make it immediate. All of that is to say, there was some minutia at the end, but the core idea here is that all of these large, stubborn codebases, these codebases that have nasty legacy problems owing to just accretion over time, can pretty much always be addressed by picking your points of leverage, building them out if you need to, picking good ratchets that are local, incremental, and actionable. Then just doing the work, having patience. It takes time, but it can make what seem like completely intractable problems solvable by a small team in a finite length of time.
Questions and Answers
Participant 1: You mentioned the fact that there is one dedicated team to accomplish this task, basically. From a code versioning perspective, where does it happen? You are branching the codebase to work on that, and merging it periodically with the ratchets and whatever, and so the rest of the team is basically aware of the suggestion, and carrying out the actual stuff, or you are doing it collaboratively, then you fix. How does it cope with evolution of the codebase, which happens in the meantime?
Ritter: There’s a little bit of a mixture here. A lot of times, we have a team, and that team’s job is to go through and do these refactors. This team will occasionally pull people in from the teams that own the specific area. They will have a particular focus area. In the case of modularity, we go on a per-service basis. We will say, ok, we have this giant codebase, but there’s individual service entry points. We can look at those entry points and say, what do we need to do to get here? That will give us a list of things. We can use some graph algorithms to figure out which edges are the most important to cut, and where to go to. Find the teams that own those, and consult with them. Occasionally, they do want to be involved. Occasionally, they don’t. We rely on them to actually do the ultimate approvals.
The practice of working on code at Stripe is a lot of developer velocity. We’re not working on necessarily long-running branches. We tend to be working on small, incremental refactors. A lot of times, what we’ll be doing is, I showed, ok, there’s the imports we can identify are bad, we’ll go one at a time. We’ll fix this import. We’ll fix that import. All of these are small enough. That means generally, nobody else is really egregiously blocked by this. There’s still a lot of velocity going on elsewhere in the codebase. There’s just these couple of individual things. By doing it in this way, by exposing it in this incremental way, we generally don’t sacrifice developer velocity elsewhere. There are a couple of exceptions to that. I’m not going to claim that it is completely roses all the time. Sometimes you do a refactor, and it turns out that that refactor didn’t take into account that they’re just about to ship a new feature, and that new feature is going to require a bad edge that you didn’t know about, and now you’ve slowed them down.
The other thing is that we also don’t mind backsliding occasionally. We will pull the ratchets back down. It’s just, we don’t want that to be the norm. Again, by having a centralized team, we also have the ability to reason about that. Like, is it ok to allow this backsliding or not? Modularity in particular is important because, like we mentioned, it actually reduces memory usage a fair bit, like a pretty egregious amount. Sometimes, having done this refactoring, having gotten a service to strict_dependencies, means that we will be able to run it on smaller instances, on smaller hardware.
That means backsliding in that case is not just like, this is bad because modularity is worse. It means that you could actually start running out of memory on your instances. You would need to pay more to get larger cloud instances or so forth. Having that team in charge, they’re able to make that call and say, this backsliding is acceptable because we were only halfway through refactoring this to begin with. That backsliding is not acceptable because we now rely on the fact that this is detangled in order to have various runtime properties, and backsliding in that is not going to work.
Participant 2: I must admit, the idea of one single team doing the complete refactoring was completely new for me, but we’re here to learn things. How many teams were working on the Ruby code?
Zimmerman: There’s still one team working on Ruby code.
Participant 2: One team?
Ritter: No, like teams across Stripe. I believe it’s in the hundreds.
Zimmerman: Let’s say if there’s a couple thousand people at the company then there’s maybe like half as many engineers, and then there’s maybe half of that working on Ruby. It’s probably still in the thousands.
Participant 2: The team that does the refactoring, how big was the team and how long did it take them?
Ritter: We introduced strict_dependencies as a notion in 2020. We got to the point that the entire codebase was packaged end of the same year. We started doing detangling as an effort in early 2021, because at the time strict_dependencies was still a little bit of a hypothetical pipe dream. At this point, we have made a significant amount of progress but it’s been three years. It’s been a core team of about half a dozen people with different individuals cycling on and off. Probably about six people. There’s places that we could have parallelized, but there’s also places where the things that we needed to fix were so cross-cutting that there was going to be a little bit of a chokepoint anyway, because like, we need to fix the API libraries that are used in every API. Having about six people has actually been, I think, pretty great for that. It will occasionally get bigger. Like we said, we’ll get people from the other teams occasionally to tag in and help out.
Zimmerman: Importantly, throughout this whole process, it’s just like you wake up one morning on a product team and suddenly your service uses seven times less memory. In the meantime, they have been just like completely ignorant of it, working on their own features, shipping their own product deadlines. Then suddenly they wake up and it’s like, ok, magically like my codebase is 10 times better.
Ritter: About six people, have been about three years. I don’t know that it’s necessarily important to get to the point that 100% of the codebase is perfect. This is for a goal, so there are particular services that we want to focus on and so forth.
Participant 3: I was curious about the team that takes care of the refactoring and adapting, how close to the developers of the tooling is it useful to be? Can they be just practitioners or do they need to have a special relationship with the people that built the tooling or a special relationship with them?
Ritter: I think they generally can be practitioners. It does help to have a relationship with the people who built the tooling.
Zimmerman: There’s a couple examples here. The first migration that we did when we were rolling out Sorbet was just get first all the files to typed: false, and then get a bunch of files to typed: true, we kind of left it there. We didn’t really do a big push to get as many files as we could to typed: strict where every method in that file has a type annotation.
At that point, it was actually people on another team that was working on specifically like the entire payments codebase. They were like in our codebase, we really need type coverage as much as possible. They just found the tools that we had built in service of rolling things out in the past, applied them to ratchet even further, and did not really need much knowledge except how the type system works. It’s pretty similar with the modularity tooling as well because the whole point of having automated this means that you distill that knowledge into the tool itself. Then, once it’s built, you can have people discover it on their own.
Ritter: In both cases, there have been examples of teams who have just enthusiastically taken it up.
See more presentations with transcripts