Transcript
Price: I’m going to be talking about moving your bugs forward in time. This is the topic that I’ve been thinking about on and off for many years. Before we get into the meat of the talk, how many folks are up to date on the Marvel Cinematic Universe? Those of you who are not, no problem. I’m going to start with a little story related to it. In the recent movies and TV shows, they’ve been building on this concept of the multiverse, where there are all these different parallel universes that have different timelines from one another. They differ by small details along the timelines. There’s one show in particular called Loki. He’s kind of a hero/villain. In his show, there are all these different timelines where the difference in each timeline is like Loki is slightly different in each one. In one of them, for example, Loki is an alligator, which, as you might imagine, leads to all sorts of shenanigans.
In that show, there’s this concept of the sacred timeline. There’s the one timeline that they’re trying to keep everything on line with. All of the other timelines somehow diverge into some weird apocalyptic situation, so they’re working to try to keep everything on this sacred timeline. I’m going to talk about what a bug might look like on this sacred timeline. We start off with, a developer commits a bug. We don’t like for this to happen, but it’s inevitable. It happens to all of us. What’s important is what happens after that. This is a little sample bit of toy Python code where we’ve got a function called divide_by_four. It takes in an argument.
Then it just returns that argument divided by four. Somewhere else in our codebase, some well-meaning developer creates a variable that is actually a string variable, and they try to call this function and pass it as the argument. What happens in Python is that’s a runtime error where you get this error that says, unsupported operand type for the divide by operator. On the sacred timeline, we’ve got something in our CI that catches this after that commit goes in. We may have some static analysis tool that we’re running. We may have test coverage that exercises that line of code. The important thing is our CI catches it, and that prevents us from shipping this bug to production. What happens next? The developer fixes the bug.
Then the CI passes, and they’re able to successfully ship that feature to prod. This is a pretty short, pretty simple to reason about timeline. The cost of that bug was basically like one engineer, like one hour of their time, something on that order. Fixing the bug probably actually took less than an hour, but dealing with monitoring the CI, monitoring the deployment, maybe it takes an hour of their time. Still not a catastrophic expense for our business.
Now we’re going to look at that bug on an alternate timeline. We’ll refer to this as the alligator Loki timeline. In this timeline, the developer commits the bug. For whatever reason, we’re not running the static analysis tool or we don’t have test coverage, and the bug does not get caught by our CI. Then we have our continuous delivery pipeline. It goes ahead and deploys this bug to production. Say we’re a regional service that deploys to regions one at a time to reduce blast radius, so we deploy this bug to U.S.-west-1. Then, for whatever reason, this code path where the bug exists isn’t something that gets frequently exercised by all of our users, so we don’t notice it. Maybe a day passes. Then the time passes. The bug ends up getting deployed to U.S.-east. Some more time passes, we still haven’t noticed there’s a bug. Deploys to Europe. Some more time passes.
Then alligator Loki eats the original developer, or maybe something more realistic happens, like they transfer to another team, or get a promotion, or whatever. That developer is not around anymore. Our bug keeps going through the pipeline and deploys to Asia. Now we have this big problem. It turns out that in that region, there is a customer who uses that code path that we didn’t catch in the earlier regions when we deployed. Now we’ve gotten this alert from this very important customer that they’re experiencing an outage and we’ve got to do something about it. Our operator gets paged. The manager gets paged because the operator doesn’t immediately know what’s going on. Maybe some more engineers get added to a call to try to address the situation. They start going through the version control history to figure out where this bug might have come in. They identify the bad commit, so now they know where the bug came in, but this has been days.
Several other commits have probably come in since then, and now they have to spend some time thinking about whether or not it’s safe to roll back to the commit prior to that, or whether that’s going to just cause more problems. They spend some time talking about that, decide if the rollback is safe. Then, they decide it’s safe. They do the rollback in that one region, and then they confirm that that customer’s impact was remediated. That’s great. We’re taking a step in the right direction. Now we have to deal with all those other regions that we rolled it out to. Got to do rollbacks in those as well. Depending on how automated our situation is, that may be a lot of work. Then this could just keep going for a long time, but I’m going to stop here.
When we think about the cost of this bug, compared to the one on the sacred timeline, the first and most important thing is there was a visible customer outage. Depending on how big your company is and how important that customer was, that can be a catastrophic impact for your business. We also spent time and money on the on-call being engaged, the manager being engaged, additional engineers being engaged, executing all these rollbacks however much time that ended up taking. Re-engineer the feature. Now we have to assign somebody new to go figure out what that original developer was trying to achieve, redo the work in a safe way, get it fixed. They’ve also got to make sure that whatever other commits got rolled back in that process, that we figure out how to get those reintroduced safely as well.
Then the one that we don’t talk about enough is opportunity cost. Every person who was involved in this event could have spent that time on something else that was more valuable to your business, working on other features, whatever it may be. When we compare the cost of these two timelines, the first timeline looks so quaint in comparison. It looks so simple. The cost was really not that big of a deal. On the second timeline, it bubbled into this big giant mess that sucked up a whole bunch of people’s time and potentially cost us a customer. The cost is just wildly different between the two. We want to really avoid this alligator timeline. What’s the difference between those two timelines? The main difference is that in the sacred timeline, we caught that bug at build time. In the alligator timeline, we caught it at runtime. That subtle difference is the key branching factor that ends up determining where you end up between these two scenarios.
Background
That’s what my talk’s going to be about, when I say moving your bugs forward in time. I’m talking about moving them from runtime to build time. Thankfully, I think that a lot of modern programming languages have been building more features in to the language to help make sure that you can catch these bugs earlier. That’s what I want to talk about. My name is Chris Price. I am a software engineering manager/software engineer at Momento. We’re a serverless caching and messaging company. Previous to that, I worked at AWS with a lot of other folks that are at Momento now. I worked on video streaming services and some of us worked at DynamoDB. Before that, I worked at Puppet doing infrastructure as code.
Maintainability
Then zooming out before I get into the weeds on this, this phenomenon I’m talking about, about moving bugs from runtime to build time is really a subset of maintainability, which as I’ve progressed through my career as a software engineer, I’ve really found more of that maintainability is one of the most important things that you can strive for, one of the most important skills that you can have as a software engineer.
When I first got started straight out of college, I thought that the only important thing about my job was how quickly I could produce code, how fast can I get a feature out the door, how many features can I ship and how quickly. As I got more experience in the industry and worked on larger codebases with more diverse teammates, what I realized is that that’s not really the most important skill for a software engineer. It’s way more important to think about what your code can do tomorrow and how easy it’s going to be for your teammates and your future teammates that you haven’t even met yet to be able to understand and modify and have confidence in their changes that they’re making to your code. That’s going to be the central theme of this talk.
Content, and Language Trends
These are the six specific language features that I’m going to dive into. First, we’ll talk about static types and null safety. Then we’re going to talk about immutable data and persistent collections. Then we’ll wrap up by talking about errors as return types, and exhaustive pattern matching. Some of the languages that have influenced the points that I’ll be making in this talk. I spent a lot of time working in Clojure a while back, and that is where I really got the strong sense for how valuable it is to use immutable data structures, how much that improves the maintainability of your code. Rust is one of the places where I really got used to doing a lot of pattern matching statements. Go is the first language that I worked in that really espoused this pattern of treating errors as return types rather than exceptions.
Then, Kotlin is a language that I really love because I feel like it takes a lot of these ideas that come from some of these more functional programming languages, and it makes them really approachable and really accessible in a language that runs on the JVM. You can adopt it in your Java codebase without boiling the ocean. You can ease your way into some of these patterns without having to switch out of a completely object-oriented mindset overnight. It’s a really awesome, approachable language. Two engineers have had a lot of influence on my thinking, Rich Hickey, the creator of Clojure, Martin Odersky, the creator of Scala.
If you get a chance to watch any talks that these gentlemen have given in the past, I highly recommend them. They’re always really informative, and they’ve been really foundational for me. I also highly recommend if you can find a way to buy yourself some time to do a side project in a functional programming language. The time that I spent writing Clojure, I think, was more formative for me and improved my skills as an engineer more than any other time throughout my whole career, even though I haven’t written a line of Clojure code in quite some time now.
Static Types, Even in Dynamic Languages
We’ll start off with static types, and I’m saying even in dynamic languages. I realize that that may be controversial to some folks. We’re going to go back to this bug that we started off with on the sacred timeline and the alligator timeline, where we passed the wrong data type into this Python function. A lot of times when I try to talk to people about opting into static typing in some of these dynamic languages, I hear responses like this, “I can build things faster with dynamic types, and I can spend my time thinking about my business logic rather than having to battle with this complicated type system”. Or, another thing I hear is, “I can avoid those runtime type errors that you’re talking about as long as I have good test coverage that exercises all the code”. I used to believe these two things, and they’re still definitely very reasonable opinions to have, but I’ve drifted away from these.
Working at AWS was probably the place where I really started to drift away from these. Inside of AWS, there’s a lot of language and a lot of shared vocabulary that gets used to try to give people a shared context about how you’re thinking about your work. One of the ones that really stuck with me was this one, “Good intentions don’t work, mechanisms do”. This is a Jeff Bezos quote, but it’s really widely spread through a lot of AWS blogs and other literature. Mechanisms here just means some kind of automated process that takes a little bit of the error-prone decision-making stuff out of the hands of a human and makes sure that the thing just happens correctly. It takes away your reliance on the good intentions of engineers. That’s going to be another key theme of this talk is that a lot of these types of bugs that we’re talking about, they come to play when you have something in your codebase that relies on the good intentions of your engineers to stick to the best pattern.
We’ve got these beloved dynamic languages like Clojure, JavaScript, Ruby, Python. My claim is that if you opt into the static type systems in these languages, you just completely avoid shipping that class of bug to production, no if, ands, or buts about it. That particular bug that we started with on the sacred timeline and the alligator timeline, it just goes away. What’s really powerful about it is you’re taking away this reliance on the good intentions of your engineers. You may have best practices established in your engineering work that whenever you’re using a dynamic language, you better make sure you have thorough test coverage that’s going to prevent you from having one of these kinds of bugs go to production, but you’re relying on the engineers to adhere to that best practice.
Then you hire new people to your team and they don’t know the best practices yet and they’re prone to making mistakes sometimes. Putting that power in the hands of the compiler instead of the humans, it just eliminates that class of bugs. That doesn’t mean that we have to abandon our favorite dynamic languages. Pretty much all of these languages have added opt-in tools that you can use to get static analysis and static typing. Python has mypy. JavaScript, obviously TypeScript is becoming much more popular over the last five years or so. Ruby has a system called rbs. Clojure has several things including Typed Clojure. Whenever you opt into one of these, you can usually do it pretty gradually. You don’t have to boil the ocean with your codebase. It really just boils down to just adding a few little type int to the method signatures. That little action changes this bug from a runtime bug to a build time bug where mypy is going to catch this up front and say you can’t pass a string to this function. That allows us to avoid that alligator timeline.
Null Safety
Second one I’ll talk about is null safety. You’ve probably all heard the phrase about this being like the million-dollar mistake in programming. If you’ve written any Java, you’re probably really familiar with this pattern where like every time you write a new function, there’s 15 lines of boilerplate of checking all the arguments for nulls up front. Same thing in C#. These are again relying on good intentions. The first thing is you’re relying on your developers to remember to put all those null checks into place. Then, even worse, if they do put the null checks into place, it’s still a runtime error that’s getting thrown, so you’re still subject to the same kind of bugs that led us to the alligator timeline. A lot of the newer languages like Kotlin have started almost taking away support for assigning nulls to normally typed variables. In Kotlin, if you declare a variable as of type string, you just can’t assign a null to it. That won’t compile.
If you know that you need it to accept null, then you can put this special question mark operator on the type definition, and that allows you to assign a null to it. Now once you’ve done that, you can no longer call the normal string methods directly on that object. The compiler will fail right there. Instead, the compiler will enforce that you’ve either done an if-else and handled the null case, or you can use these special question mark operators to say that you’re willing to just tolerate passing the null along. In either case, the compiler has made you made an explicit decision upfront about what you’re going to do in case it’s null rather than you essentially finding out about this bug at runtime. Rust is another language where there is no null.
In Rust, the closest thing you have to null is this option type. Any option in Rust is either an instance of None or an instance of Some. This is similar to optional in Java, but in Rust, it’s much more of a first-class concept. In this code here, you can see I declared this function called foo, and I’d said its argument is a string. I cannot call that function and pass a None in. That’s a compile time error. Bar, I said it’s an option of string. I can pass a None in or I can pass a Some in, but again I’ve had to be explicit about it and make the decision upfront. Compile time null safety, most languages have some support for this these days.
The languages that have been around the longest like Python, C#, Java, those languages have to deal with a lot of backward compatibility concerns. They can’t just flip a switch and adopt this behavior. In those languages, you’ll probably have to work a little bit harder to figure out how to configure your build tools to disallow nulls, but they all have some support for it. An experiment that I suggest is just writing an intentional bug where you pass a null to something that you know should not accept a null, and then play with your build tool configuration until it catches that at build time rather than allowing it to possibly happen at runtime.
Immutable Variables, and Classes
Now we’ll move on to number three, which is immutable variables and classes. There are very few things that I’ve worked with in my career that I feel improve the maintainability of my code as much as leaning into immutable variables as much as humanly possible. The main reason for this is that they dramatically reduce the amount of information that you need to keep in your brain when you’re reading a piece of code in order to reason about it and make assertions about it. As an example of that, here’s some Java code. I’ve got this function called doSomething that takes in a foo as an argument. Foo is just a regular POJO in this case. Then it calls doSomething else and passes that foo along. Now here’s some calling code that appears in some other file where I construct an instance of the foo, and then I pass it to that doSomething function. Then imagine we have maybe 100 lines of code right here or maybe even more than that.
Then we eventually get to this line of code where we print out foo. If I’m an engineer working on a feature in this codebase and the change that I want to make is somewhere around this line that’s doing the print statement, what can I assert about the state of my foo at this point in the code? Were there any statements in between those two that might have modified my foo? It’s certainly possible, so I’m going to have to read all that code to find out. Was my foo passed by reference to any functions that might’ve mutated it? Yes, it was passed to do something and then that passed it along to do something else. Does that mean that I need to go examine the source code of all of those functions in order to be able to reason about the state that this variable is going to be in when I get to this line of code? The answer to that is basically yes. Without knowing what’s happening in every one of those pieces of code, I have no idea whether this variable got mutated in between those two points in time.
Then the situation gets infinitely more difficult if you have concurrency in your program. If potentially this doSomething else function is passing that reference to some pool of background threads that may be doing work in the background, then you can imagine a scenario where I add another print line here, just two print lines in a row printing this variable out twice. I can’t even assert that it’s going to have the same value in between those two print statements because some background thread might’ve changed it in between the two.
Again, I have to go read all of the code everywhere in my application to know what I can and can’t assume about this variable at this point in time. That just slows me down a lot. An alternate way to handle this with newer versions of Java is rather than foo being a POJO, we use this new keyword called record, which basically makes it a data class. It means that it’s going to have these two properties on it and they can’t change ever. It’s an immutable piece of data.
Then I also add this final keyword, which says that nobody can reassign this variable anywhere else in this scope here. With those two changes in place, I know that nobody can have reassigned my foo to a different foo object because that would have been a compile time error. I also know that nobody can have modified this inner property, this myString, because that also would be a compile error. I don’t care anymore that we passed a reference to this variable to the Bar.doSomething method, because no matter what it does, it can’t have modified my data. I don’t have to worry about that. Also, if there’s 100 lines of code here, I know that they can’t have modified it. I no longer have to spend any time thinking about the state that might have changed in between these lines of code.
When I get down here to this print statement, I know exactly what it’s going to print. That means I can just move on with my changes that I want to make to the code without getting distracted by having to page all of the rest of this application into my brain and think about it. Most languages have some support for this these days. Kotlin definitely has data classes. Clojure, everything’s immutable by default. Java has records and final. You can find this in pretty much any programming language. TypeScript and Rust, you have to roll your own a little bit, but it’s definitely possible to follow these patterns.
Persistent Collections, and Immutable Collections
That leads us into a related but slightly different topic, which is about collections. I also want my collections to be immutable for the same reasons, but that’s a little bit harder. You can see this line of code here where I’m constructing a Java ArrayList. I’m using this final keyword because I want this to be immutable. I want to be able to make those assumptions about the state of my list without having to spend a bunch of time reading my other code. The problem is this ArrayList provides these mutation functions, the .add, .remove, whatever else. I’m right back in the world where I was before, where these other functions that I’m calling, these other lines of code that might happen here, they can mutate that list in any number of ways.
Again, I cannot make any mental assertions about what this list has in it by the time I get to this point in my code. Recent versions of Java have added some stuff, like there’s this new list of factory function that actually does produce an immutable list, which is what I want. Now I don’t have to worry about the fact that I’ve called doSomething because I know that this list is immutable. I do, again, know that by the time I get to this print statement, I know what my list has in it. The flaw with that is you’ll notice this list.of factory function is still returning the normal list interface.
That list interface provides these mutating functions like add, remove, whatever else. Even worse, if I call those now, it’s a runtime error. The compiler won’t detect that this is a problem, but the program won’t throw an error at runtime. Now I’m back to the world of relying on good intentions. Now I’ve got this immutable list, which is what I wanted, but if I’m passing it around to all these other functions and only advertising it as a list, then they may try to call the mutation functions on it and then we get a bug at runtime.
Some of the more modern languages like Kotlin, they’ve solved this problem by, in Kotlin, collections are immutable by default. If I say listOf, then I get an immutable list and it doesn’t have any methods on it like add. Again, compile time error if somebody tried to call that. It does also have mutable variants of those collections. If I really need one, I can have one, but the key here is that it has a separate interface for the two. I can lean into the immutable interface in all the places where I want to make sure that I don’t have to worry about somebody modifying the collection underneath me.
Whenever I talk about this concept of these immutable collections, people ask me, what about performance? Your code is going to make changes to the collection over time, otherwise your code’s not doing anything interesting. Doesn’t that mean that we have to clone the whole collection every time we need to make a modification to it, and isn’t that super slow and memory intensive? The answer to that is, no, thankfully. There’s a really cool talk from QCon 2009, from Rich Hickey, the author of Clojure, about persistent data structures, which is the data structures that he built in as the defaults in the language of Clojure. They present themselves to you as a developer as immutable at all times. When you have a reference to one, it’s guaranteed to be immutable. It provides modifier functions like add, remove, but what they do is they produce a new data structure and they give you a reference to it.
Now you can have two references, one to the old one, one to the new one, and neither one of them can be modified by other code out from underneath you. The magic is, behind the scenes, they’re implemented via trees and they use structural sharing to share most of the memory that makes up the collection. It’s actually not nearly as expensive as you might fear. This was a hard thing for me to wrap my head around when I first started writing Clojure. I was like, that can’t possibly be performant. It’s a really nice solution to the problem. In practice, the way that they’re implemented, you almost never need to clone more than about four nodes in the tree in order to make a modification to it, even if there’s millions of nodes in the tree. This is a slide from Rich’s QCon talk where he talks about how these are implemented. What you can see here is two trees. The one on the left with the red outline, that’s the root node of the original collection. It has all these values in it.
On the right, he’s showing us, so we want to add a new child node to this purple node with the red outline. We’re going to try to add a new child node to it. To implement that, what we actually do is we just clone all the parent nodes that go down to that one, and we add the new child node there. Then the rest of the child nodes of all of these new nodes that we’ve created, they just point back to the same exact memory from the original data structure. We’ve cloned four tiny little objects and retained 99% of the memory that we were using from the original collection. With this pattern, you can have your cake and eat it too. You can have a collection that presents itself to you as immutable so that you know that it can’t be modified out from underneath you while you’re working on it. You don’t have to sacrifice performance when other threads, for example, need to change it.
This is hugely powerful in concurrent programming because there’s all kinds of problems that you can run into with shared collections across multiple threads in your concurrent code, where you either have to do a lot of locking to make sure that one thread doesn’t modify it while another thread is using it, or you can end up just running into these weird race conditions that cause runtime errors. With this pattern, any thread, once it grabs a reference to this collection, you know that that collection’s not going to change while you’re consuming it. After it’s done with it, it can go grab a new reference to the collection, which might have been updated somewhere else, but again, that one will be immutable, and we don’t have to worry about it being modified from underneath this either. Clojure and Scala have these kinds of collections built right into their standard library, but every other programming language that I’ve looked into has great libraries available on GitHub for this, and they’re usually pretty well-consumed and battle-tested.
Errors as Return Types – Simple, Predictable Control Flows
Now we’re going to move on to errors as return types. This one has mostly to do with control flow. When I’m talking about this one, I like to reflect on the history of Java and how at the beginning of Java, it was really common for us to have these checked exceptions versus unchecked exceptions. Method signatures would be really weird depending on whether they’re using checked or unchecked exceptions. These are trying to do exactly what I’m advocating for in this talk. They were trying to give us compile time safety to make sure that we were handling these errors that might happen.
In practice, we just collectively decided we did not like the ergonomics of how it was implemented and we drifted away from it over time. I think one of the funniest examples of that evolution is in the standard library of Java itself, the basic URI class that you use for everything that has to do with networks. It throws a checked exception called URISyntaxException whenever you call its constructor, which means you literally cannot construct one of these objects without the compiler forcing you to put this try-catch there, or without you changing your method signature to advertise that you’re going to rethrow that.
Then everybody else who’s calling your function now has to deal with the same problem. Everybody hated that because the odds that we were going to actually pass something in there that would cause one of these exceptions were really low and drove people crazy. A couple releases later in Java, they added this static factory function called create that literally all it does is call the constructor and then catch the exception and rethrow it as a runtime exception. They put that into the standard library. That was an interesting trend to observe. Likewise, all of the JVM languages that have appeared in the last 10, 15 years, Kotlin, Scala, Clojure, they’ve all basically gotten rid of these checked exceptions in favor of runtime exceptions. That means now all of our errors are runtime errors. That again is really against the grain of what I’m pitching in this talk. It means now we have to go read the docs or the code for every function we’re calling and make sure we know what kinds of exceptions it could be throwing, and handle them successfully. We’re back here, good intentions.
Go is the first language recently that I’ve tickled something in my brain for thinking about different ways to solve this problem. Go really leaned into the syntax of, if you’re going to call a function that might cause some kind of error, instead of there being an exception with weird control flow semantics and relying on this weird try-catch syntax, just returns a tuple instead. You either get your result back or your error back. One of those is going to be nil whenever you call this function. Then the compiler can force that, that you’ve done some checking on that nil, and you’ve decided how to handle it.
This is again, like the compiler is now doing this work rather than relying on good intentions. The other thing that I really like about this is we’re just using an if-else statement to interact with this error. It’s not a new special language construct that differs from how we’re dealing with all the other pieces of data in our code, like a try-catch is. It’s just like the same type of code we’d write for any other piece of data. We got more clear control flow. It allows the compiler to enforce more explicit handling, prevents us from silently swallowing types of exceptions. Yes, again, we can use our normal language constructs rather than the special try-catch stuff. Here’s a Rust equivalent of that. In Rust, there’s this type called result. Any instance of result is either error or ok.
Then it’s a generic type. If it’s a success, if it’s an ok, then the type is going to be this integer 32 bit. If it’s an error, then the value is going to be a string. Then we can use this pattern match statement and say, if it’s ok, then I’m going to do something with the success case. If it’s an error, then I’m going to do something with the error case. In these case statements, we get back the types that we declared in the result declaration.
Exhaustive Pattern Matching, and Algebraic Data Types
Errors as return types help us move our bugs from runtime to build time. I’ve shown you that they’re pretty ingrained in the languages in both Go and Rust, but can we do this in other languages? That leads me into my last topic that I want to talk about, which is about exhaustive pattern matching and algebraic data types. I’m going to explain what those are a little bit, and then I’m going to close the loop on the error handling part of this. What is an algebraic data type? It’s basically like a polymorphic class. You can imagine if you had a parent class called shape, and then you had child classes called circle, square, octagon. It’s basically just that, except for the compiler knows upfront all of the existing subtypes that can exist rather than it being open-ended. Most modern languages have some way of expressing these now.
Then they have these pattern matching statements that you can use to branch on which ones of the types that you end up getting. Here’s an example in TypeScript. You can see I’ve declared this type called shape, and this little or operator just means I’m unioning together several other different types. The key in TypeScript is that I have this common property, which I happen to call type, but you could call it whatever you wanted. As long as all of the types that you’re declaring have that property, and they all have a unique value for it, then the compiler can tell the difference between all of these types. Then I can do a pattern match statement on that variable, and then I can do these case statements to handle the individual branch. This is really cool because the compiler is smart enough to know once I get inside this circle branch, that I’m going to have a radius property available, and I’m not going to have a width and height. If I tried to reference width or height here, the compiler would fail, and it wouldn’t allow me to write that code.
Conversely, the same thing with the rectangle. It gives me a lot of type safety. Exhaustive pattern matching is basically just that same concept, but the compiler can give you a build time error if your pattern match statement doesn’t cover all the possible cases. This is why algebraic data types are important, because we want the compiler to know all of the legal types that are available. Most of the languages that have this stuff, they have the support for an exhaustive pattern match statement. Not all of them have it enabled by default. In TypeScript, you’ve got to turn that on as a compiler option. If you turn it on, then this becomes an exhaustive pattern match statement. What that means is if I go modify the definition of my shape type, and I add a third one in here called square, now this shape definition may be in one file somewhere in my code, and I may have these pattern match statements scattered throughout lots of other places in my code. They’re not guaranteed to live right next to each other.
As an engineer, if I come in here and I add in this new square type, then the next thing I got to do is search all over my codebase and find all the places where I might have been doing one of these pattern matches and make sure that I add support for the square. If I don’t do that, then we can get some weird runtime failure. With an exhaustive pattern match, you’re telling the compiler that you want it to fail if it finds a pattern match statement where you’re not explicitly handling all the cases. This would fail to compile in TypeScript because I don’t have a handling for the square case here, and I have to go add it before it’ll build. That’s really powerful.
Similar concept in Kotlin. In Kotlin, these algebraic data types are called sealed classes. You can see here I’ve got one where it can either be a success1 or a success2. I’ve got this win statement that I can use as a pattern match on it. What I want to show here is, if the function that I’m using to get this result might throw an error, this is where I’m going to tie this back into the error handling, this thing might cause an error. I have to put in this try-catch statement, and I have to know what exception type might get thrown here. We’re in good intentions land again. I might forget to put that try-catch statement in there. I might not handle all of the different types of exceptions that could possibly get thrown by that function.
If I make a small change to the way I model this, and I just add the error in as a different branch of this result, sealed class, then now I get to take advantage of all this other stuff that I’ve just shown you all. This is what my code looks like now. I just have a new branch in my pattern match statement that handles the result case. Now I’m not relying on good intentions to put the try-catch into the code. The compiler, because this is an exhaustive pattern match statement, will fail to build if I haven’t added the branch to handle this error. This code is just cleaner and simpler. It doesn’t involve this extra level of nesting and weird special case code. Highly recommend looking into the support for this in various languages. This is one of the more recent trends that I’ve seen. Like in Java, it didn’t come in until Java 17. In Python, it was in Python 3.10 when it got introduced. You can find something that will allow you to do this in pretty much whatever programming language you’re using.
Key Takeaways
Allowing bugs to surface at runtime can be really expensive. That can put us on that alligator timeline that we’re trying to avoid. Modern language trends are giving us really cool tools to catch bugs at build time instead of allowing ourselves to be subject to this problem. These same trends, I think, have this nice side benefit that they make the code more maintainable and easy to reason about anyway. It’s like a double win. More maintainable code obviously leads to increased developer productivity. It makes it easier for your teammates, present and future, to understand your codebase and feel confident about making changes to it. What we’re really trying to do here is find places where we can avoid relying on good intentions to solve problems.
The specific language features that I am advocating here, leaning into type checkers for dynamic languages. Configuring your build tools to disallow nulls. Using immutable variables and data classes wherever you can. Finding a persistent collections library in your language if it’s not built into the standard library. Surface errors as return values, not exceptions. Using exhaustive pattern matching with algebraic data types to allow the compiler to make sure that you’re handling all the cases whenever you can. It just allows you to model your business logic a little bit more concretely as well. It’s really nice.
Questions and Answers
Participant 1: Are there any good examples in the open-source world that you could point to that use a lot of the best practices you were talking about?
Price: I’ve found that the way to find the good examples is to find projects that are built in these languages that make this stuff be core constructs. Any project that you find in Rust is going to be forced to follow a lot of these paradigms just because that’s how the language was designed. Many or most Kotlin projects that I’ve seen really lean into the immutable variables and the pattern matching stuff. Mostly I think about that by language more so than by specific projects.
Participant 2: The principles that you mentioned, let’s say primarily I’m a Java workshop and it’s one of those languages, like many of those principles are checked. Would you suggest that I try navigating Go or Rust and start moving your platform, like a mix of these languages or would you say that stick to just one which checks everything?
Price: It probably depends a lot on your team and their interest and willingness to branch out into different languages. There is obviously a cost for managing codebases in multiple different languages. With Java in particular, like inside of AWS over the last five years, there’s been a really big shift towards teams that had big existing Java applications starting to add new features to them using Kotlin, because Kotlin has really good JVM interop, and so you don’t have to rewrite the rest of your code. You can just start adding Kotlin classes to it. When you write your Kotlin code initially, you can choose to write it in a way that makes it look almost exactly like Java code, so it’s really familiar to your engineers that already have experience with that. You can start experimenting with the features over time and gradually migrating things over.
I think in most big existing codebases, that’s always going to be a more successful long-term strategy than trying to just cut everything over all at once because that just ends up usually not being practical given the business requirements for delivering new features and stuff like that. That’s one thing to consider. If you have some isolated project, like a new microservice that isn’t tightly coupled to your existing codebase, then that’s a reasonable place to consider trying a new language. Then, like I mentioned before, just finding some little toy side project when you have the time and interest to play around with these different languages and get a sense for how you feel about them. That really helps you decide whether it’s something that you want to lean into or not.
Participant 3: When you talked about exhaustive pattern matching and you showed the switch statement, my mind immediately went to traditional interfaces and virtual classes. Why would I want to do, adding a new enumeration to my result rather than adding a new implementation to do the different things? It was more when you did the shapes thing. When I wanted to add a square, why couldn’t I just have a shape interface class and I just have three different implementations? When I want to add in a polynomial or whatnot, I don’t have to go everywhere, I just have an interface of array or area that I would have to implement.
Price: There’s a lot of ways to skin this cat. The thing that I’m really advocating for here is choosing one that gives you the exhaustive pattern matching so that when you do make that addition to the parent type that the compiler can automatically catch all the places in the code where you haven’t added support for it. There are definitely other ways to handle that besides this one. I like this best in TypeScript because in TypeScript, if you use interfaces or subclasses, then you have to start using this weird instanceof keyword, and it gets into that realm of JavaScript where it behaves really differently for one type of data than it does for another type.
In JavaScript, if you ever had a piece of code that’s trying to check and see if a variable is a string, it may be like, if is instanceof string, or thing.type equals string, or several other conditions that you have to check just because JavaScript gets wonky when you start trying to do reflection type stuff on that. Of the different patterns that I have personally played around with in TypeScript that really work well with this exhaustive pattern matching, this is the one that has been the most foolproof for me. This is the one that Google uses in their implementation of Protobuf. Protobuf has this concept of OneOf where you can say that a piece of data is either this thing, or this thing, or this thing. If you look at the way that Google’s Protobuf libraries generate TypeScript code to handle those OneOfs, this is the way that they do it. It’s just worked really well for us when we tried it out. It’s not the only solution though.
See more presentations with transcripts