Transcript
Larson: I’m Will. I’m here to talk with Dan a little bit about ambiguity. I’m a CTO at Carta. Then Dan is the deputy to the CTO at Carta. It’s unclear what those roles mean in the detail. We’ll talk about that a little bit more. I’ve written some books. I want to talk a little bit about titles. The first thing when you talk about titles is like, there is no particularly clear definition of what any title means, and it’s an unsolvable problem. Principal engineer: and so, when I joined Uber, 2014, there were zero principal engineers. There were about 200 engineers. There were zero senior staff engineers. There were, in fact, I think maybe one staff engineer at that point. When I left a couple years later, there were 2,000 engineers, and there were, I think, three or four staff engineers at that point, maybe one senior staff.
Conversely, when I joined Calm, there were 3 principals, or principal equivalents out of 24. Pretty different ratio, like zero to 400, and then 1 in 8, radically different. Somewhere in between those, and I joined Carta, there were two principals out of about 400. The thing I really want to underpin there is that all ambiguous roles, all infrequent roles, all one-of-one roles, there’s no rubric. If you want to think about being a principal engineer, and you want to go to your manager, the CTO, and you’re like, “I meet these four criteria to be principal”. You’re hosed by definition, because there is no criteria that’s going to be totally consistent across companies.
If you think about the rarest role of most engineering organizations, it’s the head of engineering. Sure, there’s some companies on LinkedIn that have head of engineering, API, head of engineering, X. That’s not really what I mean. There’s only one real head of engineering in most companies, or one CTO. CTO across companies is radically different. I was the CTO at Calm, now I’m the CTO at Carta. The roles are quite different. It’s different because the size of the engineering team is quite different. It’s different because the needs of the company are quite different. It’s also different because the CEOs that I work with and the chief product officers I work with are radically different people as well. Each time you think about what’s the rubric, what’s the definition for a principal engineer, the thing I’d push back to you on is like, no rare roles have clear definitions, and that’s actually part of what makes them pretty interesting.
I do think the staff engineer role has gotten a little bit more clear over the last few years. I think the archetypes have been useful to describing different ways people can actually meet the expectations of staff. I do think there’s increasingly alignment around this idea that staff-plus roles mean that someone solves a company’s core problem. You’re not solving just a team problem. You’re not doing something that you find fun or enriching. You’re solving something that matters to the company. I’ve been wondering like, what could we do to get a little bit clearer on what principal engineer means? I don’t think we can get a rubric, but can we get a little bit closer? I think navigating ambiguity is the quintessential definition of what makes someone principal.
Navigating ambiguity matters a lot because if you look at an organization, not every organization, but many of them are constrained on the number of problems that their head of engineering can actually take on, load the context into their head, make a good decision on. You work with heads of engineering. Many of them never make decisions for years that you really need them to make to move forward. You’re just stuck. Maybe you’re an organization with a terrible monolith-microservices split, and there’s never clarity. You’re just stuck there indefinitely. Wouldn’t it be great if someone works with that problem and give clarity? It would be.
If your executive only has enough time to solve a few things, and your business is crashing and burning on the revenue side, they’ll literally never get attention to solve this problem that you think is important, but isn’t quite important enough for them. If you’re working on ambiguous problems, then you’re actually able to create bandwidth for, not just the executive team, but also literally the engineering executive at the company. This is the definition I want to run with for what makes a great principal or a principal engineer, if someone is actually creating bandwidth for the head of engineering directly.
Ambiguous Problems – Examples
What are ambiguous problems? What should I have for breakfast, is, loosely speaking, an ambiguous problem if you have a lot of breakfast anxiety. What do I mean? Data locality is the first example I think about a lot. Who here has worked on data locality to some extent? The first rule of data locality is you go to the engineers, like, we need to support data locality. They’re like, “Absolutely, data locality, that’s really important. We just need some requirements for the product team”. You’re like, “Ok, great”. Walk down the hall, you’re like, “Product team, data locality is pretty important”. They’re like, “Incredibly important, international expansion, number one priority for the year. We need some requirements”. You’re like, “Ok, great. We just need compliance to tell us the requirements for data locality”. You’re like, “Ok, great”. You go to compliance. They’re like, “Can’t believe no one’s asked us for this before. We’re going to get those requirements for you. We just need to know what the product you’re building is first.
Then we’ll give you all the requirements about data locality”. Then you add in the fact that the requirements are changing each time regimes change in the various countries around the world their data locality laws change as well. Navigating this is effectively impossible if you don’t have the ability to deal with ambiguity. I’ve seen so many people get stuck on this problem, including me. At Stripe in 2017, I tried to take my data locality. I could not get people to agree on the product consequences of doing the data locality approach I wanted. Came back a year later and got a little bit further that time.
Decomposing a monolithic application: how to decompose, whether to decompose, pretty challenging. I think one of the interesting problems is typically when you try to decompose, at a minimum, things get worse initially. Sometimes things get worse forever. It’s pretty challenging to actually convince people that you want to do this and to convince them that it’s worth doing, that you’ll actually be better on the other side of things getting worse. How do you balance all these different constraints? Uber did a lot of decomposition. That was possible because we had so many painful points dealing in our monolith. Also, maybe we did the wrong thing anyway. Getting people to agree, particularly in 2024. 2016, you got promoted to principal by doing a decomp project that didn’t quite work, but looked really good until you switched jobs. Now decomp is pretty untrendy and much harder to actually do. Data deletion and retention, another similar problem. No one can actually agree on what to do.
Again, you go to compliance, legal, get everyone in the room, product. You agree you need to do it. Then you start agreeing like, what are the product consequences of deleting the data that we’re obligated to delete? How do we deal with that? Incredibly hard to actually get people to agree that they’re willing to accept the consequences of doing it. Everyone is like, absolutely, let’s do it. In the details, there’s a ton of friction to actually get it done. Driving quality, 2024 is a more with less, higher quality software year. What do these things even mean? I don’t know what quality means necessarily. It’s a pretty challenging topic to make progress on.
I imagine a lot of you have been asked to drive a quality spike. It’s easy to manufacture the appearance of progress. Make some more pull requests or some more reviews. Reject anything without a bunch of tests, or if coverage goes down even 0.01%. Not clear that actually does anything useful. It just looks like progress. How do you actually make real progress here? Pretty challenging. The last one is defining principal engineer. This is the one of the five that’s absolutely not going to get you promoted to principal engineer. I think this is the least important in most dimensions. I do think it’s a really challenging problem, and getting to the definition, not an easy thing to do.
The Right Algorithm for Ambiguous Problems
Pretty abstract. Like, let’s solve some ambiguous problems. Data locality is an ambiguous problem. I want to share a little bit of my thinking about what’s the right algorithm to work through ambiguous problems? I want to use a specific thing that I’ve been thinking about a lot this year, which is, how do we manage access to customer data at Carta? When you think about this problem, there’s lots of things that you need to do that you need some customer data. Imagine you get a ticket coming into our customer support team. You do need to actually see the data to respond to that ticket successfully. Or maybe they’re asking for a password reset. You want to see some audit logs about what they’ve done before. Trying to get a sense of whether it’s actually the person. Or, one of Carta’s new business lines, we do fund administration. We actually run venture capital and private equity funds.
That case, part of what you need to do, you need to actually move money on their behalf. You need to actually make an investment for them. Figuring out how we allow access to customer data, is something that’s been increasingly important for us. The first step of dealing with an ambiguous problem is figuring out who the executive sponsor is. I think a lot of people will take on an ambiguous problem, going back to the data locality example at Stripe, I took that on. I’m like, “I’m going to solve this. I’m an aspiring leader at the company”. I couldn’t get anyone to agree with me about the tradeoffs, and so put together the document.
Got alignment on the veracity of the document, but no one would actually agree on what tradeoffs to accept. I just got stuck. This is the zeroth rule of ambiguous problems. If there’s no executive who’s willing to actually support you on it, you probably aren’t going to succeed. That’s because individuals really can’t take risk on in ownership. If you take on a bunch of risk for the company as an individual you could just leave. You’re just one person. You really need a team, an organization who’s able to hold the risk in perpetuity.
Otherwise, people just don’t believe you when you say you’re going to hold the risk. You will until you walk out the door. You need an executive, someone who can actually hold the risk persistently for you, and that agrees with you.
You’re starting to work an ambiguous problem, the first step is mapping the stakeholders. When we started thinking about access to customer data, here are four stakeholders that came out really obviously to us. Security team, yes. They care about how we access data. Compliance, legal, yes, 100%. Product, engineering. Interesting thing about these stakeholders is they have fairly unified perspective on what’s important. For these stakeholders, you could basically not allow any access to customer data and solve their problem.
One of the first challenges we had is, historically our efforts around data access really focused on these four stakeholders, none of whom require access to customer data to do their work. If you privilege these stakeholders, you almost always get the wrong outcome. The first thing is making sure you have the right map of stakeholders. The real map of stakeholders is absolutely security, absolutely compliance, absolutely product and eng. Also, all these internal teams that need to access this data to do their jobs.
The first thing we had to do is open the door and bring in the right set of stakeholders, not just the stakeholders that we liked or that agreed with us, but really understand what are the full set of different perspectives and needs that we needed to solve for. Next, I think about this idea of building layers of context. Different perspectives start showing up, but often I find that the different people involved don’t have any awareness of the other perspectives. From security and compliance perspective, let’s just make it impossible to access customer data. That’s the obvious goal.
From the internal team’s perspective, customer success, the fund administration teams, you actually want to minimize friction to getting access to data. That’s what they need to do their jobs. The delivery teams, customer support, these teams are all goaled in the number of tickets they’re solving per hour or per day. For them, the friction’s really challenging. It’s really getting in the way of them accomplishing what they need to in the way that they’re getting goaled by the company. Then, engineering and product, what do we care about? Needs to be easy to build, easy to maintain in this case.
Once you have these different layers of context, you’re able to start understanding which stakeholders are missing which perspectives. The product engineering perspective on this problem, not that interesting, let’s make it cheap and easy to maintain. Sure, no one disagrees with that. That’s like a grandma and apple pie, sort of everyone loves that perspective. These two other perspectives feel a little bit in conflict: if you want to allow no one to have access or really minimizing access to customer data, or, you want to make it really low friction to access customer data? These are hard to reconcile. It’s easy to see how you get this continuum where security and compliance are on one side, and they’re like, “Just don’t let them do it. Make them supply a fingerprint”.
On the other hand, they’re like, “Just let us access the data. We’re doing it because people are asking us to. We’re just trying to do our jobs”. If you have that perspective, it’s like, you’re screwed. Either you prioritize security or you prioritize usability, but there’s really no way to get both.
I think about the next step as like developing a multi-dimensional tradeoff. A single dimensional tradeoff is like, let’s be secure or let’s be permissive in terms of access to customer data. If this is the tradeoff, there’s really no good way to get to where you want to go.
Then you just go in, go to the executive team, and people yell at each other. They’re like, security is the value of the company, or, our margin’s going down because we have too many delivery people. You just fight about things like that. That’s not a very interesting conversation. There must be a way to actually accomplish all the goals here, not just have this spat in front of all the executives where the most important executive wins. That’s not an interesting way to solve problems. If you think about the multi-dimensional tradeoff we came to with here, it’s, how do we combine these two different perspectives? The first idea is we should eliminate all access to customer data without a fully documented rationale of why it’s happening.
Second, if the rationale already exists in another system, let’s just automatically reuse that. The balance here is we do want to make sure there’s zero unaudited, zero rationale-free accesses to data. We also want to make it as easy as possible for the folks to actually do the work they’re trying to on the internal CX and other teams. Getting super concrete, a customer calls in, a customer sends an email in, Salesforce tickets automatically open on their behalf. It’s automatically mapped to the corporation, or the fund, or whatnot that is opening the ticket.
Then it’s assigned to one person on the CX team. What if when that person and only that person comes to a company where they have a ticket that is assigned to them for that company and the ticket is actually still open? We can automatically give them access to that customer data. If the ticket’s closed, of course not. If the ticket’s not assigned to them, of course not. If it’s for a different company, of course not. By finding the next dimension within the tradeoff, we’re able to actually solve the problems of both the security team and the internal team, productivity teams, customer success, not like compromising.
By adding this additional dimension, we’re able to actually get a great set of tradeoffs. Almost every time that I run into an ambiguous problem where people can’t make progress, it’s because they haven’t found the way to add another dimension that allows everyone to get what really matters to them. If you just have secure or easy, you can never make progress on that. It’s just going to turn into politics. Here’s where I’m pushing you, I think politics are usually the consequence of poorly framed solutions. The politics aren’t necessary. Everyone can actually get what they want.
Once you have this multi-dimensional tradeoff, you still have to get people to actually buy into it. That’s where, again, step zero, who’s the executive that’s going to partner with you? If you don’t have that executive, getting people to buy off, even if no one has to sacrifice anything, really challenging to do. People don’t like change. People won’t necessarily believe you when you tell them that they can get what they want. Maybe to actually do this, you’re still going to have to do a fair amount of engineering work to actually build the new system or something like that. Again, if you don’t have that executive, even if you do an amazing job of designing the multi-dimensional tradeoff, often it’s more of a thinking exercise where you hand it off, the executive team is like, you’re super smart. Then nothing actually happens. That’s not that interesting.
Finally, sometimes on these ambiguous problems, you just don’t make progress. You put the proposal together. You do get the right layers of context. You understand the missing context across the layers. You put together a great document, and you just can’t get progress nonetheless. I think a great example, again, to the data locality issue, I think when India was rolling out UPI, things were just changing so frequently. We literally couldn’t agree on the state of what was going to happen six months from then.
Enforcement, from our perspective, seemed to be arbitrary across different companies, where some of the companies that were technically compliant didn’t seem like they were actually compliant from our understanding of what they had done. Sometimes there’s so much ambiguity that you still can’t quite make progress on it. That’s ok. I think part of working on ambiguous problems is you do a really good job, you navigate the complex solution space, and sometimes you still just get stuck. Part of this is you have to recognize when you are stuck, and move on to a different problem. This can be super demoralizing. You talk to the CTO. You’ve put together this plan of how you’re going to get promoted to principal for solving data deletion.
Then, you can’t quite solve it, you can’t quite finish it because of other stakeholders, and you don’t get promoted. That doesn’t feel fair. That’s a real thing, and it’s unavoidable. I think part of being an executive, even if you’re an executive as an IC, is you take a lot of risks, and sometimes the risks aren’t in your control. As a less senior person, you can say, that wasn’t in my control, so you can’t hold me accountable. As a more senior person, no one cares. You couldn’t control it. It’s not fair. That doesn’t matter.
Recap (Working Through Ambiguous Problems)
Recapping a little bit. The steps of working through an ambiguous problem, identifying the stakeholders first. Figuring out the layers of context. Finding the multi-dimensional tradeoff, where instead of having this us versus them, you’re actually able to solve the full set of requirements completely, and see if you can build alignment. Delay is always an option. The reason this matters is, like I’ve said from the beginning here, I think ambiguous problems are the proving ground of principal engineers, because these are the problems where I as an executive can’t make progress, because I’m not quite sure what to do.
Developing the multi-dimensional tradeoff takes so much time and so much depth, that unless I’d seen the problem and worked it before, it’s often challenging for me to figure out what to do. Again, data locality, data deletion. I don’t actually know what we should do, because the details keep changing so frequently on both of those fronts. If you can bring that to me, and then be my partner on rolling that out, you have a real chance of actually extending my leverage out of being a principal engineer.
This is my executive lens. One of the beauties of executives is they see everything in this pristine, perfect world, and then hide from the details sometimes. We’re going to have Dan, who can’t hide from the details, tell you a little bit about how he perceives some of what I’ve been talking about.
Technical, Architectural, Operational, and Business Uncertainties
Fike: I do want to take a few minutes to share a little bit of my story through the lens that Will presented. In order to do that, I want to briefly take all of you back to the year 2011. For those who can’t remember that far back, 2011 was actually when we lost Steve Jobs and when we got Google+. While Will was a director at Digg, I was working at a company called Volition. Volition made Xbox, PlayStation, and PC games like these you see. In 2011, Volition was just starting to embrace the concept of DLC. I was working on Saints Row 3 there on the right, and somewhat late in the development cycle, I had been tasked with retrofitting in support for DLC. DLC means downloadable content. It’s basically separately purchasable content for a game you’ve already bought, and it’s pretty lucrative.
If you look here, you can see an aggregate, the DLC for this costs three times as much as the original game did. There was a lot of pressure on the team to make this possible. Nothing we’d built so far was built with that modularity in mind. There was a lot to figure out to make this work on the game we’d been building for years. This might be starting to sound like one of those highly ambiguous problems Will was talking about, but you’re wrong. It’s actually not. It isn’t. I’ll get into it. You’ll see what I mean. This problem definitely had a bunch of these technical uncertainties to it.
Most of our game assets, they were loaded from a DVD and an optical disk drive, which is what consoles had been using for years. DLC, this was downloaded onto a hard drive or a USB stick or a memory unit. This was all totally new concepts at the time. How does any of that work? There are situations where the DLC might be installed, but there’s no license available to use it. How does any of that work? That’s a new concept too. For both of those questions, how are they different between the Xbox and the PlayStation? You can run into this situation where I’m loading a previous game save and DLC that used to be present isn’t anymore. What’s supposed to happen? I didn’t have any of the answers to any of these, but they’re not ambiguous. These have objectively correct answers. This was just learning.
There were some architectural uncertainties. You have to actually have a user log into an account in order to check for one of those licenses. The first party manufacturers, Sony and Microsoft, they didn’t allow you to require someone to log in until after you’ve gone through the startup sequence and landed at a title screen. Our entire game worked by loading all of our assets in that startup sequence. The 512 megs of RAM we had, that was all allocated or reserved up front. If I need to log in to load the assets and I load the assets before I log in, how are we going to square that circle? The Xbox and PlayStation both approach these things very differently. Should I build a platform agnostic wrapper around those SDKs?
One of the more challenging things was, we have this situation where DLC might be installed, uninstalled, reinstalled. Over the course of those different states, you load and resave your game data, and there’s all these weird conditions you can have where maybe the game save doesn’t load. These don’t have objectively correct answers, but they do have a bunch of correct answers. These were just decisions. This was systems design. I could answer most of these independently.
There were some operational and business uncertainties as well. All of the DLC content payload and the metadata, that was hosted on Microsoft and Sony infrastructure that was out of our control. What’s our protocol for deployment going to look like? How do we build a testing pipeline that works with infrastructure we don’t control? How do we test different combinations of DLC? Game dev broadly has a lot of manual QA. How am I going to interface with that team to make sure all of this works right? How much DLC are we going to have to throw at that team? How much is it going to fit, or how much space is it going to occupy in memory when we have all of it installed? I need to know now before we ship the first game. Lastly, how much are we willing to spend on this? What’s the upside? I can show that chart about how much things cost. I don’t have a chart for how much money it made.
At some point, there’s an amount of effort this isn’t worth. How do we weigh that? Now we’re starting to surface some ambiguity. It’s still manageable. Every other game company has solved this. You do have to page in some real stakeholder context, but solutions definitely exist.
The Deputy to the CTO Role, and Tackling Highly Ambiguous Problems
I’m going to skip ahead a little bit. This story does not end with me getting a promotion to principal engineer. That didn’t land for at least another decade, while I was working at Carta, but before I was working for Will. My name is Dan Fike. I am a principal engineer, but I’m also this deputy to the CTO. That’s actually a title I had before I was a principal engineer. I wasn’t hired into it. I was hired as a staff engineer on the data engineering team. I got the deputy title a couple years later, and I’ve held it through a couple distinct CTOs at this point. Now you all have figured out what a deputy to the CTO is. You know what that is. It’s a weird title. I don’t expect anybody to know what it means immediately. Before I was the deputy to the CTO at Carta, it was filled by nobody. I’m the first person to have that role at the company. I’ve never seen the title before. I understand if you all might be confused.
Some days I’m doing some but not all of what an architect does. Some days I’m doing some but not all of what a tech lead does. Other days I’m doing some but not all of what a VP does. On other days, a chief of staff, which is a little bit cheating because that’s another role nobody knows what it does. The best way to summarize this is, I really act as our CTO’s right hand, as Will’s right hand. When I say right hand, I’m very blatantly stealing that vocabulary straight from his Staff Engineer book, very shamelessly.
In his book, he does discuss these four different archetypes of staff engineer. Briefly, the tech lead supports a team and their execution. The architect owns a real critical area of your systems or products and its quality. The solver, strategically deployed to dig deep into arbitrarily complex problems. The right hand, this one’s harder to explain. In essence, they support an engineering executive as their attention gets pulled into more directions. That executive might be a CTO. It might be an SVP of engineering. It might be someone else. It’s a pretty uncommon role. It’s not a very well understood role. This also makes it one of those one-of-one roles that Will mentioned earlier. It’s not portable.
I want to revisit those DLC things for a moment in this lens. The solver is very well equipped to go attack some of these technical uncertainties we had to figure out. The architect, and to some extent the tech lead, perfect for these architectural problems. The operational and business ones, they’re a little bit more of a stretch. They settle somewhere at the intersection of tech leads and right hands. I do want to dig into the right hand more, not just because it’s poorly understood, but because there’s a lot of highly ambiguous problems that tend to come across the desk of a right hand. I do want to take a moment here though just to point out, if what you just saw me say is that the stuff on the left is easy and those archetypes are less valuable, that is definitively not what I am saying. I do think the skills that underpin these exist on orthogonal axes. I’m trying to emphasize the difference a little bit, much the way you would if you were discussing a software engineer versus an engineering manager. One’s not better or worse than the other, they’re just different.
On my path to this right hand role, this deputy to CTO title, and eventually onward to principal engineer, I did encounter a number of these highly ambiguous problems. We had a while a few years ago, we had over the previous few months doubled the engineering headcount of the company, there was some real growing pain starting to show up, and execution was slowing down. When I say execution, I don’t mean CI/CD pipelines, I don’t mean application performance, I mean software engineering more broadly. The time we spend planning and pivoting and deciding and shipping and releasing. It was my job to remediate that. For another period of time, I was tied up with legal and some IP claims that were pretty ambiguous, but that I can’t talk about more.
There was a period of time there where a lot of our engineers didn’t understand or couldn’t see what our broader strategy is. When you don’t know the strategy, you don’t understand the strategy, it makes decisions feel or be inconsistent or unfair. I talked a lot about this at QCon in London back in April, along with another engineer, Shawna. Lastly in this list, a few years back, I redesigned our interview loop, and the decision-making process that comes out of that. What we had was inadequate for making decisions around senior roles, and there was a lot of inconsistent things that had materialized going through that process. These are not things I did solo, these are things I did with people. This is not everything, but these, I picked these to illustrate some points.
The first, I want to acknowledge that the skills you demonstrate in trying to tackle these specific things, it’s not something they teach in school. It’s not programming. There’s no pull requests associated with these, for the most part. Yet, I would argue there’s nobody better qualified to work on these than a software engineer. If you got to figure out how you’re going to evaluate a software engineering candidate for a senior position, the software engineers are the best equipped to do that. I don’t think non-programming examples like these are required for a principal engineer, or even a right hand role.
It’s just where Carta was and what Carta needed. I also want to take note of the manner in which these problems were ambiguous, because it wasn’t the case that we just had some objectively correct stuff we had to learn. It’s not the case that we just had hard implementation or architecture decisions we had to make. It’s not the case that the ambiguity was simply rooted in dependencies on others. These problems were just full of unknown unknowns. It’s not that there were too many good decisions, and we had to pick one. It’s that there were functionally zero good solutions here, and we had to do something anyways.
I want to drill deeper into this last one just a little bit. I like talking about this because I think it’s very relatable. I don’t have to lay down a bunch of context or background or definitions. Everybody here is familiar with the concept of interviewing and hiring engineers. I first went down this road because I noticed we seldom felt confident in making senior hires.
Occasionally, it would feel like we were using our interviews to look for reasons to say no, and in the absence of that, we would just default to, yes, I guess. I took this concern to our CTO at the time, and they shared with me this real core principle that they wanted to espouse in our hiring, which is that we should be hiring for strengths and not lack of weakness. This was great direction for me to receive. I totally agree with the sentiment, but more importantly, this was step zero. This was me getting an executive sponsor. If we sit down and build out the stakeholders and layers of context for this problem, we’ll end up starting with the interviewers: that’s me, that’s us. This was easy context for me to page in.
At Carta, our interviewers are generally accountable for making yes or no decisions, and sometimes for senior candidates, they don’t really have conviction in that decision. Some of our interviewers felt like this might be because the interview was too easy. Other interviewers felt like when they wanted to say no, the time spent justifying that and articulating their rationale was not a good use of time. As with everywhere I’ve ever worked, ever, people felt like the responsibility of interviewing was unfairly distributed amongst engineers. This is some useful context from the interviewer’s perspective. We also have the hiring managers. What’s the context that we have there? Hiring managers, it is extremely important to a hiring manager that they make the correct decision when hiring a candidate.
If you get this wrong, it can be really bad for your team health. We also know that the hiring managers, they have so much more context uniquely to them on the nuance of the position they’re trying to fill. They know the other people on the team. They know the specific problems coming up in the next 6 and 18 months, and there’s a lot of nuance in who they’re looking for that they’re not going to be able to articulate in a job description or communicate to every interviewer. We also know that our hiring managers sometimes struggle to understand what level to assign to a particular candidate they want to hire.
If we move up the org chart, there’s actually some more context that’s often invisible to the hiring teams, and that’s the context behind the engineering organization itself, more broadly. The engineering organization, it’s extremely important that we make decisions consistently at the org level. Failure to do this can be really toxic to different teams and team members over time. We also knew, in this case at least, that we had some particular expectations or goal to hire N engineers in the next nine months, something like that. Carta’s flexible. Carta’s agile. We need to make sure that we have an interview process that can find engineers who can be portable, as we need to reprioritize what we’re working on one quarter over another.
Lastly, we should consider the context of the recruiters themselves. Our recruiters are generally allocated to a team or a small set of teams, and this one recruiter will be responsible for filling all of the roles on that team. This often creates challenges scheduling things, because you have a fairly small number of people on that team or teams to participate in the process, which in turn dovetails into an overall engineering hiring process that can be too slow. The recruiters will tell you, the process is just too long.
As with engineering, the recruiters value the portability of their recruiters to be able to shift from working on one team to another. There are other layers to consider here. There’s a lot of context around candidate experience or industry trends, inclusive practices, and the state of the market more broadly. We did a ton of research to go capture context from those spaces. I’ve left them off here primarily in the interest of brevity. I do want to focus on these, because I think they’re illustrative of some interesting points. I want you to notice one thing about what I wrote up there. I didn’t write what they want. I didn’t write requirements. I wrote facts. This is the context. We will use this context to derive requirements. We will definitely find requirements that are at odds with one another, where we will have to make tradeoffs. If we try and reason about those tradeoffs by digging into the underlying context and not just the requirements, we’ll be much better positioned to reason about them and make a good decision.
Consider these two for a moment. This makes a decent argument for why teams should be autonomous to interview and hire in whatever way works best for them. These make a counterargument, basically, suggesting that we really should be defining all of our interviews org-wide, and everybody should have the same consistent process and make decisions, maybe centralize the decision-making even. There’s even a secondary effect of that, which is, it would expand the pool of interviewers that are available, making it a little bit easier to coordinate and execute on interviews.
Tradeoff Dimensions
When you put this together, I really saw two primary dimensions to our decision here, which is, who’s deciding and running the interview, who’s deciding to hire a candidate. You could do these both locally. Teams, just go nuts, do what you want to do. Interview the way you want. Make a decision the way you want. It’s fine.
On the other end, we have, teams, you don’t get any say here. It’s all pooled hiring. The company is going to define exactly how we interview. The company is going to execute the interview, schedule them without regard for team they’re going to be on, and then make a decision globally. Some companies elect a compromise, they install a hiring committee or something. The hiring manager’s job becomes to figure out how to succeed with that committee, figure out how to run an interview that lets you make the compelling argument to close the candidates you want to hire.
If we focused on the underlying context a little deeper, we can start to see that there’s some other dimensions here. When I unpack this, it starts to feel familiar to me, and it might feel familiar to you too, because it starts to look like adding layers of indirection. I personally think it feels like dependency injection, but I’ve been told I’m wrong about that. We add these new dimensions, what do we see on the left? What are we measuring? What are we actually trying to measure? What are the signals that we think help determine how good an engineer is? In the middle, how do we define what good, bad, and great looks like for those measures? On the right, how do we choose a level for a candidate?
By introducing these new dimensions, we can actually come up with a whole new tradeoff that looks different than anything that I showed you before. We can say that the company broadly wants to define what we’re measuring, and what good and bad and great looks like, and how that maps to a level. Beyond that, it’s up to the teams to figure out how best to elicit signal on those measurements, and to produce something that evaluates a candidate using vocabulary that we all understand what good, bad, and great means. Once they have a final profile of how good, bad, and great a candidate is, and we can say this is a senior engineer one, they can decide if that’s what they’re looking for or not. When I was doing this, I wasn’t thinking I’m adding dimensions. I was, but I wasn’t thinking that’s what I was doing. I do think the formalism Will presented around that is actually really powerful for helping train yourself to spot what was missing between the prior version of this slide and this one.
Finish the Fight
I do want to take a moment to point out that our role does not end at just processing context to figure out what to do. There is still an execution layer that we have to hit. The conclusion of this wasn’t that Dan said, we’re going to hire like this, and then rode off into the sunset. No, I designed a system. I demonstrated to others the value and the benefits of the process. I orchestrated the organizational change to adopt this. Because this problem, it wasn’t just ambiguous to me. It was ambiguous. It was ambiguous to everyone. Your role in these situations grows to being responsible to provide clarity and alignment to others, up and down the org chart, because you will now have more context than anybody else.
The value of bringing that context together is that it becomes easier to disseminate to those who do not have it. This interviewing stuff is just one example from this slide. It’s just full-on inception in this deck today. I chose to focus on these. It’s not because these are the things I’m best at. What it really is, is at Carta, I’m the best at these things, or at least I was at the time. I’m not the only person who can work on these highly ambiguous problems. There are others. These problems were just very Dan shaped.
We are trying to build good products. We are trying to build a good company that builds good products. A lot of my recent work has been focused on that latter part, building a good company. Having said that, I want to go back and emphasize this point Will made earlier about the role being one-of-one. It does not generalize. I do not want to imply that being a right hand or a deputy to the CTO or something is required to become a principal engineer. I’m not even sure I would recommend it. I don’t know how to market myself. This is not a portable role. If you go online and look for deputy CTO titles, you’re not going to find any. Not that I’ve checked.
As Carta’s needs shift yet again, I will find myself, once again, rebalancing the kinds of problems I spend my time on. That’s what you all need to do too. You need to find out what your company’s needs are. Whatever principal engineer is, it’s going to be a role. It’s not a list of past accomplishments. Figuring that out is going to be very challenging. It’s an ambiguous problem. That’s what we’ve been talking about. Try and apply the same framing that Will presented earlier to your efforts to define principal engineer. Who are the stakeholders? What do different teams need from a principal engineer? What do different teams need from the definition of a principal engineer? What does senior leadership need? What do the junior engineers need? What does product need from a principal engineer? Go through these motions. Identify some tradeoffs. Look for the missing dimensions.
At the end of the day, this guy, this is an example. It’s not a template. These principal roles are one-of-one roles. Regardless of what definition you end up landing on precisely, the one thing that’s going to be certain is that the principal engineers are going to be solving the most ambiguous problems that you have, and there’s not going to be anybody else who can do that.
Questions and Answers
Participant 1: When you’re coming in to fix an ambitious, ambiguous problem, how do you raise the level of your team so you’re not just seen as the fixer and come in and define requirements? How do you help your team get to that level or become at the competence level of a principal engineer where they can take on these complex issues and then be spread out throughout the organization to have that competency throughout?
Larson: One of the reasons I wrote Staff Engineer is there were a lot of things that people wanted to be true that weren’t true that I wanted to convince them of. One of them is that this idea that because you think something’s important makes it something that leadership thinks is important. I think that the challenge is, typically, if you want to convince the executive team that an engineering problem is really important, that’s really hard to do. It’s not impossible to do, but it’s way easier to figure out what the executive is constrained on and then go help them with that than to convince them that they should actually have something else be more important than their current priority list.
My experience in executive roles is that I always have significantly more to do than I’m able to do. People come to me with ideas of stuff to do all the time. Usually, I’m like, “That’s an amazing idea. Go talk to Dan, or go talk to Shawna”, or something. It’s really hard to actually get your item onto my priority list unless I’m really missing something. My first thought in terms of trying to not just get the staff, like the principal, is go figure out what the CTO or the head of engineering needs and figure out how to help that, convincing them that they should prioritize what’s important to you. If they don’t already prioritize it from their perspective, pretty hard. Not impossible, but usually doesn’t go that well.
Participant 2: For some of the problems, the layers are discovered very later or after multiple iterations. How frequently, or does it make sense to bother the executive sponsor from time to time? Because I don’t want to keep going to the executive sponsor.
Larson: Managing the attention of executives is a lot of work. It depends on the individual and how much they care. Also, whether they trust you or not. When you’re trying to prove to the executive that you haven’t worked with much, you have to spend a lot of time up front just convincing them that it’s a good investment of their time. My experience is a lot of really short updates are really helpful. Where people get in trouble is they write a lot or they wait for too long and then give a huge update, and then it’s a ton of work for me to process it. I would try to give them a lot of really concise pieces and offer them the time, “If you want to chat, I’m here”, and let them guide the engagement a little bit.
See more presentations with transcripts