Dare Mighty Things: What NASA's Endeavors Teach Us About The Power Of Calculated RISCs

Transcript

Shams: “The National Space Station represents a groundbreaking collaboration between five space agencies, transcending geopolitical boundaries to extend scientific knowledge. Centuries later, image recaptured from the ISS showcases the Earth’s splendor in ways unimaginable in this era. The vibrant colors, the delicate atmosphere, and the brilliant reflection of sunlight affirm Galileo’s assertion that the Earth, seen from space, is indeed more splendid than the moon. DH^efying the powerful institutions of his day, he defended the heliocentric model despite fierce opposition from the church and traditional scholars. Even when placed under house arrest for his beliefs, Galileo continued to publish his work. Had he stopped and conformed to the pressure of his time, he might have been forgotten.

Instead, his defiance paved the way for the scientific method as we know it today. His courage to pursue observation, experimentation, and evidence-based reasoning revolutionized how we approach science and inquiry. Inspiring generations of thinkers to take risks, push boundaries, and to redefine what is possible. To be more like Galileo today, we must embrace boldness and take risks when faced with challenges. Often, the greater risk is inaction, staying comfortable with the status quo, rather than pushing for change. Galileo understood the pursuit of truth sometimes requires stepping into uncertainty, even at great personal cost. In a world that demands innovation, we must dare to question assumptions, challenge conventional wisdom by taking calculated risks, as Galileo did.”

Scope (Fear and Risk-Taking)

I’m going to talk to you about the power of calculated RISCs. The word RISC is not spelled wrong. It’s a play on words that will make sense in a little bit. Let’s get started with talking a little bit more about risks. Each of you took a risk today. You chose to leave your daily jobs to come to this conference. You risked coming back to a huge backlog on your mailboxes, your Slacks, meetings, emails. While you’re here at QCon San Francisco, some decisions get made while you’re at QCon.

You’re not going to be able to champion necessarily for the issues that you really wanted to champion for in debates that may be happening right now. Before you look to the door asking yourself how awkward it would be if you were to just exit right now, let me at least make a case to you why the FOMO is real. The FOMO is worth it, but the upside is better. You’re surrounded here by other bold risk takers who also abandoned their jobs to be here with you, to learn from each other, to network, and to make new connections that may make a lasting impact on your career, and perhaps beyond your career, too.

I started coming to QCon 15 years ago as a young engineer at NASA. It was incredibly hard for me to step out of my daily routine. The FOMO consumed me. It is here that I learned the value of networking, disconfirming my beliefs with new ideas. It is here that I learned to put aside my fears and present what I had been working on. It’s scary. As a volunteer organizer at QCon for a few of the conferences, I learned to overcome my fear of sending emails to complete strangers who are renowned experts in their fields to ask them to come speak at QCon for free.

I got so many rejections, usually by people just not responding. I got so many rejections that I became numb to it. I got so many unexpected accepts that I realized that it was ok to just ask. Fear is the primary reason we often avoid taking risks, but fear is often irrational. Had I not given up the week to come to QCon 15 years ago, my career might have been very different. I would not have had the confidence to go ask people, go ask strangers for things that I would normally be too shy to do so.

We’re going to discuss overcoming fear to take calculated risks. I’ll share a framework for risk-taking with examples from history, from space, and industry. I hope that you walk out of here inspired to make a few bold moves with a framework on how to make the most out of your newfound love of risk-taking. Before you go decide which risk to take, it is incredibly crucial to make sure that the risk is worth taking. Is your risk grounded by passion? Risks without passion is like setting sail without a destination. The best risks, or the risks with the best outcome, often take years to manifest and to materialize. You will encounter countless challenges along the way. Make sure, when you’re thinking of a bold move, that you are passionate and you know what the upside is.

A Risky Endeavor (The Mars Rover)

Let me tell you a little bit about a risky endeavor. This is a journey of 140 million miles. The rover, which is a size of a Mini Cooper, is just entering the Martian atmosphere. The heat shield does its best to protect the rover from temperatures exceeding 1600 degrees. The parachutes deploy, but then we realize that the Martian atmosphere is 100 times thinner than the Earth’s atmosphere, so they don’t help that much. Like any mad scientist’s wildest dream, we deploy jetpacks. After traveling 140 million miles, this rover is to land on its wheels on Mars. The sky crane flies away. This is no ordinary rover. It’s got a rocker-bogie suspension so that when it’s traversing the challenging terrain of Mars, the body of the rover can stay leveled. It’s got lasers. It’s got an arm that you can use to acquire samples and do science on board. This is how we explore Mars today.

This little rover, this landed in 2012. Perhaps one of the most profound inventions that this rover did was popularize the selfie. Selfies were not as common in 2012 as they are today. This little rover is obsessed with itself. If you think this is one selfie, it’s not. This is not an artist rendering. This is a collection of photos that the rover took just by covering every inch of its existence. This is dozens of images that have been stitched together. It’s fulfilling to know that the image processing pipeline I built to process images is being put to good use. Why is it there taking selfies? It gets a little bored. It’s lonely on Mars. You got to have something to do with your recreational time. The view is nice, but it gets tiring, so it takes selfies and sends it back.

Metrics really matter when you’re taking risks. This is a really important metric, 299,792,458. Any guesses as to what that is? Speed of light, that’s right. Mars is anywhere between 54-and-a-half to 400 billion meters away, depending on where the planets are in their orbit. When Earth and Mars are on the opposite side of the sun, it’s going to be 400 billion meters away. For any high bandwidth data transfer use cases, data gets communicated and relayed via a relay network of about five satellites that are orbiting Mars to send the data over to the rover.

If you take the 400 billion meters, the farthest that the planets get, and you divide it by the speed of light, pure coincidence, it’s 1337 seconds. A more socially versed friend told me that this is how you spell leet with numbers. The point here is that even if you really wanted to, you can’t operate the Mars rover with a joystick. You have to send it commands, and that fulfill is whole day. It gets a set of commands. It tries to do its best, and then it reports back. Then you lather, rinse, repeat every sol. A sol is a Martian day, 37 minutes longer than the Earth day.

What were the risks here? There are a few risks that were taken in order to pull this mission off. A really surprising risk is that the flight software and the driving software for this rover wasn’t actually done when it was launched. It takes nine months to get to Mars. While the rover is in its cruising altitude, which means different things here than it does on the plane right here, while it’s in its cruising altitude, or while it’s in cruise, scientists were busy writing up the software to make sure that the rover can operate itself safely. Bugs were being found, they were getting fixed. We were preparing a package to upload to it so it can actually land properly and do science in a safe manner. Inaction is often a really bad move when risks come.

You can wait till everything is done, but you’ll never actually ever be done with software. There’s always going to be more bugs. The problem is that if you wait too long, you miss the launch window, and you have to wait two years. Some of this is just necessity. Oftentimes, you will find yourselves in the midst of a bold bet or a risk that’s already been decided. You have no choice. You’re in the middle of the freeway. You got to go across. You have to make the most of it. It’s really important in this framework to understand the difference between productive and unproductive stress that comes with risks. If you find yourself in a situation, it’s easy to just spend time commiserating on how terrible the idea is, or how screwed we all really are. It’s cathartic. It feels like it might help your emotional state, but it doesn’t.

What really helps is being action oriented. What can we do to make the most out of the situation that we’re in? The other part of this particular rover’s journey was testing in prod. This is as YOLO as it gets. You can’t simulate the Martian atmosphere on Earth. The jetpack would react very differently on Earth than it does on Mars. The parachute will react very differently. There’s a whole lot of deploy straight to production. We do this all the time on Earth. I don’t care how beautiful your staging environment is, you will always be learning lessons in prod. Everything is a simulation. You learn lessons in prod, except here, there’s no rollback, there’s no canaries, no pagers. Just one shot to bet it all.

Framework 1: Think Bigger

The first advice I have, or the framework, there’s three parts of it, is oftentimes when we are considering a particular bold bet to make, it is incredibly easy to bikeshed on, or to get fixated on the downsides. That’s where most of us start. When you’re about to take on a new job, when you’re about to start a business, it’s easy to fixate on what could go wrong. That is productive, but only with a balance of a focus on what can go right. When I started in the entrepreneurial world, I started going to VC pitches, as an industry person to listen in. I thought that my job in these pitches was to point out all of the problems that the entrepreneurs had in their business plan.

I realized that I was the odd person out in these VC pitch sessions, because while I was there, looking at how there are 99 ways in which the particular company being pitched can fail, the really seasoned investors were looking at, what is the one thing that can go right? It’s a very different framework. It’s a hard framework for me as an engineer to employ, because as engineers, we’re trained to look for things that can go wrong, to look around the corner, to have that paranoia. When it comes to making bold moves, make sure that you’re starting and not thinking about what can go wrong, your starting point has to be what can go right. How do I make that bigger? How do I double down on the largest possible outcome? If you can’t answer that, then the risk may not be worth taking.

In terms of reckless entrepreneurship, there’s an entrepreneur that mentioned that I believe we are the best place in the world to fail, and we have plenty of practice. Failure and invention are inseparable twins. Any guesses to who said this? Jeff Bezos. Amazon has had its own set of failures, and we’ll talk more about some of them. When you are making bold risks, you’re going to fail. It’s ok. Even if you do everything right, somebody is going to sit there and point the things that you did wrong. After the Curiosity rover landed, this is CBS News, space, and they had this awesome article that said, slow but rugged. It basically pointed out that the microchip controller that’s operating the Mars rover has less horsepower than a typical cell phone. That’s frustrating to read, but it’s accurate.

That rover that you saw, everything inside of it, the entry, descent, landing, the autonomous driving, the path planning, everything gets done on this incredible RAD750 processor. It’s a beast, 200 megahertz, 256 megabytes of RAM, 400 million floating point operations per second. That last one, just to put that in context, the iPhones are measured in teraflops, this one is in megaflops. It’s just got this one really cool feature, though, which is that it is radiation hardened. The scientists and the software developers at NASA JPL have to work probably 10 times as hard to make the rover do things on this particular piece of hardware that also runs a real-time operating system. It’s not running your typical Linux. You’re not running TensorFlow on it. It’s semi arcane.

Eight years later, when the Perseverance rover was being built, the scientists really wanted to land it at a site that was a lot more challenging. They found Jezero Crater. There’s a lot of craters on Mars, but this particular crater was very interesting because it had a very clear delta, like water flowing. It had an outlet. It had an inlet. There’s a very good chance that this crater contained water at some point in time. The terrain around it was incredibly challenging. The Curiosity rover had a 20-kilometer accuracy in where it would land compared to where we wanted it to land.

For Perseverance, because of the challenging terrain, the accuracy range had to be within 40 meters. That’s 500 times smaller radius that the rover has to land in. The engineers at NASA worked on ways to modulate the rover and to improve the algorithms to do entry, descent, and landing with a higher accuracy. They invented what’s called terrain relative navigation. Terrain relative navigation basically is evaluating the visual features that are on the ground and then using that to steer the rover in the right direction, much like your self-driving cars would do, except there’s no GPS and the terrain is a little different, and a lot less features on Mars. Terrain relative navigation was required to pull this off. It’s a lot of software that has to be written to pull this off.

If anybody here really wants to be a roboticist at NASA, and interviews and find yourself at the interview question where somebody says, write terrain relative navigation for me. I’ve got a cheat sheet for you. You take pictures as you’re descending, you compare it to an orbital map that you have, and you go left or right based on where things are. That’s essentially what terrain relative navigation is. This was something that could have really benefited from a massive compute platform. We had very quantified objectives here, 40 meters as opposed to 20 kilometers. While it’s tempting to just say, let’s just change the compute platform to be able to land within that 40-meter radius, it’s important to internalize that the problem to be solved is not the chipset.

The problem to be solved is, can we pull this off? There’s a famous alleged quote from Einstein that does a better job than this one. It’s not clear whether he actually said this or not. Apparently, when asked what he would do if he had 60 minutes to save the Earth by solving a problem, he would spend 59 minutes deciding what the problem is and 1 minute solving it. What he’s actually said, though, is that the formulation of a problem is more essential than the solution. Sometimes it’s really important in a design review to just step out and say, what is the problem we’re trying to solve? It’s one of the most profound questions that principal engineers ask when they see junior engineers arguing over a particular technology choice. It’s really important to understand what is the problem that we’re solving.

Framework 2: Derisk

The other part of this, why we didn’t jump to a new platform right away, is you got to derisk your bold risks as well. The choice isn’t always as simple as taking the risk or not taking it. It’s about maximizing the upside and then reducing the blast radius of the downside. That’s what distinguishes your ability to take bold risks or not. You got to stop thinking about it as a black and white, taking it or not taking it. This is where the Snapdragon 801 chip came into play. It’s a Qualcomm chip. It’s actually quite common. There aren’t that many RAD750 chips, since Mac used them in the ’90s. That was great. You see a lot of Snapdragon 801s being developed.

This particular model, if anybody here likes to use Android, if you had a Galaxy S5 in 2014, that’s the chip in the Galaxy S5, and a bunch of other Android phones. Many Android phones use some variant of a Snapdragon chip. This, from a perspective that you can take, I’m not saying that I take this perspective, but one perspective could be that this is a toy. This is not meant for serious space exploration on a billion-dollar rover, it’s for a $1,500 phone.

Even if you don’t take that uncharitable of a perspective, it is hard to imagine a consumer grade chip surviving the thermal cycles, the vibration at the launch, the radiation on Mars. We go back and look at the upside. Chip is incredible. It runs Linux, as opposed to a real-time operating system. Don’t get fixated on the processing power, gigaflop numbers, those are like miles per gallon on the car that you might buy. Real-world application really matters. If you need more evidence of that, just see how fast I can drive a race car on the track versus an actually good driver. Application matters a lot more than what the device can do.

Because the device has Linux, because scientists can run TensorFlow on it, they can run PyTorch on it, the productivity goes up meaningfully as well, and the gains that we get are meaningfully larger than the 100x gains that are over there. Power draw, that’s another thing. In a spacecraft, you don’t have an infinite supply of power that you do in an autonomous car or on Earth. This thing draws a couple of watts of power, like 2 to 5 watts of power. It’s 10% of what the RAD750 requires. It weighs a tenth of the RAD750. It’s good on every dimension except radiation hardening.

This chip, it could change the world in terms of how we explore. Can we risk this multibillion-dollar rover? This particular rover, Perseverance, has a better arm, better instruments, better cameras, better sample acquisitions, but if the main brain of the system fails, all of that goes away. How can we derisk it? A lot can go wrong, but what can go right? What is something that we could do that we could never do before? Putting Snapdragon as the main processor was too risky, and it was a one-way door. A last-minute bug could delay the launch. Hardware failure could jeopardize the whole mission. What if we could amortize the risk? What if we could try it for the first time in a way that reduces blast radius?

Ingenuity is a helicopter that flew on Mars. Remember, the thinner atmosphere? Helicopters elevate by generating lift as their wings propel, but if there’s less air density, they got to work that much harder. That’s really the easy part. I’m not being charitable to the aerospace engineers there, but you make the propeller spin faster and it’ll go higher. I’m a software person, so I have to do this.

The hard part is because of the atmosphere being the way it is, the real-time controls require a lot more precision. Little moves can really mess up the helicopter. The terrain relative navigation here is a lot harder. Terrain relative navigation when you’re descending from space covering a very large area and have a reference map to work on, is a lot easier problem computationally than doing it in real time, and especially when flying across boring terrain on Mars. When I say boring, it just means there’s no building, sometimes things look really the same. It makes life a lot harder.

Doing this, the Ingenuity helicopter turned out to be a platform where we could turn this into a two-way door. Two-way door is a concept introduced by Jeff Bezos in a shareholder letter for Amazon in 2015, where there are some decisions that are nearly impossible or very expensive to undo, the other decisions are a lot easier. As companies get larger, oftentimes, we end up spending the same amount of time debating one-way doors as we debate the two-way doors. This is hard. If you take a mathematical perspective on this, it means that the one-way door decisions are getting the same amount of time as a two-way door decision.

The one-way door decisions are getting a lot less time than they should, and the two-way door decisions, things that are simpler are being bikeshed upon and there’s indecisiveness. The job of a senior leader is to not take one-way door decisions. The job of a senior leader in a company is to convert as many of your one-way door decisions into two-way door decisions. You can take a unique perspective. You can reduce the blast radius. Reducing blast radius is a really good way to transform a one-way door decision into a two-way door decision.

That’s exactly what we did with Ingenuity. Ingenuity was a huge success. The target was very modest. We wanted to see if we can fly it five times. This is not artist rendering, this is actual photos. This is it filming its shadows. This video is a little dated. It was made to commemorate Ingenuity’s 50th flight. It eventually completed a total of 72 flights. It went as high as 79 feet, 128 minutes in flight. It explored for 11 miles. Ingenuity can’t fly anymore, and it’s not because the chip got stuck with a cosmic ray, it had a bad landing, and the propellers were damaged. It’s far exceeded it duty cycle. It’s inspiring and paving the way for future missions.

There’s another concept that’s being talked about right now, which is a hexacopter. It’s got six wings to not only go explore, but also carry samples around Mars. There’s also Dragonfly. This is a mission that is funded to go to Titan, a moon of Saturn. Titan is a very different place than Mars, with its own unique challenges. This is a fully contained mission. This is not a little helicopter that sits on the mother ship. It’s got its own RTG. It’s got its own scientific capabilities where it can acquire samples, analyze them, fly to the next spot, and do so again.

Let’s bring this down to be a little more terrestrial. The ability to transform one-way door decisions into two-way doors is not unique to Amazon or to NASA. We see this with Apple. In 2020, Apple released the M1 processor for the MacBooks. Some of us, especially developers, might remember, this was a little scary to transition over from Intel to the ARM chip. Was it worth it? I got one of those, and I was so happy with the battery power. What did Apple do? They launched three of their products. It was the MacBook Air, MacBook Pro, and Mac Mini, while they simultaneously continued to carry the Intel processors. This was a way to reduce blast radius.

That story didn’t start in 2020, it actually started in 2010 when the iPhone 4 was launched. This was the first time Apple put in its own silicon on a device. Before that, it was Samsung silicon. This is with the launch of the iPad. Today, the M4 chip that is on the latest MacBook Pro is also available in your iPad Pros, which is basically giving Apple a much higher leverage in terms of the innovations that they do on that particular platform for multiple devices. This is all just phased deployment. Anybody who’s an SRE, who’s doing continuous deployment, this is just phased deployment.

If you are deploying to one AZ at a time and two availability zones after that, and then spreading across multiple regions slowly and being ready to roll back, it’s essentially that, but in industry. When you’re doing this, when you’re taking these bold bets, you got to inspire your team. Oftentimes, it’s not taking the risk. It’s about making sure that everybody in the team knows what the upside is.

Now, armed with the success of what the Snapdragon chipset has allowed JPL to do, we’re seeing more teams pop up, and more scientists popping up and saying, what can we do with more advanced capabilities? This is the Nancy Roman Space Telescope. We’re experimenting on doing high-order wavefront sensing. This particular telescope, you’ve heard of Hubble or James Webb, this is the next version of that. Basically, it’s got optics. Optics have aberrations. HOFS just simply deals for those aberrations, so that you can denoise the light that is coming for stars and focus on dark energy and dark matter and really learn the secrets of the cosmos.

A lot of those algorithms are now being ported over to the Snapdragon as well. Synthetic Aperture Radar, this is another really cool instrument, where this is technology from space or from an airplane, it can detect the topology of the terrain that it’s over. What scientists typically do is they study how the topology changes across multiple passes of the orbit. It can see through clouds. It works at night. It has some really profound purposes, like being able to assess damage after an earthquake. It was used for that in previous earthquakes as well. This is all happening across JPL. Lots of new applications are showing up across radar, signal processing, telecom, autonomy.

Framework 3: Be Wrong, a Lot

I’ll leave you with the third and final piece of a risk-taking framework, is to be wrong a lot. If you’re good at course correcting, then being wrong may be less costly than you think. Imagine if you’re at a bar and you’re shooting darts, and the game is to see who can get closest to the center. You can spend a lot of time aiming and doing the whole ready, aim, fire thing, or you can do ready, fire, aim, and if you shoot the dart fast enough, you can get a second shot, and you can get more shots than your opponent. That’s often true. In a lot of decisions that you make, indecision is probably way worse than not taking a risk at all. What you’ll have to do is, instead of waiting for more data to arrive, you have to just get better at course correcting.

Just like at the bar, if you’re just shooting a lot, you’re going to get better with practice as well. Failures are going to happen, and you have to embrace them. Sometimes you just have to lean into those failures. Everything fails all the time. What we learned from Ingenuity, perhaps we were too concerned with the processor, how the processor could fail.

Primarily because of radiation, the cosmic rays that are coming on Mars, the JPL has full Markov models on how much extra radiation there is and the likelihood of a cosmic ray hitting a chip on Mars versus Earth. What if we could really lean in and just know that, this processor, which is a tenth of the weight and a fifth of the power draw is more prone to radiation driven upsets? I know something a platform engineer can teach NASA, redundancy. You can have two processors, one watching the other. That’s exactly what this particular paper covers.

In a platform world, this is just music to my ears, “The groundbreaking scientific benefits of these modern high-performance computing platform is of course balanced with the concerns for known transient radiation upsets, as demonstrated to exist with the data provided in this paper. The answer to this concern is to develop resilient and robust architecture that adapt and repair errors while still providing functionality”. What they did, is they just put two processors in this model, and one just watches the other. You can get pretty good 99.99% availability in a continuous operation, by having these two chips. You don’t have to stop at two, you can actually go deeper and make quorum-based systems that are verifying each other’s work as well.

Summary

Taking a risk, think bigger, derisk, and be willing to change your mind. Galileo is who we started talking about. His life was not all that glamorous. People think he invented the telescope, he didn’t. He just improved it. A Dutch scientist had built a telescope that had 3x magnification, he built one that had 20x magnification. He improved the design. What he’s really contributed to science is he discovered a whole bunch of moons that we didn’t know existed. He discovered that the moon surface is not smooth, it’s got craters, which was hard to believe at the time. He insisted that the Earth revolves around the sun, rather than the other way around. He had his failures.

He was actually a medical student, and could not really succeed at medicine because he didn’t like it. He started the trend of dropping out well before Silicon Valley did. There are failures, and it’s ok. I’ll share a couple of really quick failures that we don’t talk much about, but we learn from them. There was the Fire Phone. Anybody remember the Mac Cube? This was I think in the ’90s. It looked like a toaster oven, and people thought it was broken when they got it, because it had this little hole in there to put your CD-ROM in. It was a total flop. Those are not the products that we remember Amazon and Apple by. Today, as you head off, engage. It’s ok to be disconnected from work for a little bit. Meet some friends. Make some connections. Think about the choices that you can make that are not just optimizing for the next step, but your eventual goals.

See more presentations with transcripts

Dare Mighty Things: What NASA’s Endeavors Teach Us About the Power of Calculated RISCs

Transcript

Scope (Fear and Risk-Taking)

A Risky Endeavor (The Mars Rover)

Framework 1: Think Bigger

Framework 2: Derisk

Framework 3: Be Wrong, a Lot

Summary

Leave a Reply

Transcript

Scope (Fear and Risk-Taking)

A Risky Endeavor (The Mars Rover)

Framework 1: Think Bigger

Framework 2: Derisk

Framework 3: Be Wrong, a Lot

Summary

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply