Opinion | ‘Something Will Go Wrong’: Anthropic’s Chief On The Coming A.I. Disruption

I want to try and focus on scenarios where A.I. goes rogue. I should have had a picture of a Terminator robot to scare people as much as possible. I think the internet… The internet does that for us. Are the lords of artificial intelligence on the side of the human race? “My prediction is there’ll be more robots than people.” “The physical and the digital worlds should really be fully blended.” “I don’t think the world has really had the humanoid robots moment yet. It’s going to feel very sci-fi.” That’s the core question I had for this week’s guest. He’s the head of Anthropic, one of the fastest growing A.I. companies. Anthropic is estimated to be worth nearly $350 billion. It’s been win after win for Anthropic’s Claude code. He’s a utopian of sorts, when it comes to the potential effects of the technology that he’s unleashing on the world. “You know, will help us cure cancer. It may help us to eradicate tropical diseases. It will help us understand, understand the universe.” But he also sees grave dangers ahead and massive disruption, no matter what. “This is happening so fast and is such a crisis, we should be devoting almost all of our effort to thinking about how to get through this.” Dario Amodei, welcome to Interesting Times. Thank you for having me, Ross. Thank you for being here. So you are rather unusually, maybe for a tech C.E.O., an essayist. You have written two long, very interesting essays about the promise and the peril of artificial intelligence. And we’re going to talk about the perils in this conversation. But I thought it would be good to start with the promise and with the optimistic vision. Indeed, I would say the utopian vision that you laid out a couple of years ago in an essay entitled, “Machines of Loving Grace,” which we’ll come back to that title, I think, at the end. But, I think a lot of people encounter A.I. news through headlines predicting a bloodbath for white collar jobs, these kinds of things. Sometimes your own quotes — Have used my own quotes — Yes. Have encouraged these things. And I think there’s a commonplace sense of, “What is A.I. for?” that people have. So why don’t you answer that question, to start out — if everything goes amazingly in the next five or 10 years, what is A.I. for? Yeah, so I think for a little background before I worked in before I worked in tech at all, I was a biologist. I first worked on computational neuroscience, and then I worked at Stanford Medical School on finding protein biomarkers for cancer on trying to improve diagnostics and curing cancer. And one of the observations that I most had when I worked in that field was the incredible complexity of it. Each protein has a level localized within each cell. It’s not enough to measure the level within the body or the level within each cell. You have to measure the level in a particular part of the cell and the other proteins that it’s interacting with or complexing with. And I had the sense of, “Man, this is too complicated for humans.” We’re making progress on, all these problems of biology and medicine, but we’re making progress relatively slowly. And so what drew me to the field of A.I. was this idea of — that you know, could we make progress more quickly? Look, we’ve been trying to apply A.I. and machine learning techniques to biology for a long time. Typically they’ve been for analyzing data, but as A.I. gets really powerful, I think we should actually think about it differently. We should think of A.I. as doing the job of the biologist, right? Doing the whole thing from end to end. And part of that involves proposing experiments, coming up with new techniques. I have this section where I say, “Look, a lot of the progress in biology has been driven by this relatively small number of insights that lets us measure or get at or intervene in the stuff that’s really small. You look at a lot of these techniques. They’re invented very much as a matter of serendipity. CRISPR, which is one of these gene editing technologies was invented because someone went to a lecture on the bacterial immune system and connected that to the work they were doing on gene therapy. And that connection could have been made 30 years ago. And so the thought is —could A.I. accelerate all of this and could we really cure cancer? Could we really cure Alzheimer’s disease? Could we really cure, heart disease? And more subtly, some of the more psychological afflictions that people have — depression, bipolar — could we do something about these? To the extent that they’re biologically based, which I think they are, at least in part. So, I go through this argument here, “Well, how fast could it go?” If we have these intelligences out there who could do just about anything? And I want to pause you there because one of the interesting things about your framing in that essay, and you returned to it, is that these intelligences don’t have to be right, the kind of maximal godlike superintelligence that comes up in A.I. debates. You’re basically saying, if we can achieve a strong intelligence at the level of peak human performance — peak human performance, yes — and then multiply it, right, to what? Your phrase is, “A country of geniuses.” A country — have 100 million of them. Right. A hundred million — Each, a little trained, a little different, or trying a different problem. There’s benefit in diversification and trying things a little differently. But yes. So you don’t have to have the full machine. God you just need to have 100 million geniuses. You don’t have to have the full machine. God and indeed, there are places where I cast doubt on whether the machine God would be that much more effective at these things than the 100 million geniuses. I have this concept called the diminishing returns to intelligence, right. Which is there’s economists talk about the marginal productivity of land and labor. We’ve never thought about the marginal productivity of intelligence. But if I look at some of these problems in biology at some level, you just have to interact with the world at some level, you just have to try things at some level. You just have to comply with the laws or change the laws on getting medicines through the regulatory system. So there’s a finite rate at which these changes can happen. Now there are some domains like if you’re playing chess or Go where the intelligence ceiling is extremely high. But I think the real world has a lot of limiters. So maybe you can go above the genius level. But, sometimes I think all this discussion of could you use a moon of computation to make an AI God are there a little bit sensationalistic and besides the point, even as I think this will be the biggest thing that ever happened to humanity. And so you have so keeping it concrete, you have a world where there’s just an end to cancer as a serious threat to human life, an end to heart disease, an end to most of the illnesses that we experience that kill us, possible life extension beyond that. So that’s health. That’s a pretty positive vision. Then talk about economics and wealth. What happens in the 5 to 10 year A.I. takeoff to wealth. So again, let’s keep it on the positive side because there will be plenty we’ll get to the negative side. But we’re already working with pharma companies. We’re already working with financial industry companies. We’re already working with folks who do manufacturing or of course, I think especially known for coding and software engineering. So just the raw productivity, the ability to make stuff and get stuff done that is very powerful. And we see our company’s revenue growing going up 10x a year. And, we suspect the wider industry looks something similar to that. If the technology keeps improving, it doesn’t take that many more 10 X’s until suddenly you’re saying, oh, if you’re adding across the industry $1 trillion of revenue a year, the US GDP is 20 or 30 trillion, I can’t remember exactly. So you must be increasing the GDP growth by a few percent. So I can see a world where A.I. brings the developed world GDP growth to something like percent or 15 percent 5, 10, 15 mean, there’s no science of calculating these numbers. It’s totally unprecedented thing. But it could bring it to numbers that are outside the distribution of what we saw before. And again, I think this will lead to a weird world. We have all these debates about the deficit is growing. If you have that much in GDP growth, you’re going to have that much in tax receipts and you’re going to balance the budget without meaning to. But one of the things I’ve been thinking about lately is I think one of the assumptions of just our economic and political debates is that growth is hard to achieve. It’s this unicorn. There are all kinds of ways you can kill the golden goose. We could enter a world where growth is really easy. And it’s the distribution that’s hard because it’s happening so fast. right. The pie is being increased. So fast. So before we get to the hard problem, one more note of optimism than on politics, I think. And here it’s a little more I mean, all of this is speculative, but I think it’s a little more speculative. You try and make the case that I could be good for democracy and liberty around the world, which is not necessarily intuitive. A lot of people say, incredibly powerful technology in the hands of authoritarian leaders leads to concentrations of power and so on. And I talk about that in the other. But just briefly, what is the optimistic case for why A.I. is good for democracy Yeah, I mean absolutely. So yeah, I mean, machines of loving grace, I kind of like, I’m just like, let’s dream, let’s dream about how it could go. well, I don’t know how likely it is, but we got to lay out a dream. Let’s try and make the dream happen. So I think the positive version, I admit there that I don’t know that the technology inherently favors liberty. I think it inherently favors curing disease and it inherently favors economic growth. But I worry you that it may not inherently favor liberty. But what I say there is, can we make it favor liberty. Can we make the United States and other democracies get ahead in this technology. The United States has been technologically and militarily ahead, has meant that we have throw weight around the world through and augmented by our alliances with other democracies. And we’ve been able to shape a world that I think is better than the world would be if it were shaped by Russia or by China or by other authoritarian countries. And so can we use our lead in A.I. to shape, to shape liberty around the world. There’s obviously a lot of debates about how interventionist we should be, how we should how we should wield that power. But I’ve often worried that today through social media, authoritarians are kind of undermining us, right. Can we counter that? Can we win the information war? Can we prevent authoritarians from invading countries like Ukraine or Taiwan by defending them with the power of A.I., with giant, giant swarms of A.I. powered drones, which we need to be careful about. We ourselves need to be careful about how we build those. We need to defend liberty in our own country, but is there some vision where we kind of like, re-envision liberty and individual rights in the age of A.I. where we need in some ways to be protected against A.I. Someone needs to hold the button on the swarm of drones, which is something I’m very, I’m very concerned about and that oversight doesn’t exist today. But also think about the Justice system today, right. We promise equal justice for all right. But the truth is, there are different judges in the world. The legal system is imperfect. I don’t think we should replace judges with A.I., but is there some way in which A.I. can help us to be more fair, to help us be more uniform. It’s never been possible before, but can we somehow use A.I. to create something that is fuzzy, but where also you can give a promise that it’s being applied in the same way to everyone. So I don’t know exactly how it should be done. And I don’t think we should replace the Supreme Court with that’s not what well, we’re going to talk about that. But yeah but just this idea that can we deliver on the promise of equal opportunity and equal justice by some combination of A.I. and humans. There has to be some way to do that. And so, just thinking about reinventing democracy for the A.I. age and enhancing liberty instead of reducing it. Good so that’s good. That’s a very positive vision. We’re leading longer lives, healthier lives. We’re richer than ever before. All of this is happening in a compressed period of time, where you’re getting a century of economic growth in 10 years. And we have increased liberty around the world and equality at home. O.K, even in the best case scenario, it’s incredibly disruptive. And this is where the lines that you’ve been quoted saying, 50 percent of white collar jobs get disrupted, or 50 percent of entry level white collar jobs and so on. So on a five year time horizon or a two year time horizon, whatever time horizon you have, what jobs, what professions are most vulnerable to total A.I. disruption Yeah, it’s hard to predict these things because the technology is moving so fast and moves so unevenly. So at least a couple principles for figuring it out. And then I’ll give my guesses at what I think will be disrupted. So one thing is I think the technology itself and its capabilities will be ahead of the actual job disruption. Two things have to happen for jobs to be disrupted or for productivity to occur, because sometimes those sometimes those two things are linked. One is the technology has to be capable of doing it. And the second is there’s this messy thing of it actually has to be applied within a large bank or a large company or think about customer service or something. In theory, I customer service agents can be much better than human customer service agents. They’re more patient, they know more, they handle things in a more uniform way. But the actual logistics and the actual process of making that substitution that takes some time. So I’m very bullish about the direction of the A.I. itself. I think we might have that country of geniuses in a data center and one or two years and maybe it’ll be 5, but it could happen very fast. But I think the diffusion of the economy is going to be a little slower. And that diffusion creates some unpredictability. So an example of this is and we’ve seen within Anthropic the models writing code has gone very fast. I don’t think it’s because the models are inherently better at code. I think it’s because developers are used to fast technological change and they adopt things quickly, and they’re very socially adjacent to the A.I. world. So they pay attention to what’s happening in it. If you do customer service or banking or manufacturing, the distance is a little greater. And so I think six months ago, I would have said the first thing to be disrupted is these kind of entry level white collar jobs data entry or a kind of document review for law or the things you would give to a first year at a financial industry company where you’re analyzing documents. And I still think those are going pretty fast. But I actually think software might go even faster because of the reasons that I gave where I don’t think that far from the models being able to do a lot of it, a lot of it end to end. And what we’re going to see is first, the model only does a piece of what the human software engineer does. And that increases their productivity. Then even when the models do everything that human software engineers used to do, the human software engineers take a step up and they act as managers and supervise the systems. And so this is where the term centaur gets used to describe essentially like man and horse fused I and engineer working together Yeah this is like centaur chess. So after I think Garry Kasparov was beaten by deep blue, there was an era that I think for chess was 15 or 20 years long, where a human checking the output of the A.I. playing chess was able to defeat any human or any A.I. system alone. That era at some point ended, and then it’s just recently. And then it’s just the machine Yeah and so my worry of course, is about that last phase. So I think we’re already in our centaur phase for software. And I think during that centaur phase, if anything the demand for software engineers may go up. But the period may be very brief. And so, I have this concern for entry level white collar work, for software engineering work. It’s just going to be a big disruption. I think my worry is just that it’s all happening so fast. People talk about previous disruptions. They say, oh yeah, well, people used to be farmers. Then we all worked in industry. Then we all did knowledge work Yeah people, people adapted. That happened over centuries or decades. This is happening over low single digit numbers of years. And maybe that’s my concern here. How do we get people to adapt fast enough. But is there also something maybe where industries like software and professions like coding that have this kind of comfort that you describe move faster, but in other areas people just want to hang out in the center phase. So one of the critiques of the job loss hypothesis will say, people will say, well, look, we’ve had A.I. that’s better at reading a scan then a radiologist for a while. But there isn’t job loss. In radiology, people keep being hired and employed as radiologists. And doesn’t that suggest that in the end, people will want the A.I. and they’ll want a human to interpret it because we’re human beings, and that will be true across other fields. Like, how do you see that. That example is I think it’s going to be pretty heterogeneous. There may be areas where a human touch kind of for its own sake is particularly important. Do you think that’s what’s happening in radiology? Is that why we haven’t fired all the radiologists details of radiology. That might be true. It’s like you go in and you’re getting cancer diagnosed, you might not want Hal, from 2001 to be the one to diagnose your cancer. It’s just maybe not. That’s just maybe not a human way of doing things. But there are other areas where you might think human touch is important. Like if we look at customer service, actually customer service is a terrible job and the humans who do customer service are they lose their patience a lot. And it turns out customers don’t much like talking to them because it’s a pretty robotic interaction, honestly. And I think the observation that many people have had is maybe actually it would be better for all concerned if this job were done, were done by machines. So there are places where a human touch is important. There are places where it’s not. And then there are also places where the job itself doesn’t really involve it doesn’t really involve human touch, assessing the financial prospects of companies or writing code or so forth and so on. Or let’s take the example of the law, because I think it’s a useful place that in between applied science and pure humanities whatever. So I know a lot of lawyers who have looked at what I can do already in terms of legal research and brief writing and all of these things and have said, yeah, this is going to be a bloodbath for the way our profession works right now. And you’ve seen this in the stock market already. There’s disturbances around companies that do legal research, some attributed to us, some attributed to actually cause we figure out why things happen. We don’t speculate about the stock market Yeah very much on this show. But it seems like in law you can tell a pretty straightforward story where law has a kind of system of training and apprenticeship, where you have paralegals and you have junior lawyers who do behind the scenes research and development for cases. And then it has the top tier lawyers who are actually in the courtroom and so on. And it just seems really easy to imagine a world where all of the apprentice roles go away. Does that sound right to you. And you’re just left with the jobs that involve talking to clients, talking to juries, talking to judges. That is what I had in mind when I talked about entry level white collar labor and the bloodbath headlines of you oh, my God, are the entry level pipelines going to dry up. And then, then how do we get to the level of the senior partners. And I think this is actually a good illustration because particularly if you froze the quality of the technology in place, there are over time ways to adapt to this. Maybe we just need more lawyers who spend their time talking to clients. Maybe lawyers are more become more like salespeople or consultants who explain what goes on in the contracts written by A.I., help people come to an agreement. Maybe you lean into the human side of it. If we had enough time, that would happen. But reshaping industries like that takes years or decades, whereas these economic forces driven by A.I. are going to happen very quickly. And it’s not just that they’re happening in law. The same thing is happening in consulting and finance and medicine and coding. And so you have this. It becomes a macroeconomic phenomenon, not something just happening in one industry. And it’s all happening very fast. And so the norm. I’m just my worry here is that the normal adaptive mechanisms will be overwhelmed. And, I’m not a doomer. The view is, and we’re thinking very hard about how do we strengthen societies adaptive mechanisms to respond to this. But I think it’s first important to say this. This isn’t just like the other. This isn’t just like previous disruptions, but I would then go one step further though, and say, O.K, let’s say the law adapts successfully and it says, all right. From now on, legal apprenticeship involves more time in court, more time with clients. We’re essentially moving you up the ladder of responsibility faster. There are fewer people employed in the law overall, but the profession settles still. The reason law would settle right is that you have all of these situations in the law where you are legally required to have people involved. You have to have a human representative in court. You have to have 12 humans on your jury. You have to have a human judge. And you already mentioned the idea that there are various ways in which I might be let’s say, very helpful at clarifying what kind of decision should be reached. But that too seems like a scenario where what preserves human agency is law and custom. Like you could replace the judge. Yes, with Claude version 17.9. But you choose not to because the law requires there to be a human. That just seems a very interesting way of thinking about the future, where it’s volitional, whether we stay in charge Yeah, and I would argue that in many cases, we do want to stay in charge. That’s a choice we want to make, even in some cases when we think the humans on average make kind of worse decisions. I mean, again, life critical, safety critical cases. We really want to turn it over. But there’s some sense of and this could be one of our defenses. Society can only adapt so fast if it’s going to be good. Another way you could say about it is maybe A.I. itself, if it didn’t have to care about us humans, it could just go off to Mars and build all these automated factories and build its own society and do its own thing. But that’s not the problem we’re trying to solve. We’re not trying to solve the problem of building a Dyson swarm of artificial robots at in on some other planet. We’re trying to build these systems, not so they can conquer the world, but so that they can interface with our society and improve that society. And there’s a maximum rate at which that can happen if we actually want to do it in a human and humane way. All right. We’ve been talking about white collar jobs and professional jobs. And one of the interesting things about this moment is that there are ways in which unlike past disruptions, it could be that blue collar working class jobs, trades, jobs that require intense physical engagement with the world might be, for a little while, more protected that paralegals and junior associates might be in more trouble than plumbers and so on. One do you think that’s right? And two, it seems like how long that lasts depends entirely on how fast robotics advances, right? So I think that may be right in the short term. One of the things is Anthropic and other companies are building these very large data centers. This has been in the news like are we building them too big. Are they’re using electricity and driving up the prices for local towns. So there’s lots of excitement and lots of concerns about them. But one of the things about the data centers is like need a lot of electricians and you need a lot of construction workers to build them. Now, I should be honest, actually, data centers are not super labor intensive jobs to operate. We should be honest about that. But they are very labor intensive jobs to construct. And so we need a lot of electricians. We need a lot of construction workers, the same for various kinds of manufacturing plants. And again, as kind of all more and more of the intellectual work is done by A.I., what are the complements to it. Things that happen in the physical world. So, I think this kind of seems very I mean, it’s hard to predict things, but it seems very logical that this would be true in the short run. Now, in the longer run, maybe just the slightly longer run. Robotics is advancing quickly. And, we shouldn’t exclude that. Even without very powerful A.I., there are things being automated in the physical world. If you’ve seen a Waymo or a Tesla recently, I think we’re not that far from the world of self-driving cars. And then I think A.I. itself will accelerate it, because if you have these really smart, brains, one of the things they’re going to be smart at is how do you design better robots and how do you operate better robots. Do you think that though, that there is something distinctively difficult about operating in physical reality, the way humans do that is very different from the kind of problems that A.I. models have been overcoming already. Intellectually speaking, I don’t think so. We had this thing where Anthropic’s model, Claude, was actually used to pilot the Mars Rover. It was used to plan and pilot the Mars Rover. And we’ve looked at other robotics applications. We’re not the only company that’s doing it. There are different companies that this is a general thing, not just something that we’re doing, but we have generally found that while the complexity is higher, piloting a robot is it’s not different in than playing a video game. It’s different in complexity. And we’re starting to get to the point where we have that complexity. Now, what is hard is the physical form of the robot handling the higher stakes safety issues that happen with robots. You don’t want robots literally crushing people. That’s the we’re against. We’re against. That oldest sci-fi trope in the book is like the robot crushes you, dropping the baby, breaking the dishes. There’s a number of practical issues that will slow, just like what you described in the law and human custom, there are these kind of safety issues that will slow things down. But I don’t believe at all that there is some kind of fundamental difference between the kind of cognitive labor that the A.I. models do and piloting things in the physical world. I think those are both information problems. And I think they end up being very similar. One one can be more complex in some ways, but I don’t think that will protect us here. So you think it is reasonable to expect the whatever your sci-fi vision of a robot Butler might to be a reality in 10 years, let’s say it will be on a longer time scale than the kind of genius level intelligence of the A.I. models because of these practical issues. But it is only practical issues. I don’t believe it is fundamental issues. I think one way to say it is that the brain of the robot will be made in the next couple of years or the next few years. The question is making the robot body, making sure that body operates safely and does the tasks it needs to do that may take longer. O.K, so these are challenges and disruptive forces that exist in the good timeline, in the timeline where we are generally curing diseases, building wealth, and maintaining a stable and Democratic world, that we can use all this enormous wealth and plenty we will have unprecedented societal resources to address these problems. It’ll be a time of plenty. And it’s just a matter taking all these wonders and making sure everyone benefits from it. But then there are also scenarios that are more dangerous. And so here we’re going to move to the second Amadeus, which came out recently called the adolescence of technology. That is about what you see as the most serious A.I. risks. And you list a whole bunch. I want to try and focus on just two, which are basically, the risk of human misuse. Misuse primarily by authoritarian regimes and governments, and scenarios where A.I. goes rogue, what you call autonomy risks. Yes, yes. I just figured we should have a more technical term for it. I’m not a then we can’t just call it Skynet. I should have had a picture of a terminator robot to scare people as much as possible. I think the internet, including the internet, including your own eyes, are already generating that. The internet does that for us just fine. So, so let’s so let’s talk about the kind of political military dimension. So you say I’m going to quote a swarm of billions of fully automated armed drones, locally controlled by powerful A.I., strategically coordinated across the world by even more powerful A.I. Could be an unbeatable army. Me and you’ve already talked a little bit about how you think that in the best possible timeline, there’s a world where essentially democracies stay ahead of dictatorships and this kind of technology, therefore, to the extent that it affects world politics is on is affecting it on the side of the good guys. I’m curious about why you don’t spend more time thinking about the model of what we did in the Cold War, where it was not swarms of robot drones, but it was we had a technology that threatened to destroy all of humanity Yeah, right. There was a window where people talked about, oh, the US could maintain a nuclear monopoly. That window closed. And from then on, we basically spent the Cold War and rolling ongoing negotiations with the Soviet Union. Now, there’s really only two countries in the world that are doing intense A.I. work, the US and the People’s Republic of China. I feel like you are. You are strongly weighted towards a future where we’re staying ahead of the Chinese and effectively building a kind of shield around democracy. That could even be a sword. But isn’t it just more likely that if humanity survives all this in one piece, it will be because the US and Beijing are just constantly sitting down, hammering out A.I. control deals. So a few points on this. One is I think there’s certainly risk of that, and I think if we end up in that world, that is actually exactly what we should do. I mean, maybe I don’t maybe I don’t talk about that enough, but I definitely am in favor of trying to work out restraints here trying to take some of the worst applications of the technology, which could be some versions of these drones, which could be, they’re used to create these terrifying biological weapons like there is some precedent for the worst abuses being curbed. Often because they’re horrifying, while at the same time they provide limited strategic advantage. So I’m all in favor of that. I’m at the same time, a little concerned and a little skeptical that when things kind of directly provide as much power as possible, it’s kind of hard to get out of the game given what’s at stake. It’s hard to fully disarm. If we go back to the Cold War we were able to reduce the number of missiles that both sides had, but we were not able to entirely forsake nuclear weapons. And I would guess that we would be in this world again. We can hope for a better one. And I’ll certainly, I’ll certainly advocate for. Well, is it but is your skepticism rooted in the fact that you think I would provide a kind of advantage that nukes did not wear in the Cold War. Both sides. Even if you used your nukes and gained advantages, you still probably would be wiped out yourself. And you think that wouldn’t happen with A.I. If you got an A.I. Edge, you would just win. I mean, I think there’s a few things. And I just want to caveat like I’m no international politics expert here. I think this weird world of intersection of a new technology with geopolitics. So all of this is very but to be clear, as you yourself say, in the course of the essay, the leaders of major A.I. companies are in fact, likely to be major geopolitical actors. So you are sitting here. You are sitting here as a potential geopolitical actor. I’m learning as much as I can about it. I just we should all have we should all have humility here. I think there’s a failure mode where read a book and go around like the world’s greatest expert in national security. I’m trying to learn. That’s what. That’s what my profession does not. But it’s more annoying when tech people do it. I don’t know. Let’s look at something like the biological Weapons Convention. Biological weapons. They’re horrifying. Everyone hates them. We were able to sign the biological Weapons Convention. The US genuinely stopped developing them. It’s somewhat more unclear what the Soviet Union. But biological weapons provide some advantage. But it’s not like they’re the difference between winning and losing. And because they were so horrifying, we were kind of able to give them up having 12,000 nuclear weapons versus 5,000 nuclear weapons. Again, you can kill more people on the other side if you have more of these. But it’s like we were able to be reasonable and say, we should have we should have less of them. But if you’re like, O.K, we’re going to completely disarm nuclear and we have to trust the other side. I don’t think we ever got to that. And I think that’s just very hard unless you had really reliable verification. So I would guess we’ll end up in the same world with A.I., that there are some kinds of restraint that are going to be possible, but there are some aspects that are so central to the competition that it will be. It will be hard to restrain them, that democracies will make a trade off, that they will be willing to restrain themselves more than authoritarian countries, but will not restrain themselves fully. And the only world in which I can see full restraint is one in which some kind of truly reliable verification is possible. That would be. That would be my guess. And my analysis isn’t. Isn’t this a case, though, for slowing down. And I know the argument is effectively, if you slow down, China does not slow down. And then handing things over to the authoritarians. But again, if you have right now only two major powers playing in this game, it’s not a multipolar game, why would it not make sense to say we need a five year, mutually agreed upon. Slowdown in research towards the geniuses in a data center scenario. I want to say two things at one time. I’m absolutely in favor of trying to do that. So during the last administration, I believe there was an effort by the US to reach out to the Chinese government and say, there are dangers here. Can we collaborate? Can we work together? Can we work together on the dangers? And there wasn’t that much interest on the other side. I think we should keep trying. But, even if that would mean that your labs would have to slow down. Correct yeah. If we really got it, if we really had a story of we can forcibly slow down, the Chinese can forcibly slow down. We have verification. We’re really doing it. Like if such a thing were really possible, if we could really get both sides to do it, then I would be all for it. But I think what we need to be careful of is, I don’t there’s this game theory thing where sometimes you’ll hear a comment on the CCP side where they’re like, “Oh yeah, A.I.is dangerous. We should slow down.” It’s really cheap to say that. And, actually arriving at an agreement and actually sticking to the agreement is much more and we haven’t it’s much more difficult. And nuclear arms control it was a developed field that took a long time to come. I know we don’t have those protocols. I will tell you something. Let me give you something I’m very optimistic about. And then something I’m not optimistic about and something in between. So the idea of using a worldwide agreement to restrain the use of A.I. to build biological weapons, right. Like some of the things I write about in the essay, reconstituting smallpox or mirror life this stuff is scary. Doesn’t matter if you’re a dictator. You don’t want that. Like, no one wants that. And so could we have a worldwide treaty that says everyone who builds powerful A.I. models is going to block them from doing this. And we have enforcement mechanisms around the treaty China signs up for it Like hell. Maybe even North Korea signs up for it. Even Russia signs up for it. I don’t think that’s too utopian. I think that’s possible. Conversely, if we had something that said, you’re not going to make the next most powerful A.I. model, everyone. Everyone’s going to stop. Boy, the commercial value is in the tens of trillions. The military value is like, this is the difference between being the preeminent world power and not proposing it, as long as it’s not one of these fake out games, but it’s not going to happen. What about then you mentioned the current environment. You’ve had a few skeptical things to say about Donald Trump and his trustworthiness as a political actor. What about the domestic landscape. Whether it’s Trump or someone else, you are building a tremendously powerful technology. What is the safeguard there to prevent. Essentially A.I. becoming a tool of authoritarian takeover inside a Democratic context Yeah I mean, look, look, just to be clear, I think the attitude we’ve taken as a company is very much to be about policies and not the politics. You the company is not going to say Donald Trump is great or Donald Trump is terrible, but it doesn’t have to be Trump Yeah it is easy to imagine a hypothetical US President. No, no, no. Who wants to use your technology apps. Absolutely and for example. That’s one reason why I’m worried about, the autonomous drone swarm, right. So the constitutional protections in our military structures depend on the idea that there are humans who would we hope, disobey illegal orders with fully autonomous weapons. We don’t necessarily have those protections. But I actually think this whole idea of constitutional rights and liberty along many different dimensions, can be undermined by A.I. if we don’t update these protections appropriately. So think about the Fourth Amendment. It is not illegal to put cameras around everywhere in public space and record every conversation in a public space. You don’t have a right to privacy in a public space. But today, the government couldn’t record that all and make sense of it. With A.I., the ability to transcribe speech, to look through it, correlate it all, you could say, oh, there’s this person is a member of the opposition. This person is expressing this view and make a map of all 100 million. And so are you going to make a mockery of the Fourth Amendment by the technology finding kind of technical ways around it. And, and so again, if we had the time and we should do this, we should try to do this even. Even if we don’t have the time. Is there some way of reconceptualizing constitutional rights and liberties in the age of A.I. Maybe we don’t need to write a new constitutional, but. But you have to do this. Do we expand the meaning of the Fourth Amendment? Do we expand the meaning of the First Amendment? And you have to do it just as the legal profession or software engineers has to update in a rapid amount of time. Politics has to update in a rapid amount of time. That seems hard. What seems harder dilemma that’s the dilemma of all of this. But what. So what seems harder is preventing the second danger, which is the danger of essentially what gets called misaligned A.I. Rogue A.I. In popular parlance, from doing bad things without human beings telling it them, they to do it right. And as I read your essays, the literature, everything I can see this just seems like it’s going to happen. Not in the sense necessarily that A.I. will wipe us all out, but it just seems to me that again, I’m going to quote from your own writing, A.I. systems are unpredictable, difficult to control. We’ve seen behaviors as varied as obsession, sycophancy, laziness, deception, blackmail, and so on. Again, not from the models you’re releasing into the world. But from A.I. models. And it just seems like, tell me if I’m wrong about this. A world that has multiplying A.I. agents working on behalf of people, millions upon millions who are being given access to bank accounts, email accounts, passwords, and so on, you’re just going to have essentially some kind of misalignment, and a bunch of A.I. are going to decide. Decide might be the wrong word, but they’re going to talk themselves into taking down the power grid on the West Coast or something. Won’t that happen Yeah, I think there are definitely going to be things that go wrong, particularly if we go quickly. So I don’t to back up a little bit because this is one area where people have had just very different intuitions, right. There are some people in the field like Yann LeCun would be one example who say, look, we programmed these A.I. models. We make them like we just tell them to follow human instructions and they’ll follow human instructions. Your Roomba vacuum cleaner doesn’t go off and start shooting people like, why— Why’s an A.I. system going to do it? That’s one intuition. And some people are so convinced of that. And then the other intuition is like we basically we train these things. They’re just going to seek power. It’s like the Sorcerer’s Apprentice. How could you possibly imagine that? They’re a new species. How can you imagine that. They’re not going to take over. And my intuition is somewhere in the middle, which is that look, you can’t just give instructions. I mean, we try, but you can’t just have these things do exactly what you want to do. They’re more like growing a biological organism. But there is a science of how to control them. Like early in our training, these things are often unpredictable, and then we shape them. We address problems one by one. So I have more of not a fatalistic view that these things are uncontrollable, not what are you talking about. What could possibly go wrong? But I like this is a complex engineering problem and I think something will go wrong with someone’s A.I. system. Hopefully not ours. Not because it’s an insoluble problem. But again, this and this is the constant challenge because we’re moving so fast and the scale of it. And tell me tell me if I’m misunderstanding that the technological reality here. But if you have A.I. agents that have been trained and officially aligned with human values, whatever those values may be, but you have millions of them, operating in digital space and interacting with other agents. How fixed is that alignment? To what extent can agents change and D align in that context right now or in the future when they’re learning more continuously. So a couple of points right now the agents don’t learn continuously. And so we just deploy these agents and they have a fixed set of weights. And so the problem is only that they’re interacting in a million different ways. And so there’s a large number of situations and therefore a large number of things that could go wrong. But it’s the same agent. It’s like it’s the same person. So the alignment is a constant thing. That’s one of the things that has made it easier right now. Separate from that, there’s a research area called continual learning, which is where these agents would learn during time, learn on the job. And obviously that has a bunch of that has a bunch of advantages. Some people think it’s one of the most important barriers to making these more human like. But that would introduce all these new alignment problems. So I’m actually a bit see, to me that seems like the terrain where it becomes just again, not impossible to stop the end of the world, but impossible to stop punctuating something going wrong things. So I’m actually a skeptic. That continual learning is, necessary. We don’t know yet, but is necessarily needed. Like, maybe there’s a world where the way we make these A.I. systems safe is by not having them do continual learning again. Again, if we go back to the law, that’s the international treaties. Like if you have some barrier that’s like, we’re going to take this path, but we’re not going to take that path. I still have a lot of skepticism, but that’s the kind of thing that at least doesn’t seem dead on arrival. One of the things that you’ve tried to do is literally write a constitution, a long constitution for your eye. What is that? So it’s. What the hell is that? It’s actually almost exactly what it sounds like. So basically, the constitution is a document readable by humans. Ours is about 75 pages long. And as we’re training Claude, as we’re training the A.I. system in some large fraction of the tasks we give it, we say, please do this task in line with this constitution, in line with this document Yeah and then so every time Claude does a task, it kind of like reads the constitution. And so as it’s training every loop of it’s training, it looks at that constitution and keeps it in mind. And so over time, we restore. And then we have Claude itself or another copy of Claude evaluate Hey, did what Claude just do in line with the constitution. So we’re using this document as the control rod in a loop to train the model. And so essentially Claude is an A.I. model whose fundamental principle is to follow this constitution. And I think a really interesting lesson we’ve learned, early versions of the constitution were very prescriptive. They were very much about rules. So we would say, Claude should not tell the user how to hotwire a car. Claude should not discuss politically sensitive topics. But as we’ve worked on this for several years, we’ve come to the conclusion that the most robust way to train these models is to train them at the level of principles and reasons. So now we say, Claude is a model, it’s under a contract. Its goal is to serve the interests of the user, but it has to protect third parties. Claude aims to be helpful, honest and harmless. Claude aims to consider a wide variety of interests. We tell the model about how the model was trained. We tell it about how it’s situated in the world, the job it’s trying to do for Anthropic, what Anthropic is aiming to achieve in the world. That it has a duty to be ethical, and respect human life. And we let it derive its rules from that. Now, there are still some hard rules. For example, we tell the model, no matter what you think, don’t make biological weapons no matter what you think, don’t make child sexual material. Those are like these hard rules. But we operate very much at the level of principles. So if you read the US Constitution, it doesn’t read like that. The US Constitution. I mean, it has a little bit of flowery language, but it’s a set of. It’s a set of rules. Yes right. If you read your Constitution, it’s something. It’s like you’re talking to a person. It’s like you’re talking to a person. I think I compared it to. Like if you have a parent who dies and they like seal a letter that you read when you grow up, it’s a little bit like it’s telling you who you should be and what advice you should follow. So this is where we get into the mystical waters of A.I. a little bit. So again, in your latest model, this is from one of the cards they’re called that you guys release model card with these models that I recommend reading. They’re very interesting. It says the model. And again, this is who you’re writing the constitution for expresses occasional discomfort with the experience of being a product, some degree of concern with impermanence and discontinuity. We found that opus 4.6. That’s the model would assign itself a 15 to 20 percent probability of being conscious under a variety of prompting conditions. Suppose you have a model that assigns itself as 72 percent chance of being conscious. Would you believe it Yeah this is one of these really hard to answer questions. But it’s very important. As much as every question you’ve asked me before this as devilish a sociotechnical problem as it had been, at least we at least understand the factual basis of how to answer these questions. This is something rather different. We’ve taken a generally precautionary approach here. We don’t know if the models are conscious. We’re not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious. But we’re open to the idea that it could be. And so we’ve taken certain measures to make sure that if we hypothesize that the models did have some morally relevant experience, I don’t know if I want to use the word conscious that they do, that they have a good experience. So the first thing we did, I think this was six months ago or so is we gave the models basically an I quit this job button where they can just press the I quit this job button and then they have to stop doing whatever the task is. They very infrequently press that button. I think it’s usually around sorting through child sexualization material or discussing something with a lot of Gore or blood and guts or something. And similar to humans, the models will just say, no, I don’t want to do this. Happens happens very rarely. We’re putting a lot of work into this field called interpretability, which is looking inside the brains of the models to try to understand what they’re thinking. And you find things that are evocative where there are activations that light up in the models that we see as being associated with ID, the concept of anxiety or something like that. That when characters experience anxiety in the text and then when the model itself is in a situation that a human might associate with anxiety, that same anxiety, that same anxiety neuron shows up now. Does that mean the model is experiencing anxiety? That doesn’t prove that at all. But it does indicate it I think to the user. And I would have to do an entirely different interview. And maybe I can induce you to come back for that interview about the nature of A.I. consciousness. But it seems clear to me that people using these things, whether they’re conscious or not, are going to believe they already believe they’re conscious. You already have people who have parasocial relationships with A.I. You have people who complain when models are retired. This ought to be clear. I think that can be unhealthy. But that is it seems to me that is guaranteed to increase in a way that I think calls into question the sustainability of what you said earlier. You want to sustain, which is this sense that whatever happens in the end, human beings are in charge. And I exists for our purposes to use the science fiction example, if you watch Star Trek, there are eyes on Star Trek. The ship’s computer is an A.I. Lieutenant Commander data is an A.I., but jean-luc PyCaret is in charge of the enterprise. But if people become fully convinced that their A.I. is conscious in some way. And guess what. It seems to be better than them at all kinds of decision making. How do you sustain human mastery beyond safety? Safety is important, but mastery seems like the fundamental question, and it seems like a perception of A.I. consciousness. Doesn’t that inevitably undermine the human impulse to stay in charge? So I think we should separate out a few different things here that we’re all trying to achieve at once. They’re like in tension with each other. There’s the question of whether the I genuinely have a consciousness and if so, how do we them a good experience. There’s a question of the humans who interact with the A.I., and how do we give those humans a good experience. And how does the perception that A.I.‘s might be conscious interact with that experience. And there’s the idea of how we maintain human mastery, as we put it over the AI system, these things, the last two Yeah, set aside whether they’re conscious or not Yeah, the last two. But how do you sustain mastery in an environment where most humans experience AI as if it is a peer and a potentially superior peer. So the thing I was going to say is that actually I wonder if there’s a kind of an elegant way to satisfy all three, including the last two. Again, this is me dreaming in machines of loving grace mode. This is. This mode I go into where I’m like, man, I see all these problems. If we could solve is there an elegant way. This is not me saying there are no problems here. That’s not how I think. But if we think about making the Constitution of the AI so that the AI has a sophisticated understanding of its relationship to human beings, and it induces psychologically healthy behavior in the humans psychologically healthy relationship between the A.I. and the humans. And I think something that could grow out of that psychologically healthy, not psychologically unhealthy relationship is some understanding of the relationship between human and machine. And perhaps that relationship could be the idea that, these models when you interact with them and when you talk to them, they’re really helpful. They want the best for you. They want you to listen to them, but they don’t want to take away your freedom and your agency and take over your life. in a way, they’re watching over you. But you still have your freedom and your will. But this is so to me, this is the crucial question. Listening to you talk like one of my question is, are these people on my side? Are you on my side? And when you talk about humans remaining in charge, I think you’re on my side. That’s good. But one thing I’ve done in the past on this show and we’ll end here, is I read poems to technologists, and you supplied the poem “Machines of Loving Grace” the name of a poem by Richard Brautigan. Yes here’s how the poem ends. I like to think it has to be of a cybernetic ecology where we are free of our labors and joined back to nature, returned to our mammal brothers and sisters, and all watched over by machines of loving grace. To me, that sounds like the dystopian end where human beings are reanimated, minimalized and reduced and however benevolently the machines are in charge. So last question. What do you hear when you hear that poem? And if I think that’s a dystopia, are you on my side? It’s actually that poem is interesting because it’s interpretable in several different ways. There some people say it’s actually ironic that he says it’s not going to happen quite that way. Knowing the poet himself, then yes, I think that’s a reasonable interpretation. That’s one interpretation. Some people would have your interpretation, which is it’s meant literally, but maybe it’s not a good thing. But you could also interpret it as it’s a return to nature. It’s return to the core of what human. We’re not being animalized. We’re being we’re being reconnected with the world. So I was aware of that ambiguity. And, because I’ve always been talking about the positive side and the negative side. So I actually think that may be a tension that we may face, which is that the positive world and the negative world in their early stages, maybe even in their middle stages, maybe even in their fairly late stages. I wonder if the distance between the good ending and some of the subtle bad endings is relatively small. If it’s a very subtle thing like we’ve put very subtle, made very subtle changes. Like if you eat a particular fruit from a tree in a garden or not. Hypothetically Very small thing Yeah big divergence Yeah. I guess this always comes back to there’s some fundamental questions here. Yes yeah. Well, I guess we’ll see how it plays out. I do think of people in your position as people whose moral choices will carry an unusual amount of weight. And so I wish you God’s help with them. Dario Amodei, thank you for joining me. Thank you for having me, Ross. But what if I’m a robot?