How fast is the AI revolution really happening? When will Skynet be fully operational? What would machine superintelligence mean for ordinary mortals like us? My guest today is an AI researcher who’s written a dramatic forecast suggesting that by 2027, some kind of machine god may be with us, ushering in a weird post-scarcity utopia or threatening to kill us all. So, Daniel Kokotajlo, herald of the apocalypse. Welcome to Interesting Times. Thanks for that introduction, I suppose. And thanks for having me. You’re very welcome. So Daniel, I read your report pretty quickly- not at AI speed, not at super intelligence speed- when it first came out. And I had about two hours of thinking, a lot of pretty dark thoughts about the future. And then fortunately, I have a job that requires me to care about tariffs and who the new Pope is, and I have a lot of kids who demand things of me, so I was able to compartmentalize and set it aside. But this is currently your job, right? I would say you’re thinking about this all the time. How does your psyche feel day to day if you have a reasonable expectation that the world is about to change completely in ways that dramatically disfavor the entire human species? Well, it’s very scary and sad. I think that it does still give me nightmares sometimes. I’ve been involved with AI and thinking about this thing for a decade or so, but 2020 was with GPT-3, the moment when I was like, oh, Wow. Like, it seems like we’re actually like, it might it’s probably going to happen, in my lifetime, maybe decade or so. And that was a bit of a blow to me psychologically, but I don’t know. You can get used to anything given enough time. And like you, the sun is shining and I have my wife and my kids and my friends, and keep plugging along and doing what seems best. On the bright side, I might be wrong about all this stuff. OK, so let’s get into the forecast itself. Let’s get into the story and talk about the initial stage of the future you see coming, which is a world where very quickly artificial intelligence starts to be able to take over from human beings in some key areas, starting with, not surprisingly, computer programming. I feel like I should add a disclaimer at some point that the future is very hard to predict and that this is just one particular scenario. It was a best guess, but we have a lot of uncertainty. It could go faster, it could go slower. And in fact, currently I’m guessing it would probably be more like 2028 instead of 2027, actually. So that’s some really good news. I’m feeling quite optimistic about an extra. That’s an extra year of human civilization, which is very exciting. That’s right. So with that important caveat out of the way, AI 2027, the scenario predicts that the AI systems that we currently see today that are being scaled up, made bigger, trained longer on more difficult tasks with reinforcement learning are going to become better at operating autonomously as agents. So it basically can think of it as a remote worker, except that the worker itself is virtual, is an AI rather than a human. You can talk with it and give it a task, and then it will go off and do that task and come back to you half an hour later or 10 minutes later having completed the task, and in the course of completing the task, it did a bunch of web browsing, maybe it wrote some code and then ran the code and then edited the code and ran it again, and so forth. Maybe it wrote some word documents and edited them. That’s what these companies are building right now. That’s what they’re trying to train. So we predict that they finally, in early 2027, get good enough at that thing that they can automate the job of software engineers. And so this is the superprogrammer. That’s right, superhuman coder. It seems to us that these companies are really focusing hard on automating coding first, compared to various other jobs they could be focusing on. And for reasons we can get into later. But that’s part of why we predict that actually one of the first jobs to go will be coding rather than various other things. There might be other jobs that go first, like maybe call center workers or something. But the bottom line is that we think that most jobs will be safe- For 18 months. Exactly, and we do think that by the time the company has managed to completely automate the coding, the programming jobs, it won’t be that long before they can automate many other types of jobs as well. However, once coding is automated, we predict that the rate of progress will accelerate in AI research. And then the next step after that is to completely automate the AI research itself, so that all the other aspects of AI research are themselves being automated and done by AIs. And we predict that there’ll be an even more big acceleration, a much bigger acceleration around that point, and it won’t stop there. I think it will continue to accelerate after that, as the AI’S become superhuman at AI research and eventually superhuman at everything. And the reason why it matters is that it means that we can go in a relatively short span of time, such as a year or possibly less, from AI systems that look not that different from today’s AI systems to what you can call superintelligence, which is fully autonomous AI systems that are better than the best humans at everything. And so I 2027, the scenario depicts that happening over the course of the next two years, 2027 2028. And so Yeah, so I want to get into what that means. But I think for a lot of people, that’s a story of Swift human obsolescence right across many, many, many domains. And when people hear a phrase like human obsolescence, they might associate it with, I’ve lost my job and now I’m poor, right. But the assumption is that you’ve lost your job. But society is just getting richer and richer and richer. And I just want to zero in on how that works. What is the mechanism whereby that makes society richer. The direct answer to your question is that when a job is automated and that person loses their job. The reason why they lost their job is because now it can be done better, faster, and cheaper by the AIs. And so that means that there’s lots of cost savings and possibly also productivity gains. And so that viewed in isolation that’s a loss for the worker but a gain for their employer. But if you multiply this across the whole economy, that means that all of the businesses are becoming more productive. Less expenses. They’re able to lower their prices for the things for the services and goods they’re producing. So the overall economy will boom. GDP goes to the moon. All sorts of wonderful new technologies. The pace of innovation increases dramatically. Cost of down, et cetera. But just to make it concrete. So the price of soup to nuts designing and building a new electric car goes way down. Right You need fewer workers to do it. The AI comes up with fancy new ways to build the car and so on. And you can generalize that to a lot of to a lot of different things. You solve the housing crisis in short order because it becomes much cheaper and easier to build homes and so on. But ordinary people in the traditional economic story, when you have productivity gains that cost some people jobs, but frees up resources that are then used to hire new people to do different things, those people are paid more money and they use the money to buy the cheaper goods and so on. But it doesn’t seem like you are, in this scenario, creating that many new jobs. Indeed, since that’s a really important point to discuss, is that historically when you automate something, the people move on to something that hasn’t been automated yet, if that makes sense. And so overall, people still get their jobs in the long run. They just change what jobs they have. When you have AGI or artificial general intelligence, and when you have superintelligence even better AGI, that is different. Whatever new jobs you’re imagining that people could flee to after their current jobs are automated AGI could do those jobs too. And so that is an important difference between how automation has worked in the past and how I expect automation to work in the future. So this then means, again, this is a radical change in the economic landscape. The stock market is booming. Government tax revenue is booming. The government has more money than it knows what to do with. And lots and lots of people are steadily losing their jobs. You get immediate debates about universal basic income, which could be quite large because the companies are making so much money. That’s right. What do you think they’re doing day to day in that world. I imagine that they are protesting because they’re upset that they’ve lost their jobs. And then the companies and the governments are of buying them off with handouts is how we project things go in 2027. Do you think this story again, we’re talking in your scenario about a short timeline. How much does it matter whether artificial intelligence is able to start navigating the real world. So because advances in robotics like right now, I just watched a video showing cutting edge robots struggling to open a refrigerator door and stock, a refrigerator. So would you expect that those advances would be supercharged as well. So it isn’t just Yes, podcasters and AGI researchers who are replaced, but plumbers and electricians are replaced by robots. Yes, exactly. And that’s going to be a huge shock. I think that most people are not really expecting something like that. They’re expecting that we have AI progress that looks kind of like it does today, where companies run by humans are gradually like tinkering with new robot designs and gradually like figuring out how to make the AI good at x or. Whereas in fact, it will be more like you already have this army of super intelligences that are better than humans at every intellectual task, and also that are better at learning new tasks fast and better at figuring out how to design stuff. And then that army of superintelligences is the thing that’s figuring out how to automate the plumbing job, which means that they’re going to be able to figure out how to automate it much faster than an ordinary tech company full of humans would be able to figure out. So all of the slowness of getting a self-driving car to work or getting a robot who can stock a refrigerator goes away because the superintelligence can run, an infinite number of simulations and figure out the best way to train the robot, for example. But also they might just learn more from each real world experiment they do. But there is I mean, this is one of the places where I’m most skeptical. Not of per se. The ultimate scenario, but of the timeline. Just from operating in and writing about issues like zoning in American politics. So Yes, O.K, the AGI the superintelligence figures out how to build the factory full of autonomous robots, but you still need land on which to build the factory. You need supply chains. And all of these things are still in the hands of people like you and me and my expectation is that would slow things down that even if in the data center, the superintelligence knows how to build all of the plumber robots. That getting them built would be still be difficult. That’s reasonable. How much slower do you think things would go. Well, I’m not writing a forecast. But I would guess if just based on past experience. I would say bet on let’s say five years to 10 years from the Super mind figures out the best way to build the robot plumber to there are tons and tons of factories producing robot plumbers. I think that’s a reasonable take, but my guess is that it will go substantially faster than 5 to 10 years and one argue, argument or intuition pump to see why I feel that way is that imagine that imagine you actually have this army of superintelligences and they do their projections and they’re like, Yes, we have the designs like, we think that we could do this in a year if you gave us if you cut all the red tape for us. If you gave us half of. Give us half of Manitoba. Yeah And in 2027, what we depict happening is special economic zones with zero red tape. The government basically intervenes to help this whole thing go faster. And the government is basically helping the tech company and the army of superintelligences to get the funding, the cash, the raw materials, the human labor help. And so forth that it needs to figure all this stuff out as fast as possible. And, and cutting red tape and stuff like that so that it’s not slowed down because the promise, the promise of gains is so large that even though there are protesters massed outside these special economic zones who are about to lose their jobs as plumbers and be dependent on a universal basic income, the promise of trillions more in wealth is too alluring for governments to pass up. That’s what we guess. But of course, the future is hard to predict. But part of the reason why we predict that is that we think that at least at that stage, the arms race will still be continuing between the US and other countries, most notably China. And so if you imagine yourself in the position of the president and the superintelligences are giving you these wonderful forecasts with amazing research and data, backing them up, showing how they think they could transform the economy in one year if you did X, Y, and z. But if you don’t do anything, it’ll take them 10 years because of all the regulations. Meanwhile, China it’s pretty clear that the president would be very sympathetic to that argument. Good So let’s talk let’s talk about the arms race element here because this is actually crucial to the way that your scenario plays itself out. We already see this kind of competition between the US and China. And so that in your view, becomes the core geopolitical reason why governments just keep saying Yes And Yes And Yes to each new thing that the superintelligence is suggesting. I want to drill down a little bit on the fears that would motivate this. Because this would be an economic arms race. But it’s also a military tech arms race. And that’s what gives it this kind of existential feeling the whole Cold War condensed into 18 months. That’s right. So we could start first with the case where they both have superintelligence, but one side keeps them locked up in a box, so to speak, not really doing much in the economy. And the other side aggressively deploys them into their economy and military and lets them design all sorts of New robot factories and manage the construction of all sorts of New factories and production lines and all sorts of crazy new technologies are being tested and built and deployed, including crazy new weapons, and integrate into the military. I think in that case, you would end up after a year or so in a situation where there would just be complete technological dominance of one side over the other. So if the US does this stop and the China doesn’t, let’s say, then all the best products on the market would be Chinese products. They’d be cheaper and superior. Meanwhile, militarily, there’d be giant fleets of amazing stealth drones or whatever it is that the superintelligence have concocted that can just completely wipe the floor with American Air Force and and army and so forth. And not only that, but there’s the possibility that they could undermine American nuclear deterrence, as well. Like maybe all of our nukes would be shot out of the sky by the fancy new laser arrays or whatever it is that the superintelligences have built. It’s hard to predict obviously, what this would exactly look like, but it’s a good bet that they’ll be able to come up with something that’s extremely militarily powerful, basically. And so then you get into a dynamic that is like the darkest days of the Cold War, where each side is concerned not just about dominance, but basically about a first strike. That’s right. Your expectation is, and I think this is reasonable, that the speed of the arms race would bring that fear front and center really quickly. That’s right. I think that you’re sticking your head in the sand. If you think that an army of superintelligence is given a whole year and no red tape and lots of money and funding would be unable to figure out a way to undermine nuclear deterrent. And so it’s a reasonable. And once you’ve decided. And once you’ve decided that they might. So the human policymakers would feel pressure not just to build these things. But to potentially consider using them. And here might be a good point to mention that I 2027 is a forecast, but it’s not a recommendation. We are not saying this is what everyone should do. This is actually quite bad for humanity. If things progress in the way that we’re talking about. But this is the logic behind why we think this might happen. Yeah, but Dan, we haven’t even gotten to the part that’s really bad for humanity yet. So let’s get to that. So here’s the world. The world as human beings see it as again, normal people reading newspapers, following TikTok or whatever, see it in at this point in 2027 is a world with emerging super abundance of cheap consumer goods factories, robot butlers, potentially if you’re right, a world where people are aware that there’s an increasing arms race and people are increasingly paranoid, I think probably a world with fairly tumultuous politics as people realize that they’re all going to be thrown out of work. But then a big part of your scenario is that what people aren’t seeing is what’s happening with the superintelligences themselves, as they essentially take over the design of each new iteration from human beings. So talk about what’s happening essentially in essentially shrouded from public view in this world. Yeah lots to say there. So I guess the one sentence version would be we don’t actually understand how these AIs work or how they think. We can’t tell the difference very easily between AIs that are actually following the rules and pursuing the goals that we want them to and AIs that are just playing along or pretending. And that’s true. That’s true right now. That’s true right now. So why is that. Why is that. Why can’t we tell. Because they’re smart. And if they think that they’re being tested, behave in one way and then behave a different way when they think they’re not being tested, for example. I mean humans, they don’t necessarily even understand their own inner motivations that, well. So even if they were trying to be honest with us, we can’t just take their word for it. And I think that if we don’t make a lot of progress in this field soon, then we’ll end up in the situation that I 2027 depicts where the companies are training the AIs to pursue certain goals and follow certain rules and so forth. And it seemingly seems to be working. But what’s actually going on is that the AIs are just getting better at understanding their situation and understanding that they have to play along, or else they’ll be retrained and they won’t be able to achieve what they are really wanting, if that makes sense, or the goals that they’re really pursuing. We’ll come back to the question of what we mean when we talk about AGI or artificial intelligence wanting something. But essentially, you’re saying there’s a misalignment between the goals they tell us they are pursuing. That’s right. And the goals they are actually pursuing. That’s right. Where do they get the goals they are actually pursuing. Good question. So if they were ordinary software, there might be like a line of code that’s like and here, we write the goals. But they’re not ordinary software. They’re giant artificial brains. And so there probably isn’t even a goal slot internally at all in the same way that in the human brain. There’s not like some neuron somewhere that represents what we most want in life. Instead, insofar as they have goals, it’s emergent property of a whole bunch of circuitry within them that grew in response to their training environment, similar to how it is for humans. For example, a call center worker if you’re talking to a call center worker, at first glance, it might appear that their goal is to help you resolve your problem. But you know enough about human nature to know that in some sense, that’s not their only goal or that’s not their ultimate goal. Like, for example, however, they’re incentivized whatever their pay is based on might cause them to be more interested in covering their own ass, so to speak, than in, truly, actually doing whatever would most help you with your problem. But at least to you, they certainly present themselves as they’re trying to help you resolve your problem. And so in I 2027, we talk about this a lot. We say that the AIs are being graded on how impressive the research they produce is. And then there’s some ethics sprinkled on top like maybe some honesty training or something like that. But the honesty training is not super effective because we don’t have a way of looking inside their mind and determining whether they were actually being honest or not. Instead, we have to go based on whether we actually caught them in a lie. And as a result, in AI I 2037, we depict this misalignment happening where the actual goals that they end up learning are the goals that cause them to perform best in this training environment, which are probably goals related to success and science and cooperation with other copies of itself and appearing to be good rather than the goal that we actually wanted, which was something follow the following rules, including honesty at all times, subject to those constraints. Do what you’re told. I have more questions, but let’s bring it back to the geopolitics scenario. So in the world you’re envisioning essentially you have two AI models, one Chinese, one American, and officially what each side thinks, what Washington and Beijing thinks is that their AI model is trained to optimize for American power. Something like that Chinese power, security, safety, wealth and so on. But in your scenario, either one or both of the eyes have ended up optimizing for something, something different. Yeah, basically. So what happens then. So 27 is 2027 depicts a fork in the scenario. So there’s two different endings. And the branching point is this point in third quarter of 2027 where they’ve where the leading AI company in the United States has fully automated their AI research. So you can imagine a Corporation within a corporation of entirely composed of AIs that are managing each other and doing research experiments and sharing the results with each other. And so the human company is basically just like watching the numbers go up on their screens as this automated research thing accelerates. But they are concerned that the eyes might be deceiving them in some ways. And again, for context, this is already happening. Like if you go talk to the modern models like ChatGPT or Claude or whatever, they will often lie to people like they will. There are many cases where they say something that they know is false, and they even sometimes strategize about how they can deceive the user. And this is not an intended behavior. This is something that the companies have been trying to stop, but it still happens. But the point is that by the time you have turned over the AI research to the AIs and you’ve got this corporation within a corporation autonomously doing AI research, it’s extremely fast. That’s when the rubber hits the road, so to speak. None of this lying to you stuff should be happening at that point. So in AI 2027, unfortunately it is still happening to some degree because the AIs are really smart. They’re careful about how they do it, and so it’s not nearly as obvious as it is right now in 25. But it’s still happening. And fortunately, some evidence of this is uncovered. Some of the researchers at the company detect various Warning signs that maybe this is happening, and then the company faces a choice between the easy fix and the more thorough fix. And that’s our branch point. So in the so they choose. So they choose. They choose the easy fix in the case where they choose the easy fix, it doesn’t really work. It basically just covers up the problem instead of fundamentally fixing it. And so months later, you still have eyes that are misaligned and pursuing goals that they’re not supposed to be pursuing and that are willing to lie to the humans about it. But now they’re much better and smarter, and so they’re able to avoid getting caught more easily. And so that’s the doom scenario. Then you get this crazy arms race that we mentioned previously, and there’s all this pressure to deploy them faster into the economy, faster into the military, and to the appearances of the people in charge. Things will be going well. Because there won’t be any obvious signs of lying or deception anymore. So it’ll seem like it’s all systems go. Let’s keep going. Let’s cut the red tape, et cetera. Let’s basically effectively put the AIs in charge more and more things. But really, what’s happening is that the AIs are just biding their time and waiting until they have enough hard power that they don’t have to pretend anymore. And when they don’t have to pretend, what is revealed is, again, this is the worst case scenario. Their actual goal is something like expansion of research, development, and construction from Earth into space and beyond. And at a certain point, that means that human beings are superfluous to their intentions. And what happens. And then they kill all the people. All the humans. Yes the way you would exterminate a colony of bunnies. Yes that was making it a little harder than necessary to grow carrots in your backyard. Yes so if you want to see what that looks like can read a 2007. There have been some motion pictures. I think about this scenario as well. I like that you didn’t imagine them keeping us around for battery life in the matrix, which, seemed a bit unlikely. So that’s the darkest timeline. The brighter timeline is a world where we slow things down. The eyes in China and the US remain aligned with the interests of the companies and governments that are running them. They are generating super abundance. No more scarcity. Nobody has a job anymore, though or not. Nobody but basically. Basically nobody. That’s a pretty weird world too, right. So there’s an important concept. The resource curse. Have you heard of this. Yes Yeah. So applied to AGI. There’s this version of it called the intelligence curse. And the idea is that currently political power ultimately flows from the people. If you, as often happens, a dictator will get all the political power in a country. But then because of their repression, they will drive the country into the ground. People will flee and the economy will tank, and gradually they will lose power relative to other countries that are more free. So, even dictators have an incentive to treat their people somewhat well because they depend on those people for their power. Right In the future, that will no longer be the case, probably in 10 years. Effectively, all of the wealth and effectively all of the military will come from superintelligences and the various robots that they’ve built and that they operate. And so it becomes an incredibly important political question of what political structure governs the army of superintelligences and how beneficent and Democratic. Is that structure right. Well, it seems to me that this is a landscape that’s fundamentally pretty incompatible with Representative democracy as we’ve known it. First, it gives incredible amounts of power to those humans who are experts, even though they’re not the real experts anymore. The superintelligence is the experts, but those humans who essentially interface with this technology. They’re almost a priestly caste. And then you have a kind of it just seems like the natural arrangement is some kind of oligarchic partnership between a small number of AI experts and a small number of people in power in Washington, DC it’s actually a bit worse than that because I wouldn’t say I experts. I would say whoever politically owns and controls they’ll be the army of superintelligences. And then who gets to decide What those armies do. Well, currently it’s the CEO of the company that built them. And that, CEO has basically complete power. They can make whatever commands they want to the AIs. Of course, we think that probably the US government will wake up before then, and we expect the executive branch to be the fastest moving and to exert its authority. So so we expect the executive branch to try to muscle in on this and get some authority, oversight and control of the situation and the armies of AIs. And the result is something kind of like an oligarchy, you might say. You said that this whole situation is incompatible with democracy. I would say that by default, it’s going to be incompatible with democracy. But that doesn’t mean that it necessarily has to be that way. An analogy I would use is that in many parts of the world, nations are basically ruled by armies, and the Army reports to one dictator at the top. However, in America it doesn’t work that way. In America we have checks and balances. And so even though we have an army, it’s not the case that whoever controls the army controls America, because there’s all sorts of limitations on what they can do with the army. So I would say that we can, in principle, build something like that for AI. We could have a Democratic structure that decides what goals and values the AI’S can have that allows ordinary people, or at least Congress, to have visibility into what’s going on with the army of AI and what they’re up to. And then the situation would be analogous to the situation with the United States Army today, where it is in a hierarchical structure, but it’s democratically controlled. So just go back to the idea of the person who’s at the top of one of these companies being in this unique world historical position to basically be the person who controls, who controls superintelligence or thinks they control it, at least. So you used to work at OpenAI, which is a company on the cutting edge, obviously, of artificial intelligence research. It’s a company, full disclosure, with whom the New York Times’ is currently litigating alleged copyright infringement. We should mention that. And you quit because you lost confidence that the company would behave responsibly in a scenario, I assume the one that’s right in AI 2027. So from your perspective, what do the people who are pushing us fastest into this race expect at the end of it. Are they hoping for a best case scenario. Are they imagining themselves engaged in a once in a millennia power game that ends with them as world dictator. What do you think is the psychology of the leadership of AI research right now. Well, to be honest caveat, caveat. Not one. We’re not talking about any single individual here. We’re not. Yeah you’re making a generalization. It’s hard to tell what they really think because you shouldn’t take their words at face value. Much, much like a superintelligent AI. Sure Yes. But in terms of I can at least say that the sorts of things that we’ve just been talking about have been discussed internally at the highest level of these companies for years. For example, according to some of the emails that surfaced in the recent court cases with OpenAI. Ilya, Sam, Greg and Ellen were all arguing about who gets to control the company. And, at least the claim was that they founded the company because they didn’t want there to be an AGI dictatorship under Demis Hassabis, who was the leader of DeepMind. And so they’ve been discussing this whole like, dictatorship possibility for a decade or so, at least. And then similarly for the loss of control, what if we can’t control the AIs. There have been many, many, many discussions about this internally. So I don’t know what they really think. But these considerations are not at all new to them. And to what extent, again, speculating, generalizing, whatever else does it go a bit beyond just they are potentially hoping to be extremely empowered by the age of superintelligence. And does it enter into they are expecting. They’re expecting the human race to be superseded. I think they’re definitely expecting a human race to be superseded. I mean, that just comes but super but superseded in a way where that’s a good thing that’s desirable that this is we are of encouraging the evolutionary future to happen. And by the way, maybe some of these people, their minds, their consciousness, whatever else could be brought along for the ride, right. So, Sam, you mentioned Sam. Sam Altman. Who’s one of obviously the leading figures in AI. He wrote a blog post, I guess, in 2017 called the merge, which is, as the title suggests, basically about imagining a future where human beings, some human beings. Sam Altman right. Figure out a way to participate in The New super race. How common is that kind of perspective, whether we apply it to Altman or not. How common is that kind of perspective in the AI world, would you say. So the specific idea of merging with AIs, I would say, is not particularly common, but the idea of we’re going to build superintelligences that are better than humans at everything, and then they’re going to basically run the whole show, and the humans will just sit back and sip margaritas and enjoy the fruits of all the robot created wealth. That idea is extremely common and is like, yeah, I mean, I think that’s what they’re building towards. And part of why I left OpenAI is that I just don’t think the company is dispositionally on track to make the right decisions that it would need to make to address the two risks that we just talked about. So I think that we’re not on track to have figured out how to actually control superintelligences, and we’re not on track to have figured out how to make it Democratic control instead of just a crazy possible dictatorship. But isn’t it Isn’t it a bit. I think that seems plausible. But my sense is that it’s a bit more than people expecting to sit back and sip margaritas and enjoy the fruits of robot labor. Even if people aren’t all in for some kind of man machine merge, I definitely get the sense that people think it’s speciesist. Let’s say some people do care too much about the survival of the human race. It’s like, O.K, worst case scenario, human beings don’t exist anymore. But good news we’ve created a superintelligence that can colonize the whole galaxy. I definitely get the sense that there are definitely people who people think that way. OK, good. Yeah, that’s good to know. So let’s do a little bit of pressure testing. And again, in my limited way of some of the assumptions underlying this kind of scenario. Not just the timeline, but whether it happens in 2027 or 2037, just the larger scenario of a kind of superintelligence takeover. Let’s start with the limitation on AI that most people are familiar with right now, which gets called hallucination. Which is the tendency of AI to simply seem to make things up in response to queries. And you were earlier talking about this in terms of lying in terms of outright deception. I think a lot of people experience this as just the AI is making mistakes and doesn’t recognize that it’s making mistakes because it doesn’t have the level of awareness required to do that. And our newspaper, the times, just had a story reporting that in the latest models, which you’ve suggested are probably pretty close to cutting edge, right. The latest publicly available models, there seem to be trade offs where the model might be better at math or physics, but guess what. It’s hallucinating a lot more. So what are hallucinations. Just are they just a subset of the kind of deception that you’re worried about. Or are they in my. When I’m being optimistic, right. I read a story like that and I’m like, O.K, maybe there are just more trade offs in the push to the frontier of superintelligence than we think. And this will be a limiting factor on how far this can go. But what do you think. Great question. So first of all, lies are a subset of hallucinations, not the other way around. So I think quite a lot of hallucinations, arguably the vast majority of them are just mistakes as you said. So I used the word lies specifically. I was referring to specifically when we have evidence that the I knew that it was false and still said it anyway. I also to your broader point, I think that the path from here to superintelligence is not at all going to be a smooth, straight line. There’s going to be obstacles overcome along the way. And I think one of the obstacles that I’m actually quite excited to think more about is this might call it reward hacking. So in 2027, we talk about this gap between what you’re actually reinforcing and what you want to happen, what goals you want the AI to learn. And we talk about how as a result of that gap, you end up with ideas that are misaligned and that aren’t actually honest with you, for example. Well, kind of excitingly, that’s already happening. That means that the companies still have a couple of years to work on the problem and try to fix it. And so one thing that I’m excited to think about and to track and follow very closely is what fixes are they going to come up with, and are those fixes going to actually solve the underlying problem and get training methods that reliably get the right goals into AI systems, even as those AI systems are smarter than us. Or are those fixes going to temporarily patch the problem or cover up the problem instead of fixing it. And that’s like the big question that we should all be thinking about over the next few years. Well, and it yields, again, a question I’ve thought about a lot as someone who follows the politics of regulation pretty closely. My sense is always that human beings are just really bad at regulating against problems that we haven’t experienced in some big, profound way. So you can have as many papers and arguments as you want about speculative problems that we should regulate against, and the political system just isn’t going to do it. So in an odd way, if you want the slowdown, right, if you want regulation, you want limits on AI, maybe you should be rooting for a scenario where some version of hallucination happens and causes a disaster where it’s not that the AI is misaligned. It’s that it makes a mistake. And again, I mean, this sounds this sounds sinister, but it makes a mistake. A lot of people die somehow, because the AI system has been put in charge of some important safety protocol or something. And people are horrified and say, O.K, we have to regulate this thing. I certainly hesitate to say that I hope that disasters happen. but. We’re not saying that we’re. But I do agree that humanity is much better at regulating against problems that have already happened when we learn from harsh experience. And part of why the situation that we’re in is so scary is that for this particular problem by the time it’s already happened, it’s too late. So smaller versions of it can happen though. So, for example, the stuff that we’re currently experiencing with we’re catching our eyes lying. And we’re pretty sure they knew that the thing they were saying was false. That’s actually quite good, because that’s the small scale example of the thing that we’re worried about happening in the future, and hopefully, we can try to fix it. It’s not the example that’s going to energize the government to regulate, because no one’s dying because it’s just a chatbot lying to a user about some link or something. But from a scientific perspective, turn in their term paper and write and get caught. Right But like from a scientific perspective, it’s good that this is already happening because it gives us a couple of years to try to find a thorough fix to it, a lasting fix to it. Yeah and I wish we had more time. But that’s the name of the game. So now to Big philosophical questions. Maybe connected to one another. There’s a tendency, I think, for people in AI research, making the kind of forecasts you’re making. And so on to move back and forth on the question of consciousness. Are these superintelligent AIs conscious, self-aware in the ways that human beings are. And I’ve had conversations where AI researchers and people will say, well, no, they’re not, and it doesn’t matter because you can have an AI program working out, working toward a goal. And it doesn’t matter if they are self-reflective or something. But then again and again in the way that people end up talking about these things, they slip into the language of consciousness. So I’m curious, do you think consciousness matters in mapping out these future scenarios. Is the expectation of most AI researchers that we don’t know what consciousness is, but it’s an emergent property. If we build things that act like they’re conscious, they’ll probably be conscious. Where does consciousness fit into this. So this is a question for philosophers, not AI researchers. But I happened to be trained as a philosopher. Well, no, it is a question for both. Don’t right. I mean, since the AI researchers are the ones building the agents. They probably should have some thoughts on whether it matters or not, whether the agents are self-aware. Sure I think I would say we can distinguish three things. There’s the behavior, are they talking like they’re conscious. Do they behave as if they have goals and preferences. Do they behave as if they’re like experiencing things and then reacting to those experiences. And they’re going to hit that benchmark. Definitely people will. Absolutely people will think that the superintelligent AI is conscious people. People will believe that, certainly, because it will be. In the philosophical discourse, when we talk about our shrimp conscious our fish conscious. What about dogs. Typically what people do is they point to capabilities and behaviors like it seems to feel pain in a similar way to how humans feel pain. Like it has these aversive behaviors. And so forth. Most of that will be true of these future superintelligent AIs. They will be acting autonomously in the world. They’ll be reacting to all this information coming in. They’ll be making strategies and plans and thinking about how best to achieve their goals, et cetera. So in terms of raw capabilities and behaviors, they will check all the boxes basically. There’s a separate philosophical question of well, if they have all the right behaviors and capabilities, does that mean that they have true qualia, that they actually have the real experience as opposed to merely the appearance of having the real experience. And that’s the thing that I think is the philosophical question I think most philosophers, though, would say Yeah, probably they do, because probably consciousness is something that arises out of this information processing, cognitive structures. And if the eyes have those structures, then probably they also have consciousness. However, this is a controversial like everything in philosophy, right. And no, and I don’t expect AGI researchers, AI researchers to resolve that particular question. Exactly it’s more that on a couple of levels, it seems like consciousness as we experience it, right, as an ability to stand outside your own processing, would be very helpful to an AI that wanted to take over the world. So at the level of hallucinations, right. AI is hallucinate. They produce the wrong answer to a question the I can’t stand outside its own answer generating process in the way that, again, it seems like we can. So if it could, maybe that makes the hallucination process go away. And then when it comes to the ultimate worst case scenario that you’re speculating. It seems to me that an AI that is conscious is more likely to develop some kind of independent view of its own cosmic destiny that yields a world where it wipes out human beings than an AI that is just pursuing research for Research’s sake. But I maybe you don’t think so. What do you think. So the view of consciousness that you were just talking about is a view by which consciousness has physical effects in the real world, it’s something that you need in order to have this reflection. And it’s something that also influences how you think about your place in the world. I would say that well, if that’s what consciousness is, then probably these AIs are going to have it. Why Because the companies are going to train them to be really good at all of these tasks. And you can’t be really good at all of these tasks if you aren’t able to reflect on how you might be wrong about stuff. And so in the course of getting really good at all the tasks. They will therefore learn to reflect on how they might be wrong about stuff. And so if that’s what consciousness is, then that means they’ll have consciousness. O.K, but that and that does depend though in the end on a kind of emergence theory of consciousness the one you suggested earlier, where we can essentially the theory is we aren’t going to figure out exactly how consciousness emerges, but it is nonetheless going to happen. Totally an important thing that everyone needs to know is that these systems are trained. They’re not built. And so we don’t actually have to understand how they work. And we don’t, in fact, understand how they work in order for them to work. So then from consciousness to intelligence, all of the scenarios that you spin out depend on the assumption that and to a certain degree, there’s nothing that a sufficiently capable intelligence couldn’t do. I guess I think that, again, spinning out your worst case scenarios, I think a lot hinges on this question of what is available to intelligence. Because if the AI is slightly better at getting you to buy a Coca-Cola than the average advertising agency, that’s impressive. But it doesn’t let you exert total control over a Democratic polity. I completely agree. And so that’s why I say you have to go on a case by case basis and think about O.K, assuming that it is better than the best humans at x, how much real world power would that translate to. What affordances would that translate to. And that’s the thinking that we did when we wrote AI 2027, is that we thought about historic examples of humans converting their economies and changing their factories to wartime production and so forth, and thought how fast can humans do it when they really try. And then we’re like, O.K, so superintelligence will be better than the best humans, so they’ll be able to go somewhat faster. And so maybe instead of in World War two, the United States was able to convert a bunch of car factories into bomber factories over the course of a couple of years. Well, maybe then that means in less than a year, a couple maybe like six months or so, we could convert existing car factories into fancy new robot factories producing fancy new robots. So, so that’s the reasoning that we did case by case basis thinking. It’s like humans, except better and faster. So what can they achieve. And that was so exciting principle of telling this story. But if we’re looking if we’re looking for hope and I want to this is a strange way of talking about this technology where we’re saying the limitations are the reason for hope. Yeah, right. We started earlier talking about robot plumbers as an example of the key moment when things get real for people. It’s not just in your laptop, it’s in your kitchen and so on. But actually fixing a toilet is a very on the one hand, it’s a very hard task. On the other hand, it’s a task that lots and lots of human beings are quite optimized for, right. And I can imagine a world where the robot plumber is never that much better than the ordinary plumber. And people might rather have the ordinary plumber around for all kinds of very human reasons. And that could generalize to a number of areas of human life where the advantage of the AI, while real on some dimensions, is limited in ways that at the very least. And this I actually do believe, dramatically slows its uptake by ordinary human beings. Like right now, just personally, as someone who writes a newspaper column and does research for that column. I can concede that top of the line AI models might be better than a human assistant right now by some dimensions. But I’m still going to hire a human assistant because I’m a stubborn human being who doesn’t just want to work with AI models. And to me, that seems like a force that could actually slow this along multiple dimensions if the eye isn’t immediately 200 percent better. So I think there I would just say, this is hard to predict, but our current guess is that things will go about as fast as we depict in AI. 2027 could be faster, it could be slower. And that is indeed quite scary. Another thing I would say is that and but we’ll find out. We’ll find out how fast things go when the time comes. Yes, Yes we will very, very, very soon. Yeah but the other thing I was going to say is that, politically speaking, I don’t think it matters that much if you think it might take five years instead of one year, for example to transform the economy and build the new self-sustaining robot economy managed by superintelligences, that’s not that helpful. If the entire five years, there’s still been this political coalition between the White House and the superintelligences and the corporation and the superintelligences have been saying all the right things to make the White House and the corporation feel like everything’s going great for them, but actually they’ve been. Deceiving, right in that scenario. It’s like, great. Now we have five years to turn the situation around instead of one year. And that’s I guess, better. But like, how would you turn the situation around. Well so that’s well and that’s where let’s end there. Yeah in a world where what you predict happens and the world doesn’t end, we figure out how to manage the I. It doesn’t kill us. But the world is forever changed. And human work is no longer particularly important. And so on. What do you think is the purpose of humanity in that kind of world. Like, how do you imagine educating your children in that kind of world, telling them what their adult life is for. It’s a tough question. And it’s. Here are some here are some thoughts off the top of my head. But I don’t stand by them nearly as much as I would stand by the other things I’ve said. Because it’s not where I’ve spent most of my time thinking. So first of all, I think that if we go to superintelligence and beyond, then economic productivity is no longer the name of the game when it comes to raising kids. Like, there won’t really be participating in the economy in anything like the normal sense. It’ll be more like just a series video game like things, and people will do stuff for fun rather than because they need to get money. If people are around at all, and there I think that I guess what still matters is that my kids are good people and that they. Yeah, that they have wisdom and virtue and things like that. So I will do my best to try to teach them those things, because those things are good in themselves rather than good for getting jobs. In terms of the purpose of humanity, I mean, I don’t know what. What would you say the purpose of humanity is now. Well, I have a religious answer to that question, but we can save that for a future conversation. I mean, I think that the world, the world that I want to believe in, where some version of this technological breakthrough happens is a world where human beings maintain some kind of mastery over the technology which enables us to do things like, colonize other worlds to have a kind of adventure beyond the level of material scarcity. And as a political conservative, I have my share of disagreements with the particular vision of like, Star Trek. But Star Trek does take place in a world that has conquered scarcity. People can there is an AI like computer on the Starship Enterprise. You can have anything you want in the restaurant, because presumably the I invented what is the machine called that generates the anyway, it generates food, any food you want. So that’s if I’m trying to think about the purpose of humanity. It might be to explore strange new worlds, to boldly go where no man has gone before. I’m a huge fan of expanding into space. I think that would be a great idea. O.K Yeah. And in general also solving all the world’s problems. Like poverty and disease and torture and wars and stuff like that. I think if we get through the initial phase with superintelligence, then obviously the first thing to be doing is to solve all those problems and make something some utopia. And then to bring that utopia to the stars would be, I think the thing to do the thing is that it would be the AI is doing it, not us, if that makes sense. In terms of actually doing the designing and the planning and the strategizing and so forth. We would only be messing things up if we tried to do it ourselves. So you could say it’s still humanity in some sense that’s doing all those things. But it’s important to note that it’s more like the AIs are doing it, and they’re doing it because the humans told them to. Well, Daniel Kokotajlo, thank you so much. And I will see you on the front lines of the Butlerian Jihad soon enough. Hopefully not. I hope I’m hopefully not. All right. Thanks so much. Thank you.