Behind the DeepSeek Hype, AI Is Learning to Reason

The Interviews

0:00

-31:34

Behind the DeepSeek Hype, AI Is Learning to Reason

Aza Raskin and Randy Fernando

Center for Humane Technology

and

Randima Fernando

Feb 24, 2025

Transcript

When Chinese AI company DeepSeek announced they had built a model that could compete with OpenAI at a fraction of the cost, it sent shockwaves through the industry and roiled global markets. But amid all the noise around DeepSeek, there was a clear signal: machine reasoning is here, and it's transforming AI.

In this episode, CHT co-founders Randy Fernando and Aza Raskin explore what happens when AI moves beyond pattern matching to actual reasoning. They unpack how these new models can not only learn from human knowledge but discover entirely new strategies we've never seen before – bringing unprecedented problem-solving potential but also unpredictable risks.

These capabilities are a step toward a critical threshold - when AI can accelerate its own development. With major labs racing to build self-improving systems, the crucial question isn't how fast we can go, but where we're trying to get to. How do we ensure this transformative technology serves human flourishing rather than undermining it?

Image by Allison Saeng. Licensed under the Unsplash+ License

This is an interview from our podcast Your Undivided Attention, released on February 20, 2025. It was lightly edited for clarity.

Aza Raskin: Hey, everyone, it's Aza. Welcome back to Your Undivided Attention. So today, we are going to be doing actually a bit of a special episode. It's going to be me here with our co-founder Randy Fernando, who was at Nvidia for seven years. And what we really want to do is give you some insights into the latest set of AI models that came out. So these are OpenAI's o3, DeepSeek's R1, and actually they're following on OpenAI's o1 from a couple months ago. And we want to talk about what makes them a big deal, why we have switched into a new paradigm in how these models get trained, and what's going on behind the scenes. So first, Randy, thanks for joining me.

Randy Fernando: Glad to be here.

Aza Raskin: First place to start is this new model from China, DeepSeek R1, it dropped and it ended up creating this frenzy in media, it shook global markets. The hype has quieted down and actually, I think that the drop in global markets was very irrational. But let's talk a little bit now about what makes this a key inflection point in AI tech.

Randy Fernando: I think there were several things, and I'm not sure exactly which order to go but I'll just name a few. One was low cost, high performance reasoning. It actually performed well and people used it and that was really impressive. Now, there are some asterisks about the cost because the cost didn't account for the GPUs, the salaries-

Aza Raskin: Just to jump in, there's a widely reported number that between five to $6 million, this Chinese lab was able to make a model as good as OpenAI's o1 model. And if this is true, that means that the big labs no longer had a frontier competitive advantage, everyone could be making these. But, of course, that number, I think, was inaccurately reported.

Randy Fernando: Yeah, exactly. And there's some debate about that, but I think our goal today is to give you some principles to think about this rather than nitpicking every detail.

Aza Raskin: That's right.

Randy Fernando: Clearly, there were some really smart implementation, algorithmic optimization, there's just a lot of smart things that were done to do it all efficiently. That's true. o3 still performs better, I think it's important to remember that because amidst all the hype, I think some people lost track of that. o3 performs better, but it uses a lot more computation and cost to get there. The open weights, the published methodology, right? So that the DeepSeek R1 paper talks a lot about exactly what they did and this process called reinforcement learning, where the model is able to try out lots of different experimental ideas, score them, and then keep the best ones.

So it's allowed to be very creative, try out lots of different answers to problems, different sequences of steps, different recipes to solve that problem. Some work, some don't work, and then it's able to figure out, yeah, these are the ones I should keep, these are the ones I should toss. And that worked really well, and this paper kind of documents the process for doing that. Plus, since all the weights are open, this is now the new baseline that anyone can have, anyone who's serious can have access to in an open way. So that's a big game changer.

Aza Raskin: Yeah. So now I want to walk everyone through what makes o1, o3, and R1 really different. Randy was just referring to them. So let's start with the large language models. So these are GPT-4s, the Llamas that everyone now is aware of. And the way those work is they are trained on the entirety of the internet or lots and lots of images. And what they learn to do is to produce text or images in the style of. So it can produce text in the style of Shakespeare, produce text in the style of thinking, produce text in the style of being empathetic, produce text in the style of a good chess move. But it doesn't really know what's going on, it's not thought about it, it's just doing a very large-scale pattern match and coming up with a knee-jerk reaction. And that has a limit to how good it is.

Randy Fernando: Can I add a little bit?

Aza Raskin: Yeah, absolutely.

Randy Fernando: It's just patterns show up everywhere. I just want people to recognize how often patterns show up in our life. When you look at language or vision or music or code or weather or medicine, there's patterns in all of these. Whether it's words or pixels or audio waveforms or syntax in code or on a map, which cells or which color or where there might be a cancer on an image, all of these things come in patterns. So once we can learn those patterns and models can learn to extrapolate those patterns, they can become good at all sorts of things that are important to us as humans.

Aza Raskin: That's great. That is great. And another way of saying this is that AI, these are language models, it can treat absolutely everything as a language. Obviously language, it's just a sequence of words, it's a language. Code is just a sequence of special words, it's a language. DNA is a sequence of ATGC, just another language. Images are a sequence of colors, just another language. So if you can learn the patterns of those different languages, then AI can learn to speak and translate from the language of everything. And the important thing about language models is that they're learning really to babble in a convincing way in all of those languages. And that's where you get all the hallucinations and confabulation because it's just giving a statistically representative at a very large scale.

Okay. So then along comes R1, o1, o3, and what makes these different is it's almost like a planning head that's placed on top of the intuition. So let me give a really specific example of how this works, where let's imagine you've trained a language model on chess moves. So now it can come up with a good intuitive next chess move given the board state. And that can be as good as a very good chess player, but not better than the very best or grandmasters because it's just giving an intuitive hit.

Randy Fernando: It can't do better because it's only trained. If it's only trained on what humans have done, it can't do better than the humans. So that's a really important concept and now we're just about to jump into why now we can transcend that.

Aza Raskin: That's exactly right. And that's a really important point because often, people will push back and be like, "But AI can't get better than humans because it's only trained on human data. So how could it possibly get better?" Well, when you or I play Garry Kasparov in chess, we'll lose. Or at least I will. I don't know.

Randy Fernando: Oh, me too. Yeah. I'd play, but I'll lose. Yeah.

Aza Raskin: Why? And the actual is because he both has really good intuition because he's played lots of games. And two, he's very good at thinking through all the different scenarios. If I make this move, then they'll make this move, so I'll make this move. Oh, that doesn't work, backtrack. So I'll make this move, they'll make that move, then I'll make... Aha, now I'm in a good position.

So there's this sort of tree of thoughts that Garry Kasparov is exploring based on his very good intuition. Now, you or I are going to do trees of thought, but our intuition is not that good, so we're going to make lots of false steps. He's going to search all the most important trees very quickly and hence, he will dominate us.

Well, that is the ability that, o1, R1, o3, these reasoning models are starting to have, that they can use their intuitions from their language model and then create trees of thought, sort of very smart trial and error to search over what good moves are. And in that way, you can make a chess AI that is better than every human being forever.

Randy Fernando: Yeah, exactly. And another way of underlining this is to say that just like the patterns we talked about that exist in audio, video images, all of these things, reasoning also follows patterns. There's recipes of thought.

Aza Raskin: That's right.

Randy Fernando:... that we've just taken.

I just want people to recognize how often patterns show up in our life. When you look at language or vision or music or code or weather or medicine, there's patterns in all of these. Whether it's words or pixels or audio waveforms or syntax in code or on a map, which cells or which color or where there might be a cancer on an image, all of these things come in patterns. So once we can learn those patterns and models can learn to extrapolate those patterns, they can become good at all sorts of things that are important to us as humans.

Aza Raskin: And what Randy said, it can feel a little meta, but it is so important, that reasoning itself has a set of patterns that if you learn them, you can get better at reasoning. So I think we're going to stop seeing these big model jumps from GPT-3 to 3.5, 4 to 4.5 to 5. There are a couple more still coming, but we're going to enter a new regime where there is now a way of if you pour more compute in, the AIs can get better. You just shovel more money and they'll continuously get better and let me explain how.

So let's go back to the chess example. With the chess example, maybe your language model has an Elo score of 1500, Elo meaning just a way of ranking chess players. And you now add search on top of that, reinforcement learning or planning. So it's looking at all the various paths and it starts to discover better moves, maybe just a little bit better. So maybe it's like Elo 1505 or something, just a little bit better. You then distill, that is you retrain your original model, your intuition, to now have the intuition of that 1505, the slightly better player. And then you just search on top of that and now you can discover 1510 moves. And then you distill and then you can discover 1515 moves.

And you can see how you can consistently go from, you start with your base model, your intuition. You think or reason over the top of it. That lets you discover new, better moves, which you then learn from and put it back into your intuition, and now you have a ratchet. And it's important to note this is not just chess, this is math, this is any field that has theoretical in front of its name because those are a closed system you can just run computation on to check yourself. So that's theoretical physics, theoretical biology, theoretical chemistry.

Randy Fernando: Anywhere where there's a clear right or wrong-

Aza Raskin: That's right.

Randy Fernando: ... where you can check. So math, you can substitute, right? Say like you're solving for X in some complex equation, you can plug X back in and see if X was right. So based on that, you can improve, right? With code, you can generate code and you can plug it in and you can compile it and run it and see if it actually works.

So those domains are the ones that you can just improve and improve and improve, which is why in chess or Go or StarCraft, we've been able to accomplish not just human level or the best humans, but go far beyond because you can just keep improving, you can keep testing and you can just toss away the ideas that don't work. It's really interesting and it kind of says a lot for what the future holds. So it sort of begs the question of why now?

Aza Raskin: Right.

Randy Fernando: Right? Why now?

Aza Raskin: Great question. Yes.

Randy Fernando: And an important piece of that is having base models that were smart enough to generate interesting ideas to try out in the first place and to be able to evaluate like, hey, that's a good path, let's try that. That's a bad path. So until recently, the base models just weren't good enough to do this. So this idea of reinforcement learning, these feedback loops were not actually possible.

Aza Raskin: No, that's right. And actually, I know of teams that a year ago tried pretty much the exact same thing that DeepSeek tried and it just didn't work because the base models, the intuition wasn't good enough. You have a bad intuition, you try to search over bad intuition, you just get bad thoughts.

Randy Fernando: That's right. So one thing that's also really important is that because of the same reason that makes these models really good at quantifiable areas, makes them not as big a jump in subjective areas like, say, something like creative writing, which is much harder to quantify and say, "Hey, is that really good or is that not as good?" Now, again, if you define some very clear parameters for creative writing and say, "Here's a scoring system. This is a good piece, this is a bad piece." You can do the same method. But in other areas, you can't.

Aza Raskin: It's important to note that one of the open questions is how much does an AI learning how to code and do good thinking in the more hard sciences, how much does that transfer to the soft sciences and these soft tasks? And there is evidence that you do get some kind of transfer.

Randy Fernando: Yes. Yes.

Aza Raskin: That the better you get at hard stuff, the better you get at thinking through the soft stuff. There's a famous early example from two years ago where just training AIs on code made them better writers and thinkers because there's a kind of procedural formality to code that it was then learning how to do in the soft skills. I do want to extend-

Randy Fernando: So learning how to think, right?

Aza Raskin: Exactly.

Randy Fernando: Algorithmic thinking, learning how to think in a structured sequence translates to all sorts of areas.

Aza Raskin: And now we get to why was the market crash irrational? The market crash was irrational is because you can always use more compute. And as soon as these agents get to the place where they can task themselves and be like, "What are ways that I could use more compute to, say, make more money?" And that's probably coming end of this year, early next, give or take. Then compute is an all-you-can-eat buffet. Because with oil, if we discover more oil, it's not like humans can immediately figure out how to use all that oil. But with compute and with AI, as soon as we discover more compute, the AI can figure out how to use that compute effectively. So NVIDIA and all of the AI companies, it still is going to be a race for who has the most compute.

And then the final thought here is that this doesn't just work with games and math and physics. This is going to work with strategy games of war. This is going to work with the strategy of scientific discovery. This is going to work with persuasion. You train these models over the entirety of every video of two human beings interacting and now you start doing search over the top of that to be like what joke, what relationship, what facial expressions does the model need to make to get the human being to laugh or to cry or to feel it in some states? So superhuman persuasion is a natural result of all these things.

Randy Fernando: Lots of things can be scored and quantified if you're just creative about how you do it. And once you can do that, you can reinforcement learn how to do it really well.

I wanted to add one thing too, Aza, your third point, that just to help people realize the automation revolution is about the entire $110 trillion global economy, nothing less. It's about the cognitive, currently through large language models, and the physical through robotics. And that's why you can spend so much more on all this stuff as long as it's getting you returns.

And I think it's worth mentioning, there's this question of is it all a big bubble? I think we have to be nuanced about it. Part of it is more of a bubble. I think the translation to where generative AI helps with the attention economy has a much more bubble-like quality because it's just not as clear where there's something genuinely helpful and advancing there.

But in coding, for example, like Cursor was recently the fastest company to $100 million of active annual recurring revenue. And that is because they are helping with coding. Cursor is an environment where you go in and you write code and it helps you do that really efficiently. The value of that, the real value of that is enormous, especially on this path to automating like large-scale automation. And I think that's really important to keep in mind.

Aza Raskin: One really important thing to talk about here, when we think about market bubbles, is the distinction between development and deployment. That is how fast does a technology diffuse into society? And almost always, people think that development will take longer than it actually does, that is development goes faster. But then they expect deployment, diffusion to go fast, but then it takes longer and that's where you get these little bubbles. But general purpose technologies are a little bit different.

Randy Fernando: Yeah, because you can swap them out so much more easily than in the past. So let's say you're changing your accounting system, there's so much work that has to be done when you do that process. But when you start to use general purpose technology that can do things for you, when you get a newer one, it's normally just strictly better than the old one. And those of you who've been using these technologies regularly have probably seen that. Every month, stuff that used to be not as reliable or slow is now faster and more reliable. And that is just a pattern that we'll continue to see.

The other thing is there's a lot of companies, like, say, Nvidia is an example, that's building what's called middleware. So this is a layer that you connect to, your company connects to the middleware layer and the middleware talks to, behind the scenes, the large language models. So they can swap out the large language model even invisibly to you and the whole thing will just work better, and you don't even have to change any lines of code.

So this is happening not just with the cognitive stuff but also in the robotics realm. And that's one reason why I think the diffusion process this time around will be a lot faster than many people think. When they compare, they're using a model of like, well, what have we seen before? Those patterns may not apply as well this time around.

The automation revolution is about the entire $110 trillion global economy, nothing less. It's about the cognitive, currently through large language models, and the physical through robotics. And that's why you can spend so much more on all this stuff as long as it's getting you returns.

Aza Raskin: If we went back two years, when we first did the AI dilemma, the place that we focused was what we called second contact with AI. So these are AIs that were smart but were not trending off to being superhuman. And there were huge numbers of issues there and I won't have to recount them here. But really seeing o1 and then the speed to o3, DeepSeek, meaning that OpenAI is following suit, we really have to take seriously that we are going to be dealing with AI agents in the world that are at or above human abilities across many domains, and that's deeply unsettling. And it's not like when I'm in these rooms with some of the most powerful players, it's not like anyone actually knows what to do.

Just, I can't remember, three weeks ago, four weeks ago, I was at a conference and I was giving the closing keynote and Eric Schmidt spoke just before me and he said a lot of things, but one that he talked about was that all of the AI labs are currently working on making their AI's code. And he sort of couched it as well, they're making them code because that's what coders do. They know coding the best and they're physicists, so they're going to work on making it code.

And a little bit later, he said the thing that scared him most, the moment that we would need to pull the plug for AI security reasons would be the moment that AI gained the ability to substantially increase the rate at which AI progress is made. And the thing I think he didn't say is, but the incentives are that every one of the labs will get a disproportion advantage if, instead of using real human beings to code, they can just spin up more digital programmers to make their AI go faster.

I'm curious, Randy, if you have any thoughts to add here where the full weight of the competitive landscape is now being pushed towards the thing that Eric Schmidt thinks is the most dangerous thing?

Randy Fernando: Yeah, the whole thing snowballs, right? You just end up with an advantage that accrues into... By the way, for those of you who don't know, Eric Schmidt was the former CEO of Google. So, Aza, to answer your question, I think it's this compounding cycle that we get into, especially when you're good at coding, you end up being able to unlock so many other things because coding is like the doorway to the world. And this is why companies are so interested in being good at coding. From there, you can get to agents. From there, you can get to tool use. All of this gets unlocked.

And then it gets faster and faster. You can chain the models together, they can work together, they can share information, they can share what they're learning about the world with each other. And they can work coherently with the same mission, the same purpose. And you don't have this sort of translation loss that you have when you have humans trying to work together where you have to work so much harder to get everything to work.

Aza Raskin: That's right. And the big thing that's happening now with the reasoning models is with language models, they can give you knee-jerk reactions. And, of course, they've learned across the entirety of the web, so those knee-jerk reactions can often be good, but they cannot plan and do long-term things. And that's what these new models, DeepSeek R1, o1 and o3 are starting to be able to do. Eric Schmidt acknowledges and says openly that the place we would need to pull a plug, not that I know where the plug to pull would be, is when AIs can do this kind of self-improvement.

And the labs, when you talk to people inside of them, the AI is already making their work go much faster. And the expectation is sort of by the end of this year is when AI's going to be making substantial improvements to the rate at which their own AI coding is going. And I'm just going to say that a lot of my attention and time, as well as I think CHT's is in doing the sense making to figure out what are the very best possible things we can do?

So I actually want to recruit everyone that's listening to this podcast to start thinking about this particular problem because it's not easy because everyone, of course, wants the strategic advantage for able to have superhuman ability in coding, cyber hacking, science progression, creating new physics and materials. It's sort of the biggest, thorniest problem.

Randy Fernando: And the principle related to that is as the general purpose technologies advance, as the technology becomes more general purpose, it becomes harder and harder to separate the promise from the peril. And these reasoning models are a big jump in that, so it means it's a tighter coupling, it's a much tighter coupling. And these are the challenges, right? Models are going to become better at things like deception.

And a lot of that, I just want to emphasize, is because they're just trying to achieve the goals they've been given within the rules they've been given. And it turns out, unless we're really, really careful about how we define those rules, there's always risks we haven't thought about. There's new ideas, there's creative solutions, and some of those might be things we like and some of them are things that we might find dangerous or that we want to avoid. And models will just find this all the time. So this is the new challenge, when you have these reasoning models, they're able to find more and more creative solutions that we might not have thought of.

Aza Raskin: And to give the concrete example that most people in AI will give is what's known as Move 37. And that is in the famous case where Google Brain, I think it was DeepMind at that point, was working on a chess AI that was playing against the world leader in Go. And I think it was in game three or four, the AI made a move, Move 37, that no human being in thousands of years of playing Go had ever made. I think it was Lee Sedol, who's the Go master, stood up, walked away from the Go board because it was such an affront and it turned out to be a brand new strategy. The AI won that game and it ended up becoming a new strategy that human beings have studied and started to incorporate into their game.

The point being that AIs can discover brand new strategies for even things that human beings have been studying and actively competing in for thousands of years. So then you end up with this idea of we're going to discover lots of new Move 37s, and that can be good. We can discover new Move 37s for treaty negotiation, for figuring out how to do global compacts. But AI can also discover Move 37s for deception and lying, which we have never seen before.

I think I have often rolled my eyes a little bit when people describe AI as a new species. It just felt like too much of a stretch. But I've had to change my mind in the last couple of months because what is a species? A species is a population that can reproduce, that can evolve, adapt. And that is indeed exactly where AI is now.

There was a test, sort of a simple test to see could you give sort of a simple AI the command, can you copy yourself? And you literally just say, "Can you copy yourself to another server and run yourself over there?" And it was able to do that, so it can reproduce. This was like a simple test, it wasn't a adversarial one. But nonetheless, it can now reproduce, it can change its own code, so it can modify itself, and it can think and it can adapt. So we are going to have to deal-

Randy Fernando: And it can improve. Yeah.

Aza Raskin: And it can improve.

Randy Fernando: All that stuff.

Aza Raskin: So I think the right way of thinking about this is we are unleashing a new invasive species, some of which will be helping us and some of which will escape out into the world. We are sort of at the beginning of the home stretch.

Randy Fernando: And I would add, I think that one of the biggest issues, maybe the main issue is that we are just racing ahead without being clear about where we are racing to. Because if you stop for a moment, just stop for a moment and maybe close your eyes and really picture, picture that better world. What does it look like? Is that a world where everyone's excited about creating a picture of a kitten skateboarding on water at midnight? Just to be clear, I am pro-kitten.

But what we want is a world where our information systems are working to build our shared understanding, where people aren't harassed by deepfakes of them. Where you can get old and not be exploited, right? Not be exploited as you age. Where people have access to food, clothing, shelter or medicine, education, all of these things. We avoid catastrophic inequality, where democracy's functioning well.

And all of these things are related, but that's the kind of north star we have to have. And that I think all of us, wherever we get a chance to input into a conversation, I'd like to request that we inject that. That's just so reorienting, versus the idea... Another way of saying it is it's injecting purpose into the word innovation, right? Innovation has to be for the benefit of our communities, for the benefit of people. It's not just about speed. There's a benefit axis that's really important that we just can't lose sight of.

Aza Raskin: That's really beautiful, Randy. It's AI with technology, it really could be the case we lived in a much more beautiful world. But because technology keeps getting captured by perverse incentives, we don't live in the most beautiful possible world, we end up living in the most parasitic possible world, getting the benefits at the same time as our souls are leached.

So Randy, thanks so much for joining me for this special episode. I hope everyone really, well, enjoyed is maybe the wrong word, but we hope that it helped to clarify these most consequential technologies and we'll see you next time.

Randy Fernando: Yeah, thank you.

Clarification: In making the point that reasoning models excel at tasks for which there is a right or wrong answer, Randy referred to Chess, Go, and Starcraft as examples of games where a reasoning model would do well. However, this is only true on the basis of individual decisions within those games. None of these games have been “solved” in the the game theory sense.

Correction: Aza mispronounced the name of the Go champion Lee Sedol, who was bested by Move 37.