## Colophon tags:: url:: https://newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html date:: [[]] %% title:: DeepSeek FAQ (Stratechery Article 1-27-2025) type:: [[clipped-note]] author:: [[@newsletters.feedbinusercontent.com]] %% ## Notes > R1-Zero, however, drops the HF part — it’s just reinforcement learning. DeepSeek gave the model a set of math, code, and logic questions, and set two reward functions: one for the right answer, and one for the right format that utilized a thinking process. Moreover, the technique was a simple one: instead of trying to evaluate step-by-step (process supervision), or doing a search of all possible answers (a la AlphaGo), DeepSeek encouraged the model to try several different answers at a time and then graded them according to the two reward functions. What emerged is a model that developed reasoning and chains-of-thought on its own, including what DeepSeek called “Aha Moments” — [view in context](https://hyp.is/AmCFaN1AEe-LPc9ga8nWSQ/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) - Annotation: This is interesting, but also worrying to me, because it means we have even less understanding of the how/why for such reasoning. > It underscores the power and beauty of reinforcement learning: — [view in context](https://hyp.is/oUqQGt1AEe-iF6sr1VGqoQ/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) - Annotation: For me to understand/[[to lookup]]: How well is reasoning for models with reinforcement learning (without human feedback) understood today. i.e. what is the baseline? > This is one of the most powerful affirmations yet of The Bitter Lesson: you don’t need to teach the AI how to reason, you can just give it enough compute and data and it will teach itself! Well, almost: R1-Zero reasons, but in a way that humans have trouble understanding. — [view in context](https://hyp.is/x8hzQt1AEe-hbL9aENvKzA/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > Here again it seems plausible that DeepSeek benefited from distillation, particularly in terms of training R1. That, though, is itself an important takeaway: we have a situation where AI models are teaching AI models, and where AI models are teaching themselves. We are watching the assembly of an AI takeoff scenario in realtime. — [view in context](https://hyp.is/Q24asN1BEe-ndk9AteJ3mw/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > So are we close to AGI? It definitely seems like it. This also explains why Softbank (and whatever investors Masayoshi Son brings together) would provide the funding for OpenAI that Microsoft will not: the belief that we are reaching a takeoff point where there will in fact be real returns towards being first. — [view in context](https://hyp.is/SerkXt1BEe-l2I_HZIar2w/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn't, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure. — [view in context](https://hyp.is/m9dMgN1BEe-LtvPslrtgdg/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > Just look at the U.S. labs: they haven't spent much time on optimization because Nvidia has been aggressively shipping ever more capable systems that accommodate their needs. The route of least resistance has simply been to pay Nvidia. DeepSeek, however, just demonstrated that another route is available: — [view in context](https://hyp.is/oLXyat1BEe-teatsXv33jg/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > The payoffs from both model and infrastructure optimization also suggest there are significant gains to be had from exploring alternative approaches to inference in particular. For example, it might be much more plausible to run inference on a standalone AMD GPU, completely sidestepping AMD's inferior chip-to-chip communications capability. — [view in context](https://hyp.is/POcQnN1CEe-DHMctWvbn8g/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > In short, Nvidia isn't going anywhere; the Nvidia stock, however, is suddenly facing a lot more uncertainty that hasn't been priced in. And that, by extension, is going to drag everyone down. — [view in context](https://hyp.is/QtHqNt1CEe-Pmxe2bdZh1w/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > Indeed, you can very much make the case that the primary outcome of the chip ban is today's crash in Nvidia's stock price. What concerns me is the mindset undergirding something like the chip ban: instead of competing through innovation in the future the U.S. is competing through the denial of innovation in the past. — [view in context](https://hyp.is/cfMiCN1CEe-w-sdMvp6mlg/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > That paragraph was about OpenAI specifically, and the broader San Francisco AI community generally. For years now we have been subject to hand-wringing about the dangers of AI by the exact same people committed to building it — and controlling it. — [view in context](https://hyp.is/l_mvlN1CEe-3_0cA0JTFqw/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > So you're not worried about AI doom scenarios? I definitely understand the concern, and just noted above that we are reaching the stage where AIs are training AIs and learning reasoning on their own. I recognize, though, that there is no stopping this train. More than that, this is exactly why openness is so important: we need more AIs in the world, not an unaccountable board ruling all of us. — [view in context](https://hyp.is/Byl2Vt1DEe-DvqtXgfHgVQ/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) - Annotation: hmmm > This actually makes sense beyond idealism. If models are commodities — and they are certainly looking that way — then long-term differentiation comes from having a superior cost structure; that is exactly what OpenSeek has delivered, which itself is resonant of how China has come to dominate other industries. — [view in context](https://hyp.is/M_nFCt1DEe-q_U9rl8wmBA/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > That leaves America, and a choice we have to make. We could, for very logical reasons, double down on defensive measures, like massively expanding the chip ban and imposing a permission-based regulatory regime on chips and semiconductor equipment that mirrors the E.U.s approach to tech; alternatively, we could realize that we have real competition, and actually give ourself permission to compete. Stop wringing our hands, stop campaigning for regulations — indeed, go the other way, and cut out all of the cruft in our companies that have nothing to do with winning. If we choose to compete we can still win, and, if we do, we will have a Chinese company to thank. — [view in context](https://hyp.is/tBXzJt1DEe-SIzOMWDB7iA/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) - Annotation: Hmm.. > R1-Zero, however, drops the HF part — it’s just reinforcement learning. DeepSeek gave the model a set of math, code, and logic questions, and set two reward functions: one for the right answer, and one for the right format that utilized a thinking process. Moreover, the technique was a simple one: instead of trying to evaluate step-by-step (process supervision), or doing a search of all possible answers (a la AlphaGo), DeepSeek encouraged the model to try several different answers at a time and then graded them according to the two reward functions. What emerged is a model that developed reasoning and chains-of-thought on its own, including what DeepSeek called “Aha Moments” — [view in context](https://hyp.is/AmCFaN1AEe-LPc9ga8nWSQ/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) - Annotation: This is interesting, but also worrying to me, because it means we have even less understanding of the how/why for such reasoning. > It underscores the power and beauty of reinforcement learning: — [view in context](https://hyp.is/oUqQGt1AEe-iF6sr1VGqoQ/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) - Annotation: For me to understand/[[to lookup]]: How well is reasoning for models with reinforcement learning (without human feedback) understood today. i.e. what is the baseline?> This is one of the most powerful affirmations yet of The Bitter Lesson: you don’t need to teach the AI how to reason, you can just give it enough compute and data and it will teach itself! Well, almost: R1-Zero reasons, but in a way that humans have trouble understanding. — [view in context](https://hyp.is/x8hzQt1AEe-hbL9aENvKzA/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > Here again it seems plausible that DeepSeek benefited from distillation, particularly in terms of training R1. That, though, is itself an important takeaway: we have a situation where AI models are teaching AI models, and where AI models are teaching themselves. We are watching the assembly of an AI takeoff scenario in realtime. — [view in context](https://hyp.is/Q24asN1BEe-ndk9AteJ3mw/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > So are we close to AGI? It definitely seems like it. This also explains why Softbank (and whatever investors Masayoshi Son brings together) would provide the funding for OpenAI that Microsoft will not: the belief that we are reaching a takeoff point where there will in fact be real returns towards being first. — [view in context](https://hyp.is/SerkXt1BEe-l2I_HZIar2w/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn't, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure. — [view in context](https://hyp.is/m9dMgN1BEe-LtvPslrtgdg/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > Just look at the U.S. labs: they haven't spent much time on optimization because Nvidia has been aggressively shipping ever more capable systems that accommodate their needs. The route of least resistance has simply been to pay Nvidia. DeepSeek, however, just demonstrated that another route is available: — [view in context](https://hyp.is/oLXyat1BEe-teatsXv33jg/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > The payoffs from both model and infrastructure optimization also suggest there are significant gains to be had from exploring alternative approaches to inference in particular. For example, it might be much more plausible to run inference on a standalone AMD GPU, completely sidestepping AMD's inferior chip-to-chip communications capability. — [view in context](https://hyp.is/POcQnN1CEe-DHMctWvbn8g/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > In short, Nvidia isn't going anywhere; the Nvidia stock, however, is suddenly facing a lot more uncertainty that hasn't been priced in. And that, by extension, is going to drag everyone down. — [view in context](https://hyp.is/QtHqNt1CEe-Pmxe2bdZh1w/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > Indeed, you can very much make the case that the primary outcome of the chip ban is today's crash in Nvidia's stock price. What concerns me is the mindset undergirding something like the chip ban: instead of competing through innovation in the future the U.S. is competing through the denial of innovation in the past. — [view in context](https://hyp.is/cfMiCN1CEe-w-sdMvp6mlg/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > That paragraph was about OpenAI specifically, and the broader San Francisco AI community generally. For years now we have been subject to hand-wringing about the dangers of AI by the exact same people committed to building it — and controlling it. — [view in context](https://hyp.is/l_mvlN1CEe-3_0cA0JTFqw/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > So you're not worried about AI doom scenarios? I definitely understand the concern, and just noted above that we are reaching the stage where AIs are training AIs and learning reasoning on their own. I recognize, though, that there is no stopping this train. More than that, this is exactly why openness is so important: we need more AIs in the world, not an unaccountable board ruling all of us. — [view in context](https://hyp.is/Byl2Vt1DEe-DvqtXgfHgVQ/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) - Annotation: hmmm> This actually makes sense beyond idealism. If models are commodities — and they are certainly looking that way — then long-term differentiation comes from having a superior cost structure; that is exactly what OpenSeek has delivered, which itself is resonant of how China has come to dominate other industries. — [view in context](https://hyp.is/M_nFCt1DEe-q_U9rl8wmBA/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) > That leaves America, and a choice we have to make. We could, for very logical reasons, double down on defensive measures, like massively expanding the chip ban and imposing a permission-based regulatory regime on chips and semiconductor equipment that mirrors the E.U.s approach to tech; alternatively, we could realize that we have real competition, and actually give ourself permission to compete. Stop wringing our hands, stop campaigning for regulations — indeed, go the other way, and cut out all of the cruft in our companies that have nothing to do with winning. If we choose to compete we can still win, and, if we do, we will have a Chinese company to thank. — [view in context](https://hyp.is/tBXzJt1DEe-SIzOMWDB7iA/newsletters.feedbinusercontent.com/ff5/ff565e49a029c94f4cd6671a757981d691c1d5fd.html) - Annotation: Hmm..