The Deep Research problem — Benedict Evans

## Colophon tags:: url:: https://www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem %% title:: The Deep Research problem — Benedict Evans type:: [[clipped-note]] author:: [[@ben-evans.com]] %% ## Notes > The Deep Research problem — [view in context](https://hyp.is/FvIVOv7uEe-or4dUp3prmw/www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem) ⬆️ date:: [[2025-03-12]] > Statista, meanwhile, aggregates other people’s data, makes sure it ranks highly in SEO, and then tries to get you to register or pay to see the result. I think Google should ban this company from the index, but even if you disagree, saying this is the source is like saying the source is ‘a Google search result’. Again, this is an intern-level issue. — [view in context](https://hyp.is/bQUGqP7uEe-KSIP_COoq7g/www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem) ⬆️ Thank you! > What do we think about this? LLMs are not databases: they do not do precise, deterministic, predictable data retrieval, and it’s irrelevant to test them as though they could. But that’s not quite what we’re trying to do here - this is a rather more complex and interesting test. — [view in context](https://hyp.is/SjVIRP8NEe-dAZNlsXZXrA/www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem) > OpenAI is asking the model a probabilistic question, not a deterministic question. But the answer to that question IS deterministic - having worked out what you really want, and which kind of answer to choose, you want the actual number. — [view in context](https://hyp.is/cJfu7P8NEe-Mx7d4ZPKV-w/www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem) > This reminds me of an observation from a few years ago that LLMs are good at the things that computers are bad at, and bad at the things that computers are good at. OpenAI is trying to get the model to work out what you probably mean (computers are really bad at this, but LLMs are good at it), and then get the model to do highly specific information retrieval (computers are good at this, but LLMs are bad at it). And it doesn’t quite work. — [view in context](https://hyp.is/ZNUU-v8REe-B2APYydLI7A/www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem) > At this stage, the obvious response is to say that the models keep getting better, but this misses the point. Are you telling me that today’s model gets this table 85% right and the next version will get it 85.5 or 91% correct? That doesn’t help me. If there are mistakes in the table, it doesn’t matter how many there are - I can’t trust it. If, on the other hand, you think that these models will go to being 100% right, that would change everything, but that would also be a binary change in the nature of these systems, not a percentage change, and we don’t know if that’s even possible. — [view in context](https://hyp.is/f-od0P8REe-qPE9d1rsLhw/www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem) > Stepping back, I feel ambivalent in writing this, because there are only so many times that I can say that these systems are amazing, but get things wrong all the time in ways that matter, and so the best uses cases so far are those where the error rate doesn’t matter or where it’s easy to see. — [view in context](https://hyp.is/jaBnGP8REe-Lr2OQi9F0yw/www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem) > And these things are useful. If someone asks you to produce a 20 page report on a topic where you have deep domain expertise, but you don’t already have 20 pages sitting in a folder somewhere, then this would turn a couple of days’ work into a couple of hours, and you can fix all the mistakes. I always call AI ‘infinite interns’, and there are a lot of teachable moments in what I’ve just written for any intern, but there’s also Steve Jobs’ line that a computer is ‘a bicycle for the mind’ - it lets you go further and faster for much less effort, but it can’t go anywhere by itself. — [view in context](https://hyp.is/swdgfv8REe-SEEtlFhCIIQ/www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem) > Taking one step further back again, I think there are two underlying problems here. First, to repeat, we don’t know if the error rate will go away, and so we don’t know whether we should be building products that presume the model will sometimes be wrong or whether in a year or two we will be building products that presume we can rely on the model by itself. That’s quite different to the limitations of other important technologies, from PCs to the web to smartphones, where we knew in principle what could change and what couldn’t. Will the issues with Deep Research that I’ve just talked about get solved or not? The answer to that question would produce two different kinds of product. — [view in context](https://hyp.is/uYR0Cv8REe-EeaeV10GTKA/www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem) > Second, OpenAI and all the other foundation model labs have no moat or defensibility except access to capital, they don’t have product-market fit outside of coding and marketing, and they don’t really have products either, just text boxes - and APIs for other people to build products. — [view in context](https://hyp.is/1EFXBP8REe-iyZsHcGPEgg/www.ben-evans.com/benedictevans/2025/2/17/the-deep-research-problem)