Deep Research is the title of a new mode in several GenAI apps, including Google’s Gemini, OpenAI’s ChatGPT, and most recently, Perplexity. In this article, I will be focusing on the currently most hyped of these: OpenAI’s Deep Research. Although they weren’t first to release a product with this title (that was Google), they have been the most prolific on my social media feed and in the news in the past weeks.
These Deep Research models all share a basic premise: convert a user’s prompt into a detailed series of internet searches, and produce a lengthy and referenced report. As I’ll explain later in this article, the methods and sources vary, as does the overall quality of the output.
One of the most important features, of course, is the cost. Perplexity Deep Research is available on the free tier when logged in to the platform. Google Gemini Deep Research is only available with a Gemini Advanced account, such as an upgraded Google Workspace or Google One account. OpenAI’s Deep Research is only available on the $200 USD/month ‘Pro’ tier of ChatGPT.
So, you could be paying anything from $0 to $30 to a whopping $200 a month for the privilege of accessing these applications. The question, of course, is are they worth that kind of money?
Deep Research in Action
Diving (or delving) right into OpenAI’s Deep Research, it’s immediately apparent that it operates differently to GPT-4o. To test it out, I used the following prompt:
Research secondary education classroom intervention methods for autistic students, writing a report that centres the neurodiversity paradigm and focuses on lived experience as well as evidence-backed research in the education context.
In the video and images below, you can see a few things happening. First, the prompt is used with 4o. The model responds quickly, and gives a brief report with five sources: two from wikipedia, two websites, and a journal article. Switching to Deep Research, the application first responds with clarifying questions about geographical focus, subject, age levels, and so on. Once those were addressed, it then spent eight minutes searching and compiling results.
The process, or “reasoning”, is displayed in a panel to the right. The images below show some of this process, and it is interesting to see this (highly anthropomorphised) stream of consciousness unfold while the application searches and processes.
Look at how it articulates both the search queries (“Searched for peer support articles… Read more from amaze.org.au”), and explains the creation of the document (“I’m crafting a structured report… I’m comparing sources… It’s interesting to see that…”).



The final report is detailed and accurately referenced, though the application does struggle sometimes with determining the quality of online sources. It has no access to paywalled content, which bars it from a lot of research. Of course, high profile deals between publishers like Taylor and Francis and AI companies might change this in the future.
References are linked inline, and if you include it in the prompt (I didn’t) it will generate a reference list in APA7, MLA, etc. I can see from this example that it has followed the brief and used a lot of quality Australian resources from places like Reframing Autism (where I have held a board position) and Amaze as well as government and academic sources.

You can read the entire response here.
I’ll let you draw your own conclusions as to the quality, but it is certainly a step up from GPT-4o. It is also twelve thousand words long, which is significantly more than the average GPT-4o response (and presumably, given cost per token, part of the reason for the hefty price tag.)
Deep Research, Deep Research, or Deep Research?
If you’re not willing or able to pay the $200/month, the other options of Google and Perplexity’s Deep Research products offer some of the same features. They both lack the capacity to write extensive reports, but do a decent job of synthesising an online search with accurate results. To test them out, I used the following simple prompt in all three:
Research changing international laws about deepfakes since 2020
Here are the results:
- OpenAI Deep Research Response
- Google Gemini 1.5 Advanced Deep Research Response
- Perplexity AI Deep Research Response



The length and complexity of OpenAI’s response is much greater than either Perplexity or Google, but Perplexity actually offers significantly more sources than either: 57 versus OpenAI’s 21 and Google’s 17. Obviously, quantity is not necessarily quality, but I find it interesting that there is such a marked difference.
The nature of the sources is also different, with Google tending more to news stories, Perplexity with significantly more official government websites, and OpenAI drawing on blogs, news, and legal websites with a handful of government pages.
A PhD in Your Pocket?
Much of the news around Deep Research is breathless commentary regarding the “PhD level” reasoning and research skills. Insiders from OpenAI and consultants with early access to the product were quick to tout the “Phd in your pocket” offered by Deep Research, and we have seen a constant barrage of posts and updates from users like Ethan Mollick on LinkedIn, who also had early access to the product. In his article on his Substack, Mollick perpetuated the idea of Deep Research’s PhD-level capabilities, reflecting on one response:
It is, honestly, very good, even if I would have liked a few more sources. It wove together difficult and contradictory concepts, found some novel connections I wouldn’t expect, cited only high-quality sources, and was full of accurate quotations. I cannot guarantee everything is correct (though I did not see any errors) but I would have been satisfied to see something like it from a beginning PhD student.
https://www.oneusefulthing.org/p/the-end-of-search-the-beginning-of
I’m three years into my own PhD, so of course I’m interested in these claims. Is this technology operating at the same level expected of me as I write my own thesis? Can OpenAI’s Deep Research truly match my levels of procrastination, existential crisis, and self-doubt? Could it, in fact, just do the damn thing for me so I can get on with my life?
Well… no.
I’ve drafted, redrafted, scrapped, rewritten, and cast my literature review into hell so many times that I’m painfully familiar with the topic (broadly: teacher-writers and digital writing, which now includes GenAI), so I figured it would make for an interesting comparison with Deep Research’s attempt. I mentioned earlier that the application cannot access paywalled articles, so I’m not judging it here on its ability to find the same breadth of sources I can access through my university library – that would be an unfair comparison.
But I am interested in what it can do with the sources it has access to. Unfortunately, the finished report made some classic lit review errors which you’d expect to be addressed at an undergraduate level, or maybe in a Masters. The entire (16,000 word!) review was filled with description and summary, but lacked any analysis. Passages like this:

The use of verbatim quotes with no further explanation is a no-no, as is the fairly banal closing statement “as Bandura’s theory of self-efficacy would predict”. It mentions “quantitative data” with no exploration of why that’s important (for context, one of the biggest criticisms of research in this field is the reliance on anecdotal and qualitative data). There is no follow through, and no discussion of why this New Zealand research is important in the global context, or at all, really.
It did find some of the seminal (open access) papers from the field which I had used in my own literature review, but it lacks the capacity to articulate why they’re so important. Even at a surface level, you might assume that an application of this capacity (and cost) could make some judgements of the relevance of a piece of research based on its citations, appearance in other studies, or a similar metric. All we get here is a list of papers, with the ever-present ChatGPT bullet points:

There has been some back and forth from defenders of the “PhD in your Pocket” that Deep Research doesn’t replace a PhD, but that it can replace some of the skills of a PhD, like Mollick’s substack above suggests.
Maybe I’m using it wrong, and perhaps this will continue to improve over time (another classic defence of the AI boosters – you’re prompt is bad! Just wait six months!), but I’m just not seeing it yet.
Who is Deep Research Actually For?
Deep Research – particularly the expensive OpenAI flavour – is undoubtedly a step up from other AI applications in the quality of sources, the accuracy of results, and the overall synthesis (though not analysis) of the findings. If you’re looking for a detailed search with varied sources of information, it blows GPT-4o out of the water, and is decidedly better than similar offerings from the competition.
But after using it for a few days, I was left with the question: who is it actually for?
Take the first example, of classroom interventions for teachers of autistic students. The recommendations are mostly solid, and the sources accurate. But why the 12,000 essay? No teacher I know is going to wade through that volume of text. Give me the list of sources, and I’ll pick a few for myself. Or give me the summary of interventions and I’ll make my own judgements about what will work. But I’m not reading that report.
Or the deepfake research. As someone who has written academic articles about deepfakes, a summary of international laws and how they have changed in the past five years might make for useful discussion in the introduction of a new article. But OpenAI Deep Research doesn’t really give me that – it provides instead a rambling walkthrough based on a combination of valid and dubious sources, and one which I would need to carefully unravel if I were to use any of it in an academic paper.

The Practical AI Strategies online course is available now! Over 4 hours of content split into 10-20 minute lessons, covering 6 key areas of Generative AI. You’ll learn how GenAI works, how to prompt text, image, and other models, and the ethical implications of this complex technology. You will also learn how to adapt education and assessment practices to deal with GenAI. This course has been designed for K-12 and Higher Education, and is available now.
My experiments with my own subject matter suggested it was just OK at providing a list of open-access resources with adequate, if banal, summaries. Again, if OpenAI does get access to paywalled articles, that could improve. But I’ve been “waiting for six months” for three years already.
So it’s not for professionals using research, it’s not for academics producing new research, and it’s not for early career researchers synthesising existing research.
The only conclusion I could arrive at is that it is an application for businesses and individuals whose job it is to produce lengthy, seemingly accurate reports that no one will actually read. Anyone whose role includes the kind of research destined to end up in a PowerPoint. It is designed to produce the appearance of research, without any actual research happening along the way.
Will it impact the education sector if and when it becomes more accessible? Absolutely. Is it any good? Sure. Is it research? I’m not convinced.
Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

Leave a Reply