There’s a moment in learning that most teachers recognise instinctively, even if they’d struggle to name it in formal pedagogical terms. It’s the moment a student hits a wall: where the answer isn’t obvious, where the reading doesn’t offer up its meaning on the first pass, where the code won’t compile and the reason isn’t clear. It’s uncomfortable. It’s frustrating. And it is, by virtually every measure we have, essential to developing genuine expertise.
GenAI is making it too easy to skip that moment entirely. I’ve written before about the ways ChatGPT and its successors are reshaping assessment, and I’ve tried to hold a balanced position about the merits of “good friction” in education. There are real, legitimate uses for AI in education, especially in the hands of expert teachers. But the question I keep coming back to is not whether students

The research on productive struggle
The concept of “productive struggle” has deep roots in educational research. Vygotsky’s zone of proximal development, Piaget’s notion of disequilibrium, Kapur’s work on productive failure and more all point to the same fundamental idea: learning happens not despite difficulty, but because of it.
When a student reaches for a problem that’s just beyond their current ability, something happens neurologically and cognitively that doesn’t happen when the answer is handed to them. Robert Bjork’s work on “desirable difficulties” has been particularly influential here. His research demonstrates that conditions which make learning feel harder in the moment – spacing, interleaving, retrieval practice – actually produce stronger, more durable learning.
Now consider what happens when a student faces a challenging essay prompt and, instead of wrestling with the ideas, feeds the question to an AI. The output they receive may be competent. It may even be good. But the cognitive work that would have transformed their understanding of the topic simply didn’t happen. They received a product without undergoing a process.
The expertise problem
Anders Ericsson’s research on expert performance – the work that was clumsily popularised as the “10,000 hours rule” – also makes a critical distinction that’s relevant here. It’s not just practice that builds expertise. It’s deliberate practice: the kind that operates at the edge of your current ability, involves feedback, and requires sustained effort. Mindless repetition doesn’t build expertise. Neither does watching someone else perform the skill. You have to do the hard thing yourself.
This is where I think the AI-in-education conversation has a significant blind spot. Much of the discourse focuses on outputs – are the essays good enough? Can you tell the difference? – when the real issue is about what’s happening (or not happening) inside the student’s head. A medical student who uses AI to generate differential diagnoses without learning to reason through symptoms hasn’t become a better diagnostician. They’ve become someone who knows how to use an app. Those are not the same thing.
A law student who generates legal arguments without learning to read cases critically hasn’t developed legal reasoning. They’ve developed “prompt engineering” skills. The distinction is important because expertise isn’t just about producing correct outputs. It’s about developing the internal representations: the mental models, the pattern recognition, the intuitive judgements that allow an expert to understand novel situations. You can’t build those representations by outsourcing the thinking.
But isn’t AI just like a calculator?
This is the objection I hear most often, and it deserves a serious response. In fact, we (Jason Lodge, Suijing Yang, Phillip Dawson and I) already gave a serious response to this claim all the way back in 2023. When calculators became widespread, we didn’t insist that every student continue doing long division by hand forever. We adjusted our expectations about what mathematical fluency looked like: Isn’t AI just the next iteration of that shift?
I think this analogy is more misleading than helpful, for a few reasons. First, calculators automated a specific, well-defined procedural skill. Long division is an algorithm. Once you understand what “division” means, automating the procedure doesn’t erode your mathematical thinking. But writing an essay, constructing an argument, analysing a text, designing an experiment… these are not procedures. They’re complex cognitive activities where the process is the learning.
Second, we had decades to study what happened when calculators entered classrooms. We could observe the effects and adjust. With GenAI, we’re making sweeping assumptions about what can safely be automated before we have any real evidence about the consequences. The research simply doesn’t exist yet; a point I’ll return to in a future post about the credibility crisis in AI-in-education research.
Third, and perhaps most importantly, calculators don’t produce outputs that look like expert human thinking. A calculator gives you a number. ChatGPT gives you something that reads like it was written by a competent person. The risk of students mistaking the AI’s output for their own understanding is qualitatively different from anything calculators posed.

What we’re actually asking
When we tell students they can use GenAI to help with their work, we’re implicitly asking them to make a judgement call: use it for the routine parts, but do the hard thinking yourself. This is a reasonable expectation for an expert – someone who already knows the difference between routine and cognitively demanding work. But it’s an unreasonable expectation for a novice, because novices don’t yet know what they don’t know.
This is the paradox at the heart of AI in education. The students who could use AI most wisely – i.e., those with deep domain knowledge who can evaluate and improve AI outputs – are the ones who need it least. The students who are most likely to over-rely on it – novices still building foundational understanding – are the ones for whom the stakes of skipping the struggle are highest. I certainly don’t think the answer is to ban AI outright. I’ve argued against blanket bans before, and I still think they’re impractical and often counterproductive.
But I do think we need to be much more honest about what we’re trading away when we encourage students to use GenAI during the learning process, as opposed to after they’ve developed sufficient expertise to use it as a genuine augmentation of their thinking.
The long game
There’s also a temporal dimension to this that doesn’t get enough attention. The effects of skipping productive struggle won’t show up on next week’s assignment. They’ll show up years later, when a graduate enters a profession and discovers that the mental models their peers developed through years of difficult practice simply aren’t there. They can use the apps. They can produce outputs. But they can’t think independently about the problems the tools weren’t designed to solve. We’ve seen plenty of versions of this before.
GPS navigation has measurably reduced people’s spatial reasoning abilities. Spell-checkers have changed how people attend to language. These aren’t catastrophic losses, but they’re real, and they happened with technologies far less powerful than GenAI. The question I’d put to every school leader, every curriculum designer, every teacher making decisions about GenAI integration: what is the “minimum viable struggle” your students need to undergo to develop genuine expertise in your discipline?
Not the maximum: the minimum. Because that’s the line we cannot afford to cross, no matter how impressive the technology becomes. We owe students more than efficient outputs. We owe them the difficult, uncomfortable, deeply human process of learning to think for themselves. That process has always been hard. It was supposed to be.
Cover image: Susan Quin & The Bigger Picture / https://betterimagesofai.org / https://creativecommons.org/licenses/by/4.0/
Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

Leave a Reply