Expertise Not Included: One of the Biggest Problems with AI in Education

For a long time now, I’ve been arguing that chatbots – in the sense of AI tutors, Homework Helpers and other edtech applications – are a dead end in education.

Since I wrote those original articles, the pace of change in AI has continued to grow, and it seems like a new model is released weekly which smashes through all available benchmarks. Most recently, we’ve seen OpenAI’s much touted o1, the secret strawberry model, which the company claims demonstrates advanced planning and reasoning capabilities. The jury is still pretty far out on whether o1 lives up to the OpenAI hype machine, and it’s currently only available in preview, but it’s still forward movement.

Yet while the pace of development is certainly frenetic, my views on chatbots in education have not changed a great deal.

In fact, a lot of the advances in recent months have only confirmed my ideas that homework helper style AI chatbots are still a dead end. One of the main reasons why I think this is still the case is that to get the most out of generative AI systems, you already need expertise in your subject area.

Expertise Not Included

The most powerful AI systems currently on the market are also the most general, with a broad knowledge set, an enormous amount of training materials, and development and fine tuning processes which make them incredibly capable across a huge range of tasks.

OpenAI’s new model, for example, allegedly shows advanced reasoning capabilities, “pauses to think” before answering questions, and is more suitable for mathematical reasoning and complex scientific purposes.

Other leading models such as Google Gemini and Claude 3.5 Sonnet are proficient across many disciplines, from computer science to languages, marketing to business strategy.

But this is the first hurdle for a learner working with a large language model: they are capable of so much in so many areas that you really need to know what you want before you can make the most use out of them.

A great example of this is what happens when you present a large language model to a person who has not actually learned how to use one. I often find that educators who haven’t had the time or the inclination to work with generative AI struggle to use applications like ChatGPT because they don’t have the right conceptual framing for dealing with these technologies. In reality, none of us do. This is a totally new way of working with digital technology: conversationally, dialogically, iteratively.

Dead-end chatbots: Why chatbots aren’t the future of generative AI in education

To get the most out of a large language model, you have to have a certain amount of expertise in using a large language model. The only way to get that expertise is, rather counterintuitively, to use a large language model a lot. Individuals who have had the time, the resources and the energy to work with large language models, at least since the release of ChatGPT, if not before, have found many great ways to prompt and refine the output of AI, but there is a huge skill gap between those few individuals and the vast majority of people who understand generative AI as just another digital technology and treat it much like Google Search.

This problem is then compounded by somebody trying to use generative AI to learn another topic, because to get the most out of AI as a learning tool, you have to be an expert, not only in using generative AI, but also in the topic itself.

The AI Learning Paradox

If this sounds paradoxical, it probably is.

To learn through generative artificial intelligence, you need to know how to ask the right questions. You need to understand how to pose those questions to a technology which is fundamentally different to a traditional search engine or research tool.

You need to know enough about the topic to know when the AI is hallucinating or bullshitting, and as the complexity of the topic increases and the amount of expertise needed to master the topic grows, it becomes harder and harder to spot inaccuracies and hallucinations, and more expertise is required to fact check the output of an LLM.

Let’s take as an example my own experience of using generative artificial intelligence whilst learning Python. I have found Claude and ChatGPT great tools for understanding the roadblocks as I’ve been working through books and other materials for learning the programming language.

I’ve also played around with using these AI systems as a tutor or advisor. The problem is, they don’t know what I don’t know, and I don’t know how to say what I don’t know.

Confused? Me too.

As a novice learning a new programming language, there are certain fundamentals which any textbook, website or AI chatbot can provide – learning the basics of loops and functions, or even some of the more advanced skills of classes and object oriented programming. All of that knowledge can be attained through working with traditional or AI learning materials.

But in any area, whether it’s creating scripts for automation, making text based adventure games, or working with Python at the back end for web apps, I reach a step where I don’t know the next questions to ask or how to ask them.

The conversation with the AI chatbot gets vaguer and vaguer. The chatbot misinterprets my requests and sends me down the wrong rabbit holes, resulting in hours of wasted time and code which still doesn’t really work or which I don’t actually understand.

Because of that broad and deep knowledge base, the AI holds more information than I can ever possibly keep in my head, but it draws on it randomly. Sometimes ChatGPT will recommend a particular code library where Claude would suggest something completely different. From one chat to the next, ChatGPT might change its mind and suddenly recommend an entirely different approach.

As a learner, this is incredibly frustrating.

Imagine you trying to pick up a new topic. You diligently study the textbooks until you have a grasp of the fundamentals, and then you work with an expert tutor to push at the boundaries of your knowledge.

This is scaffolding. This is working in what Vygotsky and decades of educators call the “zone of proximal development“, the point at which you’re stretching your own knowledge and with the help of expertise and support, can break through to the next level of understanding. But when you’re working with a chatbot and you reach the ZPD, things start to get messy.

Now imagine that you’re learning that topic. You blitz through the textbooks, you’ve got the fundamentals, and you turn to your AI expert for help. What’s next? On Monday, the AI expert tells you one approach. On Tuesday, it tells you something different. On Wednesday, a different AI expert pops up and offers completely contradictory advice. On Thursday, the developer behind the AI expert updates the software, changing its personality entirely.

The only comparison I can think of in real life is being a student confronted by a revolving door of highly knowledgeable but incredibly fickle supply teachers. As the learner, I end up feeling frustrated. I don’t yet have the expertise to articulate exactly what I need from this revolving door of expert tutors, and they don’t understand anything about me at all.

You Don’t Know What You Don’t Know

Let’s come back to the problem of hallucinations, or my preferred term, and that of many academics: bullshit. Bullshitting is a fundamental part of the architecture of large language models. Without the tendency to bullshit, AI would simply be dredging up verbatim content from the training data, something which developers deliberately try to avoid.

But as a non-expert in a subject area, particularly as the sophistication of the subject grows, I can’t tell when AI is confidently bullshitting me.

When OpenAI released GPT-o1, I ran it through its paces with a senior secondary mathematics paper. It showed all of its working, did the new reasoning thing, and came out with the correct answers, which I was able to check because the VCE specialist maths exam comes with an examiner’s report.

But I’m not a mathematician, so I can’t attest to the accuracy of its reasoning or methods. This was pointed out by Shern Tee in the comments, who gave this fantastic example.

“sadly I already got a major hallucination from it within minutes of trying”

Shern’s comment was that the AI had made an obvious hallucination. The question and its AI answer, from a first year chemistry course, might have seen obviously flawed from the perspective of an expert, but to my novice eyes, I didn’t have a clue what was going on. Shern filled in the gaps, pointing out that GPT-o1 had made an error in its working out, saying “What’s concerning is (1) it’s not a very typically human mistake, so it took me a long time to find it (like you!) and (2) the hallucination actually “spread backwards” and messed up part of an otherwise impressive answer.”

If I was a first year chemistry student, would I spot this mistake? Probably not.

The Practical AI Strategies online course is available now! Over 4 hours of content split into 10-20 minute lessons, covering 6 key areas of Generative AI. You’ll learn how GenAI works, how to prompt text, image, and other models, and the ethical implications of this complex technology. You will also learn how to adapt education and assessment practices to deal with GenAI. This course has been designed for K-12 and Higher Education, and is available now.

Check out the course

Getting Specific

So what’s the answer? Because, let’s face it, education systems across the world are chomping at the bit to put AI powered tutors into the hands of students. Edtech companies, including corporate-sponsored nonprofits like Khan Academy, are offering these things for free in an effort to put AI powered chatbots into the hands of as many students as possible. How do we as educators deal with the implications of technologies which are presented as learning tools, but which require expertise in order to use them properly?

One answer may lie in developing much more specific, purposeful applications than those general tools on the market at the moment. Although one of ChatGPT’s biggest strengths is its ability to generalise and work across domains, that makes it fairly terrible as a teaching and learning tool. But if a conscientious educator or institution were to confine that generalised knowledge and put some boundaries around what the chatbot does and doesn’t convey to the learner, it might prove much more useful.

The University of Sydney is trying something like this with Cogniti, an architecture which takes the raw GPT model and refines it using specific course materials and resources from the educators. These more refined chatbots can be encouraged to defer students’ questions, making them work harder to get the answer, and they can also draw on a much more specific and nuanced set of data.

Coming back to my example from earlier, instead of using ChatGPT or Claude 3.5 Sonnet to coach me through programming and getting frustrated with the vast ocean of information and approaches I could use, a more fit-for-purpose chat bot which only adopts one single philosophy to programming in Python and has extensive but much more narrow knowledge of particular libraries and approaches suitable for the skills that I’m learning at that moment in time.

If I got stuck, or as is frequently more often the case, distracted, this chatbot would not start offering up random solutions that take me down useless rabbit holes, but would instead tell me to go back a step, look at the same problem from a different angle, or point me towards resources and expertise beyond the chatbot itself.

A well designed chatbot could do all of these things, but I still don’t think chatbots are the answer.

If Not Bots, Then What?

I want us to reframe how we think of large language models in education, and to stop viewing them through the distorted lens of chat bots. ChatGPT is the app that made us aware of these technologies, but a large language model is not a chatbot. The chatbot is just the interface.

I want to challenge developers who work in education to think beyond the Paradigm of the Chatbot, not just the transactional dialogue based back-and-forth interface that we’ve become familiar with through ChatGPT already, but something more, something which can take multiple sources of information, respond to the learner’s input, identify and understand where the learner is in their journey, offer up relevant resources and expertise, points the learner to appropriate resources and connections to follow up on, and do all of this without the pretence of a friendly, helpful assistant.

Reflecting on my posts from months ago about chatbots being a dead end in education, I gave two definitions: one of chatbots as “[insert definition here]”Largely text-based, with turn-by-turn interactions between the user and the chatbot”, and another of multimodal LLM based technologies, which I think I would now call an agent.

In my definition, chatbots aren’t:

Fully multimodal systems, like the upcoming ChatGPT update which blends image recognition, image generation, advanced data analysis, and internet access

“System” level rather than application level, like Microsoft’s Copilot is intended to be, or Google’s Duet [now Gemini] for Workspace

Capable of interacting with many other apps and services, like an AI-powered conversational assistant (Siri, Alexa, etc.)

A learning agent is not a learning chatbot, not a tutor that requires expertise to extract the most value from, but a platform which is designed for teaching and learning. Not a technology which has been ported from another domain and slapped into education as a technological solution to a problem, but an approach which has been designed from the ground up with teaching and learning in mind.

I honestly don’t know what this teaching and learning agent looks like. I certainly don’t have the technical skills to build it myself, but I do have over 15 years of experience in the classroom telling me that chatbots are not the answer.

For the past couple of years, I’ve had a policy of rejecting most messages from edtech people trying to convince me that their AI tutor is the next best thing, but I’m going to change that policy slightly. If you’re picking up what I’m putting down in this article, and you think you have a way to build the platform that I’m talking about, I want to hear from you.

I want to hear from developers who are thinking beyond chatbots, beyond Homework Helpers, beyond AI tutors, developers who have more of an understanding of what learning is.

Because these paradigms aren’t going to shift themselves. And I think the worst thing we can do at this point is lock ourselves in to a narrow, chatbot-centric approach to using Artificial Intelligence in education. So DM me on LinkedIn, send me an email, or use the contact form at the bottom of this or any of my other posts to get in touch. Let’s see if we can build something worthwhile.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

← Back

Thank you for your response. ✨

One response to “Expertise Not Included: One of the Biggest Problems with AI in Education”

AI = UX: How GenAI will shape the way we interact with digital technologies – Leon Furze

October 7, 2024 at 8:30 am

[…] This post is part of a series exploring fundamental concepts to help prepare for a future built with artificial intelligence. For the previous post on why human expertise is still necessary, click here. […]

Loading…

Leon Furze