Dead-end chatbots: Why chatbots aren’t the future of generative AI in education

I recently ran a series of posts on LinkedIn arguing that chatbots are a dead-end in education. I think it’s fair to say that the posts ruffled a few feathers. My DMs quickly filled with people touting their made-for-education products and services, and if I’d only try them I’d see why I was totally wrong.

In the comment threads, we had some fairly civil discussions about the argument that chatbots are a dead-end, and I heard some reasonable counterpoints. A recent comment pointed to some research over the past few decades around the positive impact of chatbots through active learning and conversational feedback, which could be promising. I’m open to taking those kinds of counterpoints on board, and if you have any research to share by all means get in touch with the comment form at the end.

I also heard that I’m a technophobic Luddite standing in the way of progress. You win some, you lose some.

LinkedIn isn’t the best forum for lengthy arguments, so in this post I’m going to explain my thoughts in a little more detail, as well as addressing some of the counterarguments.

What is a chatbot?

First of all, I need to define “chatbot” as clearly as possible. One of the most consistent criticisms of the posts was that I seemed to be either too narrow or too broad in defining chatbots, and picking and choosing which apps are and are not chatbots. 

It’s a fair criticism, and I don’t claim to have the ultimate definition of “chatbot”, but I can tell you what I’m talking about when I use the term.

In my definition, chatbots are:

  • An extension of earlier (pre-generative AI) technologies that have been used for years in helpdesks and for online support functions
  • Extended by generative AI, which has allowed for more “natural” dialogue than the limited earlier technologies
  • Largely text-based, with turn-by-turn interactions between the user and the chatbot
  • Sometimes “voiced” through text-to-speech software
  • Generally, in GAI contexts, a “wrapper” built with an API connected to a foundation model (mostly GPT 3.5 at this stage, but there are many options including open source)
  • Tailored to a specific use case, generally by processes like fine-tuning on proprietary data (Khanmigo) or connection to a user’s data through a process like Retrieval Augmented Generation

In my definition, chatbots aren’t:

  • Fully multimodal systems, like the upcoming ChatGPT update which blends image recognition, image generation, advanced data analysis, and internet access
  • “System” level rather than application level, like Microsoft’s Copilot is intended to be, or Google’s Duet for Workspace
  • Capable of interacting with many other apps and services, like an AI-powered conversational assistant (Siri, Alexa, etc.)

In a nutshell, chatbots by this definition are generative AI-based tools, primarily text based and tailored to specific, limited use cases.

Hopefully that’s clear enough…

Now, onto the reasons why I think that these kinds of chatbots are a dead-end in education, and where I’d like to see schools and universities focusing their attention.

Why chatbots are a dead-end in education

Another clarification in response to some of the criticisms. I’m not saying that chatbots have no place in schools and universities. Many places are already using them for information retrieval, student helpdesks, ICT support, and other tasks. The nature of these technologies – with and without generative AI – makes them great for low-stakes information retrieval.

My concerns lie in the fact that chatbots, in the form of “tutors”, teachers’ assistants, and vehicles for personalised learning are being marketed for much higher-stakes uses than I believe are appropriate.

Despite the millions of dollars piling into the industry, and efforts from major players like Khan Academy and Duolingo, these products are not the future we want for education. Even the concept of “personalised learning” is problematic. And while I’m not saying that chatbots can’t disseminate knowledge, that’s a long walk from the unsupported claims being made of their capabilities right now.

Here are the reasons I think chatbots in education are a dead-end:

Technical limitations

The first issue with chatbots: they often don’t work, the user experience is frequently rubbish, and they’re very prone to going wildly off course.

To address the criticisms of this point first: I know that this is “day one” of generative AI, and that it’s highly likely that the technology will improve over the next few months and years. I know that, even since November 2022 and the launch of ChatGPT, the technology has improved a lot. I’ve been paying attention.

And I don’t care.

Because it’s not just the fact that chatbots break that concern me: it’s the effect of putting unreliable technology into schools.

When teachers and students use technologies that don’t work they, like anyone else, become frustrated or even stressed and discard the technology. In the context of a school, things are too busy and move too quickly for people to care about a technology they tried once that broke down, didn’t work as expected, or was frustrating to use.

The idea that “it’s early days” isn’t compelling enough to put unreliable technology into classrooms. We’ve had years of disappointing edtech to contend with, and COVID and remote learning did little to address teachers’ valid concerns that digital technologies often get in the way of learning. 

Even more frustrating, teachers and students are frequently targeted as unpaid labour to test dubious edtech products that, realistically, will never become anything more than attempt to build, scale, and cash-out by the developers. 

When I say “technical limitations”, here are some of the specifics:

  • Large Language Models’ tendency to fabricate information (“hallucinations”)
  • Inconsistent capabilities when connecting to the internet: even Bing Chat isn’t reliable enough 
  • Terrible user experience – platforms and websites which are functionally frustrating to use or have a too-steep learning curve
  • Questions over the security of models, particularly given OpenAI’s reputation as a target for hacking
  • Models reliant on OpenAI’s API, which means if OpenAI’s platform goes down, the chatbot goes down

Ethical issues

If you’re reading this blog, chances are you’ve heard that generative AI produces biased and even discriminatory output. If you haven’t heard or read much about that, check out this collection of posts on AI ethics.

We know that GAI models – including text and image – are not representative of our diverse student and staff cohorts. 

The counterargument to this is “chatbots can be used in school to educate people about these biases”.

That would be true, if chatbots weren’t often presented or interpreted as benign, neutral sources of information. For example, no-one uses a university Helpdesk chatbot and assumes that it is operating from a biased perspective. The inert, robotic nature of the interactions conveys the sense that the chatbot is not “thinking or feeling” and therefore not representing any particularly biased worldview. 

The same is true for more advanced chatbots, including those built on GAI models. The language has been fine-tuned and “neutralised” towards a friendly and helpful chatbot, some of which masks the inherent bias. Guardrails and safety features applied to the chatbots also present a veneer of diverse perspectives, such as recent updates to image platforms like DALL-E 3 to default to more diverse representations. But all of these features are simply a band-aid over the fundamental issues baked into the models. 

So it’s not the fact that chatbots are biased that is the issue, it’s the fact that they are presented as unbiased.

Under the broad catch-all of “AI ethics” there are other serious concerns which should make us pause before building and deploying chatbots in education. I won’t go into each of these in detail here, but you should absolutely check out some of my other posts on teaching AI ethics to find out more.

Social media ickyness

Of all my arguments, this got the fewest criticisms. It’s hard to argue that we don’t want to model the education system on the activities of companies like Meta, but when we follow the chatbot path that’s exactly what we’re doing.

Chatbots are loved by companies like Meta for one reason: they are incredibly compelling.

We’ve seen in 2023 just how compelling they are. An AI startup, Character.ai, has raised hundreds of millions of dollars, and reports that users interact with their chatbots for an average of two or more hours per day.

Meta has paid celebrities like Snoop Dogg and Kendall Jenner a reported $5 million USD each to license their images, voices, and personalities for chatbots to add to their products, including Facebook, WhatsApp, and Instagram.

These chatbots are deliberately designed to be compelling, persuasive, and addictive. Like the social media feed algorithms we’re all so familiar with, this is dopamine hacking on a grand scale.

The language of educational chatbots follows a similar narrative. “Gamification” in education is often touted as a solution to a lack of engagement, but there are question marks over whether it is actually effective for learning. Whether it works or not, the idea loses much of its appeal when it crosses into the territory of social media-style attention hacking.

That doesn’t stop education technology providers from leaning into the idea:

Learners have told us how much they enjoy a screen full of perfect, shiny, completed gold skills. We wondered if we could change the design of the skill to entice learners to go back and practice older lessons and thus repair the skill to its golden glory. We took the learning mechanic of spaced repetition and gamified it! 

https://blog.duolingo.com/gamification-design/

Is this really the kind of example we want to follow in our classrooms and lecture halls?

Lack of evidence

Education is currently undergoing a renaissance of “evidence based learning”. And yet, even in the current climate, many schools and education providers seem in a hurry to build and deploy chatbots.

Where is the evidence that chatbots have any positive effect on learning?

There are a few examples of studies exploring the positive impact of chatbots on knowledge recall, the storage and retrieval of information, and tutoring, though each of these articles identifies significant challenges along with the benefits. There is even less research into generative AI-powered chatbots, since they haven’t been around for long enough for rigorous research to have been undertaken and gone through the review process.

Suggesting chatbots can handle high-stakes tasks in education like supporting learners, personalising learning pathways, or “reducing cognitive load” makes for great marketing copy, but as yet there is little to nothing backing up those claims.

Cost

As if we don’t already have big enough financial and digital divides in education, generative AI and particularly chatbots have the potential to widen these existing gaps. 

First, the development and deployment of chatbots with current technologies is a costly affair. I don’t just mean the cost of building and deploying, but also the “hidden” costs of the time taken to train staff in using them, the time needed to contextualise chatbots with school data and maintain them, and the online running costs.

It’s hard to find exact data comparing the running costs of different models, but the following table, taken from an Anyscale blog post, shows the cost versus performance of models from OpenAI, Meta, and the recent open source Mistral model:

https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1

These price comparisons are drawn based on code generation, but GPT-4 is more expensive for any use, including generating text for education purposes.

This means that, realistically, if schools become reliant on chatbots then they will either need to have plenty of cash to burn, or use less powerful and therefore less capable models. Open source models are an option, but you certainly wouldn’t want to deploy something built on Mistral or Perplexity’s latest pplx-70B model in the interests of student safety – these models don’t have the guardrails of models like GPT.

If there is to be a system-wide adoption of GAI chatbots for some purpose – for example, chatbots which connect to the ACARA curriculum, or which are designed for teachers to access and create resources – then there would need to be consistency over the models used, and who is paying for them. We cannot end up in a situation where schools can build and deploy more powerful and more effective chatbots simply because they are wealthier.

The future of AI

The future of generative AI isn’t chatbots. In fact, the present of generative AI isn’t event chatbots. if you take a look at recent announcements and updates from OpenAI, for example, ChatGPT has moved along way from its release in November 2022.

Back then, GPT-3.5 was a fine-tuned model designed to interface with the language model in a “friendly assistant” fashion. It’s responses were sophisticated enough to send everyone in education into a spin, but not really good enough to actually change anything.

Fast forward one year, and the “chat” in ChatGPT is almost its least impressive feature. In the past months, we’ve seen the (re)release of Bing search integration, image recognition, advanced data analysis, and DALL-E 3 powered image generation. Soon, subscribers will have access to a model combining all of these features, making the model truly multimodal.

Remember my earlier definition of what a chatbot is and isn’t?

A bunch of enterprising startups have created education chatbots in recent months, and it looks like OpenAI is going to put them out of business with it’s “build you own chatbot” functions. But even though OpenAI is sticking to its chatbot roots, these new bots don’t look at all like those earlier versions of 2022: there are set to be integrations for popular platforms like Google and Microsoft Office, multimodality through image, audio, and text, and more. This is much more like the “what a chatbot isn’t” side of my definition.

Multimodality and seamless integration into existing platforms is the near future of generative AI. Extensions of conversational assistants like Siri and Alexa will also incorporate generative AI, allowing them to operate and interact with other applications. Virtual, augmented, and mixed reality will also likely play a part given hardware like Apple’s forthcoming Vision Pro and Stability AI’s recent announcement of its text-to-3D model.

While the chatbot interface was remarkable for changing the way we could interact with language models, and OpenAI’s rapid success is largely down to the “chatbot” structure, things are now pulling away from that form.

Integrated systems from companies like Microsoft and Google’s are also increasingly removed from “chatbots”. The language being used by these companies has started to shift, moving away from the simplistic or sometimes magical discourse of AI of a few months ago and towards words like “integrated”, “assistant”, and of course “copilot”.

This isn’t far-flung AI hype I’m talking here. This is the logical evolution of the technology in the next six to twelve months.

De-proffesionalising

Chatbots tell a seductive story of efficiency and reduced workload, but it’s a story that points to deskilling and deprofessionalisation. 

We’re in the middle of a teacher shortage, and despite ongoing efforts to improve the status of the profession the media rhetoric around teaching often makes it an unattractive seeming profession

Against the background of teacher workload concerns and post-COVID student behaviour issues, chatbots are being touted as the solution to all kinds of problems, for example:

Unfortunately, while these claims might seem attractive, they do nothing to solve the root causes of the issues teachers face every day. 

Providing teachers with off the shelf GAI-generated lesson plans does nothing to promote professionalism or autonomy. Engaging students in isolating, dopamine rewarding online platforms does nothing to solve the upstream reasons of why they’re disengaged in the first place. 

There are many ways that GAI could be used to reduce administrative workload for teachers, or to support quality planning, teaching, and assessment practices. I’ve written about many of these methods elsewhere, which use GAI to augment teachers’ professional expertise and skills without deskilling. 

Unfortunately, its not hard to imagine these scenarios: 

  • Every student has a 1:1 chatbot tutor, and therefore we can have a higher student:teacher ratio 
  • Teachers don’t need to be content experts any more, and therefore they need fewer qualifications 
  • Teachers with fewer qualifications don’t need to be paid as much
  • Teachers are only required in support roles

Though the industry is very different, it’s worth looking at the recent SAG-AFTRA strikes in the US and the subsequent enterprise agreement outlining how GAI can and can’t be used in the film industry. 

Here are some of the key points about what generative AI is and is not permitted for under that agreement:

  • Generative AI isn’t considered a “writer” or “professional writer” since it’s not a person.
  • Materials generated by GAI aren’t recognised as “literary material” under the MBA.
  • Producers can supply writers with GAI-produced content as a basis for their work provided they disclose GAI’s use.
  • GAI material won’t impact compensation or writing credit, nor disqualify writers from separated rights.
  • Writers, with the production company’s consent, can use GAI in crafting literary material, like screenplays, but can’t be forced to use GAI by the company.
  • Production companies can set their own GAI usage policies which writers must follow, and can disallow GAI use if it risks copyrightability or exploitation of the work.
  • Production companies can mandate GAI programs for purposes other than generating written material, like detecting plagiarism or copyright infringement.

Educators would benefit from proactive discussions with unions and stakeholders to take control of the narrative while we’re still early in the process of figuring out the impact of GAI in education. 

Here are my (non-legally binding, probably naive, starting point…) suggestions for adapting some of the SAG-AFTRA points for education:

  • Generative AI isn’t recognised as an “educator”, “tutor”, or other form of education professional since it’s not a person.
  • Materials generated solely by GAI are not suitable for use in education.
  • Education providers (schools, universities, vocational, etc.) can supply educators with GAI-produced content as a basis for their work provided they disclose GAI’s use. For example, a school may provide an outline for a curriculum generated by GAI and contextualised with ACARA documents, to assist as the basis for the teacher’s planning and creation of resources.
  • GAI material will not impact compensation or credit attribution, nor disqualify educators from ownership rights. In short: teachers’ wages cannot be reduced as a result of “GAI doing the planning”, and intellectual property rights can still be exercised.
  • Educators, with the education provider’s consent, can use GAI in crafting educational material, like lesson plans, but cannot be forced to use GAI by the provider.
  • Education providers can set their own GAI usage policies which educators must follow, and can disallow GAI use if it risks copyrightability, exploitation of the work, or the creation. ofresources unsuitable for education.

That’s a starting point, but if you’re reading this and you’re associated with an education union, I’d suggest that you jump in and get working on it as soon as possible.

The live webinar Practical Strategies for Image Generation in Education is now available as a recording. Click here to purchase the recorded webinar.

Conclusions 

I’m still optimistic about the long term trajectory of the technology, and excited about the creative and practical implications of generative AI in many industries, including education. 

If you’ve read around my blog, you’ll see posts ranging from fun experiments with image, audio, and video generation to practical advice for teachers on how to use platforms like ChatGPT. 

I’m not interested in gatekeeping or prohibiting the use of any particular apps and services, but I am convinced that chatbots, by my definition, are a dead-end. 

Rather than leave it there, as I did with the LinkedIn posts, I’m going to make a few suggestions of where I think we should focus our attention in education instead: 

  • Use chatbots and automation technologies for low-stakes tasks, or remove administrative overheads through deimplementation
  • Focus professional development on the strengths and limitations of the technology right now and in the immediate future
  • Engage with students on how they are actually using the technology in a non-judgemental, non-punitive manner
  • Engage with students in open conversations about academic integrity and assessment practices, and acknowledge that these technologies are not going away any time soon
  • Prepare faculty leaders and curriculum heads with professional development targeted at their specific discipline, and the current implications of generative AI in related industries. For example, discuss image generation in Visual Arts, and the SAG-AFTRA strikes in Media Studies. Discuss algorithmic bias in healthcare in Health and Physical Education

In short, focus your attention on the real, lasting implications of generative AI as a broad and complex field, and not on chatbots.

If you’ve got questions, criticisms, counterarguments, or just want to talk about this post or generative AI, please get in touch using the form below:

One response to “Dead-end chatbots: Why chatbots aren’t the future of generative AI in education”

  1. […] has just released it’s latest updates, and while I’m still not convinced about the future of chatbots in education, I can see some immediate commercial appeal and use cases for information retrieval. This platform […]

Leave a Reply

%d bloggers like this: