Generative AI, plagiarism, and “cheating”

Back in January, I wrote a post called Beyond Cheating, reflecting on the ChatGPT bans that were rolling out across various Australian states and the “cheating” narrative that had accompanied the chatbot since its release.

In that earlier post, I argued that banning and blocking generative AI would only contribute to the digital divide – students who have greater access to digital technologies would inevitably be able to access and use GAI, putting those who rely on in-school technology access at a disadvantage.

It’s almost been 12 months since the release of ChatGPT and, thankfully, the bans have now been lifted in most jurisdictions. Unfortunately, the narrative of “catching” students using GAI still persists, and educators in both K-12 and tertiary are still stuck in the loop of detection tools, pen and paper examinations, and proctoring software as methods to stop or monitor GAI use.

Whatever level of education you work in, this post is an attempt to convince you that trying to catch or detect GAI is futile. Not only that, detection tools and other plagiarism checkers may be unethical, and a punitive approach to GAI use is going to add to educator workload. This year I’ve worked in many different schools across states and sectors in Australia. I’ve fielded a lot of questions about GAI and assessment, so I’m going to present this post as a sort of FAQ. If by the end of the article you still think that detection is a viable option, I’d encourage you to get in touch via the contact form button on the left.

What is generative AI?

I’ll start with the broadest question, but I’ve written about this a lot in the past so it might be worth checking out some of the following links. In a nutshell, generative AI takes data as input, and produces new data as output. This might be text prompts (input) used to generate new text (output). Or, it could be text-to-image, image-to-image, text-to-audio, image-to-text, or any similar multimodal variations.

To understand some of the discussion that follows, it’s necessary to wrap your head around the construction of these GAI models. Though approaches vary for text, image, and audio, they are all reliant on huge amounts of data.

For a text-based model like GPT, that includes data scraped from the web, Wikipedia, social media sites, and other datasets. However, these models are not search engines: when you type a prompt into ChatGPT, it doesn’t search for a suitable answer or combination of answers. Instead, it uses the rules it has developed to generate a novel response. Occasionally, that response might be similar to, or the same as, material from the dataset. I’ll get into that in more detail later.

Here are some of my other posts which explore different kinds of Generative AI:

On November 8th I’ll be running a webinar on how educators can use image generation in their day-to-day work. Check it out on eventbrite.

Does GAI plagiarise or copy?

One concern of these technologies has been that students using them are automatically plagiarising. This is based on the assumption that a model like ChatGPT “copies” its answer from the dataset.

In some respects, language models like GPT can generate responses which contain verbatim copies of text from the dataset. For example, in the following prompt I can easily get ChatGPT to tell me the opening line of a classic novel:

Prompt: What is the opening line of pride and prejudice? Model: GPT-4

It will also perform the same way for more recent books which are still under copyright, such as J K Rowling’s Harry Potter and the Philosopher’s Stone:

Prompt: What is the opening line of Harry Potter. Model: GPT-4

However, if you push this much further you’ll get a response like the following:

That’s a trained response: the model has been “taught” to respond with a comment like this when a user appears to be asking for something which might breach copyright. In other models without these guardrails, however, they can be prompted to provide verbatim responses which seemingly recall text from the dataset.

But does that mean that these models are “plagiarism machines”? It’s a little more complex. These models work by analysing the dataset and learning the patterns of grammar, syntax, style, and so on. As a result of the process, data that is repeated more often (such as the opening lines of famous books, or facts available and commonly repeated online), is more likely to appear in generated output. Companies like OpenAI put in place measures to limit this, but it certainly still happens.

So the answer to the question “does GAI plagiarise or copy?” is “sometimes, but not in the ways you might expect”. It’s therefore not possible to claim that a student’s use of GAI equals plagiarism. If a student uses GAI to generate an essay, for example, then much of the response will be novel content, and any content that comes verbatim from the dataset is more of a side-effect than an intentional copy.

Is using GAI cheating?

This depends on your definition of cheating, and on the task.

A student using GAI to complete an entire task might be akin to contract cheating, where a person pays someone else to do the work for them. In fact, ChatGPT might even reduce the amount of actual contract cheating and put the contract essay writers out of work. There’s not much difference between paying someone to write an essay and dropping the entire question into ChatGPT to generate the response.

The key factors in determining whether GAI constitutes cheating include:

Whether the use of GAI is expressly forbidden
Whether the use is required to be disclosed
Whether there is a competitive advantage to be gained through the use of GAI

Essentially, “cheating” is whatever we decide it is. If an educator decides to ban GAI use, then of course any use is cheating. If a student uses the technology in a deliberately deceitful way, or to gain an unfair advantage, then it’s cheating.

The problem, as I’ll explain throughout, is that it’s next to impossible to enforce strict anti-GAI policies. If you consider GAI use to be cheating, for whatever reason, you’re going to have a hard time monitoring and evaluating student work outside of specific constraints.

Can’t I just use detection software?

Hot on the tails of ChatGPT, generative AI “detection” software started to appear in tertiary and then secondary education contexts. You can see the appeal. Just as a new technology arrives which threatens to hugely undermine assessment practices, a few helpful developers provide an easy way to catch students using GAI.

Unfortunately, detection software doesn’t work.

Many studies have demonstrated that detectors like GPTZero and Turnitin in simply don’t have the level of accuracy needed for an academic integrity judgement. For example, here are a few snippets from GPTZero:

Example 1: paragraph from this blog, written with no GAI. Flagged as 48% probability of being written by AI.

Example 2: Paragraph generated entirely by ChatGPT. Flagged as 53% probability of being written by AI. Model: GPT-4

Example 3a: Text from example 2 re-prompted with some basic instructions

Example 3b: Output to example 3a from GPTZero. Flagged as 36% probability of AI, down from 53%. Model: GPT-4

As you can see above, the entirely human writing (mine) scored about the same as the entirely GPT written version. After a very minimal re-prompt, the GPT written text scored as “more human” than the human text.

This is only a very brief example, and it’s worth checking out some of the emerging studies such as:

What are the ethical issues of “catching” and “detecting”?

As pointed out above, detection tools have been demonstrated to be biased against non-native English writers. There are also other ethical considerations when trying to “catch” or “detect” GAI use.

Firstly, students who are more digitally literate – or more fluent in general – may be able to use the technology in more sophisticated ways to generate undetectable content. These students might, for example, be able to construct better prompts which result in more “human-like” output. Or, they may use some of their own writing in the prompt to produce generated text that is more similar to their real “voice”.

Some students will also have better access to technology. This might simply mean device or internet access at home, or could mean access to more sophisticated models, such as the subscription-only GPT-4 model in ChatGPT. These students will produce content that evades detection software, much of which is trained to detect content from GPT-3 and 3.5.

Essentially, a student who is more confident, competent, or has access to a higher quality application might “cheat” and get away with it. This is part of the “digital divide” issue I wrote about back in January, but it is amplified when we consider that detection is more likely to be seen as an option for high stakes, competitive tests where wealthier, more literate students already have an advantage.

There are also ethical concerns with submitting student work to detection services, since the work may constitute their intellectual property. Deakin University’s Professor Phill Dawson made an excellent post about this on LinkedIn, which included a discussion of student data privacy and security.

What does all this mean for assessment design?

I’ll begin this answer with a straightforward but possibly unpopular statement: for any unsupervised assessment, we have to assume students might use GAI.

This isn’t a statement about trust. I’m not suggesting, like some of the early headlines when ChatGPT was released, that all students are compulsive cheats. I’m stating that given the ubiquity, ease of access, and inability to detect generative AI, there is simply no way to guarantee it won’t be used for any assessment that doesn’t happen under supervision. However, I’m also not suggesting that all assessment should be supervised, and certainly not conducted under exam conditions.

Here are a few considerations when designing assessments with GAI in mind:

Does the student need to demonstrate knowledge or competency without any use of GAI? Are you sure? If so, conduct the assessment in person, under supervision. It’s the only option.
Is the assessment a practical or experiential task that doesn’t benefit at all from the use of GAI? I.e., is there really no way that GAI could be used for the task? Think: fitness assessments, constructing a physical product…
Are you assessing knowledge, or skills? Can the skills be assessed in a real-world context, or applied to the student’s personal opinions and experiences?
Assuming students can and possibly will use GAI to complete some or all of the task, are all students equally aware of the technology and do they have equal access? If not, what can be done to ensure that students with access to better models are not advantaged?
If students “opt out” of using GAI, can you guarantee they won’t be disadvantaged?
Does the assessment need to be completed as a written task? Can it be completed orally, such as a discussion, viva, presentation, pitch, or debate?

Back in May, The University of Queensland’s Jason Lodge along with Sarah Howard and Jaclyn Broadbent proposed a taxonomy of approaches to assessment redesign. In the final option, “rethink”, the authors made this comment:

If assessments feel like chores and do not encourage creativity or inspire actual learning, or there is substantial time pressure to complete tasks, there is increased motivation to cut corners.
Jason Lodge, Sarah Howard, and Jaclyn Broadbent

They also explored the long-term viability of different approaches, including banning and invigilating, given the development of generative AI technologies:

Viability of assessment redesign for AI – Jason Lodge, Sarah Howard, and Jaclyn Broadbent

I’ve written elsewhere about an “AI assessment scale” which could be applied here, giving students clarity on when and where to use or avoid GAI. The key is clear communication of the expectations, and genuine reasons for students to not use GAI under certain circumstances.

The AI Assessment Scale: From no AI to full AI

What about distance learning, online courses, or out-of-class assessments?

I’ve spoken with school leaders from distance education providers, as well as tertiary providers with hundreds of online students. I also work with schools which offer programs like the International Baccalaureate, which includes an extended essay that is worked on over time, and often out of class.

My answer here is the same: anything that happens outside of a supervised setting (which may be everything, in this case) can potentially be completed with GAI. Proctoring software and lockdown browsers are as much of a dead end as detection tools, and unfortunately create a culture of mistrust.

However, it might still be possible to engage students in rich, online discussions and conversations where their knowledge can be assessed in ways other than via a written response. Otherwise, you have to accept that students could be using GAI.

Refer back to the questions above about assessment design. How might tasks be structured so that it doesn’t matter if students use GAI, or so that there is no advantage in using it?

So what does “good” assessment look like?

This obviously depends on your subject and content, but “good” assessment should be authentic, and represent the kind of skills that the student will need beyond the course itself. Good assessment should move away from “knowledge checking” towards the demonstration of skills – and some of these skills might include the use of GAI.

It’s important to ask what is being assessed, how, and why. Those might sound like obvious questions, but it’s surprising how often assessments are conducted in ways which are ill-suited to the actual thing being assessed.

For example, in my subject area of English, we typically get students to demonstrate their understanding of the views, values, and ideas of a text through an analytical essay. Why? There are other methods equally suited to demonstrating that kind of knowledge, and the skill of analysis. The dominance of the essay as an assessment item across disciplines is as much about expedience as it is “good” assessment: it’s much easier to collect and grade 100 essays than listen to 100 vivas, or 25 group discussions.

Sometimes, the essay might actually be the best form of assessment. It’s a great skill to be able to logically argue your points, use concise evidence, and write with a compelling voice. But essays can be worked on over time, drafted and edited by hand, and can be accompanied by discussion and conversations with students. All of those approaches can contribute to the next point: authentication.

How can I authenticate student work?

First of all, assume that most students want to do the right thing.

If you have clear guidelines about academic integrity, and you avoid competitive behaviours that might lead to a culture of cheating, you make authenticating student work much easier.

Authentication can happen in a few ways:

Complete certain stages of the assessment, such as planning, brainstorming, drafting, or editing under supervision. Not necessarily under exam conditions – but in person and as part of the classwork. In fact, instead of calling this “supervision” you might just say its collaboration
Retain copies (physical or digital) of work completed along the way. Compare this work to the final submission
Discuss the work with students. It’s often fairly simple to see if a student really has the knowledge and skills or if they’ve been relying on something like ChatGPT to produce work for them
Lean on formative rather than summative assessment, providing more opportunities for students to demonstrate their own knowledge
Have clear guidelines for how to acknowledge the use of GAI. Students might include links to chat transcripts, copies of prompts, screenshots, or a simple acknowledgement of the app used.

Can a student really use GAI in my task?

Probably.

Assuming there is some form of written component, or that the assessment is based on an assessment of the students knowledge of content, then it’s highly likely that GAI could be used to generate some or all of the response.

If there are visual requirements, such as a folio of images, photos, or even hand drawn elements, then we will reach a time soon where this is possible through image generation too.

We are already at the stage where a recorded oral can be easily faked using a platform like eleven labs or Descript. These services take a recording of a voice and generate a realistic – if occasionally robotic – facsimile of the original voice.

As I mentioned earlier, we have to assume that students can and will use GAI in some form or another in any out-of-class assessment.

What do we do now?

As much as I’d love to have all of the answers, I don’t, and neither does anyone else. Systems are responding to this very differently. Some places have locked down assessment to invigilated exams and pen-and-paper written tasks. Others have turned a blind eye to GAI and are pretending it doesn’t apply to their tasks.

Cheating is nothing new, and some students will always flaunt the rules to get an unfair advantage. Others cheat because of time pressures, poor organisational skills, competing interests, academic insecurity, and a plethora of other reasons. Most students will know what appropriate and inappropriate use of GAI feels like, and we can help them through clear academic integrity guidelines to understand our expectations.

To close this post, if I haven’t yet convinced you that trying to “catch” GAI is futile, I’ll end with an anecdote. In fact, I’ll throw this entire post into ChatGPT and generate an anecdote:

Prompt: <copy/paste blog post>. INSTRUCTION: Write an anecdote about two educators, one who encourages the appropriate use of GAI and academic integrity, the other who tries to enforce bans and use detection software.

The Tale of Two Educators: Fei and Jack

Fei and Jack are both university professors teaching courses in the humanities. Fei is an advocate for integrating technology into the classroom, while Jack is more of a traditionalist, wary of the growing influence of AI in education.

Fei’s Approach:

Fei believes that the future is unavoidable—AI is here to stay. She wants her students to be prepared for a world where technology and AI will play significant roles. As part of her curriculum, she includes a module on how to use AI responsibly. She educates her students on the pros and cons of using Generative AI, like ChatGPT, for academic research and writing.

“I want you to understand how to use these tools in an ethical manner,” Fei tells her students. “If you’re using AI to generate some ideas or even a rough draft, be transparent about it. Add an acknowledgment, and make sure you comprehend and revise what the AI has produced.”

By taking this approach, Fei finds that her students are more engaged, and their assignments turn out to be more nuanced and well-researched. She uses class time to discuss ideas, encouraging students to think critically, which makes grading easier since she already has a good grasp of each student’s understanding of the subject matter.

Jack’s Approach:

Jack, on the other hand, is skeptical of Generative AI. He has heard of AI “cheat-detection” software and decides to use it to police his students’ assignments. “No AI will undermine the integrity of my classroom,” he proudly announces on the first day of his course.

Jack spends hours setting up his detection software and routinely checks every assignment and essay for potential AI intervention. However, the results are inconclusive and often inconsistent. An essay written by a non-native English speaker is flagged, causing unnecessary stress and a drawn-out investigation, while a paper partially generated by GAI slips through undetected.

The amount of time Jack spends trying to catch students “cheating” with AI takes away from his ability to engage deeply with his students’ ideas and provide constructive feedback. Grading becomes a prolonged, gruelling process, and classroom discussions are shallow as he hasn’t had the time to prepare adequately.

Conclusion:

As the semester comes to an end, Fei and Jack meet for coffee to discuss their experiences. Fei shares how invigorated she feels, excited by the rich discussions and thoughtful work her students have produced. Jack, on the other hand, looks weary.

“I feel like I’ve spent the entire semester as a detective, not an educator,” Jack confesses. “And I’m still not sure if I succeeded in preserving academic integrity, or if I’ve just made everyone’s life more difficult, including my own.”

Fei nods sympathetically, “Technology is just a tool, Jack. It’s neither good nor bad. It’s all about how we use it and how we guide our students in using it.”

The ballad of Fei and Jack. Image generated in Midjourney using the concluding paragraph written with ChatGPT, GPT-4.

Thanks ChatGPT and Midjourney, I think. A little twee for my tastes, but you get the point. Trying to catch students using GAI creates mistrust and opens up potential ethical issues for equitable assessment. It’s also time consuming, frustrating, and antagonistic. Look for opportunities to move beyond the “cheating” narrative.

If you’re interested in more posts like this, then join the list:

Processing…

Success! You're on the list.

Whoops! There was an error and we couldn't process your subscription. Please reload the page and try again.

Want to ask a question, make a suggestion, or speak with me about GAI? Get in touch:

3 responses to “Generative AI, plagiarism, and “cheating””

Sharing Diigo Links and Resources (weekly) | Another EducatorAl Blog

September 25, 2023 at 6:42 am

[…] Generative AI, plagiarism, and “cheating” – Leon Furze […]

Loading…

Rethinking Assessment for Generative AI: Orals and discussions – Leon Furze

September 27, 2023 at 9:02 am

[…] In a previous post, I talked about the risks associated with trying to “catch” students, such as the ethical issues with detection software, the mistrust created by heavy-handed academic integrity policies, and the danger of false accusations. Unfortunately, we have a system that is heavily geared towards high-stakes summative assessments in written forms, such as essays and examinations. It’s a hard habit to break. […]

Loading…

Rethinking Assessment for Generative AI: Beyond the Essay – Leon Furze

October 4, 2023 at 8:52 am

[…] Generative AI, plagiarism, and “cheating” […]

Loading…

Leon Furze