Ditch the Detectors: Six Ways to Rethink Assessment for Generative Artificial Intelligence

This article is based on a series of short LinkedIn posts and includes the original ideas, plus some of the feedback and discussion from the comments. Head over to my profile on LinkedIn to find the originals.

In recent weeks, I’ve shared my thoughts on Generative AI (GenAI) and its impact on assessments, particularly the fact that AI detection tools are largely ineffective. But if we’re going to move away from these tools as a part of the academic integrity process, what can we replace them with?

I’ve got a few ideas – none of them groundbreaking or overly complex, but each with its own advantages and disadvantages. At the core of all these suggestions is a simple premise: GenAI didn’t ‘break’ assessment, and we, as educators and institutions, set the boundaries around what constitutes ‘academic misconduct’.

1. ‘Level 5 Assessments’

In the AI Assessment Scale developed by Mike Perkins, Jasper Roe, Jason MacVaugh, and myself, we outline five levels ranging from ‘no AI’ to ‘full AI’. ‘Level 5 – Full AI’ assessments obviously require us to disregard detection tools altogether. At this level, we actively teach and encourage students to experiment with GenAI tools. You can read more about the AIAS in the Journal of University Teaching & Learning Practice Vol. 21 No. 6 or via our recent preprint dealing the first pilot study of the Scale.

    Advantages:

    • Realistic: Few employers are preventing their employees from using GenAI (some aren’t even aware of its existence), so when students leave the educational bubble, they’ll be free to use whatever tools are available to them.
    • Multimodal and flexible: Level 5 tasks permit the use of any GenAI applications suitable for getting the job done, including text, image, audio, video, 3D, and code generation.

    Disadvantages:

    • Ethical concerns: GenAI isn’t a neutral technology – copyright and IP issues, dataset bias, and environmental costs are among the problems we need to address before fully embracing ‘full AI’ tasks.
    • Equity of access: Not all tools are created equal, and some students may have access to more sophisticated (and expensive) models, potentially leading to an unfair advantage.
    Sign up to the mailing list here for a collection of over 50 activities aligned to the five levels of the AIAS.

    2. Expect AI Use and Teach the Skills

    Another suggestion for rethinking assessments without relying on AI detection tools is to design tasks suitable for Levels 2-4 of the AI Assessment Scale, which includes using AI for ideation, editing, or significant portions of a task.

      Here’s my entirely unsurprising proposal: Expect that students will use Generative AI and explicitly teach them the necessary skills.

      Advantages:

      • You won’t be caught off guard when students use GenAI to complete a task, eliminating the need for detection tools.
      • You’ll be able to address students’ concerns (well-documented in recent surveys) that their education providers aren’t preparing them for a future that involves using GenAI tools.

      Disadvantages:

      • The time, resources, and cost required to train educators to have an equal and shared understanding of how the technology works.
      • The need to update and reframe many (if not all) current assessment tasks.

      If we anticipate that students are using GenAI (which they are), we can start thinking more deliberately about how to best support them in using these technologies ethically and appropriately.

      The comments on the post about expecting students to use AI and teaching the necessary skills raise some important considerations. As Adrian Cotterell points out, even when aiming for “no AI” tasks, it’s crucial to ensure that the assessments are accessible and not limited to traditional pen-and-paper exams. Additionally, as Jason Braun suggested, educators need to rethink what constitutes great work in a world powered by GenAI. While the overall quality of student outputs may rise, truly outstanding work might have unique characteristics, such as rougher edges or a more distinct voice.

      3. Ungrading

      Ungrading isn’t a new concept, but it gains new relevance when considering technologies that can effectively complete many of our traditional assessments.

        If we shift the focus of education away from the final graded assessment and towards what is being taught (and why), then the imperative for academic misconduct may be lessened.

        As Emily Pitts-Donahoe recently wrote on her substack, there are many reasons to “ungrade”…

        Advantages:

        • Reduces stress and pressure around high-stakes assessments and focuses learners on what is being taught and why.
        • Allows for diverse use of multimodal GenAI technologies without worrying about their impact on the final grade.

        Disadvantages:

        Ultimately, ungrading is a cultural shift, but it’s an idea with serious merit for improving assessments, with or without technology.

        The comments on the original post about ungrading demonstrate some of the potential of this approach to shift the focus from grades to deeper understanding and genuine learning. As Majda Benzenati points out, ungrading allows educators to prioritise critical thinking, intellectual curiosity, and finding joy in the learning process. Emily Pitts Donahoe’s work with her students further emphasises how ungrading can motivate students to learn rather than simply chase high grades. While resistance to this cultural shift is expected, as noted by Ryan MacDonald, many educators like Joerg Meindl are already moving towards ungrading or alternative grading practices. They recognise the importance of focusing on the process, providing feedback, and explaining the purpose behind learning activities. As Vince Wall suggests, ungrading aligns well with process-oriented pedagogies like project-based learning, which may become increasingly relevant in the context of AI-infused education.

        4. Know Your Students’ Style

        Developing a deep understanding of a student’s style and voice is another way to update assessments without relying on AI detection tools.

          There are tools available that can help with “stylometry,” and AI-assisted tools are undoubtedly already in the pipeline to assess work against a student’s previous output. However, I’m talking more about the good old-fashioned approach of “knowing your students.”

          Advantages:

          • Building relationships with students by fully understanding and appreciating their perspectives and ways of expressing themselves.
          • Respecting students’ work and building these relationships is an effective way to mitigate academic misconduct.

          Disadvantages:

          • Scalability issues; it’s difficult, if not impossible, for one lecturer/teacher/tutor to develop a deep understanding of 100+ students’ work over a single semester or unit.
          • Still vulnerable to “traditional” methods of academic misconduct like contract cheating and more sophisticated GenAI models like Claude 3 Opus, which are better at emulating style.

          When faculties engage in block marking, where assignments are split evenly among faculty members rather than each teacher marking their own students’ work, developing a deep understanding of individual students’ styles can be more challenging. However, this practice is often reserved for summative assessments, and there are ways to mitigate the issue. For example, when I’ve run marking in this manner, the actual teacher still reviews their own students’ work before releasing grades to check for any outliers or inconsistencies. This allows for a balance between the benefits of block marking, such as increased consistency and reduced bias, and the importance of teachers being familiar with their students’ unique voices and abilities.

          As with all of these suggestions, there’s no perfect solution. Knowing your students’ style and voice is great if the cohort is small enough, but there will always be issues and ways to game the system, especially with assessments at scale.

          The Practical AI Strategies online course is available now! Over 4 hours of content split into 10-20 minute lessons, covering 6 key areas of Generative AI. You’ll learn how GenAI works, how to prompt text, image, and other models, and the ethical implications of this complex technology. You will also learn how to adapt education and assessment practices to deal with GenAI. This course has been designed for K-12 and Higher Education, and is available now.

          5. Redefine Cheating

          Suggestion number five might seem a bit flippant, but at the end of the day, we (educators, institutions, authorities, examination boards) define what is and isn’t “cheating.”

            We’ve already seen some shifts in how academic integrity is discussed with GenAI in mind. For example, many academic integrity policies no longer group AI under the catch-all term of “plagiarism” because it isn’t. Some have even gone as far as explicitly permitting AI use.

            Advantages:

            • Redefining cheating demonstrates to students that we value trust and transparency and places the expectation on them to do the right thing. It acknowledges that we can’t ban or block the technology and that we need to reframe our assessments accordingly.
            • Reduction of educator workload; no more time spent endlessly chasing plagiarism (or “detection”) reports or going back and forth with appeal processes over academic integrity.

            Disadvantages:

            • Huge systemic and cultural barriers, not least the perception within and outside of education that shifting the goalposts on academic integrity is “soft” or a cop-out.
            • Easier said than done; this is a total, system-wide shift we’re talking about. If one institution decided to reinvent its entire approach to academic integrity, it would quickly hit barriers if external agencies and assessment bodies didn’t also move.

            Redefining academic integrity in the age of GenAI isn’t just about updating policies; it requires a fundamental shift in how we approach learning and assessment. As Mathew Hillier points out in the comments on the original post, the key question should be “how are you assuring learning has happened?” rather than focusing on catching cheaters. This reframing allows us to approach academic integrity from a more constructive standpoint, emphasising the importance of genuine learning over the moralistic labelling of certain behaviours. By moving away from punitive measures and instead designing assessments that truly demonstrate learning, we can create a system that encourages students to engage with their education meaningfully, rather than seeing it as a series of hoops to jump through.

            6. In-Person, In-Time, In-Place Assessments

            My final suggestion for updating assessments in light of GenAI, without using detection tools, is for in-person, in-time, in-place, no-device assessments.

              I’ve deliberately left this one until last, and ironically, it’s where many institutions went first when ChatGPT was released. But this doesn’t necessarily mean examination-style assessments.

              Group work, orals, seminars, practicals, simulations, vivas, brainstorming with post-it notes, debates, marker pens on butcher’s paper… There are plenty of methods that predate GenAI by a few centuries and still work.

              Advantages:

              • Easy to monitor and secure; with no access to devices and no way to do what the Victorian police call “sneaky face” (looking at a phone while driving or, in this case, under the desk), there’s no GenAI to worry about here. We might call these ‘Level 1’ assessments in our AI Assessment Scale.
              • Relevant, engaging, and authentic; these assessments can be modeled on real-world and authentic experiences, such as carrying out a practical task or a simulation.

              Disadvantages:

              • Unfortunately, this type of assessment is hard to scale. It might work well for tutor groups or K-12 classes, but it becomes unwieldy in a cohort of 100+ students.
              • No online mode. Short of relying on lockdown browsers and creepy surveillance tech, there’s no way to guarantee “no devices” in an online setting. I’ll be writing more about GenAI and online teaching at a later stage because it’s a whole different ballgame.

              So, those are six suggestions for assessments that account for GenAI but don’t rely on detection tools. None of them are perfect, and each comes with its own set of challenges, but I believe they’re a step in the right direction as we navigate this new landscape of education in the age of artificial intelligence.

              I regularly work with schools, universities, and faculty teams on developing guidelines and approaches for Generative AI. If you’re interested in talking about consulting and PD, get in touch via the form below:

              One response to “Ditch the Detectors: Six Ways to Rethink Assessment for Generative Artificial Intelligence”

              1. […] Ditch the Detectors: Six Ways to Rethink Assessment for Generative Artificial Intelligence – Leon … […]

              Leave a Reply

              Discover more from Leon Furze

              Subscribe now to keep reading and get access to the full archive.

              Continue reading