How Not to Use the AI Assessment Scale
We seem to be on a roll with papers getting published recently. In the recent weeks, we’ve had our critical AI literacy paper published in JIME, a paper on digital plastics as a conceptual metaphor in the journal Pedagogies, and last week, a commentary in JALT with the somewhat tongue-in-cheek title, “How (Not) to Use the AIAS”.
This commentary reflects on some of the ways Mike Perkins, Jasper Roe and I have applied our AI Assessment Scale in K-12, higher education and vocational education, and it also gave us an opportunity to discuss some of the pitfalls involved with publishing an open access and inherently flexible framework for assessments with generative artificial intelligence.
In this article, I’m going to explore some of the key points from the commentary, and I’ll share the entire open access paper at the end. For the hundreds of schools and universities worldwide using the AI Assessment Scale, we hope that this paper provides some useful ideas, some notes of caution, and prompts further adaptations of the scale.
Learning from the Journey
We published the original AI Assessment Scale in 2023, and version one was immediately adopted across the world by schools and universities eager to provide some kind of structure for students working with generative artificial intelligence. It was very much a case of right place, right time. I’d written the original blog post back in March, and Mike Perkins and Jasper Roe, along with Jason MacVaugh at BUV, got in touch and inquired about whether it could be adapted for their higher education context.
The original version, which is discussed extensively on this site and on our new aiassessmentscale.com website, is informally known as the “traffic lights” version. By 2024, we’d gathered a lot of data, both anecdotal from schools and universities working with the scale, via our own observations working with our students, and through a pilot study at British University Vietnam. We used all of this data to write version two, published as a preprint in 24 and soon to be published as a peer-reviewed article.
The primary differences between version one and version two were our acknowledgements that the top-down approach suggested by the traffic light colours – red for stop and green for go – would be impossible to manage given the fallibility of detection tools, the growing ubiquity of generative artificial intelligence and the complex nature of assessment in various educational contexts. Again, we’ve put a lot of work into explaining the changes between version one and version two, and wherever we encounter version one still in use in the wild, we recommend that educators consider reviewing their practice and joining us in using the up-to-date version.
While version one was successful, it’s version two that has really blown us away with the volume of adaptations in use across the world, with over 30 translations and a spotlight at UNESCO Digital Week in 2024. The current version of the AI Assessment Scale has proven incredibly popular. Again, a lot of this is down to being in the right place at the right time, but we also feel that the increased flexibility of the scale has made it more attractive to a variety of disciplines and even outside of education, in industry and corporate learning and design.
But that flexibility, of course, comes at a price, and there are some areas where we feel we’ve perhaps been too ambiguous, or where the purpose of the scale has been misinterpreted. Rather than flooding the internet with AIAS version three, we decided to address some of those misconceptions and problems in the new commentary.
Common Pitfalls
First up, I want to clarify that we’re not trying to turn the assessment scale into a rigid or formal structure. By making these recommendations, we certainly don’t have all of the answers to AI and assessment. But we have seen hundreds of examples of the AI Assessment Scale in use, and of course, with that volume, there have been some issues.
And that’s not to say that the people who are implementing the AIAS are “doing it wrong”. They’re responding to the systemic pressures placed upon them by government and tertiary institutions, regulatory bodies, and particularly in K-12, the pressures of standardised testing. We also acknowledge that many of the misinterpretations of the AI Assessment Scale are on our shoulders, particularly where people are still using version one under the impression that AI can be neatly constrained to various levels.
The first major issue that we’ve identified is using the AI Assessment Scale as an assessment security tool. It’s simply not possible to show students the Assessment Scale (either version), ask them to only use AI to level two, and then cross your fingers and hope that they do the right thing. To be fair, many students will do the right thing under those circumstances, but a sizeable number won’t. And if all you have is a colourful piece of paper and a hopeful expectation, you don’t have assessment security. But as we clarified in the article for version two, assessment security isn’t something that we’re aiming for with the AI Assessment Scale. It’s a framework to support assessment design.
We outline some of the other issues in the paper itself, which I’ll include at the end of this article.
So How Do We Use the AI Assessment Scale?
With some of the common pitfalls out of the way, it begs the question: if not this, then what? In the commentary, we also offer some of the successes that we’ve seen across various sectors – K-12, higher education and technical and vocational education (TVET).
From the article, our key suggestions for the effective implementation of the AIAS are as follows:
- Audit the broader validity of the assessments currently used;
- Decide the appropriate AIAS level per task, then redesign the brief, evidence trail, and rubric to fit that choice;
- Communicate permitted and prohibited uses in plain language and back it up through structural redesign;
- Build a chain of evidence of student attainment over time, rather than relying on single high-stakes moments;
- Align approaches within faculties to respect disciplinary and institutional norms while ensuring consistency for students;
- Build faculty capability through training that includes supporting critical AI literacy and principles of assessment design.
- Recognise equity issues and guarantee access to the required tools for all students.
We also provide examples of what these suggestions look like in practice, and how to implement them in the various sectors.
We hope you enjoy the new commentary. It is available from the JALT website as an open access article here, or you can download it below.
Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

Leave a Reply