It’s been a couple of years now since I first published a short blog post suggesting an AI assessment scale ranging from no AI to full AI. That article opened a door to hundreds of interesting conversations about assessment, and the AI Assessment Scale has taken on a life of its own, spearheaded by Dr Mike Perkins and co-authored by Jasper Roe and Jason MacVaugh. The AI Assessment Scale is now in its second version, and has become one of the most widely used frameworks in both K-12 and higher education.
But this article is not about the AI Assessment Scale. It’s about the broader principles behind why we need to rethink assessments. These principles underpin the logic of the Assessment Scale, but they extend far beyond that. These principles were also important long before generative artificial intelligence, although many have fallen to the wayside over the years in the face of high stakes standardised testing, the ranking of students and public and political narratives around the purpose of assessment.
The reasons for rethinking assessments are also much more complex than just “because students can use ChatGPT to do everything”. In all corners of education, we need to stop policing artificial intelligence and focus instead on designing better assessments. GenAI gives us an excuse to have these conversations. AI needs to prompt us to reflect on what matters most: validity, fairness, transparency and of course, learning.
This article introduces five principles that underpin my approach to rethinking assessment and help educators assess in a way that reflects the realities of teaching and learning with generative artificial intelligence.
Principle One: Validity First
The term “assessment validity” has gained traction over the last couple of years, but it isn’t new. The popular paper, Validity Matters More Than Cheating from Phillip Dawson, Margaret Bearman, Molly Dollinger and David Boud, set the tone for higher education conversations in Australia, but was grounded in work from all of the authors that predates the release of ChatGPT.
For myself, when I hear assessment validity, it takes me back through my own 15 years of secondary teaching to the Victorian Certificate of Education (VCE) assessment handbook and the Australian Skills Quality Authority (ASQA) documentation on how to produce valid assessments. While conversations about generative artificial intelligence have surfaced discussions of validity, it’s a well-established term in education and worth holding in mind as a core principle.
What is Validity?
Here I’m drawing on language from several documents including the VCAA handbook, the New South Wales Education Standards Authority (NESA) website, QCAA assessment advice, and ASQA’s guide on assessment validity. Each of these organisations treats validity slightly differently, but they have intersecting areas:
- Assess outcomes that are outlined in the curriculum and taught explicitly to students (content validity)
- Assess the outcome using the best available mode of assessment (construct validity)
- Consider the consequences of your design choices on the behaviour of the students (consequential validity)
- Design assessments which are authentic and reflect real world, practical applications of the knowledge and skills of your discipline
- Design assessments which allow students to demonstrate these skills in a variety of ways
- Design assessments which are inclusive and accessible to all students
- Assess formally and informally, valuing teachers’ professional judgements as well as external validation
- Build assessment evidence over time, to develop a whole picture of the student’s capabilities
Note that none of these points refers explicitly to generative artificial intelligence, because assessments can be valid with and without the technology. The point is to develop assessments which generate trustworthy evidence of learning.
These key points of assessment validity do not work in isolation from one another. Whilst it’s possible to improve the security and therefore the perceived trustworthiness of an assessment, for example, by moving an assessment into invigilated exam conditions, this may harm accessibility and inclusion. It’s a balancing act. Whilst formal assessments are obviously necessary for certified programmes, we cannot increase the volume of formally assessed materials at the risk of increasing teacher workload, and so we need to balance formal assessments with informal, but this requires trusting teachers’ professional judgements. Again, it’s all about balance.
Principle Two: Design for Reality
The term “authentic assessments” is somewhat problematic and has become a bit of a buzzword in education, but as discussed above, valid assessments should reflect real world processes and products and artificial intelligence may now legitimately be part of those workflows in many industries.
An authentic assessment is not just an opportunity to give a student a mock task in the hopes that it will increase their engagement or interest in the task. In the English classroom, for example, we can’t just tell students to write a news article and pretend that they’re a journalist in the vague hopes that some of them may be interested in journalism. In some subject areas, it can be difficult to imagine what authentic assessment looks like if the subject matter is disconnected from industries or areas of higher study where they might be applied: this is the “when am I ever going to use this in the real world?” problem of teaching Pythagoras’ theorem (hint: I used it when putting up a shed…).
But it is important to look for ways to bring more authenticity to tasks, because it can make the cheating question less relevant. Benito Cao wrote a piece for The Times Higher Education supplement recently where he co-opted a line from Australian Border Security: “don’t be sorry, just declare it”. Allowing students to use generative AI in ways which are authentic and which mirror the ways that people outside of education are using the technology and then encouraging them to be transparent and honest about it is incredibly important.
If we set up false and arbitrary processes for students, then many of them will still “cheat” with generative artificial intelligence, and it will become harder and harder for us as educators to detect those who are doing the wrong thing.
Principle two also speaks to what I call the “brutal reality” of assessment: GenAI is probably better than you think, your assessments are more vulnerable to AI-misuse than you think, and the technology is harder to detect than most educators believe. “Designing for reality” means designing authentic assessments that sometimes include GenAI in a deliberate, conscientious manner.
Principle Three: Transparency and Trust
The first version of our AI assessment scale was generally used as a way for teachers to show students what they considered to be appropriate or inappropriate use of artificial intelligence. As we developed towards version two of the scale, our language shifted much more towards discourse and conversations with students and transparency over expectations. Tom Corbin, Phill Dawson and Danny Liu wrote a paper in 2025 where they argued that structural changes to assessment design are necessary, and we agree entirely, but discourse and communication with students should also absolutely be a priority.
Be clear about expectations of when and how generative artificial intelligence can or cannot be used. Make these judgements based on an understanding of the technology. Teachers at any level of education need substantive professional development support to understand what generative AI can and cannot do. We cannot rely on third hand information, supposition or technology company propaganda.
Many teachers still have not had the time or the inclination to experiment with even the most common generative artificial intelligence technologies like ChatGPT and Microsoft Copilot. They have not seen firsthand how these technologies have evolved since 2023. Whenever I talk with educators, whether it’s in K-12 or higher education, they are shocked by the capabilities in mathematical reasoning, coding and language and there is often a sense that students are probably doing far more with generative artificial intelligence than we imagined.
For teachers to establish clear boundaries around the use of generative artificial intelligence, they need to understand the strengths and limitations of the technology, but once they do understand the technology, they absolutely should be responsible for setting these boundaries. Teachers are the experts in the room, and should be the ones helping students to understand where the technology can assist them in their learning and where it’s going to get in the way.
Students are telling us that this is what they want from education. They are looking to us for guidance. They want us to set boundaries and tell them what they can and cannot do and why.
Transparency and trust goes both ways. There have been instances recently of broken trust between students and educators, where institutions have applied heavy handed restrictions to students’ use of generative AI, but educators have been using it for lesson planning, creating resources and providing assessment. Again, we need transparent communication with students and communities about how educators are using the technology because educators are experts in their field. Sometimes it is appropriate for them to use the technology in ways which students should not.
A person who has already done the heavy lifting on learning subject matter is well placed to use GenAI and leverage that expertise. On the other hand, a person who is learning a new subject might fall into the trap of believing hallucinations, offloading too much of the effort of learning onto AI and so on. This is something I’ve written about recently in a couple of posts on the nature of expertise and artificial intelligence use.
Three Dimensions of Expertise for AI
So, we should frame policies and guidelines for assessment around capability building, not just rules and restrictions. This is not a “thou shalt not” dictate. It is a “you shouldn’t because…” conversation. We have to trust that the majority of students, when presented with the possibility to do the right thing, will choose to do the right thing for their own sake.
Principle Four: Assessment is a Process, Not a Point in Time
We need to move away from the idea of assessment as a point in time. Anyone who has been in education for as long as I have – almost 20 years now – will know that formative assessments, ongoing assessment practices, folios, learning journeys and the like, are not new. And anyone who’s been around for as long as I have will also know, rather cynically, that formative assessment is not valued as highly as summative. For all the talk about the importance of assessment as a process, students and institutions alike value what is graded.
We need to move away from grades, numbers, and final results. We need to move away from one shot high stakes assessments. We can dress this up in buzz words like formative, summative, programmatic assessment, assessment for learning versus assessment of learning, and so on, but at the end of the day, none of it will matter if we continue to place more perceived value on the end point, the number or letter.
We need to rethink assessment in such a way that students can see we value the process, the metacognitive aspects of learning, the discussions, the conversations, the informal moments, the collaborative moments, and not just the single points in time where a student individually demonstrates their knowledge or skills.

In senior secondary school, high stakes end of school examinations are probably the biggest sticking point. But in a recent article, I discussed our own peculiar stuck thinking about what an exam is actually for, because in secondary education, we persist with the false narrative that since the examination is weighted highly, all upstream assessments should reflect the exam. This comes from a place of good intent. Teachers feel compelled to “prepare students for the exam”. And the perceived best way to do this is to subject students to multiple exams over the course of their schooling. But there’s no evidence to suggest that making students do more exams makes them better at exams. Final exam should not dictate a whole year’s pedagogy. It certainly shouldn’t dictate the assessment methods of the whole of secondary schooling.
As per principle one in the discussion of assessment validity, examinations may be part of the evidence chain, a necessary “high security” part of the journey. So in some subject areas, this may be more important than others. In a discipline where students are required to memorise complex terms, be able to recall information under pressure, let’s say medicine or law, then it might be necessary to have a high stakes examination to prove that a student has not outsourced their learning throughout the rest of the course, whether to AI or in the much more mundane fashion of contract cheating.
But what of other subject areas: literature, music, the visual arts? Is it really necessary for a student in a vocational music industry course to demonstrate under examination conditions that they’ve memorised the regulations of the ARIA-AMRA Recorded Music Labelling Code of Practice…? Probably not.
The process of assessment should be contextualised to the discipline, and this is not an opportunity to label some courses or degrees as more worthwhile than others. They are equal but different.
Principle Five: Respect Professional Judgement
The final principle is systemic and institutional: trust teacher expertise. Institutions should resist rigid rules and surveillance technologies. We should not default to ineffective AI detection tools, proctoring software or process tracking technologies like Turnitin Clarity, which is somehow being marketed as a way to help students with their writing.
No writer I know will do their best work with somebody looking over their shoulder. It is inauthentic and therefore invalid.
Principle five wraps around the other principles. If we do not respect teachers’ professional judgement, then we will never allow for informal assessment. We will never genuinely value the process of assessment over the externally validated end point, and ultimately we will never address the problem of students misusing generative artificial intelligence to game the system.
Respecting teachers’ professional judgement is a call to refocus on the relationships between teachers and students, on a teacher building an understanding of a student’s capability over time, something which can be done in both face to face and online contexts. If we design online education around relationships and trust, rather than volume and scalability, then we will see improvements in the quality of learning.
Respecting teachers’ professional judgement also means understanding that assessment should not create workload concerns. Increasing the volume of formal assessment, and therefore the volume of assessable materials, the volume of rubrics, the volume of data needing to be entered into learning management systems, is not about trust. It’s about accountability, and accountability does not suggest respect.
We need to look for ways to recentre the expertise of teachers, ensuring that they are working in disciplines where they are confident in their own capabilities and secure in their judgements of students’ learning.
Conclusion
These five principles are pro-learning, not anti-AI. They’re aligned with emerging research on how students are interacting with GenAI, but more importantly, they reflect good practices that extend long before the release of ChatGPT.
Whenever I work with educators, faculties, teaching and learning teams, we talk about these issues before we even talk about generative artificial intelligence. When we lay our assessments on the tables before us, we ask: is it valid? Is it authentic? Are the reasons for our decisions transparent? Are we valuing the process and do we respect one another’s judgements?
If we can’t say yes to all of these things, it doesn’t really matter whether students are using generative artificial intelligence or not. This is why we need to rethink assessment. And it has very little to do with ChatGPT.
Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:
Your message has been sent
What Defined AI in Education in 2025? A Year in Review
What Defined AI in Education in 2025? In this article, I reflect on some of the highlights from the blog, looking at trends in the technology, how…
Professional Development for AI in Schools: A Three-Dimensional Approach
n Professional Development for AI in Schools: A Three-Dimensional Approach, Leon Furze presents a framework for supporting teachers in developing expertise with GenAI.
These Aren’t the Droids We’re Looking For: Move Along ChatGPT
ChatGPT just doesn’t work any more. This article looks at some of the ways in which Sam Altman’s company has fallen from grace and fallen behind competitors…
From Schools to Universities: Unpacking Australia’s New Framework for AI in Higher Education
A critical comparison of Australia’s 2025 Higher Education AI Framework and the K-12 Schools Framework. Exploring key differences in equity, Indigenous knowledges, and regulatory backing.
What the National AI Plan Means for Schools
The Australian National AI Plan 2025 has significant implications for schools. Here’s what educators need to know about policy, procurement, professional development, and 2026 planning.
Artificial Intelligence in Vocational Education
In Artificial Intelligence in Vocational Education, Leon Furze reflects on his experience at the Future Skills Organisation National Forum in Canberra, arguing that the Vocational Education and…
Teaching AI Ethics: Data 2025
This is an updated post in the series exploring AI ethics, building on the original 2023 discussion of data and “datafication”. This post explores why and how…
The Near Future of GenAI: December 2025 Update Part 1
What’s next for Generative AI? Discover the latest trends in AI Agents, synthetic media, and the battle for dominance between Google, OpenAI, and Anthropic.
Five Principles for Rethinking Assessment with GenAI
In Five Principles for Rethinking Assessment with Gen AI, Leon Furze argues that assessment reform should focus on good pedagogical practice rather than policing AI use. While…
Good Enough and Better Than Me: Two Problematic Student Perspectives on Gen AI
For the past few months, I’ve been visiting schools and hearing from students about why and how they’re using (or refusing) generative artificial intelligence. In this article,…
Teaching AI Ethics: Privacy 2025
This is an updated post in the series exploring AI ethics, building on the original 2023 discussion of privacy concerns. As generative AI has become embedded in…
About, With, Through, Without, Against: Five Ways to Learn AI
In About, With, Through, Without, Against: Five Ways to Learn AI, Leon Furze pushes back against simplistic binaries about whether AI helps or harms learning. He argues…







Leave a Reply