Claude Computer Use: The Next ChatGPT Moment

In May 2022, I made the decision to step out of the classroom and apply for a PhD, broadly focused on digital texts. I grabbed a few articles, like Bradley Robinson’s on automated writing technologies, and began reading up on these “large language model things”, like OpenAI’s Generative Pre-trained Transformer.

Back then, people accessed GPT as a developer or through a handful of third-party applications, which were mostly geared towards writing marketing copy, social media posts, and summary paragraphs. Students were also beginning to use them to spin paragraphs, essentially evading plagiarism detection tools by recycling language through apps like WriteSonic and Jasper.

As GPT-2 and GPT-3 rolled out, I played around with using them to write essays. They were probably good enough for some of the formulaic work we make Year 9 students write, but it wasn’t blowing anyone’s mind. And then in November 2022, just a couple of weeks after the official commencement of my PhD, OpenAI released ChatGPT.

We all know what happened next.

Claude Computer Use Is the Next ChatGPT Moment

Last week, Anthropic made an announcement with a new feature: Claude computer use. In their demo videos (clearly labeled as edited and not representative of real-world performance), Anthropic showed a Claude chatbot taking control of a computer’s mouse and keyboard and navigating various tasks, such as browsing web pages and entering data.

This is not yet a public feature, but it is already available to test out through the Anthropic API. This means, if you have a free developer account and a few dollars worth of credits, you can set up a system to test Claude computer use for yourself. So that’s exactly what I did, following the brief instructions on Simon Willison’s blog and Anthropic’s own advice in their GitHub repo. I had computer use up and running in a virtual machine, on Docker, on my MacBook, in under 10 minutes.

In that environment, it opens in the local browser address, running a simplistic Linux virtual computer with a handful of apps, including Firefox. I had a little think about what to test first.

Claude computer use running in a Linux virtual computer via Docker. You can see the chat window on the left (a streamlit app, from the looks of it) and the desktop on the right including a few apps.

Experiment One: Browsing the Blog

For my first simple experiment, I asked it to go to my website and locate the blog. I followed that up with a request to find posts in the blog related to Claude, such as my “Hands-on with Claude 3.5 Sonnet” posts, or maybe “Building an App in a Weekend.” Unfortunately for the chatbot, there are a lot of posts on my blog, and the ones about Claude are quite far back. It scrolled a few pages before I interrupted and sent it to We Need to Talk About Deepfakes instead.

Around about then, it timed out with a “rate limit” error.

Experiment Two: Claude Played My Deep Fake Game

Because Claude can take over the mouse and uses accurate coordinates and screenshots to position the mouse, it can click and interact with areas of the screen. The ability to hand over mouse clicks and keyboard strokes to a computer is nothing new. I remember my geeky teenage self cheating on video games by writing macros that would repeatedly click on certain areas of the screen – like RPGs with a fishing mechanic. I’d set up my automated clicker, go for dinner, and come back an hour or two later to a pile of 1000 fish.

What’s new though is the combination of that approach with the large language model and image recognition capabilities built into tools. First of all, I sent the model directly to the deep fake game. It visited the page and then paused for a second to ask me if I would like it to scroll down so that we could play the game.

Claude was able to correctly infer the logic of the game (not that it’s particularly complicated). Click between a choice of two buttons and it selected its first correct answer. The button to advance the game was still off screen, so once again, it needed to scroll down, click, and then scroll back up.

Real or Fake? The AI Deepfake Game

It took a little while to explain this function to the chatbot, and whilst I was watching this second experiment, it felt very much like teaching a first-time computer user how to browse the internet. It’s also obviously using Page Up and Page Down functions to scroll, rather than the smooth scrolling that we’re used to with mouse, trackpad, or touch screen. So when it scrolls, it scrolls down by quite a distance.

This became a problem on the second pair of images, where Claude inferred incorrectly that the game was finished because it could no longer see the Advance button. I asked it to zoom out a little, and it did by exactly 10%. On reflection, I’d probably begin interactions by telling Claude to zoom the entire browser out to 50%. Positive that Claude could still read all of the details at that magnification, and you’d have a lot more on screen.

Ultimately, this experiment timed out after about eight minutes of use. Total cost through the API: 27 cents. There’s a voiceover explaining the process in this video:

Experiment Three: Welcome to the Desert of the Real

As soon as Anthropic announced this feature, my first comment was fairly flippant: “Claude, jump into my LMS and complete any outstanding tasks, cheers mate”.

So for the third experiment, I thought I’d try exactly that – can Claude navigate an LMS and submit an answer on my behalf? I didn’t want to let this thing loose with my actual Deakin University credentials running rampant through the Deakin Sync LMS, so instead, I fired up “old-fashioned” Claude and asked it to write some code for a simple mock LMS.

I uploaded a couple of images from a popular learning management system and the instructions to create a mock LMS with three subjects, the first of which (English 101) should contain several assignments. The first assignment should also have a mock submissions platform with an interactive text box and a submission button. Claude handled this perfectly well, and in a couple of minutes, I had my mock LMS set up on this website.

Prompt in Claude 3.5 Sonnet to create the mock LMS

This third experiment is where everything became a little bit more real. This was my ChatGPT moment. The following video shows the entire process. In reality, it took around 2 minutes and 40 seconds for Claude to load the web page, find the subject, open the assignments and produce a (terrible) response. The video is edited to double speed with a few freeze frames to show the interactions between myself and the chatbot. I posted this video on LinkedIn on Friday night, and it blew up – and rightly so.

What Does Computer Use Mean for Education?

If ChatGPT kicked the hornets’ nest on the vulnerabilities of traditional essays, then computer use should be seen as the beginning of the end for all forms of arbitrary online “go here, do that” forms of assessment.

Although I’m calling this “the ChatGPT moment”, maybe that’s not entirely accurate. When ChatGPT was released, it was released very publicly and scaled to over 100 million users. It was clunky and frequently error-prone, but more or less a finished application. Simplistic as it was, everybody could instantly see the utility and the appeal of a chatbot that could write volumes of human-level text.

Claude computer use is not that. It’s highly buggy, comparatively difficult to set up, it’s not free, and in terms of security and safety, this is a world away from even the privacy concerns we had over OpenAI’s product. Maybe Claude computer use should be seen as the GPT-2 or GPT-3 moment – the precursor to something huge.

But I don’t want educators to be caught off guard again. I don’t want us to wait for the finished product, because you can guarantee that OpenAI, Microsoft, Google, maybe even Apple, all have their own versions of this technology. You can guarantee that Claude computer use is a canary in the coal mine for large language models that operate computers on our behalf.

Claude computer use is the canary in the coal mine for our online education system

Think about the implications of that – anything that a student does on a computer can realistically be automated. In many ways, this is already true. If a student had to complete a very routine task and they had sufficient coding skills, they could certainly write Python scripts to automate mouse clicks and keyboard entries. But this is something else entirely. This is the automation of much more complex and sophisticated tasks than simple data entry or routine tasks. This is the automation of potentially everything that can be done with a computer. And given how many of our students study online, that means everything.

Think about the trajectory of ChatGPT from November 2022. In March 2023, OpenAI released GPT-4, a significant step up in terms of its sophistication and language capabilities. At roughly the same time, Anthropic released Claude, arguably even better at language-based tasks than its counterpart. From there, we had a frantic arms race in 2023 between methods, open-source models steadily catching up with GPT-3.5 class language models, Google finally joining the party with Bard and then Gemini, and successive releases from Claude leading up to Claude 3 Opus.

Eventually in 2024, we hit GPT-4, Claude 3.5 Sonnet, Gemini Advanced, Llama 3.2, and Apple joined the fray with Apple Intelligence. We’ve seen advances in image recognition, speech recognition, voice generation, speech-to-speech, multi-modality, image generation, and video generation. Many of these capabilities will be bundled together, presumably in the release of GPT-5, which in all likelihood will have audio and video generation as well as speech-to-speech.

What does it look like when an AI model that powerful gets access to your mouse and keyboard?

Computer Use and Assessment

When we updated our AI assessment scale, we bumped the old AI-human evaluation level four and full AI down a notch to make room for Level Five exploration. Claude computer use is a great example of where we will need this mindset.

Updating the AI Assessment Scale

We’re going to need to work together with students to understand how this technology can be beneficial for teaching and learning. On one of my recent posts, an educator commented that they weren’t afraid of these technologies – they were angry. Angry at the developers for continuously rolling out technologies that nobody asked for and perhaps that nobody is prepared for.

I can empathise. It’s overwhelming to be confronted by wave after wave of significant technological advancement. But since November 2022, most changes have been incremental. That Anthropic has already been able to demonstrate computer use capabilities is not incremental. It’s a step change, a missing piece of the puzzle towards what the technology companies describe as AI agents.

Unfortunately, these companies aren’t going to wait and they aren’t going to ask permission. Whatever your opinions on artificial intelligence, we have a responsibility to help students understand the implications of the technology. Even as one of the authors of the AI assessment scale, Level Five feels intangible, unattainable. I don’t know what it looks like to explore AI as an assessment tool alongside students, but I will say this: although I am depressed by Anthropic’s latest feature, I refuse to be surprised and I refuse to be caught off guard.

I’d encourage every educator reading this to learn as much as they can about features like Claude computer use. This is the key piece of the near future of generative AI, and whatever your personal feelings about the technology, you and your students will be impacted by it.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

Go back

Your message has been sent

The Practical AI Strategies online course is available now! Over 4 hours of content split into 10-20 minute lessons, covering 6 key areas of Generative AI. You’ll learn how GenAI works, how to prompt text, image, and other models, and the ethical implications of this complex technology. You will also learn how to adapt education and assessment practices to deal with GenAI. This course has been designed for K-12 and Higher Education, and is available now.

Check out the course

9 responses to “Claude Computer Use: The Next ChatGPT Moment”

Niall

October 28, 2024 at 4:44 pm

I’m not a huge fan of digital assessment, particularly high-stakes, but am interested to hear your opinion on whether a service provider could block an LLM from taking an assessment from a learner’s computer, i.e. is it possible to identify if a human or chatbot is moving the mouse (realise this computer controltechnology is super new as well and may advance)

Loading…

1. Leon Furze
  
  October 29, 2024 at 3:10 pm
  
  I think it would be difficult. Essentially what’s happening here is that the LLM is running directly through the virtual computer, so to all purposes it would “look” to the LMS like a genuine user with a normal IP address… not sure though!
  
  Loading…
  
AI can’t do your Christmas shopping just yet – but next year might be different – Johansen.se

December 20, 2024 at 10:30 am

[…] expert Leon Furze created a demo using computer use to automatically browse to a learning management system, open the page for an assignment, create […]

Loading…

AI can’t do your Christmas shopping just yet – but next year might be different

December 20, 2024 at 10:36 am

[…] expert Leon Furze created a demo using computer use to automatically browse to a learning management system, open the page for an assignment, create […]

Loading…

AI can’t do your Christmas shopping just yet – but next year might be different – CybAI news

December 20, 2024 at 10:40 am

[…] expert Leon Furze created a demo using computer use to automatically browse to a learning management system, open the page for an assignment, create […]

Loading…

AI may not be able to handle your Christmas shopping this year, but it could be a possibility for next year. – GretAi News

December 20, 2024 at 11:23 am

[…] expert Leon Furze created a demo using computer use to automatically browse to a learning management system, open the page for an assignment, create […]

Loading…

AI can’t do your Christmas shopping just yet – but next year might be different – USA News Update Latest

December 20, 2024 at 3:46 pm

[…] expert Leon Furze created a demo using computer use to automatically browse to a learning management system, open the page for an assignment, create […]

Loading…

AI can’t do your Christmas shopping just yet – but next year might be different – Capital

December 20, 2024 at 6:09 pm

[…] expert Leon Furze created a demo using computer use to automatically browse to a learning management system, open the page for an assignment, create […]

Loading…

AI and the Future of HE – 28th October 2024 – learningfuturist.co

November 1, 2025 at 4:47 am

[…] in the edtech x AI space: Donald Clark calls it “practical agency”, Leon Furze‘s tested it extensively, and Ethan Mollick‘s gone deep on what it can do. Think “next ChatGPT moment” for […]

Loading…