Hands on with OpenAI’s Operator

OpenAI’s Operator is a standalone application which uses ChatGPT to control a simple web browser, and can control a mouse and keyboard to complete online tasks semi-autonomously. In this post, I’m going to explore how Operator works, what it can do, and what is on the horizon for this technology.

It’s not often I do two “Hands on” posts in a row, but OpenAI seem to be in a flurry of product releases at the moment, and I think that Operator is an important indication of the near future of GenAI. In the previous post, I spoke about Deep Research, which has just been dropped to the $20/mth ‘Plus’ subscription tier. Operator is still in the hugely expensive $200/mth tier, but I’m sure it will follow suit and drop in price soon enough.

What is Operator?

Operator is currently in “research preview” for Pro users, meaning it is not part of the standard suite of models. It is accessed via a separate website at operator.chatgpt.com, with the same login details as a normal ChatGPT account. According to the OpenAI release blog post, Operator:

can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes. The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses.

https://openai.com/index/introducing-operator

It works through a combination of text, image, and ChatGPT prompting, which takes an input request from a user and then passes it to an instance of ChatGPT which has control of a “virtual machine”. OpenAI call this their Computer Using Agent (CUA) model.

Flow chart of computer using agent (CUA) model, showing how information is passed from the prompt to the virtual machine
https://openai.com/index/computer-using-agent/

In practice, this means that you enter a request, and the application fires up a miniaturised browser window which it controls through a combination of screenshots and text prompts. The model takes a screenshot of the browser window while navigating the internet, then uses a version of OpenAI’s “reasoning” model to interpret and make decisions.

Operator in Action

Once I got access, I decided to try out a few similar activities to my earlier experiments with Claude Computer Use back in 2024. The below videos demonstrate Operator completing a variety of these tasks, such as playing my Deepfake Game, completing (some of) an online course, shopping for a pair of jeans, and compiling a list of information about my online courses.

Can Operator play the Deepfake game?

First up, Operator versus deepfakes. Claude didn’t handle this task too well in 2024 – it struggled with scrolling, couldn’t press the buttons, and took a long time to answer. OpenAI’s Operator fares much better, and in two minutes managed to score a respectable 6/10 on the game:

Operator versus Coursera

The next experiment sends Operator off to find and complete an Intermediate Python course on Coursera. This involved creating an account (just for this experiment – I deleted it afterwards!) and giving login details to Operator. After handing the CAPTCHA back to me, ironically to prove I’m “not a robot”, it browsed through the courses and with some additional prompting found a valid one from IBM.

It progressed through the course quickly, switching the video to 2x like any good online student. At the end, I couldn’t get it to access the end-of-unit tests (which were paywalled), so I had it write a “student style reflection” instead.

You can see why this kind of (mis)use of Operator-style AI might have some online and hybrid education providers worried…

Operator goes shopping

In 2024 I tried to get Claude’s Computer Use to find me a pair “cheap Levis 512s in black, size 32/30”. This weirdly specific request is much closer to how I’d actually go online shopping – I’m not a huge fan of aimlessly browsing clothing websites. But with the short legs of an Englishman descended from miners, I usually can’t find clothes that fit in Australian clothes stores. Maybe Operator can help.

Like Computer Use, OpenAI went straight to the Australian Levis store – probably not the place to find the “cheapest pair” of jeans. But, to its credit, when it couldn’t find the right size it went wandering across the internet to fulfil the task. After trying several more stores, it eventually found one with the right colour and size… then clicked the wrong size and added to cart.

I have no idea why the model changed its mind at the crucial point of purchase, but these are the kind of bugs keeping Operator-style computer use firmly in “research mode” at the moment.

Something actually useful

I still think that computer-using GenAI applications like Operator are a huge part of the future of this technology. I also think, somewhat contradictorily, that they’re a waste of time. It seems ludicrous that we’ve spent years making websites user friendly for humans, and we’re now designing AI that needs to use a computer like a person in order to navigate the web when there are so many more efficient ways to do it.

But still, all of the major developers are piling money and resources into building these things, and I have no doubt that applications like Operator will be more widely available by the end of 2025. Perplexity are releasing one, Microsoft has one, and Google are working on models for “AI Agents”. They really want you to use these things.

So I’ve been trying all week to find an actual use case in my day-to-day work, and the other morning I finally got one. A school had asked for a list of my online courses, the fees, and descriptions. For some reason, this is not something I have readily available – I can point people to the website and let them browse themselves, but I don’t have an up to date complete list.

Instead of spending a few minutes jotting down the info myself, I tasked Operator with the job and went to get a coffee. When I returned, it had browed the site at practicalaistrategies.com and returned the list I’d asked for.

Low stakes stuff, but I suppose that a few of these instances over a day or a week would actually stack up to some significant time saved in administrative jobs like this.

The Future of GenAI

Within 12 months, you will have access to Operator style AI assistants in every major browser, and on every operating system. I suspect there will be the usual arms race between Microsoft and Google, and that Apple will continue its approach of partnering with existing developers like OpenAI and now Google.

Whether its built directly into browser like Chrome, or works at the operating system level like Copilot or a future version of Siri, I believe it is inevitable that applications like Operator will be taking over our devices very soon.

Think about what that means, in and outside of education. Data entry, travel bookings, administrative tasks, managing files on a Learning Management System, autocompleting online courses, buying jeans… anything which is possible purely through clicks on a website will be fair game for these applications running on your laptop, phone, or perhaps even your next generation smart glasses.

In 2024, I called this “the next ChatGPT moment”. I haven’t changed my mind.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

Go back

Your message has been sent

Warning
Warning
Warning
Warning
Warning.

One response to “Hands on with OpenAI’s Operator”

  1. […] their most recent product, Agents. OpenAI’s Agents is an extension of their earlier experiments with Operator in producing a computer using agent, where the large language model’s chat bot can control a […]

Leave a Reply