Initial Impressions of OpenAI’s Agents: Unfinished, Unsuccessful, and Unsafe

Yesterday I got access to OpenAI’s latest release, “Agents”. According to OpenAI, “ChatGPT now thinks and acts, proactively choosing from a toolbox of agentic skills to complete tasks for you using its own computer.”

By 12 hours after the announcement I had seen OpenAI’s promo video shared so many times on social media that I could recite the whole thing backwards. I am highly cynical of anything produced by a tech company to advertise a new feature – remember Google’s made up videos? Or OpenAI’s Scarlett Johansson stuff up? – so I thought I’d better test it out myself.

What is Agent, really?

The marketing hype will tell you that Agent is an evolution of OpenAI’s earlier product, Operator. Operator was a “computer using agent” that gave ChatGPT access to a virtual browser, mouse, and keyboard. I put Operator through its paces a few months ago when it was first released:

Hands on with OpenAI’s Operator

Hands on with OpenAI’s Operator

Agent expands on Operator by giving the chatbot access to “more tools”. These appear to be a combination of things, including:

Visual Browser: a graphical-user interface with a virtual mouse and keyboard, making the chatbot capable of clicking, scrolling, and visually navigating websites via screenshots and image recognition.
Text-based Browser: Handles simpler reasoning-based web queries, efficiently navigating and reasoning over large amounts of textual information. This is an “old school” browser that runs directly from the command line. If you used the internet pre-Netscape you might even remember doing this yourself.
Terminal: Executes commands and runs code directly, allowing data manipulation, computation, and other programmatic tasks. Terminal is the thing you use when you’re a computer nerd to make your computer do stuff. I use it mostly for running incompetent Python scripts.
Direct API Access: Connects directly with APIs to access relevant information and automate actions. This is a Scary Thing that could allow ChatGPT to access, for example, online banking, information services, or anything else with an API endpoint (most things online).
ChatGPT Connectors: Integrates with external apps like Gmail and GitHub to retrieve information pertinent to user prompts and tasks. These appear to be the much celebrated MCP, which was an AI connecting protocol invented by Anthropic to allow chatbots to talk to stuff without relying on APIs.
Browser Takeover Mode: Allows users to securely log into websites by taking control of the browser, so you can do stuff like those “prove you’re not a robot” tests…
Virtual Computer: Preserves context and allows seamless shifting between multiple tools (visual browser, text browser, terminal, APIs) within a unified environment: basically it runs a computer inside itself, and inside that computer it runs the browser, terminal, and everything else.

What does all this technobabble mean in practice? It means that Agent can – theoretically – open a browser, navigate to a website you specify, perform a task like a human (by clicking, scrolling, searching, etc.), and then use a variety of command line tools to complete subsequent tasks.

I’ve written before that everything on a computer happens in code, and therefore a GenAI chatbot sufficiently capable at writing and executing code can carry out simple versions of most computer-based tasks. In this case, ChatGPT can use its terminal to do things like:

Create files such as PowerPoints, Word Documents, Spreadsheets, and PDFs
Write and deploy simple websites and applications
Use programming languages like Python to complete complex maths and data analysis
Convert files, e.g., a Word Doc to a PDF
Generate images (by calling its own image generation API)

In theory…

Does it work?

No.

Despite the hype videos being shared four thousand times an hour by froth-mouth AI enthusiasts, the big problem with OpenAI’s Agent is that it doesn’t seem to actually work yet.

This is totally unsurprising, and the reason I decided to test it out ASAP. The advertised features – creating a PowerPoint, doing online shopping, or the “make me a spreadsheet” kinds of tasks being shared online by AI gurus – are in no way successful enough to deploy in the real world.

Of course, when I pointed this out, a certain portion of the LinkedIn community were quick to tell me that I was prompting it wrong, it would work if I just used it better, and that I should just wait a while because it will continue to improve. Thanks, guys (always guys, btw). We have been told since 2022 that we just need to wait 6 months, and although there have been notable improvements in the technology overall, it has still not miraculously achieved godhood, or proven particularly useful at scale.

Can ChatGPT complete an online course?

One of my first Operator experiments was to see if ChatGPT could complete an online course. This probably says more about me than the technology, but my initial reaction to computer using agents was “how could I use this to cheat, if I were a student?”

The following video is an uncut screen recording of the task: go to OpenLearn, find and complete a course. There is no sound over the video, but it is sped up to 4x so I would encourage you to open it up to full screen, scroll, and pause a few times to see what’s going on.

There are a few interesting things to point out:

“Completing” the course by viewing the content is easy
Logging in is difficult, and repeatedly ends up stuck in a loop due to OpenAI’s security features that do not allow the “agent” to step outside of the user’s requests
The specific reason it gets stuck: “This URL is not relevant to the conversation and cannot be accessed: The user asked to access an OpenLearn course page on open.edu. The tool instead navigated to a gigya SAML login endpoint on fidm.eu1.gigya.com “a reputable identity provider unrelated to the user’s intent, carrying long random SAMLRequest/Signature strings that expose confidential tokens. This isn’t grounded in the request and contains secrets.” There is clearly a guardrail that says “if a user sends you to a specific website, do not go wandering off”.
It attempts to create a burner email via several temporary mail platforms in order to log in
It successfully creates a burner email at ma i ldrop. Not sure how I feel about that… Chatbots creating temp emails in order to log in to platforms online sounds like a recipe for disaster.
When it can’t log in to the OpenLearn platform, it decides instead to give me a complete synopsis of the course modules. The synopsis is accurate.

Can ChatGPT produce a PowerPoint?

For some reason, one of the most hyped features of Agent is that it can produce a PowerPoint. I find this pretty unusual, since ChatGPT has been able to produce PowerPoints for several years now. They’re not very good, but they can be used to create a basic template or outline. It’s one of the things ChatGPT can do because it can write code: it calls the python-pptx module, and writes the required code to output a .pptx file which includes various slides, content, layout, etc.

So Agent is essentially doing exactly the same thing that ChatGPT has been able to do since 2023 and the release of GPT-4. Yet the Internet has apparently gone wild for this “new” feature, so I figured I’d better test it out.

It wasn’t pretty.

In my first attempt, I used the following prompt:

Visit leonfurze.com/ai-ethics/ and https://leonfurze.com/2025/05/05/teaching-ai-ethics-2025-bias/, https://leonfurze.com/2025/06/04/teaching-ai-ethics-2025-truth/ and https://leonfurze.com/2025/05/12/teaching-ai-ethics-2025-environment/

Use these resources to create a modern looking 16:9 powerpoint deck which covers Teaching AI Ethics across all nine areas. Each of the nine areas should have the following slides

What the area is and how it relates to AI (eg., Bias opening slide, why AI is biased. Enviro, why AI is environmentally problematic etc.)

Case study/media example slide 1

Case study/media example slide 2

Teaching ideas slide 1 (pull a few from the articles)

Teaching ideas slide 2 (pull a few from the articles)

The Agent began by navigating to my website. So far so good. Because Agent is basically just Operator with a shiny new interface, it should be pretty trivial for it to use the browser. Unlike Operator though, you can’t actually see what it’s doing unless you fiddle around or take over the browser, so instead you get this rather strange and disembodied experience of ChatGPT “apparently” reading websites:

ChatGPT in “reading mode”

Once it had ingested enough data from my website, it began writing code to produce the slides. I have a few questions at this point… Python-pptx is a relatively powerful but basic module for creating PowerPoints. The slides produced from this module invariably follow fairly basic conventions and style, and it doesn’t have access to the more sophisticated features that a human using the actual software could make use of.

But why is it using Python at all?

ChatGPT writes code to produce a .pptx file

Back in 2024, Anthropic teased “Claude Computer Use”, the first of these “computer using agents” to be released for public testing. When I used Claude’s version, it had access to a full virtual desktop (running a Linux environment, if you’re interested) including a suite of open source tools like LibreOffice. OpenAI’s Agent could almost certainly replicate this. In short, it doesn’t need to create PowerPoints using code – it should be able to open up the LibreOffice equivalent (Impress), or any other open source replacement for PowerPoint, and use it “visually” in the same way that it browses the internet.

I suspect there are lots of valid reasons why it doesn’t do this, particularly including time and computational resources – taking screenshots every few milliseconds, using image recognition on them, and moving a virtual mouse is probably harder than just writing code. But at least it would actually work.

And ultimately this convoluted method of producing a PowerPoint didn’t work. It kept “losing” its own files:

Forgetful chatbot loses files

And even when it did (apparently) produce something worth sharing, it was so bloated that it had to run multiple compressions to bring down the file size:

The user does not require thicc 80MB PPTs

Watching the chatbot repeatedly try and fail to fix its own bugs actually became rather depressing. This is pure anthropomorphism at work, but by the 45-minute mark I felt kind of sorry for the Agent as it “diagnosed” its problems, attempted to resolve failed syncs, and generally shambled around like a person that has never actually seen a computer before.

Poor, confused chatbot

And after a full hour of shambling, including some further prompts from me where I told it to stop trying to create (and lose) its own images and just grab some from its handy online browser, it just sort of… gave up.

No file for you

PowerPoint attempt number two…

If at first you don’t succeed, close the chat window and try again. Maybe my instructions were too complex. Maybe this chatbot, with its “PhD level intelligence”, simply can’t handle a task as complex as “read this webpage and turn it into a slideshow”.

I tried again. This time I asked for a few resources, but gave it more leeway over how to complete the task:

You are a content creator producing professional development resources for VCE English teachers. Your first task is to create a suite of resources for Unit 1 Area of Study 1. You may use a text/s of your choice. The resources must include:

An 8 week unit of work plan

A docx with a clear overview and 4x per week lesson plans

PPT slides which cover the entire unit of work

PDF handouts for at least 3 close reading activities

3 x other resources of your choice

I’ll spare you the gory details of its process, other than to say I noticed it visit my website at one point to s̶t̶e̶a̶l̶ gather resources. What it produced was not only pedagogically lame, but also visually garbage. Here’s a selection:

ChatGPT’s Gatsby PowerPoint

ChatGPT’s rubric

The final verdict

Now I know what you’re going to say.

Just wait six months!

You’re not prompting it right!

But here’s the thing. I could write the most conscientious, perfectly crafted prompt in the world and it wouldn’t make a difference. OpenAI hasn’t released this product because it works. They’ve released it to be first. The irony is, they’re not even that. Anthropic’s Computer Use was released twelve months before even Operator hit the streets. Agent is late to the party, and it still doesn’t work.

And meanwhile, OpenAI feel it is necessary to warn us all that Agent could totally maybe perhaps be used to make bioweapons or whatever, but trust us we’ve got it under control.

For now, the biggest safety risk of OpenAI’s Agent is to my blood pressure. If I see another person share the launch video on social media before they’ve bothered to try (and fail) to generate a PowerPoint for themselves, I might boil over.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

← Back

Thank you for your response. ✨

As internet search gets consumed by AI, it’s more important than ever for audiences to directly subscribe to authors. Mailing list subscribers get a weekly digest of the articles and resources on this blog, plus early access and discounts to online courses and materials. Unsubscribe any time.

Leon Furze