Everything I’ve Learned so far About OpenAI’s Agents

The hype train has fully left the station, and every AI punter on every social media channel is going wild about OpenAI’s new “Agents”. Unfortunately, most of the commentators haven’t actually tried the product – they’re relying on OpenAI’s promo video. Even those who have tried Agents seem to have been wooed by its accompanying marketing spiel.

But OpenAI’s product is unfinished: another example of a tech company releasing a technology into the wild just because they can, and not because they should. OpenAI wants to be first to market with a successful browser using AI agent, and they’ll be damned if a little fact like “it doesn’t work” will get in the way.

Initial Impressions of OpenAI’s Agents: Unfinished, Unsuccessful, and Unsafe

In my previous post, I explored some of the much hyped features of Agents: making PowerPoints and other resources. It was, predictably, hopeless. But that didn’t stop the hypesters commenting and messaging me with cries of “just wait six months!” and “you’re prompting it wrong!”

So I tried, and tried, and tried again. I wanted desperately to see in Agents what others were seeing: some sort of glorious techno-optimistic future where we’re all freed from the burden of things like… online shopping and… making PowerPoints.

On LinkedIn, I shared a few more attempts. The most successful involved getting o3-pro – OpenAI’s most powerful, most expensive model – to research all the capabilities of Agents and write clear, structured instructions in JSON. Large Language Model chatbots can write and read JSON very well, and using the structured data format is a proven way to get consistent results. So I had o3-pro write a set of instructions in JSON to make a 3-slide PowerPoint. It was still… uninspiring.

What can Agent actually do?

In a last ditch attempt to get Agent to produce something remotely useful, I decided to just… ask what it could do. This tactic works well with Claude, Anthropic’s AI model – ask for a list of available tools and it will tell you all of its features. So I figured I’d try a series of exercises with Agents to have it list and demonstrate all of its bells and whistles.

The following video is a full, uncut (but sped up) recording of this test. It demonstrates everything I’ve found so far and, alongside my earlier post, also shows the current limitations.

Now, I know that it will get better. I’ve been saying since 2024 that “computer using agents” are absolutely the future of this technology, and absolutely will be pushed into the market by Google, Meta, Microsoft, and every other tech company. Do I like it any better, knowing that it’s coming? No.

Following the video, scroll down for my observations and everything I’ve learned so far about agents. If you think I’ve missed any of the tools, commands, or limitations let me know in the comments.

Available Tools and Applications

Core Tools

Browser Tool – Chromium-based web browser for internet browsing
Computer Tool – Virtual desktop environment with full GUI control
Container Tool – Linux environment for running command-line programs and scripts
Image Generation Tool – Calls ImageGen to generate an image
Memento Tool – Internal utility for saving and recalling summaries of work across sessions. Rather hilariously, this tool is actually a hallucination! It appears (as far as I can tell) to have come from a 2019 LinkedIn article by Jesus Rodriguez, plus perhaps some OpenAI forum posts and a Medium article, which have been absorbed while Agent idly browsed the internet for its own tool list. I’m leaving it in as a great example of how strange and fairly useless Agent is at the moment.

Available Applications

Linux-based virtual desktop
Chrome Browser (Chromium-based)
LibreOffice Suite including:
- Writer (word processor)
- Calc (spreadsheets)
- Impress (presentations)
- Draw (drawing application)
- Base (database)
- Math (formula editor)

Computer Tool Functions

Core Functions

computer.initialize – Launches virtual desktop session
computer.get – Captures screenshot of current desktop state
computer.sync_file – Transfers files from virtual machine to user environment
computer.switch_app(app_name) – Switches between applications
- Supported apps: "chrome" and "libreoffice"

GUI Interaction Actions (`computer.do`)

click – Mouse click at coordinates
double_click – Double-click at coordinates
drag – Drag mouse along coordinate path
keypress – Press keyboard keys with modifiers (e.g., CTRL+A, CTRL+Z)
move – Move mouse to new position based on screen coordinates
scroll – Scroll content vertically/horizontally
type – Type text into active field
wait – Pause to allow UI updates

Multi-Action Sequences

Can chain multiple actions together for complex interactions
Fills forms and performs sequences on websites
Navigate websites and applications programmatically via the GUI

Programming and Development Capabilities

Python Programming

Full Python environment in container
Can import libraries
Can create and execute Python scripts
Generate data visualisations and graphs
Handle data processing and analysis
Create documents programmatically using modules like:
- python-pptx (PowerPoint creation)
- python-docx (Word document creation)

File Operations

Create, save, and manipulate files in virtual environment
Export files for download (ZIP, PDF, CSV, images, etc.)
Better file handling via command line than GUI
Can organise files in folders within virtual environment

Web Development

Create HTML websites with CSS and JavaScript
Download and reference external images and resources
Build responsive websites with smooth scrolling
Generate complete web projects with multiple files

Internet and Web Capabilities

Web Browsing

Search and navigate websites
Download PDFs and extract content
Access web-based applications (no login required)
Navigate Wikipedia, news sites, forums, webpages that do not require logins
Use open web-based tools like:
- OpenStreetMap
- Sketchpad applications
- Games
- Unsplash for stock images

Limitations

Frequent 404 errors and navigation issues
Cross-origin security restrictions
Cannot access sites requiring authentication
More error-prone than typical human browsing
Must hand over to human user for logins, unless details are provided in the prompt (don’t do that!)

Document Creation Methods

Method 1: Python-based (Faster)

Uses Python libraries for document creation
Limited styling and formatting options
Faster execution but basic appearance
Programmatic approach to content generation

Method 2: LibreOffice GUI (Slower but marginally better looking)

Full LibreOffice interface control
Better visual formatting and styling
Much slower and more error-prone
Can create (some) tables, insert images, format text
Prone to GUI interaction issues, clicking in the wrong place, and getting stuck in a loop

Image Generation and Processing

Access to ChatGPT’s image generator
Can create abstract images, graphics, and visual elements
Generate charts and graphs from data
Download and organise stock images from web sources

Data Analysis and Visualisation

Process datasets and perform statistical analysis
Create various types of graphs and charts
Handle CSV files and data exports
Generate random data for testing and demonstrations

What OpenAI Agents CANNOT Do

Installation Limitations

Cannot install new applications – Security guardrail prevents software installation
Limited to pre-installed Chrome and LibreOffice for now

Performance Issues

GUI interactions are slow and error-prone compared to command-line operations
Frequent 404 errors when browsing websites
Gets confused with multiple simultaneous tasks – tends to go in circles
Can’t play Minesweeper
Struggles to access authenticated websites
Cross-origin security restrictions limit web browsing capabilities
Struggles with complex table creation in LibreOffice
File operations via GUI are unreliable compared to command-line
Uncertain about its own available tools – sometimes doesn’t know what it can do. At one point, referred to a third-party blog post about its own toolset…

Output Quality Issues

Output reports, PowerPoints, and other documents are sparse on information and poorly formatted
Generates annoying OpenAI citation links in documents (the long non-clickable number sequences)
Poor table formatting in LibreOffice documents
Inconsistent task completion – may fail to complete all requested updates

Download a PDF of the features here:

v2OpenAIAgentFactSheet

PostScript: Installing Packages

After running these tests, it occurred to me to also check in on the ability to install packages via the command line. Having told me it could not install new applications (beyond Chrome and LibreOffice), I figured it should at least be able to download and install things via the command line. First, I asked if it could list its currently installed packages:

There’s obviously a lot preinstalled in the environment, but the output of the list “broke” after the first 18 items.

It then proceeded to install and test some new packages, proving that it can add new capabilities this way.

What have I missed? Get in touch below or leave a comment at the end of this article.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

← Back

Thank you for your response. ✨

6 responses to “Everything I’ve Learned so far About OpenAI’s Agents”

OpenAI’s Agent Mode: A Cybersecurity Nightmare Or Overblown Simulation? – Undercode Testing

July 26, 2025 at 10:05 pm

[…] Leon Furze’s Agent Mode Findings […]

Loading…

到目前为止我学到的有关OpenAI代理的一切 – 偏执的码农

July 27, 2025 at 9:19 am

[…] 详情参考 […]

Loading…

Alles, was ich bisher über OpenAI's Agents gelernt habe

July 29, 2025 at 6:36 am

[…] Quelle: Everything I’ve learned so far about OpenAI’s Agents […]

Loading…

Processes are More Important than Prompts – Leon Furze

November 3, 2025 at 1:35 pm

[…] Browser based and other “AI Agents”: semi-autonomous web browsing tasks and for some reason lots of online shopping and travel bookings… […]

Loading…

The Near Future of GenAI: December 2025 Update Part 1 – Leon Furze

November 27, 2025 at 10:46 am

[…] of custom instructions, a defined knowledge base (for example from SharePoint), and a specific job. An OpenAI Agent, on the other hand, is a browser-using, internet searching, semi-autonomous “assistant” […]

Loading…

Free PD Video: How to Use ChatGPT – Leon Furze

February 15, 2026 at 9:19 am

[…] Everything I’ve Learned so far About OpenAI’s Agents […]

Loading…

Leon Furze

Everything I’ve Learned so far About OpenAI’s Agents

What can Agent actually do?