Did You Know AI Can Do... That?

In an earlier article titled ‘IYKYK’ (If You Know You Know), I argued that most people’s mental model of GenAI was set by their first interaction with it, and that the technology’s interfaces do almost nothing to help us discover what else is possible. The blank text box and blinking cursor of GenAI became the ceiling.

How can I help? No idea

This post is about finding ways to break through that ceiling. These are concrete examples of things it can do right now, today, that most people don’t know about – I didn’t know most of these things were possible until I stumbled across them, researched them, or got shown the way by someone else. Some of these are pretty mundane. Some are surprising. Each of them, at some point, broke my mental model of what this technology is.

Mundane But Useful

Most educators I work with know that AI can generate text: write an email, summarise a document, draft a lesson plan. Fewer know that most GenAI apps can read and understand images. Upload a photo of a whiteboard covered in student brainstorming and ask it to organise the ideas into themes. Hand it a screenshot of a timetable and ask it to find the clashes. This isn’t cutting-edge: “vision language models” (VLMs) been built into most models for well over a year. But because the interface is a text box, most people never think to drag an image into it.

GenAI can also convert file formats. Not through some plugin or third-party tool, but directly. Hand a chatbot like Claude or ChatGPT a .csv or spreadsheet and ask for a PDF. Give it a markdown document and ask for a Word file. Upload a PDF and ask it to pull the content into a slide deck. These are small, unglamorous tasks that eat lots of admin time, and many people are doing them manually – downloading, opening, and ‘saving as’ – or hunting for an online converter.

Gemini converts a CSV to PDF, but not without criticising it first

One more: If you have messy data, a spreadsheet full of inconsistent formatting, duplicated entries, or columns that don’t quite line up, GenAI can clean it. Not by suggesting what you should do (thanks, Copilot), but by actually doing it: reading the file, writing code to fix the problems, and handing you back a clean version. I’ve watched teachers spend entire afternoons wrangling assessment data that an AI could sort in seconds. They don’t ask, because “writing tool” doesn’t suggest “data cleaner. Deidentify your data, or use an enterprise account that has the seal of privacy approval fro your exec team, and see what you can do.

Squeaky clean data – cleaning a mock dataset in Claude

These kinds of task are possible because of two facts about modern LLM-based applications:

They are multimodal
They can read and execute code

Those two relatively simple technical facts can change your whole mental model of what AI is from “friendly assistant” to “computer interface”.

Breaking the Ceiling

The examples above are useful but not necessarily transformative. The moments that actually shifted how I think about AI were a little different.

These moments also aren’t limited to understanding how GenAI works: many of them are more broadly related to how computers work. The following examples demonstrate how shifting your mental models can open up whole new avenues of experimentation and GenAI use.

Voice first. I’m a writer, but first I was a speaker. Ideas rattle out of my mouth at 200 words per minute and now that transcription technology has advanced sufficiently, that means a lot of raw, unfiltered thought-data for AI to work with. I started using speech-to-text (STT) in 2023 through the first wave of “AI assisted meeting tools” like Otter. Since then I’ve abandoned Otter for some home brewed local transcription tools which run on my MacBook, but the process of transcribing speech is now a fundamental part of my writing and my work.

The mental model shift was simply “text first to voice first”. Drafts happen verbally. Ideas are recorded. GenAI prompts are spoken. Extending the idea further: If GenAI can comfortably handle millions of words of text, and speech transcripts are just text files, then what is the limit on the voice-to-AI pipeline? I haven’t hit that limit yet.

AI doesn’t care if your transcript is a block of gibberish

PowerPoint and Word files are just XML. This sounds technical, and it is, but the implication is huge. A Microsoft Word “.docx” file isn’t a proprietary black box. It’s basically a zip archive containing XML files that describe the document’s structure, formatting, and content. The same is true of .pptx files. When I realised that AI tools like Claude Code could unpack these archives and edit the XML directly, it changed how I work with these applications.

For some time, LLM-based apps have been able to generate documents and slides. You’ve possibly seen this in ChatGPT: it creates a file using python-pptx and you get a simple black-on-white text slideshow. With the XML-unpacking method, instead of relying on Python libraries that produce clunky, inconsistent formatting, I could get precise, professional-looking documents with exact control over styles, spacing, headers, and layout.

The mental model shift was that AI doesn’t just use software to edit your files. It can reach inside the file format itself. Extending this idea further: What other file formats are just “a bunch of code and stuff”? Basically all of them. PDFs, for example, are notoriously hard to copy and paste from because when you unpack them they’re gibberish: the formatting was never meant for human eyes. LLMs don’t care.

The mystical contents of a PowerPoint file: It’s XML all the way down

AI can check its own work by looking at it. This one changed how I build almost everything. AI tools with vision capabilities (like I discussed above) can take a screenshot of something they’ve just created, a slide, a webpage, a document, and evaluate it visually. Does the layout look right? Is the text overlapping? Are the colours consistent? I now build this into my workflows as a quality assurance step: create the output, screenshot it, review it, fix it. The mental model shift: AI isn’t just a generator. It can be its own critic.

Extending the mental model further: Almost everything you and I use a computer for has a “graphical user interface” (GUI) designed for human eyes. Now that AI has high quality image recognition, it can “see” all of those carefully designed interfaces.

AI can make videos. Video generation is a contentious topic due to the many copyright concerns, but AI doesn’t have to create videos through GenAI methods. Instead, AI can create videos by writing code that orchestrates open-source editing tools. I discovered Remotion, a framework for creating videos programmatically with code. Claude Code can write Remotion projects, composing scenes, adding transitions, syncing text and visuals, and rendering the output. The company has even released “skills” which can be loaded into most major AI coding platforms like Claude Code, Codex, and Cursor.

The mental model shift this time had little to do with video: AI isn’t limited to the formats you see in the chat window. If there’s an open-source tool that does the job, AI can probably drive it.

Extending this idea further: What other open source apps are available that can be run directly using code? There is a vast array of software out there. If you can think of something you’d like to do on a computer, chances are someone has made an open source tool that can do it. And AI can use that tool.

Here’s an example video created by Claude Code using the Remotion skill, editing a screen recording of the app/website I describe a little later in this article.

You can chain file formats together. I don’t like how PowerPoint slides look. I do like how HTML slide decks look, with their clean typography and modern layouts. But I also like PowerPoint’s functionality: presenter notes, easy sharing, compatibility with school systems. So I learned to do both. I use Claude Code to build slides in PowerPoint for structure and content, then upgrade them visually by rendering them as HTML, and then screenshot those beautiful HTML slides back onto blank PowerPoint slides. The result is a .pptx file that looks like a designer made it. The mental model shift: file formats aren’t endpoints. They’re waypoints. You can move between them strategically, using each format for what it does best.

URLs can store A WHOLE LOT of data. On the weekend I read a LinkedIn post from US educator Evan Peck. Peck had made a very lightweight and super fast browser app that allows users to share data tables by encoding the data into the sharing link itself. I sort of knew that URLs could encode data, but I’d never really seen it in practice. At the same time, while doomscrolling everyone’s favourite B2B sales platform, I came across a post from Idaho State’s Joel Gladd who is building an AI-agent writing app. He called it Margent after ‘marginalia’ and ‘agent’.

“Marginalia” reminded me of the collab annotation plugin Hypothesis, and those mental connections led me, over the course of an hour, to build and deploy a simple close reading and annotation app I’ve called read/closer. The premise, drawing on Peck’s data table app, is that you can share a portion of text and annotate it with multiple users just by encoding everything in the URL. No logins, no cloud services, no seeking permission to share documents. Just copy/paste/annotate/share.

However, I also hit many problems developing this app – and most of them were due to the upper limit of my understanding of code and computers. Google Chrome’s browser, for instance, flags long URLs as potential phising threats, meaning that many people encounter a bright red security warning screen when they try to use the app. Chatting back and forth in Claude Code hasn’t magically revealed a solution, and I suspect that’s partly because I don’t know the right questions to ask. You can try read/closer here, if your browser lets you.

The mental model shift this time happened in the space of a couple of hours on a Sunday morning: URLs aren’t just weblinks, they’re data.

Extending that idea further: What else can be encoded directly in URLs? How does the internet actually work…? What can we build on top of the infrastructure we already have, using AI to do the lifting on the code?

And finally… I needed an app, so I just asked for one. I wanted a simple markdown reader for my Mac, something lightweight that would render .md files nicely. Instead of searching the App Store or GitHub, I asked Claude Code to build one as a native Swift application. It wrote the code, compiled it, and I had a working app. The mental model shift: you don’t need to find software. You can commission it. On the spot, for free, built to your exact specifications.

Extending this further: Build anything.

A Few More Examples

AI can browse the web. Not just search it, but actually navigate websites, click buttons, fill in forms, and extract information from pages. The Claude extension for Chrome can semi-autonomously complete tasks online: researching, comparing, booking, submitting. This is not a chatbot. It’s closer to an assistant with a browser.

AI can connect to your actual tools. Through protocols like MCP (Model Context Protocol), AI can read and write to your email, your calendar, your databases, your project management tools. Not through copy-paste, but through direct integration. I’ve built connections between Claude and my CRM, my email platform, and my website. AI isn’t a separate application you switch to: it can sit inside your existing workflow and act on your behalf.

AI can simulate and model. Give it a set of variables and ask it to run scenarios. What happens to my school’s budget if enrolments drop by 10%? What does the timetable look like if we add a new subject in Year 9? How does changing the weighting of an assessment task affect the grade distribution? They’re the kinds of questions school leaders ask constantly, and AI can build the model, run the numbers, and visualise the results.

Shift Your Mental Models

Every one of these examples, from the mundane to the complex, represents the same underlying principle: GenAI is not a textin-text-out chatbot. It’s a general-purpose interface to computing. The ceiling isn’t the technology: it’s our mental models of the technology.

The discoverability problem means most people will never stumble across these capabilities on their own. The blank text box doesn’t hint at any of them. You have to be told, or you have to experiment relentlessly, or you have to follow people who do.

That’s the If You Know You Know problem. Once someone shows you that AI can edit the XML inside a PowerPoint file, or build you a native app, or check its own visual output, your mental model cracks open a little. You start asking different questions. You stop thinking about what AI is and start thinking about what it could do.

But if nobody shows you, you’re still typing prompts into a text box and wondering what the fuss is about.

This is the second post in the IYKYK series. Subscribe to the mailing list for updates.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

← Back

Leon Furze