It’s impossible to predict exactly what GenAI technologies will do, especially given the frantic pace of development from most tech companies. But there are certain trends and areas we can look at to make a few near-future predictions. Every six months or so, I update these posts with some of those near-future predictions.
The first set of predictions were in the final chapter of my January 2024 book, Practical AI Strategies. In the book, I questioned the future of “prompt engineering” and argued that GenAI would open up new jobs and opportunities, but the much-hyped job title of “prompt engineer” wouldn’t be one of them for very long.
I focused instead on multimodal GenAI, coding, and the potential risks of so-called “personalized learning“. I think it’s safe to say that the January 2024 predictions have all borne fruit. Everyone from tech companies to popular AI pundits have agreed that prompt engineering is far less important than it used to be now that models are more capable. And multimodal GenAI, “vibe coding”, and LLMs specifically designed for software development have continued to improve.
On the blog, I’ve updated my initial predictions several times over the past couple of years. In June 2024 I presented to the AISNSW management conference and outlined how my thinking had changed in the six-months since publishing the book. Then, in September and November 2024, in a two-part article, I suggested six trends to focus on: deepfakes, reasoning models, a move from general-to-specific, “offline” AI, wearables, and AI agents.
It’s time to update the predictions again: to see what stays, what goes, and what’s new. This post covers the first three “near future” predictions.

AI Agents
In 2024, the term “AI Agents” was starting to gain traction. By the end of 2025, it has become an industry buzzword. Every tech company is working on agents, but can’t seem to agree on what they actually are…
A Microsoft Copilot Agent, for example, is essentially a custom chatbot. It has a set of custom instructions, a defined knowledge base (for example from SharePoint), and a specific job. An OpenAI Agent, on the other hand, is a browser-using, internet searching, semi-autonomous “assistant” which can navigate webpages more like a human user, and interact with applications like Libre Office.

I like technologist Simon Willison’s definition: “An LLM agent runs tools in a loop to achieve a goal.” No hype, no mystery (Willison also invented the term “prompt injection” for a popular type of attack on LLMs, and popularised the term “slop”, so we can take his word for it).
Under this definition, an AI Agent is more like OpenAI’s application: it can use a browser (tool) and runs in a loop following the user’s initial prompt until the job is complete. This definition then extends to other “Agents” including the one in OpenAI’s Atlas browser, Perplexity’s Comet, and perhaps even elements of the new advanced features of Google NotebookLM, AI Studio, and other Gemini 3 Pro-based apps.

Digital Plastic
I’m still convinced that we’re only at the start of GenAI and multimodal texts. In 2023, I coined the term “digital plastic” to describe the potentially helpful and harmful implications of GenAI media. Like real-world plastics in the environment, these synthetic media are already ubiquitous in the online ecosystem.
We have also used the term digital plastic in our Critical AI Literacy research, most recently in a paper published in Pedagogies where we provide examples of how the concept can be used in lessons on multimodal AI.
Right now, multimodal GenAI is in its infancy compared to the production of powerful LLM-based applications, but it is rapidly catching up. The quality of images has already reached the point where AI-generated content is indistinguishable from “real” images. Google recently released Nano Banana Pro, their latest image model, which shows how quickly the company has caught up with others like Midjourney.
Prompt: digital photography, classroom documentary, an Australian vocational education evening class on electrical engineering


Google is also making strides with video generation, in direct competition with OpenAI’s Sora model and other companies whoa re working on increasingly long and accurate audio/video content. This example comes from Google’s Veo 3.1 video model, using the image above as a prompt:
The Model Arms Race
Google hasn’t only been focusing on their image and video models. Shortly before publishing this article, they also released the Gemini 3 Pro model as their most sophisticated LLM. Google released Gemini 3 Pro across their entire ecosystem, and even made it available via the open source platform Ollama in their cloud based service.
Google’s model released followed OpenAI’s (much less successful) launch of GPT-5 in August this year. GPT-5 was so hyped that the reality couldn’t possibly live up to the expectations: Sam Altman has been building up the product since back in 2023. OpenAI have since released GPT-5.1, which mostly added a feature to change the “tone” of the model to a variety of present options including “quirky” and “cynical”.

Just a few days after Google Gemini 3 Pro emerged on the top of the leading benchmarks, Anthropic released Claude Opus 4.5 and knocked them off again. It’s becoming very hard to accurately measure the competency of competing models, and I’m not sure how much longer our existing benchmarks will be helpful.
As the arms race between the major AI companies continues, it also seems as though each developer is trying to establish which niche it fits into. Google is leveraging its existing ecosystem, building Gemini into Workspace (Docs, Slides, Gmail, etc.) and developing multimodal platforms like Flow for image and video creation.
OpenAI appears to be catering to the consumer market, “dumbing down” models and adding features like shopping assistants and “erotic content.”
Anthropic is going after programmers and developers, with extensive papers on its benchmark performance in coding evaluations, and more features in its software development platforms.
Microsoft is… Microsoft. Copilot is still using GPT-4 under the hood in most cases, and their suite of various products is as confusing as ever. Copilot in Edge, Copilot Mode in Edge, Copilot MS365, Copilot Agents, Copilot Studio Agents, Copilot Pages, Copilot Notebooks… I have no idea what most of their products are, frankly. Hopefully at some point their different departments will set up an email thread (or, God forbid, a Teams channel) and have a conversation with one another.
The Near Future of GenAI…
In the next article, I’ll look at three more areas which will be important for the near future of these technologies: sovereign AI, small language models, and the AI bubble.
Join the mailing list to stay up to date with a weekly email from the blog:
Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

Leave a Reply