AI = UX: How GenAI will shape the way we interact with digital technologies

This post is part of a series exploring fundamental concepts to help prepare for a future built with artificial intelligence. For the previous post on why human expertise is still necessary, click here.

As generative artificial intelligence evolves, it will shape the way we interact with all our digital technologies: that means computers, laptops, tablets, smartphones, VR devices, and pretty much anything with a display and power will become “AI-first”.

Since the release of ChatGPT, we’ve seen a lot of focus on chatbots and the familiar text-in, text-out interaction. However, it’s only a matter of time before the chatbot interface fades into the background, and the actual logic of large language model-based interactions becomes a more seamless part of our everyday devices. This won’t happen all at once, but there are some near-future trajectories we can predict based on the directions major tech companies are pushing their products and the rhetoric from their CEOs.

The Near Future of Generative Artificial Intelligence in Education: September 2024 Update

Working with Text

Although it can be an interesting experiment to generate haikus, shopping lists, or even entire academic papers with large language models, it’s not really something they’re well-suited for. More recent advanced models like GPT-4 are capable of pseudo-reasoning, stepping through processes more adeptly, but interacting with them to create lengthy text is frankly boring for both the creator and the reader. It’s also a fairly pointless exercise, as AI can summarise other AI-generated content and use it to create equally banal responses.

We now have a technology that can speak our language (note I said “speak,” but not “understand”). The utility of a speaking machine goes far beyond writing tasks, and an obvious place to look for the implications of text generation is in the software industry, where code completion and natural language programming tools are spreading like wildfire.

I’m not here to debate whether the code produced by GPT, LLaMA, and the like is any good or as good as a human programmer. The way that software engineers are interacting with their devices is already changing, and sooner or later, this will play out in devices and operating systems. After all, the people using these powerful AI-assisted coding tools are the people designing and building the consumer products that you and I interact with.

The Practical AI Strategies online course is available now! Over 4 hours of content split into 10-20 minute lessons, covering 6 key areas of Generative AI. You’ll learn how GenAI works, how to prompt text, image, and other models, and the ethical implications of this complex technology. You will also learn how to adapt education and assessment practices to deal with GenAI. This course has been designed for K-12 and Higher Education, and is available now.

Check out the course

I’ve written before about the idea of creating software on demand using GenAI, and it’s still a complicated and lengthy process, albeit not nearly as complicated and lengthy as actually learning how to code yourself from scratch. But as these technologies advance further, the way we interact with devices will become significantly different. Launching an application may look more like describing the application you want to launch than using a specific piece of pre-built software.

When this happens, we’ll need to rethink the entire idea of applications. We’re actually seeing glimpses of that already in operating systems in ways which have been established for a number of years now. Think about the widgets you can install on any smartphone or desktop computer – customisable snippets of larger programs. Now imagine a user interface filled with widgets specifically designed while you use it, on the fly, in response to whatever tasks you’re currently working on.

The widgetiser

Need to take a note? Spawn a note widget. Need to transfer that note into a lengthier document? Morph your widget into something resembling a more operational word processing app. Have some things from that longer document which need to be added to your to-do list or sent to colleagues via email? Use the AI assistant hovering alongside your expanded application widget to take care of all those tasks for you.

The question will become not which apps you use for a certain task, but which AI-based operating system you work within. It’s highly likely that this will be narrowed down to Microsoft, Google, Meta, and Apple. So ultimately, it probably won’t matter which of the four main players (or five, if we say, for the sake of argument, that OpenAI manages to sustain itself beyond Sam Altman’s ego) because they’ll need to be interoperable. There’s no point having a system where Apple users can only contact Apple users. That’s been true for well over a decade, and we’re not going to go backwards. So a lot of this will come down to users’ personal preferences, enterprise agreements, and whether these companies specialise in targeting certain demographics (e.g., Microsoft for business, Google for mainstream consumers, Apple for high-end and people concerned more with aesthetics, Meta for as many people as they can get their hands on).

In fact, the day after I drafted this article, OpenAI basically confirmed my “widget theory” by releasing a new feature for Plus users called ‘Canvas’.

By including a trigger phrase like “open a canvas”, or in response to a task that the model infers will require the canvas feature, you can now conjure up a new sidebar and some specific editing features within ChatGPT. These features refine the big, broad application into a more streamlined and fit-for-purpose application suited for editing text and code.

Here’s a short video of GPT-4o Canvas working through the introduction to our recently published Deepfake Research Agenda:

ChatGPT’s new GPT-4o Canvas feature allows users to interact and edit text directly

And another longer example, this time with some voice over from me describing the process of using GPT-4o with Canvas alongside some coding.

As you can see, the traditional chatbot interface of ChatGPT shifts to the side panel, and the text being worked on fills the main screen. As the text is edited in realtime, the Canvas feature highlights changes, makes inline comments, and produces suggestions. You can interact with the text either via the chatbot sidebar or by highlighting and commenting on the text itself.

Using my term from before, it’s an editing widget built into the main ChatGPT application, and it makes for a more useful experience than editing text with the basic chatbot interface.

AI-First Devices

Another logical extension of current AI technologies is that we’ll see an increase in AI-powered, AI-first devices. We’re already seeing people play in this space with wearables like the failed Humane Pin, the creepy Friend necklace, and the hundreds of AI note-taking tools currently marketing themselves as memory aids. But the most serious avenue here seems to be Meta’s recent play with Orion augmented reality glasses.

The Orion glasses, touted at the recent Meta Connect, reportedly cost $10,000 a pair just to manufacture. It’s not a consumer technology yet – it’s a demonstration of the future that Meta is trying to create, and a middle finger to Apple with its heavy, expensive, wired and tethered Vision Pro VR headset. As far as Meta is concerned, the future is clearly glasses, not goggles, and there’s not a smartphone in sight.

In order for this technology to be successful, AI is not only a compulsory part of the tech stack but a significant element of the entire user experience. Though the demo Orion has a “neural wristband” tracking hand gestures such as pinches and scrolls, a lot of the action will be powered by Meta AI and future versions of the LLaMA large language model. Meta just released LLaMA 3.2, including a 90 billion parameter model with vision capabilities, and the cameras in the frame of the Orion allow the AI-augmented glasses to see the world around them, not only necessary for tracking and placing augmented reality objects but also for passing data to the language model.

In fact, this technology already exists and is commercially available in Meta’s Ray-Ban partnership, albeit currently limited to certain countries. You need access to Meta AI to use this feature – you simply speak into the microphones placed strategically around the glasses, and Meta AI (at the moment via a Bluetooth connection to a phone) uses speech-to-text and image recognition in combination to read the world around you and respond accordingly.

Meta clearly hopes to replace smartphones with AR glasses and wants to do so ahead of established smartphone manufacturers, particularly Google and Apple. It’s likely that many of the technologies being incorporated into AR will influence all consumer electronics. For example, the AI-powered image recognition, face tracking, and biometrics used in augmented reality glasses will no doubt be incorporated into smartphones, tablets, and laptops with webcams. The graphics technologies and displays being built for AR will probably appear in other formats too, and it’s likely that artificial intelligence will be the main way that we interact with these.

And of course, one of the most important user interfaces for artificial intelligence will be voice.

Why type when you can speak?

OpenAI has finally released its much-touted advanced voice model. After experimenting with it for a few days, it clearly isn’t the product promoted at that showcase earlier this year. It’s less engaging than the discontinued Scarlett Johansson-like Sky voice, and also less flirtatious and less “human.” This is either a deliberate move by OpenAI to tone down the human-like qualities of their advanced voice model in response to criticism, or more likely, a technical decision based on the complexities of reliably scaling their demonstrated model across millions of users worldwide with no adverse side effects.

But speaking to advanced voice models is markedly different from even OpenAI’s previous models. It’s less robotic, its voices are more believable, its accents and styles more varied and more malleable, and though not quite as human as Sky(lett), still convincing enough that it feels as though we’re stepping out of the uncanny valley.

Unfortunately, it’s not yet a normal way to interact with technology. Even in the confines of my own home, with only my wife and kids around me, I honestly felt like a complete lemon talking to ChatGPT. Unless I could convincingly pretend I was on a phone call, I almost certainly wouldn’t do it in public. And for the time being, at least, it doesn’t work particularly well while driving because of internet connection dropouts and probably poor microphone quality over Bluetooth.

You too can feel like a lemon whilst talking to your AI companion.

Given recent rhetoric from the likes of Sam Altman and Mustafa Suleyman, voice chat interactions are, however, the future that Silicon Valley wants us to see. We’ve also seen Google making strides with its Gemini model, and ElevenLabs producing extremely high-quality AI-generated voices.

It’s difficult to imagine voice AI being particularly useful in a crowded setting like a classroom with a few dozen kids, everybody shouting at their own individual device. But as the technology continues to improve, I can see the utility of hybrid conversations – the user wearing a headset, typing their side of the conversation and listening in response, or maybe even a combination of other technologies like eye tracking and neural wristbands for interacting with the computer, with verbal responses from the AI in return.

People will become more comfortable with AI, and it will likely normalise over time, particularly as younger generations adopt the technology, and with the development of those AI-first devices such as AR glasses.

I’m sure I’ll be surprised the first time I see somebody walking down the street clearly talking to an artificial intelligence. Saying that, and considering I live in regional Australia, the first person I’m likely to see doing that is actually me.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

← Back

Thank you for your response. ✨

One response to “AI = UX: How GenAI will shape the way we interact with digital technologies”

Hands on with ChatGPT Canvas – Leon Furze

October 16, 2024 at 8:27 am

[…] Rather than being a fully fleshed-out feature of ChatGPT, it’s more of a proof of concept. In an earlier post, I wrote about how artificial intelligence will impact the user interface of most…, and I called miniature applications like Canvas “AI […]

Loading…

Dr Leon Furze