Generative artificial intelligence in the form of chatbots like ChatGPT has been hyped to the point where reasonable discussions over the technology descend into pure chaos.
There’s so much potential for systemic harm from things like bias, the environmental issues, and copyright and privacy concerns that it can be difficult at times to identify anything positive, but there are positives to large language models and associated technologies if we look beyond the hype, existential threats and the generally ludicrous discussions of artificial general intelligence.
In this article I demonstrate some real world applications of generative AI related technologies as assistive tech. Rather than thinking about GenAI as a particular chatbot like ChatGPT or Gemini, I’ll break down these platforms into their component parts: the language model, image recognition models, text to speech and speech to text, and interrelated technologies that have existed long before the ChatGPT hype, such as optical character recognition (OCR).
We are currently being sold a one size fits all solution in the form of AI chatbots, but by breaking down these hype fuelling headline grabbers, we can start to investigate some genuinely useful and exciting possibilities.
One important caveat of this post is that I am not blind, deaf, dyslexic, or physically disabled. Though I am diagnosed autistic/ADHD, I can’t speak for everyone in that community either. If there’s anything in this article that jars with your understanding of assistive technologies, or if you have lived experience you’d like to share, then by all means get in touch.
If you’re interested in AI images, audio, and video, make sure to grab the free 20+ page resource How to Spot a Deepfake by signing up here:

Blindness and Low Vision
There are already applications which use combinations of image recognition and text to speech as assistive technologies for blind and low vision people. As image recognition capabilities improve, these applications become more accurate. Although image recognition is now folded into platforms like ChatGPT, it is a much longer standing field of artificial intelligence than the recent advances in large language models. When paired with large language models, the capabilities improve further, since the output from the image classifier can be interpreted into natural language and then spoken by an AI voice. There are already examples of this underway.

Be My Eyes is an application for blind and low vision people which can connect a real time camera feed to another human to provide visual descriptions. Be My Eyes now incorporates a GPT based model which uses OpenAI’s GPT-4V vision and 4o language model to generate conversational descriptions without waiting to connect to a human volunteer. Microsoft’s Seeing AI offers multi sentence scene explanations and is powered by a similarly powerful vision language model.
Smart glasses are also beginning to incorporate these technologies, including Meta Ray-Ban’s partnership with Be My Eyes and Envision glasses, which stream video to GPT-4 to allow for hands free operation.
The Near Future
The multimodal AI models are becoming smaller and more efficient. This will allow models to be loaded and run fully offline, giving private image description and offering image description even in areas where internet connection is patchy or non-existent. For blind and low vision people in rural or regional areas or those dealing with sensitive or private matters that they would rather not send to the cloud, these technologies could be hugely advantageous.
Audio assisted foveated augmented reality as an extension of smart glasses and other wearables could use efficient and accurate multimodal generative AI models to provide always on scene narration.
Deafness and Hard of Hearing
Research teams from various institutions including Google have pushed sign language recognition past 90% word error rate on continuous signing, and pilot apps are working on translating BSL and ASL into subtitles on the phone screen. Sign languages like BSL, ASL and Auslan are languages with their own grammar and “intonation.” So these models, at the moment, struggle with accuracy and nuance and are not yet a suitable replacement for a skilled human signer. However, they are showing promise and AI generated signing avatars may be viable in the near future.

Voice to text captioning is already much more accurate. Whisper class models such as OpenAI’s are currently some of the best available and are already efficient enough to run locally on laptops or as plugins in the Chrome browser, giving real time captions in meetings with no cloud connection. You will have likely already seen real time captioning in platforms like Zoom, LinkedIn and built into Microsoft PowerPoint. All of these are powered by related machine learning and AI technologies.
The Near Future
Speech to sign avatars, which use the same kinds of diffusion models as image generation and video generation, could potentially create very effective real time speech to sign avatars. These will likely take a while before they can replace a skilled human signer, if ever.
These technologies will need to be incredibly flexible, since there are, of course, differences between a person born completely deaf, partially deaf, a person who has lost hearing during childhood versus adulthood, and so on, so there will need to be a range of flexible technologies which are appropriate for different people.
Speech and Motor Impairments
There are many ways in which recognition and generation technologies can support people with hereditary and degenerative speech and motor impairments and disabilities as a result of injury. In the wild right now, Google’s Project Euphonia curates disordered speech corpora and fine tunes automatic speech recognition so that people with ALS or cerebral palsy can get accurate dictation and command control of mobility devices.

Predictive text systems have long been used to support Augmentative and Alternative Communication (AAC) systems, which allow people with limited mobility and no speech to communicate. And increasingly, these systems can use large language model based or transformer based technologies to guess the next symbol a user intends, cutting down the number of taps and the amount of time it takes to build a sentence.
These technologies are already incredibly important for individuals who rely on these communication technologies.
The Near Future
Personal voice cloning is often seen predominantly as a threat to security and identity, but it could be potentially advantageous for a model that can run from a few seconds of audio to allow those with degenerative conditions to speak in their own voice before illness.
Advances in multimodal speech to speech models are increasingly effective at recognising complex accents and dialects, and can also be used to transcribe dysarthric speech, or speech which is otherwise hard to understand, for example, due to late stage Parkinson’s.
Dyslexia, ADHD and Executive Function Challenges
Large language model based applications like ChatGPT are already being touted as “solutions” for learners with dyslexia and ADHD, but I believe that a more focused commitment from communities with lived experience will result in fine tuned and much more nuanced technologies which are actually helpful. For example, technologies which pair optical character recognition (OCR) with conversational text to speech tutors that can read text aloud in real time and also transcribe verbal replies would be incredibly helpful.
There are many applications on the market at the moment, such as Notion AI and Speechify, which are marketed explicitly at ADHD individuals. I’ve never tried any of these products, so I can’t comment on their efficacy, but I do use artificial intelligence myself as a way to manage my writing process as an organisational tool and generally to help get things done.
The Near Future
Adaptive multimodal reading platforms that switch between audio, text and graphics have potential, as long as again, they are developed by people with lived experience and not just used as a marketing tool.
Autism and Social Communication Support
Like dyslexia and ADHD, we’re already seeing an influx of platforms and technologies which claim to directly support autistic individuals. For example, AI social story generators, which can automatically create personalised text or images modelling routines and which are adjustable for language complexity and imagery. Social stories are a well evidenced and supported way to support autistic individuals, both children and adults.
I personally find “social stories” quite useful when traveling to a new place, and can spend a long time researching and planning and preparation for travel. So if these technologies can help, then I’m all for it.
The Near Future
Some of the technologies mentioned earlier, which can support non-verbal autistics, including eye tracking and AAC software will be further improved by these technologies.
A word of caution here. I have already seen applications which claim to support autistic individuals through methods like translating their experience into “neurotypical” behaviours and languages. I do not think that technologies like this should be the near future of generative AI for autistic individuals.
Physical Disability and Mobility
Technologies related to large language models, such as transformer technologies and neural network based technologies, which infer spatial information, not just text, are being used to improve robotics. Combinations of these technologies, including eye tracking, motion sensors and some of the text to speech variations discussed earlier are improving AI guided prosthetic limbs using computer vision.
Machine learning response systems similar to those used in very large language models allow these devices to predict grip with muscle control and research prototypes are adding adaptive sensory feedback. These tactile responses feel natural to the wearer and allow for even more fine grain control.
The Near Future
Powered limbs and “exoskeleton” style systems can use artificial intelligence models trained on millions of motion capture frames and other data points to allow for smooth, energy efficient walking for people with spinal injuries. And it is likely in the future that we will see combinations of robotics and language model based technologies to allow you to verbally give instructions to something like a GPT enabled service robot.
These needn’t be prohibitively expensive, like Tesla’s advertised Optimus robots (which we must remember don’t actually exist yet). Open source organizations like Hugging Face are already releasing low cost open source build your own robotics kits, which will continue to improve.

Conclusion
Multimodal generative AI has been subsumed by hype around chatbots and artificial general intelligence. Companies like OpenAI and Google are doing far more harm than good, and while the media stokes these fires, venture capital fills their coffers with billions of dollars.
But there are researchers, sometimes even within these companies, doing great work and making genuinely helpful technology. If we can steer ourselves away from the hype, existential threats, and “AGI”, and look at the component parts of AI, machine learning and transformer based technologies which already work and can be improved further, then we can build amazing things.
If you have seen any AI, including text generation, image recognition, image generation, text to speech and speech to text, used to create a genuinely useful assistive technology, I would really like to hear about it. Please leave a comment on this post, use the contact form at the bottom, or drop a comment on whichever social media platform you found this article on.
It’s time we stopped paying attention to ChatGPT and started to focus on technologies which genuinely make our lives better.
Subscribe to the mailing list for updates, resources, and offers
As internet search gets consumed by AI, it’s more important than ever for audiences to directly subscribe to authors. Mailing list subscribers get a weekly digest of the articles and resources on this blog, plus early access and discounts to online courses and materials. Unsubscribe any time.
Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

Leave a Reply