Multimodal Discourse and Generative AI

I’ve been re-reading Kress and Van Leeuwen’s 2001 book, Multimodal Discourse for the nth time, and, despite showing its age in some areas, it still offers some great language for discussing Generative AI. Kress and Van Leeuwen presented a new theory of multimodal semiotics, designed up update monomodal theories of communication with new technologies in mind.

The book makes a compelling case for the interaction of modes, media, and thought processes in the composition and production of texts, and every time I read it I find new connections relevant to the discussion of multimodal Generative AI. I have written several posts in the past exploring how the term “multimodal” has been appropriated in a very shallow sense by technology companies.

In What is Multimodal Generative Artificial Intelligence? I suggested that we need to look beyond the obvious definition of multimodal technologies as those which can communicate in more than one mode. We need to consider the implications of technologies which can appear to understand language, interpret speech, produce images, “read” images, and so on.

Then, in Making meaning with multimodal GenAI I proposed an “x to y” framework to help understand the many combinations of modes possible in even a very limited sense, for example text-to-text, text-to-image, image-to-speech, and so on.

https://leonfurze.com/2024/04/23/making-meaning-with-multimodal-genai/

In this article, I’ll be revisiting four key concepts which form the “strata” of Kress and Van Leeuwen’s communication theory: discourse, design, production, and distribution.

Multimodality

Writing in 2001, Kress and Van Leeuwen’s text begins with the claim that in Western culture we have historically placed a lot of value on monomodal forms of communication, celebrating literary novels, official documents, and academic publications as the “highest” form of writing. In the visual arts, single modes in a particular medium – for example, oil on canvas – have similarly dominated. But, the authors argue, this started to shift with the rise of multimodal and in particular digital texts.

They point out that the shift towards multimodality has been, in some traditional media, a source of specialisation and hierarchy: despite the multimodal nature of a print newspaper (text, images, colour, use of space, etc.), individual components of the text were assigned to different individuals. This might still be the case with some forms of media, but the rise of digital texts has shifted the responsibilities for much of that production to individuals who have access to increasingly easy-to-use multimodal authoring tools.

That’s easy to see, and extends well beyond the early 2000s. As I sit and write this article (actually sitting in a cafe typing the draft rather than dictating it, which is more common), I’m using the WordPress platform – a no code, browser-based application which combines typical word processing features (style, font, structure) with web editing tools (html, blocks, images etc.).

The move from “traditional” multimodal texts to digital, in many cases, looks something like this:

Traditional multimodal media versus blog posts

My interest lies at the intersection of digital texts and even more recent technologies which have been enabled by the recent improvements in Generative AI. That includes text generation, image generation, audio generation, video generation, and code.

In Multimodal Discourse, Kress and Van Leeuwen argue that multimodal resources are available to the designers and creators of texts which can be used to make meaning in any mode, and which is relevant within a culture or society. They use the term strata, adopted from Hallidayan functional linguistics, to outline the domains in which meaning is made.

My interest, then, is whether Generative Artificial Intelligence represents another step along the path that Kress and Van Leeuwen illuminated in the early 2000s with their discussions of digital text. Does Generative AI “flatten” the strata further, bringing even more of the discourse, design, production and distribution of text to the level of the individual?

Discourse

Discourses are “socially constructed knowledges of (some aspect of) reality” (p. 4). In broad terms, a discourse is any form of communication or interaction that shapes and constructs our understanding of a particular topic or concept. It involves not only language and conversation, but also visual representations, practices, and power structures that influence how we perceive and interpret the world around us. Discourses can be found in various contexts, such as politics, media, academia, and everyday life, and they play a significant role in shaping our beliefs, values, and societal norms.

Discourses invite and exclude particular viewpoints: they are shaped by society and culture, and by the individual’s views and values. When I write about GenAI and education, my articles like Education doesn’t need more sparkle ✨ and How Artificial Intelligence Can Catch Up With Pedagogy are clearly influenced by my own experiences, the articles and books I read, and the people I talk to. But they are also more broadly influenced by the “broken edtech” discourse of technology in education, and by discourses of performativity, neoliberalism in education, and technology in society beyond education.

Kress and Van Leeuwen use the example of Russian film-maker Eisenstein, and his planned adaptation of Marx’s Capital, to demonstrate that sometimes it is necessary to develop new methods in order to extend the “semiotic reach” of a medium. In the 1920s, no cinematic device or method existed which could realise the discourses of Marx’s text. Eisenstein’s “dialectical montage” as a proposed solution has influenced decades of film-makers since.

What new methods do we need to develop in order to extend the semiotic reach of digital texts? How can we talk about generative AI, but also with and through the technology, and in ways which offer new means to realise the discourses of technology and text?

Design

If discourse is the broad, constructed knowledge of an issue, then design is the means by which we begin to realise discourse. Designs also add to discourse: the choices we make as we design texts determine the conventions and norms we will follow, or perhaps subvert.

Design is intertwined with but also separate from the production of text. In Kress and Van Leeuwen’s example, the architect may design a building, but does not build it. Discourse provides the architect the conceptual tools with which to design the building: how people live inside houses, the function of houses as part of broader infrastructure, and so on. The design of the building might happen in the architect’s mind, or on paper, or through computer software, or a combination of methods.

Design also begins the process of turning social knowledge (discourse) into action, and in so doing further shapes the discourse. But the resources available in design are still abstract, and “capable of being realised in different materialities”. When I design an article such as this, for example, I might use a voice memo, written notes, sketches, or only “mentally” design the text, going right from idea to finished product and seemingly bypassing design altogether.

Surprisingly, GPT-4o’s image recognition can decipher my handwriting. Which is good, because most of the time I can’t.

Production

Production is the material realisation of the discourse/design. It’s the blog you’re reading now, or possibly the print version of this article in a book I haven’t written yet. Production involves the technical skills of composition, whether that is writing, painting, digital art, or the use of Generative AI. It is also the choice of media: print, digital, oil on canvas. Like the design of a text, all of these features are important to the meaning of the product.

It can be difficult to separate the semiotic mode from the medium or the design from the production. The example I gave above works here: when I write an article “off the top of my head” I am simultaneously designing and producing. The modes of communication (written and visual language) and the medium (blog post) unfold in real time. When I plan an article on paper, or verbally draft it first, the two layers of the strata are more distinct.

There is a “shadow side” the separation of design and production, however, which Kress and Van Leeuwen comment on several times throughout Multimodal Discourse. It’s an observation specifically related to education, and important enough to quote in full:

Teachers, for instance, may either design their own lessons or merely ‘execute’ a detailed syllabus designed by expert educators. In other words, when design and production separate, design becomes a means for controlling the actions of others, the potential for a unity between discourse, design, and production diminishes, and there is no longer room for the ‘producers’ to make the design ‘their own’, to add their own accent. (p. 7)

Think about that quote in the context of current narratives in education, such as the recommendations from think tanks like Grattan Institute that teachers want (or “need”) shared lesson plans – developed, of course, by companies like Ochre Education, associated with Grattan itself.

And then extend that line of thought towards Generative AI, already touted by technology and edtech companies as the answer to everything from lesson planning to behaviour management and counselling services. In fact, at a panel last year, Grattan’s Nick Parkinson stated that “generative AI clearly has a role to play here” during a discussion of teacher workload and the creation of the time teachers spend creating resources: everyone is interested in this technology, and how it might impact teacher workload.

But is GenAI the answer to teacher workload? Or is it simply another tool which, just like the curriculum of the late nineties and early 2000s discussed in Multimodal Discourse, will ultimately put distance between design and production and strip teachers of their agency?

Distribution

The final stratum of Kress and Van Leeuwen’s model is distribution, where the product is re-coded for dissemination either in the physical sense (e.g., recording onto a vinyl record) or in the sense of distributing it such as via television or radio. Like other aspects of multimodal text, this stage may involve other specialists or individuals, such as the music technicians required to turn an artist’s recording into a finished piece. Or, the individual may control the distribution, for example in the way I can hit ‘Publish’ on my own articles.

Distribution also include the redistribution of text. What change when I distribute an article directly from my blog (to subscribed readers who receive a notification) versus to my mailing list (with a weekly digest) or via social media (with my own “LinkedIn-style” commentary)? How is the meaning changed when the material is distributed by others, for example a reshare on social media, or a discussion in someone else’s podcast?

Reinterpretation, redesign, and renewal

One area which Kress and Van Leeuwen touch on, but do not spend a great deal of time discussing, is the iterative process of multimodal composition. I think it’s worth exploring in more detail, and particularly with Generative AI in mind.

While GenAI is most often discussed as a means for creating novel data (generating text, or images, or audio etc.), I find it most useful as a tool for redesigning and reproducing other content. I use the terms design and production here in the same sense of Kress and Van Leeuwen, but as part of an iterative process rather than a single event. Thinking about the technology in this way also opens up new affordances, and new limitations.

Here is a typical process from the discourse-design-production-distribution of one of my articles (I’ll use the recent How Artificial Intelligence Can Catch Up With Pedagogy as the example).

Discourse

I have been posting articles on this website for over two years about the intersection of technology, education, and writing. Many of these articles are framed by the discourse of “education technology” or edtech. The “edtech discourse” is broad, so to refine that I’ll add that my experience of 15 years in secondary education included time as an e-learning coordinator, a Director of Learning and Teaching responsible for staff training on digital technologies, a teacher during COVID and remote learning, and (briefly) a STEM teacher. I have been in both the “technology is great” and the “technology is terrible” camp at various times during that career.

Now, my use of the edtech discourse serves the interests of my PhD, and the critique of technology in education tends to dominate over the positives. However, as a technology (and GenAI) user and someone who works with educators to understand the tools, I talk often about the potential. That framing of the edtech discourse – past experience, present studies, work and personal use – influences the company I keep online, and in articular on LinkedIn which is the only social media platform I spend any significant time on. Other influences – colleagues, other researchers, the teachers I work with, the conferences I attend – all contribute to how I frame and use the edtech discourse.

Design

In the example of the article How Artificial Intelligence Can Catch Up With Pedagogy, I was initially prompted to write after seeing an article and LinkedIn post with the headline How Pedagogy Can Catch Up With Artificial Intelligence. Framed by the critical aspects of the edtech discourse, my response was frustration and incredulity, and a vague weariness of being told things like “technology will revolutionise education”.

When it comes time to design a text, I have several directions to choose from. The “available resources” or narratives from the edtech discourse include the positive, the negative, the ethical, and the practical. I can draw on various semiotic modes, and combinations of modes and media, to realise the discourse: commentaries, arguments, narratives, academic research, digital texts, images, and so on. I can choose to leave a comment on the article, to write my own article in response, to do nothing, to speak about it in a presentation…

In this instance, I decided to write an article. I have written before about my preference to use voice memos and speech-to-text to draft articles, and that’s how I approached this one (in fact, looking back at the transcript, I drafted two related articles, this one and the follow-up Education doesn’t need more sparkle ✨).

Transcript in Otter

Production

For the production, the written transcript from the verbal draft goes into another AI model, Claude 3 Opus. I use the same prompt every time to try to get the best outcome, and it is fairly consistent.

From this, it’s a straight copy/paste into the WordPress editor, which has a no-code “blocks” based interface. A reasonable amount of editing is still required, because of both technical issues (transcription errors), human issues (brain errors) or reformatting, adding images, etc.

I used Adobe Firefly to generate the images for the post, and opened up previous posts to copy/paste across common items such as mailing list signups, online course promotions, and the contact form.

How Artificial Intelligence Can Catch Up With Pedagogy

Distribution

I schedule posts ahead of time, and include in the scheduling (via an app called Buffer) a simultaneous post on LinkedIn. This means that the people subscribed on WordPress get an email if they’ve opted in, the followers on LinkedIn see the post (maybe… depending on the whims of the algorithm), and anyone who happens to chance across the https://leonfurze.com/blog page on my site at the right time sees it simultaneously.

As an aside, LinkedIn recently changed their platform so that it no longer displays a large image and headline when you share an external link. Instead, it gives a small thumbnail. This is obviously part of the ongoing ploy to keep users on-platform. That’s why these posts are accompanied by my own image with the main article header and title, made in Canva.

Finally, anyone on my mailing list gets a weekly email with the most recent posts and a brief summary.

All of this, from design to distribution, is a solo venture. But of course there are other influences along the way: the people involved in the edtech discourse, the GenAI tools (which exert their own influence), and so on. And while Kress and Van Leeuwen’s model ends here, with the four strata contributing to the final, multimodal blog post, my own processes continue.

Iterating with AI

At this point, Generative AI becomes a much more useful tool. I now have the verbal draft, the text transcript, the finished blog post, and the associated social media and emails. I could leave it there and move on to the next article in the pipeline, but often I now turn to Generative AI tools to repurpose the original content.

This not only keeps the content alive, but also contributes back into the discourse in, I think, a much more meaningful way. Here are a few examples:

The original transcript or the blog post might be fed back through Generative AI to generate new ideas or ‘spin-offs’
The blog post and its associated images and structure might be used to produce an ebook (like the GenAI Strategy for Faculty Leaders posts and this subsequent ebook). This also involves an extra person (shout out to @just_geek on Fiverr)
The post and images might be used as part of a presentation for a PD session, webinar, or keynote. This was the case with the original Practical Strategies for ChatGPT in education blog post, which became the materials for the Practical Strategies for ChatGPT in Education: Live Webinar and online PD, and then a chapter in the Practical AI Strategies book, and finally a module in the online course. That one extended even further: I used Claude 3 to analyse the PDF, powerpoint, and original article and outline a free 4-week email course on prompting (that single blog post has done a lot of mileage).
It might form the basis of more academic work, for example the original AI Assessment Scale, which was adopted by Dr Mike Perkins, Dr Jasper Roe and Assoc. Prof Jason MacVaugh and has similarly spawned multiple offshoots including the Assessment Scale Version 2 Blog Post; the AI Assessment Scale Pilot Project ; a free AI Assessment Scale eBook; the peer reviewed article in JUTLP Volume 21 Number 6: AI Assessment Scale Pilot; and a post about the AIAS in use across the world.

Even the materials surrounding the articles are great for repurposing. LinkedIn comments, for example, provide a great way to gauge the reaction to a post but also provide material for future articles. I keep track of posts with lots of engagement, and use the language model Llama 3 to synthesise the fors and againsts from the comment threads of posts like this one related to AI grading.

That post had 200 comments – more than I could respond to individually or synthesise quickly. So I use a script running on my device (so that I don’t need to worry about deidentifying the information) to do that work. Those synthesised comments might form the basis of a follow-up article like this one.

Where next for Multimodal Discourse?

In Kress and Van Leeuwen’s terms, the available resources are potentially expanded by generative AI. I’ve written before that Generative AI doesn’t “democratize creativity”, but I did acknowledge that it lowers technical barriers to some multimodal, digital tools which have previously been difficult to access. The capacity to generate text, images, audio, and video adds new technical possibilities, but I’m more interested in the ways the technology allows for existing materials to be repurposed across different modes and media.

The fluency with which GenAI can shift meaning across modes (or “transduction”, in Kress’s language) opens up new possibilities for authors and creators. It’s less about creating with AI, and more about creating through the technology.

It’s not quite as simple as I made out earlier with my “Blog + GenAI” image. While Generative AI might “flatten the strata” of discourse-design-production-distribution, it also adds complexities I haven’t even touched on in this article, such as the influence the dataset or design of the model has on the output. I’ll save those issues for another day.

Share this article on LinkedIn with your own thoughts, and make sure to tag me @Leon Furze:

The Practical AI Strategies online course is available now! Over 4 hours of content split into 10-20 minute lessons, covering 6 key areas of Generative AI. You’ll learn how GenAI works, how to prompt text, image, and other models, and the ethical implications of this complex technology. You will also learn how to adapt education and assessment practices to deal with GenAI. This course has been designed for K-12 and Higher Education, and is available now.

Learn more out the course

Get in touch to discuss how Generative AI can be brought into your school or university in ways which respect educator autonomy, and foreground the ethical concerns of technology. Sparkle not included.

← Back

Leon Furze