Artificial Intelligence and Film Studies: Midjourney versus CAMELS

CAMELS?

As an English teacher, I’ve done a lot of analytical film studies over the years. The biggest problem is always students’ grasp of technical language: film analysis tends to drift off into character and plot description, even more so than for written texts.

So, when I teach film, I lean more into my Media Studies background. In the UK I completed my undergrad in English and American Literatures. When I trained to teach, it was in English, Media, and Drama. Then, when I first started teaching in Australia, I picked up a VCE Media Study class. And that’s where I first heard about CAMELS.

CAMELS is a mnemonic for film production elements: Camera, Acting, Mise en Scène, Editing, Lighting, and Sound. It has been an incredibly useful tool in the English classroom as a framework for writing about technique. In this post, I explore how a combination of ChatGPT and Midjourney can be used to unleash the CAMELS.

ChatGPT and CAMELS

Before generating any images, I wanted to build a reference list of vocabulary. Searching online it’s not hard to find separate articles about camera work, acting, and so on. There are not a lot of resources out there that combine the six elements, however. Beyond a couple of old VCE Media resources, it seemed like this would be a DIY job.

Luckily this this is the kind of task ChatGPT can be very useful for. I began with a prompt to establish what CAMELS stands for:

I then used the following prompts to create and refine a vocabulary table. I’m not going to print every step of the way as it involved a bit of back and forth to get the final table:

Create a table. Left column, the CAMELS words and a brief description. Right column, vocabulary for each. Include LOTS of vocabulary and be technical. For example in camera include obvious things like closeup and long shot, but also more technical terms for camera movement, shots, and direction. Mise en scene includes visual composition, framing, and blocking so use terms related to all that. In camera you might include different film stock, etc. in lighting you’ll have high key and low key but also specific words to convey tone and atmosphere, and technical language.
Prompt 1

Camera is ok but doesn’t include enough film stock, digital styles, stylistic camera stuff. The others all need more. Go again
Prompt 2

To your original table add an extra column which adds examples to some of the vocabulary. For example, to ISO you might add 400. To colour palette you might add “limited” or “warm” or “neon”. While you’re at it, remove all sound references from the editing vocabulary and replace with something more visual
Prompt 3

Here’s the finished table:

Element	Description	Vocabulary	Examples
Camera	The way in which the camera is used to capture the action and convey meaning through framing, angles, movement, and other techniques.	Close-up, long shot, medium shot, wide shot, extreme long shot, dolly shot, pan, tilt, zoom, tracking shot, hand-held camera, rack focus, shallow depth of field, deep depth of field, aspect ratio, film stock (e.g. 16mm, 35mm, 70mm), digital formats (e.g. 4K, 8K), exposure, shutter speed, aperture, ISO, anamorphic lens, fish-eye lens, tilt-shift lens, Steadicam, gimbal, drone	ISO 400, 35mm film stock, anamorphic lens for widescreen look, drone shot of landscape
Acting	The quality of the performances by the actors and their ability to convey emotion and believability.	Characterization, motivation, subtext, emoting, blocking, body language, vocal delivery, facial expressions, method acting, Meisner technique, Stanislavski method, improvisation, on-camera performance, green screen performance	Subtext in a line of dialogue, physicality of a character’s blocking, facial expressions to convey emotion, improvisation to find the truth of a scene
Mise-en-scene & Visual Composition	The overall visual design of the scene, including the placement of actors and props, set design, and use of color and visual elements to create a specific mood or atmosphere.	Framing, blocking, depth of field, rule of thirds, composition, contrast, color palette, production design, set decoration, costume design, makeup, props, location scouting, set construction, art direction, special effects, matte painting, miniatures, motion capture, virtual production	Warm color palette for a cozy scene, contrasting colors to highlight a subject, rule of thirds to create balance, miniatures for a fantastical setting, virtual production for a high-tech environment
Editing	The way in which shots are assembled to create a cohesive narrative or visual style, including choices about pacing, transitions, and use of special effects.	Montage, cut, fade, dissolve, wipe, jump cut, match cut, cross-cutting, continuity editing, parallel editing, special effects, visual effects, compositing, color grading, split screen, freeze frame, slow motion, time-lapse, hyperlapse, jump scare	Montage to show passage of time, jump cut for a jarring effect, continuity editing to create seamless transitions, slow motion to emphasize an action, split screen to show multiple perspectives
Lighting	The use of lighting to create mood, convey meaning, and highlight certain elements within a scene.	Key light, fill light, backlight, high-key lighting, low-key lighting, three-point lighting, natural light, artificial light, gels, barn doors, lighting ratios, chiaroscuro, color temperature, practical lights, motivated lighting, mood lighting, shadow play, bounce light, hard light, soft light, directional light, atmospheric lighting, lens flare, gobo, cookie	High-key lighting for a cheerful scene, low-key lighting for a sinister scene, motivated lighting from a practical source, shadow play to create mystery, atmospheric lighting for a dreamlike quality

Bringing in Midjourney

At this point, I felt I had enough information to start generating some images.

One problem I’ve found with teaching film technique is that you have to spend a lot of time preparing resources; particularly identifying film stills which demonstrate what you’re trying to show. This might involve finding generic photos of camera angles, or trawling through a film you’re studying and taking screenshots of the relevant frames.

I thought it would be interesting to see what’s possible with the AI image generation app Midjourney. If you’re not familiar with Discord, Midjourney can seem a little intimidating at first. There are also ethical concerns about copyright and bias to take into account, as with many AI technologies. But there’s no denying that Midjourney is impressive for its realism and quality, and it’s easier to drive than one of the main alternatives, Stable Diffusion.

Rather than writing lots of image prompts to experiment with variations on the CAMELs, I decided to use try ChatGPT with the following prompt:

Here is the general format for a mid journey image generator prompt for a film still: /imagine prompt: film still, [scene description], [style description] — ar 16:9 —q 2

In scene description we place a concise description of the scene including camera, mise en scene, visual composition, acting, and setting, time of day, etc. In style description we may include elements of mise en scene and stylistic elements of camera eg stock, and also lighting, editing such as grading, and any reference to specific visual style.

Using that information generate 6 scenes from the same movie about a speculative fiction dystopia featuring a lone character. The scenes must be varied in composition but stylistically and aesthetically the same. You will need to consider how to keep the character and style the same from scene to scene while varying the composition and settings

ChatGPT produced image prompts in the correct format, but ignored some of the previous vocabulary from the table it generated. After a few more back and forth prompts – including explicit instructions to diversify the gender of the protagonist in its scenes – here’s what it came up with:

Here are the outputs from Midjourney with those exact prompts. Prompt number 3 wouldn’t generate any film stills: one of the words in the prompt – maybe ‘dystopian’ – kept generating anime style output.

Critiquing the output

With Midjourney outputs, I find you have to take a minute to let the awe wash over and start to look at the images more critically. This in itself could be an excellent lesson activity.

While the gist of the prompts is correct in the output, there are a few flaws. I’ve noticed that some of these occur consistently, for example:

Midjourney struggles with “high angle” and “low angle” shots. It’s default position is an eye-level mid-shot.
The phrase “depth of field” is very strong, but “shallow depth of field” and “deep depth of field” often result in the same (shallow) output.
ChatGPT output the terms “male”, “female”, and “non-binary”. Midjourney’s handling of these terms is often stereotypical. Non-binary ranges from a vaguely androgynous figure to variations on a masculinised female or a feminised male. That being said, there is no “right or wrong” way to show a male, female, or non-binary person so to avoid stereotypes you would have to use much less generic prompts.
As well as gender, race is an issue for Midjourney, which often defaults to white characters unless specified in the prompt.
Instructions for lighting are followed, but instructions for sense of movement (panning, tracking etc.) are largely ignored.

Consistency

Moving on from those initial experiments, I started to think about what Midjourney would be useful for in terms of film technique. It can be hard to get a consistent aesthetic without careful prompting, so I decided that this would also be a useful teaching tool.

Unless you use the same seed for images, Midjourney’s outputs are random. This means that if you run the prompt dystopian scene three times, you’ll get reasonably different output each time.

To control the consistency, we can use some of the film language from earlier.

Some prompt elements are definitely “stronger” in Midjourney (like the default camera positions seen earlier). This is because of the labelling of the images in its dataset: more common labels will have a stronger weight. Being aware of this can help give a little control over the prompts, by calling on common or obvious styles. For example, you can use prompts for genre (e.g., science fiction, romance, fantasy), a time period (e.g., 80s, 90s, 00s), or a descriptive word like “futuristic”. Using the same words over a series of prompts gives a more consistent series of outputs.

Here are a few examples of prompts generated using those instructions via ChatGPT:

“–no hands –no fingers” because the version of Midjourney I was using here (v4) was notoriously bad at fingers. They’ve mostly solved that with the current v5.

Note how the cyborg changes each time, but the aesthetic of the scenes and the composition is consistent throughout.

Want more of this kind of content? Join the list:

Processing…

Success! You're on the list.

Whoops! There was an error and we couldn't process your subscription. Please reload the page and try again.

Creating a scene

The last two experiments consisted of creating an entire storyboard for a scene using ChatGPT and Midjourney.

For the first example, I created the basic prompt and got ChatGPT to add technical details:

Add technical detail to this from the camera and mise en scene vocabulary lists: /imagine prompt: film still, opening sequence, Victorian manor seen through a wrought iron gate, highly detailed, cinematic –ar 16:9
–ar 16:9 produces a widescreen aspect ratio

I then varied the content of the prompt myself to create the scenes, imagining a sequence which moves from the outside of the building, zooms in through the gate, in through a window, and then through the interior of the building to the final shot. I kept the ChatGPT generated style vocabulary the same each time, and described the scene.

For the final run, I got ChatGPT to generate the details of an establishing sequence using vocabulary from the mise en scene part of the table. I didn’t specify anything about what the movie should contain. Here is the ChatGPT output:

And the results using these as the Midjourney prompts:

Learning through prompts

Based on these experiments, here are a few ways I’d recommend using ChatGPT and image generation in the classroom. It may be difficult to access Midjourney (Discord is often banned in schools), but students will likely be able to access Bing chat’s image generation, and much of this will change in the future.

Use ChatGPT to generate resources, like the original table for film vocabulary. You could also generate more specific tables like a resource focused only on camera which includes more detail on angles, movement, film stock, lens types, and so on.
Generate images of film stills in Midjourney or another image generation app and have students analyse them, identifying the types of shot, the mise en scene, etc.
Use ChatGPT to generate storyboards, and have the students write image generation prompts. Assess the level of detail of technical language in the prompts.
Recreate scenes from an existing film – such as the text you are studying – by creating detailed image generation prompts which highlight specific details and techniques such as style, colour, camera angle, and framing.
Take the ideas above about creating a table of vocabulary, and use it to generate scripts with technical directions. Turn the scripts into a visual storyboard with image generation.

Although I’ve been using ChatGPT since its launch, I’ve only really just started exploring image generation. Up until now I’ve been mainly using it to generate images to accompany blog posts, like the images created for this post on AI ethics.

If you’re interested in getting to grips with Midjourney, I’d strongly recommend following these three people:

Linus Ekenstam

Brian Sykes

Lori Mazor

Got a question or comment about this post, or anything in general? Get in touch

← Back

Leon Furze