AI Automation: Using Claude to Help Write Python Automations

One of my favourite ways to use GenAI is as a cheap and cheerful automations coder. I’ve written before about how I used ChatGPT to scrape my own blog posts, helping to create a huge bank of resources to use as draft material for Practical AI Strategies.

Every time I find myself doing something boring or repetitive, I ask whether there’s part of it that can be automated. I know a little python, html, and css, but not enough to build anything useful from scratch. But I have found that, as long as you know the right questions to ask, it is possible to write functioning code with GenAI.

In this post, I’m going to detail the steps used to create this resource, a Google Sheets .csv file containing a list of all my blog posts, their URLs, publication dates, and feature images:

Click to access the entire Google Sheets file with links to over 100 posts

Using Claude for Automations

Although it doesn’t have internet access, I’ve found Claude to be fantastic for writing automation scripts. It’s handling of code like python seems superior (to my untrained eye) to ChatGPT, and it’s large token window means you can upload much more contextual data, which came in handy in this experiment.

To begin, it’s always best to identify exactly what you want to automate. In my case, it was compiling a list of my blog posts in the “AI” category on my site. Specifically, I wanted the title, URL, date, tags, and feature image. I began with the following prompt:

I want you to write a python script that does the following:

  1. Get URLs for every post in the AI category of my website (https://leonfurze.com/category/ai/)
  2. Visit each page and extract the following data: Title, URL, date (which can actually be inferred from the URL as well), tags, and the URL of the featured image.
  3. Compile all of that data into a new .csv file

I anticipated that the first run of the code wouldn’t work, because this is a vague prompt. But I find it useful to go broad and then zero in rather than trying to get everything right on the first attempt.

I took the code and created a new blank python file in Terminal (on my MacBook). This is probably the most technical part of this entire process. I’m not going to go into it because you can figure it out with some Googling, but in a nutshell I just open a new terminal window, navigate to the folder I’m working in (just the Desktop in this case because I was being lazy), and create a new .py file with the built in pico editor. Pico is by no means the best application for a beginner writing code, but I’m literally just pasting in the code, saving the file, and running it.

The resulting code produced a CSV file with the right column headings, but no data. Because the code was very sparse, I also couldn’t see where it was going wrong, so I requested the following:

Didn't work. Blank CSV. Add lots of error handling and print statements

Blunt, but it gets the job done.

This time… it still didn’t work. But it at least gave me some errors to pursue. I could see from the error text that it wasn’t even finding the links, let alone visiting each individual page for the other details. That suggested that it was “reading” the category page incorrectly.

I’ve done some playing around with python and web scraping before, including when I briefly taught a STEM class and we learned how to scrape weather data. Unlike working with Generative AI, when you’re working with actual code everything has to be exact or nothing works. But working in combination with GenAI and code you don’t necessarily have to go hunting through the entire script for a typo or an incorrect link.

Click to subscribe and get a free 60 page eBook on Rethinking Assessment for Generative Artificial Intelligence

In the next step, knowing that the issue would likely be either the formatting of the links or the way the script was trying to find the components (the title, URL, date, etc.,), I decided to just throw a pile of information at Claude and see if it could figure out the problem for itself. I uploaded a couple of files: copy/pastes of the source code from the category page (https://leonfurze.com/category/ai) and a blog post. From this, I hoped that Claude could correctly identify things like how the page links are structured, where to find the tags and featured images, and so on. I also included the previous error messages for good measure.

Uploading source files to Claude for reference
Uploading source files to Claude for reference

This is where Claude really comes into its own against ChatGPT. Even the new GPT-4o model can’t handle this much data. I had already tried this automation in GPT-4o unsuccessfully (in fact, the total time to get this up and running was 20 minutes, and 15 of those were wasted using ChatGPT before switching to Claude).

It failed once more, with the same “no page found” issue. This time, I just pasted in the error message and hit send. The resulting code was the final working version:

import requests
from bs4 import BeautifulSoup
import csv
import time  # Import the time module

def scrape_ai_category_posts(url):
    post_data = []
    page_number = 1
    wait_time = 1  # Time in seconds to wait between requests

    while True:
        try:
            print(f"Processing page {page_number}")

            # Fetch the content of the website
            response = requests.get(url)
            if response.status_code != 200:
                print(f"Failed to retrieve the website: {url}")
                break

            # Pause to respect the server
            time.sleep(wait_time)

            # Parse the HTML content
            soup = BeautifulSoup(response.content, 'html.parser')

            # Find the main content area
            main_content = soup.find('main', class_='wp-block-query')
            if not main_content:
                print("Main content area not found.")
                break

            # Find post elements within the main content area
            posts = main_content.find_all('li', class_='wp-block-post')
            print(f"Found {len(posts)} posts on the page.")

            if not posts:
                print("No more posts found. Exiting.")
                break

            for post in posts:
                try:
                    # Extract the post URL
                    link = post.find('h2', class_='wp-block-post-title').find('a')['href']
                    print(f"Processing post: {link}")

                    # Visit the post URL and scrape additional data
                    post_response = requests.get(link)
                    if post_response.status_code != 200:
                        print(f"Failed to retrieve the post: {link}")
                        continue

                    # Pause again after each post fetch
                    time.sleep(wait_time)

                    post_soup = BeautifulSoup(post_response.content, 'html.parser')

                    # Extract the title, date, tags, and image URL as before

                    post_data.append([title, link, date, ', '.join(tags), image_url])

                except Exception as e:
                    print(f"An error occurred while processing post: {link}")
                    print(f"Error details: {e}")

            # Increment the page number and update the URL for the next iteration
            page_number += 1
            url = f"https://leonfurze.com/category/ai/page/{page_number}/"

        except Exception as e:
            print(f"An error occurred while processing page {page_number}: {e}")
            break

    if post_data:
        # Write the data to a CSV file
        with open('ai_category_posts.csv', 'w', newline='', encoding='utf-8') as file:
            writer = csv.writer(file)
            writer.writerow(['Title', 'URL', 'Date', 'Tags', 'Featured Image URL'])
            writer.writerows(post_data)
        print("AI category posts data saved to ai_category_posts.csv.")
    else:
        print("No post data found.")

# URL of the AI category
ai_category_url = "https://leonfurze.com/category/ai/"
scrape_ai_category_posts(ai_category_url)

A few things I noticed from the code (bearing in mind I’m an amateur, but I have had some minimal experience with scraping). First, I’m scraping my own website for my own use, so I don’t have any concerns with the content. The code includes wait times at a few points within the main loop to, in the words of the comment, “respect the server”. WordPress, like many sites, will block your IP if you slam it with millions of requests (though this particular script would only do around 200 requests in this category of posts, which in the grand scheme of web scraping is minimal).

Here it is chewing through the posts:

It’s a tiny little scrap of code that does exactly what I wanted, and (removing the wasted ChatGPT time) took only five minutes to create. Even if I were a proficient coder, just typing all of this out would probably take longer than that.

Since I create a lot of content as a core part of both my consulting and academic work, little automations like this can save me a lot of time. But what I’m really interested in is applying this logic to others’ workloads. If you’re an educator or a school leader I’d love your thoughts on how this approach might be useful. Here is the step-by-step process:

  1. Think of something you need to automate, and identify the high level steps.
  2. Create a broad prompt that generally outlines the required steps.
  3. Test the code. Try to identify obvious errors, or use another prompt to add “error handling” (this will encourage the model to add more things like print statements so you can see where things are going wrong).
  4. Iterate on the code until it works, uploading more details or examples if necessary.

Here’s the final .csv file again:

This image has an empty alt attribute; its file name is Screenshot-2024-05-18-at-9.11.15%E2%80%AFPM-1024x536.png
Click to access the file

If you’ve got an idea for automations that might work for educators, or you’d like to get in touch to discuss GenAI consulting and professional learning, then use the form below:

Go back

Your message has been sent

Warning
Warning
Warning
Warning.

Leave a Reply