Building an app in a weekend with Claude 3

I’ve mentioned before (here and here) that I know just about enough code to be dangerous, but not enough to make anything very sophisticated. I’m like a kid who’s learned a handful of foreign languages at school. I can order a beer in Python, HTML and CSS, but I can’t really hold a conversation with the locals.

Since the start of the year, I’ve gone back to square one and started learning Python from the very basics, as well as brushing up on the HTML that I haven’t used since I was a teenager. Of course, with generative artificial intelligence, there are a lot of hype-peddlars saying learning code is a redundant exercise. However, people who actually write code for a living tend to disagree. GenAI can indeed write functional and accurate code. But it can also write total gibberish, pretty much just like its normal language capabilities.

Like anything with GenAI, you really need to know what you’re doing in order to get the most out of the technology. So I think learning how to code and then using GenAI to help write code is much better than just relying on GenAI and hoping that a large language model can get it right. Of course, all of this will probably change as models get more and more sophisticated. But learning to code is also something which I’m finding fun, challenging and worthwhile in its own right.

I’ve gotten to the point where I can handle the basics of Python: functions, classes, and a handful of modules. And I’ve been using a few books such as Zed A. Shaw’s Learn Python the Hard Way and Learn More Python the Hard Way, and the Automate the Boring Stuff series to learn the fundamentals and start building a few small but functional applications.

Then over the weekend, I decided I wanted to try something more complex. A couple of weeks ago, Anthropic released its latest model, Claude 3, including its paid and most powerful version, Claude Opus. Claude Opus doesn’t have internet connectivity yet, but it does compare favourably with other powerful models like GPT-4 on various benchmarks, including coding. So I thought I’d try to combine my beginner’s knowledge of Python with Claude’s coding capabilities, and create a functional app.

Setting Some Goals

Because I’m a huge nerd and because I respond very well to clear goal setting, for the past few years, I’ve been using Objectives and Key Results (OKRs) to set personal learning and work goals.

OKRs are just one format among many, but I’ve found the process of having three to four objectives, each with three to four key results, very useful whilst I’ve been building my consulting practice and now that I’m working towards my PhD.

I set quarterly objectives and check in on the key results weekly or fortnightly. Up until now I’ve either done this on paper or in Todoist, which is the app that I use as a replacement for my actual brain. But Todoist isn’t really designed for tracking OKRs. It’s a to-do app and it’s much more useful for tracking projects and actions than the progress of goals which evolve over time.

So on a Friday morning, while sitting down with a notepad and a pen and planning ahead for the quarter’s objectives and key results, I thought this would be a good challenge to build my own custom app.

Planning the App

The first step was to sit down and think about my process so far. Because I’ve been using OKRs for a few years, I’ve developed a bit of a rhythm and it didn’t take long to sketch out what this app would need. It would obviously have to have the ability to record objectives and assign key results to those objectives. Objectives and key results would both need dates, titles and descriptions. And the key results would also need some way of tracking and adding notes.

I’ve never been particularly good at scoring key results, but that’s something I thought I’d like to include. And it’s recommended both in John Doerr’s Measure What Matters and Christina Wodtke’s Radical Focus, both of which books give solid advice on setting and tracking OKRs. So I took the weekly confidence rating from Wodtke’s book and the zero to one score from Doerr’s, and added them as features for the KRs.

Once I’d sketched out the app, I opened up Claude, signed in to use Claude Opus, and kicked things off with the following prompt:

Straightaway, Claude went into fairly familiar territory for large language models, which is to start pumping out a high-level list of steps for the whole project. My experience with large language models is that things go much better when you break everything down into smaller tasks. However, Claude has a particularly large context window, so it can theoretically handle huge projects. Claude’s response was also interesting because instead of just identifying the high-level project outline, it dove straight into the nitty gritty of the first steps, including how to set up a virtual environment in Python and the kind of database schema that we might likely need:

Objectives:
- id (primary key)
- title
- description
- start_date
- end_date

KeyResults:
- id (primary key)
- objective_id (foreign key to Objectives)
- title
- description
- target

WeeklyProgress:
- id (primary key)
- key_result_id (foreign key to KeyResults)
- week_start_date
- confidence_rating
- progress_score

Notes:
- id (primary key)
- key_result_id (foreign key to KeyResults)
- content
- created_at

With the app planned and Claude apprised of the project details, it was time to start building.

Building the App

I knew that this would be an iterative process and to not try to rush right to the end with something that looked pretty. I also knew a few things that I wanted to try out along the way as I continue learning Python, so I was able to give a few specifics. For example, I knew I wanted to build this app using a Python module called Flask and to ultimately style it with HTML and CSS.

A conversational prompt, riddled with typos. LLMs don’t care.

Claude began with instructions on how to set up the main Python file, app.py, and to create the classes that would be needed for the database which would store all of the information on objectives and key results. Based on my plan, it came up with classes for objectives, key results, notes, weekly progress, and whatever else. Rather than just copy-paste all of the code, I opened up Terminal and typed it out line by line. I’ve found this to be an effective way for me to learn code, and I know that it works for a lot of other people too.

Just to make things more awkward, I built this app on a Raspberry Pi which I was connecting to on an iPad using the Termius application. This is absolutely not the way I would recommend taking on any project, ever. But I’m doing this as a hobby, which means I’m just squeezing it in around work and parenting. So being able to pick up the iPad, SSH into the Raspberry Pi, do a little work, and then close it all down is very handy. I don’t have to use my desktop Mac or MacBook. And in my mind, having everything on the Raspberry Pi is neatly compartmentalised away from all of my other work stuff.

Obviously, doing things this way though, creates a few problems down the track, which I’ll get to later. One advantage of doing it this way was using the iPad’s Split Screen function, with the Terminus terminal window on the left and Claude on the right and no distractions.

Creating app routes for flask

Once I’d written out the classes to handle the database, the next step according to Claude was to create the Flask app routes. These are the parts of the application which tell it where to go for different pages. Every app route is associated with an HTML page which you create in a Templates folder. And this was the next step offered by Claude.

I created HTML templates for pages such as objectives, objective details, key results and details, and to edit, update, and track the progress of key results.

Iterate, Iterate, Iterate

I was impressed by how quickly this thing started to come together. The database creation basically took care of itself, and the core functions of the app, such as creating new objectives and key results, was very easy.

Once I started to refine and add features, things began to get a little more complex. With the basic app in front of me, I wrote down a list of all of the finer details and extra functions that I was looking for, and then went back to Claude, working through features these one by one. Often, adding or changing a function caused some kind of conflict with the database or required the installation of extra modules, such as Flask-Migrate to update database changes, or a few other examples.

Again, the fact that I’ve been learning Python for the past couple of months meant it was relatively easy to fix errors as they came up, or even in some cases, spot them before Claude did. Eventually, the features of the application that I was looking for were all there. And it was time to add some style.

Adding Some Style

Fundamentally, styling the application was just about tweaking the HTML templates until the colours, size and spacing was how I wanted it. After outlining how I would like the app to look, and providing a few example screenshots of apps that I liked the look of, Claude suggested Bootstrap as a way of handling the CSS and provided detailed instructions on how to do that. Because Claude doesn’t have internet access, the instructions were out of date. But I was able to follow along from the Bootstrap website easily enough. I created a CSS file in the static folder of the application, and once again iterated round and round until the app looked more or less how I wanted it to look. Of course, it’ll never be quite finished…

Create objective page, objectives overview, and objective detail

Create key result page, objective detail with KR, KR weekly progress

‘Traffic light’ page for scored KRs. 0-10 confidence rating for ongoing KRs, 0-1 score for completed KRs. The colour is also determined based on whether the KR target is 1 or 0.7 (for more challenging goals)

Probably the hardest part of getting the app to look right was ensuring flexibility between desktop and mobile browsers. Because I don’t have any experience here, I had no idea how simple it should have been to get this right, and therefore wasn’t able to ask the right questions. When I moved away from Claude and started to Google instead, I quickly found out that what I should be asking about were things like rem and media queries in the CSS file to allow the HTML templates to adjust dynamically to screen size. After a frustrating few rounds trying to get this right, that quick Google solved the problem much more effectively than Claude. Again, knowing what you need to know is an incredibly important part of using current generative AI models.

Finally though, the app looked exactly how I wanted it to. More or less. And, most importantly, all of the underlying functionality still worked.

The OKR App Goes Live… Sort Of

Screenshot of the app homepage running in a desktop browser

Like I said, I’ve been running this application from a Raspberry Pi, connecting to it on my home network. So, I can run the Flask application and then use any device on my home network to go to the IP address:5000, and access the app. The database is stored on the Raspberry Pi and everything works seamlessly… as long as I don’t want to leave my house.

Of course, if I want to use this app in the longer term, it would be much more helpful to be able to access it from anywhere. I looked into this, and Claude offered a few suggestions. For example, I can open up access through port forwarding on my router, essentially creating a gateway directly to the Raspberry Pi from anywhere on the internet. Even to me, with my limited knowledge of Internet security, this sounds like a bad idea. And, to its credit, Claude immediately followed this suggestion up by saying that it didn’t recommend it, because it would give anyone potentially access to my home network without putting some security measures in place. It then suggested some security measures, such as using HTTPS, adding authentication (login username, password) to the app, or moving from Flask to something which would be easier to secure, such as Apache.

Claude also suggested a few methods of deploying the application online for distribution. I dug a little further into all of these and for now put them in the “too hard” basket. Claude gave me some example code for implementing HTTPS basic encryption and creating a login page, but for now it all felt like overkill. Instead, I’m going to use this app while I’m at home for a while and continue to refine it. I work from home, so that’s hardly a major drama.

Challenges

All in all, from scribbling notes of ideas in a notebook to a mostly functioning app running on my home network took less than two days, and I certainly wasn’t working on this constantly. I squeezed in a couple of hours here and there – before the kids were awake, or after they’d gone to sleep. I had one day, during that mid-day period that we optimistically call “rest time”, when I squeezed in some extra time.

In total, I probably put about six hours into the bulk of the project, with a little bit of fiddling around the edges. Six hours from idea to prototype with a team of one with limited technical expertise. Of course, this was not without its challenges. The main challenges I’ve already alluded to – there were times when I just didn’t know what I needed to ask for. I wasted a lot of time going down dead ends and trying to do things in ways which were probably very inefficient and ineffective. In general though, it was easy to identify and fix errors.

Uploading code snippets and error reports from flask’s debugger

Another unexpected challenge came from Claude itself. Claude has an enormous context window, much bigger than ChatGPT Plus or Gemini Advanced, which means it can handle large documents and lengthy conversations. About halfway through this project, however, I noticed that Claude started to lag – at first just a little, and then significantly. When writing responses, the delay between typing the prompt and receiving the output grew and grew. Because Claude was outputting a lot of code, the code boxes that it used seemed to be presenting issues.

As I was nearing the end of the project, in fact, it became almost unusable. The textbox for entering the prompt actually reached the point where I couldn’t select it or bring up a cursor to type. And by the time of writing this post, that chat thread barely works at all. I can still open it from the history, but if I scroll too far back or try to type a new message, it refreshes and loads for several minutes before showing the most recent message again. I hit a limit on uploading screenshots (5 per chat) and files. And on the second day, hit the daily limit of total chat requests.

I was expecting to hit the end of the context window at some point, because I knew this would be a long project. But I didn’t expect the application itself to fall to pieces. To make sure that it was this particular chat, I loaded several new chat threads, which all worked fine, and tried across a range of browsers and devices with the same results – every time, the OKR tracker chat thread was basically broken. I was able to do a select-all copy/paste from the webpage into a Word document, and the full chat clocked up an impressive 40,000 words before giving up, not counting uploaded documents, screenshots, and copied-in text (which Claude handles like a separate text file).

Presumably, this is something that Anthropic will be working on. But it’s for now something to be aware of if you attempt to use Claude for long projects. As an immediate fix, I uploaded the most important files to a new chat, and instructed the model that we were picking up where we left off. This new chat thread handled the transition well.

Implications for Education

This was a fun project and I’ve ended up with a (mostly) working app that does what I want, looks reasonably tidy, and which I’ll enjoy using this quarter as I track my OKRs. But the implications of this process in education are far-reaching, and not just in computer science or programming courses.

Of course, this demonstrates that a large language model like Claude Opus can be used to create complete applications or to support a beginning programmer like myself. But it also demonstrates something that I’m far more excited about – the potential for students and educators to work together on low-code and no-code tools to build things which are creative, unique and useful.

A lot of schools waste thousands of dollars on off-the-shelf edtech products to try to address a particular need for their staff or students. Most of the time, these applications are either general purpose and not specific to the school’s needs, or they are so niche that they are useless for the task (but the developer will tell you that there’s a new feature coming soon that solves all of your problems…).

Here’s an example: let’s say a school wants a simple way to track teachers’ professional development hours, their goals, and map them to national standards. I know a lot of schools that do this in platforms like EMS360, which is actually an excursion and leave management platform that has had a PD section bolted onto it at some point. Others use clunky Department of Education provided systems or outdated websites, or pay significant amounts of money for platforms which offer PD tracking and a host of other features.

A team of two or three teachers could probably hack together a functioning PD tracking app in a couple of days using the methods outlined in this post. If one of the teachers had a little bit of programming knowledge, or you brought in one of the IT staff, or a techie friend, you could probably half the time.

I imagine that in the near future, it would be very simple to create an application that you could host internally on your school’s intranet, and probably even connect to your existing learning management system. For me, this idea of building apps on-demand is a far more exciting use of generative AI than creating tutor chatbots for students to help them with their homework. This is about creating real, useful applications that solve real, time-consuming problems and are likely to save thousands of dollars, which could then be used on resources for students and teachers that actually make a difference.

I’m going to end this post with a challenge. And if you’ve stuck with me this far, and you work in a K-12 or higher education institution, I’d love for you to get in touch with me to see if we can partner up and make this happen.

The Challenge: Hack Your Own EdTech

This year, I’m going to be running a number of hackathons on using GenAI technology to create useful and fun things. The first of these will be a students’ hackathon in partnership with ACMI to create the elements of a working video game (while you’re checking out ACMI, here are four free webinars on AI we recorded last year). But I think this idea has got some serious merit.

I’m looking for organisations, schools, universities, and groups that are interested in working together to build something exciting. If you’ve got an idea for a tool that your organisation would find useful, or something that you think would help your students or staff, I want you to get in touch using the contact form below and I want to see if we can make this happen.

I’ll come to your organisation for a day or two, work with staff and/or students, and guide you through this process of using generative artificial intelligence, a basic understanding of code, and a little bit of common sense to create a prototype piece of software. If you’re reading this and you represent an association or a larger group, this might be a great opportunity to bring together a few individuals from different organisations in your association to build something great.

Get in touch with me using the contact form below:

← Back

Thank you for your response. ✨

2 responses to “Building an app in a weekend with Claude 3”

Generative AI doesn’t “democratize creativity” – Leon Furze

April 16, 2024 at 9:31 am

[…] a confident writer who actually enjoys writing, and a some-time amateur musician, illustrator, and code-hobbyist. I’m a creative type, I suppose, so perhaps my view is narrowed by my perspective on what […]

Loading…

Creating a Project Assistant with Claude and Todoist – Leon Furze

July 29, 2024 at 10:47 am

[…] work with large datasets, and execute code in real-time, making it versatile for applications like creating interactive web apps, prototyping ideas, and performing data analysis. With features such as a large context window and […]

Loading…

Leon Furze