Hands on with Claude 3.5 Sonnet

Anthropic, a major AI developer and one of the biggest competitors to OpenAI, have just released a new model of Claude – their incredibly capable Generative AI application. It comes just a couple of months after the release of OpenAI’s GPT-4o, and after playing around with it for a few days I think it outstrips ChatGPT in many areas.

Claude has been my go-to text generation application for a while now, since it does a much better job than ChatGPT at working with language. By this, I mean that it’s helpful for tasks like transcription, editing, and working with text without mangling the content. I recently wrote about my process of designing and producing articles, which often includes taking a verbal draft and using AI to edit the transcription. Claude is hands-down the best application for this kind of work, since it follows the instructions to the letter and produces very high quality output which respects my original input.

Now, Anthropic have released a slew of updates to both their free and paid subscriptions, and they’ve made the application even more useful.

Claude versus GPT

The following chart has popped up everywhere across social media in the past few days, and demonstrates how Claude performs on popular benchmarks against GPT, Gemini, and Llama (Meta’s model). It’s important to note that these benchmarks have limitations, and it’s easy for developers to cherry pick favourable results. Still, the performance of Claude versus models like GPT-4o can be felt as well as seen on a benchmark, and I certainly feel like it outperforms in areas like reasoning, knowledge, and even code.

https://www.anthropic.com/news/claude-3-5-sonnet

I’m definitely biased as I’m already a fan, and these benchmarks just confirm that bias: Claude is better at tasks which involve more complex “thought” processes, and particularly those which require more complex prompts or more data.

One important aspect of this is the whopping 200K token context window – not as high as Gemini’s 1.5-2 million tokens, but certainly large enough to upload loads of data or to hold extensive chat conversations before hitting the upper limits.

Artifacts

Aside from the performance and context upgrades, one impressive new feature is the “artifacts” which Claude can create and render in real time. This is Anthropic’s approach to ChatGPT’s code interpreter – a feature which can both write and execute code.

In Claude’s case, it pops open an additional window to the side where it renders the code it has written, for example displaying websites, functioning apps, and data analysis.

To demonstrate the feature, I did a blatant rip-off of one of Anthropic’s launch videos and used Claude 3.5 Sonnet to write and execute the code for a playable (though obviously rubbish) jumping game.

While I’m just playing around here, there are some potential applications for this feature which go beyond just a toy. For example, it’s easy to upload a CSV file and create an interactive app built on the data. You can also provide content to create interactive resources, design mock websites, and rapidly prototype ideas in a range of programming languages.

Building apps with Claude 3.5 Sonnet

Earlier this year I used Claude (3 Opus) to develop an app from scratch. I’ve also used Claude to write simple automation scripts, like this one which created a database of this website. I think that using and working with existing materials is one of the most promising areas for authors using GenAI technologies: we have all of this content and IP to play with, and LLM-based tools offer a new way to interact with the text.

To explore that, I decided to see how Claude 3.5 Sonnet would fare in the app-creation process, and decided to make a simple app to interact with my posts. I ended up with a Blog Dashboard. Here’s a high level overview of what it does including a video of the finished application:

Using Claude 3.5 Sonnet to interpret documents

Because the Claude family of models is very capable at working with complex texts and has such a large context window, it’s perfect for working with long documents such as research papers. Building on the ‘Blog Dashboard’ idea, i wondered how Claude might fair with making research more interactive.

I’m inspired by efforts from AI developers like Anthropic, Microsoft, Google, and OpenAI to make their research more accessible, engaging, and useful. A lot of papers recently have been released with interactive versions including visualisations and various tools. A standout example is Anthropic’s own Scaling Monosemanticity paper, which peers between the black-box layers of Claude 3 Sonnet (the previous mid-tier model).

https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

Creating these papers requires quite a lot of technical clout, and while it’s great for these big developers to be publishing more interactive research, the rest of us are left with the standard journal publishing format: dense text in PDF documents, presented in the familiar structure of abstract, introduction, method, discussion, and so on.

But what if GenAI could be used to make these types of paper more accessible? Using our AI Assessment Scale as the example, I prompted Claude 3.5 Sonnet to create an interactive web app based on the AIAS. Here’s how it ended up:

Claude ‘Projects’

If you feel like there’s a new AI feature drop about every 6 hours, you’re not hallucinating. Just as I was gathering up the videos and images for this article, Anthropic announced another set of features – this time only available to paying subscribers.

‘Projects’ are Anthropic’s equivalent to OpenAI’s Custom GPTs, enabling users to create an share custom-trained chatbots built on top of user data. With the 200k context window, that’s quite a lot of data.

Having spent all that time (45 minutes…) building my own Blog Dashboard app a few days ago, I thought I’d try to create the same thing using a new Project.

I uploaded a CSV of all my blog posts (created from the database used for the original Blog Dashboard project). When I included the content of the posts – hundreds of thousands of words and HTML code – I blew up the context window by a massive 918%. After removing the ‘content’ column of the database, leaving the title, URL, date, tags, and excerpts, that reduced to just 20% of the allowance.

I then experimented with a few ideas, using the ‘Blog Assistant’ project to interrogate past posts, generate new ideas, and create a basic interactive ‘sentiment analysis’ dashboard of the last six months’ worth of posts.

Then I hit everyone’s least favourite Claude feature:

Before I hit the limit, however, I was able to do a lot with the new Claude 3.5 Sonnet Projects feature. Even lacking an internet connection, it’s a versatile tool that on a few trial runs didn’t hallucinate once. Compare this to a custom GPT with the same data, and OpenAI’s platform tends to fabricate more than half of the links it provides.

Here’s a video of the whole Blog Assistant Project from start to finish. Pretty impressive what you can achieve in just 8 minutes with a little trial and error.

The Practical AI Strategies online course is available now! Over 4 hours of content split into 10-20 minute lessons, covering 6 key areas of Generative AI. You’ll learn how GenAI works, how to prompt text, image, and other models, and the ethical implications of this complex technology. You will also learn how to adapt education and assessment practices to deal with GenAI. This course has been designed for K-12 and Higher Education, and is available now.

Implications in education

I encourage educators to use Claude as an alternative to the more familiar ChatGPT because it has powerful language capabilities. With these new features – many of which are included in the free tier – it’s definitely worth looking into.

I don’t think we’ll see many seismic shifts in education with the ability to produce “artifacts”. While it’s fun to create little interactive apps or use Claude’s HTML5 and javascript capabilities to create functional browser games, it’s not something which could be easily integrated into any sensible curriculum.

What’s more impressive is the potential of these approaches, and what they suggest about the near future of similar LLM applications. The ‘Projects’ feature could be incredibly useful for a team of educators designing a new curriculum, offering the chance to upload lots of materials as the basis for these discussions.

I think “artifacts” point to a near-horizon of LLM applications like Claude and ChatGPT in which everything created by the AI can be instantly tested, and perhaps even deployed. In the blog post announcing the updates, Anthropic mentioned “integrations”, which suggests that Claude will be able to interact with other tools, perhaps in the same way OpenAI’s custom GPTs can use API calls (though hopefully much easier, and less fragile).

Artifacts and Projects aren’t going to revolutionise education, but for me they’ve secured Claude as my favourite LLM-based application.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

← Back

Thank you for your response. ✨

Warning
Warning
Warning
Warning
Warning.

2 responses to “Hands on with Claude 3.5 Sonnet”

  1. […] written about the recent updates to Claude 3.5 Sonnet, including Projects, in an earlier post, so I’ll just gloss the details […]

  2. […] this “artifact” from Anthropic’s Claude creates a tool to help visualise the connections between […]

Leave a Reply