Hoard Your Knowledge: Build an AI-Enhanced Corpus

If you’ve been on the blog this year, you’ll have no doubt seen me referring to articles by Simon Willison, an accomplished software engineer who writes extensively about generative AI. His writing has been invaluable in shaping how I think about and use technologies

Although he has a technical focus, one of his recent posts stood out to me as useful in any field but particularly education. In the article, “Hoard Things You Know How To Do”, Willison explains that developing a deep collection of answers to complex problems is fundamental to solving new problems that people haven’t thought of solutions for yet. In his context, he’s referring to building up a bank of programming solutions, software and code so that when you’re presented with a new challenge, you can pull bits and pieces out of the archive much more quickly.

Later in the article, he goes on to point out how useful artificial intelligence is in this context. If you already have a large database of answers -for example a deep and broad code repository – then you can start posing questions to AI like, “How might I use some of the things I’ve already built to build a new thing?” This is why he suggests you should hoard things you know how to do.

Hoard your knowledge

I’m not a software engineer and neither are most of the readers of this blog, but what we do have is a wealth of knowledge. Educators hoard knowledge like magpies, gathering up shiny scraps of professional learning, hand-written and typed notes from workshops and daily briefings, lesson plans, curricular materials, and all of the other expertise and detritus that piles up throughout a teaching career.

Many educators like myself commit that knowledge to writing. I’ve got dozens of colleagues who have written chapters for textbooks or maintained blogs or have produced their own books on education. Many of them are prolific in their writing with journal articles, books, and teaching materials, produced at a frenetic rate.

So, whilst we’re not designing software, we are often capturing all of the things we know how to do. Unfortunately, a lot of this work is ephemeral. We lose notes from that great PD that happened four years ago. We write articles and then sweep up the remnants from the cutting room floor and never see them again. We deliver excellent lessons and, in the fullness of time, forget about them or don’t connect the dots between those old lessons and the new areas that we’re teaching.

Much of the knowledge we have stays tacit, trapped in our heads, or maybe buried in some corner of our minds that makes it totally inaccessible, even a few months later.

This article is a call to hoard all of that knowledge, to make the tacit explicit. Write things down, keep the drafts and offcuts, and then, importantly, store all of this information in searchable, structured collections.

It’s not even as hard as it seems.

Photo by Daniil Komov on Pexels.com

More body than brain

There’s a popular term in the tech community, spurred on by books like Tiago Forte’s Building a Second Brain. “Second brains” often take the form of vaults like Obsidian, which is a markdown reader and knowledge graphing application, much loved by the technology community.

The basic idea of a second brain is, like I suggested above, to capture as much of your ideas and output as possible, but honestly, I don’t really like the term. Second brain to me sounds like something external: knowledge as something to offload, to free up your mind for supposedly higher tasks. Second brain sounds like something that should live in a jar. And it begs the question, what happens to our first brain?

Image source: u_5bo3cy7ndy on Pixabay via Canva

What I want to capture is not some external repository of knowledge, but my own body of work. And that term, body, gives us a more useful word: corpus.

In earlier posts, I’ve written about the importance of situated expertise: the embodied, relational, contextual knowledge that comes with time spent on a job. Situated expertise is lived experience. Building a corpus or body of work is much more suggestive of that situated expertise to me than a second brain.

A corpus of work is also designed for sharing. Think of Shakespeare’s corpus, or the academic study of corpus linguistics, where large repositories of text are examined for the information they contain. Not a second brain in a jar, but a living, breathing collection of works to be built upon.

In the following section, I’m going to give a brief rundown of the corpus that I’ve just created using Claude Code. And then I’ll explain some of my choices, the difficulties that I hit along the way, and where this body of knowledge is already becoming useful.

Building a corpus, and then using it

I’ve been an educator for almost 20 years, and for about 15 of those I’ve been writing. So, I’m already well on the way to having my “body of knowledge transcribed”, but as you can see in the steps that follow, “transcribed” doesn’t mean neat and tidy. I am terrible at versioning, useless at creating folder structures. I’m absolutely awful at keeping track of draft ideas.

What I do well is capturing as many of my fleeting ideas as possible. In fact, I wrote an article recently about how I use AI technologies, including voice transcription, to do just that, which is why a big part of my data comes from hours of transcribed audio. I’ve also got over 300 blog posts, a couple of dozen published academic articles, and a few books to draw on.

Lived Experience Using AI as an AuDHD Adult

First of all we’ve got to get all those files in one place. This is, frankly, a nightmare because they’re all in (dramatic music) iCloud.

Have you ever tried to move a large number of files from iCloud to anywhere else? I have, and it did not end well. Even if you’re transferring files onto an external hard drive, iCloud first has to download them onto your local device. This creates a number of problems, since it fills up hard drive space fast, and is inordinately slow – much slower than most downloads. I presented this problem to Claude Code to see what it could come up with. My prompt was… direct.

Turns out, it is possible to skip all of the iCloud nonsense and run a script directly from the command line. Claude Code operates from the command line, and can write and execute scripts, so it was able to spin up a handy little bit of code which selectively downloaded documents (doc and PDF) and no large files, and batched them so that iCloud didn’t inadvertently murder my laptop.

Claude Code (I’ll just use CC from now on) first presented me with a breakdown of file types and their sizes, so that I could be selective. I had forgotten that 4.1GB of .gguf files (local AI models), so this also gave me an opportunity to go and delete some dead weight.

I left this running in the background, simultaneously doing a few handy things like cross-referencing with my citations in Zotero, and deduplicating, and adding version control. It ran autonomously for a couple of hours as it transferred data out of iCloud, and onto a one-terabyte SD card.

I chose an SD card rather than a faster and more obvious USB hard drive, mostly for the form factor. I’ve got a MacBook Pro with an SD card slot. It doesn’t need to be as fast as an external hard drive. And this thing shoves into my laptop and can stay there forever without me needing to carry around any extra cables.

By the end of the process, it looked like this:

Whilst most of the data I gathered was writing and knowledge from the last four years of my career, I could have gone deeper, and I would have found just as much material. For example, if I had gone to my 2010 to 2022 archive folders, I would have found hundreds of thousands of words of lesson planning and curriculum documents.

This represents my corpus from that period in my career. I wasn’t writing as much in the sense of publishing blog posts and books, but I was producing just as much written work in the form of those curriculum materials.

Every teacher has folders inside folders inside folders like this, right?… Right?

You should focus on whatever kind of knowledge makes most sense for you to hoard. Your expertise is valuable, whatever form it comes in. I always recommend voice transcription. I think it’s a great way of gathering and capturing ideas, whatever your role. But if you prefer another way of doing things, even handwritten notes can be captured and transcribed, and most AI platforms have excellent image recognition that can pull out handwriting and create text files from them.

Machine readable

Once all of the information is gathered, the next step is to convert everything into markdown files. There are two reasons to do this: speed and ease of use with the AI platform itself. Word documents in the .docx format are designed for human readers and contain a lot of bloat that can be easily stripped away if the intended reader is a large language model. So get all of that data organised in one place, and then instruct Claude Code or your model of choice to convert it all into markdown files.

I added an extra step, mostly just to see if I could, which was to build a search application using the programming language Rust. It is lightning fast. I have an archive of thousands of emails, hundreds of blog posts, articles, and hundreds of thousands of words of material from books. The Rust powered search engine can carry out keyword searches across this whole corpus in milliseconds, totally offline, running on an SD card.

As you can see in the image below, searching the entire blog archive (370+ articles) for the word “expertise” returns 30 results in 82 milliseconds. Take that, Google search.

Share what you know

It’s important to me that knowledge doesn’t end up locked away in a “vault” or in some jar-bound second brain. Here are a few examples of how I am already using this corpus, to share knowledge and to create new things:

Querying my entire blog archive to find where I’ve previously written about a concept, so I can build on it rather than repeat myself.
Pulling together relevant excerpts from past articles, book chapters, and transcribed notes when preparing a new presentation or workshop.
Asking Claude to identify recurring themes or contradictions across years of my writing that I hadn’t consciously noticed.
Drafting proposals for new clients by drawing on language and ideas from previous successful engagements.
Resurfacing forgotten draft ideas and half-finished pieces that suddenly become relevant to something I’m working on now.

There are plenty of great reasons to hoard your knowledge and then apply AI to that corpus of expertise. If I were still in the classroom, I would absolutely be experimenting with ways to do this for my lesson planning, scope and sequencing, and resources.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

← Back

Leon Furze

Hoard Your Knowledge, Then Share It