As someone who grew up in a monolingual environment, I've always dreamed of being able to learn another language. When I moved to Germany 8 years ago, I was super excited at the chance to finally be forced into doing so … but to my horror, I realised I am not a "languages person". And let's be honest, German is not the most forgiving language. Somehow the variations of the verb ziehen can mean anything from raising a child (erziehen) to moving house (umziehen) to getting dressed (anziehen). And don't get me started on the genders and plurals …
Needless to say, I've tried so many different ways over the years to improve my German, and something I keep coming back to again and again are Anki flashcards. Anki is a free application that allows you to create whatever flashcards you like, and is especially great for language vocabulary cards. The app uses a effective timed repetition technique to present the words to you - words you're less sure about will be shown more often, and those you've already learned will be shown rarely, just for a bit of reinforcement.
I thought it would be a fun project to see if I could build an LLM agent which can automatically select random lists of words, translate them, and turn them into Anki decks. It is based on a dataset of word lists in a range of languages, which are preprocessed using traditional NLP techniques such as lemmatisation and Zipf frequency.
The agent uses a ReAct agent architecture with access to a number of tools:
- Tools to select random words, or random words by difficulty level;
- A translation tool which translates the words into a target language specified by the user;
- External tools to create and update Anki decks using the MCP server Clanki.
The project uses LangGraph, and therefore is very flexible with the models you can use. I started this project with a proprietary model (GPT-4o as the reasoning model), but switched over to open weight models through Ollama (Qwen3 as the reasoning model, and Llama3.2 as the translation model).
As I'm most familiar with English, German and Spanish, I've used these three language lists, but you can update the project with whatever word lists you like. There are also a number of other changes you could add to the project, such as:
- Revisiting the source of the word lists for some of the languages. After processing, some of these seemed incomplete;
- Add more filtering data to the word lists. For example, you could add which part of speech the word is, such as a noun, adjective or verb, which is very useful information for language learners;
- Add a long-term memory feature, allowing you to keep track of the words that have already been retrieved and turned into flashcards;
- Add tools to make more advanced flashcards. For example, Anki allows you to create flashcards called Cloze cards, which allow you to learn by seeing your target word in a sentence.
If you want to check out the full code, you can access it in this repo. I'll also have a long-form tutorial coming out going over this project soon, so watch this space!
