OCR AI agent for handwritten documents

Cover image by Donatello Trisolino

Do you have a stack of old, handwritten family recipes laying around that you've been meaning to digitalise? Are you curious about what's going on in the intersection between AI agents and computer vision?

In this project, I built an OCR agent which can not only read in handwritten recipes, but convert the metrics from imperial to metric. Using a ReAct agent architecture, the agent has access to a number of tools, including:

As the agent is based on LangGraph, it is also very simple to swap out models. As part of this project, I experimented with using proprietary models (such as GPT-4o as both the vision-language and reasoning model) and open weight models through Ollama (Qwen2.5-VL for the vision-language model and Qwen3 for the reasoning model).

If you want to check out the full code, you can access it in this repo. I also presented this project as an OpenCV Live episode, if you want to see step-by-step how to build everything.

This project was built upon this one in Hugging Face's excellent agent's course.