Blog

Linear regression from scratch in Python

Almost six years ago (which means this blog is … old), I wrote what has become one of my favourite blog posts), explaining the linear algebra approach to linear regression (specifically to OLS, or ordinary least square regression). I was …

Posted on March 2, 2026 • 15 minutes read Read on

Why the LMArena is a flawed approach to LLM evaluation

In the last blog post, we discussed some of the major issues with LLM evaluation benchmarks, and why they are often a poor way of assessing model performance. So what, if anything, should replace them? One alternative you may have heard of is the …

Posted on February 9, 2026 • 6 minutes read Read on

What LLM benchmarks get wrong about measuring model performance

Have you ever listened to the creators of the latest large language model talk about how their model is the "most powerful", "most intelligent" or "best" model to date, and wondered how they're measuring this? One of the most common ways this is …

Posted on January 19, 2026 • 11 minutes read Read on

Data leakage is a major issue when measuring LLM performance

Have a think back to when GPT-4 came out in March 2023. One of the key things that OpenAI highlighted in their technical report was that this model was showing near-human performance, measured by how it scored on exams designed for humans. They …

Posted on December 29, 2025 • 8 minutes read Read on

How convolutional neural networks (CNNs) work

Convolutional neural networks, CNNs, convnets, call them what you like - these powerful neural nets remain one of the most popular type of models for classifying images. But have you ever wondered how they work? In this blog post, we'll go through …

Posted on August 10, 2024 • 18 minutes read Read on