Logo
About Blog Projects Talks Podcasts Tags Other work
About Blog Projects Talks Podcasts Tags Other work

Tag: Nlp

← Browse all tags

13 posts with this tag

Post image

Why the LMArena is a flawed approach to LLM evaluation

In the last blog post, we discussed some of the major issues with LLM evaluation benchmarks, and why they are often a poor way of assessing model performance. So what, if anything, should replace them? One alternative you may have heard of is the …

Posted on February 9, 2026 • 6 minutes read Read on
Post image

What LLM benchmarks get wrong about measuring model performance

Have you ever listened to the creators of the latest large language model talk about how their model is the "most powerful", "most intelligent" or "best" model to date, and wondered how they're measuring this? One of the most common ways this is …

Posted on January 19, 2026 • 11 minutes read Read on
Post image

Data leakage is a major issue when measuring LLM performance

Have a think back to when GPT-4 came out in March 2023. One of the key things that OpenAI highlighted in their technical report was that this model was showing near-human performance, measured by how it scored on exams designed for humans. They …

Posted on December 29, 2025 • 8 minutes read Read on
Post image

Are LLMs really on the path to AGI?

The claims of artificial general intelligence, or AGI, have been some of the hottest and most emotionally charged discussions about large language models. In addition, these claims probably have the most intellectual weight behind them of all of the …

Posted on July 27, 2024 • 17 minutes read Read on
Post image

Could LLMs be conscious or sentient?

In June 2022, a story hit international news that a Google engineer believed that one of their large language models had achieved sentience. Blake Lemoine was testing Google's conversational LLM LaMDA (the model that went on to power the original …

Posted on July 13, 2024 • 11 minutes read Read on
Page 1 of 3 Next
Copyright © 2015 - 2026 Jodie Burchell   |   BY-NC 4.0