.NET Rocks! Podcast | Measuring LLMs

How do you measure the quality of a large language model? In this episode, I join Carl and Richard again on the .NET Rocks! podcast about my work measuring large language models for accuracy, reliability, and consistency. I talk about the variety of benchmarks that exist for LLMs and the problems they have. A broader conversation about quality digs into the idea that LLMs should be targeted to the particular topic area they are being used for - often, smaller is better! Building a good test suite for your LLM is challenging but can increase your confidence that the tool will work as expected.