Real Python Podcast | Preparing data to measure true machine learning model performance

December 2, 2022

Cover image by Real Python Podcast

How do you prepare a dataset for machine learning (ML)? How do you go beyond cleaning the data and move toward measuring how the model performs? In this episode, I return to the Real Python podcast to talk about strategies for better ML model performance.

I start by defining some terms for the conversation. We talk about targets, features, and supervised learning.

We discuss three common ways that data can alter model performance and which Python tools can help spot and avoid them. I share my personal experiences of working through these pitfalls. We also share a healthy collection of resources to explore and learn more.