My name is Jodie Burchell and I'm a data scientist living in the beautiful city of Berlin, Germany. This blog is a collection of my projects and things I've learned using Python, R, SQL and other tools. The opinions expressed here are my own and do not reflect on my employer.

In our previous blog post we discussed how to implement the Minkowski distance formula in a couple of functions which relied heavily on for loops. On our full data, this lead to a processing time of over an hour. With some simple tricks in NumPy which exploit the properties of …

In the last blog post, we managed to shave a bit of time off our calculation of the Minkowski distance by using vector subtraction. Instead of calculating the difference between each pair of vectors elementwise using a loop, we were able to take advantage of NumPy’s vectorised implementation to …

In the last blog post, we discussed how to calculate the Manhattan and Euclidean distances from first principles. However, in that post, we did a very manual implementation for a single pair of vectors, which would not generalise well to more than one pair and would become cumbersome for more …

If you had to invent a machine learning algorithm from scratch, what would be some of the ways you’d find patterns in your data? One idea that you might have come up with is to assume that data points that are “close” to each other are similar, and those …

During my years of working as a data scientist, I’ve tried quite a number of IDEs. When I was primarily working with R, RStudio was a very nice environment to work with, but when I moved to working in Python I hadn’t been able to find anything close …