AWS Sagemaker has a number of inbuilt algorithms, which are not only easier to use with the Sagemaker set up but are also optimised to work with AWS architecture. At my previous job, we used word embeddings extensively to help solve NLP problems. We …
For the past year and a half, I have been switching gradually from using matplotlib to create graphs in Python to Hassan Kibirige's wonderful port of R's ggplot2, plotnine. When I was first starting to use this package, I found it was quite tricky to …
While Spark is the best thing since sliced bread for dealing with big data, I definitely realise I have a lot to learn before I can use it to its full potential. One trick I recently discovered was using explicit schemas to speed up how fast PySpark …
I recently finished Jose Portilla's excellent Udemy course on PySpark, and of course I wanted to try out some things I learned in the course. I have been transitioning over to AWS Sagemaker for a lot of my work, but I haven't tried using it with …
In the last post I talked about how to find the coefficients that give us the line of best fit for a OLS regression problem using the normal solution. The core of this approach is the equation: $$ X^TXb = X^Ty $$The way we solved this in the previous …