Logo
About Blog Projects Talks Podcasts Tags Other work
About Blog Projects Talks Podcasts Tags Other work

Blog

Post image

Training and evaluating a Word2Vec model using BlazingText in Sagemaker

AWS Sagemaker has a number of inbuilt algorithms, which are not only easier to use with the Sagemaker set up but are also optimised to work with AWS architecture. At my previous job, we used word embeddings extensively to help solve NLP problems. We …

Posted on September 7, 2020 • 12 minutes read Read on
Post image

Making beautiful boxplots using plotnine in Python

For the past year and a half, I have been switching gradually from using matplotlib to create graphs in Python to Hassan Kibirige's wonderful port of R's ggplot2, plotnine. When I was first starting to use this package, I found it was quite tricky to …

Posted on September 6, 2020 • 13 minutes read Read on
Post image

Using schemas to speed up reading into Spark DataFrames

While Spark is the best thing since sliced bread for dealing with big data, I definitely realise I have a lot to learn before I can use it to its full potential. One trick I recently discovered was using explicit schemas to speed up how fast PySpark …

Posted on August 24, 2020 • 3 minutes read Read on
Post image

Reading S3 data into a Spark DataFrame using Sagemaker

I recently finished Jose Portilla's excellent Udemy course on PySpark, and of course I wanted to try out some things I learned in the course. I have been transitioning over to AWS Sagemaker for a lot of my work, but I haven't tried using it with …

Posted on August 10, 2020 • 5 minutes read Read on
Post image

Simplifying the normal equation with Gram-Schmidt

In the last post I talked about how to find the coefficients that give us the line of best fit for a OLS regression problem using the normal solution. The core of this approach is the equation: $$ X^TXb = X^Ty $$The way we solved this in the previous …

Posted on July 27, 2020 • 8 minutes read Read on
Previous Page 6 of 16 Next
Copyright © 2015 - 2026 Jodie Burchell   |   BY-NC 4.0