Standard error

learnings and projects in data science

My name is Jodie Burchell and I'm a data scientist living in the beautiful city of Berlin, Germany. This blog is a collection of my projects and things I've learned using Python, R, SQL and other tools. The opinions expressed here are my own and do not reflect on my employer.

Training and evaluating a Word2Vec model using BlazingText in Sagemaker

AWS Sagemaker has a number of inbuilt algorithms, which are not only easier to use with the Sagemaker set up but are also optimised to work with AWS architecture. At my previous job, we used word embeddings extensively to help solve NLP problems. We found that AWS’s implementation of …

written in Read on →

Making beautiful boxplots using plotnine in Python

For the past year and a half, I have been switching gradually from using matplotlib to create graphs in Python to Hassan Kibirige’s wonderful port of R’s ggplot2, plotnine. When I was first starting to use this package, I found it was quite tricky to find clear instructions …

written in Read on →

Using schemas to speed up reading into Spark DataFrames

While Spark is the best thing since sliced bread for dealing with big data, I definitely realise I have a lot to learn before I can use it to its full potential. One trick I recently discovered was using explicit schemas to speed up how fast PySpark can read a …

written in Read on →

Reading S3 data into a Spark DataFrame using Sagemaker

I recently finished Jose Portilla’s excellent Udemy course on PySpark, and of course I wanted to try out some things I learned in the course. I have been transitioning over to AWS Sagemaker for a lot of my work, but I haven’t tried using it with PySpark yet …

written in Read on →

Simplifying the normal equation with Gram-Schmidt

In the last post I talked about how to find the coefficients that give us the line of best fit for a OLS regression problem using the normal solution. The core of this approach is the equation:

$$X^TXb = X^Ty$$

The way we solved this in the previous post …

written in Read on →