Logo
About Blog Projects Talks Podcasts Tags Other work
About Blog Projects Talks Podcasts Tags Other work

Blog

Post image

Using schemas to speed up reading into Spark DataFrames

While Spark is the best thing since sliced bread for dealing with big data, I definitely realise I have a lot to learn before I can use it to its full potential. One trick I recently discovered was using explicit schemas to speed up how fast PySpark …

Posted on August 24, 2020 • 3 minutes read Read on
Post image

Reading S3 data into a Spark DataFrame using Sagemaker

I recently finished Jose Portilla's excellent Udemy course on PySpark, and of course I wanted to try out some things I learned in the course. I have been transitioning over to AWS Sagemaker for a lot of my work, but I haven't tried using it with …

Posted on August 10, 2020 • 5 minutes read Read on
Post image

Simplifying the normal equation with Gram-Schmidt

In the last post I talked about how to find the coefficients that give us the line of best fit for a OLS regression problem using the normal solution. The core of this approach is the equation: $$ X^TXb = X^Ty $$The way we solved this in the previous …

Posted on July 27, 2020 • 8 minutes read Read on
Post image

Solving OLS regression with linear algebra

When I first learned least-squares linear regression in my undergrad degree, I remember that we approached it in the "calculus" way: taking the sum of the squared differences for each observation and solving a massive (and tedious) equation until we …

Posted on July 13, 2020 • 9 minutes read Read on
Post image

Working with matrices: powers and transposition

Part of the series Linear Algebra Basics 1. Working with matrices: addition, subtraction and multiplication 2. Working with matrices: inversion 3. Working with matrices: powers and transposition Today, we'll complete our series on basic matrix …

Posted on June 29, 2020 • 5 minutes read Read on
Previous Page 7 of 17 Next
Copyright © 2015 - 2026 Jodie Burchell   |   BY-NC 4.0