 # Standard error

#### learnings and projects in data science

My name is Jodie Burchell and I'm a data scientist living in the beautiful city of Berlin, Germany. This blog is a collection of my projects and things I've learned using Python, R, SQL and other tools. The opinions expressed here are my own and do not reflect on my employer.

## Reading S3 data into a Spark DataFrame using Sagemaker

I recently finished Jose Portilla’s excellent Udemy course on PySpark, and of course I wanted to try out some things I learned in the course. I have been transitioning over to AWS Sagemaker for a lot of my work, but I haven’t tried using it with PySpark yet …

## Simplifying the normal equation with Gram-Schmidt

In the last post I talked about how to find the coefficients that give us the line of best fit for a OLS regression problem using the normal solution. The core of this approach is the equation:

$$X^TXb = X^Ty$$

The way we solved this in the previous post …

## Solving OLS regression with linear algebra

When I first learned least-squares linear regression in my undergrad degree, I remember that we approached it in the “calculus” way: taking the sum of the squared differences for each observation and solving a massive (and tedious) equation until we arrived at our coefficients and line of best fit. I …

## Working with matrices: powers and transposition

Today, we’ll complete our series on basic matrix operations by covering powers of a matrix and matrix transposition. In the previous posts, we covered matrix addition, subtraction and multiplication and matrix inversion.

## Powers of a matrix

Say we have a number $$a$$, and we’re asked to solve \(a …