My name is Jodie Burchell and I'm a data scientist living in the beautiful city of Berlin, Germany. This blog is a collection of my projects and things I've learned using Python, R, SQL and other tools. The opinions expressed here are my own and do not reflect on my employer.

Using schemas to speed up reading into Spark DataFrames

While Spark is the best thing since sliced bread for dealing with big data, I definitely realise I have a lot to learn before I can use it to its full potential. One trick I recently discovered was using explicit schemas to speed up how fast PySpark can read a …

