Logo
About Blog Projects Talks Podcasts Tags Other work
About Blog Projects Talks Podcasts Tags Other work

Tag: Pyspark

← Browse all tags

2 posts with this tag

Post image

Using schemas to speed up reading into Spark DataFrames

While Spark is the best thing since sliced bread for dealing with big data, I definitely realise I have a lot to learn before I can use it to its full potential. One trick I recently discovered was using explicit schemas to speed up how fast PySpark …

Posted on August 24, 2020 • 3 minutes read Read on
Post image

Reading S3 data into a Spark DataFrame using Sagemaker

I recently finished Jose Portilla's excellent Udemy course on PySpark, and of course I wanted to try out some things I learned in the course. I have been transitioning over to AWS Sagemaker for a lot of my work, but I haven't tried using it with …

Posted on August 10, 2020 • 5 minutes read Read on
Copyright © 2015 - 2026 Jodie Burchell   |   BY-NC 4.0