Optimising the Minkowski distance, part 3: removing redundant calculations written July 06, 2022 in python, machine learning

Optimising the Minkowski distance, part 2: broadcasting written June 29, 2022 in python, machine learning

Optimising the Minkowski distance, part 1: vector subtraction written June 14, 2022 in python, machine learning

Just enough linear algebra to understand the Minkowski distance written June 07, 2022 in python, machine learning

Exploring decision tree modelling with DataSpell written December 28, 2021 in python, machine learning

Finding your new favourite Christmas recipe using NLP written December 24, 2020 in nlp, machine learning

Automatic word2vec model tuning using Sagemaker written November 18, 2020 in aws, sagemaker, machine learning

Training and evaluating a Word2Vec model using BlazingText in Sagemaker written September 07, 2020 in aws, sagemaker, machine learning

Reading S3 data into a Spark DataFrame using Sagemaker written August 10, 2020 in aws, pyspark, sagemaker

Simplifying the normal equation with Gram-Schmidt written July 27, 2020 in maths, linear algebra, python

Working with matrices: powers and transposition written June 29, 2020 in maths, linear algebra, python

Working with matrices: addition, subtraction and multiplication written June 01, 2020 in maths, linear algebra, python

Making beautiful plots in Python (plus a shameless book plug!) written October 29, 2019 in python, ggplot2

Applying sentiment analysis with VADER and the Twitter API written April 15, 2017 in python, programming tips, text mining

Using VADER to handle sentiment analysis with social media text written April 08, 2017 in python, programming tips, text mining

Doing hierarchical clustering with a precalculated dissimilarity index written March 10, 2017 in r, programming tips, statistics

How do we feel about New Year’s resolutions (according to sentiment analysis)? written January 10, 2017 in python, programming tips, public data, twitter api, pandas

A crash course in reproducible research in Python written October 04, 2016 in python, pandas, virtualenvs, programming tips

Creating plots in R using ggplot2 - part 11: linear regression plots written May 11, 2016 in r, ggplot2, r graphing tutorials

Creating plots in R using ggplot2 - part 10: boxplots written April 18, 2016 in r, ggplot2, r graphing tutorials

Creating plots in R using ggplot2 - part 9: function plots written March 28, 2016 in r, ggplot2, r graphing tutorials

Creating plots in R using ggplot2 - part 8: density plots written March 16, 2016 in r, ggplot2, r graphing tutorials

Creating plots in R using ggplot2 - part 7: histograms written February 28, 2016 in r, ggplot2, r graphing tutorials

Creating plots in R using ggplot2 - part 6: weighted scatterplots written February 13, 2016 in r, ggplot2, r graphing tutorials

Creating plots in R using ggplot2 - part 5: scatterplots written February 04, 2016 in r, ggplot2, r graphing tutorials

Creating plots in R using ggplot2 - part 4: stacked bar plots written January 19, 2016 in r, ggplot2, r graphing tutorials

Creating plots in R using ggplot2 - part 3: bar plots written January 07, 2016 in r, ggplot2, r graphing tutorials

What are the most popular Christmas movies according to MovieLens 10M? written December 23, 2015 in python, sql, web scraping, pandas, matplotlib

Creating plots in R using ggplot2 - part 2: area plots written December 22, 2015 in r, ggplot2, r graphing tutorials

Getting rJava to work in OS X El Capitan: A non-technical guide written December 16, 2015 in r, programming tips

Creating plots in R using ggplot2 - part 1: line plots written December 15, 2015 in r, ggplot2, r graphing tutorials

Analysing reddit data - part 4: data analysis written December 09, 2015 in python, programming tips, pandas, scipy, matplotlib, hypothesis testing

Analysing reddit data - part 3: cleaning and describing the data written December 02, 2015 in python, programming tips, public data, reddit api, pandas

Analysing reddit data - part 2: extracting the data written November 25, 2015 in python, programming tips, public data, reddit api, pandas

Analysing reddit data - part 1: setting up the environment written November 18, 2015 in python, programming tips, public data

Object-oriented programming in Python for a non-object-oriented programmer written November 11, 2015 in python, programming tips

Linear regression tools in R written November 04, 2015 in statistics, r, regression, programming tips

Interpreting linear regression coefficients written October 28, 2015 in statistics, r, hypothesis testing, regression

Using k-fold cross-validation to estimate out-of-sample accuracy written October 14, 2015 in machine learning, r, kaggle

Two-group hypothesis testing: permutation tests written October 07, 2015 in statistics, r, data simulations, hypothesis testing

Two-group hypothesis testing: independent samples t-tests written September 30, 2015 in statistics, r, data simulations, hypothesis testing

Interactive model decision trees in Stata written September 22, 2015 in stata, programming tips, consulting

A gentle introduction to the standard error of the mean written September 01, 2015 in statistics, r, data simulations