31 posts with this tag
One of the biggest issues when building an effective machine learning algorithm is overfitting. Overfitting is where you build a model on your training data and it not only picks up the true relationship between the outcome and the predictors, but …
In the last blog post I described how you could test whether the difference between two groups was statistically significant using an independent-samples t-test. (I will rely heavily on that blog post in this one, so I encourage you to at least skim …
In some of my previous posts, I asked you to imagine that we work for a retail website that sells children's toys. In the past, they've asked us to estimate the mean number of page views per day (see here and here for my posts discussing this …
Unless we are lucky enough to have access to an entire population and the capacity to analyse all of that data, we have to make do with samples from our population to make statistical inferences. Choosing a sample that is a good representation of …
In the previous post, I explained the general principles behind the standard error of the mean (or SEM). The idea underlying the SEM is that if you take repeated samples from the population of interest and take the standard deviation of the means of …