I’m currently taking the Machine Learning in Python course which is a 40 hour video course. About 30% of that is in R tho so I’m skipping that for now as it’s basically all the Python exercises in R.
I’ve sent off an email asking the instructors if it’s ok that I share some of their examples in videos. I don’t wish to reveal any content against their wishes.
In the meantime, until I hear back from them, I can give a brief overview of some things.
Simple Linear Regression
What is it?
– It’s a regression (not classification) algorithm by in which you have a dataset of 1 independent variable (x) with actual results (y). Based on that dataset, you split it out into a training (usually about 80%) and test set (usually 20% the size of the full dataset).
– It’s based on y = b0 + b1x1 where you have only that one independent x1 variable
– You can use the Python Linear regression classes to create your regressor model based on the training set. From there you can run test sets against it to do predictions and match the predictions with the actual data from the test sets to check the performance of your model.
Multiple Linear Regression
What is it?
– It’s a regression (not classification) algorithm by in which you have a dataset of 1-many independent variables (x1, x2, x3, etc..) with the dependent variable results (y). Based on that dataset, you split it out into a training (usually about 80%) and test set (usually 20% the size of the full dataset).
– It’s based on y = b0 + b1x1 + b2x2 + … bNxN
– You can use techniques like Backwards Elimination, Forward Selection, BiDirectional Elimination, All In.
– “p-value” is a very important concept.
The p-value is the probobility that, if the NULL hypothesis is true, sampling variation would produce an estimate that is further away from the hypothesized value than our data estimate. The p-value tells us how likely it is to get a result like this if the null hypothesis is true.
The null hypothesis is simply that the treatment or change will have no effect on the outcome of the experiment. For this example, it would be that reducing the number of workouts would not affect time to achieve weight loss
The null hypothesis is the proposition that implies no effect or no relationship between phenomena or populations. Any observed difference would be due to sampling error (random chance) or experimental error.
– p-value < 0.05 means we have evidence of an effect. More than 0.05 means no evidence of an effect of null hypothesis.
Start by saying the null hypothesis is TRUE (Helen defending herself in the video above). Take a sample and get a statistic.
With the sample, how likely it is to get a statistic like this (mean = 68.7g) if the null hypothesis is true. It is p-value = 0.18
0.18 is larger than 0.05 (significance value) so Helen's case is true and she's off the hook.
– There's different strategies for creating your model
– You can use the Python Linear regression classes to create your regressor model based on the training set. From there you can run test sets against it to do predictions and match the predictions with the actual data from the test sets to check the performance of your model.