Machine Learning

Machine Learning – Polynomial Regression from a “non data scientist programmer” point of view

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂 There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”

Algorithm Type: Supervised

With Linear Regression, our results are bounded by a straight line. This is not always the best approach. Especially in the case where you have outliers and other “curvy” results in your dependent variable in which your points remain further away from the “straight line”(resulting in much less accuracy in your model’s ability to predict). So what to do? In those cases where you are solving a regression problem, you can introduce “curves” by using the Polynomial Regression approach. You end up with a model that looks like this below. See that a simple/multiple linear regression would not be the best case in this scenario because the points would be too far from the straight line. By using this particular regression approach for business problems (that tend not to fit around a straight line), your model can be trained more accurately.

One of the key elements is determing/exploring/investigating which algorithms will fit your model best.

pr1

Python/R and Javascript have libraries that implement this algorithm and make it relatively easy.

Machine Learning

Machine Learning – Linear Regression from a “non data scientist programmer” point of view point of view

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂 There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”

Algorithm Type: Supervised

There are a few types of linear regression.

Simple Linear Regression
– One independent variable (X)

Resulting output is a straight line with your scatter points around it. The closer the points are to the line, the more solid/trained your model is based to predict. You train your model with your independent variable (X-axis) which generates your dependent/target variable (Y-axis). You can generate these graphs with your “training set”, then run a bunch of predictions with your “test set” (to see how good your model is) and have graphs for both to compare to see how valid/accurate your model is. Usually 86%+ is considered good in many business cases.

slr1

Multiple Linear Regression
– Multiple independent variables (x1, x2, ..) with still just 1 dependent variable (Y-axis)

Key concept here: Since you’ll have a dataset with several independent variables, you will want to attempt to eliminate those variables that are “not significant enough to add to the value of your model”. In other words, “eliminate the noise / variables that you really don’t need in order to solve your business problem”.

You can do this with “backward propogation” by eliminating the independent variables your model does not really need. It will speed up your model significantly.

This is where you can use the concept of a p-value to achieve this.

You’d setup a threshold and basically include only those independent variables that rank above the threshold (meaning those are the variable(s) that contribute significantly to value of the results of your model).

Since you have 2 x-axis (for your independent variables) in a multiple linear regression, you will have a Z dimension. This is an example of multiple linear regression across 2 independent variables to produce a dependent variable.

This taken from http://www.mathworks.com

mlr1

Python/R and Javascript have libraries that implement this algorithm and make it relatively easy.

Machine Learning

My experience only – where to start to learn Machine Learning..

Mileage may vary and everyone is different. After putting off learning about machine learning for a while, I decided to bite the bullet and try to get beginner to low/intermediate knowledge on the subject.

In no way, shape or form am I going to declare that I’m an expert or that I can take on an ML project completely on my own, etc.. What I *can* say is that I wanted to get enough knowledge about machine learning so that I can at least discuss it with others and identify whether it would be applicable for a project and/or to determine which model/algorithm(s) may be useful to solve a particular problem/case. There are tons of blogs out there, youtube videos and such. My video(s)/blog(s) may be completely inferior about the subject. That’s ok. I’m just trying to share (as someone still learning) where I’m coming from and if anyone else out there can benefit from any piece of info I present, then it’s a “win-win” for me and it was all worth while.

I put together this video here to describe the very very basics of how you can get started if you were in the same boat as I (an experienced developer with no prior real knowledge about Machine Learning but would like to understand more about it).

One key takeaway about many of the algorithms:

As they say, there is no perfect model, just one that seems to work better than others for a certain situation

Here is the course I recommend (I cover this in the video) regardless of whether or not you have Pluralsight. This course is extremely recommended and the best one for me I have found on the subject.

https://www.udemy.com/machinelearning/

and if you have a Pluralsight subscription, take this course first before you take the above one. If you don’t, you can just do the above one but this Pluralsight course will help you a bit as you take the above one.

https://www.pluralsight.com/courses/play-by-play-machine-learning-exposed

Machine Learning

Machine Learning Algorithms

In-progress

Classification Algortims

K-Nearest Neighbor
a. It is supervised (it knows the dependent variable ahead of time)
b. Looks at labelled points closest to the “unknown point to be classified”
and is “voted in” to that label class.
c. Computationally expensive tho (con)
SVM (Support Vector Machines)
a. In 2d space, the two points (on opposite sides) closest to the “equidistant line splitting the classes” are known as “support vectors.
b. Use kernal vector SVM trick in the case where a group of classes cannot be
linearly split (think a circle within a circle). So you create a new dimension (Z) and elevate one of the classes so that a split can be created as a result of the “added dimension”
Logistic Regression
a. Used when the dependent variable is categorical.
b. An example of a use case is email that is either spam or not.
c. Another example, a credit card company can use this to determine when an applicant will have good credit or not.
Decision Trees
a. single decision tree can have the potential for overfitting(results tied to close to the training set).
Random Forest (Ensemble) – bunch of algorithms used but take the avg to get the class.
a. Solves the issue with decision trees in that overfitting much less likely due to “the power of the crowd”.
Naive Bayes
a. tends to outperform very sophisticated methods, useful for large datasets.
b. assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature meaning “independence among features”.

Supervised

Unsupervised

Machine Learning

Machine Learning – Simple/Multiple Linear Regession

I’m currently taking the Machine Learning in Python course which is a 40 hour video course. About 30% of that is in R tho so I’m skipping that for now as it’s basically all the Python exercises in R.

I’ve sent off an email asking the instructors if it’s ok that I share some of their examples in videos. I don’t wish to reveal any content against their wishes.

In the meantime, until I hear back from them, I can give a brief overview of some things.

Simple Linear Regression

What is it?

– It’s a regression (not classification) algorithm by in which you have a dataset of 1 independent variable (x) with actual results (y). Based on that dataset, you split it out into a training (usually about 80%) and test set (usually 20% the size of the full dataset).

– It’s based on y = b0 + b1x1 where you have only that one independent x1 variable

– You can use the Python Linear regression classes to create your regressor model based on the training set. From there you can run test sets against it to do predictions and match the predictions with the actual data from the test sets to check the performance of your model.

Multiple Linear Regression

What is it?

– It’s a regression (not classification) algorithm by in which you have a dataset of 1-many independent variables (x1, x2, x3, etc..) with the dependent variable results (y). Based on that dataset, you split it out into a training (usually about 80%) and test set (usually 20% the size of the full dataset).

– It’s based on y = b0 + b1x1 + b2x2 + … bNxN

– You can use techniques like Backwards Elimination, Forward Selection, BiDirectional Elimination, All In.

– “p-value” is a very important concept.
The p-value is the probobility that, if the NULL hypothesis is true, sampling variation would produce an estimate that is further away from the hypothesized value than our data estimate. The p-value tells us how likely it is to get a result like this if the null hypothesis is true.
The null hypothesis is simply that the treatment or change will have no effect on the outcome of the experiment. For this example, it would be that reducing the number of workouts would not affect time to achieve weight loss

The null hypothesis is the proposition that implies no effect or no relationship between phenomena or populations. Any observed difference would be due to sampling error (random chance) or experimental error.
– p-value < 0.05 means we have evidence of an effect. More than 0.05 means no evidence of an effect of null hypothesis.

Start by saying the null hypothesis is TRUE (Helen defending herself in the video above). Take a sample and get a statistic.
With the sample, how likely it is to get a statistic like this (mean = 68.7g) if the null hypothesis is true. It is p-value = 0.18
0.18 is larger than 0.05 (significance value) so Helen's case is true and she's off the hook.

– There's different strategies for creating your model

– You can use the Python Linear regression classes to create your regressor model based on the training set. From there you can run test sets against it to do predictions and match the predictions with the actual data from the test sets to check the performance of your model.