Machine Learning

Machine Learning – PCA Dimensional Reduction from a “non data scientist programmer” point of view

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂 There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”

Algorithm Type: Classification

From this wiki link:

In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. It can be divided into feature selection and feature extraction.

I created a video, rough around the edges, where I had the kind permission from the staff at SuperDataScience (Kirill Eremenko, Hadelin de Ponteves, SuperDataScience Team, SuperDataScience Support) from their awesome course here:

https://www.udemy.com/machinelearning/

I learned a lot but am giving them 100% credit for everything I’m showing/explaining in the video.

This demonstrates PCA Dimensional Reduction in action using one of the workshops from the course.

Machine Learning

Machine Learning – NLP (Natural Language Processing) from a “non data scientist programmer” point of view

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂 There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”

Algorithm Type: Classification

From this wiki link:

Natural language processing (NLP) is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

Challenges in natural language processing frequently involve speech recognition, natural language understanding, and natural language generation.

I created a video, rough around the edges, where I had the kind permission from the staff at SuperDataScience (Kirill Eremenko, Hadelin de Ponteves, SuperDataScience Team, SuperDataScience Support) from their awesome course here:

https://www.udemy.com/machinelearning/

I learned a lot but am giving them 100% credit for everything I’m showing/explaining in the video.

This demonstrates NLP in action using one of the workshops from the course.

Machine Learning

Machine Learning – Reinforcement Learning via Thompson Sampling, a “non data scientist programmer” point of view

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂 There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”

Algorithm Type: Reinforcement

Use Cases:
Training Robot dogs to walk
Games where you give rewards or penalties, etc..

Reinforcement learning involves the model “training itself and making adjustments on the fly”. As more data comes in, it learns more and more by making adjustments. Consider a game of chess where the game is programmed to “learn from its past mistakes with each game it plays in order to get better results”.

There are 2 widely used algorithms for this.
UCB – Upper confidence bound
Thompson Sampling

This will focus on “Thompson Sampling”(which in the end seemed to outperform UCB)

You are trying to combine exploration and exploitation in order to make predictions without needing “a whole ton of data and history from the get go”. There may be times when you are not provided with “that ton of data” or that you don’t have “that history” at your fingertips. It also may not be economically feasible to get that too.

(ALL images below from this recommended course on Udemy by Kirill Eremenko, Hadelin de Ponteves)

There’s a famous dilema called “The Multi-Armed Bandit Problem“. In a nutshell, you are presented with a bunch of slot machines

ucb1

and you don’t know which one is going to present you with the “best payback”. So you can try them all a bunch of times (and lose a lot of money too) to figure out what is the best output.

When you figure out which one is best, you can exploit that one (advertising is a key business case).

You want to try to minimize costs and time and get to the optimal result as soon as we can.

This does not tend to really explain it in detail but it shows you a high level difference between this and the UCB. You basically create your own “imaginary configuration ideal point for each machine”

ts1

then with each round you get the actual (which is different from the imaginary point) of the one with the highest distribution

ts2

If you want to see that there’s some complex math going on between each round, this should convince you (but only dig into the formula if you want to really know about the formula and how it works but the API’s will do most all of this for you).

ts3

You go thru many many rounds of this adjusting your points based on the new calculated distributions and eventually you get these points (example shows 3 machines here and we want to find the one with the best chance of a payout).

ts4

Here’s a really good site that explains a lot about the Thompson Sampling. It’s worth a read:

https://www.quora.com/What-is-Thompson-sampling-in-laymans-terms

ts5

from the lab in the this recommended course on Udemy by Kirill Eremenko, Hadelin de Ponteves), the results here show that it is more accurate than the UCB approach for the same problem. 4 is the predominant value in our test data set.

Thompson Sampling:

ts6

and the UCB on the same problem:
ucb6

Machine Learning

Machine Learning – Reinforcement Learning via UCB, a “non data scientist programmer” point of view

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂 There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”

Algorithm Type: Reinforcement

Use Cases:
Training Robot dogs to walk
Games where you give rewards or penalties, etc..

Reinforcement learning involves the model “training itself and making adjustments on the fly”. As more data comes in, it learns more and more by making adjustments. Consider a game of chess where the game is programmed to “learn from its past mistakes with each game it plays in order to get better results”.

There are 2 widely used algorithms for this.
UCB – Upper confidence bound
Thompson Sampling

This will focus on “UCB”

You are trying to combine exploration and exploitation in order to make predictions without needing “a whole ton of data and history from the get go”. There may be times when you are not provided with “that ton of data” or that you don’t have “that history” at your fingertips. It also may not be economically feasible to get that too.

(ALL images below from this recommended course on Udemy by Kirill Eremenko, Hadelin de Ponteves)

There’s a famous dilema called “The Multi-Armed Bandit Problem“. In a nutshell, you are presented with a bunch of slot machines

ucb1

and you don’t know which one is going to present you with the “best payback”. So you can try them all a bunch of times (and lose a lot of money too) to figure out what is the best output.

When you figure out which one is best, you can exploit that one (advertising is a key business case).

You want to try to minimize costs and time and get to the optimal result as soon as we can.

The algorithm assumes some starting point for every distribution so we put it on the same line (see the red … lines). The colored lines are “known at the end” of our exploration and are used for reference.

ucb2

Over time as we learn more and more thru our exploration, the formula will continue to adjust the “starting” point. The colored lines are “known at the end of our exploration and are used for reference.

ucb4

Finally after a bunch more iterations (exporation), we notice our original starting point changes further

ucb5

and we can see that the machine with the best output is the fifth one.

The key is to try and “minimize” the amount of exploration needed.

In this example from the great machine learning course, we end up with a final result like this after churning thru the UCB algorithm

ucb6.JPG

Machine Learning

Machine Learning – Hierarchical Clustering from a “non data scientist programmer” point of view

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂 There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”

Algorithm Type: Cluster

Use Cases:
Inventory categorization
Detecting bots
Behavioral segmentation (determing which user types (based on interest) are visting your website, etc..)

With clustering, you don’t know what you’re looking for and you are trying to identify “segments”(clusters) in your data for “classification”.

Hierarchical Clustering tends to have the same results as KMeans but it uses a different process based on “dendograms” thru an iterative process.

The “memory of the iterative process of reducing N clusters until you reach 1” is stored in a dendogram.

A dendrogram is a diagram that shows the hierarchical relationship between objects. This page here provides a good explanation.

hc9

(ALL images below from this recommended course on Udemy by Kirill Eremenko, Hadelin de Ponteves)

hc5

hc4

hc3

hc2

hc1

In this example, our dendograms resulted in this:

hc5

drawing a horizontal straight line thru the largest (uninterrupted veritical line), we see that we intercept 5 spots which results in an optimal configuration of “5” clusters.

hc6

Our code would look like:

from sklearn.cluster import AgglomerativeClustering

# n_clusters = 5 which was derived above at line 90 by the chart!!!
hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward')
y_hc = hc.fit_predict(X)

AgglomerativeClustering means start from the “bottom up” and it is preferred over the other type which is Divisive.