Machine Learning

Forecasting with Prophet (by Facebook) from a non-data scientist programmer point of view.

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂 There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”

Algorithm Type: Forecasting

Based on this course that I have, I learned a bit about Prophet (by Facebook) and wanted to share what I learned about it.

Machine Learning

Differences between Deep Learning and Machine Learning from a non-data scientist programmer point of view.

Machine Learning

  • You select the model to train (ie, Naive Bayes, Random Forest, etc..)
  • You manually extract the features in your dataset that you want to include in your model
  • Enables the machine to learn “with experience”.
  • Consists of learning methods such as “supervised learning”(labeled data) (classification (discreet values) and regression(predict continous output)) and “unsupervised learning (unlabeled data)” and “reinforcement learning (learns by experience such as games, etc..)” (clustering)

Deep Learning

  • You select the architecture (ie, convolutional neural networks or ANN, RNN, etc..)
  • The features are automatically extracted/calculated from the inputs you provide to the model (it uses NN to infer those).  Examples could be images you feed in and the NN figures out what features to look for by extracting them.
  • DL is a subset of ML and ML is a subset of AI (artificial intelligence)
  • Relies on the training of deep artificial NN’s using a large dataset (ie, images, etc..)
  • Based on the way the human brain thinks and deducts reasoning thru layers of neurons that feed inputs to the next layers down the chain until the correct answer is deduced.

 

Machine Learning

Machine Learning – Boosting algorithms (Gradient and Adaptive Boosting) from a “non data scientist programmer” point of view

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂 There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”

Algorithm Type: Classification and Regression

Adaboost Based on Adaptive Boosting
Converting many weak learners to strong learners using methods like averages/weighted averages and then use the predictions with the highest vote. This is different than Random Forests which are an ensemble of decision trees (which are not the same as weak learners).

Gradient Boosting based on decision tree boosting.
Boosting algorithms have gained popularity recently for their effectiveness. A growing popular open source library based on the gradient boosting algorithm is “XGBoost”.

Here’s a video I put together demonstrating a popular open source library called XGBoost.

Machine Learning

Machine Learning – Deep Learning/Neural Networks from a “non data scientist programmer” point of view

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂

Rather than me trying to really explain neural networks (as it really can be a daunting and complex topic) and just mumbling and fumbling around with my explanation, I found this excellent video that explains what a Neural Network is so that someone like myself can understand it.

and this one continues on with a Javascript example:

I feel that’s a great explanation because you’ll find several of the courses tend to “over explain it in detail” at first. Without a basic level like the above, it’s difficult to follow many of the courses’ explanations.

this one too

Some key points that I took away from this:

Neural Network is an artificially created “machine” that is trained learn how to predict results based on “important inputs”.

Like any other machine in the world, you have to build it.

The “machine” consists of
– 3+ layers (input (1 layer), hidden (the heart/engine) (1/many layers), output (1 layer))
– The hidden layer(s) starts out with random biases and weights and continues to make adjustments (backward propogation) based on error thresholds/learning rates that are configurable.
– The machine learns thru each iteration (epochs) which remembers all of the previous epochs with which to base its error corrections (and learning rates) on. An accumulattive effect. It learns by doing.
– The “machine” will need regular maintenance and tune-ups like most other machines in the world (ie, cars, manufacturing, robots). The tune-ups may require some re-training based on new environmental factors which impact input data, etc..

Machine Learning

Machine Learning – LDA/Kernel PCA Dimensional Reduction from a “non data scientist programmer” point of view

This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate 🙂 There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”

Algorithm Type: Classification

In this link, I demoe’d PCA.

The other two techniques in the course (that I recommend) for dimensionality reduction are:

LDA
Kernel PCA

The difference between them is the way they perform their feature selection:

PCA – Backwards Elimination
LDA – Forward Selection
Kernel PCA – Bidirectional Elimination

After careful consideration (i apologize if this disappoints anyone), I have decided not to do videos/demo LDA and Kernel PCA from the course. My decision was solely to respect the folks at SuperDataScience who put the course together. It’s a very economical course and you can get it on sale very cheap on certain days. It’s a no-brainer to buy as you will learn a lot. I just don’t want to give away “too much of their code” as it’s worth buying the course.

I will still demo “some of the content” from that course in other videos but will be leaving enough out to where respect for their course has been honored by not showing a majority of their code. So basically, I’m providing a sampling (albeit good in spots), leaving out just enough so as to preserve the integrity of their course since they were kind enough to allow me to use their course in these demos.

Having said that, LDA/Kernel PCA are just two other methods for performing dimensionality reduction.

Here’s a good link that explains the differences between Kernel/Standard PCA

https://stats.stackexchange.com/questions/94463/what-are-the-advantages-of-kernel-pca-over-standard-pca

for Kernel, think higher dimensional space (as was also the case with Kernel SVM (Support Vector Machine).

Also a very good link comparing the differences between PCA and LDA techniques.

And if you want practice with examples/labs, the machine learning course by SuperDataScience academy will provide you with it 🙂