This is all from my own “beginners perspective”. I am NOT claiming to be an expert and welcome any constructive criticism and corrections to anything I may have said that might not be completely accurate š There are no perfect models. The key is to find the best algorithm for the specific job/problem that needs to be solved”
Algorithm Type: Association
Use Cases:
Online shop recommentations (people who purchased this item also tended to purchase..).
People who enjoyed this album may also enjoy the following albums …
Market Basket Analysis to increase sales
(ALL images below from this recommended course on Udemy by Kirill Eremenko, Hadelin de Ponteves)
Apriori is all about “identifying associations” of unrelated features to the dependent variable. It’s an algorithm used for mining frequent item sets and their association rules.
An example from the course is the following:
A retailer determined thru the use of historical data of purchases per transaction that people who purchased diapers tend to also buy beer. Diapers and beer are completely unrelated and you would not think there would be an association effect.
So here is an example of the dataset used from the course
Think of this as an example data set of “transactions” at a store. Each transaction contains a list of what was purchased”. From that list (which could be thousands of transactions), we’d like to find an association between those products so that we may be able to better arrange the shelves containing associated items “close by” to increase sales.
Your dataset would be translated (via api calls) to something like this (images from another good course I took(https://www.udemy.com/choosing-the-right-machine-learning-algorithm):

Some key important concepts are:
– Support

I know I’m jumping between examples but here’s a visual from the course on Here, out of our dataset, how many have purchased eggs?
– Confidence

You can interpret it as P(Y|X)
So in this image, of all of the people that bought eggs, these are the ones that also bought cheese.

Which brings us to our next important concept…
– Lift

So people in green here (based on lift) are the one’s who bought cheese with their eggs

As such, a store may want to consider having the cheese aisle very close to the egg shelf.
The idea is to run all of these calcuations for each of the features (products, movies, etc.) in your data set and determine the one’s with the highest “lifts” being the one’s that tend to be associated to each other more.
Going/sifting thru the output of our calculations (using the relatively painless api’s in python), our output would like this (You may need to zoom in this image by hitting the “Ctrl” and “+” keys several times in your browser. To unzoom, hit “Ctrl” and “0” keys once).

and you can see in this one example here that we have a 24% confidence(chance) that people who buy fromage blanc will buy honey. With a lift being over 3, that’s considered really good. The lift is the “relevance of the rule”.
an example of the line of python that will do that for you
# Training Apriori on the dataset
#min_support = items in your rules will have a higher support than the min support
# How to choose the support? Items that are bought 3-4 times a day
# Support = total number of support items “i”/total number of transactions
# What supports do we want to have so rules are relevant? Lets choose products purchased at least 3/4 times a day.
# By associating them/placing them together, customers more likely to purchase them.
# If product purchased 3/4 times a day, it’s purchased 7x a week which is 21/7501
# it’s the support of a product purchased 3 times a day = 0.00
#min_confidence = 0.2 means rules need to be correct 20% of the time…
#
#min_lift = you can try different values. But you want a lift with min = 3
# Min length = min number of products in the basket
from apyori import apriori
myrules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length=2)















