Updated March 8, 2023

Introduction to Apriori Algorithm

In machine learning technologies, we are using different and a number of algorithms to give intelligence to the hardware or machine. The apriori algorithm is one of them. The apriori algorithm helps to generate the association rules. For generating the association rules it is using the frequent dataset or the itemset information. Generally, it is designed in such a way that it will work on the different databases that contain or supporting the transactions. As per the association rules, we are able to identify how the two objects are strongly or weakly connected. As per the working mechanism, the apriori algorithm is using the hash tree and the breadth-first search to calculate the itemset. It will very efficiently calculate the itemset. The large dataset will provide the iterative mechanism for identifying the frequent itemset.

The apriori algorithm was developed by Srikant and R. Agrawal. It was developed in the year 1994. At the initial stages, the apriori algorithm is mainly used for the market basket analysis. It will help to identify the products that can perches together by the customer. The same algorithm will also use in the health care industry. It will help to identify the drug reaction.

Apriori algorithm in Machine Learning

In the apriori algorithm, we are major concerned about the frequent itemsets. The frequent itemsets are those items whose support value is greater than the user defines minimum support value or the threshold value that we have defined. To keep it easier, let’s consider the two itemsets i.e. P & Q. The P & Q both are the frequent itemsets.

Let’s consider there are two transactions : P = {1,2,3,4,5}, and Q = {2,3,7}. In these two transactions, the 2 and 3 are the common itemsets.

Note: In the apriori algorithm, mainly we need to focus on the association rules. To well know the apriori algorithm, we need to properly understand the support concept and the confidence. Here, the association rules come in the picture.

Below is the list of phrases that would help to understand the apriori algorithm:

Phase 1: At the initial stage, we need to identify the list of support itemsets in the transactional databases. Accordingly, we need to select the minimum support and confidence values.

Phase 2: We need to take all the supports values from the transaction table with the help of higher support values. The support value is higher as compared to the minor the selected support value.

Phase 3: We need to find all the set of rules. As per these rules, the subset values having the higher confidence values as compare to the min confidence value or the threshold value.

Phase 4: Here, we sort the set of rules in terms of the decreasing order.

How Apriori Algorithm Work?

To explain the apriori algorithm in a detailed way, we are using some set of mathematical calculation
Example: Let’s consider, we are having the below datasets. It is holding the list of transactions. In the same dataset, we are finding the Common itemset. As per the requirement, we need to define the association rules with the help of the apriori algorithm.

Transaction ID	ITEMSETS
TID1	P, Q
TID2	Q, S
TID3	Q, R
TID4	P, Q, S
TID5	P, R
TID6	Q, R
TID7	P, R
TID8	P, Q, R, T
TID9	P, Q, R

Note: As per the above transaction table, the min support value is 2 and the min confidence value is 50%.

Solution Phase for the apriori algorithm

Phase 1: Calculating C1 and L1

In the initial stage, we are building a table that will hold the support count values i.e. the common itemset in the individual dataset. The table is called the C1 i.e. the candidate set.

Itemset	Support count
P	6
Q	7
R	5
S	2
T	1

In the next phase, we are getting the itemsets those are having the greater support value count as compared to the min support count i.e. the value is 2. It will provide us with the table for the common itemset i.e. L1

Itemset	Support count
P	6
Q	7
R	5
S	2
T	1

Phase 2: Candidate Gen C2, and L2

In this phase, we will make the C2 with the help of L1. In the C2, we are creating the pair of itemset of L1. It will be in the form of subsets.
Once the subset is ready, we need to find the support values count as compare to the main transaction table of the datasets.

Itemset	Support count
{P, Q}	4
{P, R}	4
{P, S}	1
{Q, R}	4
{Q, S}	2
{R, S}	0

One more time, we need to do the comparison of the C2 support count with the min support count value. Once the comparison is completed then the itemset those are having less support value will be deleted and give the L2 table.

Itemset	Support count
{P, Q}	4
{P, R}	4
{Q, R}	4
{Q, S}	2

Phase 3: We are generating the Candidate C3 & L3

In the C3, we are repeating the same processes, but here will create the C3 table with help of subsets values as three. We are calculating the support count with the help of the dataset.

Itemset	Support count
{P, Q, R}	2
{Q, R, S}	1
{P, R, S}	0
{P, Q, S}	0

We need to create the L3 level table. Here, we are using the above C3 table. As per the table, we are getting only a single combination of the itemset that will support the count value equal to the min support count. Now, the L3 will have only a single grouping i.e., {P, Q, R}.

Phase 4: Discovery of the association rules

We need to create a new table for the association rules with the help of combination {P, Q, R}. In terms of calling the association rules, we will call the confidence with the help of sup (P ^Q)/P. Once the confidence will cal for all the rules then we need to exclude the rules having the lower confidence as comparing the min threshold i.e. (50%).

• P^Q → R 2 Sup{(P ^Q) ^R}/sup(P ^Q)= 2/4=0.5=50%
• Q^R → P 2 Sup{(Q^R) ^P}/sup(Q ^R)= 2/4=0.5=50%
• P^R → Q 2 Sup{(P ^R) ^Q}/sup(P ^R)= 2/4=0.5=50%
• R→ P ^Q 2 Sup{(R^( P ^Q)}/sup(R)= 2/5=0.4=40%
• P→ Q^R 2 Sup{(P^( Q ^R)}/sup(P)= 2/6=0.33=33.33%
• Q→ Q^R 2 Sup{(Q^( Q ^R)}/sup(Q)= 2/7=0.28=28%

Conclusion – Apriori Algorithm in Machine Learning

We have seen the uncut concept of “apriori algorithm”. The apriori algorithm will help to generate the association rules. It will help to identify the frequent transaction items.