This article explain the basics of association rules and how to find them using Apriori algorithm. The algorithm is implemented in JavaScript and C# / Silverlight and a live demonstration is available below with full source code. ES2015 version of JavaScript is used. ES2015 is a fantastic step forward for the JavaScript with significant features like classes, arrow functions, modules, etc.
Learn Apriori Algorithm by Example
The Apriori algorithm for finding large itemsets and generating association rules using those large itemsets are illustrated in this demo. Enter a set of items separated by comma and the number of transactions you wish to have in the input database. Then press Generate DB button to generate a random database with items that you entered. Then press the Apriori button to see the algorithm in action; a set of large itemsets and association rules will be generated based on the given support threshold and confidence threshold. The source code of the demo is available for download below in JavaScript and C#.
The same widget / demo is available in C# and Silverlight here. Source code is available for download below.
What is Data Mining and Association Rules
Data Mining is a technique for discovering useful information from large databases. Analyzing the data and extracting useful information can be potentially very profitable to a business. For example, if a seller can find the association between two products, critical decisions in pricing or product placements can be made in order to promote the business. By this way, the seller can concentrate the marketing efforts on every subset of customers who are very likely to buy the associated products.
Association Rules are used for discovering regularities between products in big transactional databases. A transaction is an event involving one or more of the products (items) in the business or domain; for example buying of goods by a consumer in a super market is a transaction. A set of items is usually referred as "itemset", and an itemset with "k" number of items is called "k-itemset".
The general form of an association rule is X => Y, where X and Y are two disjoint itemsets. The "support" of an itemset is the number of transactions that contain all the items of that itemset; whereas the support of an association rule is the number of transactions that contain all items of both X and Y. The "confidence" of an association rule is the ratio between its support and the support of X.
A given association rule X => Y is considered significant and useful, if it has high support and confidence values. The user will specify a threshold value for support and confidence, so that different degrees of significance can be observed based on these threshold values.
For more information on association rules, see http://en.wikipedia.org/wiki/Association_rule.
Data Structures for this Article
We create the following data-structure classes. An Itemset is just a list of strings; it can be used to represent a transaction or any set of items. The ItemsetCollection is list of itemsets; it can be used to represent a transactional database or any group of itemsets. The AssociationRule class represents an instance of a generated association-rule.
class Itemset extends Array {
constructor() {
super();
this.Support = 0.0;
}
}
class ItemsetCollection extends Array {
constructor() {
super();
}
}
class AssociationRule {
constructor() {
this.X = new Itemset();
this.Y = new Itemset();
this.Support = 0.0;
this.Confidence = 0.0;
}
}
Finding Large Itemsets using Apriori Algorithm
The first step in the generation of association rules is the identification of large itemsets. An itemset is "large" if its support is greater than a threshold, specified by the user. A commonly used algorithm for this purpose is the Apriori algorithm.
The Apriori algorithm relies on the principle "Every non-empty subset of a larget itemset must itself be a large itemset". The algorithm applies this principle in a bottom-up manner. Let Li denote the collection of large itemsets with "i" number of items. The algorithm begins by identifying all the sets in L1. Each item that has the necessary support forms a large 1-itemset and included in L1, other itemsets are dropped from consideration. This process of retaining necessary itemsets only is called "pruning". The set of itemsets used to find Li is called candidate itemsets (Ci).
The collection L2 can be constructed by considering each pair of sets in L1 and retaining only those pairs that has enough support. In general, having constructed Li, the collection Li+1 is constructed by considering pairs of sets, one from Li and another from L1 and eliminating those for which the support is smaller. This procedure is continued until all large itemsets up to the desired maximum size have been obtained or no further pruning is possible.
The following code implements the Apriori algorithm.
static doApriori(db, supportThreshold) {
let I = db.getUniqueItems();
let L = new ItemsetCollection(); // Resultant large itemsets
let Li = new ItemsetCollection(); // Large itemset in each iteration
let Ci = new ItemsetCollection(); // Pruned itemset in each iteration
// First iteration (1-item itemsets)
for (var i = 0; i < I.length; i += 1) {
Ci.push(Itemset.from([I[i]]));
}
// Next iterations
let k = 2;
while (Ci.length != 0) {
// Set Li from Ci (pruning)
Li.clear();
for (var index in Ci) {
let itemset = Ci[index];
itemset.Support = db.findSupport(itemset);
if (itemset.Support >= supportThreshold) {
Li.push(itemset);
L.push(itemset);
}
}
// Set Ci for next iteration (find supersets of Li)
Ci.clear();
let subsets = Bit.findSubsets(Li.getUniqueItems(), k); // Get k-item subsets
subsets.forEach(set => Ci.push(set));
k += 1;
}
return L;
}
The FindSubsets() function defined in Bit class is used to find all the subsets of a given set of items. This is explained in more detail in this article.
Finding Association Rules
Having found the set of all large itemsets from the input database, the next task is to find the required set of strong association rules. An association rule is "strong" if its confidence value is greater than a user-defined threshold. The association rules are created by combining each large itemset with each of its subsets. The strong rules are published as result and others are dropped.
static mine(db, L, confidenceThreshold) {
let allRules = [];
for (var i in L) {
let itemset = L[i];
let subsets = Bit.findSubsets(itemset, 0); // Get all subsets
for (var j in subsets) {
let subset = subsets[j];
let confidence = (db.findSupport(itemset) / db.findSupport(subset)) * 100.0;
if (confidence >= confidenceThreshold) {
let rule = new AssociationRule();
subset.forEach(i => rule.X.push(i));
itemset.removeItemset(subset).forEach(i => rule.Y.push(i));
rule.Support = db.findSupport(itemset);
rule.Confidence = confidence;
if (rule.X.length > 0 && rule.Y.length > 0) {
allRules.push(rule);
}
}
}
}
return allRules;
}
Downloads for this article
File |
Language |
Tools |
Apriori Demo Source in C# 365.57 kb (3702 downloads) |
C#, Silverlight 3 |
Visual Studio 2010 |
Apriori Demo Source in JavaScript / ES2015 42.96 kb (643 downloads) |
JavaScript / ES2015 |
Visual Studio Code |