• No products in the cart.

1. Association rules

Key Concepts

1.  Association rules

2.  Apriori algorithm

3.  Support

4.  Confidence

5.  Lift

6.  Leverage

This chapter discusses an unsupervised learning method called association rules. This is adescriptive, not predictive, method often used to discover interesting relationships hidden in a large dataset. The disclosed relationships can be represented as rules or frequent itemsets. Associationrules are commonly used for mining transactions in databases.

Here are some possible questions that association rules can answer:

Which products tend to be purchased together

Of those customers who are similar to this person, what products do they tend to buy

Of those customers who have purchased this product, what other similar products do they tend to view or purchase

5.1  Overview

Figure 5.1 shows the general logic behind association rules. Given a large collection of transactions (depicted as three stacks of receipts in the figure), in which each transaction consists of one or more items, association rules go through the items being purchased to see what items are frequently bought together and to discover a list of rules that describe the purchasing behavior. The goal withassociation rules is to discover interesting relationships among the items. (The relationship occurstoo frequently to be random and is meaningful from a business perspective, which may or may notbe obvious.) The relationships that are interesting depend both on the business context and the nature of the algorithm being used for the discovery.


Figure 5.1 The general logic behind association rules

Each of the uncovered rules is in the form X → Y, meaning that when item X is observed, item Y is also observed. In this case, the left-hand side (LHS) of the rule is X, and the right-hand side (RHS)of the rule is Y.

Using association rules, patterns can be discovered from the data that allow the association rule algorithms to disclose rules of related product purchases. The uncovered rules are listed on the right side of Figure 5.1. The first three rules suggest that when cereal is purchased, 90% of the time milk is purchased also. When bread is purchased, 40% of the time milk is purchased also. When milk is purchased, 23% of the time cereal is also purchased.

In the example of a retail store, association rules are used over transactions that consist of one or more items. In fact, because of their popularity in mining customer transactions, association rules are sometimes referred to as market basket analysis. Each transaction can be viewed as the shopping basket of a customer that contains one or more items. This is also known as an itemset. The term itemset refers to a collection of items or individual entities that contain some kind ofrelationship. This could be a set of retail items purchased together in one transaction, a set of hyperlinks clicked on by one user in a single session, or a set of tasks done in one day. An itemsetcontaining k items is called a k-itemset. This chapter uses curly braces like {item 1,item 2,… item k} to denote a k- itemset.Computation of the association rules is typically based on itemsets.

The research of association rules started as early as the 1960s. Early research by Hájek et al. [1] introduced many of the key concepts and approaches of association rule learning, but it focused onthe mathematical representation rather than the algorithm. The framework of association rule learning was brought into the database community by Agrawal et al. [2] in the early 1990s for discovering regularities between products in a large database of customer transactions recorded bypoint-of-sale systems in supermarkets. In later years, it expanded to web contexts, such as mining path traversal patterns [3] and usage patterns [4] to facilitate organization of web pages.

This chapter chooses Apriori as the main focus of the discussion of association rules. Apriori [5] is one of the earliest and the most fundamental algorithms for generating association rules. It pioneered the use of support for pruning the itemsets and controlling the exponential growth ofcandidate itemsets. Shorter candidate itemsets, which are known to be frequent itemsets, are combined and pruned to generate longer frequent itemsets. This approach eliminates the need for all possible itemsets to be enumerated within the algorithm, since the number of all possible itemsetscan become exponentially large.

One major component of Apriori is support. Given an itemset L, the support [2] of L is thepercentage of transactions that contain L. For example, if 80% of all transactions contain itemset {bread}, then the support of {bread} is 0.8. Similarly, if 60% of all transactions contain itemset{bread,butter}, then the support of {bread,butter} is 0.6.

A frequent itemset has items that appear together often enough. The term “often enough” is formally defined with a minimum support criterion. If the minimum support is set at 0.5, any itemset can be considered a frequent itemset if at least 50% of the transactions contain this itemset. In other words, the support of a frequent itemset should be greater than or equal to the minimumsupport. For the previous example, both {bread} and

{bread,butter} are considered frequent itemsets at the minimum support 0.5. If the minimumsupport is 0.7, only {bread} is considered a frequent itemset.

If an itemset is considered frequent, then any subset of the frequent itemset must also be frequent. This is referred to as the Apriori property (or downward closure property). For example, if 60% of the transactions contain {bread,jam}, then at least 60% of all the transactions will contain {bread}or {jam}. In other words, when the support of

{bread,jam} is 0.6, the support of {bread} or {jam} is at least 0.6. Figure 5.2 illustrates how the Apriori property works. If itemset {B,C,D} is frequent, then all the subsets of this itemset, shaded, must also be frequent itemsets. The Apriori property provides the basis for the Apriori algorithm.


Figure 5.2 Itemset {A,B,C,D} and its subsets

Template Design © VibeThemes. All rights reserved.