Association Rule Mining

Association Rule Mining

ยท

6 min read

Association Rule

Association rule is an If-then statement that shows the relationship between items in datasets. Association rule mining is a way which counts frequently occurring patterns, or associations from datasets.

An association rule has two parts:

  1. An antecedent: The antecedent is the if part of the rule. it is the item found in the dataset.

  2. A consequent: This is the then part of the rule. It is the item found in consequent of the antecedent.

E.g

If customers buys beverage like milo, then consequently 60% of the customers will buy milk

Association rules are discovered by thoroughly analyzing data and looking for frequent antecedent and consequent patterns. Then, depending on the two parts of the rule, the important relationships are observed:

  • Support: This has to do with how popular an item is. For example, given the transaction table below:

    transaction 1 - apple, drink, rice, tomatoes

    transaction 2 - drink, rice, tomatoes, salt

    transaction 3 - rice, drink, salt

    transaction 4 - drink, apple

    The support for rice given by:

    support(rice) = total number of times rice was brought/ total number of transactions

    support(rice) = 3/4 or 75%

  • Confidence: This is how likely an item Y is purchased given that item X is purchased. It is given by {X -> Y}.

       confidence(X, Y) = support(X, Y)/support(X)
    

    From the above table, the confidence(rice, tomatoes) is 2/3 or 66.67%

    One disadvantage of the confidence measure is that it might misrepresent the importance of an association. This is because it only accounts for how popular an item X is, but not Y. If Y is also very popular in general, there will be a higher chance that a transaction containing X will also contain Y, thus inflating the confidence measure.

  • Lift: It is used to compare confidence with expected confidence, or how many times an if-then statement is expected to be found true. A lift value greater than 1 means that item Y is likely to be bought if item X is bought, while a value less than 1 means that item Y is unlikely to be bought if item X is bought. If the lift is = 1, it would imply that the two items are somewhat independent of each other. This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

         lift(X, Y)=support(X,Y)/support(X)*support(Y)
    

From the table above the lift(rice, tomatoes) is 1.33333.

Use cases for association rules

In data science, association rules are used to find correlations between items in data sets. they are used to explain patterns in data. Below are real-world use cases for association rules:

  1. UX(User Experience) Design: Users expect something catered to their individual needs. With the data-based technique, UX designers can ensure the contents are personalized. Developers can collect data on how users use a website they create. They can then use associations in the data to optimize the website by analyzing where users tend to click and what maximizes the chance that they engage with a call to action.

  2. Market Basket Analysis: This is the most popular use case of association mining. Data is collected using barcode scanners in supermarkets. The database consists of a large number of records on past transactions. A single record lists all the items bought by a customer in one sale. Knowing which groups are inclined towards which set of items gives these shops the freedom to adjust the store layout and the store catalog to place the optimally concerning one another. An example of alogrithm used here is the Apriori Algorithm

  3. Medicine: Doctors can use association rules to help diagnose patients. There are many variables to consider when making a diagnosis, as many diseases share symptoms.

  4. Entertainment: Services like Netflix and Spotify can use association rules to fuel their content recommendation engines. Machine learning models analyze past user behaviour data for frequent patterns, develop association rules and use those rules to recommend content that a user is likely to engage with or organize content in a way that is likely to put the most interesting content for a given user first.

Association rules has many more use cases. Machine learning allow for larger and more complex data sets to be analyzed and mined for association rules.

With this, I hope I am able to clarify everything you need to know about association rule mining.