Skip to content

The project is an implementation of the Apriori algorithm in order to fing frequent items and association rules from a dataset using Apache Spark.

Notifications You must be signed in to change notification settings

NathanLabbe/Data_Mining_Frequent_Item

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Finding Frequent Items

KTH

Lab 2 for the ID2222 Data Mining course at KTH.

Principle

The project is an implementation of the Apriori algorithm in order to fing frequent items and association rules from a dataset using Apache Spark.

Introduction

The problem of discovering association rules between itemsets in a sales transaction database (a set of baskets) includes the following two sub-problems [R. Agrawal and R. Srikant, VLDB '94 (Connexions vers un site externe.)]:

Finding frequent itemsets with support at least s; Generating association rules with confidence at least c from the itemsets found in the first step. Remind that an association rule is an implication X → Y, where X and Y are itemsets such that X∩Y=∅. Support of the rule X → Y is the number of transactions that contain X⋃Y. Confidence of the rule X → Y the fraction of transactions containing X⋃Y in all transactions that contain X.

Task

You are to solve the first sub-problem: to implement the Apriori algorithm for finding frequent itemsets with support at least s in a dataset of sales transactions. Remind that support of an itemset is the number of transactions containing the itemset. To test and evaluate your implementation, write a program that uses your Apriori algorithm implementation to discover frequent itemsets with support at least s in a given dataset of sales transactions.

The implementation can be done using any big data processing framework, such as Apache Spark, Apache Flink, or no framework, e.g., in Java, Python, etc.

Optional task for extra bonus Solve the second sub-problem, i.e., develop and implement an algorithm for generating association rules between frequent itemsets discovered by using the Apriori algorithm in a dataset of sales transactions. The rules must have support at least s and confidence at least c, where s and c are given as input parameters.

Authors

About

The project is an implementation of the Apriori algorithm in order to fing frequent items and association rules from a dataset using Apache Spark.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Makefile 0.9%