
Abstract
Apriori is a frequent pattern mining algorithm for discovering association rules. It is one of the most well-known algorithms for discovering frequent patterns along with FP-Growth algorithm. However, as a result of the current advances in the area of storage of very large databases and the tremendous growth in number of transactions, sequential Apriori becomes a bottleneck because of the long running time of the algorithm. In this paper, our goal is to develop a new parallel version of Apriori that aims to reduce the overall running time of the algorithm. Although Apriori is not known to be highly parallelizable, several attempts have been made to parallelize it in various ways, either by using parallel I/O hardware to optimize database scans or by distributing the workload on multiple processors. However, many of the parallel approaches suffer from noticeable latencies in synchronizing results being collected from each individual processor after a parallel iteration terminates. Our approach focuses on trying to maximize the workload being executed in parallel and to minimize the synchronization point delays by adopting a parallel pre-computing scheme during generation of the superset. By applying our new approach, the running time of the algorithm is reduced by an order of magnitude compared to other parallel implementations of the same algorithm.
Read More