Share Email Print
cover

Proceedings Paper

Research on parallel algorithm for sequential pattern mining
Author(s): Lijuan Zhou; Bai Qin; Yu Wang; Zhongxiao Hao
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.

Paper Details

Date Published: 17 March 2008
PDF: 8 pages
Proc. SPIE 6973, Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2008, 69730Q (17 March 2008); doi: 10.1117/12.775402
Show Author Affiliations
Lijuan Zhou, Capital Normal Univ. (China)
Harbin Institute of Technology (China)
Harbin Univ. of Science and Technology (China)
Bai Qin, Harbin Univ. of Science and Technology (China)
Yu Wang, Harbin Univ. of Science and Technology (China)
Zhongxiao Hao, Harbin Institute of Technology (China)
Harbin Univ. of Science and Technology (China)
Qiqihar Univ. (China)


Published in SPIE Proceedings Vol. 6973:
Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2008
William J. Tolone; William Ribarsky, Editor(s)

© SPIE. Terms of Use
Back to Top