Empirical analysis of a parallel data mining algorithm on a graphic processor
Berdin, Davide (2012)
Avaa tiedosto
Lataukset:
Berdin, Davide
Turun ammattikorkeakoulu
2012
All rights reserved
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2012082713264
https://urn.fi/URN:NBN:fi:amk-2012082713264
Tiivistelmä
In this thesis, we analyze in an empirical way a different approach of the algorithm SPAM (Sequential PAttern Mining using A Bitmap Representation) made by J. Ayres, J. Gehrke, T. Yiu and J. Flannick from Cornell University, exploiting GPUs. SPAM is a novel approach for FSM (Frequent Sequence Mining) where the algorithm is not looking for a single item during the mining process but for a set of items that customers have taken in their transactions. Basically we can think of Spam as a supermarket where the customers have the ”Fidelity card” and that the supermarket can track the transactions. What Spam does is to calculate which are the most bought items per customer.
But the question is: Why graphic cards? The answer is pretty easy: graphic cards can benefit of a rich amount of data parallelism allowing many arithmetic operations to be safely performed on program data structures in a simultaneous manner. In our tests we noticed that performing a massive amount of multiplications (or any other kind of operation) with the CPU took at least the 60% more rather than the
GPU. Just with this simple case, we could understand the potential of GPUs.
But the question is: Why graphic cards? The answer is pretty easy: graphic cards can benefit of a rich amount of data parallelism allowing many arithmetic operations to be safely performed on program data structures in a simultaneous manner. In our tests we noticed that performing a massive amount of multiplications (or any other kind of operation) with the CPU took at least the 60% more rather than the
GPU. Just with this simple case, we could understand the potential of GPUs.