A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database

Fahed Yoseph, Markku Heikkilä

    Research output: Chapter in Book/Conference proceedingConference contributionScientificpeer-review

    3 Citations (Scopus)
    69 Downloads (Pure)

    Abstract

    Finding outliers, rare events from a collection of patterns, has become an emerging issue in the area of machine learning concerned with detecting and eventually removing anomalous objects in data. A key challenge with outliers/anomalies detection is because they are not a well-formulated issue. Outliers are defined as the extreme values that deviate from the overall patterns in data; they may indicate experimental errors, variability in measurement, or a novelty. Detecting outliers in large databases can lead to the discovery of hidden knowledge. However, identifying and removing outliers often helps to assure that the observations represent the problem correctly. Though there are several techniques for detecting outliers/anomalies in a given database, thus, no single technique is proven to be the standard universal choice. Depending on the nature of the target application, different implementations require the use of different outlier detection methods. The clustering method is a very powerful method in the field of machine learning and defines outliers in terms of their distance to the cluster centers. In this study, we propose a clustering-based approach to identifying outliers in a retail point-of-sales dataset. To select the best clustering algorithm for the purpose, two algorithms are applied, K-means for hard, crisp clustering, and (FCM) Fuzzy C-means for soft clustering. The experimental results show that the K-means algorithm outperforms the (FCM) Fuzzy C-means algorithm in terms of outlier detection efficiency, and it is an effective outlier detection solution.

    Original languageEnglish
    Title of host publication2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
    EditorsPhill Kyu Rhee, Kuo-Yuan Hwa, Tun-Wen Pai, Daniel Howard, Md Rezaul Bashar
    PublisherIEEE
    Pages65–71
    ISBN (Print)978-1-7281-0404-1
    DOIs
    Publication statusPublished - 2019
    MoE publication typeA4 Article in a conference publication
    EventInternational Conference on Machine Learning and Data Engineering (iCMLDE) - 2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
    Duration: 2 Dec 20194 Dec 2019

    Conference

    ConferenceInternational Conference on Machine Learning and Data Engineering (iCMLDE)
    Period02/12/1904/12/19

    Keywords

    • Clustering
    • Noise
    • Outlier detection
    • Point-of-sales analysis

    Fingerprint

    Dive into the research topics of 'A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database'. Together they form a unique fingerprint.

    Cite this