Filter method for feature selection using fuzziness index number with fuzzy k-nearest neighbor classifier
Nahid, Anushka (2020)
Diplomityö
Nahid, Anushka
2020
School of Engineering Science, Laskennallinen tekniikka
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2020051838162
https://urn.fi/URN:NBN:fi-fe2020051838162
Tiivistelmä
The extensive data sets of real-world contain high-dimensional information; among them, some may hold irrelevant information that complicates the data set. Feature selection is regarded as a useful data preprocessing strategy. In the supervised machine learning, the feature selection approach is necessary for classifying the objects. Plenty of methods for selecting relevant features exist to minimize the data set, reduce computational cost and time, and enhance the classification performance. The filter feature selection method using the fuzziness index number is introduced and examined with the fuzzy k-nearest neighbor classifier in this research. The model was examined with three separate data sets, and the outcome of the model was assessed based on the accuracy, sensitivity, specificity, type I error, type II error, and the error rate of the classifier. By implementing this method on the bank-marketing data set, the highest accuracy (88.46%) was achieved with only two features compared to the original set. The mean classification accuracy with the selected features of breast cancer and Pima-Indian data set were 96.75% and 71.42%, respectively. Finally, investigating the result, it can be demonstrated that this new method of selecting features is performing better for the medical data set rather than the bank-marketing data set.