The use of multi-criteria decision-making model in ranking feature selection technique combinations with classifier
Salami, Sodiq Adewale (2021)
Diplomityö
Salami, Sodiq Adewale
2021
School of Engineering Science, Tuotantotalous
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2021081243196
https://urn.fi/URN:NBN:fi-fe2021081243196
Tiivistelmä
Combining feature selection technique and classification method are widely used in the improvement of the predictive performance of classifiers. Not so many studies have been able to highlight the importance of prediction that selects a perfect combination of feature selection and classification techniques for certain performance measures. Accordingly, for this research, we intend to select the highest ranked results from the combinations of four concepts adopted as feature selection methods and two classification techniques utilizing a multi-criteria decision support model based on three credit datasets. The feature selection methods considered are t-test, correlation, entropy measures, similarity and entropy measures which are implemented before prediction. Logistic regression and K- Nearest Neighbour (KNN) are used as classifiers. Accuracy and specificity performance measures were utilized for evaluating the results of the classifiers. Weighted Sum Model (WSM) was used as the Multi-Criteria Decision Making (MCDM) support model for selecting the highest ranked combination concerning the impact of feature selections and classification methods on performance measures. Three dissimilar credit datasets were used to experiment and administer a solid outcome which is namely Australian, German, and Japanese credit datasets. The results of this study are borne out of minimizing feature selected and maximizing performance measure in the ranking phase for the data utilized in experimentation. The results achieved when all four feature selection methods are combined with the two classifiers based on three datasets describe in previous paragraph are ranked based on performance measure metrics accuracy and specificity. The highest ranked combination for Australian credit data set when the process described in previous paragraph is considered for accuracy performance measure is t-test and Logistic Regression (LR). Similar approach was used in ranking the combination for specificity performance measure for Australian data set, where the highest ranked is achieved with combination of t- test and both classifiers explored for this study. Similar approach is also use for German credit data set as described for the Australian credit dataset. The results achieved for accuracy performance measure in German credit dataset is entropy measure combine with LR. Same approach was utilized for specificity performance measure for German credit data and the result shows that the highest ranked combination is with entropy measure and KNN. Japanese credit dataset ranking method also follows the German credit dataset, where the result shows that for accuracy performance measure similarity and entropy measure and LR is the highest ranked. Similarly, the highest ranked for specificity performance measure for Japanese credit data is the combination of Similarity and entropy measure with LR. The results give an overview on reasons to use both feature selections and classifications methods when predicting using credit datasets. As seen in almost all data sets outcomes, when feature selection is not in place the performance result of a classification technique is affected tremendously. Classification method without feature selection method ranked the lowest for all the credit datasets. Therefore, using feature selection and classification method is vital when performance measure is of outmost importance.