Using Machine Learning to Predict Student Performance
Pojon, Murat (2017)
Pojon, Murat
2017
Tietojenkäsittelytieteiden tutkinto-ohjelma - Degree Programme in Computer Sciences
Luonnontieteiden tiedekunta - Faculty of Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2017-06-13
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:uta-201706262111
https://urn.fi/URN:NBN:fi:uta-201706262111
Tiivistelmä
This thesis examines the application of machine learning algorithms to predict whether a student will be successful or not. The specific focus of the thesis is the comparison of machine learning methods and feature engineering techniques in terms of how much they improve the prediction performance.
Three different machine learning methods were used in this thesis. They are linear regression, decision trees, and naïve Bayes classification. Feature engineering, the process of modification and selection of the features of a data set, was used to improve predictions made by these learning algorithms.
Two different data sets containing records of student information were used. The machine learning methods were applied to both the raw version and the feature engineered version of the data sets, to predict the student's success.
The thesis comes to the same conclusion as the earlier studies: The results show that it is possible to predict student performance successfully by using machine learning. The best algorithm was naïve Bayes classification for the first data set, with 98 percent accuracy, and decision trees for the second data set, with 78 percent accuracy. Feature engineering was found to be more important factor in prediction performance than method selection in the data used in this study.
Three different machine learning methods were used in this thesis. They are linear regression, decision trees, and naïve Bayes classification. Feature engineering, the process of modification and selection of the features of a data set, was used to improve predictions made by these learning algorithms.
Two different data sets containing records of student information were used. The machine learning methods were applied to both the raw version and the feature engineered version of the data sets, to predict the student's success.
The thesis comes to the same conclusion as the earlier studies: The results show that it is possible to predict student performance successfully by using machine learning. The best algorithm was naïve Bayes classification for the first data set, with 98 percent accuracy, and decision trees for the second data set, with 78 percent accuracy. Feature engineering was found to be more important factor in prediction performance than method selection in the data used in this study.