A review of data mining in bioinformatics
Limo, Vincent (2019)
Limo, Vincent
2019
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202001091160
https://urn.fi/URN:NBN:fi:amk-202001091160
Tiivistelmä
In the beginning of the 20th century, commonly known as the information age, there has been a phe-nomenal growth of potentially deadly group of abnormal diseases such as cancer. Because of this, there is a need for cancer and other biomedical research in and around transcriptomics, genomic and genetics which have a direct application of computer science methods such as data analysis and mathematics.
The aim of this bachelor’s thesis is to highlight and discuss in detail the application of data mining techniques in bioinformatics. It begins by discussing the interdisciplinary relationship between data mining, knowledge discovery and bioinformatics before a comprehensive descriptive research in data mining techniques and their application in bioinformatics. The results stablished that gene expression analysis and gene sequencing rely on the application of clustering techniques such hierarchical, fuzzy, graph, and distance clustering while classification techniques, such as machine vector learning, super-vised learning, support vector machine and random forest are fundamental in genomic and proteomic synthesizing. It recommends data transformation, cleaning, and scalable statistical models as solutions to the prominent data quality and computational challenges in data mining. This thesis is divided into four main parts, Introduction, Data mining, Application of data mining in bioinformatics and a conclusion.
The aim of this bachelor’s thesis is to highlight and discuss in detail the application of data mining techniques in bioinformatics. It begins by discussing the interdisciplinary relationship between data mining, knowledge discovery and bioinformatics before a comprehensive descriptive research in data mining techniques and their application in bioinformatics. The results stablished that gene expression analysis and gene sequencing rely on the application of clustering techniques such hierarchical, fuzzy, graph, and distance clustering while classification techniques, such as machine vector learning, super-vised learning, support vector machine and random forest are fundamental in genomic and proteomic synthesizing. It recommends data transformation, cleaning, and scalable statistical models as solutions to the prominent data quality and computational challenges in data mining. This thesis is divided into four main parts, Introduction, Data mining, Application of data mining in bioinformatics and a conclusion.