Machine Learning Method Comparison in Agricultural Data Analysis

Nevavuori, Petteri

Machine Learning Method Comparison in Agricultural Data Analysis

Nevavuori, Petteri (2017)

Avaa tiedosto

Nevavuori, Petteri: Machine Learning Method Comparison in Agricultural Data Analysis (3.595Mt)

Lataukset:

Nevavuori, Petteri

2017

Johtaminen ja tietotekniikka (Pori)
Talouden ja rakentamisen tiedekunta - Faculty of Business and Built Environment

This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.

Hyväksymispäivämäärä

2017-06-07

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201705261546

Tiivistelmä

The aim of this master’s thesis was to compare machine learning methods in clustering and regression tasks with data collected from Finnish dairy farms by Mtech Digital Solutions Oy. Clustering techniques focus on finding similarities between the items of the dataset by examining the data itself. Regression techniques then are used to build predictive models for the dataset. Common theme to all machine learning methods is that they are used to examine data that is manually too complex to handle by applying statistical and mathematical algorithms.
The data has been collected during a timeframe spanning tens of years and has been used by agricultural experts to provide insights and counselling to farmers across Finland on-site. Recent advances in the field of machine learning have however sparked the thesis’ employer’s interest to employ data-driven modelling and information acquisition practices for standardized and invariant conclusions about the health and progression of farms. There were two datasets formed – one for the clustering task and one for the regression task. The clustering dataset contained information about dairy farms’ production and cattle-related health treatment records. The regression dataset then encompassed all the metrics about farms as businesses.
Overall eight machine learning methods were compared, four clustering and four regression methods, respectively. The clustering methods were Hierarchical Clustering, k-Means, Self-Organizing Maps and BIRCH and the regression methods were Ordinary Least Squares, Decision Tree Regression, Multilayer Perceptron and XGBoost. The conclusion for clustering was that k-Means performed the best out of clustering methods, while every method’s performance was relatively equal with BIRCH being the only exception. The conclusion for regression method comparison was that XGBoost delivered the best results by performing well score-wise and providing the needed information about most important features.

Kokoelmat

Opinnäytteet - ylempi korkeakoulututkinto [37813]