MODELLING OVERDISPERSED SEED GERMINATION DATA: XGBOOST'S PERFORMANCE
G. Ser1* and C. T. Bati2
1Department of Animal Science, Faculty of Agriculture, Van Yuzuncu Yil University, Van, Turkey
2Department of Animal Science, Graduate School of Natural and Applied Sciences, Van Yuzuncu Yil University, Van, Turkey
*Corresponding author’s email:gazelser@gmail.com
ABSTRACT
Depending on the extent of variability in germination count data, the problem of overdispersion arises. This problem causes significant problems in estimation. In this study, gradient boosting algorithms are used as a new approach to support precision agriculture applications in estimating overdispersed germination counts. The database consisting of germination count data of weed (Amaranthus retroflexus L. and Chenopodium album L) and cultural plants (Beta vulgarisL. and Zea mays L.) with white cabbage seedlings, known for their allelochemical effects, was created. Accordingly, gradient boosting (GB) and extreme gradient boosting (Xgboost) algorithms were first developed for default values to estimate the germination counts of each plant; then, different combinations of hyperparameters were created to optimize the performance of the models. Root mean square error (RMSE), mean poisson deviation (MPD) and coefficient of determination (R2), were used as the statistical criteria for evaluating the performance of the above algorithms. According to the experimental results, the Xgboost algorithm showed superior performance compared to GB in both the default and hyperparameter combinations in the germination counts of A. retroflexus, C. album, B. vulgaris and Z. mays (RMSE: 0.725-2.506 and R2: 0.97-0.99). Our results indicate that the Xgboost made successful predictions of germination counts obtained under experimental conditions. Based on these results, we suggest the use of Xgboost optimal models for larger count data in precision agriculture.
Key words: Estimation, boosting algorithms, count data, germination
|