Using a random cross-validation technique to compare typical regression vs. Random Forests for modelling pan evaporation

Authors

DOI:

https://doi.org/10.36253/ijam-2043

Keywords:

Class A pan evaporation, random cross validation, regression model, random forest model, machine learning, Highest Posterior Density Distribution of solutions

Abstract

Pan evaporation (Epan) of class A pan evaporimeter under local semi-arid conditions was modelled in this study based on meteorological observations as input data using an integrated regression approach that includes three steps: a) first step: appropriate selection of transformations for reducing normality departures of independent variables and ridge regression for selecting variables with low collinearity based on variance inflation factors, b) second step (RCV-REG): regression (REG) of the final model with selected transformed variables of low collinearity implemented using an iterative procedure called “Random Cross-Validation” (RCV) that splits multiple times the data in calibration and validation subsets considering a random selection procedure, c) robustness control of the estimated regression coefficients from RCV-REG by analyzing the  sign (+ or -) variation of their iterative solutions using the 95% interval of their Highest Posterior Density Distribution (HPD). The iterative procedure of RCV can also be implemented on machine learning methods (MLs) and for this reason, the ML method of Random Forests (RF) was also applied with RCV (RCV-RF) as an additional case in order to be compared with RCV-REG. Random splitting of data into calibration and validation set (70% and 30%, respectively) was performed 1,000 times in RCV-REG and led to a respective number of solutions of the regression coefficients. The same number of iterations and random splitting for validation was also used in the RCV-RF. The results showed that RCV-REG outperformed RCV-RF at all model performance criteria providing robust regression coefficients associated to independent variables (constant signs of their 95% HPD interval) and better distribution of validation solutions in the iterative 1:1 plots from RCV-RF (RCV-RG: R2=0.843, RMSE=0.853, MAE=0.642, MAPE=0.081, NSE=0.836, Slope(1:1 plot)=0.998, Intercept(1:1 plot)=0.011, and RCV-RF: R2=0.835, RMSE=0.904, MAE=0.689, MAPE= 0.088, NSE=0.818, Slope(1:1 plot)=1.120, Intercept(1:1 plot)=-1.011, based on the mean values of 1,000 iterations). The use of RCV approach in various modelling approaches solves the problem of subjective splitting of data into calibration and validation sets, provides a better evaluation of the final modelling approaches and enhances the competitiveness of typical regression models against machine learning models.

References

Allen R.G., Pereira L.S., Raes D., Smith M., 1998. Crop Evapotranspiration: Guidelines for Computing Crop Water Requirements. Irrigation and Drainage Paper 56, Food and Agriculture Organization of the United Nations: Rome.

Almedeij J., 2012. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression. The Scientific World Journal, Ar. ID 574742: 1-9. https://doi.org/10.1100/2012/574742

Alsumaiei A.A., 2020. Utility of artificial neural networks in modeling pan evaporation in hyper-arid climates. Water (Switzerland), 1508: 1-12. https://doi.org/10.3390/w12051508

Althoff D., Rodrigues L.N., da Silva D.D., 2020. Impacts of climate change on the evaporation and availability of water in small reservoirs in the Brazilian savannah. Climatic Change, 159: 215–232. https://doi.org/10.1007/s10584-020-02656-y

Aschonitis V., Diamantopoulou M., Papamichail D., 2018. Modeling plant density and ponding water effects on flooded rice evapotranspiration and crop coefficients: critical discussion about the concepts used in current methods. Theoretical and Applied Climatology, 132: 1165-1186. https://doi.org/10.1007/s00704-017-2164-z

Ashrafzadeh Α., Malik A., Jothiprakash V., Ghorbani M.A., Biazar S.M., 2018. Estimation of daily pan evaporation using neural networks and meta-heuristic approaches. ISH Journal of Hydraulic Engineering, 26(4): 421-429. https://doi.org/10.1080/09715010.2018.1498754

Babakos K., Papamichail D., Tziachris P., Pisinaras V., Demertzi K., Aschonitis V., 2020. Assessing the robustness of pan evaporation models for estimating reference crop evapotranspiration during recalibration at local conditions. Hydrology, 7(3): 62. https://doi.org/10.3390/hydrologfy7030062

Breiman L., 2001. Random forests. Machine Learning, 45: 5-32. https://doi.org/10.1023/A:1010933404324

Bruton J.M., McClendon R.W., Hoogenboom G.,. 2000. Estimating daily pan evaporation with artificial neural networks. Transactions of the American Society of Agricultural and Biological Engineers, 43(2), 491-496. https://doi.org/10.13031/2013.2730

Brutsaert B., Lei Yu S., 1968. Mass Transfer Aspects of Pan Evaporation. Journal of Applied Meteorology and Climatology, 7: 563–566. DOI:10.1175/1520-0450(1968)007<0563:MTAOPE>2.0.CO,2

Chang F.J., Sun W., Chung C.H., 2013. Dynamic factor analysis and artificial neural network for estimating pan evaporation at multiple stations in northern Taiwan. Hydrological Sciences Journal, 58(4): 813-825. https://doi.org/10.1080/02626667.2013.775447

Deo R.C., Samui P., 2017. Forecasting evaporative loss by least-square support-vector regression and evaluation with genetic programming, gaussian process, and minimax probability machine regression: Case study of Brisbane city. Journal of Hydrologic Engineering, 22(6), art. no. 05017003. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001506

Díaz-Uriarte R., De Andres S.A., 2006. Gene selection and classification of microarray data using random forest. BMC bioinformatics, 7: 1-13. https://doi.org/10.1186/1471-2105-7-3

Dietterich T.G., 1995. Overfitting and undercomputing in machine learning. ACM Computing Surveys, 27(3): 326-327. https://doi.org/10.1145/212094.212114

Doorenbos J., Pruitt W.O., 1977. Guidelines for Predicting Crop Water Requirements. Irrigation and Drainage Paper No. 24, 2nd ed., Food and Agriculture Organization of the United Nations: Rome.

Finch J.W., Hall R.L., 2001. Estimation of open water evaporation–a review of methods. R&D Technical Report W6–043/TR. Environment Agency, Rio House, Waterside Drive, Aztec West, Almondsbury, Bristol.

Flach, P. 2012. Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge: Cambridge University Press.

Geurts P., Irrthum A., Wehenkel L., 2009. Supervised Learning with Decision Tree-based Methods in Computational and Systems Biology. Molecular Biosystems, 5(12), 1593-1605. https://doi.org/10.1039/b907946g

Ghorbani M.A., Deo R.C., Yaseen Z.M., Kashani M.H., Mohammadi B., 2018. Pan evaporation prediction using a hybrid multilayer perceptron-firefly algorithm (MLP-FFA) model: case study in North Iran. Theoretical and Applied Climatology, 133: 1119-1131. https://doi.org/10.1007/s00704-017-2244-0

Golino H.F., Gomes C.M.A., 2016. Random forest as an imputation method for education and psychology research: its impact on item fit and difficulty of the Rasch model. International Journal of Research and Method in Education, 39(4): 401-421. https://doi.org/10.1080/1743727X.2016.1168798

Guan Y., Mohammadi B., Pham Q.B., Adarsh S., Balkhair K.S., Rahman K.U., Linh N.T.T., Tri D.Q., 2020. A novel approach for predicting daily pan evaporation in the coastal regions of Iran using support vector regression coupled with krill herd algorithm model. Theoretical and Applied Climatology, 142: 349-367. https://doi.org/10.1007/s00704-020-03283-4

Hastie T., Tibshirani R., Friedman J., 2009. The elements of statistical learning. 2nd ed. Springer, New York.

Helsel D.R, Hirsch R.M, Ryberg K.R, Archfield S.A, Gilroy E.J., 2020. Statistical Methods in Water Resources. In Book 4, Hydrologic Analysis and Interpretation, U.S. Geological Survey, U.S., 4–A3, pp. 460.

Irmak S., Haman D., 2003. Evaluation of Five Methods for Estimating Class A Pan Evaporation in a Humid Climate. Horttechnology, 13: 500-508. https://doi.org/10.21273/HORTTECH.13.3.0500

Keskin M.E., Terzi Ö., Taylan D., 2004. Fuzzy logic model approaches to daily pan evaporation estimation in western Turkey / Estimation de l’évaporation journalière du bac dans l’Ouest de la Turquie par des modèles à base de logique floue. Hydrological Sciences Journal, 49: 1001-1010. https://doi.org/10.1623/hysj.49.6.1001.55718

Keskin M.E., Terzi Ö., 2006. Artificial neural network models of daily pan evaporation. Journal of Hydrologic Engineering, 11: 65-70. https://doi.org/10.1061/(ASCE)1084-0699(2006)11:1(65)

Kim S., Shiri J., Kisi O., 2012. Pan Evaporation Modeling Using Neural Computing Approach for Different Climatic Zones. Water Resources Management, 26: 3231-3249. https://doi.org/10.1007/s11269-012-0069-2

Kim S., Shiri J., Singh V.P., Kisi O., Landeras G., 2015. Predicting daily pan evaporation by soft computing models with limited climatic data. Hydrological Sciences Journal, 60(6): 1120-1136. https://doi.org/10.1080/02626667.2014.945937

Kisi O., Keskyn E.M., Terzy Ö., Taylan D., 2005. Discussion of “Fuzzy logic model approaches to daily pan evaporation estimation in western Turkey”. Hydrological Sciences Journal, 50(4): 727-730. https://doi.org/10.1623/hysj.2005.50.4.727

Konapala G., Mishra A. K., Wada Y., Mann M. E., 2020. Climate change will affect global water availability through compounding changes in seasonal precipitation and evaporation. Nature Communications, 11, art. no. 3044. https://doi.org/10.1038/s41467-020-16757-w

Kovoor G.M., Nandagiri L., 2007. Developing regression models for predicting pan evaporation from climatic data - A comparison of multiple least-squares, principal components, and partial least-squares approaches. Journal of Irrigation and Drainage Engineering, 133: 444-454. https://doi.org/10.1061/(ASCE)0733-9437(2007)133:5(444)

Kutner M., Nachtsheim C., Neter J., 2004. Applied Linear Regression Models, 4rd ed., McGraw Hill Irwin, pp. 495.

Majhi B., Naidu D., Mishra A.P., Satapathy S.C., 2020. Improved prediction of daily pan evaporation using Deep-LSTM model. Neural Computing and Applications, 32: 7823-7838. https://doi.org/10.1007/s00521-019-04127-7

Molina J.M., Martínez V., González-Real M.M., Baille A., 2006. A simulation model for predicting hourly pan evaporation from meteorological data. Journal of Hydrology, 318: 250–261. https://doi.org/10.1016/j.jhydrol.2005.06.016

O’brien R.M., 2007. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41: 673-690. https://doi.org/10.1007/s11135-006-9018-6

Pammar L., Deka P.C., 2015. Forecasting daily pan evaporation using hybrid model of wavelet transform and support vector machines. International Journal of Hydrology Science and Technology, 5: 274-294.

Penman H.L., 1948. Natural evaporation from open water, bare soil and grass. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 193: 120–145. https://doi.org/10.1098/rspa.1948.0037

Penman H.L., 1956. Evaporation: an introductory survey. Netherlands Journal of Agricultural Science, 4: 9–29. https://doi.org/10.4236/jss.2016.43010

Piri J., Amin S., Moghaddamnia A., Keshavarz A., Han D., Remesan R., 2009. Daily pan evaporation modeling in a hot and dry climate. Journal of Hydrologic Engineering, 14: 803-811. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000056

Rahimikhoob A., 2009. Estimating daily pan evaporation using artificial neural network in a semi-arid environment. Theoretical and Applied Climatology, 98: 101-105. https://doi.org/10.1007/s00704-008-0096-3

Shirsath P.B., Singh A.K., 2010. A Comparative Study of Daily Pan Evaporation Estimation Using ANN, Regression and Climate Based Models. Water Resources Management, 24: 1571-1581. https://doi.org/10.1007/s11269-009-9514-2

Seifi A., Soroush F., 2020. Pan evaporation estimation and derivation of explicit optimized equations by novel hybrid meta-heuristic ANN based methods in different climates of Iran. Computers and Electronics in Agriculture, 173, art. no. 105418. https://doi.org/10.1016/j.compag.2020.105418

Shiri J., Marti P., Singh V.P., 2014. Evaluation of gene expression programming approaches for estimating daily evaporation through spatial and temporal data scanning. Hydrological Processes, 28(3): 1215-1225. https://doi.org/10.1002/hyp.9669

Strobl C., Malley J., Tutz G., 2009. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4): 323-348. https://doi.org/10.1037/a0016973

Tabari H., Marofi S., Sabziparvar A.-A., 2010. Estimation of daily pan evaporation using artificial neural network and multivariate non-linear regression. Irrigation Science, 28: 399-406. https://doi.org/10.1007/s00271-009-0201-0

Valiantzas J.D., 2006. Simplified versions for the Penman evaporation equation using routine weather data. Journal of Hydrology, 331: 690–702. https://doi.org/10.1016/j.jhydrol.2006.06.012

Vatcheva K.P, Lee M, McCormick J.B, Rahbar M.H., 2016. Multicollinearity in Regression Analyses Conducted in Epidemiologic Studies. Epidemiology, 6(2): 227-246. https://doi.org/10.4172/2161-1165.1000227

Wang L., Niu Z., Kisi O., Li C., Yu D., 2017. Pan evaporation modeling using four different heuristic approaches. Computers and Electronics in Agriculture, 140: 203-213. https://doi.org/10.1016/j.compag.2017.05.036

Wang H., Yan H., Zeng W., Lei G., Ao C., Zha Y., 2020. A novel nonlinear Arps decline model with salp swarm algorithm for predicting pan evaporation in the arid and semi-arid regions of China. Journal of Hydrology, 582, art. no. 124545. https://doi.org/10.1016/j.jhydrol.2020.124545

Wright Μ.Ν., Wager S., Probst P., Package ‘ranger’: 2020. A Fast Implementation of Random Forests,. Available online: https://github.com/imbs-hl/ranger (accessed on 1/5/2020)

Xu C.-Y., Singh V.P., 1998. Dependence of evaporation on meteorological variables at different time-scales and intercomparison of estimation methods. Hydrological Processes, 12: 429–442. https://doi.org/10.1002/(sici)1099-1085(19980315)12:3<429::aid-hyp581>3.0.co,2-a

Downloads

Published

2024-08-26

How to Cite

Babakos, K., Papamichail, D., Pisinaras, V., Tziachris, P., Demertzi, K., & Aschonitis, V. (2024). Using a random cross-validation technique to compare typical regression vs. Random Forests for modelling pan evaporation. Italian Journal of Agrometeorology, (1), 59–72. https://doi.org/10.36253/ijam-2043

Issue

Section

RESEARCH ARTICLES