Predicting symptoms of downy mildew, powdery mildew, and gray mold diseases of grapevine through machine learning

Downy mildew, powdery mildew, and gray mold are major diseases of grapevine with a strong negative impact on fruit yield and fruit quality. These diseases are controlled by the application of chemicals, which may cause undesirable effects on the environment and on human health. Thus, monitoring and forecasting crop disease is essential to support integrated pest management (IPM) measures. In this study, two tree-based machine learning (ML) algorithms, random forest and C5.0, were compared to test their capability to predict the appearance of symptoms of grapevine diseases, considering meteorological conditions, spatial indices, the number of crop protection treatments and the frequency of monitoring days in which symptoms were recorded in the previous year. Data collected in Tuscany region (Italy), on the presence of symptoms on grapevine, from 2006 to 2017 were divided with an 80/20 proportion in training and test set, data collected in 2018 and 2019 were tested as independent years for downy mildew and powdery mildew. The frequency of symptoms in the previous year and the cumulative precipitation from April to seven days before the monitoring day were the most important variables among those considered in the analysis for predicting the occurrence of disease symptoms. The best performance in predicting the presence of symptoms of the three diseases was obtained with the algorithm C5.0 by applying (i) a technique to deal with imbalanced dataset (i.e., symptoms were detected in the minority of observations) and (ii) an optimized cut-off for predictions. The balanced accuracy achieved in the test set was 0.8 for downy mildew, 0.7 for powdery mildew and 0.9 for gray mold. The application of the models for downy mildew and powdery mildew in the two independent years (2018 and 2019) achieved a lower balanced accuracy, around 0.7 for both the diseases. Machine learning models were able to select the best predictors and to unravel the complex relationships among geographic indices, bioclimatic indices, protection treatments and the frequency of symptoms in the previous year.


INTRODUCTION
Downy mildew, powdery mildew, and gray mold are major diseases of grapevine (Vitis vinifera L.), affecting leaves and fruits and causing yield loss and quality decrease of must and wine. Downy mildew is caused by the Oomycete Plasmopara viticola (Berk. & Curt.) Berl. & de Toni, with sexual spores determining primary infections and asexual spores causing secondary infections (Gessler et al., 2011). This pathogen infects leaves, shoots, and bunches, damaging up to 75% of the crop in one season when no treatments are applied (Buonassisi et al., 2017), thus leading to great economic losses. Powdery mildew is caused by Erisyphe necator Schwein., a polycyclic disease with two distinct phases: primary infections are caused by sexual spores (ascospores) and secondary infections are determined by asexual spores (conidia) (Gadoury and Pearson, 1988), on all green tissues of grapevines, mainly leaves and berries (Gadoury et al., 2001;Caffi et al., 2011). Botrytis cinerea Pers. is the causal agent of gray mold and in grapevine infects all green tissues, particularly ripening berries, with different infection pathways for conidia (inflorescences, young clusters and ripening berries) and mycelium (berry-toberry) (Elmer et al., 2007).
Because these pathogens may cause severe symptoms on grapevines at the beginning of infection, control strategies have focused on early treatments, even in integrated pest management (IPM), as prevention to stop the pathogen outbreak before its establishment. Applying fungicide treatments during the growing season remains the most common practice to control these diseases, from early spring onward, with differences between years due to weather conditions and to the geographic location of the vineyard (Chen et al., 2020;Lu et al., 2020;Molitor et al., 2016). However, concerns about the negative impact of chemicals on environmental and human health have resulted in restrictions to regulate fungicide use, such as the EU directives (i.e., Directive 1107/2009/EU) (Valdés-Gómez et al., 2017). European Commission currently enforces national action plans for pesticide reduction, encouraging the use of monitoring networks (Directive 128/2009/EC), forecasting models, and dissemination tools to share this information among growers and technicians (Pertot et al., 2017). Therefore, a reliable monitoring and forecasting system is essential for deriving prediction indices in support of sustainable protection measures (e.g., Marchi et al., 2016).
To this aim, various weather-driven models, either mechanistic (Rossi et al., 2008;Caffi et al., 2011;Legler et al., 2011;Gonzales et al., 2015) or empirical (Orlandini et al., 1993;Rodríguez-Rajo et al., 2010;Hill et al., 2019), have been developed for predicting grapevine diseases and assisting farmers in decision-making for crop protection. Decision support systems (DSSs), based on predictive models that use weather data and infection information, may provide this service to farmers (Rossi et al., 2014;Pertot et al., 2017). In particular, DSSs may help determine the time window for fungicide application to optimize their effects and to reduce the number of interventions during the growing season. Nevertheless, currently available models are mainly focused on predicting the risk of the outbreak, rather than the pressure of the disease. This approach may cause unnecessary fungicide applications and the use of untargeted chemical compounds, in turn contravening control regulations based on the maximum number of treatments allowed for each season (mandatory in IPM).
Since these three diseases are strongly influenced by seasonal weather conditions, albeit with different pathways among vectors, varying annually and driven by composite interactions between the disease agent and the host plant (growth stage and grapevine cultivar), models that provide information on infection risk need to combine numerous weather variables, crop parameters, and disease traits. Increasing computing power is providing the means to capture and process abundant data, and to reveal associations among variables that describe the weather-pathogen-host interactions. In particular, machine learning (ML) techniques allow considering a large number of variables, integrating diverse data sources in close real time, in order to assess the interactions among disease agent, host plant, and climatic variability, before visible symptoms are present, with the aim of ensuring effective and sustainable fungicide management (Lee et al., 2019;Sperschneider, 2019).
The potential of statistical models and ML algorithms to predict the occurrence of grapevine diseases has been rarely assessed (Chen et al., 2020). Here, we investigated the ability of ML algorithms to clarify the occurrence of symptoms of these three important diseases of grapevine based on prevailing weather conditions both within and between locations and years, generating temporal-and spatial-explicit projections of the infections. These models were implemented using as inputs: bioclimatic and geographic indices, the frequency of monitored symptoms in the previous year, and the number of crop protection treatments during the growing season. The aim of the study was to calculate the overall probability of symptom appearance at field scale, using the ML approach and the area-wide IPM monitoring network of Regione Toscana, providing farmers with a tool able to address timely and accurate grapevine disease forecasting.

Monitoring grapevine diseases
Data on disease symptoms were obtained from Agroambiente.info (http://www.agroambiente.info/), the agricultural and environmental portal of Phytosanitary Service of Regione Toscana (Italy). Agroambiente.info stores data deriving from an area-wide IPM monitoring network, which covers most of the wine production area of Tuscany. Sampling is carried out weekly by trained field technicians, from the leaf development stage (midend of April) to harvest (mid-end of September), in a variable number of vineyards through years . In each vineyard, date and presence of symptoms of downy mildew, powdery mildew and gray mold are recorded inspecting leaves and/or bunches. In addition, the date of treatments is reported, as well as the active substance (maximum two active substances for each treatment), among those allowed by "Integrated Production Regulation" of Regione Toscana. A simplified index of disease severity for each disease is documented during the monitoring activity, though it was not used for the ML exercise. Considering that different cultivars may show variable susceptibilities to the three diseases, the monitoring network focuses only on the cv Sangiovese, which is the most widespread and important for Tuscany denomination of controlled origin red wines. A numeric identification code ("farm ID") is assigned to each of the selected vineyards.
In this study, we considered data from 2006 to 2019, excluding 2011 since no data was available for that year in the regional database. In each dataset of the three diseases, the observations were classified as "inf " or "no", according to the presence or absence of disease symptoms, respectively. Observations were classified as "inf" when symptoms were present on leaves and/or on bunches.

Variables associated with grapevine diseases
Variables were calculated for each vineyard and each disease to be used as features for the ML models. The set of variables included: bioclimatic indices, geographical indices, indices indicating the number of phytosanitary treatments applied, an index referring to the frequency of the presence of infection in the previous year, and the day-of-year (doy) (Tab. 1).
The package 'raster' (Hijmans, 2019a) of the R environment (R Core Team, 2020), was used to extract for each vineyard from the raster files of the Tuscany region: (i) the Euclidean distance from the sea (dis_sea) in m and (ii) the elevation above sea level (m), obtained from the Digital Elevation Model (dem).
Meteorological data were downloaded from the open access ERA5-Land dataset, the latest generation of ECMWF atmospheric reanalysis, which provides hourly data from 1981 to 2-3 months before present in a fixed grid and with a native resolution of 9 km (Copernicus Climate Change Service, 2017).
ERA5-Land dataset was selected over others (e.g., ERA5, ERA-Interim) because of its higher spatial resolution and its improved correlation with in situ measurements, especially concerning the water cycle (Muñoz-Sabater et al., 2021). Using reanalysis meteorological data, as ERA5-Land dataset, for modelling has the main advantages of providing data with a better temporal and spatial coverage with respect to the data collected with real weather stations that do not have a uniform spatial and temporal coverage and may be subjected to breaks (Padulano et a., 2021). Indeed, concerning the density of the weather monitoring network of Tuscany Region, the distance from each vineyard to its nearest station ranged between 120 m to 27000 m with an average value of 6640 m.
Meteorological data from the ERA5-Land dataset used to calculate daily maximum, minimum and average air temperature (°C) and daily precipitation (mm) for the period from 2006 to 2019 were: "2-m temperature", defined as the hourly temperature of air at 2 m above the ground, sea or inland waters, and "total precipitation", defined as accumulated liquid and frozen water, including rain and snow, that falls to the Earth's surface. The distance between each ERA5 grid-box and each georeferenced monitoring site was calculated through the R package 'geosphere' (Hijmans, 2019 b), with the aim of associating each sampling site with an ERA5 grid-box.
Bioclimatic indices were calculated starting from daily data on air temperature and precipitation, considering three different periods: (i) from November to January for the indices describing the weather conditions during overwintering (average of minimum, maximum and mean daily temperature), (ii) from November to March for monthly mean air temperature and cumulative precipitation, (iii) from April to October (monitoring period) for the bioclimatic indices describing the weather conditions in the interval from 14 to 7 days before the monitoring day or during the 7 days before the monitoring day. We considered these two time steps to identify the environmental conditions of the period during which the pathogen penetration into the host tissues was most probable (avg_14_7, avg_max_14_7, avg_ min_14_7, cum_rain_14_7) (Chen et al., 2020;Carisse et al., 2009;Barka et al., 2002).
The phytosanitary treatments were included in the ML models as counts of the applications carried out in each vineyard from the beginning of the vegetative season, considering three periods: (i) cumulative number of treatments carried out until 14 days before the monitor-ing day (count_tr_14), (ii) number of treatments carried out from 14 to 7 days before the monitoring day (count_ tr_7_14), and (iii) number of treatments carried out in the 7 days before the monitoring day (count_tr_0_7).
In addition, the models included as a variable the frequency of monitoring days in which symptoms were recorded in the previous year, to consider the potential presence of the pathogens overwintering in the vine- yard. The latter variable was calculated as the percentage of observations in which the presence of symptoms was observed in each year and in each vineyard, and it was assigned to the following monitoring year (perc_inf).

Data analysis
The three datasets on the symptoms observed of downy mildew, powdery mildew and gray mold covered a period from 2006 to 2019, excluding 2011 since no data were available for that year. The datasets had a different number of observations: 18857 for downy mildew, 14848 for powdery mildew, and 4960 for gray mold.
The dataset of each disease was partitioned with the aim of training and testing the ML models. The two datasets, downy mildew and powdery mildew, were divided in one training set and two test sets. In particular, data collected in the period from 2006 to 2017 were partitioned with an 80/20 proportion in "training" and "test 1", respectively. The partition was carried out using the R package 'healthcareai' (Thatcher et al., 2020), considering the group "farm ID x year", which allowed ensuring that observations from each vineyard in each year were not contained in both training set and test set. A further test ("test 2") included data collected in 2018 and 2019 to evaluate the performance of the model on two independent years. Since less data were available in comparison with the other two diseases, the dataset on gray mold infection (2006-2019) was partitioned only in training set and test set with an 80/20 proportion in "training" and "test 20%", considering the group "farm ID x year".
The class "inf " was present in a different percentage of the total observations for the three diseases: 37% in training set, 35% in "test 1", and 58% in "test 2" for downy mildew; (ii) 16% in training set, 15% in "test 1", and 28% in "test 2" for powdery mildew; (iii) 10% in training set and 8% in test set for gray mold.
Spearman's correlation among the variables associated with each disease was calculated with the R package 'Hmisc' (Harrell, 2019), to remove redundant features, highlighting variables that were highly correlated. Thus, in the case that the Spearman's correlation coefficient between two variables was higher than 0.9 (absolute value), we selected the one with the highest importance, using a filter approach based on the Receiver Operating Characteristic (ROC) curve analysis, a plot of true positive rate (TPR) versus false positive rate (FPR) at various threshold settings.
Machine learning models selected for comparison were: (i) Random forest (RF), based on several decision trees, which operates as an ensemble to produce an out-put with low bias and lower variance than each single tree; and (ii) C5.0 based on single binary decision tree or a collection of rules with a boosted procedure. Both algorithms were tree-based models, being able to handle complex non-linear relationships and outperforming other ML algorithms in earth science and ecology applications (Thessen, 2016).
The train function of the R package 'caret' (Kuhn, 2020) was used to train and tune the two models, RF and C5.0, by means of the ROC metric, using a 10-fold cross-validation clustered by the grouping factor "farm ID x year". Thus, while running the train function of 'caret', through the 10-fold cross-validation the training set is partitioned in 10 equal size subsamples of which 9 subsamples are used to train the model and a single subsample is retained as the validation data for testing the performance of the model with the aim of tuning the model parameters.
The models were evaluated using a confusion matrix among observed and predicted classes (Tab. 2) and a set of performance metrics on the training set and test set (Tab. 3).
The best performing algorithm (evaluated on training set and test set), was further optimized: (i) applying subsampling techniques for class imbalance during the training with the R package 'caret', and (ii) selecting the cut-off, to be applied on the probability outputs of the models for classification, which optimized the informedness (Specificity+Sensitivity-1) of the trained model, using the R package 'MLeval' (John, 2020).
For the best performing algorithm, the importance of variables in the modelling mechanism was extracted using the function varImp() of the R package "caret".
The results of the cross-validation on the training set highlighted ROC-AUC values higher than 0.8 for both RF and C5.0 (Tab. 4).
AUC-PR was higher than 0.6 for downy mildew and gray mold, while it was around 0.6 for powdery mildew. The sensitivity was around 0.7 for downy mildew, while it was around 0.4 for powdery mildew and gray mold.
The specificity was around 0.8 for downy mildew, while it was higher than 0.9 for the other two diseases. The two algorithms performed similarly on the training set, with slightly better results for RF.
For all the three diseases, the algorithm C5.0 performed better than RF on the test set ("test 1"), reporting in particular a higher sensitivity and a higher balanced accuracy (Tab. 5).
The C5.0 algorithm predicted the presence of symptoms of downy mildew with a balanced accuracy of 78%. The overall predicted "inf " were correct for 71% of the cases, whereas the percentage of correctly predicted "no" on the total prediction of no symptoms was 87%. The percentage of cases in which "inf" was correctly identified was 74%, while for "no" it was 85%. The presence of symptoms of powdery mildew was predicted by C5.0 with a balanced accuracy of 69%. The overall predicted "inf" were correct for 42% of the cases, whereas the per-centage of correctly predicted "no" on the total prediction of no symptoms was 96%. The percentage of cases in which "inf " was correctly identified was 63%, while for "no" it was 90%. The presence of symptoms of gray mold was predicted by C5.0 with a balanced accuracy of 79%. The overall predicted "inf" were correct for 59% of the cases, whereas the percentage of correctly predicted "no" on the total prediction of no symptoms was 99%. The percentage of cases in which "inf " was correctly identified was 85%, while for "no" it was 96%.
The subsampling technique "down" was selected as the best according to the performance on the test set of the three diseases (Tab. S1). Applying the subsampling technique to the algorithm C5.0 increased the percentage of cases in which "inf" was correctly identified and the balanced accuracy for all the three diseases (Tab. 6). Moreover, the informedness of the C5.0 algorithm with down-sampling was the highest when applying a cut-off equal to: (i) 0.46 for the prediction of downy mildew and   Results on "test 2", highlighted a prediction accuracy around 0.7 for both the symptoms of downy mildew and powdery mildew, in both 2018 and 2019 (Tab. 7).
The importance of the variables in the modelling process for the three diseases is reported in Tab. 8.
Around 50% of the splits were associated with the first six most important variables for downy mildew, namely: the percentage of observations of the year before in which symptoms were present, day of year, the cumulative precipitation from April to 7 days before the monitoring day, the elevation, and the precipitation of March and February.
For powdery mildew, the first eight most important variables covered around 50% of the splits, being: the percentage of the observation of the previous year in which symptoms appeared, the distance from sea, the cumulative precipitation from April until 7 days before the monitoring day, cumulative degree day (gdd) from April until 7 days before the monitoring day, the elevation, the count of the treatments carried out until 14 days before the monitoring day, the average minimum temperature between 14 and 7 days before the monitoring day, the precipitation of February.
For gray mold, the first six most important variables were associated to around 50% of the splits: the  Table 8. Importance of variables in the modelling process of the algorithm C5.0 'down' for the three diseases. The importance is calculated as the percentage of splits associated with each predictor (metric = 'splits').
Among the three models, common variables on the top of the list were: perc_inf and cum_rain_7. Similar variables, such as doy and gdd, were ranked within the top variables for all the three models. Partial dependence plots (pdp) (Fig. 1) represent the marginal effect of the latter variables on the probability of predicting the presence of symptoms for the three diseases. In particular, with increasing values of perc_inf and cum_rain_7, the probability of predicting "inf " increased for powdery mildew, until about 500 mm for cum_rain_7. Concerning downy mildew, with increasing values of doy, the probability of the class "inf " increased, until about doy 200, while decreasing after this threshold. For powdery mildew, the probability of the class "inf" markedly increased with ggd_7, until 1000, while the probability of the class "inf" for gray mold increased with ggd_ jan_7, between about 1200 and 1600. Results of the application of ML algorithms, trained on historical data, for the prediction of the appearance of symptoms of downy mildew, powdery mildew, and gray mold in grapevine, demonstrated a better performance of the algorithm C5.0 in comparison with the RF, in the test set ("test 1"), for all the three diseases. Similar results were found by Volpi et al. (2020), who applied ML algorithms for predicting the probability of infestation by Bactrocera oleae on olive trees, founding that C5.0 had a higher ROC compared to k-nearest neighbors (k-NN), Classification and Regression Trees (CART), Random Forest (RF) and Neural Network (NN).
The three datasets on grapevine diseases were unbalanced, since the observations in which the symptoms of diseases were recorded were a minority of the total observations, in particular for powdery mildew and gray mold (<20%). Class imbalance problems may lead to partial behaviour of the classifier towards the majority class and sampling methods are most commonly applied to balance the class distribution of the training data (Kaur et al., 2019). Moreover, the output of C5.0 classification for each observation is a probability between 0 and 1 of being classified as "inf" or "no" and the standard cut-off applied for classification is 0.5, which is not the most appropriate for imbalanced datasets (Zou et al., 2016). Therefore, the application of both the downsampling technique and a cut-off for classification, optimized to improve the informedness of the model, increased the sensitivity of the model, thus increasing the amount of true positives and decreasing the amount of false negatives, which has a high cost for the prediction of plant diseases.
The final models achieved a good performance in predicting the presence of symptoms of the three diseases on "test 1", with a balanced accuracy of 0.8 for downy mildew, 0.7 for powdery mildew and 0.9 for gray mold, highlighting a lower occurrence of wrong classifications for the gray mold model.
The application of the models for downy mildew and powdery mildew on the two independent years (2018 and 2019) achieved a lower balanced accuracy than on "test 1", being, however, around 0.7 for the two diseases. This slightly lower performance of the ML model on unseen data may be due to the known bias-variance tradeoff of ML models, being complex models more subjected to high variance (Abu-Mostafa et al., 2012).
Differently to other ML techniques (i.e., Bayesian network) in which the causal relationships among the variables are linked to previous knowledge (Lu et al., 2020), the effect of the variables on the prediction in tree-based ML models is entirely data-driven. However, it is possible to interpret the C5.0 model by exploring the importance of variables in the modelling mechanism and the effect of variables on the prediction.
Indeed, it was possible to highlight, for all the diseases, a higher frequency in the top-ranking positions, in terms of importance, of indices related to precipitation rather than to air temperature. In particular, the cumulative precipitation from the beginning of April to 7 days before the day of observation was among the most important variables for the three diseases.
For downy mildew and gray mold, the probability of infection increased with increasing values of cumulative precipitation (approximately until 500 mm); while, for powdery mildew, the relationship was less clear. In particular, downy mildew is typically diffused in viticultural areas characterized by temperate climate and frequent precipitation during spring and summer (Lafon and Clerjeau, 1988), and precipitation was reported as a key driver for both primary and secondary infections (Rossi et al., 2008). Climate conditions at the end of spring, particularly precipitation, were found to be decisive for the development of downy mildew symptoms (Chen et al., 2020). Precipitation events have a positive effect in spreading the infection of powdery mildew (dispersing cleistothecia and releasing ascospores) and, though free water is detrimental to conidial germination, in rainy seasons the environmental conditions become favourable for the infection due to mild temperatures, limited direct sunlight, and high humidity (Gadoury et al., 2012). Furthermore, more severe gray mold epidemics were reported under wet growing seasons, since the wetness duration is a key factor for both the development of early season and late season infections (Ciliberti et al., 2015 a, b).
The frequency of symptoms observed in the previous year (the year before the one considered for ML application) was the most important variable in the modelling mechanism for the appearance of symptoms of the three diseases. In particular, the risk of symptom development in the current year increased with the occurrence of severe infection in the previous year. Severe infections may be a source of overwintering pathogens, potentially leading to new infections under optimal environmental conditions. Indeed, downy mildew is able to overwinter mainly on infected shoots, while powdery mildew in grapevine buds, and gray mold in grapevine debris (Pertot et al., 2017;Jaspers et al., 2013;Rügner et al., 2002).
Variables describing the progress of the season, such as doy for downy mildew and cumulative degree days for powdery mildew and gray mold, were among the most important variables for predicting the development of disease symptoms. In particular, the probability of infection increased for downy mildew from April to about mid-July and then decreased. Previous studies reported that the progress of disease relates to the phenological development of grapevine (Molitor et al., 2016;Carmichael et al., 2018;Bove et al., 2020). In addition, ggd_7 was a key variable for predicting the occurrence of powdery mildew symptoms; Carisse et al. (2009) used this variable to predict the proportion of seasonal airborne inoculum.
However, even if gdd or doy were among the most important variable in the modelling mechanism, the use of a multivariate approach through ML algorithms with respect to a univariate cumulative GDD index is recognized to be more suited to model non-linear patterns and variable interactions often characterizing real-world ecological patterns (Yo et al., 2017).
The effect of the number of chemical treatments was more important for powdery mildew than for the other diseases, since only for powdery mildew an index indicating the frequency of treatments was among the top variables. Further studies are needed to evaluate new approaches to include the effect of treatments in modelling predictions, considering the type of chemical and the mechanism of action.
Results from this work highlighted that a ML algorithm trained on historical data, may be efficiently used to predict the appearance of symptoms of downy mildew, powdery mildew, and gray mold in grapevine, providing an innovative control tool, even in association with traditional models. The simplicity of the approach requires, however, the availability of symptom records, which is the monitoring of disease occurrence. Massive datasets of disease symptoms or pest attacks may allow not only regional-level analyses, like in the present study, but also the recognition of specific and localized risk factors, which take into account additional variables, conferring susceptibility or resistance to a given disease or pest. The ML algorithms can be implemented with additional weather data that are used in other models for disease prediction (Rossi et al., 2008;Chen et al., 2020). Yet, climatic inputs can be further enriched to forecast the occurrence of downy mildew, powdery mildew, and gray mold under different climate scenarios and assess the future trajectories of these diseases.
The integration of ML models in decision support systems also represents a practical application to plan the reduction of fungicide treatments. In particular, the use of ML is a promising approach to implement early warning systems, identifying periods when climatic conditions are favourable to promote disease development and alerting on symptoms that are associated with high risk of infection (Caffi et al., 2010;Pellegrini et al., 2010). As future activity, the integration of mechanistic and ML models (e.g., in Bayesian networks) could be tested to evaluate the effect of including previous knowledge in the modelling mechanism.

CONCLUSION
The application of ML algorithms, trained on historical data, was proved useful for the prediction of the appearance of symptoms of downy mildew, powdery mildew, and gray mold in grapevine. The grape disease monitoring network enabled the observation of a wide range of symptoms. This, in combination with ERA5-Land dataset allowed the development of early detection algorithms to support the implementation of IPM in viticulture. Compared to ground weather stations, ERA5 data had the advantage of providing information for locations that are not covered by traditional agrometeorological networks. Nevertheless, for becoming fully operative, this approach needs an efficient monitoring system at the landscape scale and intensive field surveys at the local scale. Link to a GitHub repository containing the R script for ML models and a subset of data as example: https://github. com/aeditsrl/grapevine_ML