Genetic (SSRs) versus morphological differentiation of date palm cultivars: Fst versus Pst estimates

Date Palm (Phoenix dactylifera L.) is one of the oldest domesticated fruit trees. For future breeding program, knowledge on genetic structure of cultivars is necessary. Therefore, the present study was performed with the following aims: 1- To provide data on genetic diversity and genetic structure of 36 date palm cultivars, 2- To provide data on the association between fruit characteristics and the genetic features of the cultivars. We used nine SSRs and EST-SSR loci for our genetic investigation. The most of SSR loci obtained have a high Gst value (0.70), and therefore have a good discrimination power for date palm cultivar differentiation task. K-Means clustering grouped date palm cultivars either in two broad clusters, or in 16 smaller genetic groups. This was supported by delta K = 2 of the STRUCTURE analysis. AMOVA produced significant genetic difference among date palm cultivars (PhiPT = 0.70, P = 0.001). New genetic differentiation parameters estimated also produced significant difference among date palm cultivars (G’st(Nei) = 0.673, P =0.001; G’st(Hed) = 0.738, P = 0.001). Test of assignment revealed that some of the cultivars have 33-66% misassignment, probably due to genetic admixture. Heatmaps of  genetic versus morphological/or agronomical characters in date palm cultivars differed from each other showing the cultivars morphological changes is not merely related to their genetic content. It points toward the potential role played either by environmental conditions or local selection practice. The new findings can be utilized in future conservation and breeding of date palms in the country.


INTRODUCTION
Plant species of the family Palmae/ or Arecaceae are distributed mainly in tropical and subtropical areas, but a few species grow at higher latitudes in the southern hemisphere. The main diversification centers of these taxa are the equatorial coast of Africa, Oceania, the Brazilian coast, the Amazon, Indonesia and the Antilles (Moore & Uhl, 1982).
The palm trees greatly contribute to the economy of the people around the world. Different sort of fruits, seeds, the 'palmito', honeys, 'sagu' (material with starch extracted from the centre of the trunks), different drinks from the sap or the fruits, and crystallized sugar from the sap, are only some of the palm tree products consumed by mankind (Rivas et al. 2012). Among date palm tree species, African oil palm (Elaeis guineensis), the coconut tree (Cocos nucifera), the date palm (Phoenix dactylifera) and the betel nut palm (Areca catechu), are considered as the main cultivated plant species. They are cultivated in about 14.585.811, 11.208.072, 1.264.611 and 834,878 hectares respectively (FAO, 2010).
The Date Palm (Phoenix dactylifera L.) is one of oldest domesticated fruit trees, which its wild plants records date back to 5000-6000 BC in Iran, Egypt and Pakistan (El Hadrami & El Hadrami, 2009). This important food plant produced about 7.048.089 tons of date only in Algeria, Saudi Arabia, Egypt, the Arab Emirates, Iraq, Iran, Morocco, Oman, Pakistan and Tunis (FAO, 2010).
Successful future development of date palm industry and cultivation depends on proper evaluating, utilizing, and conserving date palm genetic resources, as well as efficient assessment of the present and potential future cultivars (Jaradat, 2014).
One of the main tasks in plant genetic resources investigation is evaluation of available genetic diversity. Genetic diversity of date palm would be studied at different levels, including between cultivars, populations or individual clones, as well as between different geographical regions. Genetic variability may be measured at the morphological, physiological, biochemical or molecular levels (Jaradat, 2014).
The degree and distribution amount of genetic diversity may vary among different oases and populations, due to historical, geographical, ecological and anthropogenic factors (Jaradat, 2014). Mankind can also influence the genetic diversity of date palms by his activities like cultivation practice, social behavior, artificial selection as well as spatiotemporal exchange and movement of germplasm (Jaradat, 2014).
Date palm cultivars are reported to have a common genetic back-ground and therefore, proper differentiation of the cultivars and individual plant assignments in each cultivar is a difficult task and mistakes are inevitable in that. This may also be due to genetic admixture of the date palms (Sharifi et. al. 2018, Saboori et al. 2019a,b, Gros-Balthazard et al. 2020. "In general, the question of individual assignment to population samples resulted in the development of different statistical methods distinguishing between resident individuals that are ''mis-assigned'' (have a genotype that is most likely to occur in a population other than the one in which the individual was sampled) by error from real immigrant individuals (i.e., type I error, Piry et al. 2004). "In assignment investigation, Monte Carlo resampling methods have been proposed to identify a statistical threshold beyond which individuals are likely to be excluded from a given reference population sample. The principle behind these resampling methods is to approximate the distribution of genotype likelihoods in a reference population sample and then compare the likelihood computed for the to-be-assigned individual to that distribution (Piry et al. 2004)".
A combination of stable morphological characters and molecular markers may be used in date palm genetic diversity studies and discrimination among closely related date palm cultivars and clones (Johnson et al. 2015). Different molecular markers (neutral, multilocus and DNA-sequence based markers) have been utilized in date palm genetic diversity investigations as well as cultivar phylogeny analyses (see for example, Sharifi et al. 2018, Saboori et al. 2019, Saboori et al. 2020). Among these molecular markers, the nuclear microsatellite markers (simple sequence repeat, SSRs) are known to be precise and accurate in genetic finger printing of date palm cultivars (Ahmed et al., 2013, Johnson et al. 2015, Zehdi-Azouzi. et al. 2015. Moreover, Zhao et al. (2013) developed several EST-SSR (Expressed sequence tag-SSR) gene based markers to investigate date palm (Phoenix dactylifera L.) genetic finger printing. These genetic markers may provide a valuable genetic and genomic tool for further genetic research and varietal development in date palm, such as diversity study, QTL mapping, and molecular breeding.
Date palm comprises one of the most important horticultural crops of Iran which is cultivates in several parts of the country but it is mainly in southern parts of Iran (Fig. 1). They have about 400 date palm cultivars, currently under cultivation. Although domestic date palm identification started by 1960s in Iran, it was basically relied on morphological features. However, recent genetic investigations utilize molecular approaches (Hajia et al. 2015).
The genetic investigations on Iran date palms, are mainly focused on cultivar identification and evaluation, genetic diversity analyses and cultivars relationships, as well as male and female cultivars discrimination (see for example, Hajian, 2007, Marsafari and Mehrabi, 2013, Hassanzadeh Khankahdani and Bagheri, 2019. Saboori et al. 2020). However, with regard to 400 date palm cultivars and different geographical areas of their cultivation, we need a lot more detailed genetic studies in these cultivars.
Along with genetic diversity, significant difference in morphological and agronomic characters of date palm cultivars is important for breeding purpose. QST, is a quantitative genetic analog of Wright's FST (Spitze 1993, Prout andBarker 1993). The FST gives provides a standardized measure of the genetic differentiation among presumed populations, while the QST provides the amount of genetic variance among populations relative to the total genetic variance. In fact, the average QST of a neutral additive quantitative trait is expected to be equal to the mean value of FST for neutral genetic loci. The FST can be readily measured on commonly available genetic markers, and QST can be measured by an appropriate breeding design in a common garden setting. Therefore, QST is an index of the effect of selection on the quantitative trait. If QST is higher than FST, it is taken as evidence of spatially divergent selection on the studied quantitative trait. If QST is much smaller than FST then this has been taken as evidence of spatially uniform stabilizing selection, which makes the trait diverge less than expected by chance.
According to Leinonen et al. (2006) and Brommer (2011) "when QST estimates are not available, PST can be justified as a substitute." According to Brommer (2011) "divergence across populations of species that are less amenable for proper QST estimation may still be of considerable evolutionary or conservation interest'' and it can be assessed by using PST. This in turn estimates the quantitative genetic differentiation (i.e., additive genetic variance) using quantitative trait measurements within populations (Brommer, 2011). The PST index assesses the local adaptation through natural selection of wild populations and is an approximation of the quantitative genetic differentiation index (QST), obtained in common garden experiments (Gentili et al. 2018).
The relationship between the values of PST and FST can be used to estimate the relative importance of genetic processes and selection: (a) PST= FST indicates that divergence is compatible with a scenario of genetic drift; (b) PST > FST indicates directional selection (i.e., when one extreme phenotype (Gentili et al. 2018).
The quantification of population differentiation based on neutral genetic markers and quantitative traits can highlight the relative role of evolutionary processes such as natural selection, genetic drift and gene flow for patterns of local adaptation (Brommer, 2011;Leinonen et al., 2013).
Fixation index (FST) is widely used to estimate genetic differentiation with neutral loci (SSR, ISSR, AFLP) by analyzing the variance in allele frequency (Wright, 1965). In contrast, phenotypic differentiation index (PST) is an estimate of quantitative genetic differentiation (i.e., additive genetic variance) using quantitative trait measurements within populations (e.g., plant size, growth rate, etc.; Brommer, 2011).

Plant materials and morphological features
We used 36 cultivars including 122 trees were collected from Ahwaz germplasm collection (Omol-tomair station of Date Palm & Tropical Fruits Research Center, Ahwaz, Iran) and different date palm orchards located in Hormozgan and kerman provinces, Iran (Saboori et al. 2019, Saboori et al. 2020.
The fruit characters were used based on Saboori et al. 2020. They were including weight of fruit and seed, length and width of fruit, length, and width of the seed.

EST-SSR and SSR markers
Genomic DNA of fresh leaves were extracted from date palm cultivars collected by modified CTAB protocol (Saboori et al. 2020). For genetic investigation we used three EST-SSR and six SSR loci. Two primers EST-PDG3119-rubisco and EST-DPG0633-Laccase were selected (Zhao et al., 2013), while EST-GTE primer was designed by Primer3 and Gene Runner software. They were then checked for accuracy by BLAST algorithm. Six primers MPdCIR078, MPdCIR085, PdCUC3-ssr2, MPdCIR090, MPdCIR048 and MPdCIR025 were selected for SSR marker (Bodian et al, 2014). The sequences of primers of EST-SSR and SSR markers are listed in Table S1.

Genetic diversity analyses
The SSR and EST-SSR bands obtained were treated as binary characters (Podani 2000) and used for further analyses. DCA (Dentrented correspondance analysis) was used to evaluate suitability of SSR and EST-SSR bands obtained. Discriminant power of the bands obtained was determined by POPGENE program. Genetic diversity parameters in the date palm cultivars were estimated by GeneAlex 4.2. A heat map was produced on these parameters by R package.

Genetic grouping of the cultivars
In order to find the proper number of genetic groups within date palm studied, we followed two different sta-tistical approaches. 1-We used K-Means clustering as performed in Genodive program, which is based on likelihood method. 2-Delta K was obtained from STRUTURE analysis which is a Bayesian-based method. Details of these methods are according to Sharifi et al. (2018).
GenoDive provides two different statistics that can determine the number of clusters. These are pseudo-F-statistic; (the optimal clustering is the one with the highest value for the pseudo-f statistic), and the Bayesian Information Criterion (BIC, calculated using sum of squares and the optimal clustering is the one with the lowest value) (Meirmans2020). Both these criteria work well for clustering populations and individuals, especially when there is random mating within populations but BIC has the benefit that it can be used to determine whether there actually is any population structure at all (Meirmans 2020).
Based on the number of Ks obtained we performed Ward clustering as performed in PAST and STRUC-TURE analysis as implemented in STRUCTURE program.
The genetic differentiation of the studied cultivars was determined by AMOVA as performed in GeneAlex, as well asby Gst-Nei and Gst-Hederick as performed in Genodive.
Correlation between morphological characters studied was determined by Pearson coefficient of correlation. In order to compare groups of the cultivars based on both molecular and morphological characters, heat maps were constructed by related commands in R package.
Population assignment was performed by two different methods: 1-By discriminant analysis (DA) as performed in SPSS program. In this analysis a summary table was produced which indicates relatedness of each case to its presumed population, and finally provide a percentage value for each population membership based on likelihood method. 2-By using Assignment test in GeneAlex, which is also based on likelihood method and provides a total membership percentage for all data in question and also provide pairwise populations graph.

Phenotypic versus genetic differentiation
PST index was used to estimate the role of local adaptation through natural selection in date palm populations, compared to that of genetic differentiation. For each population pair, pairwise PST values were calculated for each trait (and for an average PST), using the following formula: In this formula, ð 2 B and ð 2 W are between-population and within population variance components for a trait, respectively; h2 expresses the heritability (the proportion of phenotypic variance that. is due to additive genetic effects); the scalar c expresses the proportion of the total variance that is presumed to be due to additive genetic variance across populations (Broker,2011;Leinonen et al., 2013).
In the wild, the estimation of the additive genetic variance components is challenging as breeding design is impossible. Therefore, QST is often approximated by PST (Leinonen et al., 2006), which is directly calculated from the total phenotypic variance components with no distinction between the relative contribution of genetic and environmental variations. Therefore, the phenotypic divergence between populations was estimated by the parameter PST as follows: In this formula, ð 2 B and ð 2 W are the respective phenotypic variances between and within populations, c is an estimate of the proportion of the total variance due to additive genetic effects across populations, and h2 is heritability, the proportion of phenotypic variance due to additive genetic effects (Brommer, 2011). In present study Pst was estimated by Pstat of R package (Da Silva and Da Silva, 2018).

SSR and EST-SSR analyses
We obtained in total 40 SSR bands in 122 date palm trees studied. The lowest number of bands (13)  The suitability of SSR and EST-SSR bands for date palm population genetic studies was determined by DCA plot (Fig. 2). The plot shows a well-scattered distribution of SSR loci, which indicated that these loci are from different regions of the genome and are not clustered to each other. Such loci are useful in genetic diversity analyses of the populations.
Discriminating power of SSR and EST-SSR bands versus migration (Nm) is provided in Table S2. The result shows that most of SSR loci obtained have a high Gst value (0.70), and therefore have a good discrimination power for date palm cultivar differentiation task. This is also evidenced with the high mean Gst value = 0.81 obtained.

Genetic diversity of Date palm cultivars
Data with regard to genetic diversity parameters determined in 122 individual trees of 36 date palm cultivars are presented in Table S3.
The range of polymorphism percentage varied from 2.5 in cultivar Kharook (No. 13), to 25 in. cultivar Khadhrawi (No. 17). The mean value for polymorphism was 13.07%. Usually, date palm cultivars show similar genetic contents, and therefore, about 13% genetic polymorphism is yet appreciable for further breeding studies if accompanied by some degree of morphological and agronomical desirable traits variation.
Heat-map constructed based on genetic diversity parameters (Fig. 3), reveals that based on percentage of genetic polymorphism (P), Nei' gene diversity (He) and Shanon Information Index (I), date palms may be classified in 5 or 6 genetic groups. This classification is sharper by considering only genetic polymorphism parameter.

Grouping of the cultivars
The Nei genetic distance determined in the cultivars studied varied from 0.067 between cultivars 1 and 2, to 0.46 between cultivars Estameran (No. 19) and Mashtoom (No. 28). These low values of genetic distance, indicates a high degree of genetic alikeness in date palm cultivars cultivated in the country.
For grouping of the cultivars based on SSR markers, we first performed K-Means clustering by Genodive program (Table S4). The results indicated that these cultivars can be grouped either in two broad clusters according to Calinski & Harabasz' pseudo-F: k = 2, or in 16 smaller genetic groups according to Bayesian Information Criterion: k = 16.
Ward clustering of the date palm cultivars based on SSR and EST-SSR data (Fig. 4), also grouped the genotypes in two major clusters and about 16 sub-clusters which is in agreement with K-Means clustering.
WARD dengrogram produced two main clusters or genetic groups in accord with K-Means clustering result. The cultivars 1-13 comprise the first genetic group and form the first main cluster, while the other cultivars form the second major cluster or genetic group.
In the first main cluster, the cultivars are distributed in three sub-clusters A-C. Replicates of the cultivars 1-4 show a higher level of genetic similarity and are placed in a single sub-cluster, (A). Replicates of the cultivars 9-13 comprise the second sub-cluster B, while replicates of the cultivar 5-9 form the sub-cluster C. Replicates of the cultivar 4, were admixed in two sub-clusters A and C. Few date palm plants of these cultivars also show some degree of admixture.
Since clustering is based on distance parameter only, we also tried STRUCTURE analysis for genotype grouping, which is a Bayesian-based method. For this, we first obtained K value by Evanno method, which produced delta K = 2. This is in agreement with major growing obtained by K-Means clustering. However, to obtain a better and more detailed picture on the cultivars genetic grouping, we carried out STRUCTURE analysis based on K values 2-5 (Fig. 5). The best genetic grouping obtained seems to be K =5.
Based on K =5, the cultivars 1-4 show genetic affinity and comprise the first genetic group. This is followed by the cultivars 5-13, then 14-2, 23-30, and finally the cultivars 31-36, form the fifth genetic group. All these five genetic groups show a low degree of genetic admixture with the other groups.

Genetic difference of the cultivars
AMOVA produced significant genetic difference among date palm cultivars (PhiPT = 0.70, P = 0.001). It also revealed that 70% of total genetic variability occurs due to among cultivar difference, while 31% occurs due to within population genetic variability. Moreover, pair-wise AMOVA (Table S5) produced significant genetic difference between the cultivars of the two main clusters as well as the cultivars of different sub-clusters in UPGMA dendrogram. New genetic differentiation parameters estimated produced significant difference among date palm cultivars (G'st(Nei) = 0.673, P =0.001; G'st(Hed) = 0.738, P = 0.001). These results indicate the presence of genetic variability within date palm cultivar germplasm, which can be used in future breeding program.
In Table S6, some parts of assignment result for 122 date palms have been given (only those samples inferred to be from other population are given). Assignment is based on positive likelihood, and therefore the lower the value shows the correct assignment (inferred population).

Fst versus Pst estimates
Details of morphological characters studied are given in Fig. 6. ANOVA produced significant difference (P <0.01), for these characters among the studied cultivars.
Heat-maps of the 45 date palm trees based on morphological versus genetic (SSRs), data are presented in Fig. 8. Comparison of the groupings obtained reveals difference in the clustering results. Moreover, the Mantel test performed between the two clustering results did not produced a significant association between the two markers (r = 0.057, P = 0.16), supporting the heat maps. Therefore, grouping and cultivar relationship illustrated by morphological characters studied do not accord with genetic relationship of the same date palm cultivars.
Fst versus Pst analyses, revealed that in most of the studied morphological characters, the Pst value greatly exceeds that of genetic Fst value. For example, some of the pair-wise comparison between cultivar No. 3 and the others are provided in Table S7.
Therefore, PST > FST indicates directional selection in quantitative fruit and seed characteristics has been occurred in the studied date palm cultivars. Different factors may be responsible for these directional changes, like ecological and environmental conditions in which the cultivars grow, selection practiced by the breeders or locals, etc. In general, morphological difference along with genetic diversity present in the studied cultivars may contribute in future breeding of date palm.

Genetic diversity
Present study revealed the presence of a low to moderate genetic diversity within date palm cultivars studied. This is in accord with the studies performed in Iraq and Tunesian date palms by Jubrael et al. (2005) and Zehdi et al. (2015), who suggested a common genetic basis among date palm genotypes despite the differences in fruit characters and tree morphology. Low genetic diversity within date palm germplasm was revealed but both neutral molecular markers like, ISSRs and SSRs (see for example, Sharifi et al. 2018, Saboori et al. 2020, and sequence-based marker, like chloroplast DNA (Sharifi et al. 2018).

Cultivars genetic grouping
The cultivars studied were placed in two major genetic groups by both K-Means and Bayesian-based delta K estimation more detailed analysis, revealed that they can be classified in 5 different genetic groups. Such data may be used in future breeding program. Cultivar grouping based on STRUCTURE analysis were also utilized by the other researchers in date palms (see for example, Sharifi et al. 2018). It is important in plants with almost common genetic background like date palms to classify them in different genetic classes.

Population assignment
Population assignment seems to be a prerequisite step in selecting plant individuals and breeding date palm, as these plants have a common genetic background and show overlapping genetic structure. This may also happen due to genetic admixture of the date palms (Sharifi et. al. 2018, Saboori et al. 2020. We obtained about 33% of incorrectly assigned date palms in respect to their presumed populations. This may be either due to improper plant sampling or identification within the germplasm, or due to gene flow and admixture among these cultivars. In any case, such cases should be considered in future breeding program. In a similar study concerned with genetic structure of Tunesian date palms, Zehdi et al. (2015) reported the presence of admixed cultivars too. They considered that the gene flows between eastern and western origins mostly from east to west following a human-mediated diffusion of the species, is the reason for the formation of mixed genotypes. Saboori et al. (2020), investigated the genetic structure of 13 date palm cultivars by SCoT molecular markers and reported some degree of genetic admixture among the cultivars. Though, they did not study specifically assignment of the plants to their populations, by looking at the clustering result of their samples, it becomes evident that some of the plants a presumed cultivar has been placed intermixed with plants of another cultivar. However, Sharifi et al (2018) investigated the gene flow and assignment in 16 date palm cultivars by using ISSR molecular markers and observed some degree of population admixture and few cases of incorrectly assigned date palms.
In an elaborate and precisely studied report by Gros-Balthazard et al. (2020), they used a joint ethnographic study and genetic analysis of date palms to test whether named date palm types are true-to-type cultivars versus incorrectly assigned samples in desert nearby Siwa (also known as "feral" in Battesti in Egypt). They recognized three categories of genotypes within their extensive collection namely, true-to-type cultivar samples, ethnovarieties and samples of local categories. Therefore, there is a huge mistake in assigning date palms to their respective population or named cultivar.

Genetic versus phenotypic differential
Aljuhani (2016), studied the degree of dissimilarity and the impact of location on the genetic relationship between local cultivars in Saudi Arabia by using and twenty-four nuclear microsatellite loci. He reported a high level of genetic polymorphism in some of the loci, and could differentiate the studied cultivars by these markers. Some of these cultivars were grouped according to their geographical area in which they were cultivated. We obtained a higher value for Pst versus Fst, almost in all date palm cultivars studied and for most of the fruit and seed characters. The Pst is taken as index for morphological local adaptation through natural selection, but influenced by environment (Brommer, 2011). If Pst = FST, it indicates that divergence is due to genetic drift; and if Pst > Fst, it indicates the role of directional selection (i.e., when one extreme phenotype is favored over other ones) among populations; and finally, if Pst< Fst, it indicates that the same phenotypes are favored in different populations due to stabilizing selection. We may therefore, suggest that, due to some local environmental face or local practice of cultivation or selection, some adaptive changes have occurred in date palm cultivars in the country. QST-FST compari-son has shown that trait divergence due to natural selection, as opposed to genetic drift have occurred in many taxa (Leinonen et al. 2013).
In present study, the Mantel test did not produce significant association between the cultivar grouping and morphological grouping, in other words we did not see co-variation between genetic and morphological traits. However, in Qst-Fst investigation carried out by Sˇurinova´ et al. (2018), in 11 populations of Festuca rubra, they reported the existence of adaptive differentiation in phenotypic traits and their plasticity across the climatic gradient and observed statistically significant co-variation between markers and phenotypic traits, which is likely caused by isolation by adaptation.
In a similar study, Caré et al. (2018) investigated the high morphological differentiation in crown architecture in contrasts with low population genetic structure of German Norway Spruce Stands by using Pst-Fst method and 11 nuclear SSR molecular markers.
Norway spruce trees have narrow crown phenotypes, whereas lowland trees have broader crowns. Narrow crown phenotypes are likely the result of adaptation to heavy snow loads combined with high wind speeds. They observed a high differentiation of morphological traits (Pst = 0.952-0.989) between the neighboring autochthonous and allochthonous stands of similar age contrasts with the very low neutral genetic differentiation (Fst = 0.002-0.007; G"st = 0.002-0.030), suggesting that directional selection at adaptive gene loci was involved in phenotypic differentiation.
It has been suggested that "the QST-FST method is still underused in 'omics' contexts, in which it may be useful for identifying evolutionary significance in large data sets in the absence of evolutionary models (Leinonen et al. 2013)".
In conclusion we may sat that considering different molecular studies in date palm genotypes both around the world and in our country, and irrespective of molecular marker used (neutral versus sequence based markers), a low to moderate genetic diversity is present in limited number of cultivars investigated till now. We need to carry one further detailed population genetics analysis in much more number of accessions and cultivars to possibly broaden the genetic variability of date palm for future breeding.    Abbreviations: SSD(T) = Total sum of squares, SSD(AC) = Among clusters sum of squares, and SSD(WC) = Within clusters sum of squares.