Avicennia genus molecular phylogeny and barcoding: A multiple approach

. The genus Avicennia contains of 8 species which show a great extent of morphological and genetic variability, which make taxonomy of the genus difficult. Molecular barcoding along with advancement in computational approaches may be proper methods to investigate and assess the efficiency of different molecular genetic regions in Avicennia species delineation and also produce data on species evolution and divergence. The aims of present study were to utilize multiple genetic data for the species delineation and study the phylogeny of the genus. Moreover, we developed a hypothesis on biogeography of these species with respect to barcode divergence. The results showed that both Internal transcribed spacer (ITS) and trnHG–psbA intergenic spacer (trnHG-psbA) sequences may be used in Avicennia species delineation. Barcode gap analysis and nucleotide difference of the studied taxa showed significant Fst for pair-wise species comparison and the role of nucleotide changes in Avicennia speciation.


INTRODUCTION
DNA barcoding is applied to plant and animal species with the aim to improve organismal identification and taxonomic clarification.The main principles of DNA barcoding are standardization, minimalism, and scalability, which means selection one or a few standard loci that can be sequenced routinely and reliably in very large and diverse sample sets, and obtaining a reliable and conveniently comparable data to differentiate the species in question from one another (Hollingsworth et al. 2011).
Controversy exits on the use and choosing the plant molecular barcode markers.Different researches resulted in general agreement that several different marker combinations produce equivalent performance, and that none of the proposed barcodes is perfect in every respect (Seberg and Petersen 2009).Utilizing a multiple approach for a better species differentiation has been suggested by several authors (see for example, Fazekas et al. 2008).
In most of the studies, researchers use of a common, easily amplified and aligned region such as rbcL, trn L-F spacer regions, mat K,nrITS1,nrITS2 or the full ITS1-5.8S-ITS2(nrITS), as suggested by the CBOL Plant Working Group and BOLD (Cbol 2009;Ratnasingham and Hebert 2007).
The genus Avicennia is composed of eight species of mangrove trees which grow in intertidal zones in tropical and temperate regions of the world.These plant species are economically important as they are extensively used as medicinal plants.In fact, different parts of these plants have ethno medicinal applications for treatment of various diseases such as cancer, diabetes, malaria, rheumatism, asthma, small pox and ulcer (Hrudayanath et al. 2016).These species show variation and are taxonomically complex due to vast geographical distribution and introgressive hybridization (Mori et al. 2015).Therefore, the aims of present study are: 1-Assessment of different molecular markers in Avicennia species delineation through barcode analysis, 2-Species relationships based on molecular markers, and 3-Biogeographical distribution of these species with regard to DNA sequence divergence.

MATERIALS AND METHODS
In this study, we used published data on trn L-F, trn-HG-psbA and ITS sequences for a number of Avicennia species which are reported from different parts of the world in NCBI site (Table 1, 2 and 3).

Data analyses
DNA sequences obtained were initially aligned by MUSCLE program and cured accordingly.The total length, polymorphic sites, average of p distance, Genetic diversity within the studied species, the Fst values for the sequences and Maximum Liklihod phylogenetic tree based on these sequences as well as Tajimas'D test was performed as implemented in MEGA ver.7 (Kumar et al. 2016).Mantel test performed with 1000 permutations, for showing significant association between nucleotide difference of the studied species and population to the geographical longitude and altitude.The range of Bayesian probability value obtained for species with Mr. Bayes analysis (Ronquist et al. 2012).Barcoding analysis and windows sliding of nucleotides were performed by Bar-codingR, and spider package, while Skyline plot and mismatch distribution of nucleotides were determined by ape, and pegas package in R. Monophyletic phylogenetic tree and its statistical test of Rosenberg (2007), as implemented in R package.We used RASP (Reconstruction of ancestral states in phylogeny) program ver.4.2 in the RASP-Bayesian analyzed phylogenetic tree.

Species delimitation and barcode gap analysis
We used published data on trn L-F, trnHG-psbA and ITS sequences for a number of Avicennia species which are reported from different parts of the world (Table 1).In the first step, we evaluated the efficiency of these molecular markers in species delineation, and finally, we extracted different evolutionary information from the one with the highest degree of efficiency.
Details of the studied sequences with regard to Avicennia species differentiation and species phylogeny are presented below.Available data on trn L-F sequence is very limited and is available for only 21 samples of Avicennia officinalis, A. marina, and A. alba.These sequences had a total length 296 bp, with only 14 polymorphic sites, and average p distacne = 0.006.The ML phyloge- netic tree based on these sequences (Fig. 1), revealed that only the samples of A. alba can be differentiated from the other two species.
Analyses performed by Bayesian method of species barcoding as implemented in Barcoding package in R program indicated that only 25% of the studied samples have success in species identification, and the others may not be differentiated.In this analysis also the higher degree of Bayesian probability was obtained for A. alba (0.50-0.97).The probability value obtained for the other species was about 0.06 only.Sliding Windows of the mini-barcodes within trn L-F barcode sequence 9 (Fig. 2), also showed genetic distance in mini-barcodes between A. alba and the other species studied.

TrnHG-psbA sequence efficiency in species delineation and barcoding
TrnHG-psbA sequence data is available for 26 sample in Avicennia species (Table 2) namely, A. officinalis, A. marina, A. bicolor, A. germinans, A. alba, and A. rumphiana.These samples have also been reported from different parts of the world.
Preliminary analysis of trnHG-psbA sequences indicated that the total length of the studied sequences is 167 bp, with 52 polymorphic sites, and average p distance = 0.09.Bayesian method analysis of barcoding for trnHG-psbA sequences, reveals that about 53% of the samples are identified with success.Bayesian probability value obtained for A. officinalis samples, ranged from 0.24 to 0.96.The same value for A. germinans, ranged from 0.5-0.95,for A. marina and A. alba the value ranged from was 0.2 to 0.96.
Monophyletic phylogenetic tree and its statistical test of Rosenberg (2007), as implemented in R package is presented in Fig. 3.The red circles on the nodes of this phylogenetic tree indicates that monophyly has passed the significant test at p = 0.05.Therefore, we observe the presence of significant monophyletic groups for some of the sample in A. officinalis, A. alba, A. germinans, and A. marina.We have also some cases of admixture between the studied species which makes that clade nonmonophyletic.
TrnHG-psbA nucleotide difference in the studied Avicennia species is presented in Fig. 4.This plot shows a great difference of the trnHG-psbA nucleotides, which is a significant difference according to chi-square and Snn test of Hudson (Hudson, 2000).
Pair-wise mismatch plot of these sequences (Fig. 5), revealed that, almost all the studied species-pairs differ significantly in their trnHG-psbA nucleotides.Mini-barcode analysis by windows sliding (Fig. 6), also showed that trnHG-psbA sequences contain mini-barcodes which differs among Avicennia species.Therefore, trnHG-psbA sequences could be utilized for Avicennia.species delineation.
Mini-barcode analysis by windows sliding (Fig. 7), also showed that trnHG-psbA sequences contain minibarcodes which differs among Avicennia species.Therefore, trnHG-psbA sequences could be utilized for Avicennia species delineation.

ITS sequences species delineation and barcoding
ITS sequences (Table3) obtained had total length 494 bp, with polymorphic sites = 88, and average p dist = 0.038.Bayesian method analysis revealed that about 42% of the samples are identified correctly.Bayesian probability value obtained for Avicennia alba ranged from 0.12 to 0.99, for A. bicolor ranged from 0.20 to 0.99, and for A. germinans, ranged from 0.16 to 1.00.Almost the same ranges were obtained for other species.ML phylogenetic tree of 135 species samples (Fig. 8), pro-    duced almost distinct separate clades for the studied species, but some degree of admixtures was observed too.For the test of monophyly, we kept randomly some of the replicates of each species.The result of monophyly and its statistical test of Rosenberg is presented in Fig. 9.The red circles on some of the tree nodes indicates the monophyletic clade which is significant at p = 0.05.
We also obtained monophyletic clades for some of the sample in A. officinalis, A. alba, A. germinans, A. bicolor and A. marina.We have also some cases of admixture between the studied species which makes some of the clade non-monophyletic.
The nucleotide difference in ITS sequences of the studied Avicennia species is presented in Fig. 10.The plot shows a great difference, which is a significant difference according to chi-square and Snn test of Hudson.
The Fst values for ITS sequences ranged from 0.43-0.99.The inter-specific genetic differentiation estimates obtained for the ITS nucleotide produced chi-square = 90.26,with P-value = 0.001.Similarly, Hudson Snn after 1000 permutations produced Snn = 0.82, P<0.001.These values indicate significant difference among the studied samples.Mismatch plot of ITS sequences (Fig. 11), revealed that, almost all the studied species-pairs differ significantly in their trnHG-psbA nucleotides.Moreover, Tajimas'D obtained was 0.32, which indicates the presence of a positive selection on ITS sequences and therefore ITS changes may be in some way related to speciation events in the genus Avicennia.
Windows sliding of ITS sequences also revealed the occurrence of mini-barcodes in these sequences which also differed greatly among the studied species.For example, barcode sequences in some of the species are provided in Figs 12 and 13.
Genetic diversity within the studied species ranged from 0.03 in A. alba to 0.12 in A. germinans, while interspecific genetic distance, 0.04 between A. alba and A. officinalis, and 0.08 between A. marina and A. integra, to 0.43, between A. germinans and A. alba.
DNA barcoding gap analysis of ITS sequences is presented in Fig. 14.Both intra-and inter-specific sequence gaps, supports the use of ITS sequences for delineation of Avicennia species.
Mantel test performed with 1000 permutations, showed no significant association between nucleotide difference of the studied species and population to the geographical longitude and altitude (Correlation r = 0.044, p = 0.1456).
Details of Avicennia species diversification based on ITS sequences and in relation with geographical distri-    bution of these species is presented in the RASP-Bayesian analyzed phylogenetic tree (Fig. 15).
Two main clades are present in this phylogenetic tree.The species of A. alba, A. officinalis, A. marina, A. rumphiana, and A. integra, comprised the first major clade, while, A. schueriana, A. germinnas, and A. bicolor formed the second major clade.
Looking at details of each major clade points out some interesting results.For example, most of A. marina samples were grouped together due to sequence similarity.Though Mantel test revealed no association between nucleotide difference and geographical longitude and latitude of the studied taxa, some interesting relationships between A. marina geographical populations can be seen   when we plot these studied specimens on the world map.For example, A. marina specimens studied from Saudi-Arabia and Egypt show sequence affinity and are placed close to each other in the phylogenetic tree.Moreover, the specimens studied from Madagascar, is placed close to the above said specimens.Madagascar is some-what close to and connected by Indian ocean to Saudi-Arabia and Egypt.
Similarly, Avicennia marina samples from China and Australia, also show sequence similarity and form a separate sub-clade from the other specimens studied.The studied specimens from India stand in a separate sub-clade, far from the sub-clade of China-Australia.
If we consider all the samples studied, biogeographical distribution reveals that the species of A. marina, A. alba, and A. officinalis are mostly found in Asia and Australia region (Denoted A. in Fig. 16), while A. terminals, A. bicolor, and A. schauriena, are distributed in South America (Denoted B in Fig. 16).This may indicate speciation events in Avicennia in two different regions of the world.We have also provided barcodes for geographical regions A and B, which shows nucleotide changes possibly associated / or the outcome of speciation in these two areas (Fig 16).

DISCUSSION
The present study showed that based on ITS sequence analysis, A. alba and A. officinalis show close affinity and comprise sister-clades. A. marina joins these two with some distance.
A. germinans samples form three separate clades, which indicates a potential presence of infra-specific taxon rank within this species.This is in accordance with earlier consideration which propose three different vari- eties for this species, which were later on were merged into a single species with no variety.
A. bicolor showed close relationship to one of the clades of A. germinans.Sample of A. schaueriana comprise a single distinct clade based on ITS data.
However, the species relationships were partly distorted by trnHG-psbA sequence data.The three species of A. officinalis, A. marina, and A. alba, were placed inter-mixed to some degree.Close affinity bet ween A. germinans and A. bicolor are similar to ITS results.The close affinity between A. officinalis, A. marina, was also indicated by Li et al. (2016).These authors showed closer relationship between A. rumphiana and A. alba, which in agreement with our trnHG-psbA tree, and to some extent also with ITS-baes phylogenetic tree.
According to Li et al. ( 2016), even though the first fossil record of Avicennia in the IWP region dates back to the late Eocene of southwest Australia, Avicennia speciation was active during the Miocene.Similarly, they suggest that distribution of ancestral Avicennia was likely to have been similar to its present location, extending from Japan to Borneo and from the Marshall Islands to the Red Sea.Therefore, the materials studied in this project, may have migrated from Japan, china or India, through red sea and reached to the countries like Iran, Egypt, and Sauidi-Arabia.In these migration path, Avicennia speciation may have resulted in formation of A. marina, A. officinalis and A. alba, as well as A. integra, and A. rumpiana.
If we consider our ITS-based phylogenetic tree, we observe the species of the region B, viz. A. bicolor, A. germinans, A. schaueriana.are related through A. rumpiana and A. integra, to the species of region A. Therefore, we may suggest a preliminary hypothesis that through migration of either or both of A. rumpiana and A. integra, new speciation events resulted in the formation of other species found in South America countries.We believe that more works are required to second this raw and immature hypothesis.
Mangroves in general, have a broad distributional patterns, ability in long-distance dispersal and can adapt to rigorous environmental constraints associated with regular seawater inundation.However, the present day distributions of individual taxa show several instances of finite dispersal limitations, especially across open water.These dis-continuities, in the absence of current dispersal barriers, may be explained by persistent past barriers (Duke et al. 2002).
In present study we report genetic diversity both with the studied Avicennia species and between these taxa.A high levels of genetic diversity were also reported among the central populations of many mangrove spe-cies including Avicennia in the Indo-West Pacific (IWP) (Mantiquilla et al. 2021).Mori et al. (2015), suggest that A. bicolor, A. germinans and A. schaueriana are three evolutionary lineages that present historical and ongoing hybridization.They also consider gene flow between A. germinans, and A. schaueriana by propagules rather than pollen in A. schaueriana.
We also reported distinct inter-specific genetic distance and significant Fst value among different species both within those species distributed in the A geographical region (Australia, India, China), and those distributed in the B region (South America in general).In a similar investigation performed presence of a strong genetic structuring resulting in divergence among mangrove populations of Indian Ocean and South China Sea, as well as between South China Sea and Southwestern Pacific was reported (Mantiquilla et al. 2021).

CONCLUSION
With regard to Avicennia species taxonomy and the presence of high level of genetic diversity within these species, we provided distinct molecular barcodes for species delineation.We suggest it is suitable to utilize a combination of ITS nuclear sequences along with trn-HG-psbA spacer region of chloroplast genome for taxonomic purpose.
LIST OF ABBREVIATIONS ITS: Internal transcribed spacer ML: Maximum Liklihod RASP: Reconstruction of ancestral states in phylogeny MEGA: Molecular Evolutionary Genetics Analysis

Figure 1 .
Figure 1.ML Phylogenetic tree of Avicennia species based on ITS sequences.

Figure 2 .
Figure 2. Mini-barcode sliding windows of trnL-F sequence in Avicennia species, showing genetic distance of Avicennia alba with the other studied taxa.

Figure 3 .
Figure 3. Monophyly analysis of Avicennia species based on trn-HG-psbA sequences.a The red circles on the nodes indicate that the clade is significant at p = 0.05.

Figure 6 .
Figure 6.Window sliding of trnHG-psbA sequences in Avicennia species, showing that these species differ in mini-barcodes.

Figure 8 .
Figure 8. ML Phylogenetic tree of Avicennia species based on ITS sequences.

Figure 9 .
Figure 9. Monophyly test of the studied Avicennia species based on ITS sequences.a The red circles on the.nodes indicate a significant monophyletic clade.

Figure 10 .
Figure 10.The nucleotide difference in ITS sequences among Avicennia species.

Figure 11 .
Figure 11.Mismatch plot of ITS sequences in Avicennia species.

Figure 14 .
Figure 14.Barcode gap analysis of ITS sequences revealing both intra-as well as inter-specific sequence difference in Avicennia species studied.

Figure 15 .
Figure 15.RASP Bayesian tree of data, placing the species studied in two major clades.

Figure 16 .
Figure 16.Biogeographical distribution of Avicennia marina populations based on ITS sequences.

Table 1 .
Voucher information and GenBank accession numbers of taxa sampled for the genus Avicennia based on trnL-F data.
Voucher information and GenBank accession numbers of taxa sampled for the genus Avicennia based on trnHG-psbA data.

Table 3 .
Voucher information and GenBank accession numbers of taxa sampled for the genus Avicennia based on ITS data.