Genetic differentiation and diversity of Callosobruchus chinensis collections from China

Callosobruchus chinensis (Linnaeus) is one of the most destructive pests of leguminous seeds. Genetic differentiation and diversity analysis of 345 C. chinensis individuals from 23 geographic populations using 20 polymorphic simple sequence repeats revealed a total of 149 alleles with an average of 7.45 alleles per locus. The average Shannon's information index was 1.015. The gene flow and genetic differentiation rate values at the 20 loci ranged from 0.201 to 1.841 and 11.0–47.2%, with averages of 0.849 and 24.4%, respectively. In the 23 geographic populations, the effective number of alleles and observed heterozygosity ranged from 1.441 to 2.218 and 0.191–0.410, respectively. Shannon's information index ranged from 0.357 to 0.949, with the highest value in Hohhot and the lowest in Rudong. In all comparisons, the fixation index (FST) values ranged from 0.049 to 0.441 with a total FST value of 0.254 among the 23 C. chinensis populations, indicating a moderate level of genetic differentiation and gene flow among these populations. Analysis of molecular variance revealed that the genetic variation within populations accounted for 76.7% of the total genetic variation. The genetic similarity values between populations varied from 0.617 to 0.969, whereas genetic distances varied from 0.032 to 0.483. Using unweighted pair-group method using arithmetical averages cluster analysis, the 23 geographic collections were classified into four distinct genetic groups but most of them were clustered into a single group. The pattern of the three concentrated groups from polymerase chain reactions analysis showed a somewhat different result with cluster.


Introduction
As a traditional human food resource, legumes are important crops for planting structure adjustment, international trade and increasing the income of mountain farmers in China. In recent years, human health consciousness has increased and the nutritive value of legumes has been recognized. Thus, legumes have increasingly become the focus of agricultural production. China ranks first in the world for food legume yield and number of types produced. Also, as the world's second-biggest exporter, China accounts for 12% of global food legume exports (Liu, 2012). Therefore, to a certain degree, edible beans have great competitive advantage and development potential for food production in the mountains and the international market.
To date, research reports about the genetic diversity of C. chinensis have been rare. A 522-bp fragment of the mitochondrial cytochrome c oxidase I (COI) gene was sequenced for genetic diversity analysis of eight populations of C. chinensis from Japan, Korea and Taiwan collected from different habitats, and six haplotypes were detected based on the COI sequence differences (Tuda et al., 2004). By combining the sequence differences of the mitochondrial DNA (mtDNA) COI gene, cytochrome b gene (Cytb) and the second internal transcribed spacer of the nuclear ribosomal DNA (rDNA ITS2) of 12 geo-populations of C. chinensis and three other species, the phylogenetic relationships among different populations and species were analyzed and discussed (Xu, 2007). However, no studies of the population genetic structure and genetic diversity of C. chinensis using SSR markers have been reported to date.
We conducted a national survey of the distribution and degree of damage of C. chinensis to stored edible beans during 2011-2013 and obtained populations from diverse geographic regions. Subsequently, high-put sequencing of the C. chinensis genome was performed with the Illumina HiSeq2500 platform and more than 6300 pairs of SSR primers were developed, which provided a great tool for the study of C. chinensis population variation and genetic diversity (Duan et al., 2014b). Based on this preliminary work, 20 pairs of polymorphic SSR primers were selected to analyze the genetic diversity of 23 different C. chinensis geographic populations. The aim of this study was to elucidate the population structure and genetic differences of C. chinensis from different parts of China, which will provide valuable information for C. chinensisresistance breeding.

C. chinensis population
The pests were collected from mungbean or adzuki bean seeds preserved in different regions and then stored at −30°C. Information on the C. chinensis collections is shown in fig. 1.

Genomic DNA extraction
The total genomic DNA of a single adult with its gut removed was extracted with the TIANamp Genomic DNA Kit according to the manufacturer's instructions (TIANGEN, Beijing, China). The quality of the extracted DNA was monitored on 1.5% agarose gels. DNA purity was checked using the NanoPhotometer ® spectrophotometer (Implen, San Diego, CA, USA). The DNA concentration was measured

Simple sequence repeat primers
The SSR primers used in the present study were developed by Duan et al. (2014b). The primers were synthesized by Sangon Biotech Co., Ltd. (Shanghai, China). A total of 20 polymorphic SSR loci were selected for genetic diversity analysis. The characteristics of the primers are shown in table 1.

Polymerase chain reaction (PCR) amplification and product detection
Simple sequence repeat analysis was performed following the procedure of Duan et al. (2014b). PCR were carried out in 10 µl reaction volumes containing 10 mmol l −1 Tris-HCl pH

Data analysis
The fragment lengths of the alleles from each individual were recorded from the banding patterns of SSR amplifications. In order to reduce genotyping errors of microsatellites, MICROCHECKER 2.2.3 soft was used to remove or adjust null alleles (van Oosterhout et al., 2004). Data analysis was performed using the population genetics analysis software PopGene32 (version 1.31). This included determination of the number of alleles (Na), number of effective alleles (Ne), observed heterozygosity (Ho), expected heterozygosity (He), Nei's expected heterozygosity (Nei), Shannon's information index (I), F-statistics (F IS , F IT , F ST ), gene flow (N m ), Nei's genetic identity and genetic distance (D) between various geographical populations. Significance analysis was performed with SPSS10.0 software (SPSS Inc., Chicago, IL).
Population differentiation was tested by comparing allele frequencies among 23 populations using Weir and Cockerham's F ST (θ) value (Weir & Cockerham, 1984). The F ST -value was estimated under the null hypothesis of nondifferentiation among subpopulations, where F ST = 0. Statistical analysis was done by comparing the calculated F ST values with those of datasets in which individual bean weevils were randomized across populations 10,000 times using the programme Multilocus 1.3b (Agapow & Burt, 2001). Gene flow (Nm) was determined using the equation Asadollahi et al., 2013). Analysis of molecular variance (AMOVA) was used to partition the total genetic variance within and among populations from different geographic regions (Isenegger et al., 2008). The AMOVA was implemented in the GENALEX 6 program (Peakall & Smouse, 2006).
The bootstrap analysis was performed with 1000 replicates for statistical support of branches in unweighted pair-group method using arithmetical averages (UPGMA) clustering trees using the MEGA 4 software (Tamura et al., 2007). Principal component analysis (PCA) and three-dimensional (3D)-PCA graph were performed based on Euclid distance among 345 C. chinensis collections using NTSYS-pc software package V2.2 (Rohlf, 1998).

Genetic polymorphism of the microsatellite loci
MICRO-CHECKER estimated 0.069 of frequency of null alleles, which coincided with the excellent amplification of these microsatellite loci among C. chinensis. In all, 345 C. chinensis individuals were genotyped at 20 microsatellite loci, which were used to evaluate the genetic diversity and differentiation among 23 C. chinensis populations (table 2 and (table 2). In addition, the observed heterozygosity significantly deviated from expected heterozygosity at loci CCM46 and CCM118 (P < 0.05).
The genetic differentiation among populations was assessed using fixation indices (F IS , F IT , F ST ) and measuring gene flow (N m ) at each locus. The F-statistics at each locus among the 23 populations are listed in table 3. The gene flow and genetic differentiation rate values at the 20 loci ranged from 0.201 to 1.841 and 11.0-47.2%, with averages of 0.849 and 24.4%, respectively. It was found that 12 loci, including CCM72, CCM118, CCM1, CCM83, CCM29, CCM56, CCM125, CCM143, CCM71, CCM46, CCM108 and CCM16, had high genetic differentiation among populations (N m < 1). The other eight loci, including CCM139, CCM168, CCM2, CCM40, CCM49, CCM33, CCM162 and CCM4, showed 1 < N m < 2; none of the loci had low genetic differentiation

Genetic differentiation and diversity among C. chinensis populations
The genetic diversity of C. chinensis from the 23 geographic populations is shown in table 4. In the 23 geographic populations, the number of alleles and effective number of alleles per population ranged from 1.800 to 3.200 and 1.441 to 2.218, respectively, with the observed heterozygosity varying from 0.191 to 0.410, the expected heterozygosity ranging from 0.230 to 0.431 and Nei's expected heterozygosity ranging from 0.223 to 0.417. Shannon's information index ranged from 0.357 to 0.949, with the highest value in Hohhot (HH), Inner Mongolia and the lowest in Rudong (RD), Jiangsu.
Genetic differentiation was analyzed by relating the genotypic distribution with F ST estimates. The total F ST value was 0.254 in the 23 C. chinensis populations, indicating a moderate level of genetic differentiation and gene flow among these populations. Pairwise comparisons of the fixation index (F ST ) and gene flow (Nm) were estimated from the 20 SSR loci between C. chinensis populations (table 5). The highest F ST value was 0.441 between the Tangshan (TS) and RD populations, followed by F ST values of 0.440 and 0.430 between the Changde (CD) and TS, and CD and Guangan (GA) populations, respectively. Although the F ST value between the TS and RD is greater than that between CD and TS and between CD and GA, there are no significant differences between them. The corresponding minimum Nm values were 0.633, 0.640 and 0.663, respectively, which suggested considerable genetic differentiation between these pairwise populations. In contrast, the F ST values between the Urumuqi (UM) and Lingchuan (LC), and Enshi (ES) and Xinye (XY) populations were 0.049 and 0.050, with corresponding Nm values of 9.788 and 9.581, respectively, demonstrating very low genetic differentiation and sufficient gene flow between the two pairwise populations. AMOVA analysis indicated that 76.7% of the total molecular variance was due to differences within geographic regions and 23.3% of the variance was associated with differences among populations (table 6).
Based on the amplified fragment sizes using the 20 pairs of SSR primers, the genetic similarity and genetic distance between the 23 populations were determined (table 7). The data showed that the genetic similarity between populations varied from 0.617 to 0.969. The highest genetic similarity was found between the Hefei (HF, Anhui) and RD (Jiangsu) populations, followed by ES (Hubei) and XY (Henan), and LC (Shanxi) and Lishui (LS, Jiangsu), with genetic identity values of 0.969, 0.950 and 0.945, respectively. The lowest genetic similarity was found between the GA (Sichuan) and Wulong (WL, Chongqing) populations, followed by WL and Bobai (BB), and Luliang (LL) and TS, with genetic identity of 0.617, 0.648 and 0.651, respectively. These results suggested that the C. chinensis populations from HF, Anhui and RD, Jiangsu were closely related whereas those from GA, Sichuan and WL, Chongqing were relatively genetically distinct.
Cluster analysis among C. chinensis populations Based on Nei's genetic distance, a UPGMA dendrogram was generated from the cluster analysis to show the genetic relationships among the 23 C. chinensis populations. As shown in fig. 3, the 23 populations were classified into four distinct genetic groups with a 0.12 genetic similarity coefficient.  were grouped into cluster II (ES, XY and HH), cluster III (GA, Jiangchuan (JC) and BB), and cluster IV (TS and Mengtougou (MT). Most of the C. chinensis populations were clustered into a single group, which suggested that relatively high genetic similarity existed among these populations, while there may be genetic differences among different geographic populations.

PCA of C. chinensis
PCA of 345 C. chinensis collections and 3D-PCA graph were performed based on Euclid distance. The first principal components accounted for 72.7% of total variation, followed by 16.3 and 8.4%. These collections were mainly divided into three concentrated groups by geographical locations as specific component, which slightly interpenetrated with one another. The first principal component mainly contained the collections from 18 geographic populations, including HF, RD, CD, UM, LC, QJ, LS, DZ, YY, QD, MC, DX, LL, BD, WL, HH, XY and ES. The second principal component was mainly composed of C. chinensis from GA, JC and BB populations. The third merely contained collections from TS and MT ( fig. 4). The pattern of the three major groups from PCA showed a somewhat different result with cluster. The first principal component corresponds to the Cluster I and Cluster II. However, the second and third components coincide with the Cluster III and Cluster IV, respectively.

Discussion
C. chinensis is a destructive storage pest of legume beans that often causes considerable loss in the quantity and quality of seeds during transportation and storage. This pest widely is distributed around China and has multiple hosts, showing strong environmental adaptability. However, a lack of genetic information about this pest has meant that a series of genetic questions remain largely unanswered, such as its population genetic structure, genetic differentiation, kinship and biotype abundance. An understanding of population genetic structure and differentiation is important when devising pest or disease management and resistance-screening strategies. For example, a study of the chickpea pathogen Ascochyta rabiei found that significant genetic differentiation existed between disease nursery populations and populations in commercial fields, demonstrating that screening for resistance in the disease nursery may not have exposed plants to appropriate isolates of the pathogen (Peever et al., 2004). Moreover, some different resistance genes should be considered for resistance breeding in order to control effectively pest or pathogen populations with high genetic variation. On the contrary, the variety with a certain target resistance gene could be deployed and cultivated in different regions suffering from a certain pest. Therefore, the clarification of the genetic diversity of pest or pathogen populations makes sense in enacting resistance breeding strategy. The genetic diversity of several populations of C. chinensis from Japan, Korea and Taiwan was analyzed via polymorphism in mitochondrial COI gene sequences (Tuda et al., 2004;Xu, 2007), but the number of C. chinensis populations examined and the genetic variation information provided by COI sequence differences was limited.
Microsatellite loci are powerful tools for population genetics studies because of the reproducibility and precision of these markers in detecting multiple alleles. Here, we used 20 microsatellite markers developed by Duan et al.   illumina paired-end sequencing of C. chinensis genome to determine the genetic diversity and structure of 23 C. chinensis geographic populations from China. Actually, we firstly selected more than 30 pairs of polymorphic SSR primers in the amplification of C. chinensis collections and finally 20 microsatellite loci with excellent amplification among the collections were applied to the analysis of genetic diversity. We tested the null alleles with MICROCHECKER and found the frequency of null alleles is very low among these 20 microsatellite loci. Hence, null alleles may have merely slight effects on the results of genetic analysis in present study. The 20 microsatellite loci showed 149 alleles in 345 C. chinensis individuals and averaged 7.45 alleles per locus, and Shannon's information index averaged 1.015, and the values of F IS and F ST ranged from −0.456 to 0.791 and 0.110-0.472, with an average of 0.119 and 0.244, respectively, indicating that these SSR markers were informative and effective in differentiating among C. chinensis collections from different regions. An analysis of allelic diversity at the 20 microsatellite loci provided an insight into the population structure of C. chinensis from different regions (sampled in 2012 and 2013) representing important regional locations for legume production and storage in China.
The present study is the first attempt at assessing the genetic diversity of C. chinensis using the SSR marker technique. It has demonstrated the validity and suitability of using SSR markers to detect genetic variation among C. chinensis populations. Gene heterozygosity is an optimal parameter to measure genetic variation within populations. The average heterozygosity approximately reflects the level of genetic variation. The observed heterozygosity did not significantly deviate from expected heterozygosity at most of the loci, which confirmed validity of these loci among the collections in this study. The average expected and observed heterozygosity values for the 20 microsatellite loci and 23 C. chinensis populations were 0.4901 and 0.3389, respectively, suggesting a moderate level of genetic diversity among the samples. From this, we demonstrated that the C. chinensis population in ES (Hubei) had the highest genetic diversity, whereas the RD (Jiangsu) population had the lowest genetic diversity. The fixation index (F ST ) reflects the degree of genetic differentiation among populations. F ST is close to 0 when the genetic variation shows no sign of fixation between populations. However, F ST is close to 1 when genetic differentiation is at a high level. Gene Genetic differentiation of Callosobruchus chinensis in China flow (N m ) is considered one of the major factors in the homogenization of population genetic structure. Because gene flow and genetic drift are mutually antagonistic, exchange of genes can increase the amount of genetic variation within populations while decreasing differentiation between populations (Whitlock & McCauley, 1999). In this study, the F ST values between populations ranged from 0.049 to 0.441, with corresponding Nm values varying from 9.788 to 0.633, indicating large differences in gene flow between C. chinensis collections from different regions. It is easy to understand why the maximum genetic differentiation appeared between the TS and RD populations, followed by the TS and CD populations, because there is typical geographical isolation between TS (Hebei) and RD (Jiangsu), and TS and Changde (Hunan), with distances of more than 1300 km from TS to RD and Changde. There are major differences in growth seasons, weather conditions, agricultural practices and legume varieties between these regions. In addition, based on our surveys, the Jilü series varieties and landraces were cultivated in TS, while Sulü and Tongcan series varieties and local breeds were mainly cultivated in RD. Moreover, mung bean and adzuki bean were main legumes in TS, whereas faba bean, pea and cowpea were cultivated widely in RD. Therefore, legume varieties varied greatly between the regions. In addition, the lack of legume seed exchange indeed existed between the regions due to different kinds of legumes, lack of communication and so on. In contrast, although the two areas are approximately 3000 km apart, the UM, Xinjiang and LC, Shanxi collections had the lowest genetic variation. This is understandable because frequent seed exchanges have occurred and the same varieties have been planted and stored in both areas for several years, furthermore, C. chinensis is apt to communicate and diffuse along with legume seeds by means of egg, larva, pupa or adult, which results in interaction of C. chinensis between the two areas and identical host selection pressures in both populations.
Based on AMOVA analysis, 23.3% of the total variation was found among regions, whereas 76.7% variance was attributable to population divergence within the regions. China is a large country with complex terrain and hence different levels of geographical isolation arise between C. chinensis populations from different regions, which enhances genetic variation among populations but decreases genetic differentiation within populations. However, frequent exchanges of legume seeds have taken place via trading and communication among some production areas for many years, which very likely includes C. chinensis eggs, nymphs, pupas, or adults. Therefore, the spread and diffusion of this pest increases the mating and gene flow among different collections and enhances heterogenization within populations.
As shown in the UPGMA clustering plot based on genetic distance between populations, the maximum genetic similarity was between the HF and RD collections, followed by the UM and LC, and ES and XY populations, whereas the WL and GA populations shared the minimum genetic similarity, followed by the WL and BB populations. The geographic distance between WL and GA is less than 300 km, but both of these regions are mountainous, and the food legume varieties planted in the two areas differ greatly; hence, the host selection pressures differ and there is limited communication, which has resulted in geographical isolation and limited exchange of genetic information between the two populations.
The pattern of three groups from PCA analysis looked different from cluster result with four genetic groups. Compared

132
with figs 3 and 4, Cluster I and Cluster II in fig. 3 were combined to the first principal component in fig. 4, whereas the Cluster III and Cluster IV coincided with the other two components, respectively. However, we think the results from these two methods are similar and comparable. It is not hard to see that 23 populations can also be classified into three genetic groups at a similarity coefficient of 0.135. In the present study, UPGMA cluster and PCA analysis were conducted based on geographic populations (subgroups) and individuals, respectively. As discussed above, there often exists gene exchange among C. chinensis from these regions and then leads to a decline in genetic differentiation therefore individuals from Cluster I and Cluster II comprised a principal component.
High adaptability to different environments or hosts is usually correlated with abundant genetic diversity. However, C. chinensis is apt to diffuse and spread by way of seed trade and communication, which results in frequent gene flow and decreases heterogeneity among collections from different regions in China. Therefore, our findings that there is high adaptability and moderate genetic diversity among C. chinensis populations are understandable and reasonable, and should help to develop a control strategy for C. chinensis.