Mapping QTLs for protein and oil content in soybean by removing the influence of related traits in a four-way recombinant inbred line population

Abstract Protein content (PC) and oil content (OC) are important breeding traits of soybean [Glycine max (L.) Merr.]. Quantitative trait locus (QTL) mapping for PC and OC is important for molecular breeding in soybean; however, the negative correlation between PC and OC influences the accuracy of QTL mapping. In the current study, a four-way recombinant inbred lines (FW-RILs) population comprising 160 lines derived from the cross (Kenfeng14 × Kenfeng15) × (Heinong48 × Kenfeng19) was planted in eight different environments and PC and OC measured. Conditional and unconditional QTL analyses were carried out by interval mapping (IM) and inclusive complete IM based on linkage maps of 275 simple sequences repeat markers in a FW-RILs population. This analysis revealed 59 unconditional QTLs and 52 conditional QTLs among the FW-RILs. An analysis of additive effects indicated that the effects of 13 protein QTLs were not related to OC, whereas OC affected the expression of 13 and eight QTLs either partially or completely, respectively. Eight QTLs affecting OC were not influenced by PC, whereas six and 26 QTLs were partially and fully affected by PC, respectively. Among the QTLs detected in the current study, two protein QTLs and five oil QTLs had not been previously reported. These findings will facilitate marker-assisted selection and molecular breeding of soybean.


Introduction
Soybean is an economically important crop worldwide and a major source of edible oil and protein. Improving protein content (PC) and oil content (OC) is an important objective in soybean breeding programs. These quantitative traits are controlled by multiple genes and affected by the environment. Detecting quantitative trait loci (QTLs) underlying PC and OC in soybean is helpful to improve the efficiency of soybean breeding. Diers et al. (1992) identified eight QTLs PC and nine QTLs for OC by mapping a population derived from a cross between a G. max experimental line (A81-356022) and the wild soybean (G. soja) line PI468916. Since then a growing number of QTLs controlling protein and OC have been identified. As of December 2018, 241 QTLs for PC and 339 QTLs for OC have been detected according to SoyBase (http://www.soybase.org). These QTLs were detected using F 2 and recombinant inbred line (RIL) populations derived from crosses between two parents (Liang et al., 2010;Qi et al., 2011a;Lu et al., 2013). However, this technique fails to identify some QTLs due to the relatively small differences in the additive effects of putative QTLs and the low degree of coverage of the genome, because molecular markers often exhibit little polymorphism between two parents.
To increase the power of QTL detection, Xu (1996Xu ( , 1998) developed a QTL mapping technique based on a four-way (FW) hybrid design. Furthermore, Ning et al. (2015Ning et al. ( , 2016 established a FW-RILs population in soybean that was subsequently used for linkage map construction and QTL detection for PC and OC (Ning et al., 2016) and pod number-related traits (Liu et al., 2019). Because of highly polymorphic molecular markers, FW-RILs populations show more variation than common mapping populations. The construction of FW-RILs populations involves four parents, and each QTL has a maximum of four alleles. As long as there is a substantial difference in genetic effects between two alleles in a QTL, the QTL can be detected.
There is a significantly negative correlation between PC and OC in soybean (Zhang et al., 2004). The mutual interference among these traits can affect the results of QTL mapping. Zhu (1995) proposed a conditional variable method to analyse the related traits that could eliminate the mutual influence of these traits and improve the accuracy of estimating the genetic effects of QTLs. This conditional method has been used for QTL mapping in rice (Oryza sativa) (Liu et al., 2012), oilseed rape (Brassica napus) (Jiao et al., 2015) and wheat (Triticum aestivum) (Tian et al., 2011).
The current study aimed to enhance the detection of QTLs for PC and OC and improve the accuracy of QTL mapping. The PC and OC data obtained from an FW-RIL soybean population (Ning et al., 2016) were reanalysed by both unconditional and conditional methods. The QTLs controlling PC and OC identified in the current study will facilitate the molecular breeding of soybean.

Field experiment and evaluation of phenotypic traits
The four parental and 160 FW-RIL lines were planted in Harbin and Keshan ( Table S1). The parents (parental traits are shown in Table S2) and FW-RILs were planted in a randomized complete block design with three replications. A plot containing three rows (3 m long × 0.65 m wide) was planted for each line. The seeds of ten mature plants from the middle row of each plot were harvested, and the PC and OC of the dry matter of seed determined using an Infratec 1241 NIR Grain Analyser (FOSS, Sweden).

Simple-sequence repeat analysis and construction of a genetic linkage map
The third tender trifoliate leaf was collected from each FW-RILs line and stored in liquid nitrogen: DNA was extracted from the samples using the cetyl trimethylammonium bromide method. A total of 638 simple sequence repeat (SSR) markers evenly distributed across the soybean genome  were selected to screen for polymorphisms among the four parents. Of these, 275 primer pairs showed polymorphism in the FW-RILs and were used for SSR genotyping. A genetic map of soybean containing 275 SSR primers on 20 linkage groups was constructed using GAPL 1.0 software (https://www.gap-system.org/Releases/4.10.1.html). Two adjacent markers in the genetic map with a genetic distance ⩾40 cM were uncoupled, reducing the length of the genetic map. The map covers a genome length of 3636.26 cM with an average length between markers of 15.47 cM. The length of each linkage group ranges from 49.36 to 319.02 cM and the number of markers varies from 6 to 20 (Ning et al., 2015).

Conditional variable
The conditional effect value was calculated through the method described by Zhu (1995). Specifically, conditional variables were estimated using the following formula: (1) where y pro|oil is the conditional variable PC (dependent on OC), calculated by subtracting the influence of OC from PC. Similarly, y oil|pro is the conditional variable OC (dependent on PC). y pro and y oil are the PC and OC of each individual, respectively, while y pro and y oil are the mean values of these contents. C pro,oil and C oil,pro are the covariance between PC and OC, and V pro and V oil are the variance of PC and OC, respectively. The conditional variables were calculated using Microsoft Excel 2010.

Phenotypic data analysis
Based on the mean of unconditional and conditional variables of the three replicates per line in each environment, descriptive and correlation analyses were conducted, respectively. An analysis of variance (ANOVA) was carried out depending on the unconditional and conditional variables of the three replicates per line for each environment. All analyses were conducted using SAS 9.2 software (SAS Institute Inc., USA). Heritability was estimated using the following equations: For a single environment For multiple environments where h 2 indicates the broad sense heritability, s 2 G is the genetic variance, s 2 GE is the variance due to the genotype × environment interaction, s 2 1 is the error variance, r is the number of blocks and e is the number of environments in the study. s 2 G , s 2 1 and s 2 GE were estimated using a mixed method implemented using Proc Mixed in SAS 9.2.

Quantitative trait loci analysis
The phenotypes of unconditional and conditional means of PC and OC for each line in every environment and the linkage map were used for QTL analysis via interval mapping (IM) (Lander and Botstein, 1989) and inclusive complete interval mapping (ICIM) (Li et al., 2007). The QTLs were identified using logarithm of the odds (LOD) thresholds of the empirical value generated based on 1000 permutations at 0.05 probability of type I error and the least LOD was 2.5. A QTL that was detected by two methods simultaneously was considered to be real.

Naming conventions for quantitative trait loci
QTLs were named as follows: q + trait name − chromosome name − sequence number, where q represents QTL, Pro represents PC and oil represents OC. QTLs that were mapped to the same marker region were given the same sequence number. Unified numbering was applied for both methods (IM and ICIM).

Phenotypic variation
For the FW-RILs, the frequency of individuals for PC and OC followed a normal distribution (Fig. 1, Table 1). For a single environment, the PC and OC for the FW-RILs (Table 2) exhibited a significant variation in all eight environments. For multiple environments, the variation among genotypes, environments and genotype × environments was significant at P < 0.01, indicating that the population was suitable for QTL mapping studies and could potentially reveal a variety of QTLs in different environments.

Correlation and heritability analysis
The correlation between PC and OC based on unconditional and conditional variables is shown in Table 1. There was a significant negative correlation between the unconditional variables of PC and OC. The heritability of the two traits, which was estimated based on the conditional variables, was different from that of the unconditional variable, indicating that the negative correlation between PC and OC has a greater impact on the variation of each trait. The coefficient of variation, skewness and kurtosis of unconditional variables were higher than those of the conditional variables, indicating that this negative correlation influences the degree of variation of the phenotypic values of these two traits.
Perhaps QTLs whose allelic effects were detrimental to both PC and OC could be used to improve a single trait.

Discussion
The use of four-way recombinant inbred lines populations enhances quantitative trait loci detection QTLs for PC and OC were previously identified using separate populations derived from biparental cross populations (e.g. F 2 , BC and RIL). In these populations, a single QTL contained only two alleles, its polymorphic information content was low and the mapping results could not be extended to other combinations, resulting in a narrow statistical inference space. Compared to separate populations derived from biparental crosses, the use of FW-RILs populations could increase the statistical inference space and their polymorphic information content is high, thereby increasing the density and coverage of the resulting four-way           population linkage map (Ao and Xu, 2006). Therefore, FW-RILs populations could be used to improve QTL detection. The number of QTLs for PC and OC in soybean previously identified in biparental populations ranged from 2 to 23 (Brummer et al., 1997;Chapman et al., 2003;Akond et al., 2014), whereas many more QTLs can be detected using FW-RILs populations. In the current study, 73 QTLs for PC and OC in soybean were identified. The similar additive effect values between two parents also affected QTL positioning. By contrast, four-way populations contain four parents, allowing QTLs to be detected as long as the allele effects of two parents are different. For example, for qPro-A1-3 (Satt572-Satt684), the additive effects of alleles from the four parents were −0.61, −0.60, 1.49 and −0.28. Since there was little difference in additive effects between the first and second parents, this QTL would not been identified using a biparental population from the Kenfeng14 × Kenfeng15 cross; since the difference in additive effect values between other parents was large, the QTL was identified using four-way populations.

Distinctive and stable quantitative trait loci in various environments
PC and OC are quantitative traits controlled by genotype and the environment. Since these traits are affected by environments and genotypes, the expression of QTLs controlling these traits differs. Comparing to QTLs detected in the single environment, QTL positioning using data from multiple environments increases the detection ability and accuracy of estimating the location and the effect of a QTL. Wang et al. (2014a) identified two stable QTLs based on 3 years of data in three environments. Qi et al. (2014a) determined that among the 56 OC QTLs detected in 17 environments, eight were stable in multiple environments. Mao et al. (2013) identified 40 PC QTLs, with nine that exist stably in multiple environments, as well as 35 OC QTLs detected simultaneously in eight environments with four that exist stably in over two environments.
For the detection of QTLs expressed in multiple environments, in the current study QTL mapping was conducted in eight environments. Of the 73 QTLs detected, 23 were detected in two environments and four were detected in three environments. The PVE ranged from 2.65 to 13.83% in multiple environments, indicating that QTLs for PC and OC varied in different environments and were affected by non-genetic factors. QTLs located in a single environment have poor environmental stability and are susceptible to non-genetic factors, as a specific QTL might be influenced only by environmental factors.

Advantages of conditional variable analysis of quantitative trait loci
In previous studies, unconditional QTL mapping methods were used to locate QTLs for protein and oil in soybean Eskandari et al., 2013a;Wang et al., 2014b). However, this method relies on final phenotypic values being affected by related traits, which would lead to a bias in estimating LOD scores and the additive effects of putative QTLs. The conditional variable method proposed by Zhu (1995), which is based on a net-effect analysis, makes up for the deficiencies in the unconditional variable method.
The combination of unconditional and conditional variable analysis for mapping QTLs for PC and OC has two advantages.  Firstly, conditional variable analysis corrects the bias in estimating genetic effects. The PC of soybean is negatively correlated with OC (Zhang et al., 2004). Therefore, when the variation of the target trait is evaluated, the influence of the related trait on the target traits would lead to a deviation in the calculation of heritability. The conditional variable analysis method analysed the heritability from a single-trait perspective and avoids the effects of interaction between PC and OC. In the current study, the heritability values of PC and OC were determined to be 0.39 and 0.40, respectively, using unconditional variables. After using the conditional variable analysis method, the heritability values of PC and OC were 0.31 and 0.33, respectively. Secondly, conditional analysis could be used to explore the interaction mechanism of related traits in a FW-RILs population. That is to say, the conditional traits enhance or decrease the allelic additive effect values. In the current study, for qOil-M-1, the absolute additive effect values of the four-way population parental lines were determined to be 0.60, 1.17, 0.23 and 0.34 in the unconditional analysis and 0.12, 0.45, 0.30 and 0.03 in the conditional analysis. Comparing the absolute additive effect values obtained from the two analytical methods, PC decreased the first, second and fourth allelic additive effect values and enhanced the third value.

Conditional variable analysis and strategies for molecular breeding
Conditional variable analysis methods have been used to analyse the molecular genetic relationships of related traits, and the results have been applied to molecular marker-assisted breeding (Zhang et al., 2016). Of the 73 QTLs detected in the current study, only 20 functioned independently, whereas the expression of 53 QTLs was fully or partially affected by related traits. Therefore, when designing molecular breeding strategies, the 20 QTLs that function independently could be used directly to affect the target traits, whereas when using the 53 remaining QTLs, the influence of related traits should be considered.
Comparing the quantitative trait loci identified in the current study with those reported previously To compare QTLs detected in the present study with previously identified QTLs, the newly identified QTLs were integrated into an available publicly soybean genome map. Of the 73 QTLs examined, 66 partially or completely overlapped with previously reported QTLs and might therefore represent the same QTLs, including 14 QTLs detected only by conditional analysis. Of the seven remaining QTLs, qPro-A1-3, qPro-A1-4 and qOil-A2-1 were only identified by conditional analysis. Among them, qPro-A1-3 and qPro-A1-4 appear to be present in the same locus (Satt684), and qOil-A2-1 is located in region 81.01 cM (Sat_115)-106.29 cM (Sat_392), which is adjacent to two gene regions underlying OC, i.e. 67.33 cM (A111_1)-77.7 cM (Satt341)  and 108.83-146.83 cM (Satt327) .

Conclusions
QTLs for PC and OC were identified by both conditional and unconditional variable analysis. Among the 34 protein and 39 oil QTLs identified, 13 only affect PC and seven only affect OC. The eight remaining PC QTLs were completely affected by the oil and six OC QTLs were fully influenced by PC. The 13 PC QTLs and 26 OC QTLs were detected by both unconditional and conditional variable methods. Finally, seven QTLs (qPro-A1-3, qPro-A1-4, qOil-A2-1, qOil-C1-1, qOil-C1-2, qOil-D2-3 and qOil-E-1) were identified for the first time. Conflict of interest. None.