## 1. Introduction

DNA sequencing and high throughput single nucleotide polymorphism (SNP) analysis have increased the resolution of quantitative trait loci (QTLs) mapping, making possible the use of short-range linkage disequilibrium (LD). Linkage analysis (LA), the most popular tool for QTL detection 10 years ago, was enriched by the addition of LD information in so-called linkage disequilibrium analysis (LDLA) methods. Several approaches have been proposed to combine these levels of information (e.g. Meuwissen *et al.*, Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002; Farnir *et al.*, Reference Farnir, Grisart, Coppieters, Riquet, Berzi, Cambisano, Karim, Mni, Moisio, Simon, Wagenaar, Vilkki and Georges2002; Pérez-Enciso, Reference Pérez-Enciso2003; Legarra & Fernando, Reference Legarra and Fernando2009). Some are computationally intensive (Pérez-Enciso, Reference Pérez-Enciso2003), whereas others, limited to familial designs, were easier to compute (Farnir *et al.*, Reference Farnir, Grisart, Coppieters, Riquet, Berzi, Cambisano, Karim, Mni, Moisio, Simon, Wagenaar, Vilkki and Georges2002). Meuwissen *et al.* (Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002) proposed testing the presence of a QTL at a given genomic position with a random model that included what they called a ‘haplotype effect’. The haplotypes were defined by a set of marker loci on the chromosomal segment surrounding this position. However, this terminology (haplotype effect) may be confusing: two chromosome segments belonging to two different individuals (or two chromosomes of the same individual) and carrying the same (identical by state (IBS)) haplotype may not be (identical by descent (IBD)) at the QTL. Thus, to avoid possible confusion, we will prefer and use the wording ‘chromosomal segment effect’ rather than ‘haplotype effect’. In this paper, the term ‘haplotype’ will be restricted to a sequence of marker alleles which can be found on these chromosome segments (with a maximum of 2* ^{m}* haplotypes if they are assembled from

*m*markers), while the term ‘chromosome segment’ defines physical DNA sequences found in a population (2

*n*distinct chromosomal segments exist if there are

*n*individuals).

Recently, Legarra & Fernando (Reference Legarra and Fernando2009) developed two models for LDLA approaches: a linear regression, derived from the Haley–Knott method (Knott *et al.*, Reference Knott, Elsen and Haley1996) and a mixed linear model. Their regression model is simpler and faster than the Meuwissen *et al.* (Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002) model. The association between the quantitative phenotypes recorded in the last generation and transmitted founder chromosomal segments is tested considering gamete transmission probabilities along generations. The authors make no assumption about the LD generation process, i.e. about identity between the founder chromosomal segments, and leave users free to implement or not haplotype clustering. They simply consider that two chromosomal segments carrying IBS haplotypes belong to the same ‘class’ and model the effect of the classes on the quantitative trait. Legarra & Fernando (Reference Legarra and Fernando2009) compared their regression model to a variance component IBD-based model (vcIBD) (Meuwissen *et al.*, Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002; Lee & Van der Werf, Reference Lee and van der2006) in terms of mapping accuracy. Based on their results, the regression model appeared to be more accurate than the vcIBD method when the only source of LD was drift. However, their numerical evaluation was limited to only one family size and low-marker density.

Several studies (Lee & van der Werf, Reference Lee and van der2004, Reference Lee and van der2005; Heuven *et al.*, Reference Heuven, Bovenhuis, Janss and Arendonk2005; Hayes *et al.*, Reference Hayes, Gjuvsland and Omholt2006) have evaluated the potential of LD to improve mapping accuracy when using familial linkage information. They investigated the effects of pedigree information, marker density, effective population size and mutation age on the vcIBD-based model. This method is computationally intensive and both the number of scenarios and replicates per scenario was limited.

The first purpose of this work is to extend the comparison between the linear regression (Legarra & Fernando, Reference Legarra and Fernando2009) and IBD variance component models (Meuwissen *et al.*, Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002). The original IBD-based method described by Meuwissen *et al.* (Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002) is modified according to Druet *et al.* (Reference Druet, Fritz, Boussaha, Ben-Jemaa, Guillaume, Derbala, Zelenika, Lechner, Charon, Boichard, Gut, Eggen and Gautier2008) by clustering the chromosomal segments based on their IBD probabilities. Various numbers of markers per haplotype and levels of chromosomal segment clustering are tested.

A second objective of this work is to extend the study of the LDLA regression (LDLA_reg) technique, considering different designs defined by the number of descendants per sire, the type of design (either half-sibs or a mixture of full- and half-sibs) and the number of genotyped animals. Here, we exploit the computing time of the regression technique, allowing more scenarios and replicates to be examined.

## 2. Materials and methods

### (i) Simulated designs

LD in a historical founder population, and genotypes and performance in a two-generation mapping population were simulated with the LDSO software (Ytournel *et al.*, Reference Ytournel, Teyssèdre, Roldan, Erbe, Simianer, Boichard, Gilbert, Druet and Legarra2010). Full and half-sib designs were simulated for the mapping population.

Genotypes were simulated for a number of biallelic markers in either a 5 or 25 cM region, assuming linkage equilibrium and isofrequent alleles at each locus. The limited (5 and 25 cM) sizes of the explored region were chosen to mimic the situation when a QTL is fine mapped after having been detected via a full genome scan. A biallelic QTL was located at mid-distance between two markers at position 2·125 or 9·125 cM. Neither mutations nor bottlenecks were simulated. An effective population size of 100 was simulated during 50 generations. This process created LD between loci. Three additional generations were simulated, in order to increase the mixing of chromosomal segments created after the initial 50 generations. The actual size of the population was increased to 700 (350 males and 350 females) in generation 51, 2000 (50 sire families of size 40 with two progeny per sire×dam mating) in generation 52 and 4000 (100 sire families of size 40 with two progeny per sire×dam mating) in generation 53. Two additional generations (54 and 55) formed the mapping population. Phenotypes were simulated in the last two generations, with a QTL allelic effect of one phenotypic standard deviation and an environmental effect sampled from an *N*(0,1) normal distribution.

For each scenario, a total of 100 usable replicates, i.e. for which neither of the QTL alleles had a frequency >0·90 in mapping population sires (in the 54th generation) were generated and analysed.

### (ii) Mapping analysis methods

Three QTL mapping methods were compared: (i) the LA method according to Elsen *et al.* (Reference Elsen, Mangin, Goffinet, Boichard and Le Roy1999), (ii) the LDLA regression approach of Legarra & Fernando (Reference Legarra and Fernando2009) and (iii) an LDLA variance components IBD-based method derived from Meuwissen *et al.* (Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002) . The methods will be designated LA, LDLA_reg and LDLA_vc, respectively.

The first two models are implemented in the QTLMap software (Filangi *et al.*, Reference Filangi, Elsen, Gilbert, Legarra, Le Roy and Moreno2010). Genome scans were performed using a 0·1 cM step. For both methods, the most probable parental phases (and thus the haplotypes carried by the founders) were built from the data (Favier *et al.*, Reference Favier, Elsen, de Givry and Legarra2010), and the transmission probabilities from the parents to the progeny (Elsen *et al.*, Reference Elsen, Mangin, Goffinet, Boichard and Le Roy1999) were computed. In brief, the LAs systematically tested the difference in performance levels between the progeny receiving the first chromosome or the second chromosome from the sire or the two parents at the tested position, weighted according to the transmission probabilities in the likelihood function. As described in Legarra & Fernando (Reference Legarra and Fernando2009), the LDLA_reg method adds to the LA model the effects of haplotype classes as observed on the chromosomal segments transmitted by the parents to the progeny, weighted by the corresponding transmission probability.

In Meuwissen *et al.* (Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002), the covariance between two chromosomal segment effects depended on the probability that they carried IBD alleles at the QTL given the marker and pedigree information (here, to simplify the description, such chromosomal segments will themselves be qualified IBD). The probabilities that two founder chromosomal segments are IBD can be estimated by the gene dropping method described by Meuwissen & Goddard (Reference Meuwissen and Goddard2000), or the deterministic method based on the simplified coalescence approach reported by Meuwissen & Goddard (Reference Meuwissen and Goddard2001). The probability that a non-founder chromosomal segment is IBD with a founder chromosomal segment can be estimated using the algorithm described by Fernando & Grossman (Reference Fernando and Grossman1989). The number of different haplotypes found in a population increases with the marker density. The number of non-IBD chromosomal segments may be very high, and proportional to the population size. To reduce the impact of the estimation problem, Blott *et al.* (Reference Blott, Kim, Moisio, Schmidt-Küntzel, Cornet, Berzi, Cambisano, Ford, Grisart, Johnson, Karim, Simon, Snell, Spelman, Wong, Georges, Farnir and Coppieters2003) and Druet *et al.* (Reference Druet, Fritz, Boussaha, Ben-Jemaa, Guillaume, Derbala, Zelenika, Lechner, Charon, Boichard, Gut, Eggen and Gautier2008) proposed that some chromosomal segments should be clustered together before solving the mixed model, but no clear rule was provided to achieve an efficient clustering strategy. The third method (LDLA_vc) was performed with the software suite PIBD used at INRA (F. Guillaume, personal communication). As phase reconstruction is not included in the suite, phases built with LDSO software were provided as input. We further checked that the phases inferred with QTLMap were always exact. The procedure included four steps. In the first step, the haplotypes carried by the founders were identified. In the second step, IBD probabilities (hereafter designated Prob(IBD)) between founder chromosomal segments were estimated as described in Meuwissen & Goddard (Reference Meuwissen and Goddard2001). In the third step, the chromosomal segments were grouped with a single linkage clustering technique (Hurtagh, Reference Hurtagh1985) using 1-Prob(IBD) as a distance. The threshold for grouping chromosomal segments was chosen in order to maximize the accuracy of QTL location (see below). Covariances between cluster effects were set to zero in the variance component estimation step. The final step was an EM-REML estimation of the parameters at the putative QTL position, followed by a likelihood ratio test (LRT) of the variance component associated with the QTL (Visscher, Reference Visscher2006).

For all methods, the tested positions were never coincident with the position of the simulated QTL.

### (iii) Situations simulated

A summary of all situations examined is shown in Table 1.

^{a} The reference situation (Legarra & Fernando, Reference Legarra and Fernando2009) is underlined.

The first aim of this paper was to extend the comparison between the linear regression (Legarra & Fernando, Reference Legarra and Fernando2009) and the IBD variance component models (Meuwissen *et al.*, Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002) from Legarra & Fernando to several additional experimental situations. Sixteen scenarios were simulated, including the ‘drift’ scenario of Legarra & Fernando (Reference Legarra and Fernando2009), with a 5 cM segment (this case will be designated as the ‘reference scenario’).

Two parameters relating to the way haplotype information is used were tested: the haplotype length and the clustering threshold in the LDLA_vc approach.

The haplotype length, a parameter needed in LDLA methods, is usually defined by the number of markers. Three cases were studied: 2, 4 and 6 marker haplotypes. The effect of this parameter on the accuracy of the QTL location was investigated considering a population of 1000 progeny from 5, 15, 50 or 100 sire families, with a 25 cM-long segment with 100 or 501 markers.

As explained above, the original IBD-based method described by Meuwissen *et al.* (Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002) was modified according to Druet *et al.* (Reference Druet, Fritz, Boussaha, Ben-Jemaa, Guillaume, Derbala, Zelenika, Lechner, Charon, Boichard, Gut, Eggen and Gautier2008) by clustering the chromosomal segments based on their IBD probabilities: two chromosomal segments were grouped together in the LDLA_vc method if their Prob(IBD) was greater than a threshold. Three values were compared for this threshold: 0·05, 0·50 and 0·95. Mapping accuracy was tested in a subset of the scenarios described previously, with 15 families having a total of 1000 progeny, and a 25 cM-long segment with 501 markers. From the results of this first analysis, an average situation with haplotypes of four markers and a clustering threshold of 0·50 was kept for the following comparisons.

Then two series of comparisons were performed:

(1) Two region sizes, 5 and 25 cM, and two SNP densities, 0·25 or 0·05 cM between markers, were simulated. The total number of markers was 21 or 101 for the 5 cM segment, and 101 or 501 for the 25 cM segment.

(2) Four number of families (5, 15, 50 and 100), setting to 1000 the total number of progenies were simulated.

To evaluate the efficiency of various experimental designs for mapping QTL with the LDLA_reg model the effect of familial structure on the accuracy of QTL location was explored. Four family sizes (40, 70, 100 and 160 descendants per sire) for a population consisting of 15 sire families were studied, giving a total population size between 600 and 2400. For each family size, two designs were assessed: paternal half-sibs (1 progeny per sire-dam mate) and a mixture of full- and half-sibs (10 or 20 full-sibs per family).

### (iv) Comparison criteria

#### (a) Accuracy of estimation of QTL location

Mean square errors (MSE) for the estimated QTL location were obtained from 100 replicates for each method and scenario. The MSE incorporates both the bias and standard deviation of the estimates: the smaller, the better.

### (v) Computing time

The average (100 replicates) CPU time required by each method to analyse the data sets was compared. The computing time was measured on an Intel 64 bit CPU Inter 64, (4 GB RAM) running under Linux.

## 3. Results and discussion

The validity of the simulations was assessed by considering the realized LD, measured by *r* ^{2}. The mean value, estimated in sparse and dense maps (0·25 or 0·05 cM interval between markers), was 0·099 and 0·111, respectively, between adjacent markers. These observed values for *r* ^{2} and the extent of the LD, estimated from 100 replicates, suggest that our simulation parameters generated a population that mimics the LD values observed in existing populations (McRae *et al.*, Reference McRae, McEwan, Dodds, Wilson, Crawford and Slate2002, Reference McRae, Pemberton and Visscher2005 in sheep population; Tenesa *et al.*, Reference Tenesa, Knott, Ward, Smith, Williams and Vischer2003 and Farnir *et al.*, Reference Farnir, Coppieters, Arranz, Berzi, Combisano, Grisart, Karim, Marcq, Moreau, Mni, Nezer, Simon, Vanmanshoven, Wagenaar and Georges2000 in cattle; Nsengimana *et al.*, Reference Nsengimana, Baret, Haley and Vischer2004 in pig lines).

### (i) Preliminary analyses: haplotype information

#### (a) Clustering threshold

The effect of the clustering threshold on the accuracy of QTL location with LDLA_vc is displayed in Table 2 for different numbers of families. For this comparison the number of markers was always four, and even with this low number of markers, which generates at the most 2^{4}=16 different haplotypes, a large number of clusters can be obtained. As reported by Meuwissen & Goddard (Reference Meuwissen and Goddard2001), this is because two chromosomal segments that are identical by state can show low Prob(IBD). As expected, the higher the clustering threshold is, the lower the number of clusters there will be and the higher the number of founder chromosomal segments in each cluster. The lowest threshold (1−Prob(IBD)>0·05) gave the worst mapping resolution. In this situation, only the chromosomal segments having a high probability of being identical were put together and many clusters were defined. Almost as many clusters as founder chromosomal segments were thus available in the analyses. As covariances between cluster effects are set to zero in the variance component estimation step, such analyses were close to a LA. In this situation, a large number of segment effects have to be estimated, based on limited information, and the mapping accuracy is reduced. Increasing the clustering limit from 0·05 to 0·50 decreased the MSE values by 50–78% depending on the number of families. Moving the threshold from 0·50 to 0·95 led to ‘over-clusterization’ and increased the MSE values (from 38 to 70%). In this case, a large proportion of the elements grouped in a cluster may not be IBD, decreasing the differences between cluster effects if a QTL exists. This result is in agreement with Ytournel *et al.* (Reference Ytournel, Gilbert and Boichard2007), who studied by simulation the ability of estimated IBD probability to discriminate between the IBD statuses of QTL loci. They found that, for a 0·90 clustering threshold, 75% of the QTL alleles corresponding to chromosome segments grouped together were not IBD.

^{a} MSE, mean square error (cM^{2}). Standard errors in parentheses.

^{b} NC, number of clusters.

Our observations partly contradict Calus *et al.* (Reference Calus, Meuwissen, Winding, Knol, Schrooten, Vereijken and Veerkamp2009). They clustered founder chromosomal segments with IBD probabilities (their ‘limitIBD’ equivalent to our 1-clustering threshold) of 0·55, 0·75 or 0·95, and non-founder chromosomal segments with IBD probabilities of over 0·95. They found that the posterior probabilities for a QTL to be found near to the true QTL location are practically uninfluenced by the clustering threshold chosen. This difference is probably due to the unique 0·95 threshold that they applied to non-founder chromosome segments.

The QTL effect was slightly underestimated (about 5%, data not shown) with the lower threshold and overestimated for higher clustering thresholds. The computing time was higher with larger numbers of clusters (about 2·5 times higher with a 0·05 threshold as compared with 0·50, results not shown).

#### (b) Window sizes

With the LDLA_reg method (Table 3), mapping resolution was optimal when haplotypes were defined with four markers. Two-marker haplotypes represented the worst solution, and doubled the MSE compared with four-marker haplotypes. None of the differences observed between four- and six- marker haplotypes were significant.

^{a} Bonferroni *t* test. MSE values with the same letter are not significantly different.

^{b} Marker density (cM).

^{c} Mean square error values (cM^{2}). Standard errors in parentheses.

The clustering threshold was 0·50 in these analyses, and for the LDLA_vc model, significant differences between two-, four- and six-marker haplotypes were detected but only for the 0·25 cM marker density (Table 4). In this case, the best result was obtained with six markers per haplotype (*P*<0·05).

^{a} Bonferroni *t* test. MSE values with the same letter are not significantly different.

^{b} Marker density (cM).

^{c} NC, number of clusters.

^{d} MSE, mean square error (cM^{2}). Standard errors in parentheses.

Those results are partly in agreement with other reported observations. Grapes *et al.* (Reference Grapes, Dekkers, Rothschild and Fernando2004) found that an LD regression based on two-marker haplotypes was more accurate (in terms of QTL position) than a single-marker regression. Grapes *et al.* (Reference Grapes, Firat, Dekkers, Rothschild and Fernando2006), exploring the optimal haplotype structure for the IBD-based LD approach of Meuwissen & Goddard (Reference Meuwissen and Goddard2000), found that using haplotypes of four or six markers always gave lower MSEs than smaller (1, 2) or larger (10) haplotypes, the four-marker haplotypes performing most often best. Zhao *et al.* (Reference Zhao, Fernando and Dekkers2007) also found that four-marker haplotypes minimized the MSE in the LD variance component-based IBD method, but also found that single marker regression is often the best solution. Calus *et al.* (Reference Calus, Meuwissen, Winding, Knol, Schrooten, Vereijken and Veerkamp2009), using the average frequency of correct positioning of the QTL, found that haplotypes consisting of two markers were much less efficient than haplotypes consisting of six and 12 markers, the four marker case not being studied.

For the LDLA_vc model, larger window sizes correspond to lower numbers of clusters and higher numbers of chromosomal segments in each cluster. This is partly in agreement with Calus *et al.* (Reference Calus, Meuwissen, Winding, Knol, Schrooten, Vereijken and Veerkamp2009) who studied the effects, on mapping accuracy, of window sizes (2, 6, 12 and 20) and clustering thresholds of IBD probability between founder haplotypes (0·05, 0·25 and 0·45). They found that increasing the window size decreased the number of founder and non-founder chromosomal segments for the lower (0·05 and 0·25) clustering thresholds, with the number reaching a minimum with six markers when the threshold reaches 0·45.

Whatever the haplotype size and family number, accuracy increased with the map density. In the following comparisons, the LDLA_vc analyses were performed with values of 0·50 for the clustering threshold, and four-marker haplotypes were used. The latter may be suboptimal for LDLA_vc with sparse maps, but facilitated comparing the methods, in particular for the computing time criteria.

### Comparison of mapping methods

Meuwissen *et al.* (Reference Meuwissen, Karlsen, Lien, Olsaker and Goddard2002) depicted the fairly used model for LDLA QTL detection in livestock. The regression model from Legarra & Fernando (Reference Legarra and Fernando2009) is a simple linear-models framework for association and linkage that reduces to well-known models on the hypothesis of LE or complete LD. In addition, it is computationally simple to use. For these reasons, a simple but not extensively used LDLA model was compared against the most fairly used LDLA model.

The performance of the methods applied to half-sib populations of total size 1000 is listed in Tables 5–7 for different sizes of the explored region (5 or 25 cM), marker density and number of families. Tables 5 and 6 concentrate on the mapping accuracy for region sizes of 5 and 25 cM, respectively. Differences in MSE between scenarios were tested with ANOVA adjusted for multiple comparisons (Bonferroni test at a global 5% level). Significances are given in Tables 5 and 6 for combinations of methods (LA, LDLA_reg and LDLA_vc), numbers of sire families (5, 15, 50 or 100) and marker densities (0·25 *versus* 0·05). Table 7 summarizes the computing time for both region sizes and both marker densities.

^{a} Bonferroni *t* test. MSE values with the same letter are not significantly different. First letter: differences between methods; second letter: differences between family numbers; third letter: differences between marker densities.

^{b} MSE, mean square error (cM^{2}). Standard errors in parentheses.

^{c} Models: LA, linkage analysis; LDLA_reg, LDLA analysis by regression model; LDLA_vc, LDLA analysis by IBD variance component model.

^{d} Marker density (cM).

^{a} MSE, mean square error (cM^{2}). Standard errors in parentheses.

^{b} Bonferroni *t* test. MSE values with the same letter are not significantly different. First letter, differences between methods; second letter, differences between family numbers; third letter, differences between marker densities.

^{c} Models: LA, linkage analysis; LDLA_reg, LDLA analysis by regression model; LDLA_vc, LDLA analysis by IBD variance component model.

^{d} Marker density (cM).

^{a} Marker density (cM).

^{b} Models: LA, linkage analysis; LDLA_reg, LDLA analysis by regression model; LDLA_vc, LDLA analysis by IBD variance component model.

^{c} PIBD, computing time for the estimation of IBD probabilities; ANVA, computing time for the estimation of variance components.

s, second, m, minute.

#### Estimation of QTL position

The MSEs we obtained in the ‘reference scenario’ were not significantly different from the MSEs found by Legarra & Fernando (Reference Legarra and Fernando2009) in their 5 cM drift case: 2·781 (±0·250), 0·456 (±0·055) and 0·440 (±0·076) *versus* 2·22 (±0·22), 0·67 (±0·09) and 0·780 (±0·15) in Legarra & Fernando (Reference Legarra and Fernando2009) for LA, LDLA_reg and LDLA_vc, respectively. The (non-significant) differences between these two LDLA_vc approaches may come from different choices with regard to haplotype length and the clustering step (in Legarra & Fernando (Reference Legarra and Fernando2009) all 21 markers in the chromosome were considered and haplotypes were not clustered).

The LDLA_reg model was used without any assumption about the LD generation process: the clustering of the chromosomal segments was simply based on the IBS status of the corresponding haplotypes. In contrast, in the LDLA_vc model, the IBD probabilities were derived following an approximate coalescence model. In spite of this difference, our study showed that both regression and IBD-based variance component LDLA models could precisely place a QTL in a correct location with both marker densities (0·25 and 0·05 marker per cM) studied, while LA estimates were more variable (Tables 5 and 6, see also Fig. 1). The LRT curves obtained with LDLA models were clearly sharper than the curves obtained using LA (Fig. 2), which are flat and smooth, due to high correlations between LRT values at successive positions. For sparse maps, the LDLA_reg model was more accurate than the IBD-based method in locating the QTL (Table 5). For dense maps, the IBD variance component method seems slightly better at estimating the QTL position than the regression LDLA model, although the absolute gain in accuracy is small (Table 6). A possible explanation could be the number of clusters of chromosomal segments analysed in both LDLA versions. For dense maps, the LD is high and only few founder haplotypes had effects estimated by the LDLA_reg, whereas the LDLA_vc model, in which the clusters are formed based on chromosomal segment IBD probabilities, produces more clusters and estimates are therefore less constrained.

We also computed (results not shown) the frequency of replicates positioning the QTL within an interval of 2 cM around the true location. This last criterion must be interpreted with caution when the explored genome region is of a limited size (the chance of randomly positioning a QTL within the 2 cM interval increases when the region is smaller). On the whole, those two criteria gave consistent conclusions. They demonstrate that LA and LDLA are quite different, with LA having very low power to locate the QTL. MSE was able to discriminate between locations estimated using LDLA_reg and LDLA_vc, while the frequency of correct positioning was not discriminating in dense maps.

With respect to the number of families, even though no significant differences were detected, the general picture for all methods is that the most precise design is intermediate, 15 or 50 sires, the extreme (5 and 100 families) generally displaying lower accuracy.

#### Computing time

Whatever the method, the computing time (Table 7) was higher for larger numbers of markers (from 21 to 501 SNP positions). The average time per analysis was very similar between LA and LDLA_reg. For these regression methods, the time increased nearly linearly with the number of families. The computing time required by the LDLA_vc model was much higher, 5–140 times higher than LDLA_reg, depending on the scenario (e.g. QTL analysis of five families from the sparse map and small fragment size required 2 s for LA and 3 s for LDLA_reg, *versus* 6·5 min for LDLA_vc). With LDLA_vc the computing time was relatively insensitive to the number of families, with a tendency towards a reduction when this number increases. This insensitivity suggests that it can be computationally more efficient with a very large population. Most of the computing time of LDLA_vc (about 85–95% of total time by analysis) was spent on the variance component estimation step.

### Comparison of experimental designs

This study focused on classical designs (half-sibs and a mixture of full- and half-sib families) used in livestock population to map QTLs with genome scans.

For a given number of sires (15), the estimate of the QTL location was more precise when the number of progeny per sire was higher (Table 8). This has already been shown in similar comparisons by Hayes *et al.* (Reference Hayes, Gjuvsland and Omholt2006), Zhao *et al.* (Reference Zhao, Fernando and Dekkers2007) and Cierco-Ayrolles *et al.* (Reference Cierco-Ayrolles, Dejean, Legarra, Gilbert, Druet, Ytournel, Estivals, Oumouhou and Mangin2010).

^{a} MSE, mean square error (cM^{2}). Standard errors in parentheses.

^{b} Number of progenies per sire and dam: 1, all families are paternal half-sib families, 10 and 20, each family is a mixture of full- and half-sib families.

^{c} LA, linkage analysis; LDLA_reg, LDLA analysis by regression model.

Within the half-sib family structure, as shown in Tables 5 and 6, an optimum (in terms of MSE) was generally found for the three studied methods (LA, LDLA_reg and LDLA_vc) between the number and size of families: as compared with intermediate solutions (15 or 50 families), the accuracy of QTL positioning was reduced with few large families (five sire families of 200 progeny) or a large number of small families (100 sire families of 10). As we analysed replicates in which the favourable QTL allele frequency in the sires was over 10%, in designs with few families, there was a risk that only one of them would segregate for the QTL. Conversely, using many small families increased the error in estimated location. Our results are not in full agreement with Lee & van der Werf (Reference Lee and van der2004) who analysed, using an IBD-based variance component model, a population of 128 individuals belonging to between 2 and 64 half-sib or full-sib families. For their half-sib families, the accuracy was not strongly affected by the number of families, the tendency being a lower (resp. higher) precision when increasing the family number in LA (respectively LDLA). Many differences between our work and the Lee & van der Werf study (2004) could explain this discrepancy: population size and structure, marker density (they explored a 10 cM segment with only 10 markers), QTL effect size (their QTL had an effect in the range 0·707–1·18 phenotypic standard deviation). The high accuracy found by Lee & van der Werf (Reference Lee and van der2004) with only two families of 64 sibs in the LA clearly points out that a high proportion of sires were heterozygous at the QTL, a situation probably different from our work.

The half-sib structure allows a more precise estimation of QTL location than nested familial structures (full-sib within half-sib) when LDLA_reg techniques are employed, but less precise estimates for LAs (Table 8). Increasing the dam family size (from 10 to 20 progeny) reduces the precision of LDLA_reg. This observation was reported by Heuven *et al*. (2005) who observed a similar tendency in two scenarios comparing two full-sib family sizes for the same number (eight) of sires: 24 dams per sire×10 progeny per dam *versus* 12 dams per sire×20 progeny per dam for a map distance of 2 cM between markers.

In LAs, the MSE was minimized for an intermediate dam family size (10 *versus* 1 or 20). This result was consistent with the idea that, in outbred populations where only a proportion of the parents are heterozygous at markers and QTLs, there is a trade-off between the number and size of families: the former is linked to the number of informative families, the latter to the precision of the estimation based on data from informative families. This tendency was, however, not observed in Heuven *et al*. (2005) who reported a small increase of precision between 10 and 20 progenies.

This effect was generally reinforced particularly in the case of larger dam families. In LDLA, the MSE increased (i.e. the precision decreased) when the number of progeny per dam increased, this tendency being probably due to the concomitant diminution of the number of dam families, and thus to the fact that there were fewer chromosomal segments sampled on the dam side to exploit the LD.

In LAs, the MSE was minimized for an intermediate dam family size (10 *versus* 1 or 20). This result was consistent with the idea that, in outbred populations where only a part proportion of the parents are heterozygous at markers and QTLs, there is a trade-off between the number and size of families: the former is linked to the number of informative families, the latter to the precision of the estimation based on data from informative families.

### Limits of our study

The comparisons were performed in a restricted range of genetic structures. LD was generated only by drift due to limited effective population size. The number of ‘historical’ generations creating the LD state was limited to 50, one of the values considered by Meuwissen & Goddard (Reference Meuwissen and Goddard2000). Even if the LD simulated mimics the LD observed in real populations, different haplotype configurations may have been obtained in other cases. Other situations, including natural and artificial selection, mutation, migration (with cross-breeding or not) could influence the results. The choice to simulate only drift was mostly for its simplicity and because it corresponds to most of the simulations performed in similar work, although mutations were considered in Lee & van der Werf (Reference Lee and van der2004) and cross-breeding was considered in Grapes *et al.* (Reference Grapes, Firat, Dekkers, Rothschild and Fernando2006). Single QTL tests were applied. Extension to other genetic models would be possible, including the option of a finite number of QTLs segregating in the same linkage group or on different chromosomes. In the regression LDLA model, haplotypes were clustered based on their IBS. Another criterion, such as the IBD probabilities as implemented in the LDLA_vc method, could be examined. An alternative to this simulation process would be the use of experimental data to examine mapping accuracy of cluster IBD- and regression-based models under experimental conditions.

The comparisons between regression and variance component LDL methods were done in the context of classical experimental designs for QTL detection, i.e. sets of paternal half-sib families or nested dam full-sib within sire half-sib families. The information used in the statistical models was limited to the two generations (parents and progeny) of this design, while additional data (pedigree, and possibly markers and/or phenotypes) could be used in more general treatments. This choice was made to reflect the frequent situation of a second study devoted to the fine mapping of a QTL previously detected in such standard population designs.

In conclusion, the overall result is that QTL locations are estimated with similar accuracy under LDLA regression and variance components approaches, with a dramatic difference of the computing time in favour of the regression. This result suggests that LDLA_reg should be used (1) for a rapid exploration of the data and (2) for optimization of the protocol design.

The population structure had an impact on the precision of the QTL position, with the optimum balance between the number and size of families depending on the characteristics of the particular study (length of the explored segment, mapping density, total population size, etc.). When possible, larger populations and half-sib family structures should be preferred.

This study was supported by a fellowship from the ‘Plan de Formación, Actualización y Perfeccionamiento’ Res. 779-06 and Res. 641/08 grant of the Instituto Nacional de Tecnología Agropecuaria (INTA-Argentina). Part of the work was performed when J.-M. Elsen visited CSIRO in Armidale, supported by a McMaster fellowship. The project was partly supported by Toulouse Midi-Pyrenees bioinformatics platform (France). The contribution of the ANR project Rules & Tools is kindly acknowledged. We would also like to thank the anonymous reviewers for their comments and suggestions.

## Declaration of interest

None.