1. Introduction
In a traditional analysis of ordinal data, we assume I judges assess J objects by providing ordinal preferences,
$\Pi $
. The ordinal preferences of each judge,
$\Pi _i$
, may be provided in various forms, such as complete rankings, partial rankings, or pairwise comparisons among available objects or some subset thereof. Standard statistical model families for ranking data such as Mallows (Mallows, Reference Mallows1957) or Bradley-Terry-Luce (Bradley & Terry, Reference Bradley and Terry1952; Luce, Reference Luce1959; Plackett, Reference Plackett1975) derive or estimate the rank of each object whereby each object receives a unique rank. An estimated overall ranking then orders all objects from best to worst. Analyses of this kind, often referred to as rank aggregation (Dwork et al., Reference Dwork, Kumar, Naor and Sivakumar2001), are used to rank candidates in ranked choice elections, (Gormley & Murphy, Reference Gormley and Murphy2008; Mollica & Tardella, Reference Mollica and Tardella2017), sports teams or players in a league using pairwise game outcomes (Barrientos et al., Reference Barrientos, Sen, Page and Dunson2023; Tutz & Schauberger, Reference Tutz and Schauberger2015), or genes based on ordinal comparisons of genomics data (Eliseussen et al., Reference Eliseussen, Frigessi and Vitelli2023; Vitelli et al., Reference Vitelli, Sørensen, Crispino, Frigessi Di Rattalma and Arjas2018). In these scenarios we intentionally do not consider potential heterogeneity among judges. Our goal is to learn a single ranking which is the desired outcome, whether it is an ordering of candidates or an ordering of genes.
However, requiring estimated ranks to be unique is not always useful or appropriate. For example, some objects may be equal or indistinguishable in their true quality or ability. Consider an election in which two candidates, both of the same political party, are running for an office. If voters express their preferences solely on the basis of party, the candidates are inherently equal in quality. In another situation, when the number of votes cast is small, estimated ranks assigned to each candidate could exhibit substantial uncertainty, suggesting the candidates are indistinguishable in quality based on the limited number of observed votes. In such situations, allowing for inference to estimate the candidates as having the same rank or be rank-clustered may improve interpretability, prediction, and decision-making when analyzing ordinal preferences.
In this article, we propose a Bayesian framework for ordinal data analysis that estimates an overall ranking of objects with rank-clusters, develop a computationally-efficient Gibbs sampler for estimation, and apply the model to real and simulated data. Specifically, we choose to model observed rank via the Bradley–Terry–Luce (BTL) family of distributions which permits analysis of ordinal preferences in many forms, such as complete rankings, partial rankings, pairwise comparisons, and groupwise comparisons. To induce rank-clusters, we place a novel spike-and-slab fusion prior on the object-specific parameters of BTL distributions. In contrast to existing work related to rank-clustering in the literature, our model requires neither the parameter order nor the number or size of rank-clusters to be known in advance. Instead, these quantities are treated as random variables and estimated simultaneously so that their corresponding uncertainty is naturally reflected in the resulting inferences.
The rest of the article is organized as follows. We first review literature related to rank-clustering in Section 2. Then, we propose the Partition-based Spike-and-Slab Fusion (PSSF) prior and apply it to a BTL model for ordinal data in Section 3. We develop a computationally-efficient Gibbs sampler based on reversible jump Markov chain Monte Carlo (RJMCMC) and demonstrate its accuracy on simulated data in Section 4. To demonstrate a wide variety of methodological benefits of our proposed framework, in Section 5, we apply the model to four real datasets: (i) complete rankings of sushi preferences provided by Japanese adults, (ii) partial rankings of 2021 Minneapolis mayoral candidates expressed by voters in a ranked choice election, (iii) complete and partial rankings of policy options from Eurobarometer 34.1, a survey which measures various European attitudes, and (iv) pairwise basketball game outcomes from the 2023–2024 season of the National Basketball Association (NBA). We conclude with a brief discussion in Section 6.
2 Background
Before reviewing the ordinal comparisons literature, it is helpful to introduce some basic terminology and notation. Rankings are a type of ordinal preference that denotes a relative ordering of objects from best to worst, potentially allowing ties. We use the operator ‘
$\prec $
’ to denote a strict ordering of two objects; e.g.,
$A\prec B$
states that object A is strictly preferred to B. An object’s rank is the place it receives in the ranking.Footnote 1 Rankings arise in different forms. Given a collection of objects, a ranking is called complete when all objects are ranked. In contrast, a ranking is called partial when only a subset of the most-preferred objects are ranked (e.g., a top-five ranking). In a partial ranking, we assume that unranked objects are less-preferred than those ranked, but also that the preference order among the unranked objects is unknown. Next, we call a ranking incomplete when a judge is asked only to rank a subset of the complete collection of objects. In incomplete rankings, no information can be gleaned regarding objects not considered. For example, if a voter is asked by an election pollster to rank candidates from a single political party, the ranking should provide no information regarding their preferences on candidates from other parties. We call incomplete rankings involving two objects (candidates in the above example) a pairwise comparison, and incomplete rankings involving more than two objects a groupwise comparison. Rankings may be both partial and incomplete; e.g., a top-three ranking of mayoral candidates from a specific political party.
Next, we briefly review methods for estimating rank-clusters based on the BTL and Mallows families of ordinal data models in turn. For a more thorough review of these standard model families, see Marden (Reference Marden1996) and Alvo & Yu (Reference Alvo and Yu2014).
2.1 Methods based on BTL distributions
Most work related to rank-clustering utilizes the BTL family, which comprises the Bradley–Terry and Plackett–Luce distributions and their extensions. The Bradley–Terry model, proposed by Zermelo (Reference Zermelo1929) and discovered independently by Bradley & Terry (Reference Bradley and Terry1952), is parameterized by the vector
$\omega \in \mathbb {R}_{>0}^J$
, in which each
$\omega _j$
corresponds to the worth of object j. Specifically, the Bradley-Terry model specifies the probability that object i will be ranked above object j in pairwise tournament as

The Plackett–Luce model (Plackett, Reference Plackett1975) extended the Bradley–Terry to allow for multiple comparisons, partial rankings, and incomplete rankings, and has been justified under Luce’s Choice Axiom (Luce, Reference Luce1959) and Thurstone’s theory of comparative judgment (Thompson Jr. & Singh, Reference Thompson and Singh1967; Thurstone, Reference Thurstone1927; Yellott Jr., Reference Yellott1977). In this model, a ranking
$\pi =\{1\prec 2\prec \dots \prec J\}$
of J objects is assigned probability

where often one sets
$\sum _{j} \omega _j=1$
for identifiability. Rankings drawn from the Plackett–Luce model may be interpreted as being created sequentially, where in the first stage an object is selected among all the options, in the second stage an object is selected among all the remaining, and so on. Extensions of distributions in the BTL family have been proposed to capture intricacies in ranked preferences such as order of presentation effects, ties, and covariates (Chapman & Staelin, Reference Chapman and Staelin1982; Critchlow & Fligner, Reference Critchlow and Fligner1991; Gormley & Murphy, Reference Gormley and Murphy2010; Rao & Kupper, Reference Rao and Kupper1967). Importantly, the BTL family can handle partial and incomplete rankings by exploiting its reliance on Luce’s Choice Axiom.
Since BTL distributions have continuous parameters, rank-clusters may be estimated by employing parameter fusion or shrinkage. Parameter fusion is the process of simultaneously estimating parameter values and groups of parameters that should be set equal in value (i.e., “fusing” parameters together). Masarotto & Varin (Reference Masarotto and Varin2012) analyze pairwise comparison data from sports tournaments with parameter fusion techniques under the Bradley–Terry model. Masarotto & Varin (Reference Masarotto and Varin2012) estimate an overall ranking of teams with rank-clusters by applying the frequentist fused lasso (Tibshirani et al., Reference Tibshirani, Saunders, Rosset, Zhu and Knight2005), in which the absolute difference between every pair of worth parameters is penalized after some data-driven normalization. In this approach, the fused parameters are made equal and thus create a rank-cluster among the corresponding objects. The approach of Masarotto & Varin (Reference Masarotto and Varin2012) was extended to additional datasets in sports (Tutz & Schauberger, Reference Tutz and Schauberger2015) and academic journal rankings (Vana et al., Reference Vana, Hochreiter and Hornik2016; Varin et al., Reference Varin, Cattelan and Firth2016). Jeon & Choi (Reference Jeon and Choi2018) argued that shrinkage methods like those proposed by Masarotto & Varin (Reference Masarotto and Varin2012) and Tutz & Schauberger (Reference Tutz and Schauberger2015) were developed specifically for pairwise comparisons, and thus have inappropriate penalty functions for application to richer kinds of ordinal data like partial or complete rankings. As a result, Jeon & Choi (Reference Jeon and Choi2018) proposed a modified regularization penalty that may be applied to partial or complete rankings under the Plackett–Luce model. Relatedly, Hermes et al. (Reference Hermes, van Heerwaarden and Behrouzi2024) consider sparse estimation of a Plackett–Luce model with object-level covariates under judge heterogeneity. In their setting, the number of heterogeneous preference groups and the group membership of each judge are assumed fixed and known. To improve efficiency of estimation across groups and predictive performance, they impose a lasso penalty on group-specific covariate coefficients and a simultaneous fused lasso penalty between each pair of group-specific covariate coefficients. We note that the setting studied by Hermes et al. (Reference Hermes, van Heerwaarden and Behrouzi2024) is fundamentally different to ours, in that they assume (known) preference heterogeneity among the judges and the presence of object-specific covariates.
Parameter fusion methods for rank-clustering exhibit four distinct disadvantages: First, maximum likelihood estimation of models in the BTL family, even in their simplest forms, often suffers from numerical instability and slow computational speed. As a result, numerous authors have proposed complex algorithms to improve estimation accuracy or speed (Hunter et al., Reference Hunter2004; Maystre & Grossglauser, Reference Maystre and Grossglauser2015; Nguyen & Zhang, Reference Nguyen and Zhang2023; Turner et al., Reference Turner, van Etten, Firth and Kosmidis2020). Second, uncertainty quantification is challenging and theoretically tenuous in lasso-based methods (Fan & Li, Reference Fan and Li2001; Tibshirani, Reference Tibshirani1996). Third, lasso penalty parameters may be difficult to select, requiring data-driven or ad hoc techniques (Masarotto & Varin, Reference Masarotto and Varin2012; Tibshirani, Reference Tibshirani1996). Thus, interpretation of the resulting parameter estimates and associated uncertainty is reliant on the specific choice of penalty parameter. Fourth, prior knowledge on the amount and size of rank-clusters cannot be directly incorporated into the frequentist framework: Although the penalty parameter influences estimation of rank-clusters, the specific meaning of various possible choices is not directly interpretable.
Many of these disadvantages may be addressed using spike-and-slab priors, a Bayesian approach to variable selection (George & McCulloch, Reference George and McCulloch1997; Ishwaran & Rao, Reference Ishwaran and Rao2005; Mitchell & Beauchamp, Reference Mitchell and Beauchamp1988). Spike-and-slab priors assign weight to both a point-mass at 0 (“spike”) and a continuous density function (“slab”). Although the specific formulations of these priors vary, they estimate parameters which are precisely zero in a probabilistic framework that incorporates prior knowledge via interpretable hyperparameters, as opposed to opaque penalty parameters. However, we are aware of only one variant of this prior class for parameter fusion: Wu et al. (Reference Wu, Shimamura, Yoshikawa, Murayama and Kawano2021) apply spike-and-slab to differences in successive parameters in a linear regression. In their method, the order of parameters from least to greatest in coefficient value must be known in advance (as in the fused lasso). This is not practical in the canonical ordinal data setting because the parameter order is equivalent to the overall ranking, whose estimation is a primary goal. Thus, no Bayesian parameter fusions methods exist which may be directly applied to ordinal data analyses with rank-clustering. Alternatively, one may consider the class of continuous shrinkage priors, which include Bayesian variants of the lasso (Park & Casella, Reference Park and Casella2008) and fused lasso (Casella et al., Reference Casella, Ghosh, Gill and Kyung2010) among others (e.g., Bhattacharya et al., Reference Bhattacharya, Pati, Pillai and Dunson2015; Carvalho et al., Reference Carvalho, Polson and Scott2010; Griffin & Brown, Reference Griffin and Brown2005). However, continuous shrinkage priors do not place positive probability on coefficients (or their differences) being precisely zero. Thus, parameter fusion must be performed via thresholding the posterior distribution, which is often ad-hoc (Porwal & Rodriguez, Reference Porwal and Rodriguez2024) and will not be considered in this work.
2.2 Methods based on Mallows distributions
Alternatively, one may consider rank-clustering under the Mallows family of ranking models (Mallows, Reference Mallows1957). The Mallows family is parameterized by the overall ranking,
$\pi _0$
, and a scale parameter
$\theta \geq 0$
that dictates how likely rankings of a given distance to
$\pi _0$
are to be drawn. Specifically, the probability of drawing a ranking
$\pi $
from a Mallows
$(\pi _0,\theta )$
distribution is

where
$d(\cdot ,\cdot )$
is a distance metric and
$\psi (\theta )$
is a function which provides an appropriate normalizing constant. Foundational models in the family are defined by their distance metric, with common choices being the Kendall’s
$\tau $
(Kendall, Reference Kendall1938) and Spearman’s
$\rho $
(Spearman, Reference Spearman1904).
To our knowledge, the Clustered Mallows Model (CMM) proposed by Piancastelli & Friel (Reference Piancastelli and Friel2025) is the only rank-clustering method based on the Mallows model. Their work, proposed concurrently and independently to ours, models item indifference (i.e., rank-clusters) by permitting the overall ranking parameter
$\pi _0$
to include groups of objects that are tied in rank. The model is estimated in a Bayesian framework from the observed ranking data. However, there are 3 major limitations to their work: First and most importantly, the model requires both the number of rank-clusters and the number of objects per cluster to be pre-specified. Although the authors propose sensible and efficient tools for model selection, the requirement opens the possibility of model misspecification. For example, given seven objects there are 127 model specifications; given 10 objects there are 1,023 model specifications. In addition, pre-specifying the rank-clustering structure removes any uncertainty in the number of rank-clusters and their sizes from the inference task, which we believe to be of key interest in many applications. Second, Bayesian inference of a Clustered Mallows model is in the class of doubly-intractable problems since the proposed model’s normalizing constant is not available in closed form. As a result, exact inference may be computationally slow, or approximation methods may need to be used that require an inexact pseudolikelihood approach. Third, the Mallows model is best suited for ordinal data in the form of complete or partial rankings, meaning the CMM cannot handle pairwise or groupwise comparisons. As will be shown in Section 3, our proposed model avoids all three issues by incorporating parameter fusion in the continuously-parameterized BTL model family.
3 The Rank-Clustered BTL model
In this section, we first develop a novel spike-and-slab prior for parameter fusion based on partitions. Then, we employ the prior in a model for rank-clustering based on the BTL family of ordinal data models.
3.1 PSSF prior
Suppose data are drawn exchangeably from a model,
$\mathcal {M}$
, parameterized by the vector
$\omega $
. We suppose
$\omega $
is of length J and let each
$\omega _j\in \Omega $
,
$\Omega \subseteq \mathbb {R}$
. Our goal is to estimate
$\omega $
under the belief that some pairs or groups of parameters in
$\omega $
may be clustered (i.e., fused). We say that two parameters
$m,n\in \{1,\dots ,J\}$
,
$m\neq n$
, are clustered precisely when
$\omega _m=\omega _n$
. Clustered parameters may take on any value in their domain,
$\Omega $
.
Before specifying the prior, we provide some notation on partitions. A partition of an object set
$\mathcal {J}=\{1,2,\dots ,J\}$
is a collection
$g=\{C(1),C(2),\dots ,C(K)\}$
of K disjoint nonempty subsets (henceforth referred to as “clusters”) of
$\mathcal {J}$
such that their union forms
$\mathcal {J}$
. Let
$C^{-1}(j)$
represent the cluster that contains object
$j\in \mathcal {J}$
. We let
$S(k) = \big |\{C(k)\}\big |$
be the size of the subset
$C(k)$
, and denote by K the number of clusters in g. To emphasize dependence on g, we often write
$K_g$
,
$C_g(k)$
, etc. Lastly, we let
$\mathcal {G}$
represent the collection of all partitions g of
$\mathcal {J}$
, and let
$\mathcal {G}_k = \{g\in \mathcal {G} : K_g = k\}$
.
We are now ready to specify the PSSF prior. Under PSSF,
$\omega $
is assumed to be generated via the following hierarchical model:

In Equation (4),
$f_G(\cdot )$
is a probability mass function on
$\mathcal {G}$
and
$f_\nu (\cdot )$
is a probability density function on
$\Omega $
. In words, the prior generates a partition g, and then assigns a unique value
$\nu _k$
to each cluster
$C(k)\in g$
. Last, each parameter in
$\omega $
is assigned the value of
$\nu $
corresponding to its cluster in g.
As an example, suppose
$\mathcal {J}=\{1,2,3\}$
and we draw
$g=\{C(1),C(2)\}$
such that
$C(1)=\{2\}$
and
$C(2)=\{1,3\}$
, and draw
$\nu = [5,10]$
. Then,
$\omega = [10,5,10]$
because,

3.1.1 Marginal prior probabilities
A useful feature of the PSSF prior is that, regardless of
$f_G$
, the marginal distribution of each
$\omega _j$
follows
$f_\nu $
. This is because,



Equation (5) holds as there cannot be more than J clusters and each object belongs to precisely one cluster, Equation (6) holds by the exchangeability of
$\nu _{k}$
, and Equation (7) holds since
$P[\nu _1] = f_\nu (\cdot )$
by definition and the Law of Total Probability.
3.1.2 Relationship to spike-and-slab
We have not yet explained the proposed PSSF prior’s relationship to the spike-and-slab. It is easiest to understand their connection by considering the joint prior distribution on two arbitrary component parameters,
$\omega _m$
and
$\omega _n$
, such that
$m\neq n$
. Due to the partitioning structure of parameters in the PSSF prior, there is prior probability associated with a parameter cluster. Thus, their joint prior distribution contains a “spike” component along the line
$\omega _m=\omega _n$
, with density of that line determined by
$f_\nu $
. Oppositely, given
$\omega _m\neq \omega _n$
their joint prior distribution reflects independent draws from
$f_\nu $
.
Figure 1 gives examples of the PSSF prior under varying choices of
$f_G$
and
$f_\nu $
. In all panels, we let
$\mathcal {J} = \{1,2\}$
and display the joint prior distribution of
$(\omega _1,\omega _2)$
. In this setting, there are only two unique partitions,
$g=\{1,1\}$
and
$g=\{1,2\}$
. Thus, we specify the prior
$f_G$
by stating the so-called “cluster probability,” i.e., the probability that
$g=\{1,1\}$
. Columns correspond to cluster probabilities
$0.1, 0.5$
, and
$0.9$
, respectively. Rows correspond to
$f_\nu =\text {Normal}(0,1)$
and
$\text {Gamma}(5,3)$
, respectively. We notice that as the cluster probability increases, so does the density of points in the spike component. Regardless of
$f_G$
, marginal distributions of each parameter follow
$f_\nu $
. The marginal relationships seen in Figure 1 hold identically even as
$\mathcal {J}$
grows.

Figure 1 Joint distribution of
$(\omega _1,\omega _2)$
under the PSSF prior with varying combinations of
$f_G$
and
$f_\nu $
.
Note: In all cases,
$\mathcal {J}=\{1,2\}$
, and plots show 20,000 sampled values with marginal density estimates along the axes. Rows correspond to the choice of
$f_\nu $
and columns to
$f_G$
.
Furthermore, we show the difference between parameters,
$\omega _2-\omega _1$
, between different scenarios in Figure 2. The rows and columns are identical to that in Figure 1 and make clear the relationship between the PSSF prior and the traditional spike-and-slab, which has a spike component at 0 and a background slab density.

Figure 2 Distribution of
$\omega _2-\omega _1$
under the PSSF prior with varying combinations of
$f_G$
and
$f_\nu $
.
Note: In all cases,
$\mathcal {J}=\{1,2\}$
. Rows correspond to the choice of
$f_\nu $
and columns to
$f_G$
.
3.2 Rank-Clustered BTL model
We now introduce the Rank-Clustered BTL model for ordinal data. Let I be the number of judges who assess J objects. Let
$\Pi _i$
represent the ordinal preference provided by judge i, which may be a partial ranking, complete ranking, pairwise comparison, or groupwise comparison. Let
$R_i$
be the number of objects ranked by judge i, i.e.,
$R_i= |\Pi _i|$
. When
$R_i<J$
, his/her ranking is partial. Let
$\mathcal {S}_i$
denote the objects considered by judge i when forming his/her ranking, such that
$\mathcal {S}_i\subseteq \mathcal {J}$
. When
$\mathcal {S}_i\subset \mathcal {J}$
, his/her ranking is incomplete.
$R_i$
and
$\mathcal {S}_i$
are assumed known.
Under the Rank-Clustered BTL model, we assume ordinal data is generated via the following Bayesian model:

Rank-Clustered BTL applies the proposed PSSF prior under specific choices of
$f_G$
and
$f\nu $
to the BTL family of distributions for ordinal data. Note that the data-generating BTL distribution is identifiable up to scalar multiplication of
$\omega $
. However, the proposed Bayesian model does not suffer from identifiability issues due to the non-uniform prior on
$\omega $
(Johnson et al., Reference Johnson, Henderson and Boys2022). We emphasize that unlike existing rank-clustering methods, the proposed model does not pre-specify the number of clusters, a specific rank-clustering structure, or the order of objects. These are treated as random variables and estimated simultaneously.
3.2.1 Prior selection
We now discuss the selection of priors and hyperparameters. We set
$f_G$
according to

In words, the prior probability of drawing a specific partition g depends only on how many unique clusters,
$K_g$
, it contains. This prior is intentionally vague to permit a variety of rank-clustering patterns. Note that every partition with the same
$K_g$
has equal prior probability. As a consequence, cluster sizes do not explicitly impact the prior probability of each g.Footnote 2 Still, there is an implicit connection between cluster size and
$K_g$
. For example, if
$K_g=J$
, every cluster must be a singleton. In this setup, one could set
$\lambda \approx 1$
to encourage rank-clustering, or
$\lambda \approx J$
to discourage rank-clustering. Next, we set
$f_\nu $
according to

This Gamma prior has been used in Bayesian estimation of BTL models as it allows for closed-form Gibbs sampling via data augmentation (Caron & Doucet, Reference Caron and Doucet2012; Mollica & Tardella, Reference Mollica and Tardella2017). The hyperparameters
$a_\gamma $
and
$b_\gamma $
control the prior distribution on the worth parameters. Since
$\omega $
is invariant to multiplicative transformations,
$a_\gamma $
and
$b_\gamma $
are generally non-influential. Nonetheless, because the ratios between worth parameters could become very large when one object is strongly preferred over another,
$(a_\gamma ,b_\gamma )$
should be chosen to give some density to values near 0 to allow for such extreme ratios.
3.2.2 Goodness-of-fit
To assess the adequacy of an estimated Bayesian model to observed data, we use a posterior predictive p-value (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013, p. 146),

where
$\Pi ^{\text {rep}}$
is a draw from the posterior predictive distribution,
$\Pi ^{\text {obs}}$
is the observed data,
$T(\Pi ;g,\nu )$
is a discrepancy measure chosen to test a specific quality of the assumed model, and the probability is taken over the posterior distribution of parameters
$g,\nu $
and the posterior predictive distribution of
$\Pi $
. Based on Yao & Böckenholt (Reference Yao and Böckenholt1999) and Mollica & Tardella (Reference Mollica and Tardella2017), we employ a discrepancy measure that considers the number of times item j beats item
$j^{\prime }$
, denoted
$\tau _{jj^{\prime }}$
, for
$j,j^{\prime }=1,\dots ,J$
. Specifically,

where
$\tau ^*_{jj^{\prime }}$
is the theoretical frequency expected under an assumed model with parameters
$g,\nu $
. Under a well-fitting model, the posterior predictive p-value would be near 0.5, with small values indicating inadequate model fit.
4 Bayesian estimation
In this section, we develop a Gibbs sampler for Bayesian estimation of Rank-Clustered BTL models and provide simulations to demonstrate its performance under varying numbers of observations and rank-clusters.
4.1 Gibbs sampler
Equation (4) defines
$\omega $
by the pair
$(\nu ,g)$
. Thus, to estimate
$\omega $
, we sample from the joint posterior distribution of
$(\nu ,g)$
. We do so using a RJMCMC Gibbs sampler that alternates between updating g and
$\nu $
via their full conditionals after data augmentation. The sampler is summarized in Algorithm 1.

Based on our experience fitting Rank-Clustered BTL models to real and simulated data, we recommend initializing
$g^{(0)} = \{1,2,\dots ,J\}$
(and thus
$K_{g^{(0)}}=J$
) as it allows rank-clusters to be formed during the estimation process (as opposed to being imposed by the analyst during initialization). For Step 2,
$T_1$
should be sufficiently large to allow for convergence of the MCMC chain, although specific choices are context-dependent. Step 2(a) performs RJMCMC on clusters of objects. Since RJMCMC can be slow to converge in high dimensions, it is important to run multiple chains and assess for mixing and convergence (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013). Step 2(b) relies on a closed-form Gibbs sampler. We find
$T_2\leq 5$
is usually sufficient for posterior sampling.
4.1.1 Details of Step 2(a)
We now detail Step 2(a), which proposes a new partition
$g^{\prime }$
based on the current partition g. Since
$(\nu, g)$
are intricately tied,
$\nu $
must simultaneously be updated to an appropriate
$\nu ^{\prime }$
. The sampling of discrete partitions is challenging to perform efficiently. In a seminal paper on RJMCMC, Green (Reference Green1995) provided a method for sampling partitions. We adapt that work for the Rank-Clustered BTL model.
Following Green (Reference Green1995), we only propose
$g^{\prime }$
which are slight modifications of g: Precisely, we allow only for “births” splitting one cluster into two, or “deaths” merging two clusters into one. Since all partitions have positive probability, this process is irreducible, as required. There is no need to propose
$g^{\prime }$
that shuffle the partitions but maintain the number of clusters, as these partitions may be obtained by successive birth and death moves.
Births are attempted with probability
$b_g = 0.5$
.Footnote 3 In this case, we select a cluster k at random among those with at least two objects. The cluster is split “binomially”, meaning that each object is placed independently into one of the “child” subgroups,
$k_1$
or
$k_2$
, with equal probability, conditional on each subgroup ultimately containing at least one object. Deaths are attempted with probability
$d_g = 1-b_g = 0.5$
. In a death, two adjacent clusters are merged at random. Adjacency means that
$\not \exists k : \nu _k\in (\nu _{k_1},\nu _{k_2})$
.
Births and deaths require updating
$\nu $
by increasing or decreasing its dimension by 1, respectively. In a birth, we split a cluster’s worth
$\nu _k$
into
$(\nu ^{\prime }_{k_1},\nu ^{\prime }_{k_2})$
using,

where
$u\sim \text {Unif}(0.5,1.5)$
. The corresponding death solves these equations simultaneously:

For reversibility, we automatically reject proposed births where
$\nu ^{\prime }_{k_1},\nu ^{\prime }_{k_2}$
are not adjacent.
Per Green (Reference Green1995), the Metropolis–Hastings probabilities for a birth and death, respectively, are
$\min (1,A)$
and
$\min (1,A^{-1})$
, where

where
$q(\nu ^{\prime },g^{\prime }|\nu ,g)$
is the transition probability of sampling
$(\nu ^{\prime },g^{\prime })$
given current parameter set
$(\nu ,g)$
. We now calculate each term in A. First,

where
$P(\Pi |\nu ,g)$
and
$P[g]$
are defined by Equation (8). Second,

The numerator in Equation (15) is the death probability,
$d_{g^{\prime }}$
, times the probability of selecting a pair of adjacent partitions given
$K_{g^{\prime }}$
total partitions after a split (there are
$K_{g^{\prime }}-1$
such pairs). The denominator is the birth probability,
$b_g$
, times the probability of selecting a specific cluster k among those with at least two members. This term also includes the probability of dividing the
$S_g(k)$
objects in cluster k into two non-empty subsets. There are
$(2^{S_g(k)}-2)/2$
such subsets, since there are
$2^{S_g(k)}$
total possible partitions, two empty partitions, and two ways to obtain each two-way split. Third and last,

4.1.2 Details of Step 2(b)
To update
$\nu $
conditional on a partition g and our data,
$\Pi $
, we turn to a clever data augmentation trick for Bayesian estimation of Plackett–Luce models as seen in Caron & Doucet (Reference Caron and Doucet2012) and Mollica & Tardella (Reference Mollica and Tardella2017). Here, we adapt their trick to account for the more general BTL family of distributions and rank-clustering. Let
$Y = \{Y_{ir}\}$
be a collection of independent random variables,
$i=1,\dots ,I$
and
$r=1,\dots ,R_i$
, sampled according to

The exponential rates are precisely the denominator terms from BTL densities that are burdensome to calculate. The full conditional posterior probability
$P[\nu |Y,\Pi ,g]$
is then,

Given these cancellations, we notice a closed-form expression for the posterior:

where


Thus, we can sample
$\nu $
from a closed-form Gamma distribution after augmentation of the conditioning data
$\Pi $
and random variable g with Y.
Now that we have developed an efficient estimation algorithm for Rank-Clustered BTL models, we turn to a numerical simulation to demonstrate estimation accuracy under different rank-clustering regimes.
4.2 Numerical simulation
We now demonstrate accurate estimation of worth parameters and rank-clusters via a Rank-Clustered BTL model in a numerical simulation. We assume there are
$J=8$
objects which form K=1, 2, 4, or 8 rank-clusters. When
$K=J=8$
, every object is independent; there are only singleton rank-clusters. In the true worth parameter vector,
$\omega _0$
, rank-clustered objects have identical values and successive rank-clusters are separated in value by a factor of 4 (see Table 1 for specific values). Fourfold increases induce strong but not absolute separation between objects: For demonstration, in a pairwise tournament between an object with
$\omega _1=1$
and
$\omega _2=4$
, the probability of selecting object 2 is,

We also vary the Poisson hyperparameter on the number of rank-clusters,
$\lambda \in \{0.1,2,4,8\}$
, which encourages rank-clustering to different extents and allows us to measure robustness of results when
$\lambda $
is somewhat misspecified. To assess consistency in the number of observations, we vary the number of judges
$I\in \{50,200,800\}$
. Finally, to assess the influence of partial and incomplete rankings, we vary the tuple
$(R,S)\in \{(2,2),(4,4),(2,8),(4,8),(8,8)\}$
, where R is the number of ranked objects and S is the number of objects considered by each judge. When
$R<8$
the ranking is partial, when
$S<8$
the ranking is incomplete. The set of considered objects,
$\mathcal {S}_i$
for each judge i, is selected independently and uniformly at random.
Table 1 Simulation settings for
$\omega _0$
under varying numbers of true rank-clusters, K

For each combination of K,
$\lambda $
,
$(R,S)$
, and I, we generate 20 independent datasets and fit a Rank-Clustered BTL distribution to each, under hyperparameters
$a_\gamma =5$
and
$b_\gamma =3$
. We set
$T_1=5{,}000$
and
$T_2=2$
to obtain 10,000 posterior samples in each MCMC chain and remove the first half as burn-in. We note that no MCMC chain of length 10,000 took longer than 20 minutes to run (
$\sim $
0.12 seconds/iteration); many ran in under 2 minutes. For identifiability, posterior estimates of
$\omega _0$
are normalized post-hoc such that
$\sum _j \omega _{0j} = 1$
.
We first examine the accuracy of estimation for
$\omega _0$
across simulation settings. Figure 3 displays boxplots of mean absolute error (MAE) for
$\omega _0$
by number of judges I, true number of rank-clusters K, and the choice of hyperparameter
$\lambda $
. In general, estimation is quite accurate. We see that for any specific combination of K and
$\lambda $
, MAE decreases as I increases. Estimation error is higher when K is large and I is small, most likely the result of error estimating a complex rank-clustering structure.

Figure 3 Boxplots of posterior mean absolute error for
$\omega _0$
across combinations of the number of judges I, true number of rank-clusters K, hyperparameter
$\lambda $
, number of ranked objects R, and number of assessed objects S.
Note: Errors are calculated after normalization of posterior samples such that
$\sum _{j}\omega _{0j}=1$
.
Figure 4 displays the mean posterior probability of rank-clustering across object pairs which are truly rank-clustered (navy) or independent (gold) in
$\omega _0$
.

Figure 4 Boxplots of the mean posterior probability of rank-clustering object pairs which are truly rank-clustered (left) or independent (right) across combinations of I, K,
$\lambda $
, R, and S.
Results are further separated by the number of judges, I, true number of clusters, K, and hyperparameter
$\lambda $
. For rank-clustered pairs, accuracy of recovery is generally high and increases with the number of judges, I. Accuracy is best when hyperparameter
$\lambda \approx K$
, which occurs when prior belief regarding the number of rank-clusters is approximately correct. If there is limited prior knowledge on the number of rank-clusters, we suggest specifying a vague hyperparameter setting such as
$\lambda = \frac {J}{2}$
and assessing sensitivity of results to various choices of
$\lambda $
. The posterior probability of rank-clustering independent object pairs is near 0 in all simulations, indicating excellent recovery accuracy of objects with distinct worth parameters.
The numerical simulations in this section indicate that the proposed Rank-Clustered BTL model is able to accurately estimate the relative worth of objects in a collection, including in the presence of rank-clustering or partial/incomplete observed rankings. Estimation error decreases to 0 as the number of observations increases. Overall, the model correctly identifies rank-clustered and independent object pairs.
5 Applications
In this section, we apply the Rank-Clustered BTL model to four real datasets involving ordinal comparisons. These four applications were chosen to highlight the applicability of our method to various ordinal data types and domain areas and illustrate methodological values of our approach which are summarized in Table 2. The data sets are comprised of sushi preferences of Japanese adults (Kamishima, Reference Kamishima2003), ranked-choice votes in a Minneapolis mayoral election (Minneapolis Elections and Voter Services, 2021), policy preferences of respondents from Great Britain in a Eurobarometer survey (Reif & Melich, Reference Reif and Melich1993), and pairwise game outcomes among teams in the US NBA (National Basketball Association, 2024).
5.1 Sushi preferences in Tohoku
We first study complete preference rankings of 10 sushi types from a benchmarking dataset by Kamishima (Reference Kamishima2003). To allow our results to be comparable with an analysis of the sushi data by Piancastelli & Friel (Reference Piancastelli and Friel2025), we analyze the preferences of survey respondents who lived in Japan’s Tohoku region until at least 15 years of age. There were 280 such respondents. We fit a Rank-Clustered BTL distribution to the data with
$a_\gamma =5$
,
$b_\gamma =3$
, and
$\lambda = 2$
to encourage rank-clustering but permit a wide variety of outcomes. We ran a total of 32,000 MCMC iterations, which took approximately 12 minutes (0.02 seconds/iteration). Figure 5 displays posterior rank-clustering probabilities (left) and parameter posteriors (right). In the left panel, the color of the
$(i,j)$
square of the clustering matrix represents the posterior probability that sushi types i and j are equal in rank at the population level. Additional results, including goodness-of-fit and convergence diagnostics, are provided in the Appendices .
Table 2 Summary of applications by subsection


Figure 5 Primary results from Rank-Clustered BTL analysis of Tohoku sushi data.
Note: Left: Posterior rank-clustering probabilities. Main diagonal displays posterior median estimate of worth parameter after normalization. Red squares indicate maximum a posteriori rank-clusters. Right: Posterior distributions of sushi-specific worth parameters.
Sushi types are ordered according to posterior median worth. Based on the left panel in Figure 5, fatty tuna appears to be strictly most preferred in this population, followed by tuna and shrimp rank-clustered in second place. Salmon roe and sea eel exhibit high posterior probability of rank-clustering, as do sea urchin, tuna roll, and squid; these two groups may themselves be rank-clustered. Egg and cucumber roll are rank-clustered in last place. Our results demonstrate the proposed model’s ability to rank-cluster objects with uncertainty under complete rankings in survey data.
We compare our results to those found by Piancastelli & Friel (Reference Piancastelli and Friel2025) in a CMM. They estimate the following ranking: fatty tuna
$\prec $
tuna
$\prec $
shrimp
$\prec $
{salmon roe, sea urchin}
$\prec $
{sea eel, tuna roll, squid}
$\prec $
{egg, cucumber roll}. Our results are, unsurprisingly, similar, but differ in illuminating ways. Tuna and shrimp are rank-clustered in our model. The rank-clusters {salmon roe, sea eel} and {sea urchin, tuna roll, squid} swap the rank of sea eel and sea urchin. These two rank-clusters exhibit some posterior probability of rank-clustering themselves. These differences showcase how the model pre-specification required by CMM limits the flexibility of results and may not fully show what the data has to offer or fully account for uncertainty in the estimated ranks and rank-clusters. The Rank-Clustered BTL model requires no pre-specification and permits complex posterior summaries of rank-clustering, including uncertainty in the number of rank-clusters and their respective sizes.
5.2 2021 Minneapolis mayoral election
Our second example analyzes real rank-choice votes from the 2021 mayoral election in Minneapolis, Minnesota (Minneapolis Elections and Voter Services, 2021). This election included 17 candidates (excluding write-ins and one who received no votes) and asked voters to rank their top-three choices, in order. A total of 145,337 votes were cast in this election. To mimic exit polling data, we randomly sample 1000 valid votes for analysis, which we treat as a random sample of preferences from the population of Minneapolis voters. We want to estimate the overall preferences of Minneapolis voters regarding mayoral candidates and learn which candidates, if any, are rank-clustered at the population level. Clustering candidates may be of interest to political scientists or local political organizations for the purpose of understanding voter preferences (Dimock et al., Reference Dimock, Doherty, Kiley and Krishnamurthy2014; Gunther & Diamond, Reference Gunther and Diamond2003). For example, if the winner of the election is deemed to be rank-clustered with other candidate(s), their mandate may be considered weak. Conversely, if the winner is a singleton first-place rank-cluster—clearly ranked above all other candidates—their mandate may be considered strong. We fit a Rank-Clustered BTL to the data with
$a_\gamma =5$
,
$b_\gamma =3$
, and
$\lambda = 2$
to encourage few rank-clusters. We ran a total of 80
$,$
000 MCMC iterations, which took approximately 72.5 minutes (approximately 0.05 seconds/iteration). Figure 6 displays posterior rank-clustering probabilities (left) and parameter posteriors (right). In the left panel, the color of the
$(i,j)$
square of the clustering matrix represents the posterior probability that candidates i and j are equal in rank at the population level. Additional results, including goodness-of-fit and convergence diagnostics, are provided in the Appendices.

Figure 6 Primary results from Rank-Clustered BTL analysis of mayoral votes.
Note: Party abbreviations are in parentheses after candidate surnames. Left: Posterior rank-clustering probabilities. Main diagonal displays posterior median estimate of worth parameter after normalization. Red squares indicate maximum a posteriori rank-clusters. Right: Posterior distributions of candidate-specific worth parameters.
In Figure 6, candidates are ordered by their posterior median estimate of worth. Cluster 1 consists of Jacob Frey, the winner and incumbent. We note that Frey is not rank-clustered with other candidates with high posterior probability, suggesting a relatively strong mandate. Cluster 2 consists of Kate Knuth and Sheila Nezhad, both female, non-incumbent DFL candidates. Last, Cluster 7 consists of 6 candidates with minimal support.
Figure 7 compares point estimates of rank for each candidate across four methods. The first and second rows display assigned ranks from ranked choice and “first-past-the-post” (FPP) election procedures, respectively. We calculate FPP ranks by ordering candidates by the number of first place votes he/she received (ignoring all second and third place votes).Footnote 4 The third and fourth rows display maximum a posteriori ranks from a standard Bayesian BTL and our Rank-Clustered BTL, respectively. Frey wins the election in all methods. The BTL and Rank-Clustered BTL models roughly reflect the deterministic algorithms, although we notice some swaps in candidate ranks which may be attributed to differences between first place and second or third place votes. For example, Conner received fewer first place votes than Turner, but far more second and third place votes (see Appendices for vote totals). As a result, deterministic algorithms rank Turner above Conner, while the BTL model takes into account the additional preference information and ranks Conner above Turner. In summary, the overall ordering estimated by the Rank-Clustered BTL differs from a standard BTL model and two deterministic election procedures. Furthermore, our model confirms that Frey is strictly preferred over the remaining candidates by voters.

Figure 7 Comparison of estimated rank for each candidate across four aggregation methods: Ranked Choice, First-Past-the-Post (FPP), BTL, and Rank-Clustered BTL (RC BTL).
Note: Candidates are ordered by their rank in the actual ranked choice election.
5.3 Eurobarometer 34.1 survey data
We analyze data from the Eurobarometer 34.1 survey (Reif & Melich, Reference Reif and Melich1993), which included the following question:
Question 28: There are various actions that could be taken to eliminate the drugs problem. In your opinion, what is the first priority? And the next most urgent? (Ask respondent to rank all 7, with 1 as the most urgent.)
-
1. Information campaigns about the dangers of drugs.
-
2. Hunting down drug pushers and distributors.
-
3. Legal penalty for drug taking.
-
4. Looking after and treating drug addicts and rehabilitating them.
-
5. Funding research into drug substitutes, and into the treatment of drug addiction.
-
6. Fighting the social causes of drug addiction.
-
7. Reinforcing the control or distribution and usage of addictive medicines.
We subset the data to respondents from Great Britain to avoid heterogeneity and non-proportional sampling among respondents from different European countries. There were 1005 valid responses among this group (out of 1,031 total surveyed), of which 970 were complete rankings and the rest ranked between one and five items (a top-six ranking is inherently equivalent to a complete ranking since all survey options were presented). We seek to identify a population-level ordering of the priorities that accounts for potential equality or indistinguishability among the options based on the survey data. These data were previously studied by Wang et al. (Reference Wang, Matsueda and Erosheva2017) with a mixed-membership model to learn about heterogeneity of opinions among survey respondents. Our analysis, although a simplification of the diverse population’s heterogeneous preferences, provides a simpler interpretation to policy-makers interested in understanding rank-ordering of policy preferences.
We fit a Rank-Clustered BTL model to the data with
$a_\gamma =5$
,
$b_\gamma =3$
, and
$\lambda =2$
to encourage rank-clustering. We ran a total of 16,000 MCMC iterations, which took approximately 12.5 minutes (0.047 seconds/iteration). Figure 8 displays posterior rank-clustering probabilities (left) and parameter posteriors (right). In the left panel, the color of the
$(i,j)$
square of the clustering matrix represents the posterior probability that policies i and j are equal in rank at the population level. Additional results, including goodness-of-fit and convergence diagnostics, are provided in the Appendices.

Figure 8 Primary results from Rank-Clustered BTL analysis of Eurobarometer 34.1 data.
Note: Left:Posterior rank-clustering probabilities. Main diagonal displays posterior median estimate of worth parameter after normalization. Red squares indicate maximum a posteriori rank-clusters. Right: Posterior distributions of policy-specific worth parameters.
Policy option 2 (hunting drug pushers) is strictly preferred to the rest among the population of survey respondents from Great Britain, whereas options 5 (funding research) and 3 (legal penalty) are rank-clustered last. The results indicates to policymakers that respondents in Great Britain strongly prioritize Option 2 in comparison to the rest, while pairs of Options 1 and 4 and Options 3 and 5, are, respectively, indistinguishable within each pair, with 1 and 4 being strongly preferred to 3 and 5. By rank-clustering similarly-preferred options, interpretation of constituent preferences is simplified for policymakers.
5.4 2023–2024 NBA game outcomes
Last, we analyze outcomes of 1
$,$
230 games from the 2023–2024 season of the National Basketball Association (NBA) of the United States of America (National Basketball Association, 2024). In this season, 30 teams each played 82 games, including between two and five games against every other team. We seek to estimate an overall ranking of teams that allows for potential equality in ranking.
We fit a Rank-Clustered BTL model to the data with
$a_\gamma =5$
,
$b_\gamma =3$
, and
$\lambda =1$
to encourage rank-clustering given the limited ordinal comparison data provided by pairwise matchups. We ran a total of 320,000 MCMC iterations, which took approximately 22.6 hours (approximately 0.25 seconds/iteration). Figure 9 displays posterior rank-clustering probabilities (left) and parameter posteriors (right). In the left panel, the color of the
$(i,j)$
square of the clustering matrix represents the posterior probability that teams i and j are equal in rank at the population level. Additional results, including goodness-of-fit and convergence diagnostics, are provided in the Appendices.

Figure 9 Primary results from Rank-Clustered BTL analysis of 2023–2024 NBA data.
Note: Left: Posterior rank-clustering probabilities. Main diagonal displays posterior median estimate of worth parameter after normalization. Right: Posterior distributions of team-specific worth parameters.
In this setting, the Rank-Clustered BTL model estimates an ordering of professional basketball teams with uncertain rank-clustering patterns. Uncertain rank-clustering may result from two aspects of this application. First, pairwise comparisons provide little information in relation to partial or complete rankings, by construction. Second, game outcomes provide low signal measurements of team ability (Baumer et al., Reference Baumer, Matthews and Nguyen2023). That is because many factors influence game outcomes, such as skill, home advantage, injuries, roster changes, and luck (Cai et al., Reference Cai, Yu, Wu, Du and Zhou2019). Consistent with the low signal and limited information setting, an 80% posterior credible interval indicates that there are between 6 and 9 rank-clusters. Every team has less than 0.037 posterior probability of belonging to a singleton rank-cluster.
As seen in the left panel of Figure 9, four teams (Boston Celtics, Oklahoma City Thunder, Denver Nuggets, and Minnesota Timberwolves) appear to be rank-clustered for first place. Based on regular season data alone, our model suggests that these 4 teams were of roughly indistinguishable ability. Conversely, we observe that 6 teams (Toronto Raptors, San Antonio Spurs, Portland Trail Blazers, Charlotte Hornets, Washington Wizards, and Detroit Pistons) all have a high posterior probability of rank-clustering in last place. Instead of reporting the uncertain ranking of these teams with some granularity, we recommend to infer that these teams were the worst teams of the league in this season. These rank-clusters, despite not accounting for the complexities of the sport, provide useful and interpretable summaries of the teams’ abilities across the regular season. A similar analysis could be used in the future to predict postseason performance.
6 Discussion
In this article, we proposed the Rank-Clustered BTL model for estimating an overall ranking of objects with rank-clusters. The model employs the BTL family of distributions for ordinal comparisons. We proposed PSSF prior to estimates model parameters in a Bayesian framework. The model requires neither pre-specification of the number or size of the rank-clusters (improving upon Piancastelli & Friel, Reference Piancastelli and Friel2025), nor specification of lasso-based penalty parameters (improving upon Hermes et al., Reference Hermes, van Heerwaarden and Behrouzi2024; Jeon & Choi, Reference Jeon and Choi2018; Masarotto & Varin, Reference Masarotto and Varin2012). In a simulation study, we demonstrated the model’s ability to accurately and consistently estimate the relative worth of objects in a collection while simultaneously estimating rank-clusters. We used Rank-Clustered BTL on four real datasets under different types of ordinal comparison data.
In contrast to the only other spike-and-slab based prior for parameter fusion Wu et al. (Reference Wu, Shimamura, Yoshikawa, Murayama and Kawano2021), PSSF prior we developed does not require a known parameter order. Visual inspection of the prior distribution makes obvious its connection to spike-and-slab: “spike” components correspond to parameter clusters and “slab” components correspond to independent parameters. Estimation of parameters under this model requires reversible jump MCMC. To overcome potentially slow or computationally-burdensome estimation in this setting, we proposed a computationally efficient Gibbs sampler. The sampler alternates between updating the partition of objects, based on the seminal work of Green (Reference Green1995), and updating object-level worth parameters following a data augmentation trick for standard Plackett–Luce models by Caron & Doucet (Reference Caron and Doucet2012) that was later adapted for Plackett–Luce mixtures by Mollica & Tardella (Reference Mollica and Tardella2017).
The proposed PSSF prior requires selecting hyperpriors for partitions,
$f_G$
, and the continuous values for each unique parameter,
$f_\nu $
. In this work, we specified
$f_G\propto \text {Poisson}(K_g|\lambda )$
to be intentionally vague over the large space of partitions and
$f_\nu \propto \text {Gamma}(a_\gamma ,b_\gamma )$
based on conjugacy. However, alternative hyperpriors are available. A Negative Binomial or Beta Negative Binomial distribution for
$f_G$
may be more appropriate when stronger prior knowledge of
$K_g$
is available. If PSSF were to be applied to linear regression for parameter fusion, a Normal or
$t-$
distribution may be substituted for
$f_\nu $
.
A useful benefit of estimating parameter values and clusters in a single Bayesian framework is the avoidance of issues associated with selective inference (Taylor & Tibshirani, Reference Taylor and Tibshirani2015) or, more colloquially, double dipping (Kriegeskorte et al., Reference Kriegeskorte, Simmons, Bellgowan and Baker2009). Selective inference occurs when the same data is used twice in the process of model selection and/or estimation, e.g., to estimate some latent structure underlying the data and subsequently to estimate parameters conditional on that estimated structure. In our context, selective inference would occur if ordinal preference data was used first to identify rank-clusters and then used again to estimate worth parameter values conditional on those clusters. We note that selective inference occurs in the estimation of the related CMM by Piancastelli & Friel (Reference Piancastelli and Friel2025), which requires selecting the number and size of rank-clusters among objects before fitting the model. Selective inference often leads to invalid inference in part because uncertainty regarding the estimated clustering structure is not taken into account. However, Rank-Clustered BTL models do not perform selective inference because parameter values and rank-clusters are estimated simultaneously. As such, we believe our parameter estimates to be more credible than those from the aforementioned methods in the literature because they rely on a fully Bayesian approach that incorporates uncertainty across the posterior distributions of both the rank-clustering structure and the specific parameter values (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013, p. 24).
Results from Rank-Clustered BTL models are useful in a variety of inferential contexts. As noted in other fusion literatures on rankings, estimated overall rankings may be easier to understand and interpret when rank-clusters of objects are identified, as rank-clusters lead to fewer rank levels of objects to distinguish (Masarotto & Varin, Reference Masarotto and Varin2012). In contexts where model results are used for prediction, such as in sports, estimating rank-clusters may improve predictive accuracy (Tutz & Schauberger, Reference Tutz and Schauberger2015). Similarly, estimating rank-clusters is important in the context of decision-making: In peer review, for example, rank-clusters can be beneficial for communicating uncertainty in the assessment of preferences and for better transparency in funding decisions. We might imagine a scenario where a government agency is only able to fund two grants, however, two grant proposals are rank-clustered in second place. In this case, rank-clustering can be used to communicate uncertainty in the relative quality of the top proposals. A potential danger is that under this uncertainty, decision makers may be tempted to resort to unfair tie-breaking methods, e.g., selecting the proposal with the most famous author. Instead, tie-breaking should occur based on a fairer or more principled method, such as a partial lottery (Fang & Casadevall, Reference Fang and Casadevall2016; Heyard et al., Reference Heyard, Ott, Salanti and Egger2022; Roumbanis, Reference Roumbanis2019).
We list a few possible directions for future research. First, in this work we have not considered the level of interconnectedness among the assessed objects (e.g., if separate groups of judges assess completely distinct sets of objects). This is particularly relevant in the case of pairwise comparison data, in which some pairs of objects may never experience a head-to-head match-up. Second, the PSSF prior could be imposed as a prior for more complex BTL models or to other models entirely. In the former, the PSSF prior could be applied to preference learning via BTL distributions that incorporate covariates (e.g., Baldassarre et al., Reference Baldassarre, Dusseldorp, D’Ambrosio, Rooij and Conversano2023; Chapman & Staelin, Reference Chapman and Staelin1982; Gormley & Murphy, Reference Gormley and Murphy2010; Hermes et al., Reference Hermes, van Heerwaarden and Behrouzi2024) or ties in the observed ordinal comparison data (e.g., Rao & Kupper (Reference Rao and Kupper1967)). In that case, the prior may be modified to permit covariate parameter estimation in addition to rank-clustering. In the latter case, the PSSF prior may be applied to regression for variable fusion, and its performance may be compared to other existing Bayesian variable fusion methods (e.g., Casella et al., Reference Casella, Ghosh, Gill and Kyung2010; Shimamura et al., Reference Shimamura, Ueki, Kawano and Konishi2019; Song & Cheng, Reference Song and Cheng2020). Third, we notice that the PSSF prior bears some resemblance to a Dirichlet process prior (Escobar & West, Reference Escobar and West1995). Specifically, we may consider
$f_\nu $
in PSSF as a base distribution in a Dirichlet process. However, the Dirichlet process’ concentration parameter is related to but distinct from
$f_G$
in PSSF. Thus, the connection between Bayesian nonparametrics and Bayesian parameter fusion requires further study. Fourth, the proposed model could be studied in the framework of a latent class mixture model in order to introduce clustering among both objects (i.e. rank-clusters) and judges (i.e., preference heterogeneity) simultaneously. Doing so would result in a novel form of biclustering. However, the identifiability of such a model is not clear and would require theoretical investigation.
The proposed Rank-Clustered BTL model accurately estimates rank-clusters, permitting complex summaries beyond the traditional overall ranking and allowing for improved interpretability of the results. The Bayesian Rank-Clustered BTL model relies on a novel, spike-and-slab type prior for parameter fusion, and is estimated in a computationally-efficient manner. The applications in survey data, voting, and sports to aid informed inference and decision-making illustrate methodological versatility and broad applicability of our proposed rank-clustering approach.
Supplementary material
The supplementary materials include R code and data required to reproduce our simulations and data analysis.
Data availability statement
An R implementation of the Rank-Clustered BTL is publicly available at https://github.com/pearce790/rankclust. Furthermore information on the package can be found at https://pearce790.github.io/rankclust/index.html.
Acknowledgements
The authors thank the editor, associate editor, and three anonymous referees for their insightful comments that greatly improved this work. The authors also thank Drs. T. Brendan Murphy and I. Claire Gormley for inspiring conversations on rank data modeling during the early stages of this work.
Funding statement
The authors were supported by the National Science Foundation under Grant No. 2019901.
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Appendix A Additional application results
Appendix A.1 Additional results from Section 5.1
Figure A.1 displays stacked bar charts of ranks received by each sushi type by the survey respondents from Tohoku. Figure A.2 shows a comparison among results obtained using the proposed Rank-Clustered BTL, a standard BTL, and the Clustered Mallows model.
Table A.1 Posterior predictive p-values based on a standard BTL and Rank-Clustered BTL (RC-BTL) to assess goodness-of-fit in the Sushi data analysis


Figure A.1 Stacked bar charts of ranks received by each sushi type.

Figure A.2 Comparison of results among comparator methods for the Sushi data analysis.
Table A.1 contains posterior predictive p-values based on the discrepancy measure defined in Section 3.2.2. A p-value is calculated for both a standard BTL distribution and a Rank-Clustered BTL distribution fit to the observed data. All statistics are well above a standard 0.05 threshold, indicating acceptable fit to the observed data. We recall that posterior predictive p-values are used as tools to assess potential model misfits, and not to compare or choose among the models (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013, p. 150).
Figures A.3 and A.4 contain trace plots for K and
$\omega $
after burn-in for each chain. We find the trace plots to demonstrate satisfactory mixing and convergence.
Appendix A.2 Additional results from Section 5.2
Figure A.5 displays stacked bar charts of the sampled votes by rank level for each candidate.

Figure A.3 Trace plot of K in the Sushi data analysis.

Figure A.4 Trace plots of
$\omega $
in the Sushi data analysis.

Figure A.5 Number of votes by rank level and candidate. Candidates are ordered by their position in the official ranked choice election.
Note: Acronyms on the tops of bars represent each candidate’s political party.
Candidates are ordered by their final placement according to the official ranked choice voting algorithm. The incumbent, Jacob Frey, receives the largest share of first place votes, although Kate Knuth and Sheila Nezhad also receive substantial support. The remaining candidates receive comparatively few votes. Most candidates are associated with the Democratic–Farmer–Labor (DFL) party, which is affiliated with the national Democratic Party. Laverne Turner and Bob “Again” Carney Jr. are the only Republicans (GOP) in the race. The remaining candidates represent Grassroots–Legalize Cannabis (GLC), Libertarian (LIB), Socialist Workers Party (SWP), For the People Party (FPP), Independence (INC), Independent (IND), and Humanitarian–Community Party (HCP).
Table A.2 contains posterior predictive p-values based on the discrepancy measure defined in Section 3.2.2. A p-value is calculated for both a standard BTL distribution and a Rank-Clustered BTL distribution fit to the observed data. All statistics are well above a standard 0.05 threshold, indicating acceptable fit to the observed data. We recall that posterior predictive p-values are used as tools to assess potential model misfits, and not to compare or choose among the models (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013, p. 150).
Figures A.6 and A.7 contain trace plots for K and
$\omega $
after burn-in for each chain. We find the trace plots to demonstrate satisfactory mixing and convergence.
Table A.2 Posterior predictive p-values based on a standard BTL and Rank-Clustered BTL (RC-BTL) to assess goodness-of-fit in the Minneapolis mayoral election data analysis


Figure A.6 Trace plots of K in the Minneapolis mayoral election data analysis.

Figure A.7 Trace plots of
$\omega $
in the Minneapolis mayoral election data analysis.
Appendix A.3 Additional Results from Section 5.3
Figure A.8 displays stacked bar charts of ranks received by each policy option by the survey respondents from Great Britain. Figure A.9 shows a comparison between results obtained using the proposed Rank-Clustered BTL and a standard BTL model.

Figure A.8 Stacked bar charts of ranks received by each policy option.
Table A.3 Posterior predictive p-values based on a standard BTL and Rank-Clustered BTL (RC-BTL) to assess goodness-of-fit in the Eurobarometer survey data analysis


Figure A.9 Comparison of results among comparator methods for the Eurobarometer survey data analysis.
Table A.3 contains posterior predictive p-values based on the discrepancy measure defined in Section 3.2.2. A p-value is calculated for both a standard BTL distribution and a Rank-Clustered BTL distribution fit to the observed data. All statistics are well above a standard 0.05 threshold, indicating acceptable fit to the observed data. We recall that posterior predictive p-values are used as tools to assess potential model misfits, and not to compare or choose among the models (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013, p. 150).
Figures A.10 and A.11 contain trace plots for K and
$\omega $
after burn-in for each chain. We find the trace plots to demonstrate satisfactory mixing and convergence.

Figure A.10 Trace plot of K in the Eurobarometer survey data analysis.

Figure A.11 Trace plot of
$\omega $
in the Eurobarometer survey data analysis.
Appendix A.4 Additional results from Section 5.4
Figure A.12 displays stacked bar charts of the season record of each NBA team across the 2023–2024 season. Figure A.13 shows a comparison between results obtained using the proposed Rank-Clustered BTL and a standard BTL model.

Figure A.12 Stacked bar charts of ranks received by each NBA team across the 2023–2024 season.
Note: Winning = rank 1; losing = rank 2.

Figure A.13 Comparison of results among comparator methods for the 2023–2024 NBA season analysis.
Table A.4 contains posterior predictive p-values based on the discrepancy measure defined in Section 3.2.2. A p-value is calculated for both a standard BTL distribution and a Rank-Clustered BTL distribution fit to the observed data. All statistics are well above a standard 0.05 threshold, indicating acceptable fit to the observed data. We recall that posterior predictive p-values are used as tools to assess potential model misfits, and not to compare or choose among the models (Gelman et al., Reference Gelman, Carlin, Stern, Dunson, Vehtari and Rubin2013, p. 150).
Figures A.14 and A.15 contain trace plots for K and
$\omega $
after burn-in for each chain. We find the trace plots to demonstrate satisfactory mixing and convergence.
Table A.4 Posterior predictive p-values based on a standard BTL and Rank-Clustered BTL (RC-BTL) to assess goodness-of-fit in the 2023–2024 NBA season analysis


Figure A.14 Trace plot of K in the 2023–2024 NBA season analysis.

Figure A.15 Trace plot of
$\omega $
in the 2023–2024 NBA season analysis.