Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-09T05:41:11.892Z Has data issue: false hasContentIssue false

The shortlist effect: nestedness contributions as a tool to explain cultural success

Published online by Cambridge University Press:  08 November 2021

Olivier Morin*
Affiliation:
Minds and Traditions Research Group, Max Planck Institute for the Science of Human History. 10, Kahlaische strasse, 07745 Jena, Germany Institut Jean Nicod, Département d'études cognitives, ENS, EHESS, CNRS, PSL University, UMR 8129. 29, rue d'Ulm, 75014 Paris, France
Oleg Sobchuk
Affiliation:
Minds and Traditions Research Group, Max Planck Institute for the Science of Human History. 10, Kahlaische strasse, 07745 Jena, Germany
*
*Corresponding author. E-mail: morin@shh.mpg.de

Abstract

Detecting the forces behind the success or failure of cultural products, such as books or films, remains a challenge. Three such forces are drift, context-biased selection and selection based on content – when things succeed because of their intrinsic appeal. We propose a tool to study content-biased selection in sets of cultural collections – e.g. libraries or movie collections – based on the ‘shortlist effect’: the fact that smaller collections are more selective and more likely to favour highly appealing items over others. We use a model to show that, when the shortlist effect is at work, content-biased cultural selection is associated with greater nestedness in sets of collections. Having established empirically the existence of the shortlist effect, and of content-biased selection, in 28 sets of movie collections, we show that nestedness contributions can be used to estimate to what extent specific movies owe their success to their intrinsic properties. This method can be used in a wide range of datasets to detect the items that owe their success to their intrinsic appeal, as opposed to ‘hidden gems’ or ‘accidental hits’.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press
Figure 0

Figure 1. An illustration of the concepts of nestedness and nestedness contribution. (a, b) Two binary matrices representing items in collections. A box indicates that the relevant item (column) is present in the relevant collection (row). Matrix (a) has higher nestedness than matrix (b), because of the distribution of the colored item. In (a), the blue item is present in all large collections above a certain threshold, and the matrix is perfectly nested: the items found in any small collection are a proper subset of the set of items found in any bigger collection. In (b), the green item, which has the same frequency as the blue item, is present in some, but not all, of the biggest collections, and also in some of the smallest. As a result, (b) is not perfectly nested: some of the bigger collections miss some of the items present in smaller ones. The blue item contributes more to the nestedness of (a) than the green item contributes to the nestedness of (b). (Inspired by Saavedra et al., 2011.)

Figure 1

Figure 2. A model of cultural collections predicts that content-biased selection increases the nestedness of a set of collections. (a) Our model is built upon the adoption probability function, which models the probability that an item will be added to a collection, based on the collection's capacity to adopt items, the item's prevalence, and its appeal (all represented by values between 0 and 1). (b) We ran this equation for 11 items and 11 collections of varying capacity and prevalence, changing only the value of the correlation between appeal and prevalence. In the first simulation (top panel), appeal is constant and independent of prevalence (no content-biased selection). In the second simulation (bottom panel), appeal is identical to prevalence (perfect content-biased selection). The resulting pattern of adoption probabilities is consistent with high nestedness: the items in low-capacity collections are likely to be a subset of the items present in high-capacity collections. Low-prevalence items do not occur in low-capacity collections, and high-prevalence items are very likely to occur in high-capacity collections. Simulations were computed with e = s = 1. Changing the values of e or s does not change the qualitative pattern shown here (see Methods).

Figure 2

Table 1. Correlation between content-biased selection and nestedness: results of the simulations. Each cell shows the result of 5 times 30 simulations of a 200 × 200 binary matrix. The matrix was populated using the adoption probability function (equation 1). The values that the parameters s and e were given in the equation change from cell to cell. Each of the 200 items was given a frequency k and an appeal a. The correlation between frequencies and appeals, representing content-biased selection, was 0, 0.25, 0.5, 0.75 or 1. We considered how the nestedness of the matrix was correlated with the value of content-biased selection. Each cell shows the correlation between nestedness (NODF values) and content-biased selection. The correlation was measured over five data points, each data point representing the average NODF measured over 30 simulations for one level of content-biased selection (out of five). Inside each cell, the left-side figure is the raw correlation, the right-side figure after the slash is the partial correlation controlling for matrix fill, both rounded to the lower value. All cells where the partial correlation is >0.9 are in bold. A tight correlation between nestedness and content-biased selection holds whenever e is not much higher than s

Figure 3

Figure 3. The shortlist effect: highly rated movies are more likely to enter smaller collections. Results of the test of our prediction on the MovieLens (a) and Netflix (b) datasets. Genre subsets are listed on the y-axis (first column). Each subset was analysed three times, for each of three different sources: MovieLens and IMDb for both datasets, with the addition of Metacritic ratings (for MovieLens data) and Netflix ratings (for Netflix data). The second column (‘n hits’) shows how many of the three tests verified the prediction. We considered the test to be a hit when the addition of an interaction term for appeal modulated by collection size made the model more informative (ΔAIC > 2), and when the sign of the interaction term was the opposite to the estimated weight of appeal. The third column indicates how many individual movies the regression was computed on. The graph represents each model's estimate for the interaction term, modified as follows: if the main effect of appeal over adoption in a collection is negative, we report the negative of the interaction term (if a value is positive, it is transformed to negative). If the main effect of appeal over adoption in a collection is positive, the interaction term is plotted as it is. Effect sizes are typically much larger for the MovieLens dataset (top panel) compared with the Netflix dataset, probably owing to the broader range of collections and movies (including large numbers of very small collections and highly unsuccessful movies) in the Netflix dataset. Error bars indicate 95% confidence intervals.

Figure 4

Figure 4. Drift scores negatively predict nestedness contributions. Estimates for the weight given to movies’ drift score in our nested regression model predicting movies’ nestedness contributions, for the MovieLens (a) and Netflix (b) dataset. Genre subsets are listed on the y-axis (first column). Each subset was analysed three times, for each of three different sources of ratings: MovieLens and IMDb for both datasets, with the addition of Metacritic ratings (for MovieLens data) and Netflix ratings (for Netflix data). The second column (‘n hits’) counts how many of the three tests confirmed our hypothesis (‘hits’) or reject it (‘miss’). We considered the test to be a hit when the model including drift scores was more informative than the baseline model (ΔAIC > 2) and estimated the weight of drift scores to be negative. Error bars indicate 95% confidence intervals.

Supplementary material: File

Morin and Sobchuk supplementary material

Tables S1-S6 and Figure S1

Download Morin and Sobchuk supplementary material(File)
File 99.7 KB
Supplementary material: File

Morin and Sobchuk supplementary material

Morin and Sobchuk supplementary material

Download Morin and Sobchuk supplementary material(File)
File 14 KB