Open-ended cumulative cultural evolution of Hollywood film crews

Are there large-scale trends in art history that surpass individual creativity or relatively short artistic movements? Many theories describe art history as a process similar to a change of fashions, while others suggest that art can be progressive - getting better, in some sense, over time. We approach this question anew with the theory of cumulative cultural evolution, which describes cultural accomplishments in terms of innovations that are maintained across generations and accumulated to support ever greater creative potential. In this paper, we empirically test the possibility for cumulative evolution in the techniques used to make an artistic product. Specifically, we measure the size and structure of the production crews in American films in 1910-2010 based on a dataset of 1000 popular films across the century. We find that film crews become exponentially more complex, with a growing set of core jobs, and more innovative in creating new jobs in filmmaking. Our study shows that art history can be cumulative, showing the progressive maintenance of innovative techniques, and thus providing an alternative to the widespread view of art history as a mere fluctuation of trends and fashions.

• S1: Data collection. Explains how the data was collected. • S2: Data reliability. Describes the information on data reliability included in the dataset. • S3: Data harmonization. Explains the transformations made to dataset for the analysis. • S4: Markers of hierarchical order. Explains how the markers of hierarchical order were detected in job titles. • S5: Assocations between the variables. Describes the associations between the measured variables. • S6: GAM details. Describes the GAMs reported in the paper in detail. • S7: GAMs on markers of hierarchical order. Shows the contributions of different types of markers to the trends reported in the paper. • S8: Checking against chance similarities. Describes an additional check made to determine whether the found accumulation of jobs could be explained by just increasing film crew sizes. • S9: Diffusion curves in detail. Shows the diffusion curves by decade of origin. • S10: Models of innovation space exploration • Required libraries • References The data and code to reproduce the analysis and figures, both in the paper and in the supplement, are available at an Open Science Framework electronic repository here: https://osf.io/6ysda/

S1: Data collection
We collected the data on movie ratings that was used to form the sample from the downloadable datasets on the basic movie information on IMDb (retrieved April 14, 2019). The information on data completeness, used to build the sample, was given in their previous published datasets (retrieved September 9, 2017). The latter was no longer updated after November 7, 2017. The information on the film crews was manually collected for the 1,000 films in the sample from the IMDb movie website from their "Full cast and crew" listings on April 14, 2019. The markers of data completeness within the sample were also updated then for the analysis.

S2: Data reliability
Data on film crews on IMDb is marked if it is "expected to be complete" or "verified as complete". In our sample, 692 film crews had one of these markings, while 308 were not marked (see Fig. S1). 138 of the unmarked crews belonged to the 1910s and 1920s, where definite confirmation can be difficult to come by. 66 of the unmarked film crews belonged to the 2000s, where current business interests could make the data difficult to confirm. Since our sample was based on the most popular films, we expect the data to be mostly reliable even for the unmarked crews. In order to make sure that the data did not differ to a substantial degree, we applied different weights to the data points based on their expected completeness when building models (see Section S6).  1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 Decade Films Status Undetermined Believed to be complete Verified as complete Fig. S1. The quality of data in the sample. The production crew of each film bore the mark of Confirmed to be complete, Expected to be complete or no mark. The stacked barplot shows the number of films in each category by decade.

S3: Data harmonization
Data harmonization proceeded in an iterative fashion, finding common substitutions and formatting principles behind the job titles. We removed the specifics of the jobs that were typically stored in the brackets after the job title, or following a colon (e.g., "line producer: Hong Kong", or "hair stylist: Mr. De Niro", "special makeup effects crew: Cantina sequence"). This text was used to assess whether the job should be associated with the initial release, and was removed for the analysis.
A few frequently occurring spelling variants of the same job were manually harmonized. E.g., "r&d", "R&D", "r&d;", "research & development", "research&development"; "make-up", "make up", "makeup"; "roto", "rotoscope", "rotoscoping", "roto-scope" were all replaced within the job title with a single common denominator: in these cases, "r_d", "make-up", and "roto". This was done on the basis of a manual comparison of the 11,350 unique jobs in the dataset that remained after the removal of the additional information, with a focus on more prevalent jobs.
Some of the entries contained more than one job on one line. In this case, they were split into several jobs. When a job title contained a slash separated by spaces or "and/or" (e.g., "helicopter pilot and/or camera operator"), it was split into several jobs. When a job title contained an "and", "&", or a slash separating two substrings (e.g., "painter/decorator gang boss", "giggles/howls/marmots", "hair & wig adviser; except with r&d") the parts of the job were conjoined to create several composite jobs (resulting in, e.g., "painter gang boss / decorator gang boss", "giggles / howls / marmots", "hair adviser / wig adviser"). These were then split into separate jobs for the analyses. The precise operationalization is available in the code shared with the paper.

S4: Hierarchical order of jobs
The hierarchical order of the jobs was assessed by keywords that were associated with particular roles in the hierarchy. The following parts of string were looked for within the job titles.

S5: Associations between the variables
Due to the shared basis in historical trends, the measured variables are highly intercorrelated in films (see Fig. S2, top right, above the self-correlation diagonal). In order to assess their relative independence within each year, we fit a mixed effects linear model on each of the predictors with the year of release as a random effect and measured the amount of variation explained by the other variable in each predictor pair (Fig. S2, bottom left, below the self-correlation diagonal).
We find that the number of people, the number of distinct jobs, and the number of total jobs have an association > .95 even when controlling for year. While these measure highly overlap, we considered them separately in the analysis for conceptual clarity. We believe that the number of people or jobs should be intuitively more understandable than an aggregate value that would combine all three measures. Job title length also has a mild positive association with them, indicating a trend for films with more jobs to also have, on average, somewhat longer titles. Note that the linear relationship between the log-transformed variables also reflects the Heaps' law known from information retrieval that holds for various counts of types distributed within collections (Heaps 1978). Following this distribution, rare variants will be difficult to catch even in large samples of the collection, while common variants would be found already within small samples, indicating that at least for the core jobs the coverage will be good.
The proportion of repeated components does have a strong correlation with the number of people and jobs associated with the film (variation explained~.63), with larger film crews offering more chance for repetitive elements to be used, while the correlation is lower for the number of unique jobs. There is a strong positive association between the repeated jobs and repeated job components as they are intrinsically linked: repetition of whole jobs will necessarily repeat their job components. However, the proportion of repeated jobs is not associated with other film crew measures (variation explained < .20) indicating that the repetition of elements is not a direct result of the increased crew size. For other variables, there is a also moderate association between the proportion of jobs with no hierarchical markers and the job title length (variation explained .35), while other predicted variables are fairly independent from each other (variation explained < .20). The first association would mean that films with more hierarchical markers had a mild tendency towards longer job titles on average. Generally, the variables are independent enough from each other to merit independent investigation.

S6: GAM details
We fit a Generalized Additive Model (GAM) to each of the measured variables to estimate their trends as a smooth function of their year of production. We used an adaptive smooth with the restricted maximum likelihood (REML) on smooth as a random effect to estimate a smooth that would be not overfitting or underfitting the data. We tried several basis dimension setups and in the final model we used 15 basis dimensions across all models and 5 smoothing parameters for the adaptive smooth. The results provided a good fit with the data and fulfilled the model assumptions.
In order to incorporate the information on data reliability, we varied the model weights based on the degree of confidence in the data points. We used three different parameter sets for this. In the naive model, we weighed all data points equally. In the balanced model, we gave 10% less weight to non-confirmed data points and 25% less weight to data points with no information on accuracy. In the conservative model, we gave 50% less weight to non-confirmed data points and excluded the data points with no information on accuracy.
The exact weights are given in Supplement Table S1 below. Altogether, 7 variables were modelled for the patterns of growth over time. An overview is given in Table S2.

Checking model assumptions
The model checks are given for each model in Table S3. In building the models, we iteratively increased the number of basis functions (k) to get a relatively stable effective degrees of freedom and a high enough difference between it and k. In our sampling we optimized the data for a coarse view of 100 years in 10 decades, however with approximately 10 films per year (SD=4.3) it seems reasonable to include year as a continuous predictor in the model. We found 15 basis functions to provide a good model for the data. A higher k, e.g., k = 30, complicated the smooth function slightly (e.g., edf = 6.33 → 6.91 when increasing k = 15 → 30 for the number of people in the crew), however the predictions in the mean were not easily distinguishable from k = 15 (see Fig. S3). Since a larger k noticeably increased the credible intervals due to a larger number of possible curves included without much changes to the mean estimate, we report the models with k = 15 in the paper. The basis function k = 30 provides very similar results with slightly shorter periods of significant growth due to wider credible intervals.
The balanced and naive models give similar curves with similar effective degrees of freedom, while the conservative models provide slightly less wiggly trends due to the exclusion of many data points in the first two decades and in the last decade. The diagnostic plots for model assumptions are given in separate files in the electronic repository under figures/model.checks/: gam_checks_bal.pdf (balanced models), gam_checks_con.pdf (conservative models), gam_checks_nai.pdf (naive models). The models provide a reasonably good fit to the data.

Model summaries
Each model showed a significant effect of the smoothed year across the period, demonstrating a pattern of growth for all response variables over the period.
The main results are given in the Table S4 below. The deviance explained was very high (>0.8) for the balanced and naive models on film crew size. Relatively high (>0.7) for job title length and repeated job title components, medium (~0.6) for the proportion of hierarchical jobs and low (~0.3) for repeated whole job titles. The deviance explained was similar, but slightly lower in the conservative models that excluded almost one third of the data points. In order to find the periods of significant change, we estimated the derivatives of the fitted spline with the method of finite differences: we took the predictions of estimated means at two close time points and calculated the difference between them. We did this for 300 points across the observed 100-year period. Based on this, we simulated 200 model fits and estimated the 95% simultaneous credible interval of the smooths.
The periods of significant change are the time periods where the simultaneous credible interval on the first derivative does not include zero. These intervals were obtained by simulation from the posterior distribution of the first derivative. A 95% confidence interval here contains in its entirety 95% of all random draws from the posterior distribution.
The results of the models are provided below (Fig. S3) with the comparison of the three sets of models. On the left is the model fit along with the data and confidence intervals, with the periods of growth shaded blue. On the right are the estimated first derivatives of that plot across the time period. Whenever the first derivatives did not include 0 but were bigger than it, the significant period of growth was marked. At no point did any of the measures show a significant trend of decrease.

Comparisons
The three different ways to include information on data completeness give very similar results for almost all parameters. Due to the removal of many data points in the first two decades, the conservative models differ in their estimations of hierarchical order, job title length, job reuse and job component reuse, offering predicted means closer to the subsequent decades. For job reuse and job component reuse, the first growth period situated around 1920s would then disappear. For markers of hierarchical order, this extends the trend slightly to the beginning of the period and for job title length, this slightly shortens the trend. However, given that this is due to most data points in 1910s not having been marked for data completeness, we follow the models that do rely on all data points as better data may be difficult to come by on this and it is reasonable to expect that the data on the most popular films is still fairly reliable. For the later periods, including the 2000s, the models offer mostly the same predictions. In the paper, the results of the balanced model are reported. Year First derivative Comparison of balanced, conservative, and naive models Fig. S4a. GAM results on the number of people per film (n = 1,000). The predicted values with 95% confidence interval on the left, first derivation on the right. The blue shaded areas on the left are periods when the first derivation was significantly different from 0, with 95% confidence interval. The top row -balanced; middle row -conservative; bottom row -naive models. Year First derivative Comparison of balanced, conservative, and naive models Year First derivative Comparison of balanced, conservative, and naive models Year First derivative Comparison of balanced, conservative, and naive models Year First derivative Comparison of balanced, conservative, and naive models Year First derivative Comparison of balanced, conservative, and naive models Year First derivative Comparison of balanced, conservative, and naive models

S7: GAMs on markers of hierarchical order
The paper reported a non-linear trend of increase in the proportion of jobs with the markers of hierarchical order. This in turn comprised of the trends of the three types of jobs: superordinate, equal, and subordinate. In order to better understand the trend, we fit a GAM on each of them with the same model parameters as the model of hierarchical jobs, with balanced weights on the data points (see Fig. S4 for results). We found that this trend was comprised of a gradual linear increase of superordinate jobs until the 2000s and the punctuated growth of subordinate jobs, with a slower increase until the 1940s and a quicker increase from 1964 to 1987. The proportion of associate jobs remained quite low throughout the period.

Checking model assumptions
The model checks are given for each model in Table S5. The number of basis dimensions was chosen as 15, same as in other models. The diagnostic plots for model assumptions are given in a separate file in the electronic repository under figures/model.checks/: gam_checks_bal_hier.pdf. The models provide a reasonable fit to the data.

Model summaries
Each model showed a significant effect of the smooth function of the year across the period, demonstrating a pattern of growth for all response variables over the observed period (Fig. S5). The deviance explained by the model was high for superordinate jobs (.60), subordinate jobs (.52) and all the markers of hierarchical order together (.65). The proportion of associate jobs was not well predicted by the smooth of the year (deviance explained .10). The main results are given in the Table S6 below.
We estimated the derivatives in the same way as with other models. The results of the models are provided below (Fig. S5). On the left is the model fit along with the data and confidence intervals, with the periods of growth shaded blue. On the right is are the estimated first derivatives of that plot across the time period. Whenever the first derivatives did not include 0 but was bigger than it, the significant period of growth was marked. At no point did any of the measures show a significant trend of decrease.  1910 1930 1950 1970 1990 2010 Year First derivative Fig. S5. GAM results on the different types of markers of hierarchical order (n = 1,000). The predicted values with 95% confidence interval on the left, first derivation on the right. The blue shaded areas on the left are periods when the first derivation was significantly different from 0, with 95% confidence interval. From top to bottom the measures depicted are 1) Jobs with superordinate markers; 2) Jobs with subordinate markers; 3) Jobs with neutral markers; 4) Unmarked jobs over time.

S8: Checking against chance similarities
In order to check that the growth of the number of central jobs is not simply due to bigger film crews increasing the number of chance similarities, we created 1,000 random permutations of the dataset where each film was given a random subset of unique jobs present in that decade that matched the number of unique jobs in that film. In both our sample and the random datasets the distribution of jobs is power-law-like: most jobs are relatively rare between films, while few jobs are present in many films. Always, less than 15% of the jobs are present in more than 15% of films. Fig. S6a plots the cumulative distributions of jobs of our data and one random sample across the decades. The slope of the distribution is much steeper for the generated dataset. Randomly, there are very few jobs in more than 15% of films per decade, while in our data there are more than 5% of jobs in at least 15% of films for all decades. Over time, the distribution becomes flatter and smoother as more unique jobs are added in both datasets, e.g., in the 1910s, 5% of jobs were in 30% or more films, while in the 2000s, only 2.5% of them were in 30% or more films in our sample. Fig. S6b plots the mean number of jobs in more than 10% of the films in each decade for all 1,000 generated datasets along with the data from our sample. Due to the number of unique jobs in each decade increasing along with the film crew sizes, the number of jobs shared by chance between many films stays roughly the same throughout the period, which, except for the first decade, is decisively lower than in our sample (M MEAN = 12.7, SE MEAN = 6.1, M SD =3.0, SE SD = 0.7 between decades). Across the samples, there were only a few occasions of jobs shared by more than 20% of films, and it was never more than one job, always in the 1910s. This indicates that the growth of the number of jobs shared between films is indeed due to an accumulation of innovations that become preferentially used between films.  The dataset studied is shown with yellow background and one generated dataset is shown with purple background. Grey lines show the mean across decades, selected decades are shown in a distinct lines. b) The number of jobs shared by more than 10% of films in the decade. The red line shows our sample across decades, the blue line shows the generated 1,000 datasets with error bars at ± 2SD from the mean. Fig. S7 shows the adoption curves of the jobs that were in at least 20% films in the 2000s, separately for each decade of first occurrence. The figure shows that apart from a few jobs from the very first decade that reached very high popularity right away, the growth in popularity for most of the jobs was gradual.

S10: Models of innovation space exploration
To test the association of the number of inventions with the variety originating from earlier generations we counted all the distinct jobs per decade and divided them into two parts: ones invented in the same decade and ones reused from earlier decades. We log-transformed these values to reduce the influence of larger values and fit a linear regression model to estimate the association between the two. Through model criticism and selection, we added the total growth rate compared to the previous decade as an additional predictor, which significantly improved the model. See the code attached in the SI Appendix for details. The model provided a good fit to the data explained 96% of the variation in the data (beta variety = 0.87 95% CI [0.71 -1.04], beta growth = 0.97 95% CI [0.54 -1.39], F (2,6) = 103.9, p < 0.01, R 2 = 0.96). The final formula was the following.
log_new_jobs~log_old_jobs + perc_increase The model diagnostic plots are given in a separate file figures/model.checks/: lm_checks.pdf.
To test the same association within thematic groups and to estimate the influence of local variation on the generation of innovations, we grouped the data into thematic clusters based on shared words within the job title and constructed a mixed effects model to determine the relationship within thematic clusters. For this analysis, we only included the clusters that had at least 10 jobs within a cluster, meaning that at least some exploration had been done in that innovation space. There were altogether 352 such thematic clusters (distinct jobs in cluster median = 28, IQR = 16-56, range = 10-1266). Based on model criticism and selection, we found the best fit in adding both the random intercept and slope for each group to the ordinary least squares regression model specified above. We found the association to be strong (beta variety = 0.77 95% CI [0.72 -0.82], beta growth = 0.71 95% CI [0.62 -0.80], with marginal R 2 = 0.57 and conditional R 2 = 0.70), demonstrating the close link between the variation already present in the population with inventions produced in this area of culture. This eventually gave the following formula.