Data Scraping YouTube for the Study of Lieder Reception

A growing body of literature has shifted aesthetic attention from composition to performance, or the performing activity, and asserts that the act of performance creates meaning.1 Scholars have emphasized differences between the passive consumption and active making of – or even listening to – music.2 As I sought to understand the impact of performance on Alma Mahler's legacy, I identified the need to gather as much data as possible on who, what, where, when, why, and how her songs were performed. This need led me to evaluate the metadata associated with recordings of Alma Mahler's songs in the WorldCat union catalogue and the video sharing platform YouTube. Recent studies have shown the utility of leveraging big data for musicology, although few scholars have done so to investigate reception history. This essay outlines one approach to data scraping YouTube with emphasis on the value to those researching recent Lieder reception, and in doing so highlights some of the promise and limitations associated with web scraping.

do not substitute for, but rather add value to traditional humanities approaches'. 5ranco Moretti was among the first literary scholars to recognize the opportunities afforded by leveraging big data; his theory of 'distant reading' elucidates networks and connections that can be drawn through computational analysis of large literary corpora. 6Musicologists have already begun to leverage literary and musical corpora and computational methods to conduct distant readings of diverse topics, but to date there have been few such studies relating to nineteenth-century music. 7usic scholars increasingly make use of bibliographic data.Several studies indicate the viability of library-generated metadata for musicological study. 8eather Platt and Michelle Urberg independently wrote overviews of digital musicology projects; both highlighted relevant digital repositories, aggregators and projects, some of which made use of bibliographic data. 9Sandra Tuppen, Stephen Rose and Loukia Drosopoulou discuss ideas for the analysis and visualization of bibliographical data for musical materials based on bibliographic information in their article, 'A Big Data History of Music'. 10Scholars also have looked to musical recordings for empirical studies. 11Eamonn Bell investigated the deep linking within YouTube videos that facilitates time-coded comments, and he is one of the few music scholars who have studied content on this platform. 12Studies of music on YouTube most frequently investigate popular music and do not concern nineteenth-century music.No studies that I can identify have automated the scraping of YouTube metadata to investigate a composer's reception history.
Data mining and social networking platformsand not just the analytical techniques associated with bothare not unproblematic.Scholars have highlighted the challenges of mining data on social networking sites, noting the undue influence of corporations. 13Despite its limitations, data scaping is currently the most efficient way to collect data from online video-sharing platforms.Digital humanities approaches are still evolving, and best practices for interrogating these data have not yet been codified.Given the prominence of YouTube, investigating its data provides a more complete picture of how performers and listeners are engaging with Lieder composers in the twenty-first century.

Alma Mahler Recordings in WorldCat
I conducted an analysis of Alma Mahler recordings in WorldCat to establish some context for the data yielded from harvesting YouTube results.I ran an author search for 'Mahler, Alma' in WorldCat's Online Catalogue, on 25 February 2022, and limited the results by format to sound recordings.14I exported the resulting 271 records with unique OCLC numbers in a .CSV file and sorted the exported data in turn by 'publisher', which also includes date and publisher location, 'title', and 'author' in order to identify duplicates, irrelevant content, and songs catalogued individually. 15After excluding recordings that repackaged the same content in another format and removing results that did not include Mahler's Lieder, I identified 107 unique albums.
I then sorted the recordings chronologically and organized them thematically by content, excluding 2020-2022 recordings from the analysis.There is an increase of recordings featuring Lieder by Alma Mahler from one recording in the 1970s, seven recordings in the 1980s, 23 in the 1990s, 38 in the 2000s, and 36 in the 2010s, as shown in Figure 1. 16The metadata suggest that at least 30 recordings were presented as recitals within a conservatory or educational setting, 17 and that mezzo-soprano and soprano voice types prevail among performances, but this information is not necessarily populated for each recording and cannot easily be confirmed.
I sorted recordings into one of the following categories: 1) exclusively Alma Mahler Lieder, 2) Lieder by Mahler and her contemporaries, 3) anthologies featuring the works of female composers, and 4) assorted repertoire.As shown in Figure 2, eight recordings featured only the songs of Alma Mahler, 27 focused on the music of Mahler and other contemporaries.Thirty-three recordings featured primarily or exclusively female composers.The remaining programmes did not focus exclusively on the work of Alma Mahler, her contemporaries, or female composers.The data by decade suggests that since the 2010s Mahler's music is less frequently restricted to women-only recordings than it previously was. 18he WorldCat data demonstrate that Mahler's Lieder are recorded at an increasing rate and are less likely to be relegated to specialized programmes of women's music than previously.Numerous recordings programmed Alma Mahler's music along with that of Gustav Mahler, Alexander Zemlinsky, Arnold Schoenberg and other contemporaries.Various recordings include Mahler's work alongside  WorldCat data provide useful insights into some of the performers and performances of Alma Mahler's music.Lieder are frequently performed in festivals, recitals and educational venues, however, that do not necessarily result in a formally produced recording.I identified a need to collect and analyse data from a more casual source to which more performers have access: YouTube.The music available on YouTube, and other online streaming music or video platforms that host user-generated content is frequently not represented within the bibliographic universe of WorldCat.Collecting and then comparing WorldCat and YouTube data refines our understanding of who, what, where, when and why Lieder are performed in the twenty-first century.

Alma Mahler Recordings on YouTube
WorldCat is a library-led platform whose structured metadata and indexes allow for specialized searching and collocation of like materials.YouTube, however, is a commercial platform that is designed for ease of uploading and sharing, and not for precise searching.There is no author index in YouTube, and it is accordingly challenging to separate content about Alma Mahler from performances of her music.YouTube grows more rapidly than WorldCat and boasts more recorded music.YouTube contains, for example, many hundreds of recordings of Alma Mahler songs, and collecting the relevant metadata for these requires automated assistance.
I employed DataMiner, a screen scraping plugin for the Chrome web browser that harvests webpage content, metadata, and other technical information from webpages using automated processes.DataMiner is one of numerous software applications for scraping data from websites; those with programming skills need not rely on an out-of-the-box solution.DataMiner can be programmed to identify and capture desired data points and to crawl specified webpages, both of which facilitated the automated capturing of data about recordings of Mahler Lieder uploaded to YouTube. 19efore I could program Data Miner by creating what the platform calls 'recipes', however, I had to investigate how YouTube's relevancy ranking worked with Mahler's songs.In order to achieve the most relevant results I searched Alma Mahler and the song title, both in quotation marks. 20Although refining my search queries helped with the relevance of search results, it also led to the omission of several valid performances of her work.Because all metadata is user-generated the title and descriptive information is rife with typographical errors.I found recordings of 'Laue Sommernacht' labeled as 'Blave Sommernacht', for example, and I also found recordings featuring groups of songs that did not feature individual song titles in the scraped text.Although searching by song title yielded the most relevant results, I ultimately decided to search 'Alma Mahler' and 'Alma Schindler'.Because YouTube is a commercial platform, promoted results for other classical music videos were frequently included in groups as recommendations within the results for the searched content.These results were impossible to exclude, and I had to manually delete them.
Data Miner recipes specify which elements from the webpage will be scraped and exported.Figure 3 shows a page of YouTube results for 'Alma Mahler' and the first part of the recipe creation process in which the area of the results to be included is specified.After that is established, the particular data points of interest are configured.In this case, I wanted to collect URL, title, number of views, date posted, channel, and description. 21fter scraping data from all 'Alma Mahler' and 'Alma Schindler' results on 7 March 2022, I exported the URL, title, channel, number of views, date posted, and description into an Excel spreadsheet.I deduped the data based on URL, because the title, channel, and other data points were not necessarily unique.At this point, I reviewed the results individually to ensure that all were relevant.Of the 675 unique YouTube URLs, only 410 included performances of Alma Mahler's songs, rather than discussions of her life, performances of Gustav Mahler's music, or promoted content.With a unique list of URLs, I began the second part of the data scraping process.With this data in hand, I could consider commonalities and differences among these recorded performances and compare them to the WorldCat data.Although some of the recordings on YouTube replicated formally published content catalogued in WorldCat, many others were not commercial recordings.The diversity of voices, perspectives, and individuals included on YouTube is not commonly found in WorldCat. 22Some performances were recorded in what appeared to be individual's homes, churches, or practice rooms, and not in recording studios or concert halls.The following sections outline some of the questions that YouTube data allow researchers to ask and answer.As with any platform that hosts usergenerated content and metadata, there are plenty of caveats.Despite its limitations, however, the number and variety of recordings available in YouTube suggests it to those interested in understanding the performance of any musical genre in the twenty-first century.Dates are treated differently in YouTube than in WorldCat.The date captured during data scraping is the date a video is uploaded.This date supersedes any other dates included in the description or other textual fields such as title.Because videos may be uploaded at any point after an event, for example, we cannot be certain that the recitals or concerts from which Mahler songs are excerpted and posted correlate to the YouTube date.WorldCat dates are more standardized, even if they also do not prioritize the performance date in the case of live performances.The date used in WorldCat is most often a date of publication or copyright.Those recording and sharing Alma Mahler's music on YouTube sometimes perform transcriptions and arrangements, and not only the work in its original scoring.The majority of videos feature a single singer and pianist, though several recordings feature a singer with orchestral accompaniment, many of which utilize the orchestration by David and Colin Matthews. 25Performances by choral groups singing transcriptions of Mahler songs are included among the results.Instrumental arrangements for cello quartet, wind ensemble, solo piano, and even solo cornet are included. 26 few recordings featured tracks that had been modified or remixed using electronic means. 27Pianists not only performed solo transcriptions of Mahler's works, but also offered piano accompaniment tracks that singers might use to help learn Mahler's songs and prepare for performance.My analysis of the data did include my listening briefly to verify and add voice type when missing.Alma Mahler's songs in YouTube are most frequently performed by singers with soprano, mezzo-soprano, or alto voice types.141 records featured mezzo-sopranos, 121 featured sopranos, and 98 included other or unspecified female voices.This is in stark contrast to the eight records for baritone, seven for tenor, and two for other male voices.Many of the recordings did not list voice type in the available metadata and that is perhaps a limitation of this platform for the study of Lieder.The professional level of performers is challenging to investigate with the data provided.Thirty videos had the terms college, university, or conservatory in the record metadata, and many of these further specified that the performance had been part of a degree recital.
Performances of Alma Mahler songs were uploaded to 224 unique channels.These channels may represent fans uploading the content of others, performers uploading their own content, and official channels uploading content on behalf of professional artists and ensembles.Content is uploaded by festivals, such as Aspect Chamber Music Series; singing contests, such as Redwood Empire Chapter of NATS; and ensembles, whether professional or not, such as SWR Vokalensemble Stuttgart.A few of these channels seem to have considerable user engagement as evidenced by video views, likes, and dislikes.Of the total 553,317 views of Alma Mahler songs in this sample, for example, 290,189 or over half of all views, came from only six channels: • 112,281 NPR Musicacross 1 video • 49,859 AllaBreve3across 7 videos • 45,233 Singer Joyacross 1 video • 28,119 vozbialaacross 2 videos • 27,551 London Review of Booksacross 1 video • 27,146 Wellesz Theatreacross 1 video A high number of views may suggest that these channels have a large audience base, that they are providing unique content, that their videos have been added to automated playlists, that their videos rise to the top of the relevancy ranking, whether intentional or not, or that listeners like and engage with these recordings differently.
Similarly, most of the total 6,867 likes came from a handful of channels: • NPR Music: over 3,000 likesacross 1 video • Singer Joy: 444 likesacross 1 video • ARTE Concert: 444 likesacross 2 videos • London Review of Books: 311 likesacross 1 video • George N. Gianopoulos 231 likesacross 2 videos Unlike views, which could be attributed to luck, users must log in and actively click the like or dislike buttons.This suggests that certain channels have higher levels of user engagement, and that some content is perceived as better or worse than others.Of course, the number of videos on a given channel also contributes to the amount of engagement with the like and dislike features.

What: Content and Context
Whereas WorldCat most often provides musical content in the familiar package of an album, YouTube includes countless variations, from a single song to an entire programme.Most of the videos feature individual tracks, which is in line with how the platform is used in the case of much classical music and other genres of music.Song titles are more consistently included in YouTube metadata which allows for more granular analysis than the WorldCat data, which does not consistently encode all tracks of a recording.Figure 6 lists the number of recordings by  The inclusion of Alma Mahler's songs in diverse events is something that is captured in YouTube, but not well-represented in WorldCat.Some of the events in which her songs were included are celebrations of songs written by women composers, theatre events focused on her own life, and even Houston Grand Opera's parody of the 'Real Housewives' franchise. 29Some events were streamed live, which might be analogous to recordings created from live performance.When Alma Mahler's music is included among the music of other composers, it is worth noting any similarities among the composers, or any patterns among events.In the 'Concerto Omaggio alla Donna, Musa e Musicista', the songs of Clara Schumann (1819-1896) and Alma Mahler were accompanied by recitations from both composers' respective writings. 30Schumann and Mahler both composed Lieder and their lives did briefly overlap, but both women also wrote diaries, correspondence, and other forms of life writing, which is perhaps the more pronounced similarity and unifying theme of the event.An event called 'Art Sung' featured songs of Zemlinsky, Ludwig van Beethoven, Gustav Mahler, and Alma Mahler to bring Alma Mahler's relationships to Zemlinsky and Gustav Mahler to life.It is noteworthy that Alma Mahler's music was programmed in this event; in my initial analysis to remove content that did not include at least one Alma Mahler song, I encountered many videos that used Gustav's musicand not Alma'sto tell her life story.

How: User Engagement
Although WorldCat does support user-generated reviews and tagging, these features are infrequently used in the case of classical music recordings.Perhaps this is because the content is not immediately available for listening, as it is on YouTube.YouTube viewers can engage with the recordings at any point after a video is uploaded and this immediacy empowers content viewers and creators to engage in a more sustained manner with the content and each other.The immediacy also has implications for the reception of content over time and across diverse constituents.Classical music viewers admittedly have far lower levels of engagement with classical music videos than popular music and other performance recordings on YouTube.The description field in YouTube frequently provides texts or translations of Lieder, but also provides commentary that ranges from benign appreciation to hateful stereotypes about composers or performers.It is important to note that the person uploading videos can opt to 'turn off' comments, which can be decided on a case-by-case basis.
The comments left by YouTube audiences, while typically appreciative and bland, only infrequently deal with the musical compositions performed.Even when they do, they include extramusical remarks that could be understood as sexist.Casio61, for example, wrote: 'More advanced harmonies than I was expecting.Number 2 is very interesting, ending on a major 7th chord wasn't common in 1915.The woman had some talent.' 31 Other commenters take her more seriously, but nonetheless forefront Gustav Mahler, or other men in her life.Paul Meyer wrote: 'Discouraged by a jealous husband as she ranged from Richard Strauss to Arnold Schoenberg and was among the luminaries of her time.' 32Even as audiences hear her music, some prefer to perpetuate stories, diagnose, or sexualize Alma Mahler.

***
Data scraping YouTube revealed that Lieder are performed in a variety of venues, from private practice rooms to commercially released recordings.The diversity of performers and performances on YouTube is a profound advantage, especially as those studying classical music seek to identify and celebrate more diverse representation among performers and composers.YouTube should not be understood as accurately or comprehensively representing the state of Lieder performance.Nonetheless, investigating YouTube content and its metadata provides insight into the classical music being studied and performed by diverse musicians, especially those in the global North. 33Because fewer classical music albums are being released relative to previous decades, the reception of music in the twenty-first century should seriously consider content sharing platforms such as YouTube. 34y investigating both the formally produced and the casually shared recordings of Alma Mahler songs in this example, we have more information about performers and audience engagement.My analysis of Alma Mahler's Lieder in YouTube suggested that performance of her work has the power to 'disrupt the fetishized image of Mahler as femme fatale and establish her as a composer worthy of our attention'. 35Access to YouTube's large corpus of recordings and metadata

Fig. 3
Fig. 3YouTube Results List with Data Miner Recipe Creator

Fig. 5
Fig. 5 Alma Mahler Songs Uploaded to YouTube by Year

Fig. 6
Fig. 6Mahler Songs on YouTube by Title Lieder by Johannes Brahms, Franz Schubert, and Hugo Wolf.Recordings of Mahler's songs also seem to be growing more mainstream.There were initially few recordings of her works on major labels, but more recently, wellknown singers including Kate Lindsey and Barbara Hannigan have recorded Mahler's songs on labels that boast wide distribution.