1. Introduction – transcending flatness
The two-dimensional surface of the printed page lends itself so naturally to the visual representation of pitch-time structures that it is easy to overlook how the materiality of the page might manifest itself through or otherwise shape compositional decisions. Indeed, one might ask the question as to whether the dominance of pitch-time structures in Western musical practice is in no small part attributable to the ease with which such structures are supported by the materiality of a page, a piece of parchment or any other two-dimensional surface. The representation of pitch-time structures along implied xy axes is so naturally familiar and deeply embedded in Western musical practice; it is hardly surprising that little investigation has been undertaken into how three-dimensional visualisations might suggest new ways of organising musical material, provide new avenues for aesthetic enquiry and even offer new modalities of performance engagement.
In On Sonic Art, composer Trevor Wishart considers how the lattice orientation of traditional notation and its implicit privileging of pitch-time structures might be extended along an implied z axis (depth) to provide a means for representing timbre (Wishart Reference Wishart and Emmerson1996). While Wishart’s investigation is more theoretical in nature and not necessarily driven by a desire to develop new forms of musical notation, his extension of a two-dimensional pitch-time lattice to a three-dimensional pitch-time-timbre lattice, see Figure 1, nevertheless suggests a useful framework for how a three-dimensional visualisation might be conceptualised.

Figure 1. A hypothetical musical structure represented on a pitch-time-timbre lattice from Trevor Wishart’s On Sonic Art.
Wishart’s lattice model provides a helpful lens through which one can consider a body of historical work which explores the creative affordances of layered, visual depth such as John Cage’s Variations II (1961), Fontana Mix (1958) and Cartridge Music (1960), the scores of which are constructed from superimposed printed transparencies and sheets. In each of these works, the performance emerges as an assemblage of prescriptive actions (Manolopoulou Reference Manolopoulou2013) that is shaped through the superimposition and layering of visual information. While superimposed transparencies provided Cage with an idiosyncratic means for representing non-linear musical forms, his interest in such media reached a close with Variations VI (1966). Cage nevertheless continued to explore the aesthetic manifestation of visual depth in some of his lesser-known visual art pieces such as Not Wanting to Say Anything About Marcel (1969), see Figure 2, in which the results of a three-coin toss determine the order and spatial distribution of words and letters printed on parallel sheets of plexiglass (Perloff and Junkerman Reference Perloff and Junkerman1994).Footnote 1 Unlike Cage’s work with transparencies, the more overt use of depth in Not Wanting to Say Anything About Marcel foregrounds the spatial relationship of the observer to the work in a manner analogous to the viewing of sculpture. This correlation furthermore suggests a range of musical possibilities centred on a performative, physical exploration of a score’s three-dimensional spatiality.

Figure 2. Not Wanting to Say Anything About Marcel (1969).
Cage’s work with superimposed transparencies found a receptive audience in the experimental music practices of post-war Japan, particularly in the work of artist collectives such as Group Ongaku (Marotti Reference Marotti and Piekut2014) amongst whose members were composer Toshi Ichiyanagi and later Toru Takemitsu. In Ichiyanagi’s Music for Piano No. 7 (1961), for example, a pianist is provided with ten transparent sheets and given three options for how to arrange them to form a performance score. The third option requires the pianist to ‘accumulate ten sheets freely in a row. So each sheet is read only in part. The performer may change the order or turn the score upside down in the same position to continue the piece’. (Ichiyanagi Reference Ichiyanagi1961). Each of the ten available sheets features a graphic notation in which a thin rectangular prism bisects the page and is itself filled and surrounded by various shadings, lines and circles which in turn designate pitches, pitch ranges and various harmonics from which the pianist creates a performance, see Figure 3. While the work’s sonic material is defined by each individual transparency, the work’s form only emerges through the active superimposition of these transparencies, that is form is established by material depth.

Figure 3. Two pages from Toshi Ichiyanagi’s Music for Piano No. 7 (1961).
Musical structure and material depth are fundamentally correlated in Toru Takemitsu’s Corona (1962), the score for which, designed in collaboration with graphic designer Kôhei Sugiura (Burt Reference Burt2006), exists in two versions – one for solo piano and the other for string orchestra. The piano version comprises five studies – 1. Study for Vibration, 2. Study for Intonation, 3. Study for Articulation, 4. Study for Expression and 5. Study for Conversation, each of which features a single-page score printed in blue, red, yellow, grey and white ink, respectively, on an individual transparency. While Takemitsu’s performance notes are somewhat ambiguous, he suggests that each of these transparencies may be cut and intersected with others to form different arrangements (Deupree Reference Deupree2009), he does provide explicit instructions for how the various graphic shapes and symbols within each study are to be interpreted, see Figure 4.

Figure 4. ‘Study for Articulation’ from Corona (1962) with part of an interpretive key.
Like arrangement number three of Ichiyanagi’s Music for Piano No. 7 and Cage’s Cartridge Music, the performance score for Corona constrains prescriptive information to a single page, a feature of many works featuring digital scores created in real-time where the concept of a page turn is an obvious anachronism (Kim-Boyle Reference Kim-Boyle2014). Indeed, with the foregrounding of material depth and the incumbent reading through the score, the single page thus captures a greater volume of information and opens more interpretive possibilities.
Outside the work of Cage and broader Japanese experimental practice, the use of material superimposition as a compositional tool features in the work of various composers such as Kenneth Gaburo and Herbert Brün. The scores for Gaburo’s Lingua II: Maledetto (1967–8) and Brün’s Mutatis Mutandi (1976), for example, are constructed from complex superimpositions of words, letters and other graphical symbols, see Figure 5. While performers of both Gaburo and Brün’s works do not create their own multi-layered scores from the superimposition of pre-composed pages or transparencies in the manner of Cage’s earlier work, the superimposition of graphical and typographical shapes nevertheless suggests similar temporal modalities and performance challenges. Of the latter, Brün has written – ‘The interpreter, now, is to construct, by thought and imagination, HIS version of a structure that might leave the traces which the graphic displays. The interpreter is not asked to reconstruct my computer programme, the structured process that actually generated the graphics. Rather he is asked to construct the structured process by which HE would like to have generated the graphics’. (Brün Reference Brün1968). The notational complexity of Brün’s work with its overt layering of graphical planes encourages performers to read through the score alongside a more traditional reading along. Material depth thus helps decentre the traditional association of linearity and prescriptive, left-to-right reading. Multi-dimensional notational structures are also foregrounded in Gaburo’s work for seven virtuoso speakers with simultaneity and overlapping of vocal enunciations mirrored in notational depth and textual superimposition.

Figure 5. Score excerpts from Herbert Brün’s Mutatis Mutandis (left) and Kenneth Gaburo’s Lingua II (right), both of which employ complex graphic superimpositions to create musical structure.
Aside from the active application of techniques such as superimposition and assemblage, three-dimensional structures are strongly implied in the notation of works such as Earle Brown’s Four Systems (1952–53) and his seminal December 1952 (1952), the score for which was originally conceived as a ‘three-dimensional motorised box’ (Brown Reference Brown2008) containing various physical notational signifiers moving via a system of motorised gears. In Mauricio Kagel’s Metapiece (Mimetics) (1961) for piano, the flattening effect of two-dimensional media is softened through the concertina folding of the performance score. The relationship between a score’s materiality and a work’s formal properties are featured in other works of Kagel such as Transición II (1958–59) which foregrounds materiality through engaging performers in spatial processes of cutting, reconstructing and physical assembly (Gutkin Reference Gutkin2012), while three-dimensional notational structures are suggested in many of Sylvano Bussotti’s works such as Sette Fogli – Per Tre (1963) for piano, see Figure 6 with its implied perspectival focal points, and the rhizomatic filigrees of Five Piano Pieces for David Tudor (1959) (Bogue Reference Bogue2014). The influence of a collective body of concrete poetry by poets such as Jackson MacLow, bp Nichol, and Augusto de Campos whose work was fundamentally invested in exploring the relationships between typographical structure and literary expression is also worth acknowledging at this point, especially given their close relationship to composers such as Cage and others exploring the creative possibilities of graphic notations (Williams Reference Williams1967).

Figure 6. Score excerpts from Bussotti’s Sette Fogli – Per Tre (left) and Five Piano Pieces for David Tudor (right).
The representation of multidimensional visual structures, including three-dimensional spatial visualisations, on two-dimensional surfaces presents significant performance challenges. Chief amongst these, perhaps, is how notational density can adversely impact the transparency of interpretive possibility. Consider, for example, the score excerpt from Bussotti’s Five Piano Pieces in Figure 6. The uniform colour of the score’s filigrees and their inherent density create what is colloquially known in data visualisation as the ‘hairball effect’. Individual strands within this complex assemblage cannot be easily unfurled, leading to difficulties of interpretation, no doubt partially intended. Contemporary composers such as Aaron Cassidy and Cat Hope, whose work is not overtly concerned with the affordances of three-dimensional spatial notations, often employ different colours in the design of their scores to facilitate reading and to help distinguish musical or gestural layers, see Figure 7a, and even as far back as the 15th century, composers like Baude Cordier and the Mannerist school used coloured ink to delineate rhythmic differences and distinctions more clearly, see Figure 7b.

Figure 7. (a) Score excerpt from Aaron Cassidy’s The wreck of former boundaries (2020) where colour is used to distinguish various gestural layers (left), (b) Detail from the score for Baude Cordier’s Belle, Bonne, Sage in which red-coloured notes denote rhythmic modification (right).
While superimposition, colour and other optical effects go some way to transcending the material limitations of the printed page, they also reveal fundamental constraints. The ‘hairball effect’ evident in notationally dense works like Bussotti’s Five Piano Pieces demonstrates how physical media can obscure the very relationships they seek to represent. Achieving true three-dimensional clarity ultimately requires moving beyond the physical constraints of paper, ink and transparent materials. Techniques such as stereoscopy, for example, are one such means through which this can be achieved and through which complex spatial displays of information can be realistically presented.
2. The technology of 3D
While the desire to transcend the representational constraints of two-dimensional surfaces can be dated back to the development of linear perspective in the early 1400s, it was not until Charles Wheatstone’s experiments with binocular vision and stereoscopy in the 1830s and their subsequent commercial application in stereoscopes such as those developed by Brewster and Holmes in the 1850s, see Figure 8, that three-dimensional imagery was able to achieve a verisimilitude particularly arresting to observers. Writing in 1859, Oliver Wendell Holmes captured the transformative impact of stereoscopic viewing: ‘The first effect of looking at a good photograph through the stereoscope is a surprise such as no painting ever produced. The mind feels way into the very depths of the picture. The scraggy branches of a tree in the foreground run out at us as if they would scratch our eyes out. The elbow of a figure stands forth as to make us almost uncomfortable’ (Holmes Reference Holmes1859). The potential impact on the visual arts of the ability to more naturally represent subjects was also not lost on early observers – ‘We may have in future galleries of portraits no fictions of painters, but the people as they were – not flat and framed, and hung along the walls, nor in cold marble, but round and real as they looked in life: and so with building and scenery, we may have, at a cheap rate, our hall of antiquities – Pompeii as it is, Ninevah as Layard sees it – scenery in foreign lands, in our own, in all the minuteness, grandeur, and beauty of nature’ (Hankins and Silverman Reference Hankins and Silverman2016: 153–4).

Figure 8. Two 19th-century stereoscopes made by Brewster (left) and Holmes (right).
Stereoscopes enjoyed immense popularity from the late 19th century well into the twentieth century, with toys like Mattel’s ViewMaster demonstrating their enduring commercial appeal. As tools for facilitating musical expression, however, stereoscopic imagery never elicited significant interest due in no small part to a range of pragmatic limitations, not least of which was the need to physically hold the devices which made a musician’s hands unavailable for performance. In recent years, rapid advances in microprocessing power and a growing interest in mixed reality, metaverses, and virtual environments have driven a rapid evolution of head-mounted displays which present immersive, three-dimensional imagery while allowing greater physical freedom. These developments have made opportunities for exploring the musical affordances of three-dimensional scores far more accessible.
Sophisticated, contemporary head-mounted displays such as the Microsoft HoloLens or the Oculus Quest 3 operate on the same fundamental principles as that of Victorian-era stereoscopes, presenting discrete, phase-shifted images to each eye (Hankins and Silverman Reference Hankins and Silverman2016). The HoloLens, released in 2016, projects different images to each eye via two high-definition ‘light engines’, comprising 2.3 million total light points. Unlike virtual reality headsets, these images are projected onto the surface of real-world objects rather than rendered to a display. As was recognised by 19th-century stereoscope designers, accounting for variance in interpupillary distance (IPD) between observers is critical for realistic depth perception (Hibbard, Haines, and Hornsey Reference Hibbard, Haines and Hornsey2017). Unlike early stereoscopes, however, the IPD is uniquely measured for each wearer of the HoloLens through a simple calibration process, ensuring greater realism of depth and spatial understanding for the observer. Microsoft’s HoloLens was the first widely available augmented reality headset capable of delivering interactive, three-dimensional visualisations and immersive sound to the wearer. Its ability to significantly enhance three-dimensional spatial understanding (McIntire et al. Reference McIntire, Havig and Geiselman2012) made it a subject of extensive research, in fields ranging from medicine to architecture (Leung et al. Reference Leung, Lasso, Holden, Zevin, Fichtinger, Fei and Webster2018; Hockett and Ingleby Reference Hockett and Ingleby2016). Despite other technical affordances such as spatial mapping, gestural interaction, voice recognition, 3D audio and the ability to share holograms over networks across multiple devices, the HoloLens’s use as a tool for the creative arts never attracted widespread attention, unsurprising perhaps given the cost of the device and the technical skill required to develop applications. In 2020, Microsoft released an updated HoloLens 2, which employed a faster chipset, improved ergonomics including a hinged rather than fixed display and slightly reduced weight and additional technical capabilities such as eye tracking. One particularly useful new feature was the ability of the device to recognise images using its built-in, front-facing camera. For the author, this opened the possibility of accurately anchoring holograms to musical instruments, allowing holographic scores to be mapped to an instrument’s physical attributes (Kim-Boyle Reference Kim-Boyle2022).
The rapid development cycle of technology has rendered both the HoloLens and HoloLens 2 obsolete, with most modern augmented reality developers now favouring more accessible tablets or smartphones as display devices. While some manufacturers continue to develop head-mounted displays, such as Apple’s Vision Pro, or even augmented reality contact lenses (Efron Reference Efron2023), the use of two-dimensional surfaces to present three-dimensional imagery is subject to the same material display constraints as previously discussed. Ongoing research and development of head-mounted displays has today mostly been targeted towards virtual reality rather than augmented reality applications with the gaming industry and community driving most of these advances.
As of writing, the Oculus Quest series of virtual reality headsets are the most commercially successful head-mounted displays produced to date. They feature built-in OLED displays with a resolution of 1832 × 1920 pixels per eye in the Quest 2 model, significantly higher than the first-generation HoloLens, but necessitated by the visual fidelity required to create realistic immersive VR experiences. While originally requiring hand controllers for interaction with virtual objects, over the past two years hand tracking has been facilitated with front-facing cameras mounted on the devices. This has enabled more natural forms of user interaction and expanded utility beyond gaming applications that require unencumbered hands.
Augmented and virtual reality headsets offer the most practical platforms for explorations of the musical affordances of three-dimensional visualisations. In the following section, the author will discuss some of these affordances and the incumbent reframing of spatiotemporal ontologies and new performance frameworks they offer.
3. Musical affordances and applications
How might the affordances of devices such as the HoloLens or Oculus Quest and in particular their ability to present realistic three-dimensional imagery be used for musical purposes? Contemporary mixed reality technologies offer solutions to many of the interpretive challenges that emerged in earlier experiments with three-dimensional notation. Consider, for instance, how the density issues evident in works like Bussotti’s Five Piano Pieces for David Tudor (Figure 6) might be addressed through digital three-dimensional modelling. The nested cluster of visual information that creates interpretive ambiguity on the printed page could be modelled in three-dimensional space with colour used to distinguish individual filaments. Within the constraints of a seated position, the pianist could view this construct from various angles to help distinguish relationships between constituent elements that remain tangled in the original score. With a digitally presented score, animation techniques could also be utilised such that the score’s filaments are gradually unfurled in three-dimensional space over time rather than being presented as a complete static assemblage.Footnote 2 Cassidy’s the wreck of former boundaries suggests other interesting possibilities that currently only augmented reality headsets such as the HoloLens can support. As noted, one of the features of the HoloLens 2 that distinguishes it from all other currently available augmented reality headsets is the ability to detect images. As in the author’s The Twittering Machine (2022) which features the anchoring of a holographic score to the surface of a piano keyboard (Kim-Boyle Reference Kim-Boyle2022), the affordance could map Cassidy’s score to the physical dimensions of the lap-steel guitar. Similarly to the previous Bussotti example, it would also be possible to implement three-dimensional animation techniques, such as shape dissolves and spatial transformations, to represent discrete sonic events and performance gestures.
In the author’s creative practice, and prior to the increasing accessibility of virtual and augmented reality headsets, numerous strategies were developed to facilitate the presentation of three-dimensional structures on two-dimensional surfaces. In works such as point studies no. 2 (2013), the first work of the author’s to feature a digital, three-dimensional score, the score is uniquely instantiated for each performance and rotates around xyz axes during performance. The three-dimensional structure of the score, see Figure 9a, allows a volume of information to be contained within the visual frame which would otherwise require virtual page turns or visual scrolling, with object rotation enhancing legibility.

Figure 9. Screen captures from two three-dimensional scores – (a) point studies no. 2 (2013) (left), (b) 16:16 (2016) (right).
In 16:16 (2016), for prepared piano four hands, a different visualisation strategy is adopted using anaglyphic imagery. While superficially similar in design to point studies no. 2, in 16:16 each performer wears red-cyan glasses to view an anaglyphic score generated in OpenGL (Jitter). An anaglyph is a stereoscopic image created by physically separating the red and cyan channels of a full-colour image. The greater the channel separation, the stronger the illusion of depth. In the Jitter/OpenGL environment, this is achieved by filtering three-dimensional imagery to two overlaid video planes: one filtered to retain only red (left eye) and the other to retain only cyan (right eye), see Figure 9b. When viewed through red-cyan glasses, these overlaid video planes create an anaglyphic image that appears to extend in three dimensions.
In 16:16, anaglyphic images facilitate the legibility of three-dimensional movements and help distinguish nodes aligned along nearby axes. Anaglyphic imagery, however, presents certain constraints. Displaying such imagery to audiences – as often occurs in concerts featuring works with digital scores – proves somewhat ineffective unless audience members receive their own red-cyan glasses. More significantly, anaglyph images have a limited effective colour field (Ideses and Yaroslavsky Reference Ideses and Yaroslavsky2004). While this limitation was not a constraint for 16:16, the score for which has a limited colour palette of red, green, blue, yellow, white and black, it nevertheless affects design choices. Regarding three-dimensional realism, anaglyphic imagery cannot match the capabilities of augmented and virtual reality headsets. Consequently, since 2019, the author has employed such devices across various projects to explore new aesthetic possibilities and musical affordances.
3.1. 5x3x3 (2019)
As previously noted, the HoloLens was amongst the first augmented reality headsets to enable realistic, three-dimensional imagery to be superimposed on the real world. 5x3x3, for any three wind instruments, marked a significant advance in the author’s three-dimensional scoring techniques and was perhaps one of the first uses of the HoloLens in a live performance environment. The score for 5x3x3 presents three cubic grids of coloured nodes connected by thin, coloured lines of variable length and curvature, see Figure 10, with node and line colours, line length and curvature denoting different types of articulations, pitches, note durations and timbral variations, respectively. Each of these constructs are holographically projected onto the performance space and scaled such that from the performer’s perspective the score appears to occupy a maximum span of around 10 (width) by 6 (height) by 6 (depth) metres.

Figure 10. (a) Screen snapshot from the score for 5x3x3 (2019). Performers explore various pathways through a holographically presented score as it is transformed during performance (upper), (b) Carl Rosman (foreground), Ryan Williams (centre), and Tamara Kohler (right) of the ELISION ensemble during rehearsal of 5x3x3 at the Ian Potter Centre for the Performing Arts, Melbourne (lower).
The score for 5x3x3 is not static but gradually transforms over time with the initial three-node constructs merging into a large, single assemblage. These spatial transformations manifest as a redistribution of the score’s nodes in three-dimensional space and are conveyed to listeners through a gradual evolution of the work’s rhythmic structure. In addition to spatiotemporal evolution, there is a corresponding timbral transformation which is driven by a real-time FFT analysis of the acoustic sounds presented in the performance space and reflected to the performers through varying curvatures of lines which connect nodes in the score.
Like point studies no. 2, in 5x3x3 the performers explore multiple pathways through the score as it is holographically presented in the performance space. As the three nodal constructs merge into one, the range of possibilities through the score greatly increases. The transformation of nodal positions and line lengths is constrained within predetermined limits, however, and so while the work has a non-linear formal structure, the range of possibilities available to the performer is consistently constrained within known limits across performances.
Unlike point studies no. 2, the sheer physical dimensions of 5x3x3’s score as it is presented to the three performers promote a physical engagement that is not typically foregrounded in live performance where the only physical relationship between performer and score is usually that of a page turn. For the author, Debord’s concept of the dérive has been especially valuable in this respect where the physical exploration of the score is aestheticised through musical utterance (Debord Reference Debord1956). It is hard to imagine how such a theme might be explored should the score for the work have been presented on a two-dimensional surface.
3.2. 96 Postcards in Real Color (2022)
96 Postcards in Real Color (2022), henceforth 96 Postcards, for four to eight vocalists is the first work of the author’s designed for a VR space with each singer wearing an Oculus Quest 2 VR headset during performance. Inspired by French writer Georges Perec’s Deux Cent Quarante-Trois Cartes Postales en Coulers Veritables (1978) and premiered in 2022 by the Neue Vocalsolisten of Stuttgart, 96 Postcards features a three-dimensional, immersive performance score generated from Instagram image caption data scraped from 96 locations around the world. The score is uniquely instantiated for each performance, although certain elements remain within predetermined boundaries to ensure consistency across performances, and surrounds the performers in a virtual reality space. As the performers vocalise various pathways through the score, actualising its latent possibilities, their explorations trace a musical trail through the lens of idealised Instagram memories.
The score for 96 Postcards adopts a similar design aesthetic to previous scores of the author’s discussed in this paper, although certain elements of its appearance are driven by an analysis performed in Python with various natural language processing libraries of Instagram text captions. In addition, images gathered from Instagram are integrated within the score and displayed on twelve panels which surround the performers in the VR scene. While the images are not interpreted in any strict sense, they do provide an invaluable visual anchor as the performers traverse around the 360-degree circumference of the score’s node-line construct. A text caption accompanying one of the images posted to Instagram is positioned above each panel, see Figure 11.

Figure 11. A fragment of the performance score for 96 Postcards from towards the centre of the VR scene. An interconnected, three-dimensional grid of coloured nodes is positioned in front of panels containing various images gathered from Instagram. A single text caption is positioned above each.
In 96 Postcards, each node denotes the articulation of a pitch with the node’s colour indicating the specific pitch to be sustained along a line. Like 5x3x3, 96 Postcards employs a direct correspondence between the score’s spatial distribution and its temporal structure. Lines between nodes function as temporal indicators, with their spatial length within the VR scene denoting the absolute temporal duration of the musical event it connects. In 96 Postcards, the mapping is prescribed at approximately 20 seconds across each canvas with smaller subdivisions resulting in proportionally shorter subdivisions.
During performance, the vocalists freely read around the score from one node to another, musically exploring the myriad range of pathways presented as they traverse the full 360-degree node distribution. As in previous works by the author, the performance per se thus becomes an actualisation of the latent possibilities presented by the non-linear, open form notational schema. The dèrive performance model is further reinforced through the way in which the Instagram text captions function in the performance score. The text caption placed above each of the twelve panels provides a model for how each node is to be enunciated with only successive vowels within a text caption to be sounded. A caption such as ‘Le Rocher du Basta et son point de vue parfait à la fois sur le Grande…’ would require performers to articulate and sustain pitches as represented in the node-line constructs along the vowels ‘eoeuaaeooieueaaiàaoiueae…’ etc. Each sounded text caption, sifted of plosives, fricatives, affricates and other consonants, thus becomes a filter through which all other visualised captions are sounded. These latent locations, given actuality only through a node-line construct, are thus heard as ghost-like echoes during the exploratory dérive of the performer’s journey through the score.
Physical engagement with a three-dimensional score presented in virtual reality space is quite unlike that promoted through an augmented reality system such as the HoloLens. In the VR space of 96 Postcards, performers are fully immersed in a virtual world and unable to visually perceive other performers, the audience or any other objects in the real world. While there are built-in safety protocols, these naturally limit the extent of any physical explorations. As a result, in 96 Postcards each of the performers is physically positioned within an isolated area of the real-world performance space and simply explores the 360-degree distribution of the score from a fixed, standing position.
3.3. Virtual reality adaptations – Four Systems (1953), Jasper (1991), Corona (1962)
The author’s exploration of the musical affordances of three-dimensional notations has been extended in an ongoing series of virtual reality adaptations of graphic scores for established works. This project presents a unique performance modality wherein a three-dimensional score is situated in a VR space and both transformed and performed as if it were an instrument by networked clients. Furthermore, the scores are interpreted by live musicians wearing Oculus Quest 3 headsets. This performance modality is illustrated in Figure 12.

Figure 12. Performance modality for VR scores.
Currently, the author has adapted four works for VR performance – December 1952 (1952) and Four Systems (1953) by Earle Brown, Jasper (1991) by Christian Wolff and Corona (1962) by Toru Takemitsu. The scores for December 1952 and Four Systems share a similar visual design with black rectangles spatially distributed on a page with considerable interpretative freedom afforded to the performers. The instrumentation of December 1952 is open while Four Systems was written for a solo pianist. Since the VR adaptations of both are nearly identical, this discussion will only focus on Four Systems.
During performance, networked clients can interact with the three-dimensional score for Four Systems in several ways. First, the rectangular blocks that constitute the score may be grabbed and repositioned in VR space. They may also be used as tools to strike other blocks which then move within the space subject to physical constraints such as gravity, inertia and friction. Second, the volume of the blocks may be expanded or contracted when a client grabs one and opens or closes the distance between index finger and thumb. Finally, the blocks may be ‘sounded’ by pinching them directly or striking them against other blocks, triggering the playback of prerecorded piano samples heard in the VR space. The pitch of these samples is correlated to each block’s spatial volume.
Recent technical developments for the Oculus Quest 3 headset have enabled the device to perform like an augmented reality system where imagery is superimposed on the real world. In the VR adaptation of Four Systems, this proves exceptionally useful as it allows the score to be superimposed on the surface of a piano, see Figure 13, while enabling the pianist to see both the keyboard and their hands.

Figure 13. The transformed VR score as it appears to the pianist (lower) with a section from Brown’s original score (upper).
Score transformations are perceptible to all other networked clients, including the pianist, connected to the VR scene. In addition, the interpretation of the pianist is broadcast to the VR space for all clients to hear. The reader is referred to (Kim-Boyle Reference Kim-Boyle2024) for a more detailed explanation of how these various techniques are implemented.
The third movement of Jasper (1991), for double-bass and violin, features a graphic score that closely resembles a node graph, see Figure 14a. In the author’s VR adaptation, the two performers are immersed within a three-dimensional score which requires them to physically rotate in order to read, see Figure 14b.

Figure 14. (a) An excerpt from the original score to the third movement of Jasper (1991). The upper and lower brackets denote the violin and double-bass part, respectively, while the short, horizontal lines denote the four strings of each instrument. The open and closed diamonds indicate notes (upper). (b) An immersive VR adaptation of one system where the staves are wrapped around each of the instrumentalists (lower).
As in Four Systems, clients connected to the VR scene can interact with the components of the score which directly affects the interpretive possibilities available to the two live musicians. There are two possibilities for score interaction both enabled via hand tracking: 1) pushing nodes along the string lines, 2) ‘plucking’ the edges that connect nodes. Similarly to an abacus, nodes can be pushed either left or right along the string upon which they sit. Each node is subject to Newtonian physics behaviours upon collision with other nodes, with the force of hand movement directly correlating to the velocity of movement along each string. The second type of score interaction, where edges are ‘plucked’, results in the playback of an audio sample of a prerecorded plucked string. The force of the collision determines the volume of the playback sample and the length of the string correlates to pitch, with long strings sounding as low frequencies and short strings sounding as higher pitches. These audio samples are heard by all connected clients.
The score for Takemitsu’s Corona was designed in collaboration with graphic designer Kôhei Sugiura (Burt Reference Burt2006) and exists in two versions – one for solo piano and the other for string orchestra. The piano version comprises five studies – 1. Study for Vibration, 2. Study for Intonation, 3. Study for Articulation, 4. Study for Expression and 5. Study for Conversation. Each study features a single-page score printed in blue, red, yellow, grey and white ink, respectively, on individual transparencies.
While Takemitsu’s performance notes are ambiguous, he does suggest that each transparency may be cut and intersected with others to form different structural arrangements (Takemitsu Reference Takemitsu1962). The score for each study is characterised by its circular, non-linear structure with various musical events graphically presented along its perimeter. The pianist commences performance of a particular study by selecting a point along the perimeter and traversing their way around the circle in either a clockwise or anticlockwise direction at a speed loosely indicated as 1. possibly slow, 2. 2 min or 4 min, 3. possibly fast, 4. 1 min or 3 min or 5 min and 5. tempo-free (Ibid).
The circular structure of Corona’s score, where no one spatial location is more privileged than any other and where this in turn helps support a non-linear temporal ontology, lends itself particularly well to arrangement and presentation in an immersive, virtual reality space. The author has chosen ‘Study for Intonation’ for such an adaptation, see Figure 4. In the VR adaptation, the notational descriptors which denote the properties of various sonic events are stochastically distributed along the perimeter of the circle. The score is uniquely generated upon each instantiation within constraints defined by predetermined rules – for example no more than three nodes may be drawn along any radial and no more than eight radials may be generated in any instantiation. Two such instantiations are presented in Figure 15.

Figure 15. Two instantiations of ‘Study for Intonation’.
Like the VR scores developed for Four Systems and Jasper, Corona’s score is expressly intended for a performance model wherein the performer is presented with a score that is manipulated and also performed or sounded by networked clients in both a local and distributed mode of performance. In Corona, clients enter the VR space in which the score is situated and interact with its various components by either repositioning them or playing them as if they are part of a musical instrument. Both modes of networked interaction are broadcast to all connected clients including the live pianist who views the score as it is being transformed through an Oculus Quest 3 headset in pass-through mode. Score components can be repositioned in two ways – by stretching the radials that extend from the central circle through manipulation of their terminal nodes and by moving these nodes along invisible circular splines. These interactions are enabled through hand tracking using a similar method to that employed in Four Systems and Jasper.
4. Towards an aesthetics of spatial materiality
While performance physicality has been extensively theorised in contemporary music practice in the work of composers from Lachenmann through Cassidy, the aesthetic implications of three-dimensional scores in mixed reality environments provide new theoretical territories. Works such as Birtwistle’s Secret Theatre (1984), for example, which features a choreographed movement of performers across various stage positions, might suggest some possibilities for how explorations of physical space may be musically organised; however, the relationship between performer and three-dimensional score demands different analytical frameworks. The Situationist concept of the dérive and broader psychogeographic approaches to lived spaces (O’Rourke Reference O’Rourke2013) provides more fertile theoretical ground, despite emerging from a different performative context where the environment itself did not contain explicit performance directives.
Translating psychogeographic insights into compositional practice requires theoretical frameworks that can account for both spatial navigation and musical structure. Boulez’s concept of striated and smooth space (Boulez Reference Boulez, Bradshaw and Rodney Bennett1971; Deleuze and Guattari Reference Deleuze, Guattari and Massumi1987) is one such model which helps us consider how the spatiality of three-dimensional scores might be musically integrated. His Répons (1981), for example, with its electroacoustic spatialisation of harmonic structures around the audience, demonstrates how theoretical models of space can inform our understanding of musical materiality. This framework helps illuminate works like 96 Postcards, where harmonic relationships are directly mapped onto spatial coordinates within the virtual reality environment, see Figure 16. The score becomes more than a set of instructions – it functions as a cartographic space through which performers navigate with their movements through the virtual environment directly generating musical structure.

Figure 16. Probability-based spatial distribution of node colours in 96 Postcards where node colours denote a class of predetermined pitches.
Similarly, in 5x3x3 real-world space is temporally striated through the duration of pitched sonic events which have a virtual representation, through the HoloLens, along geometric lines made visible to the audience through the movement of the performers. As line lengths in the score are mapped to the duration of notes performed, the physical mapping is made manifest in rhythmic structure and temporal organisation. Correlating physical space and temporal unfolding builds on the work of composers such as Xenakis (Xenakis Reference Xenakis1992; Treib Reference Treib1996) and is an important structural determinant in much of the work of contemporary composers such as Benedict Mason and Rebecca Saunders. The sounding of a physical space through a performance dérive repositions the score as a cartographic representation (Miller Reference Miller2017) rather than sitting as a simple denotation of prescriptive action.
The dérive, while helpful in understanding spatial exploration, can be enriched by considering Henri Lefebvre’s triadic model of spatial practice, representations of space and representational spaces (Lefebvre Reference Lefebvre and Nicholson-Smith1991 [1974]). In this framework, three-dimensional scores exist simultaneously as conceived space (in their design), perceived space (in their physical manifestation) and lived space (in their performance). This multi-layered understanding helps explain how works like 96 Postcards create complex relationships between spatial organisation and musical meaning. The lived space of performance becomes particularly significant here, as it generates what Lefebvre calls ‘differential space’ – space that is actively produced through social practice rather than simply occupied. In musical terms, this suggests that three-dimensional scores do not simply represent musical relationships but actively produce them through the performer’s embodied navigation of virtual environments.
While three-dimensional scores would seem to be particularly well suited for the representation of non-linear musical forms, to what extent might they suggest or make manifest unique temporal ontologies? Consider the adaptation of Four Systems discussed earlier noting the a priori presumption that each discrete graphic object within the two-dimensional frame constitutes a unique event which may in turn be interpreted as a pitch, gesture, timbral inflection, or other kind of sonic event. The score provides no overt indication of sequential ordering, other than perhaps what may be implied by shape similarity, and no distinctions are made through shading, gradient, or colour. Translating this notational schema to a three-dimensional representation is straightforward, though each event must extend along the z axis with the depth of this extension introducing an additional interpretive variable. Ultimately though, consistency of design from two to three dimensions can be realistically presumed and the question becomes how might extension or distribution of events on the z axis suggest new temporal ontologies and new ways in which temporal relationships might be navigated? While the question cannot be fully investigated within the limits of this paper and may, indeed, have no clear answer, helpful perspectives might be gathered from our understanding of the relationship between human spatial cognition and temporal consciousness.
Merleau-Ponty’s phenomenological analysis of depth perception provides a useful framework for understanding how three-dimensional scores restructure temporal experience. Rather than treating depth as merely spatial cognition, Merleau-Ponty argues that depth fundamentally shapes temporal perception – objects perceived on the horizon create our temporal orientation in the world (Wambacq Reference Wambacq2011; Merleau-Ponty Reference Merleau-Ponty and Landes2013). As he writes, ‘Perception provides me with a ‘field of presence’ in the broad sense, extending in two dimensions: the here-there dimension and the past-present-future dimension’ (Merleau-Ponty Reference Merleau-Ponty and Landes2013: 307). Deleuze develops this insight in his analysis of cinema, where depth of field creates what he terms a ‘time-image’ – a fusion of temporal and spatial dimensions into a homogeneous construct (Deleuze Reference Deleuze, Tomlinson and Galeta1989; Wambacq Reference Wambacq2011). This framework offers new ways to help understand how performers engage with three-dimensional scores. The spatial positioning of musical events at different depths doesn’t simply organise musical material in space – it creates a temporal field where each event’s position implies specific potentialities for realisation (Wall Reference Wall2004). The score becomes what Deleuze and Guattari (Reference Deleuze, Guattari and Massumi1987) term a ‘plane of immanence’ where past, present and future possibilities coexist in a single moment. Rather than reading the score as a linear sequence of events, performers encounter a multidimensional field of temporal possibilities made tangible through spatial relationships.
Clark’s theory of extended cognition provides another valuable model for understanding how performers engage with three-dimensional scores. In his Supersizing the Mind (2008), Clark argues that our cognitive processes routinely extend beyond the boundaries of brain and body to incorporate external tools and technologies. Considered with respect to works like 5x3x3, this suggests that the mixed reality score becomes more than just an interface – it functions as part of the performer’s extended cognitive architecture. The initial challenge of learning to navigate a three-dimensional score mirrors what Clark describes as the integration of new technologies into our cognitive repertoire. Over time, the augmented or virtual reality headset shifts from being what Clark terms a transparent technology to become seamlessly incorporated into the performer’s musical thinking. This is particularly evident in works like Corona, where performers must simultaneously process spatial positioning, gestural interaction and musical interpretation. The score environment becomes an ‘extended cognitive system’ (Clark Reference Clark2008), distributing musical thinking across performer, score and technological interface.
The extended cognitive system that Clark proposes can be further illuminated through Barad’s concept of ‘intra-action’. While Clark helps us understand how performers’ cognitive processes extend into the technological infrastructure of mixed reality scores, Barad’s framework suggests these relationships are even more fundamentally entangled. Rather than seeing performer, score and technology as pre-existing entities that then interact, Barad argues that ‘the primary epistemological unit is not independent objects with inherent boundaries and properties but rather phenomena… phenomena are the ontological inseparability of agentially intra-acting components’ (Barad Reference Barad2007: 33). In works like 96 Postcards, the score’s meaning emerges not from a simple interaction between discrete elements (performer, score, technology), but through what Barad terms ‘agential cuts’ – specific materialisations that are enacted and not given. The spatial organisation of the score, the performer’s gestural engagement and the technological mediation are not separate elements that combine but rather are mutually constitutive aspects of a phenomenon.
This understanding helps explain why three-dimensional scores create fundamentally different performance possibilities than their two-dimensional counterparts. Barad’s insight that ‘matter and meaning are mutually articulated’ (Barad Reference Barad2007: 152) suggests that the materiality of the score – whether physical transparencies in Cage’s work or virtual reality environments in contemporary practice – actively participates in creating musical meaning rather than simply conveying it. The technological apparatus itself becomes constitutive of new musical possibilities, moving beyond a neutral delivery mechanism. The score functions as an ‘apparatus’ that doesn’t merely present musical information but actively creates new forms of musical agency distributed across performer, notation, and technology.
5. Summary
The emergence of mixed reality technologies has created unprecedented opportunities for reconceptualising musical notation and performance practice through three-dimensional scores. While this paper has traced developments from early physical experiments with transparency and layering through to contemporary virtual and augmented reality implementations, the implications extend far beyond mere technological innovation. These new scoring practices fundamentally challenge traditional relationships between performer, score and musical temporality.
The present study acknowledges several important limitations that point toward fertile areas for future research. First, this investigation has focused primarily on the performer’s engagement with three-dimensional scores without examining their sonic or aesthetic consequences for listeners and audiences. Whether and how such scores impact musical perception or engagement for those not directly interacting with the visual system remains an open question requiring systematic investigation. Second, while this paper has concentrated on augmented and virtual reality implementations, it has not addressed the broader genre of locative scores or site-specific scoring practices that operate within geographic space – practices that could enrich theoretical understanding and help clarify the specificity of mixed reality three-dimensional scores within spatialised music more broadly. Third, the discussion of affordances has not fully addressed practical user-centric concerns such as ergonomics, cognitive load and the challenges musicians face when simultaneously listening, performing and navigating complex three-dimensional visual environments. Related to this, the potential for genuinely polyphonic, multi-user creative interactions enabled by three-dimensional scores deserves deeper exploration beyond the collaborative models discussed here.
Despite these limitations, the transformation from two-dimensional to three-dimensional scoring practices reveals several key insights. First, spatial depth becomes more than just an additional axis for organising musical information – it creates new possibilities for non-linear formal structures and performer agency. Second, the integration of mixed reality technologies enables dynamic score generation and collaborative, networked performance models that were previously impossible. Third, the physical engagement required by three-dimensional scores, particularly through concepts like the dérive, suggests new frameworks for understanding the embodied nature of musical interpretation.
Realising these possibilities, however, requires addressing significant challenges. The technical complexity of mixed reality development demands new collaborative approaches that may reshape traditional models of compositional practice. The performative adaptation required for musicians to effectively navigate three-dimensional scores necessitates careful consideration of cognitive load and spatial perception. Perhaps most importantly, the theoretical frameworks for understanding these new scoring practices – drawing on phenomenology, spatial theory and performance studies – are still emerging.
Looking forward, several key areas warrant further investigation. The relationship between spatial organisation and temporal perception in three-dimensional scores remains rich territory for both theoretical and practical exploration. The potential for networked performance and collaborative score manipulation suggests new models of distributed creativity. Educational applications, particularly in developing spatial-musical understanding, deserve careful study. While traditional notational practices will persist, three-dimensional scoring opens new possibilities for musical expression that directly engage with contemporary technological and theoretical developments in spatial understanding.
The evolution of three-dimensional scoring practices thus represents not just a technological advancement, but a fundamental reconceptualisation of how musical ideas can be organised, interpreted and performed. As mixed reality technologies continue to develop, the intersection of spatial thinking and musical practice promises to remain a fertile ground for creative and theoretical innovation.
