Introduction
This article sets out to examine ways in which the distinctive social ethos and tactile appeal of physical modular synthesisers can be transferred into extended reality (XR) contexts. We begin with an overview of the shift away from analogue synthesis methods towards digital platforms, and the eventual reaction against this shift. This is followed by a survey of the implications of virtual reality (VR), augmented reality (AR) and mixed reality (MR) technologies for synthesis practices, including the net-based community-building potential of these movements. An overview of our own work developing the OpenSoundLab (OSL) MR sound laboratory is then provided. This leads to a research stage in which three creative sound practitioners were invited to incorporate virtual patching with OSL into their respective artistic practices, allowing us to observe the key issues in action and to draw some general conclusions concerning the potential for the extension of modular synthesis into XR applications.
De- and re-materialisation of electronic audio production
In the 1990s and 2000s, digital technology was widely adopted in design, the arts and music, resulting in experimental practices becoming dematerialised (cf. Ward Reference Ward, Jonas, Zerwas and von Anshelm2015). Some artists in the field of sonic media arts perceive the shift towards digital software as a loss of physical experiences, which cannot be replicated by plugin-based workflows within digital audio workstations (DAWs) or patch-based workflows such as ‘Pure Data’ (Puckette Reference Puckette1997), ‘Max/MSP’Footnote 1 (c.f. Snape and Born Reference Snape, Born and Born2022) or ‘VCV Rack’.Footnote 2 DAWs have made much expensive hardware obsolete and reduced studio space needs, but often limit interaction to a 2D screen, stripping away the physicality of dedicated audio spaces.
In response to this digitisation, modular synthesisers have seen a revival. These instruments let musicians connect modules – like oscillators, filters and amplifiers – via patch cables to shape sound (see Figure 1). Pioneers include Harald Bode (Reference Bode1961), Robert Moog (Reference Moog1964) and Don Buchla (Reference Buchla2005), whose early systems resembled 1900s telephone switchboards. Initially costly, bulky and complex, they were confined to universities and studios, later overshadowed by non-modular analogue synths in the 1970s, digital synths in the 1980s–90s and DAW plugins in the 2000s. This changed in the 2010s, when modular synthesis was reimagined through the ‘Eurorack’ format – a grassroots movement enabled by affordable electronics manufacturing and niche online communities (Scott Reference Scott2016; Paradiso Reference Paradiso2017), reigniting interest among musicians and artists, and fuelling a surge in boutique hardware modules from small manufacturers.

Figure 1. Photograph of a physical Eurorack system with modules from various manufacturers.
Claes Thorén and Andreas Kitzmann’s research (Thorén and Kitzmann Reference Thorén and Kitzmann2015) highlights the importance of understanding voluntary digital ‘non-users’ as active and considered agents of socio-technical change. In a later study, Andreas Kitzmann and Claes Thoren (Kitzmann and Thorén Reference Kitzmann and Thorén2022) analysed ModwigglerFootnote 3 through ‘netnography’, a form of conducting ethnography based on online communities (Kozinets Reference Kozinets1997; Kozinets Reference Kozinets1998; Addeo et al. Reference Addeo, Delli Paoli, Esposito and Bolcato2019). Their research suggested that digital technology, despite offering a multitude of choices, often results in decision paralysis. Consequently, many EurorackFootnote 4 enthusiasts expressed a desire to move away from digital instruments and instead embrace more tangible, rewarding and spatially immersive alternatives – though these also come with greater constraints. The authors identify authenticity, legitimacy and creativity as key drivers in projecting identity onto objects. In the context of the ‘analogue turn’, modular systems are seen as unique extensions of artistic identity. Haptic engagement with complex technologies usually fosters deeper connections than non-haptic interaction, enables new creative exploration and instils pride in one’s work. These synthesisers require significant time and effort, nurturing an intimate relationship with the tool and enhancing pleasure in interaction.
To compare virtual and physical modular systems, a theoretical framework for patching in physical space is needed – one that identifies key aspects for evaluation. Understanding why modular synthesisers have surged in popularity, and what still limits their appeal, is essential. Since the rise of Eurorack and the modular revival, patching has received growing scholarly attention, including modular taxonomies (e.g. Hetrick Reference Hetrick2016) and socio-technical analyses of the factors driving the resurgence (e.g. Scott Reference Scott2016; Paradiso Reference Paradiso2017). These provide a foundation for evaluation.
Richard Scott, an electroacoustic musician and modular user who founded the 2014 Sines and Squares Festival and edited a 2014 eContact! issue on modular synthesis, argues that neither analogue sound nor flexibility alone explains the current modular boom (Scott Reference Scott2016). Instead, interfaces of digital audio systems still suffer from unresolved drawbacks, fuelling the return to physical modular systems. Digitisation and virtualisation entail dematerialisation, reducing spatiality and tactility. A digital modular (e.g. in Max or VCV Rack, see Figure 2) may include dozens or even hundreds of controls, yet they are confined to a small screen – far less expansive than physical systems. Unlike physical setups, digital patching requires sequential interaction via mouse: one knob, slider or button at a time. This forces a disconnect between the user’s physical workspace and the screen, preventing direct, embodied engagement. The mouse also offers minimal haptic feedback. Despite vast sonic potential, digital interaction often remains outdated. While MIDI controllers are common in digital studios and advanced or custom interfaces exist, they add layers of symbolic abstraction, distancing users from direct tactile control. For complex patches, mapping all elements intuitively is typically impractical.

Figure 2. Screenshot of a patch in ‘VCV Rack’.
Against this dematerialised backdrop of digital audio production, Scott (Reference Scott2016) points out how the non-symbolic self-representation, spatiality and tactility of the modular interface led to its success:
One of its great appeals is that (…) it does not exist only on a screen and that it does [not] represent anything, rather it is itself an existing three-dimensional, sound-making object – a thing. A thing whose means of access and control relate directly to the electrons flowing through its circuitry and to its own innate non-linear, chaotic and generative capabilities.
Now, roughly a decade after the resurgence of modular synthesisers, we are exploring how the ‘thingness’ of these electronic instruments is not solely about their physical, material presence, but rather their non-symbolic and spatial qualities. The question, then, is whether the latter attributes must remain physical, or if they can be carried over into VR/MR settings (cf. Milgram and Kishino Reference Milgram and Kishino1994). VR immerses users in a fully digital space, while MR blends digital and physical elements interactively – both falling under the broader term XR. Through the newest wave of XR technology started by Oculus in the 2010s, these terms continue to be redefined by the industry, leading to multiple interpretations with subtle differences.
Joe Paradiso (Reference Paradiso2017), renowned for designing and building an expansive custom modular system comprising over 140 modules since the late 1970s, concludes that the non-symbolic spatiality of modular systems is a key factor in their success. He describes these systems as ‘rich, highly engrossing, consuming, immersive, immediate, and serendipitous environment[s]’ (Paradiso Reference Paradiso2017: 3) that facilitate an ‘extreme state of flow’ (Paradiso Reference Paradiso2017: 5). Paradiso also likens artists who perform with modular synthesisers to ‘technological shaman[s] coaxing audio magic out of their mysterious “bonfire” of LEDs and cables’ (Paradiso Reference Paradiso2017: 4), effectively charging the mythos of their complex instruments. Paradiso (Reference Paradiso2017: 6) proposes that VR modular environments could become a viable alternative to physical modular systems as soon as ‘virtual environments have a similar level of tactile complexity as we see in the real world’, highlighting the spatiality and tactility of the modular interface as its most important aspect. In contrast to Paradiso’s perspective, we argue that even today’s level of tactile feedback in VR and MR systems can present a compelling alternative to physical modular systems. While these systems typically simulate touch through rudimentary vibration motors embedded in hand controllers, their effectiveness often exceeds expectations. By varying intensity and timing, these motors can convey a range of tactile sensations – such as contact with virtual interface elements, perceived friction or even a certain illusion of weight – depending on the scale and nature of the interactive object. When combined with responsive visual feedback and interactive audio, these minimal haptic cues have proven remarkably effective in our work on OSL, greatly enhancing user immersion. The resulting experience clearly surpasses that of conventional mouse-based interfaces, which offer minimal tactile feedback.
Socio-technological and ecosystemic factors also play a significant role. Scott (Reference Scott2016) highlights that the renaissance of modular synthesis was made possible by highly interactive and welcoming grassroots communities of manufacturers and artists, unified by a counterculture opposing the digitisation and centralisation of electronic music production. The rise of inexpensive, global communication and social forums on the internet was undoubtedly a major enabler of this movement. Paradiso (Reference Paradiso2017: 4) further emphasises that the advent of affordable computer-aided design (CAD) tools and electronics manufacturing processes was a critical factor in the emergence of the cottage-style industry on which the Eurorack modular ecosystem is still based – although, in recent years, large industrial players like Behringer have also entered the scene. Lastly, Paradiso (Reference Paradiso2017: 4–5) emphasises that ‘modules don’t go obsolete’ because most systems use standardised voltage specifications for audio levels, pitch tracking and power supplies. Owing to their analogue nature, these specifications can be easily adapted to various form factors, granting such instruments a durable, non-ephemeral quality and a sense of longevity that their digital counterparts simply cannot match.
To summarise, the following criteria are central to the renaissance of modular synthesisers and will serve as a foundation for evaluating PatchWorld and OSL as case studies later in this paper:
-
• Non-ephemeral quality
-
• Advent of affordable tools
-
• Grassroots communities of manufacturers and artists
-
• Non-symbolic self-representation, spatiality and tactility
-
• Capacity for a high state of flow
(Re-)Spatialisation of digital modular synthesis
In recent years, advances in VR, AR and MR technologies have provided promising alternatives to physical modular systems. The primary goal of our study was to investigate whether patching modular synthesisers in virtual environments can offer a fascination comparable to physical setups, despite the absence of tangible materiality. Existing research on spatial sonic interaction in VR can be grouped into three main areas: enhancing musical interaction (e.g. Çamcı et al. Reference Çamcı, Vilaplana and Wang2020; Mitchusson Reference Mitchusson2020; Berthaut Reference Berthaut2021), applying VR and MR technologies in performance contexts (e.g. Atherton and Wang Reference Atherton and Wang2020), and exploring immersive audio-visual experiences in VR (e.g. Halac and Addy Reference Halac and Addy2020; Weinel Reference Weinel, Shiota, Kimura, Sandor and Sugimoto2021). However, none of these directly addresses modular synthesis, and such experimental interactions, experiences and visualisations often lack the versatility and productivity of traditional audio environments. In contrast, software such as Max, Pure Data and VCV Rack enables detailed audio signal shaping and modulation, while current XR technology can create three-dimensional sound laboratories equipped with DAW-like features and modular patching facilities supported by gesture and motion tracking. Scholarly research on the affordances and appeal of virtual modular systems is limited; notable exceptions include Michael Palumbo’s ‘Mischmasch XR’ project, which specifically investigates the capabilities of modular synthesis in VR (Palumbo et al. Reference Palumbo, Zonta and Wakefield2020; Wakefield et al. Reference Wakefield, Palumbo and Zonta2020).
In summary, while existing scholarly work has contributed to the development of spatial auditory interactions in VR and MR environments, it often lacks cultural depth and ethnographic richness. This existing research neglects subcultural subtleties and usage patterns around commercially developed modular synthesis applications in VR and MR such as Mux,Footnote 5 PatchWorld,Footnote 6 SynthVR,Footnote 7 SYNTHSPACEFootnote 8 (see Figure 3) and Virtuoso.Footnote 9 Users of these apps praise the spatiality of such tools in comparison, for instance, to 2D-based virtual modular environments that a user on Reddit expressed in the following way in regards to SynthVR:
(…) I have zero interest in VCV [Rack], and this is where VR is the key difference. For me part of the fun of making music with mini synths (and potentially modular) is to get away from the computer and do something that’s more physically and spatially engaging.Footnote 10

Figure 3. Screenshot of a half-filled modular case in the VR application ‘SYNTHSPACE’. The blue line in the background visualises the current audio signal.
This suggests that (re-)spatialised virtual modular systems in XR have the potential to somewhat approximate physical modular systems, precisely because they are spatial in nature. Another user expressed a similar view regarding PatchWorld.
I have been planning to get into pd for a long while but I just can’t face looking at code editors in my free time. Just looks like more work. Patchworld is ideal for me because it’s so different (…)Footnote 11
This emerging subculture of digital-spatial practice warrants further investigation, with events like ‘Patchathons’ (running on PatchWorld) combining virtual modular patching, three-dimensional storytelling and reactive visuals to create a novel spatial social media (Kirn Reference Kirn2021). Such research examines the extent to which similarly intense experiences and relationships – akin to those found with physical modular systems – can be achieved using virtual tools. It also explores the essential elements required for translating the modular ethos into the virtual domain and considers how these elements can be approached ethnographically, including by comparing remote and co-located MR research. Finally, it addresses the unique possibilities enabled by XR modular setups that cannot be realised through physical modular setups.
Netnographic analysis of creative communities in PatchWorld

Figure 4. VR Screenshot of an audio-visual jam session hosted by Mr. Todd in PatchWorld in September 2023 (image by courtesy of PatchXR)Footnote 12 .
Emerging from its predecessor MuX – a modular audio patching system for PC-based virtual reality – PatchXR’s PatchWorld has expanded into a collaborative XR platform since introducing multi-user capabilities recently. Described by music producer and YouTube reviewer Benn Jordan as the ‘Burning Man of the Metaverse’,Footnote 13 the platform distinguishes itself through its scale, complexity and active user base. For this reason, we selected PatchWorld as a case study employing a netnographic methodology (Kozinets Reference Kozinets1997; Kozinets Reference Kozinets1998; Addeo et al. Reference Addeo, Delli Paoli, Esposito and Bolcato2019). This approach examines the discourses and artefacts related to modular sound practices in PatchWorld, an XR application that integrates music creation, social interaction and virtual world-building.Footnote 14 By analysing community discussions on the app’s Discord channel and Reddit, user-generated content on YouTube and platform-specific features, we position PatchWorld within aforementioned criteria of the interactivity and the cultural dynamics of modular synthesisers.
Patching worlds as a social activity
PatchWorld combines modular audio synthesis with visual tools, enabling users to design interactive instruments, manipulate environmental lighting and import 2D and 3D assets via a web interface. While rooted in modular patching principles, PatchWorld operates ambiguously between game and tool: its low barrier to entry – marked by tutorials styled for teenage audiences – contrasts with its steep learning curve for advanced users. Social interactions are central to PatchWorld’s identity. Weekly public jam sessions hosted by community figure Mr. Todd foster cross-disciplinary collaboration, echoing the ethos of inclusive open-mixer events combined with XR’s global accessibility (see Figure 4).Footnote 15 This communal ethos extends to PatchWorld’s anarchic take on intellectual property: users can freely open, modify and reuse any instrument shared in jam sessions. Avatars – customised with masks and animated hand trails – explore shared virtual worlds using mobile-game-style graphics: simple textures, basic lighting and blocky geometry. Most users patch and perform pre-designed, high-level instruments – like recreations of famous grooveboxes – rather than working with low-level modular components. This stems partly from the shared edit mode, which exposes an instrument’s internal complexity to all in real time, often causing frame rate drops that discourage collaborative work on complex patches. As a result, PatchWorld typically becomes a shared jam space for higher-level instruments, music toys and visual generators – though the potential for full modular synthesis remains, in principle.
Hybridity and performative experimentation
PatchWorld’s modular framework extends beyond audio synthesis into environmental and cross-reality experimentation. A Reddit user’s polyphonic MIDI patch, for instance, connects a hardware keyboard to an array of virtual saw-tooth oscillators via an OSC bridge, balancing notes across the oscillators – a hack that underscores the platform’s potential as a mediator between physical and virtual signal flows.Footnote 16 Another user enhanced their experience by overlaying PatchWorld’s ‘AR Windows’ – floating MR rectangles anchored to specific physical locations that display the user’s surroundings via the pass through cameras of the Meta Quest headset – onto their physical laptop and DJ controller. This setup allowed them to perform while surrounded by a 360-degree video of vividly animated club visuals, creating an immersive environment for DJing with physical hardware.Footnote 18 Such uses highlight the potential of audio-visual modular patching as a form of ‘environment synthesis’ rather than a mere instrument.
Full-fledged live performances within the app further illustrate its musical potential. For instance, Gad Baruch Hinkis performed a live set as ‘GBH’ in PatchWorld, accompanied by several spectators within the same virtual space.Footnote 19 The perspective of one spectator was streamed to ‘Decentraland’, a browser-based platform for social 3D worlds, where it was displayed as a video feed for a larger audience of 3D avatars. This example underscores both the remarkable level of interconnectivity achievable with current technologies and the inherent limitations of computational power and software compatibility that necessitate such distribution methods. Hosting all participants directly within a shared PatchWorld instance would likely be too resource-intensive, and furthermore, not everyone has access to the required software or the skills to navigate it. This raises critical questions about the openness and inclusivity of the Metaverse as a truly public space, suggesting that such aspirations may remain elusive in the near future if not more open platforms and development standards such as WebXR become more prevalent.
Another example of a modular live set comes from Riccardo Ferri, who designed and patched his own devices to deliver a dynamic, multi-instrument techno performance (see Figure 5).Footnote 20 His setup, requiring ‘hundreds of hours creating huge complex rooms crammed with kit’,Footnote 21 relies on invisible sub patches to conceal complexity beneath dozens of interface elements and sequencers that float freely around the performer. In his live set, Ferri recorded himself multiple times while manipulating parameters in phase with the music – a technique reminiscent of traditional overdubbing. But rather than layering audio track by audio track, the telemetric transform data of the headset and controllers are recorded and then replayed and applied to the modules of a (static) patch as a multichannel performance. Such performances highlight both the platform’s creative flexibility and its technical challenges, as such heavy patching often necessitates PC-based rendering over standalone headset operation, an option that Ferri had access to for working on that project as a beta tester of PatchWorld. Another example of such ‘avatar overdubbing’ is featured in the gallery of PatchXR’s website, where the two performers of a duet were recorded separately in Mexico City and Geneva at different points in time and then brought together in a shared virtual space.Footnote 22

Figure 5. VR screenshot of a techno live set built in PatchWorld by Riccardo Ferri in 2021 with a total of six overdubbed instances of his avatar replaying at once (image by courtesy of Riccardo Ferri)Footnote 17 .
Evaluation
How does the advent of PatchWorld as a social creative platform compare to the physical modular ecosystem in regards to the evaluation criteria outlined earlier? Such an assessment reveals tensions between accessibility and ephemerality. While the advent of affordable tools enabled by platform benefits from subsidised VR hardware and the metaverse’s cultural momentum is also a driving factor, such a closed ecosystem – dependent on Meta’s infrastructure and PatchXR’s closed-source, centralised development – makes it difficult to ascribe a non-ephemeral quality to it. In addition, user-generated assets are managed through a central web server, rendering creations vulnerable to server discontinuation. This contrasts with grassroots modular communities, which clearly emphasise open standards and user agency. Yet the platform’s social infrastructure – exemplified by Mr. Todd’s sessions and intense feedback loops between developers and users on Discord – offers a counterpoint to its corporate underpinnings. These initiatives foster a sense of communal experimentation, albeit within a commercial framework. PatchWorld epitomises the contradictions of contemporary XR platforms: a space where modular experimentation collides with platform dependency and grassroots collectivity coexists with corporate infrastructure. Its significance lies not in technical perfection but in its reimagining of music-making as a spatially and socially distributed practice – a vision constrained by its material conditions yet vibrant in its participatory potential. As such, PatchWorld serves as a critical site for examining how modularity, as both a technical paradigm and a cultural ethos, adapts to the demands of the metaverse age.
Expanding virtual ethnographies with OpenSoundLab
OSL is an MR sound laboratory built upon SoundStage VR, developed in our ongoing research and independent work. The first iteration of OSL was conceived as part of Ludwig Zeller’s personal artistic practice, whereas the second iteration was later funded by the Academy of Art and Design Basel to serve as an educational tool during the Coronavirus pandemic (Zeller and Barfuss Reference Zeller and Barfuss2022, see Figure 6). In comparison to SYNTHSPACE (see Figure 3) and SynthVR, already the original SoundStage VR distinguished itself through its open-source licensing and emphasis on innovative 3D modular patching, moving beyond imitative recreations of Eurorack systems. OSL further evolves this concept, shifting away from SoundStage VR’s gamified structure to position itself as a professional and scholarly tool – a wireless, self-contained MR platform integrating pass through visuals that blend digital elements with the user’s physical environment. Critical features standard in creative modular synthesis workflows – including foundational components like voltage-controlled amplifiers, Sample and Hold modules, flexible delay units and the exponentially scaled ‘V/Octave’ pitch standard synonymous with Eurorack – remained underdeveloped in SoundStage VR, necessitating our enhancements. Furthermore, since SoundStage VR was initially built for Microsoft Windows, optimisation was required to adapt it for ARM-based Android systems, enabling compatibility with standalone VR devices such as the Meta Quest series.

Figure 6. Mixed-reality screenshots of an OSL workshop led by Ludwig Zeller at the Academy of Art and Design in Basel, Switzerland, in September 2024. Please note that there was no depth occlusion in this older version of the app.
While PatchWorld is a strong, actively developed tool with broad adoption, we created a third iteration of OSL as both a research platform and open-source foundation for scholars and practitioners. An open-source approach is essential for building a modular, widely adopted ecosystem and enabling rigorous research. An MR-first design is critical, as modularity thrives on dynamic interaction among diverse elements, systems and people. In contrast, VR often hinders real-time collaboration in shared spaces – whether in a studio or on stage – and restricts physical movement. Though PatchWorld lowers entry barriers for younger users and explores innovative XR experiences, it diverges from the needs of (semi-)professional and academic users. While its exploration of novel user experiences is valuable, replicating physical form factors, affordances and designs – such as in SYNTHSPACE (Figure 3) – would be overly restrictive and unimaginative. PatchWorld ultimately overrelies on visually overwhelming spectacle, diverting focus from streamlined modular audio synthesis, which remains central to our goals.
Building on the second iteration of OSL, we introduced comprehensive multiplayer functionalities for two key purposes. First, to advance OSL as a modern platform for spatialised modular practices in XR, meeting contemporary needs. Multiplayer support enables collaboration among artists and researchers, offers third-person spectator views for recordings or performances and allows live audience participation – adding a dynamic social layer. Combined with remote connectivity, it unlocks new modes of interaction unachievable with traditional physical modular systems. Second, we aimed to expand OSL’s role in ‘virtual ethnography’ (Angelone Reference Angelone2019), a methodology traditionally reliant on participatory observation in avatar-based platforms like Second Life. With extended realities, this approach can now include direct spatial interaction within XR environments or third-person observation – overcoming the limitations of first-person screen feeds, which lack stereoscopic depth and fail to capture spatial nuance. Besides virtual ethnography, our work extends methods such as immersive netnography (Kozinets Reference Kozinets2022), screencast videography (Kawaf Reference Kawaf2019) and volumetric performance capture experiments (McIlvenny Reference McIlvenny2020). While existing research on virtual ethnography remains limited, much of it centres on ‘researcher-as-avatar’ in platforms like Second Life (Boellstorff Reference Boellstorff2008; Kozinets and Kedzior Reference Kozinets, Kedzior, Wood and Solomon2009; Kozinets Reference Kozinets2015).
Technical description of networking and mixed-reality co-location
The key feature for this third iteration of OSL – as already stated above – is the implementation of both local and remote multiplayer functionality. To achieve this, we needed a solution that was stable, fast and compatible with OSL’s open-source nature.Footnote 23 For network data integration and transport, we chose the open-source Unity network library Mirror.Footnote 24 For voice-chat functionality, we integrated the open-source Unity component UniVoice,Footnote 25 which required adaptation to work with the Mirror API. Additionally, we implemented the Unity RelayFootnote 26 peer-to-peer system to facilitate remote connections among users. With these three systems integrated, OSL successfully enabled remote and local multi-user connections. When multiple headsets connect, either over a local network or via the relay peer-to-peer system, device data and manipulations are transferred from each client to the host and subsequently broadcast to all other clients. In our tests, the round-trip time for local Wi-Fi connections ranged between 20 and 50 ms, while relay-based connections typically ranged between 150 and 300 ms. To ensure co-location and co-presence (Heeter Reference Heeter1992; Schroeder Reference Schroeder2010) for users sharing the same physical space, we implemented a central calibration method developed at the Immersive Arts Space of the Zurich University of the Arts.Footnote 27 This calibration method superimposes the virtual space with the real space of each headset.
In addition to the technical challenges, we addressed key design questions regarding what device interactions should be synchronised and what should remain unsynchronised. Key elements that are synchronised include device positioning, plug connections and the states of knobs, sliders, buttons, etc. The spatial precision of device transformations required millimetre-scale accuracy, as XR headsets allow users to closely inspect virtual objects. Without fine-grained syncing, inconsistencies between headsets would arise. Social presence was enhanced by syncing users’ hand and head positions, and we later added name tags to better identify remote participants. Conversely, user menus were deliberately excluded from synchronisation to allow individual control.
After synchronising the interface, a significant challenge remained: synchronising the generated audio on each headset. An initial approach was to stream the host’s audio output to all other headsets. However, this proved to be impractical due to the high latency and thus bad responsiveness that this approach would have entailed. Additionally, for more elaborate scenarios, distributed audio calculations were mandatory, e.g. when each person needs to have an individual binaural perspective. Therefore, we opted to synchronise key audio parameters such as noise seeds,Footnote 28 phasesFootnote 29 and certain events and state valuesFootnote 30 for all devices and modules.Footnote 31 These parameters are synced whenever they change, ensuring accurate audio reproduction. When a new headset joins an ongoing session, it first loads all parameter values from the host headset to generate the correct audio output.
The addition of fine-tuned user-experience elements, such as passing tapes, devices and plugs from person to person, created a dynamic and expressive creative environment, enhancing social interactions that were not possible in the older versions. For users in the same physical room, the Meta Quest 3 headset’s DepthAPIFootnote 32 was integrated to render virtual objects behind physical ones. Without this, the depth perception was confusing, tiring and often simply not possible. This was even more strongly showcased when watching monoscopic recordings afterwards.
Evaluation of the experiments
OSL constitutes a distinct case within the landscape of virtual modular systems. It is intentionally streamlined and strongly oriented towards rapid, live patching and – while retaining selective analogies to physical modular devices – deliberately avoids visual spectacle in favour of a clean, functional design. Beyond this, OSL’s MR configuration establishes a direct connection to situated studio practice and collaboration, allowing virtual modular structures to be embedded within, and negotiated alongside, physical workspaces and co-present performers.
Moving beyond theoretical comparison and platform-level analysis, we conducted a series of practice-based experiments to examine how the criteria identified earlier – spatiality, tactility, non-symbolic interaction, flow and social embedding – manifest in situated use within this specific tool. The experiments addressed whether the relative effortlessness of digital spatial patching – liberated from material and economic constraints, yet grounded in simulated haptics, spatial organisation and binaural spatialisation – can offer a compelling alternative to the appeal of physical modular systems. Three creative sound practitioners, Anselm Bauer, Thomas Meckel and Dario Klein, were selected and handed a Meta Quest 3 headset with the multiplayer version of OSL installed. The distribution of MR headsets and software acted as a new type of ‘cultural probe’ (cf. Gaver et al. Reference Gaver, Dunne and Pacenti1999; Celikoglu et al. Reference Celikoglu, Ogut and Krippendorff2017; Townsend and Patsarika Reference Townsend, Patsarika, Comunello, Martire and Sabetta2022) in the form of an invitation for the participants to incorporate virtual patching with OSL into their respective artistic practices. The evaluation spanned approximately ten weeks, during which participants used the headsets in a series of jam sessions, instructional workshops and one-on-one tutorials. These tutorials provided support for patching techniques, either individually or in group settings. Following this phase, participants underwent an interview and delivered a final demonstration or performance, which was video recorded from both the artist’s and interviewer’s perspectives. Finally, a quantitative questionnaire was provided, asking participants to rate aspects of virtual modular systems versus physical ones. This questionnaire guided the interview by prompting discussions about specific responses and preferences. The recordings of the jam sessions and final demonstrations as well as other OSL-related activities are archived via Zenodo.Footnote 33
Anselm Bauer actively engages in modular streaming communities on Twitch, where he hosts his own channel and regularly participates in streaming events.Footnote 34 He has tested OSL’s first two iterations since 2021. In the third iteration’s experimental phase, he ran three jam sessions mainly for evaluation. Anselm then created installation and performance patches that he streamed live on Twitch several times. His works included: (1) a spatial ambient patch with slowly evolving chords and random samples placed around the listener; (2) an installation-style ‘city’ patch arranging modules into a floor-based cityscape, reimagining oscillators as buildings and cables as pathways so users could walk through the sound; (3) a house-sized speaker sculpture in a public park that blended dystopian visuals with ominous ambient tones to evoke awe at its scale (see Figure 7); and (4) a performative tunnel patch shaped like a labyrinth, where users wielded virtual drumsticks to navigate and generate sounds, merging gameplay with musical interaction for an immersive ‘dungeon-crawling’ feel.

Figure 7. Mixed-reality screenshot of a generative ambient patch built in OSL and placed as a roughly 20m high ‘modular plastic’ in a public park by Anselm Bauer.
Virtual–physical space is a core theme in Anselm’s OSL work. He explores embedding digital patches into real environments, creating site-specific audio pieces. By arranging modules as interactive installations that fuse sound and sight, he can prototype spatial concepts quickly – something traditional hardware often hinders. This method hints at live shows where spatial dynamics and visual staging could deepen engagement, whether in person or remotely. However, the ephemerality of these installations – virtual elements vanish without headsets – limits broader accessibility. He proposes AR previews via QR codes or GPS-anchored smartphone interfaces to bridge this gap. Public settings pose technical hurdles: tracking errors or occlusion in large or brightly lit spaces challenge large-scale setups. For hybrid projects, he notes that improving dynamic light and shadow rendering for virtual modules would enhance visual and spatial cohesion. Performance limits did also arise. Overloading patches with modules can destabilise frame rates, especially in his complex projects. While digital tools promise limitless replication, the practical technical constraints are frustrating. Reliability issues surface during live streams; unexpected Meta Horizon OS interruptions – such as pop-up notifications or mandatory updates – have disrupted critical moments. Despite these problems, Anselm values OSL’s tactile interface and the cognitive clarity and enjoyment he gains from spatial patching. He continues to use it frequently but cites its comparatively small module library – as opposed to Eurorack or VCV – as the main barrier to more extensive adoption.
Thomas Meckel is a performance and installation artist and musician with experience in Unity and HTC Vive Lighthouse tracking for sound installations.Footnote 35 New to OSL, he mastered the app in two sessions, drawing on his Max, Pure Data and hardware synth expertise. His project was a percussive instrument triggered by 3-D hand gestures, using separate XYZ coordinate systems for each hand (see Figure 8). In the right-hand system, the z-axis launched audio samples and modulated volume like an envelope or pedal; the x-axis crossfaded between two interchangeable sample banks (flute and guitar). The left-hand system mapped pitch to the y-axis via an attenuator for fine tuning, with an optional Pentatonic Minor quantiser that Thomas usually disabled to favour fluid glides and tremolo reminiscent of a Theremin. The left-hand controlled reverb intensity on the z-axis. Outputs were routed to spatially distributed speakers, creating an immersive sound field.

Figure 8. Mixed-reality screenshot of Thomas Meckel performing a percussion-oriented patch with depth occlusion enabled.
Thomas praised OSL’s tactile, hardware-like spatial control as immediate and productive, inspiring future work. This haptic interactivity sharply contrasted with traditional screen-based workflows that rely on abstract mouse and keyboard input. Dynamically performing patches in MR – akin to manipulating a synthesiser – felt creatively liberating; he likened it to three-dimensional live coding, turning patching into an embodied act. Adjusting parameters through hand gestures or controller movements, such as triggering samples by interacting with virtual objects, was described as fascinating and empowering.
While intrigued by headset-driven performances, Thomas expressed hesitation about conspicuous VR hardware on stage, citing ethical, aesthetic and practical concerns over the bulky, socially alienating design. He also raised worries about platform dependency and Meta’s gatekeeping in MR ecosystems. Though he acknowledged Meta Quest’s affordability and developer-friendly sideloading policies, he remained uneasy with centralised control. Referencing dystopian works like Snow Crash and Neuromancer, he likened the current metaverse to a privatised mall rather than an open commons.
To reconcile these tensions, Thomas suggested creative compromises. One option was raising the headset visor – like a helmet shield – after patch initialisation, allowing performers to focus on gestures while maintaining eye contact with audiences. Alternatively, he could drop the headset after setup and rely on self-tracking controllers (e.g. Meta Quest Pro) or lighthouse-tracked systems in PC-VR setups. This hybrid workflow, comparable to modular systems such as Nord Modular or Critter & Guitari’s Organelle, would let artists design in MR yet perform fully embodied in physical space, preserving spatialised precision without sacrificing stage presence. The approach suits hand-motion performances that require visual guidance but may not generalise to all interaction modalities.
Dario Klein produces and performs electronic dance music infused with Global Sound influences, frequently travelling to Morocco to teach workshops and study traditional instrumental techniques.Footnote 36 During OSL’s experimental phase, Dario hosted three collaborative jam sessions (cf. Lähdeoja and Montes de Oca Reference Lähdeoja and Montes De Oca2021) in his studio with varying setups: in one, all four participants used OSL; in others, only two used the app alongside up to four additional musicians playing electric guitars, basses and hardware groove boxes or synthesisers (see Figure 9). Throughout, he realised his vision of multichannel, sample-based drumbeat architectures and bass synth voices with generative variations. He also explored improvised, evolving sound textures via XYZ interfaces and macro-style parameter mappings for centralised control over voice characteristics, with adjustable intensity weightings for nuanced expression. Before the final interview, he tested hybrid formats combining a beat- and bass-oriented patch in OSL with live electric guitar performance while wearing the headset, aiming for physical audiences via projected virtual environments.

Figure 9. MR screenshot of Dario Klein (on the right) adjusting an OSL patch that is positioned above a table with physical audio equipment during a jam session.
Dario primarily uses physical hardware for improvisation and idea generation, employing Ableton later for multichannel recording and finishing tracks. He finds OSL a natural fit for his creative studio workflow, valuing its immersive, tactile feel, which bridges digital and physical tools. He considers screen-based interfaces uninspiring for jamming, though essential for final production. Compared to VCV Rack, Dario found its sound ‘too clinical’ and less engaging. We theorise this may stem from minimal physical feedback in mouse-driven interfaces, suggesting a synesthetic link where poor tactility reduces perceived auditory quality. He praised OSL’s oscillators, filters and envelopes as ‘clean and snappy’, well-suited to electronic genres – despite most modules lacking explicit anti-aliasing. He valued OSL’s collaborative features: instant module duplication, shared patch visibility and no need to transport heavy gear. Though he favours physical instruments as frictionless tools for experimentation, he acknowledged virtual tools’ unique potential. Spatial organisation in OSL – e.g. placing drums in one corner, mixers in another – created cognitive maps and reliable anchors, fostering familiarity and emotional attachment to virtual gear, challenging the ephemeral nature of digital interfaces. These insights – on tactile influence on auditory perception and enduring digital attachment – warrant deeper exploration in embodied interaction design.
Summary
Our study investigated whether the distinctive appeal of physical modular synthesisers can be preserved in XR applications. For our evaluation, we adopted and expanded upon criteria originally proposed by Scott (Reference Scott2016) and Paradiso (Reference Paradiso2017), identifying several key factors contributing to the success of the Eurorack modular format: the non-ephemeral nature of modular systems, the rise of affordable production tools, vibrant grassroots communities of builders and artists, non-symbolic self-representation, spatiality, tactility and the potential for deep states of flow. In our analysis and evaluation, both interactive elements and collective, socio-technological dimensions were central to assessing whether virtual modular sound applications can offer creative experiences that are not only comparable to but also potentially even superior to those provided by physical modular ecosystems. Using netnography, we analysed social media content from YouTube, Discord and Reddit, with a focus on PatchWorld – the leading commercial virtual modular platform. Additionally, OSL, an open-source MR modular sound laboratory developed in prior work, was enhanced for multi-user MR sessions, supporting both co-located and remote collaboration.
Just as the resurgence of physical modular synthesisers was propelled by the advent of affordable tools like CAD, cost-effective electronics manufacturing and specialised online forums, so too do virtual modular systems thrive thanks to accessible development engines such as Unity and the subsidised hardware and software initiatives led by Meta Reality Labs. Nonetheless, our research indicates that XR platforms operating on closed-source systems – such as Meta – generally lack the non-ephemeral quality often attributed to physical modular systems. Ethical stakes of corporate platform control loom at multiple levels: the Meta platform as a social media conglomerate, the closed-source Meta Horizon OS (which does not grant admin rights), the closed-source Unity game engine and obviously the application itself. The desire for open-source frameworks that retain the medium’s accessibility while reducing reliance on centralised systems is exemplified by OSL’s open-source licensing model, which supports the right to hack and the right to repair (albeit commercial use is limited). This framework encourages artists and scholars to adapt and deploy OSL across diverse hardware platforms, though its long-term viability remains contingent on Unity’s ongoing development as a centralised, closed-source platform. As MR evolves, the community’s ability to navigate these tensions – leveraging corporate advancements without surrendering autonomy – will determine whether MR becomes a democratised creative space or a corporatised ‘metaverse mall’. One strategic approach is to treat platforms like Meta Quest merely as stepping stones for more open platforms to come in the future, using their technological developments to foster independent, community-driven tools. Despite these challenges, PatchWorld stands as a remarkably positive example of how developers and artist-users collaborate as equals, akin to grassroots communities of creators and innovators found in the physical modular ecosystem.
Overall, OSL was praised for reimagining music creation by harnessing XR’s spatial potential, especially for experimental and performative genres. Participants found that OSL offers an inspiring, hands-on modular experience that feels more tactile and immediate than traditional screen-based software. The controllers provide pleasant haptic feedback – such as vibrations – that enhances interaction and makes actions like patching feel remarkably tangible. This combination of spatial and haptic affordances contributes to an intuitive and rewarding patching experience, fostering flow states comparable to those in physical modular systems. Although OSL cannot fully replicate the tactile, material and non-symbolic qualities of a physical modular system – given its simulated hardware environment that can glitch or disappear at any time – its emphasis on straightforward audio modular patching without cumbersome symbolic menus ensured a stimulating user experience. Virtual modular systems also offer distinct benefits. For example, the ability to save and recall modular patches is highly appreciated, and modules are inexpensive enough to be duplicated until the processor reaches its limit (with graphics generally being the bottleneck rather than audio). Users also appreciated the reduction of physical clutter. Binaural spatialisation is simple and immediately configurable, enabling walkable installations and hinting at future applications in recording and playback of 3rd-order ambisonics.
Taken together, the two case studies in this paper suggest that extending modular synthesis into XR is not primarily a question of graphical realism or ‘virtualising analogue hardware’, but of whether the medium can sustain the modular ethos: non-symbolic, spatially organised interaction; the possibility of flow through direct manipulation; and the social conditions under which patching becomes a shared practice rather than a private workflow. PatchWorld demonstrates how quickly a virtual modular environment can become a participatory culture – while simultaneously foregrounding the fragility of platform-dependent creativity in closed ecosystems. OSL, by contrast, shows how an open, MR-first and deliberately streamlined patching environment can anchor virtual modular work in situated studio practice and embodied collaboration, while foregrounding how simulated haptics and spatial organisation stimulate the users’ sense of immediacy, instrumental intimacy and perceived sonic character.
These findings reframe virtual modular synthesis as a design and research problem of infrastructure and governance as much as interface: What would it mean for XR modular practices to be genuinely non-ephemeral – archivable, forkable, repairable and interoperable – rather than contingent on headset operating systems, central servers and proprietary toolchains? How should we document and circulate patches when the patch is not only a sound structure but also a spatial environment, a collaborative situation and a performative trace? And how might future XR performance conventions address the aesthetic, ethical and practical tensions raised by head-worn hardware, while still allowing audiences to ‘see’ and understand the spatial action that gives these instruments their distinctive appeal? In this sense, the contribution of this work is not simply to argue that XR modular synthesis is feasible, but to surface the more consequential question: under which technical, cultural and institutional conditions can XR become a durable, open-ended medium for modular practice rather than a short-lived novelty constrained by its platforms.
Acknowledgements
Our research has received funding from the Spark program (grant number 221307) of the Swiss National Science Foundation (SNSF). Writing tools based on artificial intelligence were used for improving language clarity.