Musical communication involves performance and perception processes, both of which engage the sensorimotor system. In much of the performance science literature, however, musical communication is conceptualized as a one-way trajectory from active performer to passive listener, minimizing the contribution of the listener and the collaborative nature of communication. In this paper, we discuss how movement contributes to 1) music performance, through sound production, interperformer coordination, and visual expressivity, and 2) music perception, through the simulation of observed gestures, activation of crossmodal associations, and induction of overt synchronized responses. Embodied music cognition, which treats musical communication as a process of dynamic interaction between individuals, and emphasizes the role of the physical body in mediating between environmental stimuli and subjective experiences, provides a background for our discussion. We conclude the paper with a discussion of how ongoing technological developments are simultaneously enhancing our ability to study musical communication (e.g., via integration of optical motion capture and mobile eye tracking) and, by introducing means of performing music that do not rely on human movement, challenging our understanding of how music and movement relate.
Musikalische Kommunikation umfasst Performanz- und Wahrnehmungsprozesse, die beide das sensomotorische System einbeziehen. In weiten Teilen der performanzwissenschaftlichen Literatur wird musikalische Kommunikation jedoch als ein nur in eine Richtung gehender Prozess vom aktiven Musizierenden zum passiven Zuhörenden konzipiert, die den Beitrag des Zuhörenden und die kollaborative Natur der Kommunikation in den Hintergrund stellt. In diesem Beitrag wird diskutiert, wie Bewegung auf der einen Seite beiträgt zu Klangproduktion, Koordination zwischen den Spielenden und visueller Ausdruckskraft, und auf der anderen zu Musikwahrnehmung durch die Simulation von beobachteten Gesten, Aktivierung von intermodalen Assoziationen und Induzierung von deutlichen, synchronisierten Reaktionen. Das Konzept der Embodied Music Cognition, welches musikalische Kommunikation als einen Prozess der dynamischen Interaktion zwischen Individuen behandelt und die Rolle des physischen Körpers bei der Vermittlung zwischen Umweltreizen und subjektiven Erfahrungen betont, bietet einen theoretischen Hintergrund für unsere Diskussion. Wir schließen unseren Beitrag mit einer Diskussion darüber, wie aktuelle technologische Entwicklungen gleichzeitig unsere Fähigkeit, musikalische Kommunikation zu beobachten, verbessern (z.B. durch Integration von optischer Bewegungserfassung und mobilem Eyetracking), aber das Aufkommen von Musikformen, welche nicht mehr auf menschliche Bewegung angewiesen sind (wie z.B. Computermusik), unser Verständnis von Musik und Bewegung auf eine neue Art herausfordert.
Music performance takes many forms in our society, but usually involves trained and practiced musicians playing for an audience. In some musical traditions, conventions dictate silent and motionless listening behaviour from the audience; in other musical traditions, the audience is encouraged to move to or sing along with the music. Occasionally, music performance takes the form of a participatory activity that people do together as a group.
Music performance thus provides a venue for interaction between people – indeed, some scientists hypothesize that social bonding effects, in part, encouraged the widespread evolution of music-making abilities in early humans (
Music production and perception are active processes that engage overlapping perceptual-motor mechanisms, and movement forms a critical and inseparable part of both forms of musical experience. Our aim in this paper is to highlight the role of the sensorimotor system in music perception and the role of the audience in musical communication. We situate our discussion in the theoretical context of embodiment, which defines cognition as encompassing both internal processing and observable interactions with the world (
Following a discussion of the embodied music cognition paradigm, we consider the “musical product” (i.e., the presented performance) in terms of the movements – sound-producing, communicative, and expressive – that go into it. We explore the idea that movement not only underpins music production, but is also a part of the musical product itself (e.g., when it carries expressive and/or communicative information that supplements the audio signal). In the subsequent section, we discuss how movement underlies the audiovisual perception of performed music. We consider how audience members’ prior experiences shape their perception of sound-producing movements, and how sounded music activates sensations of motion in listeners and, in some cases, encourages overt movement. We close with a discussion of how recent technological developments have simultaneously given us improved means to study musical communication and raised new questions for researchers to address.
Embodied music cognition considers musical communication to be a nonlinear process characterized by dynamic interactions between performers, listeners, and their shared environment. The paradigm is in contrast to the “individualist” approach, which treats performers and listeners as separable from each other and from the musical stimulus, potentially understating the extent to which these three components interact (
Within the embodiment paradigm, there are diverging perspectives regarding the possible role of representational cognition in musical communication. Drawing on dynamical systems theory, some argue against the use of mental representations, and instead propose a framework in which musical interaction is dynamic, emergent, and autonomous (
The research presented in this chapter is largely in line with this more moderate approach to embodiment. By this perspective, interaction with the environment occurs by way of action-perception loops. As also described in the literature on perceptual-motor coupling, actions are coded in the brain in terms of the consequences they have on the environment (
The embodiment paradigm has been criticized for being overly broad and poorly defined – while its arguments are widely consistent with findings in the literature, it does not establish hypotheses specific enough to be empirically validated by disproving the alternative (disembodied) explanation (
Traditionally, body movement has been necessary for musical sound production, but musicians’ gestures serve other functions too: they enable changes in tone quality, facilitate coordination between ensemble members, convey expressive information to the audience, and support the performer’s own affective experience (
A central aim of the research on instrumental playing technique is to determine how variations in acoustic parameters are controlled by the performer. At a cognitive level, some musicians report focusing on the image of a desired sound as they play, which they say helps to guide their performance (
The motor commands used to externalize these images and deliver musical output have been studied in increasing depth as technology for capturing fine movements has improved (
How skilled musicians manipulate timbre and dynamics has also been a focus of study. In an investigation of piano timbre,
Ensemble musicians aim primarily to coordinate their sound. Sometimes they deliberately coordinate their sound-producing gestures as well – for example, orchestral string musicians typically use coordinated bowing patterns. More often, though, the sounds that must be coordinated are the result of different types of sound-producing gestures, which require different attack durations. Some coordination of expressive gestures (e.g., body sway) also occurs (
To coordinate their sound, ensemble musicians do not generally need to be able to see each other’s movements. Both trained and novice musicians synchronize with sounded rhythms in the absence of visual cues, even when the sounded rhythm contains irregularities and error-correction is needed to maintain synchronization (
Some recent studies of ours have considered the visual signals that are exchanged at piece onset (
Conductors’ gestures have been subjected to similar analysis. A study by
Of course, it is relatively infrequent that musicians make such explicit use of visual signalling as we describe here. It has been proposed that ensemble coordination is largely supported by an exchange of low-level sensory information (e.g., psychoacoustic sound parameters and movement kinematics) that induces entrainment between performers (
Musicians’ body movements are of substantial communicative value to an observing audience. The question of how visual cues in the form of observed body gestures contribute to our perception of music performance has generated some debate among researchers and, of course, among musicians, whose primary focus is the sound their movements produce, and who are not always pleased to think that the visual modality could have a prevailing impact on their audience’s experience.
One line of research in this area tests the hypothesis that auditory and visual contributions to the perception of music expression are integrated, rather than additive. In a study by
The magnitude of the effect that visual cues can have on even a highly-trained audience’s perception of expressivity was demonstrated by
The research we describe here shows that the audience’s perception of a musical performance derives from an integration of auditory and visual kinematic cues, and it highlights the audience’s role in assigning meaning to the performance. In the next section, we consider how movement can be perceived through musical sound as well as visually.
Musical communication is a creative, dynamic process comprising performance and perception components. In the previous section, we considered how body movements underlie performers’ contributions to musical communication, and we must acknowledge that the audience’s feedback, whether real or imagined, helps to shape those movements. When engaging with a musical performance, audience members draw on their own abilities, constraints, and experiences to construct some meaning from the performers’ audiovisual signals. As such, they can be considered active contributors to the creative process of musical communication.
The term “communicative musicality” has been used to describe how coordinated companionship arises from the temporal and expressive attributes of social behaviour (
The embodied music cognition paradigm posits that movement underlies music perception just as it underlies performance. In this section, we discuss 1) how audience members’ perceptions of sound-producing movement are shaped by their prior experience, 2) how movement is “heard” in sounded music through associations of motion and acoustic parameters, and 3) why music sometimes prompts listeners to move.
Audience members become active participants in a music performance the moment the sounded and/or observed music enters their perceptual systems. If the performance can be seen and heard, a critical part of the perception process is the binding of auditory and visual signals into distinct perceptual events that correspond to sound-producing gestures and their acoustic effects. This process of audiovisual integration draws on action-perception loops that strengthen with exposure to different gestures and their associated effects. The more experience a person has with a repertoire of gesture-sound pairs, the more precise their gesture-sound associations become. Strengthened associations have been observed for pitch (
Tolerance for asynchrony or incongruency in perceived gesture-sound pairs also depends on the type of motion observed and the type of sound produced. Less asynchrony is tolerated for piano playing than for (bowed) violin playing (
Strengthening of action-perception loops occurs with both perceptual and motor experience. For example, among pianists, listening practice (without overt movement) results in better recall (i.e., performance) of simple melodies than does motor practice without sound (
On the other hand, the learning benefits of combined perceptual-motor experience has been shown to outweigh the learning benefits of perceptual experience without overt movement.
These findings suggest that audience members draw on movement in the form of action-perception loops during the early stages of interpreting perceived music, when the binding of audio and visual signals occurs. As discussed above, the way audio and visual signals combine has a potentially strong influence over audience members’ perceptions of expression. The learning that occurs with observation of others’ performance, even if it occurs to a lesser extent than with overt practice, shows how the perceptual-motor system is tuned to change in a way that facilitates prediction abilities and supports multisensory associations.
Cross-modal correspondences are symmetric associations that people make between parameters in different sensory modalities. Associations between pitch height and spatial height, for example, are widespread and seemingly independent of musical training and linguistic background (
Some of the features that people associate with acoustic parameters they also associate with emotional constructs. For example, some positively-valenced words (e.g., happy) are associated with a high spatial position, while their antonyms (e.g., sad) are associated with a low spatial position (
Taking a different perspective, the “FEELA” (Force-Effort-Energy-Loudness-Arousal) hypothesis relates affective parameters of music (e.g., arousal) to the corresponding acoustic parameters (e.g., acoustic intensity) and parameters of the movement needed to produce the underlying sound (
Studies of cross-modal correspondences suggest that part of the meaning an audience gets from perceived music can come from the associations they make between acoustic and motion parameters. While hearing movement in music is itself potentially meaningful, associations with emotional constructs may additionally contribute. Such findings are in line with the proposed role of the human body as a mediator between subjective experience and the environment that is involved in the construction of musical meaning (
Some music induces a sense of movement in listeners, inciting in both trained and novice musicians an urge to synchronize with the beat (
Other parameters have been shown to contribute to the perception of groove for both trained and novice musicians.
The results of the study by
In recent years, we have seen the development of sensor and camera systems that measure musicians’ movements and audience perceptions of them in great detail. Simultaneously, with the advent of technology-mediated performance, computers have been playing an increasing role in human musical performance, disrupting the traditional relationship between musical sound and movement (
The techniques available for studying music-related movements include sensors capable of making fine-grained measurements of movement parameters that are not readily apparent to an external viewer, such as finger forces in pianists (
Inertial and optical motion capture systems are widely used in the study of performance gestures (
Most eye tracking systems use infrared cameras to detect corneal reflections around the pupil and measure its movements. Eye tracking systems can be remote or mobile: remote systems are typically mounted to a computer screen and suitable for monitoring gaze directed towards a stationary stimulus (e.g., text or images), while mobile systems are mounted on the subject’s head and can be used to study gaze behaviour in a 3D environment (e.g., a performance space). Both remote and mobile eye trackers calculate measures such as pupil position, point of regard, and pupil dilation. Mobile eye tracking is a useful technique for studying gaze behaviour in performing musicians, who typically require more freedom of movement than remote eye trackers can cope with.
There are ongoing efforts by several research groups, including ours, to integrate mobile eye tracking and optical motion capture (
An integrated motion capture-eye tracking system greatly simplifies the analysis of eye gaze data. With mobile eye tracking, extensive manual coding of video data is typically required, since in contrast to remote eye tracking, the visual scene is constantly changing and areas of interest are not static. If mobile eye trackers are used in combination with motion capture, however, detection of subjects’ gaze targets can be automatized by remapping gaze coordinates into the motion capture coordinate system. Moments where the gaze target is the musical score or another performer (i.e., an object or person defined with markers) are then readily identifiable.
Complementing such an integrated system, and further facilitating the study of body movements and gaze behaviour, would be the development of advanced data analysis tools. Currently, MATLAB users can access the Mocap Toolbox developed by
Ongoing changes in the way music is created, distributed, and experienced by audiences challenge the understanding we have of embodied music perception. For example, while people do attend live concerts on occasion, most of the music they encounter on a daily basis enters the perceptual system unimodally, as an auditory signal without corresponding visual cues. The pervasiveness of this unimodal auditory presentation raises some questions: when people hear music without knowledge of the movements needed to produce it, is their perception less embodied than it would have been, had they some familiarity with those movements? Earlier, we discussed some perceptual sub-processes that draw on listeners’ motor resources. Some of these sub-processes, such as the ability to predict the sounded outcomes of performer gestures and the integration of audio and visual signals, are enhanced by visual experience. On the other hand, cross-modal correspondences are thought to reflect learned associations between commonly co-occurring events and sounds that develop through a combination of general (not music-specific) statistical learning and familiarity with cultural/linguistic conventions (
Music listening today is often a solitary activity, done alone, over headphones, with no possibility of observing the performers.
Changes to the way music is created include the development of algorithms capable of performing music and the introduction of digital musical interfaces (DMIs) that human performers can use to create and manipulate digital music in new and creative ways. Some of the sounds produced via these methods, including machine-like or environmental sounds, are not likely to be attributed to human movements by our perceptual systems. Research has already shown that audience members’ aesthetic judgments of DMI performance suffer when they are unable to figure out the gesture-sound mappings of the interface (
An outstanding question is whether music comprised of such sounds is, like music comprising sounds that result from human movements, experienced in an embodied way. The results of the study by
The aim of this paper was to show that musical communication is a dynamic and collaborative process involving performers and an active audience, for whom music perception is a motoric process. Traditionally, overt movement has been critical for performed music, necessary for sound production as well as a part of the musical product presented to audiences. Today, this is no longer entirely the case, since 1) the audience does not always see the performance and 2) sounded music can be generated without sound-producing gestures. Research findings are in line with the prediction that music perception is embodied: motor resources are drawn upon throughout the perception process and while constructing meaning out of the musical signal. At present, however, the literature still lacks strong tests of the embodiment paradigm, as well as an indication of how robust current findings are to a broader definition of “music” that includes genres outside the Western art music tradition, including genres in which the potential for interpersonal interaction is higher (e.g., group improvisation) and genres in which it is lower (e.g., electroacoustic music).
This research was funded by Austrian Science Fund grant P29427.
The authors have declared that no competing interests exist.
The authors have no support to report.