In the age of mobile music players and headphones that allow people to listen to music in public spaces, the question of how music influences the perception of environments is becoming increasingly relevant. While the evaluation of environments depends primarily on physical and social features, several experimental studies report differences in the evaluation of environments as a function of the music listened to (e.g., Ehret et al., 2021; Franěk et al., 2020; Yamasaki et al., 2015). Perceptual shifts leading to different ratings could be explained by mood-congruency effects (e.g., Bower, 1981; Forgas, 1994), according to which cognitive processes are influenced by affective states in a congruent manner. Therefore, the present study combines the investigation of musical influences on emotional states with the assessment of environments that differ in their properties. To examine social properties, we used situations that differ in the way people relate to each other, and incorporated the concept of entitativity (Campbell, 1958), which describes the extent to which a collection of individuals is perceived as a coherent social group.
Perception of Environments and Entitativity
The evaluation of environments is based on the perception of objective (e.g., location, geographical surface, buildings) and social features (e.g., presence of other people, group constellation) that determine how comfortable people feel, think, and behave in a particular place (Meagher, 2020). It is also a multimodal process, being commonly complemented by other senses such as hearing, “adding contrast, continuity and meaning” (Pocock & Hudson, 1978, p. 48) to what is seen. Moreover, influenced by a person’s current needs, motives, moods, and cognitive processing (Ittelson, 1978; Pocock & Hudson, 1978), environmental perception is highly subjective and cannot be explained solely by existing environmental features. In this sense, Fisher (1974) suggested that the objective characteristics of an environment and the perceived environment should be considered as two independent dimensions. Studies in environmental psychology are primarily concerned with the subjective evaluation of environments, the “internal representations of that environment—the meaning they [the perceivers] attribute to it” (Russell et al., 1981, p. 276). These include perceptual/cognitive components (relating to objective features of the stimulus, e.g., judgments about the environmental sounds) and affective components (e.g., judgments about the mood of the environment).
The evaluation of social information is subject to social perception, which describes the ability to interpret interpersonal cues to evaluate, among other things, people’s characteristics, social context, and relationships (McCleery et al., 2014). Studies of environmental perception have mainly focused on the importance of object-related properties and their subjective evaluation, although social information is an equally important dimension of environmental perception. Thus, social information in environments is situation-specific and mainly refers to the presence of other people and their interactions (Fisher, 1974). The presence of other people has been shown to influence perceptions of crowdedness (Fisher, 1974) and place acceptance (Paay & Kjeldskov, 2008), to alter feelings of safety (Jorgensen et al., 2013; Mak & Jim, 2018) and privacy (Lis & Iwankowski, 2021), and to trigger physiological responses such as stress (Kiritz & Moos, 1981).
In public, there are a variety of people with different intentions, for instance, to meet friends or to wait for the bus. Both examples form a (temporary) unit, and perceivers distinguish, even implicitly, between unrelated people and related group members when categorizing in everyday life (Hamilton et al., 2002). Relying on the intuitive theory of groups, perceivers conclude the social constellation of group members (Lickel et al., 2001). Based on Gestalt psychology, Campbell (1958) offered four characteristics for the perception of individuals as social units, according to which elements with a high expression of these characteristics are more likely to be perceived as part of a social group: spatial proximity (individuals stand close to each other), similarity (individuals look similar and share characteristics), common fate (individuals move together and share a direction), and good continuation (individuals are part of a spatial pattern, e.g., a line). Empirical findings point to various types of groups that differ in their degree of entitativity and can be reduced to intimacy groups (e.g., family, friends), task-oriented groups (e.g., sports teams), social categories (e.g., gender), loose associations (e.g., employees of a company) and transitory groups (e.g., people waiting for the bus at a bus stop). These groups show decreasing entitativity in the order mentioned, which can be attributed to differences in, for example, social interaction (Lickel et al., 2000). Perceptions of group entitativity have practical implications. For example, individuals are more likely to engage with groups that are high in entitativity (Hamilton et al., 2002). On the other hand, groups with high entitativity may elicit more negative emotions such as creepiness (Bera et al., 2018) or threat (Abelson et al., 1998; Dasgupta et al., 1999).
There is some evidence that social situations shape emotional responses to music (Egermann et al., 2011), yet less is known, on the other hand, about how music may have an impact on the interpretation and perception of social situations. In the following, factors that affect the perception of visual information and the environment will be discussed.
The Impact of Music Listening and Emotions on Visual Perception
The influence of music on multimodal perception has been deliberately used in films for decades. Research on film music has consistently shown that music controls the interpretation of scenes and protagonists (e.g., Bullerjahn & Güldenring, 1994) as well as attention (Ansani et al., 2020; for an overview see Herget, 2021), and affects both peripheral physiological arousal and the emotional interpretation of the visual content in films and video clips (Wöllner et al., 2018). These effects seem to apply to real-world situations. For instance, an audiovisual experiment on car simulation (Iwamiya, 1997) found that listening to music makes landscapes appear more pleasant than without music and that they are generally evaluated congruently with musical features. This has also been shown in various field experiments (Ehret et al., 2021; Franěk et al., 2020; Yamasaki et al., 2015). The interplay between musical features and environmental features (e.g., natural or urban environment, valence of atmosphere) is crucial for how perception is altered. By directing participants to different locations while listening to music that varied in activation and valence, Yamasaki et al. (2015) showed that listening to music modulates environmental perception on affective evaluations corresponding to musical features (e.g., relaxing, pleasant), with incongruence between musical and environmental features leading to a greater musical influence on environmental evaluations (e.g., relaxing music in crowded environments). Franěk et al. (2020) investigated the influence of music on environmental ratings such as openness, pleasantness, energy, and mystery. The researchers found only small effects for pleasantness and mystery and concluded that the environment itself, rather than the music, is more relevant to the evaluation. An indoor field experiment by Ehret et al. (2021) investigated interactions between spatial and musical atmospheres on valence ratings. The overall impression of the atmosphere was based on both variables, but a pleasant atmosphere is easily disturbed by negative music, whereas an unpleasant atmosphere cannot easily be improved by positive music.
These findings are to a certain extent influenced by the effects of the music on listeners’ affective states, and individual mood and emotion play an important role in how we perceive the world (e.g., Zadra & Clore, 2011). The associative network theory of memory and emotion (Bower, 1981) assumes that emotional states influence the way information is processed and shape mental representations so that memories, ideas, associations, interpretations, attention, and perception are congruent with the current mood, commonly referred to as mood congruence effects. Such effects have been observed in various domains, including social perception (Bless, 2001). The Affect Infusion Model (e.g., Forgas, 1994, 1995) explains possible mechanisms of these congruence effects, including judgments about people and social interactions (Forgas et al., 1984). Fisher (1974) observed similar mood-congruent effects for environmental perception, where positive affect leads to positive evaluation of the environment and negative affect to negative environmental evaluations.
Although the terms mood and emotion cannot be completely separated and are interdependent, they are distinguished by the fact that emotions tend to be event-related, show high intensity, and are relatively short-lived, whereas moods are described as diffuse, persistent, low-intensity affective states that are not tied to specific causes (Scherer, 2005). Since music is a powerful tool to regulate one’s feelings, which is a major reason for listening to music (e.g., Garrido & Schubert, 2015; Saarikallio & Erkkilä, 2007; Schäfer et al., 2013), music may also influence how information is processed and evaluated. Several experiments have used music to investigate the influence of emotional states or moods on cognitive processes. Although differences between mood and emotion induction are not clearly defined in several studies (for implications for study design, see Garrido, 2014), participants are typically presented with an induction procedure by listening to music before or during task performance, for example in the study of selective attention in eye-tracking (Arriaga et al., 2014; Isaacowitz et al., 2008), the dot probe (Tamir & Robinson, 2007, Experiment 5), free recall memory (Parrott, 1991; Tesoriero & Rickard, 2012; see also Talamini et al., 2022), or emotion processing (Bouhuys et al., 1995; Jolij & Meurs, 2011), all of which support the mood-congruency hypothesis. The influence of music listening on mood may also change the perception of physical environmental features. Riener et al. (2011) showed that the perception of steepness depends on the type of music listened to, such that hills appear steeper to observers in a negative mood (sad music) than to observers in a positive mood (happy music).
Taken together, previous findings support the assumption that a person’s affective state, which can be altered by listening to music, influences information processing in a congruent manner, affecting how the outside world is perceived and evaluated.
In contrast, less is known about the influence of music or mood on the perception of entitativity. A study on human-robot interaction (Savery et al., 2021) investigated the influence of emotional musical prosody, i.e., the variation of timbre, rhythm, and pitch extracted from speech, on the perception of the entitativity of an aggregation of robotic arms. The results show that a single voice increased entitativity ratings of the robotic arms, whereas multiple voices with slight variations decreased entitativity ratings compared to the gesture-only condition. More applicable is the study by Edelman and Harring (2015), who investigated the influence of music on social bonding by focusing on the perceived entitativity and rapport of three walkers. Videos were created that differed in terms of synchronicity of movement (synchronous, asynchronous) and music (music, no music). For the music conditions, three types of music were chosen that differed in their degree of liking (liked, neutral, disliked). It was shown that entitativity was increased when listening to music for synchronous and asynchronous movements, with neutral music showing the highest ratings, followed by liked and disliked music. Entitativity ratings were poorly correlated with the individual's mood. Based on these findings, the effect of music on the perception of closeness should be further investigated by controlling the choice of music to induce specific moods and by using a greater variety of stimuli from social situations.
The Present Study
The present study aimed at investigating the effects of music on emotional states and the perception of real-world images. We also considered the effect of group structure, referring to the accumulation of people with different degrees of entitativity. To this end, we conducted an experiment that examined both the influence of music (between-factor) and the influence of social group entitativity (within-factor) on affective and cognitive dimensions. Based on existing findings on mood-congruency effects, we expected that listening to negative, positive, or no music during the experiment would lead to different emotional states and, in turn, different evaluations of social situations. The entitativity of groups may shape the findings, yet we are open about the direction of the changes. Hypotheses are:
H1: The experimental conditions of negative, positive, or no music should influence participants’ emotional states accordingly.
H2: The experimental conditions of negative, positive, or no music should influence participants’ evaluations of social situations accordingly.
Method
Participants
A total of 215 individuals participated in the online experiment. According to an a priori Power-Analysis (G*Power: Faul et al., 2007) for within-between-interactions, assuming a moderate effect size of f = 0.2, an alpha error at 0.5, and a test power of 95%, the total sample size was calculated to include 102 participants. Several datasets had to be removed due to technical problems (music was not played) or meaningless responses, which were identified based on completion time or straightlining (Leiner, 2019), leading to a total of 183 datasets to be analyzed. Age ranged from 19 to 60 years (M = 26.92, SD = 6.50), 138 individuals were female, 44 were male and one person did not specify. Regarding musical training, 124 individuals (67.8%) reported that they had instrumental or vocal training for an average of 8.03 years (SD = 5.71). The sub-samples in the music conditions did not differ in terms of musical experience during the experiment (musical preference, awareness of music, intensity of musical experience: p's > .6; for descriptive information see Table 1). The study was conducted in accordance with the guidelines of the Ethics Committee of the Faculty of Humanities, University of Hamburg. No incentives were offered, and participants gave informed consent before taking part in the online study.
Table 1
Variable | Negative Music | Control | Positive Music |
---|---|---|---|
Sub-sample size (n) | 62 | 66 | 55 |
Gender (n) | |||
Female | 43 | 49 | 46 |
Male | 19 | 16 | 9 |
No specification | 0 | 1 | 0 |
Age: Mean (SD) | 27.77 (6.07) | 27.02 (6.97) | 25.81 (6.33) |
Musically active (n) | 38 | 44 | 42 |
Duration: Mean (SD) | 7.45 (4.48) | 7.84 (6.27) | 8.76 (6.11) |
Musical Stimuli Ratings: Mean (SD) | |||
Musical preference | 63.29 (25.89) | — | 62.16 (26.32) |
Awareness of music | 78.08 (18.73) | — | 79.53 (18.21) |
Intensity of musical experience | 62.79 (23.06) | — | 62.11 (25.21) |
Selection of Musical and Visual (Picture) Stimuli
First, the experimenters selected 20 unknown instrumental music pieces (Audio Library in YouTube Studio), composed for use in films and videos to create a background and atmospheric mood. Based on the emotion categories in the audio library (i.e., sad music, happy music), ten pieces were preselected to elicit negative valence and low arousal (henceforth “negative music”), while the other ten pieces should elicit positive valence and high arousal (“positive music”). The music consisted of easy-listening melodies and harmonies with no sudden changes in dynamics, and belonged to various genres. The pieces of music were shortened to a duration of 16 seconds and standardized in terms of volume and fade-in and fade-out effects using Audacity (Audacity Team, 2018). In a pretest, 24 independent participants (mean age = 21.54, SD = 2.21; 16 participants female, 8 participants male; students of musicology) who did not take part in the main experiment indicated their agreement with statements about the activating and positive character of the music on a 7-point scale (valence: “The piece of music creates a positive mood.”; arousal: “The piece of music has an activating effect.”; 1 = strongly disagree to 7 = strongly agree), showing significant differences with Bonferroni correction between the two music sets as expected, valence: MDiff = 3.58, t(23) = 32.84, padj < .001, d = 6.70; arousal: MDiff = 3.42, t(23) = 27.98, padj < .001, d = 5.71, and no differences concerning preference or familiarity (p’sadj > .05). The Appendix shows a table of the musical pieces selected.
Second, ten images were selected from the photo platform Flickr, representing different social situations in everyday life. Pictures were chosen to ensure that participants rated the same environments and to control for the presence of people and social group formation. The pictures were grouped into two different types of social groups to be evaluated, containing several individuals and showing different levels of entitativity. Half of the images represented intimacy groups, showing friends or other personal acquaintances in public places with high levels of entitativity. The other half represented transitory groups, low in entitativity, showing an aggregate of random people in public situations. The degree of entitativity is visually recognizable by the spatial proximity and arrangement of the people, and the way they face each other, implying social and contextual information about personal relationships. The images were blurred to reduce information from clear facial expressions that could interfere with affective judgments (Wöllner, 2008), to avoid biases in the perception of entitativity (Magee & Tiedens, 2006), and stereotyping effects, for example due to gender, clothing, or attractiveness (Figure 1).
Figure 1
Measures
We asked participants to assess the environment of each picture using four items on a 7-point bipolar scale. We presented a selection of three items used by Yamasaki et al. (2015), to measure the evaluation of pictures of the environments (“How do you evaluate the situation in the picture?”) in terms of (1) cheerfulness (sad – cheerful), (2) pleasantness (unpleasant – pleasant), and (3) crowdedness (empty – crowded). The fourth item was designed to address social aspects by asking participants to rate the group according to how well individuals know each other (“Do you think, people in the picture know each other?”), thus determining (4) familiarity between them (individuals do not know each other at all – individuals know each other very well), which we assume is a direct measure of the distinction between intimacy and transitory groups. This builds on the observation of Lickel et al. (2001) that the lay theory of groups involves the perceiver’s impression of the relationship between group members.
In this study, cheerfulness and pleasantness are considered affective rating dimensions. In contrast to Yamasaki et al. (2015), crowding was theoretically assigned to a cognitive rating dimension of social information in this study, which evaluates the ratio of individuals present in physical space, as well as the rating of familiarity among individuals, which is inferred from distance and spatial arrangement and refers to objective features of the environment.
Participants were also asked to rate their current emotional states before (baseline) and after the experiment using two sliders for valence (1 = very negative to 101 = very positive) and arousal (1 = very calm to 101 = very aroused). Those participants who listened to music were also asked about their liking of the music (slider: 1 = I did not like the music at all to 101 = I liked the music very much), whether they were consciously aware of the music (slider: 1= I did not notice the music at all to 101 = I was very aware of the music), and how they experienced the music (slider: 1 = I did not feel the music at all to 101 = I felt the music very intensely) after the experiment.
An a priori analysis of the images of the two types of social groups (intimacy vs. transitory groups) across all participants was computed to determine whether the intended grouping information was successful. The mean ratings of the familiarity variable, ignoring the experimental conditions, showed that intimacy groups were rated above the scale mean (mean range between 4.66 and 5.80), whereas transitory groups were rated below the scale mean (mean range between 3.82 and 1.49), with significant differences in the means, t(182) = 41.4, p < .001, d = 3.06, 95% CI [2.71, 3.40]. Thus, the presented intimacy groups are perceived as knowing each other well, in contrast to the transitory groups, suggesting a successful manipulation of the within-subject factor.
Procedure
After having provided their informed consent and answered questions on demographic data and musical training, participants were randomly assigned to one of the experimental conditions (1: negative music condition, 2: no music condition, 3: positive music condition). Before the experiment began, participants were asked to report their current emotional state. Next, the technical functionality was tested with an audiovisual stimulus in which participants could solve or report problems with the automatic audio playback.
Each picture was introduced by a fixation cross (presentation duration: 8 seconds). The pictures followed a random order and were presented for another 8 seconds, resulting in a total length of 160 seconds of music listening. For the music conditions, the pieces of music were randomly paired with the pictures. After each stimulus presentation, the rating items for the social situations were presented. They could not move to the next page until the items were rated. At the end of the experimental phase, participants again rated their emotional states in terms of valence and arousal. For the music conditions, they reported their listening experience regarding preference, awareness of perception, and intensity of experience.
Data Preparation and Data Analysis
Data were analyzed using the statistical software R (R Core Team, 2022) and RStudio (RStudio Team, 2022). The research interest was investigated with individual mixed ANOVAs for each dependent variable. In the first step, we identified some violations of the assumption of the mixed ANOVA design of within-subject and between-subject factors (normal distribution of residuals, outliers, homogeneity of variance, Box's M-test for homogeneity of covariance matrices). To address this issue, we computed two types of analyses: (a) robust ANOVA (Mair & Wilcox, 2020) using the WRS2 package (Mair et al., 2022), comparing the results with (b) standard mixed ANOVA using the rstatix package (Kassambara, 2023). Before running the standard mixed ANOVA procedure, we identified and removed extreme outliers (more than three standard deviations from the mean) for each dependent variable and design level, and re-tested the assumption of mixed ANOVA designs, but we still found violations. In general, ANOVAs with equal group sizes can be considered quite robust to violations (e.g., Field et al., 2012, pp. 412–414). Since there were no differences in the results between the robust ANOVA procedure and the standard procedure, we decided to report the latter for better interpretability of the standard metrics. Pairwise comparisons of experimental conditions were analyzed using post-hoc tests with Bonferroni correction.
Results
To investigate the influence of music on participants' emotional states and, in turn, their evaluations of social situations, we first present the results of the music-induced emotional states according to experimental conditions. Subsequently, we present the influence of the experimental conditions on the ratings of the social situations.
Changes of Emotional States in Experimental Conditions
To determine whether there were differences between the experimental conditions (positive music, no music, negative music; according to the pilot study results) for participants’ emotional states, we conducted two mixed ANOVAs for the valence and arousal dimensions looking at the differences from before and after the experiment (Table 2).
Table 2
Experimental Condition | Emotional State Measures
|
|||
---|---|---|---|---|
Valence
|
Arousal
|
|||
M | SD | M | SD | |
Before emotion induction | ||||
Positive music | 62.47 | 24.07 | 39.11 | 21.55 |
Control (no music) | 65.41 | 24.54 | 44.23 | 26.67 |
Negative music | 57.39 | 18.19 | 50.15 | 20.47 |
After emotion induction | ||||
Positive music | 69.33 | 18.01 | 40.82 | 22.00 |
Control (no music) | 63.85 | 20.88 | 45.50 | 24.53 |
Negative music | 55.65a | 17.45 | 43.60 | 19.25 |
Note. Valence and arousal were rated on a bipolar slider (1–101). Within-subjects comparison before and after emotion induction based on a mixed ANOVA for valence and arousal separately did not indicate significant differences. Before emotion induction, no differences were found between experimental conditions.
aIndicates that participants listening to negative music showed lower self-reports in valence than participants in the other experimental conditions after emotion induction.
For valence, we found a significant effect of experimental condition, F(2, 180) = 4.27, p = .015, = 0.045, 95% CI [0.00, 0.11], and a significant interaction effect with time of measurement, F(2, 180) = 5.56, p = .005, = 0.058, 95% CI [0.01, 0.13]. Post-hoc pairwise comparisons with Bonferroni correction revealed mean differences in post-experiment valence, F(2, 180) = 7.81, p < .001, = 0.080, 95 % CI [0.02, 0.16], between the negative music condition and the positive music condition, t(112) = -4.16, padj < .001, d = -0.772, 95% CI [-1.15, -0.39], and the no music condition (marginally significant), t(124) = -2.42, padj = .051, d = -0.426, 95% CI [-0.78, -0.08]. Participants listening to negative music rated their valence lower than those in the two other conditions. For arousal, we observed an interaction effect between experimental condition and time of measurement, F(2, 180) = 3.40, p = .035, = 0.036, 95% CI [0.00, 0.10]. After Bonferroni correction, we found no evidence of group differences before or after the experiment (p’sadj > .07). Taken together, participants’ emotional states were influenced by positive, negative, or no music only in terms of the valence dimension.
Effects of Experimental Conditions and Social Group Type on Rating Dimensions
We conducted a 3x2 mixed design ANOVA to analyze the effects of experimental condition (between-factor: negative music, no music, and positive music) and Social Group Type (within-factor: images of intimacy groups and transitory groups) for each dependent variable (cheerfulness, pleasantness, crowdedness, and familiarity). The results of the mixed ANOVAs are presented in Figure 2 (for descriptive statistics see Table S1 in the Supplementary Materials, Kuch & Wöllner, 2024b).
Figure 2
For the affective dimension of cheerfulness we observed main effects of the experimental condition, F(2, 177) = 24.71, p < .001, = 0.218, 95% CI [0.11, 0.32], and Social Group Type, F(1, 177) = 194.90, p < .001, = 0.524, 95% CI [0.42, 0.0.62], the interaction effect was significant, F(2, 177) = 3.05, p = .05, = 0.033, 95% CI [0.00, 0.10]. Post-hoc comparisons revealed that ratings differed between experimental conditions for intimacy groups, F(2, 177) = 17.50, p < .001, = 0.165, 95% CI [0.07, 0.27], and transitory groups, F(2, 177) = 12.90, p < .001, = 0.127, 95% CI [0.05, 0.22]. Multiple Welch t-test comparisons with Bonferroni correction showed that participants listening to negative music rated intimacy groups as less cheerful than participants listening to positive music, t(105) = ‑5.36, padj < .001, d = -0.992, 95% CI [-1.38, -0.60], and participants listening to no music, t(105) = -4.06, padj < .001, d = -0.732, 95% CI [-1.09, -0.37]. Similarly, participants in the negative music condition also rated transitory groups as less cheerful than participants listening to positive music, t(110) = -4.31, padj < .001, d = -0.801, 95% CI [-1.18, -.42], or no music, t(103) = -3.96, padj < .001, d = -0.714, 95% CI [-1.08, -0.35]. The musical effects of the experimental conditions were greater for the intimacy groups (negative – no music condition: MDiff = 0.55; negative – positive music condition: MDiff = 0.75), compared to the transitory groups (negative – no music condition: MDiff = 0.34; negative – positive music condition: MDiff = 0.4). Within each experimental condition, the ratings for intimacy groups were higher than for transitory groups, which differed significantly (p’sadj < .001).
On the affective dimension of pleasantness, there was a significant main effect for the experimental conditions, F(2, 175) = 7.60, p < .001, = 0.080, 95% CI [0.02, 0.16], and the Social Group Type, F(1, 175) = 408.12, p < .001, = 0.700, 95% CI [0.62, 0.77]. The interaction effect was also significant, F(2, 175) = 5.48, p = .005, = 0.059, 95% CI [0.01, 0.14]. Post-hoc comparisons with Bonferroni correction showed an effect of the experimental condition for the Social Group Type of intimacy groups, F(2, 175) = 11.20, p < .001, = 0.114, 95% CI [0.04, 0.21], but not for transitory groups (p > .05). According to Welch’s t-tests, participants listening to negative music showed lower pleasantness ratings for intimacy groups compared to listening to positive music, t(108) = -3.65, padj = .001, d = -0.678, 95% CI [-1.05, -0.30], or no music, t(110) = -4.11, padj < .001, d = -0.744, 95% CI [-1.11, -0.38], but we found no group differences for transitory groups. The mean differences between the intimacy and transitory groups for each experimental condition were significant (p’sadj < .001).
For the social-cognitive dimension of crowdedness, we found a small main effect of experimental conditions, F(2, 176) = 3.33, p = .038, = 0.037, 95% CI [0.00, 0.10], and Social Group Type, F(1, 176) = 304.73, p < .001, = 0.634, 95% CI [0.54, 0.71]. The interaction effect was not significant (p = .254). Post-hoc tests showed that, within each experimental condition, transitory groups were rated as more crowded than intimacy groups (p’sadj < .001). In contrast, we found no significant difference between experimental conditions after Bonferroni correction, neither for intimacy groups nor for transitory groups (p’sadj > .05).
Regarding the ratings of familiarity between individuals in the pictures, a mixed ANOVA revealed a significant effect for the Social Group Type, F(1, 174) = 2144.48, p < .001, = 0.925, 95% CI [0.90, 0.94], but not for experimental conditions or interaction effect (p’s > .05). The ratings indicate that participants across experimental conditions perceived people in the intimacy groups as being more familiar with each other than in the transitory groups (p’sadj < .001). In addition, we tested the difference between the music conditions (negative and positive music) and the control condition (no music) for both Social Group Types, which did not differ significantly (p’s > .274).
Taken together, music influenced perceptions of social situations in terms of affective ratings. Cheerfulness ratings were influenced for the intimacy and transitory groups, whereas only the intimacy groups showed differing pleasantness ratings. Emotional states and ratings reflected the same pattern of group differences, suggesting that the emotional states evoked by the music influenced the affective perception of social situations.
Discussion
In the present study, it was hypothesized that listening to different kinds of music would lead to different emotional states in participants, which would subsequently alter the perception of images of real-world scenes. A particular focus was laid on the implicit information about social groups, differing in the degree of entitativity (the extent to which a group of people is perceived as a coherent social group). Results suggest that listening to music alters perceptions of social situations in an emotion-congruent way. However, this effect was only observed for the negative music condition and affective ratings (cheerfulness, pleasantness). Furthermore, the effect of music was greater for ratings of intimacy groups (e.g., friends) than for transitory groups (e.g., people waiting for the bus).
Changes in Emotional States
After the experiment, a linear trend in valence ratings was found between the experimental conditions, but only the negative music condition differed significantly from the control and positive music conditions, the latter two did not differ significantly. No differences were found for the arousal dimension, providing only some support for our first hypothesis of emotional changes as a function of different types of music. This finding is in accordance with several studies that observed group differences in valence only for the negative music condition (e.g., Bouhuys et al., 1995; Fiedler et al., 2001; Mayer et al., 1992; Parrott, 1991), which has been confirmed by meta-analyses (Joseph et al., 2020; Westermann et al., 1996). It can be argued that the induction of negative affects is easier than inducing positive affects (Westermann et al., 1996). However, for our study, it is important to note that the differences were not due to the successful induction of negative emotions by listening to negative music during the experiment, which did not differ in a pre- and post-experiment comparison. Instead, group differences in valence ratings after the experiment appear to be primarily due to the positive change in valence in the positive music condition, which were slightly, yet not significantly, higher after the experiment than before.
Musical Impact on Affective vs. Cognitive Rating Dimensions of Social Situations
Differences in the evaluation of social situations were found only for variables measuring the affective properties of the scenes (cheerfulness, pleasantness), but not for cognitive evaluation dimensions (crowding, familiarity). This suggests that music may influence evaluations of the affective qualities of the environment, which is consistent with recent field experiments (Ehret et al., 2021; Yamasaki et al., 2015). In line with our second hypothesis regarding mood-congruency effects (Forgas, 1995) on affective ratings, we found a similar rating profile to valence ratings between the negative, neutral, and positive experimental conditions in most test conditions: in three out of four conditions, except the pleasantness ratings for intimacy groups, we observed a linear trend between experimental conditions (negative music < no music < positive music), as expected. However, only the negative music condition differed significantly from the others. This largely supports the assumption of mood congruency and is consistent with previous findings that positive moods lead to more positive evaluations of the environment (Fisher, 1974; Riener et al., 2011). The fact that we only found differences in valence ratings but not in arousal ratings points to the importance of mood (positive vs. negative valence) for congruency effects (Bower, 1981; Forgas, 2008).
In contrast, we found no differences between experimental conditions on social cognitive rating dimensions, which are objective features of the environment. A small number of previous studies found effects of musical prosody, mood, or music listening on perceptions of entitativity (Edelman & Harring, 2015; Savery et al., 2021), we found no evidence of a musical effect on familiarity ratings for either intimacy or transitory groups. Yamasaki et al. (2015) found differences between music conditions for the perception of crowding, with this attribute included in the arousal dimension. According to this level of observation, the lack of effect could be explained by the absence of differences in arousal between experimental conditions in our sample. On the other hand, it seems plausible that the crowdedness of places is essentially defined by the number of people present, and that the evaluation of entitativity via familiarity is concluded from how people face each other and are arranged in space. Affective evaluations of a situation, on the other hand, are much more complex, especially in the case of pictures, because they rely on subtle contextual information that leaves more room for interpretation, and other modalities and moods may contribute to the interpretation of meaning (Pocock & Hudson, 1978). This suggests that affective evaluations are more likely to be influenced by the emotional state of the perceiver than cognitive evaluations, as already noted by Russell et al. (1981). Consequently, cognitive evaluations are more likely to be influenced by the environment itself and objective features than by the individual’s emotional state or by music (Franěk et al., 2020).
Musical Impact on the Evaluation of Intimacy Groups vs. Transitory Groups
Intimacy groups were generally rated more positively than situations representing transitory groups. Intimacy groups show group differences on both affective rating dimensions (cheerfulness, pleasantness), while transitory groups only show group differences on the cheerfulness rating dimension, and the effects found were larger for intimacy groups than for transitory groups. It can be assumed that musical influence is greater in positive environments than in negative ones. In other words: Negatively perceived environments are less easily improved by music, whereas negative music is more likely to impair positively perceived environments (see also Ehret et al., 2021).
It is known that the strength of the impact of affect also depends on the characteristics of the visual target, which influences processing styles. According to Forgas (1994), affective influences on information processing styles are more likely when the stimuli to be evaluated are processed heuristically (e.g., unfamiliar, personally irrelevant) or substantively (e.g., social behavior and people, complex or ambiguous stimuli, atypical tasks). This may explain why the effects of emotional reactions are more frequent and larger for intimacy groups than for transitory groups, as it seems to be more complex to evaluate static representations of interacting individuals in social relationships, involving a higher degree of ambiguity and room for associative interpretation than an aggregate of several independent individuals. In addition, we found differences in the rating profiles of intimacy and transitory groups that are not parallel across experimental groups (Figure 2). Thus, music alters perceptions of social situations as a function of the characteristics of the group being evaluated.
Methodology and Limitations
There have been several challenges and limitations when carrying out the current study. The musical stimuli were chosen to convey the musical mood unobtrusively in the background and to control musical characteristics. In the pretest, we asked about perceived emotional characteristics, yet we did not test the extent to which the intended emotions were experienced, which may explain the limited evidence for changes in individuals’ emotional states. More emotional music may be more effective at eliciting certain emotional states, thus increasing the musical influence on evaluations of social scenes. In addition, personally relevant music may enhance emotional responses due to high preference ratings (Völker, 2021), and self-selected music reliably evokes autobiographical memories (Jakubowski & Ghosh, 2021) that are suitable for emotion induction.
The use of pictures as visual stimuli provides controlled conditions for evaluation but also has some limitations. First, pictures are non-dynamic representations of situations. Second, although we aimed to balance the image properties in both types of social groups and to reduce confounds such as facial expressions and other emotional features by blurring, we cannot completely rule out the possibility of interference between intimacy and transitory group images, such as colors or location. Self-generated video stimuli appear promising and could further enhance the discrimination of levels of entitativity by manipulating movement synchronization and direction (Bera et al., 2018; Edelman & Harring, 2015; Lakens, 2010; Lee et al., 2020). However, neither responses to static pictures nor dynamic videos are completely generalizable to real-world settings (Heft & Nasar, 2000). Therefore, future field experiments should be conducted that also take social information into account and consider possible negative evaluative effects of groups (Abelson et al., 1998; Bera et al., 2018; Dasgupta et al., 1999). The image blurring procedure may have enhanced emotional effects by increasing complexity and ambiguity, increasing the space for associative interpretation, and thus enhancing the musical effect on ratings. There is a need to discuss how these findings can be integrated into everyday experience, as people typically do not focus directly on others and peripheral vision tends to be blurred.
The operationalization of perceived entitativity was inspired by Lickel et al. (2001), who found in their study on lay theory of groups that perceivers' beliefs about the extent to which group members are personally interdependent (e.g., interaction, interpersonal bonds) are related to the perceived entitativity of groups. The evaluation used in the current study, asking for the extent of familiarity between group members, was self-developed and refers to how well the individuals in the images know each other. The inventory of entitativity developed by Blanchard et al. (2020) includes several factors such as similarity, shared goals, or boundaries, yet none of them seemed to apply to our research questions. Instead, we sought a direct measure of the relationship between individuals that is particularly useful for distinguishing between intimacy and transitory groups. The results of the current study show that assessing how well individuals know each other seems to be appropriate: The better people know each other, the more shared experiences and coherence can be expected, and the more intimate the relationship between them should be.
Finally, our sample is young, predominantly female, and the majority had received musical training. These characteristics may influence, for example, the emotional responses to music (e.g., Juslin et al., 2011; Kreutz et al., 2008). In addition, untrained listeners show greater variation in non-musical associations, so the high level of musical experience may have limited associative interpretation (Westermann et al., 1996) and influenced the present results.
Conclusion
Listening to music through headphones in public spaces has become increasingly popular, and altered perceptions of the environment are often reported. In the present study, we found that musical and environmental features play a role in this process. We have demonstrated that the musical influence on perceptions is likely due to changes in participants’ emotional states, which is most evident in ratings of affective qualities of social situations. In this context, the choice of music is essential, as music that negatively affects the individuals’ emotional states may also lead to more negative evaluations of the environment. In addition, another factor was evident in this study: Social groups that differ in the degree of entitativity (the degree to which a collection of individuals is perceived as a coherent group) alter the influence of music on environmental evaluations. This observation suggests that the effects of music vary across social situations. Thus, the influence of music is not universal but depends on the social representations and measures used in a study. Information about social situations should be considered in future studies on the influence of music on environmental perception. Overall, listening to music is a suitable strategy for altering emotions in everyday social contexts, and for improving the affective quality of environments.