Mental imagery (MI) refers to the ability to generate internal representations without corresponding sensory input although individuals typically remain aware that these internally created experiences do not correspond to the external here-and-now (Dijkstra & Fleming, 2023; Kosslyn, 1995; Richardson, 1969). It can draw on episodic memory and form new mental experiences by recombining and reshaping stored information (Kosslyn et al., 2001). MI is also a multimodal phenomenon: individuals may experience imagery in visual, motor, auditory, or other forms, and these modalities often interact (Nanay, 2018). The different imagery modalities can be conjured up voluntarily or occur involuntarily (Pearson et al., 2015), and the content and vividness (i.e., lifelike) of imagery can vary across individuals and even across the lifespan (Gulyás et al., 2022).
Mental Imagery During Music Listening
MI as a form of engagement with music listening has become a prominent research area in recent years (Antović et al., 2023; Dahl et al., 2022; Day & Thompson, 2019; Hashim et al., 2020; Hashim, Küssner, et al., 2024; Herff et al., 2021, 2022; Küssner & Taruffi, 2022; Taruffi & Küssner, 2019). Taruffi et al. (2017) found MI to be the most common form of mind-wandering regardless of the emotional valence of the music. Although MI is often described as visual, it typically spans multiple modalities including affective, conceptual, and associative forms, and such non-visual imagery is common and can vary systematically with musical style (Hashim et al., 2023; Jakubowski et al., 2024).
Research on MI is valuable as it supports a range of simulation-based functions, including enhancing creativity (Pearson, 2007), improving well-being (Blackwell et al., 2013), boosting sports performance (Castellar et al., 2025, 2026), and supporting everyday simulations of personal or social scenarios (Ayyildiz, Geibel, et al., 2025; Taruffi et al., 2023). Importantly, imagery and emotion are tightly intertwined during listening: the emotional tone of the music shapes the vividness of imagery (Taruffi et al., 2017), and imagery itself often contributes to the emotional experience (Herff et al., 2021; Küssner & Eerola, 2019).
Content and Experiential Qualities of Music-Induced Mental Imagery
Beyond the prevalence and emotional functions of music-induced MI, it also varies in its experiential qualities. For instance, music can intensify imagined vividness compared to silence (Herff et al., 2021, 2022; Ayyildiz, Prince, et al., 2026). Such qualities matter because they shape how strongly imagery is felt. Yet, music influences not only how vividly people imagine, but also what they imagine.
Researchers have therefore examined the content of music-induced MI. It has been shown that music can evoke recurring imagery themes, such as nature, people, autobiographical episodes, and abstract colours and shapes, across both retrospective reports and online listening (Dahl et al., 2022; Küssner & Eerola, 2019).
Other work using a directed imagery paradigm has demonstrated that music can evoke imagined content featuring social interactions and heroic journeys (Herff et al., 2025; Taruffi et al., 2023). These findings suggest that music-induced MI involves recurring content patterns across different listening contexts.
To categorise diverse imagery content, Hashim et al. (2021, 2023) developed a thematic framework for imagery reports evoked by three orchestral film music excerpts in the classical style differing in emotional tone (happy, tender, fearful), and identified three broad themes: Storytelling (scene-like or narrative imagery with identifiable elements like who, what, where); Associations (affective, conceptual, or abstract imagery); and References (imagery linked to the music itself or to external media). Within this framework, Storytelling appeared as the most common theme. They also proposed a complementary framework for the musical features listeners identify as triggers of imagery (e.g., instrumentation, elements such as melody or tempo, soundscapes such as the atmospheric or environmental sonic qualities, and compositional style including structural organisation like build-ups and contrasts). Building on prior work examining imagery evocation in classical music (e.g., Herff et al., 2021; Taruffi et al., 2017), this thematic framework, to our knowledge, is the most comprehensive currently available for categorising music-induced MI.
Importance of Genre
While these studies demonstrate that music can reliably shape the content and qualities of imagery, much less is known about the extent to which specific, distinct musical genres influence these processes. Genre has been shown to meaningfully predict the kinds of thoughts people experience during music listening. For example, classical excerpts elicit more fictional stories, media memories, and music-related thoughts, whereas electronic music produces more abstract imagery (Jakubowski et al., 2024). Large-scale work using 356 excerpts across 17 genres further confirms that musical style reliably shapes thought types (van der Walle et al., 2025a, 2025b). However, this work focuses on broad thought categories rather than on the qualitative content of MI itself. Since different musical genres differ in their typical musical features (e.g., timbre and rhythm), genre-specific explorations remain essential in understanding how particular genres—such as ambient music—shape the themes and qualities of MI.
Function and Characteristics of Ambient Music Relevant to Mental Imagery
Ambient music is typically characterised by spacious, slowly developing textures, and environmental elements such as waves or creaking wood, which may reduce attentional demands and provide greater cognitive space for MI to arise (Szabo, 2018). These musical features contrast with the more clearly articulated melodic and harmonic organisation of classical music, which may serve as a reference point, given that prior music-evoked imagery research has largely used classical excerpts, raising the question of whether imagery elicited by ambient music follows the same thematic patterns identified in classical contexts. Understanding how ambient music shapes MI is crucial not only for theories of music-evoked experience but also for applied settings, such as guided MI, wellbeing, and relaxation practices (Becker-Blease, 2004; McKinney & Honig, 2016; Scarratt et al., 2023), where music is used to support these activities. Moreover, creative fields, such as film, game design, and sound design, widely use ambient music for its atmospheric (or spatial) and immersive qualities (Chattopadhyay, 2017), but its imagery-related effects remain largely unknown.
The Present Study
The present study therefore explores the MI elicited by ambient music using the thematic framework developed by Hashim et al. (2021, 2023) for classical music in film. Applying this framework allows us to consider whether the core themes (Storytelling, Associations, References) extend to this genre, how they vary across excerpts differing in emotional expression, and which musical features listeners identify as evoking their imagery.
Because this is an exploratory first step into the thematic content of imagery evoked by ambient music, we additionally consider broader experiential factors that shape how people experience MI. Prior work shows that listeners differ in how vividly they tend to imagine, how strongly they respond emotionally, their musical background, and how much they like particular pieces of music (Ayyildiz, Milne, et al., 2025; Kreutz et al., 2008; Küssner & Eerola, 2019; Martarelli et al., 2016; Taruffi et al., 2017). These factors do not address imagined content directly but provide important context for understanding how MI arises in response to ambient music and help identify new directions for future hypothesis-driven research.
Method
Participants
Participants were recruited via social media and personal contacts. Of 82 participants who accessed the survey, 65 completed all required sections and were included in the analyses. Since not all participants responded to every question, sample sizes for individual analyses range from n = 61–65. As the primary aim was qualitative, sample size was guided by information power and thematic sufficiency rather than an a priori statistical power calculation. The majority of the participants were from Germany (86%). The remaining 14% comprise participants from the United Kingdom (8%) and other countries (6%; e.g., the Netherlands and the USA). Participants were between 19 and 72 years old (M = 30.95 years, SD = 12.38). Gender was assessed via a multiple-choice question; 63% selected female, 35% male, and 2% non-binary. Around two thirds were university students (37% undergraduates, 29% postgraduates), while 29% worked in non-university professions. Seventy percent of the participants indicated that they play at least one musical instrument, 28% considered themselves musicians.
Study Design
The study employed a within-subjects design with three experimental music conditions. It was designed as an adaptation of the online survey by Hashim et al. (2023), who used three classical (film) music excerpts conveying happy, tender, or fearful emotions and assessed imagery amount, vividness, content descriptions, emotional responses, and music liking after each excerpt. In the present study, we followed the same structure but with ambient excerpts.
Following their protocol, participants in the present study rated amount (1–7) and vividness (1–7) of visual imagery for each excerpt, music liking (1–5) and felt emotional intensity (1–5) for each excerpt, and provided two open-ended descriptions of imagery content and perceived musical features. They also completed two individual difference measures: the Vividness of Visual Imagery Questionnaire (VVIQ; Marks, 1973) to measure vividness of participants’ MI in general, and the Musical Training subscale of the Goldsmiths Musical Sophistication Index (Gold-MSI; 1–7; Müllensiefen et al., 2014).1
We additionally assessed perceived emotional expression (“happy”, “sad”, “tender”, “fearful”; 1–7) to confirm the representativeness of the stimuli; temporal orientation of imagery (future-, present-, and past-related; 1–7); baseline mood (1–7); and listening frequency for this type of music (1–4). These measures were single-item ratings for this study rather than established measurement instruments or standardised questionnaires. The selection of the four categories of perceived emotions was intended to provide representative examples of emotional qualities in music, including both positive/negative emotions and low/high arousal states. “Sad” was included as an additional rating category that was not represented by any of the three stimuli. This category served as a contrasting negative and low arousal option to confirm that none of the excerpts were inadvertently perceived as sad. The stimuli, specific task instructions, the questionnaire items, and other study-related materials can be accessed via PsychArchives (see section Supplemental Materials).
Although participants were prompted to report visual MI, their responses often included non-visual elements, such as affective states and narratives. We therefore use the broader term “mental imagery” throughout to encompass this multimodal phenomenology. Indeed, defining MI was necessary for the study procedure, but it may have also caused a potential priming effect.
Procedure
Participants completed the study online. After providing informed consent, they answered demographic questions and were then presented with written instructions describing the purpose of the study and a brief definition of visual MI.
Before the listening task, participants rated their baseline mood (1–7). Each participant then listened to the three ambient excerpts (happy, tender, fearful) in a randomised order using their headphones. After each excerpt, participants completed the corresponding imagery questions and rating scales detailed in the Study Design section. After completing the listening trials, participants filled out the individual difference measures (VVIQ and the Gold-MSI Musical Training subscale), followed by a question assessing how often participants listen to the types of music presented in the study.
Musical Stimuli
To identify suitable ambient stimuli for the happy, tender, and fearful conditions, we conducted an exploratory stimulus selection phase study with 11 participants. From 15 pre-selected ambient pieces, these participants were asked to choose the excerpts that they perceived as most strongly expressing each target emotion. The top-rated piece for each emotion was chosen, and a 40–60-second excerpt judged as most representative of the target emotion was extracted for use in the main study. The categories of happy, tender, and fearful were selected to represent different types of emotions, specifically positive, low arousal, and negative/high-arousal states. This approach covers a range of emotions while keeping the number of examples manageable for qualitative and quantitative analysis.
The resulting three ambient excerpts were “Happiness” by Jónsi & Alex (happy), “Philadelphia Shore” by Mind Explorer (tender), and “Amor & Psyche” by We like We and Jacob Kirkegaard (fearful). Although the excerpts differ in harmony, texture, and timbre, they all exhibit characteristic ambient features such as sustained sounds, gradual development, and layered instrumentation. “Happiness” includes a repeating string phrase, a perceivable metre, and a clear harmonic progression in G major; “Philadelphia Shore” features an electric guitar melody in A minor accompanied by piano and acoustic guitar, as well as “swoosh effects”, all treated with strong reverberation that creates a pronounced sense of spaciousness and is supported by sustained synthesiser textures; and “Amor & Psyche” is characterised by dissonant sustained strings, high-pitched harmonic material, and sparse percussive textures, creating a tense and unstable sonic environment.
Analysis
The content of participants’ MI was analysed using a directed qualitative content analysis (Hsieh & Shannon, 2005) based on the hierarchical thematic framework developed by Hashim et al. (2023). This approach was chosen because the aim was to apply an existing, theory-informed structure rather than to generate new themes inductively. This framework consists of three levels: Level 3 codes, representing fine-grained meaning units representing specific ideas or aspects within a participant’s imagery description; Level 2 sub-themes, grouping conceptually related Level 3 codes; and Level 1 themes, representing broader, higher-order categories. All imagery reports were first segmented into Level 3 codes using a semantic, explicit-meaning coding strategy (i.e., descriptions were coded at their surface meaning rather than at a latent interpretative level). These codes were then grouped into Level 2 sub-themes and subsequently assigned to the three Level 1 themes.
Following Hashim et al. (2023), imagery descriptions were coded into the three main themes (Storytelling, Associations, References) and their respective sub-themes. Storytelling included: action, characters, narratives, and settings and locations. Associations included: abstract or other associative content, feelings and atmospheres, and memories. References included: music and other media. The possibility of additional themes was not excluded a priori, but due to its wide, yet clear categorisation, Hashim et al.’s framework has proved suitable for capturing an extensive range of imagery in the present study.
For the analysis of musical features, we applied the framework of Hashim et al. (2021), which categorises perceived musical triggers into four themes: Composition & Arrangement (structural or stylistic organisation, e.g., build-ups and contrasts), Elements (e.g., melody, rhythm), Instrumentation (specific instruments/techniques), and Soundscapes (atmospheric or environmental sonic qualities). These themes are mutually exclusive as they separate distinct musical attributes, and they are distinct from the imagery-content themes.
To avoid ambiguity, References (Hashim et al., 2023) covers imagined musical features within the MI (e.g., visualising a pianist), whereas the musical feature framework (Hashim et al., 2021) refers to the perceived musical features that participants reported as evoking their imagery. These frameworks address different aspects: imagery content versus perceived musical triggers.
A small number of descriptions provided in German were translated into English prior to coding by one of the first authors. Because the present study applied an existing, pre-specified thematic framework, coding was conducted deductively with regular checks to ensure consistency. A second coder was therefore not required for framework development.
All subsequent, exploratory quantitative analyses (descriptive statistics, correlations, ANOVAs, and theme frequencies) were conducted in SPSS (Version 25; IBM Corp.) and R (Version 4.5.2; R Core Team). All p-values reported are two-tailed and were tested at significance level α = .05. Effect sizes are reported as partial eta-squared () for ANOVAs and Hedges’ g for pairwise comparisons. When the assumption of sphericity was violated, Greenhouse-Geisser corrections were applied to the degrees of freedom. Given the number of hypotheses tested, Holm-Bonferroni corrections were applied to control for alpha inflation, and Bonferroni-adjusted post-hoc tests were used for the ANOVA models. Individual difference variables (e.g., musical training scores) were recorded to characterise the sample and were not included as predictors due to the exploratory scope of the study and the limited sample size.
Results
Descriptive Statistics
Table 1 and 2 present the means and standard deviations of the investigated variables, including the perceived emotional content, for each excerpt. Overall, 61 of 65 participants (93.8%) reported experiencing MI during music listening. Participants reported imagery for all three musical excerpts with average ratings slightly above the midpoint of the scale (overall amount: M = 4.14, SD = 1.80; vividness: M = 3.97, SD = 1.90). Participants reported felt emotional experiences ranging from slightly to moderately intense (overall M = 3.35, SD = 1.13 on a 5-point Likert scale). Regarding music liking, the fearful music was disliked a great deal or disliked somewhat by 61% of the participants, whereas the happy and tender excerpts were liked somewhat or liked a great deal by 83.1% and 66.1%, respectively.
Table 1
Sample Sizes, Means, and Standard Deviations of Amount and Vividness of Mental Imagery, Felt Emotion, Liking, and Perceived Emotional Content Across the Three Musical Excerpts
| Variable | Musical Excerpts | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Happy | Tender | Fearful | |||||||
| n | M | SD | n | M | SD | n | M | SD | |
| Amount of MI | 65 | 4.72 | 1.66 | 65 | 3.74 | 1.53 | 64 | 3.97 | 2.05 |
| Vividness of MI | 64 | 4.38 | 1.86 | 63 | 3.73 | 1.70 | 62 | 3.79 | 2.07 |
| Intensity of felt emotional response | 63 | 3.51 | 1.01 | 65 | 3.11 | 1.13 | 63 | 3.43 | 1.20 |
| Liking | 65 | 3.95 | 0.91 | 65 | 3.88 | 1.01 | 64 | 2.55 | 1.21 |
| Perceived emotional content | |||||||||
| Happy | 63 | 4.61 | 1.51 | 65 | 3.66 | 1.55 | 62 | 1.53 | 1.05 |
| Tender | 61 | 4.54 | 1.69 | 65 | 4.58 | 1.73 | 62 | 1.84 | 1.37 |
| Sad | 64 | 2.88 | 1.64 | 64 | 3.19 | 1.78 | 62 | 3.40 | 1.97 |
| Fearful | 64 | 1.48 | 1.01 | 64 | 1.75 | 1.35 | 63 | 5.35 | 1.86 |
Note. Sample sizes ranged from 61 to 65 across variables because of missing responses. Descriptive statistics are based on available cases for each variable. The ANOVAs reported in the main text use data from participants who provided complete responses for the variables included in each model.
Table 2
Sample Sizes, Means, and Standard Deviations of Mood, Listening Frequency, VVIQ, and Gold-MSI (Musical Training)
| Variable | n | M | SD |
|---|---|---|---|
| Mood | 65 | 5.29 | 1.11 |
| Listening frequency | 64 | 2.14 | 0.77 |
| VVIQ Score | 64 | 62.48 | 8.94 |
| Gold-MSI Musical Training Score | 64 | 21.09 | 9.17 |
Note. VVIQ = Vividness of Visual Imagery Questionnaire (observed range: 44–80); Gold-MSI = Goldsmiths Musical Sophistication Scale (Musical Training observed range: 6–39).
Amount and Vividness of Mental Imagery
A repeated-measures ANOVA revealed that the amount of MI differed significantly between musical excerpts, F(1.92, 120.89) = 8.14, p = .001, = .11. MI was experienced to a greater extent following the happy excerpt (n = 64, M = 4.77, SD = 1.64) than the tender (n = 64, M = 3.77, SD = 1.53, p < .001, g = 0.62) and fearful excerpts (n = 64, M = 3.97, SD = 2.06, p = .008, g = 0.42; Figure 1A). Although vividness ratings were numerically highest for the happy excerpt as well, the differences in vividness across excerpts were not statistically significant (all ps ≥ .58; Figure 1B).
Figure 1
Distributions of (A) Imagery Amount, (B) Imagery Vividness, (C) Emotional Intensity, and (D) Liking for Happy, Tender, and Fearful Ambient Music Excerpts
*p < .05. ** p < .01. *** p < .001.
Pearson correlations showed that participants who reported more imagery during the happy excerpt also reported more imagery during the tender and fearful excerpts (happy–tender: n = 65, r = .27, p = .03; happy–fearful: n = 64, r = .41, p < .001). Within each excerpt, there was a strong positive correlation between the amount and vividness of imagery (happy: n = 64, r = .83; tender: n = 63, r = .56; and fearful: n = 62, r = .73; all ps < .001).
Intensity of Emotion and Liking
The intensity of the felt emotional response differed significantly between excerpts, F(1.95, 117.06) = 3.18, p = .047, = .05, with the post-hoc tests showing a more intense emotional response during the happy music (n = 61, M = 3.57, SD = 0.96) than the tender music (n = 61, M = 3.18, SD = 1.12, p = .03, g = 0.37; Figure 1C). Intensity ratings were positively correlated across excerpts (happy-tender: n = 63, r = .42, p < .001; happy-fearful: n = 61, r = .33, p = .009; tender-fearful: n = 63, r = .37, p = .003). Emotional intensity also positively correlated with the amount of MI for all excerpts (happy: n = 63, r = .39, p = .002; tender: n = 65, r = .46, p < .001; fearful: n = 63, r = .44, p < .001). For the fearful excerpt, intensity additionally correlated with vividness of imagery, n = 62, r = .46, p < .001. Finally, emotional intensity was strongly and positively correlated with music liking for both the happy (n = 63, r = .66, p <.001) and tender excerpts (n = 65, r = .60, p < .001).
Liking also differed significantly between excerpts, F(1.77, 111.50) = 39.16; p < .001, = .38, with the happy (n = 64, M = 4.00, SD = 0.84, g = 1.38) and tender (n = 64, M = 3.89, SD = 1.01, g = 1.19) excerpts being liked significantly more than the fearful excerpt (n = 64, M = 2.55, SD = 1.21; both ps < .001; Figure 1D). Moreover, liking was also correlated with the amount of imagery: participants who reported more imagery tended to like the music more, most strongly for the happy excerpt (n = 65, r = .44, p < .001), with smaller but significant associations for the tender (n = 65, r = .28, p = .02) and fearful excerpts (n = 64, r = .32, p = .01).
Emotional Content Intended Versus Perceived
Three repeated-measures ANOVAs tested whether participants’ perceived emotions aligned with the intended emotional content of each excerpt (Figure 2). For all excerpts, perceived emotional ratings differed significantly across the four emotion categories, happy: F(2.42, 145.44) = 62.75; p < .001, = .51; tender: F(2.44, 153.77) = 35.82; p < .001, = .36; fearful: F(2.30, 140.22) = 69.07; p < .001, = .53. Post-hoc comparisons showed that the tender (n = 64, M = 4.56, SD = 1.74) and fearful excerpts (n = 62, M = 5.32, SD = 1.86) were reliably perceived as their intended emotions (all ps < .001). The happy excerpt was rated as more happy (n = 61, M = 4.61, SD = 1.53) than sad (n = 61, M = 2.82, SD = 1.63) or fearful (n = 61, M = 1.48, SD = 1.01; all ps < .001), but did not significantly differ from tender, suggesting similar perceived levels of happiness and tenderness (Supplementary Table 1).
Figure 2
Perceived (Rated) Emotional Content across the Happy, Tender, and Fearful Ambient Music Excerpts
Note. Error bars represent ±1 SEM.
Content of Mental Imagery
Across the content analysis, a total of n = 641 codes were assigned to 177 imagery reports. Main and sub-themes rarely occurred exclusively, meaning that the majority of participants reported a combination of imagery elements.
With regard to the three main themes, Storytelling (71%) was the most common across all three ambient pieces, followed by Associations (20%) and References (9%). See Figure 3A.
Figure 3
Relative Distribution of (A) Theme (Level 1), (B) Storytelling Sub-theme (Level 2), (C) Associations Sub-theme (Level 2), and (D) References Sub-theme (Level 2) for Ambient Music Excerpts
Storytelling
Figure 3B shows the relative distribution of Storytelling sub-themes (Setting & Location, Characters, Action, Narrative). Setting & Location was the most frequently described sub-theme for ambient music, followed by Characters, Action, and Narrative.
Associations
Figure 3C presents the relative distribution of Associations sub-themes (Abstract & Other, Feelings & Atmospheres, Memories) for the ambient music stimuli. Feelings & Atmospheres accounted for the largest proportion of responses, followed by Abstract & Other, while Memories were least frequently reported.
References
Figure 3D shows the relative distribution of References sub-themes (Other Media, Music). Music-related references appeared most often in response to the ambient excerpts.
While Level 1 and Level 2 themes provide a broad overview of imagery content, Level 3 codes offer a more detailed insight into the specific imagery elements participants described. Examination of these codes indicated that imagery varied across the three ambient excerpts but was notably consistent within each excerpt, with each excerpt giving rise to a distinct pattern of imagery, as the following overview (including exemplary quotes) shows.
Happy Excerpt
Following the happy excerpt, participants mostly described positive outdoor scenes, including good weather (12), meadows (12), forests (5), flowers (5), celebratory events (5), and generally peaceful daytime situations. An example includes, “A meadow with butterflies and bees and a woman in a white dress, next to a river. She sways in the wind and watches the insects.” (F, 23).
Several participants described imagery resembling cinematic techniques, such as time lapse sequences (4), slow motion (2), or sepia tones: “I saw a wedding recorded by an old filmmaking device. First appeared to be a sepia movie then slowly transformed into being real.” (F, 23).
Abstract imagery was also present, particularly colours forming a coherent palette aligned with spring and summer settings (8): shades of green (8), yellow (4), white (3) and orange (3). Some answers were more general, speaking of “warm” and “friendly” colours, complementing the frequent mention of sunlight, flowers, and small animals.
Tender Excerpt
The tender excerpt elicited imagery characterised by slow, flowing movement, often set in tranquil environments. Participants frequently imagined the seaside, ocean or waves (14), with scenes often taking place in the evening (6): “There was a coast […] with some waves running against the cliffs. It’s afternoon, it’s a nice and sunny day and I saw the cliffs from above, like a drone shot. Everything moved really slow, but strong.” (M, 23).
Another notable recurring theme was romantic or relational imagery (7), which featured partners, couples, or loved ones, differing from the happy excerpt which centred more on celebratory events such as weddings. Regarding the abstract imagery, the colour blue (4) was mentioned. Additionally, feelings of nostalgia (6) and relaxation (5), but also loneliness (4) were reported. Finally, mental activity in the form of rumination played a vital role during the tender music listening: “A lonely person walking by a river during the night. A few street lamps stand sporadically… Reflecting on things they wish had gone differently.” (M, 23).
Fearful Excerpt
The fearful excerpt especially elicited dark and threatening imagery, with several participants describing sensations of being followed or in danger:
I’m in my bed and I hear something. Maybe there is some kind of ghost. So I start hiding under my blanket with one eye on the door. As the sounds get more intense the doorknob turns […]. I can see a shadow under the door which disappears slowly. I am safe, but my fear is real. (M, 23)
In this excerpt, fear (11) was explicitly mentioned, accompanied by frequent mention of “loneliness” (6) and negative emotions including “stress”, “discomfort”, and “unease”. Colour imagery was dominated by dark tones, greys, and black (9), aligning with frequent mentions of darkness (19) and nighttime (6) and forest (12) settings, with the moon appearing in several reports (4).
In line with the patterns observed at Level 1 and 2 themes, imagery elements in these excerpts often co-occurred, with reports predominantly visual but at times accompanied by affective or narrative elements such as emotions, reflections, or imagined movement.
Musical Features
After describing their imagined content, participants commented on musical features they felt had triggered these imaginings. These responses were categorised using the framework by Hashim et al. (2021), comprising four Level 1 themes: Composition & Arrangement, Elements, Instrumentation, and Soundscapes.
Figure 4 shows that Elements (e.g., melody, pitch level, tempo, timbral or textural features) were referenced most frequently (37.5%), followed by Soundscapes, Instrumentation, and Composition & Arrangement.
Figure 4
Relative Distribution of Reported Musical Features of Ambient Excerpts
At a more detailed level, several consistent links emerged between specific musical features and the imagery participants reported. In the happy excerpt, five participants described imagery resembling cinematic techniques (e.g., time-lapse). Four participants linked these to electronically generated sounds described as “flute-like” (2), “metallic”, or “mechanical”, whereas another participant mentioned a “swooshing” effect as the trigger for an association with film. The high-pitched sample line by the toy Yamaha VSS-30 (3) was also associated with “colourful small birds” and “butterflies”. Strings were likewise linked to warmth, positive events, and nature.
For the tender excerpt, participants most commonly attributed their imagery to the guitar (18), its melody (7), tempo (9), and the music’s overall calm, relaxing character (19). These features were often noted in combination similar to the imagery described above. In the fearful excerpt, imagery was frequently connected to dissonance or lack of harmony (10) and pitch (12), and string timbres described as “dark”, “shrill”, or “shrieking”. Among those who imagined suspenseful scenes (13), nine participants attributed these to dynamics and the exciting character of the music.
Discussion
This study examined how ambient music shapes the experiential qualities (i.e., amount and vividness) of MI and explored the content of MI by deductively applying the thematic framework of Hashim et al. (2023) to a genre not previously examined using this approach. Additionally, emotional responses and music liking were included as contextual variables to better understand the experiential conditions under which imagery arose. Finally, we explored which musical features listeners associated with their imagery to gain insight into how they may contribute to the themes identified. Below, we discuss these findings in relation to previous research, consider their theoretical implications, and address limitations and future directions.
Mental Imagery Prevalence and Affective Responses
Firstly, the vast majority of participants (93.8%) reported experiencing MI when listening to ambient music, consistent with high prevalence rates shown across studies with different types of music (Day & Thompson, 2019; Hashim et al., 2023; Küssner & Eerola, 2019). However, it is important to note that participants were provided with a definition of MI immediately prior to the listening phase. Although this procedure was applied equally across participants and aligns with common practice in MI research, it may have increased participants’ awareness of and reporting on the occurrence of imagery. Therefore, while this finding adds to growing evidence that MI is a widespread phenomenon during music listening, the reported percentage should be interpreted with caution when generalising to everyday contexts.
Despite the small number of stimuli, consistent results regarding music liking responses emerged. Specifically, the happy and tender excerpts were liked considerably more than the fearful excerpt, and liking was positively associated with imagery amount across all three conditions. This pattern raises an intriguing question about the directionality of this relationship: do listeners imagine more when they enjoy music, or is it plausible that creating more imagery enhances the liking of music? While the present data cannot resolve this causal question, they highlight that affective engagement and imagery may co-occur during ambient music listening.
The amount of imagery varied substantially across excerpts, with the happy excerpt evoking more imagery than both the tender and fearful excerpts. This pattern extended to emotional responses, where the happy excerpt also evoked significantly stronger emotional intensity than the tender excerpt. The positive associations between imagery amount and emotional intensity across all excerpts are consistent with prior research demonstrating links between music-induced emotion and MI (Taruffi & Küssner, 2019). However, as we only tested one stimulus per emotional category, it is beyond the scope of this article to draw conclusive inferences about emotion-specific effects, particularly in the quantitative comparisons.
An important caveat is that the happy excerpt was perceived as both happy and tender, complicating interpretation of emotion-specific effects. Previous research found that happy music is typically more arousing than peaceful music (Kreutz et al., 2008; Vieillard et al., 2008), which may explain why this excerpt elicited higher emotional intensity regardless of its ambiguous valence profile.
Thematic Structure and Content of Ambient-Evoked Imagery
A key contribution of this study is demonstrating that the thematic framework developed by Hashim et al. (2023) can be successfully applied to musical material beyond its original classical music in film repertoire. The three core themes, including Storytelling, Associations, and References, emerged reliably across all three ambient excerpts, notwithstanding ambient music’s distinct characteristics such as minimalist features, sustained textures, and a diffuse structure. This highlights the framework’s broader applicability outside the classical (film) music repertoire for which it was originally developed.
When descriptively compared with the three classical (film) excerpts from Hashim et al. (2023), the three ambient excerpts in the present study revealed both commonalities and distinctions. Storytelling remained the most dominant theme in both genres, echoing prior work showing that listeners construct narrative-like internal scenes during music listening (Margulis, 2017; McAuley et al., 2021). However, within the Storytelling theme, ambient excerpts produced relatively fewer narrative accounts and more setting and location imagery than observed in the classical (film) excerpts (Hashim et al., 2023). Most notably, ambient excerpts elicited more Associations, especially feelings and atmospheres, and prompted more music-related references but fewer media references. This may indicate that, in the present ambient excerpts, listeners’ imagery was often grounded in features of the music itself rather than in external media references. The emergence of these patterns across multiple excerpts within each genre, rather than from single pieces, suggests they may reflect recurring tendencies in the stimulus tests used here, meriting further systematic, hypothesis-driven investigation. Together, these findings suggest that although narrative imagery remains a prominent and recurring feature of music-induced MI across two genres, the ambient excerpts tended to shift imagery toward associative and spatial content.
At a more granular level, the qualitative analysis revealed that each excerpt appeared to evoke imagery content with notable coherence in the emotional tone and theme. For instance, imagery described for the happy excerpt centred on bright, outdoor scenes, while the tender excerpt prompted imagery of tranquil, flowing settings, and the fearful excerpt elicited dark, threatening scenarios. While these patterns show some consistency with empirical findings and review literature suggesting that a piece’s emotional character may guide the form of listeners’ imagery (Day & Thompson, 2019; Taruffi & Küssner, 2019), the single-stimulus design means we cannot rule out that these effects are specific to our particular musical selections rather than characteristic of their broader emotional categories. Nevertheless, the occurrence of colour imagery that aligned with each excerpt’s emotional tone is consistent with previous findings that music-colour associations are mediated by emotion (Barbiere et al., 2007; Isbilen & Krumhansl, 2016), suggesting that emotional connotations may play a linking role in shaping the imagery content in response to ambient music listening.
Musical Features as Triggers for Imagery
Given these patterns in imagery content, it is useful to consider which specific musical features may have contributed to these experiences. While musical genre and specific features have been shown to affect the nature of listeners’ imagery (Ayyildiz, Milne, et al., 2025; Jakubowski et al., 2024; van der Walle et al., 2025a, 2025b), the mechanisms through which ambient excerpts evoked particular imagery content remain unexplored. In this study, several musical features were reported as plausible triggers. Some of these were consistent with iconic associations (Schaerlaeken et al., 2019), where the acoustic shape resembles the imagined content. For example, wave-like melodic contour in the tender excerpt prompted images of flowing water. Others aligned with symbolic associations, such as bell sounds evoking churches, which usually depend on culturally learned links (Cross, 2009). These examples can also be understood through frameworks of cross-domain mapping or musical metaphors (Antović, 2022; Eitan & Granot, 2006), whereby listeners interpret musical parameters in terms of other domains (e.g., movement, objects).
Aside from these cases, most musical features in the excerpts elicited a wide variety of idiosyncratic imagery content. Elements such as dissonance in the fearful excerpt or the slow tempo in the tender excerpt were commonly mentioned, but they also produced highly diverse imagery (e.g., being followed, dark woods, romance, loneliness). This variability in responses to these excerpts may reflect musical features establishing an emotional tone that listeners interpret in individually meaningful ways.
Limitations and Future Directions
One potential shortcoming of the present study is the use of self-report measures, which remain one of the most commonly used tools in MI research but are nevertheless susceptible to several biases. In particular, participants may be influenced by demand characteristics, priming effects, or expectations about what the study is “looking for”. This does not undermine the present findings, as self-report remains essential for accessing the experiential qualities of MI including its subjective vividness and content (features that cannot be accessed through objective measures alone), but these results highlight the importance of interpreting findings within the constraints of introspective methods. In future work, objective methods (e.g., neural measures such as electroencephalogram decoding of imagined content) or semi-objective behavioural indices (e.g., binocular rivalry priming) could help complement these subjective reports. Therefore, combining these methods with self-report would allow for a more comprehensive and multidimensional understanding of how imagery arises during listening to ambient music.
A central limitation is the small number of musical stimuli. Because each emotional category and each genre was represented by only one excerpt, observed differences may reflect idiosyncratic properties of these specific pieces rather than category-level characteristics. Considering the stylistic diversity of ambient music, quantitative findings in this paper should also be viewed as preliminary. Future studies using multiple exemplars per category and testing different genres within the same study will be necessary to assess the generalisability of these patterns, particularly across a broader range of musical genres other than ambient and classical (film) music.
Additionally, although we draw descriptive comparisons with classical (film) music reported in Hashim et al. (2023) to contextualise the findings, the present study was not designed as a direct genre comparison. The ambient and classical (film) datasets were collected in separate studies at different times with different samples, so they cannot be treated as directly comparable. A rigorous test of genre-related differences would require both conditions to be examined within a single experimental design. Accordingly, any cross-genre observations should be interpreted as tentative and exploratory.
Another limitation is the unexpected evaluation of the happiness-conveying excerpt as equally happy and tender, which diverges from the pattern observed in the exploratory stimulus selection phase’s results. This phase was intended solely to guide stimulus selection and did not constitute a formal validation of emotional categories. Moreover, perceived emotions were assessed using single-item ratings rather than an established multi-item instrument, which limits the precision of emotional differentiation. Thus, the observed overlap may reflect both the nuanced affective qualities of ambient music and constraints of the measurement approach.
Furthermore, highlighting the distinct listening contexts of ambient music could lead to more nuanced results. Future studies could explore a different approach in a controlled lab setting, allowing participants to fully immerse themselves in the music by listening to the tracks in their entirety. Indeed, individual differences in absorption and levels of musical training can also shape the imaginings in response to music listening (Ayyildiz, Milne, et al., 2025; Vroegh, 2021). Systematically examining these individual differences in future research could offer valuable insights into how the occurrence and content of MI might differ when the often-lengthy tracks are not truncated.
At the framework level, the results suggest which Level 2 sub-themes proposed in the framework by Hashim et al. (2023) may require special attention in future research. Memories, for example, appeared only rarely in the present dataset, suggesting that memory-related imagery may function differently from other imagery types and may be better investigated as a distinct phenomenon.
Conclusion
This study provides novel insights into how ambient music shapes the content and experiential qualities of MI. A prominent pattern across the three emotionally varied excerpts was the strong predominance of imagery with story-like elements, especially settings, which replicates patterns previously observed for classical music in film (Hashim et al., 2023). Ambient excerpts used here tended to yield a notable amount of associative imagery, particularly feelings and atmospheres, suggesting that listeners often engage with ambient music in ways that foreground affective and atmospheric imagery. Reports of musical features further showed that some features were linked to imagery in excerpt-specific ways, whereas many others elicited highly individual interpretations. Finally, the strong associations between the amount of imagery, intensity of emotional responses, and overall liking highlight the close connection between imagery and emotion. Overall, these findings contribute to a more detailed understanding of MI evoked by ambient excerpts, which is particularly relevant for applied and creative contexts where such music is used for its atmospheric, affective, and immersive qualities.
This is an open access article distributed under the terms of the