The_acoustic_correlates_of_perceived_mas.doc

(153 KB) Pobierz
Previous research has demonstrated that both children and adults

                           

Running Head: Masculinity and Femininity in Speech

 

 

In press, Language and Speech

 

 

The Acoustic Correlates of Perceived Masculinity, Perceived Femininity,
and Perceived Sexual Orientation

 

 

Benjamin Munson

Department of Speech-Language-Hearing Sciences

University of Minnesota, Twin Cities

 

 

Please address correspondence to:

 

Benjamin Munson

Department of Speech-Language-Hearing Sciences

University of Minnesota

115 Shevlin Hall

164 Pillsbury Drive, SE

Minneapolis, MN  55455

(612) 624-0304

Fax: (612) 624-7586

Munso005@umn.edu


                            Masculinity and Femininity in Speech 34                           


ABSTRACT

Previous studies have shown that a subset of gay, lesbian, and bisexual (GLB) and heterosexual adults produce distinctive patterns of phonetic variation that allow listeners to detect their sexual orientation from audio-only samples of read speech.  The current investigation examined the extent to which judgments of sexual orientation from speech are related to judgments of masculinity or femininity made by an independent group of listeners.  It also examined the acoustic measures that predict perceived sexual orientation and perceived masculinity/femininity.  Ten listeners judged the perceived masculinity or femininity of 44 talkers (11 heterosexual men, 11 heterosexual women, 11 gay men, and 11 lesbian or bisexual women).  These were compared to measures of the talkers' perceived sexual orientation, and to acoustic measures of the talker's speech.  Listeners judged gay men to sound less masculine than heterosexual men, and lesbian/bisexual women to sound less feminine than heterosexual women.  These measures were significantly correlated with measures of perceived sexual orientation.  Regression analyses showed that different sets of acoustic measures predicated perceived sexual orientation and perceived masculinity/femininity, and that some acoustic measures were more strongly correlated with one perceptual measure than the other.  Results suggest that perceived sexual orientation, perceived masculinity, and perceived femininity are distinct but correlated perceptual parameters.  [Key Words: Sexual Orientation, Sociophonetics, Speech Perception]


INTRODUCTION

Sociophonetic Variation

The speech signal conveys multiple types of information in parallel.  In addition to conveying linguistic information about the content of words and sentences, speech is one of the media through which individuals can convey their membership in different social groups.  This type of variation is sometimes referred to as social-indexical variation.  When the social-indexical variation relates to the sound structure of language, it is referred to as sociophonetic variation.  Sociophonetic variation is pervasive.  As reviewed by Docherty and Foulkes (2001), sociophonetic studies have found systematic influences of age, gender, ethnicity, and socio-economic status on variation in speech production, as well as systematic influences of conversational variables, such as speaker-listener familiarity and topic, on variation in speech production.  This variation is salient to listeners.  For example, listeners can use sociophonetic variation to ascertain a talker’s ethnicity (Purnell, Idsardi, & Baugh, 1999), regional dialect (Clopper & Pisoni, 2004; Van Bezooijen & Gooskens, 1999), and sex (Lass, Almerino, Jordan, & Walsh, 1980) at greater-than-chance levels from audio-only samples of content-neutral speech. 

Social-indexical variation plays a powerful role in interpersonal communication, in that it is used to form overall impressions of affective characteristics of talkers.  Van Bezooijen (1995) demonstrated that Japanese listeners attended to variation in the pitch of women's voices when making affective judgments.  In addition to conveying specific information about talker characteristics, social-indexical information has been shown to mediate more abstract aspects of linguistic processing.  Nygaard and Lunders (2002) showed that the resolution of lexically ambiguous words is mediated by listeners’ perception of a talker’s emotional state.  Other studies demonstrated that listeners' phonetic identification of vowel and fricative continua is influenced by their perception of a talker's sex and the sex typicality of his or her speech (Johnson, Strand, & D’Imperio, 1999; Munson, Jefferson, & McDonald, 2006; Strand & Johnson, 1996).  Strand (2000) showed that talkers' sex typicality influences listeners' reaction times in a gated spoken-word recognition experiment. 

Sexual Orientation and Speech

The focus of this investigation is on the production and perception of a social-indexical category, talker sexual orientation.  A popular culture notion holds that talkers can convey their sexual orientation, either intentionally or unintentionally, through distinctive patterns of phonetic variation.  This notion has been supported by a number of recent studies (e.g., Carahaly, 2000; Gaudio, 1994; Linville, 1998; Munson, McDonald, DeBoe, and White, 2005; Pierrehumbert, Bent, Munson, Bradlow, and Bailey, 2004; Smyth, Jacobs, & Rogers, 2003).  Pierrehumbert et al. (2004) examined short samples of read sentences from a large group (n=103) of gay, lesbian and bisexual (henceforth GLB) and heterosexual women and men.  Pierrehumbert et al. examined five vowels: /i/, /E/, /Q/, /A/, and /u/.  Average vowel duration, as well as average F1 and F2 of each vowel, was measured.  Vowel-space dispersion was measured as the mean Euclidian distance from the center of the vowel space, following Bradlow, Toretta, and Pisoni (1996).  GLB people produced larger F1/F2 acoustic vowel spaces relative to their same-sex heterosexual peers.  For gay men, this appeared to be due to an overall hyperarticulation of the vowel space.  For women, this effect appeared to be limited to the vowels /u/ and /A/.  The lesbian and bisexual (henceforth L/B) women produced retracted variants of these sounds compared to heterosexual women[1].

              More recently, Munson, McDonald, DeBoe, and White (2005) examined acoustic and perceptual characteristics of 44 adults.  This group included 11 each gay men, heterosexual men, L/B women, and heterosexual women.  Only one woman in Munson et al. (2005) identified as bisexual; her data were combined with the ten lesbian-identified women.  Unlike Pierrehumbert et al., Munson et al. examined the acoustic characteristics of single words rather than read sentences or naturalistic speech.  The interpretation of measures of segmental phonetic characteristics taken from sentences and connected speech is complicated by the influence of prosodic structure on the acoustic and perceptual characteristics of sounds.  For example, Pierrehumbert et al.’s finding that gay men hyperarticulated their vowel spaces relative to heterosexual men might indicate a preference for gay men to produce multiple nuclear stressed words in sentences, which would lead to hyperarticulation of stressed syllables (De Jong, 1995), rather than to them producing intentionally peripheral vowel targets.  In an acoustic study of the production of single words, Munson et al. found that gay men produced a higher F1 frequency in /Q/ and /E/ and a more negatively skewed /s/ spectrum than heterosexual men.  L/B women produced a lower F1 frequency in /E/ and a lower F2 frequency in /oU/ than heterosexual women.  GLB and heterosexual people did not differ in height, suggesting that the observed differences were not related to vocal-tract size. 

              Munson et al. (2005) also conducted a perception experiment, in which a group of 40 naïve listeners were asked to rate the talkers’ perceived sexual orientation on a five-point equal-interval scale.  Each listener rated each talker 4 times, once based on words containing back vowels and no sibilant fricatives (e.g., note); once based on words with back vowels and sibilant fricatives (e.g., soap), once based on words with front vowels and no sibilant fricatives (e.g., path), and once based on words with front vowels and sibilant fricatives (e.g., said).  The choice to use words with varying segmental composition was motivated by the finding that speech-production differences between GLB and heterosexual people were phoneme-specific, and appeared to be limited primarily to front vowels and sibilant fricatives.  Munson et al. showed that naïve listeners judged GLB people to be more GLB sounding than heterosexual people, though there was some overlap between the groups.  The segmental composition of the stimuli affected judgments for the group of male talkers only.  Listeners rated gay men as more-gay sounding when judging words containing front vowels than when judging back-vowel words.  Ratings of L/B and heterosexual women were very similar across the four stimulus types. 

              A second set of analyses used hierarchical multiple regression to examine which acoustic characteristics of the 44 talkers predicted listeners' ratings of perceived sexual orientation.  The independent measures in these regressions were average F1 and F2 frequency (using the psychophysically motivated Bark scale, Zwicker & Ternhardt, 1980), average vowel-space dispersion (using Bradlow et al.’s mean Euclidian distance measure), average f0, average f0 range (using the psychophysically motivated ERB scale, Hermes & Van Gestel, 1991), /s/ center of gravity, /s/ skewness (i.e., the first and third spectral moment of /s/), and acoustic index of breathiness of the voice source, H2-H1 amplitude for the vowel /Q/.  In the analyses of male talkers, F1 frequency, F2 frequency, and /s/ skewness predicted a significant proportion of variance in perceived sexual orientation.  Men who produced low vowels with a high F1, back vowels with a high F2, and /s/ with a highly negatively skewed spectrum were more likely to be rated as gay sounding than men with the opposite characteristics.  Average f0 accounted for a small proportion of variance (16%) in men's perceived sexual orientation beyond what was accounted for by F1 frequency, F2 frequency, and /s/ skewness.  For women, F1 frequency, F2 frequency, and vowel-space dispersion predicted a significant proportion of variance in perceived sexual orientation.  Women were likely to be rated as L/B sounding if they produced a low F1 in low vowels, a low F2 in back vowels, and more-contracted vowel spaces overall. 

              Both Pierrehumbert et al. (2004) and Munson et al. (2005) argued that the group differences between GLB and heterosexual people do not represent a generic scaling of the GLB people’s speech to the opposite sex norms.  That is, the gay men’s speech style does not appear to be a globally feminine speaking style, nor does the L/B women’s speech appear to be globally masculine.  This can be seen in a number of different results, including the finding that GLB and heterosexual people did not differ in either mean f0 or f0 range in either study, and the finding that GLB and heterosexual people did not differ in the overall spacing of ensemble of vowels in the F1/F2 acoustic vowel space.  Rather, both Pierrehumbert et al. (2004) and Munson et al. (2005) hypothesized that the specific differences between GLB and heterosexual talkers are learned, socially conventional ways of speaking that convey sexual orientation. 

Masculinity and Femininity in Speech

              The purpose of the current study is to examine further whether perceived sexual orientation and sex typicality in speech are distinct constructs.  This is done by conducting a post-hoc comparison of the perceived sexual orientation measures from Munson et al. (2005) to a set of perceived masculinity/femininity measures of the same talkers.  These measures were made for post-hoc analysis of a study on how perceived sexual orientation affects fricative perception (Munson, Jefferson, & McDonald, 2006); they were not analyzed in Munson et al. (2005). 

              Understanding the relative independence of perceived sexual orientation and perceived masculinity/femininity is important for a variety of reasons.  First, popular-culture characterizations of GLB people's speech often portray this speech style as inappropriately sex-atypical, either explicitly or by implication.  This is particularly salient in the exaggerated and often prejudicial caricatures of GLB people seen in the popular media, which have been well documented in, for example, the films of the mid-to-late 1900s (Russo, 1987).  However, the small body of research on sexual orientation and speech has never systematically examined whether GLB speech styles are indeed whole-scale sex-atypical ways of speaking.  Some previous results suggest otherwise, such as Pierrehumbert et al.'s finding that L/B women's vowel formants differed from heterosexual women's only for the F2 of /A/ and /u/.  Second, understanding whether GLB speech styles are globally sex-atypical ways of speaking will help motivate research on the ways in which this speech style is acquired by talkers and processed by listeners.  It is incontrovertible that young children are sensitive to sex differences in speech production.  Boys' and girls' sex can be identified by naïve listeners from audio-only speech samples well in advance of the puberty-related anatomical changes that would make these inevitable (Perry, Ohde, & Ashmead, 2001), suggesting that early speech acquisition involves the identification and selective learning of sex-specific ways of speaking.  If GLB speech styles were indeed globally sex-opposite ways of speaking, then they could be acquired by simply emulating opposite-sex models in the ambient language.  In contrast, a finding that GLB speech styles are not globally sex-atypical ways of speaking suggests a more-complicated acquisition scenario.  This process would be analogous to the acquisition of other variable patterns, such those described by Roberts (2001).  The details of the acquisition of GLB speech styles would be dependent on the specific ways in which GLB and heterosexual people differ. 

              This study expands on previous research by examining relationships between perceptual measures of sexual orientation and perceptual measures of sex typicality made by an independent group of listeners.  Here, sex typicality is measured by having listeners judge how masculine or feminine the talkers sound.  Pierrehumbert et al. and Munson et al.’s arguments that GLB speech styles are not globally sex-atypical ways of speaking would be considerably strengthened if it could be demonstrated that perceived sexual orientation and perceived masculinity or femininity measures differed systematically.  Moreover, the argument would be strengthened if specific acoustic measures that uniquely predict the two parameters could be identified. 

              A number of previous studies have examined masculinity in male speech without making explicit reference to actual or perceived sexual orientation.  Both Terengo (1966) and Avery and Liss (1994) examined acoustic characteristics of the speech of men who had been perceptually identified as more- or less-masculine sounding.  Both of these investigations found that greater sentence-level pitch variation was associated with less-masculine sounding speech.  In addition, Avery and Liss found that less-masculine-sounding men produced higher center frequencies of the fricative /s/, as well as more-peripheral formant frequencies for the vowel /i/, than more-masculine-sounding men.  No previous studies have examined the acoustic correlates of femininity in women's speech, though some anecdotal observations have been made.  For example, Strand and Johnson's (1996) study of the influence of sex typicality on fricative perception included stimuli from one woman who was rated as prototypically feminine and one who was not.  They noted that the non-prototypical-sounding woman had a more creaky voice quality than the prototypical one. 

              Three previous studies have directly compared perception of sexual orientation and masculinity in speech.  Gaudio (1994) asked a group of listeners to rate perceived sexual orientation and perceived masculinity for a small group (n=8) of men.  These ratings were found to be highly correlated.  Two weaknesses of this study are the small number of talkers, and the fact that the same group of listeners made both judgments.  Listeners’ judgments of one parameter might have been affected by their memory of what they had judged a talker for the other parameter.  Smyth, Jacobs, and Rogers (2003) examined binary masculine/feminine and gay-sounding/straight-sounding measures for a larger (n=25) group of male talkers.  The two sets of ratings were made by independent groups of listeners.  Smyth et al. found that, on average, listeners were less likely to rate a talker to sound feminine than they were to rate him as sounding gay.  Moreover, some of the 25 talkers were reliably rated to sound gay, but were not judged to sound feminine.  Smyth et al. conjectured that listeners were relying on vocal pitch when making judgments of masculinity, but not when making judgments of perceived sexual orientation.  However, this was not supported by acoustic analyses: mean f0 did not correlate significantly with either type of rating.  Levon (2006) found a modest, statistically significant correlation between measures of perceived sexual orientation and masculinity for men's voices. 

              The current investigation goes beyond previous research three ways.  First, it examines relationships between perceived sexual orientation and perceived masculinity/femininity for both men and women.  Previous research has only examined these questions in male talkers.  Second, this study uses single-word stimuli rather than connected-speech materials.  The use of single word stimuli allows us to draw more definite conclusions regarding the specific segmental phonetic cues that are associated with perceived sexual orientation and perceived masculinity/femininity than could be drawn in previous studies using sentences or connected speech.  Third, this investigation systematically investigates which acoustic parameters are associated with perceived sexual orientation, and which are associated with perceived masculinity/femininity.  These questions are examined by conducting an experiment in which listeners rated their perception of masculinity or femininity of the 44 talkers first examined in Munson et al. (2005).  Analyses include comparisons of the masculinity/femininity measures to the measures of perceived sexual orientation from Experiment 2 of Munson et al., and to the acoustic characteristics of the talkers’ speech, reported in Experiment 1 of Munson et al. 

METHODS

Participants

The listeners in the perceived masculinity/femininity experiment were 10 adults who did not participate in the earlier perception studies reported by Munson et al. (2005).  They were all native speakers of English, and none reported a history of speech, language, or hearing disorders.  They received $5.00 for participating.  They were not aware that their data would be compared to other data on the talkers’ perceived sexual orientation.  Detailed demographic characteristics of the listeners were not collected.  However, the University of Minnesota student body is comprised overwhelmingly of students from Minnesota, the Dakotas, Wisconsin, and Iowa.   The 44 talkers in Munson et al. (2005) were from this area, which corresponds to the North Central dialect region in Labov, Ash, and Boberg (2005).  Consequently, we can presume that the majority of the listeners shared the same dialect as the talkers. 

  Details about the participants in the perceived sexual orientation rating study can be found in Munson et al. (2005).  Briefly, they were 40 adults with normal speech, language, and hearing abilities.  They were unfamiliar with the 44 talkers who they were rating.  They rated each talker four times, based on stimuli with different segmental compositions.  The stimuli are shown in Table 1, which also shows the full set of stimuli from which acoustic measures of the talkers were made.  The listeners provided ratings on a five-point equal interval scale, where 5 indicated definitely sounds GLB, 1 indicated definitely sounds heterosexual, and 3 indicated sounds neither GLB nor heterosexual. The same stimuli used in the perceived sexual orientation experiment in Munson et al. (2005) were used in the perception experiment in the current investigation. 

Stimuli

Stimuli were words produced by the 44 talkers from Munson et al. (2005).  These talkers were recruited from a variety of locations, using recruitment materials that did not make explicit reference to sexual orientation and speech.  The talkers were not informed that the study focused primarily on sexual orientation and speech until the data had been collected.  Both of these procedures were implemented to maximize the representativeness of the sample of talkers, and the representativeness of the type of speech that the talkers produced during the recording session. 

The design and procedure for the perception experiment in this investigation closely paralleled the perception experiment in Munson et al.  Stimuli were 12 words each from the 44 talkers, including three words with front vowels and sibilant fricatives (gas, said, same); three words with front vowels and no sibilant fricatives (bell, fade, path); three words with back-round vowels and sibilant fricatives (loose, soap, soon), and three words with back-round vowels and no sibilant fricatives (hoop, note, tooth).  These had a 22.05 kHz sampling rate with 16-bit quantization, and had been processed through an 11.025 kHz anti-aliasing filter.  The amplitudes of the 528 stimuli were peak-normalized prior to presentation.  The stimuli were a subset of the larger set of 32 words analyzed by Munson et al., which are listed in Table 1. 

*****INSERT TABLE 1 ABOUT HERE*****

Procedures

The experiment took place in a double-walled sound-treated booth in a university laboratory.  The experiment was designed and executed using the E-prime experiment-management software (Schneider, Eschman, & Zuccolotto, 2002).  On each trial, three words were played.  These were the triplets of words with similar phonemic composition, described in the previous paragraph.  An orthographic display of the words was presented on a 17" computer monitor in 36-point courier font concurrent with their audio presentation.  After each trial, listeners rated the talkers' masculinity or femininity on a 5-point equal-interval scale.  The experiment was blocked by talker sex.  For the men, 1 indicated sounds very masculine, 3 indicated sounds somewhat masculine and 5 indicated sounds not at all masculine.  For the women, 1 indicated sounds very feminine, 3 indicated sounds somewhat feminine and 5 indicated sounds not at all feminine.  The numbers 2 and 4 indicated intermediate values[2].  Cards with the above wording were placed above buttons on a button-box.  Participants responded by pressing buttons; their responses were logged automatically.  Words were presented over headphones at a level of approximately 65 dB HL.  Experimental blocks were preceded by a short practice block containing two talkers and two words not used in the experiment.  Experimental stimuli were presented in fully randomized order.  During the experimental blocks, an additional 88 filler stimuli, not analyzed in this report, were presented. 

RESULTS

Analyses of Variance

For each listener, average perceived masculinity or femininity was calculated separately for GLB and heterosexual men and women for the four stimulus types (words with back vowels/sibilant fricatives, front vowels/sibilant fricatives, back vowels/no sibilant fricatives, and front vowels/no sibilant fricatives).  A three-factor (2 vowel backness x 2 sibilant fricative presence/absence x 2 talker sexual orientation) within-subjects ANOVA was used to examine ratings of masculinity and femininity.  Men and women talkers were examined separately because the scales that listeners used were not identical (men were rated along a masculinity dimension, and women were rated along a femininity dimension). 

When men were examined, there was a significant main effect of vowel backness (F[1,9] = 13.4, p < 0.01, partial h2 = 0.60): listeners rated men to sound less masculine when making ratings from front-vowel words than when making them from back-vowel words.  There was also a significant main effect of talker sexual orientation (F[1,9] = 13.2, p < 0.01, partial h2 = 0.59): self-identified gay men were rated to sound less masculine than self-identified heterosexual men.  These main effects were qualified by a significant interaction between vowel backness and sexual orientation (F[1,9] = 8.9, p < 0.05, partial h2 = 0.50) and among vowel backness, sibilant fricative presence/absence, and sexual orientation (F[1,9] = 5.7, p < 0.05, partial h2 = 0.39).  These interactions can be seen by comparing the bar heights in Figure 1, which shows average masculinity ratings for men.  For comparison, Figure 1 also plots perceived sexual orientation data from Munson et al. (2005) for the same talkers.  Post-hoc paired comparisons showed there to be significant group differences between self-identified gay and heterosexual men for the conditions in which listeners were presented with words containing back vowels and sibilant fricatives and front vowels and no sibilant fricatives (t[9] = 2.5, p < 0.05, t...

Zgłoś jeśli naruszono regulamin