Unraveling the mystery of stuttering: clinical and physiological insights into its manifestation

Abstract

Stuttering is a complex neurodevelopmental speech disorder characterized by involuntary sound and syllable repetitions, prolongations, and speech blocks, accompanied by marked variability across linguistic, emotional, and situational contexts. Although numerous hypotheses have been proposed to explain its underlying mechanisms, many have encountered a fundamental limitation: the difficulty of coherently accounting for the full range of clinical, developmental, and neurobiological features observed in people who stutter. In response to this gap, the present work proposes a comprehensive, integrative hypothesis that seeks to unify the diverse physiological and clinical manifestations of stuttering within a single neurobiological framework. This model aims to link moment-to-moment fluctuations in speech behavior with neurodevelopmental alterations, offering a plausible mechanistic account for a wide spectrum of core phenomena. These include the pronounced situational variability of stuttering severity; the developmental shifts from repetitions to blocks; the transition of disfluencies from function words to content words; the tendency for stuttering to occur on key words in a sentence; and the consistently lower rates of spontaneous recovery observed in males compared to females. Furthermore, the proposed framework seeks to explore potential common mechanisms underlying the widespread structural, metabolic, and functional brain changes documented in stuttering, while considering whether these abnormalities may reflect primary contributors or secondary, compensatory adaptations. In particular, the model seeks to address a long-standing debate regarding the role of the right inferior frontal gyrus, examining whether its engagement is more consistent with a causal contribution to speech disruption or with an adaptive response to impaired speech–motor control. By integrating neurodevelopmental, physiological, and clinical evidence, this hypothesis offers a unifying perspective on key features of stuttering while proposing a neurobiological model whose assumptions and hypotheses can be empirically tested and evaluated in future experimental studies.

1 Introduction

Although often defined in terms of speech disfluency, developmental stuttering carries consequences that reach far beyond the act of speaking, exerting a broad and often enduring psychosocial impact that can begin early in life. During childhood, when peer acceptance and social comparison play a central role in shaping self-esteem, children who stutter (CWS) are more likely to experience social rejection, reduced peer status, and increased exposure to bullying, and are less frequently perceived as popular or as leaders within their peer groups (Davis et al., 2002; Berchiatti et al., 2021). Such early social disadvantages coincide with elevated internalizing vulnerability, with school-aged CWS showing markedly increased risk for anxiety disorders, including “six-fold increased odds” of social anxiety disorder relative to their fluent peers (Iverach and Rapee, 2014; Iverach et al., 2016).

As these experiences accumulate throughout development, their impact often becomes more closely tied to self-evaluation and identity formation. For many individuals, these patterns persist into adulthood, manifesting as chronic feelings of shame, avoidance of social interaction, and restrictions in participation across educational, interpersonal, and occupational domains (Türkili et al., 2022; Alatawi and Good, 2025). Evidence from adolescence also indicates that greater stuttering severity is associated with lower domain-specific and global self-esteem, alongside heightened sensitivity to peer evaluation and emerging self-stigma (Butler, 2013; Adriaensens et al., 2015).

The functional consequences of these psychosocial challenges are particularly evident in employment contexts. Survey data indicate that more than 70% of adults who stutter (AWS) believe their speech difficulties reduce their chances of being hired or promoted (Klein and Hood, 2004), while population-level analyses demonstrate measurable labor market disadvantages, including an earnings deficit exceeding $7,000 for AWS and increased underemployment among women who stutter (Gerlach et al., 2018). At the same time, greater stuttering burden has been linked to poorer mental health outcomes, with higher levels of depression, anxiety, and stress reported among adults experiencing greater adverse impacts (Engelen et al., 2024). Converging evidence further underscores the clinical significance of these associations, as elevated depressive symptoms and an increased risk for suicidal ideation have been documented in some subgroups of people who stutter (PWS) (Briley et al., 2021; Tichenor et al., 2023).

Despite its well-documented psychological consequences, stuttering remains a puzzling neurodevelopmental disorder of unclear etiology, characterized by substantial heterogeneity in its clinical presentation. In addition to classical developmental stuttering, several clinically distinct forms have been described, including neurogenic, psychogenic, and pharmacologically induced stuttering. Although there is overlap in both symptoms and neurological causes across these forms (Theys et al., 2024), clear and systematic differences also exist among them. Non-developmental forms are typically preceded by identifiable precipitating events, such as neurological injury, psychological trauma, or medication exposure. Moreover, neurogenic stuttering tends to show greater consistency across speaking tasks and communicative contexts, whereas psychogenic stuttering may exhibit marked improvement following psychological intervention (Cruz et al., 2018; Zunic et al., 2021). For reasons of conceptual precision and interpretive clarity, the present work focuses specifically on classical developmental stuttering.

Classical developmental stuttering is conventionally defined by its overt speech disruptions, including repetitions, prolongations, and speech blocks. While these features constitute the most recognizable clinical manifestations of the disorder, limiting the definition to speech-level phenomena alone overlooks a second, equally fundamental characteristic: situational variability. Stuttering severity is known to fluctuate substantially across speaking contexts, communicative demands, and time, a phenomenon that has been consistently documented over decades of research (Bloodstein, 1949; Shulman, 1955; Yaruss, 1997; Constantino et al., 2016; Tichenor and Yaruss, 2021; Usler, 2022; Lei et al., 2024; Rasoli Jokar et al., 2025). This marked variability has posed a persistent challenge to theoretical models of stuttering and has motivated the use of multiple speech samples across diverse contexts to obtain ecologically valid assessments of fluency (Tichenor and Yaruss, 2021). Given its central relevance, situational variability warrants explicit consideration as a core feature of developmental stuttering. The present work therefore begins by examining this phenomenon in depth, as we contend that it holds a crucial key to understanding both the emergence of stuttering and the manner in which it manifests across different speaking situations.

2 Situational variability in stuttering2.1 Definition and types

Although a substantial body of research has examined situational variability, many studies have relied on closely related definitions, often describing the phenomenon through brief explanations rather than a formalized definition. To enhance conceptual precision, we therefore propose a unified definition of situational variability grounded in the cumulative evidence and historical literature mentioned above. Accordingly, situational variability can be defined as any noticeable change in stuttering frequency and/or severity that occurs when an individual is exposed to different speaking situations, contexts, or tasks, as well as across time, including day-to-day variation.

Situational variability should not be regarded as a random or unsystematic fluctuation in stuttering frequency/severity. Rather, the extensive body of research examining this phenomenon has progressively clarified its structure and components, allowing for meaningful classification of its patterns. Based on the collective findings of this literature, situational variability can be broadly divided into three categories.

The first of these is what we call stereotypical variability. Stereotypical variability refers to speaking situations in which a consistent and predictable pattern of stuttering behavior has been repeatedly documented. In these situations, both clinicians and PWS can reasonably anticipate whether stuttering will markedly decrease or significantly worsen in frequency and/or severity. This category can be further subdivided into two primary types.

The first type comprises fluency-inducing conditions, which are consistently associated with a substantial reduction in stuttering severity/frequency. These include self-talk/talking when no one is present (Bloodstein, 1949; Langová and Šváb, 1973; Jackson et al., 2021), singing (Wan et al., 2010), choral reading (Freeman and Armson, 1998; Dechamma and Maruthy, 2018; Meekings et al., 2023), and entering states of euphoria or intense focus/engagement, adopting novel or altered speech patterns, or brief episodes of emotional outburst (Usler, 2022). In such conditions, stuttering is reliably attenuated, often to a marked degree.

The second type includes conditions that exacerbate stuttering, which are well known to intensify stuttering severity/frequency. These typically involve heightened communicative demand or social evaluative pressure, such as speaking in front of an audience (Porter, 1939; Hahn, 1940; Moïse-Richard et al., 2021), addressing an authority figure (Sheehan et al., 1967), introducing oneself, including saying one’s own name (Usler, 2022).

Across both fluency-inducing and stuttering-exacerbating situations, a robust and directional change in stuttering severity is observed: improvement in the former and deterioration in the latter. Importantly, this effect appears to be broadly shared among PWS, indicating a common underlying sensitivity to these situational conditions. However, within this shared directional response, notable variability remains. For example, although nearly all participants in studies examining speaking alone demonstrate noticeable improvement in fluency, the magnitude of this improvement differs among individuals. Some reach near-complete or complete fluency, whereas others continue to stutter, albeit with reduced severity (Bloodstein, 1949; Langová and Šváb, 1973; Jackson et al., 2021). A similar pattern has been reported in choral reading; an improvement was observed in all participants; however, the magnitude of stuttering reduction varied significantly among individuals, ranging from approximately 77.6 to 99.7% (Freeman and Armson, 1998) and 90 to 100% (Dechamma and Maruthy, 2018; Meekings et al., 2023). During singing, all participants exhibited a strong reduction in stuttering, although minor inter-individual differences in improvement persisted, estimated at around 10% (Wan et al., 2010).

The second category, which we refer to as time variability, describes noticeable changes in stuttering frequency/severity across different temporal scales. Such variability may occur rapidly within a single day or more gradually across longer periods, including days, weeks, or even months (Yaruss, 1997; Constantino et al., 2016; Tichenor and Yaruss, 2021; Lei et al., 2024).

The third and final category in our classification is individual variability, which refers to the observation that different PWS may respond differently to the same speaking situation or task in terms of changes in stuttering frequency and/or severity. This category captures inter-individual differences in how situational demands influence stuttering behavior and can be further divided into three subtypes.

The first subtype involves a broadly similar directional response across individuals, with variability emerging primarily in magnitude rather than direction. A clear example of this pattern is observed in responses to fluency-inducing conditions, where most PWS demonstrate a marked degree of improvement, yet the extent and strength of this improvement may vary between individuals.

The second subtype reflects genuinely divergent responses to the same situation, whereby identical contexts or tasks elicit opposite effects across different individuals, excluding situations that are already well established as universally fluency-inducing or stuttering-exacerbating. As described by Rasoli Jokar et al. (2025), activities such as traveling or playing with peers were reported to increase stuttering severity in some children, while in others, these same activities were reported to reduce apparent stuttering severity. Similar inter-individual contrasts have also been noted in earlier work, including observations reported by Yaruss (1997).

The third subtype concerns variability in the qualitative characteristics of stuttering itself, both within and between individuals. Specifically, the particular sounds, syllables, or words on which stuttering occurs may differ among PWS. Moreover, within the same individual, these loci of stuttering are not fixed and may shift over time, such that previously difficult sounds or words become less problematic while new ones emerge (Tichenor and Yaruss, 2021).

Interestingly, despite the substantial body of evidence and the critical importance of this feature, situational variability is rarely incorporated into formal definitions of stuttering in the majority of the literature. Yet this very feature has served as the foundation upon which numerous hypotheses have been constructed and, notably, the same foundation upon which many of them have been challenged or rejected. Accordingly, to unravel the mystery of stuttering, the present work deliberately begins with its most elusive feature, revisiting a long-standing question that has accompanied the field for decades: Why does stuttering appear to be situationally variable?

2.2 The historic question that sparked it all

When examining the earliest studies and hypotheses about stuttering, it is striking that many of them also began from this very phenomenon. In the work of Fletcher (1914), stuttering was shown to vary markedly with who is listening and the speaking setting (often easier in private than under scrutiny); Tompkins (1916) interpreted these shifts as fear-driven, misdirected conscious effort interfering with otherwise automatic speech and noted strong improvements in singing and unison speech; and Swift (1915) suggested a mechanistic clue in atypical conscious imagery during speaking. Taken together, these early accounts were among the first to draw attention to what would later be termed the situational variability of stuttering.

In recent years, several well-known hypotheses have been advanced to explain stuttering. These include the brain energy hypothesis (Alm, 2021), the speech rhythm hypothesis (Etchell et al., 2014), the striatal dysfunction hypothesis (Maguire et al., 2020), and the overreliance-on-auditory-feedback hypothesis (Civier et al., 2010). Each of these frameworks has contributed meaningful progress and has strengthened our understanding of stuttering by illuminating specific mechanisms that may be involved.

At the same time, none of these accounts appears to have sufficient scope to explain the full phenotype of stuttering, including situational variability across the wide range of speaking contexts discussed above. In most models, a remaining gap is almost inevitable; it may reflect contradictions in accounts of how stuttering emerges across speaking situations, an incomplete integration of established neurophysiological and neuroanatomical alterations, or a framework whose explanatory aim is restricted to a single component rather than the full clinical phenotype.

From our perspective, any effort to identify the cause of stuttering should, at least in principle, account for all major stuttering phenomena and symptoms before being treated as an etiological explanation and tested empirically. In this regard, and because stuttering has a precise neurological signature, it is essential to first examine what is already established in adults who stutter (AWS) regarding brain structure and overall patterns of neural activity.

2.3 Brain changes in adults and CWS

Neuroimaging studies consistently show that AWS exhibit reliable structural and functional brain differences. Functionally, reduced activation is often observed in left hemisphere language and speech–motor regions, alongside atypical basal ganglia involvement (Maguire et al., 2020). Although activity in several cortical regions may normalize under fluency-enhancing conditions such as choral reading, striatal activity can remain abnormally low, which has been interpreted as compatible with impaired feedforward control within left hemisphere speech networks (Chang et al., 2019). Structurally, gray matter reductions have also been reported in the striatum, specifically the left caudate nucleus (Sowman et al., 2017).

In parallel, neuroimaging studies have reported increased recruitment of right-lateralized control and salience systems during speech in AWS, most consistently involving the right inferior frontal gyrus, anterior cingulate cortex, right dorsolateral prefrontal cortex, and right anterior insula, with additional involvement of limbic regions such as the amygdala in some reports (Chang et al., 2009; Kaganovich et al., 2010; Budde et al., 2014; Belyk et al., 2015; Neef et al., 2018; Toyomura et al., 2018; Jackson et al., 2022). Consistent with atypical sensory monitoring, AWS also show diminished pre-speech auditory suppression (Daliri and Max, 2018), and the left superior temporal gyrus often exhibits reduced activation and abnormal connectivity during natural speech, with partial normalization under fluency-enhancing conditions (Garnett et al., 2022).

It is noteworthy that many of the neural alterations reported in AWS appear to represent a developmental continuation of patterns already detectable in childhood. In preschool-aged children with persistent stuttering (3–5 years), Chow et al. (2023) reported reduced gray matter volume in the striatum, specifically the putamen and nucleus accumbens, slower development of the left inferior frontal gyrus, and reduced white matter volume in major tracts including the bilateral corona radiata, superior longitudinal fasciculus, and corpus callosum. These findings are consistent with broader evidence implicating the basal ganglia–thalamocortical (BGTC) loop in stuttering across studies (Sommer et al., 2002; Guenther, 2006; Kell et al., 2024; Beal et al., 2013; Foundas et al., 2013; Connally et al., 2014; Civier et al., 2015; Chang et al., 2008, 2015, 2019).

A similar developmental continuity may also apply to right hemisphere involvement; Neef et al. (2023) reported that the right posterior inferior frontal cortex (pars opercularis) in CWS aged 3–11 shows a mixed connectivity profile, with enhanced coupling to insula and somatomotor regions implicated in motor control and inhibition, alongside weaker coupling with components of the dorsal attention network, which supports attentional and top–down cognitive regulation.

2.4 Contemporary hypotheses and their limits

Taken together, these investigations reveal a clearly delineated disturbance predominantly affecting the left hemisphere, particularly regions involved in speech production. However, while many of these cortical areas demonstrate context-dependent variability in activation across speaking situations, one structure stands out as unusually consistent: the striatum. Structural and functional abnormalities within the striatum have been documented as early as 3–5 years of age and appear to persist into adulthood. This continuity initially led to the assumption that stuttering arises primarily from a deficit within the left hemisphere speech production network.

Yet, such a fixed deficit alone is clearly insufficient. If stuttering were solely the consequence of a stable impairment in speech–motor regions, the pronounced phenomenon of situational variability would not be observed. The marked fluctuations in fluency across contexts indicate that stuttering cannot be reduced to a static dysfunction, even though abnormalities in left hemisphere speech regions are well established. The disorder is therefore more complex than a simple impairment of speech production mechanisms.

This realization prompted the exploration of alternative explanations. One early line of reasoning focused on speech rhythm (Etchell et al., 2014), motivated by the observation that stuttering often diminishes dramatically or even disappears in conditions such as singing and choral reading. These findings suggested the presence of a disrupted internal timing or rhythmic mechanism, with external rhythmic cues compensating for this deficit and thereby improving fluency. However, this hypothesis left critical questions unresolved. How, for example, can it explain the reduction of stuttering during self-speech or during moments of euphoria and intense emotional arousal?

More recently, Meekings et al. (2023) proposed findings indicating that during choral speech, PWS had an increased speech rhythm frequency, whereas neurotypical speakers had a decreased frequency. This opposite pattern challenges the hypothesis that PWS achieve fluency by matching their partner’s rhythm. Instead, fluency may result from the temporary suspension of compensatory strategies rather than aligning with external rhythms. Thus, fluency may not be directly dependent on rhythm imitation.

In an attempt to explain why stuttering decreases during self-speech and other low-pressure situations, the brain energy hypothesis was introduced (Alm, 2021). According to this view, a generalized reduction in neural energy production limits the brain’s ability to support speech under cognitively demanding or stressful conditions, whereas simpler, low-pressure contexts remain manageable (e.g., self-talk). This hypothesis offered an appealing explanation for several aspects of situational variability. However, much like the rhythm hypothesis, it encountered significant limitations. It failed to account for the absence of stuttering during singing and choral reading, as well as during euphoric or highly emotional states, contexts in which neural energy consumption would presumably be elevated. Moreover, it could not explain individual variability, whereby different individuals exhibit opposite fluency responses to the same speaking situation.

Attention then shifted to the overreliance-on-auditory-feedback hypothesis (Civier et al., 2010). This model proposes that individuals who stutter rely excessively on auditory feedback during speech, rather than on efficient feedforward motor planning. Because auditory feedback is inherently slower than motor execution, this overdependence renders speech vulnerable to delays, hesitations, and breakdowns. While this framework successfully explains several fluency-enhancing conditions, it again falls short in critical areas. It does not adequately account for individual variability, nor does it explain the emergence of speech blocks. Furthermore, it localizes the core deficit almost exclusively to left hemisphere speech and auditory regions, while neglecting the right hemisphere overactivity.

This raises a critical question: is it possible to formulate a hypothesis that integrates these neural abnormalities while also offering a coherent and mechanistically plausible account of situational variability and its associated features?

3 From neural abnormalities to situational variability: a unified hypothesis3.1 The consistent theme across all experiences

We begin with the aspect of stuttering that is supported by the strongest and most consistent empirical evidence: atypical developmental changes in the left hemisphere, particularly within speech production regions and their adjacent cortical networks.

Based on the information we have discussed regarding the evident abnormalities in the left hemisphere, particularly in the BGTC and the striatum, along with auditory regions like the LSTG, these abnormalities, regardless of their specific nature, lead to what we refer to as error signals. Error signals refer to any issues or disruptions that occur between the functioning and communication of auditory–speech–motor systems.

Despite these abnormalities, as is evident, they seem insufficient to maintain stuttering consistently. A prominent example of this is the complete disappearance or significant reduction in stuttering when speaking to oneself or in solitude. Hence, the emergence of stuttering in situations such as public speaking, delivering an important message, or speaking to someone in authority—common scenarios for stuttering—suggests that the disruption in these regions is fundamentally weak, below the threshold required for stuttering to emerge (Brocklehurst et al., 2013).

Therefore, for stuttering to manifest, some other component must intervene. Here, we are presented with two possibilities: either this second mechanism exceeds the threshold, triggering stuttering, or it supports a different mechanism that causes stuttering without necessarily surpassing a threshold in speech production regions.

To investigate this mechanism, and based on our consideration of all types of issues that may arise along the auditory–speech–motor pathway as error signals, the first question that comes to mind is how does the brain handle these error signals? How does it respond to them?

3.2 The self-monitoring system as the brain’s error-handling mechanism

The brain seems to possess a specialized mechanism for detecting and correcting error signals, referred to as the self-monitoring system (SMS). The SMS, as outlined by (Nozari, 2025), Nozari et al. (2011) and Arenas (2017), a collection of cognitive and neural processes that continuously assess and regulate speech production. It identifies discrepancies between expected and actual speech outcomes by employing mechanisms such as conflict monitoring and forward models (predicting sensory feedback from speech actions). This system is crucial for ensuring fluent speech, adjusting speech plans in response to linguistic conflicts, motor planning issues, and the influence of emotional and social factors. It integrates both internal cognitive feedback and external feedback to optimize the accuracy of speech production.

The SMS is a continuous, automatic process that operates persistently throughout speech production. It does not activate or deactivate at specific moments but functions continuously as a natural mechanism for detecting and correcting errors. It should be viewed as an inherent, supportive system in the process of speech production, particularly within the framework of this discussion (Nozari et al., 2011).

This system is characterized by its distinctive ability to function primarily subconsciously, continuously tracking speech production and making automatic adjustments without conscious involvement. However, when significant errors or discrepancies are detected, the system can be upregulated to conscious awareness, allowing for intentional attention and correction of the speech output (Nozari, 2025).

Up to this point, we have been dealing with a natural and supportive framework for speech production, where speech normally proceeds under largely subconscious control. Within this framework, conscious attention to speech can be understood as an additional strategy employed by the system to further support speech production by recruiting perceptual regions and allocating explicit attentional resources to speech.

However, the salient feature emerging from the shift from subconscious to conscious processing raises a critical concern. This feature appears to be consistently present in most situations associated with an increase in stuttering and notably absent in situations where stuttering is markedly reduced or disappears.

For instance, in singing, attention is redirected toward music, melody, and the reformulation of speech within a new rhythmic and prosodic structure. In choral reading, attention is anchored to rhythm and temporal alignment with others’ speech. In states of euphoria or deep engagement, attentional resources are almost entirely captured by the external stimulus. Similarly, during intense emotional arousal, the system’s attentional capacity is strongly oriented toward the external emotional trigger.

Across all these conditions, we observe a common pattern: the conscious component of the SMS is either reassigned or “hijacked” away from speech itself, allowing error detection and correction to proceed subconsciously. Speech therefore remains fluent. In contrast, the system can be upregulated into conscious awareness in virtually all situations in which stuttering emerges: speaking in front of others, addressing authority figures, delivering an important message, self-presentation, or even stating one’s own name.

At first glance, this suggests that the transition from subconscious error adjustment to conscious error adjustment constitutes the core mechanism underlying the emergence of stuttering. Yet, this explanation alone is insufficient, as it fails to account for one of the most robust fluency-enhancing conditions: speaking to oneself or speaking alone. In this context, an individual may direct conscious attention to speech and still speak fluently.

This observation necessitates the introduction of a second critical component, which, together with conscious attention, appears to form the core mechanism governing the appearance and disappearance of stuttering. This component is social evaluation: the process by which individuals judge themselves based on perceived social standards, expectations, and feedback from others, often influencing emotions, behavior, and self-concept. Social evaluation is present in all situations where stuttering emerges and, crucially, requires the presence of one or more listeners. It is markedly reduced during choral reading, where the individual voice is masked by others. Therefore, no individual or special attention is directed toward any single person, as they are considered part of the group. It also diminishes significantly during states of euphoria, emotional intensity, or deep engagement, where awareness of the self and even the self as an entity fades. In singing, self-evaluation may still be present, yet conscious attention to error signals is largely absent, as attentional resources are fully allocated to melody and rhythm.

Thus, what appears to explain the stereotypical variability category of stuttering is the interaction between two factors: conscious error monitoring and social evaluation. Wherever these two factors co-occur, stuttering emerges. Wherever one or both are absent, stuttering is significantly reduced and may disappear entirely in certain individuals or contexts. The critical question, therefore, is what these two factors induce at the neural level? What is happening within the brain when they co-occur?

3.3 Neural-level interpretation

The error signals originating within speech production regions are initially detected by the SMS. From this point, two distinct processing routes can be identified. In route 1, in the absence of one or both factors (conscious attention and social evaluation), these signals are processed by the SMS as ordinary error signals. They are resolved either subconsciously or consciously but without the presence of social evaluative pressure. In route 2, when both factors are simultaneously present, a qualitatively different process emerges. Although the SMS still detects the same error signals, it no longer treats them as neutral error information. Instead, the co-presence of conscious attention and social evaluation forces the system to reinterpret these signals as warning signals.

In other words, speech context transforms error signals from neutral markers of deviation into signals imbued with threat relevance. This involves a shift from subconscious to conscious control, mediated by higher-order neural regions. This process can be summarized as follows:

Error signals → no conscious error monitoring / no social evaluation → error signals processed within monitoring system.

Error signals → social evaluation + conscious error monitoring → warning signals → recruitment of additional regions.

In this framework, stuttering is not the result of defective error detection per se, but rather of a context-dependent escalation of error signals into warning signals, driven by the convergence of conscious monitoring and social evaluative processing.

The question that follows is: what constitutes these warning signals, and how can their function be understood? In our hypothesis, warning signals are not independent signals per se, but rather a reinterpretation of error signals broadcast by the SMS, indicating that these signals carry heightened contextual significance. This heightened significance emerges when the speech context is socially or personally salient, such as during social evaluation, perceived importance of the listener, performance-related expectations, fear of failure, and the desire to avoid negative attention. Within such contexts, these cognitive and affective factors imbue error signals with emotional weight, leading the SMS to reclassify them as warning signals. These warning signals operate within a defensive framework, whereby the system attempts to recruit additional neural resources and allocate increased attentional focus in the service of caution, precision, and control, with the goal of producing fluent speech and achieving the intended communicative impression or goal of the speaker.

To clarify this further, specific examples can demonstrate how they impact critical operational factors, including social evaluation and conscious error monitoring.

When PWS talk to someone they are comfortable with, they pay less attention to speech errors, and social evaluation indicates that it is acceptable to stutter, resulting in a notable reduction in stuttering. However, when PWS talk to that same person but need to deliver an important message or convey something precisely, they pay much more attention to speech errors, and social evaluation is heightened due to the pressure to speak correctly and fluently, leading to an increase in stuttering. In another example, at critical moments—such as a problem or an issue of great importance in the life of PWS, or speaking in front of a very important person—two scenarios may occur.

In the first scenario, stuttering may become extremely pronounced. The interpretation here is that attention is heavily focused on error monitoring and heightened social evaluation, which amplifies speech disruptions. In the second scenario, there may be a sudden reduction in stuttering, with speech flowing smoothly and without interruption. Our interpretation is that the speaker’s attention is fully directed toward the situation itself, while the conscious monitoring system is temporarily overridden by the external demands and context. As a result, the speaker momentarily “forgets” themselves and stops focusing on errors, engaging fully with the idea, the person, and the situation. This explains why, within the same situation, stuttering may intensify in one individual while diminishing in another. This variability largely depends on where the speaker’s attentional focus is directed; when this focus is on error monitoring and converges with social evaluation, stuttering is more likely to emerge.

3.4 Why threshold explanations remain insufficient

There is a perspective that will emerge here, suggesting that these warning signals may represent the very point that reflects the assumed threshold in speech production areas and triggers the manifestation of stuttering (Brocklehurst et al., 2013). This is a reasonable view; however, there is a distinctive feature that tends to weaken this argument and cast doubt on it.

A notable pattern emerging from clinical observations and empirical reports is that stuttering is more likely to occur on “critical” words—those that carry greater communicative importance within an utterance—rather than being randomly distributed across speech (Kaasin and Bjerkan, 1982). PWS can often produce alternative or less contextually appropriate words fluently, yet experience breakdowns precisely on the word they judge to be the “correct,” most meaningful, or most contextually appropriate response. For instance, a person may block on the straightforward request ‘Can you help me?’ yet produce a less direct and more circuitous formulation such as ‘Sorry… um… I have a question’ with relative ease.

This phenomenon suggests that perceived importance increases the salience of specific words within the SMS, thereby increasing the likelihood of dysfluency. It is not a coincidence that stuttering often occurs on important words—the most meaningful and relevant in the sentence—while it decreases for words of less significance in the context (Usler, 2022). This interpretation aligns with findings that PWS frequently anticipate upcoming moments of stuttering (Silverman and Williams, 1972; Jackson et al., 2018, 2020) and often engage in word substitutions or reformulations to avoid anticipated difficulty. Such avoidance is less applicable in situations where only one lexical item is appropriate, such as providing one’s name or labeling an object. In these cases, the warning signal represented by the SMS appears to be disproportionately focused on specific words rather than uniformly applied across entire sentences, transforming the target word into an anticipated “fear word.” Social evaluation pressure and conscious monitoring can amplify error signals into general warning signals. However, the involvement of cognitive, emotional, and logical processes can narrow these signals to the specific words that seem most important within a sentence. Importantly, this phenomenon does not need to occur exclusively at the moment of speaking; it can arise well in advance (Jackson et al., 2015).

For example, consider a person who stutters preparing to meet someone new in the coming days. They may begin to anticipate potential sources of embarrassment, such as the likelihood that the first question will be, “What is your name?” If they falter or cannot produce the answer fluently, they fear appearing socially awkward. In this scenario, higher-order cognitive and knowledge-based regions have already determined which word is most important in the sentence. Consequently, when the situation arises and the question is asked, the system is activated, generating warning signals that reflect the contextual importance, guided by the higher-order structures that previously identified the critical word. The entire attentional focus converges on that word. This mechanism can operate both in real-time during speech and in advance, as illustrated in the example above.

This idea has been discussed for decades within the stuttering literature. Bloodstein (1955) in his discussion of information-load accounts of stuttering, proposed that moments of low predictability in spoken sentences impose heightened uncertainty and communicative responsibility on the speaker. At such points, the speaker alone must supply essential information, increasing vulnerability to speech disruption. Consistent with this observation, words that PWS identify as feared or anticipated continue to elicit stuttering even months after the participants have identified them (Mersov et al., 2018).

Another important and interesting feature that has been widely discussed is that, even during moments of severe difficulty such as when asked, “What is your name?” PWS often remain capable of producing fluent but contextually inappropriate speech. For instance, they may delay providing their name by beginning with a carrier phrase such as “My name is…” and inserting a pause before producing the actual word, or they may rely on circumlocutions or offer an alternative inappropriate lexical item, such as saying another name. Another supporting example is the repetition of entire sentences or whole words. PWS may repeat a full sentence to gain access to a single target word within it. Additionally, PWS frequently adopt alternative speaking strategies to compensate for or prevent anticipated difficulty, including employing easy onset to begin speaking, using fillers or sentence starters, and interrupting the communication partner (Van Riper, 1982; Vanryckeghem et al., 2004; Arenas, 2012; van Lieshout et al., 2014; Jackson et al., 2015). This means that even during moments of speech difficulty, PWS can still produce fluent speech, and the difficulty itself appears to be focused on specific words perceived as the most appropriate or contextually relevant within the utterance.

While the VRT hypothesis by Brocklehurst et al. (2013) has made a valuable contribution to the scientific understanding of stuttering, it does not fully account for several key characteristics of the disorder. Stuttering is often word-specific, disproportionately affecting socially, emotionally, or communicatively salient words, while adjacent words remain fluent or can be substituted. PWS frequently employ dynamic compensatory strategies, such as repeating entire sentences to access a single target word, inserting carrier phrases with pauses, or reformulating sentences in real time. This observation challenges the notion that the release threshold is localized solely within speech production areas. If that were the case, speakers should not be able to substitute words easily or produce contextually inappropriate alternatives fluently, and stuttering would be expected to occur more uniformly across speech rather than selectively targeting a single word while other words remain fluent at the moment of blocking or repetition.

What we have reached so far is the conclusion that the warning signals act as a protective mechanism in the speech system, designed to help ensure smooth speech by recruiting other higher-level cognitive and attentional processes. However, within this recruited system, there seems to be a component that does not function normally. Instead of simply supporting speech, this component appears to behave pathologically, exploiting the very defensive mechanism meant to protect fluent speech. In doing so, it triggers the very outcome the system is trying to avoid: stuttering.

3.5 Neural mechanisms of anticipation and error monitoring of stuttering

To understand who these recruited components are, it is first essential to define the key regions that constitute the SMS itself. Research consistently identifies the anterior cingulate cortex (ACC) (Nozari, 2025; Nozari et al., 2011) as a central and highly sensitive hub for self-monitoring, responsible for detecting conflicts and errors in ongoing behavior. In parallel, the lateral prefrontal cortex facilitates processing by directing attention toward task-relevant stimuli, thereby enhancing stimulus–response mapping (Nozari et al., 2016; Nozari, 2025). Within this framework, the dorsal portion of the right lateral prefrontal cortex (R-DLPFC) plays a crucial role in anticipatory processes, supporting planning and expectation in speech production (Jackson et al., 2022).

From this standpoint, the aim is to examine the functional state of these two regions in PWS, as well as to identify other brain areas that are closely linked to self-monitoring processes and that, at the same time, exhibit abnormal activity or development in this population.

Research by Jackson et al. (2022) highlights a network of right hemisphere regions that play central roles in the anticipation and monitoring of stuttering. The R-DLPFC emerges as a key node within this system. It becomes hyperactive prior to the onset of speech when a person expects to stutter, reflecting the brain’s attempt to predict errors and apply cognitive control. Closely linked to this process is the ACC, which contributes to emotion regulation, decision-making, error detection, attention, and conflict monitoring. The ACC provides error signals to prefrontal regions, bridging emotional and cognitive responses during anticipated dysfluency. The R-DLPFC is part of the Frontoparietal Network (FPN) and works in coordination with the right supramarginal gyrus (R-SMG). However, under stuttering anticipation, connectivity between the R-DLPFC and R-SMG decreases, suggesting that anticipation may disrupt the stability of the network responsible for supporting fluent speech.

In line with these findings, Toyomura et al. (2018) provided compelling evidence that emotional circuits play a direct role in the expression of stuttering during real-life communication. Their findings showed that activity in the right amygdala was positively correlated with both the number of stuttering episodes and the level of emotional discomfort (as measured by SUD scores) during interpersonal speech tasks involving eye contact. This was the first study to directly show that amygdala activation tracks actual speech disfluencies in PWS during live communication, rather than reflecting only generalized or trait anxiety. In parallel, reduced activity has been reported in the medial prefrontal cortex in PWS. Notably, the ventromedial prefrontal cortex (vmPFC), a key subdivision of this region, plays an essential role in regulating amygdala-driven emotional responses. Findings of abnormal dopaminergic signaling in the vmPFC indicate a possible functional alteration in this regulatory pathway (Wu et al., 1997). This suggests that reduced prefrontal control fails to inhibit amygdala overactivation, allowing fear and negative emotional memories to influence ongoing speech. Toyomura et al. (2018) also reported increased activation in the right insula (R-insula) during speech tasks in PWS, a region critically involved in salience detection, interoceptive awareness, and the integration of emotional and cognitive signals relevant to self-monitoring and anticipatory control.

Together, these findings suggest that stuttering involves an interaction between the error monitoring networks (R-DLPFC, ACC), the integration region (R-SMG), and emotional circuits (amygdala, vmPFC, and R-insula).

3.6 Mapping the self-monitoring system in PWS

To hypothesize how the SMS operates in PWS, it is first necessary to consider its function in fluent speakers. Two scenarios can be outlined. In the first scenario, where social evaluative pressure is absent and there is no conscious focus on speech, both fluent speakers and PWS exhibit similar SMS activity. In this context, the ACC detects error signals in speech production regions in PWS and conflict or competition signals in fluent speakers. These signals are sent from the ACC to the lateral prefrontal cortex (LPFC), which is responsible for making adjustments based on this monitoring to optimize the production process. This process occurs entirely at a subconscious level.

The second scenario highlights the critical differences. In the presence of social evaluation and the absence of distracting stimuli, conscious attention is directed toward speech.

Social evaluation is determined by higher-order cognitive regions, which engage the amygdala to assess threat-related significance and retrieve prior memories of similar socially evaluative events. In parallel, the right insula contributes to monitoring self-awareness and reflecting on interoceptive bodily sensations associated with social stress. Once a socially evaluative context is established,

View original article

FRONTIERS IN HUMAN NEUROSCIENCE

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Unraveling the mystery of stuttering: clinical and physiological insights into its manifestation

Comments (0)