The reinforcement learning community has made significant progress in understanding dopamine (DA) in reward learning, cognitive control, and motivation. Yet, little consensus remains about the functions of distinct DA bandwidths: fast phasic transients (milliseconds), ramps (seconds), and slow tonic shifts (minutes). Are these dynamics independent channels conveying distinct decision variables or temporal expressions of a unified DA computation? These questions lie at the heart of competing frameworks proposing divergent mechanisms for DA in reinforcement learning. This rift reflects not only conceptual disagreements but also the historical limitations of bandwidth-limited measurement tools. Emerging breakthroughs for wideband DA quantification promise rigorous testing of these theories. Here, we synthesize recent conceptual and technical advances and extend our prior proposals, suggesting that DA timescales reflect hierarchical control signals, emerging from perception of goal progress and distributed, circuit-level inference about policy efficacy where tonic DA represents the integration of goal alignment abstracted across planning horizons and control hierarchy.
Comments (0)