“The interoceptive origin of reinforcement learning”
- Primary rewards are accompanied by a cascade of earlier signals – secondary and proxy rewards – that facilitate learning and prospective control rather than sustaining reinforcement.
- Primary reward signals are dependent on internal states and goals.
- The traditional reinforcement learning framework needs to be extended to address the generation of state-dependent reward signals and their interaction with reinforcement learning mechanisms within biological brains.
Rewards play a crucial role in sculpting all motivated behavior. Traditionally, research on reinforcement learning has centered on how rewards guide learning and decision-making.
Here, we examine the origins of rewards themselves. Specifically, we discuss that the critical signal sustaining reinforcement for food is generated internally and subliminally during the process of digestion. As such, a shift in our understanding of primary rewards as an immediate sensory gratification to a state-dependent evaluation of an action’s impact on vital physiological processes is called for. We integrate this perspective into a revised reinforcement learning framework that recognizes the subliminal nature of biological rewards and their dependency on internal states and goals.

Carbohydrates (sugar): Although the exact afferent pathway for carbohydrates is still the subject of investigation, there is evidence that glucose oxidation after sugar consumption is sensed in the portal vein and transmitted through the vagus nerve and the nodose ganglion to the midbrain dopamine system.
Fats (lipids): Fat is detected in the upper intestine, where it activates peroxisome proliferator-activated receptors (PPAR-α). This activation sends reinforcing signals via the vagus nerve to the nodose ganglion and hindbrain, which in turn increases dopamine release through the substantia nigra and the ventral tegmental area (VTA). Although the pathways for fat and sugar appear similar, they are parallel and activate distinct subpopulations. There is also evidence that sugar is sensed in the upper intestine, leading to vagal signals. Whether this pathway is redundant or distinct in function from the hepatoportal sensor is unknown.
Water: The pathway for water is still partially unknown. However, systemic rehydration is likely detected from changes in blood osmolality and projected to dopamine neurons in the VTA via GABAergic projections from the lateral hypothalamus tracking fluid balance. Although these pathways have been studied primarily in rodents, it is believed that similar mechanisms exist in humans.

(A) In the classic view, the receipt of a primary reward (consumption of food or juice) triggers a phasic activation of midbrain dopamine neurons in response to the unconditioned stimulus (US) (red peak).
When reward delivery is consistently preceded by a cue, such as a sound or visual signal [conditioned stimulus (CS), the phasic dopamine response gradually shifts across trials from the moment of consumption to the earlier occurrence of the predictive cue, now considered a secondary reward (green peaks).
(B) Proposed updated view of classical conditioning: Recent research suggests that the critical reinforcing signals from food or water are not primarily linked to oral sensory signals during consumption. Instead, primary reward signals are associated with post-oral processes that occur during digestion and absorption (red peak). These post-oral signals occur within a broad time window to accommodate variability in the timing and duration of those events. In this model, the immediate responses to oral signals act as proxy rewards (yellow), providing an early ‘affective draft’ of the value (delayed outcome) of the consumed food. These proxy rewards are distinct from pre-oral cues (secondary rewards), which, through conditioning, become associated with and predictive of the primary rewards. It remains an open question whether post-oral primary reward signals diminish over time as earlier signals increasingly predict their occurrence, as depicted by the lighter shading of the red peak.
