Decoding reward–curiosity conflict in decision-making from irrational behaviors

Humans and animals are not always rational. “Decoding reward–curiosity conflict in decision-making from irrational behaviors” discusses the fact humans not only rationally exploit rewards but also explore an environment owing to their curiosity. However, the mechanism of such curiosity-driven irrational behavior is largely unknown.
The article develops a decision-making model for a two choice task based on the free energy principle, which is a theory integrating recognition and action selection. The model describes irrational behaviors depending on the curiosity level.
Applying it to rat behavioral data, found that the rat had negative curiosity, reflecting conservative selection sticking to more certain options and that the level of curiosity was upregulated by the expected future information obtained from an uncertain environment. The decoding approach presented, can be a fundamental tool for identifying the neural basis for reward–curiosity conflicts.

Decision-making model for the two-choice task with reward–curiosity dilemma.
a, Decision-making in the two-choice task.
Reward is provided at different probabilities for each option. The agent does not know those probabilities. Through repeated trial and error, the agent recognizes the world by inferring the latent reward probability of each option, and decides to choose the next action, that is, option, based on its own inference.
b, Sequential Bayesian estimation as a recognition process.
The agent assumes that the reward probabilities change over time owing to the fluctuation in the latent variable controlling reward probability.
c, Belief updating.
The agent recognizes the latent variable as a probability distribution.
d, The update rule of the mean and variance of the estimation distribution for each option. α, Kt and f(μt) indicate the learning rate, Kalman gain, and the prediction of the reward probability, respectively. The second term in both equations disappears if the option is not selected.
e, The action selection process by the agent.
The agent evaluates the expected net utility Ut(at+1) of each action using the weighted sum of the expected reward and information gain, as shown in the equation. The agent compares the expected net utilities for both actions and prefers the option with the larger expected net utility.
f, Time-dependent curiosity. The intensity of curiosity changes over time owing to the fluctuation of ct .

Animals and humans perceive the external world through their sensory systems and make decisions accordingly. Generally, they cannot make optimal decisions because of the uncertainty of the environment as well as the limited computational capacity of the brain and time constraints associated with decision-making. In fact, they perform irrational actions. For example, people play lotteries and gamble despite low reward expectations. In this case, they face a dilemma between low expected reward and curiosity regarding whether a reward will be acquired. Thus, understanding how animals control the balance between reward and curiosity is important for clarifying the whole decision-making process. However, a method is yet to be established for quantifying the reward–curiosity balance has yet been established.

Some irrational behaviors emerge because of the strength of curiosity. For example, conservative individuals avoid uncertainty and prefer to select an action that leads to predictable outcomes. Conversely, inquisitive individuals strongly desire to know the environment rather than rewards and prefer to select an action that leads to unpredictable outcomes. Too conservative and inquisitive natures can be interpreted as autism spectrum disorder and attention deficit hyperactivity disorder, patients with which are known to substantially avoid and seek novel information, respectively. Rational individuals fall midway between these two extremes. In an ambiguous environment, they select an action to efficiently understand the environment, and if the environment becomes clear, they select an action to efficiently exploit the rewards. Therefore, curiosity has a major impact on behavioral patterns, and it is believed that animals control the balance between reward and curiosity in a context-dependent manner.
Decision-making has been modeled primarily by reinforcement learning (RL), which is a theory for describing reward-seeking adaptive behavior in which animals not only exploit rewards but also explore the environment. In RL, explorative behavior was addressed by a passive, random choice of action. However, animals actively explore the environment by selecting actions that minimize the uncertainty of the environment given their curiosity.
Recently, the free energy principle (FEP) was proposed by Karl Friston under the Bayesian brain hypothesis, in which the brain optimally recognizes the outside world according to Bayesian estimation.
The FEP addresses not only the recognition of the external world but also the information-seeking action selection, which minimizes the uncertainty of the recognition of the external world, known as “active inference”. Furthermore, FEP proposed a score of action, called expected free energy, which consists of the expected reward and curiosity with the same unit. Thus, action selection can be formulated by maximizing both reward and curiosity. Note that curiosity can be regarded as information gain, that is, the extent to which we expect our recognition to be updated by the new observation through the action.
However, FEP assumes that the weighting of rewards and curiosity is always even and constant. Although a previous FEP study modeled active inference in a two-choice task, it assumed a constant intensity of curiosity and thus could not treat actual animal behaviors in which the weights of rewards and curiosity are expected to change over time.
Hence, conventional theories such as RL and FEP are limited in describing the conflict between reward and curiosity.

In this study, we extended FEP by incorporating a meta-parameter that controls the conflict dynamics between reward and curiosity, called the reward–curiosity decision-making (ReCU) model. The ReCU model can exhibit various behavioral patterns, such as greedy behavior toward reward, information-seeking behaviors with high curiosity and conservative behaviors avoiding uncertainty. Moreover, we developed a machine learning method called the inverse FEP (iFEP) method to estimate the internal variables of decision-making information processing.
Applying the iFEP method to a behavioral time series in a two-choice
task, we successfully estimated the internal variables, such as variations
in curiosity, recognition of reward availability and its confidence.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: