Adaptive decision-making requires balancing exploitation of known rewarding options with exploration of uncertain alternatives, a dilemma also known as the exploration-exploitation tradeoff.
While this framework has been widely studied in reinforcement learning research, its relevance to coping, defined as the cognitive and behavioral strategies that individuals use to manage stress and uncertainty, remains underexplored. Maladaptive coping may reflect rigidity in exploitation or ineffective exploration, whereas adaptive coping may involve flexible adjustment of control in changing environments. In this study, we examined whether interindividual differences in the transition dynamics between explore and exploit strategy predict coping styles in a large online general population sample. A total of 1732 participants completed a three-armed restless bandit task, and their latent explore-exploit strategy states and transition patterns were modeled using a Hidden Markov Model. These computational indices of explore-exploit dynamics were then linked to self-reported psychological coping strategies using regression and canonical correlation analysis. Individuals with a greater tendency to persist in exploitative states reported less reliance on avoidant and emotion-focused coping, whereas exploratory tendencies showed distinct associations with externally oriented coping strategies.
Unsupervised clustering of exploration-exploitation dynamics further revealed four distinct decision-making subtypes, each associated with unique coping profiles. These findings provide the first evidence that computational markers of explore-exploit control dynamics relate to psychological coping profiles, offering mechanistic insight into psychological adaptation and resilience.

A) Schematic of the CCA model structure. Two computational parameters from Hidden Markov model and 14 coping strategy scores from the Brief COPE were entered as predictors. CCA identifies pairs of canonical variates (U, V) that represent linear combinations of each variable set and maximally correlate across sets.
B) Canonical loadings for the first canonical dimension (ρ1 = 0.28, p < .001), which may reflect a more goal-directed, exploitative decision making profile. Loading strength of coping strategies dimension 1 is listed in the order of highest to lowest.
C) Canonical loadings for the second dimension (ρ₂ = 0.09, p = .084), which though marginally significant, may reflect a noisier, more exploratory decision-making style. Loading strength of coping strategies dimension 2 is listed in the order of highest to lowest.
Loadings are plotted by absolute value; red bars indicate moderate to strong loadings ≥ |0.2|, interpreted as meaningful.

four distinct cognitive-coping profiles.
A) Hierarchical clustering identified four clusters.
B) Each cluster is defined by a different combination of tendency to stay in explore and exploit state.
Cluster 1 is characterized by high probability of staying in explore state and low probability of staying in exploit state (“over-exploratory” subtype);
Cluster 2 is characterized by high probability of staying in explore state and high probability of staying in exploit state (“modelocked” subtype);
Cluster 3 is characterized by low probability of staying in explore state and low probability of staying in exploit state (“mode-volatile” subtype);
Cluster 4 is characterized by low probability of staying in explore state and high probability of staying in exploit state (“goal-directed” subtype).
C-G) Use of coping strategies across clusters.
In this study, we demonstrate that latent transition dynamics between explore and exploit strategy capture meaningful individual differences in coping behavior. Using a three-armed restless bandit task, paired with latent state modeling using Hidden Markov model (HMM), we quantified the stability of individuals’ control modes – how long they tend to remain in exploration or exploitation before switching – and found systematic links to psychological coping profiles. Individuals with greater stability in exploitation reported less reliance on externally oriented, emotionally focused, or avoidant coping strategies, consistent with a more internally regulated, pragmatic approach to stressful situations. In contrast, those with greater stability in exploration reported greater use of religious coping and other externally focused strategies, consistent with heightened sensitivity to changing contingencies and greater information seeking under uncertainty.
Beyond single-strategy associations, canonical correlation analysis revealed that these decision dynamics align with integrated coping profiles, suggesting that coping may be organized along latent dimensions of cognitive control. Exploit-stable profiles clustered with reduced use of avoidant and emotion-focused strategies, while more exploratory profiles tended toward mixed or externally oriented coping.
Hierarchical clustering further distinguished four cognitive–coping subtypes. Individuals in the “over-exploratory” subtype (Cluster 1), who tend to remain in exploratory mode and switch away from exploitation quickly, reported the highest engagement in both adaptive (e.g., positive reframing) and maladaptive (e.g., denial, ignoring problems) coping strategies, as well as greater reliance on others and religion. This may reflect a tendency toward openness and information-seeking, but also a vulnerability to diffuse or externally oriented coping under stress. The “mode-volatile” subtype (Cluster 3), characterized by frequent switching between exploration and exploitation strategies, also showed increased use of religion, positive reframing, and denial compared to more stable subtypes, potentially indicating inconsistent control dynamics that encourage diverse but not always effective coping responses.
In contrast, the “goal-directed” subtype (Cluster 4), who quickly disengage from exploration and maintain stable exploitation, and the “mode-locked” subtype (Cluster 2), who persist in whichever mode they are in, reported lower use of these externally oriented and emotion-focused coping strategies. This may indicate a preference for more internally regulated, pragmatic approaches to managing stress.
Overall, the clustering results reveal that cognitive control style, which is captured by the stability of exploration and exploitation, corresponds to distinct coping profiles, with more exploratory or unstable strategies linked to greater use of externally focused and potentially avoidant coping methods.
These findings extend reinforcement learning models of the explore–exploit tradeoff into the domain of coping. They suggest that the stability of decision-making strategy modes may be a cognitive mechanism underlying how individuals select and sustain coping strategies when faced with stress. From a clinical perspective, these computational markers could inform precision models of stress adaptation, where interventions might be tailored to enhance flexibility in over exploitative individuals or strengthen goal-directed control in over-exploratory or volatile profiles.
