Small steps for mankind: Modeling the emergence of cumulative culture from joint active inference communication discusses a testable deep active inference formulation of social behavior and accompanying simulations of cumulative culture in two steps:
First: cast cultural transmission as a bi-directional process of communication that induces a generalized synchrony (operationalized as a particular convergence) between the belief states of interlocutors. Second: cast social or cultural exchange as a process of active inference by equipping agents with the choice of who to engage in communication with. This induces trade-offs between confirmation of current beliefs and exploration of the social environment.
Cumulative culture emerges from belief updating (i.e., active inference and learning) in the form of a joint minimization of uncertainty.
The emergent cultural equilibria are characterized by a segregation into groups, whose belief systems are actively sustained by selective, uncertainty minimizing, dyadic exchanges. The nature of these equilibria depends sensitively on the precision afforded by various probabilistic mappings in each individual’s generative model of their encultured niche.

The model defines two groups of parameters that couple the internal states of agents: Learning and inference.
Perceptual learning (A2) is the learning of associations between emotional valence and belief states that guide the long term actions of our agents who hold and express beliefs. This learning happens at slow time scales, accumulating across multiple interactions and used to modify models over extended periods of exchange.
Perceptual Inference (A1)—namely, sensitivity to model evidence—operates on fast time scales and is direct and explicit to agents during dialogue.
Importantly, it is hypothesized that without precise evidence accumulation, agents would be insensitive to evidence regarding the belief state of the other, and their internal states would not converge.
Active inference allows us to formulate a normative and explainable account of cultural information spread through communication by casting cultural transmission as a bi-directional communicative process that entails a particular convergence between distinct conveyors and conveners of cultural information.

Variables are depicted as circles, parameters as squares and concentration parameters as dark blue circles. Visualized on a horizontal line from left to right, states evolve in time. Visualized on a vertical line from bottom to top, parameters underwrite a hierarchical structure that corresponds to levels of cognitive processing. Parameters are listed on the left of the generative model and variables are on the right
(Based on Figure 1 from Kastel, N., Hesp, C. (2021). Ideas Worth Spreading: A Free Energy Proposal for Cumulative Cultural Dynamics. In: , et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_55)
Agents differ in their action model of which agent to visit at each time point. Their individual choices are guided by expected free energy (G) which entails maximizing the expected utility of an action (known as pragmatic value) as well as maximizing the expected information gain (known as epistemic value).
These two values constrain each other such that maximizing both simultaneously is partially (but not entirely) paradoxical (as illustrated in following figure).
These constraints may also be understood as formalizing the exploration-exploitation trade-off, where epistemic value (exploration) refers to the benefit of searching to get a better estimation of promising areas that offer pragmatic value (exploitation)

This Figure illustrates the behavioral differences between the extreme cases of being fully driven by exploitation (Left) or exploration (Right). Each cell on the grid corresponds to a potential interlocutor for these agents, who make decisions in three consecutive time steps (t = 1, 2, 3) and have previously engaged with three other interlocutors (marked with blue rectangles), where we use the shorthand klC to indicate the pragmatic component of the expected free energy Gpragmatic,visit = oexpr,visit · (ln oexpr,visit − Cidea), which corresponds to the KL divergence between expectations about the interlocutor at that location (as informed by previous visits) and the preferred ideas of our agents, such that lower values correspond to a better match. Cells that are visited during t = 1, 2, 3 are filled with granite.
The exploitation-driven agent (Left) simply revisits three times a known interlocutor with the lowest KLC. In contrast, the exploration-driven agent (Right) prefers novel visits and switches to an unknown agent every time step.
Agents can balance these two strategies as their preferences themselves evolve over time.

(Left) Lower level Step 1: Interlocutor selection.
Each agent selects one interaction partner. Agents cannot see each other’s “opinion” before conversing. Meeting selection was conditioned on: (1) Habitual visitation drives, depending on past actions. (2) Deliberate drives, conditioned on: (2a) Expected (mis)match between expressed opinions (pragmatic value) and (2a) expected reduction in uncertainty about opinions of other agents, depending on one’s memory of recent visits (epistemic value).
(Right) Lower level Step 2: Conversation with a selected agent.
Each meeting consisted in exchanges of expressed support for an idea [in the range (0,1)] and affective cues [negative-positive, in the range (0,1)]. Expressed support was conditioned on: (1) Expression habits formed during past conversations, (2) one’s current support for the idea. Expressed affective cues were conditioned on one’s current valence state. Affect played a role during Steps 1 and 2: Relative reliance on habitual tendencies vs. deliberation (expected free energy G) was regulated via action model precision. The latter was conditioned on one’s current valence state, which was conditioned on one’s current support of an idea, depending on previously learned associations between expressed ideas and concurrent affective cues (from oneself and others).
This formulation employed a Bayesian framework—known as active inference—to formally account for the dynamics underlying (local) communication and (global) cumulative culture dynamics, thus contributing to the ever-growing body of research on multi-agent Bayesian models and collective active inference.
The social “transmission” of cultural information has been cast as a fundamentally bidirectional process of communication, which has been shown in the previous active inference literature to induce a generalized synchrony between the internal (belief) states of agents holding sufficiently similar generative models.
Building on this work, we operationalized generalized synchrony as a particular convergence between the internal states of interlocutors, and show that it depends sensitively on the precision of observation or likelihood mappings in a generative model of communicative exchange. When we simulate a population of agents that simultaneously engage in communication over time, cumulative culture emerges as the collective behavior brought about by local belief updating (active inference and learning in a dyadic setting). Simulations show that when a divergent belief is introduced to the status quo, it spreads within the population and brings about a collective behavior characterized by a certain degree of segregation between different belief groups. The level to which the status quo population defects to the divergent belief is mediated by local psychological biases for confirmation bias (as directly manipulated) and novelty seeking (as emergent from procedural generation of parameters). These cultural (c.f., voting) equilibria are minimizers of collective or joint free energy that emerge from the imperative to minimize uncertainty and surprise in dyadic exchanges.
