I would like to quote some of the great insights and statements from the opinion by G. Pezzulo, T. Parr, P. Cisek, A Clark, and K. Friston published in TICS: “Generating meaning: active inference and the scope and limits of passive AI“.
Does ChatGPT ‘understand‘ what it talks about in the way we do, or is it an example of a ‘Chinese room‘ that transforms symbols without any real understanding?
This opinion article offers a biophilic perspective on generative AI systems by comparing them to an active inference (or predictive processing) view of brain and cognition, which foregrounds the notion of generative models (or worldmodels), but in a biological setting.
As considered by philosophers, psychologists, neuroscientists, and engineers, the primary function of the brain is not to accumulate knowledge about the world but to control exchanges with the world.
Particular features of the world are meaningful to us because they specify the ways that we can act on the world – called ‘affordances‘ – to attain characteristic states that have adaptive value. Responding to affordances is a type of sensorimotor understanding that precedes explicit knowledge of the world, both in evolution and in the course of child development.
For many types of interaction, some (implicit or explicit) knowledge of the dynamics of the world is essential. This includes the ability to predict how our actions will influence our state, and to infer the context in which such predictions apply. These are cornerstones of a prominent perspective in cognitive neuroscience called ‘active inference’. A key idea here is that, in living organisms, sentient behavior – the capacity to infer states of the world and to act upon it with a sense of purpose – is fundamentally predictive and rests on grounded world models that can generate predictions about the consequences of action.
In active inference, generative models play a broader role that underwrites agency. During task performance, they support inference about states of the extrapersonal world and of the internal milieu, goal-directed decision-making, and planning (as predictive inference).

This figure highlights the conceptual differences between the ways that generative models support the solution of the same problem: predicting a travel destination.
The left schematic is designed to resemble that of a series of transformer networks. These are feedforward architectures based upon a repeated motif with a (multi-head) ‘self-attention‘ structure. This structure allows interactions between different parts of a sequence such that particular elements (e.g., specific words, shown in the boxes) in the sequence are emphasized relative to other elements – effectively picking out salient information that predicts the output.
The active inference architecture, on the right, illustrates a network of neuronal systems with reciprocal connectivity – of the type found in the brain – supporting recurrent dynamics.
The hierarchical structure is evident in the asymmetrical connectivity patterns. Specifically, the ‘descending‘ connections between brain areas are shown with round arrowheads to imply an inhibitory connection, as if we subtract some prediction from a higher level – from the ‘ascending‘ inputs to that region – to compute a prediction error. The ‘ascending‘ connections are shown with a pointed arrowhead to suggest an excitatory connection in which prediction errors drive belief updating and learning. Crucially, in the active inference hierarchy, predictions based upon the policy we might pursue – shown as combinations of ‘north‘ (upwards arrow) and ‘south‘ (downwards arrow) actions – influence hidden states of the world (e.g., my location in allocentric space), which themselves predict both the words we might hear and speak, and the views we might encounter. These inferred hidden states – including where we as a physical agent are in the world, and where we plan to go – are central to biological systems that engage in active inference.
In generative AI, a prompt is the input for which there is a desired output. Conversely, in biological exchanges with the world, inputs depend upon action, namely how the world is sampled. Hearing the question shown at the top of the figure updates our beliefs about the sequence of actions we might take (or imagine ourselves taking), which updates predictions about the sequence of locations we will visit (and the visual scenes we will encounter), itself updating our predictions about the next words we will speak to answer the question; an example in a simple navigation setting can be found in. In the brain, the generative models for spatial navigation entail distributed cortical and subcortical (e.g., hippocampal) networks, and achieving advanced machine autonomy might benefit from reproducing the functional properties of these networks.
A key difference between the two approaches (Figure 1) is that, although generative AI learns to provide a response when prompted, active inference associates those responses with meaning that is grounded in sensorimotor experience: the words in the question and response about ‘going north‘ or ‘south‘ are associated with the potential for (and the prediction of) movement in physical space – and engages neuronal processes involved in guiding movement in space and predicting its multisensory and affective consequences.
Language competence itself – comprising semantic and pragmatic abilities – is built on top of knowledge grounded in the sensory modalities and a non-linguistic ‘interaction engine‘ which capitalizes on nonverbal joint actions such as moving a table around a tight corner. This competence is bootstrapped during development through collaborative sense-making and child–adult interactions situated in the physical world. The question is not (only) how the symbols of language can be connected to non-symbolic processes, but rather where the symbols themselves even come from. As the example above shows, the sensorimotor interaction comes first, long before symbols appear in both phylogeny and ontogeny.
“the real purpose of making noises is not to
convey knowledge but to persuade”
The meaning of the communiqué is not in the acoustics or syntax of a given utterance and instead lies in the interaction that the utterance is predicted to induce in those who speak the same language, and the desired consequence of that exchange.
The words themselves are only shorthand notation for the meaningful interactions, and they are compact and ‘symbolic‘.
Our grasp of the meaning of linguistic symbols does not originate from our ability to process natural language but from the more foundational understanding of the lived world that we accumulate by sampling and interacting with it.
Living organisms (and active inference systems) acquire their generative models by engaging in sensorimotor exchanges with the world – and conspecifics – and learning the statistical regularities of such interactions. These interactions enable sensorimotor predictions that shape and structure perception of the world – and other agents – and afford our causal understanding of action and effects.
By contrast, LLMs such as ChatGPT learn by passively ingesting large corpora and by performing self-supervised tasks (e.g., predicting words). Other generative AI systems use the same approach, albeit with other data formats such as pictures and sometimes robot sensor data.
The ‘understanding‘ of current generative AI systems is not action-based and is essentially passive – it reflects statistical (rather than causal) regularities evidenced within large datasets of curated data (e.g., text, images, code, videos): they generate content from content, not from causes. Without the capability to actively select their observations – and to make interventions during training – generative AI may be unable to develop causal models of the contingencies between actions and effects, or of the distinction between predictions and observations.
Without a core understanding of reality (or a ‘common sense‘), current AI systems are brittle: they can learn specific tasks but often fail when presented with close variants of the same tasks because they learn inessential features that do not generalize. Technically, this type of overfitting reflects a focus on predictive accuracy at the expense of model complexity.
This type of ‘scaling up‘ might be intrinsically limited.
In the transformer architectures used in generative AI, attention (or self-attention) refers to a mechanism that assigns greater or lower weight to their (extremely long) inputs, thereby filtering them.
In active inference, attention encompasses both this filtering role (by varying the precision of predictions and sensory information) and the active selection of salient data from the environment that resolves uncertainty. Active inference systems can perform ‘experiments’ and elicit information that is expected to maximize information gain.
This curiosity is ubiquitous in living organisms, but is more challenging to obtain with passive learning
A key aspect of natural intelligence is embodiment.
It has been speculated that this grounding engenders our emotions by reflecting a sense of ‘mattering to me’ that structures and informs the ways we process information, and that imbues our world models with meaning and purpose. Active inference models this aspect of agency by using the construct of ‘interoceptive prediction’.
This provides a firm ground to evaluate the courses of action that increase or decrease the viability of an organism, and ultimately to determine what matters and what does not. Importantly, interoceptive prediction, exteroceptive prediction, and proprioceptive (action-guiding) prediction are all cocomputed as living organisms go about the task of living.
In this way, active inference may naturally scale up in ways that do not seem to have clear analogs in the sessile, data-fed methods used by generative AI, in which learning and fine-tuning are implemented sequentially

(Left) Cartoon of the pretraining process for generative AI systems in which they are passively presented with (large quantities) of data. The weights of the network are then optimized such that their outputs are more probable given the inputs. State-of-the-art models often include subsequent fine-tuning in a (semi)supervised manner; however, this still relies upon passive presentation of labeled data or self-generated outputs paired with rewards.
(Right) By contrast, the generative models that underwrite active inference involve reciprocal interactions with the world. This means that our current beliefs about the world can be used to select those data that have ‘epistemic affordance‘ – in other words they are most useful to resolve our uncertainty about the data-generating process.
In the process of learning what it means to go north or south, we may be more or less certain about the location we will end up in under each of these actions (shown here with a relatively high confidence of ending up in the southern position if going south, but more uncertainty in going north). By choosing to go north (and observing being 10 m north from our starting location), we are now in a better position to resolve our uncertainty and optimize our predictions. Beliefs about the causes of our data are an important part of this process of curiosity, exploration, or information seeking. However, these beliefs may easily be neglected in the process of function approximation used in current generative AI systems, where all that matters is the desired output. The neuroanatomical diagrams in this figure are intended purely for illustrative purposes and are not to be taken seriously as anatomical hypotheses – which would distract from the focus of this paper on AI. However, process theories have been developed from active inference frameworks to which we direct interested readers. Broadly, we might expect planning and policy selection to rely upon networks involving cortical and subcortical regions (e.g., cortico-basal-ganglia-thalamo-cortical loops) in which asymmetrical neuronal connectivity patterns between different cortical regions reflect communication between different hierarchical levels.
In active inference, the tradeoffs between exploratory and exploitative behavior – and between the efficiency and accuracy of generative models – are all gracefully resolved by pursuing the imperative of free-energy minimization.
| The imperative to maximize the evidence (also known as marginal likelihood) for generative (i.e., world) models of how observations are caused has been an essential feature of recent trends in theoretical neurobiology, machine learning, and AI. Evidence-maximization explains both sense-making and decision-making in self-organizing systems from cells to cultures. This imperative can be expressed as minimizing an evidence bound, termed ‘variational free energy’, that comprises complexity and accuracy: [1] Free energy = model complexity–model accuracy Accuracy measures goodness of fit, whereas complexity measures the divergence between prior beliefs (before seeing outcomes) and posterior beliefs (afterwards). More intuitively, complexity scores the information gain or (informational and thermodynamic) cost of changing one’s mind. This means that evidence-maximization is about finding an accurate explanation that is minimally complex (cf Occam’s principle). Importantly, in the context of generative and generalized AI, it implies optimizing generative models such that they explain data more parsimoniously, with fewer parameters. In an enactive setting – apt for explaining decision-making – beliefs about ‘which plan to commit to’ are based on the expected free energy under a plausible plan. This implicit planning as inference can be expressed as minimizing the expected free energy: [2] Expected free energy = risk (expected complexity) + ambiguity (expected inaccuracy) Risk is the divergence between probabilistic predictions about outcomes, given a plan, relative to prior preferences. Ambiguity is the expected inaccuracy. An alternative decomposition is: [3] Expected free energy = expected cost – expected information gain The expected information gain underlies the principles of optimal Bayesian design, whereas the expected cost underlies Bayesian decision theory. In short, active inference appeals to two types of Bayes optimality and subsumes information- and preference-seeking behavior under a single objective. Free-energy minimization operates both during task performance and during offline periods, such as when the brain is at rest. Minimizing free energy during offline periods optimizes the generative model for future use, even in the absence of data; for example, reducing model complexity by pruning irrelevant parameters or self-generating data through ‘generative replay’ can go beyond experienced data to encompass counterfactual (but plausible) events. Finally, during evolution, free-energy minimization could endow animal brains with prior structure encoded in species-specific circuitry. |
In organisms like us, abstract thought and linguistic knowledge are grounded in the circuits that supported sensorimotor predictions and
purposive control in our evolutionary ancestors. In other words, linguistic abilities develop on top of grounded concepts, even if they can – to some extent – become ‘detached’ from the sensorimotor context.
Current generative AI is following a path that differs fundamentally from the phylogenetic trajectories of living organisms described above: they are following an ‘inverse phylogeny‘ that starts from acquiring knowledge directly from text, alone or with other modalities.
Living organisms acquire a sense of ‘mattering‘ because they learn generative models under selective pressure to satisfy metabolic needs and remain within viable states. Their ‘authentic‘ understanding of reality is – we argue – grounded in their agentive, purposeful interactions with the embodied world, including other agents: interactions that enable agents to become ‘authors’ of their sensorium. This embodied intelligence – and the early connection to sensorimotor reality – provides a common ground for conceptual and linguistic knowledge.
Similarly, active inference agents generate content by acting on – or intervening in – the world in which they operate. Figure 2 offers an example of this: it shows an active inference agent that selects navigation actions to resolve its uncertainty about its location: an epistemic imperative that is often a precondition for the pragmatic imperative of reaching a goal destination.
Current efforts to scale up generative AI systems focus on increasing complexity but with little emphasis on actively selecting their training corpus; in other words, by selecting ‘smart‘ data that optimize active learning and inference. We believe this is a missed opportunity.
Despite these differences, the current wave of generative AI systems can impact on our ecosystems in interesting ways.
They do not simply throw our own understandings back at us (although they do that, for obvious reasons). They also package and repackage those understandings and can, with mixed results, suggest bridges between distant parts of the world-model we have uploaded into our various data streams.
