Is Ockham’s razor losing its edge?

Is Ockham’s razor losing its edge? New perspectives on the principle of model parsimony

The preference for simple explanations, known as the parsimony principle, has long guided the development of scientific theories, hypotheses, and models.
Yet recent years have seen a number of successes in employing highly complex models for scientific inquiry (e.g., for 3D protein folding or climate forecasting).
In this paper, we reexamine the parsimony principle in light of these scientific and technological advancements. We review recent developments, including the surprising benefits of modeling with more parameters than data, the increasing appreciation of the context-sensitivity of data and misspecification of scientific models, and the development of new modeling tools.
By integrating these insights, we reassess the utility of parsimony as a proxy for desirable model traits, such as predictive accuracy, interpretability, effectiveness in guiding new research, and resource efficiency.
We conclude that more complex models are sometimes essential for scientific progress, and discuss the ways in which parsimony and complexity can play complementary roles in scientific modeling practice.

Illustration of different forms of parsimony.
(A) Parsimony by constraints.
Upper: A more parsimonious model (yellow) assigns a high probability to only a narrow range of events, while a more complex model (purple) widely spreads its predictions.
Lower: A more parsimonious model (yellow) captures a subspace of phenomena that a more complex model (purple) can accommodate.
(B) Parsimony by components.
Upper: A parsimonious model (yellow) works with fewer input variables than a more complex model (purple).
Lower: A parsimonious model (yellow) postulates fewer latent variables/causes than a more complex model (purple).

Parsimony by constraints.

appeals to a model’s limited flexibility, or lack of capacity to accommodate different potential patterns in the data. Models that are parsimonious in this way anticipate specific empirical outcomes with greater confidence. To illustrate, consider the model predicting the effect of an untested drug. A parsimonious model might predict that the drug has no effect. This prediction is specific and narrow-it only allows for outcomes in which the drug has no effect at all. On the other hand, a less parsimonious model might predict that the drug could have any effect, whether positive or negative, and of any magnitude. This latter model is more flexible and can accommodate a wider range of possible outcomes, and thus, is less parsimonious by constraints. Bayesian instantiations of parsimony align with this intuition of parsimony by constraints. To enforce parsimony by constraints, scientists typically select models with fewer parameters or effective parameters, less expressive functional forms, more precise prior distributions, shorter description length, lower rank, or other criteria.

Parsimony by components.

defines the complexity of a model as the number of meaningful components it has. These components can include types or instances of variables, independent and root causes, or distinct processes represented in the model. To illustrate, consider our example of modeling human language. A parsimonious-by-components model would aim to explain the richness of human languages with a minimal set of grammatical rules. For example, Chomsky’s Theory of Universal Grammar proposes that a small number of fundamental rules can account for the vast diversity of languages spoken around the world. In contrast, a less parsimonious model might employ a larger set of rules tailored to different languages. While this model might explain the structure of different languages more precisely, it is more complex by components because it postulates more rules.

While distinct, these two forms of parsimony are interconnected. Moreover, assessing model parsimony in practice requires many nuanced choices.
Parsimony by components and by constraints align when model components can adjust based on observed data. For example, adding input variables (components) in multiple regression expands the set of input–output relationships consistent with the model.
The two forms also align when there is uncertainty about the model components because this uncertainty allows for a wider range of possible model behaviors. For example, state-of-the-art climate models incorporate many structural hypotheses about which there is substantial disagreement or uncertainty within the scientific community. Because of this uncertainty, a given climate model can be consistent with many outcomes.
Misalignment between the two forms of parsimony occurs when model components are neither adjustable nor uncertain. For example, quantum electrodynamics posits many theoretical components but makes extremely precise predictions, and is in that sense inflexible

“Plurality should not be posited without necessity“
–Ockham, 13xx

Double descent of prediction error.
Degree-one, degree-three, degree-twenty, and degree-one-thousand polynomial regression fits (magenta; from Left to Right) to data generated from a degree-three polynomial function (green).
Low prediction error is achieved by both degree-three and degree-one-thousand models.

“All models are wrong, some are useful”
–Box, G., 1976

Assessing a model’s parsimony introduces many nontrivial issues.
For example, it is not clear whether parsimony should be a property of the model itself or of the model relative to the data it describes (e.g., number of parameters is typically a property of the model independent of the data, while the effective number of parameters is a relative property).
Moreover, depending on the context, one may want to evaluate the parsimony of a class of models or of a specific model instance (i.e., a model with its structure and parameter values fixed).
A related concern is when parsimony should be estimated—before or after the model is fitted to data.
Finally, when counting components or constraints of a model, it is often unclear what exactly counts as a component or constraint. For example, one might have to decide how abstract the components could be or whether nodes, root causes, or variables constitute independent components.

Another significant challenge involves choosing a way to integrate parsimony into the scientific modeling process. In many formal procedures, parsimony trades off with goodness of fit or it is used as a tie-breaker when choosing between two models that otherwise perform equally well.
More informally, parsimony is often a key consideration when scientists choose a model to start with; for example, they might prefer to start with a minimal causal model that only contains one key variable, and only expand this model if there is sufficient evidence that additional variables are at play. In real-world applications, the evaluation and value of parsimony depend on how it is defined, instantiated, and incorporated into the scientific modeling process.

“Some models are useful,
but how do we know which ones?”
–Bürkner, Scholz & Radev, 2023

Complexity and Parsimony as Complementary Principles

Parsimonious and complex models can be combined in scientific practice, often playing different roles at different stages of the scientific process. Traditionally, new ideas are introduced with models that are deliberately simple; application to data then leads to the discovery of new phenomena, which requires that the original models be adjusted and expanded in particular ways. One example of this is the gradual development of sequential sampling models for speeded response time tasks in cognitive psychology. The initial model was relatively bare-bones, and it was gradually expanded by adding new processes (and associated parameters). Because the simple model could have been expanded in numerous ways, it would have been mere guesswork to propose any particular expansion before the availability of data to provide the proper guidance.

The appropriate role for parsimony in the modeling process depends not only on the modeler’s goals and context, but on science itself: Advances in statistics, computer science, cognitive science, and other fields continue to both refine and challenge our understanding of when, how, and in what ways parsimony facilitates or hinders scientific progress.
Despite centuries of research since Ockham’s famous invocation of the principle of parsimony 700 y ago, this paper has highlighted that there remain many open questions and unexplored nuances of the principle of parsimony.
We expect the principle of parsimony to both facilitate the evolution of and evolve alongside science itself.

Is Ockham’s razor losing its edge?

Parsimony by constraints.

Parsimony by components.

Share this:

Leave a comment Cancel reply