Cognitive distortions are thinking patterns that are strongly associated with internalizing disorders such as depression and anxiety.
Historical traces of in millions of books published over the course of the last two centuries in English, Spanish, and German show a pronounced “hockey stick” pattern: Over the past two decades the textual analogs of cognitive distortions surged well above historical levels, including those of World War I and II, after declining or stabilizing for most of the 20th century.
This results point to the possibility that recent socioeconomic changes, new technology, and social media are associated with a surge of cognitive distortions.
The theory underlying cognitive-behavioral therapy (CBT), the gold standard for the treatment of depression and other internalizing disorders, holds that cognitive distortions are associated with internalizing disorders; they reflect negative affectivity and avoidant behavioral patterns in the context of environmental stress. Language is closely intertwined with this dynamic.
Recent research shows that individuals with internalizing disorders express significantly higher levels of cognitive distortions in their language to the point that their prevalence may be used as an index of vulnerability for depression.

The research analyzes the prevalence of a large set of markers of cognitive distortions over the past 125 y in a collection of more than 14 million books published in English, Spanish, and German. Specifically, the longitudinal prevalence of hundreds of short sequences of one to five words (n-grams) are examined. The n-grams, labeled cognitive distortion schemata (CDS), were designed by a team of CBT experts, computational linguists, and bilingual native speakers and externally validated by a panel of CBT experts, to capture the expression of 12 types of cognitive distortions. The CDS n-grams were designed as short, unambiguous, and stand-alone statements that expressed the core of a particular cognitive distortion type, using highly frequent terms.

US English shows declining levels from 1899 to 1978, with minor peaks around 1914 and 1940 (World War I and World War II) and notably 1968. This decline is followed by a surge of CDS prevalence starting in 1978 that continues to 2019.
For Spanish we find stable levels from 1895 to the early 1980s at which point a trend occurs toward higher CDS prevalence levels above any of those previously observed.
German shows stable CDS prevalence levels, with the exception of strong peaks around and after World War I and World War II, until 2007 at which point a sudden surge occurs.

Colored bands indicate 95% confidence intervals of yearly z-score values estimated with 10,000-fold bootstrap of the set of individual CDS time series.
Gray band indicates 95% confidence interval of a null model of 10,000 sets of 241 randomly chosen n-grams with the same length distribution as the English (US) CDS set.

(A) catastrophizing, (B) dichotomous reasoning, (C) disqualifying the positive, (D) emotional reasoning, (E) fortune telling, (F) labeling and mislabeling, (G) magnification and minimization, (H) mental filtering, (I) mindreading, (J) overgeneralizing, (K) personalizing, and (L) should statements. Nearly all time series reveal a universal hockey-stick pattern of recently surging CDS n-gram prevalence levels across cognitive distortion types.
The value C indicates the log (base 10) of the total frequency of CDS n-grams in the specific cognitive distortion category as an indication of the order of magnitude of its contribution to our observations.
While the differences between the languages are interesting, perhaps the most important point is that the expression of cognitive distortions increases for all three languages in the recent three decades, leading to a distinct hockey-stick pattern indicating a surge of the CDS prevalence levels, which serve as lexical markers of cognitive distortions.
