180 Comments
Comment deleted
Expand full comment

The term 'canalization' as far as I know, has its origins in genetics...

https://en.wikipedia.org/wiki/Canalisation_(genetics)

Expand full comment

Picky comment: There isn't a 1:1 mapping between HIV and AIDS, there are a very few people who with with HIV and don't get AIDS, IIRC.

Re canalization: I think there needs to be a theory that addresses how different canals do or do not intersect/connect, depending on circumstances, priors, etc.....

Expand full comment

> I’m now suspicious that factor analysis might be fake

I mean, ya! It's Principal Components Analysis less-credible cousin.

More seriously, it's a notoriously hard thing to interpret. At some point, it feels like reading tea leaves. If you're generally skeptical of null hypothesis significance testing as a paradigm, these types of analyses should almost be read as negative evidence.

Expand full comment

If over-canalization is caused by or associated with having too few synapses, then under-canalization would probably be associated with autism. Autism is associated with having too many synapses; it's not the root cause, but more like the intermediate or maybe proximate cause. So one thing to look at would be if autism breaks this pattern of every psychiatric condition being correlated with every other.

Expand full comment

What would the shape of this landscape be?

As a biologist, I imagine something akin to Waddington's landscape.

https://onlinelibrary.wiley.com/cms/asset/632f03d0-a547-44f5-9034-4574f281bd2c/mfig002.jpg

Here, stem cells are represented as being on the top and traveling down the canals into different pools of differentiated cells (neurons, skin cells etc). Top level stem cells are very plastic and can become anything they want. Downstream level stem cells might choose between a few different paths. And differentiated cells at the bottom are stuck with their cell type.

For a differentiated cell to change the cell type, we make them climb the canal back up to the "stem cell branching point" and let them choose a different branch. They can't just climb over the walls of the canal, the local gradient slope is too steep.

Perhaps psychedelics let people revert back to the "undifferentiated state" where their psyche has the opportunity to choose a different branch of development?

Expand full comment

It’s hard for me to even grasp the paradigm that autism could be an underfitting issue; it seems obvious that it’s more like “I didn’t understand you expected me to do X because my standards for X are these many overfitted criteria that must be met to make things very clear.” Can anyone describe paint a picture like this about what autism from underfitting would even look like?

Expand full comment

I personally am a depressed dissociated autistic with ADHD, and I know lots of other people with disorders from multiple of the model's quadrants.

If the deal with that is "your neurons are just sort of generally bad at forming accurate beliefs", you'd expect these people to be intellectually disabled or entirely out of touch with reality, but all of the people I'm thinking about are high IQ, lucid, and able to manage their own affairs.

This applies even if we just think about one disorder at a time - depressed people may be stuck in the belief that everything's hopeless, but they don't seem to get stuck in all their beliefs equally across the board.

It may well be that depression represents a canal having formed, but why that particular canal is probably complicated and interesting and won't fit into a cute small model like this.

Expand full comment

Why is it surprising that there would be a general factor that correlates with all physical and mental pathology? Shouldn't we expect factors like general health, mutational load, diet, local air quality, etc. to act as a composite factor that just makes everything about your body work better or worse?

Expand full comment

Here is your insane schizo take for the day: all of these things correspond to computational parameters, but those parameters are not within individual human brains, they are within the simulation that is running us. Whatever intelligence is simulating us suffers from canalization in respect to mapping certain clusters of traits together, then constructs the simulation in such a way that those traits have a common factor. The probability space of physically lawful brain models is probably excessively big relative to the probability space of originally occurring organic human brains in original meatspace. We are victims of the ongoing epistemic collapse and AI has not, or not yet mediated it.

Expand full comment

It seems like the relevant comparison isn't HIV/AIDS to ADHD or depression, it's "flu-like symptoms" or "rash with small, itchy bumps". Maybe I'm confused about how diagnosis for, say, anxiety works, but my understanding is that it's mostly matching a nexus of symptoms. We're aware of some discrete causes for these symptom-nexuses in some cases (e.g. certain types of brain injury) and aware that we're missing others. There's obviously some (at least partially) physical illnesses whose diagnoses is still basically "you have this nexus of mysterious symptoms", but that's not most people's every-day experience with physical ailments.

In this thinking categories like "depression" seem very useful for symptomatic treatment (in the same way that if I have a fever, aches, and stuffiness there are dedicated meds for treating "cold and flu" symptoms) but not very useful for specific causal diagnosis (because there's so many discrete things that can cause effectively identical symptoms).

And yeah, that common factor conclusion seems...suspicious.

Expand full comment

Falling into the valley reminded me of the concept of learned helplessness.

https://en.wikipedia.org/wiki/Learned_helplessness

Expand full comment

Is there a good reference for a non-medical-professional on the researching about a general factor for psychopathology? It strikes me as suspicious on its face, but I'd like to understand the general structure of the research that arrived there.

Expand full comment

Canalization in say, the way Stuart Kauffman uses it in the context of patterns (attractors) in a dynamical system refers to a situation where "most possible inputs will produce a small number of effects". These experiments relate to the idea of the fitness landscape which itself is quite similar to the idea of an energy landscape seen upside down. The fitness landscape concept also shows what the dimensions and the height of the space represent: the n dimensions represent n genes and moving along one genetic axis, height represents the outcome in fitness of all possible combinations with all other genes (or associated phenotypic expression). Height is strength of the effect, or fitness. For mental states (instead of genes), the height would be "strength of attractor" across all possible combinations of different mental states. The idea is that some mental states are strong attractors - stable mental conditions, some healthy, some less so.

There are two ways to produce strong (stable) attractors in this model: 1. to have few interactions between genes / states (if almost everything interacts with almost everything else, then there is no stability - a small change somewhere will instantly lead to a different attractor). 2. Canalization: there may be aq lot of interactions between genes / states, but most of these still lead to the same outputs.

So canalization helps with stability - without canalization, there would be many weak attractors in your dynamical system, if all genes / states interact randomly with nearly all others. With canalization, there are fewer and stronger attractors.

Another way of thinking about this is how modularity cuts down on individual interactions between components. When modularity is present, there is less need for canalizing inputs (or perhaps one could say that modularity is implicit canalization).

Expand full comment

I thought from the description that 'canalisation' would only happen during 'training', as it affects the landscape rather than the 'cursor'. But then the paper discusses canalisation during inference - how can this even happen (without being training)? Am I misunderstanding?

Expand full comment
Jun 14, 2023·edited Jun 14, 2023

> Once when I was on some research chemicals (for laboratory use only!) my train of thought got stuck in a loop.

Oh man, this happened to me once due to an edible. It was close to the worst experience of my life, perhaps 2nd worst. It was most certainly horrific. I was thinking of the term harrowing at the time. And for me, I didn't come out of it for hours, and the trip itself lasted over a day, though not all of it was as bad as that loop. And even though it only lasted hours, it felt like millennia. Time may as well have stopped, and sometimes I think there's some version of me still stuck back there in time, since time had no meaning to me in that moment. Only the constant cycling of thoughts and escalation in fear, until the fear became its own physical sensation, cycling itself. Don't do drugs, kids.

Expand full comment
Jun 14, 2023·edited Jun 14, 2023

-- OK, so "canalization" is "habits", or "getting stuck in a rut", or "learning something until it becomes a reflex", or "I could do that in my sleep", or "practice a punch 10000 times". Don't weightlifters also talk about this, in relation to why it's bad to train to failure?

-- I'm skeptical of a clean distinction between inference and learning, when it comes to people. I realize that this is acknowledged, but it doesn't seem grappled with. Also, we do also have things like "moods" - I'm more likely to react to a stimulus one way when I'm in a contemplative mood, and another when I'm in an angry mood. It feels awkward to speculate about another person like this, but my guess about the bad trip was that it put the brain into a state where there was a loop, and then later the state and therefore loop went away, and to the degree Scott came out of it a new person with new perspectives on life, the change was "learning", and to the degree Scott was the same old person, the change was "mood". I've not had any such experiences myself, but some people seem to have reported each type of experience. **shrug**

-- The discussion of Good Old Neural Nets seems solid. It should be noted that the computer versions were originally an attempt at mimicking what the brain does, and how neurons and synapses work. Deep learning dropped some of the superficial similarities to the brain, but may have introduced deeper similarities. (Edit: that was an awkward phrasing.) Is it closer? I dunno, but it seems to work better.

-- Generally, with Good Old Neural Networks, overfitting and underfitting were due to the size of your net. Say you had a training set of 50 pictures of dogs and 50 pictures of non-dogs. If your net was too big, it would overfit by, for example, storing each of the 100 pictures individually, giving a perfect answer on each of them, and giving random noise for other input. If your net was too small, it would underfit by, for example, identifying a pixel that was one color in 51% of the dogs and a different color in 51% of the non-dogs, and basing all decisions off of that one pixel. Those are extreme examples, but they should give the idea.

-- Overplasticity seems like what's going on with the fine-tuning and RLHF turning LLMs from good predictors into wishy-washy mediocrities that will at least never say anything bad. But I don't know much about how that works in practice, with the deeper nets of today.

> they say that overfitting is too much canalization during inference, and underfitting is too little canalization during inference. Likewise, over-plasticity is too much canalization during training, and over-stability is too little canalization during training.

-- I really don't know what to make of that. Both halves sort of make sense, but in different ways that don't seem compatible? I feel like "canalization" is being used to mean too many things at once.

-- The 4-part model is interesting. I do kinda buy the top left square: those things fit together, and feel intuitively to me like being stuck in a rut. I don't know enough about the other conditions to say anything useful.

> Canal Retentiveness

**applause**

> This question led me to wonder if there was a general factor of physical disease

'd' is clearly derived from Con, just as 'p' is derived from Wis, and the combination is related to the point buy. Next!

Expand full comment

> Once when I was on some research chemicals (for laboratory use only!) my train of thought got stuck in a loop.

This is a not-uncommon effect of psychedelics, and I feel like it must be analogous to the thing AIs sometimes do where they loop on the same sequence of words for a bit

Expand full comment

I just recently wrote up my thoughts on how such computational models of the mind relate to qualia and self-awareness

https://sigil.substack.com/p/you-are-a-computer-and-no-thats-not

Expand full comment
Jun 14, 2023·edited Jun 14, 2023

Local nitpick, globally important, error correction. If my understanding of current LLM design is correct then this statement is incorrect:

> But it changes which weights are active right now. If the AI has a big context window, it might change which weights are active for the next few questions, or the next few minutes, or however the AI works.

The AI doesn't change its weights while it is writing a lengthy output, I believe what is done is the previous output is fed back in as an input for the following token. At the beginning the input consists of the user input, which results in a single token output. The user input plus that one token output are then fed back into the AI which generates another single token. This process repeats until the AI decides it is "done".

The "size of the context window" is, IIUC, how many tokens the AI will accept as input. This means if you have a long conversation with an AI, eventually you can no longer feedback all of the previous conversation and you have to cut some stuff out. The naive solution to this culling is to just cull the oldest. This is why you can get the AI to be a lot smarter by having it periodically summarize the conversation, because it functions as a form of conversational compression so you can fit more "information" into the limited amount of input space (context window).

So weights don't change with the context, but which neurons get activated and which don't *will* change because the input changes with each new token added to the input.

I say this is a "local nitpick" and "globally important" because this error doesn't change the meaning/message of this article, but it may change the way you think about current AIs slightly!

If I'm wrong I am hoping someone corrects me!

Expand full comment

«I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?”»

These things are well-defined in the context of a Hopfield network. Each dimension is simply the state of each neuron, and the "height" is defined by a Hamiltonian - broadly it can thought of the mismatch between the network state and what each individual connection weight is telling the network to do.

As far as I can tell, Friston’s schtick is basically just using Hopfield-style networks as a fuzzy metaphor for all of cognition.

http://www.scholarpedia.org/article/Hopfield_network

Expand full comment

Even if there is a "general factor of psychopathology," canalization or otherwise, I wouldn't interpret it as "the cause of every disorder in the DSM," but more like "a cause of many of the common disorders in the DSM."

This is a weaker and more plausible claim.

Expand full comment

Speaking of the general factor of intelligence, Gwern linked a bit ago to this interesting paper: https://gwern.net/doc/iq/2006-vandermaas.pdf which posits that the positive manifold (every mental skill correlates positively with every other mental skill) may be due to a developmental process whereby, during development, each mental skill boosts the development of the others.

Apparently the authors of this paper later realized the model has some problems? Idk, I'm not up to date on the literature here at all. Still, thought I should point it out as interesting.

Expand full comment

"Likewise, it’s interesting to see autism and schizophrenia in opposite quadrants, given the diametrical model of the differences between them. But I’ve since retreated from the diametrical model in favor of believing they’re alike on one dimension and opposite on another - see eg the figure above, which shows a positive genetic correlation between them."

This actually does match the paper - it puts schizophrenia in the underfitting/catastrophic forgetting quadrant. You seem to have swapped it with borderline when you recreated the figure.

Expand full comment

'Recent research has suggested a similar “general factor of psychopathology”. All mental illnesses are correlated; people with depression also tend to have more anxiety, psychosis, attention problems, etc' Is this research based on population surveys or the prevalence of diagnosis? That is to say how confident are we the underlying factor isn't talking to someone capable of officially diagnosising mental illness?

Expand full comment

I don't get the... - I can't find the right word, defensiveness? - about comparing brains to neural networks. Neural networks are distinctly different from clockwork or switchboards or computers, in that they are like brains in a very non-metaphorical sense; they were modeled after the building blocks of the brain and are bound to have many of the same features. From what I understand, they were used to experimentally model and test hypotheses about brain functioning long before they found any useful real-world applications, and the two fields studying them are very much in contact with each other.

Of course, there's also a very real sense in which LLM advocates overblow the similarity in a naive hope that sufficiently trained LLMs will magically start exhibiting all of the brains' incredible complexity, but I don't think this is worth worrying about in this particular instance. The concepts in question seem to be fundamental, basic features of neural networks, biological and artificial alike, and I'm finding it safe - natural, even - to assume that any similarities between the two are real, meaningful, and informative, and directly translate to each other.

Expand full comment

The idea of active inference has always seemed backwards to me. My intuition says that it should be that to trigger an action you need to form a belief that that action *won't* happen. Like with Lens' Law: you move a magnet by putting current through a wire in the opposite direction to how it would flow if the magnet moved. Or prediction markets: you pay for Putin to be assassinated by betting that he'll survive to the end of the year.

Expand full comment

I don’t know who I’m writing this for. Probably mostly myself. But I had a lot of trouble wrapping my head around these metaphors and visualizations, until I realized it was the low resolution and complexity that tripped me up.

I’m not sure my takeaway is the right one, but I felt it helped to replace the single ball on the landscape with a rainfall of inputs and impulses (such is life). Also, I replaced the two-by-two matrix with something without strict binaries (surely representing plasticity and fitting as binaries, with no optimal level, invites misinterpretations?).

Imagining the rain falling on the landscape of the mind – originally shaped by one’s genes and circumstances – it’s clear how the rain gathers in pools and streams. Over time, creating rivers and canyons, lakes and large deltas.

The dimensions don’t have to represent anything other than the facts of their own existence, unique characteristics of our personality – just as peaks and valleys on the real landscape don’t “represent” anything. You could probably say that they represent traits or neurons or something else, depending on your lens, but you risk getting the resolution wrong again. The important part is that it’s the “shape” of where the environment meets the person.

Everything is constant interplay between the genetics and circumstances that determines your day-1 landscape, and the experiences, habits, information, treatment and environment that make up the weather.

Fitting is a consequence of the topography and size of the metaphorical drainage basins in your landscape; plasticity is represented by the local soil and vegetation and how robust it is to erosion.

And so, I can see how the weather could affect a landscape predisposed for, say, depression, and turn it from a melancholy trickle in a creek into a veritable Blue Nile, which can cause utter devastation.

There’s always a risk of taking a model or metaphor too far – the map is never the territory – but I can see how someone might want to solve their flooding problems by constructing a mental dam, how others might try to change the weather, how someone might try to divert a river, and how any of those, mismanaged, might have serious mental health consequences.

Like evolution by natural selection, or actual erosion, or real weather, it all seems like one of those processes that are based on innumerable simple inputs, following very fundamental principles, to produce infinitely complex outcomes with all kinds of feedback mechanisms.

I don’t know how much I have perverted the original model here, but understood like this, I can see how it could be a good and useful metaphor. Not least since we all have a sense of how hard it is to change the course of a river.

On the other hand, as models go, it doesn’t seem particularly new or trendy, so I’m probably missing or misunderstanding something.

Is it our rapidly evolving understanding of the “geology” in the metaphor that gives it new and deeper relevance and usefulness as a model?

Expand full comment

> I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?” My best guess is something like “some kind of artificial space representing all possible thoughts, analogous to thingspace or ML latent spaces”, and “free energy”

Nothing specific would make sense in 3d anyway, I imagine.

"To deal with hyper-planes in a 14-dimensional space, visualize a 3-D space and say 'fourteen' to yourself very loudly. Everyone does it." ~Geoffrey Hinton

Expand full comment

> my train of thought got stuck in a loop

Are loops a scientific thing? I strongly suspect they explain what BDSM folk call "subspace".

Expand full comment

I wonder how many times the "Waddington landscape" figure has been reprinted.

Expand full comment

Interestingly Geoffrey Hinton (often referred to in the literature as one of the 'grandfathers' of deep learning) wrote a highly cited paper in the 1980s called 'How learning can guide evolution' (available online here: https://icts.res.in/sites/default/files/How%20Learning%20Can%20Guide%20Evolution%20-1987.pdf) about canalisation (in the evolutionary sense).

I tried to get an evolutionary ecologist friend of mine to collaborate on a paper on whether learning (at the level of the individual organism) could lead to divergent canalisation (which might conceivably eventually lead to speciation), but I couldn't sustain his interest.

Expand full comment

"I can’t answer the question “what do the dimensions of the space represent?”"

Thingspace is really complicated. But fwiw, some researchers created a neat little toy space, the bird space, where the two main dimensions are the length of the neck and the length of the legs. You can look at some moderately funny pictures in the discussion at "Grid Cells for Conceptual Spaces?"

https://www.sciencedirect.com/science/article/pii/S0896627316307073

(You may need to use sci-hub to access it if that's legal for you.)

Expand full comment
Jun 14, 2023·edited Jun 14, 2023

I was better able to understand the metaphor when I thought of type A (inference) canalization as the parameter of how dense, round, and smooth the cursor is - i.e., how strongly it wants to roll downhill. A bowling ball would over-canalize, a feather would under-canalize, and maybe something like a d12 would be a good balance. Different aspects of the cursor could correspond to different aspects of inference, for example weight-to-surface-area ratio might correspond with being easily distracted (under-canalized during inference) because a breeze of random noise could blow you around.

Then, type B (learning) canalization is about the material of which the landscape is made. Sometimes it's putty and deforms at the slightest input, sometimes it's steel and resists change. Sometimes it's, what? Literally plastic? Maybe something like warm, soft plastic that resists light/spurious forces but response predictably to stronger forces. So, literally plasticity.

As for people historically poo-pooing metaphors about cognition, and being confused by the canal metaphor, it's important to remember that a metaphor is not the truth. To use a metametaphor: imagine your current understanding is at position A, and the truth is at B. A direct line of reasoning from A to B is difficult to cross because it's, I don't know, full of mountains and dense forests. But you notice a path that seems to kind of head in the direction of B. That path is a metaphor. It can get you closer to B, some other A' where the metametaphorical path ends. Then, as you explore A', you might find another metaphor - a path that will get you even closer to B, some other A'' where you can repeat the process. Eventually you arrive at B, though probably you had to do a little bit of metametaphorical bushwhacking for the last leg. Then, if B is a useful concept space, you open up a gold mine, and if you want to share the wealth you might build an explanatory road from the original A to B in the form of a blog post or sequence. If the gold mine is big enough, people will write whole textbooks and turn your road into a superhighway. I like metaphors.

Expand full comment

Reminds me of this article that claims that the reason for critical periods in human development is the ability to forget. (eg kids learn new languages easier than adults because they can change their landscape easier)

https://www.businessinsider.com.au/learning-and-aging-2013-1

Its basically a explore/exploit trade-off where adult neurons are in exploit mode and cannot unlearn their existing pathways, whereas children are in explore mode and are changing their landscape.

Expand full comment

My first thought is that while I love the "fitness landscape" analogy, it just seems so simple in the context of thoughts. Surely it is a multidimensional landscape and many landscapes not some single unified one. I even think in some very real sense there are multiple "balls" rolling around shrinking and growing in mass/speed/conscious salience/other properties.

Which is perhaps a different enough situation that it means the whole analogy isn't that helpful except in some specific cases.

>This doesn’t mean canalization is necessarily bad. Having habits/priors/tendencies is useful; without them you could never learn to edge-detect or walk or do anything at all. But go too far and you get . . . well, the authors suggest you get an increased tendency towards every psychiatric disease.

This strikes me as somewhat interesting, but not very different from where we were. I had a big struggle with depression in teens and early 20s for say 10 years. And I suspect if my life situation was bad I would still struggle with it. And to some extent the depression/suicidality at least partial arises out of things which otherwise are generally quite useful habits.

Tendency to dwell on and overanalyze things. Ability to focus. Lack of concern for other's feelings. Which is all great if you are doing forensic accounting about a big mess of a project.

Less helpful if you are doing a forensic accounting of your personal failings/injustices.

Expand full comment
Jun 14, 2023·edited Jun 15, 2023

Another great write-up! Especially like history bit: clockworks-switchboard-computer, of course all mechanical metaphors fall short because we are organic.

"The calculating machine works results which approach nearer to thought than any thing done by animals, but it does nothing which enables us to say it has any will, as animals have.”

--Blaise Pascal, Collected Works of Blaise Pascal, c.1669

Also, regarding: "sort of person who repeats “I hate myself, I hate myself”" Having read E. Easwaran Passage Meditation/Conquest of Mind and now Philokalia: The Bible of Orthodox Spirituality[A. M. Coniaris] - which are [all] surprisingly(to me) pluralist I'd suggest that the Mantram/Jesus Prayer would eventually if not quickly the increasing case load of our mental health professionals.

[Like giving a dog a chew toy so he stops tormenting the cat!]

[E2: like giving Tim Urban's primitive a rubber ball in exchange for the crude torch we all have. Better to facilitate playing well with others.]

Expand full comment

Hey guys, here again to explain how this connects to Freud. In his paper Beyond the Pleasure Principle, Freud describes two opposing drives, the life drive and death drive. The latter has the worst name of all time, leading to immense persistent misinterpretation -- death drive can be thought of simply as the _compulsion to repeat_ (named because the organism _wants to die in its own way_, which entails surviving until it is _ready_ to die. Stupid, I know). Life drive, on the other hand, is akin to seeking novelty, named as such because he tied it back into the drive toward the creation of new life, sex, creativity, etc.

Now, without getting too deep into the details of his theory, does the idea of "compulsion to repeat" not harmonize with "canalization" as posed above? Indeed, Freud recognized repetition compulsion as the basis of neurosis (~= all the mental disorders except dark triad and psychosis and MAYBE schizophrenia altho it requires a deeper epistemic claim too, going off a Lacanian reading). We arrive at the same conclusion as Scott: that the "symptom" (such as the thought loop Scott described) is ultimately a result of compulsion to repeat, in this case in a sort of meta sense insofar as the recognition of the compulsion to repeat itself became the object of repetition / symptom itself.

Ultimately the entire theory is rooted in a strange thermodynamic reading of biological processes ("the psyche is like an amoeba?"), but turns out that cybernetics is cybernetics regardless of digital or analog.

Expand full comment

Statements like this make me lose my already tentative grip on what canalization is supposed to be: "overfitting is too much canalization during inference, and underfitting is too little canalization during inference. Likewise, over-plasticity is too much canalization during training, and over-stability is too little canalization during training."

Wouldn't it be better to say that, to borrow Scott's ball/cursor analogy (where, I take it, the ball moving = inference), that overfitting is the ball having too little mass (too little momentum, and so following the canals too rigidly), underfitting is the ball having too much mass (too much momentum, and so not following the canals enough), over-plasticity creates too much canalization (during training), and over-stability creates too little canalization (during training).

Expand full comment

Well, the illustration does fit well with how my brain feels: overheated in spots and full of holes.

As to the rest of it - mmmm. They do seem to be a little too enthusiastic about "This fits in Quadrant A and that fits in Quadrant B", meanwhile I'm going "So it seems to me that I have a bit from A, a bit from B and I'll take something from C as well, thanks".

Expand full comment

Lots of good observations.

AI: the problem with AI is that it is fundamentally limited in training scope. Human consciousness is unquestionably building on animal consciousness, in turn built on various stages of living being consciousness. Whatever the means or method - these levels of consciousness have been honed over billions of years in an absolutely objective environment of survival.

AIs - neural nets or whatever - have none of these characteristics. They are tuned by their designers both by design and by training data. Their existence is artificial as is their "survival".

As someone who has worked with really brilliant people in engineering - the one thing I can say is that it doesn't matter how smart anyone is - they NEVER accurately anticipate what reality brings.

Neural net models are like climate models: useful but any expectation of reality from them is foolish to the extreme.

Expand full comment

Scott, just to be clear, the GenomicSEM results for the p factor do not support it: they suggest it's not real. There is also only negative support for the d factor. Both are the results of researchers assuming causality when the causal evidence is absent or negative. Please read Franić's work in which she described the meaning of common vs independent pathway models: https://psycnet.apa.org/record/2013-24385-001.

The part that matters most is on page 409: "Barring cases of model equivalence... a latent variable model cannot hold unless the corresponding common pathway model holds." You can use twin or family data or GenomicSEM to assess whether there is a common pathway model. If there is, the model will be consistent with the common causation implied by the phenotypic factor model.

This model holds for intelligence. It does not hold for general psychopathology, personality, or disease.

Expand full comment
Jun 14, 2023·edited Jun 14, 2023

"Every other computational neuroscientist thinks of autism as the classic disorder of over-fitting."

Perhaps it is both. Maybe the underfitted plasticity of information that the autistic brain can interpret demands an overfitted style of cognition and behavior. This explains mundane autistic behaviors like stress resulting from routine-breakage to more severe sensory overload.

EDIT: I just read that autism as dimensionality post and he beat me to it:

We can expect many of the behavioral and cognitive symptoms of autism to be compensatory attempts to reduce network dimensionality so as to allow structures to form. The higher the dimensionality and lower the default canalization, the more necessary extreme measures will be (e.g. “stimming”). “Autistic behaviors” are attempts at cobbling together a working navigation strategy while lacking functional pretrained pieces, while operating in a dimensionality generally hostile to stability. Behavior gets built out of stable motifs, and instability somewhere requires compensatory stability elsewhere.

Expand full comment
Jun 14, 2023·edited Jun 14, 2023

There's a parameter called temperature which you can vary when doing inference on an LLM (or any other probability model). Using the analogy of a ball moving across an energy landscape, you could think of temperature as contributing some "pep" to the ball, randomly nudging it in different directions, such that sometimes it can climb up out of a valley. As temperature goes to infinity, the limiting behaviour is for the ball to move in a way that completely disregards the contours of the energy landscape - for an LLM, this means emitting a stream of words chosen uniformly at random, with no regard to the particulars of the model's learned weights. A lower temperature leads to more predictable behaviour. The limiting behaviour at temperature 0 is to always emit whichever token the model assigns the highest probability.

For practical applications of language models, low-ish temperatures are preferred, as higher probability sequences of words are generally judged by humans to be qualitatively superior. However, as you decrease the temperature, you increase the risk of a curious pathological behaviour. At low temperatures, LLMs tend to get stuck in loops, reminiscent of Scott's experience on research chemicals. For example, if I sample from the 7 billion parameter LLaMA model conditioned on the prefix "Roses are red," and set the temperature to 0.1, it gives the following output:

Roses are red, violets are blue, I'm so glad I'm not you. I'm so glad I'm not you. I'm so glad I'm not you. I'm so glad I'm not you. I'm so glad I'm not you.

Most LLMs you interact with, such as ChatGPT, have some heuristics strapped on at inference time (such as a "repetition penalty") to avoid falling into these thought loops.

Expand full comment

I can't shake a suspicion that we're just playing out the "faces in the clouds" thing with each of these analogies for brains. Anthropomorphizing computers, more than describing our actual brain function.

The kinds of "algorithms" that can "run" on such different hardware feels like it *must* have important effects on the downstream output/behavior. Some evidence: with the computer analogy, we spend a lot of effort using computer terminology to describe how *unlike* normal computers brains are: super-parallel, non-digital(!!!), decentralized, enormous numbers of input/output channels, oddly small context window (working memory), etc.

Maybe the chain of metaphors is mostly-wrong the whole time, but just achieves better fit with more terminology to stretch the metaphor? In other words, over-fitting the metaphor itself?

Typo: You wanted "bytes" instead of "bites".

Expand full comment

If all mental illnesses are correlated, can we calculate a generic "Crazy Quotient", analogical to IQ?

I am curious how the entire population's CQ curve would look like? Is it one-sided, like most people are normal or mostly-normal, and then there is the tail of increasingly crazy people? Or rather is it two-sided, with normies in the middle, crazy people on one extreme, and... whatever is the opposite to crazy... on the other extreme?

If it is the latter, what do the extremely "anti-crazy" look like? Are they exceptionally sane people, super successful in real life? Something like "high IQ and without any mental burden"? Or is the opposite of crazy also somehow... weird? Such as, extremely *boring*?

(I thought about calling it Sane Quotient, rather than Crazy Quotient; just like we have Intelligence Quotient rather than Stupidity Quotient. But that's actually the question: is "sane" the opposite extreme to "crazy", or is "sane" in the middle and the opposite is something else?)

Expand full comment

Scott, a research chemical sent me into a very similar loop. My guess is that the loop started when I realized that I could not remember some actual thing. But the loop itself was, “There was a moment of trying to remember something. Wait, this is that moment. No, there *was* a moment of trying to remember something. Wait, this *is* this moment. No there *was* a moment of trying to remember something . . .” I was sitting on the end of a couch with my face in my hands, and the person with me asking what was up, and all I could say was, “It’s a trying to remember.” Yeah, that was hideous.

Expand full comment

Wouldn't a maximally underfitted model classify images as 'dogs' and 'not dogs' randomly, in the proportion that these have in the training data?

Expand full comment

What does this say about co-morbid illnesses in different quadrants, e.g., ASD and MDD? It's less bad than simply saying that ASD is too much bottom-up processing and MDD is too much top-down prediction, despite the ASD population having high rates of MDD, but your summary didn't explain if commorbidities like this were addressed, unless I missed something.

Expand full comment
Jun 15, 2023·edited Jun 15, 2023

"Embarrassingly for someone who’s been following these theories for years, I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?” My best guess is something like “some kind of artificial space representing all possible thoughts, analogous to thingspace or ML latent spaces”, and “free energy”, but if you ask me to explain further, I mostly can’t. Unless I missed it, the canalization paper mostly doesn’t explain this either."

Actually I think it makes sense you for to feel confused about this. Problem is the landscape illustrations, not your head. These landscapes with funnels in them *look* like plots of equations with 3 variables. In fact the color gradations in the funnels suggest that a 4th dimension is also being represented, via color. But the color changes are just ways of dramatizing the differing depth of the funnels. There's a color spectrum that runs from green through yellow, orange and pink to purple, and that maps onto depth. Green is "sea level," purple is the depth of the deepest funnels reach. So the tips of the shallower funnels are orange or yellow.

OK, so there's no color dimension, color just = depth, i.e place on the z-axis.. As for the x- and y-axes, they don't represent a damn thing. The piece of land as a whole represents all the possible things somebody can do or think, but the north-south and east-west dimensions don't mean anything here.

As for the z dimension -- well, there are 2 z-related things that matter: Slope, which mathematically is how much altitude you lose over a given horizontal distance, represents strength of attraction: The steeper it is, the harder it is not to slide down. Depth is how far down someone is. I guess how far below sea level someone is represents how far from being neutral or average they are about whatever the funnel they are in represents. Note that psychopathology need not to be involved in someone's sliding down to the bottom of a funnel. Let's say you're traveling with someone who has a deep interest in a certain Danish philosopher, and you happen to discover that you're in the town where he was born, and his home has been converted to a museum, and all his books are there and people are welcome to sit and read them. Your friend is going to slide down the slope to that museum really fast, right?, and it will be hard for him to tear himself away when it's time to head on to Sweden.

The landscape also has built into it some assumptions, and these are not identified, and easy to overlook. I think, but am not sure, that that is not altogether fair. (1) All the low areas are represented by funnels. i.e. they are smallish spots on the horizontal landscape when the land suddenly goes downhill very steeply on each side. There are of course other kinds of low areas possible in landscapes. For instance there could be a big valley with a shallow slope on most or all sides. If its circumference is large enough, it can even be as deep as the deepest funnels on the landscape we are given. And in fact most of us have things like valleys in our lives: For instance somebody who likes a lot of outdoor activities does. They’re attracted to a large variety of things of that kind: outdoor photography, kayaking, camping etc etc. But the slope into them isn’t steep. If an opportunity to do some outdoor thing presents itself, they’re inclined to do it, but they don’t feel an irrresistible pull. And even after a bout of bingeing on outdoor activities (i.e., going to the deepest part of the valley) it is not hard for them to stop if there’s another appealing possibility, or an obligation. (2) There are no hills or mountains. Mountains and hills would correspond to things somebody avoids (since downhill represents attraction, uphill would represent avoidance). Why have a model that does not represent avoidance? It’s as much a part of what guides human behavior as attraction. We all avoid auto accidents, and in fact we tend to avoid things lower down that mountains, things that take us partway to an accident, such as driving on an icy road. There’s also plenty of pathological avoidance, and to me mountains seem like a better representation of it than valleys. So for instance if somebody has illness-related OCD, they may avoid pubic bathrooms, restaurant food, hospitals, etc etc. Most people with OCD have numerous things they avoid. I do see that you could also represent their disorder as easily sliding down the slope of thoughts about “it could make me sick” and getting stuck at the bottom of the funnel.

I guessmy take on the landscape is that the authors had a lot of options for representing human attraction and avoidance, but chose the one that fit best with canalization.

Expand full comment

My thoughts also got stuck in a loop one time when I was on mushrooms back in college. It wasn't fun. I guess I'm just glad to hear this has happened to someone else in a similar context.

Expand full comment

Factor analysis is more or less fake, yes. I've been saying this for a long time, as have people like nostalgebraist (as you know).

But mostly, Scott, when you say "the statistical structure doesn’t look like a bunch of pairwise correlations, it looks like a single underlying cause", I would encourage you to think about what you think this means. If you could even DEFINE what "looking like a single underlying cause" means, you'd be doing better than most psychometricians, who never bother to define this. You cannot tell causation from correlations, obviously, so what is it you're looking for in your correlation matrix?

Expand full comment

> I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?”

Probably going to mangle the neuroscience parts of this, but taking the "human brain ≈ artificial neural net" analogy completely literally:

Each dimension of the space at training time = one weight of the neural net ≈ strength of one synapse ?≈? strength of association/priming between one pair of concepts (what thoughts cause--or inhibit--what other thoughts?)

Height of the landscape at training time = loss function = whatever the heck represents that-which-is-reinforced in the brain during learning (I think dopamine and other neurotransmitters might be responsible for representing the slope of the landscape, and the height might just be implicit??) ?≈? some combination of predictive accuracy (right is low, wrong is high) and emotional or hedonic valence (pleasure is low, pain is high)

Each dimension of the space at inference time = one dimension of a recurrent hidden state ≈ rates of action potential spikes ≈ how strongly a "single" "concept" is activated (actual concepts are known to be distributed across many neurons, iirc)

Height of the landscape at inference time = value or score function of a search algorithm (many neural nets don't have this part) ?≈? some subnetwork of the synapses, possibly concentrated in the frontal lobe (our very own personal mesa-optimizer!) ≈ how well one achieves goals or executes plans (lower is better).

I'm not sure how seriously to take this, but at least it highlights some obvious gaps in the metaphor when it comes to actual brains: Why do we have so many different neurotransmitters? What's the role of hormones? Where does instinct/genetics come in? Is motor control learned differently from prediction, and if so, can the two be represented by a single weighted loss function?

Expand full comment

One way to sanity check that model would be comorbidities. Schizophrenia being anti-correlated with autism sounds possible, sure. But borderline being anti-correlated with anorexia or depression feels wrong to me. Of course, you can add epicycles to the model a la "with BPD, we treat anorexia as self harm rather than a condition on their own".

How does learning even work with non-artificial neural networks? I assume there are different processes on very different timescales involved. Learning to play a game could start as simply storing the rules as in the state of general purpose short term memory neurons, and using general purpose neurons to apply them. Eventually I would expect that the brain will generate application specific neurons by adjusting the synaptic weights and perhaps establish new connections?

The underfitting/overfitting example through an asymmetry between dogs and not-dogs. It gets more complicated if one trains a NN to distinguish dogs from cats instead. An overfitting NN might remember all the dog and cat pictures and simply reply "50% dogness" to everything else? What would an underfitting network do? Arrive at the rule "Black/white blobs are cats, brown blobs are dogs"?

I would assume that the network sizes required for overfitting are much larger than the ones for underfitting. If you allocate more neurons for learning anything than a neurotypical human, would your brain not run out of neurons?

If you get threatened by a man X on a Tuesday night in the subway, underfitting might look like promoting the prior "men are dangerous", a healthy update might be "men in the subway at night may be dangerous" and overfitting might be "man X may be dangerous when encountered in the subway on a Tuesday night". The relative neuron requirements feel like 1:50:1000.

Expand full comment

This kind of landscape mapping is all over evolutionary biology, where they call them "fitness landscapes," and has been growing in neuroscience for at least 30 years. My last paper as a researcher in 2005 was on the technical methods involved in trying to fit parameters to these huge multidimensional datasets.

https://scholar.google.com/citations?user=uWZ5lvsAAAAJ

Expand full comment

Not quite sure "the brain is like a computer we designed to be like a brain" is a metaphor...

Expand full comment

Hi Scott,

Thanks for this write-up. This is Arthur Juliani, the first author of the ‘Deep CANALs’ paper. I am glad to hear that you find some of the extensions that we make to the canalization model useful. I realize that the section attempting to link psychopathologies to concepts in deep learning theory is the part of the paper which is perhaps most speculative. It was also the part of the paper which I was most hesitant to include. Ultimately it felt worthwhile to make the leap in order to follow the implications of the Inference/Learning and over/under-canalization through to their clinical implications, which would be in psychopathology.

In the case of autism in particular there are a couple papers which we reference that point to underfitting and lack of plasticity as both being characteristic of the disorder. On the other hand, as you rightfully point out, there are counterarguments to that perspective (and counterarguments to those counterarguments). While on the one hand this suggests that the framework that we propose here might be missing something (and I am sure that it is, on some level at least), on the other hand if it contributes productively to an ongoing dialogue regarding the best way to make computational sense of psychopathology then I see it as valuable.

As you put so well, the metaphors we use to understand the mind have become increasingly sophisticated over time. I certainly don’t think that deep neural networks provide some final isomorphism for the brain, but they do provide a useful set of new conceptual tools which we didn’t have access to before. I likewise look forward to whatever more useful metaphor comes along next to displace them.

PS: Since the initial preprint of the paper we have uploaded a revised version with a slightly adjusted final section discussing clinical implications (and a revised 2x2 matrix figure to go with it). It contains some of the more nuanced discussion around autism which you also included here.

Best wishes,

Arthur Juliani

Expand full comment

This discussion seems a good way to also think about system 1 and system 2 thinking.

System 1 involves very steep canals, something totally automatic as in the walking example Scott uses.

System 2 is much smoother, easier to drift into different shallow canals as we take our time to consider things and potentially end up in different places.

Expand full comment

"I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?”"

The dimensions correspond to the numbers that you need to completely specify the state of a brain, for a fixed set of neuronal connections and strengths. Given this state, you could in principle compute its future evolution (mental experience), possibly subject to some random noise and sensory inputs (which might push the state around).

The height of the landscape in a given point indicates how fast the system leaves that state. Also if the system evolves over time subject to noise long enough, the height of the landscape is related to the inverse of the probability to find the system in each point (assuming the process is ergodic; typically Pr(x) = C exp(-h(x)) where x is a given point int the landscape -a vector of many dimensions - h(c) is the height of the landscape at x, and C is some normalizing constant.

Expand full comment
Jun 15, 2023·edited Jun 16, 2023

Could my terrible memory be due to a deficit of canalization?

My memory for arbitrary facts and data is far below average. I've never learned to remember to shave in the morning, and need to hang a mirror next to the door out of my house. When I meet new people, I often ask them their name 3 times within a few minutes, and yet still forget it within minutes after ending the conversation. If I need to go to a different location to do something, I need to write down what I mean to do before going, because if I walk 200 feet, to a location which it isn't unusual for me to be at, I'll forget on the way what I was going to do about 90% of the time. Sometimes I pull a book off the shelf with the intention of finally reading it, and discover it's already full of notes in my handwriting. I often can't tell, when watching a movie, whether I've seen it before. I've offended many people by not remembering their names, sometimes after knowing them for years and meeting them dozens of times. There are a handful of words I must look up the spelling of literally every time I use them, despite having looked up their spelling literally hundreds of times each.

On the other hand, I seem to be much more "creative" than most people in some ways. I've long wondered whether both my poor memory and my creativity might be due to what, in machine learning, we would call a simulated annealing temperature set too high, which would seem to have the same effects as a canalization deficit. This results in a search which easily gets out of local minima and traverses great distances on the landscape, but fails to settle down into a stable state.

On the other other hand, in some cases I can't remember something because some other memory is too strong. I have one friend whose name I can never remember; I can only remember that his name isn't David. I had a housemate whose face I can never remember, because I always remember a former co-worker's face when I try to remember the housemate's face. This would seem to be poor memory due to too-strong canalization.

Expand full comment

Speaking of "relaxed beliefs under psychedelics", I wonder if sexual interests can be changed in that way as well (but only if the person wants it, of course).

Regardless, it might be useful for curing paedophiles.

Expand full comment

I was doing fine reading this article until I fell into a canal in which I disagree with Scott about the nature of borderline personality disorder. This caused catastrophic forgetting and overwrote in my mind whatever else the article is about. I'd better go look at some pictures of golden retrievers again.

Expand full comment

Can confirm, I have ADHD and trouble with overly-canalized thought-loops (especially regarding productivity "thought-stoppers" that I rarely consciously examine enough to find ways to bypass/solve them).

Also, I had the same experience as Scott w.r.t. horror-level thought-looping while on chemicals.

Expand full comment

> there is no perfect equivalent of bits or bites or memory addresses

Typo: bytes

Expand full comment

This is one of those posts I hate to read, as I have had similar ideas floating around in my brain for the last year or so and must now face the fact that I am not as original as I thought.

Another factor to add to the mix is regularisation. In ML, if we find a model is overfitting, we may add some penalties to the loss function in order to cause it to fit better. One of these types of penalties is L1 regularisation, which effectively limits the number of inputs to each node in a neural net. The loss function is penalised for each non-zero connection, so the model essentially has to "justify" each non-zero connection with improved performance. The stronger the regularisation, the heavier the penality, the more the performance has to improve to justify the connection.

When taken too far, this leads to models with overly mechanistic rules, which sounds like autism to me. If autism were related to some analogue of L1 regularisation, we'd see behaviour such as precise analytical thinking, obsession with a handful of topics, inability to take in a wide range of information at once, etc. I claim this also explains an autistic draw towards linear-information objects such as numbers and letters, subtitles, car plates, etc. It's a more comfortable domain for a model with only a handful of connections.

I've written a bit about this here a month ago: https://reasonedeclecticism.substack.com/p/autism-and-l1-regularisation

We might also see schizophrenia as L2 regularisation (which would be my next post). A schizophrenic person doesn't have precise or mechanistic thinking, instead L2 forces a model to weight all inputs the same. So a schizophrenic might have one line of rigorous analytical evidence for the argument that the doctor is trying to help them, but also observed the doctor tapping his pen 8 times on his desk which is clearly evidence of a global conspiracy to lock them up. The L2 model must give both pieces of evidence the same weight.

L1 and L2 regularisation are responses to overfitting, and therefore we could argue that autism and schizophrenia are both adaptations to an overfitted brain. Therefore we'd expect to see some risk factors shared, and some driving in opposite directions.

The "obvious" analogue of ADHD would be a dropout layer, however I haven't thought through whether that is actually a real connection or just a surface-level pattern-match on "forgetting stuff".

Expand full comment
Jun 17, 2023·edited Jun 20, 2023

To address the footnote, you can think of the dimensions of the space in the same way you might think of the dimensions of "phase space" (i.e. you don't, probably, but if you do, each axis is just a quantization of every relevant describable property of the object). One might graph the phase space of a pendulum using the dimensions of angular position and angular momentum. Associated with this graph is a certain description of reality -- if an object has an angular momentum, its angular position will change with time; and gravity pulls everything downward, changing an object's angular momentum in a certain way depending on its position. These can be codified into differential equations, and, at least for this specific example, you can describe a contour according to its "Lyapunov function" such that, if you left a ball to roll down this contour, it would correspond exactly to the evolution of the state of a pendulum over time.

The example you gave of being stuck in a repetitive thought seems similar to the part of the contour describing how, if you spun a pendulum around really fast, in the absence of friction, it would keep spinning the same way forever.

To start graphing the phase space of the brain, you could start similarly. Some level of belief in X, some level of belief in Y; some level of sleepiness, some level of happiness; and so on, and so forth. To describe the brain fully would require a staggering number of dimensions -- but the visual intuition in the abstract space describing the brain is very similar to the pendulum example above.

Expand full comment

Are the Necker cube and the Rubin vase examples of canalization you can viscerally experience?

Expand full comment

“what do the dimensions of the space represent?”

I think the most literal interpretation is that the state of each neuron or maybe even each individual synapse is its own dimension. What else could it really be?

As bad as human intuitions are for multidimensional spaces, they’re even worse for nonlinear multidimensional spaces. I do wonder if we’re not just confusing ourselves by referring to these systems as being “spaces” with “dimension” at all.

There’s a default intuition when talking about spaces with dimension that free movement through the space is possible, or that you can vary the dimensions independently. This is obviously not true of the brain. The neurons and synapses upstream and downstream of the dog-detector-neuron(s) will tend to be in certain configurations prior to and immediately after the detection of a dog. There is no dog-detection without some sufficient amount of wet-nose-detection and/or floppy-ear-detection, etc. In other words, these variables are not really independent, and their interdependency makes the dimensionality much more “sheet-like”. This is true even prior to “canalization.”

Expand full comment

There are some problems with the clockwork, telephone switchboard, computer and finally deep learning as analogies. The first three are all kinds of hardware while deep learning is a software. And, we actually know what kind of hardware a brain is; a computer. Not the kind of computer thats is made out of silicon and has lots of transistors but computer as in, a machine that computes. Every brain could be perfectly simulated by any iPhone or laptop or by Babbage's difference engine, a mechanical calculator. We could also make a brain out of telephone wires or a steam engine. Every universal computer can simulate a brain (as well as any other physical system). So the process of finding better analogies for the brain is finally over. In the unlikely chance that the brain is a quantum computer there might be some more work to do, but the brain is most likely just a classical computer.

What software the brain is running is a different harder question. Maybe it is just a deep learning program, probably something more complicated. Or our digital deep learners would probably a lot smarter than they are.

Expand full comment

Canalization looks like a form of dimensionality reduction to me

Expand full comment

Of course there's a general factor correlating mental and physical diseases diagnosed.

Willingness to go see a doctor.

Expand full comment

Anecdotally, I find the model in all it's glory compelling. I am a person who makes strong inferences on little data, I have always noticed that I am very good at forgetting, and I have bipolar disorder. Of course, that kind of mindset makes it very easy to pick up new grand models, ignoring everything else I have learned, and apply them to things, so I'm not shocked that I like it.

Expand full comment

Thanks for this question. There is indeed a lot of co-morbidity between different psychopathologies. This was actually one of the motivations behind the original canalization paper where they attempted to provide a single explanation for this fact. The reason we think that there can be co-morbidity while still maintaining that there are different computational mechanisms behind MDD and ASD for example is that they may reflect different kinds of canalization in different functional brain networks. The visual system in adults for example is highly canalized, whereas the hippocampal system is much more plastic. As such, it doesn't make sense to talk about the whole brain being canalized (or not), and therefore an individual can develop co-morbidities which may reflect seemingly opposing dynamics, just at different functional (sub)networks within the larger system.

Expand full comment

"When a zealot person refuses to reconsider their religious beliefs..." Now I want to know if the intended edit was to switch to the word zealot or person.

Expand full comment

The table on the paper says catastrophic forgetting (over-plasticity) is UNDER-canalization during training, but the text says over-plasticity is OVER-canalization; which is it?

Expand full comment
Jul 13, 2023·edited Jul 13, 2023

Because looping is a thing, maybe it's better to think of this in terms of a vector field rather than a terrain map? Balls don't roll in looping downhill circles forever (unless they're trapped on some Penrose stairs), but Vector fields can have loops that allow this. Thinking in this way also solves the "what does the height represent" problem. It's not height, it's an arrow. And the arrow points to what thought you will have next, given the thought that you are having currently.

Expand full comment

I wanted to share something about "trapped priors" somewhere on ACX, and maybe this is as good a place as any.

5 years ago I had an episode of psychosis and did some property damage ("misdemeanor vandalism"). The court, recognizing this was probably something I could get help with (as they should in more cases than they do, I think), put me on probation for nearly 2 years. During this time I had to take urine drug tests, and the stress around the whole process caused me to have a very hard time urinating.

It is now years later, and my urinary hesitation never improved. Every day I have at least a couple episodes where it either takes more than 2 minutes to urinate or I am completely unable to urinate, even though it should not be stressful at all anymore because the stimuli that made it stressful is long gone.

As unlikely as it is, I do hope Scott sees this post so he can use it as an example somewhere if he needs one, because I think it is a close-to-perfect demonstration of the phenomenon. It can be traced to a single original situation, and it has an irrefutable (physical, even) component. Plus it even passes the Brian Caplan test: if you held a gun to my head, I would not be able to urinate - you would have to kill me.

Expand full comment

Fascinating article. Provided inspiration for this piece on how canalization can help avoid path dependency in organizations: https://stripepartners.substack.com/p/42d577ad-f4ab-4b52-bdaf-18020bf39277

Expand full comment