180 Comments
User's avatar
User's avatar
Comment deleted
Jun 14, 2023
Comment deleted
Expand full comment
madasario's avatar

Huzzah! I was going to comment about this exact analogy. (I went through a *very* intense Marble Madness phase in middle school.) Thanks for doing it better than I would have.

Expand full comment
jamie b.'s avatar

The term 'canalization' as far as I know, has its origins in genetics...

https://en.wikipedia.org/wiki/Canalisation_(genetics)

Expand full comment
Moon Moth's avatar

This explains the whale cancer metaphor...

Expand full comment
Emma_B's avatar

It does, and to my knowledge it has long been considered an unhelpful or perhaps just wrong, metaphor. So it seems strange to reuse it for this.

Expand full comment
Christopher White's avatar

The military usage (funneling an approaching enemy through a narrow, known path where e.g. you've got your artillery all dialed in) seems apropos---something you very much want to avoid happening to you, and try to do to your enemy. I don't have a great link; https://www.militaryfactory.com/dictionary/military-terms-defined.php?term_id=831 is the best I could find with a quick google search.

Expand full comment
Randall Hayes's avatar

Computational biologist / mathematician Sergey Gavrilets had a paper about the same term in the context of hormone signaling and sex / gender determination.

http://www.intergalacticmedicineshow.com/cgi-bin/mag.cgi?do=columns&vol=randall_hayes&article=010

Expand full comment
Phil Getts's avatar

Stuart Kauffman used it to refer to the formation of attractors in a dynamic network, in his 1992 book Origins of Order (which, as I've mentioned on this blog, I think is about the 4th-greatest scientific book of all time, just after Newton's Principia, Origin of Species, and A Theory of Information).

Expand full comment
vorkosigan1's avatar

Picky comment: There isn't a 1:1 mapping between HIV and AIDS, there are a very few people who with with HIV and don't get AIDS, IIRC.

Re canalization: I think there needs to be a theory that addresses how different canals do or do not intersect/connect, depending on circumstances, priors, etc.....

Expand full comment
oxytocin-love's avatar

Lots of people get HIV and never get AIDS these days, thanks to antiretroviral drugs. You can't cure the HIV but you can keep it from killing off all your immune cells.

All AIDS is caused by HIV, but that's just because we've defined it that way - there are other things that cause immunodeficiency. If we wanted we could define ADHD symptoms caused by genetics as a different disorder from ADHD symptoms caused by head injury, we just haven't done so.

Expand full comment
vorkosigan1's avatar

1. There are people who seemingly innately don't get AIDS, even when infected with HIV.

2. What we're both describing, then, is not a 1 to 1 relationship.

Expand full comment
oxytocin-love's avatar

yes, I agree, I was not trying to contradict you, just to elaborate/add to the discussion

Expand full comment
Matt A's avatar

> I’m now suspicious that factor analysis might be fake

I mean, ya! It's Principal Components Analysis less-credible cousin.

More seriously, it's a notoriously hard thing to interpret. At some point, it feels like reading tea leaves. If you're generally skeptical of null hypothesis significance testing as a paradigm, these types of analyses should almost be read as negative evidence.

Expand full comment
Mark's avatar

Here is a skeptical take on the specific “d factor” paper. The blog author is a reputable academic:

https://eiko-fried.com/does-the-d-disease-factor-really-exist/

Expand full comment
Emma_B's avatar

Great article, thank you for the link. In my opinion it summarizes very well the problem with the concept of a general mental health factor.

Expand full comment
Boris Bartlog's avatar

As I recall, Cosma Shalizi pointed out that if you do factor analysis on some set of a thousand uncorrelated things, you will end up with a rather decent false positive general factor just by happenstance. I'm sure there is some way to account or compensate for this (analogous to Bonferroni correction or whatever), but it suggests that factor analysis is a treacherous beast.

Expand full comment
Emma_B's avatar

I have also found CS argument unconvincing. He argues that if you create new variables by summing a large number of uncorrelated variables (ie for example randomly select 2 subsets of 500 variables among 1000 variables) then these new variables will be correlated, even if the original variables are not. He concludes that these correlations are spurious. I think that they are not, and the fact that your new variables are created by sampling a large proportion of the origin variables create a perfectly real correlation.

But I do agree that this does not indicate that the factor resulting from these correlation has a biological meaning. In my opinion it is just a summary of many variables.

Expand full comment
John Fawkes's avatar

If over-canalization is caused by or associated with having too few synapses, then under-canalization would probably be associated with autism. Autism is associated with having too many synapses; it's not the root cause, but more like the intermediate or maybe proximate cause. So one thing to look at would be if autism breaks this pattern of every psychiatric condition being correlated with every other.

Expand full comment
oxytocin-love's avatar

Autism and ADHD are very correlated.

For a bunch of other disorders like OCD, bipolar, and BPD, it's unclear whether the disorders are actually correlated, related in some more complex way, or just frequently misdiagnosed as each other, any of which could also be framed as "none of these categories cleave reality at its joints".

And then there's schizophrenia... lots of people have characterized it as the opposite of autism, but it does sometimes co-occur.

Expand full comment
John Fawkes's avatar

Yeah that's what I thought– it seems like pretty strong counter-evidence to at least that particular version of this theory.

Expand full comment
Fang's avatar

Keep in mind that, formally, the thing the medical system calls "Autism"/"ASD" is definitionally just a checklist of symptoms[1] that you have to match, not a real "thing" in concrete terms. (Part of why ADHD is so correlated is that its checklist is close to being a subset of ASD's) When talking about mis-/co-diagnosis statistics, it's much more map than territory. It describes symptoms which might not share an underlying cause.

The same goes for most other mental disorder as well, like e.g. the overlap between cluster-B disorders. OCD specifically has guidance about only being diagnosable if the obsessive thoughts don't match generalized anxiety, or body dysmorphic disorder, or trichotillomania, etc, even if you have all of those combined and your diagnosis would make more sense as "maybe I just generally have OCD"

[1] A checklist which people who almost definitely "have" autism often fail! Especially if they're afab.

Expand full comment
Loweren's avatar

What would the shape of this landscape be?

As a biologist, I imagine something akin to Waddington's landscape.

https://onlinelibrary.wiley.com/cms/asset/632f03d0-a547-44f5-9034-4574f281bd2c/mfig002.jpg

Here, stem cells are represented as being on the top and traveling down the canals into different pools of differentiated cells (neurons, skin cells etc). Top level stem cells are very plastic and can become anything they want. Downstream level stem cells might choose between a few different paths. And differentiated cells at the bottom are stuck with their cell type.

For a differentiated cell to change the cell type, we make them climb the canal back up to the "stem cell branching point" and let them choose a different branch. They can't just climb over the walls of the canal, the local gradient slope is too steep.

Perhaps psychedelics let people revert back to the "undifferentiated state" where their psyche has the opportunity to choose a different branch of development?

Expand full comment
organoid's avatar

IIUC, the main thing psychedelics do is make it dramatically easier to reweight or rewire synaptic connections; if we were talking about a chemical free-energy landscape, this would represent raising each neuron's "temperature" so it could climb otherwise unsurmountable energy barriers.

If we focus on the "training" (synaptic/connectomic) rather than "inference" (transiently self-reinforcing latent activity/attention) side of the latent neural activity space, this is definitely somewhat analogous to what the Yamanaka factors do to the epigenomic landscape, by opening up chromatin across the board and allowing the cell to pick a new and radically different stable epigenetic state. Since neurons treated with psychedelics in vivo don't go through a stage where they lose all their synapses before making totally new ones, I'd say this process is more like transdifferentiation than de/redifferentiation.

Maybe the intuition is different if you focus on the spookier activity-pattern side; I haven't done enough research to opine on whether the psychological state of a psychedelic user resembles the "pluripotent" open-mindedness of a child, or if it's a sui generis state. This perhaps recalls the longstanding controversy over the extent to which iPSCs recapitulate ESCs.

Expand full comment
Soarin' Søren Kierkegaard's avatar

It’s hard for me to even grasp the paradigm that autism could be an underfitting issue; it seems obvious that it’s more like “I didn’t understand you expected me to do X because my standards for X are these many overfitted criteria that must be met to make things very clear.” Can anyone describe paint a picture like this about what autism from underfitting would even look like?

Expand full comment
oxytocin-love's avatar

The first thing I can think of is that neurotypicals can learn new social behaviors from a few examples, while autistics generally need far more, and more explicit, training.

I agree it feels more like autism is about overfitting, but not really in the sense you describe - it's more like way more noise is getting mistaken for signal across the board. This explains sensory issues, meltdowns/shutdowns, etc, and it explains social difficulty when you realize how much of the social information that comes off of a person is actually noise and not signal.

Expand full comment
Soarin' Søren Kierkegaard's avatar

Great points, thanks!

Expand full comment
Sinity's avatar

> The first thing I can think of is that neurotypicals can learn new social behaviors from a few examples, while autistics generally need far more, and more explicit, training.

That might be because NTs are already trained on lots of data. So it's like taking a GPT-3/4 and teaching them sth by giving it a few examples. While autists learn from scratch (or rather from a more shaky foundations).

From "The Intense World Theory – a unifying theory of the neurobiology of autism":

> (...) right amygdala activation is enhanced in autistic subjects during face processing when controlling for attention, that is when the autistic subjects pay attention to the stimuli. (...) Autistics also spent less time fixating the eyes region (deviant eye gaze is a core feature in autism). Moreover, in autistics, but not in controls, the amount of eye gaze fixation was strongly correlated with amygdala activation when viewing both, inexpressive or emotional faces.

> This suggests that that eye gaze fixation is associated with emotional and possibly negative arousal in autistics and this could explain why autistics have “trouble looking other people in the eye.” Eye contact and watching the facial expressions are one of the first signs of cognitively healthy infants, are natural to people, and serve to build the basis for successful navigation through a social environment. For an autistic person however, these stimuli may be just too intense or even aversive to cope with and hence they are avoided.

> Obviously, continuous avoidance of a special class of cues will consolidate feature preference processing and prevent learning in this domain, thus some later developed social awkwardness and inappropriateness described in autism may be due to this lack of acquired knowledge.

> (...) we propose that the amygdala may be overtly active in autism, and hence autistic individuals may in principle be very well able to attend to social cues, feel emotions and even empathize with others or read their minds, but they avoid doing so, because it is emotionally too overwhelming, anxiety-inducing, and stressful.

> The Intense World Theory proposes that amygdaloid hyper-reactivity and hyper-plasticity may in particular provoke a disproportional level of negative emotions and affect in autism, such as elevated stress responses and anxiety as well as enhanced fear memory formation.

Expand full comment
Sandro's avatar

> it's more like way more noise is getting mistaken for signal across the board.

This can be seen in an underfitting sense as well though: your brain learns to calibrate to filter out irrelevant details, but if it underfits this filter, then everything just blasts through and overwhelms downstream systems.

Expand full comment
Zarathustra's avatar

If you think of socialization as the extraction of rules for behavior from a lifetime of observations, the autistic person extracts few, very general rules from the same observations out of which a more neurotypical person would extract many more specific rules.

Expand full comment
Soarin' Søren Kierkegaard's avatar

That’s the angle I was looking for. Thanks!

Expand full comment
Eremolalos's avatar

Autistic people are generally pretty uninterested in other people. As infants they arched their backs and struggled when held. They do not show normal attachment behaviors. Seems to me that failure to give a shit what other people do, think, or want could account for failure to learn social skills and whatever else other people try to teach them just as well as a wiring problem that interferes with their extracting rules.

Expand full comment
Zarathustra's avatar

Either condition would lead to underfitting, I agree, but I don’t think it’s right to say that autistic people don’t give a shit in general. Many do and try hard and still have difficulties parsing the cues and associating them with moods and intentions and expectations.

Expand full comment
Melvin's avatar

I would think of it in terms of the other end of the autism spectrum; the kind where you're unable to function normally because you can't process sensory data properly. So you're so severely underfitting that you can't see a dog as a dog, it's just a confusing fast-moving area of colours that your brain can't manage to resolve into a dog.

Then you could imagine a less severe version of the same thing; your brain is okay at seeing dogs, but has trouble unpacking more complex stuff like facial expressions and tone of voice.

Expand full comment
pozorvlak's avatar

Looked at from the opposite perspective, neurotypicals overfit to local social rules so much that they frequently mistake contingent customs of their social circle for universal laws of morality. Autists are much more able to see the range of possible customs, but much less good at learning what customs are locally in force.

Expand full comment
Eremolalos's avatar

While I know that the current practice is to talk about an autistic spectrum, I am very dubious about considering people with an Asperger-type profile as having a mild case of autism. People with autism, in the usual sense, are not going to be saying things like "I didn’t understand you expected me to do X because my standards for X are these many overfitted criteria that must be met to make things very clear.” Many of them are not going to be saying anything at all, because they are mostly or completely non-verbal. They are generally uninterested in other people, and often very attached to stim activities like twitching a shiny object in a certain way. So they're going to be squatting in a corner with their back to you staring at the spoon they're twirling, not bugging you with their over-valuing of tiny details and their need for perfection.

Expand full comment
Soarin' Søren Kierkegaard's avatar

Thanks, good point that this paradigm may not play well with the “autism is a spectrum” paradigm. Appreciated!

Expand full comment
maybeiamwrong2's avatar

A little late to the party here, but I have recently found this paper to be quite elucidating for how to conceptualize autism:

https://www.researchgate.net/publication/366427549_A_cybernetic_theory_of_autism_Autism_as_a_consequence_of_low_trait_Plasticity

It shows how common autistic traits can reult from low plasticity. I think an important aspect of this might be how plasticity gets conceptualized: In common parlance, it seems to be understood as the ability to intergrate new information, whereas from a cybernetic perspective, it might be understood as the tendency to seek out new information/tendency for exploration. From that perspective, autists don't fail to learn social stuff because they are unable to integrate the information they get, but because the information the might be able to get seems unimportant and hence not worth exploring. This would further mean that it is neither an issue of under- nor overfitting.

Expand full comment
oxytocin-love's avatar

I personally am a depressed dissociated autistic with ADHD, and I know lots of other people with disorders from multiple of the model's quadrants.

If the deal with that is "your neurons are just sort of generally bad at forming accurate beliefs", you'd expect these people to be intellectually disabled or entirely out of touch with reality, but all of the people I'm thinking about are high IQ, lucid, and able to manage their own affairs.

This applies even if we just think about one disorder at a time - depressed people may be stuck in the belief that everything's hopeless, but they don't seem to get stuck in all their beliefs equally across the board.

It may well be that depression represents a canal having formed, but why that particular canal is probably complicated and interesting and won't fit into a cute small model like this.

Expand full comment
Alephwyr's avatar

The diagnoses I have received throughout my life have gone from F3g -> F1g -> F2g -> F4g -> F2g, without ever receiving the same diagnosis from two different doctors. Until the age of 24 I was at my most lucid, capable, functional, intelligent etc without medication. I am pretty sure that all of this is about as real as solomon's demons.

Expand full comment
Moon Moth's avatar

Might it be the case that the brain has different regions which do different things, and that one region (e.g., learning through physical input) might be a bit larger than optimal while another region (e.g., parsing human expressions) might be a bit smaller than optimal? Although it's probably not discrete regions that do one thing or another, but more about how much area of the brain is shared among which functions.

Expand full comment
Moon Moth's avatar

Of course, "optimal" is always in the context of a particular set of environments. A change in the environments results in something else being "optimal".

Expand full comment
DiminishedGravitas's avatar

I think one idea that could explain a lot is observation/analysis resolution, or the bandwidth/compute available for digesting inputs. In this thought experiment I'm assuming that this is a genetic trait that doesn't evolve over time.

If you've been bandwidth constrained since childhood you've necessarily developed a robust ability to discern signal from noise. You need to identify the relevant input data before you analyze it, because you couldn't possibly analyze everything.

On the other hand if there's ample compute available for data analysis, you don't have to be so picky. Just accept all the inputs and sort out what's relevant after you make sense of it.

Both are fine on their own, but problems occur when we consider prediction model / reward function formation. From inception you start developing your reward function, what's good and what's bad, and what kind of data predicts good or bad things. You start forming your valleys and peaks.

If you're bandwidth constrained these models are formed at a somewhat coarse resolution, whereas if you're on broadband your models become very intricate very quickly. The more generalistic models are more adaptable to changing circumstance, while the detailed ones rapidly become out of date.

Even though the high-resolution mind has sufficient plasticity to generate updated prediction models, the "good old" reward function still haunts it. While there's new canals that fit current inputs nicely, it just doesn't feel the same as the original valleys etched into memory.

This could lead to various different pathologies:

depression, as it becomes apparent that inputs will never fit the ideal again;

anxiety, because while inputs fit newer reward criteria, there's constant severe mismatch with the legacy criteria;

ADHD, where the brain matches inputs with parts of the model and gets excited, but then grinds to a halt when it finds mismatches;

Borderline disorder, when the mind becomes distrustful of any models based on old data and decides to continuously retrain itself from scratch, and so forth.

Expand full comment
C_B's avatar

Why is it surprising that there would be a general factor that correlates with all physical and mental pathology? Shouldn't we expect factors like general health, mutational load, diet, local air quality, etc. to act as a composite factor that just makes everything about your body work better or worse?

Expand full comment
Zarathustra's avatar

Metanarratives aren't very fashionable these days among the literate set. Unless you're a research scientist/lab rat, then discovering a metanarrative is your favorite daydream.

Expand full comment
oxytocin-love's avatar

~*~::~*~oxidative stress~*~::~*~

Expand full comment
TGGP's avatar

Karl Smith argues that we should a priori prefer a single cause... for anything resembling an epidemic:

https://modeledbehavior.wordpress.com/2012/01/08/on-lead/

There are many different forms of mental disorders, particularly if we include autism, so it's harder to say if his logic applies to that.

Expand full comment
Emma_B's avatar

I do find the argument convincing for epidemic but mental pathologies are not epidemic, and every biological phenomenon is highly multi-causal.

It seems to me the main failing of Frissonian models, this insistance on shoeing every observations within one explanation.I think that it might be an idea coming from evolutionary biology, where fitness is The One parameter that rule them all. But fitness is not a ´real’ property of organisms, it is a summary of many different traits.

Expand full comment
Alephwyr's avatar

Here is your insane schizo take for the day: all of these things correspond to computational parameters, but those parameters are not within individual human brains, they are within the simulation that is running us. Whatever intelligence is simulating us suffers from canalization in respect to mapping certain clusters of traits together, then constructs the simulation in such a way that those traits have a common factor. The probability space of physically lawful brain models is probably excessively big relative to the probability space of originally occurring organic human brains in original meatspace. We are victims of the ongoing epistemic collapse and AI has not, or not yet mediated it.

Expand full comment
oxytocin-love's avatar

This is actually just the same as saying that our minds are running on a physical substrate and it's the physical substrate that performs computation. So yes, to the extent that the canalization model is correct, it exists in the physical world that is "simulating" us, whether there's a universe "outside" the simulation or not.

Expand full comment
Alephwyr's avatar

It's not topographically identical. Also, insofar as brains are different things that vary and not one singular platonic mass, whereas most simulations would probably just instantiate a bunch of objects based on a literally identical template, these things are entirely different.

Expand full comment
Alephwyr's avatar

Point blank, if we live in a universe that exactly fits any model then we live in a model universe.

Expand full comment
Zarathustra's avatar

#include <iostream>

class Brain {

public:

void think() {

std::cout << "The brain is thinking about itself." << std::endl;

}

};

int main() {

Brain myBrain;

// Call the think() method on the myBrain instance

myBrain.think();

return 0;

}

Expand full comment
Joel Long's avatar

It seems like the relevant comparison isn't HIV/AIDS to ADHD or depression, it's "flu-like symptoms" or "rash with small, itchy bumps". Maybe I'm confused about how diagnosis for, say, anxiety works, but my understanding is that it's mostly matching a nexus of symptoms. We're aware of some discrete causes for these symptom-nexuses in some cases (e.g. certain types of brain injury) and aware that we're missing others. There's obviously some (at least partially) physical illnesses whose diagnoses is still basically "you have this nexus of mysterious symptoms", but that's not most people's every-day experience with physical ailments.

In this thinking categories like "depression" seem very useful for symptomatic treatment (in the same way that if I have a fever, aches, and stuffiness there are dedicated meds for treating "cold and flu" symptoms) but not very useful for specific causal diagnosis (because there's so many discrete things that can cause effectively identical symptoms).

And yeah, that common factor conclusion seems...suspicious.

Expand full comment
oxytocin-love's avatar

Indeed, this is what a "syndrome" is in medicine, it's just a symptom-nexus, diagnosing a syndrome is making no claims about the cause, just saying "I notice that your symptoms fit this cluster". And most psychiatric diagnoses only go that far.

Which is definitely an extremely useful thing to do, don't get me wrong, but it does point at the fact that in most cases we don't actually know the cause.

(sometimes this is because it's not really worth our time and resources to study - a whole bunch of different viruses cause "cold and flu" symptoms, but most of the time it doesn't actually matter which one it is because regardless the treatment is the same. We could pour a ton of R&D money into specific tests and treatments for each one and save a few QALYs from the "common cold", or we could research other diseases that cause more suffering and/or are easier to understand and develop drugs for.)

Expand full comment
Emma_B's avatar

It seems to me that if we suppose that a ´well working brain’ has many relatively indépendant dimensions of ´working well’ and that ´insults’ to the functioning of the brain typically affect several dimensions simultaneously, then this will produce the observed correlation between the probabilities of different mental pathologies, and thus a common factor that does not correspond to any characteristics of brain functioning but is just a statistical summary.

Expand full comment
Radar's avatar

This is my favorite explanation. With external stress (insults) as a driving external cause for severity of symptoms in any one person.

Expand full comment
JJ Pryor's avatar

Falling into the valley reminded me of the concept of learned helplessness.

https://en.wikipedia.org/wiki/Learned_helplessness

Expand full comment
Joel Long's avatar

Is there a good reference for a non-medical-professional on the researching about a general factor for psychopathology? It strikes me as suspicious on its face, but I'd like to understand the general structure of the research that arrived there.

Expand full comment
MBKA's avatar

Canalization in say, the way Stuart Kauffman uses it in the context of patterns (attractors) in a dynamical system refers to a situation where "most possible inputs will produce a small number of effects". These experiments relate to the idea of the fitness landscape which itself is quite similar to the idea of an energy landscape seen upside down. The fitness landscape concept also shows what the dimensions and the height of the space represent: the n dimensions represent n genes and moving along one genetic axis, height represents the outcome in fitness of all possible combinations with all other genes (or associated phenotypic expression). Height is strength of the effect, or fitness. For mental states (instead of genes), the height would be "strength of attractor" across all possible combinations of different mental states. The idea is that some mental states are strong attractors - stable mental conditions, some healthy, some less so.

There are two ways to produce strong (stable) attractors in this model: 1. to have few interactions between genes / states (if almost everything interacts with almost everything else, then there is no stability - a small change somewhere will instantly lead to a different attractor). 2. Canalization: there may be aq lot of interactions between genes / states, but most of these still lead to the same outputs.

So canalization helps with stability - without canalization, there would be many weak attractors in your dynamical system, if all genes / states interact randomly with nearly all others. With canalization, there are fewer and stronger attractors.

Another way of thinking about this is how modularity cuts down on individual interactions between components. When modularity is present, there is less need for canalizing inputs (or perhaps one could say that modularity is implicit canalization).

Expand full comment
Rishika's avatar

I thought from the description that 'canalisation' would only happen during 'training', as it affects the landscape rather than the 'cursor'. But then the paper discusses canalisation during inference - how can this even happen (without being training)? Am I misunderstanding?

Expand full comment
Oxeren's avatar

If I understand correctly, canalization during training refers to the process of formation of canals in the landscape, while canalization during inference refers to the traversal of the already formed canals by the cursor.

Expand full comment
Harold's avatar

> Once when I was on some research chemicals (for laboratory use only!) my train of thought got stuck in a loop.

Oh man, this happened to me once due to an edible. It was close to the worst experience of my life, perhaps 2nd worst. It was most certainly horrific. I was thinking of the term harrowing at the time. And for me, I didn't come out of it for hours, and the trip itself lasted over a day, though not all of it was as bad as that loop. And even though it only lasted hours, it felt like millennia. Time may as well have stopped, and sometimes I think there's some version of me still stuck back there in time, since time had no meaning to me in that moment. Only the constant cycling of thoughts and escalation in fear, until the fear became its own physical sensation, cycling itself. Don't do drugs, kids.

Expand full comment
Leo Abstract's avatar

I know someone who has the same response to cannabis. I'd never heard of anyone else having it. So this is highly interesting. Have you done any research into this effect?

Expand full comment
Harold's avatar

Not really. It never really happened to me when smoking joints beforehand. But then when I had that edible, it happened, and the one time I smoked a joint afterward, it happened again (though not as intense). So I decided to just never smoke weed again at all. I'd love to learn more though, to see what the deal was.

Interestingly enough, that experience made me more aware of my thoughts in different states. For example, I never used to notice how my brain wanders right before I'm going to sleep, but since that experience I do notice it, since it's a similar feeling on a very small level.

Expand full comment
Leo Abstract's avatar

Ever tried DMT? Same guy I know says it has basically no effect on him.

Expand full comment
Harold's avatar

Oh man, no! I'm terrified of DMT. I had a friend from high school who did DMT when he went off to college. He ended up in a mental institution for 6 months. He said DMT broke reality for him. In fact, that was something echoing in my mind during my trip, I was thinking about him telling me that. I kept thinking I destroyed my mind, broke my perception of reality.

Expand full comment
DiminishedGravitas's avatar

I had a similar experience and it lead me to understand the point of meditation: identifying with my thoughts was in that moment extremely clearly a source of suffering, whereas if I unfocused and simply enjoyed the cacophony without trying to interpret it, I would reach a state of bliss.

I accepted the possibility of losing my sanity and stopped fighting the sensation, and the overwhelming fear dissipated. The experience was quite fun after that and I feel I've been a more confident, less neurotic person since.

Expand full comment
Moon Moth's avatar

-- OK, so "canalization" is "habits", or "getting stuck in a rut", or "learning something until it becomes a reflex", or "I could do that in my sleep", or "practice a punch 10000 times". Don't weightlifters also talk about this, in relation to why it's bad to train to failure?

-- I'm skeptical of a clean distinction between inference and learning, when it comes to people. I realize that this is acknowledged, but it doesn't seem grappled with. Also, we do also have things like "moods" - I'm more likely to react to a stimulus one way when I'm in a contemplative mood, and another when I'm in an angry mood. It feels awkward to speculate about another person like this, but my guess about the bad trip was that it put the brain into a state where there was a loop, and then later the state and therefore loop went away, and to the degree Scott came out of it a new person with new perspectives on life, the change was "learning", and to the degree Scott was the same old person, the change was "mood". I've not had any such experiences myself, but some people seem to have reported each type of experience. **shrug**

-- The discussion of Good Old Neural Nets seems solid. It should be noted that the computer versions were originally an attempt at mimicking what the brain does, and how neurons and synapses work. Deep learning dropped some of the superficial similarities to the brain, but may have introduced deeper similarities. (Edit: that was an awkward phrasing.) Is it closer? I dunno, but it seems to work better.

-- Generally, with Good Old Neural Networks, overfitting and underfitting were due to the size of your net. Say you had a training set of 50 pictures of dogs and 50 pictures of non-dogs. If your net was too big, it would overfit by, for example, storing each of the 100 pictures individually, giving a perfect answer on each of them, and giving random noise for other input. If your net was too small, it would underfit by, for example, identifying a pixel that was one color in 51% of the dogs and a different color in 51% of the non-dogs, and basing all decisions off of that one pixel. Those are extreme examples, but they should give the idea.

-- Overplasticity seems like what's going on with the fine-tuning and RLHF turning LLMs from good predictors into wishy-washy mediocrities that will at least never say anything bad. But I don't know much about how that works in practice, with the deeper nets of today.

> they say that overfitting is too much canalization during inference, and underfitting is too little canalization during inference. Likewise, over-plasticity is too much canalization during training, and over-stability is too little canalization during training.

-- I really don't know what to make of that. Both halves sort of make sense, but in different ways that don't seem compatible? I feel like "canalization" is being used to mean too many things at once.

-- The 4-part model is interesting. I do kinda buy the top left square: those things fit together, and feel intuitively to me like being stuck in a rut. I don't know enough about the other conditions to say anything useful.

> Canal Retentiveness

**applause**

> This question led me to wonder if there was a general factor of physical disease

'd' is clearly derived from Con, just as 'p' is derived from Wis, and the combination is related to the point buy. Next!

Expand full comment
Kristian's avatar

About moods, what is the thing that made the loop experience so terrible for Scott? How do we think about that? Why didn’t he just experience it as neutral?

In depression or anxiety or OCD, there’s generally this sense, isn’t there, that what is going on is bad (catastrophizing)?

Expand full comment
Moon Moth's avatar

I dimly recall hearing about some Buddhists experiencing positive loops like that, in meditation. (Or maybe it was someone who was heavily into functional programming?) My uneducated guess is that the phenomenon is sensitive to preparation and also to random initial factors. And also reporting bias - there's no box in the forms for "everything in my life is going great, in a self-sustaining loop".

Expand full comment
Radar's avatar

The mood part often seems to me to be the suffering we experience in the gap between how we think things should be and how they are. “I should have some control over my thoughts” up against the experience for that stretch of feeling very out of control of one’s thoughts.

Expand full comment
Moon Moth's avatar

That does seem like the classic Buddhist and Stoic answer...

Expand full comment
Radar's avatar

Right. I do want to add that I don't think that accounts for all of mood disorder experience or that all mood disorders can be fixed through mental practice. The pain experienced in the gap may be universal to humans, but all kinds of biological and environmental factors may make that gap wider or the pain experienced in it greater or deprive a person of buffering factors that could help at both of those levels.

Expand full comment
DiminishedGravitas's avatar

I think it's about identifying with your thoughts as opposed to observing your thoughts.

Getting stuck in a thought loop is very different from having your thoughts stuck in a loop, if you catch my drift? If a thought loop is like a spinning top, the experience is completely depending on whether you're merely observing a top spin, or if you yourself are spinning uncontrollably.

It's also about how you assign values to experiences. Being stuck in a thought loop is anathema if you're value being in control above all else: it's like a chinese finger trap that only works because you're trying to struggle out of it. If you're fine with losing control, the thought of it happening elicits no fear response, and the loop never forms.

Expand full comment
Xpym's avatar

Yes, unlike essentially set in stone artificial NNs, human brain is greatly amenable to transient chemical influence in the moments of "inference", whether it's hormone-induced moods, or drug-related exotic states. This doesn't seem to mesh with everything being explainable by "canals" within the computational structure of the brain-NN itself?

Expand full comment
Moon Moth's avatar

Yeah. I haven't read the paper itself but the summary conveys the pop-sci "smell" of speculating on how a newly-discovered mechanism explains a lot of different things. When it probably merely has a varying amount of effect on those things, mostly small.

Expand full comment
Muster the Squirrels's avatar

> I'm skeptical of a clean distinction between inference and learning, when it comes to people. I realize that this is acknowledged, but it doesn't seem grappled with.

I would also like to read more about this. I think it's worth a follow-up post, if Scott doesn't consider anything else more pressing (though he probably does).

Expand full comment
oxytocin-love's avatar

> Once when I was on some research chemicals (for laboratory use only!) my train of thought got stuck in a loop.

This is a not-uncommon effect of psychedelics, and I feel like it must be analogous to the thing AIs sometimes do where they loop on the same sequence of words for a bit

Expand full comment
Doug S.'s avatar

Reminds me of the joke of different professions' "proofs" that all odd numbers are prime. The computer programmer's proof: 3 is prime, 5 is prime, 7 is prime, 7 is prime, 7 is prime...

Expand full comment
jakej's avatar

I just recently wrote up my thoughts on how such computational models of the mind relate to qualia and self-awareness

https://sigil.substack.com/p/you-are-a-computer-and-no-thats-not

Expand full comment
Micah Zoltu's avatar

Local nitpick, globally important, error correction. If my understanding of current LLM design is correct then this statement is incorrect:

> But it changes which weights are active right now. If the AI has a big context window, it might change which weights are active for the next few questions, or the next few minutes, or however the AI works.

The AI doesn't change its weights while it is writing a lengthy output, I believe what is done is the previous output is fed back in as an input for the following token. At the beginning the input consists of the user input, which results in a single token output. The user input plus that one token output are then fed back into the AI which generates another single token. This process repeats until the AI decides it is "done".

The "size of the context window" is, IIUC, how many tokens the AI will accept as input. This means if you have a long conversation with an AI, eventually you can no longer feedback all of the previous conversation and you have to cut some stuff out. The naive solution to this culling is to just cull the oldest. This is why you can get the AI to be a lot smarter by having it periodically summarize the conversation, because it functions as a form of conversational compression so you can fit more "information" into the limited amount of input space (context window).

So weights don't change with the context, but which neurons get activated and which don't *will* change because the input changes with each new token added to the input.

I say this is a "local nitpick" and "globally important" because this error doesn't change the meaning/message of this article, but it may change the way you think about current AIs slightly!

If I'm wrong I am hoping someone corrects me!

Expand full comment
jakej's avatar

I noticed that too, but he followed it up with "or however AI works", so close enough lol.

Expand full comment
Micah Zoltu's avatar

Hah, yeah, I saw the "or however AI works" as "I acknowledge I don't really get how this works, but this is my current mental model and hopefully someone in the comments will help me improve my mental model". I may have read too much between the lines though. 😛

Expand full comment
Moon Moth's avatar

Yeah, to my best understanding, that's how LLMs work at the moment. (I also noticed this, but then forgot when I went back to write my response. Per a discussion in the open thread, was I conscious then?)

I think there are currently some LLM designs that work around this by having the LLM maintain an external store of data, but I forget which ones they are.

Expand full comment
Godshatter's avatar

I took this to be talking about Multi Headed Attention, which is a key feature of transformer model architecture.

From a programming perspective obviously all your weights are in use all the time, but it doesn't seem wrong to explain MHA as "turning on and off weights depending on the context you've picked up so far". A more accurate way to explain it though would be "upregulating/downregulating particular response modalities based on context".

So using the terms from this article, an 'overfitted' LLM would be very context dependent in its responses. For example if you asked "Why does Paris Hilton not like genocide?" it would use the category of "questions about Paris Hilton" and say something like "Paris Hilton revealed in a recent tell-all interview that she hates confrontation".

An 'underfitted' LLM would make the opposite mistake – it wouldn't appropriately take context into account, so if you asked "Does Adolf Hitler like genocide?" it would bucket this as a general question about liking things and give a generic reply that ignored the specifics of the question: "He likes mystery novels and long walks on the beach".

Obviously this isn't how anyone in ML uses the terms 'overfitting'and 'underfitting'. I'm not even sure that it's possible to get a transformer to make these classes of error, but it might be interesting to try!

Expand full comment
Moon Moth's avatar

Cool, thanks! I'd been wondering how "attention" would interact with all this. I need I read up on that.

Expand full comment
Godshatter's avatar

No problem! https://towardsdatascience.com/transformers-explained-visually-part-1-overview-of-functionality-95a6dd460452 looks like a good explanation (Ctrl+F "What does Attention Do?") with the subsequent articles going into more detail on the actual implementation.

I'm not an expert either – I've implemented some toy transformers from scratch and and gone through the theory on Coursera, but I don't know that I really grok it all. So I have some reading to do too!

If there's anyone in thread who's actually a specialist, I'd be very happy to be corrected if I've got anything wrong!

Expand full comment
Joe's avatar

«I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?”»

These things are well-defined in the context of a Hopfield network. Each dimension is simply the state of each neuron, and the "height" is defined by a Hamiltonian - broadly it can thought of the mismatch between the network state and what each individual connection weight is telling the network to do.

As far as I can tell, Friston’s schtick is basically just using Hopfield-style networks as a fuzzy metaphor for all of cognition.

http://www.scholarpedia.org/article/Hopfield_network

Expand full comment
Doug Summers Stay's avatar

I was going to say that this is a picture of the "loss landscape". In that case, the height represents how far from the goal you are trying to optimize for (your loss function) you are, and the other dimensions are the parameters of the neural network. Gradient descent is descending into these valleys depending on the slope of the mountains. I think on Scott's model, what the brain is trying to optimize for is "predict the immediate future" so that would be the loss function.

Expand full comment
Joe's avatar

I think these are different things. Moving around a loss landscape is a description of learning. In Friston’s analogy, minimizing free energy is a description of how a "pre-trained" brain behaves, which makes it like the free energy of a Hopfield network - i.e. a description of behaviour, not learning.

Learning, in the context of a free energy landscape, would mean making the individual peaks and troughs lower or higher.

Expand full comment
sclmlw's avatar

In the context of actual brain function, wouldn't you expect learning to be happening at the same time, though? So every time you descend a gradient and get feedback about the results of that descent shouldn't that change the shape of the nearby landscape?

Expand full comment
Radar's avatar

That was one frustration i had reading that article was not knowing quite how they were using this landscape idea. By the end i had the sense it was being used as a fuzzy metaphor but being deployed in a way that seemed more literal and that i just didn’t understand the underlying science. I kept thinking surely they would say up front if they meant this topology merely in a metaphorical way to propose a model that there’s no evidence for. But i can be kinda literal…

Expand full comment
Stephen Pimentel's avatar

Even if there is a "general factor of psychopathology," canalization or otherwise, I wouldn't interpret it as "the cause of every disorder in the DSM," but more like "a cause of many of the common disorders in the DSM."

This is a weaker and more plausible claim.

Expand full comment
Sniffnoy's avatar

Speaking of the general factor of intelligence, Gwern linked a bit ago to this interesting paper: https://gwern.net/doc/iq/2006-vandermaas.pdf which posits that the positive manifold (every mental skill correlates positively with every other mental skill) may be due to a developmental process whereby, during development, each mental skill boosts the development of the others.

Apparently the authors of this paper later realized the model has some problems? Idk, I'm not up to date on the literature here at all. Still, thought I should point it out as interesting.

Expand full comment
Apogee's avatar

"Likewise, it’s interesting to see autism and schizophrenia in opposite quadrants, given the diametrical model of the differences between them. But I’ve since retreated from the diametrical model in favor of believing they’re alike on one dimension and opposite on another - see eg the figure above, which shows a positive genetic correlation between them."

This actually does match the paper - it puts schizophrenia in the underfitting/catastrophic forgetting quadrant. You seem to have swapped it with borderline when you recreated the figure.

Expand full comment
Apogee's avatar

also, wrt autism: I don't think the canal model disagrees with other predictive coding models as much as it seems to.

Sander et al: "A chronically increased precision will initiate new learning at every new instance. Hence, future predictions are shaped by noise or contingencies that are unlikely to repeat in the future (also called overfitting). An organism that developed this kind of priors will have strong predictions on what to expect next but such predictions will quasi-never be applicable."

Juliani: "[Autism] can be characterized by an inconsistent deployment of mental circuits ... as well as an inability or difficulty in learning or changing these circuits over time."

I think these match up pretty well: autism involves creating strong, specific priors that are difficult to change, even though those priors routinely fail to be validated by experience. The confusing part is that what the other models call "overfitting", Juliani calls "plasticity loss"... and then he goes on to call something else "overfitting" anyway.

I can kind of see why autistic inference could be considered "underfitting", though. Like the underfit ANN that classifies all mammals as "dog", the autistic mind might not realize e.g. that a sarcastic comment goes in a different bucket from a genuine one.

Expand full comment
soda's avatar

'Recent research has suggested a similar “general factor of psychopathology”. All mental illnesses are correlated; people with depression also tend to have more anxiety, psychosis, attention problems, etc' Is this research based on population surveys or the prevalence of diagnosis? That is to say how confident are we the underlying factor isn't talking to someone capable of officially diagnosising mental illness?

Expand full comment
Hoopdawg's avatar

I don't get the... - I can't find the right word, defensiveness? - about comparing brains to neural networks. Neural networks are distinctly different from clockwork or switchboards or computers, in that they are like brains in a very non-metaphorical sense; they were modeled after the building blocks of the brain and are bound to have many of the same features. From what I understand, they were used to experimentally model and test hypotheses about brain functioning long before they found any useful real-world applications, and the two fields studying them are very much in contact with each other.

Of course, there's also a very real sense in which LLM advocates overblow the similarity in a naive hope that sufficiently trained LLMs will magically start exhibiting all of the brains' incredible complexity, but I don't think this is worth worrying about in this particular instance. The concepts in question seem to be fundamental, basic features of neural networks, biological and artificial alike, and I'm finding it safe - natural, even - to assume that any similarities between the two are real, meaningful, and informative, and directly translate to each other.

Expand full comment
Oscar Cunningham's avatar

The idea of active inference has always seemed backwards to me. My intuition says that it should be that to trigger an action you need to form a belief that that action *won't* happen. Like with Lens' Law: you move a magnet by putting current through a wire in the opposite direction to how it would flow if the magnet moved. Or prediction markets: you pay for Putin to be assassinated by betting that he'll survive to the end of the year.

Expand full comment
Chris K. N.'s avatar

I don’t know who I’m writing this for. Probably mostly myself. But I had a lot of trouble wrapping my head around these metaphors and visualizations, until I realized it was the low resolution and complexity that tripped me up.

I’m not sure my takeaway is the right one, but I felt it helped to replace the single ball on the landscape with a rainfall of inputs and impulses (such is life). Also, I replaced the two-by-two matrix with something without strict binaries (surely representing plasticity and fitting as binaries, with no optimal level, invites misinterpretations?).

Imagining the rain falling on the landscape of the mind – originally shaped by one’s genes and circumstances – it’s clear how the rain gathers in pools and streams. Over time, creating rivers and canyons, lakes and large deltas.

The dimensions don’t have to represent anything other than the facts of their own existence, unique characteristics of our personality – just as peaks and valleys on the real landscape don’t “represent” anything. You could probably say that they represent traits or neurons or something else, depending on your lens, but you risk getting the resolution wrong again. The important part is that it’s the “shape” of where the environment meets the person.

Everything is constant interplay between the genetics and circumstances that determines your day-1 landscape, and the experiences, habits, information, treatment and environment that make up the weather.

Fitting is a consequence of the topography and size of the metaphorical drainage basins in your landscape; plasticity is represented by the local soil and vegetation and how robust it is to erosion.

And so, I can see how the weather could affect a landscape predisposed for, say, depression, and turn it from a melancholy trickle in a creek into a veritable Blue Nile, which can cause utter devastation.

There’s always a risk of taking a model or metaphor too far – the map is never the territory – but I can see how someone might want to solve their flooding problems by constructing a mental dam, how others might try to change the weather, how someone might try to divert a river, and how any of those, mismanaged, might have serious mental health consequences.

Like evolution by natural selection, or actual erosion, or real weather, it all seems like one of those processes that are based on innumerable simple inputs, following very fundamental principles, to produce infinitely complex outcomes with all kinds of feedback mechanisms.

I don’t know how much I have perverted the original model here, but understood like this, I can see how it could be a good and useful metaphor. Not least since we all have a sense of how hard it is to change the course of a river.

On the other hand, as models go, it doesn’t seem particularly new or trendy, so I’m probably missing or misunderstanding something.

Is it our rapidly evolving understanding of the “geology” in the metaphor that gives it new and deeper relevance and usefulness as a model?

Expand full comment
madasario's avatar

Very cool expansion on the concept - continual learning as weather patterns. Repeated weather patterns yield large-scale landscape patterns. Eventually the weather wears us away to nothing and we die. ...ok, you're right, it's possible to take the analogy too far. ;) But I like it.

Expand full comment
Chris K. N.'s avatar

Thanks. 😀

Expand full comment
Felix Auvray-Stiritz's avatar

Thank you for writing this!

Expand full comment
Sinity's avatar

> I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?” My best guess is something like “some kind of artificial space representing all possible thoughts, analogous to thingspace or ML latent spaces”, and “free energy”

Nothing specific would make sense in 3d anyway, I imagine.

"To deal with hyper-planes in a 14-dimensional space, visualize a 3-D space and say 'fourteen' to yourself very loudly. Everyone does it." ~Geoffrey Hinton

Expand full comment
Giles English's avatar

> my train of thought got stuck in a loop

Are loops a scientific thing? I strongly suspect they explain what BDSM folk call "subspace".

Expand full comment
Metacelsus's avatar

I wonder how many times the "Waddington landscape" figure has been reprinted.

Expand full comment
awanderingmind's avatar

Interestingly Geoffrey Hinton (often referred to in the literature as one of the 'grandfathers' of deep learning) wrote a highly cited paper in the 1980s called 'How learning can guide evolution' (available online here: https://icts.res.in/sites/default/files/How%20Learning%20Can%20Guide%20Evolution%20-1987.pdf) about canalisation (in the evolutionary sense).

I tried to get an evolutionary ecologist friend of mine to collaborate on a paper on whether learning (at the level of the individual organism) could lead to divergent canalisation (which might conceivably eventually lead to speciation), but I couldn't sustain his interest.

Expand full comment
demost_'s avatar

"I can’t answer the question “what do the dimensions of the space represent?”"

Thingspace is really complicated. But fwiw, some researchers created a neat little toy space, the bird space, where the two main dimensions are the length of the neck and the length of the legs. You can look at some moderately funny pictures in the discussion at "Grid Cells for Conceptual Spaces?"

https://www.sciencedirect.com/science/article/pii/S0896627316307073

(You may need to use sci-hub to access it if that's legal for you.)

Expand full comment
madasario's avatar

I was better able to understand the metaphor when I thought of type A (inference) canalization as the parameter of how dense, round, and smooth the cursor is - i.e., how strongly it wants to roll downhill. A bowling ball would over-canalize, a feather would under-canalize, and maybe something like a d12 would be a good balance. Different aspects of the cursor could correspond to different aspects of inference, for example weight-to-surface-area ratio might correspond with being easily distracted (under-canalized during inference) because a breeze of random noise could blow you around.

Then, type B (learning) canalization is about the material of which the landscape is made. Sometimes it's putty and deforms at the slightest input, sometimes it's steel and resists change. Sometimes it's, what? Literally plastic? Maybe something like warm, soft plastic that resists light/spurious forces but response predictably to stronger forces. So, literally plasticity.

As for people historically poo-pooing metaphors about cognition, and being confused by the canal metaphor, it's important to remember that a metaphor is not the truth. To use a metametaphor: imagine your current understanding is at position A, and the truth is at B. A direct line of reasoning from A to B is difficult to cross because it's, I don't know, full of mountains and dense forests. But you notice a path that seems to kind of head in the direction of B. That path is a metaphor. It can get you closer to B, some other A' where the metametaphorical path ends. Then, as you explore A', you might find another metaphor - a path that will get you even closer to B, some other A'' where you can repeat the process. Eventually you arrive at B, though probably you had to do a little bit of metametaphorical bushwhacking for the last leg. Then, if B is a useful concept space, you open up a gold mine, and if you want to share the wealth you might build an explanatory road from the original A to B in the form of a blog post or sequence. If the gold mine is big enough, people will write whole textbooks and turn your road into a superhighway. I like metaphors.

Expand full comment
Maxwell E's avatar

This was helpful for me, thanks for the "cursor material interacts with the substrate" metaphor.

Expand full comment
madasario's avatar

I'm glad it helped! It mixes well with Jay Rollins' comparison to Marble Madness in a different comment.

Expand full comment
Is_Otter's avatar

Reminds me of this article that claims that the reason for critical periods in human development is the ability to forget. (eg kids learn new languages easier than adults because they can change their landscape easier)

https://www.businessinsider.com.au/learning-and-aging-2013-1

Its basically a explore/exploit trade-off where adult neurons are in exploit mode and cannot unlearn their existing pathways, whereas children are in explore mode and are changing their landscape.

Expand full comment
Martin Blank's avatar

My first thought is that while I love the "fitness landscape" analogy, it just seems so simple in the context of thoughts. Surely it is a multidimensional landscape and many landscapes not some single unified one. I even think in some very real sense there are multiple "balls" rolling around shrinking and growing in mass/speed/conscious salience/other properties.

Which is perhaps a different enough situation that it means the whole analogy isn't that helpful except in some specific cases.

>This doesn’t mean canalization is necessarily bad. Having habits/priors/tendencies is useful; without them you could never learn to edge-detect or walk or do anything at all. But go too far and you get . . . well, the authors suggest you get an increased tendency towards every psychiatric disease.

This strikes me as somewhat interesting, but not very different from where we were. I had a big struggle with depression in teens and early 20s for say 10 years. And I suspect if my life situation was bad I would still struggle with it. And to some extent the depression/suicidality at least partial arises out of things which otherwise are generally quite useful habits.

Tendency to dwell on and overanalyze things. Ability to focus. Lack of concern for other's feelings. Which is all great if you are doing forensic accounting about a big mess of a project.

Less helpful if you are doing a forensic accounting of your personal failings/injustices.

Expand full comment
Mark_NoBadCake's avatar

Another great write-up! Especially like history bit: clockworks-switchboard-computer, of course all mechanical metaphors fall short because we are organic.

"The calculating machine works results which approach nearer to thought than any thing done by animals, but it does nothing which enables us to say it has any will, as animals have.”

--Blaise Pascal, Collected Works of Blaise Pascal, c.1669

Also, regarding: "sort of person who repeats “I hate myself, I hate myself”" Having read E. Easwaran Passage Meditation/Conquest of Mind and now Philokalia: The Bible of Orthodox Spirituality[A. M. Coniaris] - which are [all] surprisingly(to me) pluralist I'd suggest that the Mantram/Jesus Prayer would eventually if not quickly the increasing case load of our mental health professionals.

[Like giving a dog a chew toy so he stops tormenting the cat!]

[E2: like giving Tim Urban's primitive a rubber ball in exchange for the crude torch we all have. Better to facilitate playing well with others.]

Expand full comment
Moon Moth's avatar

Or the nembutsu.

Expand full comment
Mark_NoBadCake's avatar

I hadn't run across that term until now but YES. Thanks.

I read every major religion has this practice. So many useful aspects of native Western religion have been abandoned!

Expand full comment
Moon Moth's avatar

I gotta say, I think the Orthodox are onto something when it comes to doing this in the person's native language.

Expand full comment
Mark_NoBadCake's avatar

Brought up as a mainline protestant I migrated to 90's Evangelicalism and later married(and divorced) a ~Catholic. I have to say that Orthodox is a much more straight forward, positive and practical presentation of Christianity. [Disclaimer: the following tentative conclusions not directly from the books mentioned but a variety of reading and experience] Sure there were issues before the Great Schism but that event and the Reformation seem to have been hard turns toward consolidation of power and a trend toward wealth generation with an emphasis on fear/[guilt].

It's like they gradually sawed off the lower rungs of the ladder to sensible religion.

That's where I'm at in my reading right now anyway!

Expand full comment
Mark_NoBadCake's avatar

Dorthy, I think, would agree:

"And particularly in the matter of Christian doctrine, a great part of the nation subsists in an ignorance more barbarous than that of the dark ages, owing to this slatternly habit of illiterate reading. Words are understood in a wholly mistaken sense, statements of fact and opinion are misread and distorted in repetition, arguments founded in misapprehension are accepted without examination, expressions of individual preference are construed as ecumenical doctrine, disciplinary regulations founded on consent are confused with claims to interpret universal law, and vice versa; with the result that the logical and historical structure of Christian philosophy is transformed in the popular mind to a confused jumble of mythological and pathological absurdity.”

--Dorothy Sayers, Mind of the Maker, 1941

Expand full comment
snav's avatar

Hey guys, here again to explain how this connects to Freud. In his paper Beyond the Pleasure Principle, Freud describes two opposing drives, the life drive and death drive. The latter has the worst name of all time, leading to immense persistent misinterpretation -- death drive can be thought of simply as the _compulsion to repeat_ (named because the organism _wants to die in its own way_, which entails surviving until it is _ready_ to die. Stupid, I know). Life drive, on the other hand, is akin to seeking novelty, named as such because he tied it back into the drive toward the creation of new life, sex, creativity, etc.

Now, without getting too deep into the details of his theory, does the idea of "compulsion to repeat" not harmonize with "canalization" as posed above? Indeed, Freud recognized repetition compulsion as the basis of neurosis (~= all the mental disorders except dark triad and psychosis and MAYBE schizophrenia altho it requires a deeper epistemic claim too, going off a Lacanian reading). We arrive at the same conclusion as Scott: that the "symptom" (such as the thought loop Scott described) is ultimately a result of compulsion to repeat, in this case in a sort of meta sense insofar as the recognition of the compulsion to repeat itself became the object of repetition / symptom itself.

Ultimately the entire theory is rooted in a strange thermodynamic reading of biological processes ("the psyche is like an amoeba?"), but turns out that cybernetics is cybernetics regardless of digital or analog.

Expand full comment
Platypuss in Boots's avatar

So the life drive vs. death drive thing is the explore-exploit tradeoff?

Expand full comment
JDRox's avatar

Statements like this make me lose my already tentative grip on what canalization is supposed to be: "overfitting is too much canalization during inference, and underfitting is too little canalization during inference. Likewise, over-plasticity is too much canalization during training, and over-stability is too little canalization during training."

Wouldn't it be better to say that, to borrow Scott's ball/cursor analogy (where, I take it, the ball moving = inference), that overfitting is the ball having too little mass (too little momentum, and so following the canals too rigidly), underfitting is the ball having too much mass (too much momentum, and so not following the canals enough), over-plasticity creates too much canalization (during training), and over-stability creates too little canalization (during training).

Expand full comment
Deiseach's avatar

Well, the illustration does fit well with how my brain feels: overheated in spots and full of holes.

As to the rest of it - mmmm. They do seem to be a little too enthusiastic about "This fits in Quadrant A and that fits in Quadrant B", meanwhile I'm going "So it seems to me that I have a bit from A, a bit from B and I'll take something from C as well, thanks".

Expand full comment
c1ue's avatar

Lots of good observations.

AI: the problem with AI is that it is fundamentally limited in training scope. Human consciousness is unquestionably building on animal consciousness, in turn built on various stages of living being consciousness. Whatever the means or method - these levels of consciousness have been honed over billions of years in an absolutely objective environment of survival.

AIs - neural nets or whatever - have none of these characteristics. They are tuned by their designers both by design and by training data. Their existence is artificial as is their "survival".

As someone who has worked with really brilliant people in engineering - the one thing I can say is that it doesn't matter how smart anyone is - they NEVER accurately anticipate what reality brings.

Neural net models are like climate models: useful but any expectation of reality from them is foolish to the extreme.

Expand full comment
Eremolalos's avatar

It's true to that the way our mind works has been honed over the years by the need to survive, but that kind of honing does not necessarily optimize for perceiving things clearly or making accurate predictions. For instance evolution has left us with a tendency to divide the world into our tribe/dangerous strangers, because there was a survival advantage to bonding strongly with those in your tribe, and erring on the side of negative assessment of all others, because there was a high cost & low benefit to giving people from outside the tribe the benefit of the doubt. We are vulnerable to all kinds of biases that probably have their origins in circumstances where it was advantageous to have them, but that now just make us misperceive things.

In the 1950's, psychologist Paul Meehl argued persuasively that actuarial prediction is better than clinical judgment for many purposes, and in later parts of his career did a pretty good job of demonstrating that he was right. Here's an example: One of the things doctors treating a schizophrenic think about is how good are the changes the person will return to normal. There all kinds of ways of predicting this: the psychiatrist's clinical judgment, many many structured interviews and tests such as the Rorschach, nursing staff's observations, measures of how severe the person's symptoms are. But it turns out that for males, the answer to one question predicts outcome better than anything else: Has he ever been married? I could give you lots of other examples, but you get the idea.

OK, so AI is sort of like actuarial prediction. It is dumb in some ways, but it is not afflicted with the biases human beings are. I think in some situations, including the making of some predictions, AI's relative freedom from biases will make it more likely to be right than even the smartest, best-trained person. Last I knew AI was reading MRI's (or maybe it was X-rays) to assess for one particular kind of pneumonia better than radiologists (by radiologist I don't mean radiation technicians, I mean MD physicians specializing in radiology.). I expect that in a few years it will be more accurate than all MD's in reading all medical images, judging biopsies, and predicting the likelihood that a person with a certain history and symptom cluster currently has each of several possible disorders. I'm actually not sure what you mean by "expectation of reality" about AI, but these things seem to be to be instances of it.

Expand full comment
c1ue's avatar

I've never said that the outcomes of evolution are perfect.

What I've said is that they have proven to work by the most ruthless and objective standard imaginable.

As for your counter-example: medicine is called "practice" for a reason. There is no reason whatsoever to believe, even today, that even the most modern methods are science as opposed to practice. Nor has the reality of "machine diagnosis of X rays" actually shown real skill over trained experts that I have seen.

And last AI. It is not that AI is "dumb" in some ways - the real problem is GO as in the 2nd half of GIGO. GIGO arose in the era where you only got garbage out when you inserted garbage in; the iron-clad premise was that the part between GI and GO - the computer - would work exactly as it was supposed to.

AI, in contrast, can GO (garbage out) at any time for any reason.

A big part of this is because there is no tiered, tested layers of improvement.

Natural vision in animals - whether plants sensing light or animals seeing movement or people seeing colors and shapes - is built of multiple levels which have, once again, been tested at each stage a staggering number of times over an enormous array of circumstances over a vast stretch of time.

It is more of a wonder why people think that AI can even achieve reasonably accurate machine vision given the fundamentally limited training set.

The notion that AI - as in general intelligence - is going to arise when we don't even really understand how human intelligence arose or how it works - equally ludicrous.

So while it is certainly possible that "magic occurs here" - the problem is that there are no indications whatsoever that so-called AI even for limited use cases like machine vision, self driving, reading internet search results and formatting into paragraphs i.e. LLMs - actually are more than simply awesomely Rube Goldbergian Eliza routines: form without the substance.

Expand full comment
Eremolalos's avatar

"I've never said that the outcomes of evolution are perfect."

I did not think you had.

"The notion that AI - as in general intelligence - is going to arise when we don't even really understand how human intelligence arose or how it works - equally ludicrous."

I never said I thought it was going to.

"So while it is certainly possible that "magic occurs here" . . . actually are more than simply awesomely Rube Goldbergian Eliza routines: form without the substance.:"

I did not say or imply that magic occurs here. Did you think I had? My point was that while we are far smarter and more mentally flexible and inventive than AI, there are intellectual tasks where simple kinds of processing using a large body of data outperform human skill, even when we intuitively feel human skill will do better. There is a smart kind of dumbness, and that's what AI has. The example I gave of actuarial prediction of a schizophrenic's chance of recovery is an example of smart dumbness -- i.e.an answer arrived at without insight, and in fact without consciousness, that is correct more often than answers that grow out of our insight and far-ranging conscious observations. Reading radiologic images is pattern recognition, and AI does that very well. It's doing that when it does next-word prediction. Feed it a million radiographs and the correct read of each and it will become extremely good at predicting the correct read of various patterns on the image.

" Nor has the reality of "machine diagnosis of X rays" actually shown real skill over trained experts that I have seen."

"The sensitivity of AI was 99.1% (95% CI: 98.3, 99.6; 1090 of 1100 patients) for abnormal radiographs and 99.8% (95% CI: 99.1, 99.9; 616 of 617 patients) for critical radiographs. Corresponding sensitivities for radiologist reports were 72.3% (95% CI: 69.5, 74.9; 779 of 1078 patients) and 93.5% (95% CI: 91.2, 95.3; 558 of 597 patients), respectively"

(from https://pubs.rsna.org/doi/10.1148/radiol.222268)

Expand full comment
c1ue's avatar

Re: reading X-rays

That's funny - your belief on AI validity in reading X-rays based on one publication is a view not shared universally:

1) https://www.nature.com/articles/s41591-021-01595-0

"Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations"

2) https://www.itnonline.com/content/medical-ai-models-rely-shortcuts-could-lead-misdiagnosis-covid-19

"Medical AI Models Rely on 'Shortcuts' That Could Lead to Misdiagnosis of COVID-19"

But the biggest problem is GO: because these AI models are 100% opaque, an error can occur for any time for any reason with zero chance of detection. Do human doctors spit up randomly wrong results also? What are comparable rates?

Expand full comment
Eremolalos's avatar

Why assume my belief is based on one publication? It is actually based on several plus one meta-analysis type overview. However, you sounded as though you were not aware that AI outperforming doctors was even a thing: "Nor has the reality of "machine diagnosis of X rays" actually shown real skill over trained experts that I have seen." So I just went on google scholar and grabbed the first article I saw. You know, I'm going to end this conversation because you are unpleasant to talk to. Your first reply to me was somewhat irritable and contemptuous in tone, and seemed to have a number of unwarranted conclusions in it about stoopit stuff you thought I thought. Now you're telling me I'm funny. Well thanks! And you, my dear, are a cutie-pie when you're self-righteous.

Expand full comment
c1ue's avatar

So in other words - you don't have actual first hand experience with AI X-ray diagnosis.

Do you actually have experience with AI of any kind?

The problem I have with any paper, meta studies, or whatever is that AI is not a disinterested study area. There is an enormous amount of money as well as "publish or perish" academic skin in the game involved, on both sides of the debate.

Furthermore, the places that publish the "AI is good" papers are consistently those that push AI and software engineering in general.

I note again the problems with any AI whether neural net or machine learning: opacity and reliability.

The entire point of having medical school and residency and licenses is to attempt to find and promote those people who demonstrate skill and reliability in the practice of medicine based both upon a foundation of education and ongoing learning, plus experience.

AI has none of these things. From a pure business standpoint - I believe 100% that AI is going to be used to displace skilled practitioners despite these issues but it doesn't mean I have to like it or that I will agree that the outcome is going to be positive..

All I can hope for is AI will be used responsibly: for example, as a review mechanism for medical practice as opposed to a cheap unreliable replacement.

Expand full comment
Tossrock's avatar

> As someone who has worked with really brilliant people in engineering - the one thing I can say is that it doesn't matter how smart anyone is - they NEVER accurately anticipate what reality brings.

What a strange assertion. Engineers anticipate that the load on the bridge will be adequately supported by the structure given its geometry, material strength and environmental conditions. The bridge continues to support its designed load; quod erat refutatum? Sure, bridges do sometimes fail, but that's very different than saying "[brilliant engineers] NEVER accurately anticipate what reality brings". I think it's more accurate to say that brilliant engineers very often accurately anticipate what reality brings, within the context of their design space, and substantial departures are more the exception than the norm (cars mostly drive, planes mostly fly, buildings mostly stand, etc).

Expand full comment
c1ue's avatar

Engineers creating a bridge always have prior art of other bridges - which serve as the proof of reality.

Take any engineer and inject them into a fundamentally different situation, and all bets are off. My personal experiences were with semiconductors - every new technology node introduced new and unanticipated major effects that had to be taken into account in order to successfully design functional chips in that new technology.

Other examples include the regular explosions on SpaceX launch pads. While rockets have been successfully launched for decades - there is still a vast difference between proven designs and new ones.

Expand full comment
Cremieux's avatar

Scott, just to be clear, the GenomicSEM results for the p factor do not support it: they suggest it's not real. There is also only negative support for the d factor. Both are the results of researchers assuming causality when the causal evidence is absent or negative. Please read Franić's work in which she described the meaning of common vs independent pathway models: https://psycnet.apa.org/record/2013-24385-001.

The part that matters most is on page 409: "Barring cases of model equivalence... a latent variable model cannot hold unless the corresponding common pathway model holds." You can use twin or family data or GenomicSEM to assess whether there is a common pathway model. If there is, the model will be consistent with the common causation implied by the phenotypic factor model.

This model holds for intelligence. It does not hold for general psychopathology, personality, or disease.

Expand full comment
Oig's avatar

"Every other computational neuroscientist thinks of autism as the classic disorder of over-fitting."

Perhaps it is both. Maybe the underfitted plasticity of information that the autistic brain can interpret demands an overfitted style of cognition and behavior. This explains mundane autistic behaviors like stress resulting from routine-breakage to more severe sensory overload.

EDIT: I just read that autism as dimensionality post and he beat me to it:

We can expect many of the behavioral and cognitive symptoms of autism to be compensatory attempts to reduce network dimensionality so as to allow structures to form. The higher the dimensionality and lower the default canalization, the more necessary extreme measures will be (e.g. “stimming”). “Autistic behaviors” are attempts at cobbling together a working navigation strategy while lacking functional pretrained pieces, while operating in a dimensionality generally hostile to stability. Behavior gets built out of stable motifs, and instability somewhere requires compensatory stability elsewhere.

Expand full comment
Colin Morris's avatar

There's a parameter called temperature which you can vary when doing inference on an LLM (or any other probability model). Using the analogy of a ball moving across an energy landscape, you could think of temperature as contributing some "pep" to the ball, randomly nudging it in different directions, such that sometimes it can climb up out of a valley. As temperature goes to infinity, the limiting behaviour is for the ball to move in a way that completely disregards the contours of the energy landscape - for an LLM, this means emitting a stream of words chosen uniformly at random, with no regard to the particulars of the model's learned weights. A lower temperature leads to more predictable behaviour. The limiting behaviour at temperature 0 is to always emit whichever token the model assigns the highest probability.

For practical applications of language models, low-ish temperatures are preferred, as higher probability sequences of words are generally judged by humans to be qualitatively superior. However, as you decrease the temperature, you increase the risk of a curious pathological behaviour. At low temperatures, LLMs tend to get stuck in loops, reminiscent of Scott's experience on research chemicals. For example, if I sample from the 7 billion parameter LLaMA model conditioned on the prefix "Roses are red," and set the temperature to 0.1, it gives the following output:

Roses are red, violets are blue, I'm so glad I'm not you. I'm so glad I'm not you. I'm so glad I'm not you. I'm so glad I'm not you. I'm so glad I'm not you.

Most LLMs you interact with, such as ChatGPT, have some heuristics strapped on at inference time (such as a "repetition penalty") to avoid falling into these thought loops.

Expand full comment
Karl K's avatar

I can't shake a suspicion that we're just playing out the "faces in the clouds" thing with each of these analogies for brains. Anthropomorphizing computers, more than describing our actual brain function.

The kinds of "algorithms" that can "run" on such different hardware feels like it *must* have important effects on the downstream output/behavior. Some evidence: with the computer analogy, we spend a lot of effort using computer terminology to describe how *unlike* normal computers brains are: super-parallel, non-digital(!!!), decentralized, enormous numbers of input/output channels, oddly small context window (working memory), etc.

Maybe the chain of metaphors is mostly-wrong the whole time, but just achieves better fit with more terminology to stretch the metaphor? In other words, over-fitting the metaphor itself?

Typo: You wanted "bytes" instead of "bites".

Expand full comment
Viliam's avatar

If all mental illnesses are correlated, can we calculate a generic "Crazy Quotient", analogical to IQ?

I am curious how the entire population's CQ curve would look like? Is it one-sided, like most people are normal or mostly-normal, and then there is the tail of increasingly crazy people? Or rather is it two-sided, with normies in the middle, crazy people on one extreme, and... whatever is the opposite to crazy... on the other extreme?

If it is the latter, what do the extremely "anti-crazy" look like? Are they exceptionally sane people, super successful in real life? Something like "high IQ and without any mental burden"? Or is the opposite of crazy also somehow... weird? Such as, extremely *boring*?

(I thought about calling it Sane Quotient, rather than Crazy Quotient; just like we have Intelligence Quotient rather than Stupidity Quotient. But that's actually the question: is "sane" the opposite extreme to "crazy", or is "sane" in the middle and the opposite is something else?)

Expand full comment
Eremolalos's avatar

In one system I know of, those with no trace of diagnosable mental illness, not even a few mild symptoms, are labelled the Hyperconventional. The person who developed the scale -- can't remember who, now -- believe that those people were more prone to psychosomatic illnesses.

Expand full comment
Eremolalos's avatar

Scott, a research chemical sent me into a very similar loop. My guess is that the loop started when I realized that I could not remember some actual thing. But the loop itself was, “There was a moment of trying to remember something. Wait, this is that moment. No, there *was* a moment of trying to remember something. Wait, this *is* this moment. No there *was* a moment of trying to remember something . . .” I was sitting on the end of a couch with my face in my hands, and the person with me asking what was up, and all I could say was, “It’s a trying to remember.” Yeah, that was hideous.

Expand full comment
Calcifer's avatar

Wouldn't a maximally underfitted model classify images as 'dogs' and 'not dogs' randomly, in the proportion that these have in the training data?

Expand full comment
Viliam's avatar

I think it still must classify the training data correctly, otherwise the training isn't complete. The question is how it reacts to anything outside the training set.

(Low confidence; I am not an expert.)

Expand full comment
Your name's avatar

What does this say about co-morbid illnesses in different quadrants, e.g., ASD and MDD? It's less bad than simply saying that ASD is too much bottom-up processing and MDD is too much top-down prediction, despite the ASD population having high rates of MDD, but your summary didn't explain if commorbidities like this were addressed, unless I missed something.

Expand full comment
Eremolalos's avatar

"Embarrassingly for someone who’s been following these theories for years, I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?” My best guess is something like “some kind of artificial space representing all possible thoughts, analogous to thingspace or ML latent spaces”, and “free energy”, but if you ask me to explain further, I mostly can’t. Unless I missed it, the canalization paper mostly doesn’t explain this either."

Actually I think it makes sense you for to feel confused about this. Problem is the landscape illustrations, not your head. These landscapes with funnels in them *look* like plots of equations with 3 variables. In fact the color gradations in the funnels suggest that a 4th dimension is also being represented, via color. But the color changes are just ways of dramatizing the differing depth of the funnels. There's a color spectrum that runs from green through yellow, orange and pink to purple, and that maps onto depth. Green is "sea level," purple is the depth of the deepest funnels reach. So the tips of the shallower funnels are orange or yellow.

OK, so there's no color dimension, color just = depth, i.e place on the z-axis.. As for the x- and y-axes, they don't represent a damn thing. The piece of land as a whole represents all the possible things somebody can do or think, but the north-south and east-west dimensions don't mean anything here.

As for the z dimension -- well, there are 2 z-related things that matter: Slope, which mathematically is how much altitude you lose over a given horizontal distance, represents strength of attraction: The steeper it is, the harder it is not to slide down. Depth is how far down someone is. I guess how far below sea level someone is represents how far from being neutral or average they are about whatever the funnel they are in represents. Note that psychopathology need not to be involved in someone's sliding down to the bottom of a funnel. Let's say you're traveling with someone who has a deep interest in a certain Danish philosopher, and you happen to discover that you're in the town where he was born, and his home has been converted to a museum, and all his books are there and people are welcome to sit and read them. Your friend is going to slide down the slope to that museum really fast, right?, and it will be hard for him to tear himself away when it's time to head on to Sweden.

The landscape also has built into it some assumptions, and these are not identified, and easy to overlook. I think, but am not sure, that that is not altogether fair. (1) All the low areas are represented by funnels. i.e. they are smallish spots on the horizontal landscape when the land suddenly goes downhill very steeply on each side. There are of course other kinds of low areas possible in landscapes. For instance there could be a big valley with a shallow slope on most or all sides. If its circumference is large enough, it can even be as deep as the deepest funnels on the landscape we are given. And in fact most of us have things like valleys in our lives: For instance somebody who likes a lot of outdoor activities does. They’re attracted to a large variety of things of that kind: outdoor photography, kayaking, camping etc etc. But the slope into them isn’t steep. If an opportunity to do some outdoor thing presents itself, they’re inclined to do it, but they don’t feel an irrresistible pull. And even after a bout of bingeing on outdoor activities (i.e., going to the deepest part of the valley) it is not hard for them to stop if there’s another appealing possibility, or an obligation. (2) There are no hills or mountains. Mountains and hills would correspond to things somebody avoids (since downhill represents attraction, uphill would represent avoidance). Why have a model that does not represent avoidance? It’s as much a part of what guides human behavior as attraction. We all avoid auto accidents, and in fact we tend to avoid things lower down that mountains, things that take us partway to an accident, such as driving on an icy road. There’s also plenty of pathological avoidance, and to me mountains seem like a better representation of it than valleys. So for instance if somebody has illness-related OCD, they may avoid pubic bathrooms, restaurant food, hospitals, etc etc. Most people with OCD have numerous things they avoid. I do see that you could also represent their disorder as easily sliding down the slope of thoughts about “it could make me sick” and getting stuck at the bottom of the funnel.

I guessmy take on the landscape is that the authors had a lot of options for representing human attraction and avoidance, but chose the one that fit best with canalization.

Expand full comment
Gordon Tremeshko's avatar

My thoughts also got stuck in a loop one time when I was on mushrooms back in college. It wasn't fun. I guess I'm just glad to hear this has happened to someone else in a similar context.

Expand full comment
LGS's avatar

Factor analysis is more or less fake, yes. I've been saying this for a long time, as have people like nostalgebraist (as you know).

But mostly, Scott, when you say "the statistical structure doesn’t look like a bunch of pairwise correlations, it looks like a single underlying cause", I would encourage you to think about what you think this means. If you could even DEFINE what "looking like a single underlying cause" means, you'd be doing better than most psychometricians, who never bother to define this. You cannot tell causation from correlations, obviously, so what is it you're looking for in your correlation matrix?

Expand full comment
MellowIrony's avatar

> I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?”

Probably going to mangle the neuroscience parts of this, but taking the "human brain ≈ artificial neural net" analogy completely literally:

Each dimension of the space at training time = one weight of the neural net ≈ strength of one synapse ?≈? strength of association/priming between one pair of concepts (what thoughts cause--or inhibit--what other thoughts?)

Height of the landscape at training time = loss function = whatever the heck represents that-which-is-reinforced in the brain during learning (I think dopamine and other neurotransmitters might be responsible for representing the slope of the landscape, and the height might just be implicit??) ?≈? some combination of predictive accuracy (right is low, wrong is high) and emotional or hedonic valence (pleasure is low, pain is high)

Each dimension of the space at inference time = one dimension of a recurrent hidden state ≈ rates of action potential spikes ≈ how strongly a "single" "concept" is activated (actual concepts are known to be distributed across many neurons, iirc)

Height of the landscape at inference time = value or score function of a search algorithm (many neural nets don't have this part) ?≈? some subnetwork of the synapses, possibly concentrated in the frontal lobe (our very own personal mesa-optimizer!) ≈ how well one achieves goals or executes plans (lower is better).

I'm not sure how seriously to take this, but at least it highlights some obvious gaps in the metaphor when it comes to actual brains: Why do we have so many different neurotransmitters? What's the role of hormones? Where does instinct/genetics come in? Is motor control learned differently from prediction, and if so, can the two be represented by a single weighted loss function?

Expand full comment
quiet_NaN's avatar

One way to sanity check that model would be comorbidities. Schizophrenia being anti-correlated with autism sounds possible, sure. But borderline being anti-correlated with anorexia or depression feels wrong to me. Of course, you can add epicycles to the model a la "with BPD, we treat anorexia as self harm rather than a condition on their own".

How does learning even work with non-artificial neural networks? I assume there are different processes on very different timescales involved. Learning to play a game could start as simply storing the rules as in the state of general purpose short term memory neurons, and using general purpose neurons to apply them. Eventually I would expect that the brain will generate application specific neurons by adjusting the synaptic weights and perhaps establish new connections?

The underfitting/overfitting example through an asymmetry between dogs and not-dogs. It gets more complicated if one trains a NN to distinguish dogs from cats instead. An overfitting NN might remember all the dog and cat pictures and simply reply "50% dogness" to everything else? What would an underfitting network do? Arrive at the rule "Black/white blobs are cats, brown blobs are dogs"?

I would assume that the network sizes required for overfitting are much larger than the ones for underfitting. If you allocate more neurons for learning anything than a neurotypical human, would your brain not run out of neurons?

If you get threatened by a man X on a Tuesday night in the subway, underfitting might look like promoting the prior "men are dangerous", a healthy update might be "men in the subway at night may be dangerous" and overfitting might be "man X may be dangerous when encountered in the subway on a Tuesday night". The relative neuron requirements feel like 1:50:1000.

Expand full comment
Eremolalos's avatar

Just from clinical experience, my guess is that borderline is highly correlated with depression, and moderately with anorexia.

Just looked it up. In the one study I grabbed, 80% of those with BPD were also diagnosed with depression. Odds ratio of depression for BPD vs. no-BPD was 220! Anorexia was not among the diagnoses included in study. It's here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5225571/#:~:text=BPD%20is%20a%20complex%20clinical,3%2C%2015–19%5D.

Expand full comment
Randall Hayes's avatar

This kind of landscape mapping is all over evolutionary biology, where they call them "fitness landscapes," and has been growing in neuroscience for at least 30 years. My last paper as a researcher in 2005 was on the technical methods involved in trying to fit parameters to these huge multidimensional datasets.

https://scholar.google.com/citations?user=uWZ5lvsAAAAJ

Expand full comment
Ori Vandewalle's avatar

Not quite sure "the brain is like a computer we designed to be like a brain" is a metaphor...

Expand full comment
Arthur Juliani's avatar

Hi Scott,

Thanks for this write-up. This is Arthur Juliani, the first author of the ‘Deep CANALs’ paper. I am glad to hear that you find some of the extensions that we make to the canalization model useful. I realize that the section attempting to link psychopathologies to concepts in deep learning theory is the part of the paper which is perhaps most speculative. It was also the part of the paper which I was most hesitant to include. Ultimately it felt worthwhile to make the leap in order to follow the implications of the Inference/Learning and over/under-canalization through to their clinical implications, which would be in psychopathology.

In the case of autism in particular there are a couple papers which we reference that point to underfitting and lack of plasticity as both being characteristic of the disorder. On the other hand, as you rightfully point out, there are counterarguments to that perspective (and counterarguments to those counterarguments). While on the one hand this suggests that the framework that we propose here might be missing something (and I am sure that it is, on some level at least), on the other hand if it contributes productively to an ongoing dialogue regarding the best way to make computational sense of psychopathology then I see it as valuable.

As you put so well, the metaphors we use to understand the mind have become increasingly sophisticated over time. I certainly don’t think that deep neural networks provide some final isomorphism for the brain, but they do provide a useful set of new conceptual tools which we didn’t have access to before. I likewise look forward to whatever more useful metaphor comes along next to displace them.

PS: Since the initial preprint of the paper we have uploaded a revised version with a slightly adjusted final section discussing clinical implications (and a revised 2x2 matrix figure to go with it). It contains some of the more nuanced discussion around autism which you also included here.

Best wishes,

Arthur Juliani

Expand full comment
Your name's avatar

Dr Juliani,

What does this research say about co-morbid illnesses in different quadrants, e.g., ASD and MDD? It's less bad than (past research?) simply saying that ASD is too much bottom-up processing and MDD is too much top-down prediction, despite the ASD population having high rates of MDD, but I think I'm missing something.

Thank you!

Expand full comment
Incentives Matter's avatar

This discussion seems a good way to also think about system 1 and system 2 thinking.

System 1 involves very steep canals, something totally automatic as in the walking example Scott uses.

System 2 is much smoother, easier to drift into different shallow canals as we take our time to consider things and potentially end up in different places.

Expand full comment
Tony's avatar

"I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?”"

The dimensions correspond to the numbers that you need to completely specify the state of a brain, for a fixed set of neuronal connections and strengths. Given this state, you could in principle compute its future evolution (mental experience), possibly subject to some random noise and sensory inputs (which might push the state around).

The height of the landscape in a given point indicates how fast the system leaves that state. Also if the system evolves over time subject to noise long enough, the height of the landscape is related to the inverse of the probability to find the system in each point (assuming the process is ergodic; typically Pr(x) = C exp(-h(x)) where x is a given point int the landscape -a vector of many dimensions - h(c) is the height of the landscape at x, and C is some normalizing constant.

Expand full comment
Phil Getts's avatar

Could my terrible memory be due to a deficit of canalization?

My memory for arbitrary facts and data is far below average. I've never learned to remember to shave in the morning, and need to hang a mirror next to the door out of my house. When I meet new people, I often ask them their name 3 times within a few minutes, and yet still forget it within minutes after ending the conversation. If I need to go to a different location to do something, I need to write down what I mean to do before going, because if I walk 200 feet, to a location which it isn't unusual for me to be at, I'll forget on the way what I was going to do about 90% of the time. Sometimes I pull a book off the shelf with the intention of finally reading it, and discover it's already full of notes in my handwriting. I often can't tell, when watching a movie, whether I've seen it before. I've offended many people by not remembering their names, sometimes after knowing them for years and meeting them dozens of times. There are a handful of words I must look up the spelling of literally every time I use them, despite having looked up their spelling literally hundreds of times each.

On the other hand, I seem to be much more "creative" than most people in some ways. I've long wondered whether both my poor memory and my creativity might be due to what, in machine learning, we would call a simulated annealing temperature set too high, which would seem to have the same effects as a canalization deficit. This results in a search which easily gets out of local minima and traverses great distances on the landscape, but fails to settle down into a stable state.

On the other other hand, in some cases I can't remember something because some other memory is too strong. I have one friend whose name I can never remember; I can only remember that his name isn't David. I had a housemate whose face I can never remember, because I always remember a former co-worker's face when I try to remember the housemate's face. This would seem to be poor memory due to too-strong canalization.

Expand full comment
Hyolobrika's avatar

Speaking of "relaxed beliefs under psychedelics", I wonder if sexual interests can be changed in that way as well (but only if the person wants it, of course).

Regardless, it might be useful for curing paedophiles.

Expand full comment
Leo Abstract's avatar

I was doing fine reading this article until I fell into a canal in which I disagree with Scott about the nature of borderline personality disorder. This caused catastrophic forgetting and overwrote in my mind whatever else the article is about. I'd better go look at some pictures of golden retrievers again.

Expand full comment
FractalCycle's avatar

Can confirm, I have ADHD and trouble with overly-canalized thought-loops (especially regarding productivity "thought-stoppers" that I rarely consciously examine enough to find ways to bypass/solve them).

Also, I had the same experience as Scott w.r.t. horror-level thought-looping while on chemicals.

Expand full comment
GR's avatar

> there is no perfect equivalent of bits or bites or memory addresses

Typo: bytes

Expand full comment
Turbognome's avatar

This is one of those posts I hate to read, as I have had similar ideas floating around in my brain for the last year or so and must now face the fact that I am not as original as I thought.

Another factor to add to the mix is regularisation. In ML, if we find a model is overfitting, we may add some penalties to the loss function in order to cause it to fit better. One of these types of penalties is L1 regularisation, which effectively limits the number of inputs to each node in a neural net. The loss function is penalised for each non-zero connection, so the model essentially has to "justify" each non-zero connection with improved performance. The stronger the regularisation, the heavier the penality, the more the performance has to improve to justify the connection.

When taken too far, this leads to models with overly mechanistic rules, which sounds like autism to me. If autism were related to some analogue of L1 regularisation, we'd see behaviour such as precise analytical thinking, obsession with a handful of topics, inability to take in a wide range of information at once, etc. I claim this also explains an autistic draw towards linear-information objects such as numbers and letters, subtitles, car plates, etc. It's a more comfortable domain for a model with only a handful of connections.

I've written a bit about this here a month ago: https://reasonedeclecticism.substack.com/p/autism-and-l1-regularisation

We might also see schizophrenia as L2 regularisation (which would be my next post). A schizophrenic person doesn't have precise or mechanistic thinking, instead L2 forces a model to weight all inputs the same. So a schizophrenic might have one line of rigorous analytical evidence for the argument that the doctor is trying to help them, but also observed the doctor tapping his pen 8 times on his desk which is clearly evidence of a global conspiracy to lock them up. The L2 model must give both pieces of evidence the same weight.

L1 and L2 regularisation are responses to overfitting, and therefore we could argue that autism and schizophrenia are both adaptations to an overfitted brain. Therefore we'd expect to see some risk factors shared, and some driving in opposite directions.

The "obvious" analogue of ADHD would be a dropout layer, however I haven't thought through whether that is actually a real connection or just a surface-level pattern-match on "forgetting stuff".

Expand full comment
Bobdude's avatar

To address the footnote, you can think of the dimensions of the space in the same way you might think of the dimensions of "phase space" (i.e. you don't, probably, but if you do, each axis is just a quantization of every relevant describable property of the object). One might graph the phase space of a pendulum using the dimensions of angular position and angular momentum. Associated with this graph is a certain description of reality -- if an object has an angular momentum, its angular position will change with time; and gravity pulls everything downward, changing an object's angular momentum in a certain way depending on its position. These can be codified into differential equations, and, at least for this specific example, you can describe a contour according to its "Lyapunov function" such that, if you left a ball to roll down this contour, it would correspond exactly to the evolution of the state of a pendulum over time.

The example you gave of being stuck in a repetitive thought seems similar to the part of the contour describing how, if you spun a pendulum around really fast, in the absence of friction, it would keep spinning the same way forever.

To start graphing the phase space of the brain, you could start similarly. Some level of belief in X, some level of belief in Y; some level of sleepiness, some level of happiness; and so on, and so forth. To describe the brain fully would require a staggering number of dimensions -- but the visual intuition in the abstract space describing the brain is very similar to the pendulum example above.

Expand full comment
David Gretzschel's avatar

Thanks for helping clarify this "modern energy landscape representation" and making my thinking on this less vibes-based :)

I am still struggling a bit to understand what "depth" means in the "depth + steepness = gravitational pull"-analogy. My current assumption is that "depth" is simply the same as free energy. So every point in this multidimensional space is assigned a "free energy" or "surprisal level" according to the Free Energy Principle.

https://en.wikipedia.org/wiki/Free_energy_principle

Would you agree with that interpretation?

Expand full comment
Bobdude's avatar

My apologies, I hadn't brushed up on the relevant math in a long time and had a slightly misleading explanation, earlier, which has now been edited -- energy has little to do with it. I will say that the Free Energy Principle sounds like a partial description of whatever differential equations describe the state of the brain; namely, that in response to unexpected stimuli, the natural state of the system is to evolve in a way that similar stimulus produces less "surprise". I am definitely not one to ask for a more detailed understanding, though. It's been a long time since I've touched stuff like this...

Expand full comment
nelson's avatar

Are the Necker cube and the Rubin vase examples of canalization you can viscerally experience?

Expand full comment
mordy's avatar

“what do the dimensions of the space represent?”

I think the most literal interpretation is that the state of each neuron or maybe even each individual synapse is its own dimension. What else could it really be?

As bad as human intuitions are for multidimensional spaces, they’re even worse for nonlinear multidimensional spaces. I do wonder if we’re not just confusing ourselves by referring to these systems as being “spaces” with “dimension” at all.

There’s a default intuition when talking about spaces with dimension that free movement through the space is possible, or that you can vary the dimensions independently. This is obviously not true of the brain. The neurons and synapses upstream and downstream of the dog-detector-neuron(s) will tend to be in certain configurations prior to and immediately after the detection of a dog. There is no dog-detection without some sufficient amount of wet-nose-detection and/or floppy-ear-detection, etc. In other words, these variables are not really independent, and their interdependency makes the dimensionality much more “sheet-like”. This is true even prior to “canalization.”

Expand full comment
Timothy's avatar

There are some problems with the clockwork, telephone switchboard, computer and finally deep learning as analogies. The first three are all kinds of hardware while deep learning is a software. And, we actually know what kind of hardware a brain is; a computer. Not the kind of computer thats is made out of silicon and has lots of transistors but computer as in, a machine that computes. Every brain could be perfectly simulated by any iPhone or laptop or by Babbage's difference engine, a mechanical calculator. We could also make a brain out of telephone wires or a steam engine. Every universal computer can simulate a brain (as well as any other physical system). So the process of finding better analogies for the brain is finally over. In the unlikely chance that the brain is a quantum computer there might be some more work to do, but the brain is most likely just a classical computer.

What software the brain is running is a different harder question. Maybe it is just a deep learning program, probably something more complicated. Or our digital deep learners would probably a lot smarter than they are.

Expand full comment
Belisarius Cawl's avatar

Canalization looks like a form of dimensionality reduction to me

Expand full comment
Catmint's avatar

Of course there's a general factor correlating mental and physical diseases diagnosed.

Willingness to go see a doctor.

Expand full comment
Patrick's avatar

Anecdotally, I find the model in all it's glory compelling. I am a person who makes strong inferences on little data, I have always noticed that I am very good at forgetting, and I have bipolar disorder. Of course, that kind of mindset makes it very easy to pick up new grand models, ignoring everything else I have learned, and apply them to things, so I'm not shocked that I like it.

Expand full comment
Arthur Juliani's avatar

Thanks for this question. There is indeed a lot of co-morbidity between different psychopathologies. This was actually one of the motivations behind the original canalization paper where they attempted to provide a single explanation for this fact. The reason we think that there can be co-morbidity while still maintaining that there are different computational mechanisms behind MDD and ASD for example is that they may reflect different kinds of canalization in different functional brain networks. The visual system in adults for example is highly canalized, whereas the hippocampal system is much more plastic. As such, it doesn't make sense to talk about the whole brain being canalized (or not), and therefore an individual can develop co-morbidities which may reflect seemingly opposing dynamics, just at different functional (sub)networks within the larger system.

Expand full comment
pythagoras's avatar

"When a zealot person refuses to reconsider their religious beliefs..." Now I want to know if the intended edit was to switch to the word zealot or person.

Expand full comment
João Pedro Lang's avatar

The table on the paper says catastrophic forgetting (over-plasticity) is UNDER-canalization during training, but the text says over-plasticity is OVER-canalization; which is it?

Expand full comment
João Pedro Lang's avatar

Since catastrophic forgetting seems to be changing the landscape to destroy old canals and entrench new ones, I suppose it could be both

Expand full comment
Elvon's avatar

Because looping is a thing, maybe it's better to think of this in terms of a vector field rather than a terrain map? Balls don't roll in looping downhill circles forever (unless they're trapped on some Penrose stairs), but Vector fields can have loops that allow this. Thinking in this way also solves the "what does the height represent" problem. It's not height, it's an arrow. And the arrow points to what thought you will have next, given the thought that you are having currently.

Expand full comment
Michael M's avatar

I wanted to share something about "trapped priors" somewhere on ACX, and maybe this is as good a place as any.

5 years ago I had an episode of psychosis and did some property damage ("misdemeanor vandalism"). The court, recognizing this was probably something I could get help with (as they should in more cases than they do, I think), put me on probation for nearly 2 years. During this time I had to take urine drug tests, and the stress around the whole process caused me to have a very hard time urinating.

It is now years later, and my urinary hesitation never improved. Every day I have at least a couple episodes where it either takes more than 2 minutes to urinate or I am completely unable to urinate, even though it should not be stressful at all anymore because the stimuli that made it stressful is long gone.

As unlikely as it is, I do hope Scott sees this post so he can use it as an example somewhere if he needs one, because I think it is a close-to-perfect demonstration of the phenomenon. It can be traced to a single original situation, and it has an irrefutable (physical, even) component. Plus it even passes the Brian Caplan test: if you held a gun to my head, I would not be able to urinate - you would have to kill me.

Expand full comment
cyril maury's avatar

Fascinating article. Provided inspiration for this piece on how canalization can help avoid path dependency in organizations: https://stripepartners.substack.com/p/42d577ad-f4ab-4b52-bdaf-18020bf39277

Expand full comment
Fraser's avatar

Claim without any substation, just vibes: this framing feels strictly worse than the previous ones (trapped priors, etc). The metaphor of some high dimensional vector traversing an energy landscape is general enough that it *can* explain pretty much anything, but I don’t feel like the lens of Canalization gives me any more predictive value.

Expand full comment