180 Comments
Comment deleted
Jun 14, 2023
Comment deleted
Expand full comment

Huzzah! I was going to comment about this exact analogy. (I went through a *very* intense Marble Madness phase in middle school.) Thanks for doing it better than I would have.

Expand full comment

The term 'canalization' as far as I know, has its origins in genetics...

https://en.wikipedia.org/wiki/Canalisation_(genetics)

Expand full comment

This explains the whale cancer metaphor...

Expand full comment

It does, and to my knowledge it has long been considered an unhelpful or perhaps just wrong, metaphor. So it seems strange to reuse it for this.

Expand full comment

The military usage (funneling an approaching enemy through a narrow, known path where e.g. you've got your artillery all dialed in) seems apropos---something you very much want to avoid happening to you, and try to do to your enemy. I don't have a great link; https://www.militaryfactory.com/dictionary/military-terms-defined.php?term_id=831 is the best I could find with a quick google search.

Expand full comment

Computational biologist / mathematician Sergey Gavrilets had a paper about the same term in the context of hormone signaling and sex / gender determination.

http://www.intergalacticmedicineshow.com/cgi-bin/mag.cgi?do=columns&vol=randall_hayes&article=010

Expand full comment

Stuart Kauffman used it to refer to the formation of attractors in a dynamic network, in his 1992 book Origins of Order (which, as I've mentioned on this blog, I think is about the 4th-greatest scientific book of all time, just after Newton's Principia, Origin of Species, and A Theory of Information).

Expand full comment

Picky comment: There isn't a 1:1 mapping between HIV and AIDS, there are a very few people who with with HIV and don't get AIDS, IIRC.

Re canalization: I think there needs to be a theory that addresses how different canals do or do not intersect/connect, depending on circumstances, priors, etc.....

Expand full comment

Lots of people get HIV and never get AIDS these days, thanks to antiretroviral drugs. You can't cure the HIV but you can keep it from killing off all your immune cells.

All AIDS is caused by HIV, but that's just because we've defined it that way - there are other things that cause immunodeficiency. If we wanted we could define ADHD symptoms caused by genetics as a different disorder from ADHD symptoms caused by head injury, we just haven't done so.

Expand full comment

1. There are people who seemingly innately don't get AIDS, even when infected with HIV.

2. What we're both describing, then, is not a 1 to 1 relationship.

Expand full comment

yes, I agree, I was not trying to contradict you, just to elaborate/add to the discussion

Expand full comment

> I’m now suspicious that factor analysis might be fake

I mean, ya! It's Principal Components Analysis less-credible cousin.

More seriously, it's a notoriously hard thing to interpret. At some point, it feels like reading tea leaves. If you're generally skeptical of null hypothesis significance testing as a paradigm, these types of analyses should almost be read as negative evidence.

Expand full comment

Here is a skeptical take on the specific “d factor” paper. The blog author is a reputable academic:

https://eiko-fried.com/does-the-d-disease-factor-really-exist/

Expand full comment

Great article, thank you for the link. In my opinion it summarizes very well the problem with the concept of a general mental health factor.

Expand full comment

As I recall, Cosma Shalizi pointed out that if you do factor analysis on some set of a thousand uncorrelated things, you will end up with a rather decent false positive general factor just by happenstance. I'm sure there is some way to account or compensate for this (analogous to Bonferroni correction or whatever), but it suggests that factor analysis is a treacherous beast.

Expand full comment

I have also found CS argument unconvincing. He argues that if you create new variables by summing a large number of uncorrelated variables (ie for example randomly select 2 subsets of 500 variables among 1000 variables) then these new variables will be correlated, even if the original variables are not. He concludes that these correlations are spurious. I think that they are not, and the fact that your new variables are created by sampling a large proportion of the origin variables create a perfectly real correlation.

But I do agree that this does not indicate that the factor resulting from these correlation has a biological meaning. In my opinion it is just a summary of many variables.

Expand full comment

If over-canalization is caused by or associated with having too few synapses, then under-canalization would probably be associated with autism. Autism is associated with having too many synapses; it's not the root cause, but more like the intermediate or maybe proximate cause. So one thing to look at would be if autism breaks this pattern of every psychiatric condition being correlated with every other.

Expand full comment

Autism and ADHD are very correlated.

For a bunch of other disorders like OCD, bipolar, and BPD, it's unclear whether the disorders are actually correlated, related in some more complex way, or just frequently misdiagnosed as each other, any of which could also be framed as "none of these categories cleave reality at its joints".

And then there's schizophrenia... lots of people have characterized it as the opposite of autism, but it does sometimes co-occur.

Expand full comment

Yeah that's what I thought– it seems like pretty strong counter-evidence to at least that particular version of this theory.

Expand full comment

Keep in mind that, formally, the thing the medical system calls "Autism"/"ASD" is definitionally just a checklist of symptoms[1] that you have to match, not a real "thing" in concrete terms. (Part of why ADHD is so correlated is that its checklist is close to being a subset of ASD's) When talking about mis-/co-diagnosis statistics, it's much more map than territory. It describes symptoms which might not share an underlying cause.

The same goes for most other mental disorder as well, like e.g. the overlap between cluster-B disorders. OCD specifically has guidance about only being diagnosable if the obsessive thoughts don't match generalized anxiety, or body dysmorphic disorder, or trichotillomania, etc, even if you have all of those combined and your diagnosis would make more sense as "maybe I just generally have OCD"

[1] A checklist which people who almost definitely "have" autism often fail! Especially if they're afab.

Expand full comment

What would the shape of this landscape be?

As a biologist, I imagine something akin to Waddington's landscape.

https://onlinelibrary.wiley.com/cms/asset/632f03d0-a547-44f5-9034-4574f281bd2c/mfig002.jpg

Here, stem cells are represented as being on the top and traveling down the canals into different pools of differentiated cells (neurons, skin cells etc). Top level stem cells are very plastic and can become anything they want. Downstream level stem cells might choose between a few different paths. And differentiated cells at the bottom are stuck with their cell type.

For a differentiated cell to change the cell type, we make them climb the canal back up to the "stem cell branching point" and let them choose a different branch. They can't just climb over the walls of the canal, the local gradient slope is too steep.

Perhaps psychedelics let people revert back to the "undifferentiated state" where their psyche has the opportunity to choose a different branch of development?

Expand full comment

IIUC, the main thing psychedelics do is make it dramatically easier to reweight or rewire synaptic connections; if we were talking about a chemical free-energy landscape, this would represent raising each neuron's "temperature" so it could climb otherwise unsurmountable energy barriers.

If we focus on the "training" (synaptic/connectomic) rather than "inference" (transiently self-reinforcing latent activity/attention) side of the latent neural activity space, this is definitely somewhat analogous to what the Yamanaka factors do to the epigenomic landscape, by opening up chromatin across the board and allowing the cell to pick a new and radically different stable epigenetic state. Since neurons treated with psychedelics in vivo don't go through a stage where they lose all their synapses before making totally new ones, I'd say this process is more like transdifferentiation than de/redifferentiation.

Maybe the intuition is different if you focus on the spookier activity-pattern side; I haven't done enough research to opine on whether the psychological state of a psychedelic user resembles the "pluripotent" open-mindedness of a child, or if it's a sui generis state. This perhaps recalls the longstanding controversy over the extent to which iPSCs recapitulate ESCs.

Expand full comment

It’s hard for me to even grasp the paradigm that autism could be an underfitting issue; it seems obvious that it’s more like “I didn’t understand you expected me to do X because my standards for X are these many overfitted criteria that must be met to make things very clear.” Can anyone describe paint a picture like this about what autism from underfitting would even look like?

Expand full comment

The first thing I can think of is that neurotypicals can learn new social behaviors from a few examples, while autistics generally need far more, and more explicit, training.

I agree it feels more like autism is about overfitting, but not really in the sense you describe - it's more like way more noise is getting mistaken for signal across the board. This explains sensory issues, meltdowns/shutdowns, etc, and it explains social difficulty when you realize how much of the social information that comes off of a person is actually noise and not signal.

Expand full comment

Great points, thanks!

Expand full comment

> The first thing I can think of is that neurotypicals can learn new social behaviors from a few examples, while autistics generally need far more, and more explicit, training.

That might be because NTs are already trained on lots of data. So it's like taking a GPT-3/4 and teaching them sth by giving it a few examples. While autists learn from scratch (or rather from a more shaky foundations).

From "The Intense World Theory – a unifying theory of the neurobiology of autism":

> (...) right amygdala activation is enhanced in autistic subjects during face processing when controlling for attention, that is when the autistic subjects pay attention to the stimuli. (...) Autistics also spent less time fixating the eyes region (deviant eye gaze is a core feature in autism). Moreover, in autistics, but not in controls, the amount of eye gaze fixation was strongly correlated with amygdala activation when viewing both, inexpressive or emotional faces.

> This suggests that that eye gaze fixation is associated with emotional and possibly negative arousal in autistics and this could explain why autistics have “trouble looking other people in the eye.” Eye contact and watching the facial expressions are one of the first signs of cognitively healthy infants, are natural to people, and serve to build the basis for successful navigation through a social environment. For an autistic person however, these stimuli may be just too intense or even aversive to cope with and hence they are avoided.

> Obviously, continuous avoidance of a special class of cues will consolidate feature preference processing and prevent learning in this domain, thus some later developed social awkwardness and inappropriateness described in autism may be due to this lack of acquired knowledge.

> (...) we propose that the amygdala may be overtly active in autism, and hence autistic individuals may in principle be very well able to attend to social cues, feel emotions and even empathize with others or read their minds, but they avoid doing so, because it is emotionally too overwhelming, anxiety-inducing, and stressful.

> The Intense World Theory proposes that amygdaloid hyper-reactivity and hyper-plasticity may in particular provoke a disproportional level of negative emotions and affect in autism, such as elevated stress responses and anxiety as well as enhanced fear memory formation.

Expand full comment

> it's more like way more noise is getting mistaken for signal across the board.

This can be seen in an underfitting sense as well though: your brain learns to calibrate to filter out irrelevant details, but if it underfits this filter, then everything just blasts through and overwhelms downstream systems.

Expand full comment

If you think of socialization as the extraction of rules for behavior from a lifetime of observations, the autistic person extracts few, very general rules from the same observations out of which a more neurotypical person would extract many more specific rules.

Expand full comment

That’s the angle I was looking for. Thanks!

Expand full comment

Autistic people are generally pretty uninterested in other people. As infants they arched their backs and struggled when held. They do not show normal attachment behaviors. Seems to me that failure to give a shit what other people do, think, or want could account for failure to learn social skills and whatever else other people try to teach them just as well as a wiring problem that interferes with their extracting rules.

Expand full comment

Either condition would lead to underfitting, I agree, but I don’t think it’s right to say that autistic people don’t give a shit in general. Many do and try hard and still have difficulties parsing the cues and associating them with moods and intentions and expectations.

Expand full comment

I would think of it in terms of the other end of the autism spectrum; the kind where you're unable to function normally because you can't process sensory data properly. So you're so severely underfitting that you can't see a dog as a dog, it's just a confusing fast-moving area of colours that your brain can't manage to resolve into a dog.

Then you could imagine a less severe version of the same thing; your brain is okay at seeing dogs, but has trouble unpacking more complex stuff like facial expressions and tone of voice.

Expand full comment

Looked at from the opposite perspective, neurotypicals overfit to local social rules so much that they frequently mistake contingent customs of their social circle for universal laws of morality. Autists are much more able to see the range of possible customs, but much less good at learning what customs are locally in force.

Expand full comment

While I know that the current practice is to talk about an autistic spectrum, I am very dubious about considering people with an Asperger-type profile as having a mild case of autism. People with autism, in the usual sense, are not going to be saying things like "I didn’t understand you expected me to do X because my standards for X are these many overfitted criteria that must be met to make things very clear.” Many of them are not going to be saying anything at all, because they are mostly or completely non-verbal. They are generally uninterested in other people, and often very attached to stim activities like twitching a shiny object in a certain way. So they're going to be squatting in a corner with their back to you staring at the spoon they're twirling, not bugging you with their over-valuing of tiny details and their need for perfection.

Expand full comment

Thanks, good point that this paradigm may not play well with the “autism is a spectrum” paradigm. Appreciated!

Expand full comment

A little late to the party here, but I have recently found this paper to be quite elucidating for how to conceptualize autism:

https://www.researchgate.net/publication/366427549_A_cybernetic_theory_of_autism_Autism_as_a_consequence_of_low_trait_Plasticity

It shows how common autistic traits can reult from low plasticity. I think an important aspect of this might be how plasticity gets conceptualized: In common parlance, it seems to be understood as the ability to intergrate new information, whereas from a cybernetic perspective, it might be understood as the tendency to seek out new information/tendency for exploration. From that perspective, autists don't fail to learn social stuff because they are unable to integrate the information they get, but because the information the might be able to get seems unimportant and hence not worth exploring. This would further mean that it is neither an issue of under- nor overfitting.

Expand full comment

I personally am a depressed dissociated autistic with ADHD, and I know lots of other people with disorders from multiple of the model's quadrants.

If the deal with that is "your neurons are just sort of generally bad at forming accurate beliefs", you'd expect these people to be intellectually disabled or entirely out of touch with reality, but all of the people I'm thinking about are high IQ, lucid, and able to manage their own affairs.

This applies even if we just think about one disorder at a time - depressed people may be stuck in the belief that everything's hopeless, but they don't seem to get stuck in all their beliefs equally across the board.

It may well be that depression represents a canal having formed, but why that particular canal is probably complicated and interesting and won't fit into a cute small model like this.

Expand full comment

The diagnoses I have received throughout my life have gone from F3g -> F1g -> F2g -> F4g -> F2g, without ever receiving the same diagnosis from two different doctors. Until the age of 24 I was at my most lucid, capable, functional, intelligent etc without medication. I am pretty sure that all of this is about as real as solomon's demons.

Expand full comment

Might it be the case that the brain has different regions which do different things, and that one region (e.g., learning through physical input) might be a bit larger than optimal while another region (e.g., parsing human expressions) might be a bit smaller than optimal? Although it's probably not discrete regions that do one thing or another, but more about how much area of the brain is shared among which functions.

Expand full comment

Of course, "optimal" is always in the context of a particular set of environments. A change in the environments results in something else being "optimal".

Expand full comment

I think one idea that could explain a lot is observation/analysis resolution, or the bandwidth/compute available for digesting inputs. In this thought experiment I'm assuming that this is a genetic trait that doesn't evolve over time.

If you've been bandwidth constrained since childhood you've necessarily developed a robust ability to discern signal from noise. You need to identify the relevant input data before you analyze it, because you couldn't possibly analyze everything.

On the other hand if there's ample compute available for data analysis, you don't have to be so picky. Just accept all the inputs and sort out what's relevant after you make sense of it.

Both are fine on their own, but problems occur when we consider prediction model / reward function formation. From inception you start developing your reward function, what's good and what's bad, and what kind of data predicts good or bad things. You start forming your valleys and peaks.

If you're bandwidth constrained these models are formed at a somewhat coarse resolution, whereas if you're on broadband your models become very intricate very quickly. The more generalistic models are more adaptable to changing circumstance, while the detailed ones rapidly become out of date.

Even though the high-resolution mind has sufficient plasticity to generate updated prediction models, the "good old" reward function still haunts it. While there's new canals that fit current inputs nicely, it just doesn't feel the same as the original valleys etched into memory.

This could lead to various different pathologies:

depression, as it becomes apparent that inputs will never fit the ideal again;

anxiety, because while inputs fit newer reward criteria, there's constant severe mismatch with the legacy criteria;

ADHD, where the brain matches inputs with parts of the model and gets excited, but then grinds to a halt when it finds mismatches;

Borderline disorder, when the mind becomes distrustful of any models based on old data and decides to continuously retrain itself from scratch, and so forth.

Expand full comment

Why is it surprising that there would be a general factor that correlates with all physical and mental pathology? Shouldn't we expect factors like general health, mutational load, diet, local air quality, etc. to act as a composite factor that just makes everything about your body work better or worse?

Expand full comment

Metanarratives aren't very fashionable these days among the literate set. Unless you're a research scientist/lab rat, then discovering a metanarrative is your favorite daydream.

Expand full comment

~*~::~*~oxidative stress~*~::~*~

Expand full comment

Karl Smith argues that we should a priori prefer a single cause... for anything resembling an epidemic:

https://modeledbehavior.wordpress.com/2012/01/08/on-lead/

There are many different forms of mental disorders, particularly if we include autism, so it's harder to say if his logic applies to that.

Expand full comment

I do find the argument convincing for epidemic but mental pathologies are not epidemic, and every biological phenomenon is highly multi-causal.

It seems to me the main failing of Frissonian models, this insistance on shoeing every observations within one explanation.I think that it might be an idea coming from evolutionary biology, where fitness is The One parameter that rule them all. But fitness is not a ´real’ property of organisms, it is a summary of many different traits.

Expand full comment

Here is your insane schizo take for the day: all of these things correspond to computational parameters, but those parameters are not within individual human brains, they are within the simulation that is running us. Whatever intelligence is simulating us suffers from canalization in respect to mapping certain clusters of traits together, then constructs the simulation in such a way that those traits have a common factor. The probability space of physically lawful brain models is probably excessively big relative to the probability space of originally occurring organic human brains in original meatspace. We are victims of the ongoing epistemic collapse and AI has not, or not yet mediated it.

Expand full comment

This is actually just the same as saying that our minds are running on a physical substrate and it's the physical substrate that performs computation. So yes, to the extent that the canalization model is correct, it exists in the physical world that is "simulating" us, whether there's a universe "outside" the simulation or not.

Expand full comment

It's not topographically identical. Also, insofar as brains are different things that vary and not one singular platonic mass, whereas most simulations would probably just instantiate a bunch of objects based on a literally identical template, these things are entirely different.

Expand full comment

Point blank, if we live in a universe that exactly fits any model then we live in a model universe.

Expand full comment

#include <iostream>

class Brain {

public:

void think() {

std::cout << "The brain is thinking about itself." << std::endl;

}

};

int main() {

Brain myBrain;

// Call the think() method on the myBrain instance

myBrain.think();

return 0;

}

Expand full comment

It seems like the relevant comparison isn't HIV/AIDS to ADHD or depression, it's "flu-like symptoms" or "rash with small, itchy bumps". Maybe I'm confused about how diagnosis for, say, anxiety works, but my understanding is that it's mostly matching a nexus of symptoms. We're aware of some discrete causes for these symptom-nexuses in some cases (e.g. certain types of brain injury) and aware that we're missing others. There's obviously some (at least partially) physical illnesses whose diagnoses is still basically "you have this nexus of mysterious symptoms", but that's not most people's every-day experience with physical ailments.

In this thinking categories like "depression" seem very useful for symptomatic treatment (in the same way that if I have a fever, aches, and stuffiness there are dedicated meds for treating "cold and flu" symptoms) but not very useful for specific causal diagnosis (because there's so many discrete things that can cause effectively identical symptoms).

And yeah, that common factor conclusion seems...suspicious.

Expand full comment

Indeed, this is what a "syndrome" is in medicine, it's just a symptom-nexus, diagnosing a syndrome is making no claims about the cause, just saying "I notice that your symptoms fit this cluster". And most psychiatric diagnoses only go that far.

Which is definitely an extremely useful thing to do, don't get me wrong, but it does point at the fact that in most cases we don't actually know the cause.

(sometimes this is because it's not really worth our time and resources to study - a whole bunch of different viruses cause "cold and flu" symptoms, but most of the time it doesn't actually matter which one it is because regardless the treatment is the same. We could pour a ton of R&D money into specific tests and treatments for each one and save a few QALYs from the "common cold", or we could research other diseases that cause more suffering and/or are easier to understand and develop drugs for.)

Expand full comment

It seems to me that if we suppose that a ´well working brain’ has many relatively indépendant dimensions of ´working well’ and that ´insults’ to the functioning of the brain typically affect several dimensions simultaneously, then this will produce the observed correlation between the probabilities of different mental pathologies, and thus a common factor that does not correspond to any characteristics of brain functioning but is just a statistical summary.

Expand full comment

This is my favorite explanation. With external stress (insults) as a driving external cause for severity of symptoms in any one person.

Expand full comment

Falling into the valley reminded me of the concept of learned helplessness.

https://en.wikipedia.org/wiki/Learned_helplessness

Expand full comment

Is there a good reference for a non-medical-professional on the researching about a general factor for psychopathology? It strikes me as suspicious on its face, but I'd like to understand the general structure of the research that arrived there.

Expand full comment

Canalization in say, the way Stuart Kauffman uses it in the context of patterns (attractors) in a dynamical system refers to a situation where "most possible inputs will produce a small number of effects". These experiments relate to the idea of the fitness landscape which itself is quite similar to the idea of an energy landscape seen upside down. The fitness landscape concept also shows what the dimensions and the height of the space represent: the n dimensions represent n genes and moving along one genetic axis, height represents the outcome in fitness of all possible combinations with all other genes (or associated phenotypic expression). Height is strength of the effect, or fitness. For mental states (instead of genes), the height would be "strength of attractor" across all possible combinations of different mental states. The idea is that some mental states are strong attractors - stable mental conditions, some healthy, some less so.

There are two ways to produce strong (stable) attractors in this model: 1. to have few interactions between genes / states (if almost everything interacts with almost everything else, then there is no stability - a small change somewhere will instantly lead to a different attractor). 2. Canalization: there may be aq lot of interactions between genes / states, but most of these still lead to the same outputs.

So canalization helps with stability - without canalization, there would be many weak attractors in your dynamical system, if all genes / states interact randomly with nearly all others. With canalization, there are fewer and stronger attractors.

Another way of thinking about this is how modularity cuts down on individual interactions between components. When modularity is present, there is less need for canalizing inputs (or perhaps one could say that modularity is implicit canalization).

Expand full comment

I thought from the description that 'canalisation' would only happen during 'training', as it affects the landscape rather than the 'cursor'. But then the paper discusses canalisation during inference - how can this even happen (without being training)? Am I misunderstanding?

Expand full comment

If I understand correctly, canalization during training refers to the process of formation of canals in the landscape, while canalization during inference refers to the traversal of the already formed canals by the cursor.

Expand full comment

> Once when I was on some research chemicals (for laboratory use only!) my train of thought got stuck in a loop.

Oh man, this happened to me once due to an edible. It was close to the worst experience of my life, perhaps 2nd worst. It was most certainly horrific. I was thinking of the term harrowing at the time. And for me, I didn't come out of it for hours, and the trip itself lasted over a day, though not all of it was as bad as that loop. And even though it only lasted hours, it felt like millennia. Time may as well have stopped, and sometimes I think there's some version of me still stuck back there in time, since time had no meaning to me in that moment. Only the constant cycling of thoughts and escalation in fear, until the fear became its own physical sensation, cycling itself. Don't do drugs, kids.

Expand full comment

I know someone who has the same response to cannabis. I'd never heard of anyone else having it. So this is highly interesting. Have you done any research into this effect?

Expand full comment

Not really. It never really happened to me when smoking joints beforehand. But then when I had that edible, it happened, and the one time I smoked a joint afterward, it happened again (though not as intense). So I decided to just never smoke weed again at all. I'd love to learn more though, to see what the deal was.

Interestingly enough, that experience made me more aware of my thoughts in different states. For example, I never used to notice how my brain wanders right before I'm going to sleep, but since that experience I do notice it, since it's a similar feeling on a very small level.

Expand full comment

Ever tried DMT? Same guy I know says it has basically no effect on him.

Expand full comment

Oh man, no! I'm terrified of DMT. I had a friend from high school who did DMT when he went off to college. He ended up in a mental institution for 6 months. He said DMT broke reality for him. In fact, that was something echoing in my mind during my trip, I was thinking about him telling me that. I kept thinking I destroyed my mind, broke my perception of reality.

Expand full comment

I had a similar experience and it lead me to understand the point of meditation: identifying with my thoughts was in that moment extremely clearly a source of suffering, whereas if I unfocused and simply enjoyed the cacophony without trying to interpret it, I would reach a state of bliss.

I accepted the possibility of losing my sanity and stopped fighting the sensation, and the overwhelming fear dissipated. The experience was quite fun after that and I feel I've been a more confident, less neurotic person since.

Expand full comment

-- OK, so "canalization" is "habits", or "getting stuck in a rut", or "learning something until it becomes a reflex", or "I could do that in my sleep", or "practice a punch 10000 times". Don't weightlifters also talk about this, in relation to why it's bad to train to failure?

-- I'm skeptical of a clean distinction between inference and learning, when it comes to people. I realize that this is acknowledged, but it doesn't seem grappled with. Also, we do also have things like "moods" - I'm more likely to react to a stimulus one way when I'm in a contemplative mood, and another when I'm in an angry mood. It feels awkward to speculate about another person like this, but my guess about the bad trip was that it put the brain into a state where there was a loop, and then later the state and therefore loop went away, and to the degree Scott came out of it a new person with new perspectives on life, the change was "learning", and to the degree Scott was the same old person, the change was "mood". I've not had any such experiences myself, but some people seem to have reported each type of experience. **shrug**

-- The discussion of Good Old Neural Nets seems solid. It should be noted that the computer versions were originally an attempt at mimicking what the brain does, and how neurons and synapses work. Deep learning dropped some of the superficial similarities to the brain, but may have introduced deeper similarities. (Edit: that was an awkward phrasing.) Is it closer? I dunno, but it seems to work better.

-- Generally, with Good Old Neural Networks, overfitting and underfitting were due to the size of your net. Say you had a training set of 50 pictures of dogs and 50 pictures of non-dogs. If your net was too big, it would overfit by, for example, storing each of the 100 pictures individually, giving a perfect answer on each of them, and giving random noise for other input. If your net was too small, it would underfit by, for example, identifying a pixel that was one color in 51% of the dogs and a different color in 51% of the non-dogs, and basing all decisions off of that one pixel. Those are extreme examples, but they should give the idea.

-- Overplasticity seems like what's going on with the fine-tuning and RLHF turning LLMs from good predictors into wishy-washy mediocrities that will at least never say anything bad. But I don't know much about how that works in practice, with the deeper nets of today.

> they say that overfitting is too much canalization during inference, and underfitting is too little canalization during inference. Likewise, over-plasticity is too much canalization during training, and over-stability is too little canalization during training.

-- I really don't know what to make of that. Both halves sort of make sense, but in different ways that don't seem compatible? I feel like "canalization" is being used to mean too many things at once.

-- The 4-part model is interesting. I do kinda buy the top left square: those things fit together, and feel intuitively to me like being stuck in a rut. I don't know enough about the other conditions to say anything useful.

> Canal Retentiveness

**applause**

> This question led me to wonder if there was a general factor of physical disease

'd' is clearly derived from Con, just as 'p' is derived from Wis, and the combination is related to the point buy. Next!

Expand full comment

About moods, what is the thing that made the loop experience so terrible for Scott? How do we think about that? Why didn’t he just experience it as neutral?

In depression or anxiety or OCD, there’s generally this sense, isn’t there, that what is going on is bad (catastrophizing)?

Expand full comment

I dimly recall hearing about some Buddhists experiencing positive loops like that, in meditation. (Or maybe it was someone who was heavily into functional programming?) My uneducated guess is that the phenomenon is sensitive to preparation and also to random initial factors. And also reporting bias - there's no box in the forms for "everything in my life is going great, in a self-sustaining loop".

Expand full comment

The mood part often seems to me to be the suffering we experience in the gap between how we think things should be and how they are. “I should have some control over my thoughts” up against the experience for that stretch of feeling very out of control of one’s thoughts.

Expand full comment

That does seem like the classic Buddhist and Stoic answer...

Expand full comment

Right. I do want to add that I don't think that accounts for all of mood disorder experience or that all mood disorders can be fixed through mental practice. The pain experienced in the gap may be universal to humans, but all kinds of biological and environmental factors may make that gap wider or the pain experienced in it greater or deprive a person of buffering factors that could help at both of those levels.

Expand full comment

I think it's about identifying with your thoughts as opposed to observing your thoughts.

Getting stuck in a thought loop is very different from having your thoughts stuck in a loop, if you catch my drift? If a thought loop is like a spinning top, the experience is completely depending on whether you're merely observing a top spin, or if you yourself are spinning uncontrollably.

It's also about how you assign values to experiences. Being stuck in a thought loop is anathema if you're value being in control above all else: it's like a chinese finger trap that only works because you're trying to struggle out of it. If you're fine with losing control, the thought of it happening elicits no fear response, and the loop never forms.

Expand full comment

Yes, unlike essentially set in stone artificial NNs, human brain is greatly amenable to transient chemical influence in the moments of "inference", whether it's hormone-induced moods, or drug-related exotic states. This doesn't seem to mesh with everything being explainable by "canals" within the computational structure of the brain-NN itself?

Expand full comment

Yeah. I haven't read the paper itself but the summary conveys the pop-sci "smell" of speculating on how a newly-discovered mechanism explains a lot of different things. When it probably merely has a varying amount of effect on those things, mostly small.

Expand full comment

> I'm skeptical of a clean distinction between inference and learning, when it comes to people. I realize that this is acknowledged, but it doesn't seem grappled with.

I would also like to read more about this. I think it's worth a follow-up post, if Scott doesn't consider anything else more pressing (though he probably does).

Expand full comment

> Once when I was on some research chemicals (for laboratory use only!) my train of thought got stuck in a loop.

This is a not-uncommon effect of psychedelics, and I feel like it must be analogous to the thing AIs sometimes do where they loop on the same sequence of words for a bit

Expand full comment

Reminds me of the joke of different professions' "proofs" that all odd numbers are prime. The computer programmer's proof: 3 is prime, 5 is prime, 7 is prime, 7 is prime, 7 is prime...

Expand full comment

I just recently wrote up my thoughts on how such computational models of the mind relate to qualia and self-awareness

https://sigil.substack.com/p/you-are-a-computer-and-no-thats-not

Expand full comment

Local nitpick, globally important, error correction. If my understanding of current LLM design is correct then this statement is incorrect:

> But it changes which weights are active right now. If the AI has a big context window, it might change which weights are active for the next few questions, or the next few minutes, or however the AI works.

The AI doesn't change its weights while it is writing a lengthy output, I believe what is done is the previous output is fed back in as an input for the following token. At the beginning the input consists of the user input, which results in a single token output. The user input plus that one token output are then fed back into the AI which generates another single token. This process repeats until the AI decides it is "done".

The "size of the context window" is, IIUC, how many tokens the AI will accept as input. This means if you have a long conversation with an AI, eventually you can no longer feedback all of the previous conversation and you have to cut some stuff out. The naive solution to this culling is to just cull the oldest. This is why you can get the AI to be a lot smarter by having it periodically summarize the conversation, because it functions as a form of conversational compression so you can fit more "information" into the limited amount of input space (context window).

So weights don't change with the context, but which neurons get activated and which don't *will* change because the input changes with each new token added to the input.

I say this is a "local nitpick" and "globally important" because this error doesn't change the meaning/message of this article, but it may change the way you think about current AIs slightly!

If I'm wrong I am hoping someone corrects me!

Expand full comment

I noticed that too, but he followed it up with "or however AI works", so close enough lol.

Expand full comment

Hah, yeah, I saw the "or however AI works" as "I acknowledge I don't really get how this works, but this is my current mental model and hopefully someone in the comments will help me improve my mental model". I may have read too much between the lines though. 😛

Expand full comment

Yeah, to my best understanding, that's how LLMs work at the moment. (I also noticed this, but then forgot when I went back to write my response. Per a discussion in the open thread, was I conscious then?)

I think there are currently some LLM designs that work around this by having the LLM maintain an external store of data, but I forget which ones they are.

Expand full comment

I took this to be talking about Multi Headed Attention, which is a key feature of transformer model architecture.

From a programming perspective obviously all your weights are in use all the time, but it doesn't seem wrong to explain MHA as "turning on and off weights depending on the context you've picked up so far". A more accurate way to explain it though would be "upregulating/downregulating particular response modalities based on context".

So using the terms from this article, an 'overfitted' LLM would be very context dependent in its responses. For example if you asked "Why does Paris Hilton not like genocide?" it would use the category of "questions about Paris Hilton" and say something like "Paris Hilton revealed in a recent tell-all interview that she hates confrontation".

An 'underfitted' LLM would make the opposite mistake – it wouldn't appropriately take context into account, so if you asked "Does Adolf Hitler like genocide?" it would bucket this as a general question about liking things and give a generic reply that ignored the specifics of the question: "He likes mystery novels and long walks on the beach".

Obviously this isn't how anyone in ML uses the terms 'overfitting'and 'underfitting'. I'm not even sure that it's possible to get a transformer to make these classes of error, but it might be interesting to try!

Expand full comment

Cool, thanks! I'd been wondering how "attention" would interact with all this. I need I read up on that.

Expand full comment

No problem! https://towardsdatascience.com/transformers-explained-visually-part-1-overview-of-functionality-95a6dd460452 looks like a good explanation (Ctrl+F "What does Attention Do?") with the subsequent articles going into more detail on the actual implementation.

I'm not an expert either – I've implemented some toy transformers from scratch and and gone through the theory on Coursera, but I don't know that I really grok it all. So I have some reading to do too!

If there's anyone in thread who's actually a specialist, I'd be very happy to be corrected if I've got anything wrong!

Expand full comment

«I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?”»

These things are well-defined in the context of a Hopfield network. Each dimension is simply the state of each neuron, and the "height" is defined by a Hamiltonian - broadly it can thought of the mismatch between the network state and what each individual connection weight is telling the network to do.

As far as I can tell, Friston’s schtick is basically just using Hopfield-style networks as a fuzzy metaphor for all of cognition.

http://www.scholarpedia.org/article/Hopfield_network

Expand full comment

I was going to say that this is a picture of the "loss landscape". In that case, the height represents how far from the goal you are trying to optimize for (your loss function) you are, and the other dimensions are the parameters of the neural network. Gradient descent is descending into these valleys depending on the slope of the mountains. I think on Scott's model, what the brain is trying to optimize for is "predict the immediate future" so that would be the loss function.

Expand full comment

I think these are different things. Moving around a loss landscape is a description of learning. In Friston’s analogy, minimizing free energy is a description of how a "pre-trained" brain behaves, which makes it like the free energy of a Hopfield network - i.e. a description of behaviour, not learning.

Learning, in the context of a free energy landscape, would mean making the individual peaks and troughs lower or higher.

Expand full comment

In the context of actual brain function, wouldn't you expect learning to be happening at the same time, though? So every time you descend a gradient and get feedback about the results of that descent shouldn't that change the shape of the nearby landscape?

Expand full comment

That was one frustration i had reading that article was not knowing quite how they were using this landscape idea. By the end i had the sense it was being used as a fuzzy metaphor but being deployed in a way that seemed more literal and that i just didn’t understand the underlying science. I kept thinking surely they would say up front if they meant this topology merely in a metaphorical way to propose a model that there’s no evidence for. But i can be kinda literal…

Expand full comment

Even if there is a "general factor of psychopathology," canalization or otherwise, I wouldn't interpret it as "the cause of every disorder in the DSM," but more like "a cause of many of the common disorders in the DSM."

This is a weaker and more plausible claim.

Expand full comment

Speaking of the general factor of intelligence, Gwern linked a bit ago to this interesting paper: https://gwern.net/doc/iq/2006-vandermaas.pdf which posits that the positive manifold (every mental skill correlates positively with every other mental skill) may be due to a developmental process whereby, during development, each mental skill boosts the development of the others.

Apparently the authors of this paper later realized the model has some problems? Idk, I'm not up to date on the literature here at all. Still, thought I should point it out as interesting.

Expand full comment

"Likewise, it’s interesting to see autism and schizophrenia in opposite quadrants, given the diametrical model of the differences between them. But I’ve since retreated from the diametrical model in favor of believing they’re alike on one dimension and opposite on another - see eg the figure above, which shows a positive genetic correlation between them."

This actually does match the paper - it puts schizophrenia in the underfitting/catastrophic forgetting quadrant. You seem to have swapped it with borderline when you recreated the figure.

Expand full comment

also, wrt autism: I don't think the canal model disagrees with other predictive coding models as much as it seems to.

Sander et al: "A chronically increased precision will initiate new learning at every new instance. Hence, future predictions are shaped by noise or contingencies that are unlikely to repeat in the future (also called overfitting). An organism that developed this kind of priors will have strong predictions on what to expect next but such predictions will quasi-never be applicable."

Juliani: "[Autism] can be characterized by an inconsistent deployment of mental circuits ... as well as an inability or difficulty in learning or changing these circuits over time."

I think these match up pretty well: autism involves creating strong, specific priors that are difficult to change, even though those priors routinely fail to be validated by experience. The confusing part is that what the other models call "overfitting", Juliani calls "plasticity loss"... and then he goes on to call something else "overfitting" anyway.

I can kind of see why autistic inference could be considered "underfitting", though. Like the underfit ANN that classifies all mammals as "dog", the autistic mind might not realize e.g. that a sarcastic comment goes in a different bucket from a genuine one.

Expand full comment

'Recent research has suggested a similar “general factor of psychopathology”. All mental illnesses are correlated; people with depression also tend to have more anxiety, psychosis, attention problems, etc' Is this research based on population surveys or the prevalence of diagnosis? That is to say how confident are we the underlying factor isn't talking to someone capable of officially diagnosising mental illness?

Expand full comment

I don't get the... - I can't find the right word, defensiveness? - about comparing brains to neural networks. Neural networks are distinctly different from clockwork or switchboards or computers, in that they are like brains in a very non-metaphorical sense; they were modeled after the building blocks of the brain and are bound to have many of the same features. From what I understand, they were used to experimentally model and test hypotheses about brain functioning long before they found any useful real-world applications, and the two fields studying them are very much in contact with each other.

Of course, there's also a very real sense in which LLM advocates overblow the similarity in a naive hope that sufficiently trained LLMs will magically start exhibiting all of the brains' incredible complexity, but I don't think this is worth worrying about in this particular instance. The concepts in question seem to be fundamental, basic features of neural networks, biological and artificial alike, and I'm finding it safe - natural, even - to assume that any similarities between the two are real, meaningful, and informative, and directly translate to each other.

Expand full comment

The idea of active inference has always seemed backwards to me. My intuition says that it should be that to trigger an action you need to form a belief that that action *won't* happen. Like with Lens' Law: you move a magnet by putting current through a wire in the opposite direction to how it would flow if the magnet moved. Or prediction markets: you pay for Putin to be assassinated by betting that he'll survive to the end of the year.

Expand full comment

I don’t know who I’m writing this for. Probably mostly myself. But I had a lot of trouble wrapping my head around these metaphors and visualizations, until I realized it was the low resolution and complexity that tripped me up.

I’m not sure my takeaway is the right one, but I felt it helped to replace the single ball on the landscape with a rainfall of inputs and impulses (such is life). Also, I replaced the two-by-two matrix with something without strict binaries (surely representing plasticity and fitting as binaries, with no optimal level, invites misinterpretations?).

Imagining the rain falling on the landscape of the mind – originally shaped by one’s genes and circumstances – it’s clear how the rain gathers in pools and streams. Over time, creating rivers and canyons, lakes and large deltas.

The dimensions don’t have to represent anything other than the facts of their own existence, unique characteristics of our personality – just as peaks and valleys on the real landscape don’t “represent” anything. You could probably say that they represent traits or neurons or something else, depending on your lens, but you risk getting the resolution wrong again. The important part is that it’s the “shape” of where the environment meets the person.

Everything is constant interplay between the genetics and circumstances that determines your day-1 landscape, and the experiences, habits, information, treatment and environment that make up the weather.

Fitting is a consequence of the topography and size of the metaphorical drainage basins in your landscape; plasticity is represented by the local soil and vegetation and how robust it is to erosion.

And so, I can see how the weather could affect a landscape predisposed for, say, depression, and turn it from a melancholy trickle in a creek into a veritable Blue Nile, which can cause utter devastation.

There’s always a risk of taking a model or metaphor too far – the map is never the territory – but I can see how someone might want to solve their flooding problems by constructing a mental dam, how others might try to change the weather, how someone might try to divert a river, and how any of those, mismanaged, might have serious mental health consequences.

Like evolution by natural selection, or actual erosion, or real weather, it all seems like one of those processes that are based on innumerable simple inputs, following very fundamental principles, to produce infinitely complex outcomes with all kinds of feedback mechanisms.

I don’t know how much I have perverted the original model here, but understood like this, I can see how it could be a good and useful metaphor. Not least since we all have a sense of how hard it is to change the course of a river.

On the other hand, as models go, it doesn’t seem particularly new or trendy, so I’m probably missing or misunderstanding something.

Is it our rapidly evolving understanding of the “geology” in the metaphor that gives it new and deeper relevance and usefulness as a model?

Expand full comment

Very cool expansion on the concept - continual learning as weather patterns. Repeated weather patterns yield large-scale landscape patterns. Eventually the weather wears us away to nothing and we die. ...ok, you're right, it's possible to take the analogy too far. ;) But I like it.

Expand full comment

Thanks. 😀

Expand full comment

Thank you for writing this!

Expand full comment

> I find I can’t answer the question “what do the dimensions of the space represent?” or even “what does the height of the landscape represent?” My best guess is something like “some kind of artificial space representing all possible thoughts, analogous to thingspace or ML latent spaces”, and “free energy”

Nothing specific would make sense in 3d anyway, I imagine.

"To deal with hyper-planes in a 14-dimensional space, visualize a 3-D space and say 'fourteen' to yourself very loudly. Everyone does it." ~Geoffrey Hinton

Expand full comment

> my train of thought got stuck in a loop

Are loops a scientific thing? I strongly suspect they explain what BDSM folk call "subspace".

Expand full comment

I wonder how many times the "Waddington landscape" figure has been reprinted.

Expand full comment

Interestingly Geoffrey Hinton (often referred to in the literature as one of the 'grandfathers' of deep learning) wrote a highly cited paper in the 1980s called 'How learning can guide evolution' (available online here: https://icts.res.in/sites/default/files/How%20Learning%20Can%20Guide%20Evolution%20-1987.pdf) about canalisation (in the evolutionary sense).

I tried to get an evolutionary ecologist friend of mine to collaborate on a paper on whether learning (at the level of the individual organism) could lead to divergent canalisation (which might conceivably eventually lead to speciation), but I couldn't sustain his interest.

Expand full comment

"I can’t answer the question “what do the dimensions of the space represent?”"

Thingspace is really complicated. But fwiw, some researchers created a neat little toy space, the bird space, where the two main dimensions are the length of the neck and the length of the legs. You can look at some moderately funny pictures in the discussion at "Grid Cells for Conceptual Spaces?"

https://www.sciencedirect.com/science/article/pii/S0896627316307073

(You may need to use sci-hub to access it if that's legal for you.)

Expand full comment

I was better able to understand the metaphor when I thought of type A (inference) canalization as the parameter of how dense, round, and smooth the cursor is - i.e., how strongly it wants to roll downhill. A bowling ball would over-canalize, a feather would under-canalize, and maybe something like a d12 would be a good balance. Different aspects of the cursor could correspond to different aspects of inference, for example weight-to-surface-area ratio might correspond with being easily distracted (under-canalized during inference) because a breeze of random noise could blow you around.

Then, type B (learning) canalization is about the material of which the landscape is made. Sometimes it's putty and deforms at the slightest input, sometimes it's steel and resists change. Sometimes it's, what? Literally plastic? Maybe something like warm, soft plastic that resists light/spurious forces but response predictably to stronger forces. So, literally plasticity.

As for people historically poo-pooing metaphors about cognition, and being confused by the canal metaphor, it's important to remember that a metaphor is not the truth. To use a metametaphor: imagine your current understanding is at position A, and the truth is at B. A direct line of reasoning from A to B is difficult to cross because it's, I don't know, full of mountains and dense forests. But you notice a path that seems to kind of head in the direction of B. That path is a metaphor. It can get you closer to B, some other A' where the metametaphorical path ends. Then, as you explore A', you might find another metaphor - a path that will get you even closer to B, some other A'' where you can repeat the process. Eventually you arrive at B, though probably you had to do a little bit of metametaphorical bushwhacking for the last leg. Then, if B is a useful concept space, you open up a gold mine, and if you want to share the wealth you might build an explanatory road from the original A to B in the form of a blog post or sequence. If the gold mine is big enough, people will write whole textbooks and turn your road into a superhighway. I like metaphors.

Expand full comment

This was helpful for me, thanks for the "cursor material interacts with the substrate" metaphor.

Expand full comment

I'm glad it helped! It mixes well with Jay Rollins' comparison to Marble Madness in a different comment.

Expand full comment

Reminds me of this article that claims that the reason for critical periods in human development is the ability to forget. (eg kids learn new languages easier than adults because they can change their landscape easier)

https://www.businessinsider.com.au/learning-and-aging-2013-1

Its basically a explore/exploit trade-off where adult neurons are in exploit mode and cannot unlearn their existing pathways, whereas children are in explore mode and are changing their landscape.

Expand full comment

My first thought is that while I love the "fitness landscape" analogy, it just seems so simple in the context of thoughts. Surely it is a multidimensional landscape and many landscapes not some single unified one. I even think in some very real sense there are multiple "balls" rolling around shrinking and growing in mass/speed/conscious salience/other properties.

Which is perhaps a different enough situation that it means the whole analogy isn't that helpful except in some specific cases.

>This doesn’t mean canalization is necessarily bad. Having habits/priors/tendencies is useful; without them you could never learn to edge-detect or walk or do anything at all. But go too far and you get . . . well, the authors suggest you get an increased tendency towards every psychiatric disease.

This strikes me as somewhat interesting, but not very different from where we were. I had a big struggle with depression in teens and early 20s for say 10 years. And I suspect if my life situation was bad I would still struggle with it. And to some extent the depression/suicidality at least partial arises out of things which otherwise are generally quite useful habits.

Tendency to dwell on and overanalyze things. Ability to focus. Lack of concern for other's feelings. Which is all great if you are doing forensic accounting about a big mess of a project.

Less helpful if you are doing a forensic accounting of your personal failings/injustices.

Expand full comment

Another great write-up! Especially like history bit: clockworks-switchboard-computer, of course all mechanical metaphors fall short because we are organic.

"The calculating machine works results which approach nearer to thought than any thing done by animals, but it does nothing which enables us to say it has any will, as animals have.”

--Blaise Pascal, Collected Works of Blaise Pascal, c.1669

Also, regarding: "sort of person who repeats “I hate myself, I hate myself”" Having read E. Easwaran Passage Meditation/Conquest of Mind and now Philokalia: The Bible of Orthodox Spirituality[A. M. Coniaris] - which are [all] surprisingly(to me) pluralist I'd suggest that the Mantram/Jesus Prayer would eventually if not quickly the increasing case load of our mental health professionals.

[Like giving a dog a chew toy so he stops tormenting the cat!]

[E2: like giving Tim Urban's primitive a rubber ball in exchange for the crude torch we all have. Better to facilitate playing well with others.]

Expand full comment

Or the nembutsu.

Expand full comment

I hadn't run across that term until now but YES. Thanks.

I read every major religion has this practice. So many useful aspects of native Western religion have been abandoned!

Expand full comment

I gotta say, I think the Orthodox are onto something when it comes to doing this in the person's native language.

Expand full comment