228 Comments

It seems like LLMs are smarter with more neurons but with negative concavity, so adding neurons to a smart LLM adds less intelligence than adding neurons to a dumb LLM. I did a bit of clicking and had trouble finding good zoomed-out graphs of the correlation between cortical neurons and intelligence for animals but I would guess it doesn't have as negative of a second derivative? Though maybe we're talking about different parts of the graph here.

Human intelligence is often split into fluid intelligence and crystallized intelligence, and a big part of crystallized intelligence is working memory. As far as I know working memory capacity isn't directly connected to the number of neurons. It's not how much a human knows, it's how many things a human can manipulate and reason about at once. This is also connected to chunking in long-term memory (having more chunks reduces working memory load) so maybe you could connect it to number of neurons that way but it seems like a stretch to me.

And working memory is a big difference between human intelligence and LLM intelligence. LLM intelligence has a huge long-term memory but doesn't have a great parallel for working memory.

Expand full comment

It seems extremely obvious to me that adding neurons has marginally decreasing effect. Adding 1 neuron to a model with a 2 parameters completely changes it's dynamics while adding 1 neuron to a model with a trillion parameters does essentially nothing. Even if the derivative is always positive (a fact which I'm sure is up for debate in itself), the idea that it would do anything other than continually decrease is a bit hard to wrap my head around.

Expand full comment

Given that neurons cost energy (and that adding slightly more neurons in mostly a small gradual change of the type evolution can explore well), I would expect that most animals are at the efficiency frontier were adding a few neurons would still improve thinking performance, but is not worth the other tradeoffs.

I would not expect any animal to have so many neurons that adding more actively hurts thinking performance.

As for artificial neural nets: over the decades we hit stages were adding more neurons made performance worse. That was mostly because back then we suffered from a combination of hard to train activation functions like tanh and primitive training algorithms. In addition, we had trouble with overfitting.

We made a lot of progress on all these and more fronts since then, to where adding more neurons is practically always better (if you can pay for the extra training compute).

> Even if the derivative is always positive (a fact which I'm sure is up for debate in itself), the idea that it would do anything other than continually decrease is a bit hard to wrap my head around.

Some problems might have threshold effects, were you suddenly get a lot better when you have just enough neurons?

But I agree that if you average over enough problems and smooth out the graph a bit, you would definitely expect diminishing marginal returns in general.

Expand full comment

"I would expect that most animals are at the efficiency frontier where adding a few neurons would still improve thinking performance, but is not worth the other tradeoffs." so how do you explain the occasional genius or family of geniuses (Bernouillis, Darwins & Wedgwoods, Curies, Bachs, Huxleys, Einsteins, Schumanns, Brontes). Clearly we aren't at the efficiency frontier, unless you relate all these to environment. Genius seems to be a step function.

Expand full comment

Humans would definitely benefit from more neurons; that would be the easiest way to raise human intelligence. We would be much smarter than we are if evolution had somehow managed to move the birth canal from where it's currently surrounded by the pelvis to somewhere on the abdomen, but we currently can't add neurons in a straightforward way, so there's evolutionary pressure to gain intelligence in less straightforward ways and some of those ways look step-function-like and usually have some kind of hidden tradeoff that prevents them from taking over, one of the commonest being that they are alleles that only increase intelligence when heterozygous.

(Epistemic Status: speculative but based in well-established pillars.)

Expand full comment

"We would be much smarter than we are if evolution had somehow managed to move the birth canal" - so why don't we evolve tighter packing like birds?

Expand full comment

Do smarter humans have more children?

Expand full comment

Interesting question, and I notice that biologists usually do not want to think about it at all, they think that it is correct and how it should be.

Or mammals could remove the constraint that brain need to be big at birth.

Expand full comment

Marsupial humans?

Expand full comment

Humans are practically marsupials already - our infants are much less developed and self-sufficient when born than basically every other placental mammal.

It's also common for human mothers to carry infants around with them the way marsupials do: https://en.wikipedia.org/wiki/Babywearing

Expand full comment

"We would be much smarter than we are if evolution had somehow managed to move the birth canal" - I doubt it. We'd have just reached this point in civilisation sooner. The potential for technology and civilisation was always there, waiting to arrive as soon as we were sufficiently intelligent to make it happen.

Expand full comment

>I would not expect any animal to have so many neurons that adding more actively hurts thinking performance.

autism is positively correlated with larger brains, so it might be that...

Expand full comment

Arguably, the size of the model's context window is the parallel to working memory. RAG won't work very well on a model with a context window of 500 tokens and a human who cannot hold three new ideas at a time, won't be able to combine them well (but might be able to do so later once he internalizes one or two of them and can start using them intuitively together with 1-2 genuinely new concepts)

Expand full comment

Context window isn't very directly parallel to anything in the human brain. From the perspective of biological neurons being able to directly tap into past activation values without having had to decide, when they first occur, to spend precious space keeping them around is a complete cheat. More neurons don't *only* bring more synaptic weights, and transformers, CNNs and diffusion models effectively act like they have way more neurons from a short-term memory perspective than they do from a weights perspective.

Expand full comment

Fair enough, but the context window is not just useful for RAG (humans basically have RAG too though, although it is very slow ... looking things up in books or on the internet and asking other people). It is also useful to keep track of the current line of thought by keeping it or its summary in the prompts of subsequent API calls.

That to me sounds similar to the working memory. If I can't remember an ongoing conversation and recall crucial elements from it, I will not be able to make any conclusions. And I cannot really add it to the long-term memory that fast. It is the same with the LLM (yes, you can save all conversations but those are not really saved in the LLM, it is just the LLM can be fed those things much more easily than a human can be, humans don't have the right API).

Expand full comment

Humans have two levels of memory prior to the long-term memory. Working memory is the shortest term but highest bandwidth, and only holds about 7 "semantic chunks" although the exact number depends on which human and how hard they strain their brain. It's not sufficient by itself to have a long coherent conversation, you would repeat yourself.

Expand full comment

I see, I didn't know that :)

And how much info is there roughly in a chunk like that? How big are the chunks? Or does that differ per person? How is that even measured?

And what is the second level of memory them?

You can also link me to something, I know this is a lot of questions :) Thanks!

Expand full comment

A "semantic chunk" is basically anything you can think of as easily as a digit or a word you know well. Using two-digit numbers as semantic chunks is a good way to keep longer numbers in mind. When you try to remember random sequences like a phone number or URL without "memorizing" them, they reside in your working memory and this is why you feel "strained" when doing this and will easily forget the thing you're trying to remember if you do any hard thinking--something like doing math or having a conversation will reuse your working memory slots and force the memorized sequence out.

The second level of memory is short-term memory and it handles remembering things for up to ~1 day. It has the second highest bandwidth and can remember more things than working memory, but they are less "available" and it is harder to get correlations or sequences into short term memory than working memory if they don't follow a rule that makes sense to you. This is the primary level at which making up mnemonics helps you remember things. It is strongly conjectured that sleep, and especially REM sleep, helps move things from short term memory to long term memory which is one reason a person shouldn't stay up all night studying.

Expand full comment

To the extent that LLMs can be compared to brains, I think it could be argued that context window size is in a manner equivalent to the number of sensory neurons.

Expand full comment

Whatever is going on in chain of thought at reasoning models has *something* to do with working memory, even if it’s not a perfect parallel. (Just as someone else said with context window.)

Expand full comment

>if you're "overcomplete", you don't have to hit upon the unique combination of elements that exactly represents your function, there are lots of ways to do it, so you'll come across one by chance more quickly

This also seems to suggest that the smarter you are in this sense, the better you’d be at rationalizing (and possibly more prone to do so).

Expand full comment

In neural nets, more parameters does increase the chances of overfitting. But in real life, I don't see a positive correlation between intelligence and *tendency* to rationalize (though probably positive correlation with *ability to rationalize, thinking of lawyers). So intelligence may also make better safeguards possible.

Expand full comment

Reading this with a 25.5” head circumference.

Everything makes sense.

Expand full comment

That’s my right bicep measurement. i wish.

Expand full comment

I'm glad you linked to polysemanticity and and superposition, and I think your friend is definitely on to something wrt neural networks.

To add two more things to chew on:

1) representational capacity does not increase linearly with the number of neurons, but rather exponentially. (consider a D dimensional vector. There are D^2 ways any given dimension d in D can interact with another element. So adding another dimension results in (D+1)^2 possible interactions between dimensions). So a slightly larger brain actually has a lot more representational capacity.

2) gradient descent (and its various variants) work by assuming that the loss manifold they are traversing is 'concave'. Imagine a 2d parabola. If you drop a ball into this parabola, it will eventually hit the bottom, and then you're done with your optimization...but if you add a third dimension, you may be able to go _below_ the bottom of the 2d parabola. You've given your ball another way to move.

It's possible that being at the bottom of the 2d parabola means you're also at the bottom of some 3d structure. But it's sorta unlikely. If you had a 3d bowl and you intersected it with a plane, the resulting 2d parabola would only be 'optimal' if it happened to intersect with the middle of the bowl. It's very unlikely that's the case.

(And then repeat with 4, 5, 6, ...dimensions)

Each additional neuron is in some sense adding another dimension to your optimization problem.

Expand full comment

> gradient descent (and its various variants) work by assuming that the loss manifold they are traversing is 'concave'.

You don't need to be strictly concave in the mathematical sense: in practice you have a bit of wiggle room. And luckily, adding more dimensions seems to make it less likely to get stuck in a high local minima.

Expand full comment

"AFAIK nobody has done the obvious next step and seen whether people with higher IQ have more neurons. This could be because the neuron-counting process involves dissolving the brain into a “soup”, and maybe this is too mad-science-y for the fun-hating spoilsports who run IRBs. "

The soup also removes all the connections between the neurons, of which the differences between people are probably doing a lot of the work. Probably you'll have to wait until the invention of good enough MRI-equivalents to register the neurons in vivo.

But hey, we may all be paperclips by then.

Expand full comment

Main thing I learned from this is that I do not have enough Neurons to understand intelligence or AI.

Expand full comment

I see nothing in this post as being good at explaining nn's

I suggest this one: https://www.youtube.com/watch?v=Anc2_mnb3V8

Expand full comment

Ok, fun example. I was at "No pattern solves this. I know because I've exhausted every known one of them in order."

Then for some reason I thought about walking past people on moving sidewalks at an airport.

Then the solution was painfully obvious.

Clearly if you have enough neurons (or like me just recently flew through SFO and the dizzying experience overwrote all your deep memories) you will store some information about airports in some of them and that's the secret.

Expand full comment

I got nowhere at first (just 'looking for a pattern'―system 1), but then had the thought "what if there are two independent blocks with the same color" (system 2 finally steps up?). Then I quickly found the pattern for one of the blocks, and more slowly for the other. (I must have needed three minutes, illustrating I'm not that smart.)

Expand full comment

Thanks to this comment I finally worked out what was going on. But IMO the possible choices A-F aren't well chosen: even when I had no idea I was thinking "igur svefg gjb be gur ynfg gjb, orpnhfr gurl unir fdhnerf va gur yrsg pbyhza naq gur bayl barf jvgu fdhnerf va gur yrsg pbyhza ner va gur yrsg pbyhza gurzfryirf." If I'd guessed I'd have said the correct answer even though I was at best halfway there.

Expand full comment

I gave Claude the problem, and it thought for 3.17 minutes and concluded that the answer should be two black squares in the right column, middle and bottom. It then picked the correct solution because it considered this closest to its answer, but I'm not sure whether this is because the choices are poorly chosen or just luck.

I also asked some other AIs, all of whom thought for more than a minute and none of whom answered correctly. It took me about 3 minutes to solve the puzzle, which seems to be a common experience among commenters, so humans seem to be retaining an edge on tasks like this.

Expand full comment

(I actually stopped scrolling, solved it, then found out it was multiple-choice)

Expand full comment

If you're talking about the IQ test:

Gjb oynpx obkrf va fhpprffvir znva ebjf frrz gb nqinapr hc be qbja fho-pbyhzaf. (Va gur svefg znva pbyhza, gur obkrf ner va gur fnzr fho-pbyhza, naq guhf jvgu gurve bccbfvgr cebterffvba pna bireync, nf gurl qb va gur gbc-yrsg vgrz)

Fb, tbvat qbja gur evtugzbfg znva pbyhza, gur zvqqyr fho-pbyhza obk vf fgrccvat hc n fho-ebj rnpu gvzr naq guhf va gur zvffvat obggbz-evtug vgrz jvyy unir jenccrq ebhaq gb gur obggbz fho-ebj, naq fvzvyneyl gur evtugzbfg fho-pbyhza obk vf fgrccvat qbja n fho-ebj rnpu gvzr naq jvyy nyfb raq hc ba gur obggbz fho-ebj bs gur zvffvat vgrz. Fb va fhzznel, V erpxba gur nafjre vf Q.

I found it quite tricky at first. Boy, I could do with some more neurons! :-)

Expand full comment

Interesting that people are thinking about it [fcngvnyyl].

I solved it by thinking of it like [na nevguzrgvp frdhrapr. Gurer vf bar fdhner gung fgnegf ng bar naq vaperzragf va fgrcf bs bar, naq nabgure gung fgnegf ng bar naq vaperzragf va fgrcf bs gjb (jvgu jenccvat).]

Expand full comment

I solved it the same way as you. I find it interesting that other people, including comments in the link seemingly found different ways to get to the same solution.

I found it tricky too - it certainly took me more that one minute :)

Expand full comment

V guvax gur qvssrerag nccebnpurf ner rdhvinyrag orpnhfr sbe n 3 * 3 neenl, jvgu jenc, gjb fdhnerf hc be qbja vf rdhvinyrag gb bar fdhner qbja be hc erfcrpgviryl.

Expand full comment

I think you are right - ohg vg vf gur erfhyg bs qvssrerag zbqrf bs guvaxvat. Cheryl fcngvny ernfbavat if. sbezny zngurzngvpny.

Abj vs bar bs gur oybpxf jrer zbivat va n qvssrerag qverpgvba, fnl gbc gbjneqf obggbz, juvyr gur bgure vf zbivat yrsg gb evtug.

V guvax gur sbezny zngurzngvpny guvaxvat jbhyq or vasrevbe gb fcngvny urer (zber pbzcyrk zbqry arrqrq)? Fnl bar oybpx jrer zbivat bar fgrc qbja fb gur cnggrea jvyy or 1-4-7-2-5-8-3-6-9. Bs pbhefr lbh pbhyq punatr gur nkvf bs gur pbbeqvangr flfgrz, ohg gura lbh jbhyq hfr qvssrerag pbbeqvangrf sbe gur qvssrerag oybpxf. Guvaxvat fcngvnyyl guvf ceboyrz vf irel fvzcyr.

Expand full comment

Zl cbvag vf V qvqa'g rira guvax bs vg nf n 3k3 neenl ng nyy. Zl ernfbavat jnf gur fnzr nf vs vg jrer n fgevat bs avar barf naq mrebrf.

Expand full comment

I took just a couple seconds to decide that only one of the answers had a shape that was part of the puzzle, so I picked that one and was right. Didn't even have to think of a complex pattern because I have soo many neurons.

Expand full comment

V abgvprq gung bar bs gur oynpx fdhnerf cebprrqrq erthyneyl sebz gbc yrsg gb gbc zvqqyr gb gbc evtug gb zvqqyr yrsg rgp. Gung ryvzvangrq guerr bs gur cbffvoyr fbyhgvbaf. V qvqa'g vqragvsl n cnggrea gb gur bgure fdhner, rkprcg gung vg nyjnlf jnf nqwnprag be xvggl-pbeare gb gur svefg fdhner, juvpu aneebjrq gubfr guerr gb bar.

Expand full comment

I am annoyed that the inner and outer grids followed western reading order. I thought these questions were supposed to be free of cultural context. If the inner grid was 9 columns and the outer grid was 9 rows it would make more sense.

Expand full comment

This puzzle is actually a pretty good example of cultural assumptions in an IQ test. Because "left to right, top to bottom" turns out to be very important.

Expand full comment

You can read it just as well right to left and bottom to top... I think the concept of wrapping back around to the beginning when you reach the end is much more cultural.

Expand full comment

Cases of humans with substantially reduced brain matter (often due to hydrocephalus) but normal or even well above normal intelligence deserve consideration here. It is worth remembering that the theoretical mechanism by which neurons process information is pretty vague beyond filtering sensory input and carrying impulses outward. And rather than neuron count, maybe neuron density or connectivity is the more critical metric for intelligence. That is where humans seem to be the most different from other mammals rather than raw brain size and neuron count. Are there any human studies correlating these traits against intelligence? Harder to meaningfully measure without sectioning or soupifying the brain though.

Expand full comment

Thanks for that fascinating study. A little frustrating since the AI driven analysis didnt seem to indicate what it was detecting in the MRI data to predict intelligence a fair bit more effectively than plain brain volume. Am I misreading the study on this point?

Expand full comment

When I was 24, I was diagnosed with acute hydrocephalus due to aqueductal stenosis (i.e. one of my neural tubes became blocked, and cerebrospinal fluid was backing up behind it). As my condition worsened, I absolutely experienced reduced brain function.

The neurosurgeon explained that if I didn't get surgery to fix it, I would have gradually become stupider as the pressure increased and the years went on. It is widely understood that hydrocephalus impairs brain functioning. I don't doubt that cases exist where a person still has average or above average intelligence with hydrocephalus, because I was one. However, once I had the surgery, my cognitive function improved above and beyond that, back to what it used to be. Don't forget the counterfactual of how these patients would have functioned without the hydrocephalus!

Expand full comment

That is indeed the experience for most people with hydrocephalus. But there are a handful of recorded cases of people with IQ as high as 130 (a graduate math student IIRC) who were found to have a hugely diminished effective brain size due to life long undiagnosed hydrocephalus. A French public servant with normal IQ is also among the recorded cases. Just examples that challenge the model that brain size sufficiently explains intelligence.

Expand full comment

https://gwern.net/hydrocephalus:

>"that this virtually brain-free patient had an IQ of 126. He had a first-class honors degree in mathematics..."

>This is certainly big if true...

>The brain scan he posts is not, in fact, of the IQ 126 case... Further, Oliveira et al lied about the origin of the images, which were copied from elsewhere, and the paper has been formally retracted.

>OK, but what about the Lewin ‘review’... and the IQ 126 guy? Lewin ... is retelling Lorber’s anecdote at third hand. Lorber provides no concrete details about him...Surprisingly, as far as I can tell, Lorber and associates ... have not published anything at all on their dataset in the 39 years since the Lewin press coverage, and so there are no scans or data on this guy, or any of the others Lorber claimed to have above-normal intelligence.

>How many severe hydrocephalus cases have even above-average intelligence? Despite Lorber’s claim to have found many cases of normal and a highly above-average hydrocephalus patient easily, subsequent researchers appear to have failed to do likewise over the ensuing 4 decades. Forsydke 2014, cited by Watts, himself only cites 3 instances: Lorber’s unverifiable anecdotes via Lewin, the retracted & likely fraudulent Oliveira et al, and a third case study of Feuillet et al 2007 who reports their patient having an IQ of 75.

The French civil servant is the one referenced in the previous paragraph with an IQ of 75 - not a normal IQ.

Expand full comment

Appreciate the deeper insight into this phenomenon!

Expand full comment

I'm curious about how this kind of sudden impaired brain function would feel in practice? Were you presented with problems that you knew you could have solved before, but no longer could solve? Do you have some examples?

Expand full comment

I used "acute" in the medical sense (ie "not chronic"). My condition gradually worsened over about five years, from age 19-24. I didn't actually notice the cognitive effects until after I'd recovered from the surgery, because I got worse so gradually. The other symptoms were more annoying - pressure headaches, and a very specific visual phenomenon associated with pressure headaches.

I didn't take an IQ test during this period, but I did manage to graduate with an engineering degree from a top-5 engineering school. But I struggled *immensely* junior and senior year. At the time I just chalked it up to the coursework being harder. In hindsight, it was because my focus and working memory were in the toilet. E.g. for the final 3 semesters of college, I could barely retain any new info. If my professor had any kind of thick accent, I would spend all my brainpower decoding their words and absolutely none of the content was landing. I relied on "guessing the teacher's password" way too much, and actually retained very little info longterm.

When I started my first job after college, I struggled to learn my company's processes and systems. Eventually I learned what I needed to know, but it took much longer than usual. It was... bizarre, in hindsight. When I'd encounter a new process, I would literally think to myself, "oh, this makes zero sense to me right now on first pass, but I will eventually absorb it through osmosis and vibes after about 10 repetitions." And it kinda worked (???) even though it sounds insane to type it out right now, 5 years post-op, at age 29.

Again, at the time I just chalked it up to "getting older" and not being a brilliant teenager anymore. I bought into the memes about being a "former gifted child" and burning out in adulthood. (It didn't help that that workplace was really toxic.)

I don't actually know my IQ, but the cutoff for gifted ed in my K-12 school district was 130, so I was at least that high. And even then, I excelled in school and got into a great college, so maybe I went from, say, ~135 down to ~115. Still above average, but very diminished compared to what I had been before.

Fast forward a couple of years. I got a new job at age 26, two years post-op, and suddenly learning my new company's systems and processes was... easy! I understood everything after 1-2 repetitions. No more avoidance and learning by vibes. I could just take notes and *learn*.

Thanks for asking! I hadn't really reflected on my experience before now. Until you asked me, I hadn't realized how much easier my post-surgery job was compared to my pre-surgery job.

Expand full comment

Thanks for sharing! Interesting read - especially the part on going from 10 to 1-2 repetitions. I think you have a unique perspective into what it is like to lose ~20 IQ points.

Expand full comment

I had a mild form of this for a few months. The onset was rapid, so I could immediately feel the cognitive difference. I would describe the feeling as follows:

1. Thinking took more energy. To solve the usual problems like in software programming for example, it's not as though I wouldn't know what to do, I would just be inclined to avoid it because it took too much energy to search back in my brain for those memories.

2. Remembering things and thinking about things gave me no positive emotional response. I believe hormones are a critical aspect of intelligence.

3. Slight loss in ability to think of words on the spot, my typing speed was also impaired--it would take me longer to type without mistakes and I could no longer type without thinking quite as easily.

4. A "fuzzy" feeling in my head I couldn't identify, not a headache or pressure but something more like a weak electric shock. It would distract me whenever I tried to solve a problem. It almost felt like the fuzz became worse the harder I would try to think. This also made me avoid trying to solve problems.

5. In general, everything just took more energy. I had a few memory problems but I don't think my capacity for problem solving was lost, I just lost the drive involved with problem solving. Trying to read a page of a book felt draining, and it took enormous energy to maintain the context of the facts on the page in my head.

Expand full comment

I think brain structure counts for more than size. Whales and elephants have huge brains, but are not as smart as humans. Birds have tiny bird-brains, but some of them are much smarter than others, and smarter even than mammals with brains of comparable size. LLMs have more neurons than humans but are not nearly as smart, nor capable of real-time learning. To push the analogy further, an ocean of calculators would not be capable of nearly as many feats as my modestly-sized laptop, despite having more transistors.

Expand full comment

LLMs have 2 OOM fewer parameters than the human brain- the relevant count is of synaptic connections vs LLM parameters. Largest models have ~1T parameters humans have ~100T.

That said, yes, structure counts for a lot.

Expand full comment

Thanks, I stand corrected. That said, I still don't believe that an LLM with 10x or 100x parameters would be as "smart" as a human (as per my "ocean of calculators" analogy above). Humans are able to learn in real time, and IMO this kind of learning is key to our ability to solve a wide variety of novel problems quickly and efficiently (which is what I understand the term "intelligence" to mean).

Expand full comment

That's the Big Question, isn't it? (also, data. Do we have 100X high-quality data?) I too am skeptical, but the "shut up and follow the graph" folks have been right in many important ways over the last decade or two.

Of course nobody said the paradigm would remain constant. Indeed, reasoning models may be changing it right now. And in a direction not unrelated to being able to learn in real time.

ETA: there's also the question of target function. Token prediction has been surprisingly useful for "downstream" tasks. Not at all clear this will continue.

Expand full comment

> but the "shut up and follow the graph" folks have been right in many important ways over the last decade or two.

I don't know if that's true. To my knowledge, none of the exponential graphs that were spoken of so highly in the beginning have remained exponential. Instead they all trend toward S-curves.

Expand full comment

At the very least predictions of people like Shane Legg about how far scaling could get us look better than those of their critics. Which is not to say I endorse every single prediction he made. But even as late as 2017, I can confidently say that a large fraction of NLP researchers did worse at anticipating the advent of GPTs than Legg, Amodei or Sutskever. Even by late 2018, it was "obvious" to many that the newly-introduced BERT was going to quickly run into various walls. Of course only a few months later OpenAI announced GPT-2.

Expand full comment

If LLM needs "100X high-quality data" it's by itself evidence that in some very important ways it's much worse than human brain.

Expand full comment

Most of the "parameters" in LLMs are analogous to simple synapses, not neurons, so Epstein is right, very large LLMs have roughly 1% as many synapses as humans. I disagree about them not being "smart": they are stupid in some ways, but smarter than us in others. Kind of a pet peeve of mine that people will go ask all kinds of questions to LLMs, then declare them "unintelligent", then ask them to complete their college homework, which they do correctly, then call them "dumb". (I know it doesn't hurt the machine's feelings or anything, but people also tend to think real AGI is impossible, and I suspect it may be related to that bias toward wanting to call LLMs stupid.)

Expand full comment

I think that if two brains have the same neuron density, the larger brain will be more intelligent. However, neuron densities within the cerebral cortex and its equivalents vary across the animal kingdom. This explains why a bird can be smarter than a mammal with a similarly sized brain. A paper Suzana Herculano-Houzel co-authored found that birds have a neuron density twice that of primates. (https://www.pnas.org/doi/full/10.1073/pnas.1517131113)

Another one of her other papers found that primates have a neuron density within the cerebral cortex which remains constant as their brain size increases. (https://www.frontiersin.org/journals/human-neuroscience/articles/10.3389/neuro.09.031.2009/full) Contrastingly, rodents have a neuron density which decreases with size. She found in a third paper that most mammals have a neuron density which decreases as their cerebral cortex grows larger.

{https://www.frontiersin.org/journals/neuroanatomy/articles/10.3389/fnana.2014.00077/full) The result of this is that mammals with large brains are less intelligent than what one would expect assuming a constant neuron density. Even mammals with cerebral cortexes larger than the human brain, like elephants, are less intelligent than humans because of their inferior neural structure.

As for LLMs, I believe that their brain structure is deficient compared to the brains developed through natural selection. Though they are unnervingly capable, LLMs do not learn in real time and cannot think continuously. I do not believe that LLMs can scale to become an artificial general intelligence or an artificial superintelligence. I will go far as to predict that there will be one more AI winter before an artificial general intelligence is created.

Expand full comment

Maybe a limiting factor in neuron density is heat dissipation. It may be no coincidence that birds have a higher body temperature than mammals (I think), and that, along with birds' presumably lighter, thinner skulls, allows the higher neuron density.

Expand full comment

How do bats fit in? They have the same energy and weight constraints as birds but evolved independently.

Birds and bats live faster than humans by the way (perhaps due to higher temp and greater neuron density), so may appear to be more intelligent cos they have more time to think about a problem.

Expand full comment

Bats also have high body temperature, and I recently have learned they are shockingly smart. So they fit the pattern.

Expand full comment

They are also shockingly virus prone! (Apparently due to the high temperature)

Expand full comment

That's the context from which I learned about it.

Expand full comment

Apparently their resting body temperature is 93.9°F (34.4°C), and much less while hibernating, but it rises to c 105.8°F (41°C) when they are in flight. (c.f. a healthy human's body temperature of 98.6°F (37°C))

Of course that could be mostly muscles heating up from flapping their wings, but it may also be in part an adaption to having to think harder while navigating obstacles and hunting their prey such as moths.

Expand full comment

Or, amazingly on purpose, it is indeed the muscles heating up and enabling faster thinking :)

Expand full comment

I would be surprised if it weren't at least one of the factors. Coincidentally (I think not) we're hitting all kinds of limits related to getting the heat out of chips, with people trying to come up with some rather complicated methods, see, for example, https://s-pack.org/research/surface-engineering-enhancement-of-advanced-cooling-technology/

Expand full comment

> Though they are unnervingly capable, LLMs do not learn in real time and cannot think continuously. I do not believe that LLMs can scale to become an artificial general intelligence or an artificial superintelligence. I will go far as to predict that there will be one more AI winter before an artificial general intelligence is created.

I completely agree; I came to the same conclusion myself, and my contacts in the ML space say the same. Of course, I'd amend that to "at least one more AI winter".

Expand full comment

"As for LLMs, I believe that their brain structure is deficient compared to the brains developed through natural selection." Yes but there's a lot of promising work going on in Neuromorphic AI.

https://www.sciencenews.org/article/brainlike-computers-ai-improvement

Expand full comment

ML theory has something to say about these topics, unsurprisingly.

Let's break down what it means for AI to "work". When we say "deep learning works", what is covered by that statement? Here are at least three different meanings (focusing on a standard supervised case for simplicity).

1) A given hypothesis space (a collection of functions mapping input to output) is rich enough to include (or approximate) a very good function for our task.

2) We are able to find, within that space, a function that performs well on our training data.

3) Performance on the test data (or even more importantly, on the real world) is not catastrophically far from performance on the training data.

What does parameter count have to do with each one?

1) - this one is trivial. Higher parameter count means a richer space you can represent. Indeed, in the extreme case, a large enough neural net architecture can approximate every "nice" function (though this is "motivational" - the relevant sizes and constants tend to be astronomical).

2) In principle, one might actually expect training on complicated models to be hard. It certainly is computationally expensive. But there are various ways in which complex neural nets enjoy an advantage.

(*) Wide basins - as mentioned in the OP, both practically and theoretically (under some assumptions), good minima tend to be flatter.

(*) Exponentially rare non-minimum critical points - in complex models, the odds of a critical point (one where the gradient is zero) to be a local minimum decrease exponentially at higher values of the loss function. This means that if you encounter a local minimum, it is more likely to be a decent one.

(*) De-facto convexity - much has been made of the topic of deep learning being non-convex. And yet when you plot the lines from starting to ending point in a deep network's loss manifold (that is, the straight line connecting the initial and final values of a network that had been trained), they tend to be either convex or almost-convex (very few "hills" to overcome).

(*) The lottery ticket hypothesis - there is evidence, both theoretical and empirical, that one way to think about over-parametrized models is as an exponentially large set of subsets of parameters, some of which are particularly good starting points. This is useful for training dynamics, pruning and lots of other things.

3) This is perhaps the most mysterious and "open" question. Why does deep learning generalize surprisingly well? One might expect that huge networks would be prone to overfitting - it should be easier for them to memorize the training data. Indeed, that is what classical ML theory (e.g. PAC) predicts. I literally did my PhD about this, so I could go on forever, but let me mention a few key points.

(*) Lots of training data! And enough parameters to use it! Sure, one might worry about overfitting. Good thing we're training on all the internet.

(*) It's not literally the case that, as your friend said, every rich family of functions would do. Good NN architectures are particularly well-suited for their task. Convolutions encode good biases for vision, for example.

(*) In recent years, a growing body of literature called PAC-Bayes theory has been able to provide some answers regarding NNs' "failure to not generalize". Starting from Dziugaite and Roy's https://arxiv.org/abs/1703.11008, "Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data", we have been able to find nontrivial bounds for DL generalization.

A particularly relevant finding is that in the low-training-loss regime, sampling efficiency (how well we generalize as a function of number of samples) is actually very good, under reasonable assumptions. So many parameters and many samples = awesome.

... I should probably write a longer post about PAC-Bayes in one of the next open threads.

Expand full comment

Thanks for adding value to the discussion! I'd be very interested to read the post you mention at the end.

Expand full comment

It seems worth pointing out here that the differences in neuron number relevant to AI scaling laws are measured in orders of magnitude, while the differences between humans are much smaller — something like 10-15% within sex, and then a further average difference of 10-15% between sexes. But males and females have similar average IQs despite these neuron number differences, so the straightforward extrapolation seems far from clear.

Expand full comment

There's something fishy with the analogy, yeah. LLMs spanning multiple orders of magnitude have abilities that resemble those of human of varying abilities, even though the variation in neuron count among those humans spans much less than an order of magnitude.

Expand full comment

Couldn't it just be that LLMs have already hit a soft wall in one direction, a long time ago, like GPT 2.5 even? So making a 100x GPT4 just won't yield much, while making a 100x o3 would start to see big gains again.

Similarly if you kept making human brains bigger you'd reach an infection point where size was no longer the bottleneck, and a 15% size difference didn't matter as much.

And it would make sense we still are in the steep part of the brain size/utility graph, given evolutionary constraints.

Expand full comment
Mar 7Edited

This might be explainable like so...

Say most of human neurons and connections go towards keeping the body alive, recognizing a piano, directing the muscles to pick up a spoon, and other functions that we don't expect AI to do.

Using definitely-wrong made-up numbers, it might look like this:

Person A: 10^8 +100 synapses ... low IQ

Person B: 10^8 + 1000 synapses ... high IQ

LLM 1: 100 parameters ... feels like a person with low IQ

LLM2: 1000 parameters ... feels like a person with high IQ

... where the humans' 10^8 are going towards those other, base functions.

In this way, you could get orders of magnitude difference in LLMs that create differences that seem comparable to human ranges.

Expand full comment

To dial those numbers in a bit: The distinction you’re talking about corresponds roughly to the cerebral cortex versus the rest of the brain. About 20% of the brain’s neurons are in the cerebral cortex. Perplexity.AI say it wasn’t able to find anything "conclusive”, but it is telling me that the available evidence points to the variation in the neuron count in the cerebral cortex being on roughly the same order as the variation in total neuron count, so I’m guessing it’s not the case that humans of varying intelligence have orders-of-magnitude differences in how many neurons are devoted to intelligence (depending on how much you trust o3-mini + search engines + prompt engineering to get the right answer. If anyone knows the real answer, definitely speak up.)

I’m speculating here, but it seems to me that one of two things is going on:

- Biological neural networks scale in roughly the same way LLMs do, so the interspecies correlation between neuron count and intelligence, driven by orders-of-magnitude differences, is analogous to LLM scaling., but the intraspecies correlation between total neuron count and intelligence among humans, driven by differences that are much smaller than even a single order of magnitude, is downstream of something else, like overall brain health.

- Biological neural networks scale much differently than LLMs for some reason, perhaps due to differences in the kind of data they’re trained on, or how the weights update.

Expand full comment

It did occur to me that if you think of the cerebral cortex as being responsible for intelligence (which I usually do), then the cortex is too big to fit the picture I painted of a small addition on top of lots of other stuff.

But then I remembered that one of the big chunks of the cerebral cortex is the visual cortex. And another is the motor cortex. Which LLMs wouldn't have an equivalent of. Any part of the cerebral cortex that handles functions like:

* keeping track of passing time

* episodic memory

* remembering where to find the book you were reading yesterday

* maintaining a mental list of what you need to do

* probably other functions I'm not thinking of at the moment

...wouldn't be needed in an AI.

The prefrontal cortex might be a better comparison for AI-equivalence, and might still be too big to give an order of magnitude difference among humans. Checking Perplexity...cites a paper with numbers that suggest a range in synaptic densities from 10e8 to 11e8 synapses per cubic-mm. Still not equivalent to GPT3.5's 175B vs GPT4's (probably) 1T parameters.

PFC doesn't seem like quite the right comparison, but in both directions; it handles emotional regulation, which AIs (probably) don't do. While semantic knowledge, which is AI's strong point, is handled by regions outside the PFC (by the PFC too, but not only that).

It might still be that if you could locate more precisely the human synapses that do things AIs do, that there would be an order of magnitude difference. But it's definitely not clear that that's the case.

Still, the fact that human brain size correlates at all with IQ, plus the overall interspecies and overall inter-AI-model trend makes it more probable to me that intrahuman differences are explained by the same number trend than that something like overall brain health is responsible.

I will also add a comment on the equivalency of measurements. LLMs are usually measured in parameters, which is not strictly equivalent to number of neurons in humans. It is closer to being equivalent to [# neurons + # synapses]. But I think even that is not quite right. Human brains start with tons of synapses, and synapse number goes *down* after like, age 2ish. This is because learning requires differentiation: some synapses grow while others are pared down. LLM weights also get pared down in order for them to learn, but in an LLM, that looks like "weight's numerical value becomes closer to 0", in which case the weight is still counted in the total number of weights. Whereas with humans, making a weighted connection go to 0 looks like "synapse shrinks and disappears", so that the synapse isn't counted anymore. So comparing human brains to LLMs should look more like [# possible/original synapses (+#neurons, but this is negligible when there's 7k syn/neuron)].

I don't think that changes the math regarding order of magnitude; there might be 2x more possible/original synapses than actual synapses, but they probably still only vary by 20% or whatever. I just wanted to point it out.

Expand full comment

Also, it's trivial to imagine scaling up certain parts of the brain and getting no intellectual benefit. A yam-sized brainstem, for example, would not make it any easier to solve that IQ puzzle.

Expand full comment

What about layers? We've had perceptrons -- single layer neural networks -- since the 50's but it is not until "deep" learning added layers that AI models really took off. The Goodfellow DL book provides a nice visualization of hidden layers as effectively "folding" a piece of paper like origami to find a way to separate/classify the points in the initial dimension (see Figure 6.5 - pg. 196: https://www.deeplearningbook.org/contents/mlp.html). At the risk of stretching the analogy, for human intelligence this might be something like neurons in one part of the brain doing some local pattern recognition while also knowing when/how to "pass" that information up to higher level processing centers, which themselves transform and then pass the information further up, and so on. In this way, you can have an efficient brain either because it has a lot of dense and fast neurons or because it has efficient layering of neurons.

Having said all that, we probably shouldn't tie ourselves to neural nets too much. As your friend states, "it didn't have to be neural nets", and in fact Gaussian Processes were for some time able to keep up with neural nets with a completely different approach: fitting normal distributions in infinite dimensions. However, the classic GP requires a computationally expensive step (a big matrix inversion) across all points in the training set whereas neural nets can learn on batches and so are much more scalable especially once you throw GPUs into the mix. Is the human brain learning in batches like neural nets or with a big matrix inversion like a GP? Not sure.

Last point: "endless practice on thousands of Raven’s style patterns helps a little, but a true genius will still beat you"

I think this isn't quite right. "Ability differentiation" means that people with low ability will generally do poorly on all tests but people at the high end will have lower correlations across tests -- they'll do really well at some things and just above average on others. Moreover, Raven's are not particularly "g-loaded" (if that's your thing), Vocab and Information is, so Raven's would be especially differentiated at the high end of ability. In other words, it's entirely plausible that you could beat a genius at Raven's, especially with practice, even though they might blow you away at Similarities. Coming back to layers, perhaps this means that as ability increases your brain shifts from layer-like learning (so you are good at connecting concepts) to neuron-like learning (so you are good at a specific concept).

Expand full comment

Worth pointing out that dropout in NNs approximates Bayesian inference in deep Gaussian processes (Gal and Gharamani 2015).

Expand full comment

Yeah, one point I'm still puzzled about is how we should think about the number of neurons / parameters in a neural net when you also have a lot of regularization (e.g. via dropout). It seems like you could have a very large number of neurons but if dropout is massive or you have a lot of penalization then the "effective" number of neurons is actually quite low; though this will also depend on the amount of training data I guess. To your comment above, my sense is that a huge number of parameters + a lot of regularization is the special tension that allows NNs to generalize well. Basically they can keep opening up more complex functional forms but will do so only if the data allows for it.

Expand full comment

Regularization helps a lot but so does bias: the fact that most functions in even very large neural networks represented by a deep NN are of the form "something (relatively) simple plus some small wiggles and a few special cases" rather than being dominated by the special cases like happens with traditional highly parameterized models. This is related to overcompleteness in that simpler functions are also the ones with more different approximations in the parameter space.

Expand full comment

Yes, it definitely seems that a big part of the performance of NNs comes from them basically being a way to batch-compute Gaussian Processes. But there's another big part where instead of just computing the one function you're looking for they precompute a huge raft of related functions. And it may be hard to describe this in the Gaussian Processes framework.

Expand full comment

One part of the picture is the nonlinearities. It's easiest to reason about in the case of ReLU. It just "toggles" between various functions you might want to apply. Other nonlinearities interpolate between them more smoothly. See "A Probabilistic Framework for Deep Learning", Patel and Baraniuk 2016, for my favorite treatment of this particular point. They construct a generative model of the data where a given patch (at some resolution) either gets rendered or not, with ReLU being the corresponding part at inference time.

Expand full comment

I think that we should be careful to avoid taking the analogy between nodes in a neural network and actual human neurons too far. It's useful up to a point but beyond that point probably obscures more than it helps.

Thinking about layers is probably one of those things. In a sense, human brains have vastly more layers than any neural network we've tried. In another sense they don't have layers at all; they're not timeless functions mapping inputs to outputs, they're a constant mess of activity circulating in real time with both inputs and outputs connecting in various places but definitely not just mapping inputs to outputs.

Expand full comment

What's the subjective experience of solving puzzles (like the one in the article) for everyone else?

For me, I feel like there's two processes going on, a "loud" process and a "quiet" process. The loud process takes the form of my internal monologue trying to work through the problem methodically, laying out all the possible classes of answer and trying to think through them one by one. It's slow but methodical and likely to reach the answer eventually. Meanwhile it also feels like there's a "quiet" process going on elsewhere in my brain working on the problem from different angles, and sometimes it comes up with the correct answer and dumps it into my consciousness in what feels like a brilliant flash of insight.

What would it feel like to be dumber, in the sense of being worse at puzzles? I can imagine it going several ways. Maybe the quiet process is much weaker or absent. Maybe the loud process is slower and has more trouble keeping track of what it's doing. Or maybe the loud process is just distracted with so many other thoughts that it can't work at all.

Similarly, someone substantially better at solving problems might have a faster and more disciplined loud process, or they might be able to spawn up a whole lot more quiet processes. Or maybe they can handle multiple loud processes at once. Or they might have a completely different subjective puzzle-solving experience.

Expand full comment

I think I'm largely the same. Though I think "system 1" and "system 2" are an oversimplification of cognitive processes, I think it's a useful way to describe what goes on in my head. System 1 does subconscious pattern matching; system 2 is slow but systematic. In the article's puzzle, it was system 1 observing that squares are moving on the grid. Then system 2 asks "how are they moving", collects some observations, and selects a rule that fits---drawing from a lifetime of experience with puzzles to home in quickly on likely solutions.

I suspect that the most successful puzzle-solvers (and problem-solvers in general) have both system 1 and system 2 advantages. A system 1 that has a larger repertoire has potential to identify critical information earlier. The repertoire is acquired through experience, but the maximal size of the repertoire is determined by the capabilities of the hardware (i.e. the neurons and supporting cells). In particular, I think it's pretty obvious that neuron count should be an important factor in the capabilities of system 1. System 2 relies on precision (e.g. being able to maintain a complex chain of thought) as well as efficiency (e.g. knowing what approaches are likely to be fruitful, and what are likely to be dead ends). Again, practice is important, but the maximum capabilities are dependent on the hardware.

Expand full comment

Timothy Gowers has a series of videos on youtube where he solves math puzzles on camera and tries to narrate his thoughts. He got a perfect score and gold medal at the international math olympiad when he was a kid and went on to win the fields medal so he has to count as someone who is very very good at math puzzles. Its very much like you say where he's talking through different explicit thoughts but at the same time clearly sub/semiconsciously thinking things as well. Its interesting to pause the video and try to solve the problem yourself first and then see what he did differently.

Expand full comment

I think it would feel like "Guh? I've no idea where to start!" and not even getting as far as these processes. This is both how I feel about puzzles too hard for me to solve, and how I sometimes observe other people reacting to puzzles I can solve.

Expand full comment

Yes, it's something like that. In this case, I was considering a bunch of things then at some point the thought arrived, "Lbh'er na vqvbg. Vg'f whfg gjb fdhnerf zbivat frcnengryl juvpu unccra gb pbvapvqr va gur svefg tevq."

Expand full comment

I hate these types of puzzles. So I looked at it, and pretty quickly though I saw the right answer, and then immediately questioned it, "it can't be that simple". Then spent way more than 1 minute trying to figure out patterns, gave up, and looked at the answer in the link.

Well, turned out my initial answer was correct, but I did not use the reasoning in the link, and clearly had little confidence in it. So now I wonder what I would score in a formal IQ test - 100?

Back to figuring out things in real world. I know I'm good at that.

Expand full comment

> Why should a big pattern-matching region be good?

I think everything Scott said here is accurate, but it’s not asking the more important question. The research question over the past couple decades for neural nets has not been “do bigger nets learn better?” but “how do you keep big nets from learning too much?”

I think that quote from Scott’s friend explaining why bigger nets learn better is good, but it’s answering a less important question than, “how do you keep big nets from learning too much” and memorizing / overfitting to the dataset. The more interesting question is how do you get them to abstract rather than memorize? This is a good lesson for why the papers of 2012 were such a breakthrough at the time. Geoffrey Hinton’s work then used layer-wise pre-training. This was a state-of-the-art breakthrough *not* because it was the first time someone tried using a neural net of that size, but because it kept a neural net of that size from overfitting. Using autoencoders for layer-wise pre-training regularizes the model because it encourages abstraction / manifold learning.

On a related note, there was no technical reason that pre-transformers (LSTMs, CNNs, etc) couldn’t be 500B parameters. But they’d overfit at way below that range. I don’t mean to explain something that everyone knows, but remember that an infinitely wide neural net is a universal approximator and infinitely deep is Turing complete. Remember that the classic Transformer architecture is not Turing complete, whereas the LSTM was. So in a theoretical sense, the Transformer is less capable than a LSTM 1 / 100th the size. But of course, in practice, the Transformer far outperforms it, because you can regularize it better (with the best kind of regularization, more data). Now that we have the State Space Models (Mamba, etc), we can see that the Transformers aren’t better due to the attention layer, but just the scaling.

> The best real answer I can come up with is polysemanticity and superposition

Perhaps I’m being a grouch here, but remember that research papers are in part a rhetorical exercise, and “polysemanticity” was largely a branding effort, not novel research. The idea of polysemanticity was obvious and well understood for a long time. It was shocking how much money they spent on that. I think the polysemanticity papers did a minor disservice to AI safety by tricking well meaning, but poorly informed, people into thinking that Anthropic was making progress on AI safety.

Expand full comment

You do realize that more neurons is not just more "storage," but also more "processors," right? It's a bit like asking "how come computers with more ram and CPU cores can multiply bigger numbers faster?"

Seems pretty intuitive to me. I'm not sure I grasp why it does not seem that way to you.

To make it even clearer. I don't think anyone disagrees that (all things other being equal) two people are smarter than one. A brain with twice as many neurons as a humans could literally be 'wired' to simulate two people, and so your question is basically saying, "how come groups of people outperform individuals?" - more information throughout, more model complexity, more storage, etc...

There's rich ML literature on how larger models can learn better than smaller ones (hence distillation being a thing), but your question seems more basic than that :"how come bigger brains *can* be smarter" v.s. "why are bigger brains more efficient at learning intelligent behaviour in some cases"

Expand full comment

To make this fact of the universe even more explicit. Here is hand wavey proof:

Larger brains must be at least as smart as smaller brains (as larger brains can simulate smaller brains). I.e., the set of problems are smaller brains can solve in a given time is strictly smaller than the set of problems a larger brain can solve in the same time.

Larger brains must be more intelligent than larger brains in some senses as there exist problems they can solve in a given time that smaller brains cannot (e.g., simulate being a larger brain, or for a 2x the size brain, simulate the interactions of two smaller brains, or e.g., solve two problems at once that would each take 100% of the capacity of a smaller brain).

Expand full comment

>Larger brains must be at least as smart as smaller brains (as larger brains can simulate smaller brains).

I don't think your assumption is trivially true. It is trivially true that a larger (as in more neurons) brain could potentially contain the smaller brain within it, given the right neural connections. But it is not true that this has to be possible in any given actual brain, for example the neural connections in the bigger brain could be less efficient.

In reality, sometimes utilizing more processor cores makes computations take longer time, because the overhead of sharing information between cores outweighs the benefit of the marginal core.

Expand full comment

Yeah. Parallel processing is hard for the general case (1). Doing so over slow links especially so. Distributed processing often spends as much power organizing the work as it does actually working.

(1) There are some things well suited for parallel processing. Graphics, for example. Or, it seems, LLMs. Things where you don't have to coordinate much--you can break the problem space up and divide and conquer.

Expand full comment

Theoretical ability to represent more stuff is a misleading desideratum. That's because you want to represent the signal in your data, but not the noise. To do that, you want a *simple* representation (though not too simple). The key to success (in addition to more technical stuff) is good priors. In particular, there's the prior that good representations in perception are hierarchical (hence *deep* learning)

Once you have a good prior, more neurons are indeed good - probably because they make it easier to escape local minima.

But all this is about perception. IQ problems and the problems in the ARC AGI tests are not reducible to perception.

Expand full comment

Why are humans smarter than elephants?

Why are crows smarter than cows?

Why are dolphins smarter than blue whales?

Brain size probably correlates with intelligence in-species, but there is definitely a lot more going on.

Brain mass of different species:

https://faculty.washington.edu/chudler/facts.html

Expand full comment

This makes a ton of sense to me as someone who’s probably working near my *minimum viable* range. I’ve had, since I was a kid, a pretty big discrepancy between demonstrated conceptual reasoning ability and spoken verbal capacity, which was a little anomalous because I was an avid reader with a large vocabulary that tested well for English and comprehension. I think I can explain it like this:

When I think in my head, I think fluidly in sort of “clouds” of words that stream through my mind, seemingly as clearly as plain English somehow, but without necessarily being in linear order, or quite fully loaded into my primary processor. My mind is very quickly able to load the *precise concept* represented by [this word I can’t instantly name but I’m pretty sure it’s four syllables and starts with an E and has a different connotation than the related-but-inferior word that I’m about to use instead]

OTOH when I’m in “analysis mode”, sort of like a flow state, my mind just goes directly between concept-->concept-->concept, instead of concept-->word selection-->vocalize-->repeat

When I have to talk in person, trying to grab that *single word*, that proper label which successfully conveys all the meta-details that I’m visualizing quite clearly, is like trying to pull a slippery red fish out of barrel of blue fish while a metronome ticks loudly in the background.

When I write it’s much ~~easier~~ doable, but I often err on the side of too many words. (Sorry, they just have a way of expanding on the page...) Also it takes me an embarrassingly long time just to write a comment like this.

I’m super envious of your ability to articulate concepts so precisely without needing so much work in the editing room, so to speak.

Expand full comment

So are the concepts a neuron shares related somehow, or just randomly thrown together? The first arrangement seems better -- an arrangement where all the concepts stuffed into one neuron are approximately isomorphic. Then instead of interfering with each other each would enrich its brothers. You could think of each pair of brothers in the same neuron as being related by a sort of conceptual synesthesia.

That model fits well with a feeling I often get when trying to figure something out -- a feeling that I know the general shape of the answer, without being able to say more at that point. And by 'shape' I mean something pretty spacial. For instance some ideas have to do with levels, with each level feeding stuff to the one above, and I see a stack of stuff in my mind when I think of the idea, and some have to do with 2 things colliding, and some have to do with how a big thing is ruined by one little part of itself, over in the corner.

Here's a real example of an argument shape: My mind has always represented the idea of an emergent property by an image of a bunch of solid shapes that get mashed together, and when they are mashed a vapor rises up from the mass -- and that's the emergent property. And sometimes when I'm trying to figure out another thing entirely, the emergent property image rises up. For instance when I struggle to think of a way to describe the relationship between brain events and qualia I might see the emregent properties shape in my mind.

And the scientist who dreamed of people holding hands and dancing in a ring, then in the morning realized that the carbons in benzene must form a ring -- that's a shape-based solution.

Expand full comment

I would expect the opposite, for it to be more efficient to store orthogonal concepts in one piece of a neural system than things that are nearly but not quite the same. The latter would probably allow a rough approximation with fewer neurons but then the epicycles required to reach a closer approximation matching the orthogonal basis representation would add a lot of complexity. The random features hypothesis would, I expect, reach efficiency close to the orthogonal representation without needing complicated preprocessing. Maybe this is why highly creative people seem to have such wild associations with stimuli?

Expand full comment

So It’s not like with clothing, huh? A sock drawer, a t-shirt drawer . . .

Expand full comment

I mean maybe it's more like clothes, this is just me vibing about how I would start to model this. The problem with all these mathematical models of the brain is that it's difficult to validate them, so it's hard to make much progress without first developing some new technique to look inside an actual brain (and such techniques tend to be invasive or destructive or indirect).

Expand full comment

SPOILER ALERT!!! DON'T READ UNLESS YOU ALREADY SOLVED THE PUZZLE!! I would think more neuronal connections would be more important than total neurons. I checked the solutions for the problem to see if I got the right answer (I did) and I arrived at the answer by a completely different route. In fact there were several different routes all arriving at the correct answer, some quite clever (clever seeming to me since I didn't think of them) and completely outside my ability. One person noticed that you could always make an "L" shape by coloring an additional square, and only the correct answer allowed you to do this. My answer was to always color the first number, and the answer, to the following equation- N+(N-1), with N increasing by 1 each frame. Starting with 1+(1-1)=1, you would only color one square. Then 2+(2-1)=3, coloring the second and third square, 3+(3-1)=5, so color squares 3 & 5, 4+3=7, 5+4=9, and so on. The last box would 9+8=17, with 9 & 17 colored (box 1 would become 10, box 2 becomes 11, 3-12, and so on, so that box 8 would be 17).

I include this not show how smart I am, but to show how different my answer is from the "L" person's answer. Totally different routes both arriving at the same answer. It seems to point to different pathways and connections of neurons rather than total number. Of course, more neurons means more connections, or at least more potential connections, assuming the connectivity ability is intact.

Expand full comment

I'm skeptical that the "L" solution is an actual solution to the problem, rather than coincidence. I.e. you could probably make a similar task where there was a pattern like the L for the 8 first boxes and not for the 9th. Also, it doesn't fit for the first box.

I find your solution fascinating. Crefbanyyl V fbyirq guvf ol abgvpvat gung gurer ner gjb obkrf. Gur svefg bar zbirf bar fgrc naq gur bgure zbirf gjb fgrcf, obgu fgnegvat va obk bar, jurer gur obkrf ner beqrerq yrsg gb evtug, naq gura sebz gbc gb obggbz, jenccvat sebz gur ynfg gb gur svefg obk jura ernpuvat gur raq. Zngurzngvpnyyl guvf trarenyvmrf gb lbhe sbezhyn, 1. a, naq 2. -1+2a, jvgu obkrf ahzorerq sebz 1-9, naq gura fgnegvat ng 10 ntnva, ohg V arire gubhtug gb sbeznyvmr vg juvyr fbyivat.

edit: rot13, since I see other do this to avoid spoilers.

Expand full comment

Now that you mention it, I think he said you could take the colored square in frame one and use it to make an L in each of the other frames. Another person also had your solution. It would be interesting to see the distribution of various correct solutions. Haha, I didn't think to look at the pattern of dancing boxes my solution would create. I just numbered the boxes to see if there was an equation that fit and it jumped out at me right away. I doubt either of us have more neurons than the other, just different pathways that lead to the same end point.

Expand full comment

I don't know how to do that, but I'll edit and put "spoiler" across the top of my comment.

Expand full comment

just google "rot13", about the first link that comes up for me is a web-page that will translate back and forth.

Expand full comment

Thank you. Will use in the future.

Expand full comment

The speculations above strike me as highly plausible. "Yes" to neuron count being correlated to "intelligence." "Yes" to increased ability to represent and/or learn more function. And I would add the speculation that having more neurons makes it less necessary to forget things in order to learn new things.

But I also suspect there is something more to the story. Something related to the quality of the neurons, rather than just the count. I thought I remembered reading something about neuron speed, so I asked Claude 3.7

Q: Is there research indicating that neuron speed is a factor in the variance in human IQ? If so, does the difference seem to be physiological or developmental (for example, due to more efficient pruning of unused synapses). Please answer in a few paragraphs.

A:: Research does indicate that neural processing speed is associated with differences in human IQ. Multiple studies using reaction time tasks and inspection time paradigms have found correlations between faster neural processing and higher IQ scores. This relationship appears fairly consistent across research, with neural efficiency being one proposed mechanism underlying intelligence differences.

The origins of these differences likely involve both physiological and developmental factors. Physiologically, differences in myelination (the fatty insulation around nerve fibers that speeds signal transmission), white matter integrity, and neuronal membrane properties may contribute to processing speed variations. Research using neuroimaging has found correlations between white matter organization and both processing speed and IQ.

Developmentally, there's evidence that neural pruning plays a significant role. The brain initially develops with an abundance of synaptic connections, which are then selectively pruned during development. More efficient pruning may lead to more optimized neural networks, potentially improving processing speed and cognitive function. This "neural efficiency hypothesis" suggests that higher-IQ individuals may show more focused neural activation patterns during cognitive tasks, using fewer resources more effectively.

The research suggests these factors interact throughout development, with genetic influences shaping basic neural properties while environmental factors influence how these neural systems are refined and optimized over time

Claude can make mistakes. Please double-check responses.

That last line sounds like good advice, so I asked Perplexity.ai for a fact check. It vouched for Claude's answers and provided the following links to back it up.

https://elifesciences.org/articles/41714

https://pmc.ncbi.nlm.nih.gov/articles/PMC3608477/

https://www.humanbrainproject.eu/en/follow-hbp/news/brains-of-smarter-people-have-bigger-and-faster-neurons/

https://pubmed.ncbi.nlm.nih.gov/35995756/

It is so nice not having to rely on my own neurons for this kind of research/scholarship!

Expand full comment

I got that IQ question right! Yay!

I've never taken an IQ test and I have no plans on doing that. I see exactly zero upside. I guesstimate I'm in the area of 110-120. If my score is surprisingly low, I'll have pointlessly added another disappointment into my life.

If my score is surprisingly high, I will have pointlessly fed my egotism.

Expand full comment

A 20-minute online Mensa test with similar style puzzles told me that my IQ was 134. I have no idea how accurate that is, but I totally failed to discern the pattern here in a reasonable amount of time. I still would've intuitively picked D, which amusingly happens to be correct.

Expand full comment

Yeah I picked D very quickly and then spent too much time questioning it and trying to find a definitive pattern.

Expand full comment

I didn’t get it right. Sadly.

Expand full comment

You actually _can't_ say more complex things in a language with more words. Evidence: Computers can take any English message, convert it into just 0s and 1s, and then convert it back with no loss. This means 2 words is enough to say everything that you can say in English.

You can say more things _per word count_. But that's just combinatorics--if your language has N words, and you can fit M words in a message, there are N^M possible messages you can send. Expanding your dictionary still doesn't usually help, because doing so makes each word more expensive to send (which is why computers don't lose much by sticking to 0 and 1).

You might be able to rescue your friend's analogy by saying that you can say more things if you have more _concepts_. (This doesn't translate into being limited by number of words in the dictionary, because you can express a single concept by writing multiple words.) But I don't know how to formalize that, and I'm honestly not sure whether there is some concrete cognitive unit in real-life minds that directly corresponds to "one concept".

Expand full comment

> This means 2 words is enough to say everything that you can say in English

Raw bits are meaningless in isolation; they must be interpreted, and shoving complexity into the interpreter is not the same thing as eliminating it. You can say everything that you can say in english with two words *and* a suitable decoding scheme.

Expand full comment

English words are meaningless without an interpreter, too.

I don't see any obvious reason why an interpreter for bits that spell out english text in ASCII would be noticeably more complicated than an interpreter for ink on parchment that spells out english text in glyphs.

And even if it were, I don't see how that would support the claim that you need a lot of words in your language in order to say complex things.

Expand full comment

Yes, and that interpreter is "knowledge of the English language". Your encoded English still has all the same words, it just records them differently: 1 and 0 are not the only words of binary English any more than 65, 66... etc are of ASCII English. The language is what it is, independently of its encoding.

Expand full comment

Could you please taboo "word" and then explain your argument again?

(Referring to https://www.lesswrong.com/posts/WBdvyyHLdxZSAMmoz/taboo-your-words )

Expand full comment

Fair enough, the actual relevant unit here is the morpheme

Expand full comment

Agreed, "number of words" isn't a good metric to use by itself. Perhaps the analogy can be salvaged by postulating that the number of words you can hold in your brain at once is limited.

Expand full comment

It seems implausible to me that there would be a _language-independent_ limit on the number of words your brain can hold, unless by "word" you actually mean something like a native pointer in your brain's address space, rather than a linguistic construct that you can write down on paper.

Expand full comment

< Computers can take any English message, convert it into just 0s and 1s, and then convert it back with no loss. This means 2 words is enough to say everything that you can say in English.

No, that’s not right. It means you can say anything in English using 2 *letters.”

But letters are not units of meaning, only certain combinations of letters are units of meaning.

“XWRKL”. Did I just say something? Nope, right? Translate that into binary and I still didn’t say anything.

You can’t say everything with 2 letters any more than you can with 26 letters. . You can only say it with words, the unit of meaning

.

Expand full comment

A concept of a word is maybe a human bias; when you get into more extreme forms of compression they start blurring the symbols in between bits. (Tho anything that compressed is necessarily straining practicality)

Expand full comment

No, it isn’t a human bias, it’s a real

and meaningful distinction. If I

write out the word ‘orange,’ I’m not accreting meaning letter

by letter— it’s only the whole word that has meaning.

Expand full comment

I agree human speech is made of words and letters, but "unit of meaning" (symbols), can be made to be under a bit as you see with extremely aggressive compression. And those theories would do extremely well with analog signals.

I'm questioning your framing of discreet units of meaning being universal rather then a bias of practicality or humans. More fundamental then the 01s of op is listing all possible messages between 0 to 100% and getting a single number, Grammer words letters may just disappear into a blob of wave forms of possibilities.

Expand full comment

I don’t understand what you have in mind. Can you say more — maybe give an example?

Expand full comment

Nurions seem to communicate by timing, this is an analog signal .10001 nano seconds may have a different meaning then .10002; they are sending a pluse that maybe 1 set strength so the "unit of meaning" maybe the range between .100015 to .100025.

One could imagine that neclear subs want to complelety minimize thier signals, they may have to send something, so lets say 1 analog pulse. Imagine trying to encode an entire status message into a single pulse; 0-.5 could be "im fine" and .5-1 "PANIC" and then subdivision futher maybe a gps location could be encoded in range of .000001% of this "space"

Before brains form all life is making a 3d shape where growing extra cells *cough* cancer, is controlled making a specification that makes some sense. They have waves of electisty that must be encoding knowledge of their current shape, and the democratic concessus of further growth; *any* level of success at this level I believe is quite beyond what I believe computer science has for "swarm intelligence" yet 99.99% of the time we are born without extra arms to say nothing of the 1000's of other mirco successes.

Expand full comment

The word "apple" in english and "pomme" in french both refer to a category of objects that we can observe in the word. In order to convey the same meaning in 0s and 1s you have to have a combination of 0s and 1s that represent a unique identifier (or unique enough that it can be derived from context).

That is the fundamental level of meaning, not bits. You can change the categories to more general ones, and use more words to convey the same meaning. E.g. "round fruit of a tree of the rose family", and you could break "fruit" down further, and so on, but you can never break this down to bits as the fundamental unit of meaning, because then you would only have two categories containing meaning.

Expand full comment

Words are not messages, in the wikipidea compression contest, the message is wikipedia and the context is hardly a thrown away part.

Instead of breaking down words further, you send entire messages and words may disappear.

Expand full comment

I'm not sure what you mean.

Sure you can compress information, but there is a limit to how much. You still need to be able to identify the message afterwards - hence the unique identifiers. I think this is orthogonal to what we are discussing?

Expand full comment

Im seeing 3 thoerys of "what the fundamental unit of commutation"; bits, words, messages. Your asserting a that perfect decoding is nessery therefore words need to be decoded.

While I think bit theory is wrong, its at least universal. I really really think it needs to be about messages. Consider a cry from an prey animal, there maybe nuance on how loud to yelp or some grammer is when seeing a threat, but it wont matter if unheard because of distance but a message is still being received if hearing any noise increases awareness.

Expand full comment

Words are not the unit of meaning. Linguists call the smallest units of meaning "morphemes", and they don't correspond to what laypeople call "words". Linguists actually can't agree amongst themselves on what "words" are; there's no accepted formal definition.

So if we're being technical, the original claim wasn't formalized enough to have a precise meaning. I'm making an informal argument intended to show that the underlying intuition is wrong.

There are lots of meanings you can convey in English that can't be expressed by a _single_ computer bit. But there's also lots of meanings you can convey in English that can't be expressed by a single English word! We string words together to convey complex ideas that don't have their own dedicated word. Which is why it wouldn't make sense for the number of words in the language to be the bottleneck on how many meanings you can express in it.

But if you want to dicker over the details, then "zero" and "one" are usually considered to be valid and meaningful words. To distinguish how computer bits are used differently from words, I think you'd need to say something about how they're composed, rather than about what they mean individually.

Also, you point out that not every combination of letters is meaningful. But this doesn't distinguish them from words; not every combination of words is meaningful, either. Try looking up some of the discussion around the famous phrase "colorless green ideas sleep furiously."

Expand full comment

0s and 1s are fundamentally different from words in that they don't have any meaning in isolation. It's a combination of 0s and 1s that has meaning. Somehow those 0s and 1s has to be combined into chunks that point to some agreed upon meaning - which is loosely the same concept as words.

You could reduce the number of words in a language, but then you would have to use a lot more words to convey the same meaning. Since most words have fussy meanings and edge cases that may or may not be within the the same category, this would probably also mean that you would become more precise by accident.

For example instead of saying "cup" you would have to say something like the definition, so "small bowl-shaped container with handle, for drinking" - but some things we would probably call cups are not bowl-shaped, and some doesn't have a handle, and cup could also refer to a sports event or a unit of measure.

Expand full comment

To break this down further, I think the fewest "words" necessary would theoretically coincide with the fundamental underlying physics that give rise to the world.

So for example, if we pretend that atoms are the fundamental unit (as in the original greek word), then we need a word for every type of atom (element) and all the properties that an atom can have (position in time and space etc.), in order to describe the world in arbitrary complexity. This would be very inefficient in terms of number of words used, but efficient in terms of number of words needed in a language.

Expand full comment

How many distinct messages can English convey? Thousands? Millions?

Two independent "words" can only convey two messages. It is not possible to encode thousands of messages into a 1 or a 0. Cannot be done.

Using binary strings, many messages can be encoded; that's how computers do it.

Expand full comment

English and binary can both convey an unbounded number of messages because the messages can be unbounded in length.

Expand full comment

... which is my point.

Expand full comment

You claimed that English can convey thousands or perhaps millions of distinct messages. I demurred, saying the number is unbounded, not mere millions. Then you complained that was your point. I notice I am confused.

Expand full comment

I never placed a bound on the number.

Expand full comment

The relation between neurons and intelligence might not be obvious, but it seems obvious to me that you should at least be able to do "more total thinking" if you have more hardware to do it on; i.e. one big mind should be at least as capable as a bunch of small minds stuck together.

This should at minimum let you solve more problems per unit of time, even if it doesn't let you solve harder problems. (And note that IQ tests have time limits, so this is probably at least somewhat relevant to whatever they measure.)

I do think that intelligence is more than just "thinking faster", though, so I don't think this is the entire story.

Expand full comment

What is GoogBookZon and what is it building? This name doesn't google

Expand full comment

It's an amalgam of Google, Facebook and Amazon. As it happens, none of those are building the $500B data center (that would be Stargate, to be built by OpenAI and others including Microsoft). But GoogBookZon do plan to invest $340B in AI in 2025, between them and anyway the main point of massive investment in AI scaling stands.

Expand full comment

This issue (that we haven't resolved neuron count vs structuralism as important factors in intelligence) is why I'm against things like neuralink. Because if it's just (or nearly just) a matter of number of neurons, then giving brains the ability to directly communicate risks them synching up and forming a superintelligence.

I'm not "bomb the labs" worried, though, because my personal bet is on diminishing returns and more system instability as your neural net gets larger. So linking millions of minds directly might result in a superintelligence only a bit smarter than us but very, very suicidal. That, and nobody has solved the inflammation/scarring issue yet.

Expand full comment

What about the communication complexity barriers? Lots of tasks would become limited by the high latency inter-brain connections, just like high performance computing is bottlenecked by needing to shift data between different physical parts of the system. Not everything is intrinsically embarrassingly parallel.

Expand full comment

It appears that growing the brain too big leads to the higher impairment versions of autism though: https://www.sciencedaily.com/releases/2020/12/201217135228.htm

Expand full comment

I think that the general feeling among many people is there is a deep connection between humans learning 'patterns', in some broad sense of the word 'patterns', and neural nets learning features and/or representations in some latent space. The mapping is not clear or necessarily well-defined at the high level of abstraction humans operate at, so this analogy leaves a lot undetermined.

A good way to gain intuition about these types of things is just to play around with simple neural nets in various ways. I've always liked Hopfield nets (https://en.wikipedia.org/wiki/Hopfield_network) for this - in a simple case they can be used to store images by 'learning' a type of energy function, and thereby denoise images close enough to one they've learnt. See e.g. https://github.com/takyamamoto/Hopfield-Network for an implementation. The github page has some nice examples of learned images.

Expand full comment

Apropos of your example, I've memorized about 70% of Paradise Lost (I'll have it all before I die!) so I can say as a data point that it doesn't seem to me like it's unlocked any general pattern-recognition abilities. I'm also good at Raven's matrices, but slightly worse than when I started the Paradise Lost memorization, which I attribute to the independent effect of age-related decline.

(and if anyone would like to hear me recite books 1-9 of PL from memory, come to PDX and visit Rose City Book Pub at 7:30p on the first Saturdays of Apr-Dec this year! I do voices and everything; it's great.)

Expand full comment

Cool! Have you noticed yourself getting better at memorizing it over time?

Expand full comment

Yes, for sure. I'd say I can add lines at roughly twice the rate at which I could when I started. It doesn't seem to generalize to other poems I try to memorize, so I think my brain is picking up deep patterns in "how would Milton say this" even if those patterns don't help with Raven's matrices.

Expand full comment

Most animals are born with a set of hard-wired instinctive responses to stimuli, and higher animals, especially in their infancy, can then learn things, up to a point, on top of that. Some animals, such as cats, are avid learners until adulthood but then pretty much stop learning (I think, based on personal experience with my cats).

So a good question to ask is how is the brain developing during the animal's early days, and (in the case of cats for example) what stops changing in the brain when they reach the age of about six months? Specifically, does the development stage involve increasing the number of neurons, or (as I suspect) do those stay about the same and its the axions connecting them that increase in number.

Expand full comment

The real question: if tiny bird-sized neurons are good enough, why can't I use them and pack a lot more neurons into my medium-sized head?

Expand full comment

Your mammalian genes don't know how to make bird neurons.

Expand full comment

Well yes. So I should ask in a less fun way to avoid misinterpretation: why doesn’t selection pressure in mammalian lines push towards smaller neurons, and (in particular in humans) avoid the trade-off of intelligence versus head size and the birth canal, given we have an existence proof (not that evolution is teleological, of course)?

Expand full comment

Birds have had continuous pressure to reduce the size and weight of everything in their body for many many generations, because of flight. Primates have only had brain size be the limiting factor in their intelligence for a comparative eyeblink.

ETA: There may be engineering factors like John R Ramsden speculates producing harder limits as well, but I think this explanation is sufficient even if not necessary.

Expand full comment

>Birds have had continuous pressure to reduce the size and weight of everything in their body for many many generations, because of flight.

I am sure this explanation is totally wrong.

It's rather mammals were small sized and occasionally got a structure of brain that is good for small animals but gives large neurons at large body sizes.

(slopes on log brain - log body charts are similar for "reptiles" and "birds" but different for "mammals").

Expand full comment

It's not as though mammalian genes can get inspired by looking at the bird genes. (Ok, lateral gene transfer, but that's unlikely to succeed.) Evolution has not had time to try out every possible combination, and making big changes to an already-working complex system is hard because if it breaks along the way, you lose. Maybe with another few million years we could get something similar, but human-directed gene engineering could do so much faster.

Expand full comment

Your neurons would overheat, and your head would explode! :-)

(See my post elsewhere in this thread, speculating on birds' higher body temperature and thinner skulls, allowing their brains to run hotter)

Expand full comment

Are you suggesting that we have had edge-cases where a mutation causing neurons to become smaller caused somebody's head to pop because they were thinking to hard? :D

Expand full comment

From what i vaguely remember after looking in to this last time Scott wrote about it, I think smaller neurons are more energy costly to run because they need bigger differences in electric charge between neighbouring neurons for them to be individually resolvable since they're closer together. If you're not saving weight to help you fly it's probably better to just make your brain bigger rather than denser, like evolution has been doing.

Expand full comment

You'd also have to look at the number of *connections* not just number of neurons

Expand full comment

I think there's a problem with this approximation :) At least for humans (it probably is mostly accurate for ML models).

If you have more parameters you also need more data to fit anything (with ML models at least). A 2 parameter linear regression only needs a few examples to be fitted very well. A LLM needs insane amounts of data to move the parameters anywhere - because it has so many wiggly bits, there are so many ways those can for the data and you need a lot of data to be able to dismiss the wrong ones.

But this doesn't match how smart people learn. If it did, smart people would learn shower but would be able to understand more complex things. But smart people usually converge to the right answer faster AND can understand more complex stuff. They are also often better at simplifying complex approximants to simpler ones (which are also easier to work with and could plausibly require fewer neurons). When I was a postdoc (maths) I was always amazed by my boss's ability to rephrase complex ideas in much simpler terms which where do much easier to work with - even for me.

I don't think just having more wiggly bits for more complex approximations helps here. Most really smart ideas tend to be simple and elegant when you really grok them. It takes a lot of thinking and a lot of practice with the idea to be able to go from the complex approximations to the simpler (but equally or more correct) ones. And at least in my experience the really smart people mostly work with the simpler approximations and so don't fill their entire working memory with the "Taylor expansion of the idea" of an umpteenth order, they just don't brute force it the way a less brilliant person or an ML model might.

Expand full comment

Yeah, it seems like LLMs learn much more slowly. Perhaps their training is more analogous to evolution for humans, and in context learning is more analogous to human learning? Or arguably early childhood is the analogue to LLM training. It takes a very long time for kids to pick up early concepts compared to the later pace of learning.

As far as making simpler, more elegant mental models for problems, maybe part of it is that smarter people already have more models available, so they can pick a simpler one? If you only have one solution that comes to mind, it will likely be more complex. I also think smarter people can cycle through options better and may also have a better appreciation for an elegant solution because they have more options to compare across.

Expand full comment

"As far as making simpler, more elegant mental models for problems, maybe part of it is that smarter people already have more models available, so they can pick a simpler one?"

but why would this be the case if we model learning the way Scott describes, i.e. as an approximation algorithm which fundamentally just adds more complex "polynomials" to get closer to the actual result. Those things give you better predictions perhaps, but they are not easier to understand. Whereas the most elegant solutions typically are easier to understand and do not require more wiggly bits ... unless the wiggly bits are a lot more abstract ... but then why is it also easier for less intelligent people to then understand the more elegant solutions?

"smarter people can cycle through options better"

I think you are onto something here. But why do you need more neurons for that?

Smarter people are also a lot better at abstractions and generalizations (which ultimately make things simpler). But both usually produce something that is simpler to understand, so presumably needs fewer neurons.

The way it feels to me is that if you really are onto something you somehow zoom in on the defining parts of the problem and then figure out how those interact. Whereas when you cannot solve the problem, you get bogged down in complex and useless details. I have no idea how this is done. But obviously being very smart helps but also having thought about similar ideas for a long time and from various angles and doing "what if" experiments with them helps a lot. Smart people are better at this exercise.

I remember one time when I was on a bus, just thinking, and figured out a new mathematical proof that way, without a pen and paper. When I got home, I just wrote out the details to be sure I hadn't made a mistake anywhere. Things just fell into place once I realized the simpler basic structure of the problem. But other times, when I could not strip down ideas to their core, I ended up wrangling complex equations for weeks and I was often not too sure I really understood on a deeper level what those equations meant outside of their formal representation. And this then rarely lead anywhere (some proofs are like this but those are also the least interesting ones and the theorems they prove also tend to be sort of boring and technical or very applied and specific).

Maybe this is better rephrased like this - smarter people have more original thoughts and can understand things on a deeper level. That is not necessarily a more complex level. But this ability to see through the dressing and observe the fundamental pieces of an idea is what I would describe as being smart.

Expand full comment

Not gonna lie, stuck on the IQ test cause the patterns to me don't make any sense with the possible answers provided, and the one comment stream that seemed to be addressing it has people suddenly talking in some foreign language.

Expand full comment

It's called rot13: changing A to N, B to O, C to P, etc. People are using it here to avoid spoilers. You can decode it at rot13.com (among other places).

Expand full comment

How come LLMs have scaling laws proportional to the log of their parameter count but humans are much cleverer than chimps but only have a brain 3x as big which is a pretty small difference on a log scale? Do neurons have a faster scaling law, or are human brains more neuron dense than chimp brains, or is the chimp->human difference smaller than it appears, or something else?

Edit: Another random question I have is if it's easy to scale a normal mammalian brain to human level intelligence, why don't all large animals have human level intelligence? Seems like it would easily be worth it for a rhino to expend 0.5% of it's energy budget to run human level intelligence. What's constraining rhino brains to be so small?

Expand full comment

The simple answer is because our brains don't use a Transformer architecture like (most, but not all) LLMs.

> How come LLMs have scaling laws proportional to the log of their parameter count

Hmm, I'm not familiar with this. The Chinchilla Scaling Law paper, if I remember correctly, isn't log(parameter count) but rather it was that performance is predicted by a power law involving both model size and dataset size. Even if I misremember Chinchilla, it was definitely specific to the Transformer architecture, and our brains definitely are not. Or was there a different scaling law you're referring to?

But it doesn't change your question, because we still seem to be way smarter than chimps, but our brains aren't that much larger. It seems that primate brains are perhaps scaling more efficiently than Transformers.

There are definitely emergent behaviors in LLMs, so it could also be that. Take a look at Figure 2 on page 4: https://arxiv.org/pdf/2206.07682

Expand full comment

Humans are capable of full causal reasoning, analogical reasoning and compositionality, but chimpanzees are not. These abilities give us a very different intelligence (not just more of pattern recognition) than chimpanzees (or any other animals, including birds). Elephant have been tested for these abilities and do not have them either. Not just a question of brain size or how little energy it would take in the rhino example you give -- you need this something "magic" (well.... sounds better than describing possible mechanisms) to happen in brain evolution, as did in the relatively recent past in humans.

Expand full comment

Magic mushrooms…could’ve happened to anyone; we just got lucky.

Expand full comment

Can't comment on why animals have different brain architecture but transformer scaling is exponential in depth, linear in reasoning steps, even though it is only logarithmic in the number of parameters: A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

https://arxiv.org/abs/2503.03961

So focusing on size might be unproductive.

Expand full comment

I am not questioning the intuition, I think it is correct, but just wanted to point out that you can consider the English language as having only 26 "words" if you want. You CAN express anything you want with 5 symbols. It would just take much longer.

Expand full comment

< you can consider the English language as having only 26 "words"

No, words and letters are different kinds of entities. Letters do not have meaning, only certain groups of them that we have agreed are words. ‘Xdhhm’ does not mean anything. ‘Zebra’ means something, but the individual letters in the word do not make individual contributions to the word’s meaning.

“A zebra just ran past.” If you combine words in a sentence you get more complex meanings. And each of the individual words contributes to the sentence’s meaning.

Expand full comment

Sometimes you get one-letter words (English a, I, formerly o--not counting "u" because it's a shortened form of a multi-letter word), but notably you can only have as many of those as you have letters. Given that attempts to write with as many as 1000 words struggle to convey complex ideas, that does seem to put a limit on things (imagine trying to understand https://xkcd.com/1133/ without already knowing the full vocabulary).

Expand full comment

Steele's classic vocabulary limited paper Growing a Language spends a lot of effort on defining new terms which then can serve as compression. Without the compression of abstraction it is still possible to use two symbols (heck, even unary will do) but saying anything interesting requires a lot of symbols. There is a tradeoff between expressivity, vocabulary size, and length of text required to express a concept, but all human languages I know of have enough vocabulary to allow expressing arbitrarily complicated concepts, although you might have to speak for a long time and get the person you are talking to to first understand the intermediate concepts you are using if they don't already have it in their mental toolbox.

Expand full comment

You mean like saying parking place in French?

Expand full comment

I have to admit I was stumped by cold wet stuff that you breathe for way too long

Expand full comment

I feel like this description is too heavily focused on pattern recognition. That's a big part of what these models do, sure, but once they recognize patterns, they have to do something with them-- they build a little software to properly handle the patterns. It's these little machines all working together that gives these models real intelligence, not just the pseudo-intelligence that comes from pattern matching.

Polysemanticity and superposition are the answer to a different question-- how can something with less than a trillion parameters hold so many concepts at once?

Expand full comment

My bet here is that it's not neuron count (nodes in a graph) but actually edge count (connections between neurons) that matters - for reasons that I can't quite articulate beyond noticing that organizing information into a graph / tree structure allows you to encode orders of magnitude more things than if you aimed for a 1 node = 1 thing approach.

To back this, something like 60% of the brains mass is the wiring between neurons.

Intelligence also seems to involve things like "recursion" or "abstraction" or "making connections" - all things that graphs excel at.

Expand full comment

Edward H. Rulloff

Expand full comment

interesting story. thanks

Expand full comment

His brain was just sitting in a vat in the hallway of Cornell's psychology department. Not sure if it's still on display.

Expand full comment

Total Puritan Score: 36 points

-Raised in New England (including upstate NY): +1

(Technically Canadian-born, but fully committed to crimes and scholarship in Upstate NY.)

- Has 4+ siblings: +1

(Eldest of six children.)

-Relative named Elihu, Eliza, etc.: +1

(Brother-in-law named Ephraim—close enough!)

-Per invention (max 3): +3

(Philological theories, photography techniques, carpet designs, and more.)

-At least one eccentric invention: +3

(Proposed a linguistic manuscript he valued at half a million 19th-century dollars.)

-Achievements in multiple unrelated fields: +3

(Doctor, lawyer, linguist, carpet designer, murderer—broadly unrelated!)

-Atheist, Deist, or Freethinker: +1

(Philologist and self-taught intellectual certainly counts as freethinker.)

-Wrote a book about their heterodox religious views: +3

(Obsessively authored “revolutionary” linguistic theories bordering religious zeal.)

-Went to Harvard, Yale, or MIT: +1

(Never attended, but currently “resides” permanently at Cornell’s Wilder Brain Collection—Cornell is Ivy League! Common guys...it is! )

-Practiced Law: +1

(Repeatedly defended himself successfully in court.)

-In college society with weird classical name: +3

(Used semi-classical sounding pseudonym “Professor Euri Lorio” at the American Philological Association.)

-Founded their own school: +3

(Taught Latin and Greek from his prison cell—I'm counting this as an unique academy.)

-Ideals that were utopian yet racist: +3

(Victorian-era phrenologist, textbook!)

-Waged a crusade against abstract concept: +3

(Lifetime crusade against linguistic ignorance and imprisonment itself.)

-Social reformer: +3

(Attempted to revolutionize society through both scholarship and burglary—questionable methods, but points for ambition!)

Expand full comment

One thing I'm curious about is the strength of this relationship in humans vs neural nets. My vague impression is that the relationship is quite weak in humans (at least using volume as a proxy), while increasing the number of neurons/parameters have been a pretty big deal for neural nets. For example there are some extreme case studies where individuals have pretty normal intelligence while lacking a whole hemisphere, or having lost large number of neurons to hydrocephalus. There's probably no simple way to compare the strength of these relationships directly, but if true I wonder why. Maybe it's simply that humans are all already high up on a curve on diminishing returns? The more exciting option is that there's more variability for biological in differences in how the networks are effectively structured, where NN AI systems are mostly trying very similar things.

Expand full comment

Interesting facts and hypotheses, and so many comments, but.... it's not just neuron count:

-human brain and chimpanzee brain almost identical (and genetics so very close), yet latter not capable of causal reasoning, analogical reasoning or compositionality

-birds not capable of these either (whole literature on twig experiments -- yes impressive object handling abilities), neither are elephants (harder to test whales :)

-you may be able to achieve pseudo-causal/analogical/compositional abilities with large enough LLM and reasoning engine (and latter may allow it to become true), but smaller cognitive architecture may provide much more elegant solutions to synthetic AI/AGI/SI with these abilities

-clinical psychiatry actually has not cared about casual/analogical/compositionality (MOCA.... aghhh... bits of it but not really)

Enjoy your columns. Don't usually post. Good topic.

Expand full comment

Sorry, should have clarified: "-human brain and chimpanzee brain almost identical (and genetics so very close), yet latter not capable of causal reasoning, analogical reasoning or compositionality" -- but not identical -- some small changes allowed causal reasoning, analogical reasoning and compositionality to emerge and not just number of neurons, e.g., people with hydrocephalus and cortical thinning and yet able to function normally (i.e., there is still causality, analogy and compositionality) -- not just number of neurons

Expand full comment

> some small changes allowed causal reasoning

This is interesting to me. Could you be more specific, or point me to a source?

Expand full comment

Artificial neural networks are trying to curve match using a series of linear functions, where each layer uses the prior layer's output as its input. The more terms they have the closer they can get. You can imagine a really simple "neural network" where each layer just has one neuron and it tries to curve-fit square roots using Newton's method (You can't do this with a linear neural network that uses weighted values, but the idea is the same). Each neuron would just be set to x -(x^2-(input))/2x).

So if you only had one neuron, and you put in 5 you'd get 5 - (5^2-5/10) = 2. With two you'd get the first one, then 2 - (2^2-5/4) = 2.25. With a third you'd get 2.23611111.... Each additional iteration would get you closer to the 2.236067... true value, though none would ever get it perfect.

This seems far removed from, say, solving a logic puzzle or writing a novel in the style of Jane Austin, but internally, every neural network looks this way. It assumes there exists a curve in multidimensional space that abstractly represents a perfect novel in the style of Jane Austin (or more accurately, the perfect next word in the context of a novel in the style of Jane Austin), and attempts to match that curve, with every new neuron giving it an additional tool to inflect its own curve a little closer to that perfect one. During training we adjust those neurons to more closely match the training data curves, in the hopes that it matches future curves in a similar way.

Human neurons are a tad different because our neurons have recursive feedback. Even if we assume we learn the same way (which hasn't been entirely established), we do not always move data from Neuron A to Neuron B to Neuron C linearly. Sometimes Neuron C will feed its data backward to Neuron B or Neuron A.

This is one reason I am skeptical of neural networks' ability to truly reach and surpass human intelligence. It's very good at fitting data based on past information. And to be fair, humans usually work in this mode. Very good plumbers are good because they've seen a lot of plumbing situations and so can quickly say "oh this situation needs this solution." As your friend points out, these rules don't have to be object-level, they may be meta-rules. A mathematician may say "here are techniques I have used to solve problems before, maybe this technique fits this problem too."

But even assuming that all thought is curve-matching, there are some curves that, mathematically, cannot be approximated with linear terms. Put another way, some thought processes need the ability to modify themselves in real time, not during training, to incorporate information that is only discovered in the process of thinking. Once we crack that, I'll be truly afraid.

Expand full comment

A few counter-points:

1) This seems to be focused on a simple MLP-with-ReLUs paradigm of NNs. Many modern architectures are not covered by it - RNNs, transformers and SSMs to mention the most important ones, and of course even plain old MLPs commonly use other nonlinearities.

2) "there are some curves that, mathematically, cannot be approximated with linear terms." - I'm not entirely sure what precise statement you're implying. Is it just that universal approximation theorems do not apply to literally all domains and all types of functions? If so, you'd need to argue that e.g. being able to approximate Cantor's staircase or the Weierstrass function matters much to intelligence. Can our neurons (as opposed to our high-level symbolic reasoning) actually approximate such functions well?

If what you're saying is that there are many _important_ functions that cannot be approximated _efficiently_ with linear terms - that is certainly true and impactful. Good thing we are not in fact limited to linear terms. One topical example is the softmax in attention.

3) The analogy to human neurons - we do indeed strongly believe that our brain does not implement a current-day deep learning model and does not learn via gradient descent. But, regarded as mathematical objects, neurons are not highly sophisticated. Whatever magic inhabits the brain, it's probably not merely the presence of feedback (which is used in some NNs as mentioned above).

4) "It's very good at fitting data based on past information." - this passage seems to over-focus on memorization/ overfitting. And yet if there is one surprising this about NNs, it's precisely that they do generalize. They do find some meta-rules. Why this should be the case is a whole different discussion (which I touch upon in a different thread).

Finally, I agree that we likely lack components such as good real-time learning. But notably recent progress has been indeed focused on inference-time compute, so this particular objection isn't doing as well as it used to.

Overall, I actually usually argue against over-confident AI-worrying. As you say, we don't yet know how to train models optimally. We also don't know how to efficiently incorporate prior knowledge about the world into the model. But it's worth noticing when our set of reasons for not being worried should be updated.

Expand full comment

I don't have anything to add here other to say that this is a good comment and I guess I know what I'm doing this weekend.

Expand full comment

The AI people always say "you wouldn't build a submarine that swims like a whale." But we know how a swimming things swim and we don't know how thinking things think....

Expand full comment
Mar 7Edited

There is also "system 1" and "system 2."

https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow

S2 programs S1 so that a chess master can glance at a board and say, "mate in 4" without activating S2. Possibly the ease and frequency with which S2 activates could be a factor (it feels that way to me). [edit: That this seems likely to be relevant makes one question the observation that IQ really does appear to be a scalar(one dimensional) value.]

Expand full comment

> Humans with bigger brains have on average higher IQ (embedded link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7440690/)

This paper is pretty weak tea scientifically speaking. ;-) A correlation of ρ = 0.18 p < .001 means that brain volume accounts for only about 3.24% of the variance in intelligence. The authors go on to claim that the encephalization quotient (brain size vs body mass) should be more correlative, but several of the papers they reference suggest that this is not the case.

Global variation in cranial capacity, however, is strongly correlated with latitude and climate. A study by Beals, Smith, and Dodd (1984) -- abstract here: https://www.journals.uchicago.edu/doi/abs/10.1086/203138 — suggested that climate-driven selection may have influenced brain size and shape. Human populations under severe cold stress had larger volumes and rounder cranial shapes (to optimize heat retention). Populations from warmer climates had somewhat smaller cranial volumes and more elongated skulls. Subsequent studies confirmed the correlations.

Does an Inuit have more neurons in their skull because their brains are more spherical? Maybe. And then we have to also remember that Neanderthals (a population that seems to have cold-adapted) had larger absolute brain volumes (~1,500–1,750 cm³ vs. ~1,200–1,450 cm³ for modern humans).

Expand full comment

"Human populations under severe cold stress had larger volumes and rounder cranial shapes." Funny, Today I learned that Huns liked to tie boards to their babies heads to elongate their skulls, for whatever reason.

Expand full comment

Since several people mentioned that the number of connections might be more important than the number of neurons: it seems that more intelligent mammals have *fewer* connections per neuron than less intelligent ones. If I recall correctly, the number of connections per cortical neuron is only about 2,000 in humans, but 10,000 in mice, and somewhere in between for rats and cats. I'm out of the loop nowadays, but this was considered a major puzzle of biological neural architecture.

I think toddlers also decrease the number of connections during development, though this may be an unrelated effect. Do neural networks also reduce the number of strong connections during training?

Expand full comment

I know Raven's matrices are supposed to resist training but I vaguely remember realizing "imagine two blocks each moving at their own rate and direction" at some point and now I always try that first and it usually does it. So I would expect to do better at all Ravens-y IQ tests after that point.

Expand full comment

No doubt neuronal count matters. But the major intelligence driver is connectivity. The majority of additional brain mass in humans (vs other primates) is due to the insane level of connectivity. About 10,000 connections per neuron in the neocortex. Imagine the number of combinations in only a dozen neurons.

Expand full comment

Connectivity-per-neuron seems to have its limits though, humans with higher levels of neural connectivity past some point stop getting smarter and start getting autism.

Expand full comment

I thought the usual theory was that too-high brain connectivity led to schizophrenia?

Expand full comment

You might be right, I knew it was either schizophrenia or autism but only thought it was actually autism.

Expand full comment

A quick look seems to suggest that autism might be associated with brain size, while both autism and schizophrenia might be related to abnormal connectivity (too low as well as too high). It doesn't seem like a clear case.

Expand full comment

"endless practice on thousands of Raven’s style patterns helps a little, but a true genius will still beat you"

I don't believe the second part is true. Does anyone believe it to be true?

A true genius will of course still beat you on all the other aspects of an IQ test, but I don't believe they'll beat you on the Raven's questions unless they have also practiced with them.

There have been a few studies showing minor gains from learning. But they focused on just retaking the tests a couple of times, not "endless" practice.

Certainly it's not true of other cognitively demanding activities we correlate with IQ. Like if you train endlessly at chess, you still won't be able to beat Magnus, but you will with 100% certainty beat any True Genius who's just learned the rules of the game.

Expand full comment

Well, you could say the entire purpose of intelligence tests is to measure the part of human ability, if any, that doesn't get better with practice, and there's an adversarial process going on where people will try to practice for them, so you would expect things on up-to-date intelligence tests to at least be somewhat *resistant* to practice.

Expand full comment

No, I wouldn't say that the entire purpose of intelligence tests is to measure the part of human ability that doesn't get better with practice. Nobody says that this is their entire purpose. The usual description is to measure mental abilities like critical and deductive reasoning, working memory, pattern recognition, and so forth. There's nothing in there explicitly about not getting better with practice.

When talking about general intelligence there is sometimes a distinction drawn between fluid intelligence vs crystallized intelligence. But even this distinction is more about long-term knowledge and learning (e.g.: vocabulary), not getting better with practice.

That all said -- it is indeed reasonable to expect them to be at least somewhat resistant to practice. I agree with this expectation.

It's just a big leap from *somewhat* resistant to the *extremely high* practice resistance indicated by "you can practice this endlessly and still be beaten by somebody who has not practiced".

Expand full comment

Well, the fact that I got the question wrong and have a small head (especially the front part of my head) both seem to suggest low IQ on my part.

Of course, this isn’t what the post was actually about, ie it’s not meant to be about a particular person and is instead trying to explain a pattern across a population. But that might also be a sign of my low IQ, ie inability to pick out and focus on the main idea of a piece of writing.

Expand full comment

new asssex post yesss brain nourishment.

Expand full comment

Um has someone pointed out that number of connections scales with number of neurons. And it is the number of connections that makes us smart. And the number if interconnections, how many other neurons does one brain neuron connects to. This seems somewhat obvious, so I most be missing something. (But I have a smaller head, and in the words of Monty Pythons Gumbys, "My Brain Hurts!" with fists clenched.)

Expand full comment

My guess is it has to do with better ability to do in-context search for good fits to data. If much of the search can be done in parallel, then with more neurons you can compute more and more complicated functions in parallel, allowing you to run and create more informative hypotheses faster.

Of course, ability to store information is important too, but for intelligence it seems unlikely to me its just a matter of having read more fiction. It seems more likely the causality goes the other way.

Expand full comment

Technology is. Natural habitats of the brainwaves and the nervous systems is complex.

Expand full comment

https://www.amazon.com/Four-Realms-Existence-Theory-Being/dp/B0C5NQT7V8

The biological realm makes life possible. Hence, every living thing exists biologically. Animals, uniquely, supplement biological existence with a nervous system. This neural component enables them to control their bodies with speed and precision unseen in other forms of life. Some animals with nervous systems possess a cognitive realm, which allows the creation of internal representations of the world around them. These mental models are used to control a wide range of behaviors. Finally, the conscious realm allows its possessors to have inne

Expand full comment

I’ve contended for years it’s the entropy of the brain which should be measured. The number of interconnects per cubic centimeter.

Bird neuron myelination is different from mammals, their signals travel more slowly. Myelination and density are the two key variables

Expand full comment

Just a small point, but my failure to get the IQ question — and to even understand the explanations — does not seem to have anything to do with pattern matching. It seems to have something more to do with an inability to hold multiple moving parts in my head long enough to come to conclusions or an understanding.

I actually guessed the right answer based on pattern matching, but this is less impressive than understanding. I’ll leave the analogy with AI unstated.

Expand full comment

kinda related: why do so many substances, spices, nootropics, drugs, increase BDNF? i wonder if it's all just hormesis... and what if you're in a war during which your BDNF levels are spiking? ought one encourage dendrite growth?

Expand full comment

Work by Richard Haier suggests that it’s not about the number of neurons, but in how you use them. A few small studies suggest that those with higher IQ have less brain metabolic activity (the brain isn’t working as hard) than those with lower IQ when performing the same cognitive tests

Expand full comment

I think I have read somewhere that the correlation between IQ and brain size (correlate maybe to neuron count, but also possibly just more glial cells helping out) is minor and not significantly predictive. I have independently thought of the "pattern-matching" region, and in fact believe that it is actually distributed fractally throughout the brain and there's some secret to intelligence there. I do not think you should throw away or "pass the buck" to the pattern matching.

Expand full comment

"pretty much everybody who asks "why do neural nets work at so many things?" comes up with the same answer. it didn't have to be neural nets. other things like genetic algorithms and cellular automata have the same capability."

No. CAs can implement universal computation (terribly, terribly inefficiently). But so can von Neumann machines, and people tried for decades to make those do neural-net-like problems, and got essentially nowhere compared to where NNs are now.

GAs are a little closer because they are a general search algorithm, but they don't do gradient search except in a way so odd it's almost metaphorical.

NNs work at so many things because they use an algorithm which does nothing but removes bias in predictions. That is the magic sauce. Doing that /and nothing else/. Adding something else screws things up, /because it is by definition bias/.

Support vector machines can do a lot of the things neural nets can, if you want to do a lot more math without getting better results.

Expand full comment
15hEdited

https://pmc.ncbi.nlm.nih.gov/articles/PMC7727338/ this is an n=50 postmortem study that hints at neocortex neuron count either being insignificant or having a low effect size. the same study still found a correlation between iq and brain weight of r=0.17 (similar to more conservative meta-analyses and studies of the relationship between brain volume and iq). another postmortem study in autistic individuals, https://jamanetwork.com/journals/jama/fullarticle/1104609, found that autistic children had 67% more neurons in the prefrontal cortex, which could hint at potential deleterious effects of a disproportionately high neuron count.

on the other hand, there is a study that shows an association between neuron size / complexity and iq: https://pmc.ncbi.nlm.nih.gov/articles/PMC6363383/.

Expand full comment