Highlights From The Comments On Missing Heritability

...

Jul 03, 2025

[Original thread here: Missing Heritability: Much More Than You Wanted To Know]

1: Comments From People Named In The Post
2: Very Long Comments From Other Very Knowledgeable People
3: Small But Important Corrections
4: Other Comments

Comments From People Named In The Post

Sasha Gusev of The Infinitesimal, a leading critic of twin studies and the person whose views most inspired the post, kindly replied. His reply has four parts - I’ll address each individually. First, GxE interactions:

I think the post conflates gene-gene and gene-environment interactions; the latter (specifically interactions between genes and the "shared" environment) also get counted by twin models as narrow sense heritability. While I agree there is very little evidence for gene-gene interactions (particularly dominance, as you cite [and, interestingly, twin/adoption studies actually forecast a huge amount of dominance -- another discrepancy we do not understand]) there is quote substantial evidence for gene-environment interactions including on educational attainment (see Cheesman et al: https://www.nature.com/articles/s41539-022-00145-8 ; Mostafavi et al: https://elifesciences.org/articles/48376), IQ, and BMI. In fact, Peter Visscher led a paper that came to the conclusion that twin estimates for the heritability of BMI are very likely to be overestimated by gene-environment interactions (https://pubmed.ncbi.nlm.nih.gov/28692066/). A large amount of GxE plus some amount of equal environment violation seems like a very plausible and parsimonious answer to the heritability gap.

I asked Gusev:

I'm having trouble understanding what you mean by GxE interactions explaining much missing heritability.
For the Scarr-Rowe interaction, this would make heritability look higher in high-SES families. But wouldn't this affect twin and molecular estimates equally unless there's some reason that subjects for one type of study are consistently from a different economic stratum than the other?
If we're thinking more about, let's say, a pair of fraternal twins where one of them is ugly and so parents don't invest resources in their education, wouldn't this show up equally in twin studies and GWAS? That is, if this is a very uncommon effect, we shouldn't expect it to affect large twin studies much. But if it's a common effect, then shouldn't we expect that every ugly person is less intelligent, and so GWAS will find that a gene for ugliness is associated with lower intelligence (both within and between families)? Can you give an example of a case why this would show up in twin studies, but not GWAS, RDR, etc? Also, why would we privilege this circuitous explanation (ugliness is genetic and provokes strong parental response) over the more direct explanation (intelligence is genetic)?
Also, the papers you cite show effects on the order of 2-8%pp; do you think the real effect is higher?

He answered:

Take the peanut allergy example [from a paywalled post of Lyman Stone’s]. Let's say in order to develop an allergy you need a mutation in the PNUT gene AND ALSO grow up in a household with [ed: possibly this should be “without”] early exposure to nuts (no Bamba!); that's a gene-environment interaction. For MZ twins, they will always share PNUT mutant (or wildtype) and 100% of their household exposure, so they'll be perfectly correlated on allergy; for DZ twins, they will share PNUT mutations half the time and 100% of their household exposure, so their correlation drops in half. So the twin study will tell you allergy is a 100% heritable trait. Now we test the PNUT variant in a GWAS, the first thing you do is throw away all the relatives (i.e. take one of each twin). Some people will be PNUT mutants and grow up in a household with no exposure and be allergy free, some will be PNUT mutants with exposure and will have allergy (and vice versa for the non-carriers). The resulting correlation between PNUT mutation and allergy will be low, so the heritability estimate will be <100%. TLDR: in the ACE twin model (and sib-reg), AxA and AxC interactions get counted as A. In the GWAS (and RDR) model, AxA and AxC get counted as E. In my opinion AxA could plausible be considered "heritability" in the sense that it only relies on genes, but AxC cannot.

Shouldn’t this be easily detectable in adoption studies? Adoptees have different family environments than their bio parents, so they should act like the unrelated people in GWASs. But in fact adoption studies get similar numbers to twin studies.

Also, shouldn’t this make polygenic scores gain predictive power within families, rather than lose it? After all, you’re restricting your analysis of genetic effects to people who are in the exact same environment. But in fact, every polygenic score loses a lot of predictive power when you do a within-family validation.

Also, shouldn’t this show up as nonlinear kinship-similarity curves? That is, siblings grow up in the same family, so you should see their similar genes exerting similar effects. But cousins grow up in different families, so you should see their similar genes exerting weaker effects. Granted, your family will be less different from your cousin’s family than from the average family in the population, but you could quantify how much more similar and you should see a corresponding drop in heritability. But in fact, heritability estimates decline linearly with shared DNA, the way a traditional, GxE-interaction-free model would suggest.

Also, shouldn’t this show up as high shared environment in twin studies? After all, it’s saying that many traits are heavily determined by the shared environment. Sure, in this example it’s genetically mediated. But for anything less than 100% genetic mediation, the remainder ends up in C. So you would need for the shared environment to have huge effects, but somehow never in a way that isn’t genetically mediated in every single pair of twins in an entire sample.

Also, if lots of apparent heritability is mediated by the shared environment, why would heritability estimates constantly increase over the lifespan, as people get further from the shared environment?

Elsewhere, Gusev linked a more thorough explanation of his theory of interactions. It’s pretty interesting, but (he admits) kind of the opposite of everyone else’s theory of interactions. Everyone else is looking for interactions where poor people have lower heritability of behavioral traits than rich people (because poor people might be malnourished/undereducated/etc, whereas rich people usually get enough resources to achieve their genetic potential, whatever it may be). But Gusev instead looks for interactions where rich people have lower heritability than poor people (because poor people face many challenges and their ability to overcome those challenges might depend on their genes, but rich people will do fine regardless of what genes they have). My impression is there are many studies on both sides; I’m not expert enough in the field to know whose studies are better or whether they can be reconciled, but it’s a bad omen that people looking for these effects can’t even agree on what the sign is.

And maybeiamwrong2 here gives another good explanation of Gusev’s theory of interactions and heritability.

Second, usefulness of scores across ancestry groups:

You mention epidemiologists being the biggest losers of stratification in polygenic scores, but I think it is important to note a related group: the people who take polygenic scores trained in one population (with a ton of stratification) and directly apply them to other populations to make claims about innate abilities (see: this post). This is especially true for Edu/IQ GWAS, where every behavior geneticist has been screaming "do not do that!" since the very first study came out. People like Kirkegaard, Piffer, Lasker, etc. (and their boosters on social media like Steve Sailer and Cremieux) dedicated their careers to taking crappy GWAS data from and turning it into memes that show Africans on the bottom and Europeans on the top. These people also happen to be the court geneticists, so to speak, for SSC/ACX. I don't mean to come off as antagonistic and I'm sure some people will see this comment and immediately discount me as being an ideologue/Lysenkoist/etc so it does my broader position no favors, but this stuff has done and continues to do an enormous amount of damage to the field (including the now complete unwillingness of public companies like 23andme to collaborate on studies of sensitive traits

Gusev later says he’s specifically referring to figures like this one from Davide Piffer (from here; note that although the article looks nice and says “peer-approved” at the top, Qeios is not a real journal in the usual sense):

The horizontal axis is EA4, a polygenic score for predicted educational attainment, generally believed to correlate somewhat with IQ. The vertical axis is observed IQ of each group. Piffer’s point is that different ethnic groups’ predicted genetic intelligence seems to correlate pretty well with their observed intelligence, so maybe group differences in intelligence are genetic.

Gusev notes that “every behavior geneticist has been screaming ‘do not do that!’ since the very first study came out”, which is true. I know of three main reasons why this is a bad idea:

Polygenic scores will mistake social factors correlated with genetic factors as genetic. For example, in a society where black people have lower IQ because of racism and poor access to schooling, people with a certain gene (the gene for black skin) will have lower IQ. Therefore, a study that blindly correlates genes with IQ will incorrectly assume that the gene for black skin is a gene for IQ/etc. This will ironically seem to justify the original inequality (it will look like black people “only” have low IQ for genetic reasons). This is by far the biggest problem with a plot like this one, and the one Gusev’s post above talks about - and the post offers a good example of a real result on height which encountered this exact problem.
Different ethnic groups could have different genetic structure of intelligence. That is, white people and black people might be equally intelligent, but because of different genes. If you train a predictor on white people, and then try to use it on black people, it will falsely conclude the black people are less intelligent because they have fewer of the white people intelligence genes.
Different groups could have the same intelligence genes, but in different linkage disequilibrium. That is, suppose Gene X and Gene Y are so close on the genome that they’re practically always inherited together. Scientists can’t isolate the effect of either, and they might only have one in their panel. Let’s say Gene X is in the panel, and Gene Y is an “intelligence gene”. Then the panel would show that Gene X is an intelligence gene. That’s fine as far as it goes - a score produced with that panel would still predict intelligence correctly. But if you apply it to a genetically different group, the assumption that Gene X and Gene Y always travel together might no longer hold. Then you might notice that black people lack Gene X, and incorrectly conclude that they lack an “intelligence gene”.

I think all of these concerns are real and important. Piffer gestured at some weak statistical corrections he tried to do for [2] and [3], (see the sentence in his paper beginning “transferring polygenic scores across populations has proven challenging in this field of research”) but for problem [1], all he did was admit in the second-to-last paragraph that it might be a problem, and say that “the findings drawn from these tests should be viewed as provisional and subject to alteration”, which is obviously pretty weak for something this explosive and controversial.

On the other hand, I don’t fully understand how these issues apply here. AFAICT, the EA4 score used in this paper was constructed entirely from European people with <3% non-white ancestry, so it doesn’t seem like it should be noticing under-educated black people in the sample and falsely concluding that black people’s genes cause low education. It must be claiming that black people have fewer of the genes that are associated with education in a sample of white people. It’s possible that you could still get population stratification (maybe because of the <3% admixture in your supposedly European population?), but I think if that’s the theory then people should say it explicitly, or else explain what kind of alternate model they’re working from. (It’s also possible I totally misunderstood the claims that EA4 limits itself to people with European ancestry; if so, please let me know)

So maybe there’s more of a role here for problems [2] and [3], about the difficulty of applying a score trained on Europeans to non-European populations? My question there is - shouldn’t this produce nonsense results, rather than results which reflect the populations’ real-world IQs? I think the counterargument here would have to be that by coincidence or colonialism, the populations with the furthest genetic difference from Europeans also happen to have the lowest real-world IQs (for social reasons) - or at least that this trend holds in a vague enough way to produce the vague correlation seen on the graph. There’s some evidence for this - this Piffer’s application of EA4 predicts that Chinese (real average IQ 105) have the same educational attainment as Puerto Ricans (real average IQ 82). So maybe it’s just showing average genetic distance from its European sample after all, and Chinese and Puerto Ricans are about equally distant on average? This wouldn’t explain why the predictor correctly finds that Ashkenazi Jews come out highest, but that could be because their “European” sample did include Ashkenazi Jews, and so here problem [1] does come in.

Except that later (Figure 7), Piffer graphs polygenic score for height against real IQ and finds no correlation, so it doesn’t seem like it’s just that polygenic scores naturally get lower as you get further from the group it was trained on. So now I’m back to being confused.

Maybe the best we can do is blame autocorrelation? That is, for all the data points on the graph, there are really only three clusters - Europeans, Africans, and everyone else. So you really only need ~3 unlucky coincidences to get this finding. And three unlucky coincidences, if you admitted they were three unlucky coincidences, wouldn’t be statistically significant, let alone “p = 7e-08” (lol). So maybe all the technical issues just explain why we shouldn’t take the scores seriously, and the answer to why it matches reality is a combination of “bad luck” and “it doesn’t really match reality that well, cf. the Chinese vs. Puerto Rican issue, but with enough autocorrelated data points even small coincidental matches look very significant”.

I’ve written before about how there are a thousand skeptics speculating about the personality flaws of people who say false things for every one doing the hard work of rebutting them. So consider this my request for someone who knows more than I do to explain exactly what went wrong with Piffer’s analysis to produce this particular pattern.

I also take issue with Gusev’s claim that these people are “the court geneticists, so to speak, for SSC/ACX” - I don’t think I’ve ever interacted with Piffer or mentioned him on this blog before. You can see a partial list of actual geneticists I sent my Missing Heritability post to for review at the bottom of the post, and they’re mostly the same people that Gusev links to and endorses.

Some of the people who Gusev accuses of “dedicating their careers to” these kind of results also deny this, and say they partly share Gusev’s criticisms (1, 2)

Third, the overall framing:

I'm going to gently push back against the hereditarian/anti-hereditarian framing (which I understand is probably here as shorthand and scene setting). I am personally interested in accurate estimates that are free of assumptions. I believe twin study estimates are of low quality because the assumptions are untestable, not because they are high. I also think the public fixation on twin studies has created some real and damaging anti-genetics and anti-psychiatry backlash and wrong-headed Blank Slate views. People hear about twin studies, look up the literature and find that peanut allergy (or wearing sunglasses, or reading romance fiction) is estimated to be highly heritable and have minimal shared environment, start thinking that the whole field is built on nonsense, and end up at quack theories about how schizophrenia is actually a non-genetic fungal condition or whatever. I've been very clear that there are direct genetic effects on essentially every trait out there, including behavioral traits and IQ. If someone were to run a large-scale RDR analysis of IQ tomorrow and got a heritability of 0.9 and it replicated and all that, I would say "okay, it looks like the heritability is 0.9 and we need to rethink our evolutionary models". If anything, large heritability estimates would make my actual day job much easier and more lucrative because I could confidently start writing a lot of grants about all the genome sequencing we should be doing.

This paragraph surprised me, because people coming up with crackpot non-genetic theories about schizophrenia is part of why I think it’s so important to explain that twin studies are usually pretty good!

Schizophrenia has about the same level of missing heritability as IQ, EA, or any other trait (80% heritable in twin studies, ~10% heritable in best polygenic predictors, ~25% heritable according to GREML). I don’t really understand on what grounds you can object to the twin heritability estimates of IQ/EA/etc, but believe the ones for schizophrenia.

Elsewhere it seems like Gusev seems to accept this equivalence. As he put it in a discussion with psychiatry blogger Awais Aftab: “For schizophrenia, which was thought to be largely genetic, the most we can expect from a common variant polygenic score is an accuracy (R-squared) of ~0.24, with the current score reaching about a third of that (Trubetskoy et al. 2022)…even for these estimates, we do not yet know to what extent they may be inflated by the kind of stratification and confounding I’ve mentioned.” This is not the way I expect people to talk when part of the reason behind their work is not wanting people to mistakenly think schizophrenia is non-genetic!

I responded here, and Gusev responded to my response here.

Fourth, the consistency of twin studies:

Lastly, it's not clear to me where the conclusion that well-validated twin studies converge on "similar results" is coming from. To take one example: the leading lights of behavior genetics (Deary, McGue, Visscher, etc) ran a study looking at the relationship between intelligence and lifespan (https://pubmed.ncbi.nlm.nih.gov/26213105/). This is a nice study for us because they put together three large, modern, twin cohorts with IQ measurements, but the heritability of IQ was just a nuisance parameter for them, so they had no reason to scrutinize the findings or file-drawer them. If we look at their MZ/DZ correlations in Table S6 we find that the heritability of IQ was 0.36 in the US sample; 0.98 in the Swedish sample; 0.24 in the Danish sample; and ... 0.52 on average. In other words, all over the place (but averaging out to the nice "half nature half nurture" result you see in books); the authors themselves used an AE model in Table 2 and reported a range of 0.20 to 0.98. This is far greater than the variability we see with GWAS or Sib-Reg, so what are we to make of that?

This is a comparatively minor point, but I respond here and he responds here.

Eric Turkheimer (blog), another person named as an “anti-hereditarian blogger” in the post, writes:

You say, "Turkheimer is either misstating the relationship between polygenic scores and narrow-sense heritability [or at least egging on some very confused people who are doing that]," but in the passage you quote I am perfectly clear that I am talking about the DGE heritability, that is Column E from Supplemental Table 3. The value of .048 is the median of 17 (arbitrarily classified by me) "behavioral" DGE heritabilities taken from that column.

I don’t want to get into another “did you communicate this poorly?” argument after the recent Tyler Cowen one, so I will just quote the Turkheimer paragraphs I objected to and let readers make their own decisions. This is from Is Tan Et Al The End Of Social Science Genomes:

The median [direct genomic effect] heritability for behavioral phenotypes is .048. Let that sink in for a second. How different would the modern history of behavior genetics be if back in the 80s one study after another had shown that the heritability of behavior was around .05? When Arthur Jensen wrote about IQ, he usually used a figure of .8 for the heritability of intelligence. I know that the relationship between twin heritabilities and SNP heritabilities is complicated, and in fact the DGE heritability of ability is one of the higher ones, at .2336. But still, it seems to me that the appropriate conclusion from these results is that among people who don’t have an identical twin, genomic information is a statistically non-zero but all in all relatively minor contributor to behavioral differences.

Very Long Comments By Other Very Knowledgeable People

Peter Gerdes (blog) writes:

From an apriori point of view we should expect the relationship between genes and observed features to be incredibly complicated -- basically as complicated as the relationship between computer code and program operation -- and I'd argue the evidence we see is exactly what that model of highly complicated interactions predicts. Namely GWAS misses a bunch of twin study variation and so do simple non-linear models. Or to put the point differently the answer is the narrow/broad gap but broad hereditary is just really damn complicated.
Yes, people cite papers for the claim that non-linear effects don't seem to make the difference. But if you dig in the supposed evidence against non-linear effects (like you linked) are really only evidence against other very simple elaborations of the linear model. Showing that you don't predict much more of the variance by adding some simple non-linearity (eg dominance effects at a loci or quadratic terms etc) is exactly what you would expect if most effects are extremely complicated and neither model is even close to the complete causal story.
I mean, imagine that some OSS project like the Linux kernel gets bifrucated into a bunch of national variants which are developed independently with occasional merges between individual nation's versions (basically it acts like genes under recombination). Or better yet a Turing complete evolutionary programming experiment with complex behavior. Indeed this later one can be literally tested if people want.
We know in this case that code explains 100% of program variation and if you did the equivalent of a GWAS against it you would probably find some amount of correlations. I mean some perf improvements/behavior will be rarely discovered so they will be highly predicted by whether some specific code strings show up (they are all downstream of the initial mutation event) and other things like using certain approaches will have non-trivial correlation with certain keywords in the code.
But no linear model would actually even get close to capturing the true (indeed 100%) impact of code on those performance measurements. And it would look like there were no non-linear effects (in the sense of the papers claiming this in genetics) because adding some really simple addition to the linear model like "dominance loci" or whatever isn't going to do much better because you aren't anywhere close to the true causal model.
So the evidence we have is exactly what we should expect a priori -- genes have some really complicated (essentially Turing complete) relationship with observed behavior and simple models are going to guess at some of that but linear models will do about as well as simple non-linear ones until you break through some barrier.

This is a great point, and it’s especially helpful to know that the papers demonstrating few-to-zero interaction effects are weak.

But Lunaranus on the subreddit links https://en.wikipedia.org/wiki/Epistasis#Evolutionary_consequences, which points out that if there are many interactions, it’s hard for organisms to evolve, because the same mutation could have totally different effects depending on what’s going on elsewhere in the genome. It suggests that organisms “evolve for evolvability”, and one way they do this is trying to minimize interaction effects. But see bza9’s argument against here.

Demost argues against the comparison to computer code, pointing out that genomes have to get mixed and matched every generation. If you had to run a computer off a Linux ecosystem where each line of code was separately randomly selected from among all extant distros every time you booted up your computer, probably it would end up pretty additive too.

Related: if GxG interactions are a big deal, wouldn’t you expect outbreeding depression? Suppose that Europeans split from Africans 50,000 years ago and start evolving separately and getting different genes. Then evolution would make Europeans preferentially accumulate new mutations that interact beneficially with existing European mutations, and the same for Africans, with each population building a tower of adaptations atop previous genes. Then when the two populations interbreed, it should be a disaster - half the genes necessary for their carefully-evolved interactions are missing, and fitness plummets. But interracial children are just as healthy as within-race children. In fact, even interspecies hybrids like mules are pretty healthy (their inability to breed comes from an unrelated chromosome issue). So evolution can’t be using interactions like this.

I don’t have enough of a quantitative sense for this to know whether there can be strong interaction effects which aren’t adaptations and which evolution keeps trying to get rid of, but which stick around anyway.

Also on interactions, Steve Byrnes added:

> If nonadditive interactions are so important, why have existing studies had such a hard time detecting them?
Ooh, I have an extensive discussion of this in a recent post: https://www.lesswrong.com/posts/xXtDCeYLBR88QWebJ/heritability-five-battles Relevant excerpts follow:
§4.3.3 Possibility 3: Non-additive genetics (a.k.a. “a nonlinear map from genomes to outcomes”) (a.k.a. “epistasis”)
…Importantly, I think the nature of non-additive genetics is widely misunderstood. If you read the wikipedia article on epistasis, or Zuk et al. 2012, or any other discussion I’ve seen, you’ll get the idea that non-additive genetic effects happen for reasons that are very “organic”—things like genes for two different mutations of the same protein complex, or genes for two enzymes involved in the same metabolic pathway.
But here is a very different intuitive model, which I think is more important in practice for humans:
• Genome maps mostly-linearly to “traits” (strengths of different innate drives, synaptic learning rates, bone structure, etc.)
• “Traits” map nonlinearly to certain personality, behavior, and mental health “outcomes” (divorce, depression, etc.)
As some examples: …
• I think the antisocial personality disorder (ASPD) diagnosis gets applied in practice to two rather different clusters of people, one basically with an anger disorder, the other with low arousal. So the map from the space of “traits” to the outcome of “ASPD” is a very nonlinear function, with two separate “bumps”, so to speak. The same idea applies to any outcome that can result from two or more rather different (and disjoint) root causes, which I suspect is quite common across mental health, personality, and behavior. People can wind up divorced because they were sleeping around, and people can wind up divorced because their clinical depression was dragging down their spouse. People can seek out company because they want to be widely loved, and people can seek out company because they want to be widely feared. Etc.
• I dunno, maybe “thrill-seeking personality” and “weak bones” interact multiplicatively towards the outcome of “serious sports injuries”. If so, that would be another nonlinear map from “traits” to certain “outcomes”.
All of these and many more would mathematically manifest as “gene × gene interactions” or “gene × gene × gene interactions”, or other types of non-additive genetic effects. For example, in the latter case, the interactions would look like (some gene variant related to thrill-seeking) × (some gene variant related to bone strength).
But that’s a very different mental image from things like multiple genes affecting the same protein complex, or the Zuk et al. 2012 “limiting pathway model”. In particular, given a gene × gene interaction, you can’t, even in principle, peer into a cell with a microscope, and tell whether the two genes are “interacting” or not. In that last example above, the thrill-seeking-related genes really don’t “interact” with the bone-strength-related genes—at least, not in the normal, intuitive sense of the word “interact”. Indeed, those two genes might never be expressed at the same time in the same cell….
As far as I can tell, if you call this toy example “gene × gene interaction” or “epistasis”, then a typical genetics person will agree that that’s technically true, but they’ll only say that with hesitation, and while giving you funny looks. It’s just not the kind of thing that people normally have in mind when they talk about “epistasis”, or “non-additive genetic effects”, or “gene × gene interactions”, etc. And that’s my point: many people in the field have a tendency to think about those topics in an overly narrow way.
…
§4.4.3 My rebuttal to some papers arguing against non-additive genetics being a big factor in human outcomes:
The first thing to keep in mind is: for the kind of non-additive genetic effects I’m talking about (§4.3.3 above), there would be a massive number of “gene × gene interactions”, each with infinitesimal effects on the outcome.
If that’s not obvious, I’ll go through the toy example from above. Imagine a multiplicative interaction between thrill-seeking personality and fragile bone structure, which leads to the outcome of sports injuries. Let’s assume that there are 1000 gene variants, each with a tiny additive effect on thrill-seeking personality; and separately, let’s assume that there’s a different set of 1000 gene variants, each with a tiny additive effect on fragile bones. Then when you multiply everything together, you’d get 1000×1000=1,000,000 different gene × gene interactions involved in the “sports injury” outcome, each contributing a truly microscopic amount to the probability of injury.
In that model, if you go looking in your dataset for specific gene × gene interactions, you certainly won’t find them. They’re tiny—miles below the noise floor. So absence of (that kind of) evidence is not meaningful evidence of absence.
The second thing to keep in mind is: As above, I agree that there’s not much non-additive genetic effects for traits like height and blood pressure, and much more for things like neuroticism and divorce. And many papers on non-additive genetics are looking at things like height and blood pressure. So unsurprisingly, they don’t find much non-additive genetics.…
[Then I discuss three example anti-epistasis papers including both of the ones linked in OP.]

Andy B writes:

Sequencing technology doesn't get discussed nearly enough in this area. Illumina short-read sequencing/SNP panels have been the major source of data for all of these studies, and they are absolutely delightful at finding SNPs but are crap at anything else. I think it will be appreciated that generally things that impact function aren't SNPs, they are broad changes, and so much human genomics seems to be hoping that the thing that is contributing to a change is able to spotted by being close to a SNP, instead of actually looking at the thing that is causing the change.
Genomes aren't lists of SNPs, they are mostly repeats and 2x150bp isn't going to get anywhere near close to capturing that variation, no matter how 'deep' you sequence. Long-read sequencing (PacBio & ONT, not Illumina's synthetic tech) is clearly better, and continues to demonstrate that there is massive variation that is easy to see when you have a bunch of 20kbp fragment, while almost impossible when you're just aligning little chunks of text to a 3gbp genome.
I work in non-model genomics and long-read sequencing is such a clear winner I keep getting surprised when Illumina gets contracts for these massive human studies. The Human Pangenome Consortium is going to be providing a dataset that is way more useful than anything that's come before. Anecdotally, I hear that for some non-European genomic data they know that ~10% of the data from an individual DOESN'T EVEN MAP to the human reference (but is human genomic data). This is all invisible to analysis, or even worse, just confounds things, as the 'true' causal SNP is somewhere in the data that doesn't get analysed, and so we're stuck looking at noise and trying to make sense of it.
I feel like this is such a blind-spot for human genomics, as it's always about the latest and greatest AI/ML method to try and get some information out, when it's the underlying data which just sucks and doesn't have a hope in actually being linked to function. There was even a point on an Open Thread a few weeks back (Open Thread 374 from tabulabio) asking for people to get in touch about Frontier Foundation Genomic models, with the focus being on fancy new ML architectures.
When I asked ChatGPT to write this comment for me ("Argue that sequencing technology could explain a lot of the Missing Heritability problem") it actually pushed back against me, trying to use the Wainschtein et al. 2022 paper as evidence that '...[this paper] used high-quality WGS (which includes better SVs than Illumina) and still found that adding rare and structural variants only modestly increased heritability estimates", which is NOT TRUE. Wainschtein uses the TOPMED dataset, which is from Illumina short reads. Yes, they do 'deep' sequencing, and yes it's analysed to the absolute hilt with the latest and greatest GatK pipeline and QC to the max. But that claim is false, it's just lists of SNPs, completely ignores huge chunks of the genome and just hopes that the thing contributing to a phenotype is is able to be fished out alongside a SNP.
Another anecdote - an older friend's wife died from a brain cancer. He was an old-school non-model genomicist and got all of the data from the oncologists and various tests and analysed things. All of it short-read, none of it turned anything up, either from his or the various doctors. Long-read sequencing run was eventually done after her death and indicated that it was a splice variant missed by short-reads. It was clear as day in the long-read data, since it didn't need to do fancy bioinformatic de bruijn graphs to figure out the splice isoform - it's just mapping the read against the genome and seeing it clear as day.

Thanks - this is helpful, especially the comments on Wainschtein. I asked Andy whether the products advertised as “whole genome sequencing” that sequence every base in the genome” get his seal of approval, and he said:

I don't mean to sound hyperbolic but Illumina is kind of like the Illuminati. It's everywhere, monolithic and it's influenced genomics massively.
I had a quick look at a few "whole genome sequencing" retailers, and they’re usually using Next Generation Sequencing, which in most cases means Illumina. The phrase "sequence every base in the genome" sounds impressive, but it’s a bit misleading. Yes, they generate reads from across the whole genome, but they’re in tiny fragments, and only make sense once you align them to a reference genome.
That's where reference bias comes in. You mostly detect things that are similar enough to map cleanly, and just a little different to be able to be called a variant. That’s fine for common variants, but bigger or more complex stuff tends to get missed or misinterpreted.
To give a sense of scale, the human genome is about 3 billion base pairs long. When you get your Illumina WGS results back from a provider, you don’t get a 3 Gbp text file with your actual genome. What you usually get is a variant (VCF) file with just the differences compared to the reference genome. And that makes sense to some extent. Why include everything that's the same? But there’s a lot of complex or unmappable variation that just isn’t detected with short reads.
If you used long-read sequencing and actually assembled your genome, you could get pretty close to your actual full genome, including regions that are repetitive, messy, or just structurally different. You’d see things that are completely invisible in an Illumina dataset. And you'd have much higher confidence in the things you see, since a lot of the artifacts come from using short-read data.
That’s why basically all genomics in non-model organisms is happening with long reads now. At the International Congress of Genetics in 2023 (major conference that only happens every five years) the keynote speaker Mark Blaxter opened the meeting by saying we can finally get real, complete genomes thanks to long-read sequencing. He was talking specifically about the Darwin Tree of Life project, which is trying to sequence all eukaryotic species in the UK.
So yeah, most consumer WGS is Illumina, and it’s fine if all you want is common SNPs. But I can't wait for human genomics to migrate to long reads and overturn some of the perceived wisdom from two decades of Illumina dominance […] PacBio and ONT are almost on the same level as Illumina in terms of cost/genome, and they ACTUALLY give you the whole genome!

I’d like to hear from some statistical geneticists about whether they already knew all of this, and whether or not they agree that it’s a blind spot for their field.

Vinay Tummarakota writes on Twitter (sorry for formatting issues coming from change from tweets → blockquote):

An article like this was definitely needed: easy to follow and outlines the conceptual territory well. Wanted to share some constructive feedback in this thread!
- - - - #1: Pedigree Studies
In the article, Alexander favorably cites pedigree-based studies (e.g. studies based on siblings & cousins rather than just twins) to corroborate twin studies. However, I would like to respectfully push back against this line of reasoning.
First, his claim that extended pedigrees drop the EEA is not quite correct. For example, the Swedish study he cites uses an extended twin-family design which assumes the EEA (see the Supplemental Material).
Second, the Scotland pedigree estimates he cites are likely biased due to pop strat. In the RDR paper, @alextisyoung tests a method called “Kinship FE”. At a high-level, Kinship FE estimates heritability using a pedigree model which accounts for shared nuclear family environment. Importantly, this method is quite similar to the methods employed in the two Scotland papers cited by Alexander: Hill et al and Marioni et al (both estimate heritability using pedigrees while modeling the effects of the shared nuclear family environment). Using simulations, Dr. Young shows that Kinship FE is biased in the presence of genetic nurture or pop strat. This is because these processes induce correlations between genes and env beyond the nuclear family. Unfortunately, pop strat bias is not mitigated by PC adjustments. So the key question is: are these at play for cognitive phenotypes? The answer is maybe for genetic nurture & yes for pop strat. Tan et al Figure 1 shows that pop strat biases estimation of genetic effects for IQ & edu. Thus, pedigree estimates should be interpreted w/caution.
- - - - #2: Adoption Studies
Alexander also favorably cites adoption studies to corroborate twin studies. However, heritability estimates derived from adoption studies cannot be directly compared against heritability estimates derived from twin studies because the former are usually biased *upward* by assortative mating (AM) while the latter are usually biased *downward*. Why the difference? Well, Alexander estimates heritability as 2*rBM, not 2*(rBM - rAM) where rBM is the bio mom corr and rAM is the adoptive mom corr. Because children are >50% genetically similar to their bio moms in the presence of AM, this formula over-estimates heritability. Conversely, in a twin study, heritability is 2 * (rMZ - rDZ). Because DZ twins are >50% genetically similar in the presence of genotypic assortment, this formula under-estimates heritability. To be clear, I don’t know if adjusting for AM will make adoption studies discordant with twin studies. However, I think you should do this adjustment before claiming that adoption studies corroborate twin studies. I should also note that one of the adoption studies cited by Alexander does adjust for assortative mating (the SIBS study) so the devil really is in the details of each study.
- - - - #3: BMI Case Study
BMI is a very useful phenotype for the purposes of investigating missing heritability for several reasons.
(1) It’s a well-defined continuous phenotype, so less room for measurement artifacts to ruin things.
(2) Its population GWAS effects are highly correlated w/its within-family GWAS effects, so it’s possible to use whole genome studies (WGS) of unrelated individuals to quantify its heritability.
(3) There’s little evidence of genotypic assortment, so we can compare estimates from different methods w/o worrying about pesky AM bias.
(4) It’s a quasi-behavioral trait since BMI is impacted by behaviors, attitudes, etc.
So what does BMI reveal about missing heritability? Well, unfortunately, not much.
I’m not aware of a theoretical model which compellingly explains why RDR/WGS/quasi-random adoption estimate heritability lower than twin studies with SibReg in between.
To me, this reveals how complicated this debate really is. Even for a phenotype with many highly desirable theoretical properties, estimation of heritability is not consistent between methods. IMO, the way to resolve this issue moving forward is to conduct more unified analyses. In other words, apply each of the methods to the same cohort w/the same phenotype definition and see what happens. The RDR study was illuminating in this regard IMO. When sampling & measurement artifacts can be ruled out, the only differences between methods should lie in their actual theoretical properties (e.g. the assumptions they make). The primary obstacle to this, however, is statistical power. This is why I’ll end this thread by re-iterating my desire to see more highly-powered RDR and SibReg analyses moving forward.
EDIT: Ah, already noticed a mistake in my own thread 😑 Alexander is calculating adoption heritabilities using 2*rBM, not 2*(rBM - rAM). I was getting some quantities jumbled in my head as I wrote the thread, my bad! Nonetheless, I think the point about AM stands.

I appreciate these thoughts, although most of these biases and corrections are small enough that I don’t expect them to make a big difference to the results, or to line up in the particular Swiss cheese hole way where they let a big error get through.

Small But Important Corrections

Jorgen Harris (blog) corrects me on the crime adoption study (linked in a tweet in the main post):

Scott and Cremieux are misinterpreting the posted study. The key line is:
> "Note, however, that a genetic influence is not sufficient to produce criminal convictions in the adoptee. Of those adoptees whose biological parents have three or more convictions, 75 percent never received a court conviction. Another way of expressing this concentration of crime is that the chronic male adoptee offenders with biological parents having three or more offenses number only 37. They make up 1 percent of the 3,718 male adoptees in Table 2 but are responsible for 30 percent of the male adoptee convictions."
So, 1% of male adoptees in the study have a parent with three or more convictions *and* themselves have been convicted of a crime. 3/4 of children whose parents have been convicted 3+ times are never convicted. [So] male adoptees whose parents had 3+ convictions made up 4% of the sample, not 1%. The 1% are the 1/4 of those adoptees with any criminal convictions.
Bigger picture, I think this study is consistent with a world where, a) a large share of kids who end up committing crimes/having serious behavioral and emotional problems despite being raised by fairly normal families are adoptees (as Scott observed), and b) most adoptees raised by fairly normal families don't end up committing crimes/having serious behavioral and emotional problems.

Thank you. The tweet said that the 1% of adoptees with the most criminal parents commit 30% of the crime, but should have said that the 4% of adoptees with the most criminal parents commit 30% of the crime.

I agree that the study says that many adoptees with criminal parents never commit crime at all. These two claims (high relative rate, low absolute rate) can only be simultaneously true if the base rate of criminal conviction is low. But the base rate isn’t that low - somewhere between 10-20% of Americans have a criminal conviction. But the study uses a poorly-signposted combination of Americans and Danes between 1895 and the present, and maybe the base rate in that group is lower.

E0 writes:

> "In the degenerate case where the mother and father have exactly the same genes (“would you have sex with your clone?”) the fraternal twins will also share 100% of their genes."
Why is this so? If Mom and Dad both have genes (a, b) at one location, won't sometimes twin 1 get (a, a) while twin 2 will get (b, b)? Agree there's more commonality than normal because there's a possibility of 1 getting (a, b) while 2 gets (b, a), which isn't normally true.

Thanks, I agree this is true and have edited the post.

IAmChalzz on the subreddit discusses several possible issues and biases. Most are the sort of minor weaknesses I expect from all studies and don’t bother me too much, but one is a frank error on my part:

Scott confuses standard errors with confidence intervals when reporting on RDR/Sib-reg results. For some reason, this literature consistently reports standard errors instead of confidence intervals, don't ask me why. Because Scott uses the nonoverlapping confidence intervals for different Sib-regression estimates as a strike against the consistency and credibility of this literature, this affects his argument quite a bit.

Thanks for the correction, I’ve weakly edited this out of the piece, though not fully rewritten the piece to be as good as it would have been if I never made this error in the first place.

Other Comments

Kveldred writes:

Anecdata: I went to a fancy Christian private school (because my parents bankrupted themselves to find a somewhat-tolerable schooling environment for their weird son, not because I grew up fancy myself), and two of my classmates—out of about 100 total—were adopted through a similar "help the worst-off" sort of idea on the part of the parents. It was /very evident/ that they were... different... from the rest; you'd guess they'd been adopted even if they hadn't freely admitted it. The girl, as far as I know, didn't do great but didn't get in trouble; her brother did, indeed, end up arrested for assault.

Brandon Berg writes:

Note that, although that the story you linked doesn't mention it, the study that found people with Norman names overrepresented at Oxford was by Greg Clark, who argues that very long-run persistence of social status is driven by genetics and highly assortative mating (0.8 on a latent measure of socioeconomic competence). So in this case it wouldn't be necessarily be confounding, because persistent differences between people with Norman and Saxon names might well be genetic.
The idea that truly exogenous economic endowments can persist over several generations has pretty weak empirical support, as I believe you mentioned in a post several years ago. Studies finding otherwise are likely confounded by genetics.
But there was also a St. Louis Fed study finding that the intergenerational correlation for residual wealth (i.e. wealth greater than or less than that predicted by earnings) is only about 0.2. Except perhaps in the most extreme cases, wealth has to be actively maintained and fortified by each successive generation, or even very wealthy families fall off the map within a few generations. Consider that less than a century after his death, there are no longer any Rockefellers on the Forbes 400 list.

This started a long discussion on whether it was more likely that Normans were genetically more prone to education than Saxons, or that wealth imbalances could last 900 years.

In favor of Normans being genetically smarter - the Normans who invaded England were the military aristocracy of Normandy, which was itself descended from the military aristocracy of Denmark. If military aristocrats are selected from the smartest/strongest/healthiest portion of the population, then they might have better genes than the unselected Saxons.

In favor of wealth imbalances lasting 900 years, even if this rarely happens in normal situations, the Normans were titled aristocrats in a political system carefully designed to preserve the privilege of titled aristocrats over many generations.

Vera Wilde (blog) asks:

Did you ask the polygenic embryo selection folks about collider bias?
In the section comparing Kemper’s sib-regression estimate (14%) and Young’s Icelandic estimate (~40%), you note that the UK Biobank sample may be skewed toward healthier, higher-SES volunteers (so-called healthy volunteer bias, which commonly creates selection effects in medical research). But the implications of such selection effects extend far beyond variability in heritability estimates.
This kind of bias can flip signs when people think they're being clever by adjusting for confounds (collider stratification bias). This is especially a relevant risk in highly selected samples like the UK Biobank, where the same factors that influence participation (e.g., health, SES, education) may also affect the outcomes of interest (mental and physical health). Conditioning on variables that causally contribute to both participation and outcome can introduce more bias than it corrects for.
I wrote a bit about this here: Munafò et al.'s schizophrenia example illustrates the mechanism more clearly than I can easily argue it for IQ: people at higher genetic risk for psychosis may be less likely to volunteer for studies ("they're out to get me, so I'm not giving them my blood"). This warps the apparent genetic architecture in large datasets. Doing embryo selection against schizophrenia risk based on such a sample, from which high-risk individuals disproportionately self-selected out, could backfire. And parents who paid $50K for it might not know for 20 years (or ever).
Hopefully the real-life practical implications are trivial: if you pay $50K for a prospective 3-point IQ gain, and collider stratification bias turns it into a 3-point loss, no big deal? You probably won't notice over dinner, as long as your kid slept well last night.
But I remain curious how the corporations selling embryo selection address the possibility that they're accidentally giving parents the exact opposite of what they're paying for. My guess is: they just don't tell anybody, and there's something in the fine print saying they could be wrong and aren't liable. Causality probably can't be proven in any individual case, and anyway you're paying for a best effort process, not a result.

I haven’t heard anything about people adjusting for this, but I’d like to hear from someone who knows more.

rad1kal writes:

I would just want to add that Sidorenko 2024 (https://pmc.ncbi.nlm.nih.gov/articles/PMC11835202/) that you cite for the discussion on rare variants also measure heritability using Sib-Regression although not for EA.
They show that correcting for assortative mating their estimates are coherent with twin studies :
"Therefore, if we account for assortative mating and assume that the resemblance between siblings is solely due to genetic effects (and not to common environmental), then our data are consistent with a heritability of 0.87 (s.e. 0.05) for height in the current population (Supplementary Note)."
they also show that shared environment estimates from twin studies are inflated by assortative mating :
"Our results for height and BMI agree with that conclusion in that we find no evidence of a residual sibling covariance (𝑐2) for BMI, while the significant 𝑐2 observed for height is largely explained by assortative mating."
And the WGS estimate is higher for height than the RDR estimates (55%), my take on this is that there is an assumption in RDR that posit that the environment of individuals are independent to each others and don't influence each others in correlation to the relatedness which since it's violated bias the estimate downward. (https://hereticalinsights.substack.com/i/146616013/disassortative-mating)
"Moreover, our estimates of 0.76 and 0.55 can also be compared to estimates from GWAS and WGS data. For height, the SNP-based estimate is about 0.55–0.60 (ref.41) and the WGS estimate ~0.70 (ref. 42; but with a large s.e. of ~0.10). These estimates imply that for height there is substantial genetic variation not captured by either SNP array and, to a lesser extent, sequence data, presumably ultra-rare variants (frequency <1/10,000 not included in ref.42)"

I don’t entirely understand what’s going on here, and I challenged rad1kal on some of the assortative mating implications here, but the blog that rad1kal linked (their own? I’m not sure) has a much deeper discussion of the Equal Environment Assumption than I included in my post, see here.

Austen writes:

"more home truths are to be learnt from listening to a noisy debate in an alehouse than from attending a formal one in the House of Commons. An elderly country gentlewoman will often know more of character, and be able to illustrate it by more amusing anecdotes taken from the history of what has been said, done, and gossiped in a country town for the last fifty years, than the best bluestocking of the age will be able to glean from that sort of learning which consists in an acquaintance with all the novels and satirical poems published in the same period. People in towns, indeed, are woefully deficient in a knowledge of character, which they see only in the bust, not as a whole-length. People in the country not only know all that has happened to a man, but trace his virtues or vices, as they do his features, in their descent through several generations, and solve some contradiction in his behaviour by a cross in the breed half a century ago."
people today would never be able to grasp and see the hereditary argument play out the way even a shrewd village person would who never left their town all their life did. by knowing personally each family of the village, and their qualities and characteristics, and seeing their kids develop and grow, and their kid's kids, and how the traits are passed down from each parent and renewed in their children in the subsequent generation, the weight of the hereditary argument in palpable.
instead we have people arguing about heritability who have never observed anything firsthand or took those observations to heart, from pure statistics and data, and contriving all sort of confounding theories.
when you go back farther to the 19th century and before, it was almost taken as a given that hereditary effects were very large, by anyone who seriously thought about it, reasoning from experience and their powerful intuitions.

Strong disagree, I think this is actually a great example of why you need science and not just common sense.

When people just observed other people in their same villages, half of people thought "look, people are like their parents, obviously genetics matters", and the other half thought "look, people are like their parents, obviously parenting matters". It took twin studies - which can separate out genetics from parenting - to distinguish between these two hypotheses. And even then, millions of people rejected it because it disagreed with their "common sense" - "But John is so nice! And his mother worked so hard to teach him niceness! Obviously his mother's parenting worked! That's just common sense!"

In fact, I think the ways this picture presents an overly rosy view of village common sense go beyond that. I've read (I think this was in Arguments About Aborigines, but not going to hunt for a full citation) that in many poor uneducated parts of Europe until the 18th or 19th century, people still didn't agree on whether the mother gave any genes to the child, or whether she was just the "soil" in which the father's "seed" could grow. Aborigines had the opposite problem - they thought possible a spirit impregnated the mother, and the sex act helped summon the spirit but the father's "genetics" (to the degree that they understood the term) didn't matter. The ancient Greeks had an even weirder view, telegony, where the child's paternal genetics were the sum of everyone who the mother ever had sex with.

I think it's only possible to romanticize this sort of "village common sense" because you and the villagers are the beneficiaries of millennia of careful science, tuning those common-sense intuitions.

(I do think villagers had one other advantage lots of moderns didn’t, which was that they closely observed animal breeding. But that’s not common sense - that’s access to important scientific data!)

Astral Codex Ten

146 Comments