I. Introduction
Robin Hanson of Overcoming Bias more or less believes medicine doesn’t work [EDIT: see his response here, where he says this is an inaccurate summary of his position. Further chain of responses here and here]
This is a strong claim. It would be easy to round Hanson’s position off to something weaker, like “extra health care isn’t valuable on the margin”. This is how most people interpret the studies he cites. Still, I think his current, actual position is that medicine doesn’t work. For example, he writes:
Europeans in 1600 likely prided themselves on the ways in which their “modern” medicine was superior to what “primitives” had to accept. But we today aren’t so sure: seventeenth century medical theory was based on the four humors, and bloodletting was a common treatment. When we look back at those doctors, we think they may well have done more harm than good.
When we look at our own medical practices, however, we tend to be confident we are in good hands, and that the money that goes to buying medical care–in 2020, it was 19.7% of our G.D.P. –is well spent. Most of us know of a family member who credits their life to modern medicine. My own dad said this about his pacemaker, and I, too, am a regular customer: I’m vaccinated, boosted, and recently had surgery to fix a broken arm.
We believe in medicine, and this faith has comforted us during the pandemic. But likewise the patients of the seventeenth century; they could probably also have named a relative cured by bloodletting. Yet health outcomes are typically too random for the experience of one family to justify medical confidence. How do we know our belief is justified?
This might seem like a silly question: in Europe of the seventeenth century, the average lifespan was in the low 30s. Now it’s the low 80s. Isn’t that difference due to medicine? In fact, the consensus is now that historical lifespan gains are better explained by nutrition, sanitation, and wealth.
So let’s turn to medical research. Every year, there are a million new medical journal articles suggesting positive benefits of specific medical treatments. That’s something they didn’t have in the seventeenth century. Unfortunately, we now know the medical literature to be plagued by serious biases, such as data-dredging, p-hacking, selection, attrition, and publication biases. For example, in a recent attempt to replicate 53 findings from top cancer labs, 30 papers could not be replicated due to issues like vague protocols and uncooperative authors, and less than half of the others yielded results like the original findings.
But surely modern science must have some reliable way to study the aggregate value of medicine? Yes, we do. The key is to keep a study so simple, pre-announced, and well-examined that there isn’t much room for authors to “cheat” by data-dredging, p-hacking, etc. Large trials where we randomly induce some people to consume more medicine overall, and then track how their health differs from a control population–those are the key to reliable estimates. If trials are big and expensive enough, with lots of patients over many years, no one can possibly hide their results in a file drawer.
After listing bigger studies that he interprets as showing no effects from medicine, he concludes:
We spend 20% of G.D.P. on medicine, most people credit it for their long lives, and millions of medical journal articles seem to confirm its enormous value. Yet our lives are long for other reasons, those articles often show huge biases, and when we look to our few best aggregate studies to assuage our doubts, they do no such thing.
Or, even more clearly:
Imagine someone claimed that casinos produce, not just entertainment, but also money. I would reply that while some people have indeed walked away from casinos with more money than they arrived with, it is very rare for anyone to be able to reasonably expect this result. There may well be a few such people, but there are severe barriers to creating regular social practices wherein large groups of people can reasonably expect to make money from casinos. We have data suggesting such barriers exist, and we have reasonable theories of what could cause such barriers. Regarding medicine (the stuff doctors do), my claims are similar.
His argument: there have been three big experimental studies of what happens when people get free (or cut-price) health care: RAND, Oregon, and Karnataka. All three (according to him) find that people use more medicine, but don’t get any healthier. Therefore, medicine doesn’t work. If it looks like medicine works, it’s a combination of anecdotal reasoning, biased studies, and giving medicine credit for the positive effects of other good things (better nutrition, sanitation, etc).
I’ve spent fifteen years not responding to this argument, because I worry it would be harsh and annoying to use my platform to beat up on one contrarian who nobody else listens to. But I recently learned Bryan Caplan also takes this seriously. Beating up on two contrarians who nobody else listens to is a great use of a platform!
So I want to argue:
Medicine obviously has to work
Examined more closely, the three experiments Robin cites don’t really support his thesis
There are other experiments which provide clearer evidence that medicine works
I’ll follow Robin’s lead in dismissing the entire medical literature - every RCT of every medication or treatment ever published - because it might have “huge biases,” and try to rely on other sources.
II. Modern Medicine Improves Survival Rate
What do I mean by “medicine obviously has to work”?
Age-adjusted mortality rate from most diseases has declined significantly over the past few decades. Robin doesn’t want to credit medicine, arguing that this might be due to “nutrition, sanitation, and wealth”.
But we can more clearly distinguish the effects of medicine by looking at the effects of secondary prevention, ie how someone does after they get a specific disease. For example, what percent of cancer patients die in five years? What percent of heart attack patients die within the first month after their heart attack? This is the kind of thing that depends a lot on how much medical care you get, and is less affected by things like nutrition or sanitation.
(I’m more confident saying this about sanitation and wealth. You can imagine nutrition improving this - maybe better-nourished cancer patients are better able to fight their disease - but nutrition hasn’t really improved over the past few decades in First World countries anyway.)
Here are 5-year survival rates for various cancers, 1970s vs. 2000s:
People with cancer are more likely to survive than fifty years ago.
This is after you’ve already gotten the cancer, so it’s hard to see how nutrition, sanitation, etc could explain this. Some of these changes (especially prostate) are a result of earlier diagnosis. But others reflect genuinely better treatment. For example, studies have shown great results from the anti-leukemia drug imatinib and the anti-lymphoma drug rituximab. In Robin’s model, these extraordinary studies would have to be bias or chance, and totally coincidentally at the same time somehow better nutrition made leukemia patients (but not uterine cancer patients) twice as likely to survive.
Might this be because people are getting cancer younger (and therefore are better able to deal with it?) I can’t find great data on this; there’s increasing cancer among younger people, but (since people are living longer) we should also expect increasing cancer among older people (since there are more older people). Rather than try to figure out how to balance these effects, here’s a graph showing similar survival improvements among childhood cancers in particular, where we wouldn’t expect this to be a problem:
Likewise, here is post-heart attack 30-day mortality rate over time:
The odds of death within 30 days of a heart attack have fallen from 20% in 1995 to 12.4% in 2015 (source). This is also no mystery; the improvement comes from increased use of basic drugs like ACEIs, aspirin, and beta-blockers, plus more advanced interventions like thrombolytics and angioplasties, plus logistical improvements like more heart attack patients being placed on specialized cardiac wards.
Again, can we dismiss this because maybe heart attack victims are younger? The study this particular graph comes from says their patients were on average 2.7 years older at the end than the beginning, so here age effects seem to point in the opposite direction. Here’s a graph showing the same decline if you break it up by under- and over-65s, though I wish I could find something with smaller bins.
Same data for stroke:
In 2000, a stroke victim is only half as likely to die in the first two years after their illness as they were in 1980. Here we don’t have to worry about age effects at all; the graph is already adjusted for age.
You can see similar survival rate increases for other conditions like congestive heart failure (5-year survival rate went from 29% to 60% since 1970), multiple sclerosis (standardized mortality rate went from 3.1 to 0.7 since 1950), type 1 diabetes (survival rate at 50 from about 40% to 80% since 1950) and nearly any other condition you look up.
I’m harping on this because it’s in some sense the central example of medicine: you get some deadly disease like cancer, and you want to know if doctors can help you survive or not. All the evidence suggests medicine has gotten much better at this in the past fifty years. Robin’s going to have a lot of hard-to-interpret studies about what happens to your cholesterol score or whatever after you change insurance, and we’ll pick these apart, but to me this seems like a much less central example of “does medicine work?” than the fact that we’re curing cancer and increasing heart attack survival rates.
III. RAND Health Insurance Experiment
This is considered the canonical study on the effect of health insurance. In the 1970s, RAND gave thousands of people one of five types of insurance, ranging from very bad (barely any coverage until a family reached a deductible of $1000, ie $5000 in today’s dollars) to very good (all care was free). Then they waited eight years. Then they checked whether the people on the good insurance ended up any healthier than the people on the bad insurance.
The paper I found measured five questionnaire-based outcomes plus five objective physiological measures, for a total of ten outcomes (Robin says he has a book where they discuss 23 to 30 outcomes, but I don’t have that book, so I’m sticking with the paper). The ten in the paper I read were:
Physical functioning questionnaire
Role functioning questionnaire
Mental health questionnaire
Social life questionnaire
Health perceptions questionnaire
Smoking
Weight
Cholesterol
Vision
Blood pressure
They found no effect of insurance on any of the questionnaires, and modest positive effects on vision and blood pressure.
How surprising is this?
It seems moderately surprising that nobody improved on any of the questionnaires. These seem to measure overall health. Maybe they were bad measures? Maybe 10,000 mostly-healthy people over 8 years doesn’t provide enough power to detect health improvements on questionnaires? I’m not sure.
It doesn’t seem surprising to me that nobody improved on smoking, weight, or cholesterol. The 1970s didn’t have any good anti-smoking medication - even the nicotine patch wasn’t invented until after this study was finished. Likewise for weight loss - the 1970s were in the unfortunate interregnum between the fall of methamphetamine and the rise of Ozempic. There were some weak cholesterol medications back then - eg nicotinic acid - but they were rarely used, and doctors weren’t even entirely convinced that cholesterol was bad. For all three of these things, the 1970s state of the art was doctors saying “You should try to stop smoking and eat better.” RAND found that the better insurances led to 1-2 more doctor visits per year. I don’t think that 3 visits to a doctor saying “You should try to stop smoking and eat better” vs. 4 visits to that doctor is going to affect very much.
It’s also not surprising that vision improved; the good insurances were more likely to cover glasses, and everyone knows that glasses help your vision. Even Robin admits this is a real effect; he just classifies it as more physics than medicine.
Blood pressure is more debatable. The 1970s had some okay blood pressure medications, like the beta-blockers, and doctors weren’t afraid to use them. So it seems possible in theory that better medical care could lead to decreased blood pressure. Still, Robin is skeptical. He says that the improvement in blood pressure found during the study was p = 0.03. In a study with 30 measures, one will be positive at 0.03 by coincidence. The version of the study he’s reading has 30 measures (mine has 5 - 10, depending on how you count the questionnaire).
On the other hand, this paper looks into the blood pressure result in more detail. It finds that “plan effects on blood pressure” were three times higher for hypertensives for non-hypertensives; that is, unlike statistical flukes (which we would expect to affect everyone equally), the effect was concentrated in the people we would expect doctors to treat. It also finds that plan effects are higher for poor people; unlike statistical flukes (which would affect everyone equally), the effect was concentrated in the people we would expect insurance to help. And it finds pretty convincing intermediating factors: people with good insurance were 20 percentage points more likely to get hypertension treatment, p < 0.001). So I think it’s a stretch to attribute this one to random noise.
This is the study authors’ conclusion as well. They calculate the benefit from this blood pressure improvement and find that:
If 1,000 fifty-year-old men at elevated risk were enrolled on a free rather than a cost-sharing plan, then we would anticipate that about 11 of them, who would otherwise have died, would be alive five years later.
Still, they describe their study as having a negative result, because:
...these mortality reductions, in and of themselves, are not sufficient to justify free care for all adults.
I assume they’re working off of some kind of reasonable cost-effectiveness model for government spending here. Still, if I were a fifty year old adult, I might be willing to personally spend a few hundred extra dollars a year to increase my 5-year-survival-rate by 1%. Certainly I don’t think it’s fair to describe this as “RAND proves medicine doesn’t work.”
Robin has a book with more information than I could get from the papers, so I feel bad contradicting him on this one. I’m more confident in my discussion of the next two experiments, which I think are clear enough that we can go back to this one later and apply what we’ve learned.
IV. Oregon Health Insurance Experiment
In 2008, Oregon had extra money and decided to expand Medicaid, a free insurance program for poor people. Many people applied for the free insurance, the state ran out of money, and they distributed the available Medicaid slots by lottery. This made the expansion a perfect setup for a randomized controlled trial on whether government-provided free insurance helps the poor.
Scientists monitored the recipients for two years (why not longer? I think at some point the insurance coverage stopped) and found that the people with Medicaid did in fact use more medical care than the control group. For example, only 69% of the control group described themselves as getting all the medical care they needed, but 93% of the group with insurance did. People with the insurance used more of almost all categories of medication:
People who got the free insurance had less medical debt at the end of the study period. They described themselves on questionnaires as having better health (55% vs. 68% at least “good”, p < 0.0001), and were more likely to say their health had improved over the past few months (71% vs. 83%, p < 0.001). They described having better mental health and less depression (25% vs. 33% depressed, p = 0.001).
However, Robin notes that many of these subjective changes happened immediately, ie before they even had a chance to use their new insurance. This means they’re more likely to represent mood affiliation (eg “I have insurance now, so I’m optimistic about my health!”). There was no difference on objective health measures, including blood pressure, cholesterol, and HbA1c (a measure of blood sugar / diabetes control).
Why not? The authors do the math on diabetes. If you look at the graph above, you see that about 12.5% of controls vs. 17.5% of experimentals took diabetes medications, p < 0.05. Studies find that diabetes medications decrease HbA1c by about one percentage point (normal HbA1c is about 5%, so this is a lot). If 5% of the insurance group took diabetes medications and decreased their HbA1c by 1 pp each, then the HbA1c of the experimental group would decline by 0.05 pp compared to the control group. Their 95% confidence interval of the difference was (-0.1, +0.1 pp), which includes the predicted value. So when they say “insurance didn’t significantly change HbA1c”, what they mean is “the change in HbA1c is completely consistent with the consensus effect of antidiabetic medications”.
Could the same be true of the other results, like hypertension? We find that the experimental group was 1.8 percentage points more likely to get a hypertension diagnosis, 0.7 percentage points more likely to get hypertension medications, and had 0.8 points lower blood pressure - but that all of these numbers were nonsignificant. If we take the nonsignificant numbers seriously, 0.7 pp taking antihypertensives caused an 0.8 point blood pressure drop in the full sample, meaning that antihypertensives caused a 100 point blood pressure drop in each user. This definitely isn’t true - a 100 point blood pressure drop kills you - but it means that a plausible pro-medicine result like antihypertensives lowering blood pressure 10 point is well within the study’s confidence interval.
Maybe the anti-medicine position is that, for some reason, good insurance doesn’t lead to hypertension diagnosis or antihypertensive medication use? If I understand these numbers right, about 22% of Americans have blood pressure > 140/90, the level at which doctors recommend medication. I expect the marginally-insured poor people in this experiment to be less healthy than average, so let’s say 25 - 30%. In the experiment, about 13.9% of the control group and 14.6% of the experimental group got antihypertension medication. Why so low? This study found that only about 60% of participants in the Oregon study who got the insurance even went to the doctor for non-emergency reasons! Subtract out the ones who refused to take antihypertensives, or who have too many side effects, or whose doctors let this fall through the cracks, and I think the 13 - 15% numbers make sense.
This study found that insurance increased hypertension medication use by a central estimate of 0.7 pp, not significant, confidence interval -4.5 to 5.8. Let’s take a convenient central estimate of our likely hypertension rate and say that 28% of our population should have gotten hypertension meds. That means the central estimate increased the percent of people who got recommended hypertension meds from 50% to 53%, and the 95% confidence interval includes up to 71%.
So my assessment of the blood pressure results from this study is:
At the beginning of the study, about 50% of people who should have been on hypertension meds were. The study had too low power to really figure out how this changed, but the central estimate is +3%, and the 95% CI rules out improvements beyond +21%
The study had too low power to figure out if hypertension meds worked, and basically could not rule out any level of effectiveness, even effectiveness so high that the meds would instantly kill you by lowering your blood pressure to 0.
I don’t think we can summarize this study as “we’ve proven medication doesn’t work”.
V. Karnataka Health Insurance Experiment
Same story, different scenery. This one happened in India. 10,000 families. End result is:
Having measured (a) 3 parameters (direct/indirect/total) for (b) 3 ITT and one TOT effect for (c) 82 specified outcomes over 2 surveys, only 3 (0.46% of all estimated coefficients concerning health outcomes) were significant after multiple-testing adjustments. (As Table A8 shows, 55 parameters (8.38%) are significant if we do not adjust for multiple-testing.) We cannot reject the hypothesis that the distribution of p-values from these estimates is consistent with no differences (P=0.31). We also find no effect of access on our summary index of health outcomes (Table A6 and Table A7).
In other words:
They tested a lot of stuff
If you don’t adjust for multiple comparisons, they got 55 significant results
Once you adjust, they got 3 significant results
They can’t prove that getting 3 significant results is itself a significant result
Their study was only powered to detect effects of size 0.1 or greater.
It’s helpful to look at their table of measured outcomes (A7). This has some of the usual ones like blood pressure. But it also has things like:
Doctor or nurse assisted with childbirth
Gave birth in a hospital
Had surgery
Takes medicine for hypertension
Told that they have diabetes
Told that they have cancer
…and these were among the majority of their outcomes where the study found no effect.
These don’t cast doubt on the effectiveness of medical treatment. They just look like a study where the intervention didn’t affect the amount of medical care people got very much. This was the authors’ conclusion too. In fact, they were unable to find a direct effect of giving people free insurance on those people using insurance, at all, in the 3.5 year study period! They had to rescue this with “spillover effects”, ie the effect of one person getting insurance in a village on other people, in order to even claim that the insurance increased healthcare utilization.
Why couldn’t they find an effect of giving people insurance on those people using insurance? Insurance is very new in India. These people weren’t really familiar with it, and in many cases their doctors and hospitals weren’t very familiar with it. In a few cases it didn’t even seem like the insurance companies fully understood their product:
Many households had difficulty using insurance to pay for healthcare. On average across treatment arms, access to insurance increased by 3.34 pp annually the number of households who tried to use their insurance card by 18 months but were unable to do so (from a base of 2.68% in the control group12). (Our TOT estimates suggest that insurance enrollment increased failed use by 4.02 pp off a base of 3% annually.) This excess failure rate is 50.50% of the successful utilization ITT effect.
Lack of knowledge about the purpose of insurance and how to use insurance seem likely explanations for the failure rate. Because insurance is a relatively new product, hospitals and beneficiaries may not know how to use it (Rajasekhar, Berg et al. 2011, Nandi, Dasgupta et al. 2016). In our midline and endline surveys, we asked why households did not try to use their insurance card to pay for care and why they were unable to use the card even when they tried (Table A5). Frequent reasons given for not using the card were not knowing that the card could be used for insurance (15% at 18 months, 20% at 3.5 years), forgetting the card at home (13% at 18 months), not knowing how or where to use the card (29% and 30% at 3.5 years). Besides these beneficiary-side problems, there were also supply-side problems. Of people that tried to use the card, 55% and 69% said that the doctor did not accept the card at 18 months and 3.5 years, and 12% said that the insurance company did not accept the card (i.e., did not approve use) at 3.5 years. (These should be interpreted with caution because we do not know if doctors correctly did not approve the card because a service was truly not covered, or incorrectly did so.) This finding suggests that demand-side education and supply-side logistics may be important for raising utilization of (and thus demand for) insurance in India and similarly situated countries.
I don’t want to over-update on this. They did eventually manage to find a medium effect of free insurance on insurance use when counting the spillover effects. I think the main problem with this study is the same as all the other studies - its confidence intervals are wide enough to include medicine working amazingly well, better than anyone claims it works in real life.
This is what the authors think too:
Care should be taken in interpreting the insignificant health effects observed. Perhaps the effect of hospital care on measured outcomes is too small to translate into health improvements that we have power to detect despite our substantial sample size (Das, Hammer et al. 2008). Moreover, confidence intervals reported in Table A6 and Table A7 suggest that medically significant effects for many outcomes cannot be ruled out.
VI. Summary Of Robin’s Three Insurance Studies
If it helps, think of these insurance studies as a sort of funnel:
In order for more insurance to result in better health on some measurable parameter (eg lower blood pressure), you need a chain of four things.
First, you need the better insurance to result in more doctors visits.
Second, you need the doctors visits to result in more diagnoses (eg of high blood pressure).
Third, you need the diagnoses to result in more treatment (eg blood pressure medication).
Fourth, you need the treatment to work (actually lower blood pressure).
Each step lowers our ability to detect a signal. That is, going to the doctor doesn’t, with 100% efficacy, result in more diagnoses. Some doctors will miss some diagnoses; that will introduce noise and lower our power / statistical significance. You can imagine doing a whole paper on whether increasing doctors’ visits increases hypertension diagnoses; that paper would have a p-value greater than zero / Bayes factor of less than infinity. So even assuming that better insurance really does improve health, each step we go down the chain decreases our ability to detect that.
In fact, in these three studies, we find dropoffs below statistical significance scattered basically randomly throughout this chain:
In some parts of the Karnataka study, we lose statistical significance at step 1. The better insurance didn’t necessarily result in more medical utilization. For example, it didn’t cause people to be (significantly) more likely to give birth in a hospital or get surgery.
In the hypertension outcomes of the Oregon study, we lose statistical significance at step 2. The better insurance led to significantly more doctors’ visits. But this didn’t result in significantly more hypertension diagnoses (it only resulted in non-significantly more).
I don’t think we see any clear examples of losing significance at step 3, but you could sort of think of the smoking outcomes of the RAND study this way. The RAND participants with better insurance saw the doctor more. Probably the doctor noticed they were smoking and diagnosed them with this, insofar as “tobacco use” was a formal diagnosis at all in 1974. But there were no good anti-smoking treatments in the 1970s, so the doctor didn’t prescribe anything.
In the diabetes outcome of the Oregon study, we lose statistical significance at step 4. Diabetics with better health insurance were significantly more likely to see the doctor, significantly more likely to get diagnosed, and significantly more likely to get placed on medication, but only had nonsignificantly better health. Why? Probably because, as mentioned before, if diabetes medication worked as well as studies say it does, the study wouldn’t have enough power to detect its effects.
Robin’s argument (that medicine doesn’t work) assumes that the only possible failure is at step 4, and that the failure must be a true failure rather than one of statistical significance. But in fact there are failures at every step (although I kind of have to stretch it for step 3), and the authors of the papers tell us explicitly that these are most likely failures of statistical power.
This helps us think about a remaining question: why did these three studies get such different results?
In the Oregon study, better insurance caused higher ratings of self-reported health. But in the RAND and Karnataka studies, it didn’t.
In the Oregon study, better insurance caused less depression. But in the RAND and Karnataka studies, it didn’t.
In the RAND study, better insurance caused increased use of antihypertensive medication. But in the Oregon and Karnataka studies, it didn’t.
In the RAND study, better insurance caused lower blood pressure. But in the Oregon and Karnataka studies, it didn’t.
In the Oregon study, better insurance caused more use of antidiabetic drugs. But in the Karnataka study, it didn’t (AFAICT RAND didn’t measure this).
I think Robin attributes these differences to noise, ie the results being fake in the first place. He writes:
A muddled appearance of differing studies showing differing effects is to be expected. After all, even if medicine has little effect, random statistical error and biases toward presenting and publishing expected results will ensure that many published studies suggest positive medical benefits.
I think this is implausible. Many of these effects are large and replicable. For example, the Oregon self-rating effects are p <0.0001 on each of four different assessment methods, yet these are absolutely null in the other two studies. The RAND blood pressure results are p < 0.03, but match our expectations about subgroups (highest in the poor and sick) and accompanied by a p < 0.01 finding that insurance results in more hypertension medication (which was absent in the Oregon study). The antidiabetic drug result in Oregon was p = 0.008.
Can we explain these through differences in the studies? I think Robin’s analysis here is actually pretty good. Expanding it slightly:
RAND was a normal cross-section of Americans
Oregon was poor and unhealthy Americans
Karnataka was poor Indians who didn’t know how to use insurance, and they only got hospital care (whereas the other two studies included primary care)
We find that Karnataka didn’t result in as many utilization increases as the other two studies because it was only hospital care (which is unlikely to be involved in managing chronic problems like hypertension) and the recipients barely used the improved insurance.
We find that Oregon had increased self-reported health because these were poor and unhealthy people who were very excited to get the new insurance. Robin points out that 2/3 of the improvement came immediately after getting the insurance, before they had time to use it, so this suggests a placebo effect. Maybe these poor unhealthy people were more excited about getting free insurance than the comparatively well-off people in RAND or the insurance-naive people in Karnataka?
But we can’t dismiss the Oregon mental health findings as easily. Many of them came from depression screening questionnaires that ask pretty specific questions about eg sleep and suicidal thoughts over the past few weeks. I think these findings are plausibly real, especially given the strong effects of insurance on mental health medication use (see first graph in section IV above). If so, differences from RAND and Karnataka would be easy to explain: 1970s Americans and rural Indians mostly don’t have mental health problems (or at least don’t think of them in those terms), whereas 2008 Americans do. In 2008 America, depression is common and easy to measure. Also, antidepressants have very large effect sizes (you may have heard they have small effect sizes, but that’s after you subtract the placebo effect; before you subtract placebo effect, they’re extremely effective, and this study isn’t controlling for placebo). So this is exactly the sort of area where you’d expect to see an effect. I won’t say for sure it’s real, but nothing about the studies makes me think it isn’t.
That just leaves the diastolic blood pressure effect in RAND. Remember our funnel again: the difference between RAND and Oregon doesn’t start in Step 4 (does the drug work?) It starts in Steps 2-3 (did more medical visits result in more medication?)
In RAND, we found that better insurance increased the percent of hypertensives on appropriate medication by 20 percentage points.
In Oregon, we found that it increased it by about 2 percentage points, but with the confidence interval including 20.
So my guess is that the middle-class people in RAND were a bit more likely to go for preventative medicine than the poorer people in Oregon, and that if we ran both experiments a million times, we would get something like 5-10 pp out of Oregon and 15-20 pp out of RAND, and that’s enough to give us the statistical power to detect an effect in RAND but not Oregon. I can’t prove this is true because of the statistical power issues, it just seems like a reasonable explanation for the discrepancy.
One more point: because of statistical power issues and multiple hypothesis testing, there are a lot of cases here where we can’t say anything either way. These might be places where an effect seems plausible but we can’t prove it, or where we find an effect but can’t prove that it isn’t a result of multiple hypothesis testing. Here we should go back to the statistical basics and remember that this means more or less nothing. We shouldn’t update our priors.
Part of how Robin makes his counterintuitive argument against healthcare is to say that all of these studies found “null effects”, so now we have to believe medicine is fake. I think instead we should look at the arguments in Section II above, start with a strong prior on medicine being real, and then - confronted with studies that sometimes can’t find anything for sure either way - continue having that prior.
VII. Other, More Positive Studies
Since Robin posted the early versions of his argument, there’s been a newer, bigger, RCT-like study on the effects of medicine.
Obamacare originally mandated that everyone get health insurance, and punished noncompliance with a fine. In 2017, the IRS fined 4.5 million people for not having insurance. It originally planned to send these people a letter, saying “Obamacare mandates you to have insurance, you’re getting fined for failing to comply, please buy insurance through such-and-such a website.” But it ran out of budget after sending 3.9 million letters to a randomly selected subset of the insuranceless. The letter must have been at least a little convincing, because the 3.9 million recipients were 1.3 percentage points more likely to get insurance compared to the 0.6 million non-recipients. So the whole event turned out as a sort of randomized trial of telling people to get insurance.
Goldin, Lurie, and McCubbin followed up on the results. Because this “study” was so big compared to the others (4.5 million participants compared to a five-digit number for RAND, Oregon, and Karnataka), they were able to measure mortality directly. They found that:
The rate of mortality among previously uninsured 45-64 year-olds was lower in the treatment group than in the control by approximately 0.06 percentage points, or one fewer death for every 1,648 individuals in this population who were sent a letter. We found no evidence that the intervention reduced mortality among children or younger adults over our sample period.
Using treatment group assignment as an instrument for coverage, we estimate that the average per-month effect of the coverage induced by the intervention on two-year mortality was approximately -0.17 percentage points. We caution, however, that the magnitude of the mortality eect is imprecisely estimated; our condence interval is consistent with both moderate and large eects of coverage on mortality. At the same time, our estimated condence interval is suciently precise to rule out per-month eects smaller in magnitude than -0.03 percentage points, including the estimate from the OLS regression of mortality on coverage across individuals.
This result was p = 0.01 and robust to various checks.
A 2019 U.S. tax notification experiment did, maybe, see an effect. When 0.6 of 4.5 million eligible households were randomly not sent a letter warning of tax penalties, the households warned were 1.1% more likely to buy insurance, and 0.06% less likely to die, over the next two years. Now that last death result was only significant at the 1% level, which is marginal. So there’s a decent chance this study is just noise.
Come on! Thousands of clinical RCTs show that medicine has an effect. Robin wants to ignore these in favor of insurance experiments that are underpowered to find effects even when they’re there. Then when someone finally does an insurance experiment big and powerful enough to find effects, and it finds the same thing as all the thousands of clinical RCTs, p = 0.01, Robin says maybe we should dismiss it, because p = 0.01 findings are sometimes just “noise”. Aaargh!
Here are some other quasi-experimental studies (h/t @agoodmanbacon):
Sommers, Baicker, and Epstein: finds that when some states expanded Medicaid after Obamacare, mortality rate in those states (but not comparison states) went down, p = 0.001. Note that Baicker was one of the main people behind the Oregon experiment.
Sommers, Long, and Baicker: same story: after Romneycare, mortality in Massachusetts went down compared to comparison states (p = 0.003).
Currie and Gruber: increased Medicaid availability for children was associated with lower child mortality (they don’t give p-values, but some of the effects noted seem large).
See more discussion on this thread.
VIII: Final Thoughts
The insurance literature doesn’t do a great job in establishing one way or the other whether extra health insurance has detectable health effects on a population. Gun to my head, I’d say it leans towards showing positive effects. But if Robin wants to fight me on this, I can’t 100% prove him wrong.
But it’s far less tenable to say - as Robin does - that these studies show medicine doesn’t work. These studies are many steps away from showing that!
First, as discussed above, it’s unclear whether insurance studies themselves should be described as having positive or negative results. The best and biggest, like Lurie and Goldin, show detectable and robust effects on mortality.
Second, when insurance studies fail to show certain effects, they’re practically always underpowered to say anything about the effects of medication. Often they can’t even find that the better-insured subjects use more medication than the less-insured subjects (eg all negative RAND outcomes, all Oregon outcomes except diabetes, everything in Karnataka). When they can detect that better-insured subjects use more medication, they can often precisely quantify whether their study has enough power to test the effect of medication, and explicitly find that it cannot. I can’t think of a single one of the experiments Robin cites that finds an increased amount of medication in the experimental group, a power high enough to find medication effects, and a lack of medication effects. So these studies shouldn’t be used to make any claims about medication effectiveness.
Third, even if we were to unwisely try to use these studies to assess medication effectiveness, they only measure marginal cases. For example, in the Oregon study, the insured group used about 33% more health care than the insured group - eg the uninsured people had a hospital admission rate per six months of 20%, compared to the insured group’s 27%; the uninsured group took about 1.8 medications daily, compared to the insured group’s 2.4. Presumably everyone is going to the hospital for very serious cases (eg heart attack), and the better-insured people are just going for some marginal extra less serious things. Even if we could prove with certainty that the insured group’s extra medication wasn’t benefiting them at all, this doesn’t say anything about the core 2/3 of medical care that people would get even if they weren’t insured.
(Robin sometimes talks about how it’s hard to distinguish core vs. extra care, and I’m not sure how this works on paper, but in practice the poorer patients I talk to seem to be able to distinguish it very well - lots of them will go to the hospital for “real emergencies” but start worrying about money for anything less)
Fourth, we have strong direct evidence that medicine works, both in the form of randomized controlled trials, and in the form of increasing survival rates after the diagnoses of many severe diseases (and this isn’t just the diseases being diagnosed better and earlier, see for example here, or the patients getting the diseases younger, see the age-adjusted rates above). Even in the counterfactual where we had unambiguous, well-powered, non-marginal insurance studies suggesting that medication didn’t work, we should at most be confused by these conflicting sources. Most likely that confusion would end in us setting the insurance studies aside as suffering from the usual set of inexplicable social science confounders, given that they contradict stronger and more direct clinical evidence.
I think if Robin wants to do something with these insurance study results, he should follow other economists, including the study authors, and argue about whether the marginal unit of insurance is cost-effective - not about whether medication works at all.
EDITED TO ADD: Hanson’s response here, my response to his response here.
Contra Hanson On Medical Effectiveness