Sorry, this seems like an exceptionally silly study. I don't think it proves anything other than asking people questions about how many times they've had disturbed sleep, appetite or various kinds of thoughts immediately after major surgery is really silly (as you say but I'd go further and say it doesn't even conflict with claims about placebo).
Why didn't they administer a follow up 2 weeks later? That's an awful lot of trouble and expense to go to not to bother.
Ok that's much more convincing then and makes all the objections about administering MADRS day 1 a bit irrelevant (why not administer it you have them there... though it would be good to remark about how they are likely a poor indicator at that time).
This may be my ignorance, but patients take ketamine on a bunch of different regimen, right? Some folks have intermittent treatments at a clinic, some folks have prescriptions they take daily, etc.
If Scott is referring to those daily methods those objections would still stand, right? Though I think it is likely irrelevant for evaluating the initial single doses if you don’t consider the reflection allowed by the ego death effect as a cause
Even 14 days out from surgery most people are still in the experience of recovering from surgery and so it seems to me psychological effects from surgery may swamp any potential effect.
For most people surgery is pretty scary, there's anticipatory worry, there's rallying resources (internal and external), there may be more support from friends and family, and there's the agency aspect Scott mentions of solving some kind of medical problem that was likely hanging over the person. There's also a lot of purposeful care and self-care that happens in the couple of weeks after surgery that I can imagine would have positive effects on mood. And of course, people are relieved to have the surgery over with, to have made it through.
Not to mention, that some people may still be experiencing mood benefits from the anesthesia and post-op pain meds for most of those 14 days out.
I'm with Scott that this study is unlikely to show anything helpful, but that lower-dose studies with conscious people over some weeks of treatment and follow up over some months is going to be way more meaningful.
...so 6 times post-surgery during 14 days apparently. It must be the same questionnaire each time in order to get a time series, so I assume that is what the researchers used. Then you get the following problem: Most people will remember what they answered last time. And many of them, in order not to appear as forgetful or devoid of clear opinions about themselves (how they appear to others, even total strangers, matters to people), may answer similarly to last time(s) - for such reasons alone. Which can help explain why the curve is almost flat for both groups between day 1 and day 14. This would be an argument for waiting at least a week before giving people the questionnaire first time.
Hey, the people who develop and validate questionnaires are not dum dums. Part of validating a questionnaire is to check for problems like that. If you are trying to measure something that fluctuates, you want a test that captures the fluctuation. If scores stay stable then either you are measuring a trait, something that stays stable, or else your test somehow traps subjects into responding the way the remember responding, rather than spontaneously and honestly.
Yeah well....I have done my fair share of questionnaire reseach, including construcing quite a few, and without wanting to come acoss as a cynic, Bismarck's old dictum (slightly extended) is not way off the mark: "Those who like laws, sausages and questionnaires should not look while they are being made - or validated." Granted, psychologists are better at validating procedures than the other social sciences (they better be - "doing tests" is the main corner that psychologists occupy in the social sciences). But the question Scott presents: "you ask patients questions like “how many times have you had thoughts about guilt in the past few weeks?”" is quite close to the type of question I am sceptical about. There is this hard-to-shake idea that statements/opinions/reports (about yourself or whatever else) can somehow be "lifted up from inside you", like measures of blood pressure. But statements/ opinions/reports about self are not like that. They are low cost/low benefit behaviors on display. The cost is only something like the energy to lift a finger over a keyboard and press Y or N; the benefit is the satisfaction of having self-expressed yourself to others (which also includes the risk that others may interpret the sign you send as indicating that you are low prestige/someone to be pitied/wishy-washy in your self-expressions/something else your dislike being perceived as, or dislike perceiving yourself as). I could go on, but it would be too long...If I could get the ethics committees off my back, I would much prefer to investigate depression and everything else by studying costly behavior: tag people and their phones during a time interval and check when they get up in the morning, what they eat (if they eat) during the day, if they leave their room, if they talk to others, if they are away from work, if they drink and how much, and so on and so on. In short: Don't listen to what people say (unless what you want to study is only what people answer in response to different types of prompts/frames) - look at what people do.
The effects of a single dose of ketamine wear out after a couple weeks -- at least that's what I recall from early articles about it: Someone, often at an ER, would give it to an extremely depressed, actively suicidal person. They would feel much better in hours, and were no longer suicidal. But the effect would wear off in a week or 2.
This is also what saved my life — temporary relief, extended indefinitely — except with opioids instead of ketamine.
I would have sworn as a kid, and now swear as an adult, that I'd been happy maybe a double-digit number of days in my entire life — many of those days, oddly, when I was very ill...
...and then I took some opioid analgesics purposefully and while being old enough to put the pieces together. It was as if... like... whatever, I'll be unoriginal: as if I'd lived in a soundless world of black-and-white, and suddenly could see color and hear music.
As much trouble as they've caused me, well... they earned it, so to speak: I'm absolutely certain I'd have offed myself without them. (I still feel the black despair creeping in if I try to quit, to this day; I'm on bupe, which no longer gives me much euphoria but seems to do the trick regardless.)
To paraphrase Fitzgerald "Omar" Khayyam:
Though Morphine has much play'd the Infidel,
And robb'd me of my Robe of Honour -- well,
I often wonder what Big Pharma buys
One half so precious as the Goods they sell.
---------------------------------------
The unconscious association between hospitals and happiness I'd developed HAS made me wonder if this — or similar; e.g., post-op painkiller regimens — could explain some of the effect in this study, though.
Such recognition when I read your comment! I have had the same experience. There are some very interesting studies on the use of esmethadone (the S- or dextro-isomer of methadone) for depression. It is much less active as an opioid agonist but it affects the NDMA receptor much like ketamine does. I have been on (standard racemic) methadone for opioid dependence for some time, after I found in my youth that opioids were an effective ward against the mind-withering despair that has pursued me throughout my life. I was pushed by the clinic I attend to taper down and consequently suffered a relapse of both depression and opioid use, in that order. Back on methadone at a therapeutic dose I am myself again, but I can’t help but wonder if it had been treating my depression all along. More research on this is surely needed. I have heard this experience echoed by friends on buprenorphine as well. Thought you might find this article interesting.
Oh, THAT's so familiar as well, heh — "you gotta get off entirely!" → "okay, I'll try..." → [depressed again] → [addicted again].
Very intriguing link, thank you! Will read in detail as soon as I find my damn phone charger... don't judge me, it was a long night aight!
I have long been sort of skeptical than NMDA-anything can be related to depression — I have been unimpressed by personal experience (also in attempting to use NMDA antagonists for tolerance reduction; I dunno, maybe it's just subtle...) and what I've read so far — but it also doesn't seem IMplausible, either. If there were ever a silver bullet, hitting mu-opioid and NMDA both is surely it...
(...I did truly love methadone, too, although I don't know if the regular stuff is the racemate or the other enantiomer, heh.)
P.S. "Laurencat"? What a cute name. Cats are the other reason I haven't taken The Big Sleep yet. My kitty is right next to me right now!
(...as always. God bless her. oh, kitty, I want you to live forever. now I'm gonna cry don't look at me—)
I have always thought cats are the morphine of the animal kingdom. Warm, comfortable, a little dangerous, and habit forming. I have one curled up with me now!
Your commentary on opiates as a long-term anti-depressant resonated with me and mirrored my personal experience as well. I experimented with a lot of drugs and my late teens early 20s that probably did some numbers on my brain but the one drug I can never bring myself to quit, was opiates. I was always able to maintain a very low dose of whatever opiate I was taking and I have been doing that and living a productive life for 20 years. It takes many years for me to build a tolerance, and when it does, I usually switch drugs, and I can start at a low-dose again of a new opiate drug. I started with the badboy oxycontin, than tramadol, than methadone, and now I am back to oxycodone. I always stayed at low doses and once it got to more than 1 pill a day i knew that I needed a switch. This system has worked better for me than any other system I have had. I don't smoke or drink, and I live a pretty happy, productive life and raise a family. If one day I wasn't able to get these prescribed I would sink in to a permanent mild depressive state that would last indefinitely, after a few months it just becomes too much. I lie about the reason i need them to my drs also so I can get them because no one will prescribe them off label,
Yeah, it is rare to find a doctor that one can truly be honest with — Scott's essay defending "pseudo-addiction" blew my mind; I had the same thoughts before when my then-psychiatrist was "concerned" with how worried I was about getting my buprenorphine refilled, lol... — and so far I've met exactly 1 doctor who was open to the idea of prescribing opioids off-label for depression (and even then, I wasn't asking him to do so; we were just chatting; I bet if I'd actually requested it, he'd have said no way... not that I can blame him, heh, the way the discourse is going these days).
It's interesting you mention the slow build-up of tolerance and don't mention any other side effects — I also take a long, long time to attain any sort of significant tolerance, and have literally never once experienced any side effect such as constipation, difficult urination, nausea, lack of appetite, etc. It's another reason that made me think I was made for 'em, lol!
Bupe is easy to get if you've been on other opioids for a long time, FYI, as a possible emergency route. It's not a full agonist, but it has kept me well and happy enough to enjoy life, no question.
Agreed. Particularly with all the potentially confounding psychological experiences of the surgery itself, anesthesia, pain meds, post-surgical relief, and changing care routines that happen as part of a surgical experience.
I just read your whole post, that was really well done!
In a totally unmathematical way, the study seemed way too small. You raise an important ethical issue too which is the cost to research subjects and to the field being studied if you run a study that's underpowered.
I also wondered whether the potential swamping effects of the whole physical and psychological experience of surgery constitutes confounding variables that need to be addressed?
It seems likely the combo of the study being underpowered and the many aspects of a surgical experience being so potently affecting to people's moods that this study can't tell us anything useful.
There is a massive cottage industry of scientists who write papers along the lines of "my underpowered study didn't find an effect, therefore there is no effect". Is it true? Probably not. But it gets published, which is what is truly important here.
It may get published, but it won't get published someplace with high standards. It's astounding the shit that gets published. I once searched PubMed for the weirdest, stupidest pseudoscience I could think of, and I actually found published papers about each one -- shit like energy healing and astrology and their ability to cure this and that. The worst was the result of a search for "demonic possession." Yup, actually found a paper arguing that psychotic people do not suffer from an illness, they are possessed by demons and need an exorcism.
> Sometimes researchers try to use an “active placebo” like midazolam - a drug that makes you feel weird and floaty
How do the ethics of "active placebos" work?
In a normal trial you either give a trial drug (which we have a fairly good reason to suspect might be helpful) or a placebo (which we are confident will have no effect at all). But giving an active drug with potential harms and side effects, without any kind of belief that it will cure what ails the patient, just for the purposes of a more realistic placebo, seems ethically fraught.
Also, if the trial group does better than the placebo group, can you be confident that it's not just due to undocumented deleterious side effects of the active placebo?
Yes, they are ok *questions* but what they aren't is moral reasons. There is this very unfortunate tendency, especially in bioethics, to react to anything that's a decent question as if it's a reason to avoid doing the thing. A question isn't a persuasive argument.
I don't mean to jump on you, they are worthwhile questions, but the problem is that if you leave the dialectic here everyone treats the fact that those are questions (and the fact they are a bit scared to answer them because every coherent answer requires bitting some bullets) as if they are reasons to shy away from performing those studies and the net effect is that we end up causing more suffering in the name of moral concern.
Thanks for your reply and I suppose I can see how you could interpret my minor comment as something bigger or different from what I meant, which was simply to acknowledge the questions themselves (which I tend to see as a gateway for exploration rather than a reason to discredit or dismiss). Personally, I wouldn't think these questions would prompt a write off but, on the contrary, would urge more interest. That said, while the questions were thought provoking, I'm not sure I would frame it as an ethical dilemma, myself.
As an aside: This article felt timely as I just completed three weeks of IV ketamine treatments after having read studies, speaking to others, following my intuition, and doing some extensive training on supporting others through integration with psychedelic journeys. I'm personally a believer in the power for healing with this approach, and I'm curious about how it's viewed and studied by others. I both feel solid and restful in how it impacted me, and I'm open to hearing/reading things that may offer other insight.
I hope the studies continue, and I hope it reaches a stage where it isn't seen as "experimental" and remains out of reach of so many who, I believe, could find some tremendous healing. And if, in the end, there's some indication of placebo effects, I'm still here and happy to feel better. As with any topic, I hope questions don't become an excuse to "avoid doing the thing."
(Also, none of this is my wheelhouse. I have seen some reports of brain scans pre- and post-treatment and the effects on synaptic connections which I tend to think builds more of a complete picture than relying on questionnaires.)
To be clear, I didn't interpret your comment that way. I tried (unsuccessfully apparently) to indicate that when I remarked about how it left the dialect. Your comment is perfectly reasonable and correct.
However, the problem is that you say that and then there is a tendency of people to treat the fact that there are good questions here as a reason to treat such studies as if they have a partial black mark against them (well there are unresolved ethical questions).
The underlying problem isn't your question it's the fact that people have an asymetry in how they treat doing a study versus not doing it because in reality there are good questions about whether it's ethical not to conduct such studies as well but because that's not doing something to most people's minds they don't treat that as also having a cloud hanging over it.
So I don't mean to critisize you (and sorry if it came off that way) but as a kind of biased response that many people have to the perfectly correct point you made.
Just as an additional remark, what I think is deeply wrong with the assumptions made in much of bioethics is that it implicitly slights the very kind of concerns and interests like the ones you mention here. You don't just selfishly care about yourself, you care about finding out more about this kind of therapy in the hope that it will benefit others.
Yet the default assumption is to disregard those interests of yours as if they didn't matter or couldn't count as genuine benefits to you. That's not only wrong but I think a deeply disrespectful attitude taken toward patients. Also, inconsistent as we are willing to give weight to totally irrational desires of patients to refuse treatment such as being a Christian scientists or thinking vaccines contain mind control.
I think this systematic tendency is the result of two unfortunate incentives in bioethics: the incentive to claim that purely pragmatic concerns are moral principles and the incentive to cater to the medical establishment/avoid criticism not the public at large.
The issue with 1 is simply the incentivizes of academic philosophy. New arguments get published and simply saying that there are risks that researchers will overstate the benefits of their research in convincing patients to take part or otherwise just saying we need to balance consequentialist costs/benefits doesn't result in many papers but coming up with a new argument why this thing is really a moral principle does.
The issue with 2 is more obvious if a bit more counterintuitive. You might think the incentive would be to tell researchers they can do whatever they want but doing that risks blowback on the institution. Moreover, it's not what taking patient interests seriously would require. What that actually would require is that researchers themselves made judgements about the value of their research and presented those honestly to the patients not just threw forms at them the IRB demanded and got to avoid making any hard moral calls themselves.
No, this whole idea that you can never administer anything with the slightest chance of harm to a patient is just total hogwash and never made sense. The patient fucking undertakes a risk of serious death every time they have to drive into the treatment center for an evaluation or followup but apparently we don't count that.
What matters is that the patient understands the risks and wishes to undertake them. Frankly, I think it's downright near unethical to refuse to consider the patients own desire to accept a certain risk to help understand a condition -- that's refusing to respect the patients own beliefs and preferences and substituting the concerns of the person administering the test. That's not ethics it's selfishness.
This all makes sense, but from reading the ACX posts on IRBs, aren't all the imperfections of the researchers swamped by the IRB the way it is currently defined and practiced?
Yes, it's the people who run and support the IRB system who bear the moral blame here. I don't think I focused on the researcher (it's the people who have a choice about this which are the IRB and those who entrench that system).
This is why it's so important for researchers to stay up to date on the latest social justice fashion in order to override the IRB concerns by framing any regulatory restriction as an attack on the currently most favored marginalized groups or the defense of existing structural exclusion.
Researchers who put their personal comfort over working the system as it's intended to work now also bear some blame, since it's not as if the IRB system has a strong independent base of support and is immune to political pressures.
"I feel personally uncomfortable accusing individual reviewers of being transphobic white supremacist rape apologists to get my paper through review" may be a laudable personal ethic just like "I feel killing children is wrong", but if your job is to do scientific studies or fly predator drones, that ethic directly conflicts with your career development and job duties in a US context.
I sort of doubt that's even true, to be honest. Maybe the picture has changed, but when I looked into it some years ago I couldn't find any solid reason to believe they *actually* worsen depression at all — just the usual flawed and underpowered studies, using weird metrics (and at least one of which actually found diazepam, IIRC, had the opposite effect!).
I don't think Scott's presentation of the nature and purpose of placebo controls is exactly right, and would point to this article for a more in-depth discussion. https://doi.org/10.1016/S0895-4356(01)00496-6
When a trial is performed, the role of the placebo-control is to help separate signal from noise. They are designed so that comparisons between the groups tell you about the signal (the effect of your intervention compared to placebo), and the rest is noise. Often times, all of the noise gets called "the placebo effect", but that's misleading. The placebo effect would be the difference between getting the infusion of saline (the placebo), and getting nothing - which was not assessed in this study.
Looking at the difference within a single group, such as the pre- to post- comparison of patients who got placebo, is just a cohort study following those patients while they undergo surgery. It would be silly to attribute a reduction in depression scores for people before surgery and after surgery to the placebo - it's the result of the things that happened in the interim (getting through a surgery, possibly related to receiving the anesthetics as mentioned, though I'm skeptical of that). It has nothing to do with placebos or even regression to the mean.
Yep, I heard opioids can make you feel fine for a while. Experienced it, in fact. Very effective antidepressants if it wasn't for the obstipation and some other problem.
They've worked great for me for about fifteen years!
I feel like maybe I was born with an endogenous opioid issue. I get 0 side effects — neither constipation, nor difficulty urinating, nor nausea, nor even significant tolerance — regardless of how much opioid I'm on; and EVERYTHING in my life got better, even the stuff opioids normally make *worse* for people.
(e.g.: more appetite, no more insomnia, no more cramps and diarrhea, better mood, more sociable, more diligent, less sensitivity to sunlight...)
I love them. Right before I discovered opioids, I told my parents I couldn't remember being happy. I'd been to psychiatrists, tried all the first- and second-line stuff, therapy... didn't help; but the instant that hydrocodone hit my bloodstream, I was cured.
Effect hasn't ever gone away, either. Did I get super addicted for a while? Well, yes—bur I've been stable on the same low dose for years now. Thank God for the poppy.
There really seems to be a kind of healthy opiod use but one has to be careful. I've met a few persians who used to consume opium in their country and after emigrating ended up in a methadone program. They told me things like "It's like your beer here in germany".
Tangential: Is anyone else a bit sceptical of the current focus on measures of "effect size" as currently defined? Lots of review articles just quote those, and I suspect that a lot of nonmathematical readers assume from the name that it means something like "how effective the treatment is" , which ought to be scaled such that some number eg 1.0 meant "completely cured". But actually it really seems to mean something more approaching "signal to noise ratio", so it's telling you how reliable the study is. Which is vital to know, but actually I'd like to know how effective the treatment is too. In theory it's useful as a comparator of treatment effectiveness, but I'm not convinced that people are paying sufficient attention to the conditions for that to be the case.
There is some interaction between protonmail /substack which means I don't seem to get a notification for a good fraction of Astral Codex Ten posts, even though I have all the emails when I go looking
No, there is a number that quantifies the effect size in a way that really gives you a feel for how large and how important it is, and most studies use that number. The thing that you're calling the signal to noise ratio, that tells you how reliable the study is -- that's a different statistic. It's usually in the form of p < some number, say .01. That means that the chance that the difference found between the subjects is mere coincidence, rather than an indication of a real difference, is less than one in 100. The statistic that gives you a picture of how important and substantial a difference is is called standard deviation.
So take caucasian men's heights. The average is 69". So say you get the height of all caucasian men and see how far each is from 69", then average all those numbers. That measure is called the standard deviation. It's the average distance people. in the population are from the mean. For this population, it's 2.5". Someone who is 71.5" tall is one standard deviation above the mean. So it turns out that for most measures like this, whether it's measures of height or of depression scores, there's this regularity: 68% of people in the group are within one standard deviation of the mean, half of that subgroup measuring above the mean and half below. Under this rule, 68% of the data falls within one standard deviation, 95% percent within two standard deviations, and 99.7% within three standard deviations of the mean. So you can see how that measure gives you a feel for how big and important a change is. If there are 2 guys, and one is 1 standard deviation taller than the other, then he’s 2.5” taller. So that’s a fairly substantial difference — definitely not too subtle to notice. If somebody’s SAT score goes up one standard deviation, that means it rises by 217 points — so enough to make a difference in what colleges they are likely to get into.
Scott’s study was not finding differences. But I found one meta-analysis of ketamine effects that combined the results of 15 very well done studies, and found that people getting ketamine during surgery had approximately one standard deviation less post-surgical depression than people who got placebo. You now have a sense of the magnitude of the difference, Other things that would give you a sense of how to judge the effect would be how the average depression score of the ketamine subjects compared to that of non-depressed members of the general population. Then you'll know whether they are actually feeling about as good as a non-depressed person, or just feeling better than they would have without ketamine. You can also go look at the actual depression test, and get a sense of how many I-feel-awful items somebody has to be endorsing to get a certain score. That will give you another yardstick.
"It's usually in the form of p < some number, say .01. That means that the chance that the difference found between the subjects is mere coincidence, rather than an indication of a real difference, is less than one in 100..."
Oh god. Oh *GOD*. I don't wish to sound like a Doomer, but the fact that a frequent reader and commenter here would make such a mistake, can't help but make me feel like it's a sign that the entire Rationality project is doomed.
Basically, if one thinks p-values are the chance of things just being a coincidence,
1. You get weird results like "A screening test for cancer with a p-value of 0.01 / 99% success rate on cancer-free people, actually has a success rate of only like 10%, in that only 10% of those who get flagged actually have cancer" (the classic Base Rate Fallacy/False Positive Paradox: https://en.wikipedia.org/wiki/Base_rate_fallacy#Example_1:_Disease)
2. You get weird results like a study with an amazing p-value of 1.2*10^(-10)... saying that psychic powers are real, and people can see the future if they vibe hard enough.
Those are very very important things to learn, the foundation of almost everything Scott has ever said about science and the Replication Crisis (e.g. his famous 5-HTTLPR article: https://slatestarcodex.com/2019/05/07/5-httlpr-a-pointed-review/). Habitual Scott readers should be some of the most familiar people on the planet with the idea "p-values are *lying snakes*, don't trust them" and "p-values don't mean what people think they do". *Not* knowing that after years of reading SlateStarCodex should be as impossible as a modern young Democratic Party staffer having no idea who Trump is -- in a way, we talk of nothing else but the adversary we're fighting against.
I just don't know what to say in response, I suppose, beyond "Thank you for teaching me something I didn't know, it shattered a lot of my beliefs in the same way mythical Ozymandias would be shattered if he saw that statue of himself from Shelley's famous poem, "Ozymandias". But as we like to say, 'That which can be destroyed by the truth, should be.' "
... damn, I know I sound like I'm overreacting, but like I said, this should be as impossible as a Bible reader having no idea who Satan is. The fact that it *is* (possible), is not an indictment of you, it's an indictment of me and my warped worldview... God, this must be how universities feel when they review the evidence that they don't actually teach critical thinking, just how to pass tests (e.g. https://www.insidehighered.com/views/2016/06/07/can-colleges-truly-teach-critical-thinking-skills-essay).
... I used to believe that the key to solving problems was being the change you wanted to see in the world. Once again, I have been burned. Damn it.
As taught in any introductory statistics class, the p-value is, in fact, more or less what Eremolaos claims it is by definition. Yes, we all know there are many problems with how the p-value is used in scientific publications, and how it is sometimes interpreted, that does not change the definition of the p-value!
A more or less accurate explanation of the p-value would be that it answers the question 'supposing there's no real effect here, how likely were we to get a result this strong or stronger purely by chance?' I think the difference between this and the quoted definition is pretty important. If we test a plainly ridiculous hypothesis and get a positive result with p < .05 we should not conclude that the probability of a real effect is >95%.
I fail to see the difference between what you are saying and the original statement, i.e. I thought that was what it was saying in somewhat loose terms, at least by a charitable reading?
I've just replied below (under your reply to my reply to K. Liam Smith) and I think I'd be repeating myself here, so if you feel like carrying on the conversation let's switch to that subthread.
But we live in the universe where it actually means a complex, hard to use, and unintuitive thing: "The probability of observing a result at least as the extreme as the one observed if the null hypothesis is true." Rewritten to simple English, that's "*If* I'm wrong, this is the probability of seeing what I'm seeing right now anyways due to pure dumb luck."
It is not "The chance of being wrong." This misunderstanding was how Bohannon was able to take 20 completely false facts about chocolate ("Chocolate helps with blood pressure! / Chocolate helps with cholesterol! / Chocolate helps with sleep quality! / Chocolate helps with exercise! / ..."), throw all 20 at a p=0.05 test, find the one that randomly happened to pass ("Chocolate helps with weight loss!"), and let the newspapers publish it as "95% CHANCE CHOCOLATE HELPS WITH WEIGHT LOSS: SCIENCE".
It's not just journalists either, the point of Scott's article about doctors (https://slatestarcodex.com/2013/12/17/statistical-literacy-among-doctors-now-lower-than-chance/) is that 58% of them could not answer this basic question about p-values (46% gave the wrong answer, while 12% could not answer at all). Despite the fact that 63% of them rated their statistical literacy as adequate, only 26% of them got the correct answer on the 5-choice question about the rare cancer screening, i.e. it's entirely plausible that only about 8% actually knew, and the other 92% randomly guessed (cause 8% + [92/5]% = about 26%).
I mean, for goodness sake look at John Ioannidis' classic "Why Most Published Research Findings Are False" (https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124). Ioannidis points out that if you use the actual definition of the p-value, rather than the definition people think it has, and work through the implications, you find that a majority of published research should be false for the same reason that the majority of cancer warnings from the rare cancer test example are false. But people had never realized this for years, thinking that Science was *way* more solid than it really was, cause they didn't even realize they didn't know.
That paper was foundational to the field of metascience, the discussion of the Replication Crisis, the push for statisical literacy in science & medicine, etc. - Scott has essentially been banging on Ioannidis' drum for over 10 years, announcing to anyone who will listen what p-values really mean, and thus *really* mean once you digest the implications^[Footnote 1]. The fact that people *still don't know what they don't know*, despite listening to the drum for years, means that something must have gone terribly wrong. Even if I still don't know what exactly.
(Footnote 1: Scott *explicitly* discusses things like this at the likes of https://slatestarcodex.com/2015/05/30/that-chocolate-study/ [That Chocolate Study], what people should be taking away from all this. Not just "p-values don't do what you expect", cause that's just foundational, but the implications that flow out from there which people might not see or quickly forget, like "Don't trust lone papers" and "Don't trust science journalism" and "Statistics is unintuitive".)
You know, it's very unpleasant of you to use the tone you are to lay out the case for a different and possibly more deeply true way of thinking about p values etc.. I have not read the stuff you're linking, but I will. I was responding to some people who seemed not to know some basic stuff, so I typed out the clearest brief explanation I could of measures of the sizes of differences and of how much you could trust them. If those people read it, they now at least understand that stuff better. How about taking the trouble to write out a user-friendly summary of your understanding of how one does decide how seriously to take research that finds a difference between 2 groups, instead of several paragraphs of screaming about how only an amazingly unenlightened jerk would fail to know in 2023 certain deep truths of metascience, truths that you link to rather than explaining?
I'm not at this point convinced that the unsophisticated understanding of p values makes a lot of difference in practice. Note that Scott's discussion of the research in the present thread does not challenge its statistics. He also does not challenge the statistics of other findings he mentions about the. topic. In fact I have read lots of Scott's threads that mention various research findings, and while he is in general skeptical of conventional truths, and challenges the reasoning of researchers in ways that seem valid to me, I can't recall him challenging it on grounds anything like what you are talking about.
Note that most uninfected people test negative, as they should; this disease test has a good p-value.
Note also that 100% of the infected people test positive, as they should.
The test is still wrong near 100% of the time.
This is the "deep truth of metascience", borne of a slightly more sophisticated understanding of p-values: p-values do not do what people think they do. What people think they do is called "Positive Predictive Value"/PPV/"The chance the thing is actually true", and the Positive Predictive Value of most studies is wildly overstated, much worse than what they imply through their p-values. In fact, most studies in a lot of fields are straight up garbage: https://fantasticanachronism.com/2020/09/11/whats-wrong-with-social-science-and-how-to-fix-it/
"Criticizing bad science from an abstract, 10000-foot view is pleasant: you hear about some stuff that doesn't replicate, some methodologies that seem a bit silly. "They should improve their methods", "p-hacking is bad", "we must change the incentives", you declare Zeuslike from your throne in the clouds, and then go on with your day.
But actually diving into the sea of trash that is social science gives you a more tangible perspective, a more visceral revulsion, and perhaps even a sense of Lovecraftian awe at the sheer magnitude of it all: a vast landfill—a great agglomeration of garbage extending as far as the eye can see, effluvious waves crashing and throwing up a foul foam of p=0.049 papers. As you walk up to the diving platform, the deformed attendant hands you a pair of flippers. Noticing your reticence, he gives a subtle nod as if to say: "come on then, jump in"."
All this is to say what you did was helpful: teaching people that p-values are not effect sizes, and that something that has a good p-value can have any effect size imaginable. Sadly, however, it instead perpetuated the misconception that p-values are good, rather than bad (to put it simply). P-values are bad. No one should trust them. This is the most important thing one can learn, even if it is an oversimplification. Don't trust p-values.
> As taught in any introductory statistics class, the p-value is, in fact, more or less what Eremolaos claims it is by definition.
What? Eremolalos' definition is completely unrelated to the definition of the p-value.
The definition of p-value is "the probability of seeing results similar to what we actually observed under the null hypothesis". Stated as a conditional probability, the p-value is P(observed data | null hypothesis is true), which is easy to calculate, and Eremolalos has erroneously defined it as representing P(null hypothesis is true | observed data). These are unrelated quantities, and the second one obviously cannot be calculated at all, which is why nobody tries to do so.
No, what I said is right -- except for whatever objections Scott is on board with, which I have not read yet. But those I'm pretty sure are more subtle and philosophical, having to do with how one thinks of probability of being wrong -- not simple math objections.
>The definition of p-value is "the probability of seeing results similar to what we actually observed under the null hypothesis". Right. So let's say our hypothesis was that on average caucasian males with blue eyes are shorter than males with brown eyes. The null hypothesis is that on average blue-eyed males are the same height or taller than brown-eyed ones. So we measure 1000 of each, and get the mean and standard deviation of each group, and we find that on average blue-eyed males in our sample are half an inch shorter than brown-eyed ones. But our hypothesis is not about the 2000 men in our study, it's about *all* caucasian men. So we need a statistic to tell us how reasonable it is to use our result to make that generalization about all caucasian men based on our result. What is the chance that we would be wrong. That statistic is p. It's the probability that the null hypothesis is correct: Blue-eyed men are not shorter than brown-eyed men.
>Stated as a conditional probability, the p-value is P(observed data | null hypothesis is true), which is easy to calculate,
How would you calculate it, given my result? I gave you the mean height of each of the 2 groups in my study: Blue = 68.75, brown = 69.25. If you need standard deviations of each of the 2 groups in my study to make the calculation, let's say that standard deviation for blue-eyed was 2.3 and for brown-eyed it was 2.6. 1000 blue-eyed and 1000 brown-eyeds were in the study.
I can't really tell what you're trying to say here. Are you looking for the observation that, because you don't have a null hypothesis, you're unable to calculate any p-values? That's the only possible response if I read your comment literally.
Or are you looking for me to supply my own null hypothesis and calculate p-values from it? (The logical choice would be "the blue distribution and the brown distribution are both normal distributions with mean 69 and standard deviation 2.5", since you've already specified that that is the overall distribution.)
I agree with Leppi below. Eremolalos gave an accurate explanation of p-values and how they’re interpreted by the broader scientific community. You might have concerns with p-values and prefer a Bayesian approach, as I do, and the broader rationalist community does. But it’s not like Eremolalos waved a wand and said, “Let there be frequentism.”
I'm curious for (reliable) sources that give Eremolalos' definition. Wikipedia and my memory give: the p-value is the probability that the underlying process would generate the given observations (or observations more extreme) if the actual effect size is zero.
Is there an equivalent (for frequentists) definition where they get the probabilities of the null hypothesis? How do frequentists get around the base rate issue?
> Is there an equivalent (for frequentists) definition where they get the probabilities of the null hypothesis?
The null hypothesis is usually that the control group and the test group are the same. However, let's take a common example: coin flipping. It's usually imagined to be 50/50. But where does this come from? In reality, coins don't actually have 50/50 odds of being heads/tails because metal rubs off from them and they were never perfectly identical to begin with. But coins on average are very close to 50/50. So as a Bayesian we'd say our prior is 50/50. As a frequentist, you just ignore the problem and don't call it anything.
(I'm not endorsing the tone of WindUponWaves's comment, but) I don't see how "the chance that the difference found between the subjects is mere coincidence, rather than an indication of a real difference" can be read as an accurate explanation of the p-value. The real definition is
> the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct.
> the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct.
Therefore the p-value is the probability of making an observation at least as extreme "by chance" given that the 0 null hypothesis is correct. Which was approximatly what was stated?
My reading is that "the chance that the difference found between the subjects is mere coincidence, rather than an indication of a real difference" = 'given we got this result, how likely is it that it's a false alarm and there's no real effect?'.
Whereas "the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct" = 'supposing there's no real effect here, at the beginning of the experiment how likely were we to get results this strong or stronger?'
Those are importantly different because the answer to the first question depends on more than just the answer to the second question. For example if we test a silly hypothesis like 'saying these magic words when flipping a fair coin increases the chance that it will land heads' (null hypothesis: it's still 50/50), and we design and conduct the experiment perfectly but happen to get a positive result with p = 0.04, our estimate of "the chance that the difference found between [flips with and without the magic words] is mere coincidence, rather than an indication of a real difference" should be way more than 4%.
I agree that it's perhaps not as rigorous as you’d like, but I think a charitable reading of it is a reasonable description of p-values in a frequentist paradigm.
It sounds like you’re already familiar with all this, so I’m probably going to say some stuff you already know, but I just want to lay out where I’m coming from.
In a Bayesian context, probability is the chance of an event happening in the future. In a frequentist context, probability is the limit of the frequency of an event after near-infinite trials. It’s inherently retrospective/descriptive rather than predictive. A frequentist would say, in an experimental trial, the coin landed on heads 47 times out of a 100. A bayesian would say, it’s about 1/2 it’ll be heads the next time you flip it.
Changing the definition of probability leads to two different definitions of p-value. p-values are unimportant in a Bayesian context, which has led to a lot of blog posts arguing that they don’t exist at all in Bayesianism. This is Gelman’s definition: “From a Bayesian context, a posterior p-value is the probability, given the data, that a future observation is more extreme (as measured by some test variable) than the data.” [http://www.stat.columbia.edu/~gelman/research/published/STS149A.pdf]
Note the shift from a p-value describing past experimental results to predicting a future observation. But again, these aren’t really used in practical application and this is kind of pedantic. It’s clear that we’re talking about the frequentist definition as almost everyone does when talking about p-values, so I’ll stay focused on that.
In a frequentist paradigm, a p-value is the probability of a false positive based on the frequency of results in the experiment. Let’s say you have a hypothesis that a coin is weighted. In frequentism, a p-value asks the question, “If we had a fair coin, what’s the probability we would get 23/100 heads?”
Let’s say we put a sticker on a coin and want to know if that is sufficient to make it weighted. We flip it 100 times with the sticker as a test group and 100 without the sticker as a control group. We get 23 heads and 50 heads respectively. As you wrote, we want to find “the probability of obtaining test results at least as extreme as the result actually observed [23 heads], under the assumption that the null hypothesis is correct [the sticker doesn’t change the frequency].”
Eremo wrote, “the chance that the difference found between the subjects [the test results obtained with the coins with and without the stickers] is mere coincidence, rather than an indication of a real difference [what was the probability under the assumption of the null hypothesis]”
I don't think the rant from Wind is really about the definition of p-values, but really arguing that Bayesianism is better. I agree with Wind that Bayesianism is better, but that doesn't make Eremo's description of p-values wrong. Imagine there's a car wreck (aka the replication crisis) and a bystander describes how people were driving before the wreck. I think we're yelling at the bystander for describing how they were driving rather than the drivers for causing a wreck.
I'm not meaning to read uncharitably, but it's possible my pedantry or some assumptions I don't realise I'm making are leading me in that direction. From my perspective, though, you're being overly charitable here:
> Eremo wrote, “the chance that the difference found between the subjects [the test results obtained with the coins with and without the stickers] is mere coincidence, rather than an indication of a real difference [what was the probability under the assumption of the null hypothesis]”
To me the natural reading of "the chance that the difference found... *is* mere coincidence..." is something like 'given we got this result, how likely is it that it's a false alarm and there's really no effect'? Which, even if we're not going full Bayesian, is an importantly different question from "what was the probability under the assumption of the null hypothesis". And if we interpret p-values according to that first definition, we're setting ourselves up to draw some silly conclusions.
> In a frequentist paradigm, a p-value is the probability of a false positive based on the frequency of results in the experiment.
But this is false.
Compare Ronald Fisher writing in 1925:
> The deduction of inferences respecting samples, from assumptions respecting the populations from which they are drawn, shows us the position in Statistics of the Theory of Probability. For a given population we may calculate the probability with which any given sample will occur, and if we can solve the purely mathematical problem presented, we can calculate the probability of occurrence of any given statistic calculated from such a sample. The Problems of Distribution may in fact be regarded as applications and extensions of the theory of probability. [p. 10] Three of the distributions with which we shall be concerned, Bernoulli's binomial distribution, Laplace's normal distribution, and Poisson's series, were developed by writers on probability. For many years, extending over a century and a half, attempts were made to extend the domain of the idea of probability to the deduction of inferences respecting populations from assumptions (or observations) respecting samples. Such inferences are usually distinguished under the heading of Inverse Probability, and have at times gained wide acceptance. This is not the place to enter into the subtleties of a prolonged controversy; it will be sufficient in this general outline of the scope of Statistical Science to express my personal conviction, which I have sustained elsewhere, that the theory of inverse probability is founded upon an error, and must be wholly rejected. Inferences respecting populations, from which known samples have been drawn, cannot be expressed in terms of probability
(I was actually looking for him to provide a definition of P, but as far as I can tell he does not do so. You're expected to be familiar with Pearson's work.)
Fisher says in so many words that once you've analyzed your sample and calculated your p-value (in his words, it's just called P), you cannot make any statement at all about the distribution actually exhibited by the population. That's not what p-values represent. Eremolalos is saying the opposite.
I trust you'll believe that Ronald Fisher in 1925 was not writing from a Bayesian perspective?
The definition that Eremoloas seems to suffer from something like the base rate fallacy wrt the base rate of real effects.
Confidence test give you probability that you would see a result as big as the one you saw, given the null hypothesis.
This doesn't give you P(null hypothesis is false | the data). If there are no real effects in the area you are investigating, a statistically significant result just means you got lucky with the random noise, and p(real effect) is still zero.
Well, no, Erolamos is right, p-values are much closer to a measure of signal-to-noise ratio than effect sizes, which are just a rescaling to make it easier to interpret the importance of an effect.
Erolamos is right that p-values are not effect sizes, but wrong in that p-values are not a signal-to-noise ratio or anything like that. A p-value of 0.05 is entirely compatible with picking 20 different falsehoods, throwing them all at a p = 0.05 test, picking out the one that randomly passes, and publishing it as the real deal, for a noise-to-signal ratio of infinity. For example, https://gizmodo.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800 (I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here's How.)
In fact, a p-value of !!1.2*10^(-10)!! is entirely comparible with stuff like saying psychic powers are real (https://slatestarcodex.com/2014/04/28/the-control-group-is-out-of-control/), because p-values do not measure what people think they do. What people think they do is called "Positive Predictive Value"/PPV, and confusing them is why people have to write papers like https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124 (Why Most Published Research Findings Are False) pointing out what PPVs you actually get from the p-values people are proudly slapping on their papers (hint: the PPVs are depressingly bad).
It is probably a mstly semantic debate, which I do not find really interesting.
I think Erolamos is right in saying that p-value are MORE a measurement of signal to noise than effects sizes. But:
- p-values are obviously NOT identical a signal to noise ratio, they are conceptually related to one. Ands yes, of course, p-values can be tortured, like basically any statistical tool, and Ioannidis has a point but hey, used reasonably, p-value are a simple tool that is frequently useful.
- it of course depends on what is meant by signal to noise ratio. If we stay in the context that was discussed, is there a difference between the effects of two-treatment, then I do think that p-value are a reasonable measurement of the signal (average difference between treatments) to noise (variation within a treatment). But of c ourse signal to noise ratio can also mean something quite different, and in these other contexts, p-values would not corerspond to that.
Erolamos is right to say p-values have nothing at all to do with effect sizes. It's common for things with impressive p-values and even PPVs to have such utterly tiny effect sizes they simply do not matter.
Unfortunately, p-values have very little to do with the signal-to-noise ratio either. They might have, once. Nowadays, people have found ways to tortue them into being whatever value they want, regardless of how little signal and how much noise they're actually dealing with, and the p-value is effectively dead as a honest indicator. That's how you can get a study with an unassailable p-value of 0.00000000012 that's also completely wrong. Or oceans of p=0.049 papers that are near-literal fraud: https://fantasticanachronism.com/2020/09/11/whats-wrong-with-social-science-and-how-to-fix-it/
"Criticizing bad science from an abstract, 10000-foot view is pleasant: you hear about some stuff that doesn't replicate, some methodologies that seem a bit silly. "They should improve their methods", "p-hacking is bad", "we must change the incentives", you declare Zeuslike from your throne in the clouds, and then go on with your day.
But actually diving into the sea of trash that is social science gives you a more tangible perspective, a more visceral revulsion, and perhaps even a sense of Lovecraftian awe at the sheer magnitude of it all: a vast landfill—a great agglomeration of garbage extending as far as the eye can see, effluvious waves crashing and throwing up a foul foam of p=0.049 papers. As you walk up to the diving platform, the deformed attendant hands you a pair of flippers. Noticing your reticence, he gives a subtle nod as if to say: "come on then, jump in"."
> So take caucasian men's heights. The average is 69". So say you get the height of all caucasian men and see how far each is from 69", then average all those numbers. That measure is called the standard deviation. It's the average distance people. in the population are from the mean.
Huh. I was mostly following the discussion below you of your definition of p-value, but you should be aware that this is also grossly wrong.
You are right that it is wrong. I simplified it because I was writing an explanation that assumed the reader didn't know a thing about stats. Standard deviation is actually the square root of the sum of the squared differences between each subject's height and the mean height. So it's not a simple average, but an average that gives more weight to the values with greater distance from the mean. The simplification does not make any difference in the conceptual understanding I was trying to give readers. A conceptual explanation of standard deviation is that it's a way of quantifying how spread out the values are around the mean. That was what I was trying to get across.
> Is anyone else a bit sceptical of the current focus on measures of "effect size" as currently defined? Lots of review articles just quote those, and I suspect that a lot of nonmathematical readers assume from the name that it means something like "how effective the treatment is" , which ought to be scaled such that some number eg 1.0 meant "completely cured". But actually it really seems to mean something more approaching "signal to noise ratio", so it's telling you how reliable the study is. Which is vital to know, but actually I'd like to know how effective the treatment is too.
You've gotten the terminology backwards. "Effect size" is a measurement of "how effective the treatment is", exactly what you claim it isn't. (Except that since there is no concept of "completely cured", effect size is reported in terms of how much of a difference the effect makes, not in terms of a hypothetical absolute scale.)
The measurement of "signal to noise ratio" is known as the "p-value".
I do know what P-values are (it is unfortunate that the rest of the thread has diverted onto discussing those)
It is true that effect size should be *proportional* to the effectiveness of the treatment. But
most (all?) mathematical metrics used for effect size, eg Cohen's 'd' are normalised to the to the population variance (hence my comparing it to a signal to noise ratio).
These seem to be used in two ways which are mistaken:
1) to compare studies, which aren't guaranteed to have the same population variance
2) There are standard 'this is a good study' thresholds of effect size, which are often then interpreted as 'this is a good treatment'. But the effect size does not tell you this, as a weakly effective treatment can end up with an 'effect size' over the threshold, depending on the population variance.
I don't see what you're objecting to. The measurement of how effective the treament is is called the effect size. The effect size is reported in relevant units - for example, if your car generally goes 375 miles on 12 gallons of gas, then the effect size on range of refueling 2 gallons is "62.5 miles". (Well, possibly a bit less, because of overhead.)
You can report that in terms of the standard deviation of total range, as sampled over X driving-hours over various types of terrain, but since the deviation of range is reported in miles, your result will still be 62.5 miles. This is not "proportional" to the effectiveness of the treatment, it _is_ the (averaged) effectiveness of the treatment. The Cohen's d number, interpreted without reference to the size of the standard deviation, seeks to answer the different question "how difficult is it to get this much of an effect?"
(I would expect this normalized effect size to be incredibly large, because I don't think the variation in mpg across a lot of normal usage of the car is going to represent anywhere near as much of a change as increasing the amount of fuel by 17%. But it will still be 62.5 miles.)
> 2) There are standard 'this is a good study' thresholds of effect size, which are often then interpreted as 'this is a good treatment'.
Study quality is traditionally measured by the p-value. It is not measured by the effect size. Importance of the result is what is notionally measured by the effect size.
You might observe that low study quality directly causes larger effect sizes (after a p-value filter is applied), but this is not part of any standard evaluation of study quality.
I did once encounter someone who adamantly insisted that of course a measured effect size would fall as a study was repeated with larger sample sizes, because that's just how statistics work as sample size increases. There is no limit to the stupidity of people pretending to do statistics. But that has no particular implications for what measured effect size means.
It seems there is a terminology issue. What you are describing as effect size, as being reported in relevant units, is fine. I have no objection to that. According to wikipiedia, numbers normalised to variance like "cohen's d" are actually supposed to be called "standardised effect size".
What I object to is review or survey papers where, on the basis of the studies reviewed, different treatments for the same condition are compared solely based on "Cohen's d". I don't have time now to go looking for papers where I've seen this, I'll try and do it later. These have the pitfalls that I complained of in my previous comment.
I can understand the objection to reasoning along the lines of "this treatment has an effect size of 0.4 standard deviations, which is Large; therefore it is worthwhile to do this treatment". That's confusing a metric of difficulty with a metric of value.
But I don't see the problem with comparing different treatments for the same condition to each other by relative effect size. You don't even need to normalize against a standard deviation. No matter how you normalize the effect sizes, comparisons between treatments will always make exactly the same amount of sense; if treatment A has an effect size of 0.03 d, and treatment B has an effect size of 0.09 d, that looks identical to treatment A having an effect size of 8 benefits and treatment B having an effect size of 24 benefits. What does one benefit mean? Who cares? In all cases, you know that treatment B has three times the effect of treatment A.
Comparing different treatments would be fine if the population would be the same, but there review papers are looking at separate studies of diffierent treatments. They can be drawn from different populations, either due to explicit selection criteria or as an artifact of how and where the study was conducted. For example, one study might exclude the hardest cases, another might be of hospital inpatients . In such cases the population variability should not be assumed to be the same, so 'd' should not be assumed to be comparable.
I think you have a typo: You list "Third" twice, instead of "Third" and "Fourth."
In any case case, I agree with your (first) "Third" reason for doubting this study. The conscious, subjective effects of the drug are likely causal for its antidepressant effects. There is no reason to expect that the drug would or ought to have the same effects if the patient is unconscious when the drug is administer. (I would suggest that the same is true for MDMA, which has achieved good results in the MAPS trials, and psilocybin as well.)
The fact that this simple point seems lost on many suggest a serious misunderstanding as to how these drugs work. There's a bad mental model in play, as if depression were like a bacteria and the drug were an antibiotic that kills it, or some other such simplistic mechanistic model.
I think that might be the right mental model for ketamine, though. Another way of saying that would be to say that ketamine is sort of like a fast-acting SSRI: If it works for someone, it works by correcting something in how the brain is working, rather than by inducing an unusual experience, with the experience itself being the agent that helps improve how the brain is working. And in fact that's how most drugs work, includiing drugs that work on the brain -- tranquilizers, sleeping pills, adderall, drugs that combat epileptic seizures . They don't induce a novel experience that is meaningful in a way that leads you to become less anxious, more sleepy, more focused or less likely to have a seizure -- they just tweak the brain directly.
I have experimented with ketamine probably about 10 times, using the pure pharmaceutical drug I obrtained through an MD who's a friend. I did not find it to be a rich experience at all. Cannabis, even in moderate amounts, has induced far more intense, rich and novel experiences, some of which were quite important to me , experiences I remember now decades later. My experience was that ketamine really *felt like* like an anesthetic, a knockout drug. It made me sort of half-conscious, and while that was novel, it wasn't a rich sort of novel. Nothing from deep in my mind rose up to fill the gaps left in my consciousness. I was just sort of blank and disoriented. I have more to say about this in an earlier post.
I'm not arguing against the *possibility* that it works that way. I'm arguing against the *assumption* that it must work that way, which is what you have to make for its failure when administered while unconscious to be invalidating.
Also, I have a pretty high confidence that the efficacy of MDMA and psilocybin positively correlate with (and may be partially caused by) conscious experience, so I think it's warranted to be cautious about the assumptions made regarding ketamine.
Oh I have no doubt that MDMA and the psychedelics induce experiences that are meaningful in and of themselves, and can be change agents. I'm just skeptical that ketamine works that way. My reasons: (1) It's not even classed as a psychedelic -- it's a dissociative drug. (2) I have taken several doses of ketamine, some pretty large, and my experience was that it definitely fucked me up, but not in a rich way. It was sort of like being drunk to the point that you're on the verge of being in a stupor, and don't have much going on in your mind or your heart. (3) A meta-analysis of ketamine studies that I found looked at the combined results of 3, maybe 4 studies that compared ketamine benefit of people who were given it during surgery -- some while knocked out cold, others who had had a spinal and were awake. Both groups benefitted (had less post-surgical depression) and they benefitted equal amounts.
(In another post I tell about that meta-analysis in more detail, and link to it.)
I'm torn on this question because for me personally ketamine has provided me a ton of extremely rich subjective experience that has been very meaningful to me. I've used it maybe a hundred times now over several years at low to medium doses. For me the experience is as rich as other psychedelic experiences I've had. I have about two hundred pages of notes from these ketamine experiences, including documenting specific ways the thought/feeling material while on it shifted my relationship to circumstances I was in at the time as well as observations about how it's affected me more generally across time.
At the same time, I've worked as a therapist with a fair number of people who have been on a variety of ketamine protocols and it seems quite clear to me from that clinical experience that a lot of people are getting benefit from it without having deeply meaningful subjective experiences -- these are people at both low and high dose ranges. As a related aside, I know people on low dose psilocybin protocols who are experiencing significant therapeutic effect from it without any altered consciousness experience.
I just think we have to admit we don't know what's going on yet and to keep being open to the wide variety of experiences people are having without privileging one story over another when we really don't have enough knowledge about mechanism of action to draw those conclusions.
Perhaps related side note: I'm a long-time student of Buddhism and for me, ketamine produces a state of mind that is pretty reliably adjacent to the warm-hearted "no-self" experience that can happen while meditating. It's not the same, but it's in the same ballpark for me. My internal conversation back and forth between ketamine and Buddhism these past few years has been really wonderful.
> I have about two hundred pages of notes from these ketamine experiences, including documenting specific ways the thought/feeling material while on it shifted my relationship to circumstances I was in at the time as well as observations about how it's affected me more generally across time.
Well, I would say that is proof positive that your ketamine experiences and their value were different from mine! I used to write notes a lot when I was high on weed, and when I read them later I still have the feeling I had when I wrote them: They're interesting ways of thinking about things -- they're striking, they're funny. If I had a ketamine journal all it would say is "holy *shit* this sucks, how long til it wears off?"
It may be. I spent some time on the ketamine subreddit, and most there either felt very helped by ketamine, purely as a drug, or else felt their altered consciousness on ketamine was itself helpful. But I think forums like that select for people for whom the drug is helpful, or who at least think it is or hope it will be. People with experiences like mine might visit once of twice to describe what they went through, but then have no reason to want to keep participating in a setting like that,. Another factor is that my first ketamine experiences were with substantial doses given intramuscularly. When you take it that way you go from normal to totally gorked in about 3 mins. Maybe I would have felt more positive about the stuff if I'd had a gentler beginning. On the other hand, when I used the troches I was not alarmed or overwhelmed , but also did not feel like my experiences were very rich or valuable. I floated along, listening to music, feeling sort of peaceful and enjoying the music. It was pleasant but just not very substantial.
I personally heard from someone who also received a powerful, quick-acting dose at a clinic for her first experience. She was so put off that she decided not to return for a second time. This should really be avoided.
My view of your experience as atypical stems from more general discussion threads where people rate different drugs, not just from ketamine-focused forums. This should reduce the selection bias towards just recreational drug users, not ketamine fans. I've also read several posts from multi-drug users who have compared various drugs they've taken.
One of my reliable self-treatments for PTSD is to take a THC edible before going to bed. I wake up happy and relaxed, but slightly hung over. Apparently not only does it dehydrate you like alcohol, but it also messes with REM sleep. But then, part of the reason is that I want to be able to sleep at all, and the other part is that I want to be able to sleep without PTSD dreams.
I can't rule out this being a placebo effect from consciously taking the thing, but I've tried with a few varieties of edibles, and nothing except THC seemed to work. Although I do wonder whether I'd still notice a better effect of indica vs. sativa, if it were blinded.
This is absurd! Hallucinogens don't counteract depression while you're asleep. They work (or don't) because you experience something profound that you take with you after it wears off.
Is it impossible for the medical world to understand that a spiritual problem might have a spiritual treatment? That not everything is simple physiology where the goal is to tone down an overactive hormone or kill a deleterious bacteria? That a despair so profound can exist that it becomes a medical condition but it is, still at its heart, dangerous and intense and durable despair?
Anyone who has done hallucinogens recreationally can understand why they could be dangerous for schizophrenics and profoundly helpful for depressives. It seems...absolutely batshit to treat it like something you can do under anesthesia.
> Is it impossible for the medical world to understand that a spiritual problem might have a spiritual treatment? That not everything is simple physiology where the goal is to tone down an overactive hormone or kill a deleterious bacteria?
Yes, this is, by definition, a nonscientific hypothesis, and if you're doing science, it is not a valid idea to consider.
If people don't have souls, then treating their souls cannot have any effect.
The spirituality language is needlessly distracting, but I think that the underlying idea has merit. What does the concept of "depression" refer to in terms of materialist reductionism? The honest answer is that we have no idea, it's a high-level abstraction that we're able to reason about it only in terms of correlations and other abstractions, far removed from ground-level material reality.
The concept of "altered state of consciousness" is basically on the same level of abstraction (also ultimately grounding in material reality in some currently not understood way), and there are no a priori reasons to dismiss the notion that it can have effect on "depression".
But there are a priori reasons to dismiss the idea that it can have an effect on depression separate from its effect on simple physiology. There is no such thing.
Well, physiology may be simple in some absolute sense, but it doesn't mean that we're currently able to isolate and detect these effects, let alone completely understand. As far as we are concerned, it might as well be "magic" (in this case known as spirituality).
No more so than talk therapy, which also does not work while under anesthesia, for obvious reasons.
In other words: the most common form of treatment for psychological issues, talk therapy, works on the level of consciousness and not root biology, just as I suspect ketamine does.
And to avoid getting bogged down in quasi-religious language -- what I mean is depression is often a hole so deep of hopelessness and self-hatred and worries that you can't see the world outside of it. And hallucinogens are famous for inducing a sense of connectedness where you see how small you are compared to the universe.
Maybe there's some background physiological effect of ketamine that makes it work like a regular drug, but the fact is that it's sensible and easy for anyone who has done hallucinogens to understand why these types of drug might benefit someone with depression...but not while asleep.
Science can study higher-complexity processes of cognitive processes too. It's not the question whether it is soul or consciousness or what, it is a question whether low-dimensional quasi-permanent parameter shifts provide enough resolution for affecting the necessary complicated change. And if not, what other interventions we plausibly have and how they combine. So, in the current context, do psychedelics aid in a kind of self-therapy, and maybe how to best guide it via external therapy.
Science cannot contemplate the idea "That not everything is simple physiology". That is an invocation of the supernatural, and it is assigned a prior probability exactly equal to zero. If any phenomenon has any detectible effect, that is ultimately down to simple physiology.
Whether you're capable of adequately understanding the phenomenon in terms of the simple physiology is a different question, but there is no question of whether the simple physiology is sufficient to explain the phenomenon.
[unrelated footnote: why is the Firefox spellchecker flagging "detectible"? Why is there no entry for "detectible" in Merriam-Webster? What's going on? It's easy to find citations for the spelling, including very modern ones, and the etymology tells us that the suffix appropriate for "detect" is the same one appropriate for "deduct", where all dictionaries agree on "deductible".
It is true that neither of those appears to be etymologically well-formed, but then why are we using "deductible"?]
Charitably interpreted, «simple physiology» excludes «extremely complicated emergent effects of physiology that are definitely not simple and have to be studied on their own because the reduction is way out of reach». «Sufficient to explain» depends on the available resources for intermediate steps. That's how I chose to read the original claim.
PS. Go all the way the other way, «deductable» is an acceptable alternative spelling, according to some dictionaries (although not Firefox)!
I don't think they're saying that it works through supernatural means. It gives people a way to reframe their experience that feels less distressing. Many people feel more OK about bad experiences if they think it was productive to go through those experiences.
I am an atheist, but I have experiences like this, too. For example, I am cool with the scary and painful medical/dental procedures I was subjected to against my will when I was a child because I understand they were done to me for a purpose, and it's a purpose I believe in. If I found out that this was a lie and I was actually tortured for no good reason at all, that would be really hard to take.
I agree with you generally but I think people working in the ketamine treatment space have different understandings about how crucial various kinds of conscious subjective experiences are to treatment efficacy. For instance, people not infrequently have "bad trips" on ketamine and still experience depression relief afterwards. I don't think 100% of the effect is tied to the quality of the conscious subjective experience. I don't think we know.
There's also the question about whether or not the metabolites of ketamine [(2R,6R)-hydroxynorketamine] are antidepressant. The next question would be if the same might be true about metabolites of the drugs used for the anesthesia.
"Bad trips" are not always interpreted afterwards as something negative by those who take these types of drugs. Interviewing people who do LSD, they often interpreted bad trips as very difficult but yet profound experiences, precisely because they were so difficult and scary. Sort-of framing them according to a "on a hero's journey" narrative. Which is a nice narrative to tell yourself, if you are suffering from depression. The point I am making: Also bad trips can induce a sense of meaning in your life. Perhaps they are particularly well suited for that.
I guess I'll just say the providers I know are less clear than people in this discussion that the conscious experience is 100% of the antidepressant effect. It may be 100% of the subjective psychological meaning-making effect, but I don't think that's necessarily a one-to-one correlation to the anti-depressant effect.
I have a lot of first-hand experience with ketamine and I would say it's not clear to me either. I've also had a number of "bad trips" on various substances, including a few on ketamine, and I wouldn't frame them as "hard but meaningful" though I can see how some people might do that. I have "hard but meaningful" feelings about other life experiences certainly.
A long hike up a mountain might be a hard but meaningful experience. The handful of bad psychedelic experiences I've had in my life just go down as hard and bad.
OMG, my worst ketamine experience was getting caught in a sort of loop. I was sitting on a couch, feeling very disoriented, and having the feeling that I had just been thinking about something important and needed to recapture it. So I would strain and strain with my weak confused mind, and finally I would remember what I had been thinking about: I had been thinking that a little while ago I had had an important thought, and it was crucial to get back to what it was . . . This loop seemed to go on for hours, though in reality it was probably about 15 mins. It was awful -- sort of like being so drunk you have the spins, except this was mental spins not physical ones. The person who was with me kept asking what was going on, and all I could say was "It's a remembering." Oh that was so bad.
I'll add a note from a slightly different angle. Ketamine is also used to good effect with some people who have PTSD. How this helps people seems to vary. For instance, I know someone who on low dose ketamine -- mostly doses below that providing a kind of transporting trip -- over months just gradually became less hyper-vigilant and less easily startled. This was a long-standing pattern despite years of therapy and the only thing that changed was the person taking low-dose ketamine. We can't really attribute the outcome to the subjective trip experience because the trip experience was almost non-existent.
Another person I know who had more inter-personal trauma would get relationally triggered, meaning interactions in his primary relationship could set him off and play out across days. If this person took ketamine after being triggered, he noticed that the ketamine un-hooked him from the triggered state as soon as the drug kicked in, before it had peaked and before there had been time for any subjective "trip" to happen. Just the experience of the drug kicking in turned this person's trigger off and it stayed off after coming down (until they got triggered again some other time, though getting triggered in general decreased over months).
Both of those examples suggest to me that not 100% of the psychological benefit of ketamine is due to the subjective trip experience or to insights gleaned while on the trip.
In a meta-analysis I found just now the researchers compared the 15 best studies (out of 700+) on the affects of ketamine adminstered during surgery on post-surgical depression. Some subjects had had general anesthesia, and so were unconscious when the ketamine was administered, and some had had a spinal, so were awake to expdrience whatever state of mind ketamine induced. Both groups benefitted equally. This result came not from a single study, but from combining the results of all the studies within the group of 15 that were relevant to the question. So this result very much weighs in the direction of the subjective experience of ketamine intoxication being irrelevant to its benefits. In another post I give a link to the meta-analysis and more details.
My own unfounded theory is that the subjective experience can be meaningful and valuable and therapeutic in various ways to some people and therefore may significantly enhance the therapeutic benefit some unknown amount, but that the meaningfulness of the experience does not translate directly into the measured antidepressant effect we see.
I think there's enough we don't understand about the mind and that depression assessments are pretty crude tools so that I'm reluctant to say the subjective experience is irrelevant to the antidepressant effect. I suspect it is significant at the individual level for some subset of people but that the importance of that effect is lost in the group results.
I can wrap my head around the idea that some of the value is in physiological stuff working in the background. I can't wrap my head around a study -- with a small cohort no less -- that removes the entire experience of the drug.
This was also my immediate reaction, and it sounds like the sort of issue listed under "Third" that Scott says he is skeptical about for reasons he will discuss later — but then the discussion didn't seem to come. Perhaps he meant "later, in a separate post"?
Ah, I see how that fits together now. In that case, it's interesting that people like Evan Sp are so convinced that the conscious experience is such a central part of the effect. Are there studies or anecdotal evidence suggesting that higher doses (sufficient to cause dissociation) might be more effective, e.g. in patients who fail to respond to a lower dose? Or that for intermediate doses, patients who experience dissociation are more likely to experience subsequent relief from depression?
Hasn't there been a bunch of research on similarly powerful effects of LSD and psilocybin under the right circumstances? i.e., different chemicals, similar conscious inducement of connectedness, our own smallness in the universe, etc.
This is absurdly distant from any of my areas of expertise, but a cursory literature search didn't turn up any attempts to test therapeutic efffects of LSD or psilocybin under anaesthetic. (I could easily have missed something.)
The closest thing that has been examined somewhat extensively is probably microdosing, which is a subject of ongoing research — recent review here: https://pubmed.ncbi.nlm.nih.gov/36381758/
I take it you're not a psychiatrist? Neither am I, but I can see why Scott's instincts might diverge from those of a non-psychiatrist when considering whether conscious experiences (of dissociation, hallucination, etc.) should be necessary to produce therapeutic effects.
There is strong evidence that conscious beliefs about the causes of our mental states are a retrospectively constructed fiction. If we have a powerful conscious experience of dissociation or hallucination and subsequently change our overall outlook on life (e.g. relief from depression), we tend to assume that the experience was responsible for the change in outlook.
But especially in the case of drug-induced experiences, this assumption is liable to be mistakenly assuming causation where there is really only correlation. The drug causes a whole bunch of neurological effects across the brain, and the neurological effects causing the conscious experience might be different from the neurological effects causing the change in outlook. By changing the conditions under which the drug is administered (e.g. lower dose, anaesthetic), it might be possible to get the change in outlook without the conscious experience.
(These rambling thoughts partly inspired by Scott's old post, as well as some of the discussion in the comments:
Yes, it's not a classical hallucinogen but it has hallucinogenic effects, including visual hallucinations, euphoria, time distortion, especially at higher but sub-anesthetic doses (k-hole).
Placebo controls for drugs people can feel are always fun.
The classic for psychedelics is niacin, which gives you a flushed face at high doses but people nearly always know they haven’t been given magic mushrooms or whatever which makes it less useful as a placebo (from Aday et al. “divinity school students were assigned to receive psilocybin or niacin, a B vitamin with mild physiological effects, in a group setting at a chapel (Pahnke 1963). Despite some initial confusion because of niacin’s fast-acting effects on vasodilation and general relaxation, before long, it became clear which participants had been assigned to which condition, as those in the psilocybin group had intense subjective reactions and often spiritual experiences, whereas the niacin group “twiddled their thumbs” while watching on”.
On the other hand lower doses of alcohol can be very convincingly placebo-ed by giving orange juice but stuffing vodka soaked cotton buds up everyone’s nose, or secretly dipping the rim of a glass of cranberry juice in vodka (Bowdring 2018).
It is not just propofol and midazz, benzos in general are fairly effective against depression, but are off-patent and so nobody cares. Since benzos are generally administered before anaesthesia/surgery to calm down the patient and to reduce anaesthetic consumption, I would highly suspect this contributes to the large effect in the control group. This likely also renders the midazz controlled studies useless.
I have no personal stock or interest in this debate (not even professional interest, tbh), but it would seem to me that the common link between a lot of mental disorders (i.e. anxiety, depression, OCD) is a sort of runaway of the sympathetic nervous system. This simultaneously explains the overlaps in these syndromes, the overlaps in therapeutic responses to SSRIs observed for a variety of similar disorders and their kinda mid efficacy against any of them (they simply aren't a very good smypatholytic). It also ties in nicely the original reason SSRIs were developed (hypertension, also partially thought to be due to increased sympathetic drive). Begs the question what is causing this?
Right! I had surgery not so long ago and they pumped me full of benzos and fentanyl right before the surgery in addition to whatever else I got on the table. I didn't take opioids afterwards, but they were prescribed to me. Also, everyone is really really nice to you and they call and follow-up and want to see you in a few weeks to see how you're healing and people make you tea and bring you soup (if you're lucky). And you feel really relieved that you don't have to go back in and do it again next week. And managing physical pain -- even with ice or Tylenol, every few hours, and having to treat yourself more gently, etc -- focuses the mind for a while, as does watching the pain slowly get better and the surgical wound heal.
Yup! You would need to be completely closed off to the experience or testimony of hallucinogen users, depressed and otherwise, to think this makes any sense at all.
(Don't mean to sound like a booster or that it's for everyone...it's genuinely not. But, like, it's an experience with an effect, not a medicine that works while you sleep.)
I agree. I think Memory Reconsolidation is the foundation of most if not all therapeutic effect and for that to work you have to 1) Activate the target schema 2) Activate a contradictory feeling. You have to be conscious to do either of those.
But this isn't how we think SSRIs have therapeutic effect for instance? Why would we think that ketamine works like short-term psychodynamic psychotherapy, etc?
It might indirectly. SSRI's could conceivable enhance reconsolidation by reducing overwhelming autonomic nervous system states, for example. I'm speculating.
Well... this study somewhat supports the theory that antidepressant effect of ketamine is entirely psychological. I would be interested in seeing the same study done with SSRIs instead of ketamine.
entirely psychological vs what? (are you differentiating psychological from psychiatric? are these really separable? I'm not 100% structuralist, if anything maybe I'd say there is a spiritual element. But Belief/Thought and neurochemical states are bidirectional)
> And my patients’ experience is that it works even at low doses that produce no dissociative or ego death effect. I usually prescribe it at about 70 mg intranasal. Some of my patients report feeling a little drunk or giddy on this amount, but nothing like the k-hole that people report at the really high levels. Other patients report nothing at all, but still feel better.
So there’s good reason to try to isolate it from the experience and test the effect
Sorry, people report feeling no effects from 70mg intranasal? That's... surprising. Is there some kind of genetic variant that makes you immune to dissociatives?
I think there's another confounding factor here: people are likely to have higher levels of stress due to an upcoming surgery. The authors refer to these as 'minor' surgeries, but a surgical procedure is only routine to the surgeon. The paper says the length of the surgeries was 4 hours (+/- 2 hr SD). That's basically an all-day procedure for a patient. Not an in-and-out kind of thing, even some of these were outpatient (though I didn't see an inpatient/outpatient breakdown in the paper).
They measured these patients up to 5 days pre-op, meaning the patients knew of their upcoming surgery at least a week in advance (otherwise they'd never have screened into the study) and had all that time to fret about complications and the like. Meanwhile, post-op they were told whether there were complications (looks like only 1 patient had complications) so the uncertainty about long-term bad outcomes would have resolved at that point.
TL;DR: Because of the stress of upcoming surgery, there's clearly still lots of room for regression to the mean in this study design.
I look forward to reading this researchers follow up paper "Randomized trial of Cognitive Behavioral Therapy masked by surgical anesthesia in patients with depression"
There's also the Dead Twin Study: Living twin and dead twin have the same genes, but very different environments . . . Note the huge differences in behavior, all clearly environmentally determined.
As an anaesthetist, my first impression was that there would be simply too much interference from the anaesthetic itself. Not just from the lack of consciousness that will clearly interfere with any of the putative benefits around changed appreciation of conscious perception. Rather, that the brain state is so different under anaesthesia that it is surely challenging to infer anything relating to non-anaesthetised life based on this. That seems especially true when the feature in question is something as complex and organised as this.
Do doctors prescribe antidepressants at sub-therapeutic doses to see if the placebo effect is enough to help the patient before increasing the dose (to levels where side effects become more of an issue)?
I don't exactly know the story behind this, but my mom has been taking ~35 mg bupropion per day for the last year (half of a 75mg tablet). Standard therapeutic doses for bupropion are more like 150-450 mg/day. She reports very positive results.
(While this is pretty nonstandard, it's more common for doctors to start with a small dose of an antidepressant and taper up. Depending on the medication, lower doses can be basically an active placebo.)
Also if a person is a poor metabolizer of certain classes of drugs because of liver enzyme genetics, much smaller doses may genuinely be as effective for them as standard dose is for everyone else.
The primary mechanism by which grapefruit interacts with medications is by inhibiting CPY3A4 - you can check which drugs have that as a major metabolic pathway. (Apparently there is also a _different_ mechanism by which many citrus fruits, including but not limited to grapefruit, have an additional drug interaction. I don't understand that one.)
Years ago, I knew a psychiatrist in SF who prescribed SSRIs for some of his patients at nearly homeopathic doses and claimed to have a lot of success with that. Though as I say that now I realize he must have had a compounding pharmacy do it since tiny dosages aren't otherwise available.
Not entirely related: I had trouble getting off of the lowest dose of gabapentin and my doc had a compounding pharmacy make me a liquid solution and I was able to taper down so gradually that I didn't have any side effects whereas I'd tried many times before and the withdrawal symptoms were terrible.
I have a patient who swears that 2 mg of Lexapro solved all their problems. As far as I can tell, they're right. No good explanation. My guess that all drugs have so many effects that sometimes some effect helps them other than the usual one which we consider the mechanism of action, and that effect works at a different dose. But it could also just be a very weird metabolism.
The doctor will be unable to tell whether they're seeing a placebo effect or, alternatively, giving the drug to a person who is just extremely responsive to the drug and only needs a low dose.
That's what most psychiatrists do. For almost all psych meds you start with a low dose and raise it slowly. If the person says, while still at a low dose, that they feel much better, doc stops increasing the dose. If person starts declining, they recommend increasing the dose. Even if the improvement does not diminish, they might after a while recommend trying the next higher dose to see if there is more benefit to be gained.
I guess I was thinking more along what Radar was saying. Starting at say a quarter of what is now considered a starting dose where there is little chance of side effects.
I've wondered about this question myself, and even asked an anesthesiologist about it before he was about to put me under (ketamine/fentanyl/midazolam). I got this mix twice actually, the first time the ketamine was clearly withdrawn later than the other two drugs, and that was a crazy experience when coming back to consciousness - the second time was just waking normally but drowsy. In both cases though my immediate post-surgical state was dominated by pain and dealing with the surgery and opiate haze and I detected no discernible effect on mood.
One theory is that the effect requires neural annealing-type processes, and the presence of a heavy benzo dose basically disrupts any beneficial memory reconsolidation. I could believe this given my experience.
The S-ketamine vs R-ketamine debate is also relevant here. Some folks think R-ketamine helps just as much or more despite not having the psychoactive effects, due to non-NMDA mediated effects. That shouldn't be too disrupted by anesthesia though? Or the most potent mixture is actually racemic ketamine which gives you *both* a positive mood push *and* the psychoactive effects and everything resulting from those.
> Since this happened in both the ketamine and placebo groups, the obvious guess is “placebo effect”.
Unless I'm misunderstanding the experimental design, I'm confused why your obvious guess is "placebo effect" instead of the effect of surgery plus anaesthetic plus post-surgical painkillers.
Was there a third experimental arm where they administered the MADRS with neither ketamine nor a placebo to check whether surgery itself makes you more depressed?
Scott, I'd love to see you address this. Is there a reason to think that people who aren't in a ketamine study at all don't show similar effects post surgery?
Upon second reading, it turns out I misread the sign, and surgery actually makes people _less_ depressed overall, which is a surprise to me. I've had surgery twice and it wasn't fun, but then again I wasn't depressed.
This might be one of those counterintuitive things about depression which I remember from an earlier Scott-post about the failure of Covid lockdown to increase suicide rates as expected. Depression requires you to feel like things are bad _and_ they'll never improve, so a short term shock like surgery, which makes you feel terrible in ways that you're confident will go away within weeks, might relieve depression a bit.
Then again it might just be the effect of a massive cocktail of anaesthetics and subsequent painkillers.
In an interview, Dr. John Krystal, the Yale professor who pioneered ketamine as a treatment for depression, is very specific that dosage is important for getting antidepressant effects from ketamine.
He specifically says that if the dose is too high or too low, you don't get the antidepressant effects.
I'll spare you the long quote about it, but if you go to the transcript here https://tim.blog/2022/10/03/dr-john-krystal-ketamine-transcript/ it starts at about here "And in 1997, she published a paper that showed that ketamine released glutamate in the brain..."
Is there any evidence that depression is something other than persistent defense cascade activation combined with various maladaptive schemas?
Ketamine never worked for me. Then MDMA-therapy was the magic solution (though it took quite a few sessions). My research on the topic leads me to see most mental illness as persistent autonomic nervous system defense cascade (fight or flight, dissociation, etc) activation mixed with various maladaptive schemas (integrated memory structures containing emotional reactions, episodic memory, and semantic memory). The ANS activation seems to be primary a response to fear or bodily damage. So I think the primary treatment is healing the underlying fear. And it seems somewhat well established that the primary mechanism of durable treatment of maladaptive schemas (and the fear underlaying ANS activation) is memory reconsolidation.
>Is there any evidence that depression is something other than persistent defense cascade activation combined with various maladaptive schemas?
Yes. In some depressions it is pretty clear that there is something biological going on. The person moves and talks slowly and cannot speed up, They become so constipated that they need to be disimpacted by a nurse. They can't stay awake and can sleep for 20 hrs at a time. They can't stand to eat, or can't stop eatingl
At my worst, I was sitting on my couch and noticed a fly land on my arm and crawl around, and I made a mental note to myself "I should wash that later", but was unable to move even enough to shake off the fly.
I'd had "chronic depression" on and off for most of my life, which generally responded well to low doses of fluoxetine, but that was nothing next to this thing. I don't even think they should be called by the same name.
Truly terrible. I can relate in the feeling, though I never quite reached that degree of severity. It can be hard for people who've never been depressed to empathize with being in the position of not only needing a path to recovery, but also needing the steps on that path to be compatible with the tightly limited amount of motivated behavior a person with severe depression can muster daily.
If you can forgive me for appending a bit of dry science to your personal story (hey, it's what I do), this fascinating paper ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1572268/ ) shows that a depression-linked serotonin receptor (that is, more receptor <--> more depression) blocks the activity of NMDA signals in the neocortex. These signals that would eventually travel down the spinal cord and translate into physical activity, now require greater effort / greater sympathetic activation / greater *stress* to send out.
Is this why anxiety and depression so often occur together? If greater stress is necessary to activate a typical amount of physical activity, a depressed person’s body may adapt over time by activating the ANS more frequently in order to compensate.
I can't answer your question, but this does make intuitive sense to me. I feel like it must be the case at least some of the time.
Here is an anecdote for you:
Some decade and a half ago, I was in an abusive relationship that I badly wanted to leave. I had tried many times (I lost count of how many) to break things off over the previous five years, but I was in a state of depression, and I just did not have the mental strength to resist my then-boyfriends persistent pleas, demands, and threats to take his own life. I always caved because it was easier in that moment, and I simply could not muster my motivation any further.
When the relationship started to get genuinely scary, I suddenly felt the mental fog lift a little. I was still horribly depressed, mind you, but it was enough that I could formulate something of a plan. And I realized that a major cornerstone of that plan needed to be the cultivation of my mind into a more heightened state. There were several things I tried, but the most effective was this: I started writing down everything about my then-boyfriend that made me angry, and everything that made me scared, and I reviewed that list daily. Whenever I thiught of a new thing, I added it to the list and really pushed myself to dwell on it.
By the time I was ready to put my plan into action, I was seething and terrified, and that carried me through -- even though the process of leaving turned out to be far more stressful and complicated than I had expected, and ultimately required me to completely upturn my life (including leaving the country).
I look back in wonder that I got through it; even though I am no longer depressed, and indeed absolutely love life now and enjoy a high degree of motivation in my daily life, I don't know that I'd have the mental energy to execute a plan like that again without rage or terror to propel me.
I undoubtedly gave myself some lifelong trauma by doing this but, hey, I'm alive. It's my battle scar.
Sorry, this seems like an exceptionally silly study. I don't think it proves anything other than asking people questions about how many times they've had disturbed sleep, appetite or various kinds of thoughts immediately after major surgery is really silly (as you say but I'd go further and say it doesn't even conflict with claims about placebo).
Why didn't they administer a follow up 2 weeks later? That's an awful lot of trouble and expense to go to not to bother.
They did? 14 days = 2 weeks.
But he said they administered MADRS on post surgical day 1.
Was there a follow-up 2 weeks after that we can compare or do you just mean the whole protocol was 2 weeks (I meant wait 2 weeks after surgery).
It was administered again on days 2, 3, 5, 7, and 14 according to the graph.
Ok that's much more convincing then and makes all the objections about administering MADRS day 1 a bit irrelevant (why not administer it you have them there... though it would be good to remark about how they are likely a poor indicator at that time).
This may be my ignorance, but patients take ketamine on a bunch of different regimen, right? Some folks have intermittent treatments at a clinic, some folks have prescriptions they take daily, etc.
If Scott is referring to those daily methods those objections would still stand, right? Though I think it is likely irrelevant for evaluating the initial single doses if you don’t consider the reflection allowed by the ego death effect as a cause
Even 14 days out from surgery most people are still in the experience of recovering from surgery and so it seems to me psychological effects from surgery may swamp any potential effect.
For most people surgery is pretty scary, there's anticipatory worry, there's rallying resources (internal and external), there may be more support from friends and family, and there's the agency aspect Scott mentions of solving some kind of medical problem that was likely hanging over the person. There's also a lot of purposeful care and self-care that happens in the couple of weeks after surgery that I can imagine would have positive effects on mood. And of course, people are relieved to have the surgery over with, to have made it through.
Not to mention, that some people may still be experiencing mood benefits from the anesthesia and post-op pain meds for most of those 14 days out.
I'm with Scott that this study is unlikely to show anything helpful, but that lower-dose studies with conscious people over some weeks of treatment and follow up over some months is going to be way more meaningful.
I agree, but 14 days is still much better than 1 day in that regard, even if it is overall still unconvincing.
...so 6 times post-surgery during 14 days apparently. It must be the same questionnaire each time in order to get a time series, so I assume that is what the researchers used. Then you get the following problem: Most people will remember what they answered last time. And many of them, in order not to appear as forgetful or devoid of clear opinions about themselves (how they appear to others, even total strangers, matters to people), may answer similarly to last time(s) - for such reasons alone. Which can help explain why the curve is almost flat for both groups between day 1 and day 14. This would be an argument for waiting at least a week before giving people the questionnaire first time.
Hey, the people who develop and validate questionnaires are not dum dums. Part of validating a questionnaire is to check for problems like that. If you are trying to measure something that fluctuates, you want a test that captures the fluctuation. If scores stay stable then either you are measuring a trait, something that stays stable, or else your test somehow traps subjects into responding the way the remember responding, rather than spontaneously and honestly.
Yeah well....I have done my fair share of questionnaire reseach, including construcing quite a few, and without wanting to come acoss as a cynic, Bismarck's old dictum (slightly extended) is not way off the mark: "Those who like laws, sausages and questionnaires should not look while they are being made - or validated." Granted, psychologists are better at validating procedures than the other social sciences (they better be - "doing tests" is the main corner that psychologists occupy in the social sciences). But the question Scott presents: "you ask patients questions like “how many times have you had thoughts about guilt in the past few weeks?”" is quite close to the type of question I am sceptical about. There is this hard-to-shake idea that statements/opinions/reports (about yourself or whatever else) can somehow be "lifted up from inside you", like measures of blood pressure. But statements/ opinions/reports about self are not like that. They are low cost/low benefit behaviors on display. The cost is only something like the energy to lift a finger over a keyboard and press Y or N; the benefit is the satisfaction of having self-expressed yourself to others (which also includes the risk that others may interpret the sign you send as indicating that you are low prestige/someone to be pitied/wishy-washy in your self-expressions/something else your dislike being perceived as, or dislike perceiving yourself as). I could go on, but it would be too long...If I could get the ethics committees off my back, I would much prefer to investigate depression and everything else by studying costly behavior: tag people and their phones during a time interval and check when they get up in the morning, what they eat (if they eat) during the day, if they leave their room, if they talk to others, if they are away from work, if they drink and how much, and so on and so on. In short: Don't listen to what people say (unless what you want to study is only what people answer in response to different types of prompts/frames) - look at what people do.
The effects of a single dose of ketamine wear out after a couple weeks -- at least that's what I recall from early articles about it: Someone, often at an ER, would give it to an extremely depressed, actively suicidal person. They would feel much better in hours, and were no longer suicidal. But the effect would wear off in a week or 2.
This is also what saved my life — temporary relief, extended indefinitely — except with opioids instead of ketamine.
I would have sworn as a kid, and now swear as an adult, that I'd been happy maybe a double-digit number of days in my entire life — many of those days, oddly, when I was very ill...
...and then I took some opioid analgesics purposefully and while being old enough to put the pieces together. It was as if... like... whatever, I'll be unoriginal: as if I'd lived in a soundless world of black-and-white, and suddenly could see color and hear music.
As much trouble as they've caused me, well... they earned it, so to speak: I'm absolutely certain I'd have offed myself without them. (I still feel the black despair creeping in if I try to quit, to this day; I'm on bupe, which no longer gives me much euphoria but seems to do the trick regardless.)
To paraphrase Fitzgerald "Omar" Khayyam:
Though Morphine has much play'd the Infidel,
And robb'd me of my Robe of Honour -- well,
I often wonder what Big Pharma buys
One half so precious as the Goods they sell.
---------------------------------------
The unconscious association between hospitals and happiness I'd developed HAS made me wonder if this — or similar; e.g., post-op painkiller regimens — could explain some of the effect in this study, though.
Such recognition when I read your comment! I have had the same experience. There are some very interesting studies on the use of esmethadone (the S- or dextro-isomer of methadone) for depression. It is much less active as an opioid agonist but it affects the NDMA receptor much like ketamine does. I have been on (standard racemic) methadone for opioid dependence for some time, after I found in my youth that opioids were an effective ward against the mind-withering despair that has pursued me throughout my life. I was pushed by the clinic I attend to taper down and consequently suffered a relapse of both depression and opioid use, in that order. Back on methadone at a therapeutic dose I am myself again, but I can’t help but wonder if it had been treating my depression all along. More research on this is surely needed. I have heard this experience echoed by friends on buprenorphine as well. Thought you might find this article interesting.
https://mghpsychnews.org/esmethadone-a-novel-rapidly-effective-treatment-for-depression/#:~:text=Esmethadone%20(REL%2D1017%2C%20Relmada,opioid%20effects%20of%20racemic%20methadone.
Oh, THAT's so familiar as well, heh — "you gotta get off entirely!" → "okay, I'll try..." → [depressed again] → [addicted again].
Very intriguing link, thank you! Will read in detail as soon as I find my damn phone charger... don't judge me, it was a long night aight!
I have long been sort of skeptical than NMDA-anything can be related to depression — I have been unimpressed by personal experience (also in attempting to use NMDA antagonists for tolerance reduction; I dunno, maybe it's just subtle...) and what I've read so far — but it also doesn't seem IMplausible, either. If there were ever a silver bullet, hitting mu-opioid and NMDA both is surely it...
(...I did truly love methadone, too, although I don't know if the regular stuff is the racemate or the other enantiomer, heh.)
P.S. "Laurencat"? What a cute name. Cats are the other reason I haven't taken The Big Sleep yet. My kitty is right next to me right now!
(...as always. God bless her. oh, kitty, I want you to live forever. now I'm gonna cry don't look at me—)
I have always thought cats are the morphine of the animal kingdom. Warm, comfortable, a little dangerous, and habit forming. I have one curled up with me now!
Your commentary on opiates as a long-term anti-depressant resonated with me and mirrored my personal experience as well. I experimented with a lot of drugs and my late teens early 20s that probably did some numbers on my brain but the one drug I can never bring myself to quit, was opiates. I was always able to maintain a very low dose of whatever opiate I was taking and I have been doing that and living a productive life for 20 years. It takes many years for me to build a tolerance, and when it does, I usually switch drugs, and I can start at a low-dose again of a new opiate drug. I started with the badboy oxycontin, than tramadol, than methadone, and now I am back to oxycodone. I always stayed at low doses and once it got to more than 1 pill a day i knew that I needed a switch. This system has worked better for me than any other system I have had. I don't smoke or drink, and I live a pretty happy, productive life and raise a family. If one day I wasn't able to get these prescribed I would sink in to a permanent mild depressive state that would last indefinitely, after a few months it just becomes too much. I lie about the reason i need them to my drs also so I can get them because no one will prescribe them off label,
Yeah, it is rare to find a doctor that one can truly be honest with — Scott's essay defending "pseudo-addiction" blew my mind; I had the same thoughts before when my then-psychiatrist was "concerned" with how worried I was about getting my buprenorphine refilled, lol... — and so far I've met exactly 1 doctor who was open to the idea of prescribing opioids off-label for depression (and even then, I wasn't asking him to do so; we were just chatting; I bet if I'd actually requested it, he'd have said no way... not that I can blame him, heh, the way the discourse is going these days).
It's interesting you mention the slow build-up of tolerance and don't mention any other side effects — I also take a long, long time to attain any sort of significant tolerance, and have literally never once experienced any side effect such as constipation, difficult urination, nausea, lack of appetite, etc. It's another reason that made me think I was made for 'em, lol!
Bupe is easy to get if you've been on other opioids for a long time, FYI, as a possible emergency route. It's not a full agonist, but it has kept me well and happy enough to enjoy life, no question.
The sample size was only 20 per group, I think this is too small to say anything for sure - have written my own post here: https://open.substack.com/pub/rationalpsychiatry/p/the-powerful-and-the-damned?r=g83wq&utm_campaign=post&utm_medium=web
Agreed. Particularly with all the potentially confounding psychological experiences of the surgery itself, anesthesia, pain meds, post-surgical relief, and changing care routines that happen as part of a surgical experience.
Agree.
I just read your whole post, that was really well done!
In a totally unmathematical way, the study seemed way too small. You raise an important ethical issue too which is the cost to research subjects and to the field being studied if you run a study that's underpowered.
I also wondered whether the potential swamping effects of the whole physical and psychological experience of surgery constitutes confounding variables that need to be addressed?
It seems likely the combo of the study being underpowered and the many aspects of a surgical experience being so potently affecting to people's moods that this study can't tell us anything useful.
There is a massive cottage industry of scientists who write papers along the lines of "my underpowered study didn't find an effect, therefore there is no effect". Is it true? Probably not. But it gets published, which is what is truly important here.
It may get published, but it won't get published someplace with high standards. It's astounding the shit that gets published. I once searched PubMed for the weirdest, stupidest pseudoscience I could think of, and I actually found published papers about each one -- shit like energy healing and astrology and their ability to cure this and that. The worst was the result of a search for "demonic possession." Yup, actually found a paper arguing that psychotic people do not suffer from an illness, they are possessed by demons and need an exorcism.
I am now imagining a study in which real exorcisms ( conducted by a bona fide priest) are compared to placebo exorcisms (conducted by an actor)...
"We've swapped the usual holy water with Folger's crystals, let's see if he notices."
Thank you this was basically my first thought but I thought I was going to have to actually do the power calculation myself.
Thanks, will take a look!
Without doing calculations, just looking at the error bars on the graphs, they seem pretty large. Questions:
1) What would be the effect you'd expect by Ketamine if it worked, in units on the MADRS scale?
2) What do the error bars represent? Sample variance? Variance on the mean? At 90%? 95%? 68%?
The error bars in this case are standard deviations. It’s difficult to say what change in MADRS we’d expect, these effect size questions are tricky - https://www.astralcodexten.com/p/all-medications-are-insignificant
> Sometimes researchers try to use an “active placebo” like midazolam - a drug that makes you feel weird and floaty
How do the ethics of "active placebos" work?
In a normal trial you either give a trial drug (which we have a fairly good reason to suspect might be helpful) or a placebo (which we are confident will have no effect at all). But giving an active drug with potential harms and side effects, without any kind of belief that it will cure what ails the patient, just for the purposes of a more realistic placebo, seems ethically fraught.
Also, if the trial group does better than the placebo group, can you be confident that it's not just due to undocumented deleterious side effects of the active placebo?
these are solid questions
Yes, they are ok *questions* but what they aren't is moral reasons. There is this very unfortunate tendency, especially in bioethics, to react to anything that's a decent question as if it's a reason to avoid doing the thing. A question isn't a persuasive argument.
I don't mean to jump on you, they are worthwhile questions, but the problem is that if you leave the dialectic here everyone treats the fact that those are questions (and the fact they are a bit scared to answer them because every coherent answer requires bitting some bullets) as if they are reasons to shy away from performing those studies and the net effect is that we end up causing more suffering in the name of moral concern.
Thanks for your reply and I suppose I can see how you could interpret my minor comment as something bigger or different from what I meant, which was simply to acknowledge the questions themselves (which I tend to see as a gateway for exploration rather than a reason to discredit or dismiss). Personally, I wouldn't think these questions would prompt a write off but, on the contrary, would urge more interest. That said, while the questions were thought provoking, I'm not sure I would frame it as an ethical dilemma, myself.
As an aside: This article felt timely as I just completed three weeks of IV ketamine treatments after having read studies, speaking to others, following my intuition, and doing some extensive training on supporting others through integration with psychedelic journeys. I'm personally a believer in the power for healing with this approach, and I'm curious about how it's viewed and studied by others. I both feel solid and restful in how it impacted me, and I'm open to hearing/reading things that may offer other insight.
I hope the studies continue, and I hope it reaches a stage where it isn't seen as "experimental" and remains out of reach of so many who, I believe, could find some tremendous healing. And if, in the end, there's some indication of placebo effects, I'm still here and happy to feel better. As with any topic, I hope questions don't become an excuse to "avoid doing the thing."
(Also, none of this is my wheelhouse. I have seen some reports of brain scans pre- and post-treatment and the effects on synaptic connections which I tend to think builds more of a complete picture than relying on questionnaires.)
To be clear, I didn't interpret your comment that way. I tried (unsuccessfully apparently) to indicate that when I remarked about how it left the dialect. Your comment is perfectly reasonable and correct.
However, the problem is that you say that and then there is a tendency of people to treat the fact that there are good questions here as a reason to treat such studies as if they have a partial black mark against them (well there are unresolved ethical questions).
The underlying problem isn't your question it's the fact that people have an asymetry in how they treat doing a study versus not doing it because in reality there are good questions about whether it's ethical not to conduct such studies as well but because that's not doing something to most people's minds they don't treat that as also having a cloud hanging over it.
So I don't mean to critisize you (and sorry if it came off that way) but as a kind of biased response that many people have to the perfectly correct point you made.
Totally get it and I'm glad we cleared it up - haha. A lot of nuance lost in writing when we all read through our own lenses and inflection, hey?
Just as an additional remark, what I think is deeply wrong with the assumptions made in much of bioethics is that it implicitly slights the very kind of concerns and interests like the ones you mention here. You don't just selfishly care about yourself, you care about finding out more about this kind of therapy in the hope that it will benefit others.
Yet the default assumption is to disregard those interests of yours as if they didn't matter or couldn't count as genuine benefits to you. That's not only wrong but I think a deeply disrespectful attitude taken toward patients. Also, inconsistent as we are willing to give weight to totally irrational desires of patients to refuse treatment such as being a Christian scientists or thinking vaccines contain mind control.
I think this systematic tendency is the result of two unfortunate incentives in bioethics: the incentive to claim that purely pragmatic concerns are moral principles and the incentive to cater to the medical establishment/avoid criticism not the public at large.
The issue with 1 is simply the incentivizes of academic philosophy. New arguments get published and simply saying that there are risks that researchers will overstate the benefits of their research in convincing patients to take part or otherwise just saying we need to balance consequentialist costs/benefits doesn't result in many papers but coming up with a new argument why this thing is really a moral principle does.
The issue with 2 is more obvious if a bit more counterintuitive. You might think the incentive would be to tell researchers they can do whatever they want but doing that risks blowback on the institution. Moreover, it's not what taking patient interests seriously would require. What that actually would require is that researchers themselves made judgements about the value of their research and presented those honestly to the patients not just threw forms at them the IRB demanded and got to avoid making any hard moral calls themselves.
No, this whole idea that you can never administer anything with the slightest chance of harm to a patient is just total hogwash and never made sense. The patient fucking undertakes a risk of serious death every time they have to drive into the treatment center for an evaluation or followup but apparently we don't count that.
What matters is that the patient understands the risks and wishes to undertake them. Frankly, I think it's downright near unethical to refuse to consider the patients own desire to accept a certain risk to help understand a condition -- that's refusing to respect the patients own beliefs and preferences and substituting the concerns of the person administering the test. That's not ethics it's selfishness.
This all makes sense, but from reading the ACX posts on IRBs, aren't all the imperfections of the researchers swamped by the IRB the way it is currently defined and practiced?
Yes, it's the people who run and support the IRB system who bear the moral blame here. I don't think I focused on the researcher (it's the people who have a choice about this which are the IRB and those who entrench that system).
This is why it's so important for researchers to stay up to date on the latest social justice fashion in order to override the IRB concerns by framing any regulatory restriction as an attack on the currently most favored marginalized groups or the defense of existing structural exclusion.
Researchers who put their personal comfort over working the system as it's intended to work now also bear some blame, since it's not as if the IRB system has a strong independent base of support and is immune to political pressures.
"I feel personally uncomfortable accusing individual reviewers of being transphobic white supremacist rape apologists to get my paper through review" may be a laudable personal ethic just like "I feel killing children is wrong", but if your job is to do scientific studies or fly predator drones, that ethic directly conflicts with your career development and job duties in a US context.
I thought this was pretty well settled in bioethics and that active placebos are used pretty often?
The harms of midazolam are very low. It's a common anti-anxiety medication. Millions of people take it per year and the harms are well-known.
The wikipedia page for Midazolam says, "Benzodiazepines can cause or worsen depression." Ouch!
This is mostly talking about chronic use. Taking midazolam once isn't going to make you depressed.
I sort of doubt that's even true, to be honest. Maybe the picture has changed, but when I looked into it some years ago I couldn't find any solid reason to believe they *actually* worsen depression at all — just the usual flawed and underpowered studies, using weird metrics (and at least one of which actually found diazepam, IIRC, had the opposite effect!).
I'd give it, I dunno, 7:3 artifact:real.
I don't think Scott's presentation of the nature and purpose of placebo controls is exactly right, and would point to this article for a more in-depth discussion. https://doi.org/10.1016/S0895-4356(01)00496-6
When a trial is performed, the role of the placebo-control is to help separate signal from noise. They are designed so that comparisons between the groups tell you about the signal (the effect of your intervention compared to placebo), and the rest is noise. Often times, all of the noise gets called "the placebo effect", but that's misleading. The placebo effect would be the difference between getting the infusion of saline (the placebo), and getting nothing - which was not assessed in this study.
Looking at the difference within a single group, such as the pre- to post- comparison of patients who got placebo, is just a cohort study following those patients while they undergo surgery. It would be silly to attribute a reduction in depression scores for people before surgery and after surgery to the placebo - it's the result of the things that happened in the interim (getting through a surgery, possibly related to receiving the anesthetics as mentioned, though I'm skeptical of that). It has nothing to do with placebos or even regression to the mean.
Easy fix! We can just call it (meaning "all the interim we stuff plus a non-ket anesthetic") an "active placebo"!
All that ‘interim stuff’ is happing in the intervention arm too, so it doesn’t make sense to attribute it to an active placebo.
Also, did they give patients opioids after surgery?
Yep, I heard opioids can make you feel fine for a while. Experienced it, in fact. Very effective antidepressants if it wasn't for the obstipation and some other problem.
They've worked great for me for about fifteen years!
I feel like maybe I was born with an endogenous opioid issue. I get 0 side effects — neither constipation, nor difficulty urinating, nor nausea, nor even significant tolerance — regardless of how much opioid I'm on; and EVERYTHING in my life got better, even the stuff opioids normally make *worse* for people.
(e.g.: more appetite, no more insomnia, no more cramps and diarrhea, better mood, more sociable, more diligent, less sensitivity to sunlight...)
I love them. Right before I discovered opioids, I told my parents I couldn't remember being happy. I'd been to psychiatrists, tried all the first- and second-line stuff, therapy... didn't help; but the instant that hydrocodone hit my bloodstream, I was cured.
Effect hasn't ever gone away, either. Did I get super addicted for a while? Well, yes—bur I've been stable on the same low dose for years now. Thank God for the poppy.
There really seems to be a kind of healthy opiod use but one has to be careful. I've met a few persians who used to consume opium in their country and after emigrating ended up in a methadone program. They told me things like "It's like your beer here in germany".
Typo thread: "Third" appears twice, should be "Fourth" the second time.
I was curious if this was the sort of thing ChatGPT would catch, it does https://chat.openai.com/share/1f626deb-280e-4264-9138-109b4ae1ddf7 but it raises six other things that aren't actually issues at all.
Tangential: Is anyone else a bit sceptical of the current focus on measures of "effect size" as currently defined? Lots of review articles just quote those, and I suspect that a lot of nonmathematical readers assume from the name that it means something like "how effective the treatment is" , which ought to be scaled such that some number eg 1.0 meant "completely cured". But actually it really seems to mean something more approaching "signal to noise ratio", so it's telling you how reliable the study is. Which is vital to know, but actually I'd like to know how effective the treatment is too. In theory it's useful as a comparator of treatment effectiveness, but I'm not convinced that people are paying sufficient attention to the conditions for that to be the case.
Related post: https://www.astralcodexten.com/p/attempts-to-put-statistics-in-context
Personally I don't understand how effect sizes are useful.
Aah another one I missed!
There is some interaction between protonmail /substack which means I don't seem to get a notification for a good fraction of Astral Codex Ten posts, even though I have all the emails when I go looking
No, there is a number that quantifies the effect size in a way that really gives you a feel for how large and how important it is, and most studies use that number. The thing that you're calling the signal to noise ratio, that tells you how reliable the study is -- that's a different statistic. It's usually in the form of p < some number, say .01. That means that the chance that the difference found between the subjects is mere coincidence, rather than an indication of a real difference, is less than one in 100. The statistic that gives you a picture of how important and substantial a difference is is called standard deviation.
So take caucasian men's heights. The average is 69". So say you get the height of all caucasian men and see how far each is from 69", then average all those numbers. That measure is called the standard deviation. It's the average distance people. in the population are from the mean. For this population, it's 2.5". Someone who is 71.5" tall is one standard deviation above the mean. So it turns out that for most measures like this, whether it's measures of height or of depression scores, there's this regularity: 68% of people in the group are within one standard deviation of the mean, half of that subgroup measuring above the mean and half below. Under this rule, 68% of the data falls within one standard deviation, 95% percent within two standard deviations, and 99.7% within three standard deviations of the mean. So you can see how that measure gives you a feel for how big and important a change is. If there are 2 guys, and one is 1 standard deviation taller than the other, then he’s 2.5” taller. So that’s a fairly substantial difference — definitely not too subtle to notice. If somebody’s SAT score goes up one standard deviation, that means it rises by 217 points — so enough to make a difference in what colleges they are likely to get into.
Scott’s study was not finding differences. But I found one meta-analysis of ketamine effects that combined the results of 15 very well done studies, and found that people getting ketamine during surgery had approximately one standard deviation less post-surgical depression than people who got placebo. You now have a sense of the magnitude of the difference, Other things that would give you a sense of how to judge the effect would be how the average depression score of the ketamine subjects compared to that of non-depressed members of the general population. Then you'll know whether they are actually feeling about as good as a non-depressed person, or just feeling better than they would have without ketamine. You can also go look at the actual depression test, and get a sense of how many I-feel-awful items somebody has to be endorsing to get a certain score. That will give you another yardstick.
"It's usually in the form of p < some number, say .01. That means that the chance that the difference found between the subjects is mere coincidence, rather than an indication of a real difference, is less than one in 100..."
Oh god. Oh *GOD*. I don't wish to sound like a Doomer, but the fact that a frequent reader and commenter here would make such a mistake, can't help but make me feel like it's a sign that the entire Rationality project is doomed.
I should be fair I suppose, the posts where Scott talks about exactly this (how p-values are *lying snakes*) are old, almost a decade old, from way back when SlateStarCodex was still in its infancy: e.g. https://slatestarcodex.com/2013/12/17/statistical-literacy-among-doctors-now-lower-than-chance/ (STATISTICAL LITERACY AMONG DOCTORS NOW LOWER THAN CHANCE) & https://slatestarcodex.com/2014/04/28/the-control-group-is-out-of-control/ (THE CONTROL GROUP IS OUT OF CONTROL).
Basically, if one thinks p-values are the chance of things just being a coincidence,
1. You get weird results like "A screening test for cancer with a p-value of 0.01 / 99% success rate on cancer-free people, actually has a success rate of only like 10%, in that only 10% of those who get flagged actually have cancer" (the classic Base Rate Fallacy/False Positive Paradox: https://en.wikipedia.org/wiki/Base_rate_fallacy#Example_1:_Disease)
2. You get weird results like a study with an amazing p-value of 1.2*10^(-10)... saying that psychic powers are real, and people can see the future if they vibe hard enough.
Those are very very important things to learn, the foundation of almost everything Scott has ever said about science and the Replication Crisis (e.g. his famous 5-HTTLPR article: https://slatestarcodex.com/2019/05/07/5-httlpr-a-pointed-review/). Habitual Scott readers should be some of the most familiar people on the planet with the idea "p-values are *lying snakes*, don't trust them" and "p-values don't mean what people think they do". *Not* knowing that after years of reading SlateStarCodex should be as impossible as a modern young Democratic Party staffer having no idea who Trump is -- in a way, we talk of nothing else but the adversary we're fighting against.
I just don't know what to say in response, I suppose, beyond "Thank you for teaching me something I didn't know, it shattered a lot of my beliefs in the same way mythical Ozymandias would be shattered if he saw that statue of himself from Shelley's famous poem, "Ozymandias". But as we like to say, 'That which can be destroyed by the truth, should be.' "
... damn, I know I sound like I'm overreacting, but like I said, this should be as impossible as a Bible reader having no idea who Satan is. The fact that it *is* (possible), is not an indictment of you, it's an indictment of me and my warped worldview... God, this must be how universities feel when they review the evidence that they don't actually teach critical thinking, just how to pass tests (e.g. https://www.insidehighered.com/views/2016/06/07/can-colleges-truly-teach-critical-thinking-skills-essay).
... I used to believe that the key to solving problems was being the change you wanted to see in the world. Once again, I have been burned. Damn it.
As taught in any introductory statistics class, the p-value is, in fact, more or less what Eremolaos claims it is by definition. Yes, we all know there are many problems with how the p-value is used in scientific publications, and how it is sometimes interpreted, that does not change the definition of the p-value!
A more or less accurate explanation of the p-value would be that it answers the question 'supposing there's no real effect here, how likely were we to get a result this strong or stronger purely by chance?' I think the difference between this and the quoted definition is pretty important. If we test a plainly ridiculous hypothesis and get a positive result with p < .05 we should not conclude that the probability of a real effect is >95%.
I fail to see the difference between what you are saying and the original statement, i.e. I thought that was what it was saying in somewhat loose terms, at least by a charitable reading?
I've just replied below (under your reply to my reply to K. Liam Smith) and I think I'd be repeating myself here, so if you feel like carrying on the conversation let's switch to that subthread.
If p-values meant the straightforward, useful, understandable thing people think it does, then John Bohannon's "Chocolate = Weight Loss!" study would not have been possible (https://gizmodo.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800)
But we live in the universe where it actually means a complex, hard to use, and unintuitive thing: "The probability of observing a result at least as the extreme as the one observed if the null hypothesis is true." Rewritten to simple English, that's "*If* I'm wrong, this is the probability of seeing what I'm seeing right now anyways due to pure dumb luck."
It is not "The chance of being wrong." This misunderstanding was how Bohannon was able to take 20 completely false facts about chocolate ("Chocolate helps with blood pressure! / Chocolate helps with cholesterol! / Chocolate helps with sleep quality! / Chocolate helps with exercise! / ..."), throw all 20 at a p=0.05 test, find the one that randomly happened to pass ("Chocolate helps with weight loss!"), and let the newspapers publish it as "95% CHANCE CHOCOLATE HELPS WITH WEIGHT LOSS: SCIENCE".
It's not just journalists either, the point of Scott's article about doctors (https://slatestarcodex.com/2013/12/17/statistical-literacy-among-doctors-now-lower-than-chance/) is that 58% of them could not answer this basic question about p-values (46% gave the wrong answer, while 12% could not answer at all). Despite the fact that 63% of them rated their statistical literacy as adequate, only 26% of them got the correct answer on the 5-choice question about the rare cancer screening, i.e. it's entirely plausible that only about 8% actually knew, and the other 92% randomly guessed (cause 8% + [92/5]% = about 26%).
I mean, for goodness sake look at John Ioannidis' classic "Why Most Published Research Findings Are False" (https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124). Ioannidis points out that if you use the actual definition of the p-value, rather than the definition people think it has, and work through the implications, you find that a majority of published research should be false for the same reason that the majority of cancer warnings from the rare cancer test example are false. But people had never realized this for years, thinking that Science was *way* more solid than it really was, cause they didn't even realize they didn't know.
That paper was foundational to the field of metascience, the discussion of the Replication Crisis, the push for statisical literacy in science & medicine, etc. - Scott has essentially been banging on Ioannidis' drum for over 10 years, announcing to anyone who will listen what p-values really mean, and thus *really* mean once you digest the implications^[Footnote 1]. The fact that people *still don't know what they don't know*, despite listening to the drum for years, means that something must have gone terribly wrong. Even if I still don't know what exactly.
(Footnote 1: Scott *explicitly* discusses things like this at the likes of https://slatestarcodex.com/2015/05/30/that-chocolate-study/ [That Chocolate Study], what people should be taking away from all this. Not just "p-values don't do what you expect", cause that's just foundational, but the implications that flow out from there which people might not see or quickly forget, like "Don't trust lone papers" and "Don't trust science journalism" and "Statistics is unintuitive".)
You know, it's very unpleasant of you to use the tone you are to lay out the case for a different and possibly more deeply true way of thinking about p values etc.. I have not read the stuff you're linking, but I will. I was responding to some people who seemed not to know some basic stuff, so I typed out the clearest brief explanation I could of measures of the sizes of differences and of how much you could trust them. If those people read it, they now at least understand that stuff better. How about taking the trouble to write out a user-friendly summary of your understanding of how one does decide how seriously to take research that finds a difference between 2 groups, instead of several paragraphs of screaming about how only an amazingly unenlightened jerk would fail to know in 2023 certain deep truths of metascience, truths that you link to rather than explaining?
I'm not at this point convinced that the unsophisticated understanding of p values makes a lot of difference in practice. Note that Scott's discussion of the research in the present thread does not challenge its statistics. He also does not challenge the statistics of other findings he mentions about the. topic. In fact I have read lots of Scott's threads that mention various research findings, and while he is in general skeptical of conventional truths, and challenges the reasoning of researchers in ways that seem valid to me, I can't recall him challenging it on grounds anything like what you are talking about.
The user friendly summary: https://kharshit.github.io/img/falsepositiveparadox.jpg
Note that most uninfected people test negative, as they should; this disease test has a good p-value.
Note also that 100% of the infected people test positive, as they should.
The test is still wrong near 100% of the time.
This is the "deep truth of metascience", borne of a slightly more sophisticated understanding of p-values: p-values do not do what people think they do. What people think they do is called "Positive Predictive Value"/PPV/"The chance the thing is actually true", and the Positive Predictive Value of most studies is wildly overstated, much worse than what they imply through their p-values. In fact, most studies in a lot of fields are straight up garbage: https://fantasticanachronism.com/2020/09/11/whats-wrong-with-social-science-and-how-to-fix-it/
"Criticizing bad science from an abstract, 10000-foot view is pleasant: you hear about some stuff that doesn't replicate, some methodologies that seem a bit silly. "They should improve their methods", "p-hacking is bad", "we must change the incentives", you declare Zeuslike from your throne in the clouds, and then go on with your day.
But actually diving into the sea of trash that is social science gives you a more tangible perspective, a more visceral revulsion, and perhaps even a sense of Lovecraftian awe at the sheer magnitude of it all: a vast landfill—a great agglomeration of garbage extending as far as the eye can see, effluvious waves crashing and throwing up a foul foam of p=0.049 papers. As you walk up to the diving platform, the deformed attendant hands you a pair of flippers. Noticing your reticence, he gives a subtle nod as if to say: "come on then, jump in"."
All this is to say what you did was helpful: teaching people that p-values are not effect sizes, and that something that has a good p-value can have any effect size imaginable. Sadly, however, it instead perpetuated the misconception that p-values are good, rather than bad (to put it simply). P-values are bad. No one should trust them. This is the most important thing one can learn, even if it is an oversimplification. Don't trust p-values.
> As taught in any introductory statistics class, the p-value is, in fact, more or less what Eremolaos claims it is by definition.
What? Eremolalos' definition is completely unrelated to the definition of the p-value.
The definition of p-value is "the probability of seeing results similar to what we actually observed under the null hypothesis". Stated as a conditional probability, the p-value is P(observed data | null hypothesis is true), which is easy to calculate, and Eremolalos has erroneously defined it as representing P(null hypothesis is true | observed data). These are unrelated quantities, and the second one obviously cannot be calculated at all, which is why nobody tries to do so.
Hmm, yes I think you are right - it has probably been too long since I attended that statistics class.
No, what I said is right -- except for whatever objections Scott is on board with, which I have not read yet. But those I'm pretty sure are more subtle and philosophical, having to do with how one thinks of probability of being wrong -- not simple math objections.
>The definition of p-value is "the probability of seeing results similar to what we actually observed under the null hypothesis". Right. So let's say our hypothesis was that on average caucasian males with blue eyes are shorter than males with brown eyes. The null hypothesis is that on average blue-eyed males are the same height or taller than brown-eyed ones. So we measure 1000 of each, and get the mean and standard deviation of each group, and we find that on average blue-eyed males in our sample are half an inch shorter than brown-eyed ones. But our hypothesis is not about the 2000 men in our study, it's about *all* caucasian men. So we need a statistic to tell us how reasonable it is to use our result to make that generalization about all caucasian men based on our result. What is the chance that we would be wrong. That statistic is p. It's the probability that the null hypothesis is correct: Blue-eyed men are not shorter than brown-eyed men.
>Stated as a conditional probability, the p-value is P(observed data | null hypothesis is true), which is easy to calculate,
How would you calculate it, given my result? I gave you the mean height of each of the 2 groups in my study: Blue = 68.75, brown = 69.25. If you need standard deviations of each of the 2 groups in my study to make the calculation, let's say that standard deviation for blue-eyed was 2.3 and for brown-eyed it was 2.6. 1000 blue-eyed and 1000 brown-eyeds were in the study.
I can't really tell what you're trying to say here. Are you looking for the observation that, because you don't have a null hypothesis, you're unable to calculate any p-values? That's the only possible response if I read your comment literally.
Or are you looking for me to supply my own null hypothesis and calculate p-values from it? (The logical choice would be "the blue distribution and the brown distribution are both normal distributions with mean 69 and standard deviation 2.5", since you've already specified that that is the overall distribution.)
I agree with Leppi below. Eremolalos gave an accurate explanation of p-values and how they’re interpreted by the broader scientific community. You might have concerns with p-values and prefer a Bayesian approach, as I do, and the broader rationalist community does. But it’s not like Eremolalos waved a wand and said, “Let there be frequentism.”
I'm curious for (reliable) sources that give Eremolalos' definition. Wikipedia and my memory give: the p-value is the probability that the underlying process would generate the given observations (or observations more extreme) if the actual effect size is zero.
Is there an equivalent (for frequentists) definition where they get the probabilities of the null hypothesis? How do frequentists get around the base rate issue?
I answered the first part of your comment in a comment below.
As for, "How do frequentists get around the base rate issue?"
A snarky Bayesian response would be, "they don't." This is one of the major criticisms that Bayesians have of frequentism in general, so much so that there's even a xkcd comic about it [https://www.explainxkcd.com/wiki/index.php/1132:_Frequentists_vs._Bayesians].
> Is there an equivalent (for frequentists) definition where they get the probabilities of the null hypothesis?
The null hypothesis is usually that the control group and the test group are the same. However, let's take a common example: coin flipping. It's usually imagined to be 50/50. But where does this come from? In reality, coins don't actually have 50/50 odds of being heads/tails because metal rubs off from them and they were never perfectly identical to begin with. But coins on average are very close to 50/50. So as a Bayesian we'd say our prior is 50/50. As a frequentist, you just ignore the problem and don't call it anything.
(I'm not endorsing the tone of WindUponWaves's comment, but) I don't see how "the chance that the difference found between the subjects is mere coincidence, rather than an indication of a real difference" can be read as an accurate explanation of the p-value. The real definition is
> the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct.
> the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct.
Therefore the p-value is the probability of making an observation at least as extreme "by chance" given that the 0 null hypothesis is correct. Which was approximatly what was stated?
My reading is that "the chance that the difference found between the subjects is mere coincidence, rather than an indication of a real difference" = 'given we got this result, how likely is it that it's a false alarm and there's no real effect?'.
Whereas "the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct" = 'supposing there's no real effect here, at the beginning of the experiment how likely were we to get results this strong or stronger?'
Those are importantly different because the answer to the first question depends on more than just the answer to the second question. For example if we test a silly hypothesis like 'saying these magic words when flipping a fair coin increases the chance that it will land heads' (null hypothesis: it's still 50/50), and we design and conduct the experiment perfectly but happen to get a positive result with p = 0.04, our estimate of "the chance that the difference found between [flips with and without the magic words] is mere coincidence, rather than an indication of a real difference" should be way more than 4%.
I agree that it's perhaps not as rigorous as you’d like, but I think a charitable reading of it is a reasonable description of p-values in a frequentist paradigm.
It sounds like you’re already familiar with all this, so I’m probably going to say some stuff you already know, but I just want to lay out where I’m coming from.
In a Bayesian context, probability is the chance of an event happening in the future. In a frequentist context, probability is the limit of the frequency of an event after near-infinite trials. It’s inherently retrospective/descriptive rather than predictive. A frequentist would say, in an experimental trial, the coin landed on heads 47 times out of a 100. A bayesian would say, it’s about 1/2 it’ll be heads the next time you flip it.
Changing the definition of probability leads to two different definitions of p-value. p-values are unimportant in a Bayesian context, which has led to a lot of blog posts arguing that they don’t exist at all in Bayesianism. This is Gelman’s definition: “From a Bayesian context, a posterior p-value is the probability, given the data, that a future observation is more extreme (as measured by some test variable) than the data.” [http://www.stat.columbia.edu/~gelman/research/published/STS149A.pdf]
Note the shift from a p-value describing past experimental results to predicting a future observation. But again, these aren’t really used in practical application and this is kind of pedantic. It’s clear that we’re talking about the frequentist definition as almost everyone does when talking about p-values, so I’ll stay focused on that.
In a frequentist paradigm, a p-value is the probability of a false positive based on the frequency of results in the experiment. Let’s say you have a hypothesis that a coin is weighted. In frequentism, a p-value asks the question, “If we had a fair coin, what’s the probability we would get 23/100 heads?”
Let’s say we put a sticker on a coin and want to know if that is sufficient to make it weighted. We flip it 100 times with the sticker as a test group and 100 without the sticker as a control group. We get 23 heads and 50 heads respectively. As you wrote, we want to find “the probability of obtaining test results at least as extreme as the result actually observed [23 heads], under the assumption that the null hypothesis is correct [the sticker doesn’t change the frequency].”
Eremo wrote, “the chance that the difference found between the subjects [the test results obtained with the coins with and without the stickers] is mere coincidence, rather than an indication of a real difference [what was the probability under the assumption of the null hypothesis]”
I don't think the rant from Wind is really about the definition of p-values, but really arguing that Bayesianism is better. I agree with Wind that Bayesianism is better, but that doesn't make Eremo's description of p-values wrong. Imagine there's a car wreck (aka the replication crisis) and a bystander describes how people were driving before the wreck. I think we're yelling at the bystander for describing how they were driving rather than the drivers for causing a wreck.
I'm not meaning to read uncharitably, but it's possible my pedantry or some assumptions I don't realise I'm making are leading me in that direction. From my perspective, though, you're being overly charitable here:
> Eremo wrote, “the chance that the difference found between the subjects [the test results obtained with the coins with and without the stickers] is mere coincidence, rather than an indication of a real difference [what was the probability under the assumption of the null hypothesis]”
To me the natural reading of "the chance that the difference found... *is* mere coincidence..." is something like 'given we got this result, how likely is it that it's a false alarm and there's really no effect'? Which, even if we're not going full Bayesian, is an importantly different question from "what was the probability under the assumption of the null hypothesis". And if we interpret p-values according to that first definition, we're setting ourselves up to draw some silly conclusions.
> In a frequentist paradigm, a p-value is the probability of a false positive based on the frequency of results in the experiment.
But this is false.
Compare Ronald Fisher writing in 1925:
> The deduction of inferences respecting samples, from assumptions respecting the populations from which they are drawn, shows us the position in Statistics of the Theory of Probability. For a given population we may calculate the probability with which any given sample will occur, and if we can solve the purely mathematical problem presented, we can calculate the probability of occurrence of any given statistic calculated from such a sample. The Problems of Distribution may in fact be regarded as applications and extensions of the theory of probability. [p. 10] Three of the distributions with which we shall be concerned, Bernoulli's binomial distribution, Laplace's normal distribution, and Poisson's series, were developed by writers on probability. For many years, extending over a century and a half, attempts were made to extend the domain of the idea of probability to the deduction of inferences respecting populations from assumptions (or observations) respecting samples. Such inferences are usually distinguished under the heading of Inverse Probability, and have at times gained wide acceptance. This is not the place to enter into the subtleties of a prolonged controversy; it will be sufficient in this general outline of the scope of Statistical Science to express my personal conviction, which I have sustained elsewhere, that the theory of inverse probability is founded upon an error, and must be wholly rejected. Inferences respecting populations, from which known samples have been drawn, cannot be expressed in terms of probability
(I was actually looking for him to provide a definition of P, but as far as I can tell he does not do so. You're expected to be familiar with Pearson's work.)
Fisher says in so many words that once you've analyzed your sample and calculated your p-value (in his words, it's just called P), you cannot make any statement at all about the distribution actually exhibited by the population. That's not what p-values represent. Eremolalos is saying the opposite.
I trust you'll believe that Ronald Fisher in 1925 was not writing from a Bayesian perspective?
The definition that Eremoloas seems to suffer from something like the base rate fallacy wrt the base rate of real effects.
Confidence test give you probability that you would see a result as big as the one you saw, given the null hypothesis.
This doesn't give you P(null hypothesis is false | the data). If there are no real effects in the area you are investigating, a statistically significant result just means you got lucky with the random noise, and p(real effect) is still zero.
Well, no, Erolamos is right, p-values are much closer to a measure of signal-to-noise ratio than effect sizes, which are just a rescaling to make it easier to interpret the importance of an effect.
Erolamos is right that p-values are not effect sizes, but wrong in that p-values are not a signal-to-noise ratio or anything like that. A p-value of 0.05 is entirely compatible with picking 20 different falsehoods, throwing them all at a p = 0.05 test, picking out the one that randomly passes, and publishing it as the real deal, for a noise-to-signal ratio of infinity. For example, https://gizmodo.com/i-fooled-millions-into-thinking-chocolate-helps-weight-1707251800 (I Fooled Millions Into Thinking Chocolate Helps Weight Loss. Here's How.)
In fact, a p-value of !!1.2*10^(-10)!! is entirely comparible with stuff like saying psychic powers are real (https://slatestarcodex.com/2014/04/28/the-control-group-is-out-of-control/), because p-values do not measure what people think they do. What people think they do is called "Positive Predictive Value"/PPV, and confusing them is why people have to write papers like https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124 (Why Most Published Research Findings Are False) pointing out what PPVs you actually get from the p-values people are proudly slapping on their papers (hint: the PPVs are depressingly bad).
It is probably a mstly semantic debate, which I do not find really interesting.
I think Erolamos is right in saying that p-value are MORE a measurement of signal to noise than effects sizes. But:
- p-values are obviously NOT identical a signal to noise ratio, they are conceptually related to one. Ands yes, of course, p-values can be tortured, like basically any statistical tool, and Ioannidis has a point but hey, used reasonably, p-value are a simple tool that is frequently useful.
- it of course depends on what is meant by signal to noise ratio. If we stay in the context that was discussed, is there a difference between the effects of two-treatment, then I do think that p-value are a reasonable measurement of the signal (average difference between treatments) to noise (variation within a treatment). But of c ourse signal to noise ratio can also mean something quite different, and in these other contexts, p-values would not corerspond to that.
Erolamos is right to say p-values have nothing at all to do with effect sizes. It's common for things with impressive p-values and even PPVs to have such utterly tiny effect sizes they simply do not matter.
Unfortunately, p-values have very little to do with the signal-to-noise ratio either. They might have, once. Nowadays, people have found ways to tortue them into being whatever value they want, regardless of how little signal and how much noise they're actually dealing with, and the p-value is effectively dead as a honest indicator. That's how you can get a study with an unassailable p-value of 0.00000000012 that's also completely wrong. Or oceans of p=0.049 papers that are near-literal fraud: https://fantasticanachronism.com/2020/09/11/whats-wrong-with-social-science-and-how-to-fix-it/
"Criticizing bad science from an abstract, 10000-foot view is pleasant: you hear about some stuff that doesn't replicate, some methodologies that seem a bit silly. "They should improve their methods", "p-hacking is bad", "we must change the incentives", you declare Zeuslike from your throne in the clouds, and then go on with your day.
But actually diving into the sea of trash that is social science gives you a more tangible perspective, a more visceral revulsion, and perhaps even a sense of Lovecraftian awe at the sheer magnitude of it all: a vast landfill—a great agglomeration of garbage extending as far as the eye can see, effluvious waves crashing and throwing up a foul foam of p=0.049 papers. As you walk up to the diving platform, the deformed attendant hands you a pair of flippers. Noticing your reticence, he gives a subtle nod as if to say: "come on then, jump in"."
EDIT: To illustrate things, I found this diagram: https://kharshit.github.io/img/falsepositiveparadox.jpg
Note that most uninfected people test negative, as they should; this disease test has a good p-value.
Note also that 100% of the infected people test positive, as they should.
The test is still wrong near 100% of the time.
> So take caucasian men's heights. The average is 69". So say you get the height of all caucasian men and see how far each is from 69", then average all those numbers. That measure is called the standard deviation. It's the average distance people. in the population are from the mean.
Huh. I was mostly following the discussion below you of your definition of p-value, but you should be aware that this is also grossly wrong.
You are right that it is wrong. I simplified it because I was writing an explanation that assumed the reader didn't know a thing about stats. Standard deviation is actually the square root of the sum of the squared differences between each subject's height and the mean height. So it's not a simple average, but an average that gives more weight to the values with greater distance from the mean. The simplification does not make any difference in the conceptual understanding I was trying to give readers. A conceptual explanation of standard deviation is that it's a way of quantifying how spread out the values are around the mean. That was what I was trying to get across.
> Is anyone else a bit sceptical of the current focus on measures of "effect size" as currently defined? Lots of review articles just quote those, and I suspect that a lot of nonmathematical readers assume from the name that it means something like "how effective the treatment is" , which ought to be scaled such that some number eg 1.0 meant "completely cured". But actually it really seems to mean something more approaching "signal to noise ratio", so it's telling you how reliable the study is. Which is vital to know, but actually I'd like to know how effective the treatment is too.
You've gotten the terminology backwards. "Effect size" is a measurement of "how effective the treatment is", exactly what you claim it isn't. (Except that since there is no concept of "completely cured", effect size is reported in terms of how much of a difference the effect makes, not in terms of a hypothetical absolute scale.)
The measurement of "signal to noise ratio" is known as the "p-value".
I do know what P-values are (it is unfortunate that the rest of the thread has diverted onto discussing those)
It is true that effect size should be *proportional* to the effectiveness of the treatment. But
most (all?) mathematical metrics used for effect size, eg Cohen's 'd' are normalised to the to the population variance (hence my comparing it to a signal to noise ratio).
These seem to be used in two ways which are mistaken:
1) to compare studies, which aren't guaranteed to have the same population variance
2) There are standard 'this is a good study' thresholds of effect size, which are often then interpreted as 'this is a good treatment'. But the effect size does not tell you this, as a weakly effective treatment can end up with an 'effect size' over the threshold, depending on the population variance.
I don't see what you're objecting to. The measurement of how effective the treament is is called the effect size. The effect size is reported in relevant units - for example, if your car generally goes 375 miles on 12 gallons of gas, then the effect size on range of refueling 2 gallons is "62.5 miles". (Well, possibly a bit less, because of overhead.)
You can report that in terms of the standard deviation of total range, as sampled over X driving-hours over various types of terrain, but since the deviation of range is reported in miles, your result will still be 62.5 miles. This is not "proportional" to the effectiveness of the treatment, it _is_ the (averaged) effectiveness of the treatment. The Cohen's d number, interpreted without reference to the size of the standard deviation, seeks to answer the different question "how difficult is it to get this much of an effect?"
(I would expect this normalized effect size to be incredibly large, because I don't think the variation in mpg across a lot of normal usage of the car is going to represent anywhere near as much of a change as increasing the amount of fuel by 17%. But it will still be 62.5 miles.)
> 2) There are standard 'this is a good study' thresholds of effect size, which are often then interpreted as 'this is a good treatment'.
Study quality is traditionally measured by the p-value. It is not measured by the effect size. Importance of the result is what is notionally measured by the effect size.
You might observe that low study quality directly causes larger effect sizes (after a p-value filter is applied), but this is not part of any standard evaluation of study quality.
I did once encounter someone who adamantly insisted that of course a measured effect size would fall as a study was repeated with larger sample sizes, because that's just how statistics work as sample size increases. There is no limit to the stupidity of people pretending to do statistics. But that has no particular implications for what measured effect size means.
It seems there is a terminology issue. What you are describing as effect size, as being reported in relevant units, is fine. I have no objection to that. According to wikipiedia, numbers normalised to variance like "cohen's d" are actually supposed to be called "standardised effect size".
What I object to is review or survey papers where, on the basis of the studies reviewed, different treatments for the same condition are compared solely based on "Cohen's d". I don't have time now to go looking for papers where I've seen this, I'll try and do it later. These have the pitfalls that I complained of in my previous comment.
I can understand the objection to reasoning along the lines of "this treatment has an effect size of 0.4 standard deviations, which is Large; therefore it is worthwhile to do this treatment". That's confusing a metric of difficulty with a metric of value.
But I don't see the problem with comparing different treatments for the same condition to each other by relative effect size. You don't even need to normalize against a standard deviation. No matter how you normalize the effect sizes, comparisons between treatments will always make exactly the same amount of sense; if treatment A has an effect size of 0.03 d, and treatment B has an effect size of 0.09 d, that looks identical to treatment A having an effect size of 8 benefits and treatment B having an effect size of 24 benefits. What does one benefit mean? Who cares? In all cases, you know that treatment B has three times the effect of treatment A.
Comparing different treatments would be fine if the population would be the same, but there review papers are looking at separate studies of diffierent treatments. They can be drawn from different populations, either due to explicit selection criteria or as an artifact of how and where the study was conducted. For example, one study might exclude the hardest cases, another might be of hospital inpatients . In such cases the population variability should not be assumed to be the same, so 'd' should not be assumed to be comparable.
I think you have a typo: You list "Third" twice, instead of "Third" and "Fourth."
In any case case, I agree with your (first) "Third" reason for doubting this study. The conscious, subjective effects of the drug are likely causal for its antidepressant effects. There is no reason to expect that the drug would or ought to have the same effects if the patient is unconscious when the drug is administer. (I would suggest that the same is true for MDMA, which has achieved good results in the MAPS trials, and psilocybin as well.)
The fact that this simple point seems lost on many suggest a serious misunderstanding as to how these drugs work. There's a bad mental model in play, as if depression were like a bacteria and the drug were an antibiotic that kills it, or some other such simplistic mechanistic model.
I think that might be the right mental model for ketamine, though. Another way of saying that would be to say that ketamine is sort of like a fast-acting SSRI: If it works for someone, it works by correcting something in how the brain is working, rather than by inducing an unusual experience, with the experience itself being the agent that helps improve how the brain is working. And in fact that's how most drugs work, includiing drugs that work on the brain -- tranquilizers, sleeping pills, adderall, drugs that combat epileptic seizures . They don't induce a novel experience that is meaningful in a way that leads you to become less anxious, more sleepy, more focused or less likely to have a seizure -- they just tweak the brain directly.
I have experimented with ketamine probably about 10 times, using the pure pharmaceutical drug I obrtained through an MD who's a friend. I did not find it to be a rich experience at all. Cannabis, even in moderate amounts, has induced far more intense, rich and novel experiences, some of which were quite important to me , experiences I remember now decades later. My experience was that ketamine really *felt like* like an anesthetic, a knockout drug. It made me sort of half-conscious, and while that was novel, it wasn't a rich sort of novel. Nothing from deep in my mind rose up to fill the gaps left in my consciousness. I was just sort of blank and disoriented. I have more to say about this in an earlier post.
I'm not arguing against the *possibility* that it works that way. I'm arguing against the *assumption* that it must work that way, which is what you have to make for its failure when administered while unconscious to be invalidating.
Also, I have a pretty high confidence that the efficacy of MDMA and psilocybin positively correlate with (and may be partially caused by) conscious experience, so I think it's warranted to be cautious about the assumptions made regarding ketamine.
Oh I have no doubt that MDMA and the psychedelics induce experiences that are meaningful in and of themselves, and can be change agents. I'm just skeptical that ketamine works that way. My reasons: (1) It's not even classed as a psychedelic -- it's a dissociative drug. (2) I have taken several doses of ketamine, some pretty large, and my experience was that it definitely fucked me up, but not in a rich way. It was sort of like being drunk to the point that you're on the verge of being in a stupor, and don't have much going on in your mind or your heart. (3) A meta-analysis of ketamine studies that I found looked at the combined results of 3, maybe 4 studies that compared ketamine benefit of people who were given it during surgery -- some while knocked out cold, others who had had a spinal and were awake. Both groups benefitted (had less post-surgical depression) and they benefitted equal amounts.
(In another post I tell about that meta-analysis in more detail, and link to it.)
I'm torn on this question because for me personally ketamine has provided me a ton of extremely rich subjective experience that has been very meaningful to me. I've used it maybe a hundred times now over several years at low to medium doses. For me the experience is as rich as other psychedelic experiences I've had. I have about two hundred pages of notes from these ketamine experiences, including documenting specific ways the thought/feeling material while on it shifted my relationship to circumstances I was in at the time as well as observations about how it's affected me more generally across time.
At the same time, I've worked as a therapist with a fair number of people who have been on a variety of ketamine protocols and it seems quite clear to me from that clinical experience that a lot of people are getting benefit from it without having deeply meaningful subjective experiences -- these are people at both low and high dose ranges. As a related aside, I know people on low dose psilocybin protocols who are experiencing significant therapeutic effect from it without any altered consciousness experience.
I just think we have to admit we don't know what's going on yet and to keep being open to the wide variety of experiences people are having without privileging one story over another when we really don't have enough knowledge about mechanism of action to draw those conclusions.
Perhaps related side note: I'm a long-time student of Buddhism and for me, ketamine produces a state of mind that is pretty reliably adjacent to the warm-hearted "no-self" experience that can happen while meditating. It's not the same, but it's in the same ballpark for me. My internal conversation back and forth between ketamine and Buddhism these past few years has been really wonderful.
> I have about two hundred pages of notes from these ketamine experiences, including documenting specific ways the thought/feeling material while on it shifted my relationship to circumstances I was in at the time as well as observations about how it's affected me more generally across time.
Well, I would say that is proof positive that your ketamine experiences and their value were different from mine! I used to write notes a lot when I was high on weed, and when I read them later I still have the feeling I had when I wrote them: They're interesting ways of thinking about things -- they're striking, they're funny. If I had a ketamine journal all it would say is "holy *shit* this sucks, how long til it wears off?"
Based on various forums and trip reports, your experience seems quite atypical.
It may be. I spent some time on the ketamine subreddit, and most there either felt very helped by ketamine, purely as a drug, or else felt their altered consciousness on ketamine was itself helpful. But I think forums like that select for people for whom the drug is helpful, or who at least think it is or hope it will be. People with experiences like mine might visit once of twice to describe what they went through, but then have no reason to want to keep participating in a setting like that,. Another factor is that my first ketamine experiences were with substantial doses given intramuscularly. When you take it that way you go from normal to totally gorked in about 3 mins. Maybe I would have felt more positive about the stuff if I'd had a gentler beginning. On the other hand, when I used the troches I was not alarmed or overwhelmed , but also did not feel like my experiences were very rich or valuable. I floated along, listening to music, feeling sort of peaceful and enjoying the music. It was pleasant but just not very substantial.
I personally heard from someone who also received a powerful, quick-acting dose at a clinic for her first experience. She was so put off that she decided not to return for a second time. This should really be avoided.
My view of your experience as atypical stems from more general discussion threads where people rate different drugs, not just from ketamine-focused forums. This should reduce the selection bias towards just recreational drug users, not ketamine fans. I've also read several posts from multi-drug users who have compared various drugs they've taken.
One of my reliable self-treatments for PTSD is to take a THC edible before going to bed. I wake up happy and relaxed, but slightly hung over. Apparently not only does it dehydrate you like alcohol, but it also messes with REM sleep. But then, part of the reason is that I want to be able to sleep at all, and the other part is that I want to be able to sleep without PTSD dreams.
I can't rule out this being a placebo effect from consciously taking the thing, but I've tried with a few varieties of edibles, and nothing except THC seemed to work. Although I do wonder whether I'd still notice a better effect of indica vs. sativa, if it were blinded.
This is absurd! Hallucinogens don't counteract depression while you're asleep. They work (or don't) because you experience something profound that you take with you after it wears off.
Is it impossible for the medical world to understand that a spiritual problem might have a spiritual treatment? That not everything is simple physiology where the goal is to tone down an overactive hormone or kill a deleterious bacteria? That a despair so profound can exist that it becomes a medical condition but it is, still at its heart, dangerous and intense and durable despair?
Anyone who has done hallucinogens recreationally can understand why they could be dangerous for schizophrenics and profoundly helpful for depressives. It seems...absolutely batshit to treat it like something you can do under anesthesia.
My $.02.
> Is it impossible for the medical world to understand that a spiritual problem might have a spiritual treatment? That not everything is simple physiology where the goal is to tone down an overactive hormone or kill a deleterious bacteria?
Yes, this is, by definition, a nonscientific hypothesis, and if you're doing science, it is not a valid idea to consider.
If people don't have souls, then treating their souls cannot have any effect.
The spirituality language is needlessly distracting, but I think that the underlying idea has merit. What does the concept of "depression" refer to in terms of materialist reductionism? The honest answer is that we have no idea, it's a high-level abstraction that we're able to reason about it only in terms of correlations and other abstractions, far removed from ground-level material reality.
The concept of "altered state of consciousness" is basically on the same level of abstraction (also ultimately grounding in material reality in some currently not understood way), and there are no a priori reasons to dismiss the notion that it can have effect on "depression".
But there are a priori reasons to dismiss the idea that it can have an effect on depression separate from its effect on simple physiology. There is no such thing.
Well, physiology may be simple in some absolute sense, but it doesn't mean that we're currently able to isolate and detect these effects, let alone completely understand. As far as we are concerned, it might as well be "magic" (in this case known as spirituality).
No more so than talk therapy, which also does not work while under anesthesia, for obvious reasons.
In other words: the most common form of treatment for psychological issues, talk therapy, works on the level of consciousness and not root biology, just as I suspect ketamine does.
And to avoid getting bogged down in quasi-religious language -- what I mean is depression is often a hole so deep of hopelessness and self-hatred and worries that you can't see the world outside of it. And hallucinogens are famous for inducing a sense of connectedness where you see how small you are compared to the universe.
Maybe there's some background physiological effect of ketamine that makes it work like a regular drug, but the fact is that it's sensible and easy for anyone who has done hallucinogens to understand why these types of drug might benefit someone with depression...but not while asleep.
Science can study higher-complexity processes of cognitive processes too. It's not the question whether it is soul or consciousness or what, it is a question whether low-dimensional quasi-permanent parameter shifts provide enough resolution for affecting the necessary complicated change. And if not, what other interventions we plausibly have and how they combine. So, in the current context, do psychedelics aid in a kind of self-therapy, and maybe how to best guide it via external therapy.
Science cannot contemplate the idea "That not everything is simple physiology". That is an invocation of the supernatural, and it is assigned a prior probability exactly equal to zero. If any phenomenon has any detectible effect, that is ultimately down to simple physiology.
Whether you're capable of adequately understanding the phenomenon in terms of the simple physiology is a different question, but there is no question of whether the simple physiology is sufficient to explain the phenomenon.
[unrelated footnote: why is the Firefox spellchecker flagging "detectible"? Why is there no entry for "detectible" in Merriam-Webster? What's going on? It's easy to find citations for the spelling, including very modern ones, and the etymology tells us that the suffix appropriate for "detect" is the same one appropriate for "deduct", where all dictionaries agree on "deductible".
It is true that neither of those appears to be etymologically well-formed, but then why are we using "deductible"?]
Charitably interpreted, «simple physiology» excludes «extremely complicated emergent effects of physiology that are definitely not simple and have to be studied on their own because the reduction is way out of reach». «Sufficient to explain» depends on the available resources for intermediate steps. That's how I chose to read the original claim.
PS. Go all the way the other way, «deductable» is an acceptable alternative spelling, according to some dictionaries (although not Firefox)!
That interpretation is not suggested where "simple physiology" is explicitly contrasted with "spirituality".
I don't think they're saying that it works through supernatural means. It gives people a way to reframe their experience that feels less distressing. Many people feel more OK about bad experiences if they think it was productive to go through those experiences.
I am an atheist, but I have experiences like this, too. For example, I am cool with the scary and painful medical/dental procedures I was subjected to against my will when I was a child because I understand they were done to me for a purpose, and it's a purpose I believe in. If I found out that this was a lie and I was actually tortured for no good reason at all, that would be really hard to take.
You might not want to look into prophylactic wisdom teeth removal.
I agree with you generally but I think people working in the ketamine treatment space have different understandings about how crucial various kinds of conscious subjective experiences are to treatment efficacy. For instance, people not infrequently have "bad trips" on ketamine and still experience depression relief afterwards. I don't think 100% of the effect is tied to the quality of the conscious subjective experience. I don't think we know.
There's also the question about whether or not the metabolites of ketamine [(2R,6R)-hydroxynorketamine] are antidepressant. The next question would be if the same might be true about metabolites of the drugs used for the anesthesia.
"Bad trips" are not always interpreted afterwards as something negative by those who take these types of drugs. Interviewing people who do LSD, they often interpreted bad trips as very difficult but yet profound experiences, precisely because they were so difficult and scary. Sort-of framing them according to a "on a hero's journey" narrative. Which is a nice narrative to tell yourself, if you are suffering from depression. The point I am making: Also bad trips can induce a sense of meaning in your life. Perhaps they are particularly well suited for that.
I guess I'll just say the providers I know are less clear than people in this discussion that the conscious experience is 100% of the antidepressant effect. It may be 100% of the subjective psychological meaning-making effect, but I don't think that's necessarily a one-to-one correlation to the anti-depressant effect.
I have a lot of first-hand experience with ketamine and I would say it's not clear to me either. I've also had a number of "bad trips" on various substances, including a few on ketamine, and I wouldn't frame them as "hard but meaningful" though I can see how some people might do that. I have "hard but meaningful" feelings about other life experiences certainly.
Yeah, my most intense experiences with ketamine were in the hard but meaningless category.
Eesh, awful.
A long hike up a mountain might be a hard but meaningful experience. The handful of bad psychedelic experiences I've had in my life just go down as hard and bad.
OMG, my worst ketamine experience was getting caught in a sort of loop. I was sitting on a couch, feeling very disoriented, and having the feeling that I had just been thinking about something important and needed to recapture it. So I would strain and strain with my weak confused mind, and finally I would remember what I had been thinking about: I had been thinking that a little while ago I had had an important thought, and it was crucial to get back to what it was . . . This loop seemed to go on for hours, though in reality it was probably about 15 mins. It was awful -- sort of like being so drunk you have the spins, except this was mental spins not physical ones. The person who was with me kept asking what was going on, and all I could say was "It's a remembering." Oh that was so bad.
I'll add a note from a slightly different angle. Ketamine is also used to good effect with some people who have PTSD. How this helps people seems to vary. For instance, I know someone who on low dose ketamine -- mostly doses below that providing a kind of transporting trip -- over months just gradually became less hyper-vigilant and less easily startled. This was a long-standing pattern despite years of therapy and the only thing that changed was the person taking low-dose ketamine. We can't really attribute the outcome to the subjective trip experience because the trip experience was almost non-existent.
Another person I know who had more inter-personal trauma would get relationally triggered, meaning interactions in his primary relationship could set him off and play out across days. If this person took ketamine after being triggered, he noticed that the ketamine un-hooked him from the triggered state as soon as the drug kicked in, before it had peaked and before there had been time for any subjective "trip" to happen. Just the experience of the drug kicking in turned this person's trigger off and it stayed off after coming down (until they got triggered again some other time, though getting triggered in general decreased over months).
Both of those examples suggest to me that not 100% of the psychological benefit of ketamine is due to the subjective trip experience or to insights gleaned while on the trip.
In a meta-analysis I found just now the researchers compared the 15 best studies (out of 700+) on the affects of ketamine adminstered during surgery on post-surgical depression. Some subjects had had general anesthesia, and so were unconscious when the ketamine was administered, and some had had a spinal, so were awake to expdrience whatever state of mind ketamine induced. Both groups benefitted equally. This result came not from a single study, but from combining the results of all the studies within the group of 15 that were relevant to the question. So this result very much weighs in the direction of the subjective experience of ketamine intoxication being irrelevant to its benefits. In another post I give a link to the meta-analysis and more details.
Oh good on you for finding that!
My own unfounded theory is that the subjective experience can be meaningful and valuable and therapeutic in various ways to some people and therefore may significantly enhance the therapeutic benefit some unknown amount, but that the meaningfulness of the experience does not translate directly into the measured antidepressant effect we see.
I think there's enough we don't understand about the mind and that depression assessments are pretty crude tools so that I'm reluctant to say the subjective experience is irrelevant to the antidepressant effect. I suspect it is significant at the individual level for some subset of people but that the importance of that effect is lost in the group results.
I can wrap my head around the idea that some of the value is in physiological stuff working in the background. I can't wrap my head around a study -- with a small cohort no less -- that removes the entire experience of the drug.
This was also my immediate reaction, and it sounds like the sort of issue listed under "Third" that Scott says he is skeptical about for reasons he will discuss later — but then the discussion didn't seem to come. Perhaps he meant "later, in a separate post"?
I was wondering about that myself. I am very much in the camp saying the “experience” is everything.
Sorry, my discussion was that I give some patients pretty low doses that don't cause dissociation at all and they seem to do pretty well.
Ah, I see how that fits together now. In that case, it's interesting that people like Evan Sp are so convinced that the conscious experience is such a central part of the effect. Are there studies or anecdotal evidence suggesting that higher doses (sufficient to cause dissociation) might be more effective, e.g. in patients who fail to respond to a lower dose? Or that for intermediate doses, patients who experience dissociation are more likely to experience subsequent relief from depression?
Hasn't there been a bunch of research on similarly powerful effects of LSD and psilocybin under the right circumstances? i.e., different chemicals, similar conscious inducement of connectedness, our own smallness in the universe, etc.
This is absurdly distant from any of my areas of expertise, but a cursory literature search didn't turn up any attempts to test therapeutic efffects of LSD or psilocybin under anaesthetic. (I could easily have missed something.)
The closest thing that has been examined somewhat extensively is probably microdosing, which is a subject of ongoing research — recent review here: https://pubmed.ncbi.nlm.nih.gov/36381758/
I take it you're not a psychiatrist? Neither am I, but I can see why Scott's instincts might diverge from those of a non-psychiatrist when considering whether conscious experiences (of dissociation, hallucination, etc.) should be necessary to produce therapeutic effects.
There is strong evidence that conscious beliefs about the causes of our mental states are a retrospectively constructed fiction. If we have a powerful conscious experience of dissociation or hallucination and subsequently change our overall outlook on life (e.g. relief from depression), we tend to assume that the experience was responsible for the change in outlook.
But especially in the case of drug-induced experiences, this assumption is liable to be mistakenly assuming causation where there is really only correlation. The drug causes a whole bunch of neurological effects across the brain, and the neurological effects causing the conscious experience might be different from the neurological effects causing the change in outlook. By changing the conditions under which the drug is administered (e.g. lower dose, anaesthetic), it might be possible to get the change in outlook without the conscious experience.
(These rambling thoughts partly inspired by Scott's old post, as well as some of the discussion in the comments:
https://slatestarcodex.com/2019/09/10/ssc-journal-club-relaxed-beliefs-under-psychedelics-and-the-anarchic-brain/
Epistemic status: wild speculation, possibly reinventing a square wheel that has already been conclusively demonstrated not to work.)
But ketamine at sub-anesthetic doses really is not much of a hallucinogen. It's classed as a dissociative.
Yes, it's not a classical hallucinogen but it has hallucinogenic effects, including visual hallucinations, euphoria, time distortion, especially at higher but sub-anesthetic doses (k-hole).
Placebo controls for drugs people can feel are always fun.
The classic for psychedelics is niacin, which gives you a flushed face at high doses but people nearly always know they haven’t been given magic mushrooms or whatever which makes it less useful as a placebo (from Aday et al. “divinity school students were assigned to receive psilocybin or niacin, a B vitamin with mild physiological effects, in a group setting at a chapel (Pahnke 1963). Despite some initial confusion because of niacin’s fast-acting effects on vasodilation and general relaxation, before long, it became clear which participants had been assigned to which condition, as those in the psilocybin group had intense subjective reactions and often spiritual experiences, whereas the niacin group “twiddled their thumbs” while watching on”.
On the other hand lower doses of alcohol can be very convincingly placebo-ed by giving orange juice but stuffing vodka soaked cotton buds up everyone’s nose, or secretly dipping the rim of a glass of cranberry juice in vodka (Bowdring 2018).
It's interesting that niacin is the placebo, given that it's a main ingredient in the so-called 'Stamets stack'.
It is not just propofol and midazz, benzos in general are fairly effective against depression, but are off-patent and so nobody cares. Since benzos are generally administered before anaesthesia/surgery to calm down the patient and to reduce anaesthetic consumption, I would highly suspect this contributes to the large effect in the control group. This likely also renders the midazz controlled studies useless.
I have no personal stock or interest in this debate (not even professional interest, tbh), but it would seem to me that the common link between a lot of mental disorders (i.e. anxiety, depression, OCD) is a sort of runaway of the sympathetic nervous system. This simultaneously explains the overlaps in these syndromes, the overlaps in therapeutic responses to SSRIs observed for a variety of similar disorders and their kinda mid efficacy against any of them (they simply aren't a very good smypatholytic). It also ties in nicely the original reason SSRIs were developed (hypertension, also partially thought to be due to increased sympathetic drive). Begs the question what is causing this?
Right! I had surgery not so long ago and they pumped me full of benzos and fentanyl right before the surgery in addition to whatever else I got on the table. I didn't take opioids afterwards, but they were prescribed to me. Also, everyone is really really nice to you and they call and follow-up and want to see you in a few weeks to see how you're healing and people make you tea and bring you soup (if you're lucky). And you feel really relieved that you don't have to go back in and do it again next week. And managing physical pain -- even with ice or Tylenol, every few hours, and having to treat yourself more gently, etc -- focuses the mind for a while, as does watching the pain slowly get better and the surgical wound heal.
This is so dumb. It’s the experience of the ketamine that’s therapeutic. The experience is nil if you’re under anasthesia. Case closed. Waste of time.
Yup! You would need to be completely closed off to the experience or testimony of hallucinogen users, depressed and otherwise, to think this makes any sense at all.
(Don't mean to sound like a booster or that it's for everyone...it's genuinely not. But, like, it's an experience with an effect, not a medicine that works while you sleep.)
I agree. I think Memory Reconsolidation is the foundation of most if not all therapeutic effect and for that to work you have to 1) Activate the target schema 2) Activate a contradictory feeling. You have to be conscious to do either of those.
http://sequentialpsychotherapy.com/assets/bbs_lane-et-al.pdf
But this isn't how we think SSRIs have therapeutic effect for instance? Why would we think that ketamine works like short-term psychodynamic psychotherapy, etc?
It might indirectly. SSRI's could conceivable enhance reconsolidation by reducing overwhelming autonomic nervous system states, for example. I'm speculating.
Well... this study somewhat supports the theory that antidepressant effect of ketamine is entirely psychological. I would be interested in seeing the same study done with SSRIs instead of ketamine.
entirely psychological vs what? (are you differentiating psychological from psychiatric? are these really separable? I'm not 100% structuralist, if anything maybe I'd say there is a spiritual element. But Belief/Thought and neurochemical states are bidirectional)
The history of science is full of things that people assumed were dumb or obviously wrong, but then turned out to be true anyway (and vice versa).
True, but I would be willing to bet this isn’t one of them.
Yet Scott says:
> And my patients’ experience is that it works even at low doses that produce no dissociative or ego death effect. I usually prescribe it at about 70 mg intranasal. Some of my patients report feeling a little drunk or giddy on this amount, but nothing like the k-hole that people report at the really high levels. Other patients report nothing at all, but still feel better.
So there’s good reason to try to isolate it from the experience and test the effect
Sorry, people report feeling no effects from 70mg intranasal? That's... surprising. Is there some kind of genetic variant that makes you immune to dissociatives?
Naturally dis-associative?
I think there's another confounding factor here: people are likely to have higher levels of stress due to an upcoming surgery. The authors refer to these as 'minor' surgeries, but a surgical procedure is only routine to the surgeon. The paper says the length of the surgeries was 4 hours (+/- 2 hr SD). That's basically an all-day procedure for a patient. Not an in-and-out kind of thing, even some of these were outpatient (though I didn't see an inpatient/outpatient breakdown in the paper).
They measured these patients up to 5 days pre-op, meaning the patients knew of their upcoming surgery at least a week in advance (otherwise they'd never have screened into the study) and had all that time to fret about complications and the like. Meanwhile, post-op they were told whether there were complications (looks like only 1 patient had complications) so the uncertainty about long-term bad outcomes would have resolved at that point.
TL;DR: Because of the stress of upcoming surgery, there's clearly still lots of room for regression to the mean in this study design.
I look forward to reading this researchers follow up paper "Randomized trial of Cognitive Behavioral Therapy masked by surgical anesthesia in patients with depression"
“Unfortunately none of them did the assigned homework.”
That's hilarious.
That’s funny
There's also the Dead Twin Study: Living twin and dead twin have the same genes, but very different environments . . . Note the huge differences in behavior, all clearly environmentally determined.
As an anaesthetist, my first impression was that there would be simply too much interference from the anaesthetic itself. Not just from the lack of consciousness that will clearly interfere with any of the putative benefits around changed appreciation of conscious perception. Rather, that the brain state is so different under anaesthesia that it is surely challenging to infer anything relating to non-anaesthetised life based on this. That seems especially true when the feature in question is something as complex and organised as this.
That makes a lot of sense to me.
Sort of an aside:
Do doctors prescribe antidepressants at sub-therapeutic doses to see if the placebo effect is enough to help the patient before increasing the dose (to levels where side effects become more of an issue)?
This seems like a good idea. What am I missing?
I don't exactly know the story behind this, but my mom has been taking ~35 mg bupropion per day for the last year (half of a 75mg tablet). Standard therapeutic doses for bupropion are more like 150-450 mg/day. She reports very positive results.
(While this is pretty nonstandard, it's more common for doctors to start with a small dose of an antidepressant and taper up. Depending on the medication, lower doses can be basically an active placebo.)
That’s it exactly. Glad it’s helping!
Also if a person is a poor metabolizer of certain classes of drugs because of liver enzyme genetics, much smaller doses may genuinely be as effective for them as standard dose is for everyone else.
Sounds like another good reason to deviate from the standard dosing schedule.
Does the "grapefruit effect" from furanocoumarins apply to these medications?
The primary mechanism by which grapefruit interacts with medications is by inhibiting CPY3A4 - you can check which drugs have that as a major metabolic pathway. (Apparently there is also a _different_ mechanism by which many citrus fruits, including but not limited to grapefruit, have an additional drug interaction. I don't understand that one.)
Many Thanks!
Years ago, I knew a psychiatrist in SF who prescribed SSRIs for some of his patients at nearly homeopathic doses and claimed to have a lot of success with that. Though as I say that now I realize he must have had a compounding pharmacy do it since tiny dosages aren't otherwise available.
Not entirely related: I had trouble getting off of the lowest dose of gabapentin and my doc had a compounding pharmacy make me a liquid solution and I was able to taper down so gradually that I didn't have any side effects whereas I'd tried many times before and the withdrawal symptoms were terrible.
I have a patient who swears that 2 mg of Lexapro solved all their problems. As far as I can tell, they're right. No good explanation. My guess that all drugs have so many effects that sometimes some effect helps them other than the usual one which we consider the mechanism of action, and that effect works at a different dose. But it could also just be a very weird metabolism.
"Placebo effect" isn't a good explanation because they previously were prescribed other drugs/doses that didn't help?
The doctor will be unable to tell whether they're seeing a placebo effect or, alternatively, giving the drug to a person who is just extremely responsive to the drug and only needs a low dose.
In either case it sounds superior to immediately hitting them with a higher dose.
That's what most psychiatrists do. For almost all psych meds you start with a low dose and raise it slowly. If the person says, while still at a low dose, that they feel much better, doc stops increasing the dose. If person starts declining, they recommend increasing the dose. Even if the improvement does not diminish, they might after a while recommend trying the next higher dose to see if there is more benefit to be gained.
I guess I was thinking more along what Radar was saying. Starting at say a quarter of what is now considered a starting dose where there is little chance of side effects.
Yes, a quarter of the usual dose is about where psychiatrists start, sometimes lower if it's a drug that can have especially unpleasant side effects.
But why not start with a homeopathic dose first, without the patient knowing? Something to do with accursed bioethicists, as usual?
I've wondered about this question myself, and even asked an anesthesiologist about it before he was about to put me under (ketamine/fentanyl/midazolam). I got this mix twice actually, the first time the ketamine was clearly withdrawn later than the other two drugs, and that was a crazy experience when coming back to consciousness - the second time was just waking normally but drowsy. In both cases though my immediate post-surgical state was dominated by pain and dealing with the surgery and opiate haze and I detected no discernible effect on mood.
One theory is that the effect requires neural annealing-type processes, and the presence of a heavy benzo dose basically disrupts any beneficial memory reconsolidation. I could believe this given my experience.
The S-ketamine vs R-ketamine debate is also relevant here. Some folks think R-ketamine helps just as much or more despite not having the psychoactive effects, due to non-NMDA mediated effects. That shouldn't be too disrupted by anesthesia though? Or the most potent mixture is actually racemic ketamine which gives you *both* a positive mood push *and* the psychoactive effects and everything resulting from those.
> Since this happened in both the ketamine and placebo groups, the obvious guess is “placebo effect”.
Unless I'm misunderstanding the experimental design, I'm confused why your obvious guess is "placebo effect" instead of the effect of surgery plus anaesthetic plus post-surgical painkillers.
Was there a third experimental arm where they administered the MADRS with neither ketamine nor a placebo to check whether surgery itself makes you more depressed?
Scott, I'd love to see you address this. Is there a reason to think that people who aren't in a ketamine study at all don't show similar effects post surgery?
Upon second reading, it turns out I misread the sign, and surgery actually makes people _less_ depressed overall, which is a surprise to me. I've had surgery twice and it wasn't fun, but then again I wasn't depressed.
This might be one of those counterintuitive things about depression which I remember from an earlier Scott-post about the failure of Covid lockdown to increase suicide rates as expected. Depression requires you to feel like things are bad _and_ they'll never improve, so a short term shock like surgery, which makes you feel terrible in ways that you're confident will go away within weeks, might relieve depression a bit.
Then again it might just be the effect of a massive cocktail of anaesthetics and subsequent painkillers.
In an interview, Dr. John Krystal, the Yale professor who pioneered ketamine as a treatment for depression, is very specific that dosage is important for getting antidepressant effects from ketamine.
He specifically says that if the dose is too high or too low, you don't get the antidepressant effects.
I'll spare you the long quote about it, but if you go to the transcript here https://tim.blog/2022/10/03/dr-john-krystal-ketamine-transcript/ it starts at about here "And in 1997, she published a paper that showed that ketamine released glutamate in the brain..."
Is there any evidence that depression is something other than persistent defense cascade activation combined with various maladaptive schemas?
Ketamine never worked for me. Then MDMA-therapy was the magic solution (though it took quite a few sessions). My research on the topic leads me to see most mental illness as persistent autonomic nervous system defense cascade (fight or flight, dissociation, etc) activation mixed with various maladaptive schemas (integrated memory structures containing emotional reactions, episodic memory, and semantic memory). The ANS activation seems to be primary a response to fear or bodily damage. So I think the primary treatment is healing the underlying fear. And it seems somewhat well established that the primary mechanism of durable treatment of maladaptive schemas (and the fear underlaying ANS activation) is memory reconsolidation.
https://doi.org/10.1097/HRP.0000000000000065
http://sequentialpsychotherapy.com/assets/bbs_lane-et-al.pdf
>Is there any evidence that depression is something other than persistent defense cascade activation combined with various maladaptive schemas?
Yes. In some depressions it is pretty clear that there is something biological going on. The person moves and talks slowly and cannot speed up, They become so constipated that they need to be disimpacted by a nurse. They can't stay awake and can sleep for 20 hrs at a time. They can't stand to eat, or can't stop eatingl
At my worst, I was sitting on my couch and noticed a fly land on my arm and crawl around, and I made a mental note to myself "I should wash that later", but was unable to move even enough to shake off the fly.
I'd had "chronic depression" on and off for most of my life, which generally responded well to low doses of fluoxetine, but that was nothing next to this thing. I don't even think they should be called by the same name.
That sounds a lot like chronic dissociation? Or some schema that includes a belief that nothing is worth doing because nothing helps?
Truly terrible. I can relate in the feeling, though I never quite reached that degree of severity. It can be hard for people who've never been depressed to empathize with being in the position of not only needing a path to recovery, but also needing the steps on that path to be compatible with the tightly limited amount of motivated behavior a person with severe depression can muster daily.
If you can forgive me for appending a bit of dry science to your personal story (hey, it's what I do), this fascinating paper ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1572268/ ) shows that a depression-linked serotonin receptor (that is, more receptor <--> more depression) blocks the activity of NMDA signals in the neocortex. These signals that would eventually travel down the spinal cord and translate into physical activity, now require greater effort / greater sympathetic activation / greater *stress* to send out.
What a shitty disease.
Is this why anxiety and depression so often occur together? If greater stress is necessary to activate a typical amount of physical activity, a depressed person’s body may adapt over time by activating the ANS more frequently in order to compensate.
I can't answer your question, but this does make intuitive sense to me. I feel like it must be the case at least some of the time.
Here is an anecdote for you:
Some decade and a half ago, I was in an abusive relationship that I badly wanted to leave. I had tried many times (I lost count of how many) to break things off over the previous five years, but I was in a state of depression, and I just did not have the mental strength to resist my then-boyfriends persistent pleas, demands, and threats to take his own life. I always caved because it was easier in that moment, and I simply could not muster my motivation any further.
When the relationship started to get genuinely scary, I suddenly felt the mental fog lift a little. I was still horribly depressed, mind you, but it was enough that I could formulate something of a plan. And I realized that a major cornerstone of that plan needed to be the cultivation of my mind into a more heightened state. There were several things I tried, but the most effective was this: I started writing down everything about my then-boyfriend that made me angry, and everything that made me scared, and I reviewed that list daily. Whenever I thiught of a new thing, I added it to the list and really pushed myself to dwell on it.
By the time I was ready to put my plan into action, I was seething and terrified, and that carried me through -- even though the process of leaving turned out to be far more stressful and complicated than I had expected, and ultimately required me to completely upturn my life (including leaving the country).
I look back in wonder that I got through it; even though I am no longer depressed, and indeed absolutely love life now and enjoy a high degree of motivation in my daily life, I don't know that I'd have the mental energy to execute a plan like that again without rage or terror to propel me.
I undoubtedly gave myself some lifelong trauma by doing this but, hey, I'm alive. It's my battle scar.