137 Comments
User's avatar
User's avatar
Comment deleted
Apr 25, 2023
Comment deleted
Expand full comment
quiet_NaN's avatar

I don't think so.

In general, LLMs have the ability to process much more domain specific knowledge than humans. If you train a LLM with lots of chess games, it will probably use most of its parameters on chess related things. I am not sure a non-chess person in the intelligence bracket of grandmasters who trained using as many games as the LLM gets as input would do any worse.

What would be much more impressive would be to have a LLM which has not seen any chess games at all, and gets told as a prompt the rules of chess and from that information alone is able to beat grandmasters. That one might then be a superintelligence.

Expand full comment
Isaac King's avatar

Relevant to AI forecasting: Someone set up a bot that queries the GPT-4 API and uses its responses to bet on Manifold Markets. It leaves a comment with every bet explaining its reasoning.

It's... not great. Currently in the negatives, and doesn't look to be coming back from that any time soon. But this is comparable to the average Manifold user, so I think it could reasonably be called a "human-level" forecaster.

https://manifold.markets/GPT4

Expand full comment
Sheikh Abdur Raheem Ali's avatar

RLHF model has worse calibration than the base model, so I'm not particularly surprised at this result.

Expand full comment
Arie IJmker's avatar

What if you reroll the response to recreate the wisdom of the (AI) crowd.

Expand full comment
rotatingpaguro's avatar

I like that, out of 3 resolved markets it participated in, the GPT-4 bot correctly predicted GPT-4 would pass a certain difficult test:

https://manifold.markets/tftftftftftftftftftftftf/will-gpt4-pass-the-triplebyte-codin

Expand full comment
duck_master's avatar

> Currently in the negatives, and doesn't look to be coming back from that any time soon.

GPT-4 actually had positive profit from March 19 (when it started) to March 21, as well as from March 28 to April 14 (source: click the "ALL" button), so I'm not that pessimistic about the AI dipping too far into the red.

Expand full comment
Gres's avatar

Would a bot that always bet no also be human-level in this sense?

Expand full comment
Coagulopath's avatar

>Actual GPT-4 probably would just give us some boring boilerplate about how the future is uncertain and it’s irresponsible to speculate. But what if AI researchers took some other model that had been trained not to do that, and asked it?

Sounds like a job for a certain Chad McCool...

Expand full comment
Roxolan's avatar

But then it'll be playing a character that's biased towards the cool or non-PC predictions.

Expand full comment
Leon's avatar

Who decides architectural significance? And how intact must it be?

Expand full comment
4Denthusiast's avatar

From the description of that question on Manifold Markets: "Will resolve to Scott Alexander's judgement if he provides it; otherwise, I will use my judgement and reports in reliable sources to try to answer it". The disagreement was basically about whether cultures that could produce things like that existed at all at the time, so presumably it only has to be intact enough for the evidence of what it was originally like to be reasonably clear.

Expand full comment
Fedor's avatar

Highly relevant for the next ~day.

Please buy 1 No share on this Manifold Market if you want Prediction markets to succeed: https://manifold.markets/IsaacKing/will-the-whales-win-this-market#R1Q3FCdYceg2c55fvpw8

Isaac King (The guy in the chronological post below) just dropped 22k$ to win this prediction market. All of that money is going to Manifold. If you bet 1$ No, then he'll have to spend 100$ more on Manifold. 5 seconds of work for 100$ for Manifold Markets, this way they'll have more runway.

Expand full comment
Shaked Koplewitz's avatar

Anything that makes it more expensive for Isaac King to be on Manifold is a win in my book.

Expand full comment
Robert C's avatar

Isaac's real world markets have made my time on Manifold much more enjoyable. I'd be very sad if he was no longer on it.

This WvM market has gotten ridiculous.

Expand full comment
Damien Laird's avatar

Re: Autocast. Interestingly, you can't just use GPT-3 or GPT-4 to perform better on their dataset. This is precluded by the rules because the models were trained after the resolution dates of the curated forecasts, meaning the training data is likely tainted with the correct outcomes. I still agree more powerful LLMs will forecast more accurately, and I'm especially excited about the parallels between "let's think this through step by step" and fermi estimation. I wrote about these and related topics here: https://damienlaird.substack.com/p/research-forecasting-with-large-language

Expand full comment
Ch Hi's avatar

More powerful LLMs *will* forecast better than less powerful ones, but I'd be surprised if they averaged as well as a human. Pure LLMs don't have any real idea of physical space, etc.

OTOH, I doubt that we're actually seeing any pure LLMs. A pure LLM would never give you a lecture saying "don't ask for that". The lecture-response questions prove that we aren't seeing pure LLMs. They don't prove that the canned responses are the result of something AI beyond an LLM. But they *could* be. (Beyond doesn't mean "more powerful", it means "powerful in a different area". One example of "a different area" would be converting sensory perceptions into ballistic trajectories.)

Expand full comment
Eremolalos's avatar

I read you article, & that's really interesting stuff. I've been making up puzzles for GPT4 to solve, trying to get a better feel for how it "reasons", and one thing I've noticed is that if the question you ask it is kind of offbeat, it is not good at pulling in the misc bits of info it almost certainly has to help it answer the question. It does much better if it is clear what iinfo it needs to access to answer the question. I think this difficulty with selecting, without guidance, what info is relevant is something that would affect GPT's ability to forecast. I'll say more about that, but first let me explain more what I mean about GPT's not using relevant misc. info it has. Here's an example: I asked how a person with nothing but blue jeans & a pocket knife could escape from a 40 foot tower, and while it did say the person could cut his jeans into strips and make a rope, there were a lot of things it skipped over: It did not take into consideration that it's desirable to use the thickest strips of cloth possible, to minimize the chance the rope will break, and some factors that affect how wide the strips can be and how long the rope actually has to be. Obviously shorter ropes can use fatter strips. It did not mention that. It did not consider that you really only need a 30 foot rope, because if you drop from a spot where your hands are 10 feet from the ground then your feet are only 3 or 4 feet from the ground and when you drop you won't get hurt. It did not consider rope stretch, though denim often has some stretch, and the rope would certainly gain some length from the tightening of the knots between segments. It did not mention the avg. diameter of adult male jeans legs, which is important info if you're trying to figure out how to use the widest possible strips. I'm sure if you asked GPT about each of these points directly, it would have answered correctly: Should you try to maximize strip thickness? Yes. Is the rope likely to stretch? Yes. How short can we make the rope while still ensuring that the drop from the end won't injure the person descending. about 30 feet. ETc. But it doesn't ask itself these questions on its own.

So the way this is relevant to GPT forecasting is that it seems to me that good forecasters cast sort of a wide net. You ask a forecaster something for which there's no clear way to get an answer -- for instance, will the suicide rate for the country Z-land be at least 5% higher this year than last. You look at Z-land stats for the last 10 years and suicide rate is zigging and zagging, with no clear trend. After that, its up to you what you look at. You might look at events in the country -- or trends in similar countries -- or trends in Z-land in things you think are likely to affect suicide rate -- or trends in world suicide rate -- or use some tool that tells you how many dark words ("horror . . . tomb . . . grief,") appear in Z-land newpaper headlines now and compare that to the past. But it seems like the process of coming up with something to use as a basis for predicting is not straightforward -- you have to let your mind range widely over a lot of possibilities, none of which have a label on them saying "good suicide predictor." And it is that step that GPT seems to be pretty unable to do.

What you think of that?

Expand full comment
Damien Laird's avatar

Thanks for the thorough comment. I agree that what you’re highlighting is a current weakness of LLMs relative to humans in a lot of applications. That being said, I can come up with enough possible mitigations (as a naïve outsider) that haven’t much been tried yet that I’m pretty confident that weakness will vastly diminish over time.

Things that come to mind…

1. Better prompting. In your example in particular I’m very confident a prompt requesting a more detailed explanation could have gotten one (even if not to the full level of detail you were looking for). I don’t think prompting gets us all the way to human-comparable forecasting, but I’m sure it helps and will be combined with the other items on this list.

2. Scale and architecture. GPT-4 seems much, much better at pulling in relevant scattered details from its training than GPT-2 was. As we continue to improve models, how much of this problem will just resolve itself? Not at all clear to me, and this could just be solved over time as a side effect of other improvements.

3. Increased context-window. GPT-4 has a 32k token context window coming soon, and recent research is teasing the possibility of context windows of 2 million tokens. These are gargantuan compared to what we’ve had in the past. The human judgmental forecasting paradigm is to draw mostly from your existing knowledge base and then add some relatively light specific research to augment this. LLMs don’t need to work this way if we can pass them a huge amount of specific information to parse through as part of a prompt. To re-use your experiment, with a 2 million token context window you could just pass it books on rope making, stories about prison escapes, or a bunch of examples of explanations with satisfactory levels of detail. Not only does this increase capability, it increases the design space of prompts.

4. Supporting architecture. Just like people are experimenting with agent-y embeddings of GPT-4, like AutoGPT, you could nest an LLM within an architecture of external resources that makes it better at forecasting. Maybe it starts by searching the web for resources related to your forecasting prompt, then it summarizes the most relevant aspects of those for itself in memory, then it combines those into a forecast structured as a list of updates on whatever its original prior was? You could systematize bayesian updating this way, just as an example of one architecture to try.

I’m not saying any of these is a magic bullet, nor am I claiming a particular timeline… but my intuition is that forecasting with LLMs is extremely ripe for innovation and that this potential is continuing to grow even with few people currently paying attention to this particular application.

Expand full comment
Lackadaisical Enkrateia's avatar

Talking of bets, anyone want to post another attempt at those stained glass windows Scott made a bet about, with today's models? I'd love to see what midjourney v5 cooks up.

Expand full comment
Kenny Easwaran's avatar

That sounds like a great idea! It's hard to believe that this was less than 11 months ago!

https://astralcodexten.substack.com/p/a-guide-to-asking-robots-to-design

Expand full comment
Eremolalos's avatar

I've mostly used Dall-e, which is what Scott was using, but have experimented a little bit with Midjourney. I actually ended up quite frustrated. I would describe what I wanted very very clearly: A large crowd filling an entire valley with this and that in the air above, and in the foreground somebody doing a certain thing -- and it just did a very loose interpretation of what i asked for. For instance I'd get a crowd of 40 or so people, and the thing I'd wanted in the air above the crowd was on the ground next to them. And the whole thing looked very finished and glossy, in a commercial art kind of way. Producing precisely what one asks for is not the only thing that matters, but it sure does matter. It seems to me Midjourney's strength is producing stuff that's very pretty in a commercial art sort of way. Of course my familiar Dall-e won't follow directions either, but it introduces such ugly, peculiar variations that I'm sort of won over by them.

Expand full comment
Eremolalos's avatar

I actually did get Dall-e to make a good Darwin with Finch. It's here:

https://photos.app.goo.gl/AZkeFxN8MD7fqLHr7

Never tried to get better versions of Scott's other images though.

Expand full comment
Carl Pham's avatar

Prediction, especially based on detection of subtle patterns in enormous noisy data sets, is certainly a good idea for neural network models -- indeed, that's kind of the reason they were invented. But it seems like if you train it on *only* data before "the present" and you reward it for predicting stuff that humans are likely to want to hear, you're going to get the conventional wisdom of "the present" ipso facto -- which is boring. You could have gotten that a lot cheaper with a poll.

But on the other hand, if you train it on predicting stuff that humans "in the future" (when the prediction came true or not) want to hear, then you could indeed be training the network usefully. That seems quite promising, a way to leverage all this enormous text data collection to do something way more practically useful than making a talking robot.

Expand full comment
Eremolalos's avatar

When I was an adolescent, having vaguely mystical daydreams about math, I used to wonder if there was one formula that summed up absolutely everything -- from what I wore today to exactly how fast a pair of binary stars we don't even know about are circling each other. I pictured it as being enormously long and complicated and ugly. It wouldn't have predictive power, just would sum up everything up til now. The formula that summed up everything up thru tomorrow would be different. It seemed to me that there had to be one. Anyway, talking about GPT improving its ability to predict the future reminds me of that, and creeps me out.

Expand full comment
Carl Pham's avatar

Well, it if helps, the Einstein field equations, which predict the orbits of all the heavenly bodies, the existence of weird things like black holes and the expansion of space, and with which we attempt to predict the origin and future of the Universe, are these:

G_μν + Λ g_μν = κ T_μν

That's it! I mean, there's a whole lot of subtletly buried in those symbols, of course, but the equations themselves are amazingly simple in form. And if you want to add in electromagnetism, which explains all of chemistry, and pretty much all earthly forces and physics of daily technological importance, from semicondutors to radio and how bridges stand up, that's all contained in Maxwell's equations, which are also very compact and simple looking[1].

There have been passionate arguments recently that physics pays too *much* attention to the quality of elegance and simplicity in governing equations -- that we should accept that the best view we can get of things might be messy and weird, ugly math. But it's still a powerful influence on how people try to come up with mathematical theories: they strive for simplicity and symmetry, as if the esthetics are some weird guide to the truth. Maybe they are -- I mean, I kind of hope they are -- or maybe we are deluding ourselves.

----------------

[1] https://www.amazon.com/Maxwell-Equations-There-Light-Shirt/dp/B07KVSVK8G

Expand full comment
Eremolalos's avatar

Yes, the esthetics of it have always seemed important to me too. In the maxwell equations what do the point-down triangles with arrows over them signify? I've always loved e to the i pi equals minus one. Like that ties it all together with the most elegant of knots.

Expand full comment
Carl Pham's avatar

Those are gradient operators, a 3-dimensional derivative. For a 1-D function, like the kind you plot in algebra (f(x) = x^2 or something), when we ask for the rate of change (the derivative) we only need another 1-D function (f'(x) = 2x, meaning at every point x, the rate of change of x^2 is 2x, so at x = 4 the value of x^2 is changing by 2x = 8 every time x changes by 1).

For a 3-D function, meaning f(x,y,z) a function defined at every point {x,y,z} in a 3-dimensional sspace, when we ask for the rate of change we need another 3-D function, because we have to give the rate of change along each of the x, y, and z axes. So we write the derivative differently, instead of "df/dx" in 1-D (meaning "find the derivative of f") we say ∇f, meaning "find tthe 3-D derivative of f". I don't know why the upside-down delta was chosen to represent this operation. (When we go to four dimensions for relativistic equations, we use a square ☐ which we call the "box operator" and I'm pretty sure that's just because it's the 4D equivalent of the 3D symbol, and since the 3D symbol is a triangle we use a square.)

There's an additional complication in Maxwell's equation, because the electric and magnetic fields are not just 3D functions but 3D vector functions, meaning they have 3 values at every point {x,y,z} in space. That's because the electric and magnetic fields have a direction as well as a magnitude, they point in some direction or other. So we can define still more complicated derivatives, which can be written as combinations of the 3D derivative function and vector multiplication, so that's why we have the ∇ followed by a dot or cross operators before the variables for the fields.

Apologies for however much of this you already know.

Expand full comment
Eh's avatar

And now write down the initial conditions / boundary conditions :-D

Expand full comment
Carl Pham's avatar

ouch

Expand full comment
Thor Odinson's avatar

Similarly, the QFT Lagrangian for the universe fits on a mug, with the standard set of abbreviations

Expand full comment
Carl Pham's avatar

Come, it can hardly claim to be for the universe if gravity is missing. Although...maybe you just need to write down HΨ=EΨ and you're all set:

https://journals.aps.org/prd/abstract/10.1103/PhysRevD.28.2960

Expand full comment
Metacelsus's avatar

>We’ve talked before about LLMs playing chess; they can sort of do it, but they’re not very good yet. The market thinks 34% chance they’ll get much better in the next five years; I think my estimate is lower.

Can LLMs exceed the performance of human experts? Even if they are trained on data from experts, surpassing that level would require predicting something *different* from what an expert would do. But LLMs are trained to predict the "next move" [token] as accurately as possible given their training data.

Expand full comment
Jon Cutchins's avatar

This is a good point. Training on a dataset for chess is a very good way to win board configurations that someone has already found a way to win and put it in the dataset but does not seem likely to be able to win a board configuration that is not in the dataset.

I see grandmasters developing a particular antiSI(Simulated Intelligence I refuse to describe this technology as Artificial Intelligence) strategy of building board configurations that are different from what is in the training data. It would be a great way to spark creativity from the grandmaster community, but anyone expeting a creative victory from the SI will be very disappointed for the foreseeable future.

Expand full comment
beleester's avatar

I'm not so sure. GPT does more than just memorize configurations, and a lot of chessboard configurations online are going to appear next to information about chess principles like "the bishop in this position controls the main diagonal" or "here White sacrifices his queen to deflect the enemy rook". And I've seen it do other spatial reasoning tasks which couldn't be easily memorized (things like "draw me a 10x10 grid with X's at these randomly-chosen coordinates.") So it's at least plausible that it could learn some actual chess principles and extrapolate on its own. I don't know if it could actually win a game without further training or prompting, but I wouldn't say it's definitely impossible.

Anti-AI strategies in Go have had some success, but Chess AFAIK doesn't have enough unexplored territory for a human to throw an AI off its game. Once you get up to Stockfish's level any mistakes will just get you crushed.

Expand full comment
Jon S's avatar

It's plausible they could be trained on play by computers that are better than humans (stockfish). But presumably if it achieved strong enough reasoning skills it could do even better ("Imagine you're a next-generation chess AI with an elo of XXXX..."

Expand full comment
Carl Pham's avatar

If that worked in general, we could use it. I could just imagine I'm Albert Einstein before tackling the next math problem, and do much better.

Expand full comment
Eh's avatar

But you are not aiming for next token prediction so this won’t work on you. LLMs instead are not striving to solve problems or anything of the sort unless that makes them better at predicting the next token. Hence the trick

Expand full comment
Carl Pham's avatar

OK, I will tell myself I am Albert Einstein and I am not attempting to "solve" this equation, I am just attempting to predict the next mathematical token.

Expand full comment
Sempervivens's avatar

It needn't work in general, just specifically for LLMs - I'm pretty convinced it usually does for LLMs, do you dispute that claim or are you just claiming that it doesn't work for all intelligences (notably humans). If the latter, it seems you're talking past each other.

I've never asked LLMs to play chess, but telling them they're world-class at whatever they're doing does usually improve performance a lot.

Expand full comment
Deiseach's avatar

"telling them they're world-class at whatever they're doing does usually improve performance a lot."

That sounds like something to be investigated; are they sorting out their responses to questions based on some scale (e.g. 'give the idiot's answer unless asked for high-level explanation'?) If the training material has been sorted out into general level for ordinary people, more technical material, and then the very technical material for professionals in the field, the AI might be tailoring its answers along those lines.

And I think we should find out if that is what is happening, before we start blindly trusting whatever answer it comes up with.

Expand full comment
Carl Pham's avatar

Do I dispute the claim that merely telling the LLMs to "be intelligent" actually increases its intelligence? Of course, it would be weirdly naive to even countenance that hypothesis, it's as a priori magical as thinking if Pinocchio wishes hard enough he'll become a real boy. Is it the case that telling the LLM to simulate a response from a more or less intelligent human alters the nature of the prompt, and can alter the nature of the response? Also of course. Does the AI have the ability to distinguish between "intelligent" and "less intelligent" responses? Probably, to the extent that its training corpus contains some markers of what other human beings think is intelligent and non-intelligent discourse.

So that means if you tell the AI to "be smart" it will give responses that correspond more closely to the responses which other human beings have signaled as part of its training data that human beings consider "smart." How this will help solve any useful nontrivial problems I have no idea, by definition you're just going to get the conventional wisdom. As I said originally, it seems to be about as useful as me telling myself "be smart!" before I tackle a problem.

Or to put it another way, if you're using an AI to try to solve an actual problem, and you *fail* to tell it to be as smart as it can be, meaning "restrict your output to only those statements which human beings consider as smart as possible," then you have been careless in your prompt, and are not using the tool to its full capacity. And parenthetically that's bad input interface design. A better designed model, if intended for use in solving problems, would have "be as smart as possible" as its default setting.

But either way, all you're going to get is the conventional wisdom, whatever human beings generally consider to be an "intelligent" response. So you won't be solving any problems that human beings can't already solve, because those solutions, even if they exist, have already been marked by human beings (in the training data) as "dumb" or "not going to work."

Expand full comment
quiet_NaN's avatar

Human intelligence is not shaped by predicting the next token, not to the degree that LLMs are.

If a LLM is trained on a million chess matches, most of them will probably not be by world class players, so the shoggoth has no reason to simulate a particularly good chess player. If you prompt them for being especially good, you can try to summon that particular face of the shoggoth. Of course, you are still bound by how good the LLM can play at all, so prompting it about being a planet-sized superintelligence will not make it arbitrarily clever.

I also wonder if LLMs are more likely to figure out that their opponent is weak and decide to do some riskier maneuvers to quickly crush them than chess engines are. It seems like more of the thing humans (whose utility in online matches includes the number of moves) would do than chess engines.

Expand full comment
Carl Pham's avatar

Yes? You should mention your first sentence to all the people who think LLMs might be intelligent because they can predict the next token very well.

Expand full comment
Metacelsus's avatar

Good point.

Expand full comment
Scott Alexander's avatar

I think yes in theory, because they're also trained with the information on who wins, so they can see things like "Human experts who do X win more often than human experts who do Y". I don't know how efficient this training process is though. I also don't know if it's worth making arguments like "humans also learn from other humans but some of them still manage to surpass their teachers".

Expand full comment
Some Guy's avatar

My take for the utility of a super predictor (like a really, really, good one) is that it would be used to interdict between humans and all uses of AI by predicting if something really bad will happen as a result of a given request and then denying that specific request.

Expand full comment
Silverlock's avatar

". . . to decide if some evidence bears on a forecast. . ."

Maybe we should just ask the evidence bears. But we might have to stipulate an anonymity claws.

Expand full comment
Deiseach's avatar

Re: banning mifepristone, it will depend how you define "nationwide".

If that means "every state in the Union" then unless you imagine that by some fluke California agrees with a ban, then the winning bet is "no", and Gavin is doing his best to safeguard the nation against this:

https://www.gov.ca.gov/2023/04/10/california-announces-emergency-stockpile-of-abortion-medication-defending-against-extreme-texas-court-ruling/

"Governor Newsom announced that California has secured an emergency stockpile of up to 2 million pills of Misoprostol, a safe and effective medication abortion drug, in the wake of an extremist judge seeking to block Mifepristone, a critical abortion pill.

California shared the negotiated terms of its Misoprostol purchase agreement to assist other states in securing Misoprostol, at low cost.

While California still believes Mifepristone is central to the preferred regimen for medication abortion, the State negotiated and purchased an emergency stockpile of Misoprostol in anticipation of Friday’s ruling by far-right federal judge Matthew Kacsmaryk to ensure that California remains a safe haven for safe, affordable, and accessible reproductive care. More than 250,000 pills have already arrived in California, and the State has negotiated the ability to purchase up to 2 million Misoprostol pills as needed through CalRx. To support other states in securing Misoprostol at a low cost, California has shared the negotiated terms of the purchase agreement with all states in the Reproductive Freedom Alliance."

If, however, some jiggery-pokery with definitions goes on, such that "by 'nationwide', I meant at least one state in the West, one in the Middle, and one in the East, that is, geographically extending from one coast of the country to the other", then it could happen.

But I still think "no" is the way to bet.

Expand full comment
Ted's avatar

I think you're missing two points here. First of all, the market is about Mifepristone (as is the Texas lawsuit), and you're linking to a California purchase of Misoprostol, a different drug (that's often taken as part of a two-drug regimen with Mifepristone). Second, the thing that would make the market resolve positive is if the end result of the Texas court case is the FDA directed to withdraw its approval of Mifepristone. There's a note in the comments of the market that this could lead to some vagueness, such as if FDA does so but then announces that it will be declining to enforce the relevant laws on unapproved medicines against Mifepristone marketers.

Expand full comment
Deiseach's avatar

So like I said - semantic jiggery-pokery. "No, we meant the *other* abortion drug". So the anticipation is a Supreme Court case will make mifepristone illegal in all 50 states, hence why California and other blue states will technically 'ban' it (while issuing other abortifacient pills)?

Because I don't see how a case in Texas would lead the FDA to withdraw its approval, someone versed in the law explain this to me.

EDIT: It does seem to be hinging on the Texas law case, somebody explain that to me. What is the Texas judge's argument that would cause the FDA to withdraw approval - that this drug is unsafe?

Expand full comment
Jay C's avatar

That the FDA didn't follow the legally mandated procedures when approving it and changing its prescription rules.

Expand full comment
Ted's avatar

I don't know that it's semantic jiggery-pokery when it's the whole point of the market--I believe the market was started in response to the lawsuit and was specifically intended to predict the result of the lawsuit. I'd think of it as semantic jiggery-pokery if the market had a vaguer title like "Will medication abortion be banned?" and then you had to go to the text to read that what they meant was Mifepristone.

On the details of the lawsuit, first important point is that it's a federal judge in Texas--if it was Texas state court judge, he'd have no power over other states, but federal district judges have the ability to issue orders with nationwide application. He issued a preliminary order on April 7 that directed FDA to reverse its approval of the drug (both the original approval in 2000 and some changes to the approval in 2016), but the federal appellate court that covers Texas stayed the effectiveness of that order on April 13, limiting it to only reversing the 2016 changes, and the Supreme Court ultimately stayed the effectiveness of the whole order on April 21, restoring the status quo. However, these were all orders as part of a preliminary stage of the case, and the case remains live--there will be a number of further steps that could end in a different result.

In terms of how/why a judge can now require the FDA to withdraw an approval that happened in 2000, I'm not interested in starting a legal debate thread, but essentially the judge found that the initial approval was improper based on the law and that legal provisions and principles that would normally make a challenge to a 2000 decision untimely today didn't apply for various reasons. In my opinion this aspect of his decision and many others are wholly incorrect, but that's what he decided.

Expand full comment
John Schilling's avatar

Assuming that SCOTUS lets Kacsmaryk's ruling stand, how long do you think it will take the FDA to reapprove mifepristone while carefully including language saying "yes, our experts carefully considered all the stuff you say we didn't carefully consider last time"?

Expand full comment
Purpleopolis's avatar

That's the illuminating part of the current controversy. Suing government agencies because they didn't correctly follow procedures is relatively commonplace and celebrated when it's done to block a building/development permit. This suit could be mooted pretty much any time.

Expand full comment
Ash Lael's avatar

If the ruling stands, they would have to find a different part of the law that let them do it, as the current approval was done under a legislative process for "serious or life-threatening illnesses". The ruling asserts that pregnancy is not an illness and therefore the FDA was not authorised to approve mifepristone under that provision.

I'm not sure what other pathways may or may not exist.

Expand full comment
Doug S.'s avatar

Pregnancy itself might not be (legally) an illness, but it seems to me that they could plausibly claim that "pregnancy + [insert problem here]" can amount to a "serious or life threatening illness" that can be treated by using the drug to terminate the pregnancy. (That might create some issues with "off-label" use when people want to use it for terminating normal pregnancies, but I think that for the most part, doctors can prescribe things that aren't controlled substances for any reason they want, you just can't *marke*t a drug for an off-label use.)

Expand full comment
Carl Pham's avatar

Ah yes government. One assumes there's no one competent in basic economics who could have cautioned them what suddenly placing an order for 2 million pills to stockpile will do to the price and availability of the drug to those who might seek it in the near future.

Expand full comment
Deiseach's avatar

Now now, Carl Pham, don't you know all the blue states and California in particular are RICH RICH RICH and indeed net transfers of tax income from them to the conservative knuckle-dragger red states and this is why the blue states should secede and leave the red states high and dry with their Bibles and guns? 😀

So what have economics to do with it, when it's a matter of wrapping themselves in the flag of - let me quote that governor's letter to make sure I get it right - "to ensure that California remains a safe haven for safe, affordable, and accessible reproductive care". I don't want to be accused of misrepresenting anyone, so here are their very own words:

"WHAT GOVERNOR NEWSOM SAID: “In response to this extremist ban on a medication abortion drug, our state has secured a stockpile of an alternative medication abortion drug to ensure that Californians continue to have access to safe reproductive health treatments. We will not cave to extremists who are trying to outlaw these critical abortion services. Medication abortion remains legal in California.”

WHAT PEOPLE ARE SAYING:

Lieutenant Governor Eleni Kounalakis: “Today’s announcement reaffirms California’s commitment to lead the fight against extremist attempts to take away the fundamental right to reproductive care. I applaud Governor Newsom’s swift action to ensure that Californians and those who seek care here can continue to access safe abortions.”

Senate President Pro Tempore Toni Atkins: “We are continually looking for ways to stay ahead of the curve on reproductive access in California. I applaud Governor Newsom on his leadership to ensure decisions made in other states on medication abortion do not prevent Californians from getting reproductive care. I look forward to continuing to work with the Governor and my colleagues in the Legislature on additional efforts to safeguard abortion access in California.”

Assembly Speaker Anthony Rendon: “I applaud this effort by Governor Newsom to ensure that critical abortion medication is available for every woman in need, even while other states fight to strip away that right to bodily autonomy. With the legal future of mifepristone uncertain, taking early action to make sure we are well-supplied with misoprostol will mean continued access to reproductive healthcare for Californians across the state.”

First Partner Jennifer Siebel Newsom: “I’m proud of Governor Newsom’s commitment to safeguarding and expanding the personal freedom that comes from reproductive choice. In a time when targeted backlash against women’s progress persists around the country, California has led in the fight to value women and treat them equitably. I’m thankful to all of the Governors who have joined the Reproductive Freedom Alliance for their partnership and commitment to being bold in protecting women’s freedoms and bodily autonomy.”

I particularly like the "First Partner" formulation. Leave us not offend anybody in any conceivable way by seeming to favour one particular form of domestic arrangement over another, and certainly not by ascribing *gender* to the position. Better that we should sound ashamed to be married, than that!

Expand full comment
Carl Pham's avatar

I know damn well how much money they have, I send them an absurd amount every year, more than my entire starting salary the year I began work. And the absurd crap on which they spend it, don't get me started, while (just for example) almost the entire ~150 miles of I-40, between Barstow and Vegas jams up every weekend, because nobody can bebothered to add a lane or two extra to a road that was designed and built in the 1950s. (Nevada, less fiscally incompetent, modernized the road on *their* side of the border long ago, which has the perverse effect that going West you can sometimes experience a jam from Vegas itself all the way to California as 6 lanes gets slowly squeezed into 2.)

And then there's the electricity and water generation and distribution system, which also hasn't been updated in 60 years, from when the state's population was half what it is now. It's as if this stuff just appears by magic, when Gaia smiles at the orange groves, and it's not the problem of mortal men to plan it or keep it up. (The canonical solution for the past quarter century has been "just conserve more!" as if there is any serious waste left, as if it's no problem for everyone to eventually turn into Fremen, or the intention is for the population to shrink again.)

But if you ask me it's no coincidence that Newsom trots out a squirrel when California's boom 'n' bust budget cycle is turning sour:

https://calmatters.org/california-budget/2023/01/california-budget-newsom-deficit/

They're all like that, politicians. That's why they need to be kept on a very short leash, or you need someone who's just a natural Scrooge, like Jerry Brown, who despite his loony notions wasn't a bad governor *because* he had a zeal for saving money that was 100% genuine. I bet he wears his underwear twice and darns his own socks.

First Partner, ha ha. Geez, I'd despise being called something so clinical, and I would think any red-blooded woman would, too. Once upon a time we feared and hated the notion of the future turning us all into THX-1138, now we freaking revel in it, being turned into shiny plastic robots with detachable multi-functional copulatory apparatus. Even the New Soviet Man was less boring, because he sweated and probably needed a shower after a long day harvesting grain or shooting dissidents.

Expand full comment
Deiseach's avatar

I was going to make a jokey response about "well you rightfully pay for the privilege of living in paradise on earth", then I read the article about the budget and this is too sad, depressing, and enraging:

"$3.5 million to purchase opioid overdose reversal medication for every middle and high school in the state."

How about, if your 12-15 year olds are *overdosing on opiods*, you fix that before stocking up on abortion pills, Gav? Or is *that* your genius plan? Can't overdose aged 12 if they never got born in the first place?

Expand full comment
Carl Pham's avatar

We live in an age of decadence, unfortunately. Politicians have found that if they can't solve genuine problems, which of course they generally can't, because genuine problems are genuine for good reason, then they can still get applause, money, and votes if they promise to solve symbolic problems, or make grand gestures to existential problems. We're going to save all life on Earth! Huzzah! The fact that your kids will live more impoverished, shorter, and more stressful lives than you is something you can ignore because -- we're going to save all life on Earth!

Why people believe it I do not know. I see no great evidence that we could not make at least progress on the hard problems -- incremental improvements in health, longevity, material comfort, quality of education and life, modest improvements in justice and harmony. Enough to add a little positive to the life of the next generation, which is not more (although not less) than our plain duty. But that's not the Zeitgeist, it seems. I wonder why?

Expand full comment
Bugmaster's avatar

> Or is that your genius plan? Can't overdose aged 12 if they never got born in the first place?

You gotta admit, the logic checks out :-/

Expand full comment
Ash Lael's avatar

I recommend anyone interested in this case have a look at the Federal Court order in question rather than relying on news reports:

https://www.nytimes.com/interactive/2023/04/08/us/court-decision-invalidating-approval-of-mifepristone.html

I think the legal issues in play are a bit more complex than mainstream coverage generally lets on, and wouldn't be surprised to see SCOTUS uphold the mifepristone ban in some form.

IANAL, but for my money, the two strongest arguments are the subpart H issue and the Comstock Act.

The Comstock Act is an old law from the 1800s that (among other things) bans sending abortion drugs in the mail. The meaning of the law is clear and explicit, and it has not been repealed. However, there was a previous assumption under Roe V Wade that this part was unconstitutional. Under the current Dobbs jurisprudence, it's difficult to see why this part of Comstock would not be good law. The Justice Department took a stab last year at arguing that the law that clearly bans abortion-by-mail doesn't actually do that, but I personally do not find their argument convincing, and I suspect SCOTUS will not either.

The other big issue IMO is that the FDA approved mifepristone via subpart H which allows accelerated approvals for drugs that treat "serious or life-threatening illnesses". The Federal court ruled that the FDA did not have the authority to do this as pregnancy is not an illness.

This in turn runs into a long-running question about how much deference courts should give to government agencies when it comes to interpreting statutes. There is an established idea that courts should defer as much as possible - if the FDA says pregnancy is a serious or life threatening illness, the court should nod and respect their expertise. The current SCOTUS however is known to be sceptical of that formulation (and the Federal Court ruling says it doesn't apply in anyway as the FDA has interpreted subpart H narrowly in all other cases, using it only for things like HIV and cancer).

I don't have any special legal expertise, and certainly not in US law, so there's a good chance I'm missing important nuances to the issue. However to my mind people are under-rating the probability that SCOTUS ultimately comes down against the FDA here, operating on the same "this is how it's always been, this is how it will continue" heuristic that said Roe would not be overturned.

I think the reality is that while the current SCOTUS is serious and won't nakedly embrace junk legal theories for ideological reasons, they are also thoroughly willing to rule in a way that conservatives would like when the law clearly supports it - and in this case I think it does.

Expand full comment
Deiseach's avatar

I am surprised that over all the years since Roe vs Wade, nobody got around to repealing Comstock Act(s):

"The restrictions on birth control in the Comstock laws were effectively rendered null and void by Supreme Court decisions Griswold v. Connecticut (1965) and Eisenstadt v. Baird (1972). Furthermore Congress removed the restrictions on contraception in 1971 but let the rest of the Comstock law stand."

I imagine that was a compromise to get contraception legalised ("don't make a fuss about condoms through the post and we'll leave the bits about abortion on the statute books"), but now the chickens have come home to roost.

And if pregnancy is a serious or life-threatening illness, then *every* pregnancy is serious etc. and if mifepristone is the drug of choice to treat this ailment, then every doctor should be prescribing it to their pregnant patients. Clearly this is absurd, but once again - chickens. roost.

Fifty years since Roe vs Wade, a combination of lack of ability to get it done and complacency means that abortion is still a live issue and things like this are happening. Since the liberal side was perfectly happy with ruling from the bench when decisions were going their way, they're going to have to lump it now.

I think everyone is going to have a damn good try to get around/overturn the Texas decision, but it does sound a lot more complicated than anyone wants to get into the weeds about.

Expand full comment
Moon Moth's avatar

> I am surprised that over all the years since Roe vs Wade, nobody got around to repealing Comstock Act

I don't know about Ireland, but in America the left seems to be bad at this sort of thing. It's as if they never took seriously the idea that Roe v. Wade might be repealed. There were state laws that were left sitting on the books, too, which were suddenly updated in a mad scramble.

I'm tempted to blame the "right side of history" narrative of progress. But realistically I think it's more that the movement is decentralized and largely moving on its own impulses. I think it'd be more successful if there actually were a left-wing conspiracy, or at least a Leninist intellectual vanguard. IMO, for a while the American right wing had the equivalent of a Leninist intellectual vanguard (if it didn't feel so wrong to apply that term there), but as far as I can tell that went by the wayside after George W. Bush's presidency, never recovered during the Obama years, and was decisively sidelined by Trump.

Expand full comment
Ash Lael's avatar

There's also the part where the American system makes big controversial legislation really hard, so change tends to come through the courts. I'm not convinced that there's actually any point between 1973 and today where a national abortion law would have passed. Even when Democrats briefly had 60 senate seats under Obama, there would have been defectors on this issue.

Expand full comment
Carl Pham's avatar

One wonders what penumbra of the Commerce Clause would've been used to argue that the Constitution gives Congress the power to legislate on abortion.

Expand full comment
Moon Moth's avatar

Gonzales v. Raich should work for just about anything. Clarence Thomas in dissent:

> Respondents Diane Monson and Angel Raich use marijuana that has never been bought or sold, that has never crossed state lines, and that has had no demonstrable effect on the national market for marijuana. If Congress can regulate this under the Commerce Clause, then it can regulate virtually anything—and the Federal Government is no longer one of limited and enumerated powers.

Expand full comment
Moon Moth's avatar

Yeah, there is that. I don't feel like this adequately explains the state-level stuff, though, especially given the amount of (accurate) rhetoric claiming that Republicans would be going after Roe v. Wade.

Expand full comment
Carl Pham's avatar

That is a very interesting gloss on the issue, thanks for pointing this out. It's mighty discouraging how often the Executive Branch resorts to short-cuts like this. Even in the cases where I agree with the outcome, this is a piss-poor way to run a republic. If people get used to thinking the President-imperator or some tribune or other can and should just order whatever the fuck The People, bless their hearts, want, that way lies the usual Caesarism by which republics historically commit suicide. Bah.

Expand full comment
Husky's avatar

Ask AI to forecast the likelihood of consciousness if a trillion data points were achieved. Let’s take the max tegmark & future of life AGI pledge & pause 6 months to get our feet under us.

Expand full comment
Ryan L's avatar

I'm not an AI expert, but I naively expect a LLM to perhaps asymptotically approach, but always do worse than, the wisdom of the crowd at predicting the future. The reason is that a LLM has no model of the outside world, just a model of human language. I could believe in some hand-wavey way that a large and advanced enough LLM might approximate the wisdom of the crowd, but without a true model of the outside world I don't think it would ever surpass that.

Expand full comment
Forstfrost's avatar

Apparently a LLM potentially could have a model of the outside world! 3rd point of this paper: https://arxiv.org/abs/2304.00612

Expand full comment
Moon Moth's avatar

I wouldn't call those things a "model of the outside world". They seem more like a "compact data structure that produces accurate results". And sure, there's a causal connection between the outside world and the data structures, via the data that the LLM was trained on. But I feel that calling such things "models of the outside world", without more technical clarification, will lead to misunderstanding and incorrect inferences.

Expand full comment
Forstfrost's avatar

How would you say is a human's model of the world different from this? Maybe because of a constant stream of sensory input? Would Bing searching the web then count? (I'm just thinking out loud.)

Expand full comment
Moon Moth's avatar

Partly, and this is pedantic to the point of silliness, I'd say it's because humans have direct access to the outside world, whereas AIs don't. And that it's the training that produces the model, which is no different than someone programming a classical chess-playing AI with an internal model of the board. It's not that the neural net program figures out how to play on the fly, by observing an object and thinking about it. To me, it's not particularly different than saying that my laptop has an internal model of a keyboard and mouse.

Partly I'd say it's because the data structures might not actually reflect the outside world - they might be more or less accurate, and they might be larger or smaller than a strict one-to-one representation. (If an AI developed an accurate representation of a part of the world which was also more compact, that would be very interesting.) Of course, you could argue that this just means that some models are better or worse than others, for particular purposes. Certainly, people often have incorrect models of things.

And partly it's a bit of verbal hygiene. The cool thing about neural nets is that they can sometimes mimic animal and human behaviors, in surprising ways. It can be easy to anthropomorphize them, when really the lesson should be "that thing that we thought was particularly human, is just a characteristic of neural net behavior". So I'm generally in favor of keeping descriptions as low-level as possible. That said, we have no idea at what point these neural nets might approach something that has ethical standing on par with animals or humans, so it's not a bad idea to keep using the figurative language too.

Expand full comment
Scott Alexander's avatar

I don't think this is right. Imagine training an AI on the business section of the newspaper. Each day it learns what the economic conditions for that day are, and then it's trying to predict the text in the next day's business section (ie "stocks went up" or "stocks went down"). Eventually even though it's only processing text, it should start trying to learn how to predict if stocks will go up or down; if it's very good at this, it might eventually get better than humans. I think this generalizes to a lot of different areas.

IE it's trying to predict what humans will say, but often what humans say is something like "I see stock prices have gone down" or "I see it is raining", so you can't predict this without also being able to predict the external world.

Expand full comment
Ryan L's avatar

Hmm. That's a more focused approach than I had in mind, and it's interesting. I'm not sure how well it work in practice -- it still seems like an attempt to centralize decision making, and I think it would run into the same problems that all centralized decision makers face (I'm very much a Hayekian in this regard). But that's an empirical question.

But would this work if you tried it with a general purpose LLM like GPT-4? The training data set you describe is narrowly focused but the corpus for existing LLMs would seem to contain a lot of noise. And how does the model know the difference between a prediction that came true, a prediction that didn't come true, a postdiction that is accurate, and a postdiction that is inaccurate?

Expand full comment
Scott Alexander's avatar

I'm not literally suggesting training it on the business section, I'm saying that general LLMs have been trained on business sections and many other things beside, and so they are naturally engaging in this process of trying to understand the world.

I agree that the process of "train an LLM on the business section, make it solve the planning problem" wouldn't work, partly because the planning problem is hard, but partly because the business section just doesn't contain that much information about the world. The prices of oil might fall because fusion power gets invented, but in order to predict fusion, you need to know science, not just business.

I'm more saying that an AI trained on the entire linguistic corpus (which includes business pages and many other things referring to the external world) will naturally produce an AI that engages in reasoning about the external world the same way humans do, and although this AI probably won't be an economic calculator that can solve the planning problem (for the same reason humans can't solve the planning problem), it might be able to tell you intelligent things about the economy (for the same reason some humans can tell you intelligent things about the economy). And if you solve many other problems and scale it up and otherwise do lots of very hard work, it might even be able to tell you more intelligent things about the economy than human economists can.

I think the "noise" problem is one that all LLMs face but seem to be able to get around. As for correct predictions, that's the point of the training process - once it's been partly trained, it's using that training to predict other things. If the business section of a newspaper contains the phrase "Stocks went up today", it's trying to predict the token u in up (and distinguish it from the token d which would begin down) and the training process is naturally rewarding/penalizing it for getting that right/wrong.

Expand full comment
Bugmaster's avatar

> I'm more saying that an AI trained on the entire linguistic corpus (which includes business pages and many other things referring to the external world) will naturally produce an AI that engages in reasoning about the external world the same way humans do...

Is that actually true, though ? For example, humans can learn a simple algorithm that, when used with care (and supplied with enough paper), can be used to multiply arbitrarily large numbers. This algorithm is simple enough to be implemented in silicon or in a few lines of code, but AFAIK modern LLMs cannot discover it on their own. They can sort of multiply some numbers sometimes, but they cannot handle arbitrary multiplication nearly as well as even a cheap calculator can. That goes double for floating-point division (heh).

Granted, you might argue that numbers are not part of "the external world", but I don't think this is true. If I'm trying to split up a pizza equally among N people, one relatively easy (though admittedly not the best) way to do it is to divide it into slices of 360/N degrees each. If I want to build a bridge, it helps to know that I'll need to pay (cost_per_yard) * (sum(beams, beam=>length_yards(beam)) dollars for my steel beams... and so on.

Expand full comment
Scott Alexander's avatar

I think AIs don't have paper, and do about as well as humans do without paper.

My guess is that an AI trained with an attached function "copy this to a (virtual) piece of paper" would eventually (might take a long time) invent the same algorithm humans invented for multiplying (it took us a long time to invent too).

My thought process is something like - an AI has a certain number of parameters, which are a limited resource. Also, it's not allocating this limited resource in a directed way, they're just getting moved back and forth by the (correlated with usefulness but unintelligent) vagaries of training. I think it might take a lot of parameters to hold a multiplication algorithm, with increasingly many parameters per digit, and it's not worth it given the rarity with which mutliplying very large numbers appears in the dataset (or the parameter planning process isn't clever/stable enough to come up with it). I think this is probably related to why humans can't multiply large numbers in our head very well, although it might also have to do with noise, which AIs shouldn't (might not?) face.

Someone who knows more about AI can tell me if I'm totally off base here.

Expand full comment
Carl Pham's avatar

There's a problematic assumption you've made here, that today's news is sort of randomly representative of all the financial data, e.g. you find news about what people consider important but also what they consider unimportant.

My experience is that the only things people talk about in today's news is the data that they think are important, and which explain yesterday's news. Unless they are generally right -- and if they *were* then humans could predict stocks already -- that means important data is frequently not presented or discussed, because humans (mistakenly) think it isn't important. An AI isn't going to be able to detect a pattern invisible to humans if the data from which that pattern could be extracted is missing, because human beings insenstive to even the possibility of that pattern have thrown it away. You might get lucky and find a pattern that *is* present in the data, and which human beings (and existing neural net tools, of which I'm sure there are already a bunch in the financial industory) have overlooked, but it doesn't seem likely there can be more than a few such lucky accidents waiting to be found.

The only way around the general problem is for the AI to access data directly and more representatively, i.e. to have direct experience of the world, without the intervening filter of what human beings consider important. Which circles right back to the OP's point, which is that it seems unlkely you can deduce more than humans deduce if you limit yourself to analyzing human reactions to data human beings consider interesting.

Expand full comment
Scott Alexander's avatar

Last time I read a local paper, it had a statistics page with things like the price of oil, price of gold, unemployment rate, interest rate, etc, every day (though some of these obviously didn't change daily).

I think this is enough to start forming hypotheses. I agree that it won't be able to get certain things, but humans also can't get certain things (it might be that the economy is predicted very well with the number of negative thoughts people on Wall Street have, but nobody is monitoring this and no human economist can use it).

That is, you can think of newspaper data as a "sense". It's not a perfect sense, but neither is sight, sound, taste, etc. And most human economists aren't tasting the economy, they're reading about it in newspapers (or books, or papers, etc).

Also as I'm sure you're aware vision. hearing, etc are already plenty biased.

Expand full comment
Carl Pham's avatar

To be sure. I didn't say there was zero useful data in the business section. I only doubted whether the data available were anywhere near sufficient -- and more importantly sufficiently unweighted and undistorted -- to do considerably better than human beings (and existing pattern detection methods).

Let me put it another way: suppose ChatGPT itself had been trained not on as much data as the trainers could lay hands on, but instead on only the modest corpus of data that consisted of prior efforts at natural language production, plus the scholarly debate about that data -- what it means, how to do better, et cetera. Would you expect the resulting SLM (we could hardly call it an LLM) to be able to construct good paragraphs on almost any subject, after being trained on only the discussion of experts on how to construct good paragraphs (on any subject)?

You would not. The way to get power in a neural net is through enormous amounts of data, and unfiltered data. So far as I know that's pretty what distinguishes the latest generation of neural net chatbots form previous versions -- unprecedented amounts of training data (and a larger model, of course, but the model scales naturally with the data -- you can't have more weights than you have data to train them on).

You can't start filtering your data and expect good results, because you will start biasing the training, and both failing to discover real patterns as well as increasing the probability that you detect bogus patterns. I think this is kind of a given in statistical pattern detection, curve fitting, et cetera. And that's what I'm saying here. Any pre-filitering of the data will compromise pattern detection, and reading what human beings say about financial data, rather than accessing financial data directly, is a pretty high degree of pre-filtering. After all, the financial section of the Wall Street Journal represents a compression of the entire economic activity of 300 million people by a staggering degree. Almost all of the data has been thrown away -- and not randomly.

I am of course not saying you're wrong. I'm saying there is a clear problem of pre-filtering going on here, and that gives us good reason to be skeptical. Indeed, my own skepticism is sufficient that I wouldn't even bother to make the attempt, I'd just figure out some way to accumulate way more financial data than anyone had ever scraped up before, and train a special purpose neural net on it. Why screw around with using hearsay instead of the direct testimony, so to speak?

Expand full comment
Scott Alexander's avatar

I agree if you wanted to make a financial-predictor, you would do the direct giant pile of financial data (although I think you would want to somehow combine this with a generic language model so that it could think at all and answer questions).

I'm making the weaker claim that if you're training a language model, on things that include some (even prefiltered) financial data, you will get some effort to understand it. That effort may be far subhuman (at current levels) or potentially human or superhuman (if AI technology advances a lot).

I think in those cases it will be bad that the financial data is pre-filtered, but I also think that's life/the human condition. I personally understand some things about the economy, even though I have never gone to a factory and watched manufacturing happen with my own eyes. I've read about it in books and newspapers, tried to adjust for the biases without necessarily succeeding, and mostly absorbed the conventional wisdom of the people writing it while occasionally having an original thought based on things they didn't think were important but I do. I think future AIs will be basically the same as this, and may be worse/better/the same compared to me at economics based on how big/advanced they are, without having big fundamental differences.

I think the AI trained on a giant pile of unfiltered financial data would be better if it were possible, but I'm not sure it is and it doesn't change my opinion on the LLM case much either way.

Expand full comment
Carl Pham's avatar

Well, I think you may be overlooking two important aspects of how you, as a human being, process data: (1) you have direct experience of the world *as well as* what you read and hear from other people. You test what others say against your own experience -- that is, to be Bayesian, you have priors[1]. An LLM has no equivalent priors, since it has zero experience of the real world. Lacking that experience, it lacks any method to check the appropriateness of any novel output, other than internal consistency with its all the speech of others it's read, which one would guess contributes to the "hallucination" problem[2]. (We can build those in, of course -- but then the intelligence and ingenuity here belongs to us, not the LLM.)

And (2) you have initiative, and you consider and reconsider how you think of things, even in the absence of new data. The LLM doesn't do that. It only does anything at all when prompted, and if it doesn't get the appropriate prompt, there are things it will never consider -- in particular, things that would not occur to human beings to ponder -- which is exactly where we're trying to get, if we want the AI to exceed its designers.

Another way to put the latter is that you have your own internal random prompt generator, which pokes you all the time to consider and reconsider your thoughts and the patterns you see in the data you have. The AI doesn't. It seems to me unlikely the AI can surpass human standards of general ingenuity without either access to the world itself (which is where this thread started) or the ability to "self-prompt" and muse on the data in ways that no human has yet asked it to do.

--------------

[1] And notably when you truly do lack any relevant direct experience for a particular issue, you have another prior that suggests caution about your conclusions in the area, which is again probably derived from direct experience.

[2] One assumes that an LLM trained on Malleus Maleficarum would argue eruditely for the existence of witches.

Expand full comment
Feral Finster's avatar

If AIs were so damn smart, why can't Google or MS come up with a spell checker or autocorrect that actually works?

Mine is worse than random chance, and worse, they keep changing correct text to incorrect.

Expand full comment
Martin Blank's avatar

My favorite is when they don't recognize a simple mis-key transposition and suggest something wildly more implausible, where I presumably messed up 12 keystrokes.

Expand full comment
Feral Finster's avatar

I've noticed that a typo in the initial letter sends the spellcheck and autocorrect down some odd rabbit holes.

Expand full comment
John Trent's avatar

Not covering the whale/minnows drama on Manifold is a big miss, Scott

Expand full comment
Alex Power's avatar

I hadn't seen it ... and now that I have seen it, I am glad Scott didn't cover it.

"Why you shouldn't allow penny-auctions or dollar-auctions" would take an entire blog post of its own to cover. Not least because the terms confusingly mean different things: penny-auction generally refers to "pay per bid", while dollar-auction generally refers to "second-highest bidder pays but gets nothing".

EDIT: more specifically, an entire blog post after the dust is settled on current drama.

Expand full comment
Deiseach's avatar

I may regret asking this, but why are people angry at this Isaac King guy in the comments above? What is the drama?

Expand full comment
Moon Moth's avatar

Ditto! (+1, upvote, like, what she said)

Expand full comment
Tom Bennett's avatar

Reminder that Futuur currently has over 700 open real-money markets on a lot of these topics.

https://futuur.com

Also, we just launched our beta API (request access on your Futuur settings page or ping me directly).

I expect there will be a lot of interesting trading opportunities, both via arbitrage, and leveraging the new AI models.

Expand full comment
duck_master's avatar

Does Futuur ban Americans? I thought some of the other real-money prediction markets did.

(Note: I am American.)

Expand full comment
Tom Bennett's avatar

Yes, unfortunately due to the terms of our gaming license we must restrict the US and several other countries from access to real-money markets. However, you're still welcome (and encouraged) to participate on the play-money side!

Expand full comment
Alex Zavoluk's avatar

> This is my Long Bet with Samo Burja - the resolution criteria are slightly different, but close enough to make me feel a little more confident I’m on the right side.

The way this and your other bet are worded have me slightly confused. "Something comparable to GT from slightly before GT" seems plausible to me in a way that "100,000 year old Ice Age civilization that taught the Egyptians how to make pyramids" doesn't.

Expand full comment
Scott Alexander's avatar

I agree. We're betting on GT slightly before GT (which I think is slightly less likely than he does, although we both agree there's significant uncertainty); we both agree the 100,000 year old civilization is unlikely.

Expand full comment
Jack's avatar

Not that I think it can't do it, because AI LLMs seem to keep making progress that people said they couldn't, but how exactly would it work that it would be able to play chess better than a grandmaster?

I can understand, as a first step, recognizing the input as being related to asking for chess moves and outputting things that look like chess moves. Then as a second step, recognizing the patterns for what makes something a legal chess move and not just a capital letter followed by a lowercase letter from a to h followed by a number from 1 to 8.

Then even as a third step recognizing a connection between the prompt and the idea that they're supposed to be good chess moves, along with recognizing what is considered by the stuff in the training data to be a good chess move.

But how do you get from that to beating a grandmaster? Unless the dataset is filled with games that are better performances than what grandmasters do, and labelled as such, but that doesn't seem to be the case now. Maybe if Google dumps like 1 billion AlphaZero chess matches onto an online database somewhere?

This also leads to a related question which is how reliably it can know which ones are the good moves. For something like the scholar's mate, presumably most of the references to it in the model are near words like "bad" and "stupid" and "don't do this" so it can tell they're bad. But it's not clear to me that this same thing would distinguish the best moves from the merely-good moves (which is really a broader question than just being about chess).

If I dump 50 billion chess matches onto an online database but they're all shit, along with a bunch of (AI-generated of course) commentary of "oh what a brilliant move here", would that make GPT-5 really bad at chess?

Expand full comment
Scott Alexander's avatar

Whoever the best human is has learned to beat all other humans while only watching human games.

The easiest way to predict what Magnus Carlsen will do doesn't *just* involve forming a psychological profile of Carlsen, it involves learning how to play chess and what a good move is, and then assuming he will make good moves.

But also, the AI is learning to predict who wins chess games (since any chess games in its corpus include text about someone winning). That means it's learning how X move affects the chance of a player winning. If it learns that well enough, it has a model of which chess moves maximize its chance of winning, which is what you need to win at chess.

Expand full comment
Alex Zavoluk's avatar

> Whoever the best human is has learned to beat all other humans while only watching human games.

At this point, that might not be true. Superhuman chess engines are over 20 years old and I can't imagine that top players haven't tried to learn from them. Of course, there are probably plenty of engine games out there too, but on the other hand humans don't have to only watch human games, or to be really accurate, reading text strings describing games without every playing.

Expand full comment
Scott Alexander's avatar

Fine, assume I'm talking about 50 years ago and my point still holds.

Expand full comment
Alex Zavoluk's avatar

Sorry, I didn't mean to disagree. I agree this is theoretically possible (but difficult, which you seem to agree).

Expand full comment
Scott Alexander's avatar

Sorry, didn't mean to sound hostile.

Expand full comment
skybrian's avatar

I hate to see "anything can happen" getting made fun of because, to a first approximation, it's often correct. Other than physical and logical impossibilities, we're not usually putting zero probability weight on the things we think won't happen. I should think that anyone worried about existential risks would be well aware of the importance of low-probability scenarios?

Furthermore, most of the time we communicate and reason using metaphor and analogy. They're good for imagining possibilities you might not not have considered otherwise. They're pretty rubbish for calculating probabilities or ruling things out. Even for people who like math, we're most often using math as metaphor. Have you calculated anything using a prior probability today?

Expand full comment
glaebhoerl's avatar

I had the same idea w.r.t testing LLM predictions on events already past a few days ago, and quizzed GPT-4 on the first dozen significant-seeming questions that occurred to me: https://twitter.com/glaebhoerl/status/1649547678718500866. Not systematic or scientific in any way, unlike the paper! I hope someone puts in the elbow grease to see how newer and more capable models perform.

Expand full comment
Stephen Pimentel's avatar

>Actual GPT-4 probably would just give us some boring boilerplate about how the future is uncertain and it’s irresponsible to speculate. But what if AI researchers took some other model that had been trained not to do that, and asked it?

How unaligned AGI got created because Scott wanted to test an LLM in a prediction market and forgot to be paranoid.

Only kidding, I don't think anything of the sort is likely, whether for this or any other reason. I just thought it was funny.

Expand full comment
duck_master's avatar

I have asked ChatGPT to make probabilistic predictions (in the prompt, I've told it "Pretend you are the world's best superforecaster"). It mostly agrees with me, I think, except that sometimes there have been important events related to the question since the 2021 data cutoff, so the prediction + reasoning becomes noticeably outdated.

Expand full comment
Stephen Pimentel's avatar

I tried this with Russia/Ukraine and it demurred with "As an AI language model and not an actual superforecaster ..."

Expand full comment
quiet_NaN's avatar

I feel forecasting is AGI-hard? For questions already discussed in public ("will X invade Y this year"), the best LLMs can probably do is search for op-eds on the question and calculate a weighted average on them, in which step the LLM is only really needed to convert the opinion text to a probability. If we had an neural net whose world model was good enough that it could predict as well as a superforecaster, that already would seem somewhat x-risky?

The question about the biggest twitterer creating a question seems silly? It would be like betting on boxing match outcomes if it was acceptable for contestants to bet against themselves and fake going k.o. in that no rational non-contestant would bet on that? Given that this is all play money, nobody cares, I figure?

As a PR move for the platform, it is brilliant. It is basically a welcome gift for the biggest twitterer who spends five minutes to create some market.

Expand full comment
Ash Lael's avatar

Further on the mifepristone issue, Manifold also has this market which currently gives a 35% chance that SCOTUS overrules the Texas decision (and by implication gives a 65% chance that it is upheld): https://manifold.markets/BTE/will-the-supreme-court-reverse-the-304d1ed78694

Putting that and the other question together, it seems like the market expects mifepristone to be banned nationally, but for the case not to be resolved before 2024.

Expand full comment