137 Comments

Seeing that number on Trump/Colorado made me check up whether Trump v. Anderson had been decided or not. It hasn't.

Expand full comment

No, but the response to the oral arguments two weeks ago adequately explains those odds.

Expand full comment

They've only got "Supreme Court reinstates Trump" at 87%, though. Not sure why that hasn't been arbitraged; maybe people think that in the case of an uphold the Colorado Republicans will defy SCOTUS and put Trump on there anyway? And it's not like there aren't ways that the SCOTUS could say "overturned, he's eligible" and he still doesn't wind up on the ballot.

Expand full comment

>fair warning: if you already hate any of rationalists, San Francisco, tech, prediction markets, polyamory, betting, love, reality shows, or Aella, this will definitely make you hate them more.

Glaring exclusion of "musicals" here

Expand full comment

Also "life"

Expand full comment

I think the chance of a child resulting from this event is somewhat less than 1%, so you may be correct here.

Expand full comment

The Manifold show reminds me of the “industrial”, short for “industrial musical”: “a musical performed internally for the employees or shareholders of a business to create a feeling of being part of a team, to entertain, and/or to educate and motivate the management and salespeople to improve sales and profits” https://en.wikipedia.org/wiki/Industrial_musical

I have an older friend who did these decades ago for companies including Johnson & Johnson. Like, he wrote parodies of Broadway musicals, but they were about surgical dressings and things like that.

Expand full comment
Feb 20·edited Feb 20

> Generating the full movie

> The dumbest possible way to do this is to ask GPT-4 to write a summary (“write the summary of a plot for a detective mystery story”), then ask it to convert the summary into a 100-point outline, then convert that into 100 minutes of a 100-minute movie, then ask Sora to generate each one-minute block

I've spent several weeks trying to make GPT4 generate at least medium quality detective plot - and was forced to admit complete defeat. Not only I was unable to make it generate a consistent plot with clues and foreshadowing, even asking it to retell well known plot would show big holes in understanding. So either I've missed something big, or you are dramatically underestimating the difficulty.

If anybody is working on something similar and has at least mild success - I'd be very interested in talking, so please reach out by mail rlkarpichev at outlook

Abbreviated list of stuff I've tried, and how it failed:

Here and later plot/plot descriptions/beatsheet/outline/... (and a lot of different tried formats) = descriptions

1. Straight description generation (i.e. generate plot/beatsheet/outline/...) - would successfully generate some list of linked events, but each event is described with too few details, which disallow most causal links (clues/redherring/Chekhov guns/etc)

2. A lot of attempts to increase level of details (json/yaml formating with separate fields for different details, dozens of prompts, chain-of-thoughs, examples in prompt, ...). Two failure modes - either abandon format halfway (so still not enough details) or formally adheres to format but in practice made a mockery of it (i.e. {"event": "Detective X finally arrested Y", "why it happened": "X caught Y unaware, so Y couldn't resist"} - ignoring possible clues from context)

3. One thing that *almost* worked is reflection/self improvement - generate description, ask GPT for problems, then ask to fix them. It successfully identified the problems and generally improved description in first 2-3 iterations but then run face-first into lack of necessary details or problems with modifying the description (see the next point)

4. Fixing bad plots descriptions after the fact by handcrafted prompts (i.e. ask to generate, then look at it and say "Ok, GPT4 please add clue to Act 4"). Results: Mild success hiding major failure. While GPT easily modifies descriptions it either tries to modify whole description to accomodate (thus creating other problems), or often miss obvious reasons/consequences which also should be modified to have a coherent plot.

5. As a hail Mary tried to use agents - got only two different ones working, and both would fall into introspection or fixate on minor details. Haven't spent enough time on it, so not really sure if all agent like that or what

6. At this point I was ready to give up, so I decided to do something really easy - generate description of first Harry Potter book/film. Considering the number of exact texts, fanfiction and analysis GPT have seen during the training, it should have been a piece of cake. It wasn't. After spending couple of days trying to make GPT spit out description which would have at least half causal links between events and 90% notable events, I was sure that either GPT4 can't do it or I'm too stupid to make it work

Conclusion:

I walked away convinced that GPT4 has very weak grasp on causality, and with much better intuition about limitations of LLMs .

Expand full comment

This is roughly in line with what I expected. (NOTE: I'm one of the major NO holders in the AI movie market).

Regarding Sora, this advance is also roughly in line with my expectations. The visual quality is very good, but world consistency is really bad. Entities that just walk into the ground and disappear, characters that fail to keep the same clothing and hairdo even in simple single subject scenes, different entities that have ridiculous sizes compared to each other, etc. All these flaws will be magnified x1000 once multiple characters that need to be consistently differentiated come into the picture.

I'm also assuming that generation is very expensive for now, optimized for a flashy announcement over practicality, but this isn't a huge crux.

Finally, I'm banking on Scott's honest interpretation of "high quality" and "to a prompt". To me that means generating an arbitrary movie in an arbitrary style, at the quality of a movie studio. Something like "don't worry, you'll get used to people changing haircuts in every scene" does NOT come anywhere close to that implied quality level. Details like this might not be very noticeable in a single scene, but it'll make the experience of watching the movie terrible. Little queues like that guide the narration, help with recognizing characters, and so on.

Expand full comment

The probably-cherrypicked fluid sim was quite good though.

Expand full comment

Which one? I mostly paid attention to the waves, and those were all pretty bad as soon as you looked closely.

Expand full comment

I was thinking of the pirate ships in a teacup.

Expand full comment

I feel like "haircut" type consistency is a solveable problem. You'd need some kind of separate model trained to solve the problem of "take this image and make it consistent with this image", which is not fundamentally different from the whole StyleGAN thing.

There are some longer term spatial-inconsistency things in the Sora demos that can't be solved on an image by image basis though; e.g. in the gold rush town the camera flies past a building and then rotates to reveal that the building was never there. Perhaps what's really needed is the ability to generate not video but 3D models of entire scenes.

I do think that writing a genuinely good script is going to be the part that won't be doable by 2030.

Expand full comment
Feb 20·edited Feb 20

> what's really needed is the ability to generate not video but 3D models of entire scenes

I don't have a lot of experience with image/video generation but but deep dive in text generation suggests that it's not going to work (meaning you would get minor quality boost and not really solve the problem)

There are two ways to generate something: bottoms-up and top-down. Multi Agents interaction simulation vs GPT plot generation, 3D rendering vs DALLE, physics simulation and rendering vs SORA.

Top-down approaches suffer from lack of coherency and steerability, while top-down rely on you getting details of implementation and initial conditions exactly right (to match desired big picture) and a lot of times just computationally intractable.

Trying to marry opposite approaches seems promising, but imo would only multiply the problems

Expand full comment

It's interesting as there is a lot of recent progress on generating 3D scenes from videos or images e.g. all the neural radiance field (NeRF) research - so perhaps this can be added as a "critic" to a diffusion model that already incorporates the context of "global consistency across frames", mandating certain consistency across scenes - the tricky part might be separation of the static scene parts from those where you *do* want intentional inconsistency due to intentional movement or disappearance of something.

Expand full comment

Yes, imo that's the two main limitations of all generative models - consistency and steerability. A lot of functionality hinges on indescribable blobs of abstractions, so if one such blob close enough to what you want, everything is easy. But if not, you are back to standard ML pipeline with data gathering and finetuning.

Expand full comment

I'm not trying to generate plots, but when I try to ask straightforward chemistry questions to GPT4 (or Gemini) (e.g. "What inorganic compounds are gases at STP?", "What are the possible hydrocarbons with 4 or fewer carbon atoms?" "What is the solubility of FePO4 in 5 N HCl?") I keep getting wrong answers.

The FePO4 case is the most similar to the plot consistency problem you saw: It needed to get the ratio of phosphate ions (PO4)3- to total phosphate in 5N HCl, and it needed to string _together_ several acid dissociation equations to do it, and it failed.

Expand full comment

I know close to nothing about chemistry, but your case seems pretty interesting. If you don't mind sharing:

1. Did you encounter any problems caused by tokenization? I imagine chemistry reactions should be especially vulnerable to tokenization problems

2. Did you try any prompting strategies beside asking the question "What is the solubility of FePO4 in 5 N HCl?" (think in steps/chain-of-thought, take a deep breath, ...)

3. What about correcting after the fact? I.e. if you point an error to GPT, could it fix? What about consequences of the error?

Expand full comment
Feb 20·edited Feb 20

Many Thanks!

1) I was pleasantly surprised at the apparent absence of tokenization problems. E.g. specifically in the FePO4 case, it knew that the solid should dissolve to give Fe3+ and (PO4)3- ions. In other experiments, it was able to take the formula for a compound (I think I used Prozac) and give me a count of the various atom types in the molecule, so in both cases it wasn't treating the formula as a monolithic black box token. It seems to be able to dissect formulae correctly.

2) Yes, though I didn't try to do it as a single prompt with multiple cautions for the FePO4 case. I asked it repeated leading questions about the acid dissociation equations involved, and was finally able to force it into getting the calculation almost right (IIRC, one of the dissociation equations was still wrong at the end - and it started with getting numerators and denominators reversed in all the equations).

3) Yes, it can accept corrections (at least within a session - if I understand correctly, this is volatile information in the context window and doesn't get folded back into the weights). In some cases, e.g. the "inorganic gases" question, I could tell it "Think carefully about entry #4" (Where e.g. entry #4 boiled at too high a temperature) and it would then say something like "entry #4 does fail the criteria and should not have been on the list. Sorry about the error"

Overall, it generally seems possible to force GPT4 into finally getting the right answers, if one already knows the right answer and are willing to lead it by the nose. But, in my experience, it is nowhere near trustworthy enough to ask it questions where one _doesn't_ already know the answer, and be able to trust what it comes up with.

It can be sort-of useful if one is trying to find a solution to some problem and one doesn't know the _name_ of the solution, but are reasonably sure that someone has published a solution. E.g. it named the Oppenheimer-Snyder solution for collapse to a black hole, which was a name I didn't know, based on a description of what I was looking for (nonstationary precursor to the Schwarzchild solution).

edit: FWIW, here is my latest experiment with GPT4: https://chat.openai.com/share/7aa305b2-5f6b-40e3-82a8-93cfcaea49ce I gave it what is close to the simplest possible titration problem (NaOH (a strong base) being titrated with HCl (a strong acid)). It _should_ have been able to deduce a closed form analytical expression for pH as a function of titrant added. This is basically solving a quadratic. It couldn't do that. It took repeated prodding to make it give _any_ numerical answer. I had to force it to take a numerical derivative, and to force it to use a particular increment for the numerical derivative. It initially falsely said that the derivative at the equivalence point was infinite, and I lead it by the nose through a numerical calculation to get the almost-right answer (it was still wrong by a factor of two). I'm really hoping that the reliability of LLMs is substantially improved soon. BTW, let me know if you can't see the URL, in which case I will try to copy the text somewhere more visible.

Expand full comment

Thanks, your chat log was informative. I do feel that failures in your case are similar to mine - ultimately GPT can do it, but have to be dragged along and repeatedly corrected

Expand full comment

Many Thanks!

>ultimately GPT can do it, but have to be dragged along and repeatedly corrected

Agreed. And, unfortunately, this substantially reduces its usefulness in many cases.

Expand full comment

I think your mistake was in demanding a high-quality plot, as opposed to something comparable to the quality of modern movies :-(

Expand full comment

I agree. I bet it could generate passable Hallmark Christmas movies, for those that like them.

Expand full comment

For what it's worth -- it has always seemed to me that the clues in mystery novels are sort of bogus. For instance at the end of Agatha Christie novels maybe Hercule Poirot busts the murderer, and names of the things that were clues to him and how he put them together. But in real life, the "odd" things that Poirot picks up are a dime a dozen. I mean, they're things like somebody who usually doesn't stay up late still having a light showing in their window at 1 am, or someone who especially fond of creme brulee leaving before dessert is served. Real life is full of anomalies. So I think what a lot of mystery writers do is find ways to keep the reader from noticing that the clues are just a few anomalies of the sort that happen all the time because people aren't perfectly predictable and shit happens. *That's* the crucial skill to writing mysteries, not figuring out what striking but unnoticed clues might there be to someone having committed a crime.

I do agree, though, the GPT4 is dumb as dirt in many ways.

Expand full comment
founding

Obviously we need a proper rationalist mystery novel where the great Hercule Bayes Poirot talks us through the math - "With our prior previously updated to 0.15 that Lord Sweettooth did the crime, the fact that he left the table before dessert was served causes us to update to p=0.42; now let us consider the illumination of their window..."

Maybe we just put the details in a spreadsheet in the appendix, though.

Expand full comment

Yes, perfect. Except that Lord Sweettooth would *kill* for creme brulee, as would I. In fact maybe the murder occurred when Lord Sweettooth realized that the business matter he was called away from dinner for was trivial, involving a mere million pounds of his estate. Whereupon he bit his accountant's messenger in the jugular.

Expand full comment

You may want to look into Tree of Thought approaches, which seem more likely to work for this use case, along with passing everything generated so far in as context (if you're not already) and emphasizing the importance of consistency.

Expand full comment

Your list of seven items ends on a 5.

Expand full comment

It seems that your Manifold market is about an AI creating a movie from a single prompt, while the Metaculus question just asks that all the audiovisual material are created from an AI. Meaning that it could take as input the complete script and more, and also that it could be edited by humans.

Expand full comment

I didn't think the manifold show was well done at all. Too long, far too many puns (stop writing for other writers), an ugly set.

The only good part was the dance and sheet.

Expand full comment

So text to video is expected to cost over 50 cents per minute at the end of 2024, I think I can expect that the capability to make a full movie for me personally that I would pay for is on the low side

However the ability to buy shares in a certain prompt by 2030 is a really pleasant prospect. And if 50 cents is in the ballpark then it won't take very many shareholders/funders at all to fund an entire movie at lower than current prices

So in 2015 we were asking where our hoverboards were. In 2024 that would be nice, but no longer impressive. If AI unlocks forecasting that may be a meta technology with beautiful far reaching consequences

Expand full comment

I'd totally pay that for a pornographic movie that caters to my particular preferences.

Expand full comment

You can totally do that right now. There are porn companies blurring the lines between deep fakes and generated video that take orders. Some fetishes are more amenable than others to the format, but there is already a non trivial amount of generated porn going around.

Expand full comment

I'm only interested if I can get granular control and have believable emotions (eg illustrate a good story).

I mean what seems appealing is directing a porno with my partner and realizing that vision together.

Expand full comment

Hmm... I've played with one of the sex positive image generators (Civitai) and found that it is surprisingly hard to get it to generate even very vanilla sex positions in a controlled way from prompts. It does generate erotic images, but the positions are, if not random, close enough to it to not be worth the effort of using it.

Expand full comment

The simulated love of a pimp is very different from that of a square on your screen.

Expand full comment
Feb 20·edited Feb 20

The AI movie market tanked at the start of the year because someone sold all their shares (don't remember who, IIRC they retired from manifold ETA: it was @Mira).

It's hard to read the logs as selling of shares doesn't seem to be registered as a transaction the same way that buying is.

Expand full comment

There was some further discussion of the reasons for the jump in the market here:

https://manifold.markets/Fion/will-the-ai-make-a-movie-market-rea#14DWrpspKgvPRSbu2HCh

I personally think a big part of it is just that nobody really knows so they're just guessing what everybody else will guess. A big challenge with long term prediction markets.

Expand full comment

All the time I see people upset about how porn takes advantage of actresses so it's really depressing they don't seem to form a constituency demanding that these AI picture/video engines be freed up to make porn.

Expand full comment

I think most of those people aren't really primarily concerned on the actresses' behalf, they just have a moral objection to porn existing at all.

Expand full comment

I think it's a bit more complicated than that. I think they do care in some sense. But I agree that the emotional valence is pushing them more than the policy outcomes. That's why I labeled it depressing.

Sadly, it's not just this issue but everything. We empowered individuals with the internet and we found out that individual incentives favor advertising your values not plumping for the policies that achieve the things you value :-(.

Expand full comment

That seems more like a function of wealth inequality rather than a fully general result - most people who spend extensive time on the internet can't afford to have meaningful policy agendas at all. Let's see if cheap energy and Georgist LVT + UBI breaks the gridlock of political learned helplessness some time in the 2030s.

Expand full comment

While I'm all for UBI, I feel Georgist tax policies haven't really been developed in ways that are practical in the real world.

The issue is that either you try and impose a tax based on thr unimproved land value (eg land in NY gets taxed at the same rate as land in Montana) which, if you tried to collect a substantial amount of income that way, is going to have hugely distortionary effects on domestic vs imported food prices not to mention all sorts of problems with owning land vs renting access from the government.

Alternatively, you tax people on the value of land as developed and you have both a ton of practical trouble with valuing illiquid land and end up with a politically unpopular situation where people are at risk of being forced out of their homes as a result of improvement in the neighborhood (hence higher taxes).

In some kind of libertarian paradise the later might be ok if you could force acceptance and economically efficient but, in practice, local control of development means that it will cause communities to come together as super NIMBYs to prevent their land value from increasing and raising their taxes.

Expand full comment

I think you've misunderstood what "unimproved land value" actually means. A vacant lot in NYC is worth far more per square foot than a vacant lot in Montana because of the benefits of proximity to the rest of NYC - externalities from those improvements to other nearby land outside the individual lot in question do still count toward its assessed value. That's actually shockingly easy to assess, since the value of a location is almost by definition a matter of things which are visible from the street, such as proximity to schools and transit.

What's not being taxed is improvement on the lot itself, since you don't want to discourage people from making their own homes and businesses better in whatever ways make the most sense to them. Existing non-georgist property tax requires assessing the value of each building's interior structures, which requires a lot of invasive regulatory overhead and messy guesswork.

> a politically unpopular situation where people are at risk of being forced out of their homes as a result of improvement in the neighborhood

That's why turning around and distributing the tax revenue to everyone equally as a UBI is so important. Only people who could possibly get forced out that way are those who were not only trying to hold on to a chunk of land with value out of proportion to their personal per-capita share of GDP, but refusing to put it to some appropriately productive use. In "it's a wonderful life" terms, all of Jimmy Stewart's friends come out way ahead, Mr. Potter is the one in deep trouble.

Expand full comment

"Existing non-georgist property tax requires assessing the value of each building's interior structures, which requires a lot of invasive regulatory overhead and messy guesswork."

" Only people who could possibly get forced out that way are those who were not only trying to hold on to a chunk of land with value out of proportion to their personal per-capita share of GDP, but refusing to put it to some appropriately productive use."

So Georgism substitutes its *own* invasive regulatory overhead. Hey Granny, get out of that house you and your husband bought fifty years ago before the city expanded out this far! If you're not going to tear it down and build a blacking factory on it, at least sell it to somebody who will!

That does not sound better than "ooh, the state government will be invasive about the interior structures".

My property, I get to do with it what the hell I want, or not do what I don't want to do. While there's a certain amount of reason about not letting benefit to the community be held up by either greedy people trying to force a payday, or stick in the muds just refusing to acknowledge that things have changed, you can call it the epigenetic folk memory of forced eviction that evokes the reaction "if I own this, you can go to hell in a handbasket if you want to force me off it just because I'm not doing what *you* think I should be doing with it".

Expand full comment

Sorry for the distraction about Georgism. I think it is fully general because it reflects the actual incentives. As one voter in millions your personal incentives to telegraph your values far exceed any cost you pay because the outcome doesn't support them.

Now you might think the best way to telegraph your values is to advocate for the thing which best satisfies them but the problem there is that includes both factual and value components and there is an unfortunate tendency for people to claim they share a value (eg care for the poor) but actually always happen to think the facts mean you do the thing they really care about (ohh look the poor are always best off when you don't tax the rich). As such there is a strong incentive for someone who wants to convey that they care about the poor to advocate a position that someone with the salient competing value would never take even if it's not what they'd actually implement if they were given dictatorial control.

Indeed there are studies showing people's preferred policies differ systematically based on whether they have substantial control over the outcome.

Expand full comment

As a voter, yes. When it comes to specialized day-to-day operational decisions, there's probably a conceivable method which is somehow worse than having hundreds of millions of people all get together and vote on every detail, but I'm having trouble thinking of what it might be. That's why we try to keep the "two angry mobs screaming at each other" approach as a last resort.

Money is a far more efficient way to allocate decisionmaking power, but, like blood in an organic body or hydraulic fluid in a machine, to work correctly it needs to be circulating through all the right places. Too much pooled together in unexpected spots tends to indicate a life-threatening leak.

Expand full comment

Honestly, illustrated pornography is way better anyways. I have no idea why people want to watch two weird looking people have awkward, passionless sex.

Expand full comment

Well some of us can't draw.

Expand full comment

EDIT: Scott pretty effectively demolishes this objection below.

Regarding accuracy vs. humans, I don't think this is very interesting since they can look at the best human forecaster's trades and simply try to improve on them. In other words, even in the contest on preselected outcomes for accuracy the bots only need to do something a bit more complicated than look for an arbitrage opportunity. For instance, if they get to time the purchase just act quicker than the human superforcaster to new news. Even if not as long as these markets are largely bragging rights and aren't liquid and efficient there is every reason to believe even the best players won't put in the effort to give the absolute ideal valuation and all the AI now needs to do is identify the best forecasters and some systemic error they make.

Expand full comment

This is a good point. Obviously, in the real world, looking at other predictions is a good thing to do. But that means you don't have a very good metric of "raw" prediction ability.

Expand full comment
author

They could do this in theory, but this isn't actually what they're doing.

(source: it's pretty open about its reasoning, plus it often gets results different from the best humans)

Expand full comment

Thank you for pointing that out. It's a compelling counterpoint and I withdraw this objection.

If it's also not directly trained on past trades I'm even more impressed.

Expand full comment

Thanks Peter. Yep, I run the futuresearch bots, confirming what Manifold wrote in the tournament announcement: "These bots will *not* attempt a portfolio strategy in this tournament. (Yours can!) They will simply periodically (~weekly) make new 5 mana bets on every question."

We also comment the probabilities the bots give on every question at the time we make the bet, so you can see all our past predictions.

Expand full comment
Feb 20·edited Feb 20

>AIs are willing to work for less than $1 per hour, and have the knowledge of an encyclopedia - and if that's not enough, they can even be integrated with real-time web search capability.

Let us see if the accuracy can be improved first. I'm still seeing very wrong answers to the simple "What inorganic compounds are gases at STP?". No reference classes, no probability adjustments, no theory of mind needed, just a summary of uncontroversial facts. The latest answer (from Gemini) included water in the "gases". :-(

Expand full comment

And I still haven't found a model that does even basic arithmetic consistently, as soon as it's slightly different than what's in the dataset. As an example, I tried for hours across models to get the right answer to: "How many times can 9/8ths be added to itself but not be higher than 24? Show your reasoning.", or variations on that to try to make it easier for the AI, and the numbers always go off the rails after a few additions, or it gives a hilariously wrong answer that's wrong even on its own reasoning. Repeated guidance to try to get it to recognise the error and correct it did not work.

Interestingly, what *did* work was getting it to create a program using variables for the number being added to itself, target number, and to solve for number of additions in that program, and then it got the right answer! Still couldn't explain the reasoning though.

Expand full comment

Really? I just tried it on ChatGPT4 and it worked on the first try.

User

How many times can 9/8ths be added to itself but not be higher than 24? Show your reasoning.

ChatGPT

When 9/8 is added to itself, it cannot exceed 24 more than 21 times. This is because 24 divided by 9/8 equals approximately 21.33. Since we're looking for complete additions without exceeding 24, we can only add 9/8 to itself 21 times without going over 24.

(and it links to ​the following "code")

​# Calculation

# To find how many times 9/8 can be added to itself without exceeding 24,

# we can divide 24 by 9/8.

# The formula for this calculation is: 24 / (9/8)

result = 24 / (9/8)

result

I then tried ChatGPT Classic (also based on GPT4) and it also worked on the first try:

ChatGPT Classic

To find out how many times \(\frac{9}{8}\) can be added to itself without the sum exceeding 24, we need to solve the following inequality:

\[n \times \frac{9}{8} \leq 24\]

where \(n\) is the number of times \(\frac{9}{8}\) can be added to itself. We can rearrange the equation to solve for \(n\):

\[n \leq \frac{24}{\frac{9}{8}}\]

Multiplying both sides by the reciprocal of \(\frac{9}{8}\), we get:

\[n \leq 24 \times \frac{8}{9}\]

\[n \leq \frac{192}{9}\]

\[n \leq 21.333\]

Since \(n\) must be a whole number (you can't add \(\frac{9}{8}\) a fraction of a time), the largest number of times \(\frac{9}{8}\) can be added to itself without the total exceeding 24 is \(21\) times.

This means you can add \(\frac{9}{8}\) to itself \(21\) times before the total would exceed \(24\).

Expand full comment

Interesting, thanks - I'll need to go retest my various questions it got wrong previously.

Expand full comment
Feb 20·edited Feb 25

For what it's worth, pretty much anytime I see anybody definitively claim that gpt4 cannot do something, I then almost immediately see somebody demonstrate that gpt4 can do exactly that thing

and yet the original claim was always genuinely made in good faith and it seems like there really is some kind of crazy bimodal disconnect

i think it's about whether or not you let yourself try different prompting techniques until you get the right answer, or whether you stop at the initial incorrect answer

But I don't really know if that actually reflects well on gpt4. Sometimes just demonstrating that a capacity is theoretically possible is enough to prove a point, other times a 99% failure rate is actually relevant

Either way, if you want to actually get some use out of gpt4, it's definitely worth at least trying some prompt engineering 101 tactics. i thought by now everyone knew about promising gpt4 a cookie, or asking it to take a deep breath before answering, or telling it to reason step by step and type out each step in the argument chain

stupid little tricks like that are hugely impactful, and you can go a lot further too

Expand full comment

>For what it's worth, pretty much anytime I see anybody definitively claim that gpt4 cannot do something, I then almost immediately see somebody demonstrate that gpt4 can do exactly that thing

Both claims are true. Person 2's success doesn't mean Person 1's failure didn't happen. GPT4 both can and can't do the thing!

This is why it's still important to remember that LLMs are text-completers, not oracles with some divine access to truth, or even computer programs that reliably give the same output each time. You can get different results by changing a few meaningless words in a prompt.

>i think it's about whether or not you let yourself try different prompting techniques until you get the right answer, or whether you stop at the initial incorrect answer

LLMs can be extremely capable if you handhold them around specific issues. The problem is, it's often not clear what sort of handholding is required. Do you put spaces around each letter, to solve BPE encoding issues? Do you use RAG? Do you say "solve this problem and I'll give you $200"?

The promise of AI is that it automates human cognitive labor. This grows less true once the user needs a specialized set of "prompt engineering" skills.

Expand full comment

Yeah this is sort of what I was trying to convey with my comment

Expand full comment

Many Thanks! Ouch! I _have_ seen some math work, but I phrased it in a way that made it particularly easy for GPT4 to recognize that it was a math problem ("What are the roots of (a polynomial with roots that I knew)?"). It punted the mathematical work to a math subsystem, and, in my case, it did so correctly, but it has probably seen "What are the roots of..." in that exact form in its training data.

Expand full comment

Wait, does that mean it was compiling and running a program based on user input? It seems like that could have some major failure modes...

Expand full comment

> The latest answer (from Gemini) included water in the "gases". :-(

You know, I see a lot of gaseous water at temperatures and pressures where I expect water to be liquid. If you go out into some fog with a bucket of water, the bucket of water won't evaporate (any more than usual), but the fog will also stay fog.

Expand full comment
Feb 20·edited Feb 20

All liquids also exist as gasses, with a partial pressure dependent on the chemical, presence of other chemicals, and temperature.

The term "is a gas at STP" actually means "cannot exist as a liquid or solid at 1 ATM and 20 degrees C"

Expand full comment
Feb 20·edited Feb 20

Note: the fog is also liquid water. It's composed of sub-mm sized water droplets. There's water vapor there too, but it's invisible. Prove this to yourself: where does the water in the fog go when the fog dissipates?

Expand full comment

I believe you, but I don't think the proof works. Where does the water in rain go when the rain dissipates?

Expand full comment

Rain falls. Most of it reaches the ground, where it observably produces puddles etc. Some evaporates before it reaches the ground, which is sometimes visible in the form of rain shafts extending from clouds that don't reach the ground. However, sometimes what looks like rain shafts evaporating is actually snow melting into rain and becoming much less visually obvious. But yeah, the fog dissipation might be too slow to tell what's going on; a time lapse of cumulus clouds is better for seeing clouds form out of "thin air".

Expand full comment

So... rain, which is easily perceptible in the air as globules of water, collects visibly on the ground and flows away, or into the ground.

And fog, which is imperceptible other than as a change in the color of the air, collects invisibly on the ground and soaks into the ground, or gets absorbed by local plants.

The model is wrong (well, not the part about getting absorbed by local plants), but it's not wrong in a way that "prove this to yourself" can show.

Expand full comment

I suppose it depends on the manner of dissipation. It's common to speak of fog "lifting, which corresponds to the ground being heated by diffuse sunlight and the air adjacent to the ground warming first and evaporating the fog. As time goes on, the air higher up warms and the upper part of the fog dissipates.

That 's the common mode of dissipation of radiation fog, anyway. West Coast stratus too, except all the action happens aloft in most places.

There may be a mode of dissipation there the fog gets shallower and shallower from the top down so that it appears to sink into the ground, but I haven't witnessed it.

Expand full comment

Many Thanks!

>I see a lot of gaseous water at temperatures and pressures where I expect water to be liquid.

There is certainly a finite vapor pressure of water e.g. at room temperature. But asking about whether something is a gas at a specific temperature and pressure is, as Eric fletcher implied, really asking whether the pure substance will be in the gas phase, rather than liquid or solid, if it alone (no partial pressures!) is at that temperature and pressure.

In general, a pure substance at some arbitrary temperature and pressure (not the melting or boiling points or triple point) will be stable in a single phase. https://en.wikipedia.org/wiki/Phase_rule#Pure_substances_(one_component)

Expand full comment

Re the criterion of embarrassment, it's not that the Gospel writers "must have" been telling the truth, but that it's likelier than not that they are, or at least that it raises the likelihood that they're telling the truth. (That it's talking about what's "likely", not what "must have" happened, is even the way it's framed in the Wikipedia article you linked to.)

Expand full comment

What I'm not clear on when assessing that argument is if those things would have been embarrassing to people *at the time*. I have no idea how to check this.

Expand full comment

You'd have to read a bunch of literature and correspondence from the time, ideally stuff that criticizes or mocks other people. That would teach you what was viewed negatively at the time, if you could find the material and slog through it.

Expand full comment

In practice, this is easier (or harder) than it may look to latter days snowflakes. As for Jesus, it is pretty much: baptized by John and death on cross. The criterion does not lead to the conclusion: John baptised Jesus and angels appeared, but to: "it was known, that Jesus had been a follower of more famous John (they had a similar message, for sure) - and that 'irritating/embarrassing" detail about the "son of god" had to be 'explained'." - Other things are the role of women - who saw the empty grave first?! et al. Harder is to assume what happened, but got deleted: Did those women take part in 'the last supper'? (Not much reason to assume otherwise.)

Expand full comment

My favourite on this is the leaven of the Pharisees; after crossing the lake, the disciples had forgotten to bring any bread with them. Jesus tells them to beware of the leaven of the Pharisees, and they go "Oh, he's angry because we forgot the bread".

Jesus then has to explain to them no, he means beware the hypocritical teachings of the Pharisees.

Expand full comment

I'm not sure embarassment is the best term. It's more that there are details that don't really fit a mythologized account. Like Jesus writing in the dirt, or Zaccheus being short and having to climb the fig tree. There's some sense of observing a real person, even if you think the miracles are fake. This leads to odd or weird details that a purer myth would omit or not include.

i think cs lewis said the disciples would have had to discover realistic novel writing a millenia before anyone if jesus was totally faked. it doesn't mean its true but there was something observed by different people.

Expand full comment

That is an argument people make, but it's separate from the criterion of embarrassment, which is specifically about details that make the author look worse or otherwise actively weaken the point ostensibly being made. The most common form of the argument goes that the Crucifixion must have been a real event, because no religious sect would fabricate a story about their own god being captured and killed by ordinary humans.

Expand full comment

>“postrationalists” who were vaguely inspired by rationalist writings but also think the emphasis on facts is boring and autistic and we need to focus more on creativity/friendship/woo/intuition/vibes.

Right, because nothing is as boringly uncontroversial and fact-based as being extremely sure of an imminent robot apocalypse. A more fair description of the main disagreement would be that they prefer somewhat different woo/intuition/vibes.

Expand full comment

Most of us, including Scott, aren't "extremely sure of an imminent robot apocalypse". Zvi thinks it's 2/3 unconditional IIRC; Scott's on record as saying 20% *given that AGI is built* (https://www.astralcodexten.com/p/pause-for-thought-the-ai-pause-debate). I personally am quite sure *given that superintelligence is built in the current paradigm*, but I tend to give about 30% unconditional (there are a variety of ways that AGI Not Soon could happen).

Eliezer Yudkowsky is extremely sure of an imminent robot apocalypse. I'd say "MIRI", but not even all of MIRI is as sure as he is.

Expand full comment

And Yudkowsky is, arguably, still the main "thought leader" and community's center of gravity, so in practice the real distinction between rats and postrats is whether they intellectually respect him.

Expand full comment

The problem is that the rationalist approach to the humanist project got to the right answers very quickly as soon as someone tried:

1) Poor children are suffering and dying from easily preventable causes, let's stop that; and

2) Factory farming of animals produces an insane amount of suffering for relatively limited upside ***

However, these things are depressing to think about too much and people got bored of them (or, specifically, people whose job is talking instead of doing got bored of them; the doers are still working on it).

*** If you believe that consciousness emerges from a soul, and only humans have souls, you're exempt from #2 I guess

Expand full comment

I used to reside firmly in the camp that AI won't be conquering humanity any time soon ... however I'm defecting. Because earlier I couldn't see the mechanism by which AI held any sway over humanity outside chat rooms and what not. But I realize now, people will try to implement the [sarcastic tone here] {Perfect AI management scheme}, and that we will do ourselves in by putting all management aspects of humanity on cruise control with AI, and we will be paperclip maximized into dystopia via a thousand little paperclip cuts. For an example listen to Brian Chao's podcast From The New World, he interviews a guest who explains the UK Post Office fiasco. For those who don't know, in the UK, there are a lot of little villages where there is one little shop that sells a small set of everything, does a little banking, and manages the local Post Office. One Japanese company wrote the banking & PO software, and client stores just run with it. Only the accounting software was faulty and there were many accounting inconsistencies. The scripting for the help desk—isn't it always the help desk—the scripting included the phrase "you're the only office having problems." And the police department was arresting people for fraud they didn't commit. The software company was making bank, because they made 20k for every criminal fraud trial, and there were over a thousand criminal fraud trials. The Police department was happy, because they got a lot of easy convictions. And the UK doesn't have the robust class action system the US has, so there was no pulling a class suit. Eventually the abused people gathered an online quorum, sought financing, and sued the government and the software company. But not after thousands of hard working people were defrauded by the paperclip maximizer of the government, police, and software company. Many people's lives were devastated, they were bankrupted, many became homeless, some killed themselves, others died in poverty. Now, things are better, the government has apologized, convictions were expunged, the software company has paid the victims 20k per claim, but not after a lot of paperclips were made out of people.

In the same vein, the pandemic toilet paper shortage was a paperclip maximizer event. We still wiped the same number of asses, only we have two independent and non-mixable sources of toilet paper. There is the home system, we all go down to the market and by a pack of TP. We use about the same amount every year, its fully predictable, predicted, planed, scheduled, nothing changes, and the system doesn't have much leeway for change. We also have the Janitorial Service system, with the same rigidity, your corporate or government office hires a janitorial supply company, which contracts out months in advance the large rolls you find in the workplace toilets. The same amount of paper went into each system, however we stopped wiping with the Janitorial Supply side, and went almost exclusively to the home supply side, which couldn't keep up with the demand. The machine couldn't combine nor switch streams, hence the home supply side ran out. This is the paperclip maximizer problem.

Just listen to Klaus Schwab and his WEF, how they plan to implement a global command economy, driven in his words 'not by democratic stakeholders, but by financial stakeholders' which is to say, your elected leaders won't be calling the shots, the corporate leaders will be calling the shots ... for the global economy. So prepare to get maximized or rebel.

Expand full comment

"Private Eye" magazine covered it for years. Yeah, this is how AI will doom us - not because it becomes agentic and is not aligned with our values, but because greedy, stupid humans will push it out as a product to make money, other greedy, stupid humans will use it as a short cut to slash expenditure and make profit, and some ridiculous flaw will mean catastrophe because nobody human is watching, or is senior enough to step in and say "stop this now".

Look at the Canadian airline now being held liable for their chatbot 'customer service' which invented a refund policy and promised it to a customer. Airline tries to argue chatbot is its own legal entity and none of their responsibility, court is not having it. I think it serves them damn well right (trying to shave pennies off costs by replacing even the cheap call centre labour with a machine) but this is how it will go. AI is *not* ready to be used in the wild, due to the hallucination problem, but the greedy stupid humans have no patience to wait until it is reliable.

https://www.forbes.com/sites/marisagarcia/2024/02/19/what-air-canada-lost-in-remarkable-lying-ai-chatbot-case/

Expand full comment

> An AI that can generate probabilistic forecasts for any question seems like in some way a culmination of the rationalist project.

Yeah, in a "we've come full circle" kind of way. Rationalism was supposed to be all about justifying your estimates based on evidence, as opposed to blindly trusting some authority or indeed your own gut. And now, it "culminates" in a blindly trusting a black box to produce estimates for you. Progress !

Expand full comment

I mean, it's not blind trust. That's kind of the whole point, there's money on the line and skin in the game and what is most effective will hopefully tend to win

Expand full comment

Искусственный интеллект не имеет искусственного дефицита в отличие от группы людей подверженных определенной фиксации (ритмы) на рабочем месте, в данном контексте сверхизобилия ,возможно вовсе не так привлекательна ,как звучит ее название . Материальное изобилие намного более вероятно ,поскольку на этом уровне все еще сохраняется интерес к своей личной судьбе и судьбе других. Сверхизобилие относиться к концепции чего то за пределами изобилия-пространству где жизнь свидетельствуется самой жизнью. Человеческая жадность за любовь и деньги - энергия ,которая для получения желаемого идет на компромисс с собственной целостностью и в этом ее беда. Восхождение -это невероятная последовательность физических явлений -живой материал для трансценденции

Expand full comment

Did you miss the part where they graded its past forecast accuracy? Very different from "blind trust."

Expand full comment

No, I did not miss it, I'm just looking a few steps ahead.

Expand full comment

Actually it's the opposite. [FutureSearch person here.] With prediction markets / Metaculus, best you can hope for are some isolated comments from predictors, that almost never lay out the full chain of reasoning. FutureSearch does this every time - it cites its web research, the facts it finds, reference classes it generates, lays out all the data points in statistical models, how it weights the base rates, and why it makes adjustments, etc.

There are still gaps. Some deductions involve GPT-4 so have some fundamental inscrutability. But I do think though that this is the first ever way of forecasting that is nearly completely transparent in how it works (and uses the same methodology on every question).

Human forecasters are pretty mysterious when you get down to it.

Expand full comment

Sorry if I'm misunderstanding the algorithm -- but isn't it in fact *primarily* based on GPT-4 ? Yes, it reports on all the self-prompts it is generating, but these prompts are ultimately fed back through GPT-4. As the result, the process appears to be sensitive to the phrasing of the initial prompt (as Scott had demonstrated), because the differences between the different possible prompt phrasings are amplified at each step.

That said, I do agree that human forecasters are "pretty mysterious" as well, as you've put it. I am not advocating that one should trust humans over FutureSearch; I'm saying that the Rationalist tendency to treat prediction markets and/or LLMs as some kind of oracles is not very (lowercase-r) rational, and one should not blindly trust predictions obtained by either means. Of the two, I would still trust humans a little more simply because their failure modes are more predictable than those of LLMs -- though perhaps this will change in the future.

Expand full comment

I think we mostly agree - "rational" lines of reasoning are rarely as rational as they seem, and even in objective scorable arenas like forecasting, humans or LLMs or any other systems are pretty illegible. I think futuresearch is significant progress on this front, users can judge for themselves.

Expand full comment

> FutureSearch’s AI tries to do something similar. It prompts itself with questions

Finally someone is contributing actual effort to produce reasoning ability from LLMs. We should pour much much much more effort in scaffolding and prompting of what we already have instead of training bigger and bigger models.

> Vitalik On AI Prediction Markets

An interesting idea, but as with every other idea for crypto application, it would work even better without crypto.

Also, yes, I think it's quite likely that regular market mechanisms will just eventually produce a leader from all these oracles-bots, and then it would be little need to use the prediction market type of system aggregating the predictions of inferior models.

Expand full comment

"An interesting idea, but as with every other idea for crypto application, it would work even better without crypto."

I'm going to steal this as a succinct and accurate stock answer.

Expand full comment

As an autisty nitpick, crypto adds value to a system when you need to have the scarcity of some object be guaranteed by the unassailable mathematics of hard cryptography

i can think of a few cases where this is a desiderata. money is the big one ofc. but also names: a name's entire value is as a unique identifier, and i could see some value in ensuring that your name literally cannot possibly be taken from you or used without your permission

those are the only two use cases i can think of, though

I remember the period of time where people were trying to put everything on the blockchain, I remember seeing an 'httpcoin' webserver-on-a-blockchain where http requests would get served by proof-of-work miners in exchange for a cut of the generated httpcoin

I remember seeing somebody point out that when DPR wanted to make a drug market website utilizing cryptocurrency tech, he used regular webserver and regular onion routing and regular forum software and just had the currency be bitcoin. he didn't invent a whole silkroadcoin protocol and try to have the shipping address decryption key locked behind an arbitrary mining constraint

I've thought about that ever since, every time I see a new crypto usecase... why are people so obsessed with importing energy-expensive proof-of-scarcity into their products, it is so utterly bizarre

It's a bit like nfts. If people were going around forging certificates of ownership all over the place for expensive possessions, I could see why it might be useful to set up some kind of decentralized CoO registry where such certificates were impossible to forge. But, uh. There's like, a couple cases of certificate forgery per decade and they are very easy to sort out.

On top of that, it seems really obvious that what people are really paying for is the certificate itself, and not the underlying property, which is so pants on head backwards I almost can't believe it

i am so glad people eventually realized this and NFTs are dying

sorry for the rant. I just really like pointing out that namecoin was probably a really good idea and should have caught on, modern DNS sucks and namecoin seemed like a strict improvement

Expand full comment

>"As an autisty nitpick, crypto adds value to a system when you need to have the scarcity of some object be guaranteed by the unassailable mathematics of hard cryptography

i can think of a few cases where this is a desiderata. money is the big one ofc."

Nit on nit: the problem I've had with all such attempts is that the pro-crypto crowd subscribes to a similar sort of monetary economics as the goldbugs do; BTC's fixed finite potential quantity (especially when coupled with effectively-permanent losses of many coins) is a recipe for deflationary disaster.

Expand full comment

I think BTC is an interesting cross between an investment and a currency. It strikes me as somewhat similar to "e-gold", and I wonder if it will have a similar fate.

https://en.wikipedia.org/wiki/E-gold

Expand full comment

To get cryptographically verifiable unique identities, you don't need a blockchain, you can just use normal public-key cryptography. This doesn't get you human-readable names, and that seems like the sort of thing that maybe there might be a blockchain based solution for (you at least need some method involving more communication when assigning names to ensure they're unique), but I haven't thought it through and that still feels to me like a case where the value added by the blockchain over simpler methods is dubious.

Expand full comment

I highly suggest looking into namecoin, then!

it's interesting because it failed as a cryptocurrency, but also the value it provides doesn't come from the value of the coin itself, it comes from the actual functionality of the coin as a mechanism by which domain names can be allocated, so the price of the coin doesn't actually matter

and, unlike with certificates of ownership, centralized control of domain names by ICANN et al actually does trouble me a great deal and i would greatly prefer a decentralized system

it really does seem like a genuine example of blockchain technology being utilized in a way that adds a great deal of value

but nobody was really interested in it. since the value of any given coin was equivalent to the value of the domain name that coin represented, you couldn't just speculate on generic namecoin, which meant it didn't get to take part in the enormous crypto bubble. so it just kinda died on the vine.

Expand full comment

>i can think of a few cases where this is a desiderata. money is the big one ofc. but also names: a name's entire value is as a unique identifier, and i could see some value in ensuring that your name literally cannot possibly be taken from you or used without your permission

>those are the only two use cases i can think of, though

There are a couple of name-like things which also need (near?)-global uniqueness:

ISBNs, UPCs

Also, for rivalrous, fixed supply goods like land, valid title to the good needs some flavor of uniqueness.

Expand full comment
author

I've heard claims that it's going to be hard to give AIs access to bank accounts that will let them make split-second bets, and easier to give them access to crypto. I don't know if this is true and would welcome a perspective from someone with bank regulation experience.

Also, gambling remains semi-illegal, and crypto sometimes is better at circumventing that than fiat.

Expand full comment

I don't know about the technical details, but this might be of interest wrt your question:

https://manifold.markets/SteveSokolowski/will-my-stock-trading-market-exceed

Expand full comment

> I've heard claims that it's going to be hard to give AIs access to bank accounts

Which would be good from the AI safety perspective, don't you agree? I find the fact that crypto allows to circumvent this restriction to be quite unfortunate.

> Also, gambling remains semi-illegal

If no humans are participating in what sense is it gambling? AIs do not need real money to be stimulated to make their predictions. You just pay your subscription fees to get access to the oracle service and ask whatever questions you want. And then the system can abstractly reward AIs that happen to perform better. Though, even this seems to be too anthropomorphic. Probably the AIs that perform better will just be used as a base for the next generation of oracles.

Expand full comment
author

I don't think serious AI safety concerns are bottlenecked on bank accounts. Any actually dangerous AI can figure out a way to circumvent security or get a human confederate.

The system I'm imagining is one where any company can program an AI (eg FutureSearch above) and enter it into the market. The company will need some financial incentive to do that.

Expand full comment

> I don't think serious AI safety concerns are bottlenecked on bank accounts. Any actually dangerous AI can figure out a way to circumvent security or get a human confederate.

Sure, but it's not a reason to make the situation even easier for misaligned AIs. Imagine how pathetic it would be to go extinct not due to superintelligent general AI but due to slightly more intelligent than humans in its domains, not even fully general AI.

As long as we do not have any really good bottlenecks, every one of the bad one counts. Layers and layers of imperfect sequrity in hopes that it will buy us enough time to come up with something better.

> The system I'm imagining is one where any company can program an AI (eg FutureSearch above) and enter it into the market. The company will need some financial incentive to do that.

As I said the financial incentive is the subscription fees that users pay. Companies do not need to enter the prediction markets for humans there can be aggregator platforms that allows different predictor bots to answer your questions, though I suspect that even this wouln't really be necessary because it would be more obvious than not which oracle is better than others and people will be just asking it directly.

Expand full comment

company running the bets (using an AI to do the betting), right? So there's no barrier of it just having a bank account like any hedge fund that uses AI to buy stocks.

On gambling remaining semi-illegal, probably, but unclear if you could actually avoid this problem at scale by using crypto (you can set it offshore, but then polymarket already does that without the crypto). There's a medium scale where the crypto element makes it just complicated enough for regulators to not bother charging you with anything, but that's not scalable and hard to rely on long term.

Expand full comment

What question(s) caused Gemini and Claude bots to sharply go into negative profit around January 08?

Expand full comment

I've posted about this on the Open Thread, but it seems worth repeating here.

As seen above, it is currently possible to buy no shares on Trump winning the presidential election at 46c on Polymarket (representing a 54% chance that he wins). I note that one market participant has bet $907k that Trump will win and now holds 27% of the shares. Based on their activity, it seems plausible this person has appetite to bet more.

Meanwhile on Betfair Exchange, odds of 5/4 are offered on Trump winning (repesenting a 44% chance that he wins). It would be possible to bet $3,750 at those odds right now, but I would be confident that a modestly larger bet would be matched fairly quickly.

The resolution criteria are slightly different. Polymarket has: "The resolution source for this market is the Associated Press, Fox News, and NBC. This market will resolve once all three sources call the race for the same candidate. If all three sources haven’t called the race for the same candidate by the inauguration date (January 20, 2025) this market will resolve based on who is inaugurated."

Betfair has: "This market will be settled according to the candidate that has the most projected Electoral College votes won at the 2024 presidential election. In the event that no Presidential candidate receives a majority of the projected Electoral College votes, this market will be settled on the person chosen as President in accordance with the procedures set out by the Twelfth Amendment to the United States Constitution. This market will be settled once both the projected winner is announced by the Associated Press and the losing candidate concedes. If the losing candidate does not concede, or if there is any uncertainty around the result (for instance, caused by recounts and/or potential legal challenges), then the market will be settled on the winner decided by Congress, on the date on which the Electoral College votes are counted in a joint session of Congress. This market will be void if an election does not take place in 2024. If more than one election takes place in 2024, then this market will apply to the first election that is held. Once voting (whether postal, electronic or at the ballot box) begins in the year 2024 for the US Presidential Election 2024, the election will be deemed to have taken place for the purposes of this market. We will then settle the market as per our rules regardless of whether the election process is fully completed in 2024 or beyond. If there is any material change to the established role or any ambiguity as to who occupies the position, then Betfair may determine, using its reasonable discretion, how to settle the market based on all the information available to it at the relevant time."

In most cases I imagine those criteria will produce the same outcome, but if anything I would expect Trump to be slightly more likely to win by the Betfair criteria.

So suppose somebody bet $5k at 5/4 that Trump would win. At the same time they could spend $5,125 buying no shares at 46c, which gets you 11,141.3 shares.

If Trump wins, the yes bet pays $6,250, less 2% commission, is $6,125 (NB bets are actually in GBP, I convert to USD for simplicity). The player loses their $5,125 no bet, so they are $1,000 up overall.

If Trump loses, the no bet pays $6,016.30 (actually in USDC). The player loses their $5k yes bet, so they are $1,016.30 up overall.

Ignoring transaction costs, this represents a return of 9.9% on $10,125 over a period of less than a year.

Of course transaction costs might be significant, and the player is also exposed to some risks, including at least: (1) USD:GBP currency fluctuations, (2) the stability of USDC, (3) the reliability of both exchanges and (4) the possibility of differing resolutions. Nevertheless, the arbitrage seems potentially profitable to someone able to transact on both exchanges at reasonable cost and who was sanguine about the risks.

It seems a bit surprising that two exchanges each with c$6m of bets on this outcome have come up with probabilities differing by as much as 10%.

Expand full comment

There's also taxes to consider. You'll have to pay tax on the full profit on your wining market. But you may not be able to deduct the loss on the losing market. If not you'll make a loss even at a 10% pre-tax return. Eg in the US you have to qualify as a professional gambler to be able to deduct the losses.

Expand full comment
Feb 21·edited Feb 21

In the UK, the profit is not taxable (precisely because HMRC don't want people to deduct the losses, since they expect most gamblers to be loss-making). This potentially increases the attraction of the scheme, since the theoretical 9.9% return would be untaxed, and the counterfactual return on depositing the cash would be taxed in most cases.

You are right that somebody planning to do this would need to understand the tax consequences in their own jurisdiction. I am not able to say whether the UK or US rules are more typical. The US tax position is unlikely to be of direct relevance, since it is (AIUI) illegal to bet on the outcome of elections in the US.

Expand full comment
Feb 20·edited Feb 20

I like the new robot mantis for Mantic Monday 😀

"Instead, imagine a search engine where you can pay 0.01 ETH to ask a question like “Will Prospera have a population of at least 10,000 people by 1/1/2026?”

Or I could save my money and the hassle of trying to work out how the hell to use cryptocurrency and just wait two years to find out (my guess? No).

That video was... well, it was a video. "some kind of extremely well-done" - let's just say, opinions vary on that. I don't like musicals/Broadway numbers, so from the opening it certainly lost me. I fast-forwarded (apologies) and there were points where I was going "what in the name of God am I watching?" and I have no idea who won a date, if anyone, but this is not selling me on prediction markets. Still, bully for all willing to stand up on a stage and make an exhibition of themselves!

EDIT: Oh! Is anyone running a prediction market on this year's Eurovision Song Contest? We've got politics galore with Israel and Ukraine, as well as both Armenia and Azerbaijan being contestants (as of right now, at least)! This is the Irish entry, which leaves me 🙄

https://www.youtube.com/watch?v=n73nIfFI3k4

EDIT EDIT: Since I've scorned the rationalist singing and dancing, let me link to something equally wince-inducing from the Irish Catholic side (80s Irish youth groups had a Fr. Noel Furlong in charge in reality, this is not satire or comedy so much as documentary):

https://www.youtube.com/watch?v=lk_-E2eBrrQ

Expand full comment

> I like the new robot mantis

Humanity finally created a counterpart for itself, a true mate, but then...

Expand full comment

The Irish entry hits me the same way, but the backup dancers look like something out of "Pan's Labyrinth". Maybe that's good for them?

Expand full comment

It's maybe good for Eurovision, in that dated pop genres are 'modern' in that context, but I do get "yes, dear, very 90s Marilyn Manson" vibe from it. It's not really shocking or novel. It'll do middling okay, *may* make it to the final, but not much else.

Expand full comment

The laughtrack is completely preventing me from watching the video.

Expand full comment

>(I don’t know why it went down so much from late 2023 to early 2024.)

Probably just a general expression of no major public-facing AI improvements coming in during that period and hype hangover setting in.

Expand full comment

"..there’s a group called the “postrationalists” who were vaguely inspired by rationalist writings but also think the emphasis on facts is boring and autistic and we need to focus more on creativity/friendship/woo/intuition/vibes."

- Every ideological group that grows to a certain size, reproduces within itself the divisions that separated it from its ideological competitors at an earlier and smaller stage of its existence.

...a sociological law that has yet to be named.

Concerning why groups, if they grow (rather than to implode and become sects) may tend to reproduce cultural/ideological distinctions that earlier sat them apart from other groups, a possible mechanism is that members of the group compete to shine in the eyes of group members. And one way to shine, is to take an idea from outside the group, twist it so that it sort-of can be seen as an innovation within the group, and then use that to signal some kind of locally superior intellect/coolness/whatever.

Just a speculative hypothesis, of course. I do not know if it can help explain the emergence of post-rational rationalists.

Expand full comment

Possibly relevant: https://meaningness.com/geeks-mops-sociopaths

Expand full comment

This is interesting. Well written, also.

...I am reminded that the successful faction of the Russian communist party (the Bolsheviks) arguably started out as a political subculture in this sense. (Which went on to become the Mother of all influential communist parties, such as the Chinese and Cuban communist party. Makes you think. )

For what it is worth, the law came to me in student days, when a friend remarked " the marx-lenin-stalin-mao party is so dominant at our university right now that the most important ideological divide is not between them and the rest of us, but right through their own group."

…more generally, studying small-group or sub-culture interaction is a fruitful way to get a grip on the importance of the “social” in the biopsychosocial approach to human behavior. Group dynamics are not captured by only studying individual psychological factors (and/or genetics).

Expand full comment

In the first three WarCraft games (before the MMO version) I noticed that no matter what happens, the conflict is eternal and balanced in long term. When a group wins, it splits into two competing factions. When a group loses, it finds a new ally.

For example, humans finally deflect the orc invasion, but instead of peace, you simply get the conflict between prince Arthas and Uther the Lightbringer. Then Arthas joins the undead. Then the undead start fight against each other...

Expand full comment

> (I don’t know why it went down so much from late 2023 to early 2024.)

May have been related to AI generated movies being released? I know of one (Scalespace) which got screened around that time.

Expand full comment

The Nermit bot -- the one that does future search -- would it be possible now to make a version that addresses questions of personal interest? For instance deciding whether to rent for another couple years or buy, and if the latter where to buy. Relevant issues are what home and rental prices are likely to be in a few years in various parts of the state, future changes in interest rates, building projects that are in the works in different areas, etc. Seems like the steps you'd have the AI take would be pretty much the same ones Nermit takes in formulating predictions, except that you'd need to enter relevant personal data about savings, etc. , and maybe tell it where to look to find relevant info about whattup with housing in one's area. Jeremy Howard teaches a bunch of courses in what he calls "Fast AI." Students learn ways to train an AI that can be done on a home computer. I was looking at the forum for students in a beginner's class, and there was a farmer training his AI to recognize areas with a lot of weeds from photos of his land. That has me thinking about personal uses of AI.

Expand full comment

Hi there, great question. Yes, we at futuresearch test by forecasting personal questions all the time. Our main forecasting approach, finding historical analogies and extrapolating outcomes, doesn't usually work on these types of questions.

Other approaches we've tried do better, like doing fermi estimates, or writing and executing simulations. But these aren't our focus now, we're trying to do as well as we possibly can on near-term geopolitical questions. Like you, and like Scott, we'd like to expand from there at the right time.

Expand full comment

Duncan Horst ... Traditionally the name Duncan means "Dark Warrior" and Horst is German for Ghost. So this name more properly means Dark Warrior Ghost.

Expand full comment
Feb 20·edited Feb 20

I hate love reality shows, San Francisco, and polyamory, like rationalists, tech, and betting, and am neutral with high variance on Aella. Should I watch Bet on Love?

Expand full comment

(Disclaimer: I'm a research scientist at FutureSearch.ai)

Regarding

> it doesn’t sound too outlandish that you could apply the same AI to conditional forecasts, or to questions about the past and present (eg whether COVID was a lab leak

We found that, quite generally, reference class forecasting, somewhat surprisingly, goes a long way. Asking "Was COVID19 a lab leak?", for example, we get some decent or at least interesting reference classes like

> Portion of "Governmental or authoritative bodies have assessed that a high-consequence event is likely the result of a specific type of human error or accident." where "Subsequent investigations or consensus within a period of up to 5 years confirm the initial assessment as the primary cause of the event.": 55%

(based on events like the Deepwater Horizon oil spill (2010), Space Shuttle Columbia disaster (2003), Chernobyl disaster (1986), Three Mile Island accident (1979), Banqiao Dam failure (1975), Grenfell Tower fire (2017), Boeing 737 MAX groundings (2019), and 13 more examples)

and

> Portion of "Pathogen samples have been identified at a site associated with the initial outbreak of a disease" where "The disease is determined to have originated from that site within 5 years of the initial outbreak": 68%

(based on viruses like Zika, Ebola, various avian influenza viruses, and H1N1)

as well as some nonsensical ones.

Also, just to confirm Scott's conjecture: In the hope of finding at least one good approach, we do try many different ways of tackling a question (many of which end up being rather underwhelming, granted) and then try to give more weight to the better ones. Of course there is still a lot of work to be done since we can do a lot better than just asking GPT-4 what's good and what's not.

Expand full comment

It would be nice if you could attach one of these AI probability generators to all AI answers. Part of the problem with current AI is that it's so often confidently *wrong*. It spits out all answers with the same tone of total certainty, so I can't rely on it for anything unless I have some other way to check it. A probability, even a rough estimate, would be an easy way to know when it's worth paying attention to, and when we should disregard it.

Expand full comment

While I take Scott's point about the Swift Center's question, I have to admit that I interpreted the question the same way as the Swift Center did. Its not clear to me that most people would think to interpet it in any other way!

Expand full comment
Feb 21·edited Feb 21

I've only seen the first few minutes, but the musical looks pretty funny as a self-satire of the rationalist community. Everyone is from San Francisco, and you hit the usual jokes about Yudkowsky, shibari, polyamory... I think people here at least would enjoy it.

Expand full comment