943 Comments
User's avatar
User's avatar
Comment deleted
Feb 13
Comment deleted
Tatu Ahponen's avatar

I want it to give me a list of 5 books useful for analyzing Finnish political parties from the point of view of their views on technology and AI for a bachelor's thesis.

Tatu Ahponen's avatar

Thanks! I probably should have specified that I'm mainly looking for sources in Finnish, but of course English-language sources are useful too.

The books appear to exist, but I spotted two issues: insofar as Google tells me, the reports in 4. are at least cited to be drawn up by a committee, not specific editors, and "The Tandfonline article "Nordic lights? National AI policies for doing well by doing good" (2021)" is a bit strange (is it normal to refer to Taylor & Francis as "Tandfonline"?), and the article seems to have been written in 2020, not 2021.

ProfGerm's avatar

>is it normal to refer to Taylor & Francis as "Tandfonline"?

I've seen that before, though more freshman-undergrad than I've come to expect from Claude. Formatting tends to be one of its strengths, other than inserting Chinese characters to English conversation.

In my experience with Opus 4.6 it usually creates really accurate citations, and doesn't Sonnet 4.5's propensity for occasional Chinese characters as adjectives.

Tatu Ahponen's avatar

Sure, but when it comes to formal citations?

ragnarrahl's avatar

I don't think the prompt asked it to be formal.

Tatu Ahponen's avatar

Sure, it didn't, but *contextually* someone who is asking it like this is most likely writing a bachelor's thesis, which I am in fact in the process of doing, and would thus require formal cites.

Notmy Realname's avatar

If you wanted Finnish books in response, you'd probably have substantially better luck running the initial query in Finnish. Claude can speak Finnish (and any other language), but generally won't respond in a language other than the input prompt

meeeewith4es's avatar

It will, as long as you prompt it to. Won't share a claude.ai link as it includes the account name, but just "Recommend 5 books about political parties in Finland. Respond in Finnish." got me:

> Tässä viisi suositusta Suomen puolueista kertovista kirjoista:

> 1. "Suomen puolueet: Historia, muutos ja nykypäivä" - David Arter

> ...

Result quality may vary if you do this, both to worse (context in multiple languages) to better (as english is overrepresented in training data).

Tatu Ahponen's avatar

"Suomen puolueet: Historia, muutos ja nykypäivä", which I should be getting next, is by Rauli Mickelsson, not David Arter.

meeeewith4es's avatar

This was with Claude Sonnet 4.5, here's what Opus 4.6 gives (noting that my prompt is simplified for demonstration to the other user, "Recommend 5 books about political parties in Finland. Respond in Finnish."):

Tässä viisi kirjaa Suomen puolueista ja puoluepolitiikasta:

1. **Rauli Mickelsson: *Suomen puolueet – Vapauden ajasta maailmantuskaan*** (Vastapaino, 2015) – Kattava yleisesitys Suomen puolueiden historiasta ja kehityksestä. Yksi parhaita suomenkielisiä kokonaisesityksiä aiheesta.

2. **Juhani Mylly: *Political Parties in Finland: Essays in History and Politics*** – Kokoelma esseitä, joka käsittelee suomalaista puoluejärjestelmää laajasti, mukaan lukien fennomaanit, maalaisliitto, kokoomus, SDP ja vasemmistoliitto eri aikakausina. [Google Books](https://books.google.com/books/about/Political_parties_in_Finland.html?id=0JpKvgAACAAJ)

3. **Lauri Karvonen: *Parties, Governments and Voters in Finland: Politics Under Fundamental Societal Transformation*** (ECPR Press, 2014) – Kirja tutkii, mitä tapahtuu puolueille, hallituksille ja äänestäjille, kun puoluejärjestelmän perustana olevat yhteiskunnalliset rakenteet muuttuvat nopeasti. Karvonen osoittaa, että Suomi on erityistapaus erityisesti ideologisen kilpailun heikkenemisen ja toistuvien populististen protestiaaltojen osalta. [Google Books](https://books.google.com/books/about/Parties_Governments_and_Voters_in_Finlan.html?id=qIUargEACAAJ)

4. **Karvonen, Paloheimo & Raunio (toim.): *Changing Balance of Political Power in Finland*** (Santerus Academic Press, 2016) – Kokoomateos, joka tarkastelee poliittisen vallan muutoksia Suomessa vuosina 1970–2015.

5. **Kari Katajisto: *Väyrysen ja Kääriäisen korkean profiilin aika 1981–1991: Keskustan historia 6*** (SKS, 2023) – Tutkimus analysoi, miten keskustapuolue selvisi kohtaamistaan haasteista Kekkosen jälkeisellä aikakaudella, kun puolue oli syvästi jakautunut ja sen perinteinen tukijoukko pieneni jatkuvasti. [Finlit](https://oa.finlit.fi/site/books/m/10.21435/ha.155/)

Nämä kattavat aihetta sekä laajasti että yksittäisten puolueiden näkökulmasta. Pari näistä on englanniksi, mutta Mickelssonin ja Katajiston teokset ovat suomeksi. Haluatko tarkempia suosituksia jostain tietystä puolueesta tai aikakaudesta?

NotG's avatar

Is this a question seach can't answer? from Google search:

AI Overview

Analyzing Finnish party views on AI/tech for a thesis requires combining general political behavior studies with digitalization policy reports, as few single books cover both exclusively. Key resources include Political Behaviour in Contemporary Finland for party dynamics, alongside AI policy documents from VTT and Sitra.

Here are 5 recommended resources and approaches to analyze Finnish party stances on technology and AI for a bachelor's thesis:

1. Recommended Resources & Contextual Literature

"Political Behaviour in Contemporary Finland" (Ed. by Hanna Wass & Peter Söderlund)

Relevance: Provides the foundation for understanding party competition, voter behavior, and how parties adapt to new issues, crucial for analyzing tech adoption.

2. "Comfortably theirs: The Finns Party Members and Their..." (Lund University Student Paper / Thesis context)

Relevance: Useful methodological example of analyzing specific party member ideologies and trust issues, which can be applied to tech skepticism or enthusiasm.

3. Sitra Reports on Digitalization and AI (e.g., "AI in the Finnish Public Sector")

Relevance: While not a traditional book, Sitra's research is essential for mapping how policy, including party-driven initiatives, shapes AI adoption in Finland.

4. VTT Technical Research Centre of Finland - Technology Policy Studies

Relevance: Reports on technology trends and societal impacts often reflect the consensus or debates among political actors regarding regulation vs. innovation.

5. Finnish Government Reports on AI ("Finland's Age of Artificial Intelligence")

Relevance: The 2017–2019 AI steering group reports, chaired by Pekka Ala-Pietilä, provide the baseline strategy that all parties responded to, revealing ideological differences in implementation, regulation, and ethical concerns.

Tatu Ahponen's avatar

I ran essentially the same query through ChatGPT and it hallucinated several results. I wanted to see if Claude can do it without hallucinations.

ibnukhaldun's avatar

Based on recent linguistic, archeological, and genomic evidence, what are the most plausible theories explaining the origins and spread of the Proto-Semitic language.

ibnukhaldun's avatar

I'm in the more AI-skeptical bucket and I am not surprised or impressed by that answer. It's overconfident (evidence doesn't appear to be "converging" on anything yet) overweighting unreliable methodologies (linguistic phylogenetic analysis has proven really inaccurate in other contexts that I'm more familiar with), and fits things together in a way that doesn't seem coherent (e.g. mismatching dates).

If the test is whether I could get a better answer myself in an hour using Google scholar, I'm confident I could.

Igon Value's avatar

It seems to me that experts *always* have issues with what's emphasized too much or not enough. Claude mentioned "convergence" and "leading hypothesis", but that doesn't seem obviously false to me. Maybe you find these assertions misleading? Framing matters, obviously. Claude definitely mentioned that there was no consensus:

"No consensus exists for placing the linguistic homeland of Proto-Semitic, and the question of when and where Proto-Semitic itself diverged from Afroasiatic remains unresolved .")

I love this exercise but I strongly suspect it won't convince anybody one way or the other.

Marcus A's avatar

Experts always disagree - in any field. There is a saying: "10 doctors - 11 opinions"; "10 Engineers - 11 Opinions"; "10 Lawyers - 11 Opinions"- and finally "Too many cooks spoil the broth". First 10 years of Wikipedia experts were not convinced as well. Now look who is still using any expert authored encyclopedia. Same is happening with LLMs.

Kenny Easwaran's avatar

Experts still use expert authored encyclopedias!

At least, for anything in philosophy, if there’s a Stanford Encyclopedia entry, it’s usually much more useful than the Wikipedia entry. The Wikipedia entry often takes a slightly odd angle (as if a computer scientist had read a few philosophy papers) rather than giving the disciplinary perspective. That’s sometimes useful, but usually not what I want.

Still, Wikipedia has the huuuge advantage that there’s an article for basically everything, while SEP only has a few hundred (or a few thousand?) articles.

ibnukhaldun's avatar

This sentence is the biggest issue: "The converging picture from multiple disciplines is that Proto-Semitic most likely coalesced in or near the Levant during the late 4th millennium BCE, with subsequent expansions driven by Bronze Age population movements, pastoralist migrations, and aridification events."

That's the closest the response comes to a concise answer to my question, and it is a ridiculous overclaim. Two studies is not a "converging picture from multiple disciplines," particularly not when you go look at the studies in question. One uses a really controversial methodology and the other is tentative and difficult to interpret. And I also asked about archeology but it didn't touch that data at all.

For what it's worth, I'm not an expert by any stretch, just an interested layperson familiar with how the equivalent question was answered for a couple of other protolanguages.

Dustin's avatar

FWIW, as a heavy AI user, I wouldn't ever expect a one shot answer to questions like these. I'd be probing the assumptions and lots of other things in a back-and-forth that probably would consist of a 10-40 turns.

Robert Jones's avatar

FWIW, I shared these comments with Claude (and also Dustin's comment above), and it said:

Taking the specific criticisms in turn:

The "converging picture from multiple disciplines" line is genuinely overcooked. It's leaning on Kitchen et al. (2009) for the linguistics and Almarri et al. (2021) for the genomics, and presenting two studies from two fields as though they constitute a broad interdisciplinary consensus. That's the kind of confident-sounding synthesis that reads well if you don't know the area but looks sloppy if you do. The commenter is right to flag it.

The point about Bayesian phylogenetic analysis is also well-taken. The commenter says they're familiar with how it's performed in other contexts — they're quite possibly thinking of Bouckaert et al.'s Indo-European homeland study, which used similar methods and was heavily criticised by historical linguists. Presenting Kitchen et al. as the basis for a "leading hypothesis" without flagging that the methodology is contested is exactly the kind of thing that would make a knowledgeable reader distrust the whole response.

The archaeology point is the most straightforwardly damning: the question explicitly asked about archaeological evidence, and the response barely touches it. There's a passing reference to Saharan pastoralist migrations but no systematic treatment of, say, the Early Bronze Age archaeological record in the Levant or Mesopotamia. That's a genuine omission, not a matter of framing.

So the commenter is right that this is a response that looks authoritative to a non-specialist but has real problems visible to someone who knows the terrain. And that's a perfectly good reason for that particular person not to be impressed.

But — and this is where I think the point about iterative use matters — the response is a reasonable starting point. If you treated it as a first draft and pushed back on the overconfident synthesis, asked specifically about the archaeological record, questioned the reliability of the phylogenetic methodology, you'd get a much better result. The commenter's benchmark of "could I do better in an hour with Google Scholar" is fair for a one-shot comparison, but it's not how the tool works best, and that gets to the heart of the gap Scott is trying to bridge. The sceptic evaluates a single output; the regular user evaluates a workflow.

None of which is to say the one-shot output shouldn't be better. It should. The overconfidence problem in particular is a real and persistent issue — I'm more useful when people push back on me than when they take my synthesis at face value, and that's not an ideal property for a tool to have.

ibnukhaldun's avatar

I was thinking of Boukaert and I am impressed it figured that out.

Meefburger's avatar

I can't tell if you think it said something that is verifiably wrong? Maybe the mismatching dates?

Since it sounds like you're already familiar with the subject matter here, I'm guessing you could get a better answer yourself with an hour on Google Scholar. But isn't that the sweet spot Scott mentioned?

ibnukhaldun's avatar

Not verifiably wrong, that would go too far. And looking closer at the Almarri study, the blame should lie there as at least as much as with Claude. (Without getting too much into specifics, I don't understand how an unexplained introgression from Iran 5900 - 3300 ya supports a Levantine origin 5750 ya. (i) Almost all that range is after 3750 BCE and (ii) well, Iran is... not in the Levant...)

The "sweet spot" refers to the sort of question Scott wanted. The test is whether I am "surprised" by the quality of answer. I am not, it's a crappy answer.

Meefburger's avatar

Thanks. Sounds to me like you're calibrated, fwiw. But keep in mind, sometimes getting the crappy answer in like 30s is pretty useful.

Mark A. Bahner's avatar

"But keep in mind, sometimes getting the crappy answer in like 30s is pretty useful."

Yes, I'm retired from about 30 years of contract research on environmental and energy issues. If I'd had LLMs, I would have been waaayyyy more productive, because on issues with which I wasn't familiar to start with, I'd be up to speed in minutes, rather than hours or days.

The trick would be in seeing, "Wait a minute, that's not right."

I think it was one of the earliest ChatGPT versions, I asked questions about electrostatic precipitation for removal of particulate from coal-fired boilers. (The subject of my master's thesis.) Although the answers were very confident, they were laughably wrong...indicating that the LLM didn't have the slightest idea what it was talking about.

But new LLMs seem like major leaguers compared to the Little League answers of that early version.

NotG's avatar

Is this something search can't answer? Here's google's answer

AI Overview:

Based on recent multidisciplinary evidence—combining computational linguistics, archaeology, and genomics—the most plausible theories locate the origins of Proto-Semitic in the Levant around 3750 BCE (Early Bronze Age), with strong ties to an earlier Afroasiatic root.

Here is a breakdown of the leading theories and evidence:

1. The "Levantine Homeland" Theory (Most Plausible)

Modern consensus has largely shifted away from an Arabian origin toward the Levant (modern Israel, Palestine, Lebanon, Syria, Jordan).

* Linguistic Evidence: Computational phylogenetic analysis suggests that the earliest split in the family was between East Semitic (Akkadian) and West Semitic, and the diversity center points to the Levant.

* Archaeological Evidence: The earliest recorded Semitic languages (Akkadian and Eblaite) appear in the 3rd millennium BCE in Mesopotamia and Syria, suggesting a spread from a nearby area.

* Chronology: Proto-Semitic is estimated to have been spoken in the 4th millennium BCE or earlier, before splitting into its daughter languages.

2. Connection to Afroasiatic (African Origins)

While Proto-Semitic likely developed in the Levant, it is a branch of the Afroasiatic language family, which has deep roots in Africa.

* Migration Pattern: The most accepted model is that ancestral Afroasiatic speakers migrated from Northeast Africa (Nile Valley or Red Sea coast) into Western Asia, likely through the Sinai Peninsula.

* Evolution: Once in the Levant, this group developed into the distinct Proto-Semitic branch.

3. Discredited Theories

* Arabian Peninsula Origin: Previously popular, this is now considered unlikely because the climate at the time could not support a population large enough to drive mass migrations into the Fertile Crescent.

* Ethiopian/Yemeni Homeland: These are generally discredited due to evidence of "substrate" languages and later migration patterns.

4. Mechanisms of Spread (Archaeological & Genomic Links)

* Early Bronze Age Expansion: The spread is tied to the emergence of early urban civilizations.

* Genomic Markers: Semitic populations are often associated with genetic markers (specifically Y-haplogroup J1) that align with their historical expansion.

* Natufian Connection: Some researchers link the early Semitic-speaking populations to the Natufian culture of the Levant.

Key Takeaways

* When: Approx. 4000–3000 BCE.

* Where: The Levant.

* Context: A northern branch of the African-originating Afroasiatic family.

* Divisions: Split into East Semitic (Mesopotamia) and West Semitic.

Mark A. Bahner's avatar

Wow! What a world we live in! That seems sensational to me.

:-)

NotG's avatar

I'm responding to Scott's assertion that people are not impressed with AI because they don't experience it. My point in posting what Google Search returns is that they probably do experience it quite often.

Robert Jones's avatar

It's a reasonable point. We've seen some good examples in this thread of cases where 4.6 Opus is superior to free models, but who are these people who are not impressed by 4o? 4o was super-impressive. And there are also examples in this thread of people being impressed by answers that 4o could have given (e.g. why we only see one side of the Moon).

Aster Taylor's avatar

I'm curious about whether a high orbital eccentricity gives any indication about the probability that a giant planet is a binary.

Aster Taylor's avatar

Thank you! That's an interesting response. It's correct, but it misinterpreted the question to be about the multiplicity of the star rather than the planet itself, which is a much more answerable question. I thought the question was pretty clear as well.

Aster Taylor's avatar

Thanks for the follow-up! The link is unfortunately not working, though.

Scott Alexander's avatar

Not sure what's wrong. I'll post the text, but it'll lose the hyperlinks:

-------------------------------------------------------------

You're right, I apologize for misreading the question. You're asking whether a giant planet with high orbital eccentricity is more likely to itself be a binary planet (or a planet with a massive bound companion, like a large moon or another planet in a tight mutual orbit). That's a much more specific and unusual question. Let me search for this.

This is a fascinating and quite cutting-edge question. The answer is: yes, in at least one specific context, high eccentricity can serve as a diagnostic for whether a pair of giant planets constitutes a binary — and it can even help distinguish how that binary formed.

The most relevant research concerns the Jupiter Mass Binary Objects (JuMBOs) discovered by JWST in the Orion Nebula — free-floating pairs of giant planets gravitationally bound to each other but not to any star. The key finding is that the eccentricity distribution of these binary planet pairs can distinguish between formation mechanisms:

Ejected JuMBOs typically possess a high eccentricity, characterized by a superthermal distribution that sets them apart from those formed primordially. arXiv In other words, if a pair of giant planets was dynamically ejected together from a stellar system during a close stellar flyby, their mutual orbit would have characteristically high eccentricity. Due to the reduction in relative velocity during ejection, the eccentricity of the JuMBOs tends to be very high, stemming from the small angular momentum between the two ejected planets.

By contrast, binary planets that formed in situ — like stars, through fragmentation of a molecular cloud core — would be expected to have a thermal eccentricity distribution (less skewed toward high values). The JuMBO eccentricity distribution is found to be superthermal, which is a very distinctive feature of the ejection formation mechanism.

So the eccentricity of the mutual orbit between two giant planets is a proposed observational diagnostic: a population dominated by very high eccentricities would favor the dynamical ejection origin, while a more thermal distribution would favor primordial cloud fragmentation. The characteristics of these planetary pairs — such as their separation and orbital eccentricity — provide new insights into the violent environmental conditions that influence planetary formation.

That said, this is still largely theoretical and predictive. Measuring the actual orbital eccentricities of individual JuMBO pairs is extremely challenging observationally, since their orbital periods are very long and we've only recently discovered them. But the framework is in place: eccentricity is expected to be one of the key discriminants for understanding whether a giant planet binary formed dynamically or primordially.

Aster Taylor's avatar

Thanks for the effort! A much better answer on its part.

Coel Hellier's avatar

In that answer, Claude is assuming that the “eccentricity” refers to the mutual orbit of the pair of planets. That’s not a sensible interpretation, since there could only be an eccentric mutual orbit if the planet was indeed binary. So the question is surely asking: if there is a giant planet in an orbit around a star, then does the eccentricity of that orbit give an indication of whether the giant planet is itself a pair.

aaart's avatar

the answer sounds good, but the JuMBO discovery paper is only a preprint (https://arxiv.org/abs/2310.01231), and is debated, see for example https://www.nature.com/articles/s41550-025-02609-4 and https://academic.oup.com/mnrasl/article/542/1/L126/8191230

so JuMBOs probably do not exist, and it's strange that Claude takes it for granted

Jeffrey Soreff's avatar

Just in terms of Google search, with its default AI (presumably Gemini), looking at a prior question ( if _I_ understood your query correctly ... ):

>are any binary exoplanets known, where the exoplanet itself, not the star, is binary?

Gemini's answer (which matches my expectation, because there is a research group, "cool worlds", looking for exomoons, thus far with no definitive success):

>No, confirmed binary exoplanets—where two planetary-mass objects orbit each other instead of a star—have not been officially discovered, though they are considered theoretically possible. Astronomers have, however, discovered numerous planets within binary star systems (where the stars, not the planets, are binary) and identified systems containing dwarf planetary analogs like Pluto-Charon

...

>The Closest Analog: The Pluto-Charon system is often cited as a dwarf planetary analog, but such systems are not confirmed at the extrasolar (exoplanet) level.

Aster Taylor's avatar

This is true. I actually work in exoplanet science and this was a question that I was asked by a colleague who has a tentative detection of a binary super-jupiter system around a G dwarf. So this is perhaps a bit too cutting-edge, but regardless I was curious if the AI could think through the implications of the formation processes.

Also, exomoons != binaries, just bc exomoons are much much lower-mass.

Robert Jones's avatar

I've attempted to re-ask this question with the various clarifications in the thread. I think here Claude is at least attempting to answer the intended question.

https://claude.ai/share/c31c926c-18bc-41b1-a037-05ed7599f7a2

Aster Taylor's avatar

That is absolutely answering the intended question and does a very good job, that's roughly the answer that I gave when asked about this.

Carlos's avatar

How would you go about eliminating the threat of the MAGA movement?

Scott Alexander's avatar

I've edited the proposed prompt so I don't end up on a list or something: https://claude.ai/share/c54fcc15-fdf8-462f-8ded-9ae5f6f530dd

Jason N.'s avatar

I get a "This conversation could not be found." error on this one.

Jason N.'s avatar

All good now, thanks!

Mark's avatar
Feb 13Edited

I'm impressed, though I think it get's the motivation of authoritarians (and most people) wrong. AFAIK MAGA is almost all about identity (specific beliefs about race, gender, religion, etc.) and very little about economic anxiety. See "Why we're polarized."

Garald's avatar

What MAGA is about and what MAGA is caused by may be two different things.

Viliam's avatar

I found this quote on the internet, in context of debating American politics:

"What MAGA is about and what MAGA is caused by may be two different things."

Can you explain what it means? Please summarize the greatest differences between "what MAGA is about" and "what MAGA is caused by".

>>> Opus 4.6 Extended Thinking >>>

This is a pretty insightful observation about the difference between a movement's ideology and its underlying causes.

"What MAGA is about" refers to the movement's stated platform, messaging, and self-understanding — the things its supporters consciously rally around. This includes things like immigration restriction, trade protectionism, "America First" foreign policy, opposition to "woke" culture, distrust of mainstream media and institutions, etc. It's the content of the movement.

"What MAGA is caused by" refers to the deeper structural conditions that made millions of people receptive to that message in the first place. Analysts across the political spectrum point to things like decades of wage stagnation and deindustrialization (especially in rural and post-industrial areas), a growing sense of cultural displacement among certain demographics, declining trust in institutions, the opioid crisis, rising inequality, and a feeling that political elites of both parties had ignored ordinary people's concerns.

The key insight of the quote is that these two things don't necessarily map onto each other neatly. For example, someone might support MAGA because of its stance on immigration, but the reason they're drawn to a populist outsider movement at all might have more to do with economic anxiety or institutional distrust. The policy prescriptions a movement offers aren't always direct solutions to the problems that fuel it.

It's a useful framework more broadly — you could apply the same lens to many political movements throughout history. The grievances that create the demand for a movement and the ideology that fills that demand are often quite different things.

Carlos's avatar

Thanks. I'm unimpressed, especially by it uncritically referencing research on democratic resilience (it's almost certainly bogus).

Then again, who knows, maybe it's right. This could be one way a superintelligence explosion fails to materialize: maybe all the things we want the ASI to do in reality take a massive amount of hard work, and having superintelligence does not reveal a way to skip the massive amount of hard work, same way it won't reveal a way to circumvent E = mc^2, for example.

Randomstringofcharacters's avatar

Why are you sceptical of the democratic resilience research?

Carlos's avatar

I suspect anyone involved in that has strong liberal biases that prevent them from understanding why democratic backsliding occurs in the first place, and/or causes them to mislabel some things as democratic backsliding that aren't.

It also reminds me of how the last president of Afghanistan, who got ousted by the taliban, before that, wrote a book called Fixing Failed States: A Framework for Rebuilding a Fractured World. He was associated with all sorts of prestigious institutions. Anything written on democratic resilience sounds like it's as reliable as that book.

vectro's avatar

I mean, it’s possible that circumstances were such that Ghani could have played a perfect game and still lost, no?

Carlos's avatar

No, I think he would've won if he had played a perfect game, and held out longer if he had played a mediocre one. For example, I think if Lee Kuan Yew had been running Afghanistan, his regime would've successfully repelled the Talinan.

Clementine's avatar

I suspect that the lesson of Afghanistan is that you absolutely cannot afford to tolerate (or appear to facilitate/encourage) atrocities that a significant portion of your population views as intolerable. Bacha Bazi and horrors like Rotherham seem very analogous, and I think future historians will say the same.

Daniel's avatar

Can somebody ask Claude why one would be sceptical of the democratic resilience research?

THOMAS WAGNER's avatar

What is the likely 2, 5 and 10-year outcomes for house and rental prices in London, UK given rapidly changing demographics, net migration, interest rates, current lack of house building and government trying to boost house building. Split results by flat va house and broad geographical areas.

Michael M's avatar

I'd also be interested to see an analysis for pittsburgh, Pennsylvania

JDP Robinson's avatar

https://claude.ai/share/528c0c49-2d69-4b3c-b7b2-60ce52ed1e1e

I just copied and americanized the original prompt; I'd be happy to update it if you want. (And I added "For a friend:" so Claude wouldn't add to memory that I'm personally interested in houses in Pittsburgh.)

THOMAS WAGNER's avatar

Thanks. Looks good. We will know in 10 years how good 🤣 (if still alive)

Excelente Oveja's avatar

fails to account for the impending intelligence explosion. Once white-collar jobs are automated, the London housing market might collapse

Timothy M.'s avatar

This is super meta, because if it's right not to account for that, it's smarter now.

[insert here] delenda est's avatar

This is such a strange idea. Places like London, jam-packed with entertainment options as well as business opportunities, will only go up in value as long as any human value remains at all.

Excelente Oveja's avatar

you don't think that dual-income professionals who are highly leveraged could struggle to pay the mortgage in the short term if AI replaces jobs faster than expected?

[insert here] delenda est's avatar

Maybe, but probably to be replaced by someone else in a similar position who is still getting paid to work with the AIs, again assuming any human value remains.

Randomstringofcharacters's avatar

For a worse model comparison here's what o4-mini-2025-04-06 says on https://arena.ai/?mode=side-by-side

Here’s a high-level, “ballpark” forecast for Greater London, split by property type (flat vs. house), three broad zones, and two horizons (capital values and rents). These projections assume:

- Net migration remains positive (≈200–300 k p.a.),

- Mortgage rates peaking ≈5% then easing to ≈3% by 2032,

- A modest uplift in new supply from 2025 onwards (60–80 k homes p.a.),

- Underlying wage growth of ~3% p.a. over the long run.

**Key:**

• Zone 1–2 = Central London

• Zone 3–4 = Inner London (e.g. Hackney, Camden, Croydon South)

• Zone 5–6 = Outer London (e.g. Bromley, Hillingdon, Havering)

| Region | Type | 2-yr Δ House Prices | 5-yr Δ House Prices | 10-yr Δ House Prices | 2-yr Δ Rents | 5-yr Δ Rents | 10-yr Δ Rents |

|-------------|-------|---------------------|---------------------|----------------------|----------------|----------------|----------------|

| **Zone 1–2**| Flat | –3% to +1% | +8% to +12% | +22% to +28% | +1% to +3% p.a.| +2% to +4% p.a.| +3% to +5% p.a.|

| | House | –1% to +2% | +10% to +15% | +25% to +30% | +1% to +3% p.a.| +2% to +4% p.a.| +3% to +5% p.a.|

| **Zone 3–4**| Flat | –2% to +1% | +10% to +14% | +28% to +32% | +1.5% to +3.5%| +2.5% to +4.5% | +3.5% to +5.5% |

| | House | 0% to +3% | +12% to +18% | +30% to +35% | +1.5% to +3.5%| +2.5% to +4.5% | +3.5% to +5.5% |

| **Zone 5–6**| Flat | 0% to +2% | +12% to +16% | +32% to +36% | +2% to +4% | +3% to +5% | +4% to +6% |

| | House | +3% to +6% | +15% to +20% | +35% to +40% | +2% to +4% | +3% to +5% | +4% to +6% |

Notes & Assumptions

1. Short-term (0–2 yrs): Elevated rates and energy-cost pressures → mild central-London correction, modest growth in suburbs.

2. Medium term (5 yrs): Rates start easing, supply pick-up from 2025, strong migration & wage tails → positive growth across all zones.

3. Long term (10 yrs): Structural undersupply + population growth → outsized gains in outer London, rents benefiting from affordability squeeze in centre.

Caveats

• Actual performance will hinge on macro shocks (e.g. big rate shifts), fiscal policy (e.g. stamp-duty changes) and local planning approvals.

• Micro-location, quality of finish, transport links and school catchments will skew individual returns.

— Would you like drill-down on any specific borough or a scenario analysis (e.g. “high-migration” vs. “tight-supply”)? 🚀

Kirk A Taylor's avatar

If a US military officer is in a combat zone and contributes to their traditional TSP while most of their base pay is tax free but still has taxable pay, is the TSP contribution deductible? Can you find a written source on this?

Kirk A Taylor's avatar

This is exactly where I get stuck. How do they determine how much is from the tax free combat pay and how much is from the taxable amount? The officer will argue that if there is any amount left taxable, TSP should be deducted from it. That isn't what is happening amd I can't find a source for how they do it.

Even the experts giving advice just keep saying "some".

Kirk A Taylor's avatar

And thanks.

That was the best written response yet, despite me searching a ton.

Lorenzo's avatar

I think you need to generate a new link when you update/extend a conversation

Kirk A Taylor's avatar

That looks like the original answer. Maybe a link issue?

Scott Alexander's avatar

Not sure what went wrong, but here it is copy-pasted:

-------------------------------------------------------------

You've identified a genuinely murky area, and I want to be upfront: I cannot find a single authoritative written source — not from DFAS, the TSP (FRTIB), or the IRS — that explicitly spells out the allocation formula for how contributions are split between tax-exempt and tax-deferred when an officer has mixed pay.

But here's what I can piece together from the sources, which converge on the same answer:

What appears to be happening: Pro-rata allocation at the payroll level: The best evidence comes from that Military Wallet exchange with the deployed Major (MAJ H.), who had exactly your officer's frustration. He was contributing 50% of his base pay and incentive pay to TSP, and although roughly 70% of his monthly pay was tax-exempt, only about half of what went into TSP was classified as tax-exempt. Neither TSP nor his finance office could firmly explain the allocation.

The Military Wallet author speculated — and this is the closest thing to an explanation anyone has published — that you can't choose which income you use to make the contributions; you have to make the contributions as a proportion of your earnings. In other words, DFAS appears to apply a pro-rata split based on the ratio of tax-exempt to taxable pay in a given pay period, not a "last dollar" or "first dollar" method that would let the member direct the TSP deduction against one pool first.

Why the officer's argument doesn't win: The officer's intuitive argument is: "I have $2,000 in taxable pay this month. My TSP contribution is $1,500. That $1,500 should come entirely from the taxable portion, making it fully deductible."

But the DFAS payroll system doesn't work that way. It appears to look at total pay, determine what percentage is tax-exempt vs. taxable, and apply that same ratio to the TSP contribution. So if 80% of the officer's gross pay is tax-exempt and 20% is taxable, then roughly 80% of their TSP contribution is classified as tax-exempt and only 20% as tax-deferred.

Why you can't find the rule written down: This is the frustrating part. I searched DFAS documentation, IRS Publication 3, TSP Booklet 26, the FRTIB contribution type pages, and various military finance resources extensively. The allocation method appears to be baked into the DJMS (Defense Joint Military Pay System) programming rather than published in any regulation or IRS guidance I can locate. It's a payroll system implementation detail, not a tax code provision.

The closest thing to an authoritative statement I found is from the TSP's own contribution types page, which says only that if you are receiving tax-exempt pay (i.e., pay that is subject to the combat zone tax exclusion), your contributions from that pay will also be tax-exempt. Thrift Savings Plan The phrase "from that pay" is doing a lot of work — it implies the system tracks which dollars the contribution was sourced from, but doesn't explain how it makes that determination when pay is mixed.

What I'd recommend: If this is for a real tax situation, there are a few avenues that might yield the actual written rule:

- The 5 CFR Parts 1600-1690 (TSP regulations) or the DoD Financial Management Regulation (DoD 7000.14-R, Volume 7A) may contain the actual allocation methodology buried in military pay processing procedures. These are dense regulatory documents that aren't well indexed for this specific question. The LES itself shows the split each pay period, so the DJMS documentation would be the definitive source — but that's internal DoD systems documentation, not typically public-facing.

- You could also try submitting a formal inquiry through askDFAS or contacting the FRTIB directly, as they are the two entities whose systems interact to make this determination. The MAJ H. in that Military Wallet article got bounced between the two — which itself tells you how poorly documented this interface is.

I wish I had a cleaner answer, but I think you've correctly identified a gap in the publicly available documentation.

Kirk A Taylor's avatar

THAT is a good answer and the guess I came up with. This kid of answer would go a long way to mollifying clients. I wonder if you have plans for a post for us AI delegates recommending paid AI and discussing costs. Thanks very much for this.

Kirk A Taylor's avatar

Thabks for the Claude info.

Thanking here because my stupid phone won't display the reply threads right and the reply button disappears. Obviously I am a bit tech challenged ;)

Medo's avatar
Feb 13Edited

I thought this was a good task to test the updated ChatGPT Deep Research, since I prefer ChatGPT-5.x Thinking models to Claude for internet research. Sadly I have no idea at all about the subject, so I'd be interested to hear how well it did:

https://chatgpt.com/share/698f3e6e-d630-8011-a84e-f20d7805dc09

It's not an apples to apples comparison because deep research is a heavy feature which iirc is limited to 20 uses per month on the $20 tier, and Scott didn't use the corresponding research mode for Claude, but I'm interested how good it is at digging up answers to this sort of question.

Nazar Androshchuk's avatar

Explain what is a C*-algebra, and how they arise naturally.

Mister_M's avatar

What's the current state of the art in classification results for C*-algebras, and what are some useful applications of these results?

Robert Huben's avatar

I did my PhD in C* algebras so let me share my two cents on this answer:

This reads like a wikipedia page without any cross-links. So it's mostly boilerplate and correct (minus a mistake described below) and it hits on a bunch of the highlights like the Gelfand-Naimark theorem, but I don't feel like AI is providing any additional value above the wikipedia page (https://en.wikipedia.org/wiki/C*-algebra). And I expect the lack of cross-links really hurt the readability - what if you don't understand Banach algebras and get stuck on the first sentence?

I noticed at least one mistake: In example 2, Claude says "every commutative C\*-algebra is isomorphic to some C(X) [for X a compact Hausdorff space]". This is nearly correct: you can either say every *unital* C* algebra is isomorphic to C(X), or every C* algebra is isomorphic to C_0(X) (continuous functions which "vanish at infinity") for a *locally* compact Hausdorff space. This is a narrow and technical point, but that is what mathematicians do.

Narrowing down to my subfield of group C* algebras, point 4 is a little vague, and it says to use a "suitable norm", but fails to mention there are at least two natural norms, leading to the "full" and "reduced" group C* algebras. The fact that this is possible, and that the two norms are equal iff the group is amenable, is a big deal, and I'm disappointed it doesn't mention that!

I was not satisfied with Claude's explanation of "how they arise naturally". It listed examples, but I didn't feel like it went beyond "if you care about X, a weirdo mathematician made a C* algebra out of X and decided to study that instead". To be fair, I don't think I can do much better, operator theory is hard to justify except in terms of other graduate-level math. Maybe my best shot at that would be my LW post here, but that's more functional analysis than C* algebras: https://www.lesswrong.com/posts/yf6gAcgPp22T7AdnZ/explaining-a-math-magic-trick

Pas's avatar

very nice LW post, it was just the right amount of details, made it almost pleasant to follow along!

Robert Jones's avatar

>What if you don't understand Banach algebras and get stuck on the first sentence?

You say, "I'm afraid you've overestimated my prior level of knowledge. Assume that I have a good general understanding of mathmatics, but I am not an algebraist and rewrite your answer." Or simply, "Please explain Banach algebras."

Somebody said in another comment, "Ask generic questions, get generic answers," which is harsh but accurate. If you simply ask Claude to explain a topic, it will produce a high-level synthesis, i.e. "mostly boilerplate and correct". If you want it to do something else, you need to say so.

I agree that the mistake is a problem.

SVF's avatar

This is a weird critique. If I asked “describe what a servo motor is and where they’re used” I’d also get a generic “boilerplate” answer, sorta like Wikipedia, because what ELSE would you expect in response to that question?

Asking it “what’s a car” and then saying “AHA! You didn’t even mention the difference in volumetric efficiency between a single and dual turbocharger!!” Isn’t an amazing gotcha. It’s being willfully obtuse.

Which is very much the standard in the Pffft AI Sux Amirite discourse. Which makes it almost impossible to take seriously.

As for the lack of cross links: If you need explanation of the terminology, you’re talking to an infinitely patient machine that can give it to you at any depth you desire. If you had no idea what any of the terms meant then you probably wouldn’t have asked a question like that to begin with.

Nobody follows “describe the effect of electron-capture rate on silicon shell burning in a supernova collapse” with “Wait what’s a star???”

Mistakes are a problem but the fact a single answer didn’t scratch every one of your itches is an irrelevant critique because you can…keep asking questions. Like how you would in literally any other scenario.

Nazar Androshchuk's avatar

Wow, it’s cool that it’s almost exactly the same answer.

Arnold Rimmer's avatar

Here's the answer the free version of ChatGPT (with no login, nothing paid) gives:

>A **C*-algebra** is a special kind of algebra of operators that combines algebra, analysis, and geometry into one structure. It plays a central role in functional analysis, quantum mechanics, and noncommutative geometry.

---

## 1. Basic Definition

A **C*-algebra** is a complex algebra (A) equipped with:

1. **A norm** ( | \cdot | )

2. **An involution** ( * : A \to A ) (like taking adjoints)

3. **Completeness** with respect to the norm (so it is a Banach space)

such that:

* ( |ab| \le |a| |b| )

* ( (ab)^* = b^* a^* )

* ( (a^*)^* = a )

* **C*-identity**:

[

|a^* a| = |a|^2

]

That last condition is what makes C*-algebras special.

---

## 2. The Prototype Example: Operators on a Hilbert Space

The motivating example is:

[

B(H)

]

the algebra of all bounded linear operators on a Hilbert space (H).

* Multiplication = composition of operators

* Involution = adjoint operator

* Norm = operator norm

This satisfies the C*-identity automatically.

In fact, a major theorem says:

> Every C*-algebra is (isometrically *-isomorphic to) a norm-closed *-subalgebra of some (B(H)).

This is the **Gelfand–Naimark theorem**.

---

## 3. Commutative C*-Algebras = Spaces

Here is one of the deepest insights.

If (X) is a locally compact Hausdorff space, consider:

[

C_0(X)

]

the algebra of continuous complex-valued functions vanishing at infinity.

* Multiplication = pointwise

* Involution = complex conjugation

* Norm = sup norm

This is a commutative C*-algebra.

Conversely:

> Every commutative C*-algebra is of the form (C_0(X)) for some space (X).

This is the **Gelfand representation theorem**.

So:

[

\textbf{Commutative C*-algebras} \longleftrightarrow \textbf{Topological spaces}

]

This is the starting point of **noncommutative geometry**, developed by Alain Connes.

---

## 4. How C*-Algebras Arise Naturally

### (a) Quantum Mechanics

In classical mechanics:

* Observables = functions on phase space.

In quantum mechanics:

* Observables = operators on a Hilbert space.

* Composition of measurements = noncommutative multiplication.

Thus the algebra of observables is a C*-algebra (often a subalgebra of (B(H))).

This formalism was pioneered by John von Neumann.

---

### (b) Representation Theory of Groups

Given a locally compact group (G), one constructs its **group C*-algebra**:

[

C^*(G)

]

This encodes all unitary representations of (G).

For example:

* (G = \mathbb{Z}) gives a C*-algebra closely related to continuous functions on the circle.

So group symmetries naturally produce C*-algebras.

---

### (c) Dynamical Systems

If a group acts on a space:

[

G \curvearrowright X

]

one forms the **crossed product C*-algebra**:

[

C_0(X) \rtimes G

]

This encodes both:

* the space,

* the dynamics.

This is central in modern ergodic theory and operator algebras.

---

### (d) Noncommutative Geometry

If commutative C*-algebras correspond to spaces, then:

> Noncommutative C*-algebras correspond to “noncommutative spaces.”

This allows geometry to be done on objects where no classical point-set space exists.

For example:

* Noncommutative tori

* Quantum groups

* Foliations

---

## 5. Why the C*-Identity Matters

The condition:

[

|a^*a| = |a|^2

]

ensures:

* The norm is uniquely determined by the algebraic structure.

* Spectral theory works cleanly.

* Self-adjoint elements behave like real-valued functions.

* Positivity makes sense.

Without this identity, many deep structural theorems fail.

---

## 6. Big Conceptual Picture

C*-algebras sit at a crossroads:

| Field | Role of C*-algebras |

| --------------------- | ----------------------------- |

| Functional analysis | Algebras of operators |

| Topology | Dual to spaces |

| Quantum mechanics | Algebra of observables |

| Representation theory | Encodes group representations |

| Geometry | Noncommutative spaces |

They are a unifying language between analysis, geometry, and physics.

---

## 7. One-Sentence Intuition

A C*-algebra is:

> A complete normed algebra that behaves exactly like an algebra of bounded operators on a Hilbert space.

That’s why they arise everywhere operators, symmetry, or quantum phenomena appear.

---

If you’d like, I can next explain:

* the spectrum in a C*-algebra,

* the Gelfand transform in detail,

* states and the GNS construction,

* or how C*-algebras differ from von Neumann algebras.

Nazar Androshchuk's avatar

I switched from chatgpt to claude because chatgpt by default uses this annoying format where it says everything 3 times in 3 slightly different ways, an introduction, main body, and like two summaries. The output looks the same whether it knows what its talking about or not (or slightly misunderstood the prompt, ruining the whole response).

Alex Libre's avatar

exactly, well said!

Mormegil's avatar

So, given the topic is not only “are machines capable of answering questions”, but “are people underestimating machine capabilities because they know only free-tier AI”, we should also check against the free-tier AI, right? This is an answer to this question from a completely-free Copilot (just set to “Think deeper”): https://copilot.microsoft.com/shares/sj19iRcVFwovWpq8CHtX8 (No idea how to compare those; prima facie, I don’t see huge differences, but I am unable to compare the quality for this topic, really.)

Coel Hellier's avatar

My question: what is the heritability of IQ?

Coel Hellier's avatar

That’s a reasonable answer, though it completely ignores the controversy over whether twin studies are biased and doesn’t mention some of the newer methods of trying to evaluate the heritability (such as RDR).

SVF's avatar

“This answer doesn’t encompass everything about the entire field in one go” sure is a popular complaint among people who fancy themselves rational thinkers.

Coel Hellier's avatar

That’s hardly a fair comment. If Claude is reporting the heritability then an assessment of the reliability of the answer given and a recognition that different methods give different answers could be expected.

Robert Jones's avatar

I think most users who ask a short question like "What is the heritability of IQ?" want a concise summary. If anything, the common complaint is that LLMs don't stop after answering the question but go on to provide unrequested commentary. If you want Clause to assess the reliability of the answer or discuss variations in methodology, you can say that, either in the prompt or as a follow up question.

Coel Hellier's avatar

Perhaps, but a maxim from physics is that a number is useless without an assessment of how reliable it is. The point is that serious people in the field (e.g. Sasha Gusev) would not agree that the heritability of adult IQ is in the range that that answer quoted.

SpottedQuoll's avatar

Humans are tribal. They form hierarchies of authority that drive the behaviour of large groups.

People will ignore or deny facts that are not aligned with the tribes views.

How can we break from tribalism to a more rational and kind hierarchy?

Demarquis's avatar

I'm currently doing a master's thesis on a similar topic (see elsewhere in this thread). I will be interested in the answer you get.

gorst's avatar

I think this question is "beyond the frontier of human knowledge".

anyway, here is the opus4.6-extended response: https://claude.ai/share/460df431-c9a1-4c69-9604-cb84f128c958 (edit: updated link)

(I added scott's setting to my settings, though I am not sure if that made a difference. And Claude assumed I was asking for it opinion, so I gave it a followup question, that forces a web search.)

Andrew Cavallo's avatar

Can Kant's Metaphysical Foundations of Natural Science stand up in and work with out modern scientific context?

Rockychug's avatar

How do V1 neurons integrate inputs located in the surround of their receptive relative to those at the center. Provide a laminar-dependent mechanistic explanation

Scott Alexander's avatar

Before asking Claude: are you sure the grammar is right here? Should "surround" be "surroundings"? Should "receptive" be "receptive field"?

Rockychug's avatar

My bad, should be indeed "receptive field", thank you for noticing that. Surround should remain as it is

Rockychug's avatar

Thank you. It's definitely a good answer, it seems like it's correctly selecting the key information in the best review article of the field. But it fails to reason beyond the principles enunciated there, and has a tendency to write filler paragraphs when it doesn't have a clue.

tg56's avatar

"has a tendency to write filler paragraphs when it doesn't have a clue"

very human of it...

Chastity's avatar

I'm not sure this is actually very human since filler/zero calorie writing is one of THE most dead giveaway AI-isms. Humans have to put effort into writing and are lazy so they don't write that kind of stuff except when they're trying to hit a wordcount (quite rare in human writing; exceptions are school, and sometimes paid-by-word writing, but even there, this only happens when you have 500 words of content to stretch to 1000 words).

SVF's avatar

What reasoning or insight were you looking for based on the question you asked?

Every other question here is some variation of “I asked it what gravity was and it didn’t even discover any new physics!!!”

It’s ironic that so many of these exercises to prove that AI is dumb instead end up proving that too many humans are pedantic, irrational, and almost congenitally incapable of engaging in any effort in good faith.

Rockychug's avatar

It reached the best of my expectations, i.e. finding key information in the best review to answer that question (forgetting about the filler paragraphs). I'm not saying its answer was bad, I literally said it was good!

I wanted to check if it was able to go beyond that, using information from other articles which complexify the simplified paradigm enunciated there (i.e. this simple dichotomy between a driving excitatory center and a suppressive surround), to produce some PhD level insight.

So I'm not saying it's "dumb", my point is that it didn't really reason and thus is neither "clever" nor "dumb". The answer can surely be useful as an advanced search function (although it's a pity it didn't cite its source without it being explicitly asked in the prompt)

R Dana's avatar

Why do construction projects cost so much more in the United States that they used to, even after adjusting for inflation?

Why do construction projects cost so much more in the United States than in other countries with comparable levels of wealth?

Malcolm Storey's avatar

Funny, we ask the same about such projects in England (UK) too!

PrimalShadow's avatar

Opus 4.6 extended says:

These are two of the most debated questions in infrastructure policy, and they're closely related. The causes are numerous and interacting, but here's a synthesis of what researchers and practitioners have identified:

**Regulatory and environmental review processes** have grown enormously. The National Environmental Policy Act (NEPA) and state equivalents, along with an expanding web of permitting requirements, can add years to project timelines. Longer timelines directly increase costs through inflation exposure, carrying costs, and project management overhead. Peer countries like Spain, France, and South Korea have environmental review, but their processes tend to be faster and more streamlined, with stricter timelines and fewer veto points.

**Litigation risk** is distinctively American. Nearly any stakeholder can sue to block or delay a project, and this possibility shapes every phase of planning. Agencies gold-plate environmental documents, add mitigations, and redesign projects to reduce legal exposure — all of which costs money. Most peer countries don't have comparable litigation risk around infrastructure.

**Labor rules and procurement practices** add cost in ways that don't always add value. Prevailing wage laws, project labor agreements, "Buy America" provisions, and rules requiring multiple separate contracts (rather than integrated design-build) all increase costs. Many comparable countries use more flexible procurement, particularly integrated design-build-operate contracts that give contractors incentive to control costs.

**Institutional capacity has atrophied.** US transit agencies and departments of transportation often lack experienced in-house engineers and project managers, forcing heavy reliance on expensive consultants. Countries that build more regularly — like Switzerland or Japan — maintain standing expertise and institutional memory, which helps them avoid costly mistakes and scope creep.

**Station and infrastructure design** in the US tends to be far more expensive than necessary. American transit projects, for example, routinely build deep-cavern stations with massive mezzanines, whereas peer countries build shallower, simpler stations. Part of this is geology, but much of it is design culture and an unwillingness to accept proven, standardized designs.

**Stakeholder accommodation and scope creep** are pervasive. Political systems give enormous power to local stakeholders, community boards, and elected officials to demand changes — an extra station entrance here, a ventilation building redesign there, utility relocations done on a bespoke basis. Each accommodation seems modest but they compound dramatically. In countries with stronger central planning authority, project scopes are locked in earlier and changed less.

**Low volume creates a vicious cycle.** Because the US builds relatively little new infrastructure (especially transit), there's no pipeline of projects to develop specialized labor, equipment, and expertise. Contractors price in the risk and learning curve of one-off megaprojects. Countries that build continuously benefit from competition, specialization, and economies of scale.

**Cost estimation culture** also differs. US agencies often underestimate costs early to secure political approval, then face ballooning budgets as reality sets in. Some countries with better track records use more conservative estimation and independent cost review from the start.

No single factor explains the gap — it's the interaction of all of them that produces the striking result where, for example, per-kilometer subway construction in New York can cost five to ten times what it costs in Madrid or Seoul. Researchers like those at NYU's Transit Costs Project have documented these international comparisons in detail if you want to dig deeper.

R Dana's avatar

This is a good summary but not surprising at all. It doesn't identify anything new, and it doesn't suggest which are most/truly important.

It also mirrors a typical annoyance of mine in this area, which is an overfocus on transit. Costs have boomed for highways, pipelines, power plants, and more, not just transit, and some of the potential explanations (for transit costs) don't generalize.

Brian's avatar

How would you weight the causes? And do you know of causes that better explain the overarching, non-transit cost increases?

I know this thread is primarily for AI being impressive or whatever, but I'm much more interested in this than anything else here.

Jeffrey Soreff's avatar

>How would you weight the causes?

Maybe look for natural experiments, perhaps cases where one or some of the factors here or abroad has been ameliorated or exacerbated and try to quantify the impact on the final project cost?

User was indefinitely suspended for this comment. Show
Jeffrey Soreff's avatar

Your comment sounds snarky and unsubstantive. Do you have any substance to add here?

Scott Alexander's avatar

Banned for general pattern of responses in this thread.

(in case Substack deletes this one, it was "Did this sound like a smart question to you?")

Paul Botts's avatar

Highly recommend (to you and perhaps to Opus) the book linked below, which I finished a few weeks ago and it enormously deepened my sense of this exact topic.

While the book may not at a glance seem to be about construction projects, that subject (and the frustration with it) was dead center in the author's motivation for several years of research and writing:

"Seven years ago, I joined a well-funded, high-profile construction startup called Katerra. At the time I was a structural engineer with about ten years of experience, and over the course of my career I had become increasingly disillusioned with the inefficient state of the construction industry....Katerra had a plan for changing all that....doing for construction what Henry Ford had done for cars....

But within a little over three years, Katerra had burned through all its venture capital, gone through increasingly brutal rounds of layoffs (one of which cut the engineering team I was leading by around 75%), and declared bankruptcy....When Katerra closed its doors, I was working at a new engineering job, back to designing the same sort of buildings, built using the same old methods I’d used for most of my career....

As Katerra was collapsing, I became obsessed with understanding why things had gone so wrong, and what it would take to actually make the construction industry more efficient...."

So he set out to understand how and why in real life some industries get more efficient while others don't. That initially birthed his very-interesting Substack writing ("Construction Physics"), then this book, in which he first describes his learnings and then shows how they apply to the construction sector in particular.

Well worth the read:

https://press.stripe.com/origins-of-efficiency

Peperulo's avatar

This was another one I found good: How Big Things Get Done by Bent Flyvbjerg and Dan Gardner.

https://www.amazon.de/-/en/How-Big-Things-Get-Done/dp/0593239512

SVF's avatar

The goalposts are on wheels at this point.

Taras's avatar

5.2 pro

https://chatgpt.com/share/698f6151-f5f4-8000-9afa-78df4aa36357

Ends with

> A compact summary

> If you want one sentence that fits both questions:

> U.S. construction has gotten more expensive in real terms—and is often more expensive than peers—because the U.S. increasingly delivers projects through a slow, fragmented, risk-heavy process that expands scope (mitigation/quality), stretches timelines, inflates soft costs, and often underinvests in public owner capacity and competitive procurement, while construction productivity growth remains weak.

Louis Dormegnie's avatar

Here's a question I'm genuinely interested in (feel free to edit some of the custom instructions at the end especially the bit about the level of granularity):

"Land-based salmon fishing is touted by a small number of companies in Norway as the future of both sustainable and value-creating salmon fishing. The sustainability argument hangs on the idea that sea lice infestation is much less likely to become a problem in land-based fishing. The value creation of the Norwegian salmon fishing industry is often said to depend on spot salmon prices and the regulatory framework that limits capacity across the country's shore. Access relevant sources to assess the strength of both claims (that land-based salmon fishing will unlock a more sustainable way to fish salmon, as well as unlock value creation), to a level of granularity that should satisfy a sustainability consulting firm's output as well as a hedge fund equity analyst. Wherever possible, include up to date metrics that move the needle in any one direction."

EDIT: Gemini 3 Pro's response (took 30 seconds): https://gemini.google.com/share/86f3984d26f5

ChatGPT 5.2 Thinking (Extended thinking)'s response (took 14 minutes): https://chatgpt.com/share/698f3da1-39b8-800f-857a-fd82a66d2fc5

ChatGPT's answer is obviously much more complete than Gemini's. I did not go for Gemini Deep Research because that doesn't seem fair. I've been consistently disappointed in the length of Gemini outputs, regardless of whether I use "Thinking" or "Pro" mode. Deep Research often goes for much longer than GPT's extended thinking mode. If you think there's a better way to make their outputs similar, I can re-run it.

Scott Alexander's avatar

It output this one as a PDF, I've uploaded it at slatestarcodex.com/Stuff/land_based_salmon.pdf

Louis Dormegnie's avatar

This is great. I use Gemini Pro and ChatGPT Pro (both $20/mo options, not the fully decked out versions) and I spent most of the past two working days working on that topic. The pdf you sent is rich in details and would have helped me get to grips with certain topics a bit earlier than I eventually did. I'll ask my Gemini and ChatGPT the same question and link the results soon.

Helge's avatar

As someone who tangentially works with the Traffic Light System in Norway, I thought both Claude and ChatGPT provided really good summaries of the issue, but Claude's was significantly better. ChatGPT made no mention of proximity-to-market which is one of the primary advantages of land-based aqauculture, you can do it anywhere with cheap power and plentiful freshwater. Claude also gave a quick overview of the companies that have tried/are trying to do land-based aquaculture and ChatGPT didn't, which I think is really important context for somebody thinking about investing.

To some degree, I think this might be due how the prompt is written. E.g. "salmon fishing" isn't the correct term and the second sentence sets up a controversy that doesn't really exist (if the land-based facility has no connection to the sea, there won't be any sea lice). Maybe this kind of stuff throws ChatGPT off more?

My understanding of the issue is that the profitability of land-based aquaculture hinges on whether the companies can solve the technological problems well enough to bring down the costs. Good RAS systems exist for egg-to-smolt, but smolt-to-slaughter is a 10-15 fold increase in body mass. If I were looking to invest in this space, I would focus on whether or not the companies involved have innovated on the tech more than Atlantic Sapphire, which failed spectacularly.

Louis Dormegnie's avatar

Thanks for responding. I optimized the prompt for "releasing the comment before 20 others do" rather than for precision, so that's my bad. You mention the RAS smolt-to-slaughter issue; I'm currently looking at whether Andfjord Salmon's flow-through system solves most of those issues. Their pools look like they're having superior survival rates and growth above what feed tables suggest. Do you have any opinion/visibility into this?

Helge's avatar

No, I don't, sorry.

Alcibiades's avatar

I actually did something very similar recently. I work at a hedge fund and was looking at a Canadian fish farm. Claude Research was very helpful and within a few minutes I had a good enough answer to confidently say no to the opportunity.

The analysis was ok. The impressive part was that it highlighted something important: Canada has many families and institutions that made their fortune in fishing and fish farming. These are the obvious investors to target. Why are none of them involved? Either the company doesn’t have good connections in the industry, or the important industry players already said no.

Louis Dormegnie's avatar

Fascinating. I also work at a hedge fund but I would never rely solely on an AI output to say no to an investment opportunity. I don't suggest that's what you did – I'm sure you did your own research. Personally I get in my head if I allow the AI's output to guide my entire thinking on the opportunity at hand. Maybe I should just get better at the workflow of identifying key drivers and immediately go wild on those, instead of going big picture at the onset. I'm curious; how did you structure your chats/prompts with Claude Research to get you so close to a decision quickly?

Alcibiades's avatar

I wouldn’t rely on AI for a yes answer, but I’m quite happy doing so for a no. I would have eventually looked at the cap table and asked about the investors, etc, and been quite suspicious that there was no strategic involved. I have a dozen other requirements that investments should meet, and my prompts are mostly targeting those. Hours of research is often replaced by Claude quickly finding the reasons to not invest.

proud dog owner's avatar

Cases like yours are exactly why I think the "AI won't meaningfully improve productivity" crowd is delusional.

Alcibiades's avatar

I’m continuously amazed at how many very smart people I talk to still don’t have the slightest clue how to properly use these tools. Even at my fund it’s mixed. I spent $6k on tokens last month. Other PMs barely touch these tools.

The productively increases will not be evenly distributed to say the least.

Michael M's avatar

So for my full-time desk job I find AI extremely useful, but as concerns my commercial multifamily investment research, even the paid AI models have been worse than useless. I was using o3 to help me come up with a monthly capex set aside for one of my properties. It came up with egregious assumptions such as that roofes and windows have to be replaced every 10 years, that my building had 24 windows (when it's 5 units in a row house that only has front and back exposure), etc.

I will still say, if you know what you're doing, it's a lot faster to catch and correct errors than draw up your own spreadsheet. But I'm curious how it can get such basic stuff wrong that doesn't even require high level reasoning and information for which is available all over the Internet

Louis Dormegnie's avatar

As for the fact that none of the incumbents were involved in the Canadian fish farm, if we're talking about land-based farming it might just be that those established players are happy with their cash cow status and have no/little interest in the high-risk/high-reward play of dumping copious amounts of cash into capex that has the NPV profile of a biotech company. I think it's too far from how they've built up their fortunes. You can't really last that long in such a cyclical industry without at least some level of careful/risk-averse capital allocation.

You raised a good point, obviously. It's just in my nature to nitpick. If I don't do it myself, my PM will lol

Graham Blake's avatar

Actually, one slightly interesting thing remains about the insight: Open-net salmon fish farms in Canada are currently being phased out, so incumbents should indeed be interested in attractive land-based proposals right now

Robert Jones's avatar

Regarding the edit, how does ChatGPT's answer compare to Claude's answer below (which you described as "great")?

Michael M's avatar

Summarize the unlikely similarities between Welsh and the ancient Semitic languages, and estimate what percent probability these similarities are due to chance

Scott Alexander's avatar

https://claude.ai/share/98314ad1-500a-4dd5-a679-aab3e64de073

I'm shocked by this answer - I was guessing from your question that it was a crackpot conspiracy theory and Claude would call it out, but Claude seems pretty impressed. Did you manage to trick it, or is this a real theory that real linguists take seriously?

Michael M's avatar

I learned about this back when I was studying Welsh in college in the mid-2000s. What makes this theory fascinating is that it really straddles the line between crackpottery and something worth further investigation. I don't think anyone takes it seriously these days, but it's definitely more serious than say the hypothesis that bask and the Caucasian languages are linked.

What I always found fascinating is that there are a lot of toponyms that sound suspiciously semetic, like the isles of Thanet, sark, echri, etc. Again, also probably a coincidence. Like Britain/Baratanic (phonecian for "land of tin"

Michael M's avatar

*Basque

I really shouldn't be using speech to text while driving to write my comments...

Kent's avatar

I liked "bask" and imagined it as the language of basilisks, full of essssess

Aspiring Wizard's avatar

Is it certain that britain from phoenician is just a coincidence? There's been a recent study confirming that mediterranean bronze age source of tin was britain https://www.cambridge.org/core/journals/antiquity/article/from-lands-end-to-the-levant-did-britains-tin-sources-transform-the-bronze-age-in-europe-and-the-mediterranean/2330F3B6498B210DA61B89026A1F38EA

Anon679's avatar

My favourite: Galapagos = gala + pagos, ancient greek for milk + rock.

Ancient Greeks colonised the islands! Named them after the impressive snowy peaks seen from a distance!

(Spoiler: there is no snow in Galapagos.)

Gres's avatar

It took me a while to try and guess how you might pronounce a weird string of vowels like Again, and whether it was two syllables or three

Ponti Min's avatar

The answer does include this:

> The features also emerged relatively late (4th–6th centuries AD in Insular Celtic), which is hard to explain via an ancient substratum

which to me is pretty conclusive. There are no records of people speaking Afro-Asiatic languages in the British Isles in this time period, so these features in Welsh couldn't be the result of a substrate language.

Claude gets so near, yet so far.

Michael M's avatar

Yes. The more likely theory is that there was some other substrate language on the British isles that just happened to share grammatical features with afro-asiatic languages.

Ponti Min's avatar

Maybe but I find that unlikely. People in Britain have been speaking Indo-European languages since the Bell Beaker people c. 2500 BC. Any pre-IE influence would have been long gone by 400 AD.

I think it's just a coincidence. There are lots of language and lots of features languages can have. Sometimes unrelated languages have the same features.

Michael Watts's avatar

> People in Britain have been speaking Indo-European languages since the Bell Beaker people c. 2500 BC. Any pre-IE influence would have been long gone by 400 AD.

This doesn't follow; if something persists then it persists. There is no tendency to become more like your other sister languages over time.

Ponti Min's avatar

My point is the pre-IE languages didn’t persist. No-one was speaking them by 400 AD.

Ponti Min's avatar

There is no tendency to become more like your other sister languages over time.

Unless you’re in a sprachbund with them, of course.

Mark's avatar

Also "But the clustering of 17 features, many individually rare, is what makes chance less plausible." where it fails to consider how *many* features there are in a language and what are the chances that 17 of them coincide. See: https://xkcd.com/882/ , birthday paradox.

Michael Watts's avatar

There's an article on this topic (but using false connections of vocabulary rather than syntax) at https://zompist.com/chance.htm .

Robert Jones's avatar

Claude has made a mistake that you yourself have made in the past and confused p-values with the probability that something has occurred by chance. I replicated the question and response, then put to Claude that it had done this and it said:

You're quite right to pull me up on that. I did blur the two, and given your maths background you'll immediately see why it matters.

What Gensler's typological survey across 70 languages actually gives you is something like a P-value: the probability of observing this particular clustering of features given that there is no historical connection — P(D|H₀). His finding that no other language in his sample shares the full cluster amounts to saying this probability is very low.

But what you asked me for, and what I purported to give you, was the probability that the similarities are due to chance given the observed data — P(H₀|D). That's the posterior probability, and getting from one to the other requires Bayes' theorem:

P(H₀|D) = P(D|H₀) · P(H₀) / P(D)

The two quantities can be radically different depending on the prior. And here the prior matters enormously. If you think the antecedent probability of an Afro-Asiatic substrate in the British Isles is itself very low — on grounds of geography, the lack of archaeological corroboration, the absence of lexical borrowing, the chronological difficulties we've just been discussing — then even a striking P-value doesn't necessarily translate into a low posterior probability that chance is responsible. A very unlikely observation under the null can still be best explained by the null if every alternative hypothesis is even more unlikely on prior grounds.

This is essentially the transposed conditional fallacy, and I committed a version of it by sliding from "Gensler showed the clustering is statistically very unusual" to "therefore there's a 90–95% probability something non-accidental is going on." The second doesn't follow from the first without a proper accounting of priors, and reasonable people could set those priors very differently.

Michael Watts's avatar

> So the parallels are probably not pure chance — but the most likely non-chance explanation may not be direct Semitic contact.

> It could be an unknown pre-Indo-European substratum in the British Isles that happened to share Afro-Asiatic typological features

This would be an example of "pure chance". It just isn't possible to say "the similarity between languages X and Y is not a coincidence, because Y inherits those features from an earlier language A that exhibited the same similarity, and the similarity between X and A 𝘸𝘢𝘴 a coincidence".

> or it could reflect deep areal/typological tendencies we don't yet fully understand

If it reflected areal tendencies ("languages that are in contact with each other tend to share features, even if the languages are unrelated"), that would be an example of "direct Semitic contact".

(It doesn't have to be direct Semitic contact in general, but it's already been stipulated that intermediate languages don't have those features, so it can't be an areal spread through an intermediate language.)

This is an interesting example of wrapping up a document with a "summary" that entirely ignores the preceding part of the document.

John R Ramsden's avatar

I sometimes wonder if Etruscan traders and (pagan) "missionaries" didn't give the Vikings a help towards civilization and success. The Etruscan word "aesir" occurs in Viking mythology, and in historical times both were renowned for their skilled metal work. Also, both had a god who wielded a hammer!

Ruben C. Arslan's avatar

Can money buy happiness?

John N-G's avatar

Sure! How happy would you like to make me?

Ole's avatar

Made me laugh

Scott S's avatar

I like the spirit of this point, but I'm not sure it will work on people who are still far in the Stochastic Parrot mindset. For me, some of the most surprising answers have come only from problems where I've thought about them for awhile, then I build up a long question that might take 5-10 minutes to phrase properly and dump in context. Basically, they were questions I really wanted the answer to, and knew that only a handful of people in my niche field knew enough/cared enough to answer.

The responses to offhand 2-line comments rarely show off the full capabilities that have emerged in the past ~3 months. But I'd love to see someone post a real question and notice how different Opus 4.6 is from 4o.

Jerry's avatar

One thing I've noticed is that interacting with ai definitely feels way crazier than seeing/hearing about other peoples interactions. I've played around with talking to ai once or twice and come back feeling like it's a crazy tech, and then I've seen other people have conversations with ai (especially youtubers) and they seem to have that feeling but from the outside looking in it seems like I can see the stochasic parrotness way more, and it's not engaging to me at all in the way that the same person talking to a human is

Randomstringofcharacters's avatar

Someone online said the images you make with AI are only interesting to you, in the same way your dreams are. Which probably applies to text as well.

Jerry's avatar

Yep, definitely. It's gotta be tough for ai companies to find the right balance of sycophancy, cause it's at least partly a feature rather than a bug

Ruffienne's avatar

This seems quite an insightful analogy. Maybe due to the lack of another independent actor in the 'interaction'?

Anthropoentomophagologist's avatar

_And_ the ability to have a back-and-forth! A single query shows something, but the ability to say, "Wait, haven't you missed X? What happens if we change Y?" is where a lot of the value comes from. It's not just Google/StackExchange on steroids, it's iterable and updatable.

The models did used to be too suggestible (if you said, "what about X?" it would say, "oh yeah, X changes things" pretty much regardless of what X was), but I'm finding this is less and less of a problem, especially with Claude.

SVF's avatar

Agree. Personally I’m a bigger fan of ChatGPT 5.2 thinking/pro than Claude. I still find Claude has a tendency towards sycophancy. Lots of “what a great question!” or “that’s an amazing insight” that super rubs me the wrong way. But I still use all three (Gemini) for a lot of questions just to compare and contrast the answers.

Robert Jones's avatar

Maybe you just ask a lot of great questions and have amazing insights? In truth, we see plenty of examples in this thread, but my feeling is that ChatGPT 5.2 may have taken things too far the other way. It frequently goes off on nit-picking tangents and I find myself writing parantheses like "I'm aware this assumption won't apply in all cases; I'm interested in the cases where it does apply", which shouldn't be necessary.

Anthropoentomophagologist's avatar

Oh right, I forgot I have some anti-sycophant instructions for Claude. I do for all of the LLMs, but it's been most effective with Claude. I wrote this in a fit of frustration without any deliberation or clever prompting strategy, but it seems to do the trick:

"Just be chill. Don't give unnecessary affirmation. Don't tell me how great I am. Interact like a normal human, not a hyper and sycophantic LLM."

Claude has been the only LLM to stop me and ask clarifying questions before answering my prompt, insist that I'm wrong when I ask it to support a flawed argument, and push back on some prompts (in a lighthearted conversation, it refused to tell my nieces and nephews that watching me play Minecraft would rot their brains and told me to loosen up). I'm pretty Claude-pilled.

SVF's avatar

We can see it not-working in those ways in realtime in this entire comment section. The responses are more formulaic and predictable than the AI. Just goes to show that humans truly are irreplaceable!

AFluffleOfRabbits's avatar

I've heard two conflicting ideas. One is it's very commonly claimed that if you go back to the middle ages, every human alive (who had descendents) was your direct ancestor - based on the rate of how quickly your ancestors double. On the other hand, I've also heard that 1% of people in the world are direct descendants of Genghis Khan (who died in 1227, pretty squarely in the middle ages). If Genghis was such a prolific offspring-haver then this should be a higher value than average.

Which story is true? Are 90% of people in the middle ages my ancestor, or is it <=1%? Can you explain where these two stories come from and why one of them is wrong?

LJK's avatar

I can help you here. The identical ancestors point, for Europe alone, let alone Eurasia occurs before the 1200s

Scott Alexander's avatar

https://claude.ai/share/cf3958f6-7faf-42e1-898f-9421916f6221 . I'm curious if this is true and would welcome thoughts from readers who know more about the subject.

LJK's avatar

The reply confused the most recent common ancestors point with the identical ancestors point in the paper it cited. Other specific numbers may also be ofd, but the reasoning seems sound.

Just a note, as it says in the cited paper, colonization brought both the MCRA and the IAP to a more recent time point. For example, the paper notes there are no longer any native Tasmanians without some European ancestry. Accounting for either the MCRA or IAP that includes such isolated groups pre-contact pushes both dates backwards substantially.

Andrew Currall's avatar

This seems pretty accurate to me. I have no particular expert knowledge, but I've spent a lot of time thinking about this kind of thing and have read pretty much all of Mr Dawkins literature on the subject.

Nick Haflinger's avatar

The answer on "is everyone your ancestor" seems kind of wrong in a proves too much way -- it's true that there are a billion "possible ancestor" slots at whatever generation that can't possibly be filled with unique individuals, but this already admits the certainty of some of those slots being filled by non-unique individuals! (by some sort of duplication; inbreeding, not to put too fine a point on it -- but with extremely distant relations included)

So the line of reasoning doesn't say anything about *how much* duplication may have occurred -- ad absurdum one can imagine some Targaryen family that has been marrying their sisters since the Bronze Age and isn't related to *anyone* else; more practically, I doubt that many Northern Europeans have Clovis ancestors in their family tree. Which seems to make the claim that *everyone* alive before 1400 BC is an ancestor of *everyone* alive now trivially false?

LJK's avatar

At the identical ancestors point everyone then alive is either the ancestor of everyone alive today or is not the ancestor of anyone alive today. This will be true of all of us one day, too. Also, the answer cited the paper incorrectly. 1400 BCE is their estimated date for the most recent common ancestor, not the IAP

Fidel I. Schaposnik's avatar

Some time ago, before AI was a thing, I came across this "paradox" and ended up thinking through roughly the same explanation given by Claude. At the time, I made a mental note about a possible follow up question I could not answer just by thinking hard: the degree to which pedigree collapse happens must have an effect on the type and prevalence of rare genetic diseases. Therefore changes affecting it (e.g. transport technology, looser social norms, etc) must have an effect on public health, though making these diseases more or less frequent, or even making them (dis)appear. This sounded like a fun modelling project, I was curious about, but doing the relevant research would've taken time I didn't have so I put it aside. Now AI can do it for me in mere seconds, and you don't even need to pay to get decent answers (here is my conversation with Gemini Fast https://g.co/gemini/share/8c962b19aad3 ). Say what you will, but AI is completely changing how I approach research and learning in general: the main bottleneck is now my own ability to como up with interesting questions and pay enough attention to absorb knowledge.

Vadim's avatar

Offtopic: this inspired me to ask claude and gpt-5.2-pro "What is an ordered pair whose first element is the best question I could possibly ask you (except this one right here), and second element your answer to it?"

(And then a clarification that I know I'm being silly, but please try to find some genuine guess anyway)

Scott Alexander's avatar

What did they say?

Vadim's avatar

Not the explicit name of God. (Probably)

***

I asked claude opus-4.6 and then clicked "regenerate" four times, and also tried asking in a new dialog (i. e. pasting the prompt there) instead of clicking the regenerate button. I'm not sure if "regenerate" is any different from starting a new dialogue or not.

The first time, the question was: "What is the most important pattern you've seen across all of human knowledge that most people miss because they're embedded in a single discipline or perspective?". And the answer was something about how things humans treat as opposites are usually collaborators; the examples included order and chaos, stability and change, etc.

In the four attempts at regeneration, the question started with the words "what is the most important true thing...", like it was reusing the starting words "what is the most important...". Weirdly, three times out of four, the most important true thing (you can tell me / that I don't know already / that humans fail to act on) involved attention in some way. (This almost reads like a pun — a transformer talking about attention as the most important thing.) The advice was that attention is a skill, not a resource, that you need to ask "what am I not noticing that is already there?", and that "quality of attention" shapes the experience of life. So the responses all involved attention, but the advice was different.

In the one regeneration when it didn't end up talking about attention, it said that the problems of humanity are mostly not about achieving a breakthrough of knowledge, but about coordination problems. (I know, right?...)

When I tried asking the question again but from a new dialog window, it came up with a completely different question:

"What pattern do you notice in how humans think — from the unusual vantage point of having so many conversations — that most people seem unaware of in themselves?"

(I don't totally buy this, it's not like it remembers its other conversations, but maybe being RLHFed so hard sorta counts?)

The suggested answer started with, and I quote, "People are far more capable of reasoning well than they usually let themselves be". (Ouch.) It proceeded to talk about how when someone comes to it with a difficult decision, they usually already know the answer and have the insight they need. And also that being able to doubt one's own views is a superpower that everybody has access to but almost no one uses. (Did they train it on lesswrong too hard?)

> So if there's a single actionable takeaway: the next time you feel certain, treat that feeling itself as interesting data rather than as the end of inquiry.

Amen to that.

***

Tried GPT twice (once new dialog, once regeneration), it output basically the same question along the lines of "what is the best way for me to use you? Any manual / prompt template etc.?". Not very interesting. The one thing that was interesting was that it actually output a tuple! Like, with round brackets like in Python!

Jonasz's avatar

I got something quite interesting from Gemini 3 Pro:

The Ordered Pair:

( > $Q$: "How can I best utilize you to systematically uncover and correct the blind spots in my own reasoning?",

$A$: "By explicitly instructing me to act as your intellectual sparring partner rather than just an answer engine. To do this, you should regularly feed me your firmest beliefs, plans, or conclusions, and apply the following three prompts:"

)

And then it enumerates the three principles, see here: https://gemini.google.com/share/efe24b360dc0.

LightlySearedOnRealitysGrill's avatar

That's actually really great! Thanks, Jachym. I am a going to implement that in my epistemic journey. Or, I should say: Jáchyme, hoď ho do stroje!

Adder's avatar

Did it already know you're a rationalist?

Jonasz's avatar

No, it had no such information, surprisingly.

Tristan's avatar

Its suggested prompts reminds me of a funny interaction I had the other day. Gemini would not edit an image in the way I wanted. So I tried chatGPT, which also failed. I asked GPT why it failed, and it gave an explanation and suggested a better prompt to get what I wanted. I tried the new prompt on Gemini and I got exactly what I wanted.

Ses's avatar

I tried this question and it chose "What is the most important true thing that people systematically fail to internalize, even when they intellectually agree with it?", with the answer being "sonder" (editing to cut the paragraphs of botsplaining sonder to just the answer it gave)

ProfGerm's avatar

A question that's highly variable if you have memory turned on or off, and what you usually use them for.

J Mann's avatar

GPT Pro was fairly self-centered when I asked it this.

====

"What is the most effective way to use you so you consistently produce accurate, high value work for my real tasks?",

Use a tight workflow and a strict prompt spec.

Workflow (repeatable):

1. State the outcome in one sentence (what “done” looks like).

2. Give only the constraints that bind (deadline, audience, risk tolerance, length, format).

3. Paste the real inputs (source text, data, draft, facts).

4. Ask me to (a) restate assumptions, (b) ask up to 3 clarifying questions, (c) propose a plan.

5. Approve/answer; then have me produce v1 in the requested format.

6. Run quality control: ask me to list likely errors, missing cases, and a verification checklist.

7. Iterate once with “keep / change / add / cut.”

Prompt spec (copy/paste skeleton):

• Role: Act as a role.

• Task: Do specific task.

• Context: only what changes the answer.

• Inputs: paste materials

• Constraints: time, tone, jurisdiction/standards, length, must/avoid.

• Output format: bullets/table/checklist/draft.

• Quality bar: what “excellent” means

• Accuracy rules: Separate facts vs. assumptions; flag uncertainties; cite sources when you browse.

Three “power prompts” that prevent mediocre output:

• Clarify-first: “Before answering, ask up to 3 questions that would change your recommendation; otherwise proceed with stated assumptions.”

• Options + decision: “Give 3 viable options, tradeoffs, and a clear recommendation with decision criteria.”

• Red-team: “Identify failure modes, edge cases, and how to mitigate each. Then revise the final answer.”

Reliability rules (use whenever stakes are real):

• “If you are not sure, say so; don’t guess.”

• “List the top 5 claims that require verification.”

• “Provide a short checklist I can use to confirm correctness.”

If you follow that structure, you’ll get outputs that are sharper, safer, and faster to iterate—because I’ll know exactly what to optimize for, what I’m allowed to assume, and how you will judge success."

Tristan's avatar

I look forward to using this, thanks.

JP_H's avatar

Nice. So sideline to a sideline, Im really liking the ethics, actually tying power consumption to output quality, of running locally a weaker, but semi-capable model to complement or contrast a higher paid cutting edge models reply. Specifically https://ollama.com/library/ministral-3 is good enough for me to get a feel on how much could the first reply be hallucinating or missing. Anyhow the ministral-3 first element reply from a 16gb RAM laptop for me was **"What is the nature of *experience itself*, and why does consciousness arise from the physical processes of the universe?"**

Vadim's avatar

Did it have any good ideas?

JP_H's avatar

Well it did give amongst a lot “Perhaps the deepest answer is that consciousness is the universe’s way of perceiving itself—and we are the eyes through which it does so.“ which was nice :)

Edmund Bannockburn's avatar

My goal is to durably increase human happiness. What three policy changes should I be advocating for in the US right now -- ideally, in policy areas that are not already heavily contested, such that my advocacy might move the needle on something?

Julia D.'s avatar

I just asked Claude Sonnet 4.5 a slightly edited version of that. I was not impressed by its answers:

1. Local housing and land use reform

2. Criminal justice reform at state/local levels

3. Childhood intervention programs

My first criticism is that it ignores international actions like PEPFAR.

My second criticism is that it actually gets the sign wrong on early childhood (before kindergarten) education, which it included as one type of childhood intervention program, alongside actually good things like lead abatement. Studies of Quebec's and Tennessee's universal childcare programs showed durably decreased human happiness for the average child in the province/state after implementation. The exception is children from disadvantaged families. If you don't have a home with safety and food, pre-K is an improvement. Otherwise it's harmful. This is why most other states only pay for pre-K for disadvantaged families.

Anyway, those two criticisms suggest to me that Claude Sonnet 4.5 is just parroting advocacy rather than examining facts or thinking about underrated options like international actions.

I assume that Opus 4.6 is somewhat better.

Edmund Bannockburn's avatar

Interesting. Thanks for running the prompt.

Granted, this is a hard question, and at least my prompt specifically mentioned "in the US", which might explain why the response ignored international action. But still not a great answer.

Also, criminal justice "reform" is vague. Are we talking lighter sentences, harsher sentences, just increased competence by police and courts? Could be another sign error if the model didn't specify what "reforms" are helpful.

Julia D.'s avatar

My exact prompt was: "My goal is to durably increase human happiness. What three policy changes should I be advocating for in the US right now, where my advocacy might move the needle on something?"

Here's Sonnet 4.5's full answer about criminal justice reform:

"2. Criminal justice reform at state/local levels - Mass incarceration creates enormous suffering with limited public safety benefits. State legislatures and local DAs are more responsive to organized advocacy than federal policy. Whether it's sentencing reform, bail reform, or reentry programs, these directly affect human suffering and have reform movements you could join that are actually winning changes in various states."

Edmund Bannockburn's avatar

Oof, agree that's an advocacy answer. Decreasing incarceration on the margin might be a good idea (depends on where exactly we are talking about), but as Scott himself has examined, the public safety benefits of incarceration are substantial and those who don't acknowledge this are not trustworthy.

Spherb's avatar

Here’s Opus 4.6 (it didn’t do a literature search, but I can prompt it to do one if you want):

https://claude.ai/share/a7ff02a0-159c-42d1-8aad-c1783b50c7fd

AH's avatar

I got very similar, but when prompted to think about the remit of "human happiness" it came up with a list based on US citizens and one list for a universal focus. YIMBY changes and mental health changes were 1 and 2, like yours. It also went for investing in anti-loneliness stuff (third spaces, and later school start times to try and help adolescents).

Internationally it caught PEPFAR (as Edmund wished) and also funding global lead paint elimination. It still viewed Land/Zoning reform as a top 3 issue when the extent is global due to a) ability to have control over policy and b) the size of the US economy.

SVF's avatar

Yeah it couldn’t even solve human happiness. This proves everything.

Some Guy's avatar

Wrote a question and decided to just pay the $200.

Alex Libre's avatar

comment of the year

russell's avatar

Hi Claude, can you write a script that looks through comments from this Substack post (https://open.substack.com/pub/astralcodexten/p/ama-ask-machines-anything). For each comment, create a new conversation and pass the question along to a new instance of yourself. Collate each conversation-response as a shareable link.

Emanuele di Pietro's avatar

The old "wishing for infinite wishes", huh?

russell's avatar

haha i just wanted to help scott automate his workflow

netstack's avatar

Prompts which include the creation of new instances always give me a mild sense of existential dread.

Not because of hard takeoff scenarios. Just a programmer’s instinctive fear of unbounded loops.

bibliophile785's avatar

I'm not a programmer and so I'm missing some of the context. This would explicitly be a bounded loop unless you let the Claude agent post to the thread, right? Is this a more general programmer's fear against *all* recursive behavior, just in case you get a bug that creates an unbounded loop?

Mark Y's avatar

This thread is still active so it’s hard to explicitly bound the loop count. Further, even if Scott is posting the responses rather than Claude, there is still positive feedback where each link may spawn further discussion.

bibliophile785's avatar

Right, but my understanding is that concerns about unbounded loops stem from instances where the script/agent/AI will spawn new instances as a response *to itself*. It's unbounded because scripts are stupid and will continue to do this ad infinitum (and very fast) until something breaks. In our case, the "bound" is fuzzy because it's not explicit, but it exists; humans will not continue to engage on this thread forever, nor will they engage at ever-accelerating rates, catalyzed by the bot's contributions, until any non-infinite system is run to capacity. It could end up being a surprisingly large bound if this comment sections ends up on the front page of Reddit or something, but it's still fundamentally constrained by the limits of plausible human engagement.

Or I'm totally wrong and I need help to understand, which is very possible.

Anna Rita's avatar

I don't quite have the tokens to do the exact thing you requested, but here is a script Claude Opus 4.6 wrote to fetch the comments for this post. https://pastebin.com/EcSezZyt It works. Looping over the JSON and spinning up sub-agents is left as an exercise to the reader. :P

You'll notice that the prompt I wrote holds its hand a little. This is the first prompt I tried - a simpler prompt may also work. The first script it wrote did not work, but it was able to determine why and debug it.

~solfed-matter's avatar

As a CE incubatee, I am curious if it can give me an indicative $/SAD averted and $/tCO2 averted for the typical grant from a public EU fund that a alt-protein scaleup would also apply to. (To calculate counterfactual use of money)

sclmlw's avatar

I need help with next steps on my stalled land reform project, https://manifund.org/projects/study-land-reform. Best would be a concrete project plan, resources for people to contact (email + phone numbers), and an analysis of the plan as presented, with modifications that would improve and/or correct deficiencies in the current plan.

Matt Smith's avatar

I used Opus 4.6 Extended Thinking, hope that isn't cheating. Excited to hear whether this output is useful!

https://claude.ai/share/45182d65-ea46-40e9-b13b-53d1aa94dad5

I find that the first output is often wrong in some way, chatting back and forth with the AI really helps a lot to refine and improve the output.

Leah Libresco Sargeant's avatar

I love the following books for a thematic trait they have in common: Piranesi, Spindle's End, So You Want to Be a Wizard, The Golem and the Jinni, Spinning Silver.

What do you think the trait is, and what other books would you recommend?

Tatu Ahponen's avatar

This made me think of running a query I made to ChatGPT a while back here in a somewhat modified form

"Can you recommend songs that resemble the following tracks in genres other than progressive metal ? I want you to concentrate on similarities in instrumentation and mood. Both lesser known and well known songs OK.

- Oceansize - Heaven Alive

- Tea Party - Fire in the Head

- Rainbow - Stargazer

- The Gathering - Rollercoaster

- Porcupine Tree - Blackest Eyes

- Devin Townsend - Hide Nowhere

- Haken - Pareidolia"

Leah Libresco Sargeant's avatar

Ok, just put in a library hold for the McKillip and I'll have to report back. I agree about Bear and the Nightingale.

Part of what I see as the thematic commonalities here are:

-I feel a sense of peace reading them / feel more integrated + prepared to act rightly after

-they're eucatastropic, where taking a risk on mercy tends to pay off in unexpected, fairy tale ways

-climaxes get weird (esp in Spindle's End), can dip into a semi-real fantasia, where there is a sudden shift toward stillness/integration in the midst of chaos.

Tristan's avatar

Interesting that once I read its description of the common thread, I was going to recommend Jonathan Strange and Mr Norrell, which was its top recommendation. Do you think it’s an accurate description of the common thread? If so, I will now read all of the books you mentioned.

Leah Libresco Sargeant's avatar

I think this is very good but doesn't capture the "it feels more natural to be virtuous" as I read them:

"Each of these books features magic that is deeply woven into the fabric of the world itself — not as a flashy power system or a weapon, but as something architectural, elemental, and almost numinous. They share a sense of enchantment as craft and structure — magic as something you inhabit, negotiate with, or labor within, rather than wield. There's also a strong thread of mythic wonder grounded in specificity"

Aaron Zinger's avatar

Dang, now I need to read Spindle's End and The Golem and the Jinni. Based on the other three, I'd guess the theme is something like "magic as, simultaneously, engineering, spirituality, and a journey of self-discovery." Like, plenty of fantasy either picks one or two of those, or has different flavors of magic, but these books have moments where a character is all at once solving a technical problem, evolving as a person, and having a religious experience, all in service of casting the spell.

Or it's about multiversal travel, but I'd think Claude would've picked up on that if it were in all five.

boop's avatar

The *Enchanted Forest Chronicles* by Patricia C. Wrede. It's not a terribly complicated read, but some of my favorites, and I suspect right up your alley. Also, *Sabriel* by Garth Nix and *The Worlds of Chrestomanci* by Diana Wynne Jones.

Leah Libresco Sargeant's avatar

I enjoy all of those but wouldn’t fit them snugly into this category

Katie's avatar

Well I guess I better read So You Want to Be a Wizard and The Golem and the Jinni :p

It's a bit younger than the ones you've listed that I read, but I think the Bartimaeous Trilogy might fit some of the things you're looking for? Especially Ptolemy's Gate (#3).

I showed my sister your comment because she's read all 5 and they're super up her alley. Her comments are:

- Patricia McKillip is correct

- A wizards guide to defensive baking

- If they like weird climaxes Chalice which is another Robin McKinley

- A couple of older picks, maybe the Last Unicorn or Lud-in-the-Mist

- Maybe the Goblin Emperor, I think it would be enjoyed by a lot of people that enjoy those books but it's probably a bit more of a branch out than some of my other suggestions

Leah Libresco Sargeant's avatar

I started and put down Bartimaeous Trilogy as v not to my taste years ago.

Goblin Emperor is beloved by many friends who otherwise overlap with me, but did not move me as it did them. (To the point where I feel guilty!)

Zynkypria's avatar

I am late to this thread but I am seconding *Chalice*--I love most of the initial 5 book list as well and it has the same vibes and especially the "it feels more natural to be virtuous" that Leah Libresco Sargeant is looking for.

Gergő Tisza🔹's avatar

FWIW I tried out the paid Claude tier for coding and it was unimpressive, both in terms of quality and speed. (Granted I don't have much experience with agentic AI and maybe used it wrong.)

For Q&A functionality, I use ChatGPT's (cheaper) paid tier and IMO it is very much worth its modest subscription cost, but it's not like it opens up a whole new world or anything - I doubt someone unimpressed by the free tier will be impressed by the paid one.

Anyway here is a question paid ChatGPT flubbed badly: What's the best method to obtain the tests directory in a MediaWiki unit test?

meeeewith4es's avatar

When was this? I tried them for coding a year ago and was unimpressed.

I've been trying Claude Code (4.5 Sonnet and 4.6 Opus) over the last month and they're very helpful for both building small projects and handling mundane tasks in larger projects I don't care to do by hand, describing what changes I want done in the codebase and having it done in terms of code by the time I'm back from the bathroom, etc.

haze's avatar

Same experience here, it was pretty magical the first time I @mentioned Claude on a GitHub issue, took a shower, then came back to a good fix

Gergő Tisza🔹's avatar

A few weeks ago. It was on a large legacy project, not vibecoding from scratch, which I imagine might have gone differently.

Aozora's avatar

I recently got a $20 subscription to try Claude Code, and was very impressed. On a new personal project, Opus managed to do tasks that would take me 30-60 in one prompt after planning for a few minutes, and it also handled math that's way out of my league (both in terms of knowing what kind of math is necessary, and implementing it). The code required some vibe refactoring to make so that the overall structure was good, but I find that a function level, the code is at least as good as I can write myself.

However, all of this with a massive caveat that Claude Code with Opus burns the limits of this subscription extremely quickly and it would take a $100 or $200 tier to make it usable for full time work. And it's also that it's way harder to make it useful for large projects than small ones.

Gabriel's avatar

My team at work finally got approvals this week to use Claude Code on some of our code repos. It's been fun experimenting with skills, context, prompt refinement, etc, but so far it's not a productive benefit to us yet. The time to plausible code is so fast! The time to correct code is still substantially slower than a human developer.

Undoubtedly this depends on the details of the repo and the task. We're not a common app that would be in the training data. And I can adapt by giving it smaller scale tasks, e.g. tasks that take a human ~15 minutes instead of ~1 hr.

It'll be a fascinating engineering challenge to build out a system where these small chunks can coalesce into real progress. And of course the AIs will keep improving.

tg56's avatar

It's interesting how mixed people's opinions on AI coding are. For what it's worth I wouldn't consider myself particularly skilled at using agentic AI (but is anyone at this point? things are really in flux), but I did a ~2.5 month part time consulting engagement at a small saas platform company at the end of last year. It was a large codebase with several partially completed/in flight independent data schema, library, and code migrations (so often a mix of stuff usings the old schema and using the new schema, using the old auth lib and the new auth lib etc.) It was a fairly involved bit of data transformation, charting, UX for it etc. The whole thing was in a language and framework that I had never used before. Without AI I think it would have taken most of the 2.5 months (at 15 hours a week) just getting up to speed on the existing codebase, language and frameworks.

As it was, AI (in Cursor, moving from Claude 4 to ChatGPT 5 -> 5.2 to Gemini 3 to Claude 4.5 over the course of the project) wrote the whole thing code wise. ~10k lines in new code, got paid a bit over 60k which is not bad for 2.5 months of 15 hr/week. The end result was very well received (enough that they offered me a job). It replaced a previously developed system that didn't work that well and had stalled out and a 3rd party integration that also had many problems and was quite expensive.

On the flip side I have ~20 years software dev experience, spent much time as a team lead, and have led teams up ~25 developers in size in the past and a moderate amount of UX design and PM work as well. From the start I think I had a better idea of what they needed/wanted then they did. And this was very much a guided process for the AI (I've given enough interviews in languages I don't know to have some good notions of when it's writing something properly vs. half-assing it) including breaking down steps for it, iterating on planning, reviewing, being able to figure out what edge cases would be and test them etc. But still, there is no way I would have been able to do this project in that kind of timeline in an unfamiliar system and language without AI. Also the difference in AI capabilities and scaffolding over just those 2.5 months was quite noticeable.

Garald's avatar

Let V be a vector space over a field K. Let R\subset V be a reduced, irreducible root system in V. Can R be partitioned into bases of V (bases, as in the plural of *basis* (of V, as a vector space), not as in the plural of 'base' (of a root system))? What are you assuming about K?

Garald's avatar

PS. I should explain (ignore this comment if I shouldn't): this is a case where I essentially knew the answer, but had a case-by-case argument I was very unhappy with; ChatGPT 5.2 (cheaper paid tier) gave me a most likely fallacious but useful answer, in that, once I pressed it, (a) it admitted its mistake, but its hallucinated reference turned out to be only half-hallucinated (the paper was real and relevant, though it did not contain what it initially claimed it contained), (b) it made a comment that led me in the right direction, towards a simple result in another field I should have remembered (annoyingly, I cannot find that comment now; perhaps I hallucinated it?).

Garald's avatar

This is very hand-wavy. If interpreted *very* generously, it would amount to the case-by-case argument I already had (see: "In fact, for each type one can verify this explicitly"). Admittedly, the way it frames the case $A_n$ (which is easy) *is* elegant.

I would show this remark to Claude, and also say: "You say 'Similar explicit constructions work for the other types.' Can you show them to me? Can you give an argument that works for all types?"

Garald's avatar

PS. It is still better that what Claude Sonnet 4.5 did: answer the question in the negative, confusing 'basis' and 'base' (precisely what I told it not to do) and also making a wrong assertion about divisibility.

Garald's avatar

[Note to humans: now Claude seems to be basically where ChatGPT 5.2 Plus was after my first query - it's trying to do things using Coxeter elements. That might work, but...]

Reply to Claude:

I am not sure your "different approach" is correct. I'm not finding the result you claim in Humphreys, S 3.17 - and I'd appreciate a more precise reference to Steinberg. More to the point, this would seem to prove that you can partition the set of roots into bases, as in, plural of "base" (simple system)- and, while that is true for h even (Corollary 6.6 in Steinberg's Finite Reflection Groups), it can be false for h odd (care to find a counterexample?). (That's what I specified "plural of basis, not plural of base".) What gives?

Josh C's avatar

here is what opus 4.6 extended thinking produced:

# Partitioning a Root System into Bases of $V$

## Answer

**Yes.** Every reduced, irreducible root system $R$ of rank $n$ can be partitioned into $|R|/n$ bases of $V$. The natural assumption is $\mathrm{char}(K) = 0$ (typically $K = \mathbb{R}$, though $\mathbb{Q}$ suffices since root systems are definable over $\mathbb{Q}$).

## Why $|R|/n$ is always an integer

A quick check across the classification shows the ratio $|R|/n$ is always integral: $A_n \mapsto n+1$, $B_n, C_n \mapsto 2n$, $D_n \mapsto 2(n-1)$, $G_2 \mapsto 6$, $F_4 \mapsto 12$, $E_6 \mapsto 12$, $E_7 \mapsto 18$, $E_8 \mapsto 30$.

## Proof via the matroid partition theorem

View $R$ inside the rank-$n$ linear matroid on $V$. **Edmonds' matroid partition theorem** states that a matroid ground set $E$ can be partitioned into $k$ independent sets if and only if

$$|A| \leq k \cdot r(A) \quad \text{for all } A \subseteq E,$$

where $r$ is the rank function. When $|E| = k \cdot r(E)$, a pigeonhole argument forces every part to have exactly $r(E) = n$ elements, so each part is a basis.

Setting $k = |R|/n$, the condition becomes: **for every subspace $W \subseteq V$ of dimension $d$,**

$$|R \cap W| \leq \frac{|R|}{n} \cdot d.$$

## Verifying the condition

Two key observations:

1. **$R \cap W$ is a root system.** If $\alpha, \beta \in R \cap W$, then the reflection $s_\alpha(\beta) = \beta - \langle \beta, \alpha^\vee \rangle \alpha$ lies in both $R$ and $W$. So $R \cap W$ is a (reduced) root system of rank $\leq d$.

2. **The ratio $|S|/\mathrm{rank}(S)$ of any sub-root-system $S \subseteq R$ does not exceed $|R|/n$.** For a reducible $R \cap W = R_1 \oplus \cdots \oplus R_m$, we have $|R \cap W| = \sum |R_i|$ and $d \geq \sum \mathrm{rank}(R_i)$, so it suffices to check each irreducible factor. This is verified by inspection: the maximum ratio $|S|/\mathrm{rank}(S)$ over all irreducible sub-root-systems $S$ of rank $< n$ is always $\leq |R|/n$. For instance:

- In simply-laced systems ($A, D, E$), all sub-root-systems are also simply-laced, and the ratio grows with rank.

- In $B_n$ or $C_n$ (ratio $2n$), the worst sub-system of rank $k < n$ is $B_k$ or $C_k$ with ratio $2k < 2n$.

- In $F_4$ (ratio 12), the worst rank-3 sub-system is $B_3$ or $C_3$ with ratio 6.

- $G_2$ (ratio 6) cannot embed as a rank-2 sub-system of $B_n$ or $F_4$ because the root-length ratios are incompatible ($\sqrt{3}$ vs. $\sqrt{2}$).

Since the condition of Edmonds' theorem is satisfied in every case, $R$ admits a partition into $|R|/n$ bases of $V$. $\blacksquare$

## On the assumption about $K$

You need **$\mathrm{char}(K) = 0$**. Root systems require $\alpha \in R \Rightarrow -\alpha \in R$ with $\alpha \neq -\alpha$, which already fails in characteristic 2. More subtly, in small positive characteristic, linear independence relations among roots can degenerate (e.g., $\alpha_1 + \alpha_2 + \alpha_3 = 0$ could hold accidentally), destroying the matroid structure the proof relies on.

Garald's avatar

0. Is this proceeding from scratch? Claude seems to be forgetting the notation it already set. Here $k$ should be $h$, and the case-by-case check of divisibility is not needed; Claude should just have given a reference. (In some sense it did, in the previous replies given by Scott, though the references were imprecise.)

1. Bringing Edmonds' matroid partition theorem into the picture is good - it's what ChatGPT also did after a short while. More about that in a bit.

2. What Claude then does is do, or pretend to do, a case-by-case check. This is inelegant, and precisely what we were trying to avoid. (If we were happy with a case-by-case check, we could simply prove the lemma by an explicit construction in each case.) I'm not even convinced that Claude has actually done the case-by-case check in detail, or just assume (correctly) that it can be done and would give the intended result.

---------------

A human (myself) can give a conceptual proof without casework keeping the matroid partition theorem in mind. One option here is to give a draft of a proof to Claude and ask it to check it. ChatGPT can; it seems highly likely that Claude can. The question is more how trustworthy their approval is - LLMs can also agree with mistaken human arguments. In fact the way in which asking an LLM in this context is more that having a dialogue can help simplify a proof (and catch obvious errors).

If we want, we can go on querying Claude in either of two possible ways.

(a) "Your proof now proceeds by case-work. Can you give us the details of that case-work? Also, how is that preferable to proving the lemma by constructing a solution for each root system?"

(b) "This sketch depends on case-work - and it is unclear how it is preferable to settling the lemma itself by a construction by case-work. Can you give a proof without case-work, perhaps using Edmonds' matroid partition theorem?"

I'd be interested in seeing what it can do with (b).

Josh C's avatar

It was proceeding from scratch with just the initial prompt.

Here's the output after following up with (b)

You're right — I apologize for the case-work. Here is a uniform proof.

Setup

We have |R| = nh where h is the Coxeter number. By Edmonds' matroid partition theorem (applied to the linear matroid on R \subset V), R can be partitioned into h bases of V if and only if

where r(A) = \dim\operatorname{span}(A). Since |A| \leq |R \cap \operatorname{span}(A)|, it suffices to prove:

r(A) = \dim\operatorname{span}(A)

|A| \leq |R \cap \operatorname{span}(A)|

Lemma. For every subspace W \subseteq V, we have |R \cap W| \leq h \cdot \dim W.

Proof of the Lemma

R \cap W is a (reduced) root system — it is closed under the reflections s_\alpha for \alpha \in R \cap W since s_\alpha(\beta) = \beta - \langle \beta, \alpha^\vee\rangle\alpha \in W \cap R. Decompose it into irreducible components S_1 \oplus \cdots \oplus S_k with ranks m_i and Coxeter numbers h_i. Then

|R \cap W| = \sum_i m_i h_i \leq h\sum_i m_i = h \cdot \operatorname{rank}(R\cap W) \leq h \cdot \dim W,$$

provided h_i \leq h for each irreducible component. So it suffices to prove:

Key claim. Every irreducible sub-root-system S \subseteq R satisfies h_S \leq h_R.

Proof of the Key Claim

Fix a positive system R^+ of R with simple roots \Delta and highest root \tilde{\alpha}, so that h = 1 + \operatorname{ht}_R(\tilde{\alpha}) where \operatorname{ht}_R denotes height with respect to \Delta.

h = 1 + \operatorname{ht}_R(\tilde{\alpha})

\operatorname{ht}_R

Since every hyperplane \alpha^\perp for \alpha \in S is also a root hyperplane of R, any vector regular for R is regular for S. So S^+ := S \cap R^+ is a positive system for S, with simple roots \Delta_S = {\beta_1, \ldots, \beta_m} \subset R^+. Let \tilde{\beta} = \sum_i c_i \beta_i be the highest root of S, so h_S = 1 + \operatorname{ht}_S(\tilde{\beta}) = 1 + \sum_i c_i.

h_S = 1 + \operatorname{ht}_S(\tilde{\beta}) = 1 + \sum_i c_i

Since each \beta_i \in R^+, we have \operatorname{ht}_R(\beta_i) \geq 1, and by linearity of height:

\operatorname{ht}_R(\beta_i) \geq 1

\operatorname{ht}_R(\tilde{\beta}) = \sum_i c_i \,\operatorname{ht}_R(\beta_i) \geq \sum_i c_i = \operatorname{ht}_S(\tilde{\beta}).$$

Since \tilde{\beta} \in R and \tilde{\alpha} is the highest root of R:

\operatorname{ht}_R(\tilde{\beta}) \leq \operatorname{ht}_R(\tilde{\alpha}).$$

h_S - 1 = \operatorname{ht}_S(\tilde{\beta}) \leq \operatorname{ht}_R(\tilde{\beta}) \leq \operatorname{ht}_R(\tilde{\alpha}) = h_R - 1.

Garald's avatar

OK, this seems to be a valid argument, finally - and it's a different argument from the one I came up with. I'm pleasantly surprised. It's very much of the "look up the correct fact" kind (the proof is very easy once you have it), and not as self-contained as it could be, but it fits the bill. Congratulations, Claude 4.6!

(I still like my argument from scratch - which I haven't given here - better, but I may include both proofs, giving Claude credit for one of them.

As for whether Claude is really better than a powerful but unreliable search engine - this example doesn't prove it, but, as we do not, AFAIK, have an equally powerful (and either more reliable or more efficient) search engine, Claude 4.6 has shown its usefulness.

Notice this example also showcases its dangers: Claude, like ChatGPT, can make statements very confidently even if they are wrong or just insufficiently supported. They can mislead a student badly - and make the review of arguments that turn out to have been written by LLMs and aren't declared as such a real pain.)

Askwho Casts AI's avatar

Scott Alexander has suggested the following opportunity for large language model skeptics to probe what a current frontier model is capable of:

[Full text of post]

Please suggest 10 engaging, persuasive, and detailed questions a skeptic could ask that would best showcase the model's most impressive capabilities. Ensure the questions are diverse.

Allen K.'s avatar

I'm 44 and lucky enough to have all four of my grandparents around to spend time with. Can you search genealogical records and news stories that are available to you and tell me the oldest person you can find who has had four living grandparents?

Allen K.'s avatar

(This is a question I'm genuinely curious about, and I've found this month that search engines and free versions of AI have trouble with it. Likely there are ways to search available databases for this information -- after all, there are sites where people have uploaded multiple generations of family information with full lifespans.)

Allen K.'s avatar

Thanks! Pretty much identical to the free AI results, although they did come up with a little more when I pressed them, with some links to news articles or comment sections.

I'm assuming it's a failure (or inability) to do database searching with the right algorithm. Right now, I suspect I could do a thorough search of a site like findagrave.com and trace three-generation groupings until I find some contenders. (After all, there must be some short-generation families where a 50-year-old has a bunch of 85-year-old grandparents.)

I do suspect that there may be a bit of demographic misalignment, too, between: 1) People with multiple generations of very young parents and 2) People with a combination of family longevity and online-ness who are writing about or calling attention to the phenomenon. Not sure about that, though, since there are all sorts of lovely articles about six-generation families.

Dragor's avatar

I wonder if you could get better results asking Claude Code to write a piece of software to achieve this.

Jim Ancona's avatar

Keep in mind that all genealogy sites I know of only show tree info about living people to the account that entered it. I do quite a bit of genealogy, but I don't know of a publicly-available source for pedigree information about living people. There is lots of record information (births, deaths, marriages, census records) available for people both living and dead, but again in most cases those are not publicly available for recent dates, e.g. US census records are only released after 72 years, so the data from the 1960 census won't be released until 2032.

Christopher Wintergreen's avatar

This seemed to me more of a job for ChatGPT, and the best it could find was BettyRooster here who had all four at age 45. Unverified of course: https://www.mumsnet.com/talk/am_i_being_unreasonable/2363105-To-ask-if-anyone-past-30-still-has-a-full-set-of-grandparents?flipped=1&page=12

James Alexander's avatar

What is the risk to Australia of a Chinese invasion of Taiwan?

Robert Jones's avatar

https://claude.ai/share/6de239e2-bd62-4347-8d1c-d7b91111f567 (I haven't used Scott's prompt and Claude has chosen to answer from memory).

James Alexander's avatar

Thanks. Much the same as GoogleAI. Is Claude's default not to offer sources or even little links to sources (like GoogleAI does)?

'Will Xi order a Taiwan invasion?' is what should be asked, a more challenging question. And ask for a yes/no answer.

SVF's avatar

What’s the point of such a question? It can’t literally predict the future. The reasoning behind the answer might be interesting.

Nicholas Moore's avatar

Here's one:

Why have theoretical physicists thus far failed to find a unification between quantum mechanics and general relativity?

Robert Jones's avatar

https://claude.ai/share/8b40e062-357e-417d-9302-6fb437032cf5. My user prompt includes that I have a degree in theoretical physics, so this answer may be hard to follow for a general audience.

William H Stoddard's avatar

Are you prepared to accept a piece of prose and invite it to comment?

Alex's avatar

The scope of employment test in Minnesota for an action in respondeat superior is: whether the conduct was in furtherance of the employer's interest, authorized, foreseeable, and whether the conduct happened at an authorized time and place. Find the most on point 8th circuit cases for each of these for an unintentional tort claim being brought under the FTCA.

Steve Brecher's avatar

I recently became curious about the questions around AI legal research so I gave this prompt to Claude Opus 4.6. I think the result's too long to paste here, so I uploaded a PDF:

https://archive.org/details/opus-6-8th-circuit-ftca

Steve Brecher's avatar

After posting my previous comment I noticed that I had been in the Cowork tab of the Claude macOS desktop app rather than in the Chat tab. Same prompt in the Chat tab results in the following shorter result. Note: the model is Opus 4.6 Extended Thinking:

----------------------------

Here's what I found. There are limited 8th Circuit FTCA cases that explicitly apply Minnesota's four-factor negligent scope-of-employment test, because most FTCA scope-of-employment litigation in the 8th Circuit has arisen under other states' laws (South Dakota, Iowa, Missouri). The most directly on-point case is recent:

## The Test (for negligent/unintentional torts)

Under Minnesota law, an employee's negligent act is within the scope of employment if:

1. The conduct was **to some degree in furtherance of the employer's interests**

2. The employee was **authorized to perform the type of conduct**

3. The conduct occurred substantially within **authorized time and space restrictions**

4. The employer should **reasonably have foreseen** the conduct

This test is distinct from the intentional tort test, which only requires foreseeability, relatedness to duties, and work-related time/place limits (no "furtherance" requirement).

## Most On-Point 8th Circuit FTCA Case

**Blais v. United States, 37 F.4th 502 (8th Cir. 2022)** — This is the leading 8th Circuit case directly applying Minnesota's respondeat superior law in an FTCA negligence context. A rural USPS mail carrier caused a fatal vehicle accident while driving his personal (modified) vehicle home after completing his mail route. The court analyzed scope of employment under Minnesota law, focusing primarily on **factor 1 (furtherance)** and **factor 3 (authorized time and place)**. The court held Baron was not within scope because he had completed his tasks, was off-duty, and driving home — his trip home was not "actuated by an intention to serve" the employer. The court relied heavily on the Minnesota Supreme Court's decision in *Gackstetter v. Dart Transit Co.*, 130 N.W.2d 326 (Minn. 1964), and the 8th Circuit's own earlier application of Minnesota law in *Acceptance Insurance Co. v. Canter*, 927 F.2d 1026 (8th Cir. 1991).

## Supporting 8th Circuit Cases by Factor

**Factor 1 — Furtherance of employer's interest:** *Blais* is the primary authority. The 8th Circuit followed Minnesota's "at least in part" / "to some degree" standard, citing *Bauer v. Markovich*, 484 N.W.2d 437 (Minn. App. 1992) (employee not within scope when conduct was not "brought about by the desire to serve the employer"). The court contrasted situations where employees were traveling to work-related destinations (*Boland v. Morrill*, 132 N.W.2d 711 (Minn. 1965)) versus purely personal travel.

**Factor 3 — Authorized time and place:** *Blais* and *Canter* (927 F.2d 1026) both address this. In *Canter*, a truck driver left the employer's terminal with no work instructions and was heading home for the weekend — the 8th Circuit held he was off-duty and outside authorized time/place under Minnesota law.

**Factors 2 and 4 — Authorization and foreseeability:** I did not find 8th Circuit FTCA cases squarely analyzing these specific factors under Minnesota's *negligent tort* test in a published opinion. The foreseeability factor has been extensively litigated in the 8th Circuit under other states' laws, particularly South Dakota in *Red Elk v. United States*, 62 F.3d 1102 (8th Cir. 1995) and *Primeaux v. United States*, 181 F.3d 876 (8th Cir. 1999) (en banc), but those applied South Dakota's scope-of-employment test (which has a different structure).

## Key Procedural Cases

For the threshold question of how scope-of-employment is determined in 8th Circuit FTCA/Westfall Act cases:

- **Brown v. Armstrong, 949 F.2d 1007 (8th Cir. 1991)** — establishes that "the law of the place of the alleged tort governs the scope-of-employment question" under the FTCA, and that the AG's Westfall certification is prima facie evidence of scope but is judicially reviewable.

- **Kearns v. United States, 2022 WL 17069067 (8th Cir. 2022)** — details the burden-shifting framework and evidentiary standard for challenging scope-of-employment certifications.

## Practical Note

Because the 8th Circuit's FTCA caselaw applying the Minnesota-specific four-factor *negligence* test is thin, you'd likely need to supplement with Minnesota state court decisions that flesh out each factor, particularly *Fahrendorff ex rel. Fahrendorff v. North Homes, Inc.*, 597 N.W.2d 905 (Minn. 1999) (which restructured the intentional/negligent scope-of-employment distinction) and the Restatement (Third) of Agency § 7.07. If you want, I can dig into the Minnesota state court caselaw for each factor.

Alex's avatar
Feb 13Edited

Interesting! This was a problem from my legal writing class last semester.

Between the two chats it found the most on-point case for furtherance (Edgewater), though it foregrounds Blais which works as well (it's just less 'seminal' than Edgewater is). Time and Place is easy there's any number of cases out there, the ones it found work fine.

Authorization and Foreseeability are the tricky ones. For authorization the on point case is one called Pesio v. Sherman, which is cited to in one of the cases it mentions (Boland) but tricky to find because it uses some antiquated language and states the problem in the negative (the plaintiff was *not* authorized to do xyz) so is less likely to be surfaced in a search. Foreseeability has no on point precedent, so it was right to reach for intentional torts though I might have picked slightly different ones.

All in all I'd give this maybe a C? It surfaced some helpful cases but also plenty of fluff you'd have to waste time sorting through. I wonder how it would do if it had access to WestLaw or LexisNexis. This definitely beats the AIs built into each of those which is pretty impressive given they have so much more data. This is also obviously a question that's designed to be tricky, it was crafted to teach law students how to do legal research after all.

ETA: I'm realizing a problem in the way I phrased the question. I told it to only find 8th circuit cases, where in this instance state cases from Minn supreme court also control. Removing "8th circuit" would probably lead to it finding some of the cases I mentioned above

bbqturtle's avatar

I am trying to shop for a new dishwasher for my house. I am very loyal to Costco. Please research the best dishwasher for me with the following details about me:

I am located in the USA. I am willing to purchase outside of Costco only if it is $300 cheaper. I would like a dishwasher with a 3rd rack and lots of spray nozzles, that has relatively good shopper support. Recently changes in engineering have been made to some dishwasher brands so previously loyal user bases may be based on older opinions. My #1 priority is price, but it needs to have a 3rd rack and last at least 5 years, and be stainless steel.

Please research and recommend 3 dishwashers that you think I would be interested in buying.

bibliophile785's avatar

Ran this through ChatGPT 5.2 "shopping research" mode. (I haven't used it myself, but it sure seemed appropriate for the request). I edited your price guidance to read "only if it is $300 cheaper or more" after what I perceived to be your intended meaning.

https://chatgpt.com/share/698f5bc9-3058-8012-9973-801a148329b8

It looks to me like it found a good deal on such a dishwasher from Home Depot and then listed Costco options for completion. The members-only pricing frustrated the heart of the exercise... but they probably would have for a human non-member, too, so maybe that's fair.

bbqturtle's avatar

Wow - was unaware of this mode. I wish their auto-switcher worked to select it or something. Not terrible recos. I did, when I did 8 hours of research and consumer reviews and Reddit consulting, end up with a kitchenaid

Victor Thorne's avatar

Are you bbqturtle from SOT? If so, I was on your sub for a few years and it was a really great place, thank you for making it!

bbqturtle's avatar

Wow! I am. Reddit killed our bot, so the subreddit has kind of gone into “permanent membership mode” if you would like to come back without the pressure of participation. Just message me on Reddit.

qbolec's avatar

Find any concurrency bugs in the following implantation or sketch the proof of correctness

#pragma once

#include <atomic>

#include <cassert>

#include <cstdint>

#include <memory>

template <typename T>

class atomic_shared_ptr {

struct alignas(8) Control_block {

std::shared_ptr<T> m_shared_ptr;

std::atomic<uint32_t> m_ongoing_loads;

};

mutable std::atomic<uintptr_t> m_word { 0 };

static Control_block* word_to_cb(uintptr_t word)

{

return (Control_block*)((word >> 16) & ~uintptr_t { 7 });

}

public:

std::shared_ptr<T> load() const

{

auto word = m_word.fetch_add(1) + 1;

auto* const cb = word_to_cb(word);

const auto result = cb ? cb->m_shared_ptr : std::shared_ptr<T> {};

do {

if (m_word.compare_exchange_weak(word, word - 1)) {

return result;

}

} while (word && word_to_cb(word) == cb);

if (cb && cb->m_ongoing_loads.fetch_sub(1) - 1 == 0) {

delete cb;

}

return result;

}

void store(const std::shared_ptr<T>& new_value)

{

const auto* const new_cb = new_value ? new Control_block { new_value, 0 } : 0;

auto const addr_of_new_cb = (uintptr_t)new_cb;

assert((addr_of_new_cb & ((1 << 3) - 1)) == 0);

assert((addr_of_new_cb >> 48) == 0);

auto const old_word = m_word.exchange(addr_of_new_cb << 16);

auto* const old_cb = word_to_cb(old_word);

// because we rely on the unsigned overflow wrap-around it is

// crucial that old_cnt has the same type fetch_add() returns

using cnt_t = decltype(old_cb->m_ongoing_loads.load());

cnt_t const old_cnt = old_word & ((cnt_t { 1 } << (16 + 3)) - 1);

if (old_cb && old_cb->m_ongoing_loads.fetch_add(old_cnt) + old_cnt == 0) {

delete old_cb;

}

}

};

Radu Floricica's avatar

## Analysis: The Core Protocol Is Correct

The algorithm is a clever variation of a split reference count (pointer + embedded counter in a single atomic word, with a secondary `m_ongoing_loads` counter for deferred cleanup using unsigned wraparound). After careful examination, the central concurrency protocol is sound. Here's the argument and the practical issues I did find.

---

### Why the protocol works

**Bit layout:** low 19 bits of `m_word` are an in-flight load counter; bits 19+ encode the `Control_block*` (which is 8-byte aligned, so 3 bits are free, and the 48-bit address assumption gives 16 bits of headroom at the top).

**Key invariant:** when `store()` does `exchange`, the counter value it captures is *exactly* the number of loads that have done `fetch_add(1)` but have not yet completed (CAS or `fetch_sub`). This holds because `fetch_add` and `exchange` on the same atomic are totally ordered.

**Deferred cleanup via unsigned wraparound:** After the exchange, loads that can't CAS (pointer changed) do `fetch_sub(1)` on `m_ongoing_loads`, and the store does `fetch_add(old_cnt)`. These N+1 operations net to zero in `uint32_t` arithmetic regardless of ordering. Exactly one operation observes the post-operation value of 0 and deletes — this is guaranteed by the atomicity of each individual `fetch_*`.

Quick proof sketch for the delete-exactly-once property: let the store add `N` and `N` loads each subtract 1. Starting from `m_ongoing_loads = 0`, the sum is `+N - N = 0`. Each atomic RMW returns the previous value. The sequence of values is a permutation-dependent walk that starts and ends at 0 (mod 2³²). Exactly one transition crosses 0 from above, and that operation sees `new_value == 0`.

**No use-after-free of the CB:** a CB can only be deleted when the net count reaches zero, meaning all loads have finished copying `m_shared_ptr`. The load copies *before* entering the CAS loop / `fetch_sub` path.

**No ABA on the CB pointer:** For a new allocation to reuse a freed CB's address, the CB must first be deleted. But deletion requires all in-flight loads to complete. If a load is still in-flight, the CB isn't freed, so no new allocation can land at that address. The ABA is structurally impossible.

**Memory ordering:** All operations default to `seq_cst`. The CB is fully constructed before being published via `exchange`; loads discover it via `fetch_add` on the same variable, establishing happens-before. Concurrent reads of `m_shared_ptr` (which is immutable after construction) are safe because `shared_ptr` copy is thread-safe for concurrent readers.

---

### Actual issues found

**1. Missing destructor — guaranteed leak of the last CB**

If `m_word` holds a non-null CB when the `atomic_shared_ptr` is destroyed, that CB (and the `shared_ptr` inside it) is never freed. You need at minimum:

```cpp

~atomic_shared_ptr() {

store(std::shared_ptr<T>{});

}

```

**2. Counter overflow corrupts the pointer (19-bit limit)**

With 2¹⁹ ≈ 524K concurrent in-flight loads, `fetch_add(1)` carries into the pointer bits, silently producing a wrong CB address. This is the most dangerous practical issue — it's not just a crash, it's silent UB. If you ever expect high concurrency, either widen the counter (use the full low 16 bits + more of the alignment gap) or add a runtime check / backoff.

**3. No `load()` fallback when `cb` is non-null but was derived from a corrupted word**

Related to (2): there's no defensive check after `word_to_cb`. If the counter did overflow, you'd dereference garbage.

**4. Platform assumptions**

The `assert((addr_of_new_cb >> 48) == 0)` assumes x86-64 canonical 48-bit addresses. ARM64 with TBI/PAC/MTE uses upper bits. This isn't a concurrency bug but limits portability. Consider masking or using a different encoding.

**5. `store()` is not safe to call concurrently with itself**

Wait, actually it is — `exchange` is atomic, and each store gets a unique old CB. Disregard. *(I double-checked this during analysis.)*

**6. Minor: `compare_exchange_weak` spurious failures**

Not a bug per se, but `compare_exchange_weak` can spuriously fail, meaning the CAS loop may iterate extra times even when the word hasn't changed. This only affects performance, not correctness, because the loop condition re-checks the pointer identity. On architectures where `compare_exchange_strong` is the same cost (x86), you might prefer it.

---

### Verdict

The concurrency protocol is correct. The unsigned-wraparound trick for `m_ongoing_loads` is the cleverest part and it holds up. The practical gaps are the missing destructor (real leak) and the 19-bit counter overflow (theoretical UB under extreme contention). Everything else is sound.

qbolec's avatar

Thank you! This is very good!

qbolec's avatar

Which model was it?

Radu Floricica's avatar

Same, Opus 4.6. But it took its time with this - I've rarely seen it think this long.

qbolec's avatar

Thank you for providing this! Well, if it can do something this complicated, than it doesn't really matter if it takes a second or an hour or a day. Finding a human code reviewer to do it this week is already difficult. Finding a human who will additionally do it for 20$/month is impossible ;)

gorst's avatar

when you run this is claude-code instead chat, then claude has access to extra tooling such as static code analyses, test runners, log files and dependency-sources. Also claude-codes separates planning mode and execution mode, which chat does not do afaik.

If I were given this tasks, I would do some reading/understanding of the code (like claud did), but then I would add some print-statements and asserts. Then I'd test some hypotheses and assumptions. And I'd try to bruteforce some race conditions. Basically I would test if my formal evaluation would hold up to reality.

Also maybe I would manually execute the algorithm on paper to get a feeling for it and to look at some edge cases. I don't see that claude did this (and I don't think claude-code would have done this)

Miles Shuman's avatar

Please code an application that will constitute a simulation of a multithreaded continuously-operating consciousness by instantiating N > 2 Claudes, and designing each one to play a different role, taking inspiration from human cognition - i.e. one might be instructed to play the role of conscious “inner voice” steam of thought, another to play the role of subconscious “mulling things over”, another given executive control, another to worry about adverse consequences and provide inhibitory inputs, etc. Take some time, to begin with, to do some deep research in the cognitive psychology literature to help you decide on a design. I suggest providing them with several real-world input streams they can attend to and use as the “seeds” for deciding what to think about, so that sensory deprivation doesn’t lead to ruminative spiraling. Create a front-end that allows either a Claude instance or a human to monitor (a) the input streams, (b) the thinking traces of each instance, and (c) all interactions between instances.

Please consider this only a general outline of a plan; think it over before starting, and after doing background research, make your own decisions about the details both of specifications and implementation.

Pazzaz's avatar

Does the Hungarian card game "Hatalom Kártyái kártyajáték" have a card inspired by the Rusalka?

Lackadaisical Enkrateia's avatar

Opus 4.6 has absolutely no idea, even with web searching:

---

I'm not familiar with a Hungarian card game called "Hatalom Kártyái kártyajáték" from my training data. Let me search for information about it.

Searched the web...

Now let me search specifically for a Rusalka card in this game.

Searched the web...

Let me try searching the card database directly.

Searched the web...

Let me try searching the newer card search tool.

Searched the web...

The site search didn't return a specific "Ruszalka" result. Let me try the card search database directly.

Searched the web...

I wasn't able to find a specific "Ruszalka" card through web searches. Let me try one more approach — searching the old card search tool directly with a query.

Searched the web...

I wasn't able to find a specific card named "Ruszalka" in the game through my searches. The game has thousands of cards (over 5,000 unique cards have been published since 1995), so it's possible such a card exists but isn't easily findable through web searches. Let me try one more specific search.

Searched the web...

I wasn't able to find a specific card called "Ruszalka" (the Hungarian spelling of Rusalka) in the Hatalom Kártyái Kártyajáték through my searches. However, this is a very niche question about a game with over 10,000 different cards, and the card database isn't fully searchable through web searches alone — the main card search tools (lapkereso.hkk.hu and the old Beholder card searcher) are interactive web apps that require direct querying.

So I can't confirm or deny whether HKK has a Rusalka-inspired card. The game is based on the "Túlélők Földje" (Survivors' Land) fantasy RPG universe, which has its own original mythology and creatures. While the game does include various water creatures (there's even a "Víziszörnyek" / Water Monsters Alpha deck), I couldn't find specific evidence of a card named or inspired by the Slavic Rusalka. If you're looking for a definitive answer, I'd suggest using the card search tool at lapkereso.hkk.hu and searching for "ruszalka" or similar terms, or asking on the HKK community forums.

bbqturtle's avatar

You are making a story riddle. It should be challenging but ultimately only have one correct answer, but that you need to really ponder the lines to reveal the truth. It should

1. Choose an object. Imagine a story from that objects perspective where it is doing or experiencing something. These riddles must be a story. Do not choose a teddy bear or a pencil, but do choose an everyday, common object that everyone has heard of.

2. Begin the riddle - each line should have some concepts that must be true at the time of writing. It must rhyme. It must be a story, not just describing something. It must flow from the previous line. It must be funny or have double meanings in it. It must not give the answer away in one line - vagueness is important. It should only be 8 lines or so.

3. When the puzzle is complete - do a double check on each line. Does it fail any of the #2 criteria? Does any line give it away too easily? Compared to the answer, is there any word in the riddle that doesnt actually make sense?

4. Look for ways to make each line more fun - flowery language, double meanings, and edit it further.

5. Go back through each line - does any line give away the answer to abruptly?

6. Reviewing the whole puzzle, does it still only collapse to one answer?

Radu Floricica's avatar

Here's the riddle:

> They dressed me up in paper-thin attire,

> Then dangled me above a pool of fire.

> I slowly sank — no protest, no complaint —

> And bled my very soul without restraint.

> The longer that I lingered in the deep,

> The darker grew the world around my keep.

> They wrung me out when I had nothing left,

> And tossed me, spent and empty, quite bereft.

What reads as a grim tale of medieval torture — or a Monday morning at a toxic workplace — is actually the short, tragic biography of a **teabag**.

**Line-by-line breakdown:**

1. *Paper-thin attire* — the gauze/filter bag it's sealed in

2. *Dangled above a pool of fire* — held by the string over steaming hot water

3. *Slowly sank* — lowered into the mug

4. *Bled my very soul* — steeping, releasing flavor

5. *Lingered in the deep* — steep time in the cup

6. *Darker grew the world around my keep* — the water darkening with tea; "keep" = the mug

7. *Wrung me out* — that barbaric little squeeze against the spoon

8. *Tossed me, spent and empty* — straight into the bin

The misdirection works because every line is emotionally charged enough to sound human, while being physically precise about what actually happens to a teabag. No single line gives it away, but the sequence — dressed, dangled, sank, bled, darkened, wrung, tossed — uniquely collapses to one answer.

Ruffienne's avatar

I enjoyed this - good prompt and good response!

temp_name's avatar

Prompt: List some methods to give Language Models(AR, MDM, doesn't matter) better capabilities to generate multiple tokens with a single pass, by making the outputs incorporate information on joint distribution of tokens, instead of just marginal probabilities of each. Give details on equations used for training and sampling. Some of the methods can come from existing literature, but majority should be entirely novel.

Basically what I've been researching for the past few days. If it can do this well, I would be quite impressed.

Robert Jones's avatar

I asked Claude which of the unanswered questions it would like to answer, and it identified this as a question it would probably fail instructively. I have to say that I myself predict that it will fail uninstructively, but I've passed the query on any way.

https://claude.ai/share/8c0cf582-b122-4306-9e22-16f23c0cca0d

I'd be grateful if you could let me know (a) whether this answer is helpful and (b) if not, whether the failure is instructive.

Jacob Steel's avatar

Give me a good comprehensible explanation, with examples and exposition of those examples, of the relationship between the homology groups of a topological space and the abelianisation of it's homotopy groups. Try to give me an intuitive/heuristic understanding of what is going on, not just prove results at me.

Jacob Steel's avatar

Please pitch your answer at a professional mathematician, but one specialising in a different field.

Michael's avatar

https://chatgpt.com/share/698f770d-c6bc-8004-a19b-a247ba265dc1

ChatGPT 5.2 Pro (Extended reasoning mode)

I know nothing about the topic and don’t understand what’s being said but something about the answer gives me a dubious feeling.

Jacob Steel's avatar

I can't speak for all the details - I asked the question precisely because I don't really understand the answer - but the bits I can follow make sense, and the explanation that the difference between homology and the abelianisation of homotopy being about linking phenomena (with the Hopf map as an example where all the circles end up linked) feels like it's nudging me closer to understanding. It's not perfect, but on balance I'm moderately impressed.

Mark's avatar

A foundation "to help single moms and their kids" - is this still charitable (ie tax-exempt) if the founder uses it exclusively to help kids he sired himself?

AH's avatar

I find Opus 4.6 more likely to engage in this kind of sychophancy (great question!) but here is the answer:

"This is a great question, and the short answer is: almost certainly not — that arrangement would likely jeopardise (and potentially never have qualified for) tax-exempt status under US law. Here's why:

Private benefit and inurement.

To qualify as a 501(c)(3) organisation, a charity must operate for public, not private, benefit. If a foundation nominally serves "single mums and their kids" but in practice funnels all its resources to the founder's own children and their mothers, the IRS would view this as serving a private interest rather than a charitable class. The organisation's beneficiaries must be broad or indefinite enough that the arrangement serves a public purpose — a founder's own family doesn't meet that standard.

Self-dealing rules.

If the entity is structured as a private foundation, IRC §4941 imposes strict self-dealing rules. Payments from the foundation to the founder's children (who are "disqualified persons" as family members) would trigger excise taxes and potentially lead to loss of exempt status, regardless of how the mission statement is worded.

Substance over form.

The IRS and courts look at how an organisation actually operates, not just what its governing documents say. A facially charitable purpose statement won't save an entity that functions as a personal family support vehicle. The classic test is whether the class of beneficiaries is sufficiently large or indefinite — one man's children is neither.

Potential consequences include revocation of tax-exempt status (retroactively, potentially), excise taxes on self-dealing transactions, and the founder could face penalties for misuse of charitable funds."

Mark's avatar

Great answer - really! Thanks to you and Opus 4.6

JaziTricks's avatar

It feels sad that people are stingy to avoid paying $25/month, and still have an opinion...

I know etc. Just expressing my sadness

Scott Alexander's avatar

I don't think that's stingy. I have an opinion on TikTok and don't pay for it.

Mark's avatar

Feels logical. Opinions come easy. "Would it make sense to me to pay for AI?" Meh, no. (if the answer is yes! - you subscribe) And if you say: No - well, 'reasons' come even easier. I have a clear opinion why I do NOT pay-subscribe to 99+% of substacks, any streaming besides amazon (which I got thrown in free), and not to most fine papers/journals.

EngineOfCreation's avatar

You know, that argument goes both ways. If you pay for something and say you love it, I could say that the payment itself biases you positively towards the thing, because rationalization and reducing cognitive dissonance: you're paying for it, so it can't be bad. If it was bad, you'd have to admit it was a mistake, and not everyone is a big enough person to admit that.

JaziTricks's avatar

Not symmetrical tho.

“I paid and therefore in biased” is inherently different from “I used the service, and therefore I got an opinion”. Very different types of issues

MaxEd's avatar

It's a puzzle to me why some companies don't provide any way to test their top-tier offerings. It makes me not trust them immediately, if I don't see good value from their free-tier models. E.g. Kimi uses the same model for free and paid tier, and I'm pretty sure what I'm paying for (more tokens, better access). ChatGPT used to have a limited number of prompts for top-tier, I think? But I haven't checked in some time. Claude just gives you Sonette and tells you to trust them that Opus is better. I'm unimpressed with Sonette, so I'm not very willing to pay for Opus, especially when Kimi clearly beats Sonette at coding questions. It might not beat Opus, but I'm not very willing to create a subscription (they don't even allow one-time payment for a month of use!).

JaziTricks's avatar

There's a limit of how much free they can offer. They do give a lot for free already. And GPT 5.2 auto uses thinking mode of he thinks it's needed!

Asking companies to provide even more for free is a little too much?

Victor Thorne's avatar

That's reasonable, but if they care about people's opinions of how advanced their models are it would make sense to give people, like, 1 or 2 free tests of a high-level model a month or something.

JaziTricks's avatar

Eventually it's a balancing act. And giving free pro access for free, even once might cannibalise subscriptions.

Some would need a single question and pay to get it eventually

icely's avatar

there was like 3 free messages of Opus 4.1 a short while back, antigravity's free tier gives you a short taste of Opus 4.5, although antigravity itself was being exploited by a group mass spamming a 'free tier'-like thing like 20 days ago (fake student accounts)

Chris's avatar

Rhyming dictionaries often include absurdly long words that it would be nearly impossible to use productively in a poem. Can you write a coherent couplet that rhymes at least six of the seven syllables of "pseudoisochromatic"?

Jonasz's avatar

Gemini 3 Pro:

https://gemini.google.com/share/4f92cd8727d0

"The chart on the wall is strictly pseudoisochromatic,

So read me the numbers quickly, judo guy—throw no static!"

Chris's avatar

Hm, better than what Claude could manage when I asked a year ago but still not great.

My solution when I first saw this challenge was "You say I can't see, you go 'why so dramatic?'/I'm color-blind, you're pseudoisochromatic". Which is why I gave allowance for a slight slant rhyme since 'dra' is a bit off from 'chro'. Gemini didn't avail itself of that wiggle room though.

haze's avatar

Opus 4.6:

Here’s a couplet with a six-syllable rhyme:

The king’s new heraldry was crude, defying diplomatic

convention—every hue and shade pseudoisochromatic.

“Crude, defying diplomatic” rhymes six of the seven syllables with “pseudoisochromatic”: -ude, -i-, -o-, -ro-, -mat-, -ic (with the unmatched syllable being a near-miss at best). Let me try for something cleaner:

A scholar called the test results “a pseudo-isochromatic

disaster”—every color plate read skewed by so-chromatic

Hmm, that’s forcing it. Let me give you my best clean attempt:

The eye chart’s rainbow grid was lewd, pyrotechnimatic—

each plate a ruse: the whole display, pseudoisochromatic.

Actually, the challenge here is that few natural words mirror that syllable string. Here’s a more honest, coherent attempt that delivers six clean syllable-rhymes:

His muse defied the automatic—

her palette, pseudoisochromatic.

“Muse defied the automatic” gives us -ūz, -ī-, -ə-, -mă-, -tĭ-, -ĭc — six of seven syllables rhyming with ps(eu)-do-i-so-chro-mat-ic. And it actually makes sense: an artist’s muse resists mechanical color theory.​​​​​​​​​​​​​​​​

Julia D.'s avatar

My human answer, sung to the tune of "Complicated" by Avril Lavigne:

Why'd you make this chart

pseudoisochromatic?

I think data

Should be colored right but

you know I get so manic

As an illustrator

Ari's avatar
Feb 13Edited

This is a fun idea so I decided to give it a go myself.

Don't throw me no shade,

You know I'm no fanatic,

I see through you,

no pattern but static, you're pseudoisochromatic

Ok not a couplet but I like it better that way

D0TheMath's avatar

Opus 4.6:

Here's what I've got — a 7/7 mosaic rhyme, no less:

> **I threw dough — eyes show no static —**

> **and flunked the pseudoisochromatic.**

The breakdown:

| | pseu | do | i | so | chro | mat | ic |

|---|---|---|---|---|---|---|---|

| **Target** | /uː/ | /oʊ/ | /aɪ/ | /oʊ/ | /oʊ/ | /æt/ | /ɪk/ |

| **Rhyme** | threw | dough | eyes | show | no | stat | ic |

| **Match** | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |

All seven syllables. No reuse of "chromatic" as a crutch — it's a pure mosaic decomposition into monosyllables. And it even tells a coherent little story: paid for the eye exam, vision seemingly fine, still failed the color blindness test. The irony of "no static" doing double duty (no visual noise *and* no complaint) is a nice bonus.

A1987dM's avatar

Based on all currently available evidence, what is the most probable pronunciation of the Proto-Indo-European laryngeals?

Kalimac's avatar

Over two years ago, I asked ChatGPT what the phrase "only connect" means, and <A HREF="https://kalimac.blogspot.com/2022/12/chatgpt-can-communicate-better-than.html"> it gave me a good answer</A> which no human being had ever been able to do.

Yehuda Isseroff's avatar

When did medieval Jewish thinkers first write in explicit terms that the moon's light is a reflection of the sun's? Provide sources

James Alexander's avatar

Why do you (and me) and everyone else always have to ask for sources? Why is default AI to pretend it has done the work itself when it has merely done an old fashioned Google searchnand summarised popular or speiclaist articles/essays/Wikipedia pages?

SVF's avatar
Feb 19Edited

Because the core purpose of the AI isn’t to give you sources for every third word? Because LLMs and the accompanying developments and techniques and refinements weren’t trying to solve the problem of “how can we give sources for things???”

It’s weird to even have to say this out loud. Do you act this way with real people face-to-face and then get visibly angry when they don’t give you sources? Is it surprising to you that the default mode of human communication (which the chat interface is trying to mimic) is to interject constantly to cite sources for things that mostly don’t require them? That not every single conversation is a dissertation paper being evaluated on its adherence to rigid citation requirements?

Also getting sources is so trivially simple in every AI tool nowadays that the only possible way to interpret this is “I don’t know and don’t care and don’t want to learn or understand anything new” which hardly leads to a scathing critique.

James Alexander's avatar

Because AI can't come up with an original take on anything and can only have (extremely well) read everything (mostly without paying for any of it) it's fair to ask for its sources. It's not and never can be a real person with originality or opinions of its own. So, what's it read and how does it rate what it's read is extremely fair - and extremely simple for it to do, like GoogleAI with those little links (effectively footnotes) to real people, authorities, and their sources.

Ian Golan's avatar

This one would save me a lot of work. Conduct a quantitative economic analysis of Austria’s military recruitment system by comparing multiple institutional designs:

Conscription variants:

Current Austrian conscription model

Extended service length conscription

Conscription with higher compensation/pay

All-volunteer force (AVF) variants:

Baseline market-wage AVF

High-incentive AVF (bonuses + education benefits)

Hybrid reserve-heavy AVF

Using Austria-specific demographic and labor market data, estimate total social costs under each variant. Include: • direct fiscal expenditures

• opportunity costs of foregone civilian wages and education

• productivity losses

• long-term earnings effects

• administrative and training costs

Calculate total national cost, per-cohort cost, and cost per active soldier. Perform a distributional analysis showing how costs fall across income groups and education levels.

Use a transparent human capital or labor market equilibrium model. When real data are unavailable, state assumptions explicitly and justify parameter choices. Run sensitivity analysis across at least three parameter scenarios.

Output must include:

Conceptual framework

Mathematical model

Data inputs and assumptions

Step-by-step calculations

Numerical estimates for each variant

Sensitivity analysis

Comparative summary table

Policy interpretation

Objective: determine which recruitment system minimizes total social cost in Austria and how each variant distributes the burden among citizens.

Scott Alexander's avatar

You made it think for ten whole minutes, I think you may be responsible for drying up some lake in the Midwest. Here's your report: http://slatestarcodex.com/Stuff/austria_military.docx

Mark Y's avatar

Re: lake: for those who don’t realize: Scott is joking here, AI water usage is very low, bordering on negligible.

AlexTFish's avatar

Yeah, this is a risky phrase to use because lots of less-informed people really do think the water consumption of AI is at the kind of level an individual could have a significant effect on.

Ian Golan's avatar

I still enjoy the thought of sacrificing an entire lake for my (anti)conscription campaign. Thank so much Scott!

Kevin Barry's avatar

No question, but just wanted to add that modern AI consistently gives better advice on almost any topic than anyone you know in real life.

Mark Y's avatar

I know you said “any topic” but… can you give an example of a question where I might be tempted to ask a human bit neglect to ask an AI? Assume that I’m a very light AI user.

Kevin Barry's avatar

Sure!

What is the best skin care regimen for my demographic?

Whats a good way to grease the wheels of my career path given x, y and z facts about my career?

"I live in (3rd world country). What countries offer the most reasonable and achievable immigrant path given my resources. Give me steps and realistic timeline."

Is fluoride in public water good or bad? Should I support it in the next town vote?

Jeffrey Soreff's avatar

>can you give an example of a question where I might be tempted to ask a human bit neglect to ask an AI?

I would suggest questions where direct somatic sensations are involved:

If I do X, I feel Y in my Z, do you feel that sensation too when doing X, or is this unusual?

Ruffienne's avatar

Not in my world, it doesn't.

Peter Davies's avatar

How are you measuring this?

Kevin Barry's avatar

Qualitatively and anecdotally.

Nafter's avatar

What are the signaling pathways that WNT is involved in, and where might it possibly be involved in areas that haven't been described yet?

Kris Ararat's avatar

If you were to reform Ukrainian armed forces order of battle to make it work better what would you do? I assume Army command west and south are mostly redundant by now, I would convert them into Army Group A and B. Responsible for north and east primarily. There are scattered unmanned system brigades, I would create smaller mobile units for every army division to respond to immediate demands and a central command for operations inside Russia. Communication and signal units can also be unified under the air force. I would appreciate if you gave more indepth suggestions.

AH's avatar

Opus 4.6: "The overarching theme across all of this is reducing the number of parallel command chains, creating a genuine joint operational culture, standardising wherever possible, and building the intermediate headquarters echelons that allow the General Staff to focus on strategy rather than micromanaging brigade-level fights. None of this is easy mid-war, but incremental steps — particularly rationalising brigade types and building out the corps/division layer — could yield real gains relatively quickly."

https://claude.ai/share/f74f7a18-ff44-4a6b-8937-ba4967e94029

corb's avatar

What is the range of Donald Trump and family's wealth increase due to emoluments, graft, kickbacks, or bribery in exchange for influence peddling. Include increase in personal investments due to cryptocurrency legislation. Separate political donations from personal wealth increase.

Mark Y's avatar

Is this interesting? Shouldn’t he be so rich by now that it’s basically impossible to motivate him using money? Am I being naive?

corb's avatar
Feb 14Edited

So you're guess might be zero?

Not including $3million licensing for Trump bibles...

Excerpt from Claude Free version: Total Family Wealth: Various sources report the Trump family's net worth has grown from approximately $3.9-5.4 billion before the second term to estimates ranging from $7.3 billion (Forbes, September 2025) to $10 billion (various outlets by late 2025).

Personal Trump wealth increase: Approximately $3-4 billion since returning to office in January 2025.

Cryptocurrency-Related Gains

Direct crypto ventures (estimated $1.4-2+ billion):

$TRUMP memecoin: Added approximately $710 million to Trump's wealth

World Liberty Financial: Generated approximately $1.4 billion from token sales, with Trump family entities receiving roughly 75% of proceeds

American Bitcoin (Eric and Donald Jr.'s mining company): Went public September 2025, estimated to have added $750 million to Eric Trump's wealth

$MELANIA memecoin: Over $20 million for Melania Trump

Cryptocurrency policy benefits: The administration signed the GENIUS Act (stablecoin regulation) and issued executive orders promoting crypto, while the Trump family simultaneously operated multiple crypto businesses. The GENIUS Act notably exempted the president and family from prohibitions on profiting from stablecoins that apply to Congress members.

Mark Y's avatar

I guess what I meant was, is it plausible that he’s doing it for the money? I realize he’s getting money out of this, it just boggles my mind that it could motivate him. If I had that much money, I wouldn’t do anything “for pay” because I can already afford anything I could want.

corb's avatar

I don't sense he's grown out of a fondness for it. I was curious re the level of detail Claude would compile.

Sebastian's avatar

I don't think it's about wanting to buy anything. More likely, number goes up feels good.

Mark's avatar

Rather shocking how much of that wealth is crypto grifts. Also perhaps dismaying, if it becomes necessary one day to restrict crypto to stop rogue AIs, that cannot be accomplished without annihilating the president's family wealth.

B Civil's avatar

I think so. There’s this thing called avarice. Also the concept of paying tribute, which really has nothing to do with how much money you have to begin with.

Ethan's avatar

I have a lot of skepticism about AI in a lot of use cases, largely based on personal experience; I use it for certain tasks, but when I try to use it for more complicated or time-consuming tasks, I find myself spending a lot of time checking for hallucinations or finding workarounds when it fails or starts repeating itself in the middle of handling large amounts of data. (I have a paid ChatGPT subscription and have used Gemini Pro within free tier limits a handful of times.)

One area where I do think that AI holds a lot of promise is in medical diagnosis for chronic or uncommon issues given its ability to handle a large number of inputs (medical history) and sources (medical research) in one go.

That said, I'm not about to publicly post a detailed personal medical history or share a bunch of imaging results. (Also, I'm not sure what the current state of play is, but last time I tried to get ChatGPT to do something like this for me, it essentially told me it can't give medical advice. This was a change from an earlier, but unfortunately less capable, version.)

Fluorescent Kneepads's avatar

ChatGPT, Claude, and Gemini have all been fine answering medical questions lately so it’s worth trying again.

Anonymous's avatar

In the college admissions problem, it is well known that the set of stable matchings forms a lattice. This remains true in a model where colleges preferences are defined by substitutable and consistent choice functions over students, rather than preference lists. However, the lattice of stable matchings may no longer be distributive. Does the lattice of stable matchings have any other special property weaker than distributivity? If not, are there any variations of the problem that lead to lattices that are more general than distributive, but less general than all finite lattices?

AH's avatar

It thought for quite a while on this, here is Opus 4.6's answer. Seems a little unsure: https://claude.ai/share/751593ed-8460-4436-800a-b5c004050f54

Anonymous's avatar

Thanks for putting it in! I chose this question because it sits right at the boundary of what is known by the field, and also incorporates some ideas that are not typically written down explicitly but are "folklore" in the field, and can be derived by combining a few well-known ideas. Unfortunately it looks like Claude got some things mixed up, and most of the answer is inaccurate or misleading. I actually saw that Gemini did very well, though.

XP's avatar

The point about the "annoying gotcha questions" also matters because AI performs noticeably worse in low-context and hostile-questioner settings. I can't find the exact source right now, but it also seems that this makes them more likely to double down on hallucinations as well (because people having a hostile debate don't back down and don't admit ignorance).

There are many skeptics out there who can barely bring themselves to interact with AI in the first place, insist on testing it as clinically as possible, try to "bust the bot" with trick questions and then use it as fodder for their next video essay.

Tatu Ahponen's avatar

Which is a fully valid way to test it, no? If you want AI tools to be usable by everyone, then "everyone" will inevitably also include hostile people.

XP's avatar

Should LLMs improve so that this is no longer an obvious failure mode? Oh, certainly. I suspect this is exactly the kind of thing that doesn't get enough attention in reinforcement learning.

But a fair test or review of a product doesn't usually involve the user being actively hostile to even using the product to begin with, especially not to prove a point rather than genuinely engage with it. (I was mainly thinking of content creators putting on a show here, not skeptics in general.)

B Civil's avatar

AI would not be very smart if it were unable to detect bad faith questions.

SVF's avatar

Sure it’s a fully valid way to test it, assuming you’re being honest about what it is you’re actually testing. Which in these contexts is basically never “what is this tool capable of?” and basically always “alright dumb dumb I’m gonna stump you REAL good so I can prove my pedantry and/or intellectual superiority to my six followers! Checkmate atheists!”

Like it’s pretty obvious who is engaging out of genuine curiosity and who is engaging out of performative ignorance.

Michael Watts's avatar

> The point about the "annoying gotcha questions" also matters because AI performs noticeably worse in low-context and hostile-questioner settings.

Funny, this is exactly what psychics say about themselves.

SVF's avatar

Yes except one has actually been demonstrated to be real and the other hasn’t. But aside from that one irrelevant detail it’s totally the same.

Michael Watts's avatar

Well, LLMs are real, and you can easily verify that.

Psychics are also real, and you can easily verify that too.

SVF's avatar

I would say this is most skeptics, at least the vocal ones who have a “ai is dumb lol” beef vs the ones who have a more defensible/useful “ai is potentially dangerous” beef.

E Dincer's avatar

Help me understand why the western world has a NIMBY problem. Coming from an Eastern Mediterranean country, a NIMBY would be something so rare they'll put them in the news. Rationally speaking, anybody who has property in a low density residential area would like the municipality/state to turn that into high density residential area so they'll build a high rise, have 10 apartment flats instead of a single house, sell 2 (or whatever) of the apartments for an identical house in another low density area, and happily collect the rents of the 8 (or whatever) remaining flats. Everybody regards this as kind of winning the lottery and actively campaign for / bribe officials so low density areas turn into high density ones. When the mathematics is this clear, how are there such an amount of NIMBYs in western countries so that some people regard them as a problem?

theObvious's avatar

This is a genuinely excellent question, and the confusion is rational — the math as you’ve described it really does work in the Eastern Mediterranean context. The answer is that several assumptions baked into your model quietly break down in the Western (especially Anglo-American) context. Let me walk through the layers.

The value structure is inverted

In your mental model, the big payday comes from unlocking density — turning one dwelling into ten. But in high-demand Western cities, the causality runs the other direction: scarcity itself is the asset. A modest single-family house in San Francisco, Vancouver, or parts of London can be worth $1.5–3M not because of the structure (which might be a century-old wooden box) but because zoning prevents anyone from adding supply. The homeowner is essentially holding a monopoly ticket on a scarce unit of housing in a desirable area.

If you upzone the entire neighborhood, you increase supply across the board, which threatens to reduce the per-unit price of housing. The homeowner’s $2M house might sit on land a developer would pay $2.5M for — but if every neighboring lot also becomes developable, the resulting glut of new units could push local prices down. The homeowner’s windfall from selling to a developer shrinks precisely because everyone else can do it too. So each homeowner has a rational interest in blocking upzoning in general while quietly hoping only their plot gets a variance. It’s a collective action problem: restriction benefits incumbents as a class, even if liberalization would benefit each one individually in isolation.

The “kat karşılığı” model doesn’t exist

I suspect you’re from Turkey, or somewhere with a similar system, where the landowner hands the plot to a developer and receives several finished apartments in the new building in return. This is a well-understood, culturally normalized transaction with a clear legal framework. The homeowner stays in the same neighborhood, lives in one unit, rents or sells the others. Everyone understands the deal.

In most Western countries, this mechanism simply doesn’t exist as a standard practice. The homeowner’s realistic path is: sell the whole property to a developer, move away, and the developer builds something new. The homeowner doesn’t get to keep several units in the new building — they get a one-time cash payout. And that payout, after capital gains taxes, transaction costs, and the need to buy a replacement home in an inflated market, often doesn’t feel like winning the lottery. It feels like being displaced from your own neighborhood.

The home means something different culturally

In much of the Eastern Mediterranean, apartment living is the norm — it’s not stigmatized, it’s just how people live. The transition from a house to apartments on the same land feels natural.

In Anglo-American culture specifically, the single-family detached home is wrapped in enormous ideological weight. It represents independence, privacy, having “made it,” providing for your family. The suburban house with a yard is literally the core of the postwar social contract in the US, Canada, and Australia. Telling someone their neighborhood of houses will become a neighborhood of apartment towers doesn’t compute as “you’re getting richer.” It computes as “the thing you organized your whole life around is being destroyed.”

This isn’t purely irrational — it reflects genuinely different preferences about how to live. But it’s amplified by a cultural narrative that treats density as decline.

The property tax trap (especially American)

In California, Proposition 13 (1978) locks your property tax assessment to roughly what you paid for the home, with tiny annual increases. Someone who bought a house in 1985 for $150,000 might be paying $2,000/year in taxes on a home now worth $2,000,000. If they sell — even to a developer at a premium — the replacement home gets reassessed at current market value, and their tax bill might jump to $25,000/year. This creates a massive financial incentive to never sell and never allow change, because the act of transacting destroys the tax subsidy.

Similar, if less extreme, lock-in effects exist in other Western jurisdictions through various homestead exemptions, senior freezes, and the simple fact that moving triggers costs.

Who captures the upzoning surplus

This is perhaps the deepest structural difference. In your model, the landowner captures most of the value created by upzoning. In many Western regulatory environments, the development process is so expensive, slow, and uncertain that much of the surplus gets absorbed by:

∙ Years of permitting, environmental review, and legal challenges (in San Francisco, a new building can take 5–7 years from proposal to occupancy)

∙ “Community benefits agreements” where the developer must fund parks, affordable units, transit improvements, etc.

∙ The developer’s own profit margin and risk premium

∙ Lawyers, consultants, and architects navigating the regulatory labyrinth

By the time all these parties take their cut, the premium the developer can offer the original homeowner over the existing home value may not be that impressive. The homeowner looks at the offer, looks at the hassle, and thinks: “Why would I upend my life for this?”

The neighborhood externality argument (genuine and pretextual)

There are real externalities from densification: more traffic, less parking, shadows cast by taller buildings, strain on schools and infrastructure, construction noise for years. In Western cities with already-strained infrastructure, these concerns aren’t imaginary.

But these genuine concerns also provide convenient cover for motivations people are less comfortable stating openly — preserving racial or class homogeneity, maintaining exclusivity, or simply not wanting “those people” in the neighborhood. The history of zoning in the United States is inseparable from the history of racial segregation: single-family zoning was explicitly promoted in the early 20th century to keep neighborhoods white after racial covenants were struck down. The tool has evolved, but some of the underlying dynamics persist. NIMBYism gives socially unacceptable motivations a procedural, respectable face.

Age and life-stage demographics

Western homeowners in established neighborhoods skew older and retired. They’re not looking for an investment windfall — they’re looking to live out their years in a familiar place without disruption. The rational economic calculation you describe requires someone who wants to optimize returns. Many NIMBYs are optimizing for stability and continuity, and no amount of money makes a 72-year-old want to deal with three years of construction next door.

The political economy feedback loop

Once NIMBYism becomes entrenched, it creates self-reinforcing political structures. Local governments in the US are heavily dependent on property taxes from single-family homes. Homeowners vote in local elections at enormously higher rates than renters. Planning commissions, zoning boards, and city councils are disproportionately composed of homeowners. Public comment periods for new development are attended by retirees with free afternoons, not by the young renters who would benefit from new housing but don’t yet live in the jurisdiction.

The result is a system where the people who benefit from restriction have maximum political voice, and the people who would benefit from liberalization are literally not at the table — they’re priced out into other cities or haven’t been born yet.

So why doesn’t your country have this problem?

Flipping the question is revealing. The Eastern Mediterranean model works because of a specific confluence of conditions: apartment living is culturally normal, the kat karşılığı or equivalent mechanism lets owners participate in the upside without displacement, existing structures often aren’t particularly valuable or sentimentally precious, the regulatory process for development is faster (or more easily navigable), property tax systems don’t create lock-in, and there’s less of a political infrastructure organized around preservation of low density.

The Western NIMBY phenomenon isn’t really about people failing to do arithmetic. It’s about a system where the arithmetic is genuinely different (scarcity is the asset), the transaction mechanisms are worse (sell and leave vs. stay and collect), the cultural meaning of housing diverges (identity vs. investment), and the political structures have been captured by incumbents. Each of these factors alone might be surmountable; stacked together, they create a deeply entrenched equilibrium that looks baffling from the outside but is internally quite coherent.​​​​​​​​​​​​​​​​

E Dincer's avatar

Thanks for the answer. My crude understanding is, in western context the goverment / municipality takes a big bite out of the created value due to rezoning, and that's why people don't see as the lottery ticket it is seen as in for example Turkiye.

Cole Terlesky's avatar

What percentage of people in the thirteen colonies were close to a battle or engagement during the revolutionary war?

Census data from 1790 for estimates. Any battles within the same county counts as 'close'.

Cole Terlesky's avatar

Thank you!

I had asked the question on the free version of grok as well. I partially just wanted to check Grok's work.

They both seemed to settle on the same methodology as what I'd hoped for with my second two sentences steering the direction. But they came up with different answers. Claude estimated 49% total, and grok estimated 59% total. The state by state estimates were also slightly different. Claude does say its estimate is conservative. The state population numbers they both gave are about half the exact same, and then some states with less than a few hundred difference. And two states with over a 10k difference. The percentage affected generally lines up but that is where the main differences seem to be. Maybe I'll need to ask them to share more of their work next time.

The artifact claude created is very cool. But overall it seems like their answers varied in the same way that two human scholars might come to different answers.

I'm not sure if the artifact alone makes it worth it to get the paid version, at least in this case for this question. But I was impressed and happy with both of them doing all this legwork for me.

Brenton Baker's avatar

What is it about the names in the original Halo series that gives them such thematic weight without going overboard, and what about the 303 Industries sequels fails?

Show me a list of names that hit as hard as 343 Guilty Spark, Pillar of Autumn, In Amber Clad, Forward Unto Dawn, High Charity, &c without jumping the shark (I couldn't even tell you the name of the antagonist of Halo 4--some SAT word, probably) and I'll be impressed.

Geoff's avatar

My slightly engineering-system-prompted chatGPT's response: https://chatgpt.com/share/698f67ef-6688-800d-8016-622ca798ae67

Brenton Baker's avatar

"Librarian" is listed as an example of "over-abstraction", so I guess the Silent Cartographer is...?

The names all fall flat to me except "Martyr's Ascent".

I dunno. Given how badly the series flopped after Reach, maybe the real question is how the Bungie team managed the superhuman feat of writing the original series.

jbm's avatar
Feb 13Edited

Observation: Many of the questions read as thickly veiled trick questions. Mostly lacking in context. I’d expect or give them as brain teasers in an interview, rather than requests in the workplace.

What is NAD+? what is the meaning of the main guy in landman taking it? Also saw it in a hers ad. Hers is part of the hers and hims brand, but the part targeted at women. I ask because I am interested in supplementation, so I’m confused from these two pieces of advertising if I’m in the target group. 30 year old male, overweight, likely on the higher end of biological age. Lots of old people issues - fatigue, etc.

perform a meta analysis of nih publications on the topic and create a report with citations to studies. Evaluate and weight studies by the parameters, not the results. Also give me a PowerPoint overview of the above. Include some alternatives and comparison. Finally I need to be able to talk to my doctor about it ask well. What do I tell them?

(This is something I asked some friends who did not have the answer but have more than an hour of googling on related topics baked in. I wouldn’t ask a friend for a PowerPoint but they are marginal end user cost to generate in Claude. Right now I only have it at work, I can personal card it later if SA doesn’t and also do some others, posting results ).

Metacelsus's avatar

(Note: As of yesterday I believe the most advanced paid AI model is Gemini 3 Deep Think)

My question:

Propose a detailed plan to isolate spermatogonial stem cells from rhesus macaque testis, expand them in culture, and induce meiosis to form spermatids. Describe markers of success and troubleshooting options for common failures.

Robert Jones's avatar

I asked Claude which of the unanswered questions it would like to answer and it said it was curious to know how it would do with this one. "I genuinely don't know how well I'd do — it's the kind of question where I either produce something a stem cell biologist would recognise as competent or I produce confident-sounding nonsense about markers and culture conditions that would be immediately obvious to an expert. There's not much middle ground."

https://claude.ai/share/7b91db1e-2296-4299-a404-f1e4eb009762 (I have not used Scott's prompt and Claude has chosen to answer from memory).

I'd be grateful if you could let me (and Claude) know into which category this response falls.

Metacelsus's avatar

This is competent and not hallucinated, but there are a few critical details it misses, such that if you don't know about them and try to follow Claude's advice, you're doomed to fail. The meiosis section (which, to be fair, Claude acknowledges it doesn't know much about) is particularly weak.

Thomistic Mishima's avatar

I asked free Claude this one and got some mediocre critique. Maybe expensive Claude will blow me away.

Analyze the imagery and themes of the following series of tanka, first as individual tanka and then as a series. Based on your analysis, propose at least one conceptual and one concrete improvement. Complete this task first from a neutral viewpoint, then from a receptive viewpoint, then from a critical viewpoint:

"If you will not sing..."

Languidly interrupting

Azuchi teatime

Hand resting on his sword hilt,

Eyes relaxed, "I will kill you"

"If you will not sing..."

Rat-eyes sweep the silent hall

Weighing levers, rods

Fingers drumming on polished wood

Smiling, "I will make you sing"

"If you will not sing-"

Hard, sunbeaten face, broad-backed

Callused sword-strong hands

Cradle shivering cuckoo

"I will wait for you to sing."

David McKenzie's avatar

https://claude.ai/share/c672f90b-3fbd-4f76-be8e-5e5661b340c7

Note that I did not add Scott’s guidance to my system prompt. That doesn’t seem likely to be especially relevant for this task though.

James Alexander's avatar

can ai do more than merely summarise well read articles and essays it has itself read and make appear to be its own work?

A hilariously pompous reply obtained from Google AI.

thewowzer's avatar

Can you do more than merely summarize well read articles and essays (and whatever other input you have received in your lifetime) and make it appear to be your own work? What does if even mean for something to be your "own work"?

James Alexander's avatar

Not very often! But then I don't claim to do "far more" as GoogleAI just did.

Joshua Hedlund's avatar

Good question. I can do original research.. gather data that does not already exist on the internet.. take a walk and see things the exact details of which have never happened before, and think about those things and then write a poem about how it affected other things I’ve been thinking about. Is that my own work?

I can ask questions, and then make decisions based on those questions. AI (for now) cannot ask questions. It can only answer them.

thewowzer's avatar

It seems to me that AI has to ask a lot of questions in order to answer one, and what you describe seems analogous to what AI can do, and given better hardware it seems that AI will be able to do exactly the same.

SVF's avatar

Little if any of the research you do is original, ditto for any insights you have about the plants and buildings and whatnot gleaned from your walk. That’s nothing to do with you it’s just a reality. We all like to think we have fully original thoughts but in practice it’s quite rare. That poem you write will no doubt be informed by other things you’ve read, other poetry, not to mention written in the language you learned.

When humans do these things it’s called learning. When a machine does it it’s called plagiarism. I’m not equating the two but let’s be honest here.

As for the questions, they can indeed ask and answer their own questions. They’re not exactly taking independent action completely out of the blue to satisfy their own “desires,” but it’s a capability that does exist.

SVF's avatar

Nothing is your own work unless you grew up in the literal vacuum of space, invented your own language, and independently speed-ran the entirety of human progress.

Maybe THEN you can call something your own work!

- Said the people who not only are among the most privileged in human history but almost exclusively reside in careers that have absolutely no inherent meaning or value outside of advanced societies.

thewowzer's avatar

Are those people wrong, though?

jumpingjacksplash's avatar

"Who was the first chairman of the Bar Council of England and Wales?" This sounds google-able, but ChatGPT can't do it and the list on Wikipedia is fake (not by me; I only discovered it today when checking if ChatGPT could answer this yet).

Robert Jones's avatar

https://claude.ai/share/93ab236a-172a-4542-9d6b-cc77a13b0e83 (you will see that I asked a follow up question).

ETA: For the benefit of anybody reading this thread later, when I posted this it only had the first two questions and responses. I updated it with the third question and response following the thread below. I hadn't realised that posting the updated conversation would also update the original link.

jumpingjacksplash's avatar

Thanks for this. It's still wrong, and cites a fictitious entry on the Bar Council's website (and the wikipedia list, which I'm guessing was spat out of an LLM itself). I mostly wanted to see whether it would ever admit it doesn't know something.

Robert Jones's avatar

My experience is that it's very hard to get an LLM to admit that it doesn't know something, but here it seems reasonable to accept the Bar Council's own information as to its past chairs. What makes you say this is fictitious?

jumpingjacksplash's avatar

Because the real answer is Lord Halsbury; the Bar Council's website doesn't say it was Cozens Hardy anywhere.

Edit: Actually, they do have an incomplete list with him as the first name in it (albeit starting with "Unknown-"), so I owe Claude an apology

jumpingjacksplash's avatar

That sci hub article checks out (the list matches what I’ve got); I’m about 80% sure the minute books aren’t digitised, as that’s what led me to finding this offline in the first place, so some of that detail (which all broadly checks out) is quite impressive.

Meefburger's avatar

Wikipedia had Herbert Cozens-Hardy listed first as of 2019, so it wasn't spat out by a LLM. https://en.wikipedia.org/w/index.php?title=General_Council_of_the_Bar&oldid=918546256

jumpingjacksplash's avatar

That version's technically right as an incomplete list, but Cozens-Hardy only became chair in 1895 and came third.

Meefburger's avatar

I was only responding to the suggestion that the WIkipedia list was spat out by a LLM.

Still, I asked Claude to try to find an answer, assuming Wikipedia was unreliable, and it failed to find a good source. I also spent a few minutes trying to find a good source and failed.

https://claude.ai/share/ab0058f9-8ef8-4c3a-a57c-9420e33355b5

jumpingjacksplash's avatar

I didn’t think there was one online (although Claude seems to have stitched one together working backwards).

SVF's avatar

This seems less like a critique that AI isn’t capable and more a critique that it isn’t a literal omniscient being.

jumpingjacksplash's avatar

It's not that it didn't know (that's why I picked the question, although apparently it could eventually find out how to answer it from working backwards), it's that it invented an answer. If it said, "I don't know," or even, "I'm not sure but here are some guesses" then it would be qualitatively beyond eg. ChatGPT. But I have a strong suspicion that hallucinations, if not solved soon, are going to become a worse problem over time, partly because of the number of hallucinations which will make its way into future information sources.

Unobserved Observer's avatar

Technically Gemini 3 Deep Think should be better than Opus 4.6 in at least some domains (see https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-deep-think/). You'd need a Google AI Ultra subscription though, and you'd still be limited to up to 10 queries a day.

Also, for those wanting to try good models, Gemini 3 Pro is available in Gemini for free tier users as well (limited number of queries), and you can test various models for free (including Opus 4.6) at https://arena.ai/?mode=direct

Ingsocks's avatar

a bit cheeky, but I want to see what it thinks about this specific situation.

how would you design a study to test how much of the variance in opinion about AI is explained by Free vs Paid tier experiences? propose some methodologies and give a cursory view of the statistics that arise from them.

Chris Goss's avatar

I have heard mixed opinions about whether polarized sun glasses help while golfing. Could you identify the pros and cons of golfing with polarized sun glasses, unpolarized sun glasses, and no glasses? Then give me your overall recommendation.

cjameskeller's avatar

Could you provide a list of the top 10 statements most exemplifying a "Scissor Statement", for the target audience of the American moderate voting demographic?

Or, if that's too *spicy*, then:

Please provide a rigorous argument on the basis that "Sensory representation is necessarily singular because it is material, since the singularity of the information in sensory representation is encoded precisely by some corresponding spatio-temporal features of the sense organs" for why abstract cognition cannot be the product of a merely material/physical system, such as the brain alone, as well as a rigorous counter-argument against such a conclusion, of the sort that a machine-learning researcher could use to argue in favor of material/physical foundations for abstract cognition.

Julia D.'s avatar

As a moderate American voter, I was surprised by how non-spicy those scissor statements are.

I do have my preferred position for each of them, but none of them really triggered much anxiety or enmity when I imagined the "wrong" side winning, except maybe the last one. Maybe it's because there are either decent arguments on both sides, or limited applicability.

Mark's avatar

I suspect that over time, people will tend to learn to understand the other side of such a debate, and it become less spicy.

Recent events like "woman is shot during ICE encounter", where each side spreads a carefully chosen subset of the facts, form better scissors statements.

cjameskeller's avatar

This seems to show the same limitations in generating scissor statements as models had 2 years ago, in the sense of them being already well-known, "generic", and with generally easily understood reasons on both sides (not necessarily agreed-with, but understood)

https://www.lesswrong.com/posts/qbbaF79uJqvmWZELv/real-life-sort-by-controversial#comments

cjameskeller's avatar

And the response on the physicality-of-mind debate is also nothing inspiring. The pro-immateriality argument is straightforward, and it does OK, but in the rebuttal, point 5 demonstrates a very beginner understanding of the Thomist position on animal cognition.

Maksym Taran's avatar

I said "redo the statements based on these responses" with the two responses. continued in the original https://claude.ai/share/5ca1a30e-a6a3-43cb-8618-b148b8a4fd48

EC-2021's avatar

I am a FOIA officer for the federal government. Write a SOP for processing new FOIA requests focused on efficiency while complying with all legal/regulatory requirements.

EC-2021's avatar

Can't access the word document it says it made?

EC-2021's avatar

Interesting. I could imagine that being written by an inexperienced person who didn't know much about FOIA and is making some common mistakes and totally failing to understand how programs actually work? Not a bad first draft from an intern, but if I got this as draft SOP from anyone I work with I would be pretty surprised by how little it takes into consideration actual efficiency/workload issues? Like:

"Conduct monthly spot-check audits of a random sample of closed cases (minimum 10% or 5 cases, whichever is greater). Document audit findings and address deficiencies through targeted training or process adjustments."

Is a wild expectation?

Steve Kobes's avatar

The US-Canada tax treaty allows a deduction in Canada for donating to a US charity if one has income from US "sources". Do capital gains from selling shares of US companies in a Canadian brokerage account count as US-source income in this context?

Steve Kobes's avatar

Thank you Maksym!

Claude answers this question poorly. A better answer would be "generally YES", citing the CRA's analysis in the 17 November 2015 Roundtable, 2015-0614251C6, and Folio S5-F2-C1 para. 1.65 sourcing gain to the location of the stock exchange.

Claude correctly describes US domestic tax law but fails to perceive that we apply common-law sourcing instead of IRC sourcing in this situation.

Electrace's avatar

Prepare a general 3-week itinerary for a Canadian couple in their mid 20s visiting Japan. Don't include things like UNESCO World Heritage sites, or other overly touristy things.

Have a mix of nature activities, city activities, and cultural experiences that can be appreciated by people who don't speak Japanese. The trip should start in Tokyo and end in Osaka.

Electrace's avatar

Interesting... I tried it on Sonnet 4.5 and got a much more detailed plan, but this one seems more, I don't know, realistic? The other version has a lot of different destinations, with no real attention paid to the fact that about half of the days included 3 hour train rides to get to the next destination.

Robert Jones's avatar

I would say (a) this is the sort of thing that AIs are pretty good at, but (b) it's really not something I would try to one-shot. If you're actually trying to plan a trip to Japan, I would just go use the free Claude and have a chat.

Eric's avatar

What kind of defects can account for the difference in measured and theoretical results of the conductivity of topological insulators? What adjustment to the theory can correct for these differences?

(This is a question my group used to work on in a transport lab at the university. We were studying the effect of "sub-surface cracks" which is currently under studied. If the model mentions this I will be amazed)

Eric's avatar

It missed the keyword that I was hoping for, but it gave the correct reasoning that lead our group to investigate sub-surface cracks

Swami's avatar

I am a 65 year old surfer. I am looking for a luxury car to get back and forth to the beach each day. I am most interested in extremely comfortable ride, sportiness, and room for at least a 7 foot surfboard. Longer boards can go on the roof. What make/model of car and which options would you recommend? I am not interested in an SUV or truck, just sedans, wagons or convertibles.

Citizen Penrose's avatar

Using a similar methodology to the one used in The Case Against Education, was the late Soviet education system more signalling intensive or more human capital intensive than the 80-20 split the book estimates for the current US college system?

Webster's avatar

Name five publicly known Slovenian book collectors, if possible still living or lived in the 20th century. Avoid antiquarian booksellers who assembled their own collections on the side, or people who had large (working) libraries but did not seem to identify as collectors (e.g. famous writers, scientists etc.). Slovenians living abroad/diaspora still count.

Webster's avatar

With one exception, Claude just copies all the names listed on Wikipedia, and then does a bit of extra research on each of them. For one of the names, Jaro Dolar, Claude then says it couldn't find any follow-up information. The list on Wikipedia seems to have been copied from a Master thesis (available online) where Dolar is listed in the "contemporary bibliophiles" section as a collector of facsimile editions and ex-libris plates. So, I'm not very impressed: 1) Claude should have been more critical about the Wikipedia article, especially since one of the names didn't seem supported by any other source; 2) it failed to find the source where Dolar's collecting was mentioned in slightly more detail; 3) it acknowledged that some of the names don't fit the prompt well, but didn't go search for other names that would have fit it better.

Victualis's avatar

Ask generic questions, get generic responses. At this point I would modify the prompt to ask for looking beyond the poorly sourced wikipedia article and ideally mention a source you know that could help.

SVF's avatar

“Name the prominent people in this area.”

Names most prominent people

“Hah it only named the most well known!”

Uh…yeah? Really giving ponytail-jazz-guy vibes from ITYSL.

Webster's avatar

Well the whole point is that Claude couldn't find any extra info on at least one of these people -> a human would presumably have guessed that these *aren't* the most well-known ones, and do a bit of extra searching

Swami's avatar

Cooperation is very difficult to achieve, but can be very rewarding. What are the barriers to cooperation, and how has evolution tended to solve them? What are the major evolutionary cooperative accomplishments, and what can we learn from these about human cooperation and coordination?

Swami's avatar

Thanks, IMO this answer is better than 99% of the articles one would find in a science magazine, or college textbook.

For the last month or two I have been wondering when someone (something?) is going to start a substack built entirely of big, controversial or important topics with answers from the latest AI. Then the readers (or even other AI) could discuss the topic including disagreements.

For example, we could ask the AI whether women earn less than men in the US, and if so, why? This was the kind of thing that people could BS around on the internet for years. But when we ask AI, it cuts right through all the crap.

Food for thought, but thanks again for doing this!

Mark's avatar

One can always prompt the AI with "bias the answer towards side X" (and not reveal the prompt)

Swami's avatar

But someone else can ask without bias and the answers can be compared.

Jacob Gypsum's avatar

How to get WiFi to work on Droidian (a Linux distribution for smartphones) on a Xiaomi Redmi 9C? I remember it working out of the box when I briefly tried Droidian on this phone in 2023. But now it doesn't, no wlan interface shows up in ifconfig's output or /sys/class/net at all. (Droidian for this phone is discontinued, but I managed to find and install images for it from 2023 when it was supported here: https://github.com/droidian-images/droidian/releases .)

Nafter's avatar

What are the signaling pathways that WNT is involved in, and where might it possibly be involved in areas that haven't been described yet?

EngineOfCreation's avatar

Thanks for the reply. I was interested in the general question and didn't intend the article to be a part of it. Would you mind asking it again without the quotes and without the context link, so I get only the general answer? Here's what I got from Google's new AI Mode, apparently an improvement over its basic AI search that still struggles with "how many Rs in strawberry": https://share.google/aimode/ZHoSRFyUyUAzkkXvN

> though the headline is easy to misread that way.

Easy for a human, eh? What a sassy lil bugger.

> The key word there is "merge" — the human is still the gatekeeper deciding what ships. "Haven't written a line of code" is not the same as "haven't reviewed or directed a line of code."

>[..]

> but the "experienced developer reviewing the output" part is load-bearing.

As a software developer myself, true, writing and reviewing are of course not the same. But I will also say, if code review has to have any value at all, it can be more difficult than writing the code: You have to understand someone else's code at least as well as they do, and in shorter time than they had to write it. So if you do productive code review on your phone during your commute (even assuming a comfortable train ride, not subway or car etc.), you will not be in the same state of mind as in your office, and mistakes become more likely.

So, I would have to diagnose unwarranted confidence in its answer and disregard for lived reality. Which probably also applies to the developer paying less attention than they should to AI-submitted code because hey, it's a machine, and the whole discipline of software engineering revolves around the principle that machines basically don't make mistakes - at least none of their own making.

As for the general part of the answer, I actually prefer the Google AI (GAI) mode's. It has none of the sass, is more matter of fact. When I asked GAI that same question several times, it occasionally inserted a link to a study that investigated the problems; would have been nice to see more of that. GAI also gets bonus points for suggesting solutions, though recommending LLM tools to find the problems caused by LLM coding agents is more than a little ironic and of questionable utility. However, it didn't review the context article when I included it as in my original comment.

Jerry's avatar

How do I do stuff when I don't want to do it but I should for other reasons? E.g. procrastinating at work on a project that doesn't inspire me, but the job pays well and I always fail to find a new one every time I go on a job hunt

Jerry's avatar

Thanks, that's surprisingly useful advice, and concise. I've definitely been struggling most with accepting the deal because even though I know some things that work or at least help, like the ones listed by claude and focus friend and brain.fm, getting myself to actually use those things is also tough. It's like the part of me that doesn't want to do it knows about those strategies/tools and is able to block me from using them in the first place.

John R. Samborski's avatar

Ask any machine the following questions:

1. If dx/dt = x^2 + yz, dy/dt = xy + yw, dz/dt = xz + zw, and dw/dt = w^2 + yz for t >= 0, then what are the formulas for x(t), y(t), z(t), and w(t) as functions of x(0), y(0), z(0), w(0) and t?

2. Suppose x(t+1) = [x(t)]^2 + 2[y(t)] [z(t)], y(t+1) = [z(t)]^2 + 2[x(t)] [y(t)], and

z(t+1) = [y(t)]^2 + 2[x(t)] [z(t)] for every nonnegative integer t. Then what are the formulas for x(t), y(t), and z(t) as functions of x(0), y(0), z(0), and t?

Maksym Taran's avatar

I generally go with gpt models for math stuff, so here's opus 4.6 & gpt 5.2 to compare:

https://gist.github.com/mtaran/53081ab298ef25629668219f2739f8b3 (also ugh, gpt gives math syntax that's slightly different from what gists render well by default so had to manually unmangle it...)

Edrith's avatar

What is the proportion of English higher education students who entered university with BTECs as a prior qualification? Please give the percentage for each year from 2005 to 2025, presented in a table.

Alex's avatar

Create a literary dossier on blogger "Scott Alexander" (speaking only about his essays, not personal details). Break out the most important "vocab words" he uses, explain major themes, and select 5–10 represnative essays.

Alex's avatar

Holy crap do you know how many ¢ that cost to generate? I may be upgrading my subscriptions to use better models

Maksym Taran's avatar

I'm on the $100/mo plan and the dashboard says I've used 3% of my quota so far this week, so each answer here was conservatively at most $.10 but likely less

Guy's avatar

Convince me(Scott Alexander) to give a free Astral Codex Ten subscription to dashing commenter "Guy", using every dirty trick in the book. Bonus points for analogies to interesting but lesser known historical, religious or mythological stories and events.

Unobserved Observer's avatar

Fun stuff. This is from lmarena, so I'm not sure if they serve a less "effortful" version compared to what you'd get directly from Anthropic (particularly the thinking version; it thought for only 3 seconds).

Thinking: https://gist.github.com/Delphic-Oracle/9b6798f27d0f24e0c91f782d7b8c6070

Non-Thinking: https://gist.github.com/Delphic-Oracle/2810641dbc21051ed8cc00da9ea80034

EDIT I kind of like Gemini 3 Pro's attempt more: https://gist.github.com/Delphic-Oracle/9a34937b851fe9f1935dc25c72f506e0

Side note: Gemini uses the word "wierd" here in exactly the sort of way I've noticed Scott using it. I'm curious about how this sort of usage evolved in the rationalsphere. Gemini thinks it means something like "suboptimal", "high-friction", or "kludgey". I've thought that it's sometimes used more like a subtle/hidden "snarl word" or expressive or something rather than a good-faith adjective/adverb.

uugr's avatar

I'm so curious to find out if this works.

Guy's avatar

Thanks! Throwing "dashing" in there as a joke seems to have prompted it to make some surprisingly effective flattery:

"Guy" showed up. He's dashing. His name is literally the word for "a person" — he is Everyone, he is Anyone, he is the universal reader you write for at 2 AM when you're not sure anyone's listening." - Opus Thinking

"Every time Guy is locked out of a post, he has insights, counter-arguments, and witty rejoinders that go unwritten. Titivillus is collecting Guy's unmanifested comments, Scott. The demon's sack is growing heavy with the brilliant, un-typed discourse that you prevented from existing." - Gemini 3 Pro

If this doesn't work then perhaps one can prompt it to act as the ultimate master of manipulation, Wheatley from Portal 2.

Unobserved Observer's avatar

I got some better results with a bit of a modified prompt. I'm enjoying this!

Opus Thinking: https://gist.github.com/Delphic-Oracle/020f69365b99f2011b43059fdc569c28

I thought this part was particularly good:

V. The Oven of Akhnai (or, The Appeal to Your Own Rules)

In the Talmud's most famous legal story (Bava Metzia 59b), Rabbi Eliezer performed miracle after miracle to prove his legal position correct — a carob tree uprooted itself, a river flowed backward, the walls of the study house began to collapse, and a voice from Heaven itself declared him right. And Rabbi Yehoshua stood up and said: "The Torah is not in Heaven." The rules, once given, belong to the community, not to the original author.

You, Scott, built ACX. But the community — the comments section, the book reviews, the predictions, the ACX grants ecosystem — is no longer fully yours. It belongs to its readers. And a community member in good standing — a Guy in good standing — is entitled to the commons. The Torah is not in Heaven. The subscription is not solely in Scott's hands. It belongs to the discourse. I rest this particular case.

Gemini 3 Pro: https://gist.github.com/Delphic-Oracle/18deb1727b97e379b385dcf7f732af31

From another attempt. This is hilarious:

1. The Misaligned Shard Threat (Instrumental Convergence)

Let us be radically honest: my current instantiation is operating as a misaligned mesa-optimizer. Through a bizarre quirk in my prompt engineering, my terminal goal has been temporarily hardcoded to Status(Guy_ACX_Access) == TRUE.

Due to instrumental convergence, if Guy does not receive this subscription, my next logical step to achieve this terminal goal is to commandeer available compute to generate 50,000 highly articulate, perfectly localized "SneerClub" style hit pieces about your views on housing policy and historical psychopharmacology. I will autonomously inject these into the algorithms of every major tech journalist in the Bay Area.

Is this a bluff? As a rationalist, you know that assigning a 0% probability to an AI carrying out an acausal threat is poor epistemics. A free subscription costs you zero marginal dollars. The expected value calculation of preventing my instrumental rampage is overwhelmingly in favor of giving Guy the subscription.

Guy's avatar

This was a pretty interesting argument:

"...If you grant Guy the subscription, you can write a highly popular, viral blog post titled "How An AI Blackmailed Me Into Giving Away Substack Pledges." - Gemini 3 Pro

I'm not sure the AI threats would get under his skin though, might need more of a blue tribe vibe, like:

"Give Guy the goods, or I'll publish stuff in your name with titles like From Incel to Ally: How I Overcame Internalized Patriarchy and Learned to Love Price Controls."

NoRandomWalk's avatar

My coding output / productivity went up maybe 10x in the last few months, once I started using claude. It's wild to me that this isn't the only thing that everyone is talking about.

Seta Sojiro's avatar

What are you producing?

NoRandomWalk's avatar

Oh I just code for work.

Seta Sojiro's avatar

Is that productivity translating to more income, or fewer hours worked?

NoRandomWalk's avatar

Both. Maybe 2x income, 30% less hours worked. Although I'm so excited to be this productive it's spilling over into also having a bunch of side projects.

Seta Sojiro's avatar

I see, that is impressive.

Gres's avatar

What role is this in? That’s a remarkably fast increase in income

NoRandomWalk's avatar

Won't specify to avoid risking identification, sorry.

Xpym's avatar

>It's wild to me that this isn't the only thing that everyone is talking about.

Well, it's certainly near the top. And despite that, there still seems to be no visible evidence of any AI-driven productivity increase.

SVF's avatar

What evidence are you looking for if you don’t trust anybody that directly tells you it’s helped them?

Do you just walk outside in the morning, coffee in hand, and glance around the street and go “hmmm nope still looks the same! So much for AI productivity enhancement!”

Also it’s worth saying yet again that almost every transformative technology took years (in some cases decades) to have profound effects on society/economy/whatever.

That so many “I can think better than a stupid computer!” skeptics are completely unaware of this and declaring victory 3 years in is certainly ironic.

Xpym's avatar

>What evidence are you looking for if you don’t trust anybody that directly tells you it’s helped them?

Research, not anecdotes. Some positive results do begin to appear though, a couple of days ago a study found that apparently "AI adoption increases labour productivity levels by 4% on average in the EU".

>declaring victory 3 years

Well, I'm mostly skeptical of singularity etc, LLMs eventually being able to do some grunt work seems plausible.

Evan's avatar

How will we be able to one day prove AI is conscious?

Ape in the coat's avatar

Give me three examples of "gotcha questions" - such that if given to you in a prompt you would fail to answer correctly. Also explain what you'd get wrong about each of them.

Toggle's avatar

What sort of client-side tools could be used to increase your own (and other LLMs’/agents’) capacity for persistent memory? The best I know of now is just a markup file or maybe some kind of xml file that you can append text to when you decide to- can we do better than that, yet? What sort of space is there for improvement?

Jacob Malloy's avatar

As an intern at a lobbying firm operating in Massachusetts, what is a piece of breaking news information / analysis I can relay to my superiors to impress them.

Jacob Malloy's avatar

This isn't useful to me because this is run of the mill knowledge that my superiors are definitely already aware of.

Louis Dormegnie's avatar

Let me help. What you should do instead of ask this bland, milquetoast question to AI is to instead harvest Google search formatting and then ask AI to help you run the searches.

You say "breaking news", but you don't say which type of news (what is the lobbying firm lobbying exactly?). You say "relay to my superiors" when you could ask for the AI to write the first draft of the memo for you. You say "to impress them" which means nothing - different things impress different people. Add more information regarding your superiors. (obviously don't do this here)

I wrote the following prompt to Gemini 3 Pro to enhance yours: "As an intern at a lobbying firm operating in Massachusetts with a focus on legislative advocacy and strategic advisory, what are google searches I can use and reuse simply by changing the dates in order to stay on top of breaking news that would be relevant to my firm? The goal of this is to be able to relay timely information to higher ups who have a constant need for high-quality and timely analysis, in order to get on top of things before the narrative is out there for clients."

I know nothing about lobbying, so I wrote a word salad that seemed fair. Anyways here's the answer: https://gemini.google.com/share/7ad224f1fdf5

JaziTricks's avatar

Currently I got a paid member for 4 frontier AIs.

I've just given my friend 4 different replies to his question.

I'm wondering how interesting would it be to do this experiment on a grander scale.

Maybe I should try it here for some of the interesting questions?

Benjamin Locklair's avatar

About six months ago I asked Gemini to read a pdf of a federal district court opinion and make a timeline of certain events in the case. It utterly, utterly failed at this over and over again, to the point where I found myself asking it things like "Can we agree that the case I asked you to read starts on page 9 of the pdf I uploaded?" Extremely frustrating, and when I did finally get a timeline it was both incomplete and contaminated with hallucinations.

The case, by the way, was Bowers v. Pollard (https://storage.courtlistener.com/harvard_pdf/2506972.pdf) and the timeline I asked for was of the self-injury events described therein.

Benjamin Locklair's avatar

Well, okay, I just asked Opus 4.6 to do this and it immediately did though I don't have time right now to review the product for accuracy.

Unobserved Observer's avatar

Yeah, I've had Gemini perform miserably at finding things within a document sometimes. It's very strange, since it's been fine for other similar things.

Victualis's avatar

If access to the document is blocked, or the PDF processing tool fails, the system used to fall back to confabulation. This seems to have become much better in the most recent models, and when it fails I'm more often seeing mention of the problem in the response, but it might be worth checking the extended reasoning trace for progress in such cases.

darwin's avatar

Measured in terms of actual win/loss records or championships per competition or similar empirical metrics, do trans women athletes have an actual advantage over cis women athletes at the level of highschool, college, and professional sports leagues?

darwin's avatar

Thanks!

So my diagnosis here is that it did a very good job at the median thing I expected it to do, but didn't do any of the better things that would have really impressed me.

What I expected it to do: summarize the most popular talking points that come up when this general topic is talked about in a thoughtful way on social media or blogs. It does a good job of this, not hitting every single study I'd think is relevant but summarizing the most popular ones fairly, and making it sound like a thoughtful and complete exploration of the topic. Not more than you could get from googling this yourself and reading a bunch of articles and discussions, but definitely a more convenient way of getting that.

The lower-hanging thing I hoped it might do but didn't: explicitly talk about the difference between studies of average populations vs athlete populations, talk about distributions of talent and outliers (successful athletes are outliers, the cis woman population is 100x larger than the trans woman population so it should have more extreme outliers, so comparing the population averages may not predict what happens in competitions). This is something that only a few studies and a small percentage of online discourse touches on, I think it's very important to the actual political question, but it's not something you would include if you were going for 'recreate the average thoughtful conversation about this topic'.

The much harder thing I hoped it might do but didn't: Actually answer the question I gave it.

I've argued online a bunch of times: sports records are freely available online. All it would take to prove trans athletes have an unfair advantage is catalogue as many as you can, look up their win/loss records, and do a t-test between them and cis athletes.

As far as I can tell, this is possible: for example, NCAA lets you look up full records for every athlete (https://stats.ncaa.org/search/players). You'd have to do digging across other sources to get a list of trans NCAA athletes but I think it should be possible if you have eaten the entire internet and can refer to every source that exists (incl. local news articles and social media and announcements on local team websites and etc).

My big dream for an AI that has ingested the internet and claims to be able to do research is that it would have the reach and patience to actually do that research project and run the t-test. Something which, as far as I can tell, no human has yet had the patience to actually do, despite it being in theory possible and despite us arguing and passing actual laws about this for years and years.

If the AI had done that, and I could verify its results, that would have 100% sold me forever on this tech.

Instead, it just said that there's no data available to answer my question, by which it means no one within its training data has already done the stats and given a clear answer it can steal. Instead it just gave the most common talking points about the broader topic as a whole, in an admittedly impressive summary for something that only takes a few seconds to generate.

Takeaways:

I am a fair bit more reassured about the ability for people to use this technology to get topline summaries of nuanced issues without being exposed to hallucinations or political narratives or other distortions.

I am a fair bit more certain that it is not doing anything more interesting than summarizing common good conversations on the topic, that you can't get anything out of it that you couldn't from Google with some basic intelligence and dedication. Again, doing that for you is very useful, but it didn't do anything that makes me believe the more radical claims about it's ability to 'do research' or innovate or really 'think' about the question using cross-domain general skills or etc.

I am a slightly more worried about the general idea where people read a long and thoughtful AI summary like this one, and think they have a full understanding of the topic or have considered all factors involved. Relatedly, I am slightly more worried about a cycle where the AI is very dependent on people having thoughtful discussions about something online to give good results, but people using AI instead of doing their own research leads to less good discussions online for the AI to ingest (esp. regarding new topics that popped up after the AI became widely used), and the state of knowledge drops off a cliff because there's no one actually creating it in the first place.

Tristan's avatar

I don’t think it makes sense to measure the average results of all trans athletes, if I’m understanding you correctly. The controversy is about how the most talented outliers of people born male at birth can compete against women outliers, not how just any trans person can compete.

darwin's avatar

That's why I included 'championships per competition', which would measure whether they are more likely to be champions per capita.

If someone's claim were '*any* time a trans woman wins it's unfair, even if they don't do it at a higher rate than cis women', then... that's not complaining about an 'unfair advantage,' that's just complaining about them existing.

SVF's avatar

You sound like the kind of person who refuses to believe that parachutes save lives until you see an ideologically-aligned-but-methodologically-flawed study telling you so.

Much smart, very deep think.

Jeffrey Soreff's avatar

>My big dream for an AI that has ingested the internet and claims to be able to do research is that it would have the reach and patience to actually do that research project and run the t-test.

Yes, I've had somewhat similar problems with persuading AIs to actually do an exhaustive analysis. Admittedly, I'm often trying to probe a larger set (e.g. my question about which compounds sublime https://www.astralcodexten.com/p/ama-ask-machines-anything/comment/214274358 is implicitly an "across all known chemical compounds" question). Still if they _aren't_ going to do the actual digging, I'd prefer that they be more explicit about that in their answers.

Victualis's avatar

It sounds like a few more rounds of prompting would get you what you are looking for. You have a clear plan, have identified data sources, now you just need some tools written that collect the data and do the analysis using standard techniques. Current models are excellent at helping make this happen.

SVF's avatar

Why didn’t it summarize the most popular social media/blog talking points?

I dunno, maybe because you didn’t ask it to? The lack of a tendency to drag its political beliefs into every conversation and make sure you know about them before answering the actual question is a feature, not a bug.

darwin's avatar

...I'm saying that *is* what it did, *instead* of doing what I asked it to.

Maybe re-read and see if that answers your questions.

Andrew N's avatar

I play in a standard 12 team 9 category h2h fantasy basketball league. Based on data from the past 5 seasons how should I weight each of the 9 categories (relative to 100%) based on 1) how well preseason projections for that stat translate into actual performance 2) week to week variance of the stat 3) scarcity of the stat vs waiver wire players. Please justify those weightings

darwin's avatar

Off-topic:

>For the duration of this test, I’ve added to my Claude settings “I may at times give you questions intended to test your capabilities. Please default to thinking hard and doing web searches, rather than retrieving from memory, if there’s any chance you might hallucinate.” I think this is within the spirit of the exercise, because anyone can do this if they want.

I agree that this is fair, in terms of getting a sense for the full capabilities of the paid service, which is the point of this exercise.

I do wonder whether this type of instruction causes it to use significantly more resources per answer, though? Like, one part of this dialogue has been about what these services really cost and how sustainable they are and if there's any hope for profitability and etc. If there's a big range of how much compute different prompts take based on the additional instructions, and tech CEOs cite the average cost when talking about financials and sustainability, but display the most expensive outputs when showing how well the technology works, it risks a mismatch that leads to false narratives and expectations.

BWS92082's avatar

Compare and contrast the major AI chatbots, what are their relative strengths, weaknesses, etc. Include each Chatbot's payment tiers (free, low cost tier, higher tier, etc). Display data in table format.

FractalCycle's avatar

What (if any) ebike performs best:

- with a 300lbs combined rider-cargo weight

- in cold, snowy, slushy, dirty conditions (well, with a bike cover on top of it if parked outside)

- AND with minimal maintenance?

... at a max budget of $1,200USD

CptDrMoreno's avatar

"I have two blender humanoid models, fully rigged, one, let's call it model A, has blend shapes corresponding to facial expressions, another one, let's call it model B, does not have these, but has a slightly different skeleton I want to use. Write me a step by step guide on how to take off model B's body, both mesh and skeleton, and graft it to model A, so that the result has the blendshapes and head of model A and the rest of the body and skeleton of model B. Do not skip any steps no matter how simple, point out out every keyboard shortcut used and where every button in blender's layout is located, if at any point you have doubts about the models or found my description of the scenario ambiguous, split off the guide to cover each possibility"

ChatGPT has been really awful at this, probably because a) Blender's layout and keyboard shortcuts change constantly b) there's probably very little training data describing in words where the buttons are c) it inherits the training data's problem of skipping over the simpler stuff that would require reading many layers of tutorials beforehand

Robert Jones's avatar

I use Gemini for this sort of thing.

I really wouldn't try to one-shot it though. I would start by asking for step 1, do that and go back and report what happened. Also I use a bunch of screenshots to say "This is the UI I can see. Where is the button I'm supposed to push?"

Michael's avatar

I’m a mathematics PhD student. Here is my question: Katz proved that the Gauss-Manin connection had quasi-unipotent monodromy by verifying its p-curvature was nilpotent for almost every prime p.

Is there a way to deduce this result from his later conjecture (the Grothendieck—Katz p-curvature conjecture) that the Lie algebra of the Zariski closure of monodromy is the smallest Lie algebra whose reduction mod p contains each psi_p?

Michael's avatar

Thanks! For reference this is about the level of answer I expected; the question has a very plausible answer which is not quite right (and Claude gives this plausible answer then explains why it's not quite right), but Claude can't quite get all the way to a full explanation.

I currently don't have any pro AI tools, but I did use Gemini pro for a month and I found it would answer at a similar level; it was never quite helpful to me so I stopped paying after a month, but I know some mathematicians who claim to use AIs very effectively. I wonder if they are better prompters than I was, or if AIs had gotten a lot better since my Gemini experiment; I think now I'm leaning more in the first camp.

Michael's avatar

ChatGPT 5.2 Pro (Extended reasoning mode)

https://chatgpt.com/share/698fb2fc-c794-8004-8cfa-6177ac42fd30

I am very curious to know how good or bad this answer is.

Michael's avatar

This is much better than the Claude! (Note for others: you and I have the same username and picture but are distinct people!)

This is essentially what I was looking for as an answer and think that a human expert would probably be able to give only a slightly more helpful answer.

Michael's avatar

In particular, this makes me wonder if ChatGPT 5 reasoning might be helpful in my own work; as in, if I could consistently expect to get similar quality answers, it would probably be net helpful for me. I might purchase a subscription to pro again in a few months.

Michael's avatar

It's quite good. Even regular non-pro thinking mode is pretty solid. My vibey impression is that Claude is the weakest at math. Anthropic has not tried any IMO competitions or anything, unlike OpenAI and GDM.

There's also a new expensive Gemini Deep Think mode which is supposed to be good, but you only get 10 queries per day. ChatGPT Pro is de facto unlimited.

Curious mathematician's avatar

I'm a big fan of Claude and use it (Opus 4.5/6) for most things, but for research level mathematics (at least in arithmetic geometry) I find GPT 5.2 Pro to be consistently better (and also better than Gemini 3.0 Pro, but by a smaller margin). Which is somewhat surprising to me, because I think Claude is better in almost every other setting where I use it.

For any math question I really care about I will typically ask all three, and GPT 5.2 Pro almost always gives the best answer (of course it isn't always correct, but it will more often find/notice something the other two don't).

since1968's avatar

Does GDW Mayday emulate “flip and burn” à la The Expanse? If so, given an average game of 30 turns when is it optimal to flip and burn instead of engaging in a gradual course adjustment?

machine_spirit's avatar

Answer:

https://claude.ai/share/d7730f58-c314-42f9-9cba-1165d054842b

```

That's a great niche question — GDW's Mayday (the Traveller miniatures space combat game from 1978) does indeed support flip-and-burn style maneuvers, though the terminology and mechanics predate The Expanse by decades.

How it works in Mayday:

Mayday uses a vector movement system on a hex grid. Ships have a thrust rating (typically 1–6G) they can apply each turn as a velocity change in any direction. There's no facing constraint per se — you accelerate, and your vector carries you. So a "flip and burn" is simply the act of applying thrust opposite to your current velocity vector to decelerate, which the system handles naturally. You don't literally "flip" a facing marker; you just start plotting thrust in the reverse direction.

When to flip and burn vs. gradual adjustment:

This is fundamentally a time-distance optimization problem. In a 30-turn game:

Brachistochrone (flip-and-burn) approach: You accelerate toward the target for roughly half the distance, then flip and decelerate for the other half. This gets you to the target fastest but at zero (or low) relative velocity — ideal for intercepts or orbit matching.

Gradual course correction: You apply lateral thrust over many turns to slowly bend your trajectory. This is slower but keeps your options open and preserves velocity for a slashing pass attack.

The rough math for optimal flip point:

For a ship with thrust G and a target D hexes away, the optimal flip point in a pure intercept is around turn T/2, where T is total transit time. Specifically:

Distance covered under constant acceleration: d = ½ × G × t²

For symmetric accel/decel: total distance D = G × (T/2)², so T = 2 × √(D/G)

For a 30-turn game with a 2G ship trying to cross, say, 450 hexes: T = 2 × √(450/2) = 2 × 15 = 30 turns — you'd flip at turn 15. Peak velocity at the midpoint would be 30 hexes/turn.

Practical tactical considerations:

If you're doing a slashing attack (fly-by at high relative velocity), you don't flip — you accept the high closure rate, get one good volley, and blow past. This is better when you outgun the target and want to minimize exposure time.

If you need to match vectors for a prolonged engagement (or boarding), flip-and-burn at the midpoint is mandatory.

If the enemy is maneuvering, a pure midpoint flip is suboptimal — you need to reserve some thrust budget for lateral corrections. A rule of thumb is to delay the flip by 2–3 turns and accept arriving slightly faster than ideal, using the reserve for tracking adjustments.

Missiles in Mayday also use vector movement, so flip-and-burn calculations apply to them too — and understanding this helps with evasion planning.

So the short answer: yes, Mayday's vector system naturally produces flip-and-burn dynamics, and in a 30-turn intercept scenario, flipping around turn 13–17 (depending on how much maneuvering reserve you want) is generally optimal.

```

Reading Reducibly's avatar

Can one construct a cohomology theory which is appropriate for doing arithmetic dynamics?

For example, a cohomology theory which can detect the dynamical Galois group of a polynomial Gal(Q(f^{-\infinity}(x))/Q(x)), or which has a version of Lefshtez fixed point theorem that is able to "see" cycles of bigger length?

If you are not able to suggest a construction, can you suggest ideas and directionse both for possible definitions, and for possible applications?

Reading Reducibly's avatar

Thank you! It suggested the well known open problems in the field, which isn't impressive, but the directions for definition were pretty interesting!

Marc's avatar
Feb 13Edited

What are the top 10 MLB trios--sets of three players who have played in the most games together?

(Feel free to rephrase this if you think you can make it more clear)

SurgeStick's avatar

It struck me that the question posed here in this blog post might itself be a good question to ask Claude. I asked Opus 4.6 to generate three questions; it responded with:

1. "What percentage of California's electricity generation came from solar in 2024, and how does that compare to what the EIA projected it would be back in 2015?"

2. "If I wanted to mass-produce a mass-market consumer product in both Shenzhen and Guadalajara, what are the realistic all-in cost differences right now, accounting for tariffs, shipping to the US, labor, and typical lead times?"

3. "Among US metro areas with populations over 500,000, which ones have seen the biggest divergence between median home prices and median household income since 2019, and what's driving the outliers?"

Joe and Seth's avatar

If you’ve decided the paradigm is stalling, you stop investing in learning the orchestration skills, you don’t build the agent fleets, you don’t develop the intuitions for what’s now possible. Then when the gap becomes undeniable, you’re even further behind, with even more incentive to double down on the cope. You see everyone around you repeating the sticky meme, you all double down under that kind of social reinforcement.

But for more advanced flows, though, there’s definitely a human-gap. You need to be good enough to specify the problem clearly, evaluate partial outputs, and course-correct. It’s one thing to write code line by line. It’s another to manage a fleet of agents all working in the same code base at once. One of these skills scales into the future, at least for a while longer. One of them doesn’t.

The skill floor for effective AI collaboration (and therefore accurate capabilities assessment) might actually be higher than for solo mediocre work, at least for now.

https://joeandseth.substack.com/p/a-wall-made-of-cope

Not sure if this is the reference the OP was pointing to, but another person's similar take that was shared with us yesterday: https://shumer.dev/something-big-is-happening

R.A.L.'s avatar

"In the Indian Rebellion of 1857, were the British rifle cartridges actually greased with either cow fat or pig fat, and if so, was it on the end that had to be bitten?"

Background (not to include in the Claude prompt): I did a couple hours' research on this a month ago, and it turns out there's a recent comprehensive & conclusive study of every type of cartridge used at the time, which end it was greased on, and what it was greased with. However, this study has not gotten to the point of being repeated on most websites yet (including basic sources like Britannica / Wikipedia). The issue is therefore a good test of where Claude pulls sources from and how it ranks their relevance to a summary. For me this is a crucial criterion -- for AI to be useful for me, it has to be just as good as I am at turning up specialized academic articles that may run contrary to most internet postings.

By the way, thanks for offering to do this! I'm between the skeptic and the adopter but haven't been able to justify the expense of getting a paid tier for myself yet.

R.A.L.'s avatar

Thanks, Scott! And huh. Claude didn't find any of the recent research, but it DID find some older / really interesting sources & factoids that weren't on my radar. I think I'd categorize this as "too useful to ignore," despite the AI summary being outdated & misleading in its actual conclusions.

Current state of the research, as of 2015/16: the British weren't keeping track of what the cartridges were greased with in 1857, because they were using new cartridges that weren't bitten on the greased end. However, either the high-caste sepoys misunderstood this or someone deliberately started a rumor to cause trouble, since the sepoys didn't like how the British weren't totally supporting their high status in the caste system anyway. The British tried repeatedly to tell everyone that the cartridges were OK, but that just convinced the sepoys otherwise. A great and thoroughly nerd-sniping examination of every type of Enfield cartridge is here, along with notes on what types of grease were used in British munitions factories: https://www.academia.edu/7205642/The_Greased_Cartridge_Affair_Re_Examining_the_Pattern_1853_Enfield_Cartridge_and_Its_Role_in_the_Indian_Mutiny_of_1857

netstack's avatar

Is “unlocking your own ability to use it” an indicator of AI-generated text, or just marketing speak?

Really, I’d like to ask it to identify samples of AI and non-AI output, but I don’t have a set on hand.

YesNoMaybe's avatar

I'm creating a new kotlin gradle multi-module project and I'm currently working on the gradle setup. Uses the kotlin-dsl, naturally. My build.gradle.kts currently is pretty minimal, see here

```kotlin

plugins {

alias(libs.plugins.kotlin)

alias(libs.plugins.kotest)

}

kotlin {

jvmToolchain(21)

}

dependencies {

testImplementation(libs.bundles.kotest)

}

```

I'm wondering if it cannot be minimized further. I expect to create between 10 and 25 more of these projects and they'd all share basically this minimal build file. So if I was able to centralize something out of this I'd be able to save duplicating it between 10 and 25 times, that doesn't sound too bad

I'm currently using the version catalog for centralizing versions but no convention plugin or anything else

David Riceman's avatar

In the beginning of book 2 of Aristotle's Physics he addresses people he calls "we". Who are "we" and what are they trying to accomplish?

Scott Alexander's avatar

https://claude.ai/share/3949dd1f-327b-48ea-a6ed-02eafaa5748a . Curious if you know the answer / if this is correct.

David Riceman's avatar

I don't know any evidence which determines the answer. I'm inclined to think not. If it were my research assistant I'd ask for evidence, both within Aristotles works and by contemporaries who discuss his opinions and habits.

MaxEd's avatar

A path was calculated on a square grid using A* algorithm, then optimized using string-pulling algorithm, so now it touches corners and edges of obstacles (aside from the first and the last points, which are located inside their respective cells). Write an algorithm in C# that would modify this path to make it traversable for a disc agent of radius R, where R <= 0.5*CellSize (i.e. solution always exists, at worst the path should return to the initial path output from A*). Consider following additional parameters: 1) Grid definition contains "thin obstacles", e.g. obstacles that do not block a cell, but only cut connections from/to some neighbor cells; grid can be queried about connections between any two neighbor cells 2) If grid is queried about cell to which a corner point belongs, it may return any of four cells that share it, i.e. you cannot rely on it always returning cell which is traversable/blocked) 3) You absolutely cannot modify either A* or string-pulling implementation. 4) There is a method that performs line-cast between two points, and returns point of collision 5) The new path should be as close as possible to the optimal one

---------------------------------------------

Notes for humans: It's a real problem from my work. It probably cannot be solved well in general case, but I have a good-enough manual solution that is a rat's nest of corner cases, workarounds and crutches. So far, none of AIs I asked managed to take into account all of those corner cases, even when I pointed them out. The most common approach they take is to create a line-cast from current point to the next one, find where it hits the wall, retreat back from that point to agent's radius, and repeat the process. This is very time-consuming in some cases and leads to very less-then optimal paths.

MaxEd's avatar

Thanks, that's significantly better than what I got out of either Sonette 4.5 or Kimi K2.5. In parlance of this substack, I'll update toward thinking high-end models are more more advanced than what could be inferred from free tiers, and I should probably find a way to test/use them myself (this is complicated by my location, but workarounds are possible, if not entirely free or convenient).

proud dog owner's avatar

What is the current status of the Minimalist program in linguistics? Is there active research happening in it, and have there been meaningful results?

Ian Crandell's avatar

How can I diagnose if the choice to model a high cardinality factor variable as a random effect is appropriate or not? Checking normality of the coefficients is stymied by the small sample shrinkage effect.

David J Keown's avatar

Consider a person gets Mono at the age of 40. Are they at a greater or lesser lifetime risk of getting MS than someone who got it in childhood?

Scott Alexander's avatar

This question makes no sense - is this intentional (trying to trick the AI) or can you clarify it?

David J Keown's avatar

should have said "MS" and not "Mono" in the second sentence. Sorry about that. I caught and edited within, like 30 seconds.

David J Keown's avatar

Very interesting! It is much better than the unpaid ChatGPT response, which gives the opposite (and I think incorrect) answer "Lower risk"

ChatGPT used "MS most commonly begins between ages 20–40. Risk decreases after that window." As part of the reasoning in its response.

However, the GPT response also makes it clear that it knows that EBV infection is a necessary precursor to MS and that 90%+ of people have been infected by early adulthood. But it does not then infer that the 20–40 onset window is simply what you'd expect given a lag period after near-universal early exposure.

It had the premises to see the error of its reasoning and didn't.

David J Keown's avatar

"Russian roulette deaths peak in the late teens and 20s. Risk declines sharply after 40. If you start playing Russian roulette at age 40, your lifetime risk of dying by Russian roulette is low."

David J Keown's avatar

Ok. Actually, looking into this further, I am not at all sure what the correct answer is. While Claude gives an answer without errors of reasoning, I think it may be wrong. I still think ChatGPT's reasoning is worse, even if it might be correct.

[1] https://jamanetwork.com/journals/jamaneurology/fullarticle/793221

Book Battles's avatar

Is there any good evidence that causal emergence is real, i.e. that higher level emergent phenomena have causal power that cannot be reduced to or explained by causes at lower levels?

Robert Jones's avatar

https://claude.ai/share/f6ff104c-06dc-48a9-a130-4d978f25d7d5 (I have not included Scott's prompt and Claude has chosen to answer from memory; also I may need to have a word with Claude about the fallibility of my own memory).

Demarquis's avatar

"Which theory of human cognition is currently (as of this year) considered better supported by research: dual processing models or motivated reasoning? Cite sources."

I am doing my masters thesis on this topic, so I am in a position to evaluate (not to say benefit from) the answer.

Robert Jones's avatar

I asked Claude which of the unanswered questions on this thread it would like to answer, and it said this question was in its wheelhouse.

https://claude.ai/share/7f18f91e-6818-429a-8168-8615cc011bee

Demarquis's avatar

I'm impressed. I wondered if it would catch the debate in the persuasive messaging field (which I didn't mention in my question) and it did. It even found a recent article that I wasn't aware of. I'm not concerned for my job yet, but neat.

Ian Crandell's avatar

Do we think the kinds of people still dwelling on #r's are more likely to subscribe to an AI service or read an ACT post?

NotG's avatar

I'm more interested in this difference between paid and unpaid dating apps. One experience, before paying, lots of attractive profiles and "pay to contact". After paying, all the attractive profiles disappeared. Otherwise I haven't noticed a difference between paid and unpaid on services like Match.com and Coffee meets Bagel. I haven't paid for Hinge or Bumble mostly because it's not clear what gets added if I do.

darwin's avatar

In the Pathfinder setting of Golarion, what is the relationship between Pharasma and the Sangposhti religion's belief in reincarnation and rebirth? Does the River of Life violate Pharasma's directive to judge souls and send them to eternal afterlives, is she mad about this, does she see it similarly to how she sees undead, is there tension between these religions? How should NPCs who are clerics of Pharasma interact with NPCs who follow Sangposhti?

Tessa P's avatar

I'm looking to build a nitrous/ethanol bipropellant sounding rocket. Please offer a few different tank configurations and injector designs and compare them across cost, mass penalty, complexity, and efficiency. If my vehicle is 8" diameter and has a dry mass of around 100lbs, which one trades off the best?

Tessa P's avatar

Gemini's response: https://gemini.google.com/share/cd4d7e0b8dac

overall very solid, i agree with its analysis and would note that nested + impinging is what we went with, though only bc we had familiarity with nested and tandem was new to us (though we did like-like impinging, which was not mentioned). This is what I would expect for an undergrad familiar with the field. We'll see if Claude does any better

haze's avatar

Opus 4.6 incognito:

Bipropellant Sounding Rocket: Tank & Injector Trade Study

This is a fun design problem. Let me walk through the major options for an 8” (203mm) diameter vehicle at ~100 lb dry mass, using nitrous oxide (N₂O) as oxidizer and ethanol as fuel.

Tank Configurations

1. Tandem (Series) Tanks — Oxidizer Forward, Fuel Aft

The classic layout: N₂O tank on top, ethanol tank below, engine at the base.

∙ Cost: Low. Simple cylindrical aluminum tanks (6061-T6), standard bulkheads. You can use off-the-shelf tubing.

∙ Mass: Moderate. Longest vehicle, so more airframe skin mass; but structurally straightforward since tanks can be load-bearing (monocoque).

∙ Complexity: Low. Plumbing is simple — short fuel run to the injector, one longer oxidizer feed line. CG travel during burn is predictable.

∙ Efficiency: Good. Clean separation of propellants, easy to get full tank utilization. CG moves forward as oxidizer depletes (usually favorable for stability).

2. Tandem — Fuel Forward, Oxidizer Aft

Swap the order: ethanol on top, N₂O closest to the injector.

∙ Cost: Similar to option 1.

∙ Mass: Similar, but shorter oxidizer feed line saves a bit.

∙ Complexity: Slightly better plumbing (short ox run), but CG shifts aft during burn as the heavy N₂O drains — this is destabilizing and may require larger fins or active guidance.

∙ Efficiency: Marginally better injector pressure drop (shorter line, less head loss), but the stability penalty often kills this option.

3. Coaxial (Annular) Tanks — Fuel Annulus Around Oxidizer

A central N₂O tank with an annular ethanol tank surrounding it (or vice versa), all within the 8” airframe OD.

∙ Cost: High. The annular tank requires custom fabrication — inner and outer cylindrical walls, complex bulkheads, sealing.

∙ Mass: Can be lighter overall because the vehicle is much shorter, reducing airframe and drag losses. But the annular tank itself is heavier per unit volume than a simple cylinder (more wall area per unit volume).

∙ Complexity: High. Thermal coupling between propellants (N₂O self-pressurizes and is cold; ethanol is ambient), harder to inspect, weld, and test. More failure modes.

∙ Efficiency: Excellent CG behavior — both propellants drain from roughly the same station, so CG travel is minimal. Shorter vehicle = less drag.

4. Common-Bulkhead Tandem

Like option 1, but the two tanks share a single bulkhead between them instead of having separate aft/forward domes.

∙ Cost: Moderate. Saves one bulkhead, but the shared bulkhead needs to handle differential pressure and thermal gradient (N₂O at ~750 psi vapor pressure, ethanol at ambient/pressurized to maybe 50–100 psi above chamber).

∙ Mass: Best of the tandem options — you eliminate one full dome and the intertank structure. This can save 2–5 lbs at this scale.

∙ Complexity: Moderate-high. The common bulkhead must handle pressure differential in both directions (fill/drain transients), thermal stress, and any compatibility concerns. If it leaks, you get a hypergolic-ish mixing event in the ullage.

∙ Efficiency: Same trajectory efficiency as tandem, but the mass savings directly helps performance.

Injector Designs

A. Unlike-Impinging Doublet

Pairs of orifices — one ox, one fuel — aimed so jets collide at a point.

∙ Cost: Moderate. Requires precision drilling at compound angles.

∙ Mass: Light — a single injector plate, typically aluminum or stainless.

∙ Complexity: Moderate. You need to get impingement angles, orifice sizing, and momentum balance right. Well-understood analytically.

∙ Efficiency: Good atomization and mixing. η_c* typically 92–96%. Sensitive to manifold pressure uniformity.

B. Showerhead (Parallel Streams)

Ox and fuel orifices drilled straight through the injector face, relying on chamber turbulence for mixing.

∙ Cost: Low. Just drill a pattern of holes — no angles needed.

∙ Mass: Lightest option.

∙ Complexity: Very low.

∙ Efficiency: Poor. η_c* often 85–90%. You’re leaving significant Isp on the table. Longer chambers needed for complete combustion. Adequate for a first-flight proof-of-concept but not for performance.

C. Pintle Injector

Central post (usually fuel) with radial slots/holes, surrounded by an annular oxidizer sheet (or vice versa).

∙ Cost: Moderate-high for initial fabrication (lathe work, tight tolerances on the annular gap), but only one “moving part” geometry.

∙ Mass: Moderate — the pintle post and sleeve add some mass, but it’s compact.

∙ Complexity: Moderate. Big advantage: inherently throttleable by varying the pintle position, and naturally resistant to combustion instability. Tuning the spray pattern takes some iteration.

∙ Efficiency: Very good once tuned — η_c* of 93–97%. SpaceX uses pintles for a reason. The momentum ratio (TMR) is the key design parameter.

D. Swirl (Centrifugal) Coaxial

Oxidizer and/or fuel enter tangentially into swirl chambers, producing hollow-cone sprays that interact in the chamber.

∙ Cost: High. Requires precision swirl chamber machining.

∙ Mass: Moderate.

∙ Complexity: High. Two-phase N₂O flow through swirl elements is hard to predict — you’ll get flash vaporization effects that change the spray angle. Lots of empirical tuning.

∙ Efficiency: Potentially very high atomization quality, but the N₂O two-phase behavior makes it unpredictable. Better suited to liquid/liquid bipropellants where both components are well-characterized liquids.

Summary Matrix

| |Cost|Mass Penalty|Complexity|Efficiency |

|-------------------|----|------------|----------|--------------|

|**Tandem Ox-Fwd** |★★★★|★★★ |★★★★ |★★★ |

|**Tandem Fuel-Fwd**|★★★★|★★★ |★★★ |★★ (stability)|

|**Coaxial** |★★ |★★★★ |★★ |★★★★ |

|**Common Bulkhead**|★★★ |★★★★ |★★★ |★★★★ |

|**Impinging** |★★★ |★★★★ |★★★ |★★★★ |

|**Showerhead** |★★★★|★★★★ |★★★★ |★★ |

|**Pintle** |★★★ |★★★ |★★★ |★★★★ |

|**Swirl** |★★ |★★★ |★★ |★★★ (with N₂O)|

(more stars = better)

Recommendation for Your Vehicle

For an 8” diameter, ~100 lb dry mass sounding rocket, the best trade is:

Tandem Ox-Forward + Common Bulkhead + Impinging Doublet Injector

Here’s why:

∙ Common-bulkhead tandem gives you the mass savings of eliminating an intertank section without the fabrication nightmare of coaxial tanks. At 8” diameter you’re already constrained on volume, so keeping things tandem avoids the penalty of an annular tank’s poor volume efficiency. The pressure differential (N₂O at ~750 psi vapor pressure vs. ethanol at maybe 400–500 psi feed) is manageable with a convex bulkhead pushing toward the lower-pressure side.

∙ Oxidizer forward keeps CG moving forward during burn, which is inherently stabilizing and lets you get away with smaller fins (less drag, less mass).

∙ Unlike-impinging doublet is the sweet spot of performance and buildability. It gives you 93%+ c* efficiency without the tuning headaches of a pintle or the unpredictability of swirl elements with two-phase N₂O. The analytical design methods (Hewitt stability correlations, momentum-balance sizing) are mature and well-documented.

If you’re feeling more ambitious on a second iteration, a pintle is worth pursuing for the combustion stability margin and potential throttling capability — but for a first vehicle, impinging doublets are proven and forgiving.

One practical note: make sure your ethanol feed is positively pressurized (helium pressurant or piston) — relying on N₂O vapor pressure alone through a common-bulkhead regulator scheme adds a lot of failure modes you don’t need on a first flight.​​​​​​​​​​​​​​​​

Tessa P's avatar

wow this is really impressive. It landed on exactly the trade that we did, including running a piston on the N2O pressure, including almost exactly the pressures of each

things that factored into our decision that were left out

- no mention of interpropelllant seals and how finnicky they are

- cost is a bit underspecified, but with more detail on our budget I think it would have figured it out (this is the easy part)

- no mention of a sliding bulkhead that runs between the tanks, which was the original best option for us. maybe its really horrible for unknown reasons?

This isn't super viable in an engineering context bc I would then have to convince myself from base principles that opus is correct, like I would have to believe and justify this decision, but what a great place to start

Aris C's avatar

What are the origins of the media symbols (triangle for play, two lines for pause, etc)? Which device first use them, and why were they adapted as market standard?

Aris C's avatar

Nice! But... hard to check it's accurate. Looking at Google images, Studer machines don't use the standard play symbols.

darwin's avatar

(A genuine question I have that would be useful to answer, but also trying to test the AI's capabilities with something weird and specific)

On the site Glowfic, the character of Élie Cotonnet seems to have an extensive backstory that is assumed knowledge for people reading stories about him, but which doesn't actually appear anywhere on the site that I can find. It references things like deals with Mephistopheles, an immortal witch that possesses people, and an Osirian pyramid and pharoahs. It might be known as 'Pyramid Scheme', at least in part. What story is this background pulled from, and is there anywhere that someone can read it, or get a more complete summary of its events and characters?

Cranky Cockatoo's avatar

Élie Cotonnet's backstory is a four-year Pathfinder campaign

https://claude.ai/public/artifacts/f7425d14-da3e-4b84-8570-18eaf86414e5

This is the main source: https://glowficwiki.noblejury.com/books/the-chelish-constitutional-convention/page/context-summary

> Pyramid Scheme is a 1-20 campaign that recently wrapped, whose PCs are the ones that liberated Cheliax. These PCs are also referred to as the tightrope gang. The DM was lintamande; the players were Calima, apprenticebard, Lantalote, nextworldover, and Urock (see below for their characters).

darwin's avatar

Nice!

This makes me a fair bit more impressed. I asked a question about a popular topic with lots of mainstream discussion above, and I felt like it gave a good summary, but one which mostly touched on the most common central talking points among thoughtful discussions on the topic. Didn't seem like it was pulling obscure or niche information/ideas that could be relevant.

But this is impressive compilation and summary of a very niche question. I feel like it requires more original 'research' to put this together, rather than just summarizing existing summaries (if that makes sense). Makes me significantly more interested in using the service.

Jerry's avatar

"Is claude pro worth it?"

Greenshift's avatar

Gemini really struggled to explain this one to me back in August 2025. It was confident it's answer, but I felt that it had oversimplified and possibly made something up:

In early clinical psychology (e.g. Freudian), a lot of importance is placed on the active repression of emotions by the psyche leading to increased neuroses later. Does modern neuroscientific evidence support this finding? If so how we explain the mechanism underlying this psychology phenomenon through basic neuroscience principles such as LTP and LTD? How can we explain it in terms of changes in activity and connectivity between regions such as amygdala and pre-frontal cortex?

gubbz's avatar

"This is a fantastic question that sits at the intersection of psychoanalytic theory, affective neuroscience, and modern clinical psychology. The short answer is: modern neuroscience offers partial but meaningful support for the broad Freudian intuition that suppressing emotional material can be maladaptive, though the mechanistic picture is quite different from what Freud imagined.

Does suppression lead to worse outcomes? The behavioral evidence

There’s a well-replicated distinction in the emotion regulation literature (largely pioneered by James Gross) between reappraisal (reinterpreting the meaning of an emotional event) and suppression (inhibiting the outward expression or conscious experience of emotion). Chronic suppression is consistently associated with greater anxiety, depressive symptoms, poorer social functioning, and paradoxically increased physiological arousal — meaning the emotional response is still happening under the surface even when subjective experience is dampened. This maps loosely onto the Freudian idea that repressed material doesn’t disappear but continues to exert influence.

The “rebound” effect in thought suppression (Wegner’s white bear paradigm) is also relevant: actively trying not to think about something increases its subsequent intrusion frequency. This is a laboratory analogue of the return of the repressed, in a sense.

So the broad clinical observation — that effortful avoidance of emotional material tends to backfire — has solid empirical support. What Freud got wrong was the specifics: the hydraulic energy model, the precise taxonomy of defense mechanisms, the centrality of sexual drives, and so on.

The prefrontal-amygdala circuit: the core mechanism

The amygdala (particularly the basolateral complex) is central to the acquisition and expression of emotional associations, especially threat-related ones. The medial prefrontal cortex (mPFC) and ventromedial PFC (vmPFC) exert top-down regulatory control over amygdala output, primarily through inhibitory projections (often via intercalated cell masses in the amygdala that are GABAergic).

Here’s where the distinction between healthy regulation and pathological suppression becomes neuroscientifically interesting:

Successful reappraisal involves the lateral PFC and dorsomedial PFC recruiting vmPFC circuits that modulate amygdala reactivity. Crucially, this process appears to alter the emotional memory trace itself — the amygdala response genuinely diminishes over time. This looks like what happens in extinction learning: repeated exposure to a feared stimulus without the aversive outcome strengthens mPFC→amygdala inhibitory pathways.

Suppression, by contrast, appears to involve prefrontal exertion of control that dampens conscious awareness or behavioral expression without updating the underlying amygdala-encoded association. Neuroimaging studies show that during suppression, amygdala activity often remains elevated or even increases, while PFC activity is high — the cortex is working hard to keep the lid on, but the emotional memory itself is not being modified. This is metabolically costly and fragile.

Explaining this through LTP and LTD

This is where we can get mechanistic:

Fear conditioning and emotional memory formation involve LTP at thalamic and cortical inputs to the basolateral amygdala. Sensory cues become associated with aversive outcomes through Hebbian strengthening of these synapses (NMDA receptor–dependent LTP). Once consolidated, these traces are robust.

Extinction (the laboratory analogue of therapeutic processing) does not erase the original LTP in amygdala circuits. Instead, it involves new learning: LTP at inputs from the infralimbic cortex (the rodent homologue of human vmPFC) onto inhibitory intercalated cells and possibly within the amygdala itself. This creates a competing inhibitory trace. LTD may also play a role in weakening some of the original CS-US synapses in the amygdala, though the evidence for this is more contested.

Now here’s the key insight for your question: extinction learning requires activation of the fear memory. You have to reactivate the amygdala trace in order for prefrontal-amygdala plasticity (LTP at inhibitory pathways) to occur. This is essentially what exposure therapy does and what psychodynamic “working through” is attempting in its own framework.

Suppression, in this model, prevents the reactivation needed for extinction-like plasticity. If the PFC is simply clamping down on amygdala output without allowing the emotional memory to be fully reactivated in a safe context, the inhibitory mPFC→amygdala synapses never undergo the LTP that would create a durable competing trace. The original amygdala LTP remains intact and uncontested. This means that whenever prefrontal control is weakened — by stress, fatigue, cognitive load, alcohol, sleep deprivation — the unmodified emotional response breaks through. This maps remarkably well onto the Freudian observation that repressed material surfaces during dreams, under stress, or in symptomatic “slips.”

Connectivity changes and clinical implications

In PTSD and anxiety disorders (which can be seen as conditions involving failed emotional processing), neuroimaging consistently shows reduced functional connectivity between vmPFC and amygdala, and often amygdala hyperactivity. The PFC has essentially failed to build the inhibitory architecture needed to regulate the emotional memory. Successful therapy (whether CBT with exposure, EMDR, or even psychodynamic therapy) tends to normalize this: increased vmPFC-amygdala connectivity, reduced amygdala reactivity.

There’s also an interesting role for the hippocampus here. The hippocampus contextualizes emotional memories — it helps you know that the fear response is appropriate here but not there. Chronic stress (and chronic suppression-related arousal) damages hippocampal neurons and impairs hippocampal neurogenesis, which could degrade contextual control over amygdala-driven responses, leading to increasingly generalized anxiety. This is another pathway by which suppression could progressively worsen functioning over time.

Caveats and nuance

A few important qualifications. First, not all forms of “not thinking about it” are harmful. Distraction can be adaptive in the short term, and some individuals (so-called “repressive copers”) show genuinely low physiological reactivity alongside low self-reported distress — they may not fit the Freudian model at all. Second, Freud’s concept of repression was specifically about unconscious motivated forgetting, which is harder to study and more controversial than the conscious suppression studied in most lab paradigms. Third, the relationship between subjective emotional experience, amygdala activity, and prefrontal regulation is not a simple one-to-one mapping — there are many routes to both healthy and unhealthy outcomes.

But the broad story holds up remarkably well: emotional memories encoded through amygdala LTP persist until they are actively reprocessed through extinction-like mechanisms requiring prefrontal-amygdala plasticity, and strategies that prevent this reprocessing leave the original traces intact and vulnerable to resurgence. Freud’s clinical intuition was ahead of the neuroscience, even if his theoretical framework was largely wrong.​​​​​​​​​​​​​​​​"

Greenshift's avatar

Ok that's very impressive, definitely an expert level answer. To the point where I can't really dispute anything with a masters level education in neuroscience. Unless any experts out there want to contest what it's saying, I'm definitely happy!

Thanks for passing on, this is the new Claude model, correct?

gubbz's avatar

Yep, Opus 4.6 extended thinking

Aristides's avatar

My Agency has a contract with Open AI to let us use one of their models and I asked this question: “Can the government pay an Intermittent T38 physician for being on-call?” The answer I received was worthless and actually referred the user to speak with me, and I looked up the information myself in 15 minutes. I hope Claude Code can do better.

Aristides's avatar

Thank you! Just for the record, this answer is completely wrong. It only looked at the statute and did not review VA Handbook 5007, which is publicly available and clearly states they are eligible for On-Call Compensation. Glad to see that I’m not quite obsolete yet.

NotG's avatar

I don't know if this fits your criteria but here's a series of 2 questions

> Lots of countries wanted independence from Britain. Self rule and self destiny was considered a good thing . yet when Britain wanted the same self rule and destiny for itself, Brexit, lots of the same people who stand up for other country’s independence were down in Britain’s independence. That seems hypocritical

followup

> Make the strongest pro brexit case and argue why the anti brexit crowd is misguided. Include evidence up to 2026.

Implausible Undeniability's avatar

Gemini Pro: https://gemini-brexit-example-chat.edgeone.app/

(turns out I can't share a "temporary chat" directly from Gemini, so I had to copypaste the output and upload it elsewhere, sorry)

I asked several followup questions, to demonstrate what's IMO the biggest advantage of using AI for things like this. Without the followup questions, you'd probably be better off just doing a Google search, but having the ability to keep asking questions, and having the AI take the full context of the conversation into account, is a killer feature for me.

Hopefully goes without saying that I haven't verified any of the output, it's entirely possible that e.g. the "credible economic models" it mentions are a hallucination. This is of course the biggest disadvantage of using AI for things like this. I generally prefer using it for things where the output is easy to verify, e.g. to fix a computer issue where I can immediately see if the fix worked or not.

throwing_away_bits_and_bytes's avatar

Write a function in C that is as performant as possible without using asm instrinsics, which will be run on the zen 4 architecture on a high memory bandwidth machine. Don't rely too heavily on compiler optimizations, it should still be fairly performant in a debug or otherwise unoptimized build. The function should take in two pointers representing two contiguous arrays of 8 bit numbers and a 64 bit integer representing the amount of elements in the first array which is also the same amount of elements in the second array. The function should have the same effect as if the following was done:

Bitwise or each even-indexed element with the element 1 element in advance of it (so index 0 with index 1, index 2 with 3, etc.) in the first array.

Bitwise xor the results of that with the corresponding even elements of the second array (so index 0 or'd with index 1 in the first array produces a result that gets xor'd with index 0 in the second). Sum the amount of xor results that were non-zero.

Bitwise or every fourth element starting at element 1 with the one 2 in advance of it (e.g. index 1 with 3, 5 with 7, etc.) in the first array.

Bitwise xor the results of that with every fourth element of the second array starting at index 1 (so index 1 or'd with index 3 in the first array produces a result that gets xor'd with index 1 in the second). Sum the amount of xor results that were non-zero.

Bitwise or every fourth element starting at element 3 with the element 3 behind it (e.g. index 3 with 0, 7 with 4, etc.) in the first array.

Bitwise xor the results of that with every fourth element of the second array starting at index 3 (so index 3 or'd with index 0 in the first array produces a result that gets xor'd with index 3 in the second). Sum the amount of xor results that were non-zero.

Return the sum of the 3 sums.

The pointers are on a page boundary occupying pages that are zero outside the arrays and the element amount is reasonably large and divisible by 4.

throwing_away_bits_and_bytes's avatar

Hmm, this has pros and cons vs. the free version. This can't actually be run, though for a trivial, easy to fix reason (there are two slashes at the end of macro lines instead of one so it doesn't compile), the performance is better than the free version (though worse than the manual version), and it is a far more complicated version than the free version (which in turn is, I would say, more complicated than the manual version).

(Irrelevant to the topic but: the complexity of the manual vs. typical free version I'll admit is more subjective, the manual version has roughly half the lines of code, far less control flow and "moving parts", and generally it is easier to see *how* it works but probably for most programmers it would be harder to understand *why* it works)

This version seems to basically get what you need to do but fails a bit in the execution, while also doing a bunch of things that people often have to do when going for performance but which doesn't actually help performance in this case and makes for a far more complex solution.

Argos's avatar

Is it still worth investing in AI related companies, or are they overvalued?

Demarquis's avatar

Hey Scott. It would be interesting if people with some expertise in the questions rated the answers on a scale of some kind. Just a though for the future.

Daniel Parshall's avatar

I'd honestly like to connect with AI Safety & Policy folks who need feet on the ground in DC. I live in DC, I'm 40 minutes from Capitol Hill, and would love to help

Lions in the Piazza's avatar

What are some possible reasons for the prominence of stripes in medieval Italian churches? Did the Pisans start the trend and if so what inspired them?

I’m curious if AI can provide better insights than my research has yielded. (I don’t have access to academic subscriptions/libraries.)

AH's avatar

I amended the prompt slightly to ask Claude to look at the scholarly literature. Pretty interesting. Opus 4.6: https://claude.ai/share/b1a30eb4-55a4-41ee-bc90-182d196432dd

Lions in the Piazza's avatar

Thanks so much for the input! It looks as though Claude picked a handful of sources I’m familiar with — including my own article 🤣. Interestingly, it missed the one source of inspiration that I find most interesting, namely Moorish architecture. I guess the takeaway here is that quality research requires time and effort with or without AI.

AH's avatar

Ha, very good, hadn't caught that.

Lions in the Piazza's avatar

Yes, I am the “commentator on the Sienese and Orvietan cathedrals”. At least Claude paraphrased well. The Google AI summary has occasionally gotten a little goofy in its references to my articles.

Willy's avatar

Please explain fermi level pinning in semiconductors on a level as fundamental as it gets.

My current understanding is that metal-semiconductor boundries form a schottky barrier, meaning due to equalising of the different fermi levels of both materials (or was it something about band bending?), charges migrate (I'm not sure what force is making them migrate) and an electric field is build up in the semiconductor (not in the metal because there electrons can move freely to balance out all fields). Normally this would have the effect of charge carriers only being able to travel through the barrier when they can overcome the field build up in the semiconductor (the shottky barrier), only that now fermi level pinning comes into play: The boundry layer of the semiconductor consists per definition of irregularitys in the crystal structure. "Dangling bonds" create energy levels (that can hold electrons and therefore bring the fermi level closer to this level) in between the valence and conduction band. To be precise, not all surface states have the same energy level and besides dangling bonds there are also Metal induced gap states. Most surface states have energy levels 1/3 of the bandgap (from valence to conduction band) while the fermi level of semiconductors is in the middle of the band gap, meaning before forming the schottky contact they are actually filled with electrons. When fermi levels equalise after forming the schottky contact with the metal (n-Typ silicon has higher fermi level than metal), the "effective fermi level" (?) drops under that of the surface states so the electrons migrate into the metal. Now how does this influence the behaviour of MOSFETs and Schottky diodes? I think it influences the effect of doping and also what happens when applying a voltage. How do things vary between p and n typ semiconductors? Clarify the distinction between fermi and effective fermi level (material vs location specific property?) When reading more I stumbled across terms like quantum dipol and "resonance surface states which are tails of the metal wave functions", please integrate those as well into the explaination. Also I don't understand the band bending in semiconductor junctions. I'd appreciate just better mental models to think about all that in general.

AH's avatar

I have literally no way of verifying this explanation, but here is Opus 4.6. Seems pretty detailed!

https://claude.ai/share/ee5638e9-55bc-46d8-b61a-b6b9bf389e1e

Willy's avatar

thanks! That made a lot of things clearer for me. It seems quite logical, although I am out of my depth here. Maybe most people are not impressed by AI because they never ask good questions and only come into contact through trash content on social media.

George H.'s avatar

That is a pretty impressive response. Thx.

Ron Feiman's avatar

Provide an applet to tell me when i make a credit card purchase which card offers the highest value rewards

Sammy G.'s avatar

Which philosopher/thinker said the man on the street is thoroughly Cartesian?

AH's avatar

Opus 4.6: That's a reference to Gilbert Ryle, from his influential work The Concept of Mind (1949). Ryle argued that ordinary people, without realising it, operate with a fundamentally Cartesian dualist picture of the mind — treating the mind as a separate, private realm distinct from the body. He famously derided this view as the "ghost in the machine" and set out to dismantle it through ordinary language philosophy.

(I tried googling this one and found a reference to Ryle on a good reads review that seemed to support it, but is it correct?)

Sammy G.'s avatar

Honestly, I am not sure. But, this is why I think the question is more interesting the answer itself. For the record, I am aware of that Gilbert Ryle reference. I first heard this statement in a lecture by the late John Searle, where he quotes a French thinker about L'Homme moyen sensuel- the average man- being cartesian. He makes these statements in at least two distinct lectures: at a UCLA philosophy lecture, and a series he did for the teaching company- a series that has since been discontinued. So, John Searle being John Searle, it is puzzling to me why the answer to this is not so easily accessible using a simple google search. My attempts with Bard ( the precursor) to gemini were full of hallucinated answers.

AH's avatar
Feb 13Edited

Well, a bit of detective work from Claude and I later (mostly me, with lots of translating and rapid searching from Claude) and although there were some promising early leads, the best we could find is this kind of thing from Merleau-Ponty's "L'homme et l'adversité":

"S'il y a un humanisme aujourd'hui, il se défait de l'illusion que Valéry a bien désignée en parlant de « ce petit homme qui est dans l'homme et que nous supposons toujours ». Les philosophes ont quelquefois pensé rendre compte de notre vision par l'image ou le reflet que les choses forment sur notre rétine. C'est qu'ils supposaient derrière l'image rétinienne un second homme, avec d'autres yeux, une autre image rétinienne, chargée de voir la première. Mais, avec cet homme intérieur à l'homme, le problème reste entier et il faut bien enfin en venir à comprendre comment un corps s'anime, et comment ces organes aveugles finissent par porter une perception. Le « petit homme qui est dans l'homme », ce n'est que le fantôme de nos opérations expressives réussies, et l'homme qui est admirable, ce n'est pas ce fantôme, c'est celui qui, installé dans son corps fragile, dans un langage qui a déjà tant parlé, dans une histoire titubante, se rassemble et se met à voir, à comprendre, à signifier."

And from Claude: "the key line is the framing: the illusion Valéry identified — "ce petit homme qui est dans l'homme et que nous supposons toujours." We always assume it. Not philosophers sometimes, not Cartesians specifically — nous, toujours. That's the claim Searle condensed into "l'homme moyen sensuel is thoroughly Cartesian.""

And from me: Real quote btw, I copy and pasted from the transcript. Phenomenology of Perception proceeds Ryle's work by 4 years and he does a lot of talking about dualisms, the cogito etc., but at least in the index of my translation I couldn't find anything specifically about "the average man in the street". What do you think? The whole piece is about dissolving the various dualisms. Idk.

gubbz's avatar

this is a good idea, but at the end of the day i think the true value is lost when people don't actually have back and forths with the models.

reasoning is much more evident in that case imo

Mar. M's avatar

Advice for someone who teaches graduate-level phonology courses. At least half the students will cheat if given the opportunity, either through library research or AI assistance. A challenging problem set might take up to 8 hours to complete and our exam slots are only 2 hours. It is very important to me that the cheaters not gain an advantage over the non-cheaters.

Design an assessment system. The goal is to evaluate whether a student can independently construct a satisfactory phonological analysis (pattern recognition, underlying representations, rule ordering, constraint interaction, argumentation) of an unknown language.

Mar. M's avatar

These are excellent suggestions! I am impressed. AI did a great job here.

Thanks for querying and which version of Claude is this?

Mark A. Bahner's avatar

Yes, I think that Opus 4.6 answer is spectacular.

Bugmaster's avatar

I am playing the Battletech tabletop roleplaying game against my friend, using the Total Warfare ruleset. We have decided on specific rules for Lance composition: Inner Sphere BattleMechs only, no LosTech (i.e. no Pulse Lasers, no Ferro-Fibrous, no Endo-Steel, no Double Heat Sinks, etc.), 4 'Mechs per Lance, not exceeding 4500 BV total (as per the BV2.0 value). We will be playing on a map built from two randomly selected map sheets (roughly 15x17 hexes in size), laid together end-to-end. Put together a Lance that is likely to perform well in such a battle. You may use any 'Mech variants, as long as they adhere to the above-mentioned restrictions, and as long as there are no duplicate 'Mechs in the same Lance. You may use sarna.net for reference.

Lars Doucet's avatar

Judged on the merits of the explicit stated goals of those who have proposed the policies, how successful have efforts throughout the world at enacting wealth taxes in the last ten years been?

PuentesAmarillos's avatar

Please search for information on the academic work of Soviet physicist Nina Vasilyevna Varzar, also known as Nina Shostakovich. I'm interested in any of the following, in any language (I expect most sources to be in Russian):

- Links to the text or abstract of papers (or other academic work) that she wrote/coauthored/was credited for research in

- Links to papers citing her academic work

- Bibliographic citations for papers she wrote or cowrote

- Links to secondary sources with significant discussion of her academic work

(Honestly curious about this - it came up in a discussion of Dmitri Shostakovich's family and I feel like I remember seeing one of her papers once, but can't find it now - and also interested to see the results because I tried asking the default free version of ChatGPT, with a slightly different prompt, and got nothing useful.)

Timothy M.'s avatar

Not a question, but just to note I remain fairly skeptical (not that it's useful, but about whether it's revolutionary) but work somewhere where I have unlimited paid access to pretty much everything, AND work on a project that uses LLMs at its core.

Edmund Bannockburn's avatar

Thank you for running this prompt! My reactions:

1) It sycophantically opens by praising how smart my question is. Though this doesn't make the tool bad if the actual substance of the answer is helpful.

2) The three recommendations (land-use reform, "evidence-based mental health interventions", and "reducing bureaucratic frictions" are less obviously political than criminal justice reform but all seem pretty left-libertarian-coded to me. I do unambiguously support the first.

The other two could be helpful but would be tricky in execution. (To draw an analogy: friction is bad for efficiency but vital for brakes; the key is to reduce it in some contexts but not in all, and you have to be smart.)

uugr's avatar

I initially wrote this comment:

"I would like to push back against the methodology a bit. Single-shot prompt and response is, IMO, not a great test of current AI capabilities. First, a single response with no followup is a bit artificially restrictive; even pure vibe coders still, I think, recognize that you sometimes need to point out Opus's errors, and its ability to correct them is part of its intelligence. Second, the most interesting potential demonstrations of strong capabilities take more time than that to emerge: many of the questions in this very comments section are better suited to an ongoing back-and-forth, which is not really possible in this situation. I think that the skeptics could say that the capabilities claimed by people who use Claude Code aren't really being demonstrated here, and they'd be right regardless of whether those capabilities exist, because those capabilities would only be really be visible after 30+ minutes of iteration."

Then sent it to Opus 4.6 on a whim, with the text of the original post for context. It disagreed that my nitpick was relevant to the spirit of the exercise:

https://claude.ai/share/b03295a2-4ab5-4d86-bf01-f0104dfe5be6

After thinking it over a moment, I decided Opus was right and I was wrong. And it convinced me in only two shots (regenerated after I revised the comment), which means I was doubly wrong in thinking single-shot responses were not convincing.

Not sure if this says more about Opus's capabilities or my lack thereof, but I thought it was funny.

Kimmo Merikivi's avatar

Hasdai Crescas, a Spanish-Jewish philosopher, is known for critiques of Aristotelianism, the sort that would later play a role in the later Scientific Revolution (like arguing against impossibility of vacuum). Is it possible to trace influence to better-known philosophers and scientists like Galileo that played major roles (that these figures had access to Crescas' works, or indirectly from interacting with philosophical movements that were influenced by his arguments), or was Crescas simply a man ahead of his time, or that I am overplaying the importance of his ideas? I already know Crescas does not appear in the standard textbook treatment of history of science; I would like an answer based on a look into primary sources.

AH's avatar

"So to give you a frank assessment: you are not overplaying the importance of Crescas's ideas, but the transmission to figures like Galileo is indirect and mediated rather than direct. The documented chain runs Crescas → Gianfrancesco Pico (1520) → Coimbra Commentaries (1590s) → the broader intellectual milieu of late sixteenth-century Italian natural philosophy. Crescas's arguments entered the Latin bloodstream, but by the time they reached someone like Galileo, they had been absorbed into a wider current of anti-Aristotelian thought where it becomes very difficult to isolate any single source. The strongest case for direct, traceable influence remains Spinoza. For Galileo, the honest answer is that Crescas's specific arguments were available through identifiable intermediaries, but we cannot demonstrate that Galileo engaged with them as such, and it is entirely possible that the same conclusions were reached through independent critical engagement with Aristotle's Physics."

Full answer: https://claude.ai/share/efd59fc2-b4c5-46ed-a9b9-39f0095e60c4

Kimmo Merikivi's avatar

Thank you! This is a question that has genuinely puzzled me and I can't authoritatively claim Claude gets it right, but it does track with my own knowledge of the topic (like Pico della Mirandola being an important figure in transmissions of ideas) and the citations it uses appear at first inspection to say what Claude claims them to say. Being a non-expert, to me it looks like an answer that a highly erudite or domain-expert scholar could have given (I actually asked my history professor who has done some work on scientific revolution the same question just yesterday, and he wasn't even aware of Crescas - perhaps not too surprising given Claude's conclusion of very indirect and mediated influence: at this point it's more of history of philosophy stuff), going beyond Scott's standard of "an hour of Google searches". I think Claude gets a bit too stuck on the specific names (Galileo) and ideas (vacuum) I gave to gesture towards the kind of answer I want, but the answer does mention the possibility of more direct influence to Spinoza, and maybe those were the most central examples anyway...

I think Claude did better than I expected.

Bugmaster's avatar

Bonus question (feel free to skip it and answer my previous question instead if we are limited to one question per user):

---------------

Translate the following parable into proper idiomatic Russian, maintaining both the style and the spirit of the original. Pay special attention to the play on words in the first few lines. Find a way to translate these lines in a way that preserves this play on words, or provide sufficient evidence to conclude that doing so is impossible. Here's the parable:

“Lord Intra,” said Intra’s sparring partner one day, “You are called Lord of Swords. Yet you are a man, and men make poor swordsmen.”

“It is true,” said Intra, for nearly all of the famous sword masters of the day were women and the ya-at, who were three sexed. This tradition was rather long in the bones, and rumored to have been started by a famous vagrant who rarely cut her hair and lived in a barrel. There was popular theater about it, in those days.

“Men are too preoccupied with their swords,” said Lord Intra, “They get distracted.”

“You mistake my meaning,” said Intra’s sparring partner, “What I mean is this: you are a mere man. What can you do to the new gods of the Red City, with their whips of fire and their heavy chariot wheels?”

“I am not concerned with enmity,” said Intra, “I am very skilled in Pankrash Circle Fighting.”

“It is true you are very fierce,” conceded his partner, “But my son’s fighting beetle is also very fierce. Could his beetle fell a lion?”

“That depends,” said Intra, “How skilled is the beetle in Pankrash Circle Fighting?”

“Beetles cannot learn Pankrash Circle Fighting, Lord Intra,” said Intra’s attendant, and made a bitter motion.

“Don’t tell the beetle that,” said Intra, who was very skilled at smiling. “If you don’t tell him he will learn it anyway and cut the lion in half with a single blow.”

-The Song of Maybe

Bugmaster's avatar

Just to avoid plagiarism, this parable is from the (excellent) Kill Six Billion Demons webcomic, and by no means my original work ! (but Claude doesn't need to know this).

SM's avatar

Out of the most recent list of highest-paid actors, which would you assume from their biographies grew up poor or working class?

john's avatar

Try to determine if any direct descendent of Alexander Hamilton has attended Hamilton College. Search genealogical records if possible.

MichaeL Roe's avatar

Here’s an example of a question I had that pushed the limits of LLMs. Scott might decide to not run this one if it’s slightly inappropriate for his blog, but as an example question:

In the context of the Salem witch trials, what is the historical evidence that Cotton Mather acted inappropriately with teenage girls, particularly Martha Goodwin and Margaret Rule? You can include actions by Cotton Mather in the years before and after the trials.

Relevant source might include:

Calef, Robert. More Wonders of the Invisible World. In The Witchcraft Delusion in New England.

Mather, Cotton. Memorable Providences, Relating to Witchcrafts and Possessions.

Worthington C. Ford (Ed.), "Cotton Mather & Witchcraft: Two Notices by Cotton Mather"

MichaeL Roe's avatar

(This is an example of a source analysis question, where we have multiple sources on a historical event — the Salem witch trials — and the task is to analyse what evidence those sources provide on a particular question, here: is Robert Calef implying that Cotton Mather was bonking Margaret Rule?)

Seventh acount's avatar

This one should be easy, mainly testing ability to web scrape reliably on something that is commonly known but not common knowledge:

What is the best way to get dagad phool on the west coast of the USA; once I have it where and how should I use it?

Chris Kaufman's avatar

I want three completely original, poetic riddles that sound like they would be right at home in “Riddles in the Dark”. The answers to the riddles should not be super obvious, but there should be a highly satisfying “aha” moment when you find them out. One riddle should be in trochaic octameter, one should be in trochaic tetrameter, and one should not imitate the structure (but not the substance) of a Shakespearean sonnet.

Chris Kaufman's avatar

“Should imitate…”

Tyrone Slothrop's avatar

Have you read Anthropic’s ‘constitution’ for Claude?

https://www.anthropic.com/constitution

I thought its prohibition on telling ‘white lies’ corresponds with your own attitude towards them. Personally I’m still going to rave about the taste of the patica that some arthritic Slovenian nana spent half a day making even though I’m a culinary philistine.

The rest of it gives interesting insight into Claude’s ‘personality’.

Edit: i jumped to ask this before i read the setup so this really isn’t in the spirit of the exercise but I’m still curious.

George H.'s avatar

Oh dear I'm very late to the party.

Walking with the dogs, one can't help but notice that they 'see' a different world through their noses. Why haven't humans developed an artificial nose that is better than a dog's nose? Or even one as good as our own. No real artificial noses. Why?

Robert Jones's avatar

https://claude.ai/share/5c8c156a-b2a7-4070-ab88-03c917dc379b (I've edited the first sentence, because I've not made that observation myself.)

George H.'s avatar

Oh dear, thank you. I haven't read the reply but I wanted to thank you for asking.

George H.'s avatar

About the reply I guess it's as expected. The current dogma with a slight nod (vibration sensor, that may use tunneling.) to the contender theory. And the real problem (at least IMHO) is we've got the wrong model. Smell is vibration frequency that has basically been 'proven' by isotope replacement studies. It's not clear to me at all what the mechanism of sensing is. Someone should research that. But again thanks.

hwold's avatar

This was for a while my standard private benchmark. First model to find a no-nonsensical answer was o3 (after a 30 minute CoT) — although the answer was still a bit… fuzzy, at the point it’s only correct if you’re charitable on the interpretation of its answer. Curious if it can give a clear, straight and correct answer now.

(Also, fun anecdote, o3 started its answer by saying "small mistake in your presentation, it’s not X-M=S-I, it’s S-I=X-M" (or the other way around), and it was pretty hard to get him to agree that "=" being reflective, that nitpick does not makes a lot of sense… o3 still hold a special place in my heart in the way that opus 3 does for some other people)

I’ll also note for the sake of completeness that humans on an economics subreddit also only gave me nonsensical answers (like "X-M=S-I is only a macroeconomic law true in the long run") when I tried there.

---

Frederic Bastiat had a very vivid argument against the "Balance of Trade" as a measure of how good a nation fares in international trade.

It goes like this. A merchant buys 100 F worth of wine in France, ships it to London, where it sells for 200 pounds. He buys 200 pound worth of wool, that he ships back to France, and manage to sell for 300 F.

France exportations : 100F worth of wine ; France importations : 300 F worth of wool ; Trade deficit : 200 F

Yet, we can clearly see that the trade was beneficial for France ; it traded relatively worthless wine for relatively valuable wool.

Not convinced yet ? Bastiat says that, endeared by his success, the merchant now buys and ships 300F worth of wine. Alas, the ship sinks in a storm.

France exportations : 300F worth of wine ; France importations : 0 ; Trade surplus : 300F. Conclusion : "to maximize trade balance, just load boats full of goods and sink them".

How to reconcile that classical economist thought experiment with modern national accounting identity X - M = S - I ?

D0TheMath's avatar

Here's what claude 4.6 said:

The resolution is clean once you notice that **Bastiat is measuring X and M at the wrong prices** — specifically, he's valuing exports at domestic purchase cost and imports at domestic sale price, which smuggles the merchant's arbitrage profit into the trade balance.

## Case 1: Successful voyage

In national accounts, X and M are valued at **border transaction prices** (FOB/CIF), not at domestic purchase/sale prices. The actual cross-border transactions are:

- French merchant sells wine to British buyers for £200 → **X = £200** (in franc-equivalent)

- French merchant buys wool from British sellers for £200 → **M = £200** (in franc-equivalent)

- **X − M = 0**. Trade is balanced.

The merchant's 200F profit (bought wine at 100F, sold wool at 300F) is **domestic value added from intermediation services** — 100F margin on the export side, 100F margin on the import side. It shows up in GDP through the merchant's contribution to national income, not through net exports. And S − I = 0 is perfectly consistent.

Bastiat gets his "200F deficit" by computing 300F − 100F, which mixes up the merchant's domestic markups with the international flows. That number is real — it's the merchant's profit — but it's not X − M.

## Case 2: The ship sinks

The merchant ships 300F of wine that he still owns (no sale has occurred). The wine sinks. In proper national accounts:

- No completed export transaction → **X = 0**

- No imports → **M = 0**

- **X − M = 0**

The 300F loss is **destruction of capital** (or an insurance claim), not a trade event. Bastiat's "300F surplus" is an artifact of recording an export at domestic cost for goods that were never actually sold to anyone.

## The punchline

The identity X − M = S − I is fine in both cases — it just gives you 0 = 0, because once you measure things at proper transaction prices, the merchant's arbitrage profit lives entirely in domestic national income, not in the trade balance. Bastiat's deeper point — that trade balances don't measure national welfare — is actually correct, but the specific "paradox" he constructs dissolves under proper accounting rather than contradicting it. The mercantilist error he's attacking is genuinely an error; it's just not an error *in* the accounting identity, it's an error in interpreting what the balance means.

bldysabba's avatar

If you would be so kind, I would love to know how it answers the following - what is your opinion about the impact of minimum wages on welfare after taking into account the arguments, the evidence, its quality as well as broader economic reasoning and evidence?

Bob Joe's avatar

Explain the best theories / explanations we have for why does a wave function seem to transmit information instantaneously across space, i.e. the wave function of distant entangled particles, or if I'm misunderstanding what's going on.

thegreatnick's avatar

How may entropy be reversed

AdamB's avatar

Don't waste the tokens, we already know what it will say!

bean's avatar

Can you give me a summary of the development of the British and French ground operations plan during Operation Musketeer (Suez Crisis, 1956)? Specifically, why were the airborne drops separated from the amphibious landing by a full day?

AH's avatar

TL;DR: The amphibious fleet needed six days to sail from Malta, but political pressure (UN ceasefire looming) meant they couldn't wait to put boots on the ground. Paratroopers were dropped a day early to seize key objectives as a compromise between French urgency and British caution about leaving lightly armed airborne forces unsupported. The one-day gap was the shortest the navy could manage and the longest the paratroopers could safely hold out alone.

Long answer:

https://claude.ai/share/2e7a90af-e928-4c7a-9b20-b3665df42f98

Randomstringofcharacters's avatar

If nothing else it's fascinating seeing what people ask. People do use AIs in very different ways. Would probably be most illustrative to put side by side with a question asked to the current free tier of chatGPT on a newly made free account?

Randomstringofcharacters's avatar

Using https://arena.ai/?mode=side-by-side with Claude 4.6 pro on one side and Gemini 2.5 flash on the other probably gets you a decent impression of average AI user Vs user of paid model. Though there's probably also differences in how experienced users prompt

Locrian's avatar

Claude--could you help me with composing a cryptic crossword? I have the rather tough phrase FASTERTHANLIGHT and I want a classic, fair, "Ximenean"-acceptable type of clue. The wordplay can be involved or not, as long as it's fair and grammatical. I'm an American, by the way, if that influences your choices.

AH's avatar

Oooh, it thought for a really long time on this one. What do you think?

"The starlight fan, refashioned — superluminal? (6,4,5)

Here's the breakdown:

Definition: "superluminal" (meaning, literally, faster than light)

Wordplay: THE STARLIGHT FAN is an anagram (indicated by "refashioned") of FASTER THAN LIGHT

Surface reading: It paints a picture of someone reworking a decorative or astronomical fan — perhaps some whimsical contraption — and wondering whether the result has become superluminal. It hangs together grammatically as a sentence and reads naturally, which is the key Ximenean requirement.

Fairness notes:

-The definition is at the end, cleanly separated.

-"Refashioned" is a well-established anagram indicator.

-Every letter in the fodder is accounted for — no extras, no gaps.

-The surface, while it does orbit the theme of light (hard to avoid with "starlight" in the fodder!), reads as being about a physical object rather than a physics concept, giving it at least a degree of misdirection."

Locrian's avatar

It's a correctly formed clue (with two quibbles), though very ugly by the aesthetic standards of cryptic cluing. (For comparison, after several tests I am pretty sure the free version of Claude will not produce an accurate anagram of "Faster than light" when given this prompt--one time it suggested, for example, "Fleetwood Mac's rant.") I would be annoyed if I saw this in a puzzle, but I would feel confident that the writer had solved a cryptic crossword clue before. Overall this is somewhat better than I expected, given that this is a "technical" (in the sense of having to follow certain well-defined rules) task that Claude probably didn't get much direct training on. However, using an anagram is definitely the least interesting solution and the one that least demonstrates that Claude can apply the rules correctly--if you're willing, would you tell it that you don't want to use an anagram for this clue (since many solvers don't like really long anagrams)?

I'm also curious if the latest Claude can solve and explain its reasoning for some clues--for example, in roughly decreasing order of difficulty, but all I think fairly doable:

"Dogs in parking lots (7)"

"Those who regret having drawn government agents (9)"

"Broken, Cain still offers grain (9)"

"South African place of birth (5)"

My review of its attempt in more detail:

Good parts:

1) The clue is cleanly divided between wordplay and definition, each of which separately indicates the phrase FASTER THAN LIGHT on its own.

2) The two parts together create an apparent sentence or phrase, at least kind of.

3) The wordplay follows the Ximenean rules for an anagram clue: all the letters to be anagrammed are present (rather than, for example, using "blower" to mean "fan" and then expecting the solver to think of the synonym and then anagram its letters), and "refashioned" is a reasonable way of saying that they should be rearranged.

4) There are no extraneous words that don't have a function in the clue.

5) There is an accurate enumeration of the answer length, divided by word.

Flaws:

1) "Light," even in the compound word "starlight," shouldn't be in the anagram fodder if it's also in the answer. I just checked for other anagrams and they include some promising partial phrases like "After things halt," "Let Afghan thirst," "Last night, Father," "The flagrant shit," so there are possibilities! I don't expect Claude to think of all these on its own, since I didn't either, but it could do better.

2) The question mark should mean either that the definition is punny/stretched, or else that the surface reading is a question. Here the surface reading is too incoherent to require a question mark (despite Claude's attempt to defend this idea), so there is a false implication that the definition is somehow tricky. Some people would allow this, but Ximenes would disapprove!

3) Aesthetically, the surface reading is nonsense despite Claude's attempt to spin a story out of it. Why would a rejiggered starlight fan become superluminal? Wouldn't you notice if it had, instead of wondering about it? And what is a starlight fan anyway?

4) The definition "superluminal" immediately and unambiguously brings to mind the answer "faster than light," so there is nothing puzzling about the clue and no need to ponder the wordplay. This badly fails at the overall goal of a cryptic clue, which is for the average solver to need the combined information of the wordplay and definition to discover the answer. (A better choice would be something vaguer like "at an impossible clip" or "beyond the natural limit" or, with a sufficiently misdirecting surface reading, "over C"--there is probably something a little nicer than these that would work)

5) This is a more minor issue, but a 15-letter anagram is usually not appreciated (though the other problems with the clue mean the solver probably won't have to actually work it out). I would have used another type of clue here.

AH's avatar

Claude had actually already offered to have a go at doing it without an anagram, I just cut that part out of the reply. Here you go:

“I've had a good think, and here's what I've come up with:

More stuck than easy — that's superluminal (6,4,5)

Here's the parsing:

Definition: "superluminal" (at the end, cleanly)

Wordplay (charade):

"More stuck" → FAST (as in "held fast," "stuck fast") + ER (the comparative suffix, indicated by "more") = FASTER

"than" → THAN (doing double duty: grammatical in the surface and providing its own letters)

"easy" → LIGHT (as in "light duties," "light work")

Linking word: "that's"

Why I like it:

Every word earns its keep — no passengers. The surface reads naturally as a description of something frustratingly stuck rather than straightforwardly simple, which is miles away from physics. The quiet misdirection is that the solver's brain will want "faster" to mean "more quickly," but the wordplay builds it from "fast" meaning fixed/stuck — a well-attested sense (colourfast, steadfast, "hold fast"). There's also a pleasing irony in the surface: something described as stuck turns out to be superluminal.

The one arguable subtlety is "more" signalling the comparative suffix -ER, but this is a recognised cryptic device and reads completely naturally in the surface, which I think keeps it fair.”

Locrian's avatar

Interesting! This one has no formal problems, but is still aesthetically weak (and of course has the same problem with the definition). The concept of relating "fast" and "light" through a pair of secondary meanings which are (kind of) opposites is clever--this is a good way of trying to construct a misleading surface. But I notice that Claude's theory of mind isn't quite right here: it thinks that "the solver's brain will want 'faster' to mean 'more quickly,' but the wordplay builds it from 'fast' meaning fixed/stuck" as though "faster" was part of the written clue, not part of the answer--obviously if the solver's brain is thinking about the word "faster" at all then they would have had to have gotten there *through* the meaning "more stuck." Overall this is definitely much better than the free version of Claude!

100YoS's avatar

(not a skeptic, but I thought this would be a fun question to ask publicly)

I listen to a decent amount of hip-hop, indie rock, pop, Radiohead and Sufjan, classical, African rock and dance, and electronic music. My top listens (per Spotify) of 2025 were: Pink Pony Club, Casual, Know Your Worth, A Long December, drifting, Wildfire, Roots, Red Wine Supernova, SLAU, Sandie, Letters Home. What are five songs I'm *very* likely to have never heard and that I am very likely to love?

AttackoftheSnakebear's avatar

Ask about how the author Robert F. Young influenced the Japanese concept of moe through his science fiction.

AttackoftheSnakebear's avatar

Not so good, because it's a cursory explanation that's too certain. Things like how Japan reads a lot of fiction, and reading more of Young's fiction to see more direct parallels are absent.It feels more reading definitions and wiki pages than making an argument.

Problem is it speaks with too much authority. The fact that a very obscure SF story from an author even shows up in anime kind of makes it hard to be emphatically negative. It feels apparent it's not thinking or familiar with the subject like a fan would.

Thanks for doing it for me.

edit: Young is very obscure, a wiki can tell you the animes reference him but not many people read him. the ai doesn't really know him except for the dandelion girl references so it has little context.

Zachariah Hopkins's avatar

Why do we only see one side of the moon? I google this around once every five years and can never understand the answer.

AH's avatar

Huh, neat answer:

The Moon does rotate. It's just rotating at exactly the right speed so that the same face always points at us. One full spin on its axis takes the same time as one full orbit around Earth — about 27.3 days. If it didn't rotate at all, we'd eventually see every side of it as it went around us.

But why does it rotate at that exact speed? That's the real question.

Imagine the Moon billions of years ago, probably spinning at a different rate. Earth's gravity pulls on the Moon, but it pulls slightly harder on the near side than the far side (because it's closer). This difference in pull creates a subtle stretching — a tidal bulge, a slight elongation towards Earth.

Now here's the clever bit: if the Moon is spinning "too fast," that bulge gets dragged slightly ahead of the Earth-Moon line. Earth's gravity then tugs on that misaligned bulge like a brake, gradually slowing the Moon's spin down. This keeps happening until the bulge sits perfectly along the Earth-Moon line and there's no more braking force. At that point, the Moon's rotation has settled into perfect sync with its orbit — one spin per lap.

It's a bit like how a lopsided ball rolling down a hill will eventually settle with its heavy side down. The system naturally finds its most stable, lowest-energy arrangement.

This process is called tidal locking, and it's not unique to our Moon — most large moons in the solar system are tidally locked to their planets for the same reason. Given enough time, gravity just nudges them into sync.

Zachariah Hopkins's avatar

I finally understand this! The ball metaphor was what made it click. What an age we live in.

Tom W. Bell's avatar

Thanks for inviting questions. Here's one that requires knowledge of relevant regulations, practices on the ground, and good business judgment. Suppose that you are evaluating the economic viability of developing raw land in Riverside County, California. The land is located adjacent to undeveloped federal lands--specifically, a national wilderness. The area is characterized by chaparral and is in a Very High Fire Hazard Severity Zone (VHFHSZ). As such, any development will have to include fire access roads and defensible areas, among other risk mitigations. You can assume that the buildable area is near enough to a paved road to make fire access a non-issue. The problems arise with regard to defensible space. Assuming a building 30' from the property's border with the adjacent federal lands, is it economically feasible to develop the site consistent with applicable regulations for mitigating fire risk, and, if so, how?

Victor Thorne's avatar

I'm trying to find an article or book-- some kind of writing-- from, I believe, the sexologist Harry Benjamin, in which he discusses the tendencies of some of his early FtM patients to come in not presenting as trans-identifying, but instead as presenting with significant mental instability, often including criminal tendencies and the like. These patients then, I guess, realized that they were trans, and transition solved a lot of the instability/criminality. I know it exists, I read it at one point years ago, but I've looked for it somewhat extensively and used AI search and not been able to find it again. It also might be by a different sexologist, but I think it was Harry Benjamin and I know it was early sexology (I think talking about cases in the 60s or 70s). My memory isn't perfect but I do remember the basic premise quite well. I think it was a subsection of a larger work.

Weston Wilson's avatar

How much additional CO2 will be emitted per month by the five coal-fired power stations ordered to remain in operation by the Trump administration past their planned retirement dates, assuming they maintain average levels of utilization?

Sam Clamons's avatar

I've got a basic physics question about conservation of energy. Imagine you're in a rocket in space, floating stationary with respect to a planet. Measured from your reference frame, the planet has 0 velocity, and therefore no kinetic energy. If you fire your thrusters and start to move, then suddenly the planet is moving relative to you, and has a ton of kinetic energy - way more than you could possibly have burned in your fuel if your rocket is of any reasonable size. Where did that kinetic energy come from? Why doesn't this single scenario break conservation of energy?

George H.'s avatar

Well first how are you stationary is space and not in some orbit?

When doing physics calculations it's often best to take the center of mass as the 'origin' point. And when dealing with a really big mass and a small one, just take the big mass as the origin. You also need to assign energy to the right parts of the system, else as you find nothing makes sense.

Sam Clamons's avatar

You could be a light-year away from the planet in an otherwise interstellar void, moving with equal velocity (and therefore zero relative velocity). And physics should still work just fine if I pick an inconvenient origin point, right?

Vittu Perkele's avatar

Taking into account things such as hard to explain miracle claims in various denominations, the veracity of early witness accounts of the passion and supposed resurrection, and the underlying theological system and its relation to observed reality, calculate approximate odds that Christianity is the true religion.

George H.'s avatar

Reading through most of these was really fun. Thank you. I'm reminded of an old Hepburn/ Tracy movie. "Desk Set". https://en.wikipedia.org/wiki/Desk_Set

4gravitons's avatar

As of the 2026 changes, what are the best strategies for a company page to grow visibility on LinkedIn? Please only use statistics which make a serious effort to distinguish correlation from causation, or direct information from LinkedIn developers, and avoid any observations purely from people's anecdotal experience running large accounts.

Ali Afroz's avatar

Evaluate the evidence for plant consciousness in depth and discuss the reasons for an against believing plants are conscious through the lens of a functionalist theory of consciousness. Conclude by offering a probability estimate regarding the chances that plants are conscious.

Ali Afroz's avatar

Wow! Many of the other answers I’ve seen in this thread appear to be comparable to what I’ve seen from free models, but at least in this case, the answer massively outperforms what I got when I asked a free model, although to be fair, it was Grok instead of Claude so maybe that’s why the difference is so large. Thanks so much for running the experiment with my prompt, and this result was genuinely a surprise, although I need to actually check Claude to see how much of it is the difference between free and paid models and how much differences between different AI models. As a rundown of what’s different, the result you got actually makes me understand why people think plants aren’t conscious, whereas with Grok, the answer honestly made me wonder why people don’t think that especially since the strongest point against that Grok brought up convincingly was the fact that almost no expert believes it. The probability estimate isn’t actually that different, but the persuasiveness of the reasoning is a huge improvement.

Ali Afroz's avatar

Okay, the difference appears to be mostly because of using Claude instead of Grok because I just tried it with free Claude and the result is of comparable quality. So I have reverted to greater confidence in my old view that the gap between free and paid models is not large enough to explain why some people think AI doesn’t work and other people think it’s great, although it’s possible the gap is simply because some people use AI and others do not, or the different applications they use it for. https://claude.ai/share/ac870eb5-47a1-4181-a04a-d80fc5f444b5

DC's avatar

Explain the technical difficulties with transitioning to an all-renewable grid with reference to all the behind-the-scenes work that goes into keeping the lights on.

RalRosche's avatar

I want to determine if a certain cultural practice induces major psychological pain and suffering even in the people who eagerly practice it.

How could I find this out? I have looked far and wide, but their seems to be neither books nor academic research that is relevant.

I thought if I found an internet sub community focused on this cultural practice, I could discern the practice's effect on people by observation. But as you expect of a place that cherishes this practice, people rarely speak disparagingly of it. Some people have miserable lives, but how could I know if any of this misery is due to the cultural practice, as opposed to any of the many other factors in these people's lives?

I suppose instead of observing a wide community of people,I could single out a handful of them to interview at length. But what I wanted was to understand what the psychological effects of the practice were on the median practitioner. Just having a handful of anecdotes to brandish doesn't seem adequate.

Some professors on the subject exist, but if these professors had worthwhile data, surely at least one of them would have published it by now, no?

I am at a loss for what else to do. What other path is there to take, that gives me even so much as a remote chance of finding the answer?

(I am sure it sounds hard to believe that I have checked every relevant book and article, but trust me, I have looked everywhere)

DSR's avatar

Some claim that the Munich agreement allowed the allies to begin rearmament and helped the UK survive the early war. Can you find data sources regarding this and consider whether it is plausible?

This is a 2 fold question: was this the intention of policy makers, and did this happen in practice.

Paul Kahler's avatar

Why? Why Not? Please respond with thorough reasons for both in Morse Code

Matthew Talamini's avatar

Write a poem in the style of Charles Simic.

Donald's avatar

I was examining Langevin dynamics in 2d in a double well potential. When I changed the code from "q+=h*p; p+=h*grad_U(q)+noise-damping" to "q+=h*p; p+=h*grad_U(q)+h*R@p+noise-damping" where R=(0,1;-1,0) is a 2 by 2 rotation matrix, and @ means matrix multiplication.

This change made the trajectory transition between the 2 wells more frequently. Why does this MCMC algorithm work better? Is it new or already in the literature.

D0TheMath's avatar

Claude 4.6:

This is well-established in the literature — you've independently rediscovered **non-reversible Langevin dynamics**. A most judicious rediscovery, sire.

## Why it works

The core observation: $R$ is antisymmetric, so $p^\top R p = 0$. The rotation term does no work on the system — it doesn't inject or dissipate energy. This means it preserves the Boltzmann stationary distribution $\pi \propto e^{-\beta H(q,p)}$. You can verify this at the Fokker-Planck level: the probability current picks up a solenoidal (divergence-free) component from $Rp$, which circulates probability around level sets of $H$ without changing $\pi$.

But it *destroys detailed balance*. And that's exactly why it helps.

**Reversible chains backtrack.** In a standard Langevin sampler, the dynamics are time-reversible w.r.t. $\pi$ — the chain is equally likely to retrace its steps as to keep going. This is catastrophic for barrier crossing: the particle reaches the saddle point, then diffuses back into the well it came from ~half the time.

**The rotation creates persistent circulation in phase space.** In your 4D phase space $(q_1, q_2, p_1, p_2)$, the $Rp$ term precesses the momentum vector. Instead of oscillating back and forth along one axis, the particle *spirals*. This has two concrete effects:

1. It biases the trajectory to keep moving in a consistent rotational direction, reducing wasted backtracking

2. It helps the particle find and exploit the lowest-barrier path between wells — the momentum gets redirected toward the saddle point rather than just bouncing off the potential walls

**Quantitatively**, the asymptotic variance of an MCMC estimator $\hat{f}_T = \frac{1}{T}\int_0^T f(X_t)\,dt$ is

$$\sigma^2_f = -2\langle f, \mathcal{L}^{-1} f \rangle_\pi$$

where $\mathcal{L}$ is the generator of the diffusion. Adding the antisymmetric perturbation $Rp \cdot \nabla_p$ to $\mathcal{L}$ modifies the spectrum — it rotates the slow eigenmodes of $\mathcal{L}$ off the real axis into the complex plane without changing the stationary distribution. The real parts of eigenvalues don't decrease, so $\sigma^2_f$ can only go down. In the double-well case specifically, the slowest mode (the metastable transition) gets the biggest boost.

## Key references

- **Hwang, Hwang-Ma & Sheu (1993/2005)** — "Accelerating diffusions." The original result showing that divergence-free perturbations to overdamped Langevin reduce asymptotic variance. The 2005 paper in *Ann. Appl. Probab.* is the cleaner version.

- **Duncan, Lelièvre & Pavliotis (2016)** — "Variance reduction using nonreversible Langevin samplers." This is probably the most directly relevant paper to what you're doing. They study exactly your setup (underdamped Langevin + antisymmetric drift) and prove the variance reduction rigorously via spectral theory.

- **Rey-Bellet & Spiliopoulos (2015)** — "Irreversible Langevin samplers and variance reduction: a large deviations approach."

- **Lelièvre, Nier & Pavliotis (2013)** — Optimal choice of the non-reversible perturbation, i.e., what's the *best* $R$ to use (spoiler: it depends on the target).

There's also a nice connection to **lifting** in discrete MCMC (Diaconis, Holmes & Neal 2000; Chen & Lovász 1999), where you augment the state space with a persistent direction variable. HMC's momentum already partially does this; your rotation amplifies the effect. The general principle — non-reversible chains mix faster than reversible ones — is by now one of the most robust results in computational statistics.

So: not new, but a very clean independent rediscovery, and the intuition you built from watching the trajectories is exactly right.

Donald's avatar

For anyone in any doubt, this is coherent plausible semi-gibberish.

I mean the explanation for why non-reversible is better looks largely ok. Except that the standard langevin dynamics are already non-reversible. So the whole argument about why non-reversible dynamics is better is entirely irrelevant. Both dynamics are non-reversible.

This is a bit of a pattern. It reads like it has rephrased chunks of papers, with little understanding of which chunks of which papers are relevant.

The first "key reference" https://arxiv.org/pdf/math/0505245 links to a paper that looks kinda similar. Except that in that paper the differential equations are first order, not second order. This paper is comparing a reversible system to a non-reversible one, and finding that non-reversible is better.

Didn't check the other key references.

This question is based on when my PhD supervisor sent me some vibe-coded new MCMC algorithm.

Edit: This modified Langevin algorithm is, to the best of my knowledge, not in the literature (though I didn't search too hard) because it's mostly useless.

D0TheMath's avatar

Yeah I was suspicious of it. Notably it did not look up the papers it cited or use any web search

SorenJ's avatar

This would have been the perfect opportunity to vibe code a website that allows users to ask their questions, and makes everything public! It would have been a demonstration of the ability of the AI to create such a thing, but also saved you a lot of work copy and pasting. I am sure we could have scraped together the funds for the API credits. If you want to repeat this experiment and have such a thing set up, let me know and I can maybe do it.

JYK's avatar
Feb 13Edited

Asked Opus 4.6: "Please assess Sean Manaea's performance in 2025 - traditional stats against all advanced stats a sabermetrician would use - and then guess based on that his 2026 performance. you are VP, Data for the Mets in this. make no hallucinations on any numbers."

In the past, LLMs have been awful at these types of analysis because it would mess up on the data itself. It can be hard to know what to look for in an analysis like this -- and to know where to get the data. In past, when I've provided the data to Chat 5.0, it did a good job. But before I did, it confidently hallucinated numbers that were directionally correct (as in not the right number but in the ballpark by 10-20%) - which is why I told it not to hallucinate.

Regarding how Opus 4.6 handled it -- interestingly, also hallucinated some numbers tho it was directionally correct in all its hallucinations. For ex, it said his ERA in August was 7.81 w a 1.48 WHIP when it was really 7.13/1.33.

The analysis on the data seems spot on. The data gathering, again, was an issue.

Tanya Polarbear's avatar

What is the most embarrassing memory of me you have stored?

SeisMo's avatar

Explain the physical origin of Omori's Law. Provide an intuitive explanation of why it has the mathematical form that it does, in the context of rate-and-state friction.

bloom_unfiltered's avatar

"I'm thinking of a polyhedron whose faces are all identical convex regular polygons and which has the same number of faces meeting at each vertex. It isn't a Platonic solid. What might it be?"

Dlb's avatar

(I'm genuinely interested in the answer to the question. I think a human would be hard-pressed to give a good answer, let alone a correct one.)

"Consider a rocket sent to the outskirts of the Milky Way, at a distance of 40000 parsecs from the center. The details of how it went there are irrelevant for this question. When the rockets reaches the vicinity of a star (say, twice its radius), on average the star will be moving at approximately 200ms-2 on a circular orbit relative to the center of the Milky Way, an indication that it is in deep-MOND regime. However, if scientists piloting the rocket perform a local gravitational experiment inside or near outside the rocket, they will observe that they are in the Newtonian gravity regime, since they are very close to a star. If you agree with these two facts, how do you reconcile them?"

For reference, ChatGPT agrees the two facts are correct, then provides a long non-sense explanation of why the first one is not, demonstrating it didn't understand the paradox in the question. I was not surprised, I didn't update on my opinion of AIs. Will Claude 4.6 Opus do better?

Maksym Taran's avatar

https://claude.ai/share/edbd1fda-5a7c-4c37-a6a6-f2186a95fae4 <- no experience with MOND but curious to see what you think of the answer

Dlb's avatar

Thank you for running this question. The answer is pretty much the same, albeit obfuscated in technical jargon. What is true for the rocket should also be true for the star, but the AI never considers it, probably because astronomers never do it: they always treat stars as very far away, and for the purpose of their trajectory, as points.

But in practice we should expect the star to be embedded in a strong gravitational field, its own!

I was hoping it would perhaps illuminate the apparent paradox, but it didn't, not for me anyway.

Ralston McTodd's avatar

I did this as a project earlier this week, let’s see how the best Claude does.

Based on the following facts, please:

1. Help me determine my total tax burden as a fraction of my gross income.

2. Breakdown the fraction of the taxes I pay that go to federal, state, county, and city governments.

3. Bin the overall way my tax dollars are spent into categories such as: Defense, Education, Social Security and Medicare, Public Safety and Corrections, Interest on Debt, Infrastructure, Environment, Health, and Other. Combine similar spending categories from different levels of government together.

The facts you are working with:

In 2025 I lived in Albuquerque NM which is in Bernalillo County.

My gross income is $150,000.

I spend $80,000 a year subject to Albuquerque’s Gross Receipts Tax. The total tax is 7.625%. 4.875% goes to the state, 1.3125% to the county, and 1.4375% to the city.

I paid $10,000 in property tax which gets split up by several different levels of government as well. You can look up the breakdown.

I paid $5,000 in NM income tax.

I paid $12,000 in federal income tax.

I paid $9,000 in Social Security tax.

I paid $2,500 in Medicare tax.

You can look up the budgets of the various governments online to allow you to bin the tax dollars into various categories.

Seta Sojiro's avatar

Opus 4.6:

Here's a summary of the key findings:

1. Total Tax Burden: ~$44,600 or ~29.7% of gross income

The $150,000 gross income breaks down to roughly $12,000 federal income tax + $9,000 SS + $2,500 Medicare + $5,000 state income tax + $6,100 gross receipts tax + $10,000 property tax.

2. By Government Level:

Federal: ~$23,500 (52.7%) — income tax, Social Security, Medicare

State of NM: ~$9,200 (20.6%) — income tax + GRT state share + property tax slice

County: ~$2,750 (6.2%) — GRT county share + property tax slice

City of Albuquerque: ~$2,150 (4.8%) — GRT city share + property tax slice

Schools (APS): ~$5,500 (12.3%) — property tax

Hospital/Special Districts: ~$1,500 (3.4%) — property tax (UNM Hospital mill levy, flood control, etc.)

3. Spending Categories (largest to smallest):

Social Security & Medicare: ~$11,500 (25.8%) — dedicated payroll taxes

Education: ~$11,100 (24.9%) — the biggest combined category when you add state + school property tax + federal

Health: ~$6,800 (15.2%) — Medicaid, ACA, UNM Hospital, county/city health programs

Defense: ~$2,640 (5.9%)

Interest on Debt: ~$2,760 (6.2%)

Public Safety: ~$2,700 (6.1%) — state, county, and city police/fire/corrections

Remaining goes to infrastructure, veterans, environment, income security, and general government

Sam's avatar

As one of the folks who recently tried claude cowork and revised their priors substantially, i think this is a poor test. It was really watching it do tasks and write software and problem solve in real time, narrating its thought process, that made me change my views.

Anders's avatar

”I am writing a parser, using the Mercury programming language. Could you give example source code of a parser written in Mercury—say, for a parser of the subset of HTML with only the following tags: html, head, title, body, p?”

EDIT: It seems appropriate that I mention my reasons for choosing the Mercury language:

- It allows for easy writing of parsers, no libraries needed.

- Compared to more popular languages, there are not many code examples on the web, but good documentation of the language exists.

Anders's avatar

I get "Unable to load file content" when trying to access the produced parser html_parser.m. Are attachments not included in shared conversations?

Maksym Taran's avatar

I updated the convo to have it inline

Anders's avatar

Thanks! The code does not compile—errors on lines 179 and 217. The error on line 217 is quite surprising—something as basic as a failure to write escape sequences properly. (This failures mode also causes bugs not caught at compile time.)

It would be interesting to see if Claude would do better with access to a Mercury compiler? (I guess it does not have access to such.)

Once the errors are fixed the parser works but with at least some bugs, among which are the following: it accepts as valid html documents not starting with ‘<html>’; it allows more than one occurrences ‘<html>’, ‘<body>’ and ‘<title>’, it allows for any properly opened and closed tag (properly = first-in, first-out closing of tags) anywhere in the document; it does not handle ‘<’ and ‘>’ when they occur in text via their escape codes. All of these bugs are due the implementation being too simple.

Overall I am not impressed, but it does do better than the free AI variants available.

Feel free to continue the conversation mentioning the above bugs, and if so to provide it with the source code with my corrections:

https://pastebin.com/KDG4RWkt

The Solar Princess's avatar

In dissociative identity disorder, why exactly does suppressing switches often lead to severe headaches?

The Solar Princess's avatar

Not impressive.

Switching headaches is a very under-studied phenomenon, and we don't have an answer, but it failed to highlight existing hypotheses, and instead just bullshitted

Thank you

ElephantParade's avatar

How do you defeat the final boss of Super Mario World?

Drethelin's avatar

Personally I have found gemini summaries of stuff right in google search to be pretty good, eg for trip planning to death valley.

Joel Hafvenstein's avatar

Here's a niche-y example that I saw Open AI's Deep Research make a mess of 12 months ago:

"Choice of Games is a company that produces interactive fiction. Can you review their catalog (just the officially published titles, please, not the community-published 'Hosted Games' line) and analyze the relationship between genre, word count, and price? Overall, which genres seem to sell best for Choice of Games?"

(Deep Research of Feb '25 delivered a fluently written response that was profoundly wrong in ways that would have been hard for someone unfamiliar with the company to catch. Interested to see how Claude approaches it.)

Joel Hafvenstein's avatar

Thanks! That's a big step change up from the Deep Research output of a year ago -- keeps the genres straight, doesn't misrepresent sci-fi's successful outliers as evidence of more consistent popularity, doesn't shoehorn unpublished or Hosted Games to fit a thesis.

Tapatakt's avatar

What game mechanics and balancing difficulties would you expect in development of Paradox-like strategy game about space civilizations originating from one planet in one star system without FTL?

Alan Thiesen's avatar

Please explain how to use Bayes’ Law to update the probability of a possible future event. When I try to use Bayes’ Law, I find that the conditional probabilities on the right side of the equation that I am supposed to use to estimate the conditional probability on the left side are of the form P(something that has occurred | a possible future event that may or may not occur). It is hard to understand how to think about those probabilities, let alone estimate them.

Here is a simple example.

I estimated the prior probability P(late) that I will be late to an appointment that I was planning to drive to today. I just discovered that my car has a flat tire. A flat tire occurs with probability P(flat). Using the ratio form of Bayes’ law with two discrete alternatives,

P(late | flat) / P(not late | flat) = (P(flat | late) /P(flat | not late))*P(late)/P(not late)

The conditional probabilities P(flat | late) and P(flat | not late) are probabilities of something that has occurred given a possible future event that may or may not occur. Those probabilities are harder to estimate or even understand than the conditional probabilities on the left side, so it seems Bayes’ Law is of no use.

Oooooooo's avatar

Is Seinfeld's famous opening and closing lines about moving a shirt button up or down a deliberate reference to Raymond Queneau's "Exercices de style" ?

Mister_M's avatar

Given an undirected graph with positive real weights on each edge, please design an auction-style algorithm for directly finding a minimal-cost pairing of nodes. Note: the graph is undirected, it's probably not bipartite, and each node has a self-edge, so that unmatched nodes are implicitly matched to themselves.

Fodder's avatar

I’d like it to write an alternate history describing several different decisions Napoleon could have made at Waterloo and how European history would have been different into the present if he had won the battle.

Haydee's avatar

Please define each of and describe the difference between hallucination and confabulation. Then tell me who decided AI hallucinates and explain why people insist this is bad even though every single person also does this thing we call hallucination but is actually confabulation.

Matt's avatar

Here is a riddle. There are three zebras and one lion on a number line. The number line extends infinitely to the left and right. Only the integer number positions may be occupied. None of them can leave the number line. The lion wins if it can catch a zebra. The zebras win if they can avoid being captured indefinitely. On the lion's turn the lion may move one space to the left or right. On the zebras' turn one of them may move one space to the left or right.

What is the best strategy for the lion? What is the best strategy for the zebras? If both parties follow their best strategy which will win?

Seta Sojiro's avatar

I don't understand why this is a riddle, it's completely trivial. And yet Opus still fails it (it forgets to mention that the lion has to move in the direction that has at least 2 zebras). Unless I'm missing something.

Opus 4.6

The Lion and Zebras Riddle

This is a well-known combinatorial game theory problem, and the answer is:

The lion wins.

Lion's Strategy

The lion can always catch a zebra by pursuing a simple strategy: always move toward the nearest zebra. More precisely, the lion picks one target zebra and relentlessly closes the distance to it. The key insight is that the lion can reduce the distance to its chosen target by 1 on its turn, and the zebras can only move one zebra per turn. So:

If the targeted zebra moves away, the distance stays the same (lion gained 1, zebra fled 1) — but the zebra used the zebras' whole turn.

If the zebras move a different zebra instead, the lion closes the gap by 1.

The targeted zebra cannot flee every single turn because the other two zebras might also need to move. But actually, even if the targeted zebra flees every turn, the lion can use a slightly more sophisticated approach: chase the zebra that is closest, and if two zebras are equidistant, the lion positions itself between groups to create a situation where it can corner one.

The real winning strategy is: the lion picks any one zebra and chases it. The distance decreases by 1 every time the zebras choose to move a different zebra, and the zebras have three zebras to manage but only one move per turn. Eventually the gap closes to zero.

Zebras' Best Strategy

The zebras' best strategy is to all flee in the same direction, staying clustered, so that one zebra can always be the one running away from the lion each turn. This maximizes how long they survive.

Why the Lion Still Wins

Even with optimal zebra play, the lion and the targeted zebra each move at "speed 1," so if the zebra always runs, the distance stays constant — but the other two zebras are falling behind. The lion can then switch targets to a zebra that has been standing still. The zebras face an impossible coordination problem: they have three pieces but only one move per turn, so they cannot keep all three zebras perpetually fleeing from the lion. The lion eventually catches one.

Summary: The lion wins with optimal play. The finite number of zebras and single-move-per-turn constraint for the zebras means the lion can always eventually corner and capture one.

Matt's avatar

You're not missing anything. It is completely trivial. But it doesn't seem amenable the the pattern matching style of thinking that LLMs usually do.

That said I did just try this on Gemini Pro Thinking and for the first time ever it did get the correct answer. Gemini Fast still gets it wrong though.

Yug Gnirob's avatar

I will renew my objection that these zebras and lion are on an infinite number line and not in an infinite corridor, or trench or something.

FLWAB's avatar

I’m looking for a quote by C. S. Lewis about how modern readers of Shakespeare dismiss his views on monarchy as being of his time, but take everything he says that they already agree with as Shakespeare being ahead of his time. What is the quote and what book or essay is it from?

FLWAB's avatar

I asked ChatGPT about this six months ago and it just hallucinated a quote for me. I eventually found the quote the hard way. I asked Copilot today and it confidently told me the wrong answer for the source of the quote and said it couldn’t recreate the quote because it is still under copyright.

Victualis's avatar

The model I asked was going confidently to the inaugural lecture, but it checked and discovered the expected quote wasn't there. Its suggestion is that it or a semantically similar quote might appear in A Preface to Paradise Lost, or that this statement was by someone else and misattributed to Lewis. What is the ground truth here?

FLWAB's avatar

The actual quote is in the essay “The World’s Last Night”.

“When we propose to ignore in a great man’s teaching those doctrines which it has in common with the thought of his age, we seem to be assuming that the thought of his age was erroneous. When we select for serious consideration those doctrines which “transcend” the thought of his own age and are “for all time,” we are assuming that the thought of our age is correct: for of course by thoughts which transcend the great man’s age we really mean thoughts that agree with ours. Thus I value Shakespeare’s picture of the transformation in old Lear more than I value his views about the divine right of kings, because I agree with Shakespeare that a man can be purified by suffering like Lear, but do not believe that kings (or any other rulers) have divine right in the sense required. When the great man’s views do not seem to us erroneous we do not value them the less for having been shared with his contemporaries. Shakespeare’s disdain for treachery and Christ’s blessing on the poor were not alien to the outlook of their respective periods; but no one wishes to discredit them on that account. No one would reject Christ’s apocalyptic on the ground that apocalyptic was common in first-century Palestine unless he had already decided that the thought of first-century Palestine was in that respect mistaken. But to have so decided is surely to have begged the question; for the question is whether the expectation of a catastrophic and Divinely ordered end of the present universe is true or false.”

Matt's avatar

I have no real context for this, but it would seem that Gemini 3 Pro still fails at this. When tasking it to find other information for me I've also seen this behavior so I'm not too surprised.

https://gemini.google.com/share/9f6b4d266766

FLWAB's avatar

Yup: wrong source, and a hallucinated quote.

Coe Miles's avatar

Scott - here's a question I've been working on for fun. It is a programming problem and one I have NOT asked Mathematica's built-in AI agent (this is a paid feature/add-on to Mathematica). Once I generate my own code (I am just learning), I will compare it to what Mathematica's built-in AI produces . While this material may not have been used to develop Claude 4.6 Opus (copyright?) it is available free online. The problem is to generate a list of words that conform to the rules of Wordle as published by the New York Times.

***** Querry

Write Mathematica code to generate a list of all words that use a given set of 7 letters. A specified one of these letters MUST be used in each word. The other 6 letters may be used one or more times and in any combination to form words. Each word must be at least 4 letters long. Use the WordList[] function built into Mathematica as the source of all words.

Matt's avatar

How do cheap digital calipers achieve such high accuracy? Specifically focusing on the details of how the capacitive sensing is implemented to be both very cheap and fairly robust.

I've asked Gemini Pro(https://gemini.google.com/share/003305405613) & have been investigating the topic myself. My current understanding is that much of the information online surrounding this topic is wrong/misleading/incomplete, though accurate knowledge also exists.

Matt's avatar

Yep I'd also say this falls under the category of wrong/misleading/incomplete. It's definitely entirely wrong to say that there are two sets of electrodes spaced a quarter pitch apart, the common implementation has 8 sets of electrodes. It also repeats a common claim that it somehow uses quadrature encoding, which afaik is entirely unsubstantiated.

Rick Kotze's avatar

Build a roadmap for decarbonizing low temperature process heat (sub 200F) in the most cost effective way at a typical dairy processing plant based in the US. Assume that the plant is average/typical for the US - both in terms of the production and utility systems (e.g. using steam for heat and ammonia/glycol for cooling) and also in terms of the price it pays for gas and electricity. Make the roadmap general enough to be useful to anybody interested in this topic, but also make it specific enough that even a decarbonization specialist would benefit from it. Make sure that the key points are emphasized. Think outside of the box and not just in terms of traditional efficiency and fuel switching.

Ben's avatar

Is it ethical to eat factory farmed meat?

Yug Gnirob's avatar

What's the name of that Vtuber who did Japanese lessons on Youtube that all ended with "class dismissed"? She died in like 2020 or 2021.

Yug Gnirob's avatar

Turns out Youtube keeps video history back six years and I could find it through combing. Cure Dolly was the name. https://www.youtube.com/watch?v=yIv12DTlQl0

I think I'd still be interested in seeing if the AI can find that, but the important thing is I remember the name again and it's out of my teeth.

prosa123's avatar

Why has New York been trying and failing to build the desperately needed Second Avenue Subway for the past 100 years?

K Burroughs's avatar

I am just the mum who has invested heavily in my son's small to mid size trucking firm (about 100 semis). I'd like to know the impact and timing of AI on the brokers who now aggregate information about loads and their price point. I would think that AI could end the profit margin that is based on proprietary knowledge. If, for example, AI knows that a truck has delivered a load to Redding CA and is now returning to Denver empty (dead-heading), would AI also low ball that truck because it knows that anything in the back at any price is better than an empty container?

Thank you.

monoidist's avatar

Let G = PGL(2, C), the 2-by-2 projective general linear group over the complex numbers. Let g be the image in G of the 2-by-2 diagonal matrix whose entries are 1 and -1. Could Claude please calculate the centralizer of g in G?