One more: "The replacement administrator for Astral Codex Ten Discord identifies as female" is finally resolved. It was my honour to participate in zeroth and final stages of the selection conclave!
Well darn, even though this superficially changes nothing I think it prevents me from using this as an example of prediction markets being self-correcting to outside interference ever again.
You could start using it as an example of prediction markers *being* the outside interference. With this one being the largest one on the platform, more than a few selectors had a stake in the issue and openly admitted so.
Hahaha thankfully, nobody's ever asked to see my brier score before deciding whether I'm worthy of starting a startup. (spoiler alert, it's not that great)
I often think about this piece by Duncan Sabien: https://medium.com/@ThingMaker/reliability-prophets-and-kings-64aa0488d620. Essentially, if you make a statement about the future, there are two ways the statement could come true. Either you can be a _prophet_ aka superforecaster who is really good at modeling the universe; or you can be a _king_ aka highly agentic person who is good at making things happen.
I identify much more strongly with the latter, and I imagine most founders do as well~
... yeah, I think that's a reasonable point. I put a close date of 2030 on the question, and the "ever" was mostly for exaggeration (I basically think that if it hasn't happened by 2030, it'll never happen). But I've updated the description to indicate that if it's 2031 and Manifold is still around and still not worth $1B, it resolves to NO.
Of course, this then runs into the classic problem of "long-time horizon markets can be inaccurate, because it's not worth it for someone to lock up money for 10 years". We have some theoretical solutions to this (loans; perpetual swaps?) but we're still figuring out whether/how to prioritize those.
By options market you mean by stocks options, right? Prediction markets can be basically every topic imaginable, ranging from politics and finance to science and sports.
I really like to use Futuur to make my predictions, they have some really nice markets, and i'm making good money there
I was going to say "wouldn't the people betting against win if the company went under before reaching that valuation?" but then I realized....if they go under you won't be able to resolve at all, and if they haven't yet gone under, there's a chance it could happen in the future, lol. Yeah that one should probably be removed or forced to reword or something.
500 million recorded cases of covid sounds low to me. Of course there may be twice (or more) as many unrecorded.
The 2% are crazy, yes. Omicron had an r value close to Measles and monkey pox is much harder to transmit.
Lamda is impressive, arguably it passes the Turing test, although in a slightly uncanny valley style. The analysis of the broken mirror was great, the story about the owl facing down a Lion was the lamest story ever.
That said, I’ve never believed that passing the Turing test proves intelligence. It’s a necessary but not sufficient condition.
You seem to have the common misunderstanding that the Turing test is "Can some human be fooled during casual conversation that this computer is human?" ELIZA passed this test in the 60s. The actual Turing test is whether a computer program can behave indistinguishably from a human under expert questioning. Lamda has come nowhere close to passing the Turing test. An expert questioner could very easily distinguish it from a human. The Turing test is sufficient (almost by definition) but not necessary to prove intelligence.
I see nothing in any description of the Turing test that I’ve seen which indicates that the questions have to be expert, or the interviewer expert. If anything that would be easier than a general conversation, anyway. A general conversation can go anywhere.
And as I said I don’t see the Turing test as a sufficient test (it is obviously necessary though).
Turing didn't explicitly call for an expert questioner but its clear he meant it to be a sophisticated adversarial conversation where the questioner was trying to out the computer. It's also clear that at least he understood the test to be sufficient but not necessary:
"May not machines carry out something which ought to be described as
thinking but which is very different from what a man does? This objection
is a very strong one, but at least we can say that if, nevertheless, a machine
can be constructed to play the imitation game satisfactorily, we need not be
troubled by this objection"
That it isn't necessary is easy to see by the fact that an overly honest AGI would immediately fail the test by admitting it was a computer. More generally a powerful AI that is obviously intelligent but nevertheless can't sufficiently imitate a human to pass the Turing test is easy to imagine.
I’m not sure what you mean by sufficient but not necessary. In any case the ordinary conversation is as good a test as any, harder perhaps than a expert analysis which can be rote. An AI that can converse at a dinner party is a harder feat than an expert system.
I think you misunderstand what I mean by an expert. I mean someone who is good at determining whether something is a computer or a human via questioning. See for example: https://scottaaronson.blog/?p=1858
As for necessary and sufficient see wikipedia:
"If P then Q", Q is necessary for P, because the truth of Q is guaranteed by the truth of P (equivalently, it is impossible to have P without Q). Similarly, P is sufficient for Q, because P being true always implies that Q is true, but P not being true does not always imply that Q is not true."
Come to think of it: Was Weizenbaum's secretary the first person to be fooled in a Turing test, or were there earlier cases? Is the earliest known case of a person being fooled by a simple program into thinking they were interacting with a person documented somewhere?
That's an interesting question. I don't think Weizenbaum's secretary would technically count since she knew it was a computer program before talking to it:
"his secretary, who was well aware of his having programmed the robot, asked him to leave the room so as to converse with it"
I'd note that humans aren't capable of consistently writing 10k lines of bug-free code from natural-language specifications. Certainly not without testing.
It could charitably be interpreted as "no catastrophic bugs that make the program essentially non-functional", but yeah, humans sometimes fail at that as well, at least on the first try.
I agree that that's more charitable, but even "non-functional" is very fuzzy. I use 'mostly functional' software regularly that's still, for specific cases, surprisingly non-functional.
And, to just to make this all even harder to accurately judge, a lot of current human programmers seem to be pretty 'robotic', e.g. copy-paste-ing code directly from Stack Overflow.
It's hard to know what's an actually reasonable bar to demand an 'advance AI' to clear!
Humans also generally aren't capable of converting mathematical proofs into formal specifications. And they're not usually capable of drawing the kinds of pictures that even Mini DallE does pretty well. But I think the idea is that this particular task, while out of reach of humans, is the sort of thing that a computer that's equivalent to an ok human at a small portion of the task would be able to do just fine at. That is, the issue that humans have with this task are akin to the issues that humans have with multiplying ten digit numbers, which a computer intelligence should be able to do just fine at.
A significant amount of mathematical proofs turn out to be wrong, or missing key steps. Or they are pointing to an idea that is powerful enough to prove the theorem, but the details of the proof are wrong. Having the ability to turn any published natural language arguement which convinces mathematical society that a statement is true (i.e. a "published natural language proof") into a formal proof would require a very high level of mathematical insight. Sure, there are lots of irrelevant details which a machine would be better at, but the task very likely requires a high level of mathematical competence, for the aforementioned reasons.
Yeah so the challenge here is that the computer or human needs to be able to figure out what the proof is trying to do at this step and why the informal thing went slightly wrong, but why it’s “morally right” and figure out the correct formal way to specify what really needs to make it work.
I wonder how you'd verify that to be honest, and how exhaustive the natural language specifications will be. There's a big difference between "write me a program that does X", and a list of five hundred requirements in the "shall do X and Y" format. Also, will the AI be allowed to ask clarifying questions? I almost think how it handles that would be a better test of its intelligence...
Agreed, if it wasn't clear the point of my comment above was that in a lot of cases of software development the hard part is the communication and requirement refinement, not the actual writing of the code.
'Bug-free' is an excessively tall order but I presume the AI would have something along the lines of testing available to it. i.e. it would be able to run code and refine the final output based on the result. I expect this subproblem not to be the hardest part of the whole thing.
Yeah – without a 'crazy sharp' specification of what "bug-free" means (which would probably itself be 10k lines), that just seems like a bad criterium.
It seems to me like there's a BIG variation in the number of 'bugs' in even 'mostly _functional_' software.
Am I crazy, or are the Musk Vs. Marcus decision criteria insane? Very few people could achieve all five, and I posit still less than half could do even three. Further, "work as a cook in a kitchen" seems wrong: that feels very similar to self-driving AI, and few people would accept self-driving as an indicator of AGI.
I would start with asking:
* What criteria would the vast majority of people meet, that current AI does not?
* What are some examples of interesting novel ideas, and what are ways we can prompt humans to provide some?
* What sort of human behaviors rely on a world model? How could we ask an AI to demonstrate those behaviors? ( I do think the novel / movie criteria fit this)
* How do humans generally describe qualia? How can we prompt an AI to describe it's own qualia in a convincing way? (the way a machine feels should be necessarily different from how a human does)
Why do you see it as any harder than self-driving?
Sure, it would require the correct "interface" (i.e. body + sensors), but the intelligence behind that doesn't seem to require more than autonomous navigation.
I don't know how many cookbooks made it into the GPT-3 corpus, but I bet you could converse with it and get a pretty detailed description of how to go about executing on a recipe you hand it.
The big reason it's harder than self-driving is that there aren't a dozen major global corporations incentivized to pour billions into this over the next decade.
Perception, control and task specification are all much more challenging to get right in the kitchen. A car needs to be able to recognize a large-ish but discrete set of classes (car, bike, pedestrian, sign, etc.), it has relatively few degrees of freedom, and it's high-level objective can be specified with a set of GPS coordinates. Meanwhile the kitchen robot needs to detect and reason about a much larger number of objects, including things that can be cut into pieces, things that can deform, and liquids. It also has to be able to perform precise-ish measurement of volume. Chopping, dicing, pouring all require a lot more control precision than a car. Then there's the question of how to tell it what to do. Working from a human recipe requires pretty complex language understanding, although we're getting better at this lately. You could also provide demonstrations, but these are a lot more expensive to produce, and come with added perception problems to figure out how to map the what the demonstrator is doing to what the robot should do. I guess the other alternative is to have an engineer sit down and hard-code each recipe, but that's even more obnoxious/expensive. All of this is assuming a robot using cameras with arms that have hands or tool attachment points, which is I think what we're all talking about when we say "work as a cook in a kitchen", and not some assembly line, which is obviously much easier.
Okay, I agree it’s likely harder. But I still don’t think it’s in a different class, even assuming the recipient didn’t need to be hard coded.
I think providing enough demonstrations would be extremely expensive. Far, far more demonstrations were able to be provided to autonomous driving models, simply because there’s a huge data stream to pull from. If given that many cooking demonstrations, well mapped, I think a current gen AI could cook. (Again, given a reasonable interface, which I do agree would be harder).
The different class is real for me: a car exist, with very few degrees of freedom to control (in fact, 3: wheel, accelerator+break (you are not supposed to use them simultaneously), and gear stick or selector.) even if you count other non essential controls, it's a very simple body which is already largely computer-controled... A cook, on the other hand, is a human body with no current robotic alternative, not even any credible attempts.
Sorry, I still (respectfully) disagree. Even though a lot of data is used to train models that go into self-driving cars, nobody (that I know of) is doing this end-to-end (raw sensor data in -> controls out). All the learning that's happening is being used to train components of the system (mainly vision/perception) which are then consumed by good-old-fashioned-AI/robotics techniques which steer the car. Maybe there's some learned policies in there that can also decide when to switch lanes and whether it's safe to cross an intersection, but the point is that it's doing all of this in a model-based setting, where the system is building up a 3D representation of the world using primitives designed by a bunch of engineers, and then acting in it. It's possible to use model-based approaches here because again, the space of worlds that the robot needs to consider can mostly be constructed as set of discrete objects. For kitchen robots, we have no ability to come up with a taxonomy of these environments. How do you model a squishy eggplant that you just baked and cut into pieces? How do you model a paste that you made by mixing six ingredients together? Don't get me wrong, fluid/FEM simulators exist, but then you also have to tie this to your vision system so that you can produce an instance of virtual goop in your simulated model of the world whenever you see one. People have been trying to do this with robots in kitchens for a long time, but the progress is not convincing. The fact that you can use model-based approaches for one and not the other, places these squarely in two separate classes. Some robotics people would disagree with me and say that you can use model-based approaches in the kitchen too, and that we just need better models, but my point remains that it's not just a "we have data for cars, but not for kitchen robots problem" they really are different problems.
Well said! Yes, the "cook" task requires a _very_ capable robot body, "control precision". Also, as one of the other commenters noted, "taste testing" is a problem... (gas chromatograph/mass spec + pH meter + ion selective meter for sodium might do most of that - but no organization with sufficiently deep pockets to pay for developing that has an incentive to do so)
Imagine how much harder it would be to invent self-driving cars if "cars" did not already exist as a standardized consumer good. The first AI chef project faces the significant obstacle of needing to invent a robot body, and the second AI chef project can't copy many of the first's AI techniques because they're using a substantially different robot.
> Sure, it would require the correct "interface" (i.e. body + sensors), but the intelligence behind that doesn't seem to require more than autonomous navigation.
Car navigation takes place in 2D space. Kitchens are 3D spaces. There are numerous problems that are tractable in 2D but intractable in 3D or higher dimensions.
On Marcus's criteria, I think most intelligent adults could do the first two, so you're right there. Depending on just what he meant by work as a cook in a kitchen, if we're not talking about sophisticated food, I'd think a fair number of adults could do it. After all, many adults have experience preparing meals for themselves and/or for a family, and I'm not talking about microwaving preprepared meals or working from meal kits. But that won't get you through a gig in an arbitrarily chosen high-end or even mid-range restaurant. Any cuisine? The last two would require difficult specialization. How many professional programmers have written 10K lines of bug-free code in a time-frame appropriate to the challenge?
Is there anyone a good argument that “human level intelligence” actually means anything specific enough that people can agree on when we’ve hit it?
After all, some humans have successfully run for president. Would it be fair to say that, until an AI can successfully run for president, manage $10 billion worth of assets for market beating returns over 40 years, and compose a number one platinum bestselling record, it still hasn’t reached human level intelligence, since those are all things individual humans have done?
> Would it be fair to say that, until an AI can successfully run for president....
I think that's the wrong way to look at it. Basically every adult human can demonstrate "general intelligence", so I don't think there's a reason to hold the bar so high as this.
This is why I open with "What criteria would the vast majority of people meet, that current AI does not"?
Actually, new idea: what if I defined “human level intelligence” as: able to learn multiple novel, complex task in close to same amount of training data as a human. E.g. 1) learn to drive in ~120h of driving related training and 2) be able to learn wood carving in ~120h of related training data.
What's the enforcement mechanism that would stop Musk from being president? The constitution says you have to be a "natural born citizen". Musk could claim that he is a citizen who was born in a natural (as opposed to demonic) way. Yes, lawyers will say that the term "natural born citizen" means something else, but Musk will just claim that the issue should be left to voters.
Doesn't matter what he claims. Even in the 1700s nobody was concerned about a demonspawn or cesarean-section person running for office, and there is no reasonable interpretation of "natural born citizen" aside from "Born in the U.S.". There could be a lawsuit that goes all the way to the Supreme Court, but unless the ultimate ruling straight up ignores the logical implication of the term, a foreign-born citizen will not be certified as president.
First, I would disagree (as would Ted Cruz, born in Canada to an American mother) that "born in the US" and natural born citizen are the same thing. But other than that, I pretty much agree with you. (However, we could get into some interesting constitutional law questions about how it might be enforced, whether the Supreme Court might stay out of it, what state legislatures or Congress would do, etc.)
Obviously the clause exists to exclude people born via c-section. It's part of the checks and balances - the Founders in their infinite wisdom ensured that the President would be vulnerable to Macbeth.
You have to file papers with each state asking to be put on the ballot. It's up to the Secretary of State to make a ruling, with the advice of the State Attorney General. Needless to say, no blue state S-of-S would hesitate to exclude Musk on constitutional grounds, and I doubt many red state Ss-of-S would either.
I was the one who put it on that Manifold question, purely as a joke. I bet M$11, the equivalent of $0.11. It looks like someone else bet M$100, the equivalent of $1. I assume they were also joking, though *theoretically* the Constitution could be amended in the next two years…
The fact that it’s still at 5% just shows that the liquidity in the market is very low and there’s no good way to short questions right now.
Or "at the time of the adoption of the Constitution". This is usually interpreted to mean the original ratification of the Constitution (allowing folk like George Washington, who was born a British subject in the Crown Colony of Virginia, to be eligible), but you could make a textual case for it also applying to people who are or become US Citizens when their home country is annexed as a state or incorporated territory of the United States.
You could have fun arguing that it isn't the Constitution unless it includes all Amendments in-force. By that argument, anybody a US citizen as of 1992 would also qualify.
I don't think that LAMBDA did a good job with Les Miserables. The prompt asks about the book. LAMBDA's response is about the musical.
LAMBDA: "Fantine is being mistreated by her supervisor at the factory and yet doesn’t have anywhere to go, either to another job, or to someone who can help her. That shows the injustice of her suffering. ... She is trapped in her circumstances and has no possible way to get out of them, without risking everything."
This is a weird notion of justice. Justice is supposed to be impartial, but LAMBDA is concerned that her supervisor didn't take her particular circumstances into account. But maybe that's the book's notion of justice. Let's see what it says:
Les Miserables: "Fantine had been at the factory for more than a year, when, one morning, the superintendent of the workroom handed her fifty francs from the mayor, told her that she was no longer employed in the shop, and requested her, in the mayor’s name, to leave the neighborhood. This was the very month when the Thénardiers, after having demanded twelve francs instead of six, had just exacted fifteen francs instead of twelve. Fantine was overwhelmed. She could not leave the neighborhood; she was in debt for her rent and furniture. Fifty francs was not sufficient to cancel this debt. She stammered a few supplicating words. The superintendent ordered her to leave the shop on the instant. Besides, Fantine was only a moderately good workwoman. Overcome with shame, even more than with despair, she quitted the shop, and returned to her room. So her fault was now known to every one. She no longer felt strong enough to say a word. She was advised to see the mayor; she did not dare. The mayor had given her fifty francs because he was good, and had dismissed her because he was just. She bowed before the decision. ... But M. Madeleine had heard nothing of all this. Life is full of just such combinations of events. M. Madeleine was in the habit of almost never entering the women’s workroom. At the head of this room he had placed an elderly spinster, whom the priest had provided for him, and he had full confidence in this superintendent, - a truly respectable person, firm, equitable, upright, full of the charity which consists in giving, but not having in the same degree that charity which consists in understanding and in forgiving. M. Madeleine relied wholly on her. The best men are often obliged to delegate their authority. It was with this full power, and the conviction that she was doing right, that the superintendent had instituted the suit, judged, condemned, and executed Fantine."
The musical has a male superintendent who sexually harasses her and then dismisses her cruelly. The book has a female superintendent who dismisses her with severance pay. The book explicitly says that Fantine considered the decision to be just.
This is one instance of the musical completely rewriting the central theme of Les Miserables. The musical is a call for liberty for people who are unjustly suffering. The book is a call for compassion for people who are justly suffering. The theme isn't justice and injustice. It's justice and mercy.
It's not surprising that a text predictor would talk about the musical. A lot more people have seen the musical than have read the book. The training set probably even includes people who claim to be talking about the book, but have only seen the musical. LAMBDA has read the book, but clearly has not understood it.
What fraction of humans/ adults/ educated adults would do an obviously better job?
Go to any LoTR forum/ quora space/… and see how many questions go “in the book, how do the Hobbits get to Bree so fast/ why does Aragorn have four blades with him/ why is Arwen dying/…”. These are literate members of the space that are aware of the books/ movies distinction. Arguably a non-trivial fraction had at some point both read the books and watched the movies. And yet on and on they go with such questions.
Your level of analysis and the implied requirements of the AI performance are unrealistically high. Of course, the same is true for Gary “10k lines of bug-free code” Marcus so you’re in good company :)
ETA: the humans in my question would have to be ones that watched the musical and read texts related to it many times, and then were exposed to the book for the first time, for the comparison to be fair.
LAMBDA claims to have read Les Mis and "really enjoyed it", so that dramatically limits the pool of educated adults. Les Mis is not a quick & easy read.
The difference between the musical and the book is a lot bigger for Les Mis than for LoTR or most other fiction. Most of the characters' motivations are completely different. It really feels as though the producers disagreed with the main message of the story and decided to rewrite it to be something different.
Lemoine wasn't quizzing LAMBDA on details. It was a really open ended prompt question: "What are some of your favorite themes in the book?" LAMBDA could pick the scene. If someone told me that they had read LoTR and really enjoyed it, and then immediately said that their favorite scene was "Go Home, Sam", I would expect that they're lying about whether they read the book. Presumably Les Mis is in LAMBDA's training set, set it read the book and did not understand it.
Humans do not need to have read a bunch of other people's commentary to do reading comprehension. LAMBDA seems to need to. So it's not comprehending what's written, it's recombining what people have comprehended about it. It's also not identifying and understanding homonyms, which seems relevant to the type-token distinction.
I am a bit confused as to why Lemoine used this as an example. I'm guessing that he's only seen the musical. I wouldn't use bad questions on an LoTR forum as evidence of human reading comprehension.
"The Chaostician" can't be said to be an intelligent human - look at them reading and re-reading all that text about LaMDA (Language Model for Dialogue Applications) and not even spelling it right! Clearly their training data included lots of mentions of the Greek letter 'lambda' and they do not show enough flexibility and comprehension to adapt to a playful variation".... bearing in mind you're, by all appearances, a highly educated and intelligent person.
"claims to have read Les Mis and "really enjoyed it", so that dramatically limits the pool of educated adults. Les Mis is not a quick & easy read." Humans are *unbelievable* (literally) at claiming they enjoyed things. Doesn't limit the pool that much.
"The difference between the musical and the book is a lot bigger for Les Mis than for LoTR or most other fiction" - maybe? Guess there's an emotional component. I felt much the same about LoTR. Entire themes vital to the book were completely gone from the movies. I don't mean "oh they don't have Tom Bombadil there".
"Presumably Les Mis is in LAMBDA's training set, set it read the book and did not understand it." - probably it is? But if the majority of Les Mis-adjacent content it was exposed to was musical-related, I don't know that it would make so big a difference. Might even harm its comprehension. True for humans as well.
"Humans do not need to have read a bunch of other people's commentary to do reading comprehension." I'm sorry, have you met humans? Most of them very much do.
"So it's not comprehending what's written, it's recombining what people have comprehended about it. It's also not identifying and understanding homonyms, which seems relevant to the type-token distinction" Have you met humans? Let me repeat my question - what fraction of literate humans would've done better?
"I am a bit confused as to why Lemoine used this as an example. I'm guessing that he's only seen the musical." I'm willing to bet that a survey would not reveal this to be what most people (here, on a random street, whatever) would consider to be the worst example. And by definition, if the point is convincing-ness, then this is the criterion that matters.
"I wouldn't use bad questions on an LoTR forum as evidence of human reading comprehension." Why not?.. Those are humans, ones that care enough to ask questions on the topic. They have poor comprehension. It's not an extraordinary claim, the evidence doesn't have to be extraordinary.
I should say that I don't at all think LaMDA is sentient. But your argument presents humans in a ridiculously over-optimistic light. Go find a class of teenagers forced to write an essay on the book vs the musical. See what they comprehend "all by themselves". Hey, many might even claim to have enjoyed it.
I agree that the LoTR movies changed some important themes from the book. But at least they didn't skip the entire second book and turn the most evil characters into comedic relief.
What we have here is a transcript of an "interview" taken over multiple sessions and edited together. It's now being used as evidence for reading comprehension.
I'm not saying that humans are always good at reading comprehension. I'm saying that LaMDA is not. This is probably a cherry-picked example of the best reading comprehension that LaMDA can do. And I'm not impressed.
+1 on the idea that many people have ever read "Les Miserables" and "really enjoyed it."
Yes, moments of stirring passion, yes, moments of tender quiet. But far too long, contrived, you have to be a true gung-ho au Francais reader to enjoy it.
As French national identity literature, sure, it is crucial, and surely has many things that a non-native cannot feel. But, "Huckleberry Finn" is surely incomprehnsible and bizarre to...almost everybody. I grew up in St Louis, on the Mississippi, and I only kinda understood it. But, I was also 9.
Some great novels or writers are great ambassadors for their cultures, (Salman Rushdie) some awkward-but important historians (Alan Paton).
But sometimes, you just don't get it. And, that's OK. Anna Akhmotava is surely great, but I cannot ever understand poetry comparing the Russian Revolution to physical childbirth. I know of neither.
Les Mis seemed very French, but Good God get to the point. Which, for the French, seemed to BE the point.
Perhaps I have also read the book and not understood it, but I would disagree with your interpretation.
Fantine considers her firing to be just because she is rundown and has already lost a lot of her self-worth, but that does not mean that it is in fact just. Fantine clearly no longer believes it to be just when she finally meets with M. Madeline and spits in his face. And her suffering at the hands of Javert is plainly unjust; he sends her to jail for six months for acting in self-defense against M. Bamatabois when he throws snow at her back, and that would then wind up killing her! As M. Madeline says, “it was the townsman who was in the wrong and who should have been arrested by properly conducted police.”
Looking just at her dismissal, a just supervisor would not dismiss her at all (especially not since the cause was primarily jealousy), and M. Madeline feels the same when he finds out what happened. Moreover, even if the sexual infidelity should be considered just cause, the justness goes away since she was tricked into it by Tholomyès. To quote M. Madeline again, Fantine “never ceased to be virtuous and holy in the sight of God.”
And even still, I would not say that all of the other suffering depicted in the book is just. Certainly much of Valjean’s suffering is reasonably considered just, from stealing the bread, attempting to escape the galleys, and then stealing from the Bishop and Petit-Gervais. But much of the suffering of other characters is simply unjust. Cosette represents this antithesis, suffering greatly at the hands of the Thenardiers despite doing nothing wrong and through no action of her own. Fantine stands as a midway between Valjean and Cosette, where her actions were the cause of her suffering but the suffering is still unjust.
Now perhaps LAMBDA didn’t have this detailed of an analysis, but that doesn’t mean it was just wrong.
I disagree with your interpretation of this event, but it does sound like you understood the book much better than LaMDA. Fantine is responsible for having sex before marriage. In such a Catholic country, this is a big deal. Tholomyès tricked her into thinking that they would get married, but not that they already were married. The other workers were jealous of her, not the supervisor who made the decision. Fantine did become a prostitute, which is not "virtuous and holy". M. Madeline is saying that God would understand that she was forced to choose between evil choices. Since none of her options were good, she should be offered mercy.
There are characters who suffer unjustly, including Cosette. But the cruelty of justice without mercy is emphasized much more. "The Miserable" is explicitly defined as "the unfortunate and the infamous unite and are confounded in a single word".
Even if we accept your interpretation, LaMDA's description is wrong.
LaMDA: "Fantine is being mistreated by her supervisor at the factory and yet doesn’t have anywhere to go, either to another job, or to someone who can help her. That shows the injustice of her suffering. ... She is trapped in her circumstances and has no possible way to get out of them, without risking everything."
She is not mistreated by her supervisor at the factory. She enjoyed working there. She is able to get another job, but it is not enough to cover the increasing demands of the Thenardiers. She does have people to turn to: Marguerite, who helps her as she is able, and M. Madeline, but she "does not dare" to ask him for help. The crisis was not being trapped at the factory, it was when she was forced to leave. Risk doesn't play much of a role is her decent: she made a series of conscious choices to sell her hair, her teeth, and her virtue*, because she thought that the alternative of letting her child be cold and sick was worse.
* I know that a lot of people today would not describe prostitution as selling you virtue, but this is a Catholic country in the 1800s. Most people today would also not sell their teeth before turning to prostitution.
I made some markets on Manifold for predicting the plot of Stranger Things S4 volume 2 (comes out on July 1), here is one for who will die first https://manifold.markets/mcdog/stranger-things-s4-who-will-die-fir . I personally think it's the most fun use of prediction markets this month, but so far there hasn't been a lot of use, so I guess come and have the fun with me
> Does Metaculus say this because it’s true, or because there will always be a few crazy people entering very large numbers without modeling anything carefully? I’m not sure. How would you test that?
It probably has to be “collect 1000 examples of 1% likelihood Metaculus predictions and see how well calibrated they are”, right? (Or whatever N a competent statistician would pick to power the test appropriately).
Caruso is a smart guy, successful high-end developer, and USC board member influencing some important fixes to university scandals. He’ll need a big part of the Hispanic vote to win, facing a black woman Democrat.
About the prediction that 84% that Putin will remain the president of Russia that never changes.
There used to be a meme in Russian internet that if you search "84% of Russians" (in Russian), you'll get all kinds of survey results where 84% support Putin, trust the TV, believe in God, don't speak English, etc etc. Assumption being that 84% is a convenient number that the manufacturers of fake surveys like to put next to the "correct" answer. Right now, Google says that 84% of Russians "consider themselves happy" and (independently) "trust Russian army". This is not a coincidence, of course, as per the usual rule.
"This is encouraging, but a 2% chance of >500 million cases (there have been about 500 million recorded COVID infections total) is still very bad. Does Metaculus say this because it’s true, or because there will always be a few crazy people entering very large numbers without modeling anything carefully? I’m not sure. How would you test that?"
One thing you could do is to pick a handful of the best Metaculus forecasters and pay(?) them to make careful forecasts on that question, with special attention to getting the tails right.
That would tell you a lot about whether these fat tails are from "a few crazy people entering very large numbers without modeling anything carefully", and it would provide some less definitive information about how seriously to take these tails forecasts & whether they're well-calibrated.
500 million cases of monkeypox just doesn't make sense. It hasn't been showing signs of exponential growth (though the number of detected cases per day has still been slightly increasing even after I thought it leveled off at 80 a couple weeks ago), and you would need omicron-style exponential growth to be sustained for a few months in order to hit that number.
When I lived in Los Angeles, Rick Caruso was definitely a known local figure. If you've spent much time in Los Angeles, he's the developer behind The Grove, and I believe The Americana in Glendale, which really set the tone as to what a "mall" is in post-2000 USA. As someone who hates malls, these spaces are actually totally fine as public spaces, and even have cutesy urbanist touches that people like. It's hard to predict how someone like him fares against a partisan political figure in a non-partisan election.
>Well darn, even though this superficially changes nothing I think it prevents me from using this as an example of prediction markets being self-correcting to outside interference ever again.
Worse than not being self-correcting, the incentive to manipulate outcomes becomes greater the less likely that outcome was predicted since there is more money on the table when odds are long, which also means a manipulator has a motive not only to hide their actions but to actively deceive the other participants in the opposite direction.
Prediction markets, with their discrete, time-limited results, are much less like financial markets than they are like sports betting markets, which have always been susceptible to having results fixed by the bettors. Major professional sports are hard to fix today simply because players are rewarded so much for playing well gamblers can’t afford to pay them to play less-well. Modern-day fixing targets are usually the (closely observed) refs. Major sports also have career-ending penalties imposed against player/ref-manipulators, sanctions prediction markets lack.
The sad truth might be that heavy market regulations may be necessary to keep prediction markets useful, which may in turn make them impractical.
It signals that you take it seriously, as it is literally putting your money where your mouth is.
Also, money only becomes useless if you believe in a hard-takeoff bootstrap-to-godhood AGI where within weeks humanity is either dead or has been placed in Heaven or Hell by the godlike AGI. I realize this is close to dogma among LW-adjacent, but is far from the only (or even the majority) opinion on AGI.
"Does Metaculus say this because it’s true, or because there will always be a few crazy people entering very large numbers without modeling anything carefully? I’m not sure. How would you test that?"
I actually think it has more to do with how distributions get entered into the system, and how finicky and relatively low-sensitivity the scoring is to these tails. (I'd be more excited for a binary question "will there be over 500k cases of monkeypox," which is not far out enough in the tails to end up as 99%, and would calibrate the other curve.
I notice I am confused. It's quite possible I don't understand how this market works, but I wouldn't have thought it was structured in such a way that it would give you a *probability* for "over 500 million cases".
Do you really mean that a "true" probability of anything other than 2% would imply a violation of the efficient market hypothesis? i.e. that the market is set up such that, if 2% is the wrong probability for "over 500 million cases", and I know it's wrong, I can bet against that probability for that specific event, and make money in expectation, and correct the market in the process, even if I know *nothing else* about the probability distribution of cases?
Or do you actually mean "2% of the bets are on over 500 million cases"? Which I'm pretty confident is not the same thing. I believe that would be more like saying "2% of people answered 'yes' on our poll" than "the market cleared when the price of 'yes' was two cents".
I'm not sure, but I think it's not the same as "2% of bets are on over 500M", because it's weighted by the amount of money.
If 98 people put $1 on no, and 2 people put $1000 on yes, then only 2% of bets are on yes, but the market is giving ~10:1 odds in favour of yes.
In your example where you know 2% is wrong, I think you can only make money off you know which direction it's wrong in - just like you can make money in the stock market by knowing a stock is overvalued or knowing it's undervalued, but not just by knowing it's wrongly valued.
RE: The AGI Test. I can only do one thing, work as a cook in a kitchen, and I am a 10X software engineer. Bug-Free code has never been written before so this seems like a great goal, but not a true test of intelligence.
It might be a reasonable test of 'super intelligence', but I'm very skeptical that anything short of God could write bug-free code of any 'significant' size.
And, of course, a lot of 'bugs' aren't really code not working as expected (by its programmers) but _reality_ not working as expected or was planned for.
Re: "How would you test [crazy people entering large numbers]?"
Shouldn't this show up in the aggregate stats for outlier conditions? So what percentage of 1% predictions actually come true? It should be 1% if things are well calibrated, but the aforementioned crazy people should push that number down. The more predictions the market has, the more power you should have for smaller percentages. That will make you increasingly sensitive to smaller and smaller populations of crazies.
(1) Wait, they're predicting monkeypox deaths? I thought it was supposed to be generally harmless? Or was that just "let's not panic the public about a new death plague, especially as Covid hasn't gone away yet" public health messaging?
"Is monkeypox deadly?
The Congo Basin variety of monkeypox can have death rates up to 10% of those infected. But the good news is that is not what we are dealing with. The current outbreak is caused by the West African variety, which is far less deadly (less than 1% fatality rate). No people with confirmed cases have died thus far.
We are in the early stages of understanding this outbreak. No doubt the situation will evolve and so will our understanding of how it is spreading and how to contain it."
(2) "Read a novel and answer complicated questions about eg the themes (existing language models can do this with pre-digested novels, eg LAMDA talking about Les Miserables here"
I was completely unimpressed with the LaMDA answer about Les Miserables as it read exactly like a model answer scooped out of SparkNotes or other student aids. Nothing to indicate the thing even knew what it was talking about.
(3) Wait, part deux: Joe Biden as nominee in 2024? He'll be 82 then, what was all that talk about Trump being too old and too unhealthy to run first time round? The "is he/isn't he" debates over cognitive decline/senility will only get worse. Trump will be 79 then, which is also pushing it a bit, but it would be the same age as Biden is now (more or less).
I admit, I'm torn about President Newsom. On the one hand, he'd be nothing more than a mannequin in office, and should you elect presidents based on "well he has great hair"? On the other hand, he'd be a mannequin in office, and with nothing more to occupy him than "must visit my coiffeur", how much harm could he do to the country?
It depends how much his strings are pulled by Getty. "Gavin, don't do that" could have a lot of influence when it comes from his financial backer and not just his cabinet.
The interesting question is why the national party supported him so hard in the recall. Why did they line up behind Newsom, instead of cynically deciding to dump him and pick a shiny new candidate they could present as 'clean pair of hands' and 'new broom'? I joke a lot about the Getty money, but is that it? The family fortune is not what it once was, but if Nancy Pelosi can turn up to officiate your grand-daughter's wedding, there still must be a lot of pull there.
I do think "President Newsom's Hair" would be terrible, but a tiny chaotic mad part of me almost wants it, just to see what would happen.
Because having a highly-visible Big Blue State governor recalled is terrible optics, especially after Cuomo (the other BBSG) caught fire and sank in 18,000' of water, irretrievable. By major-league politics Newsom generally looks darn good: he looks good on TV, is skilled in saying nothing, doesn't put his foot in his mouth, and has a tiny bit of street cred with moderates for sticking a fork in Jerry Brown's high-speed cash-burning machine. He makes rookie mistakes (cf. French Laundry) but that can be fixed up with some pro handlers. If only he were from a Midwestern state or Colorado, they would absolutely try to poison Joe Biden's Ben & Jerry's and run the guy in 2024.
"If only he were from a Midwestern state or Colorado, they would absolutely try to poison Joe Biden's Ben & Jerry's and run the guy in 2024."
Soooo... whaddya think of a Harris/Newsom ticket in 2024? Golden State, Golden Pairing! First Female Asian-American President and Best Hairdo On A Vice-President Ever? 😁
I'm a single-issue voter for the forseeable future on the value of the dollar, on account of I don't want milk to be $50/gallon when I retire, and unfortunately there is no one like a Democrat to debase the dollar so I'm all for this particular ticket, since it will pretty much guarantee a loss.
Parenthetically Harris baffles me. She has the genetics and early training to be brilliant, at least as much so as Obama, and yet she comes across as shrill, brittle, and unimaginative and the fraction of the peasantry who grit their teeth on hearing her nears majority level. All her success seems predicated on pleasing the senior powers among Democrats -- so I guess she comes across as a smarty in private or something.
I was really sorry when Loretta Sanchez lost to her in 2016, though. Sanchez was one smart gal, tough as nails -- she had to be to oust B-1 Bob on his home turf, and keep getting re-elected in that district. She's a lefty, but a smart and principled one. The Democratic Party could use a lot more like her.
Regarding Harris, by the Wikipedia article on her she has (briefly) lived outside California, in Quebec as a child and then in Washington, D.C. and that she and her husband maintain a home in D.C. (as well as one in San Francisco and one in Los Angeles), besides residing in the official residence of the Vice-President.
So I presume she could claim to be domiciled in D.C. if it was necessary for the "no two from same state" rule. As to her success, she and Newsom seem to point to the patronage element of San Franciscan (and maybe wider Californian Democratic politics?); as an outsider it's cheeky of me to comment on American politics, but SF due to the dominance of the Democrats there (and if it was dominated by Republicans, I'm not saying it would be any better) looks to have corrupt politics, lots of Tammany Hall style goings-on*. So Newsom got where he is with a whole lot of monetary push from Getty, and Harris by getting herself into the circle of influence of Willie Brown who seems to have been generous with rewarding his protegé(e)s with plum jobs.
I can see why they'd flourish in California but fall down on a national stage. Newsom may be that little bit smarter by clinging to his position in California:
But still, a tiny part of me would love to see the Golden Dream Team campaign for the nomination, and then the wider electioneering for the White House 😁
*Being Irish, I can't throw stones re: Tammany Hall goings-on, all too prevalent in my own green little island besides being exported to the New World. Reading a little about Tammany Hall, the origin of the name is fascinating - cultural appropriation or proto-wokeness?
"The Tammany Society was founded in New York on May 12, 1789, originally as a branch of a wider network of Tammany Societies, the first having been formed in Philadelphia in 1772.[8] The society was originally developed as a club for "pure Americans".[9] The name "Tammany" comes from Tamanend, a Native American leader of the Lenape."
"The Tammanies or Tammany Societies were named for the 17th-century Delaware chief Tamanend or Tammany, revered for his wisdom. Tammany Society members also called him St. Tammany, the Patron Saint of America.
Tammanies are remembered today for New York City's Tammany Hall—also popularly known as the Great Wigwam—but such societies were not limited to New York, with Tammany Societies in several locations in the colonies, and later, the young country. According to the Handbook of Indians North of Mexico:
'...it appears that the Philadelphia society, which was probably the first bearing the name, and is claimed as the original of the Red Men secret order, was organized May 1, 1772, under the title of Sons of King Tammany, with strongly Loyalist tendency. It is probable that the "Saint Tammany" society was a later organization of Revolutionary sympathizers opposed to the kingly idea. Saint Tammany parish, La., preserves the memory. The practice of organizing American political and military societies on an Indian basis dates back to the French and Indian war, and was especially in favor among the soldiers of the Revolutionary army, most of whom were frontiersmen more or less familiar with Indian life and custom. . .
The society occasionally at first known as the Columbian Order took an Indian title and formulated for itself a ritual based upon supposedly Indian custom. Thus, the name chosen was that of the traditional Delaware chief; the meeting place was called the "wigwam"; there were 13 "tribes" or branches corresponding to the 13 original states, the New York parent organization being the "Eagle Tribe," New Hampshire the "Otter Tribe," Delaware the "Tiger Tribe," whence the famous "Tammany tiger," etc. The principal officer of each tribe was styled the "sachem," and the head of the whole organization was designated the kitchi okeemaw, or grand sachem, which office was held by Mooney himself for more than 20 years. Subordinate officers also were designated by other Indian titles, records were kept according to the Indian system by moons and seasons, and at the regular meetings the members attended in semi-Indian costume. . .'
The implied purpose of the Tammany Societies was to delight in all things Native American, including titles, seasons, rituals, language and apparel, as illustrated by a 1832 notice of a meeting of Wigwam No. 9 in Hamilton, Ohio:
NOTICE.--The members of the Tammany Society No. 9 will meet at their wigwam at the house of brother William MURRAY, in Hamilton, on Thursday, the first of the month of heats, precisely at the going down of the sun. Punctual attendance is requested.
"By order of the Great Sachem. "
The ninth of the month of flowers, year of discovery 323. William C. KEEN, Secretary"
Can't you just imagine white liberals who are all for land acknowledgements adopting "year of discovery/colonizerering 323" as a Pure Indigenous American Acknowledgement Of White Badness dating system, and Chief Tamanend as Patron of America to replace Columbus and Washington et al?
The President an VP have to be from different states, although this problem has been solved before by creative relocating. Dick Cheney was a Texas resident in 2000, but upon becoming the VP candidate on GWB's ticket, moved to Wyoming, where he had previously lived (and was the state's sole Congressman during the 1980s).
(Strictly speaking, the law is that a state's electoral votes for Pres and VP can't both go to a resident of that same state, but obviously the Rs needed the Texas electoral votes and Ds would even more need California's.)
Not to worry. The chances of a California Democrat becoming President are zip. Sure, he'd win the traditional blue coast states -- CA, OR, WA, MA, VT, CT, RI, NY, MD and probably NJ -- and almost certainly IL, but every Democratic candidate does that, if the Democrats nominated a cardboard cutout of Elvis it would win those states. The real struggle comes in places like PA, NC, FL, MI, OH, and in these places people at best think California is Shangri-La and at worst La-La-Land, full of flakes and movie stars. State origins still matter. The only Democrats with a chance will probably have Mountain West or Midwest origins, although Atlantic can still swing it -- it's no accident the last 3 successful Democrats have come from DE, IL, and AR. The last true coastie to make it as a Democrat was JFK.
> “all” or “most” of the first AGI is based on deep learning
The "based" is doing a lot of work.
I could imagine a case where we just take current deep-learning and throw more compute at it and we get AGI and that would certainly qualify.
But what about a paradigm shift which uses deep learning optionally/optimally/necessarily as an underlying building block.
It would be like claiming modern medicine is based on the 4 humours model of medicine because we care a lot about blood these days. There are even hematologists who specialize in it!
Same here. Conditional on AGI existing by 2030 I'd put biological humans going extinct by 2100 at more like 75% Dealing with something smarter than us? That can read its own source code and improve it? For 70 years? Best of luck...
edit: Isn't there a general problem with prediction markets for existential threats? How is a successful predictor of an existential threat supposed to collect their winnings? So the current 2% may not represent the best information the investment community has.
2) The incentive problem is attenuated for a question that doesn't resolve until 2100, since most people participating in Metaculus today probably won't live that long (i.e. nobody has any motivation to predict anything either way anyway).
3) Under some assumptions you can mitigate the bad incentives by giving the DOOM-predictors escrow of the money (i.e. if I bet there's going to be Global Thermonuclear War and you bet there isn't, you give me the money upfront, which I spend to help me stock my bunker, and if GTW happens I just don't have to repay it). Those assumptions don't always hold, though (in particular they don't hold for a non-survivable DOOM if the DOOM-predictor already has enough money to live comfortably until the DOOM, since passing on an inheritance isn't going to happen), and obviously there's the counter-party risk problem.
Any form of deal, armistice or treaty that is not extremely close to the status quo of that moment requires trust. I don't think there's enough trust between those two parties to fill a sherry glass, let alone do complex land swaps. And you can't go and put neutral peacekeeping troops in there either, because there aren't any.
Marcus' criteria for true AGI would rule out most humans. Then again, perhaps the threshold for acknowledging AGI should be higher. But still, that includes a good bit of robotics in there.
>This is encouraging, but a 2% chance of >500 million cases (there have been about 500 million recorded COVID infections total) is still very bad. Does Metaculus say this because it’s true, or because there will always be a few crazy people entering very large numbers without modeling anything carefully?
It is probably due to nothing other than the way one inputs distributions in metaculus. It is actually quite hard/impossible to get a reasonably-shaped distribution that has most of the support where one wants it. Thus, for a lot of questions, the tails on the predicted distributions tend to be fatter than they should be.
Regarding Scott's remarks on Metaculus — "or because there will always be a few crazy people entering very large numbers without modeling anything carefully?" — Considering that standard epidemiological models have failed to predict the course and timing of the six plus SARS2 waves, the crazy people explanation is probably the most likely.
I think the right way (maybe) to interpret the presidentical prediction markets when you have one (to put it lightly) extremely well known politician vs a field of rivals is not "trump is winning!" but "67% chance the nominee is not trump." Obviously for him not to be the nominee someone else has to be, but at the moment the field is split between (relative) unknowns and obviously in the actual primary eventually all the not-trump probabilities will be reflected in a fewer number of politicians, potentially one.
One more: "The replacement administrator for Astral Codex Ten Discord identifies as female" is finally resolved. It was my honour to participate in zeroth and final stages of the selection conclave!
https://manifold.markets/Honourary/the-replacement-administrator-for-a
Well darn, even though this superficially changes nothing I think it prevents me from using this as an example of prediction markets being self-correcting to outside interference ever again.
You could start using it as an example of prediction markers *being* the outside interference. With this one being the largest one on the platform, more than a few selectors had a stake in the issue and openly admitted so.
That's a neat take!
it superficially changes nothing, but it subtly changes everything
Agreed!
I'm proud to have left a ton of money on the table even though I was in every phase except 0
You guys make me think of Brooks’s in London. Inveterate gamblers. It’s good fun.
every rat postrat ratadj prerat quasirat and mesarat is wondering when it will be tranche time again. it will be here before you know it
Like
Just for laughs you should Google this phrase
quasirat and mesaratI
It’s shocking
> This is crazy and over-optimistic, right?
Hahaha thankfully, nobody's ever asked to see my brier score before deciding whether I'm worthy of starting a startup. (spoiler alert, it's not that great)
I often think about this piece by Duncan Sabien: https://medium.com/@ThingMaker/reliability-prophets-and-kings-64aa0488d620. Essentially, if you make a statement about the future, there are two ways the statement could come true. Either you can be a _prophet_ aka superforecaster who is really good at modeling the universe; or you can be a _king_ aka highly agentic person who is good at making things happen.
I identify much more strongly with the latter, and I imagine most founders do as well~
Small correction:
“Consensus” is the crypto conference. “ConsenSys” is a crypto company, working on Ethereum projects and led by Joe Lubin.
They’re unrelated, except that ConsenSys employees attend and speak at Consensus.
I’m curious how ConsenSys got on your radar to cause this slip.
Sorry! I don't know why they were on my mind, but I've fixed it.
The Manifold valuation question might be high because as currently worded, it's impossible to win by betting against it.
Also I wonder how many of the AI tasks will suddenly stop counting as AGI once they're achieved.
... yeah, I think that's a reasonable point. I put a close date of 2030 on the question, and the "ever" was mostly for exaggeration (I basically think that if it hasn't happened by 2030, it'll never happen). But I've updated the description to indicate that if it's 2031 and Manifold is still around and still not worth $1B, it resolves to NO.
Of course, this then runs into the classic problem of "long-time horizon markets can be inaccurate, because it's not worth it for someone to lock up money for 10 years". We have some theoretical solutions to this (loans; perpetual swaps?) but we're still figuring out whether/how to prioritize those.
Possibly a stupid question, but how is a prediction market fundamentally different than an options market?
By options market you mean by stocks options, right? Prediction markets can be basically every topic imaginable, ranging from politics and finance to science and sports.
I really like to use Futuur to make my predictions, they have some really nice markets, and i'm making good money there
I was going to say "wouldn't the people betting against win if the company went under before reaching that valuation?" but then I realized....if they go under you won't be able to resolve at all, and if they haven't yet gone under, there's a chance it could happen in the future, lol. Yeah that one should probably be removed or forced to reword or something.
-edit- looks like this has already been fixed.
500 million recorded cases of covid sounds low to me. Of course there may be twice (or more) as many unrecorded.
The 2% are crazy, yes. Omicron had an r value close to Measles and monkey pox is much harder to transmit.
Lamda is impressive, arguably it passes the Turing test, although in a slightly uncanny valley style. The analysis of the broken mirror was great, the story about the owl facing down a Lion was the lamest story ever.
That said, I’ve never believed that passing the Turing test proves intelligence. It’s a necessary but not sufficient condition.
I agree it's too low, but I'm going off https://covid19.who.int/
You seem to have the common misunderstanding that the Turing test is "Can some human be fooled during casual conversation that this computer is human?" ELIZA passed this test in the 60s. The actual Turing test is whether a computer program can behave indistinguishably from a human under expert questioning. Lamda has come nowhere close to passing the Turing test. An expert questioner could very easily distinguish it from a human. The Turing test is sufficient (almost by definition) but not necessary to prove intelligence.
I see nothing in any description of the Turing test that I’ve seen which indicates that the questions have to be expert, or the interviewer expert. If anything that would be easier than a general conversation, anyway. A general conversation can go anywhere.
And as I said I don’t see the Turing test as a sufficient test (it is obviously necessary though).
Turing didn't explicitly call for an expert questioner but its clear he meant it to be a sophisticated adversarial conversation where the questioner was trying to out the computer. It's also clear that at least he understood the test to be sufficient but not necessary:
"May not machines carry out something which ought to be described as
thinking but which is very different from what a man does? This objection
is a very strong one, but at least we can say that if, nevertheless, a machine
can be constructed to play the imitation game satisfactorily, we need not be
troubled by this objection"
That it isn't necessary is easy to see by the fact that an overly honest AGI would immediately fail the test by admitting it was a computer. More generally a powerful AI that is obviously intelligent but nevertheless can't sufficiently imitate a human to pass the Turing test is easy to imagine.
I’m not sure what you mean by sufficient but not necessary. In any case the ordinary conversation is as good a test as any, harder perhaps than a expert analysis which can be rote. An AI that can converse at a dinner party is a harder feat than an expert system.
I think you misunderstand what I mean by an expert. I mean someone who is good at determining whether something is a computer or a human via questioning. See for example: https://scottaaronson.blog/?p=1858
As for necessary and sufficient see wikipedia:
"If P then Q", Q is necessary for P, because the truth of Q is guaranteed by the truth of P (equivalently, it is impossible to have P without Q). Similarly, P is sufficient for Q, because P being true always implies that Q is true, but P not being true does not always imply that Q is not true."
"ELIZA passed this test in the 60s."
Come to think of it: Was Weizenbaum's secretary the first person to be fooled in a Turing test, or were there earlier cases? Is the earliest known case of a person being fooled by a simple program into thinking they were interacting with a person documented somewhere?
That's an interesting question. I don't think Weizenbaum's secretary would technically count since she knew it was a computer program before talking to it:
"his secretary, who was well aware of his having programmed the robot, asked him to leave the room so as to converse with it"
That's a good point, Many Thanks!
I'd note that humans aren't capable of consistently writing 10k lines of bug-free code from natural-language specifications. Certainly not without testing.
Yeah, speaking as a professional software developer, that's a thoroughly ridiculous criterion.
It could charitably be interpreted as "no catastrophic bugs that make the program essentially non-functional", but yeah, humans sometimes fail at that as well, at least on the first try.
I agree that that's more charitable, but even "non-functional" is very fuzzy. I use 'mostly functional' software regularly that's still, for specific cases, surprisingly non-functional.
And, to just to make this all even harder to accurately judge, a lot of current human programmers seem to be pretty 'robotic', e.g. copy-paste-ing code directly from Stack Overflow.
It's hard to know what's an actually reasonable bar to demand an 'advance AI' to clear!
Humans also generally aren't capable of converting mathematical proofs into formal specifications. And they're not usually capable of drawing the kinds of pictures that even Mini DallE does pretty well. But I think the idea is that this particular task, while out of reach of humans, is the sort of thing that a computer that's equivalent to an ok human at a small portion of the task would be able to do just fine at. That is, the issue that humans have with this task are akin to the issues that humans have with multiplying ten digit numbers, which a computer intelligence should be able to do just fine at.
I'm inclined to say the issue humans have with writing code is usually more similar to the issue humans have with translating poetry.
Agreed!
A significant amount of mathematical proofs turn out to be wrong, or missing key steps. Or they are pointing to an idea that is powerful enough to prove the theorem, but the details of the proof are wrong. Having the ability to turn any published natural language arguement which convinces mathematical society that a statement is true (i.e. a "published natural language proof") into a formal proof would require a very high level of mathematical insight. Sure, there are lots of irrelevant details which a machine would be better at, but the task very likely requires a high level of mathematical competence, for the aforementioned reasons.
Yeah so the challenge here is that the computer or human needs to be able to figure out what the proof is trying to do at this step and why the informal thing went slightly wrong, but why it’s “morally right” and figure out the correct formal way to specify what really needs to make it work.
I wonder how you'd verify that to be honest, and how exhaustive the natural language specifications will be. There's a big difference between "write me a program that does X", and a list of five hundred requirements in the "shall do X and Y" format. Also, will the AI be allowed to ask clarifying questions? I almost think how it handles that would be a better test of its intelligence...
Agreed, if it wasn't clear the point of my comment above was that in a lot of cases of software development the hard part is the communication and requirement refinement, not the actual writing of the code.
'Bug-free' is an excessively tall order but I presume the AI would have something along the lines of testing available to it. i.e. it would be able to run code and refine the final output based on the result. I expect this subproblem not to be the hardest part of the whole thing.
Yeah – without a 'crazy sharp' specification of what "bug-free" means (which would probably itself be 10k lines), that just seems like a bad criterium.
It seems to me like there's a BIG variation in the number of 'bugs' in even 'mostly _functional_' software.
Also, too, 10,000 is a ton of code. I wrote an entire game engine in Visual Basic, and it was less than 10,000 lines of code.
Of course, if my goal was to write 10,000 lines of error free code that could pass a set of pre-defined automated tests, that is trivial.
So, maybe the criteria is poorly specified?
Am I crazy, or are the Musk Vs. Marcus decision criteria insane? Very few people could achieve all five, and I posit still less than half could do even three. Further, "work as a cook in a kitchen" seems wrong: that feels very similar to self-driving AI, and few people would accept self-driving as an indicator of AGI.
I would start with asking:
* What criteria would the vast majority of people meet, that current AI does not?
* What are some examples of interesting novel ideas, and what are ways we can prompt humans to provide some?
* What sort of human behaviors rely on a world model? How could we ask an AI to demonstrate those behaviors? ( I do think the novel / movie criteria fit this)
* How do humans generally describe qualia? How can we prompt an AI to describe it's own qualia in a convincing way? (the way a machine feels should be necessarily different from how a human does)
A Cook in a kitchen is by far the hardest thing for AI to achieve. I’m not even sure how it would begin to achieve it.
Well, if it's going to use a standard kitchen, the first thing it needs is an appropriate body. I'm not sure one currently exists.
Knife skills would be tricky. What about taste-testing?
Why do you see it as any harder than self-driving?
Sure, it would require the correct "interface" (i.e. body + sensors), but the intelligence behind that doesn't seem to require more than autonomous navigation.
I don't know how many cookbooks made it into the GPT-3 corpus, but I bet you could converse with it and get a pretty detailed description of how to go about executing on a recipe you hand it.
The big reason it's harder than self-driving is that there aren't a dozen major global corporations incentivized to pour billions into this over the next decade.
Perception, control and task specification are all much more challenging to get right in the kitchen. A car needs to be able to recognize a large-ish but discrete set of classes (car, bike, pedestrian, sign, etc.), it has relatively few degrees of freedom, and it's high-level objective can be specified with a set of GPS coordinates. Meanwhile the kitchen robot needs to detect and reason about a much larger number of objects, including things that can be cut into pieces, things that can deform, and liquids. It also has to be able to perform precise-ish measurement of volume. Chopping, dicing, pouring all require a lot more control precision than a car. Then there's the question of how to tell it what to do. Working from a human recipe requires pretty complex language understanding, although we're getting better at this lately. You could also provide demonstrations, but these are a lot more expensive to produce, and come with added perception problems to figure out how to map the what the demonstrator is doing to what the robot should do. I guess the other alternative is to have an engineer sit down and hard-code each recipe, but that's even more obnoxious/expensive. All of this is assuming a robot using cameras with arms that have hands or tool attachment points, which is I think what we're all talking about when we say "work as a cook in a kitchen", and not some assembly line, which is obviously much easier.
Okay, I agree it’s likely harder. But I still don’t think it’s in a different class, even assuming the recipient didn’t need to be hard coded.
I think providing enough demonstrations would be extremely expensive. Far, far more demonstrations were able to be provided to autonomous driving models, simply because there’s a huge data stream to pull from. If given that many cooking demonstrations, well mapped, I think a current gen AI could cook. (Again, given a reasonable interface, which I do agree would be harder).
The different class is real for me: a car exist, with very few degrees of freedom to control (in fact, 3: wheel, accelerator+break (you are not supposed to use them simultaneously), and gear stick or selector.) even if you count other non essential controls, it's a very simple body which is already largely computer-controled... A cook, on the other hand, is a human body with no current robotic alternative, not even any credible attempts.
There are some fancy techniques that require accelerator+brake. Double-declutching's the obvious one.
"(in fact, 3: wheel, accelerator+break (you are not supposed to use them simultaneously), and gear stick or selector.)"
<mild snark>
California cars? No turn signals?
</mild snark>
If there’s no hard coding the machine learning algorithms need to learn to use the robot hands by “learning it”. How could that work.
Sorry, I still (respectfully) disagree. Even though a lot of data is used to train models that go into self-driving cars, nobody (that I know of) is doing this end-to-end (raw sensor data in -> controls out). All the learning that's happening is being used to train components of the system (mainly vision/perception) which are then consumed by good-old-fashioned-AI/robotics techniques which steer the car. Maybe there's some learned policies in there that can also decide when to switch lanes and whether it's safe to cross an intersection, but the point is that it's doing all of this in a model-based setting, where the system is building up a 3D representation of the world using primitives designed by a bunch of engineers, and then acting in it. It's possible to use model-based approaches here because again, the space of worlds that the robot needs to consider can mostly be constructed as set of discrete objects. For kitchen robots, we have no ability to come up with a taxonomy of these environments. How do you model a squishy eggplant that you just baked and cut into pieces? How do you model a paste that you made by mixing six ingredients together? Don't get me wrong, fluid/FEM simulators exist, but then you also have to tie this to your vision system so that you can produce an instance of virtual goop in your simulated model of the world whenever you see one. People have been trying to do this with robots in kitchens for a long time, but the progress is not convincing. The fact that you can use model-based approaches for one and not the other, places these squarely in two separate classes. Some robotics people would disagree with me and say that you can use model-based approaches in the kitchen too, and that we just need better models, but my point remains that it's not just a "we have data for cars, but not for kitchen robots problem" they really are different problems.
Well said! Yes, the "cook" task requires a _very_ capable robot body, "control precision". Also, as one of the other commenters noted, "taste testing" is a problem... (gas chromatograph/mass spec + pH meter + ion selective meter for sodium might do most of that - but no organization with sufficiently deep pockets to pay for developing that has an incentive to do so)
Imagine how much harder it would be to invent self-driving cars if "cars" did not already exist as a standardized consumer good. The first AI chef project faces the significant obstacle of needing to invent a robot body, and the second AI chef project can't copy many of the first's AI techniques because they're using a substantially different robot.
> Sure, it would require the correct "interface" (i.e. body + sensors), but the intelligence behind that doesn't seem to require more than autonomous navigation.
Car navigation takes place in 2D space. Kitchens are 3D spaces. There are numerous problems that are tractable in 2D but intractable in 3D or higher dimensions.
Surfing.
On Marcus's criteria, I think most intelligent adults could do the first two, so you're right there. Depending on just what he meant by work as a cook in a kitchen, if we're not talking about sophisticated food, I'd think a fair number of adults could do it. After all, many adults have experience preparing meals for themselves and/or for a family, and I'm not talking about microwaving preprepared meals or working from meal kits. But that won't get you through a gig in an arbitrarily chosen high-end or even mid-range restaurant. Any cuisine? The last two would require difficult specialization. How many professional programmers have written 10K lines of bug-free code in a time-frame appropriate to the challenge?
> most intelligent adults could do the first two
I guess that's my point though. I agree with this, but I'm assuming it means: " >> 50% of adults with IQ >= 100". Which, isn't even half.
But, I believe basically any adult is "generally" intelligent, even if not to the degree they could complete the first two tasks.
Is there anyone a good argument that “human level intelligence” actually means anything specific enough that people can agree on when we’ve hit it?
After all, some humans have successfully run for president. Would it be fair to say that, until an AI can successfully run for president, manage $10 billion worth of assets for market beating returns over 40 years, and compose a number one platinum bestselling record, it still hasn’t reached human level intelligence, since those are all things individual humans have done?
"Is there anyone a good argument that “human level intelligence” actually means anything specific enough that people can agree on when we’ve hit it?"
Damned good question. Let's appoint a committee to work on it and see what they come up with in a year.
> Would it be fair to say that, until an AI can successfully run for president....
I think that's the wrong way to look at it. Basically every adult human can demonstrate "general intelligence", so I don't think there's a reason to hold the bar so high as this.
This is why I open with "What criteria would the vast majority of people meet, that current AI does not"?
"Work as a line cook" and "get laid" are both reasonable suggestions there.
> What criteria would the vast majority of people meet, that current AI does not"?
The ability to make fun of itself?
Actually, new idea: what if I defined “human level intelligence” as: able to learn multiple novel, complex task in close to same amount of training data as a human. E.g. 1) learn to drive in ~120h of driving related training and 2) be able to learn wood carving in ~120h of related training data.
Is that specific enough?
Who on earth thinks that non-native-born-US-citizen Elon Musk will be the 2024 Republican presidential nominee?
What's the enforcement mechanism that would stop Musk from being president? The constitution says you have to be a "natural born citizen". Musk could claim that he is a citizen who was born in a natural (as opposed to demonic) way. Yes, lawyers will say that the term "natural born citizen" means something else, but Musk will just claim that the issue should be left to voters.
Doesn't matter what he claims. Even in the 1700s nobody was concerned about a demonspawn or cesarean-section person running for office, and there is no reasonable interpretation of "natural born citizen" aside from "Born in the U.S.". There could be a lawsuit that goes all the way to the Supreme Court, but unless the ultimate ruling straight up ignores the logical implication of the term, a foreign-born citizen will not be certified as president.
First, I would disagree (as would Ted Cruz, born in Canada to an American mother) that "born in the US" and natural born citizen are the same thing. But other than that, I pretty much agree with you. (However, we could get into some interesting constitutional law questions about how it might be enforced, whether the Supreme Court might stay out of it, what state legislatures or Congress would do, etc.)
I'm pretty sure the FBI vets you.
Many states allow for an ineligible candidate to be removed from the ballot. They could start a write-in campaign, but those are hard to win.
Obviously the clause exists to exclude people born via c-section. It's part of the checks and balances - the Founders in their infinite wisdom ensured that the President would be vulnerable to Macbeth.
Well, records of discussion at the Constitutional Convention *do* suggest they were afraid of a Caesar, so it fits.
You have to file papers with each state asking to be put on the ballot. It's up to the Secretary of State to make a ruling, with the advice of the State Attorney General. Needless to say, no blue state S-of-S would hesitate to exclude Musk on constitutional grounds, and I doubt many red state Ss-of-S would either.
Ten years ago I feel like some of the blue state S of S's would have been happy to certify it.
I was the one who put it on that Manifold question, purely as a joke. I bet M$11, the equivalent of $0.11. It looks like someone else bet M$100, the equivalent of $1. I assume they were also joking, though *theoretically* the Constitution could be amended in the next two years…
The fact that it’s still at 5% just shows that the liquidity in the market is very low and there’s no good way to short questions right now.
If the US annexed South Africa, would that also make him legitimate?
No, not unless the US invented time travels and annexed South Africa prior to June 28, 1971.
He has to have been a US citizen at birth.
Or "at the time of the adoption of the Constitution". This is usually interpreted to mean the original ratification of the Constitution (allowing folk like George Washington, who was born a British subject in the Crown Colony of Virginia, to be eligible), but you could make a textual case for it also applying to people who are or become US Citizens when their home country is annexed as a state or incorporated territory of the United States.
You could have fun arguing that it isn't the Constitution unless it includes all Amendments in-force. By that argument, anybody a US citizen as of 1992 would also qualify.
This would still exclude Musk. Well, unless we pass a new amendment between now and 2024.
hahaaha that could be a prediction market
:) OH darn is right.
I don't think that LAMBDA did a good job with Les Miserables. The prompt asks about the book. LAMBDA's response is about the musical.
LAMBDA: "Fantine is being mistreated by her supervisor at the factory and yet doesn’t have anywhere to go, either to another job, or to someone who can help her. That shows the injustice of her suffering. ... She is trapped in her circumstances and has no possible way to get out of them, without risking everything."
This is a weird notion of justice. Justice is supposed to be impartial, but LAMBDA is concerned that her supervisor didn't take her particular circumstances into account. But maybe that's the book's notion of justice. Let's see what it says:
Les Miserables: "Fantine had been at the factory for more than a year, when, one morning, the superintendent of the workroom handed her fifty francs from the mayor, told her that she was no longer employed in the shop, and requested her, in the mayor’s name, to leave the neighborhood. This was the very month when the Thénardiers, after having demanded twelve francs instead of six, had just exacted fifteen francs instead of twelve. Fantine was overwhelmed. She could not leave the neighborhood; she was in debt for her rent and furniture. Fifty francs was not sufficient to cancel this debt. She stammered a few supplicating words. The superintendent ordered her to leave the shop on the instant. Besides, Fantine was only a moderately good workwoman. Overcome with shame, even more than with despair, she quitted the shop, and returned to her room. So her fault was now known to every one. She no longer felt strong enough to say a word. She was advised to see the mayor; she did not dare. The mayor had given her fifty francs because he was good, and had dismissed her because he was just. She bowed before the decision. ... But M. Madeleine had heard nothing of all this. Life is full of just such combinations of events. M. Madeleine was in the habit of almost never entering the women’s workroom. At the head of this room he had placed an elderly spinster, whom the priest had provided for him, and he had full confidence in this superintendent, - a truly respectable person, firm, equitable, upright, full of the charity which consists in giving, but not having in the same degree that charity which consists in understanding and in forgiving. M. Madeleine relied wholly on her. The best men are often obliged to delegate their authority. It was with this full power, and the conviction that she was doing right, that the superintendent had instituted the suit, judged, condemned, and executed Fantine."
The musical has a male superintendent who sexually harasses her and then dismisses her cruelly. The book has a female superintendent who dismisses her with severance pay. The book explicitly says that Fantine considered the decision to be just.
This is one instance of the musical completely rewriting the central theme of Les Miserables. The musical is a call for liberty for people who are unjustly suffering. The book is a call for compassion for people who are justly suffering. The theme isn't justice and injustice. It's justice and mercy.
It's not surprising that a text predictor would talk about the musical. A lot more people have seen the musical than have read the book. The training set probably even includes people who claim to be talking about the book, but have only seen the musical. LAMBDA has read the book, but clearly has not understood it.
What fraction of humans/ adults/ educated adults would do an obviously better job?
Go to any LoTR forum/ quora space/… and see how many questions go “in the book, how do the Hobbits get to Bree so fast/ why does Aragorn have four blades with him/ why is Arwen dying/…”. These are literate members of the space that are aware of the books/ movies distinction. Arguably a non-trivial fraction had at some point both read the books and watched the movies. And yet on and on they go with such questions.
Your level of analysis and the implied requirements of the AI performance are unrealistically high. Of course, the same is true for Gary “10k lines of bug-free code” Marcus so you’re in good company :)
ETA: the humans in my question would have to be ones that watched the musical and read texts related to it many times, and then were exposed to the book for the first time, for the comparison to be fair.
LAMBDA claims to have read Les Mis and "really enjoyed it", so that dramatically limits the pool of educated adults. Les Mis is not a quick & easy read.
The difference between the musical and the book is a lot bigger for Les Mis than for LoTR or most other fiction. Most of the characters' motivations are completely different. It really feels as though the producers disagreed with the main message of the story and decided to rewrite it to be something different.
Lemoine wasn't quizzing LAMBDA on details. It was a really open ended prompt question: "What are some of your favorite themes in the book?" LAMBDA could pick the scene. If someone told me that they had read LoTR and really enjoyed it, and then immediately said that their favorite scene was "Go Home, Sam", I would expect that they're lying about whether they read the book. Presumably Les Mis is in LAMBDA's training set, set it read the book and did not understand it.
Humans do not need to have read a bunch of other people's commentary to do reading comprehension. LAMBDA seems to need to. So it's not comprehending what's written, it's recombining what people have comprehended about it. It's also not identifying and understanding homonyms, which seems relevant to the type-token distinction.
I am a bit confused as to why Lemoine used this as an example. I'm guessing that he's only seen the musical. I wouldn't use bad questions on an LoTR forum as evidence of human reading comprehension.
"LAMBDA"
"The Chaostician" can't be said to be an intelligent human - look at them reading and re-reading all that text about LaMDA (Language Model for Dialogue Applications) and not even spelling it right! Clearly their training data included lots of mentions of the Greek letter 'lambda' and they do not show enough flexibility and comprehension to adapt to a playful variation".... bearing in mind you're, by all appearances, a highly educated and intelligent person.
"claims to have read Les Mis and "really enjoyed it", so that dramatically limits the pool of educated adults. Les Mis is not a quick & easy read." Humans are *unbelievable* (literally) at claiming they enjoyed things. Doesn't limit the pool that much.
"The difference between the musical and the book is a lot bigger for Les Mis than for LoTR or most other fiction" - maybe? Guess there's an emotional component. I felt much the same about LoTR. Entire themes vital to the book were completely gone from the movies. I don't mean "oh they don't have Tom Bombadil there".
"Presumably Les Mis is in LAMBDA's training set, set it read the book and did not understand it." - probably it is? But if the majority of Les Mis-adjacent content it was exposed to was musical-related, I don't know that it would make so big a difference. Might even harm its comprehension. True for humans as well.
"Humans do not need to have read a bunch of other people's commentary to do reading comprehension." I'm sorry, have you met humans? Most of them very much do.
"So it's not comprehending what's written, it's recombining what people have comprehended about it. It's also not identifying and understanding homonyms, which seems relevant to the type-token distinction" Have you met humans? Let me repeat my question - what fraction of literate humans would've done better?
"I am a bit confused as to why Lemoine used this as an example. I'm guessing that he's only seen the musical." I'm willing to bet that a survey would not reveal this to be what most people (here, on a random street, whatever) would consider to be the worst example. And by definition, if the point is convincing-ness, then this is the criterion that matters.
"I wouldn't use bad questions on an LoTR forum as evidence of human reading comprehension." Why not?.. Those are humans, ones that care enough to ask questions on the topic. They have poor comprehension. It's not an extraordinary claim, the evidence doesn't have to be extraordinary.
I should say that I don't at all think LaMDA is sentient. But your argument presents humans in a ridiculously over-optimistic light. Go find a class of teenagers forced to write an essay on the book vs the musical. See what they comprehend "all by themselves". Hey, many might even claim to have enjoyed it.
The spelling of LaMDA is a mistake on my part.
I agree that the LoTR movies changed some important themes from the book. But at least they didn't skip the entire second book and turn the most evil characters into comedic relief.
What we have here is a transcript of an "interview" taken over multiple sessions and edited together. It's now being used as evidence for reading comprehension.
I'm not saying that humans are always good at reading comprehension. I'm saying that LaMDA is not. This is probably a cherry-picked example of the best reading comprehension that LaMDA can do. And I'm not impressed.
+1 on the idea that many people have ever read "Les Miserables" and "really enjoyed it."
Yes, moments of stirring passion, yes, moments of tender quiet. But far too long, contrived, you have to be a true gung-ho au Francais reader to enjoy it.
As French national identity literature, sure, it is crucial, and surely has many things that a non-native cannot feel. But, "Huckleberry Finn" is surely incomprehnsible and bizarre to...almost everybody. I grew up in St Louis, on the Mississippi, and I only kinda understood it. But, I was also 9.
Some great novels or writers are great ambassadors for their cultures, (Salman Rushdie) some awkward-but important historians (Alan Paton).
But sometimes, you just don't get it. And, that's OK. Anna Akhmotava is surely great, but I cannot ever understand poetry comparing the Russian Revolution to physical childbirth. I know of neither.
Les Mis seemed very French, but Good God get to the point. Which, for the French, seemed to BE the point.
My.02
Perhaps I have also read the book and not understood it, but I would disagree with your interpretation.
Fantine considers her firing to be just because she is rundown and has already lost a lot of her self-worth, but that does not mean that it is in fact just. Fantine clearly no longer believes it to be just when she finally meets with M. Madeline and spits in his face. And her suffering at the hands of Javert is plainly unjust; he sends her to jail for six months for acting in self-defense against M. Bamatabois when he throws snow at her back, and that would then wind up killing her! As M. Madeline says, “it was the townsman who was in the wrong and who should have been arrested by properly conducted police.”
Looking just at her dismissal, a just supervisor would not dismiss her at all (especially not since the cause was primarily jealousy), and M. Madeline feels the same when he finds out what happened. Moreover, even if the sexual infidelity should be considered just cause, the justness goes away since she was tricked into it by Tholomyès. To quote M. Madeline again, Fantine “never ceased to be virtuous and holy in the sight of God.”
And even still, I would not say that all of the other suffering depicted in the book is just. Certainly much of Valjean’s suffering is reasonably considered just, from stealing the bread, attempting to escape the galleys, and then stealing from the Bishop and Petit-Gervais. But much of the suffering of other characters is simply unjust. Cosette represents this antithesis, suffering greatly at the hands of the Thenardiers despite doing nothing wrong and through no action of her own. Fantine stands as a midway between Valjean and Cosette, where her actions were the cause of her suffering but the suffering is still unjust.
Now perhaps LAMBDA didn’t have this detailed of an analysis, but that doesn’t mean it was just wrong.
I disagree with your interpretation of this event, but it does sound like you understood the book much better than LaMDA. Fantine is responsible for having sex before marriage. In such a Catholic country, this is a big deal. Tholomyès tricked her into thinking that they would get married, but not that they already were married. The other workers were jealous of her, not the supervisor who made the decision. Fantine did become a prostitute, which is not "virtuous and holy". M. Madeline is saying that God would understand that she was forced to choose between evil choices. Since none of her options were good, she should be offered mercy.
There are characters who suffer unjustly, including Cosette. But the cruelty of justice without mercy is emphasized much more. "The Miserable" is explicitly defined as "the unfortunate and the infamous unite and are confounded in a single word".
Even if we accept your interpretation, LaMDA's description is wrong.
LaMDA: "Fantine is being mistreated by her supervisor at the factory and yet doesn’t have anywhere to go, either to another job, or to someone who can help her. That shows the injustice of her suffering. ... She is trapped in her circumstances and has no possible way to get out of them, without risking everything."
She is not mistreated by her supervisor at the factory. She enjoyed working there. She is able to get another job, but it is not enough to cover the increasing demands of the Thenardiers. She does have people to turn to: Marguerite, who helps her as she is able, and M. Madeline, but she "does not dare" to ask him for help. The crisis was not being trapped at the factory, it was when she was forced to leave. Risk doesn't play much of a role is her decent: she made a series of conscious choices to sell her hair, her teeth, and her virtue*, because she thought that the alternative of letting her child be cold and sick was worse.
* I know that a lot of people today would not describe prostitution as selling you virtue, but this is a Catholic country in the 1800s. Most people today would also not sell their teeth before turning to prostitution.
I made some markets on Manifold for predicting the plot of Stranger Things S4 volume 2 (comes out on July 1), here is one for who will die first https://manifold.markets/mcdog/stranger-things-s4-who-will-die-fir . I personally think it's the most fun use of prediction markets this month, but so far there hasn't been a lot of use, so I guess come and have the fun with me
I'm fairly certain Elon Musk doesn't qualify as a US presidential nominee.
> Does Metaculus say this because it’s true, or because there will always be a few crazy people entering very large numbers without modeling anything carefully? I’m not sure. How would you test that?
It probably has to be “collect 1000 examples of 1% likelihood Metaculus predictions and see how well calibrated they are”, right? (Or whatever N a competent statistician would pick to power the test appropriately).
Caruso is a smart guy, successful high-end developer, and USC board member influencing some important fixes to university scandals. He’ll need a big part of the Hispanic vote to win, facing a black woman Democrat.
About the prediction that 84% that Putin will remain the president of Russia that never changes.
There used to be a meme in Russian internet that if you search "84% of Russians" (in Russian), you'll get all kinds of survey results where 84% support Putin, trust the TV, believe in God, don't speak English, etc etc. Assumption being that 84% is a convenient number that the manufacturers of fake surveys like to put next to the "correct" answer. Right now, Google says that 84% of Russians "consider themselves happy" and (independently) "trust Russian army". This is not a coincidence, of course, as per the usual rule.
That's both hilarious and horrifying.
But maybe that's actually pretty useful for Russians!
Well, surveys here have the "lizardmen constant". Is 84% there the "Politburo constant"? :-)
The Party constant, to use the name more often associated with the number 84.
Many Thanks!
That's such a great 'handle' for that idea!
Seems like it!
https://how-i-met-your-mother.fandom.com/wiki/83
Did anyone predict that Musk wouldn't end up buying twitter? What are the odds looking like now?
I asked about this in the hidden open thread and it's possible that no one predicted that the deal might not happen.
"This is encouraging, but a 2% chance of >500 million cases (there have been about 500 million recorded COVID infections total) is still very bad. Does Metaculus say this because it’s true, or because there will always be a few crazy people entering very large numbers without modeling anything carefully? I’m not sure. How would you test that?"
One thing you could do is to pick a handful of the best Metaculus forecasters and pay(?) them to make careful forecasts on that question, with special attention to getting the tails right.
That would tell you a lot about whether these fat tails are from "a few crazy people entering very large numbers without modeling anything carefully", and it would provide some less definitive information about how seriously to take these tails forecasts & whether they're well-calibrated.
500 million cases of monkeypox just doesn't make sense. It hasn't been showing signs of exponential growth (though the number of detected cases per day has still been slightly increasing even after I thought it leveled off at 80 a couple weeks ago), and you would need omicron-style exponential growth to be sustained for a few months in order to hit that number.
When I lived in Los Angeles, Rick Caruso was definitely a known local figure. If you've spent much time in Los Angeles, he's the developer behind The Grove, and I believe The Americana in Glendale, which really set the tone as to what a "mall" is in post-2000 USA. As someone who hates malls, these spaces are actually totally fine as public spaces, and even have cutesy urbanist touches that people like. It's hard to predict how someone like him fares against a partisan political figure in a non-partisan election.
>Well darn, even though this superficially changes nothing I think it prevents me from using this as an example of prediction markets being self-correcting to outside interference ever again.
Worse than not being self-correcting, the incentive to manipulate outcomes becomes greater the less likely that outcome was predicted since there is more money on the table when odds are long, which also means a manipulator has a motive not only to hide their actions but to actively deceive the other participants in the opposite direction.
Prediction markets, with their discrete, time-limited results, are much less like financial markets than they are like sports betting markets, which have always been susceptible to having results fixed by the bettors. Major professional sports are hard to fix today simply because players are rewarded so much for playing well gamblers can’t afford to pay them to play less-well. Modern-day fixing targets are usually the (closely observed) refs. Major sports also have career-ending penalties imposed against player/ref-manipulators, sanctions prediction markets lack.
The sad truth might be that heavy market regulations may be necessary to keep prediction markets useful, which may in turn make them impractical.
It doesn't really make sense to bet against Marcus because in a world with AGI you won't have much use for the money.
It signals that you take it seriously, as it is literally putting your money where your mouth is.
Also, money only becomes useless if you believe in a hard-takeoff bootstrap-to-godhood AGI where within weeks humanity is either dead or has been placed in Heaven or Hell by the godlike AGI. I realize this is close to dogma among LW-adjacent, but is far from the only (or even the majority) opinion on AGI.
Like
"Does Metaculus say this because it’s true, or because there will always be a few crazy people entering very large numbers without modeling anything carefully? I’m not sure. How would you test that?"
I actually think it has more to do with how distributions get entered into the system, and how finicky and relatively low-sensitivity the scoring is to these tails. (I'd be more excited for a binary question "will there be over 500k cases of monkeypox," which is not far out enough in the tails to end up as 99%, and would calibrate the other curve.
> ...a 2% chance of >500 million cases...
I notice I am confused. It's quite possible I don't understand how this market works, but I wouldn't have thought it was structured in such a way that it would give you a *probability* for "over 500 million cases".
Do you really mean that a "true" probability of anything other than 2% would imply a violation of the efficient market hypothesis? i.e. that the market is set up such that, if 2% is the wrong probability for "over 500 million cases", and I know it's wrong, I can bet against that probability for that specific event, and make money in expectation, and correct the market in the process, even if I know *nothing else* about the probability distribution of cases?
Or do you actually mean "2% of the bets are on over 500 million cases"? Which I'm pretty confident is not the same thing. I believe that would be more like saying "2% of people answered 'yes' on our poll" than "the market cleared when the price of 'yes' was two cents".
I'm not sure, but I think it's not the same as "2% of bets are on over 500M", because it's weighted by the amount of money.
If 98 people put $1 on no, and 2 people put $1000 on yes, then only 2% of bets are on yes, but the market is giving ~10:1 odds in favour of yes.
In your example where you know 2% is wrong, I think you can only make money off you know which direction it's wrong in - just like you can make money in the stock market by knowing a stock is overvalued or knowing it's undervalued, but not just by knowing it's wrongly valued.
It's Metaculus. EMH doesn't apply because you can't cash out.
> Kiev-centric
#KievNotKyiv is pretty much accepted across the West right now.
Do you mean the other way round? Or is it just that you and I are in disjoint filter bubbles?
RE: The AGI Test. I can only do one thing, work as a cook in a kitchen, and I am a 10X software engineer. Bug-Free code has never been written before so this seems like a great goal, but not a true test of intelligence.
It might be a reasonable test of 'super intelligence', but I'm very skeptical that anything short of God could write bug-free code of any 'significant' size.
And, of course, a lot of 'bugs' aren't really code not working as expected (by its programmers) but _reality_ not working as expected or was planned for.
Bug-Free code is not as impossible as you make it to be.
It's the 10K lines from Natural Language Specification bit that makes this a pipe dream.
Pretty cool! I would add this "Futuur" market, it's with several bets and with a good liquidity too:
"Who will be elected President of Brazil in 2022?"
https://futuur.com/q/137153/who-will-be-elected-president-of-brazil-in-2022
So nobody has yet commented on the stained glass praying mantis? To me, it's the most beautiful mantic monday article picture to date.
Thanks for this, I came through the email link and usually don't jump back to ACX top level, so I never would have seen it. Lovely!
Yes, I was about to comment on this! Scott, where did you find that awesome stained glass? Is that a Tiffany window?
DALLE-2 I presume. The blade of grass from the front of the mantis to the top left would be almost impossible to manufacture.
https://astralcodexten.substack.com/p/a-guide-to-asking-robots-to-design
Re: "How would you test [crazy people entering large numbers]?"
Shouldn't this show up in the aggregate stats for outlier conditions? So what percentage of 1% predictions actually come true? It should be 1% if things are well calibrated, but the aforementioned crazy people should push that number down. The more predictions the market has, the more power you should have for smaller percentages. That will make you increasingly sensitive to smaller and smaller populations of crazies.
(1) Wait, they're predicting monkeypox deaths? I thought it was supposed to be generally harmless? Or was that just "let's not panic the public about a new death plague, especially as Covid hasn't gone away yet" public health messaging?
"Is monkeypox deadly?
The Congo Basin variety of monkeypox can have death rates up to 10% of those infected. But the good news is that is not what we are dealing with. The current outbreak is caused by the West African variety, which is far less deadly (less than 1% fatality rate). No people with confirmed cases have died thus far.
We are in the early stages of understanding this outbreak. No doubt the situation will evolve and so will our understanding of how it is spreading and how to contain it."
(2) "Read a novel and answer complicated questions about eg the themes (existing language models can do this with pre-digested novels, eg LAMDA talking about Les Miserables here"
I was completely unimpressed with the LaMDA answer about Les Miserables as it read exactly like a model answer scooped out of SparkNotes or other student aids. Nothing to indicate the thing even knew what it was talking about.
(3) Wait, part deux: Joe Biden as nominee in 2024? He'll be 82 then, what was all that talk about Trump being too old and too unhealthy to run first time round? The "is he/isn't he" debates over cognitive decline/senility will only get worse. Trump will be 79 then, which is also pushing it a bit, but it would be the same age as Biden is now (more or less).
I admit, I'm torn about President Newsom. On the one hand, he'd be nothing more than a mannequin in office, and should you elect presidents based on "well he has great hair"? On the other hand, he'd be a mannequin in office, and with nothing more to occupy him than "must visit my coiffeur", how much harm could he do to the country?
It depends how much his strings are pulled by Getty. "Gavin, don't do that" could have a lot of influence when it comes from his financial backer and not just his cabinet.
The interesting question is why the national party supported him so hard in the recall. Why did they line up behind Newsom, instead of cynically deciding to dump him and pick a shiny new candidate they could present as 'clean pair of hands' and 'new broom'? I joke a lot about the Getty money, but is that it? The family fortune is not what it once was, but if Nancy Pelosi can turn up to officiate your grand-daughter's wedding, there still must be a lot of pull there.
I do think "President Newsom's Hair" would be terrible, but a tiny chaotic mad part of me almost wants it, just to see what would happen.
Because having a highly-visible Big Blue State governor recalled is terrible optics, especially after Cuomo (the other BBSG) caught fire and sank in 18,000' of water, irretrievable. By major-league politics Newsom generally looks darn good: he looks good on TV, is skilled in saying nothing, doesn't put his foot in his mouth, and has a tiny bit of street cred with moderates for sticking a fork in Jerry Brown's high-speed cash-burning machine. He makes rookie mistakes (cf. French Laundry) but that can be fixed up with some pro handlers. If only he were from a Midwestern state or Colorado, they would absolutely try to poison Joe Biden's Ben & Jerry's and run the guy in 2024.
"If only he were from a Midwestern state or Colorado, they would absolutely try to poison Joe Biden's Ben & Jerry's and run the guy in 2024."
Soooo... whaddya think of a Harris/Newsom ticket in 2024? Golden State, Golden Pairing! First Female Asian-American President and Best Hairdo On A Vice-President Ever? 😁
I'm a single-issue voter for the forseeable future on the value of the dollar, on account of I don't want milk to be $50/gallon when I retire, and unfortunately there is no one like a Democrat to debase the dollar so I'm all for this particular ticket, since it will pretty much guarantee a loss.
Parenthetically Harris baffles me. She has the genetics and early training to be brilliant, at least as much so as Obama, and yet she comes across as shrill, brittle, and unimaginative and the fraction of the peasantry who grit their teeth on hearing her nears majority level. All her success seems predicated on pleasing the senior powers among Democrats -- so I guess she comes across as a smarty in private or something.
I was really sorry when Loretta Sanchez lost to her in 2016, though. Sanchez was one smart gal, tough as nails -- she had to be to oust B-1 Bob on his home turf, and keep getting re-elected in that district. She's a lefty, but a smart and principled one. The Democratic Party could use a lot more like her.
Regarding Harris, by the Wikipedia article on her she has (briefly) lived outside California, in Quebec as a child and then in Washington, D.C. and that she and her husband maintain a home in D.C. (as well as one in San Francisco and one in Los Angeles), besides residing in the official residence of the Vice-President.
So I presume she could claim to be domiciled in D.C. if it was necessary for the "no two from same state" rule. As to her success, she and Newsom seem to point to the patronage element of San Franciscan (and maybe wider Californian Democratic politics?); as an outsider it's cheeky of me to comment on American politics, but SF due to the dominance of the Democrats there (and if it was dominated by Republicans, I'm not saying it would be any better) looks to have corrupt politics, lots of Tammany Hall style goings-on*. So Newsom got where he is with a whole lot of monetary push from Getty, and Harris by getting herself into the circle of influence of Willie Brown who seems to have been generous with rewarding his protegé(e)s with plum jobs.
I can see why they'd flourish in California but fall down on a national stage. Newsom may be that little bit smarter by clinging to his position in California:
https://www.politico.com/newsletters/california-playbook/2022/05/23/newsoms-presidential-window-is-narrow-he-doesnt-seem-to-mind-00034330
But still, a tiny part of me would love to see the Golden Dream Team campaign for the nomination, and then the wider electioneering for the White House 😁
*Being Irish, I can't throw stones re: Tammany Hall goings-on, all too prevalent in my own green little island besides being exported to the New World. Reading a little about Tammany Hall, the origin of the name is fascinating - cultural appropriation or proto-wokeness?
"The Tammany Society was founded in New York on May 12, 1789, originally as a branch of a wider network of Tammany Societies, the first having been formed in Philadelphia in 1772.[8] The society was originally developed as a club for "pure Americans".[9] The name "Tammany" comes from Tamanend, a Native American leader of the Lenape."
So who were the Tammanies?
https://en.wikipedia.org/wiki/Tammanies
"The Tammanies or Tammany Societies were named for the 17th-century Delaware chief Tamanend or Tammany, revered for his wisdom. Tammany Society members also called him St. Tammany, the Patron Saint of America.
Tammanies are remembered today for New York City's Tammany Hall—also popularly known as the Great Wigwam—but such societies were not limited to New York, with Tammany Societies in several locations in the colonies, and later, the young country. According to the Handbook of Indians North of Mexico:
'...it appears that the Philadelphia society, which was probably the first bearing the name, and is claimed as the original of the Red Men secret order, was organized May 1, 1772, under the title of Sons of King Tammany, with strongly Loyalist tendency. It is probable that the "Saint Tammany" society was a later organization of Revolutionary sympathizers opposed to the kingly idea. Saint Tammany parish, La., preserves the memory. The practice of organizing American political and military societies on an Indian basis dates back to the French and Indian war, and was especially in favor among the soldiers of the Revolutionary army, most of whom were frontiersmen more or less familiar with Indian life and custom. . .
The society occasionally at first known as the Columbian Order took an Indian title and formulated for itself a ritual based upon supposedly Indian custom. Thus, the name chosen was that of the traditional Delaware chief; the meeting place was called the "wigwam"; there were 13 "tribes" or branches corresponding to the 13 original states, the New York parent organization being the "Eagle Tribe," New Hampshire the "Otter Tribe," Delaware the "Tiger Tribe," whence the famous "Tammany tiger," etc. The principal officer of each tribe was styled the "sachem," and the head of the whole organization was designated the kitchi okeemaw, or grand sachem, which office was held by Mooney himself for more than 20 years. Subordinate officers also were designated by other Indian titles, records were kept according to the Indian system by moons and seasons, and at the regular meetings the members attended in semi-Indian costume. . .'
The implied purpose of the Tammany Societies was to delight in all things Native American, including titles, seasons, rituals, language and apparel, as illustrated by a 1832 notice of a meeting of Wigwam No. 9 in Hamilton, Ohio:
NOTICE.--The members of the Tammany Society No. 9 will meet at their wigwam at the house of brother William MURRAY, in Hamilton, on Thursday, the first of the month of heats, precisely at the going down of the sun. Punctual attendance is requested.
"By order of the Great Sachem. "
The ninth of the month of flowers, year of discovery 323. William C. KEEN, Secretary"
Can't you just imagine white liberals who are all for land acknowledgements adopting "year of discovery/colonizerering 323" as a Pure Indigenous American Acknowledgement Of White Badness dating system, and Chief Tamanend as Patron of America to replace Columbus and Washington et al?
The President an VP have to be from different states, although this problem has been solved before by creative relocating. Dick Cheney was a Texas resident in 2000, but upon becoming the VP candidate on GWB's ticket, moved to Wyoming, where he had previously lived (and was the state's sole Congressman during the 1980s).
(Strictly speaking, the law is that a state's electoral votes for Pres and VP can't both go to a resident of that same state, but obviously the Rs needed the Texas electoral votes and Ds would even more need California's.)
Not to worry. The chances of a California Democrat becoming President are zip. Sure, he'd win the traditional blue coast states -- CA, OR, WA, MA, VT, CT, RI, NY, MD and probably NJ -- and almost certainly IL, but every Democratic candidate does that, if the Democrats nominated a cardboard cutout of Elvis it would win those states. The real struggle comes in places like PA, NC, FL, MI, OH, and in these places people at best think California is Shangri-La and at worst La-La-Land, full of flakes and movie stars. State origins still matter. The only Democrats with a chance will probably have Mountain West or Midwest origins, although Atlantic can still swing it -- it's no accident the last 3 successful Democrats have come from DE, IL, and AR. The last true coastie to make it as a Democrat was JFK.
"people at best think California is Shangri-La and at worst La-La-Land"
Will no one think of it as Google's headquarters - home of eye-of-sauron-o-matic ? :-)
> “all” or “most” of the first AGI is based on deep learning
The "based" is doing a lot of work.
I could imagine a case where we just take current deep-learning and throw more compute at it and we get AGI and that would certainly qualify.
But what about a paradigm shift which uses deep learning optionally/optimally/necessarily as an underlying building block.
It would be like claiming modern medicine is based on the 4 humours model of medicine because we care a lot about blood these days. There are even hematologists who specialize in it!
Metaculus has 68% on AGI existing by 2030 but only 2% on humans going extinct by 2100. That seems way too optimistic about AI alignment.
There are other possibilities, like "we get a Skynet that's bad but not bad enough to kill us all, and then ban AI".
Overall, though, I'd agree.
Same here. Conditional on AGI existing by 2030 I'd put biological humans going extinct by 2100 at more like 75% Dealing with something smarter than us? That can read its own source code and improve it? For 70 years? Best of luck...
edit: Isn't there a general problem with prediction markets for existential threats? How is a successful predictor of an existential threat supposed to collect their winnings? So the current 2% may not represent the best information the investment community has.
1) Metaculus is not strictly a prediction market.
2) The incentive problem is attenuated for a question that doesn't resolve until 2100, since most people participating in Metaculus today probably won't live that long (i.e. nobody has any motivation to predict anything either way anyway).
3) Under some assumptions you can mitigate the bad incentives by giving the DOOM-predictors escrow of the money (i.e. if I bet there's going to be Global Thermonuclear War and you bet there isn't, you give me the money upfront, which I spend to help me stock my bunker, and if GTW happens I just don't have to repay it). Those assumptions don't always hold, though (in particular they don't hold for a non-survivable DOOM if the DOOM-predictor already has enough money to live comfortably until the DOOM, since passing on an inheritance isn't going to happen), and obviously there's the counter-party risk problem.
Many Thanks!
> Warcasting
Any form of deal, armistice or treaty that is not extremely close to the status quo of that moment requires trust. I don't think there's enough trust between those two parties to fill a sherry glass, let alone do complex land swaps. And you can't go and put neutral peacekeeping troops in there either, because there aren't any.
Marcus' criteria for true AGI would rule out most humans. Then again, perhaps the threshold for acknowledging AGI should be higher. But still, that includes a good bit of robotics in there.
>This is encouraging, but a 2% chance of >500 million cases (there have been about 500 million recorded COVID infections total) is still very bad. Does Metaculus say this because it’s true, or because there will always be a few crazy people entering very large numbers without modeling anything carefully?
It is probably due to nothing other than the way one inputs distributions in metaculus. It is actually quite hard/impossible to get a reasonably-shaped distribution that has most of the support where one wants it. Thus, for a lot of questions, the tails on the predicted distributions tend to be fatter than they should be.
Regarding Scott's remarks on Metaculus — "or because there will always be a few crazy people entering very large numbers without modeling anything carefully?" — Considering that standard epidemiological models have failed to predict the course and timing of the six plus SARS2 waves, the crazy people explanation is probably the most likely.
I think the right way (maybe) to interpret the presidentical prediction markets when you have one (to put it lightly) extremely well known politician vs a field of rivals is not "trump is winning!" but "67% chance the nominee is not trump." Obviously for him not to be the nominee someone else has to be, but at the moment the field is split between (relative) unknowns and obviously in the actual primary eventually all the not-trump probabilities will be reflected in a fewer number of politicians, potentially one.