335 Comments
Comment deleted
Expand full comment

On the question of who is a great K-12 teacher, just ask the parents. Or the admin. They all just know who it is. It is the one everyone wants next year.

No need to spend resources coming up with a metric to assess which teacher is most deserving of merit-based pay raise.

Expand full comment

Fantasy football, but with grade school teachers.

Expand full comment

"How many minutes do you spend in Codex-like tool?" is a terrible statement for a question. Google already has AI-driven auto-completion in its internal code editor. I don't see how to measure its usage in minutes. (You could theoretically count _saved_ minutes, but it's not easy to estimate.)

Question to any programmer: How many minutes per week do you spend in thee auto-complete of your editor of choice ?

Expand full comment

> If you treat “teachers whose students get high test scores = good”, then you’ll just promote teachers who work in rich areas, or who get lots of smart students, or some other confounder related to student selection effects.

Give the class to the teacher that will accept the lowest base salary. A class that will predictably have worse scores, granting a lower grades bonus, will grant a higher resulting base salary. Teachers that prefer teaching smart kids can accept a lower salary to do so. Teachers that prefer making money will go for the class where they can improve scores more than others.

Expand full comment

https://www.gwern.net/CO2-Coin has some good ideas on a CO2 Blockchain. The piece contains some interesting tangents on prediction markets.

Expand full comment

“and we should also replace all lower courts with prediction markets about what the Supreme Court would think”

We should replace the Circuit Courts of Appeals, though.

We should not replace courts of first instance, though. There is a need for some organisation to create a transcript of the evidence for the prediction market to work with - and also to produce decisions in the cases that aren't especially controversial. The Federal District Courts do this for the federal system and are necessary; it is their transcript of evidence that the higher courts rely upon when generating a verdict, and that is also what the prediction market participants need.

Expand full comment

I'm a doctor, and I've suggested to colleagues, even wrote a concept paper for a small conference, that we use prediction markets for prognosis, diagnosis, and treatment of patients. It has gone over as one might expect.

Expand full comment
founding

For "how do you know how good a teacher is", the obvious answer to me is "use standardized tests and use a model to adjust for expected student performance".

The hard part isn't in constructing the model (you could use a Prediction Market, or you just could hire two college students to write R for a month), it is in constructing the standardized tests.

Both "what should be tested" and "does the test actually measure that" are Hard problems that I'm not sure the government can actually solve in the long term; it seems certain to me that most school districts aren't solving that well today.

Expand full comment

You can solve the 2100 prediction problem with another kind of recursion - have them predict what the market 20 years from now will predict (conditional on it existing). And of course the hypothetical 2041 market will predict what the 2061 market will predict, and so on.

Expand full comment

Regarding using prediction markets for long-term predictions. I discussed it with Polymarket's founders and they are aware of the problem, but choose to ignore it for now, by limiting PM to short-term predictions.

A solution for longer-term market would be to make stakes in a currency that grows at the rate of the market. The guy from Polymarket mentioned Compound USDc, but in my eyes it would be much easier to just make stakes in ETH. Considering that you don't need to wait until the market is resolved and can sell your stake at any time, I don't see why it wouldn't work.

Expand full comment

>Vitalik didn’t end with “and we should also replace all lower courts with prediction markets about what the Supreme Court would think”, but I’m not sure why not.

I'd disagree with this, even if we were good at executing it. I think the lower courts help shape SCOTUS decision on newer/more controversial cases (on others, they do just apply precedent, and do ok at it). It's good for the problem to be fairly worked over by the time it reaches SCOTUS, and for the justices and clerks to have a number of perspectives to sort through.

A lot of my opinion on SCOTUS was shaped by a stage play, Arguendo, that simply staged oral arguments from a case about whether exotic dancing was first amendment protected speech. (I reviewed the show here: https://www.the-american-interest.com/2014/09/15/the-art-of-argument/).

The Justices seemed less focused on justice or even applying past precedent, than in crafting a narrow decision that would be minimally harmful when it came back to bite us all as Established Precedent. And thus, seeing how lower courts wrestle with the issue, what they think different decisions would imply down the line, etc. is helpful.

Expand full comment

Isn’t the best way to evaluate teachers is to let kids choose their next teacher? Sure some kids will opt for a soft option, but if that’s what they want who are we to argue? I would bet most teenagers have a pretty good understanding of what’s in their best interest and can easily get feedback from the current pupils. If a teacher gets few or very little uptake, its a pretty good signal that they are not providing what the market wants.

Expand full comment

For long-term predictions like the chance of nuclear war by 2100, could you augment the Keynesian beauty pageant approach by “laddering” out 80 years, 10 years at a time. So basically, you’d be guessing today what you think the outcome of a Keynesian beauty pageant run in ten years would say. And that one would be trying to guess the one 20 years away, which would be trying to guess the one 30 years away, etc.

The benefit is that there is an answer within 10 years (not in 80 but not right away) but if you have insider information that you think will be public (or at least known by enough participants) in 10 years, you should fact that into your guess. You don’t have a perpetual liquid market, just run the beauty pageant every 10 years.

Of course, I’m not sure how this is any better than having a shorter-term futures (or options) market on top of the underlying long-term question (e.g. what will the market price be for chance of nuclear war by 2100 in 2030). The underlying long-term may not directly get interest from knowledgeable forecasters (because they don’t care about a payout in 80 years) but the short-term one should and should incorporate information (or arguments) that will come to light in the next 10 years. Plus, the long-term market price should adjust towards what the ten-year out market predicts (barring new, real information entering it) because otherwise you could trade on the difference.

But wait, how is that different from someone buying (selling) the 80-year contract because they think it will be higher (lower) in ten years and they’ll be able to cash out then? If the contracts are tradable and the market liquid, you don’t have to wait for them to expire, you can profit based on the change you expect over any time horizon that matches your preferred holding time.

So is the approach only really relevant when you can’t provide a permanent liquid market?

Expand full comment

I'm pretty sure there's a teacher shortage that's only gotten worse since the pandemic. Not sure why teacher merit pay is a question that anyone cares about, as opposed to just raising teacher pay across the board in order to address the shortage, and also improving working conditions to reduce burnout to ensure that it doesn't just happen again in a few years.

Here's a fun fact: experience makes people better at their jobs, and getting teachers to stick around for more than five years would do a lot more to improve performance across the board than questionable schemes to identify the best teachers.

How about thinking about ways to get teachers more training and professional development? More time to do lesson/unit planning? Did you know that teachers can write off school supplies that they buy for their classroom on their taxes? Seems nice, until you realize that it means that students whose teachers don't have disposable income aren't getting fully-stocked classrooms! Why does the richest country on earth have its overworked, underpaid teachers buying school supplies for their classrooms?

What's the point of merit pay? Do you know any teachers who have the potential to be great, but choose not to do their best because they aren't getting paid more than their peers? What do you think it is, exactly, that motivates teachers?

Sorry to come off as forceful here but merit pay is the kind of scheme that is just so completely divorced from the problems and realities of education that it makes me despair that we'll ever get serious solutions.

Expand full comment

I think we need to decide what the point of education is before we decide how to measure it. I was reading just the other day about the Prussian education system and about how, in addition to the subversive socialist counterculture, there was a movement by middle class burghers to preserve their own educational norms. It was the best system in Germany measured by things like literacy, income of graduates, etc. It was normal for them to send orphans and homeless children through the system too so it wasn't selection effects. Yet it ultimately lost.

The three lines of attack against it (afaict) were that it was insufficiently national (whether in terms of socialist class struggle or conservative nationalism), that it was insufficiently professional, and that it focused too little on things like the cultivation of virtues or poetry. It was stereotyped as creating wealthy people who thought they were educated but only had a kind of shallow cosmopolitanism combined with an obsession with money.

But those aren't attacks on its efficacy. Those are attacks on its purpose. The attacker's response was that education was supposed to prepare you for a career. And the reply was that wasn't the point of education. That's a normative argument. And one that Germany ultimately decided against them. Instead, both the conservative nationalists and the liberal socialists wanted a national professional system.

Likewise, the original purpose of the American national educational system was to inculcate a sort of republican (as in the republic, not the party) virtue and civic nationalism. Before that it was to save your soul. I don't think those are its purpose today. But what is that purpose? I suspect you'd get ten different answers. The last time we had a concerted answer was No Child Left Behind which, more or less, said it was to pass tests proving you had baseline skills like literacy. If we agree on that we can make that happen. But do we agree on that? Even at the time I don't think we did.

Expand full comment

Seems to me that the problem with prediction markets is that they have finite liquidity and are subject to irrational/dishonest bets. Emotionally motivated voters can make money-losing bets, or intentional market manipulators can make bets to drive the market up or down.

In an efficient market, any $ of irrational bets would immediately be overwhelmed by opposite bets from smart investors, who would happily take the money. However, in real life there's a finite trading volume on any given bet, and if irrational bettors spend $$$ faster than smart-money bettors can capture it, the market remains irrational.

We've seen this in multiple US elections where the political prediction markets can behave in very irrational ways despite a small number of smart-money investors plundering as many bad bets as they can.

The problem of irrational money outweighing smart money gets even worse if you try to routinely use prediction markets to answer real-world questions. Let's say you use prediction markets to answer 10 salient questions, and you manage to draw a big enough trade dollar value to swamp any effect of irrational bets.

Since your markets are really practical and useful, you now expand them to 100 salient questions. Either you have to somehow attract 10x the trade volume, which could be difficult or impossible if your volume was already quite high... or each question only gets 1/10 as much trade volume as before. The more questions you add, the less confident you are in any one answer.

It feels like forking a cryptocurrency into many mutually-incompatible coins until each one is vulnerable to a 51% attack.

Expand full comment

On the valuation of SpaceX - I'm not entirely sure, but could there be some distortion introduced by looking at the *median* valuation rather than some probabilistic average? I suppose that if the valuation can't go negative, then the mean must be at least half of the median, so there must still be some mismatch, but it might not be as extreme as you get by looking just at the median.

Expand full comment

I don't think your moderation problem (really bad spammy posts) is as bad as you think. In principle theres no reason a sufficiently liquid betting market can't target really unlikely events. But you need to pay out at the odds of the event. E.g. Kalshi could handle this moderately well.

But really what you need to do is re-orient slightly. The reddit/metaculus approach of giving everyone internet points works surprisingly well, and doesn't have problems with these low-probability events being expensive -- because internet points are cheap. Add a ranking system for points holders and it gets much easier. Then you could see if.. e.g. >95% of bets were that you'd block something, you temporarily block it -- subject to review.

Expand full comment

"In some sense, the definition of probability is what a smart person who knows a certain amount of information should estimate"

That doesn't sound right. Probability would exist even if no people did.

Expand full comment

"Certain classes, races, and genders of students consistently produce higher VAM than others, and a teacher’s VAM can apparently predict their students’ past performance, which makes no sense unless there’s some kind of bias going on."

This was very intriguing. I wish you'd have done a deeper dive on these VAM issues. It seems to me any problems with this metric are simply due to a failure to define educational "value" correctly. And rigorously examining this definition should also trigger a highly useful debate about exactly what we are trying to accomplish with education and how much it is worth.

For example, is the "value added" metric defined solely in terms of the raw number of additional test questions answered? In that case, it would be easier to add "value" by just selecting a bunch of 99th percentile smart kids who will learn the material better and faster. But if they started and ended in the 99% percentile (and probably would have done so even if they skipped school and taught themselves), has any "value" really been added by the teacher?

But what if "value added" for an individual teacher is the ability to raise the percentile rank of her students. In that case, low motivation and "underperforming" kids are the low hanging fruit who have the most room for improvement. But maybe this "value" isn't being added by the teacher and is just an artifact of underperformers regressing (upward) to their mean ability.

In any event, "teacher value added" should be graded by "degree of difficulty." For example, improving results for kids with IQs of 75-85 and issues with focus and impulsivity should count for more than simply presiding over smart kids being smart.

Finally, have educational researchers ever heard of Bill James? He already figured out how to do "value added" analysis for individual performers. According to Sabermetrics principles, once we correctly define what result we consider an educational "win" we should simply rank teachers like baseball players in the draft -- i.e., based on their expected "wins above replacement."

Expand full comment

Quick comment before I finish reading the post but this sounds confused:

> For example, one stable equilibrium is that the right answer is the obvious Schelling point so everyone tries to coordinate around that. But another stable equilibrium is that “one thousand” is a very round number, so everyone tries to coordinate around that.

Is there an obvious Schelling point I'm failing to think of? Zero years (until nuclear war destroys civilization)? Zero is an extremely focal number.

But the confusion is in the next sentence -- "But another stable equilibrium..." -- which goes on to describe exactly what a Schelling point is, so starting it with "but" doesn't make sense.

(Also "one thousand" isn't a coherent answer to "will nuclear war destroy civilization by the year 2100" so I guess there are layers of confusion here.)

The overall point is clear (and well taken!) to those of us who know all about Schelling points and Keynesian beauty contests but I suspect the less initiated are going to get lost there.

Expand full comment

I think citing Caplan as your authority on education -> income is a lot like citing Borjas as an authority on immigration. He's not a crank, but definitely represents a view that is disputed by other people who know how to run a regression.

Expand full comment

(1) Re: the Virginia governor election - I have no idea about the two candidates apart from one seems like a standard Democrat and one seems like a standard Republican. But going to the linked "Washington Post" story sends me on to a link to an explainer ad on Twitter from Governor McAuliffe, and I would vote for his opponent on this alone:

"Husband, father, former Governor, and proud dog dad to Trooper and Dolly. Now, running to be the next Governor of the Commonwealth of Virginia."

He's 64 years old, he's got five human kids, and he burbles on about being a furry. Oh, you didn't mean it *that* way, Terry? Then why pretend you are the equivalent of an adoptive father to non-human animals, and that they are the equivalent of your human children whom you do not put into your Twitter bio (are you afraid potential voters will see you have five kids and imagine you are some religious zealot conservative?)

Yes, I'm cranky about this. Even in a fun, just-joshing way - animals are not your babies, you are not their parent, it's not an equal relationship.

(2) Setting up bans on the basis of automating banning which is triggered by how much money one person, or a group of people, are willing to co-ordinate to get some comment or some commenter banned. Yes, I see no way at all that can be gamed!

Expand full comment

Re: Decision making by market prediction: What stops a billionaire from putting their thumbs on the scale? There's been value shown in vote-manipulating Reddit posts by companies and governments. If we assigned a dollar value, this would become more expensive, but still a good return on investment for the people with this kind of capital.

Expand full comment

> Keynesian Beauty Contests

You could also predict what people will predict at a subsequent prediction in t years. e.g. "What will the prediction be when this same poll is run 3 years from now?" under the assumption that as you approach the terminal endpoint, the predictive values stochastically converge to the true ones.

Expand full comment

Scott,

I suppose you have forgotten this, but in 2015 there was a website called Omnilibrium that already implemented a very similar idea (the old SSC website had a link to it for a while). On Omnilibrium though, instead of making predictions just for the moderator the recommendation algorithm was making predictions for each individual user. The website never acquired many active users and went inactive after about a year.

Expand full comment

<And you can’t do an open tournament, because then lots of stupid people would be in it and the challenge would be figuring out what stupid people would guess. >

You know, you could just ask me.

Expand full comment

For content moderation, you could have reddit-style up and down votes, and then Scott's votes are authoritative. So if Scott downvotes a comment, then everyone who upvoted it will be downweighted in terms of how much their votes counted for moderating other posts. And if someone is heavily downweighted, then anyone who has a similar voting pattern to them will be downweighted as well. This would work best if Scott's votes were invisible and no one knew about the algorithm so they couldn't try to game the system. It obviously works less well for something like Facebook where users don't trust the company's opinions.

Expand full comment

Vitalik also had a neat sketch of a mechanism that would use smart contracts to disincentivize spam.

https://ethresear.ch/t/conditional-proof-of-stake-hashcash/1301

"The idea here is that we set up a smart contract mechanism where along with an email the recipient gets a secret key (the preimage of a hash) that allows them to delete some specified amount (eg. $0.5) of the sender’s money, but only if he wants to; we expect the recipient to not do this for legitimate messages, and so for legitimate senders the cost of the scheme is close to zero (basically, transaction fees plus occasionally losing $0.5 to malicious receivers)."

Expand full comment

The Lincoln Project false-flag operation feels like a Too Good To Check playing out and actually swinging the election.

Act 1: Nazis support Youngkin! Youngkin sinks in the polls.

Act 2: Lauren Windsor of the Lincoln Project admits that the McAuliffe-supporting PAC organized the "Nazis" as a false-flag operation, but refuses to apologize, saying that it was fair since Republicans are basically Nazis anyway. McAuliffe sinks in the polls.

Act 3: Leaked documents reveal that the Lincoln Project is a false-flag operation by Youngkin supporters. Youngkin sinks in the polls.

Act 4: Turns out the Nazis were a false-flag operation by Democrats pretending to be Republicans pretending to be Democrats pretending to be Republicans. McAuliffe sinks in the polls.

Or does he? It depends on where people stop.

You can focus on the people whose minds will never be changed, and say that people will keep going until they find a conclusion they like and then stop. But median voters exist. You never hear from them, because they don't produce any of the national conversation, but they do consume it. Admittedly, there's no way to be sure that this actually did move the polls at all. But it could have.

Expand full comment

SpaceX can't be taken public if the plan is to build a Martian colony. An off-world colony is very far from a profit-maximizing idea, it's highly unlikely that public shareholders would be persuaded that SpaceX's profits should be invested into it.

Expand full comment

If the assignment of students to teachers is randomized, can't you just decide who the better is by looking at their scores (or improvements) afterwards? The randomization should even out the demographic differences, as long as the number of the randomly assigned students is large enough.

Expand full comment

-"Probably you can prevent that by hiring one expert to make an educated guess outside of the beauty contest, and including that in the mix."

I don't see how one expert would make much of a difference; if there are significant advantages to using a Schelling point like $1000 then it seems they would overwhelm the small penalty for being far away from this expert. Though maybe it would psychologically make a difference.

Expand full comment

A fundamental problem of teacher evaluation is sample size. An average K-12 teacher will have 30-90 students per year. Even if a prediction market manages to actually account for confounders, you're still not going to get reliable measurement with an n of 30.

But no system can account for one of the most important confounders—other students in the class. For example, there are those students who are a constant disruption, and being in a class with them makes learning much harder (regardless of who the teacher is); and often, how disruptive a student is depends on which other students (ie., their friends) are in a class with them. The opposite is also true, where some students can learn much better by being in a class with certain other students. To evaluate how much a student learned from a teacher, you have to be able to control for how that student was affected by the other students in the class. But given the number of permutations of possible classes of students, no model will be able to actually account for this effect.

Prediction markets could probably still be a useful tool for evaluating questions with a large sample size, like the public-vs-private schools example. But it won't work for evaluating an individual teacher, and probably not even for an individual school. Maybe the question "what system can accurately and reliably measure the effectiveness of a teacher using only their students as inputs?" is "there isn't one".

Here's the question that should be asked instead: "how do we make education better?" Obviously, in general, having better teachers leads to better education. But has anyone actually tried to figure out how much a more accurate teacher evaluation system would improve educational outcomes? It's not at all self-evident that this is the highest-return change we could be pushing for in education.

Maybe a better approach towards improving teacher quality would be to just ask teachers, "what would make you a better teacher?" Teachers know a lot about teaching, so they're a good resource. If you ask teachers, they'll almost all say smaller class sizes. Smaller class sizes obviously cost more money, because it means hiring more teachers, so it's often not even considered. But it should be obvious that teachers can reach each student better if they can spend more time and effort focusing on each student individually. And beyond that, we should also expect that for a student, more individual attention from a descent teacher might actually be a lot better than no individual attention from a great teacher.

Expand full comment

I was under the impression that Keynesian Beauty Contests were a metaphor for a failure mode in real markets, including prediction markets.

If I know that Enron stock will ultimately crash to zero and they aren't profitable, or that Trump will not successfully execute a coup before 2020, BUT that there are a bunch of idiots who believe the opposite and will drive up the price, then my optimal strategy is to ride that bubble and try to exit before it bursts - NOT to fight it and give my true prediction. The market can stay irrational longer than you can stay solvent.

In the worst case, it's possible for most or even ALL of the investors driving up a stock's price to believe it's actually worthless and will not pay out, but nevertheless correctly believe that it would be irrational for any individual investor to act on this. Many "meme stocks" and minor crypto scams seem to work this way.

The Wikipedia page seems to agree:

A Keynesian beauty contest is a concept developed by John Maynard Keynes ... to explain price fluctuations in equity markets...

Keynes described the action of rational agents in a market using an analogy ...

Expand full comment

"Also, if I were to play this prediction market, I could insider trade and steal all your money. I guess if you trust me enough to make me a moderator, maybe you also trust me enough not to do that?"

I think there's lots of people I would trust to moderate a forum or comments section, but very few that I would trust to not insider trade in that situation (probably including Scott). The incentive just seems to strong, and it also seems hard to have strong enough verification procedures to prevent insider trading.

Expand full comment

I have no idea what you will ban. So will not bet. How many people will be involved in your betting pool?

Expand full comment

>The main flaw I can come up with in five minutes of thinking about this: suppose there’s some obviously terrible post, like outright spam. Nobody would predict I don’t ban it, so how would there be any money to reward the people who correctly predict I will? Maybe there’s a 1% tax on all transactions, which goes to subsidizing every post with a slight presumption toward don’t-ban.

You would have to provide a marginal amount of liquidity (Say $10) per post.

The system should also allow participants to add liquidity.

In an obvious ban post, a sharp comment reading will just snatch the $20.

This is how Polymarket kind of works.

Expand full comment
founding

> I have no source for this, someone told me about it at a meetup.

This might have been me? (Unless it was a recent meetup.) Anyway, there's some work under the name "Bayesian Truth Serum" which is interesting here.

Expand full comment

I have a few questions about the teacher evaluation idea:

[1]

I'm confused about why the prediction market for evaluating teachers would be better than existing methods like VAM. Presumably, any bias in existing methods like VAM are because of non-random assignment of students to teachers. But if the prediction market is predicting things like "performance of Alice conditional on being assigned to Mr. Smith's class", and there's non-random assignment, won't the best strategy for predictors be to reproduce the bias? In other words, if there's some bias that makes Mr. Smith get assigned all the best students, then won't a predictor use the fact that Alice got assigned to Mr. Smith as evidence that Alice is a good student?

It seems to me that the only way out of this would be to make sure you were able to randomize the students to teachers. But if you had a way to effectively randomize students to teachers, then wouldn't you be able to do a properly controlled and unbiased study in the first place, and not need the prediction market? What information does the prediction market give you that the actual test scores don't give you? (Prediction markets might make sense if you have to make a decision now based on something that won't resolve until much later, but that's not the case here. We're perfectly fine waiting until we get the actual test scores before we finalize the evaluations.)

[2]

The proposal here seems to be that the teachers are evaluated on the *market prediction* of their students' performance, not the students' actual performance. Doesn't this create a perverse incentive for teachers to do things that look to market participants like they help, even if they don't actually help? Even if you accept that market participants will never be fooled (which seems optimistic; even in the real stock markets, companies do often try to fool analysts, e.g.https://www.bloomberg.com/opinion/articles/2021-05-04/under-armour-earnings-were-a-bit-misleading) , you still have the problem that there *is no incentive* to do anything that's not visible to market participants, no matter how much you think it will help.

Expand full comment

I'm a conspiracy theory enthusiast, but I don't recall the specifics of the Dath Ilani conspiracy theory. Is it the one where superior people from a more rational reality come over here and infiltrate our society in order to cause evolutionary uplift? I thought that was just one of Eliezar's thought experiments; is he now claiming it to be genuine?

Could somebody please tell me more? I'm glad that prominent members of the rationalist community are finally getting into conspiracy theories a bit more because they're great fun and really brighten up people's boring days. And since spreading them is very low-cost and carries just great emotional benefits, obviously it's the height of Effective Altruism. ;-)

Expand full comment

The highlight of the Steinhardt post: by 2025 "forecasters predicted 52% on MATH, when current accuracy is 7% (!)" for performance of AI on free-response math problems expressed in natural language. The example problems were generally too hard for me to do in my head. A Berkeley PhD got 75%.

If this is right, it seems to me we ought to expect very good Codex-style automatic programming too in roughly that timeframe, at least for the kind of problems you get in a coding interview, if not on a larger scale.

Expand full comment

Note that there is currently a flaw in the title for the Metaculus question about Robin Hanson's Twitter poll. Scott correctly interpreted the community distribution, but many predictors on Metaculus were confused (as it was unclear whether you were predicting "minutes" or "hours").

Expand full comment

What's going on with the 2024 US Presidential election market? https://www.betfair.com/exchange/plus/politics/market/1.176878927

Expand full comment

On teacher evaluation, why do we assume teachers have to be treated differently/more objectively than other professions? School administrators are far from perfect (like all bosses and indeed other humans). I struggle to think of other professions where we demand such a super-objective system and so highly distrust supervisors to evaluate employee performance. Will all supervisors do this well and fairly? No, but this is true for nearly every other profession. Even other public employees have annual performance plans that are approved by their supervisors and then rated according to the performance plans. Public employees typically have appeal rights for adverse actions (including poor evaluations). Although the systems are often sclerotic, public employees can be rated poorly and miss out on performance bonuses, promotions, etc.

I’m open to the argument that public employers have fewer incentives than private employers to value good performance, but this is an argument for holding school administrators, supervisors, school boards, etc. responsible for the success of the school not for taking teacher performance evaluation out of the hands of school administrators.

Expand full comment

Step 1:

Produce a really bad comment.

Step 2:

Bet that it gets banned

Step 3:

Profit

Expand full comment

>An upvote invests $1 (Vitalik says 1 ETH, but the post is from 2018 and maybe he didn’t expect that to be worth $4325) in a prediction of “Scott won’t ban this”. A downvote invests $1 in a prediction of “Scott will ban this”.

What if I think something is a terrible comment that shouldn't appear high up on the page, but I don't think it deserves banning?

Expand full comment

I feel like you are getting your characterization of the Keynesian beauty contest as a solution to the prediction problem precisely backwards. While it might solve the problem in practice, it does not solve it in theory.

In theory, any prediction is an equilibrium - it works if we all coordinate on it - and no equilibrium is stable by any definition I am aware of - or perhaps all are (I think the Mertens stable set is the set of all equilibria; I'm asking a more knowledgeable friend). There is certainly not a prediction that is singled out by the formal description of the game, which is what I would call the theory. If the truth is a Schelling point - a dubious proposition if the truth is not well established - then that is a psychological observation, not a game theoretic one. In other words, if Keynesian beauty contests solve the problem, it is because of highly contingent factors. It might work in practice, not in theory.

Expand full comment

On GPT Codex: The question becomes slightly confounded by the question of who is a programmer. If GPT Codex works, I'll use it: I can't program, but there's an app I'd like to make (am not working on because of lack of technical skills). The definition of programmer may just shift to "those who do the programming that GPT Codex can't do".

Expand full comment

I'm a software engineer on sabbatical with a experience making software that gloms onto existing websites and an interest in prediction markets.

I'd be keen to whip up a prototype to use as an experiment on a post! Shoot me an email at jarred.filmer at gmail dot com if you'd like to give it a go 😊

Expand full comment

Prediction markets don't work very well in a society as economically unequal as ours.

Expand full comment

Side note on Moderation: Reddit's problem isn't just that it's vulnerable to cliques. It's that, well, for whatever reason, it's one of the ruder places I encounter. Downvoting trolls doesn't seem to have kicked in. Similarly, StackOverflow noted that regulars are pretty rude to newcomers and put in a new policy of asking everyone to be nice. It's still pretty challenging for timid newcomers working with languages they don't understand yet.

Expand full comment

By the time students graduate and get accepted to college they have had 30 or more teachers. I don't see any way you could use grad rates to evaluate individual teachers. The average class size in a US elementary school is around 20 kids. Is that going to provide enough data to offset confounds like this teacher got more difficult students, they have the kid with downs syndrome in their classroom, they got the smaller classroom by the noisy gym, they have fewer students than their colleagues?

I would look at how much gain in student achievement a "good teacher" gets, and then see how evident those gains would be in a sample of only 20. How easy would it be to offset those gains by giving them difficult students and other confounds?

Some teachers are martyrs and they like helping difficult kids, are willing to take one for the team and have the smaller classroom etc. Tying their pay to data disincentivizes this generous behavior. Instead you would be encouraged to look at your students, and think about who provides maximum potential gains in the metrics, and then focus on them.

Expand full comment

We oughtn't throw more standardised testing at kids than is necessary. Yes, it's needed to ration access to higher education and for hiring but this proposal seems to be to force teachers to teach to the test at the expense of everything else.

Expand full comment

The story about #tweets is an example of why I'm a lot less bullish on prediction markets as a silverish bullet today than a couple years ago.

It's relatively easy to manipulate outcome of narrow prediction markets. Be that [assassination of particular individual](https://en.wikipedia.org/wiki/Assassination_market), [chance of streaker during superbowl](https://www.insider.com/super-bowl-streaker-bet-on-himself-prop-bet-2021-2) or number of tweets by individual in last two weeks.

Expand full comment

> GPT Codex is an AI that auto-completes code for programmers. You can see a really amazing and/or rigged demo here:

I use GitHub Copilot (as I understood, it's basically the same model, it's GPT-3 tweaked with additional "RAM"). I have to say the demo is not rigged at all. Once I learned how to seed it, it produces very correct 10-15 lines of code I'd say 50% of the time, and mostly correct the other half.

I use it to write tests, and obviously nobody likes writing tests, but it became very fun now – I try to write the prompt so that Copilot creates a correct test case on the first go.

One of the most mind-blowing examples for me was when I wrote code for SVG image and I typed:

// output an image with old paper look

AND IT blasted some distortion filter that did exactly that: created an randomized outline around the image,

Of course that's cherry-picking and that could have been a copy-paste example from the internet, but still it's impressive how deep it can go.

It goes without saying that's a lifesaver for some file or JSON operations, for data manipulations, for-each loops etc.

Say I want to open each file from directory, parse it's content, and then output as a JSON array into one file, where keys are the filenames, and values are the file content. Copilot doesn't even stutter and nails this every time.

So I'd say the number is underestimated. As soon as people try it, and learn how to get value from it, it'll be irreplaceable.

P.S. Another fun story: I needed a URL example and put something like "use random URL for this test". Can you guess which URL it used, without opening? https://www.youtube.com/watch?v=dQw4w9WgXcQ

Expand full comment

> Vitalik didn’t end with “and we should also replace all lower courts with prediction markets about what the Supreme Court would think”, but I’m not sure why not.

I think this system allows you to risk money to get a ruling in your favour. With comments it's not that bad. I can bet a lot of money on my comment to not have it banned. And it sounds like if it doesn't get reviewed (because it got a good rating because I bet a lot on it not getting banned) then I get it back after a while.

With actual court cases it's a much bigger problem. Because people involved in a case can get gain more if they win the case than just the money they stake on prediction market they might want to bet on odds which are not exactly right. And supreme court doesn't have that much capacity to look at cases so vast majority of bets don't resolve.

Expand full comment

The situation with insider trading isn’t quite as bad as all that with Keynesian beauty contests. If you have some private but verifiable-if-made-public information, you can predict higher than usual averages, then publish the info to whatever forum the predictors use, send it to news agencies, or whatever. You’ve bought up all the shares incorporating your information on the cheap, the other predictors see the information, update, and your on-sale shares turn out to get the big pay out. Even if broadcasting the information enough that the other predictors see it is costly, you can pay it out of your expected profit and still come out ahead.

There is an issue where information that’s either hard to broadcast or to verify is less likely to be incorporated, but that seems at-worst ambiguously bad. There will be some people with genuinely good evidence that’s just illegible for whatever reason, but there will also be people whose “evidence” is the product of a delusion or flight of fancy. I don’t know what sorts of relative magnitudes we’re looking at there, though, and that is the crucial variable.

Expand full comment

Your paragraph about SpaceX is a bit incorrect because growth in market value is not the same thing as return. Suppose SpaceX is worth 100B and raises 100B: the market value of SpaceX doubles but the wealth of existing shareholders is unchanged.

Expand full comment

About Teacher Pay, etc.. I have a related question: What made (public) school so horrible in the first place and being a teacher so stressful and dangerous?

I don't know tons of schoolteachers, but in conversations with the half-dozen acquaintances who are, I've learned there's basically two things. First, there's chasing accreditations and test scores over giving autonomy and freedom-of-action to local administrators to use their judgment of what's best (a kind of de-localization). Second, in many school kids are just straight-up dangerous, from broken or destitute families with little concern for their education and future, and teachers have little recourse to restrain them. It seems like things were okay until the 90s, and then by the early 00s the freefall began in earnest.

Expand full comment

I (weakly) believe that there is enough random variance in student performance over a year, and that student performance is robust enough to bad teachers, that building a model to predicts teacher performance might be impossible..

Expand full comment

I may have a tip on fusion predictions. On r/fusion a Metacalculus prediction was just posted with an average guess of 2041 for a fusion only reactor to deliver 100MW net electricity

Commonwealth Fusion Systems plans to build a high field pilot plant in the early 2030's that I assume would be hooked to the grid with a projected 200+MW output

They are currently building a smaller demonstration for 2025 to validate the physics which are for tokamaks which have the largest track record of experimental results, this plant is expected to hit a plasma Q of 10

Since the larger plant is apparently conservatively listed as a Q of 13 and a net Q of 3 putting out those 200+MW it seems likely that this prediction will pan out about a year after the larger plant is built

Expand full comment

If we pay teachers based on the progress of their students, then students can easily blackmail their teachers. "If you won't let me play with my smartphone during your lessons, I will intentionally screw up the exam, and you can kiss your salary goodbye."

Like, get a C instead of an A, so that you punish your teacher, but don't ruin your own career. Especially if you can coordinate with half of the class; then it will totally seem like the teacher's fault.

Expand full comment

Scott, you are really going crazy with this prediction market stuff.

In an unsubsidized market, for every long-term winner there is a long-term loser. In your prediction market utopia, WHO WILL BE THE LONG-TERM LOSERS?

If you want prediction markets for comment moderation, or court decisions, or student performance, and if you're not subsidizing these substantially, you MUST answer this question.

WHO WILL BE THE LONG-TERM LOSERS?

No long-term losers means no long-term winners, which means the equilibrium will be "nobody bets in your prediction markets".

Expand full comment

I do nothing about the technical details of prediction markets where actual money is involved. But, a question.

Can one make long-term prediction bets (or whatever name the instrument should have) ownership transferable? Like, a large enough correct prediction-bet about an event predicted to happen 50 or 100 years in future should be valuable property worth something in 50 or 100 years, similarly to stocks or a house, and could be transferred to the next generation stocks or house or other property?

People don't treat any other property as a joke, even though they are not going to be around 100 years later.

This won't solve the problem with predictions about events that involve collapse of financial system when the prediction pay-out is dependent on the financial system existing. For that end, has anyone suggested something like as follows:

Party A makes a prediction that nuclear annihilation will happen in 20 years. Party B makes a prediction nuclear annihilation happens only after 50 years. Rough sketch: They could make a two-part loan-like instrument, where holder of part B agrees to pay to holder of part A x units of money per year for the first 20 years, on the condition that A will pay y units of money per year for the years 20 to 50. Amounts x, y, adjusted for parties respective confidence at the moment of signing the contract. Possibly involve a collateral assumed worth 50-20 = 30 years times y. Ownership of both parts of the instrument should be transferable, so that the party B, if they expect they are not around in 20 years to collect winnings for reasons unrelated to nuclear armageddon, should be able sell it to someone who expects to be around to collect the winnings.

Expand full comment

You don't actually need to use real money for the prediction market system - bet something similar to Reddit karma instead.

You accumulate karma by making comments that don't get banned, or by better correctly on comments that do get banned. You lose karma by losing these bets. There is also a number of comments posted counter to compare against current karma balance, as well as account age.

At a glance, you can distinguish between:

Lurkers who do a lot of your content moderation who are good at it.

Someone who makes a lot of rule-breaking comments

Someone who makes a lot of comments and is also doing a lot of good moderation

Next you need an algorithm that can weigh all of these factors when considering the sway of each individual vote - votes from high karma to commenting ratios are very favoured vs low karma votes.

The trick is that karma should be visible to the user, as an incentive to actually use the system. Another incentive could be participation in the comments of controversial posts.... Which would motivate a decent chunk of this blog's audience IMO.

Expand full comment

Tying a a person's pay to someone else's performance does not seem fair.

Expand full comment

> DARPA investigates how prediction markets do vs. expert surveys when guessing the results of social science studies. Answer: neither of them does well.

Given only 20-40% of social science studies survive replication, I'm not sure how reliable this result is.

Expand full comment

I got a prototype for a comment moderation prediction market working 😁! I've emailed you the details.

Expand full comment

The major reason that merit pay won't work (even with good value added metrics) is that there's very little variation to measure. In the United States today, within the same school system, students do not learn noticeably more or less from one teacher than another. Partly because they don't learn much at all; for most student most of the time, things are learned, used for a test or project, and then forgotten--a process that begins the next hour and is largely complete in a year. But mostly because 1) most teachers are similarly good in presenting testable information, and 2) how much is remembered depends mostly on the student: how smart the student is and how interested.

It is certainly true that some teachers are more interesting. Some are nicer. Some have a better voice or personality. Some present things in a more sophisticated or challenging way. All that may make a big difference to how enjoyable a class is. But for the vast majority of students, it won't make much of a difference in how much is remembered or how much can be used after the class is over.

Expand full comment

What would happen if we set a question of how much would a stock or crypto cost in 3 years for example. Would the live price follow closely the predicted one, or would the predicted one fluctuate along the live price?

Expand full comment

Better than prediction-markets is: the market. See: driving schools. You only open one, if you feel sure to be a good instructor/manager. And you take very good care the instructors you hire do their job well. Else your school is gone very soon indeed. Potential customers look at the reviews / ask around - no big worry not to get actual instruction in car-driving. - Dath-Ilan probably has vouchers - though even poor Indians manage to just pay up for real instruction: https://www.economist.com/asia/2018/10/11/indian-states-are-struggling-to-lift-public-school-attendance - first 9 sentences free to read - and all you need to know

Expand full comment

prediction markets don't exist because no one cares. The two big prediction markets in the world, i.e. financial markets and sports betting exist because either they are economically useful (and importantly can be hedged by a market maker) or people care about the outcome being gambled on. Also why the largest 'prediction market' event is US elections.

On the Cowen article saying why there aren't markets to bet on home price appreciation, futures markets for this literally exist and are available to be traded on the CME, but they have no liquidity because you can't hedge them and no one cares

Expand full comment

> What if you promote teachers whose students tend to gain many points on their (relative position in) test scores compared to last year? This is the idea behind value-added models [...] Various studies show this works much less well than you would think.

I mean, I immediately think this is a terrible idea, so I'm not sure how much worse it can get.

An average teacher can game the system in this model by just selecting students with (high test score variance and) low (read: below-mean) test scores, and rely on regression to the mean. (It's not _quite_ that simple, because most of the time a teacher can't directly select which students they are teaching... but there are indirect methods & correlations.) Of course, this would show up in correlations between teacher ratings and _past_ performance of students...

> [...] a teacher’s VAM can apparently predict their students’ past performance [...]

...oh.

Expand full comment