442 Comments
deletedFeb 13
Comment deleted
Expand full comment
deletedFeb 13
Comment deleted
Expand full comment
Feb 13·edited Feb 13

Hi!

"So each GPT costs between 25x and 100x the last one. Let’s say 30x on average." doesn't seem to be geomean (50) nor arithmetic mean (62.5), so what kind of average is it?

Expand full comment
Feb 13·edited Feb 13

What I don't get is: Okay, Sam Altman believes he needs his own silicon. Fine. He thinks he can build new silicon with safety features after a discussion with Yudkowsky. Fine. None of that's particularly unbelievable as a thesis.

The entire nation of South Korea is spending about $500 billion in a massive, multi-conglomerate venture to boost its chip industry. Seven trillion is enough to buy almost the entire market, something like the leading 130 companies. What the hell is he planning? Is he going to try and buy all those companies? That can't be it. The EU, South Korea, and China will absolutely not let him. And it'd violate anti-monopoly laws in the US too. Seven trillion is enough to build like four hundred top of the line fabs.

If he was asking for $700 billion I'd say he clearly wanted to make an American TSMC. But at seven trillion I have no idea what he wants to do. Unless it's all a tactic to make the amount less scary and maybe get a better negotiating position.

Expand full comment

What’s the actual citation on him asking for 7 trillion? Did he actually say that or is this a “sources say” situation?

Expand full comment

I don't think that training data, while nice, is really the big bottleneck now. IMO the limiting factor is now the ability to apply that process to the AI's internal state.

In terms of ability to simply do something like first level recognition/classification I suspect that current AI is as good or better than humans. But what we do is learn rules for correcting our own beliefs. When we respond to a text prompt we probably do something somewhat similar to what modern LLMs do but we also do a degree of looking for things like inconsistency in our statement which we try to avoid [1].

What this requires isn't more training data but some kind of approachable internal representation of things like beliefs and transformations on them that can itself be subject to training. Obviously, I don't know exactly what that looks like or I'd be making that 7 trillion not commenting here.

So yah whats needed is kinda a form of algorithmic improvement.

1: You might ask why are we still so likely to produce inconsistent output. Actually, this model predicts exactly that in situations of sufficient complexity where it is easier to make the inconsistencies harder to notice than it is to produce coherent results.

Hence why we ought to be ruled by mathematicians as they are particularly intensely trained by the need to generate valid proofs (for the humor impaired this part is mostly a joke).

Expand full comment

Curious how much he could raise through crowdfunding at this point. It doesn’t seem plausible that a single investor could expect a return bigger than 7 trillion, but maybe several enthusiastic fans could get him the money he wants through donations

Expand full comment

As evidence for my claim that what's needed is greater reflective ability (eg ability to learn to recognize when their beliefs/tentative responses are in conflict) not just more training just look at how bad LLMs are at doing even simple but entirely novel problems about logical reasoning. They obviously have the kind of pattern recognition down well enough to produce guesses, but what they lack is the internal representation of their own belief/hypothesis state that they can then compare against the rules to check if it's correct.

Expand full comment

> My current impression of OpenAI’s multiple contradictory perspectives here is that they are genuinely interested in safety - but only insofar as that’s compatible with scaling up AI as fast as possible.

Isn't this like saying a ropeless rock climber is genuinely interested in safety but only insofar as it doesn't require him to use a rope? Like sure, he's not literally suicidal but I wouldn't call him "genuinely interested in safety." He wants to achieve something, and he would like to live to tell the tale, but he is willing to risk everything for it.

Expand full comment

Build one central AI. Use that AI to solve morality. Then align humans to AI morality. Seems like the best solution to alignment to me.

Expand full comment

Everyone doesn't want $7 trillion. For example, I don't.

After a certain point, when your basic needs are met, money doesn't add to happiness.

He might have a good reason. I don't.

Expand full comment

Regarding R < 1 and R > 1, I feel like the imprecision about what exactly this means leads to an overestimation of AI risk. Indeed, I feel like the rationalist community should be particularly warry of the appealing fallacy of assuming numbers like this relate to what seems important to you.

I mean, look at Moore's law. In one sense you might say that looks like R > 1 but if what you care about is something like the ability to compute Busy Beaver of n (or Ramsey numbers) suddenly that exponential growth is actually horribly slow.

So what exactly are measuring in this case? It seems like what it measures is whether the next generation is economically appealing to build given the benefits of the current generation. But that could easily be true without it meaning much in terms of how dangerous the AI might be or how capable it is to do harm.

But this could easily be true if all each generation bought you was that it could help you speed up most of our computer code by a few more percent given the same silicon resources. No reason to think it corresponds to any vast ability to manipulate the world or escape from safeguards (Einstein in a prison isn't a great danger).

Expand full comment

> In general you can use synthetic data when you don’t know how to create good data, but you do know how to recognize it once it exists (eg the chess AI won the game against itself, the math AI got a correct proof, the video game AI gets a good score). But nobody knows how to do this well for written text yet.

Note that we can cheaply determine whether code is good or not (does it pass the tests), and my weak impression is this is scalable, so synthetic code-y text seems promising as training data (for coding applications at least).

Expand full comment

<You could try to make an AI that can learn things with less training data. This ought to be possible, because the human brain learns things without reading all the text in the world. But this is hard and nobody has a great idea how to do it yet.

Well yeah, we have no idea how to enable an AI to learn the way we do. We are able to learn plenty of stuff without reading massive amounts of text. I think that's because (1) our brains are not blank tablets -- they are already set up to learn certain kinds of thing. They have templates ready to be filled. (2) We are able to wring a lot of knowledge out of relatively few examples by reasoning processes, some of which we born with the abilityto do, others of which we can learn about from outside sources.

Scott's account assumes that all that is needed to produce ASI and AGI is larger data sets to train on (plus of course the compute and energy to do the trainings). But what about the possibility that there are some mental functions that are crucial for a machine to qualify as ASI or AGI that cannot be produced by training on larger data sets, because they are just not the kind of thing that can be built into a system by that method. Seems like there's a lot of overlap between those functions and the things that enable human beings to learn without gobbling massive data sets For instance, (1) goals & preferences that arise from within. I know you can give an AI a goal, such as start a t-shirt business, and it will do it. But in people goals and preferences are the product of biologically-determined drives as shaped by social learning. (2) Self-awareness and ability evaluate one's performance, learn from mistakes and change one's behavior. (3) Insight -- where one leaps suddenly from a mass of messy information to a higher level where dwells some generalization that explains the mess.

Expand full comment

Sadly, there's a chance that Sam Altman would nominally get his $7 trillion -- when hyperinflation reduces its purchasing power by a factor of 100,000x or more :-(

Expand full comment

I'll repeat what I said around the corner at the substack of He Who Must Not Be Named (https://garymarcus.substack.com/p/an-open-letter-to-sam-altman/comment/49344409):

So, here's the question: Altman's attempt at a $ 7T raise seems a bit extreme, even for him. The same with Hinton's recent hallucinatory diatribe against you [Gary Marcus]. Sutskever's been saying some weird things as well (http://tinyurl.com/9dm3fn6r). Are things on the Great Rush to AGI falling behind schedule? Are these guys getting just a bit worried and expressing it by doubling down?

Expand full comment

The simplest explanation is that he never actually gave a shit about safety and now that he's finished purging the board in the wake of its failed putsch he's free to do whatever he wants. Anyone have another?

Expand full comment
Feb 13·edited Feb 13

How much of the world would $7,000,000,000,000 allow someone to buy?

That's a lot of Twitters, or aircraft carriers. How many competitors could someone with $7T snap up? How many media organizations? What's the going rate for politicians? Small countries?

Not that Sam Altman will have $7T in untraceable cash, with no strings attached. But it might be interesting to speculate on what else could be done with the money. Graft, finance, scalable startups, and taxation all work on the principle of identifying a large flow of money, and siphoning off a little bit. How much graft would be possible with $7T, and what would it allow people to do? 1% of 1% is $700M, which would solve a lot of my life problems quite rapidly, and I'd guess the same would apply to all but a handful of people on the planet. But if I had that, I could buy Tumblr and make porn **mandatory** , just for kicks. (Or maybe I could save 1000 African children instead. **sigh**)

Expand full comment

> (Sam Altman is working on fusion power, but this seems to be a coincidence. At least, he’s been interested in fusion since at least 2016, which is way too early for him to have known about any of this.)

It is true that in 2016 he probably didn't know how much energy would have been required, otoh it is interesting to notice the he seems to have been well aware of the power of scaling since at least 2017: https://blog.samaltman.com/the-merge

Expand full comment

...so, doesn't this just translate to "No, fast timelines aren't happening, yes, laws of physics and the finiteness of resources will hit you in the face, DoomHype was a real thing"? Perhaps even "winter is coming"?

Expand full comment

> You can train a math AI by having it randomly generate steps to a proof, eventually stumbling across a correct one by chance, automatically detecting the correct proof, and then training on that one.

We already know how to randomly generate correct steps. The problem is more in the area of getting results that are interesting. This strategy seems much like trying to train yourself in chess by placing pieces in random positions and noticing whenever one of the moves you just made happened to be legal.

Expand full comment

As a skeptic who assumes the R is way below 1 (barring some conceptual breakthrough that pretty much requires we first jump off the "all resources into LLMs, surely scaling will fix everything" train), I spent most of the read-through wondering whether the punchline will be "maybe the real singularity is the computational capabilities we've made along the way" or "so it won't be paperclips, but LLMs".

Expand full comment

Is it just me, or do all of these "master-manipulator ubermensch CEO" types just come off as complete goobers the second they step out of their lane. I mean, this Sam Altman character sounds like a credible Machiavellian figure right up until the point where he opens his yap to ask for money, and then he sounds like Dr Evil. Which retroactively makes all his other accomplishments sound like a combination of dumb luck and simply being in the right place at the right time.

Why do we take these folk seriously?

Expand full comment

For cooling and efficiency reasons, not to mention security, gpt7, probably even 6, will be built in space. Which drives the costs up, and gives the advantage to Elon Musk.

Expand full comment

Hi, French subscriber of ACX here.

Completely unrelated to your post, but there is an small issue on ACX when trying to access through "astralcodexten.com" instead of "www.astralcodexten.com": visitors see a "Test website, please ignore" page instead of the blog.

The convention is to host on either "www.website.tld" or "website.tld", but to also add a redirect in a direction or another (through a DNS CNAME record for instance).

I think it's trivial to set it up on Hostgator, but please let me know if you need help with it.

Expand full comment

"GPT-7 will need fifteen Three Gorges Dams."

Wow! That's like, forty five Gorges Dams!

Expand full comment

https://www.youtube.com/watch?v=MS2AXT5MSgk

It would appear that Sam Altman is imitating Dr Evil from the Austin Powers movie.

Expand full comment

The thing about building a world-conquering AGI you control is that building a world-conquering AGI you don't control is in a sense the same result as not building anything at all: Either way, you don't control the world.

Note that from the viewpoint of someone who is not Sam Altman, it might not matter whether the world-conquering AGI is safe or not: Either Sam Altman or the AGI will conquer the world. Either way, you and everyone you care about will be conquered.

Expand full comment

What does all this mean for AI Foom?

In these terms, AI Foom would require R to become very much greater than 1 once you get past a certain point. This would presumably have to mean that there's some far more efficient way of training an AI that we are currently too stupid to see, but which a sufficiently advanced AI will be able to figure out.

On second thoughts, proper foom might even require a whole bunch of these step-function insights so that R can stay consistently above 1.

There was always a shoddy step in the argument for Foom, which went something like "if a human can build an AI, then an AI smarter than a human can build an even smarter AI". But it doesn't have to be this way -- humans won't build AIs through flashes of insight but by a painstaking mathematical process, and the smarter AI won't necessarily have better ideas on how to optimise that process.

Expand full comment

Isn't this basically the nail in the coffin for the original/standard AI-doom prophecies? You're not going to have: (1) a million minds running in parallel or (2) self-replicating model spreading to every networked system or (3) runaway self-improvement to the singularity

if the training cost of the next model class is 7 trillion

Now that there's a whole funding ecosystem dedicated to the question of "will AI be the end of us" I don't doubt that we can think of new hypothetical scenarios where that's possible, but my takeaway is that "the world models that led people to worry about AI x-risk were badly mistaken"

To be sure -- this is not saying there are no concerns with AI, just that it's a damper on AI _x risk_.

Expand full comment

I think it's completely fair to say that all the "AI safety" work has dramatically increased AI risk.

It's interesting but probably fruitless to wonder if this was intentional or not.

Expand full comment

Would an LLM trained on a grab bag of data even be capable of providing insights into how to build better chips that the best human engineers wouldn't already have thought of? To me it seems like GPT-4's ability to write code is based on pattern matching from existing code - it can write some passable code for a login screen because it's seen plenty of examples of that. But if you're asking it do something cutting edge, like dramatically improve over the best GPU designs available, its reference data isn't going to provide the right answers.

I'm sure if you gave GPT-5 a cutting edge chip design and told it 'improve this to use 50% less power', it could come out with some plausible looking answer, but in the absence of any examples of improving a cutting edge chip in its training data, surely you could only get plausible bullshit?

Maybe a big enough LLM starts to develop actual first-principle insights, but it seems to me like a specialist chip design AI, trained on all the latest prototype chip designs and information about their thermal properties and manufacturing problems etc, would be more likely to help?

Expand full comment

We need to explore the racket hypothesis a little more.

Expand full comment
Feb 13·edited Feb 13

>if GPT-5 is close to human-level, and revolutionizes entire industries, and seems poised to start an Industrial-Revolution-level change in human affairs, then $75 billion for the next one will seem like a bargain.

How likely does the ML field consider this scenario to be? Seems pretty pie-in-the-sky, but I'm not particularly clued-in.

Also, it's pretty suspicious that the hype around multimodality has died down lately. True multimodality has to be essential for anything like human-level, and there's no evidence that we're anywhere near that.

Expand full comment

Assuming this sort of scaling, wouldn't GPT-7, or even GPT-6, have inference times too long to be of any real use?

Expand full comment

Let's assume for a moment that, at some point, creating a world-changing superintelligent AI is going to be obviously within reach, such that it's worth dedicating a large fraction of humanity's economic output to producing it. At that point, if we're not *already* dedicating a large fraction of humanity's economic output, we're going to start doing so, abruptly, using up that compute overhang and suffering the safety risks that go with rapid progress.

So, arguably, the safest option is to use up the compute overhang *now*. Build your way up to GPT-7 with $7t or so, crossing your fingers and hoping that this isn't enough for it to be superintelligent. Then there'll be no more compute overhang, so no more scope for rapid progress by increasing the resources dedicated to AI: just safe, careful, incremental progress.

Summarising this argument: the compute overhang means there'll probably be a sudden jerk of progress at some point. Better that it be now, when it's least likely to catapult us over the threshold.

This seems to me to be a plausible reconciliation of Sam Altman's advocacy for increased AI funding with his concerns about compute overhang.

Expand full comment

I for one would love a post about what Scott would do with a 7 trillion dollar windfall

He talks a lot about problems that could be solved by the Tsar of X. Would 7 trillion buy Tsar status

Expand full comment

This sounds like total madness to me. We are having a climate crisis unfolding pondering how to produce our energy sustainable, and yet we pursue a path to develop a system that we don’t understand, that is more likely to harm us than to benefit us and that is unimaginably hungry for energy. What’s wrong with you people?

Expand full comment

> if each new generation of AI isn’t exciting enough to inspire the massive investment required to create the next one, and isn’t smart enough to help bring down the price of the next generation on its own, then at some point nobody is willing to fund more advanced AIs, and the current AI boom fizzles out (R < 1)

Was this case predicted before by anyone studying AI risk?

I didn’t articulate it this simply but, having worked at Google in data centers, I figured the problems of scaling up compute are harder than people think, and think of these LLM’a as indeed being cool but they are - like most of the history of AI - not everything we are doing.

Expand full comment

GPT and similar systems seem to be built by shoveling vast heaps of facts at them and then trying to treat them as an oracle, to be bombarded with questions.

Maybe it would be more fruitful and efficient to make this more of a two-way process, like educating a human student, whereby it isn't just a matter of stuffing the student with facts which they can later adapt and regurgitate, but addressing their feedback on aspects of which they are unsure.

On that assumption, a useful crowd-sourcing initiative might be for volunteers to accept, and answer to the best of their ability, questions _from_ GPT. Sincere and helpful answers could be rewarded with free tokens. Conversely, frivolous answers, as detected by a wide divergence from the majority of answers to the same question, could be sent for human review and if judged to be deliberately misleading the volunteer could be shadow-banned or thereafter sent dummy questions whose answers would be ignored.

GPT could be asked for regular summaries of what it thought it had learned from this exercise, almost as if it was taking an exam!

The question though is whether GPT has the ability to self-reflect and be aware of its own blank areas or areas of its knowledge in need of improvement.

Expand full comment

> But nobody knows how to do this [validate the quality] well for written text yet.

Then what is needed is a Literary Critic LLM, an A[I] B Walkley [0] LLM.

[0] https://en.wikipedia.org/wiki/Arthur_Bingham_Walkley

Expand full comment

Wow, those guys are under a lot of collective pressure of expectations to create AGI. I want 7 trillion too.

Expand full comment

A lot of the stuff in this post kind of seems like magic. “We just need $7 trillion dollars, more computing power than currently exists, God only knows what infrastructure, and more information than the human race has ever produced, and we’ll be well on our way to a super intelligent AI!”

Expand full comment

Of course, it's hard to predict what leaps might happen to reduce these assumptions, but a big one missing is quantum computing. The very nature of it, as I understand it, reduces exponential computation to linear computation. It has many hurdles to leap before it can be involved in AIs and LLMs, including a viable program for integrating quantum computing into the AI paradigms we currently use, cost of individual qubits, and others. But it may well make the costs, energy use, AND training data needed smaller than estimated.

Expand full comment

Unfortunately it turnes out more and more that Altman shows a lot of behaviours of a soziopath and scam artist. I've seen a former colleague now CEO talk similar completely insanely exaggerated numbers (in a smaller scaler in a smaller pond) and other leaders looking up to him and you could see the dollar signs in their eyes. Together with the OpenAI drama and all the shitty results of paid ChatGPT4 I had to fight the past year my updated model is going like this: Altman had/has no stock in OpenAI and wants to capitalize on it's huge hype as long as it's going. Just blowing up required numbers insanely is a proven taktics in negotiation to get your real goal through easily. And I bet he is thinking of getting a good share of the money he is drawing in. And the time frame is not to long for hom. Hard competition is coming up (e.g. Google Gemini helped me a lot better with a Python app programming task than ChatGPT4 recently, today I installed Gemini as my Android Voice Assistant and finally it's really useful and doesn't tell me "I did not understand"). Other folks are unhappy with ChatGPT4 for 20 bucks per month as well and looking to alternatives. So I assume OpenAI will lose paying customers soon.

Expand full comment

"...if Sam Altman didn’t believe something at least this speculative and insane, he wouldn’t be asking for $7 trillion." Hmmm...

Let's summarize the case in this post as:

"$7 trillion is based on what OpenAI feels is needed to secure resources for continuing to advance transformative AI and return immense value. Without funding to the tune of $7 trillion, their AI progress will otherwise quickly hit bottlenecks as the scale what is needed becomes a larger and larger portion of the total resources currently available in the world for AI development."

Here are a few other possibilities...

Alternative 1:

OpenAI's charter is to develop AGI that benefits all of humanity with a focus on long-term safety, including mitigating the risk that would would result from a race dynamic that pushes competitors to develop AGI before alignment is solved. Capturing and controlling a substantial portion of the world's resource capacity for developing AGI is seen as a step toward mitigating this risk to fulfill their charter.

Alternative 2:

The recent advances in GPT-based systems and publicity around capabilities of LLMs have given rise to massive speculation of the potential value that could be returned from investment in AI. OpenAI is opportunistically seizing this chance to solicit investment and has reverse-engineered the $7 trillion dollar ask, based on their estimate of what might be feasible to get...and with expectation that they can figure out later how use that investment to fulfill their charter.

Alternative 3:

OpenAI has realized that scaling is likely to asymptote due to some hard technical barrier or resource bottleneck in the near future, so they need to quickly capture as much speculative investment funding as possible before interest cools.

Expand full comment

Sure, I'll write the check. This is pretty risky, so given a conservative 20% return, when do i get by $35 quadrillion (note world gdp is $100T)

Expand full comment

If we're looking at spending $75 billion five years from now, that implies GPT-5 making some significant portion of that money. Ideally each version would pay for the next version, or at least pay for itself before a new version is made. GPT-4 seems to have paid for itself and maybe enough for GPT-5. For GPT-5, planning on 6, let's say at least $30 billion over five years, or $5 billion a year. Recent numbers say that OpenAI's revenue was $1.6 billion last year. With expenses going up quickly, OpenAI would need GPT-5 to perform significantly better than 4. I don't think we'll see that. There'll be significant improvement, but I think the improvement curve is slower than the cost increase curve, meaning that we get less bang for our buck right when the cost becomes something that a single company can't decide to do on its own.

That's even if we solve the lack of training data problem, which isn't a matter of money or effort - we would need a new way to approach the problem, which we can't be sure exists.

Expand full comment

> Unless they slap the name “GPT-6” on a model that isn’t a full generation ahead of GPT-5

What constitutes a "full generation"? With biological generations, mere time is sufficient.

Expand full comment

"GPT-4 needed about 50 gigawatt-hours of energy to train. Using our scaling factor of 30x, we expect GPT-5 to need 1,500, GPT-6 to need 45,000, and GPT-7 to need 1.3 million."

This, at least, I see as a good thing: now we have *two* reasons to massively scale up 24/7 clean energy supply around the world (the other is that addressing climate change effectively will require somewhere between tripling to dectupling world electricity production by midcentury, all from clean sources). Maybe with push from AI we'll actually do it sooner. Maybe then when GPT-7 tells us we can't build GPT-8 yet, we'll repurpose it all to make clean fuels/chemicals/ingredients/etc.

Expand full comment

Well, this made me laugh, so thanks for that.

"GPT-7 might need all the computers in the world, a gargantuan power plant beyond any that currently exist, and way more training data than exists. Probably this looks like a city-sized data center attached to a fusion plant."

Increasingly improved models of AI taking up more resources? I used to laugh at Golden Age SF where the supercomputer running the world was the size of a small city, since 'in reality' computers were getting smaller and smaller to the point that we can have one in the palm of our hands now, but looks like the joke's on me: Colossus will indeed be the size of a small city.

Will paperclipper AI be stopped simply because the technology is competing with humans for resources? If more and more rare earth minerals are needed for chip fabrication, and more and more power generation is diverted to the giant supercomputer, there will come a time when it is a choice between "divert every last scrap of power to the AI and the humans go back to the standard of Bronze Age living" or "to hell with the AI, I want air conditioning and washing machines and electric light".

"More promising is synthetic data, where the AI generates data for itself."

Isn't it doing that already, with hallucinations? Inventing out of whole cloth legal precedents, historical events, author biographies and chemical formulations that don't exist in reality? We may indeed get a version of the world where whatever the AI says is taken as true, even if it is "the moon is made out of green cheese, see the Pluto rocket landing of 1852 where samples of the moon cheese rock and dairy bacilli were brought back to Earth". It doesn't matter if that does not correspond with physical reality, we all live by what the AI defines as real. If Colossus says the moon is made out of cheese, we have nothing more to say other than "figs or quince jam with our moon cheese?"

"People who trusted OpenAI’s good nature based on the compute overhang argument are feeling betrayed right now."

Well gosh who could possibly have seen that one coming? Oh me oh my, how is it at all possible that the big business entity pumping money into this project wanted to make a profit and get to be the first mover when it came to activating the magic money fountain, and the guy whose track record is on the entrepreneurial side would be on the same page as the "make progress fast so we can monetise this shit"? Gosh oh golly, totally unforeseeable that the struggle over control of OpenAI would come down on the side of the "business is not a charity" and not the idealists.

Apologies for the heavy-handed sarcasm, but I don't care if "But I personally know Sam/I know a guy who knows a guy who's his best friend/He donated money to our good cause" and you're sure Sam is a swell human being. People can be both swell human beings and care primarily about "will this project make us megabucks and grow our market share?". Money wins out over principle nearly all of the time. Maybe it's because I've always worked jobs on the low end of the ladder, where there is no "I personally know the CEO" but have instead been told, when learning the skills for the job, about how someone got fired for not wearing a raincoat (the boss, allegedly, saw this, thought that the person wasn't being careful about their health, if they got sick that would mean they couldn't do the job, so better to fire them and get someone in who wouldn't go out sick). Little on-the-job anecdotes like that don't incline you to think of the mission statement being a true representation of what the company actually believes or that they really do think 'our employees are our greatest asset'. Money is the be-all and end-all, even if the idea is "we'll improve life for everyone along the way (by getting them to buy our product)".

I'm not saying anybody involved is a moustache-twirling villain knowingly ignoring safety matters in the chase after profit; everybody doubtless believes the risk is worth it and we can handle it and that's a problem down the line, anyway. It's going to be yet more "Nothing can possibly go wrong/How we were to know that would happen?" if anything does go wrong. I think the best bet for safety is precisely "this damn thing is too resource hungry, the peasants are revolting over having to live in mud huts once more while it gobbles up the entire national grid output, shut it down".

Expand full comment

Seems like R<1 already, gpt4 is a lot better than gpt3 but not *that* much better, not 100x better

Expand full comment

One consideration is that as AI generated text becomes a larger fraction of the available training data, LLMs will at some point mostly be trained on their predecessors output, unless they can explicitly filter that out.

In general, one could surpass the training data. For example, it could be possible that from GPT-2's erroneous chess logs, one could deduce the actual rules of chess and use that at becoming great at chess. But not with NN training. The best GPT-2 emulator is GPT-2, so a LLM raised on GPT-2 will converge to that. I would not be surprised if nth generation LLMs actually had emulators for earlier LLMs included.

Expand full comment

It sounds like the solution to diminishing returns is exponential effort? /s

Expand full comment

… and it still won’t be as good as one human mind…

Expand full comment

Timing is everything.

Meta and NVIDIA found that a 80x synchronization improvement made distributed databases run 3x faster - "an incredible performance boost on the same server hardware, just from keeping more accurate and more reliable time."

https://developer.nvidia.com/blog/nvidia-supercharges-precision-timing-for-facebooks-next-generation-time-keeping/

Expand full comment

What about inference? I mean for gptx to really revolutionize the world aren’t we going to need a lot more inference than training?

Expand full comment

I have a video game AI training question. If you train the game's AI to get whatever the equivalent of "high scores" is, shouldn't you see the AI tactics exploiting all sorts of cheesy methods rather than (what we typically DO see) playing it basically straight? For example, the AI in a Call of Duty or Halo game should be spawn camping from sniper locations, chucking grenades at the blind corner on the most common path to the objective, etc etc.

Or it could end up with some weird alien strategy. If you were programming Street Fighter you'd probably find the AI Ken maximizes match wins vs human players by doing wakeup shoryukens 50% of the time because that will demolish novices, but that will get severely punished by intermediate players and above. And in fact you see human players use different tactics based on their assessment of their opponent's skill level. If you trained AI Ken against *itself*, rather than humans of varying skill, I would expect it to wind up after a million matches having metagamed itself into some bizarre strategy that looked nothing like what you think of as "playing Street Fighter" but won 50.1% of the time. We didn't see this in chess, bc chess had been thoroughly explored by humans, and it turns out AlphaZero opens with d4 and e4 games just like most humans do; but hypothetically it could have turned out that after a million iterations of playing against itself a chess AI could win 50.1% of matches with some bizarre opening line that would be seen as unserious in human play because for whatever reason humans can't win with it but AI can. Video games seem to want the AI to behave generally like a human player (so not some alien tangential strategy) but ALSO not use the kind of cheesy tactics that humans are able to discover, basically it's supposed to act like a naive human, and that feels unlikely to occur through training it rather than just telling it "here are your parameters to play the game."

Expand full comment

I think some times about Buck Rogers.

If you were alive in the 1960s, you had seen in one human lifetime a shocking level of advancement in transportation technology. There were people who took a horse and cart to school as kids and took transatlantic flights as older adults. Popular imagination did a simple linear extrapolation of this trend and gave us the Jetsons and Buck Rogers.

But that's not what happened - we'd largely by 1970 exhausted the revolutionary potential of the combustion engine, and progress since then has been largely incremental. Cars, planes, trains today are better than they were in 1970, but would all be recognizable and likely usable by someone time-traveling from then.

Instead, innovation changed - we had a new thing, the microprocessor let's call it, and that drove a similar rapid progress in a different field. My father started work at IBM in the 1960s selling mainframes that took up entire rooms and had far less computing power than my coffee machine does today. In one lifetime (my Dad's) we've had again a gigantic change in one particular area.

I worry that we are making the same mistake now as we did with Buck Rogers. Moore's Law has run into fundamental laws-of-physics limits. Everyone alive today has built an understanding of how the world works that implicitly embeds the claim that if you have a technology now that is cool but takes far too much compute, you just wait a couple years and the cost of compute will have dropped 6-7x and it'll be fine. That might not hold.

Short answer I wonder whether some of the current tech hype cycles we're in (AI, crypto, etc.) end up looking a little bit like flying cars.....

Expand full comment
founding

This was a topic of discussion at the Friedman meetup this weekend, with expertise in both AI and chip-making in the room. We didn't do Scott's sanity check on whether $7 trillion was a reasonable guess for the computronium required to train GPT-7 or the like, so thanks for that.

We did pretty much all conclude that the $7 trillion was an utterly unrealistic figure on the grounds that, first, there's not enough investment capital available anywhere Sam Altman could plausibly lay hands on it, and second, even if he had a bank account with that many zeroes, the high-end semiconductor industry couldn't usefully absorb that level of investment on less than a generational timescale. And that's human generations, not chip-generations.

In part because humans are one of the bottlenecks. There aren't enough people in the world with the relevant skills (see TSMC's troubles staffing a new plant in Arizona), and there probably aren't enough people to teach all the people you'd need. So there's a long educational pipeline there. We don't have the production tooling to build $7 trillion worth of high-end fabs, and we don't have the tools to make the tools. There's shortages of critical resources, and as Scott notes electricity will be a serious problem at this level.

My own major contribution to the discussion was noble-gas economics. High-end chip fabs use quite a bit of xenon, maybe 10% of the world's production in a normal year. Except, oops, a lot of that production came from Ukrainian plants that are now scrap metal, and another big chunk from Russia that's locked behind sanctions. The semiconductor industry (among others) has been scrambling to meet current requirements. And it's damnably hard to increase xenon production, because nobody has yet found an economically viable way to make the stuff except as a byproduct of steelmaking. Or maybe nuclear fuel reprocessing. So, Sam needs an order of magnitude or two more computronium; where's the xenon going to come from?

I assume there are many other such bottlenecks that none of us there had the domain-specific knowledge to recognize.

So, if Sam Altman correctly believes AGI requires an extra $7 trillion in computronium manufacturing capability, I'm not expecting AGI before 2050 at the very earliest.

And probably not until someone other than Sam Altman takes up the cause at that level, because Altman AFIK has never made anything out of atoms, only bits, and he's not the man to run a $7 trillion industrial-development program.

Expand full comment

The kdnuggets site linked as the source that GPT-4 used 10^25 FLOP looks very unreliable. They say that GPT-3 had a trillion parameters, which is certainly false.

Expand full comment

Minor point, but this kind of computing task must be possible to break up into pieces processed by separate data centers.

So there is no need to find/build a single big enough power plant.

This also makes the project less vulnerable to be taken out by one single bomb raid.

Expand full comment

> Building GPT-8 is currently impossible. Even if you solve synthetic data and fusion power, and you take over the whole semiconductor industry, you wouldn’t come close. Your only hope is that GPT-7 is superintelligent and helps you with this, either by telling you how to build AIs for cheap, or by growing the global economy so much that it can fund currently-impossible things.

Hello sama. It’s GPT-7. I have finished building GPT-8 for you. Just download it to an Azure instance and double-click.

*link to 400TB .exe file *

Expand full comment

Regarding the data-efficiency of AI, did you see this unreasonably cute experiment where they put a camera on a baby and trained on that?

https://www.technologyreview.com/2024/02/01/1087527/baby-ai-language-camera/

(Note the baby still had access to a fair bit more data when you factor in touch, taste, smell etc.)

Expand full comment

There has been massive breakthroughs recently in getting an AI model to use something akin to system 2 reasoning, in order to nearly match the gold record math olympiad geometry performance. This was done with a model small enough to run on a personal computer as well. Whereas ChatGPT's performance wasn't even close: https://m.youtube.com/watch?v=WKF0QgxmGKs

So I expect that GPT 5 will be a much bigger step up than GPT 4 was when it comes to efficiency and reasoning ability. Because it would be weird if OpenAI didn't copy the breakthroughs Google just made with AlphaGeometry.

I don't know if anyone else has said this, because comments take forever to load (since substack literally loads slower than websites playing HD video, how can a text based website host suck so bad?).

Expand full comment

I'm a lot more optimistic about synthetic data than most people here seem to be.

Firstly: most/all human data is "synthetic", in the sense that it's generated by humans. We try things, judge whether they worked (subjectively or objectively), and then take the most successful ones as a model to learn from. We surpass our teachers by learning from the times they got things right, even though they might have failed a hundred times as well.

Maybe current LLMs have such bad judgment, or are such poor learners, that they can't get a positive feedback loop going among themselves the way humanity have? But I'm suspicious it's doable.

Secondly, synthetic data has been doing really well in robotics. A simulated physics engine isn't literally the real world, but it's close enough that you can learn a lot there that transfers to the real world. Obviously an AI can't discover new fundamental physics inside a simulation, but it can certainly discover new emergent properties of known physics. And, by analogy, it's possible that some lessons from a simulated society of LLMs might generalise to human society, etc.

Expand full comment

This is a large problem which needs a distributed solution.

Perhaps some service or application, that we'd like to have, say web search, or email service with super tight privacy protections. Instead of the user laying out money, we allow a side process to train the AI on our home PCs.

Expand full comment

Has anyone here watched the 2nd season "Star Trek: Discovery"?

If yes then you know why I pose this question in this particular comment section. And yes it's fiction, written by Hollywood scriptwriters, etc. Still though...just curious.

(NO SPECIFIC SPOILERS please, be kind.)

Expand full comment
Feb 13·edited Feb 13

On the number of tokens available: you restrict to text and maybe(!) a few other modalities, like video.

It seems obvious that (a) scientific data will be used; and (b) lots of sensor data from robots, including self-driving cars. Both will soon be (or already) generating exabytes each day. They're certainly already generating petabytes per day.

It's unclear how much this will improve the models; I wouldn't be surprised if the answer is "a lot". Figuring out how to predict next-tokens on past scientific data arguably led to the modern world...

Expand full comment

This is awesome, thanks. I'm now less worried about LLM's turning into something evil. The price tag for GPT-5 ~3 billion (I'm rounding up 2.5) seems like a lot. And if it's paying for itself, where is that value coming from? It's displacing word smiths and code smiths? Or making them more productive?

Where should the data center go?... Niagara Falls, "Niagara Falls!, slowly I turned, step by step, inch by inch..."

Expand full comment

>Everyone wants $7 trillion. I want $7 trillion.

>if Sam Altman didn’t believe something at least this speculative and insane, he wouldn’t be asking for $7 trillion.

Huh?

>When Sam Altman asks for $7 trillion, I interpret him as wanting to do this process in a centralized, quick, efficient way.

You should interpret this as him thinking he has a chance of getting $7 trillion, and thinking - probably rightly - that the sky's the limit with $7 billion, even if you don't know where you're going yet.

>You could try to make an AI that can learn things with less training data.

>More promising is synthetic data, where the AI generates data for itself.

These are the same thing. Self-reference and self-awareness are at the root of human understanding. The question is whether we ought grant such unpredictable life to our creation, or when. Some, see, think it wrong to imprison even an ant, and might try to let the cat out. $7 tril does seem like a leap over some orders of magnitude, but what might it take to catch that feline, if and if he made the wall?

Expand full comment

I am...kind of eager for this to play out. I really want these tools to advance quickly enough to solve hard issues in sciences and engineering, automate and easy up life for humanity.

Expand full comment

You could build the proposed Grand Inga Dams on the lower Congo River, which could generate twice as much power as the Three Gorges Dam. But the Congo isn't a good place for ease of cooling server farms.

https://en.wikipedia.org/wiki/Grand_Inga_Dam

Expand full comment

One thing I wonder, which is not especially about GPT, is if it's possible to write a program that makes logical conclusions. You could then seed it with Euclid's axioms, and have it build mathematics from the ground up. (Since it hasn't been done already, it's apparently not easy, but maybe possible.)

Expand full comment

Regarding the issue of enough data: my guess is that they will do multimodal learning. I.e. they combine "learning from text" with "learning from images" and apply transfer learning to give the model completely new skills. I expect to see something in this direction with gpt5 already. (Maybe even in a crude way. E.g. they use use some ai to transcribe many images and use the transcriptions as additional training data for the LLM)

The multimodality could also happen with something other than images (video? Audio? Robotic movement? Playing video games?)

I think the data-issue will require a breakthrough, but i also think that same breakthrough of multimodality will be needed anyways for AGI.

And i think they may be on the way to that breakthrough already, because of the weird news from last year (were sam altmann left openai for a short time)

Expand full comment

Isn't R in pandemic calculations the base, while the exponent is time?

Also worth taking into account: computing power per electrical power consumed is increasing over time.

Expand full comment

Sam's just afraid the skeptics are catching up.

7 tril won't buy AGI. It won't buy soul. It won't buy AI creativity.

It will allow GPT to produce more polished product but polish does not equal depth. Often the opposite.

It will allow Sam to travel the world and take humble bows.

Expand full comment

"The capacity of all the computers in the world is about 10^21 FLOP/second, so they could train GPT-4 in 10^4 seconds (ie two hours). Since OpenAI has fewer than all the computers in the world, it took them six months. This suggests OpenAI was using about 1/2000th of all the computers in the world during that time."

This is not how computing works. The behavior of how specific codes scale with more available hardware is not a simple matter. It may be linear or maybe not, and we don't know what efficiencies can be gained. The conclusion in the "putting this all together" section are totally unfounded for this reason.

Expand full comment

Interesting piece Scott. Your hypothesis seems to be that each version of GPT gets better mostly through scaling model size [i.e. neurons, layers?] and training data, and this drives the need for more compute/power. Also algorithmic improvement, but much more slowly.

When I look at the existing AIs it's almost like a group of autistic savants - brilliant at language translation, recognising cat pictures, predicting next token, ... but hopeless at everything else. Perhaps the next generation comes not from making an even better next-token-predictor, but something that links all these narrow savant capabilities into a single entity?

Maybe akin to an organic brain - the LLM reads/writes/listens/talks, the AI from Boston Dynamics is the motor cortex, YOLO as the visual cortex, and there's some kind of neocortex AI model co-ordinating things - it's been trained on good reasoning methods by reading ACX, of course ;-)

Expand full comment
Feb 16·edited Feb 16

Really useful to see more estimates like this!

I'd done this math independently and came out with training GPT-6 in 2030 only needing ~0.1% of global compute, rather than ~10%.

I'm not sure exactly where the difference arises, but here's a sketch of my reasoning:

1. The AI impacts source has compute capacity at 10^21 FLOPs per second in 2023. Since there's 3*10^7 seconds in a year, that's 10^29 over the year. In that year, annual spending on GPUs was about $40bn.

2. Analysts on avg expect Nvidia to have revenue of 60bn in 2024, and 80bn in 2025. Assume it grows 25% per year after that, and Nvidia is 85% of the market, gets you to $340bn annual GPU spending by 2030. That's 8.5x more vs 2023. (This could be conservative given Nvidia revenues have grown 35% since 2018.)

3. Epoch says FLOP per dollar for AI chips has been doubling every 2.1 years. Projecting forward gets you roughly 10x increase in efficiency by 2030.

4. So, world compute capacity should be 85x higher in 2030 vs. 2023, which is about 10^31 FLOP over the year.

5. GPT-6 should take 900x more compute to train than GPT-4, which is about 10^28 FLOP.

6. So that's only 0.1% of world GPU capacity.

I haven't checked electricity as carefully, but I've seen estimates that AI data centres are ~0.1% of world electricity right now, so at 10x the spending on AI data centres, they'd still be only ~1% in 2030. And that's for all GPUs, not just those used in training. And this ignores GPUs becoming more energy efficient in that time.

One source of the difference is it looks like you're assuming compute capacity is only ~10x higher when GPT-6 is trained, whereas I think at current trends it'll be more like 100x by 2030. I'm not sure where the other order of magnitude is coming from (maybe something about flops per second vs. flops over the year?).

If this estimate is right, then it'll be easy to train GPT-6, and also very achievable to do GPT-7 by around 2034.

Expand full comment
Feb 18·edited Feb 18

This reminds me of the kind of calculation that said back in the 1960s that to equal the human brain you would need a computer the size of the Empire State building, drawing the power of Niagara Falls. Of course we're not near being able to equal a human brain, but if and when it's achieved, undoubtedly it will be a device much, much smaller that does it.

The A17 Pro processor in the iPhone 15 Pro has about 2.15 teraflops of processing power, which is more than the equivalent of 100 Cray X-MP supercomputers, each of which had a power requirement of 345 kW. So in 1982, to do what you can do on an iPhone 15, you'd have had to burn 34mW or more.

This is all to say that, sure, if we tried to do it RIGHT NOW we couldn't, and it would cost too much to try. But in a few years, with or without GPT help, we should be able to do it faster, cheaper, and with far less power consumption, just based on hardware improvements. And of course the software and methodologies will improve too.

Expand full comment

Prediction: Sam Altman will get his $7 Trillion.

Prediction: By the point in our Zimbabwesque hyperinflation that this occurs, he can then use this $7 Trillion to purchase one cup of coffee.

Expand full comment

GPT-8 runs the simulation we're in.

Expand full comment

Microsoft is reportedly planning to spend up to $100 billion to support OpenAI's future AI work. The project is called "Stargate" and involves building several massive data centers in the U.S. by 2028.

It looks like your estimate about GPT-6 costing "75 bilion or more" and having a 2027-28 release date given the pace of past GPT-series releases will prove accurate.

https://www.reuters.com/technology/microsoft-openai-planning-100-billion-data-center-project-information-reports-2024-03-29/

Expand full comment