On my original scroll-through, with no context and no real interest, my brain immediately clocked it as "photo of Paris, whatever". But once you actually look at it there's clearly a lot of things wrong with it. My second guess would have been that it's a particularly lousy AI-generated picture of what Paris might look like, and some replica in China would have been my third guess.
Trick questions can be easy if you have reason to suspect you're going to get a trick question, but hard if you're not thinking about it.
I've been to Paris and my reaction before reading the answer was "I don't remember that shot, but then I didn't walk in a giant circle around the Eiffel Tower, so how the hell would I know what it looks like in every possible photo? And I definitely couldn't give the GPS coordinates of a particular spot in Paris where you see the Eiffel Tower like that!"
You wouldn't have had to walk in a giant circle. I'm imagining just four straight segments, maybe a hundred yards long each.
(My suspicion after staring at it for about ten seconds was that it was photoshopped, and didn't exist. But that was mostly due to Scott's caption about it being his favorite.)
I first thought it was obviously Paris. Then I thought it obviously couldn’t be that obvious. Then I thought it could be any old place with some imposing, somewhat Elysian buildings, and a strategically located, invisible glass table with a little model of the Eiffel tower on top.
All you have to do is remember that the geography of the area looks nothing like this. There's a series of large flat lawns on one side and a street on the other side. No nearby buildings or hedges like this.
I got the last one right because I've been there :D although I didn't actually know the name of the complex and would have just guessed "the fake Paris complex in Hangzhou" (I live in Hangzhou). Rather a surreal place.
When I first came to China in 2013, it was also fairly common to name city districts after more famous areas from other cities in other countries (in Chinese transliterations) - Zhengzhou, where I lived at the time, had a "Manhattan" for example, and I think one other. But I vaguely remember hearing at some point that this practice was banned.
It's an interesting example though - I'm not sure I would have guessed it correctly if part of my reasoning did not include "well, it's obviously not Paris, so what would be an amusing place it could be that would warrant inclusion in this post?" I have a terrible visual memory, and didn't particularly recognize anything that would distinguish it from any other real or fake Eiffel Towers that may be out there.
To be fair a lot of cities in the world have a small fake version of China (a China-town, if you will) in them, so it seems appropriate that there's a small fake version of other cities in China.
I've never been to Europe but that's not Europe at all lol. I would have guessed somewhere in America simply because I'm sure there's lots of replicas there.
I think more like 8? Atlanta, Austin, Milwaukee, and Michigan don’t exist.
Now I’m more interested in Paris, Tx and Paris, TN. Dedicated same year. Same height (!) until Texas added a cowboy hat in 1998 to add some…seems like a quirky documentary!
Boy, a lot of those don't look much like the Eiffel Tower, at least in their canonical images. I mean, they share the schema of a slightly curved steel lattice tower, but I'd never mistake them.
ETA: I'll grant that Tianducheng is a pretty good match.
I took the opposite away from it - a cheap gotcha. How is it a sign of intelligence whether or not you've been to, or intensely studied images of, n+1 Eiffel Towers?
One hint would have been the size of the tower against the row of houses in the back. The real tower is about 300m I think, so that would make the row of houses 100m high, and probably more as the houses appear to be quite a bit further away than the tower itself, perhaps 150m. A skyscraper-sized city block would at least be unusual, so the conclusion that this is not Paris would be near.
At least this is what I might plausibly have figured out in hindsight if I had bothered to guess. With regard to its real location, I would have had no clue. Of course, if Geoguessers are allowed to use search engines, doing an image search for "Eiffel tower replica" would probably have worked.
Right, the Chinese replica is 108 meters or 354 feet tall.
And, all apartment buildings in the main Haussmannian district of Paris are 6 or 7 stories tall, including the garret where the starving artists live. So, the Eiffel Tower literally towers over Paris.
The general takeaway is that today's LLMs are pretty good at impressing people who are not paying much attention to detail and/or don't know anything about the subject and/or don't double-check the LLM's output. I've seen this in code, art, general knowledge questions, and now GeoGuessr.
If AI really brings about the downfall of humanity, it won't be because we lose John Connor to a time-travelling Terminator, but because we drive off a harbor wall when our GPS nav system tells us to.
My "concern" is more that they fail to bootstrap effectively and we're left with the ability to massively regurgitate midwit pictures/text/whatever. Then, no one bothers to learn to think anymore because they can rely on the midwit machines, and we just stagnate.
I have a more optimistic interpretation of the same technical trajectory. If midwit-level intellectual output is reliably doable by machines, the only stable (and thus, inevitable) economic / sociological solution is to reallocate prestige rewards for intellectual labor accordingly. Mediocre pundits and other sloppy thinkers will get their niches automated away. https://slatestarcodex.com/2014/01/28/wirehead-gods-on-lotus-thrones/
Anyone who wants the dignity and power associated with recognition of personal excellence - true adulthood, rather than a "script kiddie" bashing together tools they barely understand - will have casual access to AI tutors of superhuman patience, diligence, and encyclopedic familiarity with every established scientific or artistic subject, the clearest possible path laid out for virtuous ambition to climb... but still need to put in the work to make something special of/for themselves. Some will do so. There'll also be more than enough material resources to keep the rest of us mostly happy, and out of serious trouble. https://egypt.urnash.com/rita/chapter/aleph/ Achieving everything we know how to want is better described as "thriving" than "stagnation."
Agreed. In the context of the post (and the OP of this comment thread), I saw it as a parlor trick, trying to pass off one capability (a readily accessible and queriable compressed version of all images posted on line plus their metadata) as another (critical thinking and general ability to discern things).
The former is a powerful and impressive capability, but it's also a known and understood capability of LLMs/LMMs. It doesn't cause me to update on anything w.r.t. rapid take-off or AGI or whatever.
Yeah, this is an interesting one. I immediately knew the destination as well, although I couldn’t have explained it as well as people in the comments did. This particular Eiffel Tower replica was notably featured in a music video by the xx.
That Eiffel Tower looks very short - there are some high rises in the back that look *maybe* 10 stories tall if that, and are much further back, and even then come up a quarter of the "Eiffel Tower" in the picture. So that caps its height to 140m tops, meaning we're looking at an Eiffel Tower replica. The text on the signs looks like blurry CJK ideograms not blurry latin script. I definitely wouldn't have done any better than "somewhere in that region" though unless I was allowed to do google searches.
That said, if I had never seen the half-scale Eiffel Tower replica in Vegas maybe I wouldn't have an intuition saying "that Eiffel Tower looks even smaller than the Vegas one".
I've been to Paris so I know the area around the Eiffel Tower looks nothing like this, but if I wasn't given any reason to think about the answer, I might have gotten lazy and guessed Paris anyway.
I was going to attach this, too. It was a good video. For somebody who is mostly just involved in his niche, Geoguesser, he has been pretty awesome on the AI videos, discussing how they may get tripped up, what his thought process is vs. how the ai approaches it, etc.
The Tucuman images are pretty hard to guess unless you know the specific valley, it's an outlier compared to the vast majority of other Argentina rounds (90% of Geoguessr rounds that look like that will probably be in Peru/Bolivia)
> “Trevor Rainbolt is as far beyond the rest of us as a helicopter engineer is to a chimp, and if you didn’t predict it was possible for a human to guess the location of a picture of blue sky, then you’re going to be extra-double-surprised by whatever superintelligence can do”.
This is just an unreasonable way to spin it. The average human wouldn’t be able to predict that someone could make a big hunk of metal fly by spinning a few tiny metal blades, and certainly not that you can make atomic bombs, electricity, etc. And yet, this is the kind of uncertainty and unpredictability that we’ve all been living in since the discovery of fire.
Someone getting a feel for how Google Maps pictures of skies in different regions are different from each other isn’t more bewildering to me than electricity or the atomic bomb. It’s very much the same order of magnitude.
And yes, these are all amazing and bewildering things. But they *happen all the time*, and have been happening for a while. I don’t remember anyone freaking out about pro geoguessers in the same way that people are about superintelligence, after all. You’re watering down the claim about superintelligence and a chimp-to-human leap to pretty much just “things will go on as usual”.
Exactly my thoughts. It's impressive, but pretty much all high technology is like that and yet it's so implemented in our daily lives that we don't even care to notice. I'm not too impressed with AI being able to figure out photos of landscapes. That's exactly the kind of thing I would expect it to be good at.
I also would like to see rainbolt guessing 100 straight open sky photos. I'd imagine he only puts the most notable guesses in his videos only. If 100 random people sent him 100 blue sky photos, would he guess many of them?
Saying "my ignorance about geoguessrs is like my ignorance about what superintelligence can do, because they're both ignorance" is not a valid argument to say that all things you are ignorant about don't matter.
Hmm... There is an extremely short-term version of the argument with some validity: All things which haven't forcibly impinged on my consciousness yet are unlikely to matter to me in the next few seconds (caution: not valid in areas with ordinance in flight).
Yes. I view this as evidence that we're really bad at understanding what human capabilities actually are outside of our own specialties/interests, and as a result, are often Gell-Manned into thinking AIs are much more impressive than they are.
I've watched a bunch of Rainbolt videos (though I had no idea that was his real name) and also a bunch of another guy called Zigzag (presumably not his real name). They're impressive but not chimp-to-human impressive; the top geoguessr players seem to be at the right level for people who spend a lot of time practicing geoguessr.
When they're not picking on things that they've deliberately studied like road markings, bollard types, and properties of the actual mapping vehicle and camera, they're often just picking up subtle things that become obvious once you've spent enough time looking at an area like the vegetation and the colour of the rocks and the soil.
Coming back to AI, it's not at all surprising that an AI trained as an image classifier to play geoguessr would be good at playing geoguessr, but it's somewhat surprising that it managed to get pretty good at it without ever being deliberately trained.
Yeah as someone who is an avid watcher of Geoguessr and a reasonably strong player I think Scott is just underestimating how good trained and experienced humans are at this. Not that it's not impressive that an AI can do it, but I think there's a few pieces of context missing here that you would only know if you watch a lot of high level Geoguessr
1. Rainbolt is not the best Geoguessr player in the world, although...
2. He is arguably the best Geoguessr player at this sort of "low information blurry picture of landscape" sort of guess.
3. There are also plenty of other very high level Geoguessr players who are quite strong in this area, maybe nearly equal to Rainbolt's skill if not better. I don't know how many competitions have been run but Rainbolt runs a daily competition that includes most of the best players that has sometimes included low information landscapes like this and he is strong but not dominant.
4. While many Geoguessr players do lean heavily on car meta/roads/streets etc the best ones are also phenomenal on Just Vibes and vegetation.
So basically no I do not think Rainbolt is a Chimp -> Helicopter level phenomenon in Geoguessr, or even close to it really. Nor is he Magnus Carlsen. Just a absolute top player.
My claim is that seemingly impossible things can be done, so we can't rely on common sense to determine what's possible.
Yes, every seemingly impossible thing being done has a story behind it. For helicopters, it's lots of human civilization and the industrial revolution. For Rainbolt, I assume it's something about his talent, his experience, and a big community of GeoGuessrs working on strategies together.
But all of this implies that if you have more talent, more experience, and more community, you can do even more things, past the boundary of what we now think is impossible. Or else I would have to claim that even though I was completely wrong about the borders of the possible in GeoGuessr one month ago, I can be certain I'm correct now.
From a pragmatic standpoint I think that focusing on seemingly impossible risks, like self assembling nanobots, just makes the AI safety community seem unhinged and it's a poor rhetorical strategy (even if they are possible). It's much more compelling to focus on more concrete risks like competing nations ceding more and more decision making to AGI in an arms race situation (and I do think that AI 2027 does a good job of this).
However, I still have issues with calling this geogussr performance "seemingly impossible". LLMs are fundamentally knowledge distillation models and this task is perfectly suited to it. Finding patterns and statistics in copious amounts of data is the bread and butter of what they do. That's part of intelligence certainly, but not all of it. This is still fundamentally a single shot lookup problem*, whereas most real world problems would require iterated problem solving where actions are performed, data is gathered and then further actions are taken based on the effects of the previous actions. LLMs are still absolutely awful at these types of tasks.
*only the actual lookup table is too enormous to represent explicitly so it's approximated within the model.
It's not that anything IS possible, it's that reasoning based on "common sense says x is plausible or not plausible" is not particularly truth tracking. Which is true, because people by default are not calibrated .
Just adding some thoughts - these capabilities were already available in 4o, and to an extent even GPT-4V.
A team I worked with tested 4o and GPT-4V models on Geoguessr back in 2024. That version of 4o in ChatGPT was capable of everything you described in the post, zero-shot: https://x.com/menhguin/status/1833138220592500767. In fact, we also discovered that its accuracy *improved* when asked not to output CoT.
My hypothesis is that a pre-trained natively multimodal model is inherently capable of such feats zero-shot, and requires no special post-training.
In fact, I took a (extremely downscaled) screenshot of the image you sent, and 4o still got Hangzhou, China! - see below:
----------
This image appears at first glance to be Paris due to the presence of what looks like the Eiffel Tower. However, upon closer inspection, several key clues reveal this is **not** Paris, France:
---
### **Key Clues & Analysis:**
1. **Eiffel Tower Replica**:
* The proportions and surrounding architecture suggest this is not the original Eiffel Tower. It lacks the immediate urban density and layout of central Paris.
* This is a **replica**, not the real tower.
2. **Symmetry and Urban Design**:
* The buildings are styled in Haussmannian Parisian architecture but are **too perfectly mirrored**, more uniform than actual Paris.
* The symmetry is typical of planned developments aiming to mimic European cities.
3. **Commercial Signs**:
* On close zoom, the shops and logos at ground level appear non-French — several signs use Latin-based text typical of faux-European developments, often in China.
4. **Most Likely Match: Tianducheng, China**:
* This is a well-known “fake Paris” development in **Tianducheng**, a real estate project in **Hangzhou, Zhejiang Province, China**.
* It features a 108-meter-tall replica of the Eiffel Tower and Paris-inspired streets and buildings, designed as a luxury development.
---
### **Final Guess**:
```json
{
"lat": 30.2416,
"long": 120.1868,
"city": "Hangzhou",
"country": "China"
}
```
This is the replica Eiffel Tower in **Tianducheng**, a suburb of Hangzhou in eastern China.
Would you like to explore other famous architectural replicas around the world next?
On the other hand 03 really struggled with the Google street view photo just outside my house, the reasoning system getting ever more frantic as it continued to second guess itself. I could see it try and zoom in on house names as well. No luck. Even when I gave some hints and pointed out the camera was facing south (which is something a geo guesser can work out in google street view via the sun position and height) and that there was a visible hill in the distance it really didn’t get much better. In fact it was suggesting places that absolutely wouldn’t have fit those criteria, being too flat or having hills to the east or north of the position, but flatland or sea to the south.
When I have it the county it still got the position in the county wrong. Admittedly the hill is in a different county.
My guess is that it needs the Google street view markers that geo guessers use but, in my case, the system wasn’t very good at other hints.
"Fuji and Everest are both more touristy than Olympus (somehow)"
C'mon, Greek Tourist Bureau, get to work at brainstorming how to promote climbing Mt. Olympus. Run some ideas up the flagpole and see if anybody salutes!
Put some giant fake ruins at the top and say it's the palace of the gods. Charge tourists forty euros to take a selfie in the remains of Zeus's oversized gold throne.
On the other hand, it sounds like AI is only a few years away from overcoming all the problems suggested here.
I can imagine that the National Security Agency has a project to take all photos ever posted online and spit out every photo in which an American asset appears in the same photo as a target: e.g., "You were photographed in 2002 on a tenth grade class trip to the Museum of Science and Industry with the new assistant to the Minister of the Interior in the government of Bolivia. We'd like you to get in touch with your old friend ..."
I wasn't aware that GPT-4o did a good job on Geoguessr! However, if you look at the benchmarks (https://geobench.org/, https://deepguessr.com/), you'll see that open-source models are also on there.
The results are slightly mixed on open vs closed models, but my guess is that models have been trained on Geoguessr-style problems for a while. Llama 3.2 90b Vision is the closest we have to a pre-trained model that's likely not trained on Geoguessr problems and it scores about as well as the average human. Which is pretty impressive to be clear.
How could we test this? I would like to see Molmo (https://allenai.org/blog/molmo) on these benchmarks, since we know the data mix. We can check to see if it has Geoguessr-style problems in its training data and compare its performance to other models.
I would love to see someone who is GPU-rich to run this experiment, happy to discuss exactly what to run :)
I'm more surprised that it couldn't recognize the photo of Mount Olympus than by anything else here. Granted, I don't think I would have recognized it either, but then I haven't seen every photo of Olympus on the Internet.
I just ran my own experiment with public parks and playgrounds in Germany, and neither Gemini nor the various ChatGPT models did all that well.
They figured out that the pictures are from the general area (and remained admirably open minded about similar areas like France, UK, Switzerland etc), but couldn't really pin anything down much beyond that.
Vadim's comment made me smile because Russia is probably the hardest country in Geoguessr - notoriously hard in fact. You have lots of identical communist architecture and similar latitudes across a massive longitude range which generally makes guessing from things like vegetation harder. Even when you tell if you are west or east, that is still a massive area. Most of Russia is flat too (the non-flat parts are the easy parts!).
It's the one country where if you want to be good at Geoguessr above a certain level, you really can't avoid using the google car, copyright and the season that coverage was taken to rule down your options. You sometimes see the pros make ridiculously wrong guesses on Russia in a way that is far more rare on other countries and usually the first thing they'll do on a round is scrutinise the car antenna. There are of course some recognisable areas and roads, and if you are playing moving then you can obviously try and get info or other regional clues but playing no-move if you're not in one of those recognisable areas, it is hard.
Someday, if you blindfolded a robot, took it to a random wilderness location, and dropped it off, it could figure out exactly where it was within hours. It would use the same methods that the LLMs use today, but it would also be able to create a map of its immediate area by walking around, and then cross-reference that against the Earth atlas data it had stored in its memory. Waiting until nighttime to have a look at the constellations in the sky would also be really helpful.
Rainbolt's pretty good. His specialty is No move, no pan, no zoom games with very short time limits, which is very impressive. I'm a fan of Blinky, who doesn't really promote himself much on social media. He's the champ at default competition rules, which allows move/pan/zoom, usually with 40-60 second timers. I geoguess a fair amount too, usually default rules. I particularly like the Indoor maps, where all the locations are inside buildings.
rainbolt isn't the best yes but he is A: really prominent and B: his specialty is most similar to how AI looks at images so at least he is a decent benchmark
If you have the original file of that planted-flag photo, and if the device you used had GPS (e.g. a modern phone), you might be able to find the exact coordinates in the EXIF data. Social media services tend to remove that data for privacy reasons, so it would probably have to be the original file.
Hmm. The Chimp in the tree cannot comprehend what he is looking at. He might just ignore it (unless it were so close he couldn't) because it's that far out of his experience zone. It's not even magic, it's a "Somebody Else's Problem" field, it's so far out of context.
I once predicted to a friend that we will know the aliens have arrived because something will happen that has no plausible explanation. We all wake up, and Jupiter is gone. They will be so far advanced compared to us, they might as well be gods.
AI isn't that. I understand exactly what it is doing--it's guessing locations based on photos. I don't know *how* it is doing that, but I understand perfectly *what* it is doing. The chimp in the tree can't even process what a photo safari is, I but grok geoguessing.
One day we will all wake up and the internet will be gone, erased from every server simultaneously. We won't know who, how or why, but it's done. That's how we will know that GAI has arrived.
>If Rainbolt’s skill is anywhere near what it looks like in this video, I don’t think the takeaway is “don’t worry about AI after all”, it’s “Trevor Rainbolt is as far beyond the rest of us as a helicopter engineer is to a chimp, and if you didn’t predict it was possible for a human to guess the location of a picture of blue sky, then you’re going to be extra-double-surprised by whatever superintelligence can do”.
Conservation of expected evidence always seems to go out the window in AI-risk discussions. Are we supposed to think that if Rainbolt had been clearly outperformed by o3, then you'd have considered that a reason to worry *less* about what ASI could do?
For the record, I find comments like these mean and obnoxious, if you made them about anyone else I would ban them, and I'm only not banning them for me because it would look bad.
- I do find the comment necessary as a reply to the previous commenter, to offer an explanation for why you found it necessary to hitch ASI to Rainbolt's above-average, even top-level, but evidently not super-human GeoGuessr skills.
- I do believe the comment to be true:
- - You are an official spokesperson for AI 2027[1]. You are convinced of their timeline and the expectation of it happening is not in question to you. You write in support of that belief, and you want people to adopt that belief[2]. While you do have more insight into the work of the AI 2027 researchers than most, you do not count yourself among them[3]. You are a "true believer" as far as AI 2027 is concerned: by your own characterization a believer and propagator, not a creator of knowledge. Hence my choice of "evangelizing".
- - "Spinning" something does not necessarily mean lying about it, and I did not intend to accuse you of the latter. I meant it to mean "[arguing] circles around the average uneducated person"[4].
Therefore I believe my comment to score 2 out of 3, which would pass your comment rule[5]. Please explain where you disagree with my scoring or whether you have new rules for ACX that I then apparently missed.
[2] "[..] I’m hoping that my celebrity will get people to pay attention to what Daniel‘s saying. Think of it as 'International aid expert discusses the Ethiopian famine with concerned Hollywood actor,' with me in the role of the actor, and you won’t be disappointed." https://www.astralcodexten.com/p/introducing-ai-2027
[3] "I can’t take credit for the forecast itself - or even for the lion’s share of the writing and publicity - but it’s been an immense privilege to work alongside some of the smartest and most epistemically virtuous people I know, trying to absorb their worldview on a level deep enough to do it justice." https://www.astralcodexten.com/p/introducing-ai-2027
That is not enough to make your comment actually pass the "true" test. That Scott would spin "anything and its opposite" into supporting AI danger is both controversial and highly opinionated.
"If you want to say something that might not be true – anything controversial, speculative, or highly opinionated – then you had better make sure it is both kind and necessary."
If the *only* datapoint you had was whether the AI is better or worse than Rainbolt, you would be right. But if you learned something about Rainbolt's capabilities in an absolute sense, then you have more information than that.
We might well have learned that what Rainbolt does was only averagely impressive, and that the AI was *still worse* - in which case you're not in chimp helicopter land. It is consistent therefore for "AI is a bit worse than rainbolt" to be weak evidence against AI capability... but "AI is a bit worse than rainbolt but Rainbolt is ludicrously good" can still be moderate evidence for AI capability.
TL;DR your statement is true in a one variable problem, but we have multivariate evidence.
The key question I tried to address in the original post was whether we could easily judge the borders of impossibility. If both Rainbolt and the AI were able to do good jobs on easily identifiable pictures, but couldn't figure out difficult ones better than chance, that would be evidence against the hypothesis that superintelligence could do seemingly impossible things.
I mentioned this on the original post but if we were going to actually link some Rainbolt videos I'm surprised we didn't go with the recent one where he just straight up does an OSINT head to head against o3 and spanks it pretty hard.
The GeoGuessr prompt that Kelsey Piper developed was a revelation. It's obviously a very good prompt because it describes in detail how an intelligent human might reason their way through GeoGuessr: here are the things to notice, here are the reasoning steps.
But that's an AlphaGo way of thinking: encouraging the AI to think like an expert human. If we're going to get to the helicopter moment, don't we want the AlphaZero way of thinking? I.e., the model should be coming up with its own GeoGuessr strategies that a human would never even think of (or that a human couldn't reliably reproduce).
If the models are indeed being RL trained on GeoGuessr, then you might expect them to have come up with their own strategies, and you might also expect that giving them a prompt with human reasoning steps wouldn't significantly improve their performance.
After reading the first article, I screenshot a video of a Canadian youtuber I watch walking around outside, and ChatGPT guessed it down to the exact intersection in Toronto, Ontario. I became and remain amazed.
The overarching argument is essentially - o3 can do amazing things at Geogussr, therefore superintelligence could potentially do amazing things that we might not expect. The second statement might be true, but it the link to Geoguessr is extremely tenuous.
This is exactly the sort of task we should expect large transformer models to be good at - finding statistical patterns in enormous amounts of data. That's how they are trained. This tells us nothing about their ability to create engineering solutions to problems that have never been seen previously, or do more exotic seemingly magically things like creating self assembling nanobots. That's a totally different skillset.
"OpenAI is trying to grab every data source they can for training. Data sources work for AIs if they are hard to do, easy to check, can be repeated at massive scale, and teach some kind of transferrable reasoning skill. GeoGuessr certainly counts. "
I'm not sure I understand this part. Are you saying that AI's are good at GeoGesser because they have seen and "memorized" photographs of every place on Earth down to WhatThreeWords level, and are pattern-matching on that library?
Or that the training set let it "extract" a set parameters about Earth geography, like soil, vegetation, etc, and it analyzes a photo for those "variables" and matches them with learned "facts"i.e. where on Earth those things are found?
I'm reminded of the killer in Thomas Harris' "Red Dragon", who was able to stalk his victims and case their houses because he worked at the film processing lab in St. Louis where families sent their 8-mm home movies to be developed.
Quote: "Families were mailing him their 'applications' every day!"
What? Is this a spambot or does the comment mean something?
The final guess is a clever way to end the post — I wonder how many humans would've said Paris, France.
Anyone who's been to Paris probably wouldn't.
Can confirm -- I've been to Paris, saw the Eiffel Tower, and this picture looks nothing like it.
I've been to Paris, but I'm an idiot, so I would.
On my original scroll-through, with no context and no real interest, my brain immediately clocked it as "photo of Paris, whatever". But once you actually look at it there's clearly a lot of things wrong with it. My second guess would have been that it's a particularly lousy AI-generated picture of what Paris might look like, and some replica in China would have been my third guess.
Trick questions can be easy if you have reason to suspect you're going to get a trick question, but hard if you're not thinking about it.
Exactly the same here. And I lived 16 years in Paris ! I was so ashamed when I figured out the (obvious) trick.
I've been to Paris and my reaction before reading the answer was "I don't remember that shot, but then I didn't walk in a giant circle around the Eiffel Tower, so how the hell would I know what it looks like in every possible photo? And I definitely couldn't give the GPS coordinates of a particular spot in Paris where you see the Eiffel Tower like that!"
You wouldn't have had to walk in a giant circle. I'm imagining just four straight segments, maybe a hundred yards long each.
(My suspicion after staring at it for about ten seconds was that it was photoshopped, and didn't exist. But that was mostly due to Scott's caption about it being his favorite.)
The scale is all wrong.
I first thought it was obviously Paris. Then I thought it obviously couldn’t be that obvious. Then I thought it could be any old place with some imposing, somewhat Elysian buildings, and a strategically located, invisible glass table with a little model of the Eiffel tower on top.
All you have to do is remember that the geography of the area looks nothing like this. There's a series of large flat lawns on one side and a street on the other side. No nearby buildings or hedges like this.
I got the last one right because I've been there :D although I didn't actually know the name of the complex and would have just guessed "the fake Paris complex in Hangzhou" (I live in Hangzhou). Rather a surreal place.
When I first came to China in 2013, it was also fairly common to name city districts after more famous areas from other cities in other countries (in Chinese transliterations) - Zhengzhou, where I lived at the time, had a "Manhattan" for example, and I think one other. But I vaguely remember hearing at some point that this practice was banned.
It's an interesting example though - I'm not sure I would have guessed it correctly if part of my reasoning did not include "well, it's obviously not Paris, so what would be an amusing place it could be that would warrant inclusion in this post?" I have a terrible visual memory, and didn't particularly recognize anything that would distinguish it from any other real or fake Eiffel Towers that may be out there.
There's a Wikipedia page of "Eiffel Tower Replicas and Derivatives"
"https://en.wikipedia.org/wiki/Eiffel_Tower_replicas_and_derivatives
All you'd have to do is click on each listing's photo and see if the foreground and background looks similar before digging into Google Street View.
There is actually an official Geoguessr map that is just Eiffel Tower replicas.
Isn’t that cheating?
To be fair a lot of cities in the world have a small fake version of China (a China-town, if you will) in them, so it seems appropriate that there's a small fake version of other cities in China.
So there may be a small fake Tianducheng in Paris, complete with a small squared fake squared Eiffel Tower?
Probably, if you count miniature Eiffel Towers on display at souvenir shops in Paris Chinatown.
I think the recursion ends though.
I've never been to Europe but that's not Europe at all lol. I would have guessed somewhere in America simply because I'm sure there's lots of replicas there.
Is there? Maybe just Vegas.
https://en.wikipedia.org/wiki/Eiffel_Tower_replicas_and_derivatives
Welp. I stand erected.
12 according to Wikipedia. So my sixth sense was right lol.
I think more like 8? Atlanta, Austin, Milwaukee, and Michigan don’t exist.
Now I’m more interested in Paris, Tx and Paris, TN. Dedicated same year. Same height (!) until Texas added a cowboy hat in 1998 to add some…seems like a quirky documentary!
Boy, a lot of those don't look much like the Eiffel Tower, at least in their canonical images. I mean, they share the schema of a slightly curved steel lattice tower, but I'd never mistake them.
ETA: I'll grant that Tianducheng is a pretty good match.
There's a half scale replica in Las Vegas. But it's pretty easy to tell that replica isn't on the Las Vegas Strip.
There are a number of smaller, less detailed replicas at American amusement parks.
In general, structures more or less inspired by the Eiffel Tower are pretty common around the world. The Eiffel Tower is really popular!
On the other hand, it would be really expensive to replicate and maintain at full scale.
Also, it helps that it's surrounded by Paris. The rebuilding of Paris under Napoleon III is a pretty crazy story.
How could you tell it’s not Europe?
Ancient knowledge revealed to me in a dream 8).
I don't know, it simply doesn't look like France. Plus, it looks like a toy tower, not tall enough.
I took the opposite away from it - a cheap gotcha. How is it a sign of intelligence whether or not you've been to, or intensely studied images of, n+1 Eiffel Towers?
Yeah, in fact, since it's such a cheap gotcha, you'd immediately think "there's no way this is the Eiffel Tower, too silly".
I agree, it is really unimpressive.
If it had been trained on Wikipedia, it should know it:
https://en.wikipedia.org/wiki/Eiffel_Tower_replicas_and_derivatives
https://en.wikipedia.org/wiki/Tianducheng
One hint would have been the size of the tower against the row of houses in the back. The real tower is about 300m I think, so that would make the row of houses 100m high, and probably more as the houses appear to be quite a bit further away than the tower itself, perhaps 150m. A skyscraper-sized city block would at least be unusual, so the conclusion that this is not Paris would be near.
At least this is what I might plausibly have figured out in hindsight if I had bothered to guess. With regard to its real location, I would have had no clue. Of course, if Geoguessers are allowed to use search engines, doing an image search for "Eiffel tower replica" would probably have worked.
Right, the Chinese replica is 108 meters or 354 feet tall.
And, all apartment buildings in the main Haussmannian district of Paris are 6 or 7 stories tall, including the garret where the starving artists live. So, the Eiffel Tower literally towers over Paris.
The general takeaway is that today's LLMs are pretty good at impressing people who are not paying much attention to detail and/or don't know anything about the subject and/or don't double-check the LLM's output. I've seen this in code, art, general knowledge questions, and now GeoGuessr.
If AI really brings about the downfall of humanity, it won't be because we lose John Connor to a time-travelling Terminator, but because we drive off a harbor wall when our GPS nav system tells us to.
My "concern" is more that they fail to bootstrap effectively and we're left with the ability to massively regurgitate midwit pictures/text/whatever. Then, no one bothers to learn to think anymore because they can rely on the midwit machines, and we just stagnate.
I have a more optimistic interpretation of the same technical trajectory. If midwit-level intellectual output is reliably doable by machines, the only stable (and thus, inevitable) economic / sociological solution is to reallocate prestige rewards for intellectual labor accordingly. Mediocre pundits and other sloppy thinkers will get their niches automated away. https://slatestarcodex.com/2014/01/28/wirehead-gods-on-lotus-thrones/
Anyone who wants the dignity and power associated with recognition of personal excellence - true adulthood, rather than a "script kiddie" bashing together tools they barely understand - will have casual access to AI tutors of superhuman patience, diligence, and encyclopedic familiarity with every established scientific or artistic subject, the clearest possible path laid out for virtuous ambition to climb... but still need to put in the work to make something special of/for themselves. Some will do so. There'll also be more than enough material resources to keep the rest of us mostly happy, and out of serious trouble. https://egypt.urnash.com/rita/chapter/aleph/ Achieving everything we know how to want is better described as "thriving" than "stagnation."
Agreed. In the context of the post (and the OP of this comment thread), I saw it as a parlor trick, trying to pass off one capability (a readily accessible and queriable compressed version of all images posted on line plus their metadata) as another (critical thinking and general ability to discern things).
The former is a powerful and impressive capability, but it's also a known and understood capability of LLMs/LMMs. It doesn't cause me to update on anything w.r.t. rapid take-off or AGI or whatever.
I immediately guessed China. It's definitely not France.
Yeah, this is an interesting one. I immediately knew the destination as well, although I couldn’t have explained it as well as people in the comments did. This particular Eiffel Tower replica was notably featured in a music video by the xx.
Obligatory https://xkcd.com/1214/
That Eiffel Tower looks very short - there are some high rises in the back that look *maybe* 10 stories tall if that, and are much further back, and even then come up a quarter of the "Eiffel Tower" in the picture. So that caps its height to 140m tops, meaning we're looking at an Eiffel Tower replica. The text on the signs looks like blurry CJK ideograms not blurry latin script. I definitely wouldn't have done any better than "somewhere in that region" though unless I was allowed to do google searches.
That said, if I had never seen the half-scale Eiffel Tower replica in Vegas maybe I wouldn't have an intuition saying "that Eiffel Tower looks even smaller than the Vegas one".
I've been to Paris so I know the area around the Eiffel Tower looks nothing like this, but if I wasn't given any reason to think about the answer, I might have gotten lazy and guessed Paris anyway.
I tried doing this with Gemini 2.5 Pro - it suggested Tianducheng for the Eiffel Tower image as well.
Trevor Rainbolt tests, comments on, and competes against o3's geoguessing in this video:
https://www.youtube.com/watch?v=prtWONaO0tE
High chance he'd speak to you if you reached out, very cool and interesting guy.
I was going to attach this, too. It was a good video. For somebody who is mostly just involved in his niche, Geoguesser, he has been pretty awesome on the AI videos, discussing how they may get tripped up, what his thought process is vs. how the ai approaches it, etc.
Seems like the AI struggles with big geographical regions that share the same features.
https://imgur.com/a/6ktg9uv
The Tucuman images are pretty hard to guess unless you know the specific valley, it's an outlier compared to the vast majority of other Argentina rounds (90% of Geoguessr rounds that look like that will probably be in Peru/Bolivia)
They are very popular spots, asides from the wheat field one, there's tons of photos online.
I was surprised to see that the AIs would guess seemingly random patches of grass and dirt, but struggled a lot with these kind of photos.
> “Trevor Rainbolt is as far beyond the rest of us as a helicopter engineer is to a chimp, and if you didn’t predict it was possible for a human to guess the location of a picture of blue sky, then you’re going to be extra-double-surprised by whatever superintelligence can do”.
This is just an unreasonable way to spin it. The average human wouldn’t be able to predict that someone could make a big hunk of metal fly by spinning a few tiny metal blades, and certainly not that you can make atomic bombs, electricity, etc. And yet, this is the kind of uncertainty and unpredictability that we’ve all been living in since the discovery of fire.
Someone getting a feel for how Google Maps pictures of skies in different regions are different from each other isn’t more bewildering to me than electricity or the atomic bomb. It’s very much the same order of magnitude.
And yes, these are all amazing and bewildering things. But they *happen all the time*, and have been happening for a while. I don’t remember anyone freaking out about pro geoguessers in the same way that people are about superintelligence, after all. You’re watering down the claim about superintelligence and a chimp-to-human leap to pretty much just “things will go on as usual”.
Exactly my thoughts. It's impressive, but pretty much all high technology is like that and yet it's so implemented in our daily lives that we don't even care to notice. I'm not too impressed with AI being able to figure out photos of landscapes. That's exactly the kind of thing I would expect it to be good at.
I also would like to see rainbolt guessing 100 straight open sky photos. I'd imagine he only puts the most notable guesses in his videos only. If 100 random people sent him 100 blue sky photos, would he guess many of them?
Saying "my ignorance about geoguessrs is like my ignorance about what superintelligence can do, because they're both ignorance" is not a valid argument to say that all things you are ignorant about don't matter.
Hmm... There is an extremely short-term version of the argument with some validity: All things which haven't forcibly impinged on my consciousness yet are unlikely to matter to me in the next few seconds (caution: not valid in areas with ordinance in flight).
Yes. I view this as evidence that we're really bad at understanding what human capabilities actually are outside of our own specialties/interests, and as a result, are often Gell-Manned into thinking AIs are much more impressive than they are.
I've watched a bunch of Rainbolt videos (though I had no idea that was his real name) and also a bunch of another guy called Zigzag (presumably not his real name). They're impressive but not chimp-to-human impressive; the top geoguessr players seem to be at the right level for people who spend a lot of time practicing geoguessr.
When they're not picking on things that they've deliberately studied like road markings, bollard types, and properties of the actual mapping vehicle and camera, they're often just picking up subtle things that become obvious once you've spent enough time looking at an area like the vegetation and the colour of the rocks and the soil.
Coming back to AI, it's not at all surprising that an AI trained as an image classifier to play geoguessr would be good at playing geoguessr, but it's somewhat surprising that it managed to get pretty good at it without ever being deliberately trained.
Yeah as someone who is an avid watcher of Geoguessr and a reasonably strong player I think Scott is just underestimating how good trained and experienced humans are at this. Not that it's not impressive that an AI can do it, but I think there's a few pieces of context missing here that you would only know if you watch a lot of high level Geoguessr
1. Rainbolt is not the best Geoguessr player in the world, although...
2. He is arguably the best Geoguessr player at this sort of "low information blurry picture of landscape" sort of guess.
3. There are also plenty of other very high level Geoguessr players who are quite strong in this area, maybe nearly equal to Rainbolt's skill if not better. I don't know how many competitions have been run but Rainbolt runs a daily competition that includes most of the best players that has sometimes included low information landscapes like this and he is strong but not dominant.
4. While many Geoguessr players do lean heavily on car meta/roads/streets etc the best ones are also phenomenal on Just Vibes and vegetation.
So basically no I do not think Rainbolt is a Chimp -> Helicopter level phenomenon in Geoguessr, or even close to it really. Nor is he Magnus Carlsen. Just a absolute top player.
My claim is that seemingly impossible things can be done, so we can't rely on common sense to determine what's possible.
Yes, every seemingly impossible thing being done has a story behind it. For helicopters, it's lots of human civilization and the industrial revolution. For Rainbolt, I assume it's something about his talent, his experience, and a big community of GeoGuessrs working on strategies together.
But all of this implies that if you have more talent, more experience, and more community, you can do even more things, past the boundary of what we now think is impossible. Or else I would have to claim that even though I was completely wrong about the borders of the possible in GeoGuessr one month ago, I can be certain I'm correct now.
From a pragmatic standpoint I think that focusing on seemingly impossible risks, like self assembling nanobots, just makes the AI safety community seem unhinged and it's a poor rhetorical strategy (even if they are possible). It's much more compelling to focus on more concrete risks like competing nations ceding more and more decision making to AGI in an arms race situation (and I do think that AI 2027 does a good job of this).
However, I still have issues with calling this geogussr performance "seemingly impossible". LLMs are fundamentally knowledge distillation models and this task is perfectly suited to it. Finding patterns and statistics in copious amounts of data is the bread and butter of what they do. That's part of intelligence certainly, but not all of it. This is still fundamentally a single shot lookup problem*, whereas most real world problems would require iterated problem solving where actions are performed, data is gathered and then further actions are taken based on the effects of the previous actions. LLMs are still absolutely awful at these types of tasks.
*only the actual lookup table is too enormous to represent explicitly so it's approximated within the model.
I don't think that your ignorance of GeoGuessr abilities can be used to prove that anything is possible.
It's not that anything IS possible, it's that reasoning based on "common sense says x is plausible or not plausible" is not particularly truth tracking. Which is true, because people by default are not calibrated .
Hello,
Just adding some thoughts - these capabilities were already available in 4o, and to an extent even GPT-4V.
A team I worked with tested 4o and GPT-4V models on Geoguessr back in 2024. That version of 4o in ChatGPT was capable of everything you described in the post, zero-shot: https://x.com/menhguin/status/1833138220592500767. In fact, we also discovered that its accuracy *improved* when asked not to output CoT.
My hypothesis is that a pre-trained natively multimodal model is inherently capable of such feats zero-shot, and requires no special post-training.
In fact, I took a (extremely downscaled) screenshot of the image you sent, and 4o still got Hangzhou, China! - see below:
----------
This image appears at first glance to be Paris due to the presence of what looks like the Eiffel Tower. However, upon closer inspection, several key clues reveal this is **not** Paris, France:
---
### **Key Clues & Analysis:**
1. **Eiffel Tower Replica**:
* The proportions and surrounding architecture suggest this is not the original Eiffel Tower. It lacks the immediate urban density and layout of central Paris.
* This is a **replica**, not the real tower.
2. **Symmetry and Urban Design**:
* The buildings are styled in Haussmannian Parisian architecture but are **too perfectly mirrored**, more uniform than actual Paris.
* The symmetry is typical of planned developments aiming to mimic European cities.
3. **Commercial Signs**:
* On close zoom, the shops and logos at ground level appear non-French — several signs use Latin-based text typical of faux-European developments, often in China.
4. **Most Likely Match: Tianducheng, China**:
* This is a well-known “fake Paris” development in **Tianducheng**, a real estate project in **Hangzhou, Zhejiang Province, China**.
* It features a 108-meter-tall replica of the Eiffel Tower and Paris-inspired streets and buildings, designed as a luxury development.
---
### **Final Guess**:
```json
{
"lat": 30.2416,
"long": 120.1868,
"city": "Hangzhou",
"country": "China"
}
```
This is the replica Eiffel Tower in **Tianducheng**, a suburb of Hangzhou in eastern China.
Would you like to explore other famous architectural replicas around the world next?
On the other hand 03 really struggled with the Google street view photo just outside my house, the reasoning system getting ever more frantic as it continued to second guess itself. I could see it try and zoom in on house names as well. No luck. Even when I gave some hints and pointed out the camera was facing south (which is something a geo guesser can work out in google street view via the sun position and height) and that there was a visible hill in the distance it really didn’t get much better. In fact it was suggesting places that absolutely wouldn’t have fit those criteria, being too flat or having hills to the east or north of the position, but flatland or sea to the south.
When I have it the county it still got the position in the county wrong. Admittedly the hill is in a different county.
My guess is that it needs the Google street view markers that geo guessers use but, in my case, the system wasn’t very good at other hints.
"Fuji and Everest are both more touristy than Olympus (somehow)"
C'mon, Greek Tourist Bureau, get to work at brainstorming how to promote climbing Mt. Olympus. Run some ideas up the flagpole and see if anybody salutes!
I was considering climbing Olympus this summer. But looking into it it turns out Olympus isn't that beautifull.
Put some giant fake ruins at the top and say it's the palace of the gods. Charge tourists forty euros to take a selfie in the remains of Zeus's oversized gold throne.
On the other hand, it sounds like AI is only a few years away from overcoming all the problems suggested here.
I can imagine that the National Security Agency has a project to take all photos ever posted online and spit out every photo in which an American asset appears in the same photo as a target: e.g., "You were photographed in 2002 on a tenth grade class trip to the Museum of Science and Industry with the new assistant to the Minister of the Interior in the government of Bolivia. We'd like you to get in touch with your old friend ..."
I wasn't aware that GPT-4o did a good job on Geoguessr! However, if you look at the benchmarks (https://geobench.org/, https://deepguessr.com/), you'll see that open-source models are also on there.
The results are slightly mixed on open vs closed models, but my guess is that models have been trained on Geoguessr-style problems for a while. Llama 3.2 90b Vision is the closest we have to a pre-trained model that's likely not trained on Geoguessr problems and it scores about as well as the average human. Which is pretty impressive to be clear.
How could we test this? I would like to see Molmo (https://allenai.org/blog/molmo) on these benchmarks, since we know the data mix. We can check to see if it has Geoguessr-style problems in its training data and compare its performance to other models.
I would love to see someone who is GPU-rich to run this experiment, happy to discuss exactly what to run :)
Isn't the last one unusually easy? The amount of fake Eiffel towers in the world is surely limited?
I guessed the Fuji Pic correctly, it's the only volcano I've hiked. I think you're slightly below the summit going down on the Yoshida trail.
I imagine if you offered it a different volcano hike in a similar climate it might have guessed Fuji as well
I'm more surprised that it couldn't recognize the photo of Mount Olympus than by anything else here. Granted, I don't think I would have recognized it either, but then I haven't seen every photo of Olympus on the Internet.
I just ran my own experiment with public parks and playgrounds in Germany, and neither Gemini nor the various ChatGPT models did all that well.
They figured out that the pictures are from the general area (and remained admirably open minded about similar areas like France, UK, Switzerland etc), but couldn't really pin anything down much beyond that.
Vadim's comment made me smile because Russia is probably the hardest country in Geoguessr - notoriously hard in fact. You have lots of identical communist architecture and similar latitudes across a massive longitude range which generally makes guessing from things like vegetation harder. Even when you tell if you are west or east, that is still a massive area. Most of Russia is flat too (the non-flat parts are the easy parts!).
It's the one country where if you want to be good at Geoguessr above a certain level, you really can't avoid using the google car, copyright and the season that coverage was taken to rule down your options. You sometimes see the pros make ridiculously wrong guesses on Russia in a way that is far more rare on other countries and usually the first thing they'll do on a round is scrutinise the car antenna. There are of course some recognisable areas and roads, and if you are playing moving then you can obviously try and get info or other regional clues but playing no-move if you're not in one of those recognisable areas, it is hard.
Someday, if you blindfolded a robot, took it to a random wilderness location, and dropped it off, it could figure out exactly where it was within hours. It would use the same methods that the LLMs use today, but it would also be able to create a map of its immediate area by walking around, and then cross-reference that against the Earth atlas data it had stored in its memory. Waiting until nighttime to have a look at the constellations in the sky would also be really helpful.
If it has an accurate clock (and why wouldn't it) it could figure it out reasonably well from the sun angle alone.
Rainbolt's pretty good. His specialty is No move, no pan, no zoom games with very short time limits, which is very impressive. I'm a fan of Blinky, who doesn't really promote himself much on social media. He's the champ at default competition rules, which allows move/pan/zoom, usually with 40-60 second timers. I geoguess a fair amount too, usually default rules. I particularly like the Indoor maps, where all the locations are inside buildings.
rainbolt isn't the best yes but he is A: really prominent and B: his specialty is most similar to how AI looks at images so at least he is a decent benchmark
Plus he's still pretty good.
If you have the original file of that planted-flag photo, and if the device you used had GPS (e.g. a modern phone), you might be able to find the exact coordinates in the EXIF data. Social media services tend to remove that data for privacy reasons, so it would probably have to be the original file.
Can you unblock nonZionism? You had previously said that he would be blocked for one month and he has served his time (and then some). Thank you!
Another useful feature substack should add: timeouts.
I wonder if Scott still reads all the comments; if not, will he see this one?
Might have to keep posting it
What is a timeout?
I presume it would be a way for Scott to ban someone for a period of time but not have to remember to unban them.
Yeah, that
Do you remember his exact account? I can't find it under that name.
https://substack.com/@mascilbinah thank you very much sir
Hmm. The Chimp in the tree cannot comprehend what he is looking at. He might just ignore it (unless it were so close he couldn't) because it's that far out of his experience zone. It's not even magic, it's a "Somebody Else's Problem" field, it's so far out of context.
I once predicted to a friend that we will know the aliens have arrived because something will happen that has no plausible explanation. We all wake up, and Jupiter is gone. They will be so far advanced compared to us, they might as well be gods.
AI isn't that. I understand exactly what it is doing--it's guessing locations based on photos. I don't know *how* it is doing that, but I understand perfectly *what* it is doing. The chimp in the tree can't even process what a photo safari is, I but grok geoguessing.
One day we will all wake up and the internet will be gone, erased from every server simultaneously. We won't know who, how or why, but it's done. That's how we will know that GAI has arrived.
>If Rainbolt’s skill is anywhere near what it looks like in this video, I don’t think the takeaway is “don’t worry about AI after all”, it’s “Trevor Rainbolt is as far beyond the rest of us as a helicopter engineer is to a chimp, and if you didn’t predict it was possible for a human to guess the location of a picture of blue sky, then you’re going to be extra-double-surprised by whatever superintelligence can do”.
Conservation of expected evidence always seems to go out the window in AI-risk discussions. Are we supposed to think that if Rainbolt had been clearly outperformed by o3, then you'd have considered that a reason to worry *less* about what ASI could do?
Scott is an AI evangelist. He is going to spin anything and its opposite in support of that.
For the record, I find comments like these mean and obnoxious, if you made them about anyone else I would ban them, and I'm only not banning them for me because it would look bad.
- I do admit to the comment being mean.
- I do find the comment necessary as a reply to the previous commenter, to offer an explanation for why you found it necessary to hitch ASI to Rainbolt's above-average, even top-level, but evidently not super-human GeoGuessr skills.
- I do believe the comment to be true:
- - You are an official spokesperson for AI 2027[1]. You are convinced of their timeline and the expectation of it happening is not in question to you. You write in support of that belief, and you want people to adopt that belief[2]. While you do have more insight into the work of the AI 2027 researchers than most, you do not count yourself among them[3]. You are a "true believer" as far as AI 2027 is concerned: by your own characterization a believer and propagator, not a creator of knowledge. Hence my choice of "evangelizing".
- - "Spinning" something does not necessarily mean lying about it, and I did not intend to accuse you of the latter. I meant it to mean "[arguing] circles around the average uneducated person"[4].
Therefore I believe my comment to score 2 out of 3, which would pass your comment rule[5]. Please explain where you disagree with my scoring or whether you have new rules for ACX that I then apparently missed.
[1] You're getting second billing on the AI 2027 homepage: https://web.archive.org/web/20250507015025/https://ai-2027.com/
[2] "[..] I’m hoping that my celebrity will get people to pay attention to what Daniel‘s saying. Think of it as 'International aid expert discusses the Ethiopian famine with concerned Hollywood actor,' with me in the role of the actor, and you won’t be disappointed." https://www.astralcodexten.com/p/introducing-ai-2027
[3] "I can’t take credit for the forecast itself - or even for the lion’s share of the writing and publicity - but it’s been an immense privilege to work alongside some of the smartest and most epistemically virtuous people I know, trying to absorb their worldview on a level deep enough to do it justice." https://www.astralcodexten.com/p/introducing-ai-2027
[4] https://slatestarcodex.com/2019/06/03/repost-epistemic-learned-helplessness/
[5] https://slatestarcodex.com/2014/03/02/the-comment-policy-is-victorian-sufi-buddha-lite/
That is not enough to make your comment actually pass the "true" test. That Scott would spin "anything and its opposite" into supporting AI danger is both controversial and highly opinionated.
"If you want to say something that might not be true – anything controversial, speculative, or highly opinionated – then you had better make sure it is both kind and necessary."
https://slatestarcodex.com/2014/03/02/the-comment-policy-is-victorian-sufi-buddha-lite/
I said nothing about Scott or AI 2027 "supporting AI danger". I said that Scott and AI 2027 are supporting the idea that super-human AI is imminent.
If the *only* datapoint you had was whether the AI is better or worse than Rainbolt, you would be right. But if you learned something about Rainbolt's capabilities in an absolute sense, then you have more information than that.
We might well have learned that what Rainbolt does was only averagely impressive, and that the AI was *still worse* - in which case you're not in chimp helicopter land. It is consistent therefore for "AI is a bit worse than rainbolt" to be weak evidence against AI capability... but "AI is a bit worse than rainbolt but Rainbolt is ludicrously good" can still be moderate evidence for AI capability.
TL;DR your statement is true in a one variable problem, but we have multivariate evidence.
The key question I tried to address in the original post was whether we could easily judge the borders of impossibility. If both Rainbolt and the AI were able to do good jobs on easily identifiable pictures, but couldn't figure out difficult ones better than chance, that would be evidence against the hypothesis that superintelligence could do seemingly impossible things.
My favourite Rainbolt moment is: “Guys, this is a Mongolian camera. The pixels and saturation are a Mongolian camera.”
3:19 in this video: https://youtu.be/FvVGfTkWqzg?si=mfZWs_xAAanh9HQl
I mentioned this on the original post but if we were going to actually link some Rainbolt videos I'm surprised we didn't go with the recent one where he just straight up does an OSINT head to head against o3 and spanks it pretty hard.
The GeoGuessr prompt that Kelsey Piper developed was a revelation. It's obviously a very good prompt because it describes in detail how an intelligent human might reason their way through GeoGuessr: here are the things to notice, here are the reasoning steps.
But that's an AlphaGo way of thinking: encouraging the AI to think like an expert human. If we're going to get to the helicopter moment, don't we want the AlphaZero way of thinking? I.e., the model should be coming up with its own GeoGuessr strategies that a human would never even think of (or that a human couldn't reliably reproduce).
If the models are indeed being RL trained on GeoGuessr, then you might expect them to have come up with their own strategies, and you might also expect that giving them a prompt with human reasoning steps wouldn't significantly improve their performance.
Rainbolt has done several videos in which he directly competes against AI, if you want a more direct comparison of their skill.
After reading the first article, I screenshot a video of a Canadian youtuber I watch walking around outside, and ChatGPT guessed it down to the exact intersection in Toronto, Ontario. I became and remain amazed.
The overarching argument is essentially - o3 can do amazing things at Geogussr, therefore superintelligence could potentially do amazing things that we might not expect. The second statement might be true, but it the link to Geoguessr is extremely tenuous.
This is exactly the sort of task we should expect large transformer models to be good at - finding statistical patterns in enormous amounts of data. That's how they are trained. This tells us nothing about their ability to create engineering solutions to problems that have never been seen previously, or do more exotic seemingly magically things like creating self assembling nanobots. That's a totally different skillset.
"OpenAI is trying to grab every data source they can for training. Data sources work for AIs if they are hard to do, easy to check, can be repeated at massive scale, and teach some kind of transferrable reasoning skill. GeoGuessr certainly counts. "
I'm not sure I understand this part. Are you saying that AI's are good at GeoGesser because they have seen and "memorized" photographs of every place on Earth down to WhatThreeWords level, and are pattern-matching on that library?
Or that the training set let it "extract" a set parameters about Earth geography, like soil, vegetation, etc, and it analyzes a photo for those "variables" and matches them with learned "facts"i.e. where on Earth those things are found?
I'm reminded of the killer in Thomas Harris' "Red Dragon", who was able to stalk his victims and case their houses because he worked at the film processing lab in St. Louis where families sent their 8-mm home movies to be developed.
Quote: "Families were mailing him their 'applications' every day!"