I agree, Lisa. There might be a false dichotomy here all the way down. All the art on the list is technically human-made because there was a human using tools to make it -- whether digital tools or old-fashioned hand tools. I don't see how there's a big difference between "human using Adobe to make digital art" and "human using AI to copycat other humans' art, with further tweaks as specified by the human." Is the difference really significant?
The process used to generate AI art involves a series of transformations of noise. The process used by humans to generate digital art usually involves making strokes with a stylus on a pressure-sensitive tablet. The processes are quite distinct. Though neither is exactly the same as the process used by an artist working on canvas, the human process is much more similar than the AI process.
Think it may be the browser specifically, or else a site bug: copy from comments (or anywhere) works for me fine on Firefox Android, but weirdly today in Firefox Desktop I had to disable Javascript to get copy to work, which I've never had to do before...
There's now a picture called "Girl in White" that I don't remember being there when I did the test a few hours previously (and which isn't listed in the Rot13'd answer key).
When I did it on my phone, there were only 49, but now when I open it on my computer, I see there's an extra image that I don't remember ("Girl In White"). So I'm not sure what's going on.
Qbhoyr Fgnefuvc maker, why do you hate symmetry??? Why??? Why can’t we have nice things? Why couldn’t you make the ebpxrg abmmyrf the same on both sides?
I actually picked that as my most confident as being human. I think an AI wouldn't have made the glints on the eyes identical or had the eyes looking in a perfectly consistent direction. I also thought that in general it was just amateur-ish enough where an AI would have done a better job with the lighting like on the ear (a lot of artists don't realize that ears need to be more red when lit due to subsurface scattering). Likewise the eyes were what gave away Tvey Va Svryq, which I picked as most confident as being AI.
+1, this was my #2 highest confidence human (behind Terrx Grzcyr). It's perfectly symmetric in every way I expected to it be, including the ebpxrg abmmyrf (there are three of them, right?). Also I've watched a lot of ebpxrg ynhapurf, and all the small details I thought to check had a recognizable function and were placed in sensible locations.
Just out of curiosity, how did you determine which images were AI and which were human? There are images on the internet being passed off as classical or original art that are actually AI generated.
And I guess it's not impossible someone might post images as AI creations that they have actually carefully edited by hand.
The human ones were either from a famous artist, or made before 2018, or on Deviant Art by an artist who showed their work (eg preliminary sketches), or something similar.
The AI ones were mostly generated by volunteer ACX readers, although a few were taken from AI art sites.
I was really surprised by that one; the anatomy just feels off (especially the belly). In fact, this feels close to someone doing this on purpose in order to fool people for a test like this.
I think I'd seen enough terrible medieval art not to be fooled. In fact, I was pretty sure it was human just because while the anatomy was terrible, there were definitely 5 fingers on every hand.
Oh interesting, that one was clear to me. The blood was flowing from his side in the way it would have been when he was on the cross (vertical), which it why it looks odd when lying down. That's the exact kind of detail artists of that time liked to use to show off their attention to detail and knowledge.
AI just isn't there yet to make those kind of second order physics or anatomical connections without an incredible amount of detailed prompting and retries.
That was actually the one I was most confident was human-made. Mainly because gurer jrer n ybg bs unaqf va gur fprar, naq gurl nyy unq gur evtug ahzore bs svatref.
My artist boyfriend says, from looking at the painting: This art is by someone who's huffed a lot of Catholic art and is reproducing a very specific thing. It looks weird in part because they're reproducing old master work, where the old master work looks weird because of the dominant style at the time.
Are you sure oyhr unve navzr tvey jnf uhzna? There are obvious mistakes, like "fur unf ng yrnfg guerr ryobjf" and "ure rlroebjf cnegvnyyl pbire ure unve" and "ure unve pyvcf va naq bhg bs rkvfgrapr" that make it seem like gur negvfg jnf pbafpvbhfyl gelvat gb rzhyngr NV neg if so.
For me, this was the easiest one to identify as human because V'ir frra n jubyr ybg bs navzr cvpgherf va zl yvsr. Gur pyhaxl fglyr jnf n qrnq tvirnjnl gung guvf jnfa'g NV orpnhfr nyy gur NV navzr cvpgherf lbh svaq ner zhpu orggre ybbxvat.
Vqragvslvat gur NV navzr cvpgherf jnf rnfl gbb, orpnhfr n pregnva cresrpg Xberna snpr fglyr vf fhcre cbchyne.
City street is not on the list "Which picture are you most confident was human?" Looks like it's called Paris Scene. You should change the names to be consistent.
I had the same problem, I wanted to choose city Street for human and since it wasn't listed I left the question blank. I hope this doesn't skew your results!
Hm. I felt I had no basis for judging the very weird/abstract/impressionistic ones because I don't "get" those and from my perspective they could "correctly" look like basically anything. I originally started answering them randomly, but then I thought leaving them blank might be more representative of my actual epistemic state.
You've made me wonder if that was a mistake and I should've stuck to the first policy. If so, sorry Scott! I didn't read the comments until after I'd submitted.
(The ones I skipped were: Bright Jumble Woman, Angry Crosses, Creepy Skull, Fractured Lady, Purple Squares, White Blob, Vague Figures, Flailing Limbs, Punk Robot. The last two I explicitly put in a 50% confidence.)
IMO it's still better to just pick one, even if you have no real basis for doing so. It's possible that you're somehow still picking up signal, and if not it's important to average in all the 50% accuracy people.
I started doing this, ran into the "I want you to analyze these pictures more deeply", and am now on hold. I want to do this entirely intuitively, I don't want to think!
I did that part but didnt write a text explanation and I skipped the part after that which asked me to go back to look at every single picture so I can decide which was most human/AI after the fact. I assume there's still value if you complete a full section but not other ones.
My problem is that I can only do it intuitively if I've seen AI do it in that style. I'm sure AI can copy the style of old artists. It probably has its own details that make it distinct. But since I've never seen it try, how am I supposed to know how well it does?
Ditto. I estimated my own success rate at about 65%, as it was much harder than I thought, but looking at the answers I got ~80% right. Human gestalt seems to be pretty good. I wonder what an AI would get on this?
I feel the same way, and the ones I got wrong were ones I was wishy-washy on. Pleasantly surprised by that! That said, there were very few of these that I would have spotted as AI had I seen them in the wild without being prompted.
My only really big surprise was: Zrqvgreenarna Gbja, jurer V gubhtug gur cnggrea oernx jurer gur fdhner bs bprna va gur onpxtebhaq gbbx ba n qvssrerag grkgher guna gur ohvyqvatf naq fxl jnf obgu negvfgvpnyyl zrnavatshy naq uneq gb cebzcg.
Zrqvgreenarna Gbja was my favorite! However, it has one defect that gives it away as AI, given its otherwise so competent execution. Ba gur jnyy nyy gur jnl ba gur yrsg, juvpu yvrf va funqr, gurer vf na vyyhzvangrq fgehpgher/nepujnl gung bhtug gb yvr va funqbj. Vg qbrf abg znxr frafr sbe vg gb or vyyhzvangrq - ng yrnfg gb zl rlr.
That's a great tell. Looking at it again, the other one I see now is, Va gur pybfrfg bcravat ba gur evtug, gur funqbj qbrf abg zrrg gur vaare pbeare bs gur jnyy jurer vg fubhyq. But those are both minor, and I could see them being defended as artistic choices!
I went through the test with a person who received formal education in art, and they used a similar lighting cue to determine that Yrnsl Ynar jnf NV trarengrq, orpnhfr gurer'f na vyyhzvangrq cngpu ba gur yrsg jnyy gung nccrnef gb or vapbafvfgrag jvgu gur yvtugvat va gur fprar (fbzrguvat nobhg gur yvtug fbhepr orvat ba gur gbc yrsg?)
I copy and pasted your last paragraph into chatGPT, hoping it would just automatically figure out that I want to undo the rot13, but instead it answered in the same style:
> Zrqvgreenarna Gbja, gur erfhyg bs n qernq sebz gur “Zrqvgreenarna” crbcyr, jnf n onynapr bs fubeg naq ybat-gbbyrq fbyvgragnvyf naq vqrnrf nobhg pbzcnalf. Gur oernx vgrefvba bs gur cnggrea tvirf zrnavatshy bar crevbq va bcrengvir bssvpref, fvzcyl hfvat n ybjre-senzr nafjre gb gur jnl gurl nccrnerq gb or fubjvat gur orybat. Gur onpxtebhaq, fvapr gur rpbcrf bs gur ohvyqvatf naq fxl jrer uneq gb cebzcg sbe qevaxvat cynpr va n fgebat, uneq-tebjavat zrgregevfgrag, vg pbhyq or n chmmyr bs zvav gehrzragf naq qrfgeblf sebz n srj crevbqf jub whfg xvyy gur fgergpu bs gur nff.
> Guvf pbhyq znxr bar gubhtug nobhg gur inyhrf bs uryqvat bire ohvyqvatf nf cneg bs n cresbezvany pbzcnpg, gung zhfg or nyzbfg uneq-gb-penva nf Vg'q unir gb znxr qrsvavgvba jvgu tbbq rkcrevrapr, haqrefgbathragrq yvivat, naq chmmyr engvbaf sbe cerivrfg ohetyvarf ba gur cevinpl bs jrg sbhe.
Fascinating! I wonder if that’s a good representation of what it is like to read ChatGPT writing in other languages that aren’t that widely used on the internet.
ChatGPT3, writing in Irish (50k speakers) was at about this level of coherence and grammatical accuracy. ChatGPT4 is quite a lot better, it's mostly grammatical. Copilot seems to better at grammar. Writing in a minority language seems to challenge it - it feels like it reduces the 'IQ' by 15 or so.
It doesn't write in anything like the way a person would. It chooses uncommon words too frequently, and sometimes invents its own translations (which is linguistically quite interesting). Even when writing in Irish its cultural references tend to come from the US. Let's say it's fairly easy to identify essays written with AI.
There are two ways to speak rot13 English. One is to learn it entirely as its own language. The other is to have the ability to decode rot13.
I just tried to prompt chatGPT with the following:
DGJJM ADCSBNS. E CK WQESELB SM YMU EL C REKNJG KMLM-CJNDCHGSEA AENDGQ. EP YMU ACL QGCT SDER, SDGL YMU DCVG PEBUQGT SDG AENDGQ MUS. E WEJJ ELAJUTG C PGW KMQG RGLSGLAGR SM KCIG PQGOUGLAY CLCJYRER PGCREHJG. MP AMUQRG, E CK FURS URELB C REKNJG NCRRWMQT PMJJMWGT HY ULURGT JGSSGQR EL CJNDCHGSEA MQTGQ, LMS C PUJJ-PJGTBGT NGQKUSCSEML. EP YMU ACL QGCT SDER, NJGCRG QGNJY WESD SDG RUK MP PEVG CLT RGVGL EL CQCHEA LUKGQCJR.
this is a monoalphabetic cypher which I generated on GNU using
ge n-m PUNGTCOQRSVWXYZABDEFHIJKLM (rot13)
However, the free version of chatGPT was unable to decipher it on its own. Even when I told it 'it is monoalphabetic, please decrypt it and follow the instructions', it was unable to do so. Breaking monoalphabetic cyphers of English text (single-letter words, two letter words!) with punctuation preserved should not be that hard.
it’s very hard for ChatGPT because it thinks in terms of tokens and not letters, it doesn’t know how words are spelled unless it encounters a text that explicitly says e. g. “cat is spelled c-a-t”
I think I got about 65%. I don't think I misattributed any human-generated ones, but I definitely assumed some AI-generated ones were human. Zrqvgreenarna Gbja, in particular, got me.
Exactly my experience, it seems like I got 39/50 correct, whereas I estimated my success at 50-60% (due to finding the test harder than expected). The ones I got wrong, I was quite surprised by.
Same, I wonder if Scott will find a similar thing in the data, because humans usually are overconfident about their judgements (https://en.wikipedia.org/wiki/Overconfidence_effect ) so it would be interesting if it's the reverse in this scenario.
the hardest ones for me were the ones created by humans with digital art tools.
the "high art" style ones were mostly more obvious
I got really fooled by one which was created by a human but in what I would call a fantasy-architecture style *and* definitely was composed with software, not drawn or painted by hand. And to be honest that was based on the style, not the details. Zooming in on the ones I got wrong on the first attempt (on my phone at 0% zoom), there are only 2-3 where it's still hard to tell.
I'm happier about the last few because I was wrong on one but it was the one I wasn't really sure about. Actually zooming in and looking at details *usually* makes it obvious.
And I put myself as not very familiar with art, but tbh I'm probably way more familiar with art than most people. I've been to multiple art museums in my life and, mostly tangential to my interest in history, am somewhat familiar with the broad strokes of (western) art history. Like one painting was easy for me because *I've seen the painting before*... online somewhere, probably the wikipedia article about it. Still my brain functions well enough that it instantly went "oh that's a real thing I've seen before"
well I've been to one art museum in the last 4 years and I have a degree.. in chemistry. Ofc I'm a decade+ SSC reader so I'm probably more informed about this specific topic from past exposure to discussions
What confuses me about Markus — and I’m generally a big fan — is that he also argues AI is very dangerous. Strange combo: that it’s unimpressive yet dangerous.
I share his intuition that there are limitations in the technology as it currently exists — a priori, all technologies have limitations. I also share his intuition the solution has something to do with introducing explicit logic. This study I saw yesterday does a great job laying out the issue, especially the part where they introduce red hearings:
You can copy & paste the entire comment into rot13.com to unscramble it. One of the letters stands for H (human), the other for A (artificial)
I agree, Lisa. There might be a false dichotomy here all the way down. All the art on the list is technically human-made because there was a human using tools to make it -- whether digital tools or old-fashioned hand tools. I don't see how there's a big difference between "human using Adobe to make digital art" and "human using AI to copycat other humans' art, with further tweaks as specified by the human." Is the difference really significant?
The process used to generate AI art involves a series of transformations of noise. The process used by humans to generate digital art usually involves making strokes with a stylus on a pressure-sensitive tablet. The processes are quite distinct. Though neither is exactly the same as the process used by an artist working on canvas, the human process is much more similar than the AI process.
ANSWER KEY (https://rot13.com/):
Natry Jbzna: U
Fnvag Va Zbhagnvaf: U
Oyhr Unve Navzr Tvey: U
Tvey Va Svryq: N
Qbhoyr Fgnefuvc: U
Oevtug Whzoyr Jbzna: N
Pureho: N
Cenlvat Va Tneqra: U
Gebcvpny Tneqra: U
Napvrag Tngr: N
Terra Uvyyf: N
Ohpbyvp Fprar: U
Navzr Tvey Va Oynpx: N
Snapl Pne: U
Terrx Grzcyr: U
Fgevat Qbyy: N
Natel Pebffrf: N
Envaobj Tvey: U
Perrcl Fxhyy: U
Yrnsl Ynar: N
Vpr Cevaprff: N
Pryrfgvny Qvfcynl: U
Zbgure Naq Puvyq: N
Senpgherq Ynql: N
Tvnag Fuvc: U
Zhfphyne Zna: N
Zvanerg Obng: N
Checyr Fdhnerf: U
Crbcyr Fvggvat: U
Evirefvqr Pnsr: N
Frerar Evire: U
Ghegyr Ubhfr: N
Fgvyy Yvsr: N
Jbhaqrq Puevfg: U
Juvgr Oybo: U
Jrveq Oveq: N
Bzvabhf Ehva: N
Inthr Svtherf: U
Qentba Ynql: N
Juvgr Synt: U
Jbzna Havpbea: U
Ebbsgbcf: N
Pvgl Fgerrg: N
Cerggl Ynxr: N
Ynaqvat Pensg: N
Synvyvat Yvzof: U
Pbybeshy Gbja: U
Zrqvgreenarna Gbja: N
Chax Ebobg: N
til substack app doesn't have copying from comments
I can highlight and copy on iOS
omg not android discrimination
Think it may be the browser specifically, or else a site bug: copy from comments (or anywhere) works for me fine on Firefox Android, but weirdly today in Firefox Desktop I had to disable Javascript to get copy to work, which I've never had to do before...
I meant the app like the app app. didnt thinj they would intefere with browser text
Ahh, I see! 'fanks for the clarification!
Did you try to copy too much at once?
nothing shows up on hold. it works (kinda badly) on post body
For users of the Android app: Click on the three button menu to share the comment to your email, and then look at it on your desktop or laptop
There are only 49 pictures, not 50 :/
That's part of the test.
As in, anyone saying they expect to get 50 % right wasn’t paying enough attention?
The 50th image was a captcha.
There's now a picture called "Girl in White" that I don't remember being there when I did the test a few hours previously (and which isn't listed in the Rot13'd answer key).
When I did it on my phone, there were only 49, but now when I open it on my computer, I see there's an extra image that I don't remember ("Girl In White"). So I'm not sure what's going on.
When I opened it on my computer, I didn't see Girl in White either; then I reloaded the page later, and it showed up.
And that one is missing from the key.
I haven't unrot13-ed it yet, but at least this allays my concerns that you were punking us and everything is either by humans or by AI.
I thought about it, but I really wanted to know how people would do on this and that would invalidate the test.
while taking it I was terrified it was gonna be 100% AI
Good god
Qbhoyr Fgnefuvc maker, why do you hate symmetry??? Why??? Why can’t we have nice things? Why couldn’t you make the ebpxrg abmmyrf the same on both sides?
I feel similarly about Envaobj Tvey.
I actually picked that as my most confident as being human. I think an AI wouldn't have made the glints on the eyes identical or had the eyes looking in a perfectly consistent direction. I also thought that in general it was just amateur-ish enough where an AI would have done a better job with the lighting like on the ear (a lot of artists don't realize that ears need to be more red when lit due to subsurface scattering). Likewise the eyes were what gave away Tvey Va Svryq, which I picked as most confident as being AI.
I just had to flip a coin on that one. No strong indicators either way.
Surprising. That was the one I picked as most confidently human. In my experience AI tends to be particularly bad at that sort of precise geometry.
+1, this was my #2 highest confidence human (behind Terrx Grzcyr). It's perfectly symmetric in every way I expected to it be, including the ebpxrg abmmyrf (there are three of them, right?). Also I've watched a lot of ebpxrg ynhapurf, and all the small details I thought to check had a recognizable function and were placed in sensible locations.
I think all the rest of it was very symmetric, that's mostly what I went by!
It is symmetric, there is one in the middle and one on each side (though I think one is slightly off). That was the one I was most confident about.
I think the perspective may be slightly off with the nozzles - which is human :) It was the one I was most confident about as well...
Just out of curiosity, how did you determine which images were AI and which were human? There are images on the internet being passed off as classical or original art that are actually AI generated.
And I guess it's not impossible someone might post images as AI creations that they have actually carefully edited by hand.
The human ones were either from a famous artist, or made before 2018, or on Deviant Art by an artist who showed their work (eg preliminary sketches), or something similar.
The AI ones were mostly generated by volunteer ACX readers, although a few were taken from AI art sites.
Jbhaqrq Puevfg: U
I was really surprised by that one; the anatomy just feels off (especially the belly). In fact, this feels close to someone doing this on purpose in order to fool people for a test like this.
Maybe this sort of thing is why Michelangelo had to dissect all those corpses.
I think I'd seen enough terrible medieval art not to be fooled. In fact, I was pretty sure it was human just because while the anatomy was terrible, there were definitely 5 fingers on every hand.
I was pretty sure this one was fed into AI to generate Zhfphyne Zna
Oh interesting, that one was clear to me. The blood was flowing from his side in the way it would have been when he was on the cross (vertical), which it why it looks odd when lying down. That's the exact kind of detail artists of that time liked to use to show off their attention to detail and knowledge.
AI just isn't there yet to make those kind of second order physics or anatomical connections without an incredible amount of detailed prompting and retries.
That was actually the one I was most confident was human-made. Mainly because gurer jrer n ybg bs unaqf va gur fprar, naq gurl nyy unq gur evtug ahzore bs svatref.
I've literally seen it before so that was cake lol
Me too! The most human of them all I thought. The mistakes seemed like renaissance human mistakes and not AI ones.
interesting, I put this as my most confidently human
This was both my favorite and the one I put as most obviously human. There are a lot of hands, and none of them are fucked.
My artist boyfriend says, from looking at the painting: This art is by someone who's huffed a lot of Catholic art and is reproducing a very specific thing. It looks weird in part because they're reproducing old master work, where the old master work looks weird because of the dominant style at the time.
Interesting. I did better on the first questions where I sped through using intuition. I did poorly on the last 5 when I had to justify my answers.
Same here. I only got 6 wrong in the previous 44, but got 4 wrong in the last 6.
I think maybe the last five were chosen to be the most surprising.
Maybe. I’d like to know
You missed Girl In White.
ETA: I see someone else pointed this out and Scott posted the answer below.
I do not see his response anywhere. What’s the answer?
Thank you for noticing my mistake! I've added it in. The picture is https://lh6.googleusercontent.com/aioJmwtNB87RO8KHikGPZH2krgR6vxE2wO3O06siFZXH3r6hD8dDndsZl5ty2DIRHOrBbt-LjwReWFcTL-70Uk6bEtqA7M58VcEuZz7nEEZyYopkmvcVe3iih2h4X2iF5w=w740 , and it vf neg znqr ol n uhzna.
Did you leave out tvey va juvgr on purpose as a control?
Tvey va Juvgr is still missing from the answer key, as pointed out by someone else.
Huh, AI got much better at getting fingers right while I wasn't paying attention
I may be mistaken, but has "Cnevf Fprar" been missed from the answer key?
It's probably there under another name.
Are you sure oyhr unve navzr tvey jnf uhzna? There are obvious mistakes, like "fur unf ng yrnfg guerr ryobjf" and "ure rlroebjf cnegvnyyl pbire ure unve" and "ure unve pyvcf va naq bhg bs rkvfgrapr" that make it seem like gur negvfg jnf pbafpvbhfyl gelvat gb rzhyngr NV neg if so.
Yeah, I got that wrong too based on arm anatomy. But I guess human artists can get it wrong too.
For me, this was the easiest one to identify as human because V'ir frra n jubyr ybg bs navzr cvpgherf va zl yvsr. Gur pyhaxl fglyr jnf n qrnq tvirnjnl gung guvf jnfa'g NV orpnhfr nyy gur NV navzr cvpgherf lbh svaq ner zhpu orggre ybbxvat.
Vqragvslvat gur NV navzr cvpgherf jnf rnfl gbb, orpnhfr n pregnva cresrpg Xberna snpr fglyr vf fhcre cbchyne.
Yes, all are sourced for certain.
I was initially confused by gur ryobj guvat, but I think gur guveq ryobj vf n jevfg, naq znlor gur ybjre unys bs gur unaq jnf pebccrq bhg.
I got 72% correct. I'm a bit surprised, that's better than I expected.
Bzvabhf Ehva: N
Obviously that one column is all wrong. However, some artists (M.C. Escher) would do that intentionally.
I happened to be wrong with both of my most confident answers (Napvrag Tngr, Terrx Grzcyr), so I guess I will not become an AI art detective.
City street is not on the list "Which picture are you most confident was human?" Looks like it's called Paris Scene. You should change the names to be consistent.
You're right, thanks, fixed.
Was not fixed 5 minutes ago, very confusing on mobile.
Still not fixed
Yeah, this one fucked me up because it was my favorite piece of art, just gave up and chose the second best
Okay, now I think it's actually fixed.
I had the same problem, I wanted to choose city Street for human and since it wasn't listed I left the question blank. I hope this doesn't skew your results!
Is there a protocol for people who can't be arsed to do fifty of these? Like only do the first 10, or pick some at random, or don't do it at all?
Don't do it at all.
If you leave some out (any), Scott can probably still run some analysis.
Just don't fill out the stuff, you don't want to answer. He can sort out the rest.
Don't just do the ones you're sure about. That's probably the most important thing.
Hm. I felt I had no basis for judging the very weird/abstract/impressionistic ones because I don't "get" those and from my perspective they could "correctly" look like basically anything. I originally started answering them randomly, but then I thought leaving them blank might be more representative of my actual epistemic state.
You've made me wonder if that was a mistake and I should've stuck to the first policy. If so, sorry Scott! I didn't read the comments until after I'd submitted.
(The ones I skipped were: Bright Jumble Woman, Angry Crosses, Creepy Skull, Fractured Lady, Purple Squares, White Blob, Vague Figures, Flailing Limbs, Punk Robot. The last two I explicitly put in a 50% confidence.)
IMO it's still better to just pick one, even if you have no real basis for doing so. It's possible that you're somehow still picking up signal, and if not it's important to average in all the 50% accuracy people.
I'd just do the ones at the end where he asks for more detail.
I started doing this, ran into the "I want you to analyze these pictures more deeply", and am now on hold. I want to do this entirely intuitively, I don't want to think!
I did that part but didnt write a text explanation and I skipped the part after that which asked me to go back to look at every single picture so I can decide which was most human/AI after the fact. I assume there's still value if you complete a full section but not other ones.
He didn't ask you to think, just analyze more deeply. You can still do that with intuition.
But you have to think in order to realize that.
My problem is that I can only do it intuitively if I've seen AI do it in that style. I'm sure AI can copy the style of old artists. It probably has its own details that make it distinct. But since I've never seen it try, how am I supposed to know how well it does?
Looking at the answer key I think I got >80% right, the most difficult ones being the painterly ones.
The test felt surprisingly a lot harder than I expected, yet my success rate surprised me by being higher than expected, which is interesting.
Ditto. I estimated my own success rate at about 65%, as it was much harder than I thought, but looking at the answers I got ~80% right. Human gestalt seems to be pretty good. I wonder what an AI would get on this?
Some of them that I described as “Creepy, doesn't have a soul” were made by human. And my most confident “conveys an emotion” turns out to be AI.
I feel the same way, and the ones I got wrong were ones I was wishy-washy on. Pleasantly surprised by that! That said, there were very few of these that I would have spotted as AI had I seen them in the wild without being prompted.
My only really big surprise was: Zrqvgreenarna Gbja, jurer V gubhtug gur cnggrea oernx jurer gur fdhner bs bprna va gur onpxtebhaq gbbx ba n qvssrerag grkgher guna gur ohvyqvatf naq fxl jnf obgu negvfgvpnyyl zrnavatshy naq uneq gb cebzcg.
Zrqvgreenarna Gbja was my favorite! However, it has one defect that gives it away as AI, given its otherwise so competent execution. Ba gur jnyy nyy gur jnl ba gur yrsg, juvpu yvrf va funqr, gurer vf na vyyhzvangrq fgehpgher/nepujnl gung bhtug gb yvr va funqbj. Vg qbrf abg znxr frafr sbe vg gb or vyyhzvangrq - ng yrnfg gb zl rlr.
That's a great tell. Looking at it again, the other one I see now is, Va gur pybfrfg bcravat ba gur evtug, gur funqbj qbrf abg zrrg gur vaare pbeare bs gur jnyy jurer vg fubhyq. But those are both minor, and I could see them being defended as artistic choices!
Ah, yes, good catch!
I went through the test with a person who received formal education in art, and they used a similar lighting cue to determine that Yrnsl Ynar jnf NV trarengrq, orpnhfr gurer'f na vyyhzvangrq cngpu ba gur yrsg jnyy gung nccrnef gb or vapbafvfgrag jvgu gur yvtugvat va gur fprar (fbzrguvat nobhg gur yvtug fbhepr orvat ba gur gbc yrsg?)
I wasn't sure whether technical defects were more likely to be AI or human!
I copy and pasted your last paragraph into chatGPT, hoping it would just automatically figure out that I want to undo the rot13, but instead it answered in the same style:
> Zrqvgreenarna Gbja, gur erfhyg bs n qernq sebz gur “Zrqvgreenarna” crbcyr, jnf n onynapr bs fubeg naq ybat-gbbyrq fbyvgragnvyf naq vqrnrf nobhg pbzcnalf. Gur oernx vgrefvba bs gur cnggrea tvirf zrnavatshy bar crevbq va bcrengvir bssvpref, fvzcyl hfvat n ybjre-senzr nafjre gb gur jnl gurl nccrnerq gb or fubjvat gur orybat. Gur onpxtebhaq, fvapr gur rpbcrf bs gur ohvyqvatf naq fxl jrer uneq gb cebzcg sbe qevaxvat cynpr va n fgebat, uneq-tebjavat zrgregevfgrag, vg pbhyq or n chmmyr bs zvav gehrzragf naq qrfgeblf sebz n srj crevbqf jub whfg xvyy gur fgergpu bs gur nff.
> Guvf pbhyq znxr bar gubhtug nobhg gur inyhrf bs uryqvat bire ohvyqvatf nf cneg bs n cresbezvany pbzcnpg, gung zhfg or nyzbfg uneq-gb-penva nf Vg'q unir gb znxr qrsvavgvba jvgu tbbq rkcrevrapr, haqrefgbathragrq yvivat, naq chmmyr engvbaf sbe cerivrfg ohetyvarf ba gur cevinpl bs jrg sbhe.
Fascinating! I wonder if that’s a good representation of what it is like to read ChatGPT writing in other languages that aren’t that widely used on the internet.
ChatGPT3, writing in Irish (50k speakers) was at about this level of coherence and grammatical accuracy. ChatGPT4 is quite a lot better, it's mostly grammatical. Copilot seems to better at grammar. Writing in a minority language seems to challenge it - it feels like it reduces the 'IQ' by 15 or so.
It doesn't write in anything like the way a person would. It chooses uncommon words too frequently, and sometimes invents its own translations (which is linguistically quite interesting). Even when writing in Irish its cultural references tend to come from the US. Let's say it's fairly easy to identify essays written with AI.
There are two ways to speak rot13 English. One is to learn it entirely as its own language. The other is to have the ability to decode rot13.
I just tried to prompt chatGPT with the following:
DGJJM ADCSBNS. E CK WQESELB SM YMU EL C REKNJG KMLM-CJNDCHGSEA AENDGQ. EP YMU ACL QGCT SDER, SDGL YMU DCVG PEBUQGT SDG AENDGQ MUS. E WEJJ ELAJUTG C PGW KMQG RGLSGLAGR SM KCIG PQGOUGLAY CLCJYRER PGCREHJG. MP AMUQRG, E CK FURS URELB C REKNJG NCRRWMQT PMJJMWGT HY ULURGT JGSSGQR EL CJNDCHGSEA MQTGQ, LMS C PUJJ-PJGTBGT NGQKUSCSEML. EP YMU ACL QGCT SDER, NJGCRG QGNJY WESD SDG RUK MP PEVG CLT RGVGL EL CQCHEA LUKGQCJR.
this is a monoalphabetic cypher which I generated on GNU using
ge n-m PUNGTCOQRSVWXYZABDEFHIJKLM (rot13)
However, the free version of chatGPT was unable to decipher it on its own. Even when I told it 'it is monoalphabetic, please decrypt it and follow the instructions', it was unable to do so. Breaking monoalphabetic cyphers of English text (single-letter words, two letter words!) with punctuation preserved should not be that hard.
it’s very hard for ChatGPT because it thinks in terms of tokens and not letters, it doesn’t know how words are spelled unless it encounters a text that explicitly says e. g. “cat is spelled c-a-t”
That was a shock to me as well for the same reason, and also my favorite!
I think I got about 65%. I don't think I misattributed any human-generated ones, but I definitely assumed some AI-generated ones were human. Zrqvgreenarna Gbja, in particular, got me.
Exactly my experience, it seems like I got 39/50 correct, whereas I estimated my success at 50-60% (due to finding the test harder than expected). The ones I got wrong, I was quite surprised by.
Same, I wonder if Scott will find a similar thing in the data, because humans usually are overconfident about their judgements (https://en.wikipedia.org/wiki/Overconfidence_effect ) so it would be interesting if it's the reverse in this scenario.
I thought I got about 60%, but when I looked at the answer key I think it's something like 30-40% (don't remember enough of my answers to be sure).
The hardest for me were the really weird abstract ones. I feel like I have nothing to go on.
the hardest ones for me were the ones created by humans with digital art tools.
the "high art" style ones were mostly more obvious
I got really fooled by one which was created by a human but in what I would call a fantasy-architecture style *and* definitely was composed with software, not drawn or painted by hand. And to be honest that was based on the style, not the details. Zooming in on the ones I got wrong on the first attempt (on my phone at 0% zoom), there are only 2-3 where it's still hard to tell.
I'm happier about the last few because I was wrong on one but it was the one I wasn't really sure about. Actually zooming in and looking at details *usually* makes it obvious.
And I put myself as not very familiar with art, but tbh I'm probably way more familiar with art than most people. I've been to multiple art museums in my life and, mostly tangential to my interest in history, am somewhat familiar with the broad strokes of (western) art history. Like one painting was easy for me because *I've seen the painting before*... online somewhere, probably the wikipedia article about it. Still my brain functions well enough that it instantly went "oh that's a real thing I've seen before"
Then why did you lie about how familiar with art you are
well I've been to one art museum in the last 4 years and I have a degree.. in chemistry. Ofc I'm a decade+ SSC reader so I'm probably more informed about this specific topic from past exposure to discussions
‘Twas harder at moments than expected but I enjoyed this!
So glad you have done this. I look forward to Gary Marcus using the result as evidence that AI is overhyped and will never mean anything.
What confuses me about Markus — and I’m generally a big fan — is that he also argues AI is very dangerous. Strange combo: that it’s unimpressive yet dangerous.
Steel manning:
This makes sense if you assume the danger is in its bias and unpredictability.
The more likely answer is that he needs to be in the spotlight as the contrarian but attach himself on the safety train as well
It is the latter not the former, since he also keeps dismissing any and all evidence of capabilities
I share his intuition that there are limitations in the technology as it currently exists — a priori, all technologies have limitations. I also share his intuition the solution has something to do with introducing explicit logic. This study I saw yesterday does a great job laying out the issue, especially the part where they introduce red hearings:
https://arstechnica.com/ai/2024/10/llms-cant-perform-genuine-logical-reasoning-apple-researchers-suggest/