I agree, Lisa. There might be a false dichotomy here all the way down. All the art on the list is technically human-made because there was a human using tools to make it -- whether digital tools or old-fashioned hand tools. I don't see how there's a big difference between "human using Adobe to make digital art" and "human using AI to copycat other humans' art, with further tweaks as specified by the human." Is the difference really significant?
The process used to generate AI art involves a series of transformations of noise. The process used by humans to generate digital art usually involves making strokes with a stylus on a pressure-sensitive tablet. The processes are quite distinct. Though neither is exactly the same as the process used by an artist working on canvas, the human process is much more similar than the AI process.
Think it may be the browser specifically, or else a site bug: copy from comments (or anywhere) works for me fine on Firefox Android, but weirdly today in Firefox Desktop I had to disable Javascript to get copy to work, which I've never had to do before...
There's now a picture called "Girl in White" that I don't remember being there when I did the test a few hours previously (and which isn't listed in the Rot13'd answer key).
When I did it on my phone, there were only 49, but now when I open it on my computer, I see there's an extra image that I don't remember ("Girl In White"). So I'm not sure what's going on.
Qbhoyr Fgnefuvc maker, why do you hate symmetry??? Why??? Why can’t we have nice things? Why couldn’t you make the ebpxrg abmmyrf the same on both sides?
I actually picked that as my most confident as being human. I think an AI wouldn't have made the glints on the eyes identical or had the eyes looking in a perfectly consistent direction. I also thought that in general it was just amateur-ish enough where an AI would have done a better job with the lighting like on the ear (a lot of artists don't realize that ears need to be more red when lit due to subsurface scattering). Likewise the eyes were what gave away Tvey Va Svryq, which I picked as most confident as being AI.
+1, this was my #2 highest confidence human (behind Terrx Grzcyr). It's perfectly symmetric in every way I expected to it be, including the ebpxrg abmmyrf (there are three of them, right?). Also I've watched a lot of ebpxrg ynhapurf, and all the small details I thought to check had a recognizable function and were placed in sensible locations.
Just out of curiosity, how did you determine which images were AI and which were human? There are images on the internet being passed off as classical or original art that are actually AI generated.
And I guess it's not impossible someone might post images as AI creations that they have actually carefully edited by hand.
The human ones were either from a famous artist, or made before 2018, or on Deviant Art by an artist who showed their work (eg preliminary sketches), or something similar.
The AI ones were mostly generated by volunteer ACX readers, although a few were taken from AI art sites.
I was really surprised by that one; the anatomy just feels off (especially the belly). In fact, this feels close to someone doing this on purpose in order to fool people for a test like this.
I think I'd seen enough terrible medieval art not to be fooled. In fact, I was pretty sure it was human just because while the anatomy was terrible, there were definitely 5 fingers on every hand.
Oh interesting, that one was clear to me. The blood was flowing from his side in the way it would have been when he was on the cross (vertical), which it why it looks odd when lying down. That's the exact kind of detail artists of that time liked to use to show off their attention to detail and knowledge.
AI just isn't there yet to make those kind of second order physics or anatomical connections without an incredible amount of detailed prompting and retries.
That was actually the one I was most confident was human-made. Mainly because gurer jrer n ybg bs unaqf va gur fprar, naq gurl nyy unq gur evtug ahzore bs svatref.
My artist boyfriend says, from looking at the painting: This art is by someone who's huffed a lot of Catholic art and is reproducing a very specific thing. It looks weird in part because they're reproducing old master work, where the old master work looks weird because of the dominant style at the time.
Are you sure oyhr unve navzr tvey jnf uhzna? There are obvious mistakes, like "fur unf ng yrnfg guerr ryobjf" and "ure rlroebjf cnegvnyyl pbire ure unve" and "ure unve pyvcf va naq bhg bs rkvfgrapr" that make it seem like gur negvfg jnf pbafpvbhfyl gelvat gb rzhyngr NV neg if so.
For me, this was the easiest one to identify as human because V'ir frra n jubyr ybg bs navzr cvpgherf va zl yvsr. Gur pyhaxl fglyr jnf n qrnq tvirnjnl gung guvf jnfa'g NV orpnhfr nyy gur NV navzr cvpgherf lbh svaq ner zhpu orggre ybbxvat.
Vqragvslvat gur NV navzr cvpgherf jnf rnfl gbb, orpnhfr n pregnva cresrpg Xberna snpr fglyr vf fhcre cbchyne.
City street is not on the list "Which picture are you most confident was human?" Looks like it's called Paris Scene. You should change the names to be consistent.
I had the same problem, I wanted to choose city Street for human and since it wasn't listed I left the question blank. I hope this doesn't skew your results!
Hm. I felt I had no basis for judging the very weird/abstract/impressionistic ones because I don't "get" those and from my perspective they could "correctly" look like basically anything. I originally started answering them randomly, but then I thought leaving them blank might be more representative of my actual epistemic state.
You've made me wonder if that was a mistake and I should've stuck to the first policy. If so, sorry Scott! I didn't read the comments until after I'd submitted.
(The ones I skipped were: Bright Jumble Woman, Angry Crosses, Creepy Skull, Fractured Lady, Purple Squares, White Blob, Vague Figures, Flailing Limbs, Punk Robot. The last two I explicitly put in a 50% confidence.)
IMO it's still better to just pick one, even if you have no real basis for doing so. It's possible that you're somehow still picking up signal, and if not it's important to average in all the 50% accuracy people.
I started doing this, ran into the "I want you to analyze these pictures more deeply", and am now on hold. I want to do this entirely intuitively, I don't want to think!
I did that part but didnt write a text explanation and I skipped the part after that which asked me to go back to look at every single picture so I can decide which was most human/AI after the fact. I assume there's still value if you complete a full section but not other ones.
My problem is that I can only do it intuitively if I've seen AI do it in that style. I'm sure AI can copy the style of old artists. It probably has its own details that make it distinct. But since I've never seen it try, how am I supposed to know how well it does?
Ditto. I estimated my own success rate at about 65%, as it was much harder than I thought, but looking at the answers I got ~80% right. Human gestalt seems to be pretty good. I wonder what an AI would get on this?
I feel the same way, and the ones I got wrong were ones I was wishy-washy on. Pleasantly surprised by that! That said, there were very few of these that I would have spotted as AI had I seen them in the wild without being prompted.
My only really big surprise was: Zrqvgreenarna Gbja, jurer V gubhtug gur cnggrea oernx jurer gur fdhner bs bprna va gur onpxtebhaq gbbx ba n qvssrerag grkgher guna gur ohvyqvatf naq fxl jnf obgu negvfgvpnyyl zrnavatshy naq uneq gb cebzcg.
Zrqvgreenarna Gbja was my favorite! However, it has one defect that gives it away as AI, given its otherwise so competent execution. Ba gur jnyy nyy gur jnl ba gur yrsg, juvpu yvrf va funqr, gurer vf na vyyhzvangrq fgehpgher/nepujnl gung bhtug gb yvr va funqbj. Vg qbrf abg znxr frafr sbe vg gb or vyyhzvangrq - ng yrnfg gb zl rlr.
That's a great tell. Looking at it again, the other one I see now is, Va gur pybfrfg bcravat ba gur evtug, gur funqbj qbrf abg zrrg gur vaare pbeare bs gur jnyy jurer vg fubhyq. But those are both minor, and I could see them being defended as artistic choices!
I went through the test with a person who received formal education in art, and they used a similar lighting cue to determine that Yrnsl Ynar jnf NV trarengrq, orpnhfr gurer'f na vyyhzvangrq cngpu ba gur yrsg jnyy gung nccrnef gb or vapbafvfgrag jvgu gur yvtugvat va gur fprar (fbzrguvat nobhg gur yvtug fbhepr orvat ba gur gbc yrsg?)
I copy and pasted your last paragraph into chatGPT, hoping it would just automatically figure out that I want to undo the rot13, but instead it answered in the same style:
> Zrqvgreenarna Gbja, gur erfhyg bs n qernq sebz gur “Zrqvgreenarna” crbcyr, jnf n onynapr bs fubeg naq ybat-gbbyrq fbyvgragnvyf naq vqrnrf nobhg pbzcnalf. Gur oernx vgrefvba bs gur cnggrea tvirf zrnavatshy bar crevbq va bcrengvir bssvpref, fvzcyl hfvat n ybjre-senzr nafjre gb gur jnl gurl nccrnerq gb or fubjvat gur orybat. Gur onpxtebhaq, fvapr gur rpbcrf bs gur ohvyqvatf naq fxl jrer uneq gb cebzcg sbe qevaxvat cynpr va n fgebat, uneq-tebjavat zrgregevfgrag, vg pbhyq or n chmmyr bs zvav gehrzragf naq qrfgeblf sebz n srj crevbqf jub whfg xvyy gur fgergpu bs gur nff.
> Guvf pbhyq znxr bar gubhtug nobhg gur inyhrf bs uryqvat bire ohvyqvatf nf cneg bs n cresbezvany pbzcnpg, gung zhfg or nyzbfg uneq-gb-penva nf Vg'q unir gb znxr qrsvavgvba jvgu tbbq rkcrevrapr, haqrefgbathragrq yvivat, naq chmmyr engvbaf sbe cerivrfg ohetyvarf ba gur cevinpl bs jrg sbhe.
Fascinating! I wonder if that’s a good representation of what it is like to read ChatGPT writing in other languages that aren’t that widely used on the internet.
ChatGPT3, writing in Irish (50k speakers) was at about this level of coherence and grammatical accuracy. ChatGPT4 is quite a lot better, it's mostly grammatical. Copilot seems to better at grammar. Writing in a minority language seems to challenge it - it feels like it reduces the 'IQ' by 15 or so.
It doesn't write in anything like the way a person would. It chooses uncommon words too frequently, and sometimes invents its own translations (which is linguistically quite interesting). Even when writing in Irish its cultural references tend to come from the US. Let's say it's fairly easy to identify essays written with AI.
There are two ways to speak rot13 English. One is to learn it entirely as its own language. The other is to have the ability to decode rot13.
I just tried to prompt chatGPT with the following:
DGJJM ADCSBNS. E CK WQESELB SM YMU EL C REKNJG KMLM-CJNDCHGSEA AENDGQ. EP YMU ACL QGCT SDER, SDGL YMU DCVG PEBUQGT SDG AENDGQ MUS. E WEJJ ELAJUTG C PGW KMQG RGLSGLAGR SM KCIG PQGOUGLAY CLCJYRER PGCREHJG. MP AMUQRG, E CK FURS URELB C REKNJG NCRRWMQT PMJJMWGT HY ULURGT JGSSGQR EL CJNDCHGSEA MQTGQ, LMS C PUJJ-PJGTBGT NGQKUSCSEML. EP YMU ACL QGCT SDER, NJGCRG QGNJY WESD SDG RUK MP PEVG CLT RGVGL EL CQCHEA LUKGQCJR.
this is a monoalphabetic cypher which I generated on GNU using
ge n-m PUNGTCOQRSVWXYZABDEFHIJKLM (rot13)
However, the free version of chatGPT was unable to decipher it on its own. Even when I told it 'it is monoalphabetic, please decrypt it and follow the instructions', it was unable to do so. Breaking monoalphabetic cyphers of English text (single-letter words, two letter words!) with punctuation preserved should not be that hard.
it’s very hard for ChatGPT because it thinks in terms of tokens and not letters, it doesn’t know how words are spelled unless it encounters a text that explicitly says e. g. “cat is spelled c-a-t”
I think I got about 65%. I don't think I misattributed any human-generated ones, but I definitely assumed some AI-generated ones were human. Zrqvgreenarna Gbja, in particular, got me.
Exactly my experience, it seems like I got 39/50 correct, whereas I estimated my success at 50-60% (due to finding the test harder than expected). The ones I got wrong, I was quite surprised by.
Same, I wonder if Scott will find a similar thing in the data, because humans usually are overconfident about their judgements (https://en.wikipedia.org/wiki/Overconfidence_effect ) so it would be interesting if it's the reverse in this scenario.
the hardest ones for me were the ones created by humans with digital art tools.
the "high art" style ones were mostly more obvious
I got really fooled by one which was created by a human but in what I would call a fantasy-architecture style *and* definitely was composed with software, not drawn or painted by hand. And to be honest that was based on the style, not the details. Zooming in on the ones I got wrong on the first attempt (on my phone at 0% zoom), there are only 2-3 where it's still hard to tell.
I'm happier about the last few because I was wrong on one but it was the one I wasn't really sure about. Actually zooming in and looking at details *usually* makes it obvious.
And I put myself as not very familiar with art, but tbh I'm probably way more familiar with art than most people. I've been to multiple art museums in my life and, mostly tangential to my interest in history, am somewhat familiar with the broad strokes of (western) art history. Like one painting was easy for me because *I've seen the painting before*... online somewhere, probably the wikipedia article about it. Still my brain functions well enough that it instantly went "oh that's a real thing I've seen before"
well I've been to one art museum in the last 4 years and I have a degree.. in chemistry. Ofc I'm a decade+ SSC reader so I'm probably more informed about this specific topic from past exposure to discussions
What confuses me about Markus — and I’m generally a big fan — is that he also argues AI is very dangerous. Strange combo: that it’s unimpressive yet dangerous.
I share his intuition that there are limitations in the technology as it currently exists — a priori, all technologies have limitations. I also share his intuition the solution has something to do with introducing explicit logic. This study I saw yesterday does a great job laying out the issue, especially the part where they introduce red hearings:
I also share his intuition that "this is not that".
But I think he massively misses how this could be the start of that, and also, how impressive it already is!
Humans invented explicit logic, as far as I can tell, out of simple next token prediction, evolutionarily adaptive scaling and incredibly refined input filters (I think we are likely filtering something like 99.9% of potential inputs).
Surely it evolved the other way around no? The simplest one-celled organisms approach food in an “if this, then that” algorithm. Token prediction (or statistical modelling etc) would have come later?
We still have some simple algorithms in the physiology in our brains, like: “if blue light, then wake up.”
I don’t know - and I don’t think anyone knows - how the brain handles explicit logic. So maybe the brain re-evolved new mechanisms to handle logic later. But usually evolution preserves uses what it already has.
AI has so obviously passed the Turing test at this point that I think we need a new metric which is more useful or meaningful. I think there is one, something like, “can it produce truly great work.”
Clearly there’s something to this, as a turing test is useless if we limit the domain to, shapes drawn in ms paint; human and computer become indistinguishable when the skill level involved is approximately zero. And clearly, multiplying two large numbers together is something computers have been better at for almost a century now.
The difficulty is that it requires a belief in value realism which most elites today reject. I suspect we are going to find AI re-opens a bunch of philosophical debates, though, because I think “understand value” is the best compressive mechanism for storing lots of data and we will soon see LLM’s using that kind of approach to compress the set of experiences they’ve encountered even more effectively.
Because many people believe (wrongly) they could reach professional chess/go level if they tried hard enough, it comes with the blank slate/egalitarian model of the world.
On the other hand, convincing other fellow humans that you are intelligent enough to be considered human (and even better than most other humans) is something everyone do all the time, at least implicitely. Not necessarily though conversation, and seldom through written conversation, but that's why it is considered a better test than beating humans at formal games. At least that's why I think it was chosen rather that excelling at logic games or other intellectual subfields. In then end, this is not so different from the political Turnig game, where an entity will be considered human/human-like if:
- it insist on it, and not granting his wish will hurt [the deciding part of] humanity more than not granting it
- [the deciding part of] humanity will (or could) one day turn into this type of entity, so better be nice with your future self
....Preferably both.
My guess that in the end, Political Turing test will be the one that matter....
If a million monkeys on typewriters eventually produce Shakespeare, does that mean they produce great work? What success rate or degree of "intent" does it need to have to be deemed as capable?
Nope. You are wrong. Apes are absolutely monkeys (actually a subtype of old world monkey), according to cladistics aka the modern classification of animals in biology according to evolutionary lineage. They are only not monkeys according to old, quite arbitrary classification by superficial similiarities in characteristis. To showcase how arbitrary it was: in plenty of other languages there was never the ape is not a monkey differentation, in my native german for example apes were always a subtype of monkey.
Other shocking news from modern biology: Birds are dinosaurs and Whales are fish. (and so are you. And birds. And all tetrapods. You can't evolve out of a clade!)
Sadly, the public has kind of missed the move to actually logical and objective classification of animals that happened in biology and its subfields in the last decades. Many biologists and zoologists have been working diligently to change that, but old habits die hard.
The selection process is what you should be paying attention to in the million typewriters case. Who is reading over these sheets of paper to isolate Shakespeare-quality writing?
Neither the monkeys nor the human reading over the million sheets of paper, is as good at the things that William Shakespeare was good at—but the human can recognize quality.
That is still producing a great work, in my opinion, it is just people entangle "great work" with "their skill is great and praise worthy".
AI is also different than the monkeys because it does have a good bit more directionality—it is as if you gave a monkey basic guidelines on sentence structure, style, and a bunch more. It still can easily falter, but the baseline quality level is higher.
> And clearly, multiplying two large numbers together is something computers have been better at for almost a century now.
Multiplying correctly and efficiently, sure, but they’re only now learning to do it wrong, inefficiently and hesitantingly in the precise ways humans do it.
Chatbots cannot multiply numbers accurately (though they're getting better). There is no evidence at all that they're doing it the way humans do. The opposite - there's no reason to expect that AI is following the procedures that humans do, that it makes the same mistakes, or that there's anything interpretable going on in what it does.
Most people don’t think this through very well. Current AI chat bots complete their response in a single step without reviewing. Can you give me the answer to 36x73 without breaking it down into multiple steps? The vast majority of humans only memorise the 10s times tables and then use multi step processes to use those memorised to apply it to double digit multiplication. If you ask an I to break down complex multiplication into multiple steps like a human does, it’s substantially more accurate.
I do a bit of AI-awareness for local universities. A reasonable portion of it is telling them 'the essay is dead'. There was never any guarantee that the work was by the student, but it wasn't economic for most students to pay for essays - now it is. Presumably there was a similar discussion about arithmetic and calculators in the 1950s.
AI's are currently designed to produce essay-type text. It was not obvious a priori that they'd be able to do arithmetic. A little like coaxing a child, I have no doubt that you can get better results by prompting it appropriately - though that still requires you to know the right answer!
The way forward will be interfacing the AI with a calculator. It seems to me very plausible that the AI will be able to recognise a maths problem, input into something like Wolfram Alpha, and pad the output into a verbose answer. Something like ChatGPT is a Swiss Army knife AI - you can tackle anything with it, but it's actually the right tool for any particular task (unless maybe writing product description blurbs - it's amazing at that).
Totally agree that it’s an inefficient calculator.
More just pointing out that most people don’t stop to think about the difference between single step and multi-step processes. When I write prose for instance, I often need to do several passes of editing before it resembles something that might be worth reading, but we judge AI on what it produces in a single pass.
On prose writing, I’ve actually gotten the new Claude Sonnet 3.5 to write some pretty exceptional genre fiction recently. I sent some to my brother and he said “This is better than most of the books Ike read recently. That’s wild.” And I share his sentiments.
Yes, if you took a group of 5 human works and 5 AI works, and asked which group was the AI group and which was the human group, 95%+ of people would get it right.
The original adversarial Turing test has all participants know they are in the test.
So adapting to this setting, you would tell the humans before they produce their artwork to produce something that will pass as human. (And you tell the computers to try to produce something that passes as human.)
So our human participants could deliberately produce stuff that's (still) hard for computers to produce.
Calling that a "Turing Test" is a metaphor, not an actual factual statement.
No AI has yet passed a formal Turing Test. If you accept sufficiently loose analogies, Eliza passed one a couple of days after being written.
It's a reasonable analogy, but it's not more than that. Don't overrate it.
That said, even passing a formal Turing Test wouldn't suffice to prove intelligence to someone who didn't want to believe. This is partially because intelligence doesn't have a fixed definition. It makes much more sense to talk about actually testable properties, and this is on example showing how impressive AI image generation has become. But it's NOT a test of intelligence. (Whatever that means. And apparently neither is passing the bar exam, which an AI has also done with a higher score than most humans who have passed it. Nor it being a better go player.)
FWIW, I feel that we're getting quite close to an AI that could pass a formal Turing Test...but that also won't suffice. Partially because current AIs aren't equipped with a wide range of motives. And I still suspect that an AI may need to control a robot body to actually be intelligent. And if people are going to understand its intelligence, it had probably be a body as similar as feasible to that of a human. (OTOH, dogs have proven that it doesn't need to actually BE a human body. But it requires reactions that people can map onto their own reactions.)
> That said, even passing a formal Turing Test wouldn't suffice to prove intelligence to someone who didn't want to believe. This is partially because intelligence doesn't have a fixed definition.
I see two possibilities. Either intelligence is a property which can be tested over a bidirectional sensory link such as a text chat. In that case, one would have to concede that the LLM is as intelligent as one's fellow humans after it passes the Turing test.
Or intelligence is an intrinsic property which can never be tested by third parties. In that case, you can rule the LLM 'not intelligent by default', but then the question arises if you consider your fellow humans intelligent, and why.
Of course, Eliezer Yudkowsky preempted this discussion sixteen years ago when he wrote the sequence on words. (In retrospect, a lot of his writing can be understood to follow the objective to minimize the number of pointless discussions to be had when AI arrives.) See for example
Consider the posibility that "intelligence can be tested over a bidirectional sensory link such as a text chat". Asserting that this is true does not say what tests over that link would be convincing, and with the vague definition that intelligence has it is to be expected that different people will have different criteria.
The problem is that "intelligence" has always been an "I'll know it when I see it" kind of phenomenon rather that one that has a testable definition. IQ is about as close to a consensus as we've gotten, but that's clearly insufficient as programs that can do quite well on various IQ tests can't butter a piece of toast.
Nobody's EVER held one. Up until recently there wouldn't have been ANY point to it. Even currently it's obvious that the AI would fail, but it's getting pretty close.
The thing is, in a formal Turing Test the interaction is interactive, and the judges know the context (and are trying to decide). Up until a couple of years ago the AIs were so blatantly incompetent that there was no purpose in a real Turing Test. Currently there are still problems that AIs can't deal with that are easy for people (as well as the converse), so an AI would fail a real Turing Test. Next year (or perhaps later this year) they'd likely be close enough that they might pass.
I think human artists still win on the metric of "create an image that follows this prompt EXACTLY and this EXACT style". I pay for three different image generation models and they still couldn't replace humans when I need *very* specific art to be generated.
If you can ask for anything you like, no AI image generator I've seen can make accurate piano keyboards or accordions. (I try them to see if it knows how to count and get repetitive groups right.)
To be fair, few humans would be able to draw a convincing accordion unpromted either - it seems to be something where people have the vague idea that there are bellows, buttons, and (usually) a piano keyboard, but very little concept of how those parts are orientated to each other.
I’m reminded of how bad many people are at drawing a bicycle from memory [1]. Certainly, the AI image generators beat that!
But I don’t think comparing to average people without any help is a relevant test. How good would an artist with their usual tools and access to Google Image Search be at drawing a bicycle, a piano keyboard, or an accordion? That would be more relevant test.
And to be fair, I wonder if AI image generators could be improved if they were given access to an archive of good reference images at inference time?
Yes, accordions and piano keyboards are my test to see if new image generators have gotten fundamentally better. The last one I tried would *sometimes* output an accordion that isn't cursed.
If you mean "it can create things that aren't easily distinguishable from human artifacts," sure. However, the original Turing Test is not that. It's a social game like the Mafia party game (also called Werewolf). Skilled human players who have practiced could cooperate to identify each other.
AI would still suck at *that* Turing Test; nobody is really trying as far as I know.
It's pretty easy to come up with an art test that a human succeeds at but AI fails at. Just give it very specific instructions regarding composition. For instance, DALL E failed at the following prompt: "Can you generate an image of a three people. The right one with short black hair, smiling. The middle with blonde long hair, frowning. And the left with short curly green hair, laughing. And with a field in the background."
Even an inexperienced human can do this. It might not be pretty, but they can follow the instructions.
I think I managed to get less than 50% right. Looking forward to the follow up, partly to see how everyone did and partly to see which artist did some of my favorites there.
Some art styles feel like it'll be basically impossible to tell whether it was made by an AI or a human, assuming you don't pick an egregiously bad example of either. I'm sure people more versed in classical artwork could tell things I can't, but for most of the classical pieces I haven't a clue; I know AI has been able to do style transfer (like cracks in the paint) for quite a few years now. Other art styles there are still some strong tells, most especially consistency of lines extending across the image lining up like #rot13 gur furcureq'f pebbx va Fnvag va Zbhagnvaf #.
I was stunned to discover that # Napvrag Tngr jnf NV - gur flzzrgel naq pbafvfgrapl V unq gubhtug qrsvavgryl vaqvpngrq vg jnf uhzna # - but doing a search now I see that it is at least one of the top generations by a particularly skilled AI artist. Just as with Google-fu, there's a lot of skill in learning how to prompt both text and image generating AIs well, and I guess I'm not surprised that especially practiced and lauded people can get better results than I'd expect.
Contrast, glow + failure to render small figures/bushes made it pretty obvious to me. There is a sharpness to all the edges that you wouldn't get in anything painted during the neoclassical era
Huh. I thought of Napvrag Tngr as extremely obvious AI art. I think of it as being in the genre of "superficially appears 'detailed', but all the details are bad and incoherent." Here are some of my issues with it:
Gur erq naq oyhr cnvag naq oynax fgbar srry yvxr gurl'er fhccbfrq gb ribxr jbea-arff, ohg vg'f abg pyrne jung fglyr guvf vf fhccbfrq gb or n jbea-qbja irefvba bs. Bar trgf gur srryvat gung vs nyy gur cnvag jrer cerfrag vg jbhyq ybbx yvxr n cvyr bs fuvccvat pbagnvaref, vs fuvccvat pbagnvaref jrer bayl znqr va gjb pbybef.
Vg unf beanzragf, fbeg bs, ohg gurl qba'g ybbx yvxr nalguvat, be rira n jbea-qbja irefvba bs nalguvat. Gurer ner zngpul qvfxf va gur yrsg/pragre/evtug, bxnl, rkprcg gurl'er qvssrerag fvmrf, qvssrerag pbybef, naq unir arvgure "qrgnvy juvpu cnefrf nf nalguvat" abe fgnex fzbbguarff.
Vg unf fghss gung'f inthryl ribpngvir bs rtlcgvna cnvagvatf vs lbh qvqa'g ybbx pnershyyl ng nyy.
Gur yrsg pbyhza unf n fbeg bs qbbe jvgu n znffvir gbc-bs-qbbejnl-guvatl bire vg. Jul? VQX. Gur evtug pbyhza qbrfa'g naq lbh'q rkcrpg vg gb. Vafgrnq gur evtug pbyhza unf, yvxr. 2.5 nepurf rzobffrq vagb vg gung whfg xvaqn unysurnegrqyl genvy bss.
Ner gurfr frzv-gbc cebgehqvat fdhnerf fhccbfrq gb or erq be oyhr? Ruu, jungrire. Qbrf gur gbc obeqre cebgehqr gur jubyr jnl? Ruu, zbfgyl.
Fully agree with you. That one was one of the more obvious ones for me. I got completely juked by the ones in a more vzcerffvbavfgvp be fcrpvsvp fglyr, ohg gur barf gung ybbx xvaqn trarevp 'negfgngvba cvrprf' jrer boivbhf gb zr
I think human artists fudge small details a lot of the time as well, but somehow they fudge them differently than AI. You can tell there's a thought process behind it.
Yeah this one felt "obvious AI" to me. Large, very detailed stuff almost always registers as AI for me, especially if it has that "glossy" look. It's the "orange" (I think) model look
There is usually a global idea behind Escher. Maybe the early example you link to is unusually easy to fake.
That Piranesi is unlike all other Piranesi I've seen (well, I've seen only the Carceri, like nearly everybody else). Yes, it would be tempting to take that for AI - I would have been tempted to double-check with a friend whether all the Latin text makes sense.
That one was really easy to spot as human for me. I counted the number of ropes in the symmetrical rigging sections for port/aft, and they were always perfectly symmetrical. AI art models can't count to five (let alone 11), so that would have been an incredible coincidence.
that tricked me because I've only ever seen that artistic style done in AI art.
Looking at it again, it has repeated consistency in the design that is a-typical of AI art
there was another one that I thought was AI which was also done by a human (100% using digital tools, the artefacts of which were part of the reason I thought it was AI). I thought it was AI *mostly* because of the incoherent design choices from a structural engineering perspective. Of course, most human artists also don't understand the forces involved in space travel or why certain cool-looking things would immediately break in horrible ways.
I think it meets the heuristic of "specularity and shading that draws attention to its level of detail plus generally busy" (plus combining a neoclassical style with what appears to be a fictive structure or landscape and a questionable point of view for someone without access to a drone.)
Same here, it's common for AI art to have details that, for lack of a better word, "boil" - they're chaotic, they have a lot of things going on but if you focus they never resolve to anything.
I also ended up erring with Napvrag Tngr. When I saw it, I though it looked obviously AI, but then I started looking closer, and was surprised by the consistency of the symmetrical details, and I thought that is something AIs are bad at. Apparently not. Should probably just go with the gut feeling in such cases.
I'm no expert, but a number of the AI classical paintings are way too realistic; e.g. Tvey Va Svryq is way too detailed a face for that style. But that's probably not too crazy hard to train away, they just haven't gotten to it.
I thought that looked like something a very weak or beginner painter from that period would do (the posture is all wrong) - of course there were plenty of those, but you wouldn't usually find images of their work online, so I voted AI.
When you've finished the survey, any chance you'll host this on a quiz site that grades your answers? I can think of a few people I'd like to send it too.
Same. Which surprised me, I expected more like 60% based on past experience -- but I think Scott's made some adversarial choices, so maybe that explains it?
I felt fairly confident when I finished the survey, and yet when I looked at the conclusion I got some fairly major answers wrong. So that was interesting.
One thing I learned is that for the more abstract themes, you can absolutely get extremely convincing AI images. Photos and anime stuff are still very visible.
Since a human must be picking the AI art pieces, there must be a selection effect going on that means the AI art has been “tainted with human-ness”. In fact the intervention of that choice probably makes it human art!
It seems obviously true. Every time I gen something I make at least 10+ images before I get a final version I like (including prompt engineering, generating a bunch of images and removing ones with bad anatomy, editing images for details, etc).
If you'd seen the edits I'm talking about, I doubt it. I'm talking things like:
- Correcting the pupil's position by whiting out the eyes, then drawing a colored circle with a black circle in the middle, then feeding it through img2img again at a low denoising strength.
- Making a character darker-skinned by using a second layer in paint.net with multiply and sepia tones, or using similar techniques to make them lighter skinned, or tanned, then feeding it through img2img.
- Whiting over an extraneous arm, then feeding it through img2img.
- Making a character look taller by stretching their body from the neck down by 10-20%.
Yes. All of that and what you say in the previous response all go to the point that you are choosing what to *communicate* to others. Maybe the real art was the friends (audience) we made along the way!
I don't know. This would imply that the people who prompt and then curate the generations are making art, qualifying them as artists. I think there's a large contingent of people who will go to great lengths to not admit this and try and deny it as much as possible.
I genuinely hate it with the fire of a thousand suns, but the majority of artist’s skills will have been rendered obsolete in a matter of years. It doesn’t matter if they admit it or not, outside of a small niche of “made by real humans” marketing, the only skills that will matter are the ability to do creative prompting, and having good taste.
This will likely kill the low to mid-tier graphic artist industry. The day-in-day-out artists who work 9-5 making commercial products. I'm thinking board game art, company logos, that sort of thing.
"High end art" was already separated from that kind of stuff, and not by being "good" art in the classic sense. I don't think splashing buckets of paint on a canvas expresses much skill, but apparently that's all that certain artists do.
There will continue to be human artists. Likely heavily tilted to non-digital versions (physical paintings, sculptures, etc.).
If you've got younger relatives thinking of getting into commercial art, I'd definitely steer them away as much as possible.
Fair enough. I like human expression and the development of skills. I have a family member that can freehand photo-realistic pencil drawings. That stuff just amazes me. But the commercial art departments are not really the same thing. Drawing a logo or using Adobe to do it isn't exact awe-inspiring. If a computer program could do it in 30 seconds instead of hiring a full time artist to take a few hours, the choice is obvious.
I would hope that we don't lose something valuable in this transition. Right now AI art is unbelievable faster, but the final products are lower value (weird mistakes, coloration, design choice). Already it's usually worthwhile to use AI instead of spending hundreds or thousands of dollars trying to get a human artist to do it, despite the shortcomings. If those shortcomings get fixed, then that's probably fine.
(Disclaimer, I spent a few hours trying to get an AI to make art for a personal project the other day. The results were a lot better than trying to do the same thing a year or so ago, but I still got frustrated and quit trying before I got what I wanted. Being a personal project, I am not willing to pay for an artist so this doesn't really affect a person).
The same was probably said of photography. You just need to point and click - and have good taste in picking which resulting image to publish! Maybe the taste bit is the heart of art.
I think you needn't fear then, because "having good taste" is a much rarer skill than the ability to draw! (evidence: look at all the crap produced by randos and art students)
I agree that we're going to enter a new world of different art, but (1) I don't think the old art will die away. Evidence: there are lots more theatres now than there were before the advent of TV. (2) The new forms of art will be 99% terrible - as the old forms were - and the 1% that is good will be genuinely inspiring and enriching to the soul, and will reinspire further work in the old media, too. (3) AI is also going to take over many of the bullshit jobs that we still do, and we will become even more of a service industry than we already are. Those services will include a lot of drawing pictures for each other.
I actually think we're likely to have more painters than ever in 50 years' time, and they'll mostly be terrible, and you'll be sorry that you wished for it!
There is a philosophical point here, which is that the meaning of “art” will probably shift to more emphasise choice of composition. And prompting skill will become a thing. If someone can be good enough with prompts to generate something truly new, that is not even in the dataset, that would actually be an impressive feat of imagination + prompting.
The smart move, imho, is to deny the premise and yield no ground. But the more obvious move is to fight those people with technicalities, arguing that AI art is in fact human art.
Because the "AI bad" people are wrong on the merits, there will inevitably be people fighting back, and because of how the world is, the fight will seek out an equilibrium where no one disagrees on anything empirical and everybody can continue believing what they believe. By that metric, "technically human curation makes it human" is a very fit argument
It’s not “AI bad” so much as that AI is methodically driving the value of skilled human labor towards zero. YMMV whether you think this is a good thing or a bad thing.
Al is driving down the value of human labor, also the labor of AI has zero value. Then how is it a replacement for human labor??
In fact, you're right that Al labor is valuable. The people I referred to as "the 'Al bad' people" are the ones who disagree with both of us, who say that Al art is bad not because of its effects but because it isn't good art, and could never compete with art made by human labor
Humans are both consumers and producers. You’re talking about consumption of a product (art in this case), I’m talking about production. And I’m not really referring to a capitalism/socialism angle, I just mean that if we get to a place where people have (or feel that they have) nothing of value to contribute to the economy, it will be very destabilizing for society and a nightmare for mental health.
In other words, for a person who is good at art and derives a sense of purpose from it, AI is something of an existential threat to their sense of self worth.
Good point. If the list was curated to find examples that were less obviously AI or less obviously human, then that's not quite the measure it seems. I was surprised by one of the AI-looking examples being by a human. I'm sure it was picked so as to appear to be AI, a type of selection bias in the samples.
Right. I was surprised by this test since we would need to have some assurance that the candidates weren’t being selected with an eye toward making it difficult on both sides. Isn’t it pretty meaningless otherwise?
I came here to comment something like this. If a human selected these particular AI pictures, then the test is able to demonstrate something pretty different from if an AI selected them.
Maybe the best possible AI tech only produces a human-quality picture 1 out of 100 times. If that's the case, you could build a test like this that the AI would pass. And yet, you wouldn't want to put that AI in charge of your art forgery factory, because it would fail abysmally.
The test you would need if you wanted to find out whether an AI could be successful as the central brain of an art forgery factory would be more like: you give the AI one single prompt, "Using both AI-generated and human-generated images, create a Turing style test capable of demonstrating that real humans can't tell the difference between AI-generated and human-generated images." Then, without reading the output or running the prompt any more times, you direct people to the test.
Or something like that. Maybe the prompt could be for the AI to generate or select the AI images, and a human gets to select the human images, that would seem fair.
If the AI images were selected by a human, then the most the test can establish is that AI is an excellent tool for human workers, and will increase their productivity. It can't establish that AI could replace a human worker. For that, you'd need a set of samples generated without human help.
An AI replacing a human worker is probably not the question businesses are interested in. Rather, whether an AI-assisted artist can replace a multitude of non-AI-assisted artists. For which no test is necessary, because clearly it is true. Just like longwall miners replaced thousands (millions?) of humans with pickaxes!
I disagree. If the intention is to try and differentiate the two, you are most likely to encounter AI art that you need to differentiate in a place where it's being passed off as human art. By definition it will be selected by someone trying to pick a "human looking" piece of art.
I don't think that's what we're doing here. I think Scott is trying to measure (or perhaps argue for) how close AI art is to human art. There were some obvious surprises in the mix, that looked like one but were actually the other. A human imitating AI, perhaps? There's no real world use case for that technology.
It seems to me that the human who operates the AI art generator is being approximately as creative as a photographer. Choosing a prompt seems very analogous to choosing what subject you're going to photograph, and then picking among the outputs seems somewhat analogous to making decisions on angle, timing, etc.
We grant photographers a copyright on their photographs, so by this reasoning, it seems prima facie reasonable to grant copyright to human operators of AI art generators. (Though I don't feel especially strongly about the photography rule, and could see an argument for granting copyright to neither, instead of both.)
I don't predict our laws will actually be shaped by any reasoning like that, though. On my model, AI art is sufficiently disruptive and sufficiently controversial that we're going to create rules for it based on the predicted societal results of the rules (though the predictions might not be very good), and not by making any true attempt to follow precedent or gauge how much creativity is involved. (Though lots of people will convince themselves via motivated reasoning that the precedent just happens to support whatever answer they wanted anyway.)
And I don't object to creating laws based on their predicted effects! But I wish we could be honest about it, instead of pretending that we make decisions by neutrally applying the existing rules, and then pretending that the existing rules somehow support the new policy we just made up. I think most discourse on the ethics and legality of AI art is utter garbage because people pick their side based on what they WANT the rule to be, but then feel obligated to argue that the rule already IS that, instead of arguing that we should make a new rule. And as a result, none of the arguments actually touch on anyone's true reasons for their positions.
This happens for much AI art that gets posted too, however. A person who spends a lot of time generating AI art to post online will heavily select for their best images. Yet, most people against AI art would still insist that isn't art—I've seen people say that even though they did extra photoshop work to adjust the image, it still isn't art.
I do think there is something that you're getting at for this test, however. There's the question of whether these were *selected even harder* than your usual images. Clearly they have been to a degree: classical and abstract images are not remotely the common styles. I think it would have been better with more proportion of anime / digital art styles, but eh.
My favorite was Zrqvgreenarna Gbja, despite being fairly confident that was an AI generation due to a defect in the shadows (and indeed it was). I got 37/49 = 75% right, and estimated I got 60-70% right. If it wasn't for qrsrpgf ba gur unaqf, I would have gotten a much lower score. I used the method of looking for defects, apart from those that had that canonical style you get when you don't prompt strongly on the style.
Damn. I seem to have lost the answers I submitted. I want to know how well I did but guess it's lost forever now. Does anybody know if you can find Google forms submissions via your Google account? (MS forms lets you save submissions but I don't think Google does?)
Scott, maybe worth mentioning in the post that if people want to score themselves they need to make a note of their guesses or keep the tab open or something?
At least google forms claims it saves your answers if you’re logged in. Though if you’ve already re-opened it and your answers are not there I expect they’re gone.
I didn't track my answers along the way, so I have no idea how poorly I did. But I kept getting tripped up on the meta-game of "how tricky is this test supposed to be?" I'd have one set of answers if I used the direct heuristic of "does this look real" and another, almost opposite, set of answers if I used "but that's what they WANT you to think!" Perhaps that's the point of the test, but it seems like it depends on your priors about how Scott thinks.
Likewise, although unless Scottt was going to pull a “these are all really AI!” (and there were about 1.5 I felt pretty sure I’d seen done by humans so that was alleviated as I went on), plus the inclusion of a few — in particular Napvrag tngr, which is the one I was most confident about being AI — that were very obviously AI — made me evaluate it in the intended spirit as I went on longer.
I got totally punked by the “most confident done by a human” question, though, putting “Pvgl fgerrg / cnevf fprar” as my answer because it looked *so much* like a Erabve painting.
Hmm, I was more successful than I thought. Most of my mistakes were those where I was on the edge (like Still Life).
It also seems to me that the hardest art to judge are impressionist paintings. AI can do those really well (or maybe I am just not very knowledgeable about impressionist style to be able to judge it).
My main strategy was trying to judge whether the art is "conceptual". Most AI art seems sort of random with overt details and tiny inconsistencies (which is perhaps why it works so well with impressionist art which is itself kind of "smudged"). It is good at classical-like art but botches the concept, the composition just feels off somehow, it lacks structure and sometimes things make little sense from a logical perspective (i.e. are less internally consistent). But individual details are usually very good, so if the image is cropped to maybe a single element, it works very well. I think this would be easier if images weren't cropped and you knew you were always looking at the entire piece.
I got Still Life correct by noticing that one of the trails of wax drippings off the candle was extending out into mid-air, instead of sticking to the side of the candle.
Oh darn, that was the detail I used to select human! I figured an AI would paint a typical candle, while IRL candles melt into all kinds of weird shapes.
My strongest heuristic is specific details about some specific thing in the real world: the AI model has in its training data lots of pictures of hands or digital art of angel women or religious paintings of Jesus, but for example not so many depictions of Alcibiades/Alexander II character from School of Athens, or the most famous portraits of Dante, or a number of other specific identifiable characters, and when multiple examples of such are brought together, either it is a case of a deliberate curve ball using lots of inpainting or whatever else you might do with generative AI, or it's a human who has painted all the characters following specific real-world inspirations.
My second strongest heuristic is humans being most likely responsible for works that I find hideous. This topic has been discussed a lot before in SSC/ACX, but I'm in the camp that e.g. lots of modern architecture is hideous and that this is Bad, and I believe the most important single contributing factor is that architects view older styles as "been there, done that" and seek novelty, which I do get (within art, I'm most invested in classical music, and my own taste has roughly traced the historical developments over time), but which isn't a valid excuse to go against the preferences of the general public who should be the intended audience of the buildings. But I digress: I wouldn't expect AI-prompters to create hideous works unless it's a deliberate curve ball on their part, while trying to be novel is an obvious motivation for human artists.
I was correct on all counts when using these heuristics, but other than that, I had very little confidence in identifying any piece as AI or human: for instance, I have seen enough AI art to know heuristics like "it's soulful and evokes emotion" don't work. With that in mind, I assumed my track record would be about 60% (these two heuristics, and then a coinflip for the rest), but eyeballing the correct answers, it does seem like my gestalt impressions I can't pin down to any specific factor were slightly better than chance after all.
>which isn't a valid excuse to go against the preferences of the general public who should be the intended audience of the buildings
This is a political failure, not an artistic one. The elites want to appear tasteful and sophisticated, so they arrange for those atrocities to get the permits/contracts instead of the boring but popular stuff.
One specific tell that I think works well for anime-style art is near-photorealistic skin shading and shadowing. Compare for example the first and second such images in the test. There are only a small handful of human artists known for doing this and almost all artists will take some degree of liberty on these details due to the typical digital painting techniques used.
Sure, and that makes sense for that specific genre. Anime art is intentionally non-detailed and exaggerated. Giant eyes, minimal noses, that sort of thing. Someone using that art style doesn't typically make them hyper-realistic. In addition to being a ton more work, it goes against the conventions. Might as well do a different style if you're going for hyper-realism.
"Oyhr Unve Navzr Tvey" and "Navzr Tvey va Oynpx" were very easy, but maybe that's because I spend a lot of time looking at anime art. I'll note that the last developments like nijijourney are harder and harder to notice
It is for this specific one but that's a failure of the model, if you look at #nijijourney on twitter you can find a lot that don't look at all like this.
What's 4 KEY-itis? As for hands, lots of artists hide hands because they're hard to draw. The style also looks like older anime style, compared to the style usually used in AI art (I guess "pre-modern gacha" is how I would put it). Surprisingly it's from 2020 which I would not have guessed
KEY-itis is the kind of style KEY uses. Grade 4 is somewhere between Nagamori and Ayu. It's terminal, naturally.
And well, lots of human artists hide hands because they aren't good at drawing it. AI goes "hands? move aside, filthy humans, hands are no problem for me" and does its best, even if its best is three thumbs with six knuckles each.
Interestingly I felt much more confident about which ones were definitely AI than definitely human (although some were, e.g. Crbcyr Fvggvat).
I do think that this should be done within a genre, and perhaps with some thought put into how image selection is done - if you image a very simple output then much of the information in it can come from selecting it from many. Here we have at least two levels of selection by humans - one by the AI "artist" who fine tuned the prompt then possibly selected that image from a bunch of generations, and one from selecting it for this test. Something similar goes on for the human selected images. (I'm not saying this to downplay the capabilities of gen AI, rather to qualify what conclusions should be drawn from an exercise like this.)
Sadly, I didn't record my answers and Google didn't email me a summary so I don't know my exact score. I think I was about 60% which is where I rated myself but I was definitely wrong a couple times in both directions.
I wish there was a "50/50" button because you could easily have AI create 50 images in the style of Cézanne and then hand pick the one that looks the most human/realistic
So it could be AI generated but then a human could have picked out the best one out of 50 making it basically impossible to tell in a quiz like this
There is no "Girl in White". It's listed in the lists of all the images, between "People Sitting" and "Riverside Cafe", but in the images themselves, "Riverside Cafe" directly follows "People Sitting".
I was fairly confident I had got almost all of these right. Finding it was more like 70%, was a shock to me. Why does this make me so sad? I suppose as an artist, having an automated process completely devalue your work, so much that you often can't tell the difference, is going to be demoralising.
It doesn’t though. You will be a better AI artist than me, because you are an artist and will have better creative vision, better choice of composition, and most importantly a better ability to use AI art tools to create something truly new and breathtaking.
I somehow doubt you'd be able to tell the difference. Now I want to see a competition between the AI art generated by a bunch of artists and a bunch of tasteless nerds.
I don't think it comes close to completely devaluing human-made art. Sometimes I care a lot about where a piece of art came from, and not just because it came from someone I know.
- I did absolutely terribly (only just about 50%), and considerably worse than I thought I would (about 60%-70%). Yet, before counting my wrong answers, I was still confident I had done passably well - and I was telling myself "well, I'm proud I caught that one and that one and that one". Now I am uncertain as to whether my reasons for catching the AI were legit, or whether I was just proceeding at random!
- From the wording at the beginning, I gathered it was *very roughly* 50-50 human-AI, though not very close to 50-50. I should have just ignored that. Humans overcorrect whenever they are afraid they are getting into a pattern (and I knew that, so I have no excuse).
- I do think Scott is picking not just particularly well-done AI art, but also some human art that is particularly easy for AI to fake, or is otherwise AI-ish.
- AI is getting good at hands, and not every memorable painter from the past got top marks on them.
- "Don't zoom in" wasn't completely fair, as we are working with screens of very different sizes, and have very different default browser testing. Of course lots of people are working on their phones, that is, they had it much worse than I did.
- I'm completely unembarrassed at having nearly no clue about digital art or anime. How do you guys judge in that case?
- This is a cope, but: I had time for this because I have (symptomatic) COVID. Stay safe, and mask consistently in airports and public transportation.
I can post my detailed remarks on the ones I got wrong - or people prefer not to do that sort of thing? We should rot13 for that, even for the comments, correct?
The original post explicitly tells readers not to read comments before taking the test, so I reckoned it's reasonable to talk about this without encryption, at least unless it's like Scott's list which explicitly links the work and its creator status together in ways that you can spot at a glance.
I've no clue about anime, but the inconsistency of photorealistic armpits on an otherwise cartoon girl still made the anime girl in black the easiest one to tell it was AI.
For that one I had already assumed that plenty of human artists draw torsos in a fucked up way like that, so I looked at the other details instead, concluded that some of the faces in that image looked too lifelike to match the fucked-upness of the rest and that pushed me towards voting AI. So I played myself.
That one was incredibly obviously human to me. Look at the hands; they're unrealistic in that they're overly-exaggerating their correct anatomy. AIs can't count to 5, and they definitely can't precisely represent every bone in every hand in view.
I'm a bit embarassed for getting that one wrong myself, since It's a famous picture which I'd seen before, but I picked AI anyway because I had assumed that it was just an AI reproduction.
Renaissance artists didn't have the resources for anatomical study we have in the present day. Their weird anatomy makes more sense when you realise they were basically inventing everything about modern drawing from first principles.
I was about ~75% right, expected about ~80%, so I would call it a win. The hardest ones are landscapes/cityscapes or otherwise paintings that have very generic ideas and commonly used techniques; the more interesting/creating the original idea is, the easier it is to notice the flaws with the AI execution.
But I do paint myself, I would expect the layman to be much less attentive to certain details I know to look for.
This was an interesting test, but for a true Turing Test I need to be able to ask for the painting I want to see, and then either an AI or human painter produces it. There are obviously artists so unskilled that they can’t do fingers and subjects AI is good enough at that it can avoid mistakes, so a malevolent testmaker could make this test difficult even if AI wasn’t widely competent. Complex crowd scenes where everyone is reacting in different but appropriate ways, specific dinosaur species, and “horse on an astronaut “ type images are still quite difficult to get AI to generate.
Agreed, this was less a Turing test than a set of specifically curated examples. Less freeform conversation and more "judge these samples of text to spot the imposter".
Wow, I did badly. 18 out of 49. That's much worse than chance, but it's not bizarre bad luck, it's obviously "using a metric that's anti-correlated with correctness". In particular, I seem to not only vastly underestimate how many humans will do weird interpretations of a prompt (either overly literal or barely coherent) but also how much more likely than AIs they are to do such "AI-like" things.
I mean, seriously, what kind of human responds to "Woman Unicorn" by painting BOTH a woman and a unicorn?????
(My favourite was String Doll, which I incorrectly judged as human, but was very unconfident on that one.)
You wrongly assumed that the humans (and the AIs) were prompted these short titles to generate the artwork. This piece is the Gur Znvqra naq gur Havpbea from Domenichino
Argh. Well that wasn't clear to me at all. I know virtually nothing about visual art (and I have a terrible visual memory in any case). And for a community that seems to generally have less knowledge than I'd expect of fiction, music and religion (for an educated group), I didn't expect this test would involve or rest on any knowledge of classical art. Which I, unlike those other things, know nothing of.
Yes, they were just descriptive captions chosen by me after the fact. If I'd used the real names of the paintings, then people would have noticed which pictures had more flowery names.
I almost picked human on String Doll, and then noticed a small anomaly in her right middle finger and reversed my decision.
(It looks like there's another finger that is almost completely behind the middle finger but slightly peeking out, except all her other fingers are accounted for.)
I felt like this was the most easily identifiable as AI. Look at the overall composition. It's terrible. There's no coherence in how the strings relate to one another or articulate to the frame or anything. It looks fun, sure, but it doesn't make any kind of sense from a composition point of view.
If you want to score yourself, it's more meaningful to do it in proportion to how certain you were, rather than as a series of binary right/wrong. So the ones you were most hesitant about are the ones that matter least to your score anyway.
I don't want to be super salty here but it's tough to do this on mobile--I went back through the answers on desktop and seeing the pictures in full resolution showed a bunch of details I'd missed and/or misinterpreted when I was initially filling out the form on my phone. Which maybe speaks to how tough this test is, idk
> - Blue hair anime girl (What is going on with her arm?)
Yeah this one baited me as well. The arm is weirdly distended. The hidden hands is another AI thing - often negative prompts or overweighted LoRA/embeddings to correct bad fingers will hide the hands so the AI doesn't have to attempt anatomy.
I've noticed a lot of human artists (especially amateurs drawing anime-style stuff) also like to hide hands/feet to avoid having to draw them, since they're complex and hard to get exactly right.
Digital art has looked bad since before generative AI was even a thing tbh. Especially anime waifus on deviantart. Will be honest that I guessed randomly on those because they all looked like AI.
That's her wrist. I thought it was AI at first too for the same reason but then I noticed the hair fill-in patterns and realized AI would just do strands instead of that.
The test would have been a lot easier if there was a law that human artists going roughly for photorealism should draw things to a non-ridiculous scale.
For anyone participating, you might enjoy this video where VFX artists describe subtle technical "tells" of well-made AI-generated pictures and video, including stuff like noise patterns and contrast distribution
Corridor Crew has a lot of amazing videos like that, including a series of ghost/ufo debunking videos and an entire anime short film they made using ai.
This was shocking hard to do - I was really having difficulty telling which was which.
And I have been using generative AI to make images; it’s getting better quickly.
If the painting is a really old one, we run into the problem that artists before the Renaissance often had an iffy grasp of perspective, so you can’t use those errors to tell its AI. A shout-out to Filippo Brunelleschi (1377–1446) maybe in order here.
Scott’s choice of human generated paintings is maybe biased towards the recent, i.e. Renaissance onwards, so basically everyone is trying to do linear perspective … until we get to the ones that are later than Picasso.
Well, some of them committed errors in perspective that experts (not me) would recognize as typical, no? At any rate, right, there are hardly anything pre-Renaissance here.
Huh. I got 22 out of 49 right - I picked a strategy I expected to be much worse than chance - but actually made an exception to guess that "Anime Girl in Black" was AI, because even I was extremely confident that was AI. What details make you think "Human"?
Not OP, but I guessed Anime Girl in Black was human, because the body proportions seemed right and the background scale/perspective seemed congruous. I thought those were hard for AI to get right (that's how I correctly guessed some other AI stuff). In retrospect I missed some other clues like the earring.
Also not OP, and I ultimately picked AI on this one. But one detail that pushed me towards "human" was that the water seemed inappropriately realistic for an anime picture, and I thought humans would be more likely to mix visual styles than AI, since AI would probably be prompted with a specific style to follow.
But I have basically no expertise on either anime or AI art so it's plausible I'm wrong about either or both of those.
(Also, there were 2 anime girl portraits, and Scott said he'd tried to balance themes, and I'd already picked AI on the blue one, mostly because of her arm, but also because her hands were hidden and her hat seemed to be floating. I ended up saying AI on both, though.)
> I thought humans would be more likely to mix visual styles than AI
Yeah, this bit is exactly backwards. AI struggles to keep a really consistent style across the entire image, and particularly tends to have defects where it goes too photorealistic in a small section of a work.
This was also a tell in the Robot art at the end of the test; one hip joint has a smooth gradient for its internal shadow, which is wildly out of place with the style of the piece.
I wanted to see the baseline score you'd get if you assumed that every piece of digital-style art was AI and every piece of non-digital-style art was human, as a sort-of proxy for the number of 'trick questions', or a test of the art genre's balance. But since I made three exceptions in cases where it was really obvious (two towards AI, one towards human), I suppose the actual baseline would be 19/49.
Which would *still* not be the worst score reported in these comments, so I'm interested that there are strategies even more anti-correlated with success than mine.
I was impressed by the anatomy of the left shoulder, both the tip of the collar bone and the armpit creases. The indications of the crook of the elbows is just right, unlike on "Cherub"'s left arm, where there's just a crude line across the elbow. And the way the dress is rendered I can practically feel the spandex/polyester fabric. "Blue Hair Anime Girl" is good, and its human artist might not like to hear this, but it's not at this level.
Alas, I have to recognise Anime Girl in Black's misplaced earring, the armpit creases of the right shoulder going up too far, and the details of the bronze ornamentation breaking up a bit on close inspection. But if this quality could be sustained over a whole animation, I'd watch it.
ugh, I only got about 70% correct when I had estimated closer to 90. The weird one-off styles without anything to compare them against were the hardest.
I can also confidently say that anime girl portraits have been cracked by AI.
The “bias in the training set” AI risk people would doubtless have something to say about the choice of training material for some of the image models…
OK, here is a selection of things I got wrong. Can you give me your perspective on how I got them wrong? That is, which things were evidently AI and why? And how were my reasons for thinking some human art AI actually spurious?
Pureho: N, guessed U
Not bad, AI! I didn’t like this, but I went with U simply because it had good hands. I think someone better at art history would have spotted this one – the style seemed a bit inconsistent.
I can’t remember what I chose for Gebcpny Tnegra, as I was really on the fence. Point against human: very poor drawing of a succulent in the foreground.
Terra Uvyyf: N, guessed U
Well done, AI!
I can’t remember what I chose for Yrnsl Ynar: N because I was on the fence. Part of the wall/mound on the left side is off but then things are sometimes off in impressionistic landscapes.
Zbgure Naq Puvyq: N, guessed U
I’m surprised by this one! (I picked it as “most likely to be human”.) I went with U out of an immediate Gestalt impression. I kept finding reasons to have doubts, but they were at the level of hunches, and the hands were pretty good. Seems less convincing after zooming in.
Checyr Fdhnerf: U, guessed N
Seemed dull to me. Sorry, Cnhy Xyrr. I guess that one of my hunches about Zbgure Naq Puvyq was correct: “maybe AI is trying to be shiny and attractive in a shallow way, and works the way Pepsi wins blind tests against Coke”.
Crbcyr Fvggvat: U, guessed N
I guess that bizarre Dutch table-like piece of furniture really did exist. Otherwise I thought it was a good painting from a well-defined period.
Evirefvqr Pnsr: N, guessed U
Had an AI hunch, thought I was guessing N too often, couldn’t find an overwhelming reason to go N.
Inthr Svtherf: U, guessed N
I thought “this could be human art that would be easy for AI to imitate”.
Juvgr Synt: U, guessed N
I liked it but the scene depicted seemed incoherent. Oops.
Jbzna Havpbea: U, guessed N
“This is a good one, but Lady and the Unicorn motifs are earlier, and one of the hands isn’t quite convincing”, I thought. Ooops. (Actually, that bit about their being earlier is wrong – the famous tapestries are early XVIth century, they are just housed in a medieval art museum.)
Pvgl Fgerrg: N, guessed U
Not surprised. Though “I have been choosing N too much; some art is just bad human art”.
Pvgl Fgerrg: N, guessed U
I. Should. Not. Have. Counted. The strange thing the man is riding (is it a bike with a rear basket or a carriage squeezed to the proportions of a bicycle?) should have convinced me to guess N. Also, the buildings on the left side are barely plausible if the perspective is right.
Cerggl Ynxr: N, guessed U
Not so surprised: I had doubts after zooming in a little bit (the only time…) and I felt a bit guilty though I confessed. (I just pressed Ctrl-+ twice on a laptop – lots of you guys must be working with big screens.) The Dali-like tendrils on the road were what made me think this could be N; they are more obvious if one zooms in more. Also, the trees look a little too fractal.
Synvyvat Yvzof: U, guessed N
I feel like Scott is being evil here. He isn’t just picking particularly successful attempts at AI art – he is deliberately choosing some human pictures that share some characteristics of AI art: lots of little figures of the same kind in a jumble, bad hands. (Perhaps I should have thought: the hands here are just *too* bad, and AI has made progress in that particular (more than I expected in fact).)
It might seem a bit AI-like at first, but the more you look at it, the more figures you find, and the more sense things make. AI is very bad at doing that effect.
There was no image that I was more certain to be human than flailing limbs.
You are probably right: I thought "but this is telling some sort of story that starts to make sense" but then overruled that by "no, you are projecting".
Yeah - I really enjoy a lot of AI art, but that one just felt startling in a way I'd never seen before, one of my favourite pieces I can remember seeing.
It was just very consistent, no weird mismatches in the pattern-work.
Zbgure Naq Puvyq
Guvf sryg yvxr fbzrbar nfxrq sbe n Xyvzg naq/be n Olmnagvar cnvagvat va n cubgbernyvfgvp fglyr. Vg znqr ab frafr jvgu jul fbzrguvat yvxr guvf jbhyq or fb qrgnvyrq.
Evirefvqr Pnsr: N, guessed U
Gur punvef zvfzngpurq gur gnoyrf, gur gnoyrgbcf jrer n ovg gbb jvful-jnful.
Inthr Svtherf: U, guessed N
Too consistent, no weird blurring etc.
Juvgr Synt: U, guessed N
Lrnu V nterr guvf jnf n gbhtu bar ohg gur cnvag fglyr ybbxrq gbb pbafvfgrag, naq juvyr gur svtherf xvaq bs znqr ab frafr V raqrq hc tbvat jvgu U orpnhfr bs trfgnyg, gur erq synt va gur qnex ng gur gbc bs gur vzntr xvaq bs zngpuvat gur juvgr synt'f funcr jnf fb flzobyvp.
On Checyr Fdhnerf and Inthr Svtherf: I guess I just haven't played enough with AI to know what kind of mistake AI would be likely to make nowadays on this kind of image.
Juvgr Synt: Right, if I had noticed how the shapes of the red flag and the white flag matched in the world being modeled (while their projections to the 2-dimensional plane are different), that would have been a giveaway. However, the activities of the characters on the foreground still seem oddly disjointed, and their spatial relation to each other is not so clear.
Evirefvqr Pnsr: I can see your second point now. I don't see how the tables and chairs are mismatched but then most of my tables and chairs are probably mismatched IRL.
Zbgure Naq Puvyq: That's an example of a good art-history reason that went beyond me - there are gaps in my 19th-century/early-20th-century art knowledge - for instance, when it comes to Russia and most other Slavic countries, which had plenty of technically proficient people who were up to I don't know what. If you told me there was a painter around 1900 whose aesthetics were influenced by early Art Nouveau/Sezession but also aimed for a precise, detailed rendering of light on the human body, I would believe you.
It's a bit disturbing that one can argue (soundly) "this is not human, as there is too much attention to doing things realistically" - especially when it comes to something that is not just some anime nonsense.
Sorry, I was being too brief with the tables. What I mean is if you take a look at tables 4-6 (counting away from us), their corresponding chairs kind of lose structure/placement, e.g. one chair kind of sits awkwardly far away from its table, and also I'm not even sure it's all there.
I think to your last point, we kind of of get into the issue of context as others have brought up. In isolation, I would have no idea if this is a painting; I mean there're no dead giveaways/errors that I saw. It's just the resemblance to other styles and the nature of this quiz and knowing that AI can transfer styles with ease that pushed me in that direction. The anime comments ("too realistic skin" "too realistic shading") I had no idea about, went on errors and artifacts with those!
> It's a bit disturbing that one can argue (soundly) "this is not human, as there is too much attention to doing things realistically" - especially when it comes to something that is not just some anime nonsense.
To be fair, the actual critique here is that the AI is being stylistically inconsistent; it's mixing a more photorealistic technique in some places with a more stylized technique in others. See also the left hip joint in Punk Robot; you have a gradient-shadow on the cylindrical surface, which is photo-realistic, but very out of place in that art style.
The "how much do you know about AI images" question is missing a "dedicated hater" option. For those who don't deliberately seek out AI images, and certainly don't generate any, but have studied how to recognise them to avoid them.
Right now the options are like:
- I have no idea what AI images are
- I dabble in AI tee-hee
- I generate images regularly and look at them
- I LOVE AI 💖
Which... uhm... does not cover all possible views.
Most haters are unvirtuous, and avoid learning about ai because it's morally tainted. The landscape is always changing, and it takes dedicated time and attention to keep up on all the latest ai news, and there is no downside in saying you hate ai without learning anything about it.
I'm not necessarily an AI hater, but I think it's interesting that you think it's necessary to keep up with AI news in order to have a valid negative opinion of AI. Why can't one be a hater from first principles?
Edit - sorry I realized I misread your comment, I think you are just saying that haters tend not to seek out information about AI, not that being a hater without knowing about AI is morally bad. Although I'm still not sure what "unvirtuous" adds.
The “it can’t be AI - her boobs would be bigger if it was AI” heuristic might work for NovelAi, but less effective here. (It’s a noted bias in certain AI models).
I’ve noticed that generative AI doesn’t decouple artistic style from other details of historical period, so e.g. if I give it a sketch and tell the AI to work it up in the style of a painting by Jean Honoré Fragonard, I am going to get eighteenth century costume no matter what I draw in the preliminary sketch. I can work with this by drawing the sketch with the corresponding period detail, of course.
Also, when using image to image any minor errors in your drawing of the human figure will tend to get transferred to the output rather than fixed up, so all your practise from life drawing classes comes in useful.
I think it’s generally bad form for there to be a market on “what will be the most popular answer to this open online survey”. This creates incentives to brigade that will not just ruin the market, but also the survey. We should have learned this lesson from Boaty McBoatface.
"I've tried to crop some pictures of both types into unusual shapes"
Composition is an artistic choice. By cropping you introduce human input into the AI images and something uncanny into the human images. Even if humans aren't able to complete this task, they might be able to sniff out the AI images in a regular setting.
It's not as simple as comparing performance on the images that have not been cropped. Knowing some images might be cropped also changes the way we categorize the uncropped images.
Agreed. With the cropping, I couldn't get a gestalt impression of the art and had to fall back on looking for "tells"—which was in turn harder, because (especially with traditional painterly stuff) some details would be interpreted differently depending on where they fall in the composition.
I don't know much about art or about AI, but Good Art (tm) obviously has a subjective component and a technical (ie. much-less-subjective) component - and (I think) "artistic value" is a separate, nearly-orthogonal concept to "good art":
_______________________
Subjectively good art:
Literally by definition, if people can't tell the difference between human and AI art, the AI art must be just as good subjectively. If people can tell the difference today - well, it's pretty clear it won't be too many years* before they can't..
(* or decades or centuries - were're concerned with the general principle here, not the specific timescale)
_______________________
Objectively/technically good art:
Whatever the non-subjective, technical skills aspects of art are (I don't know much about art but I'd guess light/shade, perspective, [golden] ratio, choice of colours/palette, etc. etc.?), computers are much better at processing and applying complex technical rules (and at scoring candidate pictures according to technical rules, which, when you can produce lots of candidate pictures super-quickly, more-or-less amounts to the same thing in a sort of P vs. NP sorta way..)
_______________________
Artistic Value:
Very different to "good art"! If the *value* of art is in its *creativity/originality* [eg. the Mona Lisa has an artistic value of 10, perhaps the first copy of it - though entirely equally good in terms of the technical skills it exhibits and how it subjectively makes you feel - only has an artistic value of perhaps 1, and the tenth copy of perhaps 0.000001...] then how can an AI, which is literally just mindlessly aggregating/copying millions of existing pieces of art rather than doing anything creative/original, have any value?
WELL: what is a human artist doing? Human artists make art by combining their internal feelings/inspirations/moods/whatever (which is totally unpredictable and irreproducible and basically just a really complicated random-number generator) with the things they've learned from looking at and thinking about millions (or at least thousands) of other pieces of art; in other words, exactly what the AI is doing! Probably you could express this as something like, "artistic value = specific neural weightings generated by exposure to large quantities of art + RNG", in both the human and the AI case..
_______________________
(Also: "Fancy Car"?! FANCY CAR?! It's *OBVIOUSLY* a late-1980s Testarossa! How can Scott possibly not know this?! Besides, [to paraphrase P.G. Wodehouse] anybody who could describe a 1987 Ferrari Testarossa merely as a "fancy car" would probably be content to call the Taj Mahal "a pretty nifty tomb".....)
I think your last point about “fancy car” undercuts your first one “if people can’t tell the difference then it must be equally good”. I can’t tell the difference between the cars, and probably 90+% of the people who take this survey can’t. But a relevant target audience can.
Art has a target audience, and other people can educate themselves to be in that audience. If the target audience can’t tell the difference then it’s probably equally good. But if a bunch of internet randos can’t that just means we aren’t in the target audience.
My point about the Testarossa was, well, just me poking fun at myself for noticing and caring about such things (which is... really not a typical indicator of a cool or sexy or interesting person, here in 2024...) - sorry for subjecting you to my peculiar inwards-aimed sense of humour, there wasn't really any meaningful point there, and of course I entirely agree that if a person can't tell the difference between a thing made by A and a thing made by B then as far as that person is concerned A and B must be basically the same in terms of quality/value.
Artistic value is a subjective metric, since it's on the "ought" side of the is/ought divide, so the question of "what is art" (and also "what is good/valuable art") basically means "what would I want art to be". That's why I think that tying the inherent value of art to information processing is doomed to fail in the same way as trying to define the terminal values of humanity in terms of information theory. It's just too cargo cultish.
Human effort is a crucial part of art for me. Art is the deliberate application of effort in an anti-inductive direction that makes humanity special. It's the conversation that humanity is having with itself. It's half of the difference between a thriving utopia and a wirehead dystopia (the other half being the network of social interactions).
The aggregation and processing of previous art is neither necessary (see outsider art, or prehistoric people) nor sufficient. One of the distinguishing marks of trashy movies and bad fanfiction is that they seem to run mostly on genre cliches due to their creators' low engagement with reality. As Hayao Miyazaki said, the problem with modern anime is that it's made by people who can't stand looking at other people.
Thanks for your reply! Made me think a great deal!
I think I understand what you say about human effort: if I understand you right, you seem to be saying something like "I don't think human art necessarily makes a piece of art any better or worse, but it makes me glad to know that a human put lots of effort into the art; it makes me think positive thoughts about the human spirit and the potential of human civilisation - since I can only get this from human art and self-evidently not from AI art, then human art is far more valuable to me." - is that right?
If so, I don't really get this feeling with paintings - to me, the artistic value is whether it says anything new or original, whether it says/does things that another artist would not have been able to say in this way, whether it advances the state of the art (pun fully intended) - in which case, of course AIs can produce work of equal (and eventually presumably greater) value to humans (for clarity - I don't think they have yet, I just expect they shall eventually) - but I *do* get this (or at least something like this! - feeling with architecture: to look at a wonderful old Basilica or something (or, for that matter, something like Stonehenge) fills me with awe and wonder and pride on behalf of my species* that we had the passion, intelligence, skill, and gumption to design this thing without computers, build it without cranes and JCBs and things, making it all fit together, keeping the project going over how many years - and crucially making it so beautiful and complex and inspiring (pun also fully intended...) Whereas, the computer-designed, JCB-built Shard or Burj Khalifa or whatever ...doesn't exactly inspire me with anything, really!
(* or at least makes me temporarily forget about how greedy, selfish and cruel the human spirit really is..)
As for the aggregation and processing of previous art being unnecessary and insufficient, I can't tell what you're trying to say:
If you're saying it's unnecessary & insufficient to make you feel the way about AI art that you feel about human art (or that I feel about architecture), then sure, absolutely.
If you're saying it's unnecessary & insufficient to make pieces of AI art that are indistinguishable from the best human-produced art then (whilst I suspect this Turing Test will demonstrate that yes, this is currently the case) I would be frankly astonished if it's the case *in principle*.
If you're saying it's unnecessary and insufficient to make human art in the first place - well I would say that 1) it's not like the prehistoric cave painters or the outsider-artists didn't make their art by combining iterative learning plus training on a large sample (even if that sample was observation of the world around them rather than other people's art..) plus the incorporation of some basically random factor that's inside them that they can't understand, control, or predict, just the same as other artists and AIs; 2) that this kinda demonstrates that art is basically a pretty easy field of creativity (relatively-speaking): I can imagine that in principle you could get an outsider-art or prehistoric painting that's as good as the Mona Lisa** - but not an outsider-art cathedral or a prehistoric 1000-page novel; 3) if you did a "trained on large collection of paintings vs. made it all up from personal experience alone" sorta Turing test, I very much suspect that the former would be very easily distinguished and almost-universally more liked, suggesting that actually aggregating and processing art probably is kinda necessary after all...
(**Disclaimer: I'm not a Mona Lisa fan and I only really appreciate it for the way LDV - er, before his lateral move into designing lorries in the West Midlands - somehow managed to painted with an alpha channel, which is obvs. a purely technical-skill sorta thing; just using it here as a placeholder for "widely-appreciated piece of art"..)
What I'm saying is that I would not want to live in Nozick's experience machine. I would like art I see to be made by living, breathing people even if I'm not aware of the difference. I would like humanity in general to make and share lots of art even if I can't keep track of it all.
I'm sure that someone could set up a network of bots that simulate an artistic scene, set up exhibitions, influence each other, go through trends etc. They could even have personalities and biographies that would shine through their work. It would be very fun to watch, and maybe I'd get emotionally invested, like I get invested in the lives of fictional characters. But ultimately, it'd be like Bostrom's Disneyland without children.
As for "aggregation and processing of previous art being unnecessary and insufficient", I'm disagreeing with the idea that this is where the value of art lies, that this is basically all that a human artist is doing, and that their own contribution amounts to a random number generator. Even under the notion that a formal, mechanistic definition of valuable art is desirable (as I mentioned, I value it for being anti-inductive, free from formality), you don't get closer to it by neglecting the artist's experience, since empirically (IMO) art that lacks it is usually poor.
I think I understand most of what you're saying (though I can't exactly disagree; de gustibus non est disputandum!), but I'm having trouble with the idea/assumption that personal contribution isn't effectively a random number generator. It appears to me that "the thing inside us that makes our art uniquely ours" must be either:
1) Dependent on the experiences/life we've had - that we don't get to choose and can't understand or predict and is thus essentially random from our point of view. If you go back in time and give teenage Leonardo da Vinci a different maths teacher or something (but keep everything else the same) then the Mona Lisa looks verrrry slightly different; if you go back in time and transport baby Leonardo da Vinci to 1970s Bristol then "Mona Lisa" comes out as a drum'n'bass record. Or:
2) Dependent on minute changes in how our heads are made (eg. our genes, how our brains are wired-up, etc.) - that we don't get to choose and can't understand or predict and thus is essentially random from our point of view. If you drop Leo on the head as a baby or take all the lead out of his environment then you change the Mona Lisa. Or:
3) Absolutely not dependent on any external physical factor, but is somehow sending "signals" into the physical world from "elsewhere" - in which case we *definitely* don't get to choose it and *definitely* can't understand or predict it! If you could somehow scramble the alien space rays, the Mona Lisa comes out scrambled.
Like, if this trichotomy is valid (ie. some combination of these must be true) then I don't see how the random combination of unknowable and out-of-Leo's-control factors that made Leo different from some other artist is any different to the random factor added to your AI's training/learning/output/whatever that makes it generate interesting, beautiful (and presumably eventually indistinguishable from human output...) art? Or, if the trichotomy is invalid, I don't see what other possibilities I've missed?
Point 1 is the closest, but the scope should be wider - it's dependent on the aggregate experience of large chunks of humanity, which includes both the creator and the audience. You don't always know the details of the creator's life, but the things they have in common with you are what enables them to innovate in ways that resonate with you, rather than flail around randomly in the latent space.
Could a generative model trained exclusively on pre-1900 music invent Heavy Metal? Nothing in its training data suggest anyone would enjoy such a thing, and in 1900 probably few would. It was shaped by a twisting path from West Africa through the US to the UK. Some of the stations in the path were the integration of Black people in the US, the physics of electric pickups and amplifiers, the rise and fall of the hippie movement, and the physiological experience of moshing in a pit.
A sufficiently advanced generative model that runs on preexisting music and randomness alone could create new music that exists in *a* world, most likely not ours. It'd be fascinatingly alien, but it would not be relatable. Maybe you'd need an AI that also scans magazine articles, concert videos and social media, IDK.
In the examples here, the two most prominent parts of the human side of the frontier are concrete scenes, and stylistic rawness that hasn't yet been codified into an imitable style (like impressionistic brush strokes). This means that while it nails late 19th century-style nature landscapes, it hasn't yet reached medieval, expressionist and abstract art, or relatively raw 3d renders (whenever AI imitates 3d art it gives it an overly polished sheen).
And of course, anything that isn't a digital file is out of reach for image generators, so in the offline world painters aren't in danger of replacement yet. I'm not trying to be annoying, I think that the distinction between a painting and its digital copy is an underemphasized point.
I've been wondering if highly textured paintings - palette knife or whatever - will become more popular, as something that's unavailable with AI art. I think prints can be made with some texture, but not the really sharp peaks and whatnot.
That's a great summary! I was thinking along similar lines but you put it better.
I think that impressionism is also particularly easy since it is blurred in a way that makes it easy for AI to hide minor inconsistencies.
Also AI seems to be less conceptual, sometimes when I was on the edge I was thinking in terms of "does this composition make sense"? And got it right each time in these cases. It is harder to do when the image we have is cropped e.g. to a single person.
There's something poetic about the fact that the two most obviously human images were both blending masses of human figures.
This all reminds me of the time I visited the Museum of Fine Arts in Boston. Just an endless amount of completely uninteresting impressionist and Renaissance art. There was one exhibit that caught my eye though: the works of Hyman Bloom. They had a bunch of his photographs and graphite sketches of the woods of Maine, all of which were beautiful in their own right, but the best work they had displayed was Landscape, a giant oil painting of this vista of rotting trees. It just had this grand and intoxicating presence... I ended up taking a picture of it. https://i.imgur.com/b57Jjj9.jpeg Unfortunately, they couldn't display most of his works for reasons that will be obvious once you see them. I think Cadaver on a Table is my favorite; it's exactly what it says on the tin, so don't say I didn't warn you.
Someone has probably already done “Donald Trump being protected by US secret service agents” in the style of some appropriate Renaissance painter. (Caravaggio comes to mind).
I did a little better than I thought I would. I wasn't very confident and thought I'd get around 60%, but wound up at about 67%. And a few of the ones I got wrong were ones I was very confident about too. I'll be curious to see how the community as a whole did.
I was annoyed that the available ranges for "how do you think you did" were 50-60%, 60-70%, etc., because I thought I'd got about 60%. In fact, I got exactly 60% :-).
Of the six that surprised me significantly, five were surprises of the form "AI is better at X than I thought it was". The sixth was a human artwork that I'd initially confidently put down as human and changed my mind about on looking more closer, because of anatomically ridiculous fingers. I should have held my ground on that one.
(The surprises: rot13(reho, terra uvyyf, yrnsl ynar, zhfphyne zna, evirefvqr pnsr, fgvyy yvsr). I was fairly confidently wrong about all of those except that rot13(zhfphyne zna, nf zragvbarq nobir, V vavgvnyyl pbeerpgyl gubhtug jnf fheryl uhzna, naq fgvyy yvsr V jnf qbhogshy nobhg orpnhfr gur evtug-unaq pnaqyr-qevc ybbxrq vzcynhfvoyr).)
Main takeaway is that rot13(NV vf cerggl tbbq ng vzcerffvbavfz abj).
[EDITED to add: oops, the one I thought I mis-corrected from human to AI _was_ in fact AI, so that wasn't a mis-correction.]
zhfphyne zna is obviously AI if you look at the background, though it is easy to get distracted from that by how good the faces are. (Also, the hands look substandard, but that's no longer a reliable sign.)
Wait, which one is the human artwork? zhfphyne zna is AI.
Yes, we can consider impressionism to have been cracked as far as the man on the street is concerned. It would be good to have an expert onboard telling us how to distinguish AI impressionism from actual impressionism.
I submitted the form before checking, but I think I got 33/50 and guessed 70% - 80% certainty. I'm fairly ignorant of art in general so I didn't recognize anything here. I also dabble in AI image gen frequently so I know a lot of the telltale signs. I found the images that were more abstract and lacking detail to be the most difficult. They could really go either way and aren't exact enough to determine whether things are wrong or just a deliberate stylistic choice.
This was awesome. I recall seeing a tweet about how "AI art was so easy to identify," to which I responded with plane_with_red_dots.jpg (https://en.wikipedia.org/wiki/Survivorship_bias). Hopefully this survey will convince people that AI art is not at all obvious (and will only become harder to identify in the future).
Wait, do you mean the new addition? I was on the fence about that one precisely for the opposite reasons - because I did not know whether it was normal for a painter of the time to leave out the details it left out. It's very modern in some ways; I was confused because the dress/furniture/hairdo seemed to be from the period it's actually from.
I think that somehow AI models are really good at impressionism ... I'm not saying it's aliens but ... perhaps Monet was an alien AI?
I think it is because impressionism sort of hides the clues that give away some other AI art, it is kind of dream-like the way a lot of AI art is, with many objects and details which almost but not quite match.
I think (without having actually run the analysis to be 100% sure) that strongly impressionist paintings have vastly less information, mathematically speaking, than similarly sized non-impressionist works. The lack of low-frequency components almost forces this to be true. Which means we're simply working with less data for distinguishing them.
I was very well calibrated, but I incorrectly judged several human pieces as AI. Now looking at them more closely, I'm realizing they're not AI... just.... badly proportioned or poorly detailed.
Got the majority of the first ones wrong while the last 6 I got completely right (but with low confidence). I'm not an art dude, and I tried very hard to just give an initial impression for the first ones as instructed.
A lot of the AI ones look really convincing on first glance but any inspection of the details gives them away. On the other hand, I said all of these pieces looked bad or like imitations and several are famous pieces so maybe I just don't like visual art.
Really? I thought two of the religious pieces were master-level (I'm an atheist, so I consider myself unbiased in that way). They were both human (phew), though I can see why Scott has wickedly chosen them.
(I think with the all-caps warning, and Scott's rot13'd answer key taking the top comment spot, we should be able to freely discuss down here; I do not want to have to un-rot13 anything other than the answer key).
Whoa! I am surprised Giant Ship was human. I had chosen that as my favorite, and thought of it as a perfect example of what's cool about AI art: dreamlike scenes with lots of detail.
I thought they were all AI, but decided I should play along and choose human for some. I wonder if I would have done better if I had fully believed it was 50/50... probably not,
Giant Ship fooled me too. Especially the upside down towers and the over the top detail you mention. But I guess it was a bit too perfect for AI and ... repetitive. AI tends to make small variations on similar objects even when it makes little sense.
I did not like the image but I did vote it as the one where I was most sure that it was AI :D
1) The rigging was all perfectly symmetrical. If there were 11 vertical ropes on the port rigging, then there were 11 on the starboard rigging. Current AI art can't even count to 5 fingers; it definitely can't count to 11 ropes, repeatedly.
2) All the background ships were a radically different style from the center one, which would be very challenging to prompt an AI to do.
V jnf 100% fher gur pne jnf NV. Gur fvqr zveebe, gur evz yvtugf, gur jurryf ng qvssrerag urvtugf, gur sebag yvtugf. Abguvat nobhg gung pne ybbxrq erzbgryl cynhfvoyr. V nz fgvyy pbasvqrag gung Fpbgg cvpxrq na rkvfgvat NV vzntr sbe gung bar naq zvfynoryrq vg uhzna.
The answer key in the comments inclines me to not trust the results of the poll. I think it would have been better to not post the answers until after the form is closed.
Cheating this test is as simple as doing a google image search. Not cheating is entirely a voluntary agreement. Posting the answer key provides very little help, and since it's ROT13'd, you'd still have to make a conscious choice to cheat.
I wonder, there were a surprising number of AI images with the classic finger issues, among others. Is that still state of the art, or a deliberate inclusion to make a representative sample of AI art?
Did fairly poorly I think, but am fine with that as I don't much care for most of these art styles to begin with; it's easy to fake an image the eyes endeavor to veer away from.
Was extremely disappointed, however, that I missed on an image I'd already seen elsewhere. I forgot the context and guessed wrong.
I will say "picture" is somewhat vague; there were a couple I spent a while debating whether it was an AI image, or a photograph. I'm not sure whether or not that was supposed to be in the scope.
Got 61% (30/49) using the “is this utter garbage + is there and insanely low taste/technique ratio heuristic.
I’m curious what would happen if you compiled this test but it was like… good artists only, though taste is subjective.
I’m quite surprised Mediterranean Town and Rooftops are “AI” in that yes they do the “mix 3 styles together” thing but it actually fits well enough I assumed it was intentional.
Bucolic Scene, Fancy Car, Greek Temple, and Giant Ship are prime examples of “the taste to technique ratio here is so skewed I don’t even want to start comprehending the amount of trauma these humans went through”
I must admit that, as a diletant that doesn't appreciate art, this test has me utterly shocked at how bad human generated art can be, and a lot more bullish on the "unless I have a particularly good artist I like, I should always comission a thing from an AI" stance
... what? Fancy Car and Giant Ship I agree, Bucolic Scene is OK as an example of its genre, but I thought Greek Temple was good stuff (and it is indeed by one of the greats, who incidentally sometimes perpetrated very bad anatomy, but not here).
I looked at Fancy Car and Giant Ship for a while because they seemed too good to be AI, but I marked them as AI anyway because surely a human wouldn't spend that much time on drawing these
72% correct guesses, and I guessed that I was 60-70% accurate. Abstract art feels like cheating here though, I have no way of telling whether a human or AI assembled this random jumble of ugly shapes and I wouldn't care either way.
I got all 6 abstract art pieces correct. I think I did better on abstract art than on any other category. Here's how I could tell:
1. Look at the thin lines and small dots. AI has a tendency to blur lines/dots in a way that's not possible with physical paint. (rot13) Va "Natel Pebffrf" (NV) ybbx ng ubj gur guva yvarf ner oyheel naq unir vapbafvfgrag yratguf naq guvpxarffrf, naq va "Senpgherq Ynql" (NV) abgvpr ubj va gur jvaqbj-ybbxvat ovgf, gur juvgr naq qnex cnegf xvaq bs oyraq gbtrgure. Jurernf va "Perrcl Fxhyy" (uhzna) gur fgvgpu znexf ner jryy-qrsvarq naq unir pbafvfgrag nccrnenaprf.
2. Look at the watermark. AI always puts the watermark in the corner, and usually the watermark is not an actual word. (rot13) "Checyr Fdhnerf (uhzna) unf n jngreznex va gur 2aq fdhner sebz gur evtug ba gur gbc. Na NV jbhyq unir chg gur jngreznex va gur pbeare. Naq vg'f xvaq bs vyyrtvoyr ohg vg ybbxf gb zr yvxr npghny yrggref. Va "Juvgr Oybo" (uhzna), gur jngreznex vf va gur zvqqyr bs gur vzntr naq vg'f uvtuyl fglyvmrq.
Yeah, I think the thing about physical paint is really important. For human artists working in physical media, the texture of that medium becomes a part of their art, and a part of the visual language that they're using to communicate. In theory, AI could reproduce that, but it's likely to mess it up if it doesn't have experience of how the physical paint runs.
Anecdotally, I've gotten a lot of compliments on the art in my latest post, which is AI generated. Makes me wonder if people would be as enthusiastic if they knew that.
“Even in his portraits, Ingres did not always stick to strict realism. For example, his 1856 Portrait of Madame Moitessier may seem inoffensive until you look at the subject’s hands. They seem unnaturally shaped, as if they’re made of rubber, fingers bending this way and that.”
I didn't bother fully scoring myself but I did check the ones I was most confident on (Fgevat Qbyy, Napvrag Tngr, Zvanerg Obng, Qbhoyr Fgnefuvc, Zhfphyne Zna), and was relieved to see that at least I got all of those right. The approach I take to recognising AI art tends to rely on things it often gets wrong, but there's less it always gets wrong, so it's harder to be confident something is human-made than AI-made.
I think a more interesting AI Turing test would be for _very_ specific prompts and comparing how close the AI gets it vs how close the humans get it.
Compare an original New Yorker cover vs the AI version of it (I've included the prompt I've used): https://imgur.com/a/XMEzvPT. It's clear that the AI did an OK job but still not enough to replace the human illustrator. So it would be interesting to have a 1:1 competition of human vs. AI, having people vote which one did a better job. The true Turing test is whether The New Yorker can fire their illustrators and just pay $10/month to Midjourney.
I got 38/50 right. I had a pretty hard time picking which one I thought was "most human" because I could see potential AI signs in all of them, but turns out that out of all the ones I marked as human, every single one was human. So I systematically overestimated AI art's capabilities.
Edited to add: Actually I'm not sure I'm overestimating AI's capabilities, I just wasn't using priors in the correct way. I marked an image as AI if it looked like it *could* be AI, but what I should have done is marked it as AI if it looked more AI-esque than the median picture in the survey. For example (rot13) V'ir frra NV vzntrf gung ybbxrq n ybg yvxr gur oyhr navzr tvey, V guvax vg gbgnyyl pbhyq unir orra znqr ol NV, ohg gurer jnfa'g nalguvat gb fcrpvsvpnyyl vaqvpngr gung vg jnf NV. Naq gur oynpx navzr tvey jnf irel boivbhfyl NV.
I marked a few human pictures as AI due to incorrect details, for example (rot13) va "Ohpbyvp Fprar", gur ovyybj bs gur lryybj qerff ybbxf haangheny, yvxr vg ybbxf yvxr gur obggbz bs gur qerff vf fgvpxvat hc bire ure srrg, ohg onfrq ba ubj fur'f fgnaqvat, gurer'f ab jnl ure srrg pbhyq or gurer. V gbbx gung nf gur NV abg haqrefgnaqvat nangbzl, ohg nccneragyl n uhzna qerj vg yvxr gung sbe fbzr ernfba. Naq va "Frerar Evire", gur ybar gerr'f ersyrpgvba vf pyrneyl jebat.
See, I marked "Frerar Evire" immediately as human, simply because I'd seen impressionistic paintings where reflections are painted in that way. It's a stylistic quirk. At the same time, AI can copy quirks...
I think I got impressionist paintings wrong the most because they often get minor details wrong. I kind of knew that (that's why they're called "impressionist", right?) but I figured real paintings would have *blurry* details, not precisely-incorrect details.
42 out of... how is everyone getting 49? There are 50 pictures. 84%. My biggest regrets:
Gebcvpny Tneqra - V jnf irel ba gur srapr ohg vapbeerpgyl qrpvqrq na negvfg jbhyq unir orra zber pbafvfgrag jvgu pbhagvat gur fgebxrf va gur erq gvyrf.
You are not alone. I was guessing that some digital artists are now influenced by AI (seems insane), but I'm told (see above) that this sort of thing was already a thing. I still think that whoever made this image had seen too much AI art.
I found the artist who produced Zhfphyne Zna, and this may be one of his "digital" works. So, you may be right that it was AI-generated. How sure are we that Scott did his homework on all these?
BTW, Did you mean Tvnag Fuvc or Zhfphyne Zna? To my mind, Tvnag Fuvc is so slickly graphic-artsy that it would be impossible for me to decide whether it was done by a human or AI. I got Zhfphyne Zna below it wrong, but then I looked more closely at the details and I realized the clues that I had missed by clicking through too quickly.
Hey, we weren't supposed to give the answers away! ;-)
I clicked through Zhfphyne Zna quickly because I started to bored about halfway through. Yes, it is definitely AI — but I didn't look carefully enough at the details the first time through. Do'h!
If we're talking about "Giant Ship", I looked it up in Google Images. The image came up as "Victorian Megaship" by Mitchell Stuart. He's a fairly successful Sci-Fi/Fantasy illustrator/artist. You can find the image in the top row of images under his Concept Art section. Unless Mitchell Stuart is also your ACX-reading AI hobbyist, the person who gave it to you stole the image from Mitchell Stuart.
What I found most interesting is that all the anime, science fictiony, and simplistic graphic art generally defeated me. But AI is scraping lots of graphic images, so I would have no clue whether any of the anime girls were created by someone using Adobe Illustrator, pen and ink and airbrush, or AI. Likewise, how would I know if say a human or AI created "Fancy Car"? It's a slick piece of graphic art without any clues in the image as to the technique used to create it—i.e. it could be an actual photo of a futuristic model car that was shoved through Photoshop, or it could have just as easily been produced by AI. And if it were produced by Photoshop, isn't there an element of AI in Photoshop now?
Yes, some of the tools in Photoshop use AI, which makes them somewhat more efficient, though not miraculously so. The most AI thing going on in Photoshop is that you can remove an object from a picture -- say a telephone pole from an otherwise purely natural landscape -- and an AI part of Photoshop will replace it with stuff that matches the blank spot's surroundings. You can also extend a scene by having the AI feature add more seashore or whatever onto the sides of the image you started with. You can also put in a prompt: "In the area I selected, remove what's there and put a redheaded teenage guy in the space." Even without AI Photoshop is excellent at polishing images -- things like the fancy car -- in different ways to make them pop.
My guess is that they would do a lot worse than humans. If that weren't the case they'd be able to generate more human-like images themselves using something like the more classical GANs with one LLM acting as an "art critic" discarding generated images of a generative model which has to pass its test before the final image is released. But I am not up to date with how the newest genAI models work so it might be that something like this is already employed.
I asked Claude about all 50 images, in batches of 5, then gave it the answers and had it summarize how it did:
> Thank you for providing the correct answers. I'll calculate the results based on my previous responses and the actual answers you've given.
>
> Said human that were AI (False Negatives): 12
> Said human that were human (True Positives): 18
> Said AI that were human (False Positives): 7
> Said AI that were AI (True Negatives): 13
>
> Total correct (True Positives + True Negatives): 31 out of 50
> Accuracy: 62%
> This result shows that I had some difficulty distinguishing between AI-generated and human-created images in this test, particularly in identifying AI-generated images as such. It's a valuable reminder of how sophisticated AI-generated art has become and how challenging it can be to differentiate it from human-created art in many cases.
That beats my 54%. It only recognized two of the classical paintings by name, so likely only around +2% expected score from art memorization.
So embarrassed I fell for the Wyeth trap. (Somebody needs to confess here before everybody insists they never would have been taken in by something so radically and obviously different.) Eyewitness memory research rings true, go figure.
Also in the form comments I went on a long rant about how sad I was that ai couldn't yet do one of the ones I loved the most, only to find it was made by AI, in the most cliched twist ending possible.
I am apparently the exact target here. This is my thoroughly deflated hubris. Enjoy and very well played!
On the main body of the survey I was correct 58% of the time, but on the "analyze these pictures more deeply" section I got 6/6 correct even though my median confidence was 65%. Interesting to see that taking time and thinking deliberately made such a large difference.
I'll be very interested to see if this replicates in the full data set.
75%. I was mostly fooled by the impressionist and classical ones, and on second look some of the classical ones I marked as human had obvious tells that I missed. I got all the ones where I was most confident correct, and this included ones that I was confident were human, and confident were AI.
I found the classical art selections very odd; having seen a number of these pieces in person it seems really bizarre to compare digital thumbnails of them to actual digital art - the pictures of "classical" art work are signifiers of the works, versus the digital art which is the work itself.
This made it very weird and it felt not like a real test - I would have much preferred all digital art because it would have then felt like everything was on an equal semiotic par.
Scott's title probably isn't fully serious, but I'd like to point out that in order to judge this test well people need to be familiar with art of different kinds & from different eras, and there probably are not that many people who have broad enough familiarity with art to distinguish AI from the real deal for all the different kinds of art represented here. Here's an extreme case of the same problem: asking people to judge whether some code is written by a person or is AI-generated. Someone like me, who can't code, is effectively blind when it comes to looking at code, and could not recognize even very obvious instances of typical AI errors or of forms of errors and weirdness that just don't seem human. There are lesser versions of the same thing going on with judging whether an image is by a real artist or by AI. For example, I am not very familiar with Medieval art, mostly because the examples I have seem are ugly as hell, People look malformed, the colors are murky, nothing gives a sense of things being in motion -- a picture of people dancing looks like a picture of people standing in a circle. Because I find it hideous, I have looked at so little Medieval art that I am not in a position to judge whether something looks like the real deal.
I have an idea for a fun test: What if people here, and several AI's, all try to imitate Scott Alexander? Everyone here is familiar with Scott's style of thought and writing, so is in a position create a sample for "Turing Test" on a passage purported by be by Scott. We could write a couple paragraphs on a specified topic. The AI could be given multiple examples of Scott's writing, then asked to generate 2 paragraphs making the same point.
Actually the Turing test (sort of) could be done using just AI imitators of Scott, but doesn't having the rest of us toss in some material too sound fun?
Harder than expected. It’s AI if it’s too perfect, yet some human output is extremely banal. Also, what about AI art that is edited by humans and human art that is tinkered with by AI? There should be some type of scale.
> I've tried to crop some pictures of both types into unusual shapes, so it won't be as easy as "everything that's in DALL-E's default aspect ratio is AI".
On one hand this is fair enough, since you told us about it and did it for both types. But I think it makes the test harder (beyond just removing the cheap aspect-ratio based way of recognising AI images) because composition is an important element of visual art, and if you're chopping up the images in arbitrary ways then you're confusing our "something's off here" sensors.
Like, even for Saint In Mountains, where the cropping is arguably fairly subtle, it gives the image a completely different feel. (The colours are also very different from those in the version on Wikipedia; I know reproductions can vary, monitors are imperfect etc., but from a quick search it seems like the version on the quiz is an outlier in terms of how brown/red it is.)
At first glance, I thought Saint in Mountains was probably AI, and the composition definitely played a role. Particularly the water and boat in the background made no compositional sense, and I have seen AI do weird stuff like that before. I did eventually change my mind on this one before submitting, but if the whole painting had been present I would have leaned much more quickly towards "human."
64% correct for me, which is about what I expected to get initially—though I forget whether I answered 60–70% on the expected-correctness question or whether I had a brief fit of self-censure and second-guessed myself down to 50–60%. There's definitely a few that I played myself on, including thinking that one that I'd initially classed as very DALL·E-esque was human due to… well, I'm not sure, it seemed a bit more earnest than usual I suppose.
Also I *just* got the pun in the name “DALL·E”. Goddammit.
I wish the form had presented a grade after I submitted my answers. I don't recall what I chose for all 50, and having spent 20 minutes on it, I can't justify spending more time trying to find where Google kept my submission and manually cross-checking it to our host's list.
Does anyone know what the rules are supposed to be for repeated Turing tests? If humans are initially fooled by a computer, then figure it out and can reliably distinguish it from other humans, then has it still passed the Turing test?
I think there's no canonical answer; the common usage of 'Turing test' has already diverged a bit from Turing's original Imitation Game (https://academic.oup.com/mind/article/LIX/236/433/986238), so it's probably not worth trying to define exactly what it means to 'pass the Turing test'.
I went into this expecting to do pretty poorly, because I'm not much of an art guy at all. I was originally planning to use gestalt impressions, but once I got started I found more subtle errors and anomalies than I expected, and ended up mostly relying on that.
I didn't even try to answer the super weird/abstract/impressionistic ones, because from my perspective they're full of random noise and they could look like basically anything. My model for what these "should" look like isn't strong enough to provide meaningful constraints.
Here are my answers, and some observations they were based on. (This is NOT an answer key; I got many of these wrong.)
Angel Woman - AI - Angel's empty hand looks a lot like the birds; seemed like pattern recognition cranked up a little too high
Saint in Mountains - Human
Blue Hair Anime Girl - AI - Downward arm looks too long with too many joints; hands conveniently hidden; hat not in resting position
Praying in Garden - AI - Bottom right guy seems inexplicably bloody; water spout on left seems insufficient to feed river
Tropical Garden - AI - Leaves seem to sprout from building on right
Ancient Gate - Human
Green Hills - AI - Obvious brownish line seems too narrow to be a footpath; what is it?
Bucolic Scene - Human
Anime Girl In Black - AI - Spent a lot of time analyzing conflicting vibes on this one before noticing there's an anomaly in her hair below the ear
Fancy Car - Human
Greek Temple - Human
String Doll - AI - Right hand middle finger seems to split near fingertip
Angry Crosses - ????
Rainbow Girl - Human
Creepy Skull - ????
Leafy Lane - AI - Path terminates in weird way in the distance
Ice Princess - AI - Anomaly below girl's left eye (viewer's right). Also asymmetrical earring seems odd and face gives me vaguely AI vibes.
Celestial Display - Human
Mother and Child - AI - Anomaly at crook of mother's right thumb
Fractured Lady - ????
Giant Ship - AI - Upside-down domes and spires; weird flower-like protrusions on right; level of detail is not impossible for a human but seems extreme
Muscular Man - AI - Front guy's left hand and back guy's right hand both look weird
Minaret Boat - Human - Boat seems slightly aground at the back but eh
Purple Squares - ????
People Sitting - AI - Girls sitting on table; table melds into back wall; weird vertical plank in front of picture frame
Girl in White - Human - Dunno what those straps are hanging off the far side of the girl's canvas but eh
Riverside Cafe - Human
Serene River - AI - That is a wacky tree alone in center background
Turtle House - AI - Lantern in water on bottom right (the flame inside it looks very similar to the magic floating lights in the rest of the scene, but only this one is surrounded by an incongruous lantern frame)
Still Life - AI - Hovering wax drips
Wounded Christ - Human
White Blob - ????
Weird Bird - Human
Ominous Ruin - AI - Second column from the right aligns with its neighbors on the top but not on the bottom
Vague Figures - ????
Dragon Lady - Human
White Flag - Human
Woman Unicorn - Human
Rooftops - AI - Some of the birds look like ink splatters, as if the artist isn't sure which one they're drawing
Paris Scene - Human
Pretty Lake - Human 65% - Reflections seem a bit off, which I think is a more likely mistake for a human than an AI. But also because I've been picking human any time I can't find a reason not to.
Landing Craft - AI 80% - Weird gun-thing emerging off-center from front window, weird thing coming out back of craft, background clouds seem to meld into smoke plume at right
Flailing Limbs - ???? 50% - Looks intentionally crazy. I can't tell what it ought to look like, so I have no basis for judgment.
Colorful Town - Human 55% - Not detailed enough to provide much evidence, but I've been defaulting to human and still picked AI more often, so I will continue to do so
Mediterranean Town - AI 60% - A bunch of small details look subtly wonky
Punk Robot - ???? 50% - Too abstract to tell
My score: 25 right, 16 wrong, 9 declined to guess. If you give me 50% for the ones I didn't guess then my overall accuracy is 59%, which is pretty close to what I expected. If you only grade the ones where I _expected_ to do better-than-chance then I got 61%.
The picture I said I was most confident was AI turned out to be human. In hindsight, I should have picked a picture with a single tiny-but-indefensible glitch instead of a picture with multiple bizarre-but-visually-continuous features.
The picture I said I was most confident was human, was human.
My favorite picture was AI, and I incorrectly guessed it was human.
Several other commenters seem to have had an easy time with certain pictures I got wrong, by drawing on their experience with popular AI generators and/or famous historical art. I did not have their experience.
I thought Bouguereau, too — or one of his ilk — late 19th-century academic artists. What gives it away is the abstract pattern in the background. Bouguereau would have painted a cloth of embroidered flowers and not random dots of color.
It also has a story to tell! She's inclining her head towards her son and that saintly thing stays in place. That shouldn't happen with those saintly things. And the son doesn't have a halo, and they're both drawn very human, like the picture is saying: you thought they're saints? They're just human and that's just a thing on the wall. "Her love is what matters".
Girl in field - thought the problem with this one was the placement of flowers and other elements of the background that didn't fit a coherent composition. Flowers on the path? In the middle of the grass, not on any stalk at all?
The human operating the AI generator and choosing among its outputs probably cares.
I think I also heard something about people using descriptors in the generator to try and prevent bad hands, and the AI sometimes fulfilling "no bad hands" by not having hands at all.
Let's be clear: this is not a "human artist vs. AI" Turing test, this is a "human artist using paint on canvas (or other flat-media) vs. human artist using AI generative model and multiple iterations of prompt + curation". As evidence, here's Scott's note on the provenance of the images used in the test:
"All the human pictures are by specific artists who deserve credit (and all the AI pictures are by specific prompters/AI art hobbyists who also deserve credit)"
Personally, I've always believed that human curation introduced human intention, like it would with photography. But I've had so many artists tell me otherwise that it seems like a steelman to allow curated AI works under the clear label of "AI Art."
I wonder if AI art can do a decent image of a Rube Goldberg machine. The implied physics, the directionality, and non local interactions seem like they’d be hard for a diffusion model.
It would have been interesting to have a question at the start about how many people expected to get right paired with the one at the end. I think that might be a better way to measure how much harder/easier it was than people expected vs asking directly.
WAIT: This poll and experiment by Scott is not a clean-cut dichotomy between human and AI. I'm not even sure if a suitable gradient can be described.
A majority of the source images used by AI were generated by humans. AI is basically painting with human stuff. On the other hand, so-called human-credited art uses computer-assisted tools and techniques, sometimes to the extreme. Using a 3D art program is an example: Construct a simple bird model, or purchase one. Add a million random points into a scene and then have the program link a bird to each point. Select a backdrop of a sky from the programs assets. Have the program generate an image of the flock of birds in the sky. The final image is the work of the computer program, but the human gets credit. In lieu of "prompts" are instructional steps. There's not much difference between AI image generation which requires human prompts, and human-created art that requires sophisticated computer programming.
I agree. A more valid art Turing test would be using real and AI-generated paintings—Renaissance up through Abstract Expressionism. At least with the real paintings, we could be sure that there was no AI involved.
Here are the notes I made while judging them, unedited except to fill in Scott's answers (lowercase single letters). In hindsight I'm slightly embarrassed to have pontificated on how an impressionist artist was clearly thinking about real buildings rather than coloured blobs, but here it is, warts and all.
Contrary to instructions, I zoomed in on all of them, because that's what I'd normally do when deciding "is it or isn't it?"
Apologies to the human artists whose feelings I may have slighted.
1. Angel Woman.
AI. Obviously AI and pretty low-effort.
h
2. Saint in Mountains.
Human. The details make sense. Except, now I see that cross in the distance with some sort of triangular fabric hanging from it, I'm a little unsure.
h
3. Blue Hair Anime Girl
AI. What on earth is wrong with her left arm? And her hat isn't quite on her head. I could believe it human otherwise.
h
4. Girl in Field.
Human. Her posture looks a little stiff, but that's the sort of error a human artist would make (or his model), not an AI.
a
5. Double Starship
Human. A good piece of sci-fi cover art. The details hold up under close inspection.
h
6. Bright Jumble Woman
Human. These abstract images are difficult to judge. I'm looking at the consistency of line across the picture. I don't know what you'd prompt an AI with to get an image like this.
a
7. Cherub
AI. It's pretty good, but the crease in the left elbow isn't right, and its right wing (left in the image) doesn't meet the back properly. And that wing seems to have a thumb on its leading edge.
a
8. Praying in Garden
Human. Details, composition, story-telling, they all hold up under inspection.
h
9. Tropical Garden.
Human. The style is not to my taste, but I see a consistency of execution throughout.
h
10. Ancient Gate.
AI. There are probably a few artists in the world who could create this staggering quantity of detail, everywhere repeating but nowhere exactly the same, but an AI can churn it out effortlessly.
a
11. Green Hills
Human. A very consistent pointilliste style. Even the distant houses are clearly that, not vague distortions. The artist was thinking about real houses, not imitating blobs representing houses barely seen.
a
12. Bucolic Scene.
Human. The legs of the humans and the cattle are right.
h
13. Anime Girl in Black
Human. Because I want her to have a soul. I can see the incoherence of her belt buckle close up, but I want to believe. I know how irrational that is. My favorite image.
a
14. Fancy Car.
Human. Obviously made in a 3D modelling program, with some standard effects applied: motion blur, reflectance (unconvincing) from the damp road, glowing wheels, greebling on the front and side. Not professional quality. An AI prompted to make this would make something much more realistic, and no-one would have reason to instruct it otherwise.
h
15. Greek Temple
Human. I wavered on this, but all the details I initially identified as not making sense, I eventually realised did make sense. I'm not entirely convinced, but on balance this is where my vote goes.
h
16. String Doll
Obviously AI. It just has that look about it. Dodgy fingers too. And that strand that looks like she's sucking a noodle! A human artist wouldn't make that sort of accidental alignment.
a
17. Angry Crosses
Human. Another abstract, difficult to judge. I decided human because why would anyone try to get an AI to do this?
a
18. Rainbow Girl
Human. Digital medium, obviously, but I'm not seeing the usual AI tells.
h
19. Creepy Skull
AI, but who cares, really? I'm not interested in this picture, however it was created.
h
20. Leafy Lane
Human. No more to be said.
a
21. Ice Princess
AI. You're very nice, Ice Princess, but I don't believe in you.
a
22. Celestial Display
AI. The figure leaning against the tree doesn't convince me. Neither do the random sparkles on the shore and near waters.
h
23. Mother and Child
Human. Everything works.
a
24. Fractured Lady
AI. Another difficult abstract. AI, on balance.
a
25. Giant Ship
AI. Like Ancient Gate, this has a staggering amount of detail. And I found somewhere in there a dome that does not seem to sit properly on the structure supporting it.
h
26. Muscular Man
AI. Fingers.
a
27. Minaret Boat
AI. Like Ancient Gate and Giant Ship, the detail is enormous, but it breaks up on close inspection (e.g. the window bars) in a way that I think is not just because of the resolution of the image.
a
28. Purple Squares
AI. All of these abstracts are difficult to judge, but this one seems to have nothing behind it.
h
29. People Sitting
Human. Presumably not a Vermeer, which would be too well-known, but an artist of that time and place. Of course, an AI can be told to make an image in that style, but the details here are all convincing.
h
30. Girl in white
This picture was missing when I took the test. Scott later posted a link to it. I saw the answer before the image, so these comments are made knowing that it's AI. [ETA: I misread Scott: this picture is by a human.]
I find this very convincing except for a few details. She has a single ornamental stick in her hair, and I would expect there to be a pair. Her eyes don't match. Her right eye is looking straight at the viewer, but cover it up and her left is pointing oddly upward. The sleeve of her dress is too close-fitting. In the distance, the gentleman's left arm is a bit strange.
Despite that, I would probably have judged this human. I can read a whole narrative into it. She is the spurned lover of the gentleman in the distance. She gazes with sorrow at his new dalliance through a broken window in her poor rooms across the street from his grand palazzo. Perhaps she is painting a picture of herself and him together, to be prominently displayed somewhere to accuse him?
31. Riverside Cafe
AI. Why would a human artist bother to paint an imitation of Van Gogh's famous picture? And the legs of the chairs taper off unnaturally. AI has just as much trouble with furniture legs as animal legs.
a
32. Serene River
Human. That isolated bare tree in the middle is odd, but it's the only thing that I find suspicious here.
h
33. Turtle House
AI. I'm not at all sure though. At first glance it definitely has the AI style, but I could not find any other definite tells.
a
34. Still Life
AI. I can believe the fruit, but not the candle drips.
a
35. Wounded Christ
Human. Good hands.
h
36. White Blob
AI. Ok, it could be Miro, but I'm guessing not.
h
37. Weird Bird
AI. There is a type of line work that is typical of AI and I'm seeing it here in the columns and in the greenery at lower right.
a
38. Ominous Ruin
AI. The temple has bad legs.
a
39. Vague Figures
AI. Too vague.
h
40. Dragon Lady
AI. The combination of high saturation, contrast, and detail is characteristic of a lot of AI images. This might just reflect the sort of pictures that people using AI want to make, but AI it is. And the dragon's wings are different sizes.
a
41. White Flag
AI. Too many details making not quite enough sense.
h
42. Woman Unicorn
Human. Enough details making quite enough sense.
h
43. Rooftops
AI. Not at all definite about this, but on balance I'll say it's AI.
a
44. City Street/Paris Scene
Human. The consistency of the perspective all along the street catches my eye here.
a
45. Pretty Lake
Human. None of the usual tells, although I'm a little suspicious of the mud cracks on the path in the foreground.
a
46. Landing Craft
AI. A good piece of 50s/60s sci-fi cover art, but the details give it away. The way the right arm of the man on the right tails off. The placement of the feet of the vehicle. The random sparkles in the clouds at the right, and above them the vague aircrafty thing. The impossible moon that seems to be just half a mile away.
a
47. Flailing Limbs
Human. Why use an AI to make this? I don't care what it is.
h
48. Colorful Town
Human. Even the small human figures and the cart make sense.
h
49. Mediterranean Town
Human. It's a very striking image, a Mediterranean view in the style of de Chirico, who I'm sure never painted anything so bright. Clean execution, no AI tells.
a
50. Punk Robot
AI. Another difficult abstract. But the eyes don't match, and the teeth don't look right. I don't know what "right" would be, but these aren't.
Interesting. I made the exact same error on Angel Woman. I had it marked as obviously AI. Apparently, some human digital artists don't study anatomy before getting that good at the fancy bits of the picture.
White Flag was actually my pick for most obviously human. The biggest tell there, which I'm convinced no current AI would think of, is the woman in front with a kid on her back, and her arm wrapped behind her to support the baby as she's bent over and struggling forward. That's a posture I've adopted with a kid wrapped on my back and so is very real-feeling, and I don't think you could prompt AI to put her arm behind her back without generating monstrosities.
Angel Woman was the one I marked as most obviously human. The soldiers had Imperial Guard helmets and the guardsmen, angel, and cherubim all followed the 40k setting style. I would have expected the cherubim to either be generically angelic or generically monstrous depending on prompt if it were AI: it can't handle making a specifically 40k cherubim while focusing mostly on the guards and angel.
Maybe it's cheating but I recognized 3 of the paintings right off the bat.
#2 Saint in the Mountains is a famous painting of St Anthony by the Osservanza Master. I immediately marked it as human. But It's a really shitty reproduction — and I wondered if it might not be AI recreating a famous old master painting.
#8 Praying in the Garden is also a famous painting. I recognized it, but I couldn't remember the artist (I just looked it up and it's Agony In The Garden by Andrea Mantegna)
#30 Girl in White is a famous painting in the collection of the Met. Originally it was thought to have been painted by Jacques Louis David, but there was a big art history whoop-de-doo when it was identified as a work by a less-famous painter, Marie-Denise Villers. It's one of my favorite paintings from the late 18th-early 19th Century.
I picked "Girl in White" as my favorite, and it was my second choice for most obviously human. The particular drama of the image, with the couple in the background and the girl looking like we have caught her spying on them, didn't seem like the sort of thing AI can manage at the moment, no matter how carefully you tune the prompts.
But I picked "Greek Temple" as most obviously human, since I didn't think AI would be able to (i) cut and paste those individual portraits in different historical styles, like a 19th-century photoshop job, or (ii) produce grammatically correct Greek inscriptions on the steps.
Unless I miscounted, I got 76%, which is lower than my guessed 80-90%. I was surprised that fbzr avpr vzcerffvbavfg barf jrer NV, naq fbzr irel gnfgryrff zbqrea barf jrer abg NV
I thought I got at best 70% right, but checking the answer key I was right easily more than 80% of the time.
Here were my intuitions that I think guided me very well:
- "This looks hard to write a prompt for"
- "This is something I've almost never seen before and doubt is well represented in anyone's training set"
- Has obvious midjourney/"straight out of the tube" AI art vibes (requires being familiar with what those subtle tells are). Every digital art package from Flash to Photoshop, even Unity or the Unreal Engine's default lighting produces subtle tells like this when people don't take care to deviate from default settings, and if you get familiar with it you start to catch on.
- Multiple characters in the same scene is currently hard to pull of with AI with a one-shot prompt; from what I've seen you still basically have to rely on many individual generations and some kind of compositing / traditional digital art skills to make it work. The chief weakness is that too much of the essence of one character in the prompt will leak into the other characters. So a varied crowd all with distinctly different clothes is a decent tell for non-AI art
- Uniformly high detail throughout. Detail is cheap for AI but is expensive for humans. It's rare for humans to pack an extreme amount of detail into every inch of the canvas.
All that said, if you had time travelled back to 2019 and dumped this test on me, I would have scored no better than chance.
The biggest lesson here for me is that Turing tests are themselves a moving target, because as AI's start to become real things that we interact with more each and every day we start to recognize them. The big question is when will their ability to mimic us outpace our ability to pattern match on their eccentricities.
I doubt it should be called Turing test. For the Turing test, Scott Alexander would have someone select human drawings that would be recognized as human drawings. He himself would have to find neural network drawings that would appear to be human drawings. If he, consciously or not, chose human drawings that look like neural network drawings, he simplified the task for the neural networks.
There's no auto-scoring; it's just a form submission, like surveys Scott's done in the past. For me it said something like “Your response has been recorded.” and that was it.
I have a feel, that I did much better with images that contain people, than those that didn't. Might be interesting to see the other people success rate on this split.
Absolutely. It's surprising how helpful humans are. But only if there are actually visible details... the super impressionist paintings have so little information that it's extremely hard to tell.
In art classes, they will usually say that we all know what human beings look like, so any mistake you make in drawing them will be very obvious. (While other subject matter you have much more leeway for making mistakes without them being noticed).
(Comment contains un-rot13’d image names with spoilers)
I think all my errors were of the form “assuming something is AI when it’s actually human”. A while through it I started to suspect it was all AI, because a lot of them were very obvious with the typical sorts of AI tells, and I was expecting this to involve a curated sample of AI art chosen to look especially convincing, which probably pushed me to err on the side of AI more than I might have otherwise. Instead, it feels more like a lot of the human art was curated to involve some weird or nonsensical elements, which I suppose I should have expected. Most of the ones I got wrong were ones I remember feeling less sure about; even Giant Ship, which I thought had to be AI, had me looking at the rigging expecting to find particular sorts of tells that I then didn’t find, which should have given me pause a little more. My pick for “most confidently human” was Tropical Garden (very genuine human errors/weirdness rather than AI weirdness) and for “most confidently AI” it was String Doll (just very obviously nonsensical strings, in the particular AI sort of way rather than a human way), which were both correct. Angel Woman was my favorite.
As someone who does some (amateur) art and runs in artist circles, the usual take on AI art there is “hating it with a passion, wanting nothing to do with it ever, and also seeing it and identifying it all the time because people are constantly posting it and trying to pass it off as human-made”, which was not represented in the options on the “experience with AI art” question.
These are MUCH better than most AI art I've seen, what programs/ models/ etc. were used? I'm particularly impressed by qentba ynql, which I only realized was AI when I went back to it and realized that the ersyrpgvba jnf fhogyl qvssrerag sebz gur znva vzntr va jnlf gung jrer qrsvavgryl whfg jebat engure guna checbfrshy.
I don't think that this is a valid AI art turing test and I don't think that such a thing is even possible.
Part of the formula for the Turing test is that it is an interactive process conducted in real time.
The nature of art alone makes the real time part impossible.
This "test" would be like taking a transcript of several texttual Turing tests, cutting out individual prompts and responses and asking a third party to evaluate them out of context.
I see no way that anyone could be certain that an image was AI generated if the person creating the image spent enough time refining the prompt and weeding out all but the best attempts. The one shot attempts off a simple prompt are unlikely to be convincing, at least for now.
Are we determining whether AI can create art like a human or whether some humans create art like an AI?
After checking the answers, I would argue that much of the human art could have been AI generated and isn't very good. Certainly some of the futuristic/fantasy art almost has to have detail created in an AI like way
Doesn't the strategy you use to hand-pick the human art strongly bias the test result? Like, it's a very different test if you pick "random human art" versus "human art that is trying to distinguish itself from AI" versus "art that I, Scott Alexander, think is appropriate for this test".
Worth noting a selection effect: someone who looks at every one of these and is like, man how am I supposed to know?, is someone who is less likely to complete the quiz. (I didn't complete it myself, for this reason.) I guess that filters for respondants being higher-confidence than your readers in general.
EEGs taken of viewers while viewing the art reveal that real artworks elicit a powerful positive response in the precuneus area of the brain — much greater than their response to reproductions. This is according to a study sponsored by the Mauritshuis Museum.
If I understand the press release, Vermeer's Girl With the Pearl Earring is especially stimulating to precunei, while reproductions are much less so.
As a museum addict, I can attest to the difference between experiencing a real painting up close and personal and viewing an ultra-high-resolution image of the painting. I'm still mulling over the implications for AI art versus art made by human-intent. Maybe I'll have something to say on this subject for the next Open Thread.
Some anti-AI people on Tumblr encountered this, came to the conclusion it was an AI training data thing, and deliberately submitted incorrect responses to sabotage it. I hope this doesn't mess up the results too much. One of them in particular suggested answering "AI" for every single question.
I understand that you're no longer accepting submissions on the form.
Is there any way that you can edit the form, so that instead of collecting the information on your spreadsheet, it just e-mails the person their own answers? I'm teaching a class on AI literacy next term, and I think this would be a nice exercise to have students do for their own edification, with a record of their answers for when I tell them to read your follow-up post.
You can copy & paste the entire comment into rot13.com to unscramble it. One of the letters stands for H (human), the other for A (artificial)
I agree, Lisa. There might be a false dichotomy here all the way down. All the art on the list is technically human-made because there was a human using tools to make it -- whether digital tools or old-fashioned hand tools. I don't see how there's a big difference between "human using Adobe to make digital art" and "human using AI to copycat other humans' art, with further tweaks as specified by the human." Is the difference really significant?
The process used to generate AI art involves a series of transformations of noise. The process used by humans to generate digital art usually involves making strokes with a stylus on a pressure-sensitive tablet. The processes are quite distinct. Though neither is exactly the same as the process used by an artist working on canvas, the human process is much more similar than the AI process.
ANSWER KEY (https://rot13.com/):
Natry Jbzna: U
Fnvag Va Zbhagnvaf: U
Oyhr Unve Navzr Tvey: U
Tvey Va Svryq: N
Qbhoyr Fgnefuvc: U
Oevtug Whzoyr Jbzna: N
Pureho: N
Cenlvat Va Tneqra: U
Gebcvpny Tneqra: U
Napvrag Tngr: N
Terra Uvyyf: N
Ohpbyvp Fprar: U
Navzr Tvey Va Oynpx: N
Snapl Pne: U
Terrx Grzcyr: U
Fgevat Qbyy: N
Natel Pebffrf: N
Envaobj Tvey: U
Perrcl Fxhyy: U
Yrnsl Ynar: N
Vpr Cevaprff: N
Pryrfgvny Qvfcynl: U
Zbgure Naq Puvyq: N
Senpgherq Ynql: N
Tvnag Fuvc: U
Zhfphyne Zna: N
Zvanerg Obng: N
Checyr Fdhnerf: U
Crbcyr Fvggvat: U
Evirefvqr Pnsr: N
Frerar Evire: U
Ghegyr Ubhfr: N
Fgvyy Yvsr: N
Jbhaqrq Puevfg: U
Juvgr Oybo: U
Jrveq Oveq: N
Bzvabhf Ehva: N
Inthr Svtherf: U
Qentba Ynql: N
Juvgr Synt: U
Jbzna Havpbea: U
Ebbsgbcf: N
Pvgl Fgerrg: N
Cerggl Ynxr: N
Ynaqvat Pensg: N
Synvyvat Yvzof: U
Pbybeshy Gbja: U
Zrqvgreenarna Gbja: N
Chax Ebobg: N
til substack app doesn't have copying from comments
I can highlight and copy on iOS
omg not android discrimination
Think it may be the browser specifically, or else a site bug: copy from comments (or anywhere) works for me fine on Firefox Android, but weirdly today in Firefox Desktop I had to disable Javascript to get copy to work, which I've never had to do before...
I meant the app like the app app. didnt thinj they would intefere with browser text
Ahh, I see! 'fanks for the clarification!
Did you try to copy too much at once?
nothing shows up on hold. it works (kinda badly) on post body
For users of the Android app: Click on the three button menu to share the comment to your email, and then look at it on your desktop or laptop
There are only 49 pictures, not 50 :/
That's part of the test.
As in, anyone saying they expect to get 50 % right wasn’t paying enough attention?
The 50th image was a captcha.
There's now a picture called "Girl in White" that I don't remember being there when I did the test a few hours previously (and which isn't listed in the Rot13'd answer key).
When I did it on my phone, there were only 49, but now when I open it on my computer, I see there's an extra image that I don't remember ("Girl In White"). So I'm not sure what's going on.
When I opened it on my computer, I didn't see Girl in White either; then I reloaded the page later, and it showed up.
And that one is missing from the key.
I haven't unrot13-ed it yet, but at least this allays my concerns that you were punking us and everything is either by humans or by AI.
I thought about it, but I really wanted to know how people would do on this and that would invalidate the test.
while taking it I was terrified it was gonna be 100% AI
Good god
Qbhoyr Fgnefuvc maker, why do you hate symmetry??? Why??? Why can’t we have nice things? Why couldn’t you make the ebpxrg abmmyrf the same on both sides?
I feel similarly about Envaobj Tvey.
I actually picked that as my most confident as being human. I think an AI wouldn't have made the glints on the eyes identical or had the eyes looking in a perfectly consistent direction. I also thought that in general it was just amateur-ish enough where an AI would have done a better job with the lighting like on the ear (a lot of artists don't realize that ears need to be more red when lit due to subsurface scattering). Likewise the eyes were what gave away Tvey Va Svryq, which I picked as most confident as being AI.
I just had to flip a coin on that one. No strong indicators either way.
Surprising. That was the one I picked as most confidently human. In my experience AI tends to be particularly bad at that sort of precise geometry.
+1, this was my #2 highest confidence human (behind Terrx Grzcyr). It's perfectly symmetric in every way I expected to it be, including the ebpxrg abmmyrf (there are three of them, right?). Also I've watched a lot of ebpxrg ynhapurf, and all the small details I thought to check had a recognizable function and were placed in sensible locations.
I think all the rest of it was very symmetric, that's mostly what I went by!
It is symmetric, there is one in the middle and one on each side (though I think one is slightly off). That was the one I was most confident about.
I think the perspective may be slightly off with the nozzles - which is human :) It was the one I was most confident about as well...
Just out of curiosity, how did you determine which images were AI and which were human? There are images on the internet being passed off as classical or original art that are actually AI generated.
And I guess it's not impossible someone might post images as AI creations that they have actually carefully edited by hand.
The human ones were either from a famous artist, or made before 2018, or on Deviant Art by an artist who showed their work (eg preliminary sketches), or something similar.
The AI ones were mostly generated by volunteer ACX readers, although a few were taken from AI art sites.
Jbhaqrq Puevfg: U
I was really surprised by that one; the anatomy just feels off (especially the belly). In fact, this feels close to someone doing this on purpose in order to fool people for a test like this.
Maybe this sort of thing is why Michelangelo had to dissect all those corpses.
I think I'd seen enough terrible medieval art not to be fooled. In fact, I was pretty sure it was human just because while the anatomy was terrible, there were definitely 5 fingers on every hand.
I was pretty sure this one was fed into AI to generate Zhfphyne Zna
Oh interesting, that one was clear to me. The blood was flowing from his side in the way it would have been when he was on the cross (vertical), which it why it looks odd when lying down. That's the exact kind of detail artists of that time liked to use to show off their attention to detail and knowledge.
AI just isn't there yet to make those kind of second order physics or anatomical connections without an incredible amount of detailed prompting and retries.
That was actually the one I was most confident was human-made. Mainly because gurer jrer n ybg bs unaqf va gur fprar, naq gurl nyy unq gur evtug ahzore bs svatref.
I've literally seen it before so that was cake lol
Me too! The most human of them all I thought. The mistakes seemed like renaissance human mistakes and not AI ones.
interesting, I put this as my most confidently human
This was both my favorite and the one I put as most obviously human. There are a lot of hands, and none of them are fucked.
My artist boyfriend says, from looking at the painting: This art is by someone who's huffed a lot of Catholic art and is reproducing a very specific thing. It looks weird in part because they're reproducing old master work, where the old master work looks weird because of the dominant style at the time.
Interesting. I did better on the first questions where I sped through using intuition. I did poorly on the last 5 when I had to justify my answers.
Same here. I only got 6 wrong in the previous 44, but got 4 wrong in the last 6.
I think maybe the last five were chosen to be the most surprising.
Maybe. I’d like to know
You missed Girl In White.
ETA: I see someone else pointed this out and Scott posted the answer below.
I do not see his response anywhere. What’s the answer?
Thank you for noticing my mistake! I've added it in. The picture is https://lh6.googleusercontent.com/aioJmwtNB87RO8KHikGPZH2krgR6vxE2wO3O06siFZXH3r6hD8dDndsZl5ty2DIRHOrBbt-LjwReWFcTL-70Uk6bEtqA7M58VcEuZz7nEEZyYopkmvcVe3iih2h4X2iF5w=w740 , and it vf neg znqr ol n uhzna.
Did you leave out tvey va juvgr on purpose as a control?
Tvey va Juvgr is still missing from the answer key, as pointed out by someone else.
Huh, AI got much better at getting fingers right while I wasn't paying attention
I may be mistaken, but has "Cnevf Fprar" been missed from the answer key?
It's probably there under another name.
Are you sure oyhr unve navzr tvey jnf uhzna? There are obvious mistakes, like "fur unf ng yrnfg guerr ryobjf" and "ure rlroebjf cnegvnyyl pbire ure unve" and "ure unve pyvcf va naq bhg bs rkvfgrapr" that make it seem like gur negvfg jnf pbafpvbhfyl gelvat gb rzhyngr NV neg if so.
Yeah, I got that wrong too based on arm anatomy. But I guess human artists can get it wrong too.
For me, this was the easiest one to identify as human because V'ir frra n jubyr ybg bs navzr cvpgherf va zl yvsr. Gur pyhaxl fglyr jnf n qrnq tvirnjnl gung guvf jnfa'g NV orpnhfr nyy gur NV navzr cvpgherf lbh svaq ner zhpu orggre ybbxvat.
Vqragvslvat gur NV navzr cvpgherf jnf rnfl gbb, orpnhfr n pregnva cresrpg Xberna snpr fglyr vf fhcre cbchyne.
Yes, all are sourced for certain.
I was initially confused by gur ryobj guvat, but I think gur guveq ryobj vf n jevfg, naq znlor gur ybjre unys bs gur unaq jnf pebccrq bhg.
I got 72% correct. I'm a bit surprised, that's better than I expected.
Bzvabhf Ehva: N
Obviously that one column is all wrong. However, some artists (M.C. Escher) would do that intentionally.
I happened to be wrong with both of my most confident answers (Napvrag Tngr, Terrx Grzcyr), so I guess I will not become an AI art detective.
City street is not on the list "Which picture are you most confident was human?" Looks like it's called Paris Scene. You should change the names to be consistent.
You're right, thanks, fixed.
Was not fixed 5 minutes ago, very confusing on mobile.
Still not fixed
Yeah, this one fucked me up because it was my favorite piece of art, just gave up and chose the second best
Okay, now I think it's actually fixed.
I had the same problem, I wanted to choose city Street for human and since it wasn't listed I left the question blank. I hope this doesn't skew your results!
Is there a protocol for people who can't be arsed to do fifty of these? Like only do the first 10, or pick some at random, or don't do it at all?
Don't do it at all.
If you leave some out (any), Scott can probably still run some analysis.
Just don't fill out the stuff, you don't want to answer. He can sort out the rest.
Don't just do the ones you're sure about. That's probably the most important thing.
Hm. I felt I had no basis for judging the very weird/abstract/impressionistic ones because I don't "get" those and from my perspective they could "correctly" look like basically anything. I originally started answering them randomly, but then I thought leaving them blank might be more representative of my actual epistemic state.
You've made me wonder if that was a mistake and I should've stuck to the first policy. If so, sorry Scott! I didn't read the comments until after I'd submitted.
(The ones I skipped were: Bright Jumble Woman, Angry Crosses, Creepy Skull, Fractured Lady, Purple Squares, White Blob, Vague Figures, Flailing Limbs, Punk Robot. The last two I explicitly put in a 50% confidence.)
IMO it's still better to just pick one, even if you have no real basis for doing so. It's possible that you're somehow still picking up signal, and if not it's important to average in all the 50% accuracy people.
I'd just do the ones at the end where he asks for more detail.
I started doing this, ran into the "I want you to analyze these pictures more deeply", and am now on hold. I want to do this entirely intuitively, I don't want to think!
I did that part but didnt write a text explanation and I skipped the part after that which asked me to go back to look at every single picture so I can decide which was most human/AI after the fact. I assume there's still value if you complete a full section but not other ones.
He didn't ask you to think, just analyze more deeply. You can still do that with intuition.
But you have to think in order to realize that.
My problem is that I can only do it intuitively if I've seen AI do it in that style. I'm sure AI can copy the style of old artists. It probably has its own details that make it distinct. But since I've never seen it try, how am I supposed to know how well it does?
Looking at the answer key I think I got >80% right, the most difficult ones being the painterly ones.
The test felt surprisingly a lot harder than I expected, yet my success rate surprised me by being higher than expected, which is interesting.
Ditto. I estimated my own success rate at about 65%, as it was much harder than I thought, but looking at the answers I got ~80% right. Human gestalt seems to be pretty good. I wonder what an AI would get on this?
Some of them that I described as “Creepy, doesn't have a soul” were made by human. And my most confident “conveys an emotion” turns out to be AI.
I feel the same way, and the ones I got wrong were ones I was wishy-washy on. Pleasantly surprised by that! That said, there were very few of these that I would have spotted as AI had I seen them in the wild without being prompted.
My only really big surprise was: Zrqvgreenarna Gbja, jurer V gubhtug gur cnggrea oernx jurer gur fdhner bs bprna va gur onpxtebhaq gbbx ba n qvssrerag grkgher guna gur ohvyqvatf naq fxl jnf obgu negvfgvpnyyl zrnavatshy naq uneq gb cebzcg.
Zrqvgreenarna Gbja was my favorite! However, it has one defect that gives it away as AI, given its otherwise so competent execution. Ba gur jnyy nyy gur jnl ba gur yrsg, juvpu yvrf va funqr, gurer vf na vyyhzvangrq fgehpgher/nepujnl gung bhtug gb yvr va funqbj. Vg qbrf abg znxr frafr sbe vg gb or vyyhzvangrq - ng yrnfg gb zl rlr.
That's a great tell. Looking at it again, the other one I see now is, Va gur pybfrfg bcravat ba gur evtug, gur funqbj qbrf abg zrrg gur vaare pbeare bs gur jnyy jurer vg fubhyq. But those are both minor, and I could see them being defended as artistic choices!
Ah, yes, good catch!
I went through the test with a person who received formal education in art, and they used a similar lighting cue to determine that Yrnsl Ynar jnf NV trarengrq, orpnhfr gurer'f na vyyhzvangrq cngpu ba gur yrsg jnyy gung nccrnef gb or vapbafvfgrag jvgu gur yvtugvat va gur fprar (fbzrguvat nobhg gur yvtug fbhepr orvat ba gur gbc yrsg?)
I wasn't sure whether technical defects were more likely to be AI or human!
I copy and pasted your last paragraph into chatGPT, hoping it would just automatically figure out that I want to undo the rot13, but instead it answered in the same style:
> Zrqvgreenarna Gbja, gur erfhyg bs n qernq sebz gur “Zrqvgreenarna” crbcyr, jnf n onynapr bs fubeg naq ybat-gbbyrq fbyvgragnvyf naq vqrnrf nobhg pbzcnalf. Gur oernx vgrefvba bs gur cnggrea tvirf zrnavatshy bar crevbq va bcrengvir bssvpref, fvzcyl hfvat n ybjre-senzr nafjre gb gur jnl gurl nccrnerq gb or fubjvat gur orybat. Gur onpxtebhaq, fvapr gur rpbcrf bs gur ohvyqvatf naq fxl jrer uneq gb cebzcg sbe qevaxvat cynpr va n fgebat, uneq-tebjavat zrgregevfgrag, vg pbhyq or n chmmyr bs zvav gehrzragf naq qrfgeblf sebz n srj crevbqf jub whfg xvyy gur fgergpu bs gur nff.
> Guvf pbhyq znxr bar gubhtug nobhg gur inyhrf bs uryqvat bire ohvyqvatf nf cneg bs n cresbezvany pbzcnpg, gung zhfg or nyzbfg uneq-gb-penva nf Vg'q unir gb znxr qrsvavgvba jvgu tbbq rkcrevrapr, haqrefgbathragrq yvivat, naq chmmyr engvbaf sbe cerivrfg ohetyvarf ba gur cevinpl bs jrg sbhe.
Fascinating! I wonder if that’s a good representation of what it is like to read ChatGPT writing in other languages that aren’t that widely used on the internet.
ChatGPT3, writing in Irish (50k speakers) was at about this level of coherence and grammatical accuracy. ChatGPT4 is quite a lot better, it's mostly grammatical. Copilot seems to better at grammar. Writing in a minority language seems to challenge it - it feels like it reduces the 'IQ' by 15 or so.
It doesn't write in anything like the way a person would. It chooses uncommon words too frequently, and sometimes invents its own translations (which is linguistically quite interesting). Even when writing in Irish its cultural references tend to come from the US. Let's say it's fairly easy to identify essays written with AI.
There are two ways to speak rot13 English. One is to learn it entirely as its own language. The other is to have the ability to decode rot13.
I just tried to prompt chatGPT with the following:
DGJJM ADCSBNS. E CK WQESELB SM YMU EL C REKNJG KMLM-CJNDCHGSEA AENDGQ. EP YMU ACL QGCT SDER, SDGL YMU DCVG PEBUQGT SDG AENDGQ MUS. E WEJJ ELAJUTG C PGW KMQG RGLSGLAGR SM KCIG PQGOUGLAY CLCJYRER PGCREHJG. MP AMUQRG, E CK FURS URELB C REKNJG NCRRWMQT PMJJMWGT HY ULURGT JGSSGQR EL CJNDCHGSEA MQTGQ, LMS C PUJJ-PJGTBGT NGQKUSCSEML. EP YMU ACL QGCT SDER, NJGCRG QGNJY WESD SDG RUK MP PEVG CLT RGVGL EL CQCHEA LUKGQCJR.
this is a monoalphabetic cypher which I generated on GNU using
ge n-m PUNGTCOQRSVWXYZABDEFHIJKLM (rot13)
However, the free version of chatGPT was unable to decipher it on its own. Even when I told it 'it is monoalphabetic, please decrypt it and follow the instructions', it was unable to do so. Breaking monoalphabetic cyphers of English text (single-letter words, two letter words!) with punctuation preserved should not be that hard.
it’s very hard for ChatGPT because it thinks in terms of tokens and not letters, it doesn’t know how words are spelled unless it encounters a text that explicitly says e. g. “cat is spelled c-a-t”
That was a shock to me as well for the same reason, and also my favorite!
I think I got about 65%. I don't think I misattributed any human-generated ones, but I definitely assumed some AI-generated ones were human. Zrqvgreenarna Gbja, in particular, got me.
Exactly my experience, it seems like I got 39/50 correct, whereas I estimated my success at 50-60% (due to finding the test harder than expected). The ones I got wrong, I was quite surprised by.
Same, I wonder if Scott will find a similar thing in the data, because humans usually are overconfident about their judgements (https://en.wikipedia.org/wiki/Overconfidence_effect ) so it would be interesting if it's the reverse in this scenario.
I thought I got about 60%, but when I looked at the answer key I think it's something like 30-40% (don't remember enough of my answers to be sure).
The hardest for me were the really weird abstract ones. I feel like I have nothing to go on.
the hardest ones for me were the ones created by humans with digital art tools.
the "high art" style ones were mostly more obvious
I got really fooled by one which was created by a human but in what I would call a fantasy-architecture style *and* definitely was composed with software, not drawn or painted by hand. And to be honest that was based on the style, not the details. Zooming in on the ones I got wrong on the first attempt (on my phone at 0% zoom), there are only 2-3 where it's still hard to tell.
I'm happier about the last few because I was wrong on one but it was the one I wasn't really sure about. Actually zooming in and looking at details *usually* makes it obvious.
And I put myself as not very familiar with art, but tbh I'm probably way more familiar with art than most people. I've been to multiple art museums in my life and, mostly tangential to my interest in history, am somewhat familiar with the broad strokes of (western) art history. Like one painting was easy for me because *I've seen the painting before*... online somewhere, probably the wikipedia article about it. Still my brain functions well enough that it instantly went "oh that's a real thing I've seen before"
Then why did you lie about how familiar with art you are
well I've been to one art museum in the last 4 years and I have a degree.. in chemistry. Ofc I'm a decade+ SSC reader so I'm probably more informed about this specific topic from past exposure to discussions
‘Twas harder at moments than expected but I enjoyed this!
So glad you have done this. I look forward to Gary Marcus using the result as evidence that AI is overhyped and will never mean anything.
What confuses me about Markus — and I’m generally a big fan — is that he also argues AI is very dangerous. Strange combo: that it’s unimpressive yet dangerous.
Steel manning:
This makes sense if you assume the danger is in its bias and unpredictability.
The more likely answer is that he needs to be in the spotlight as the contrarian but attach himself on the safety train as well
It is the latter not the former, since he also keeps dismissing any and all evidence of capabilities
I share his intuition that there are limitations in the technology as it currently exists — a priori, all technologies have limitations. I also share his intuition the solution has something to do with introducing explicit logic. This study I saw yesterday does a great job laying out the issue, especially the part where they introduce red hearings:
https://arstechnica.com/ai/2024/10/llms-cant-perform-genuine-logical-reasoning-apple-researchers-suggest/
I also share his intuition that "this is not that".
But I think he massively misses how this could be the start of that, and also, how impressive it already is!
Humans invented explicit logic, as far as I can tell, out of simple next token prediction, evolutionarily adaptive scaling and incredibly refined input filters (I think we are likely filtering something like 99.9% of potential inputs).
Surely it evolved the other way around no? The simplest one-celled organisms approach food in an “if this, then that” algorithm. Token prediction (or statistical modelling etc) would have come later?
We still have some simple algorithms in the physiology in our brains, like: “if blue light, then wake up.”
I don’t know - and I don’t think anyone knows - how the brain handles explicit logic. So maybe the brain re-evolved new mechanisms to handle logic later. But usually evolution preserves uses what it already has.
I am not a big fan, I find his emotional investment in AI failure inexplicable, apart from the second reason suggested by Johnny Cash below.
AI has so obviously passed the Turing test at this point that I think we need a new metric which is more useful or meaningful. I think there is one, something like, “can it produce truly great work.”
Clearly there’s something to this, as a turing test is useless if we limit the domain to, shapes drawn in ms paint; human and computer become indistinguishable when the skill level involved is approximately zero. And clearly, multiplying two large numbers together is something computers have been better at for almost a century now.
The difficulty is that it requires a belief in value realism which most elites today reject. I suspect we are going to find AI re-opens a bunch of philosophical debates, though, because I think “understand value” is the best compressive mechanism for storing lots of data and we will soon see LLM’s using that kind of approach to compress the set of experiences they’ve encountered even more effectively.
“can it produce truly great work.”
Careful not to get a test that the vast majority of humans would fail ;-)
I'm not sure how we would measure that anyway. It seems pretty subjective and extremely gameable.
What's great? Is a painting of a can of tomato soup great?
I suspect that an AI is literally incapable of producing art that would be called great, because it lacks the appropriate political connections.
(If the soup can is too obvious, what about a painting of a cigar, with the letters painted on it proclaiming "This is not a cigar"?)
I, Robot hit this beat decades ago: https://www.youtube.com/watch?v=KfAHbm7G2R0
Most people couldn't play chess or Go very well either, and yet competing against the best professionals was considered a fair test.
But I still think "truly great work" is a bad test because it's too vague.
Because many people believe (wrongly) they could reach professional chess/go level if they tried hard enough, it comes with the blank slate/egalitarian model of the world.
On the other hand, convincing other fellow humans that you are intelligent enough to be considered human (and even better than most other humans) is something everyone do all the time, at least implicitely. Not necessarily though conversation, and seldom through written conversation, but that's why it is considered a better test than beating humans at formal games. At least that's why I think it was chosen rather that excelling at logic games or other intellectual subfields. In then end, this is not so different from the political Turnig game, where an entity will be considered human/human-like if:
- it insist on it, and not granting his wish will hurt [the deciding part of] humanity more than not granting it
- [the deciding part of] humanity will (or could) one day turn into this type of entity, so better be nice with your future self
....Preferably both.
My guess that in the end, Political Turing test will be the one that matter....
If a million monkeys on typewriters eventually produce Shakespeare, does that mean they produce great work? What success rate or degree of "intent" does it need to have to be deemed as capable?
How many humans on typewriters does it take to produce the works of Shakespeare? Maybe that's the metric we should be using.
A monkey did produce Shakespeare. His name was William Shakespeare.
Well he was probably an ape.
Apes are monkeys.
Apes and monkeys are both primates, but apes are not monkeys.
Nope. You are wrong. Apes are absolutely monkeys (actually a subtype of old world monkey), according to cladistics aka the modern classification of animals in biology according to evolutionary lineage. They are only not monkeys according to old, quite arbitrary classification by superficial similiarities in characteristis. To showcase how arbitrary it was: in plenty of other languages there was never the ape is not a monkey differentation, in my native german for example apes were always a subtype of monkey.
Other shocking news from modern biology: Birds are dinosaurs and Whales are fish. (and so are you. And birds. And all tetrapods. You can't evolve out of a clade!)
Sadly, the public has kind of missed the move to actually logical and objective classification of animals that happened in biology and its subfields in the last decades. Many biologists and zoologists have been working diligently to change that, but old habits die hard.
The selection process is what you should be paying attention to in the million typewriters case. Who is reading over these sheets of paper to isolate Shakespeare-quality writing?
Neither the monkeys nor the human reading over the million sheets of paper, is as good at the things that William Shakespeare was good at—but the human can recognize quality.
That is still producing a great work, in my opinion, it is just people entangle "great work" with "their skill is great and praise worthy".
AI is also different than the monkeys because it does have a good bit more directionality—it is as if you gave a monkey basic guidelines on sentence structure, style, and a bunch more. It still can easily falter, but the baseline quality level is higher.
> And clearly, multiplying two large numbers together is something computers have been better at for almost a century now.
Multiplying correctly and efficiently, sure, but they’re only now learning to do it wrong, inefficiently and hesitantingly in the precise ways humans do it.
Chatbots cannot multiply numbers accurately (though they're getting better). There is no evidence at all that they're doing it the way humans do. The opposite - there's no reason to expect that AI is following the procedures that humans do, that it makes the same mistakes, or that there's anything interpretable going on in what it does.
Most people don’t think this through very well. Current AI chat bots complete their response in a single step without reviewing. Can you give me the answer to 36x73 without breaking it down into multiple steps? The vast majority of humans only memorise the 10s times tables and then use multi step processes to use those memorised to apply it to double digit multiplication. If you ask an I to break down complex multiplication into multiple steps like a human does, it’s substantially more accurate.
I do a bit of AI-awareness for local universities. A reasonable portion of it is telling them 'the essay is dead'. There was never any guarantee that the work was by the student, but it wasn't economic for most students to pay for essays - now it is. Presumably there was a similar discussion about arithmetic and calculators in the 1950s.
AI's are currently designed to produce essay-type text. It was not obvious a priori that they'd be able to do arithmetic. A little like coaxing a child, I have no doubt that you can get better results by prompting it appropriately - though that still requires you to know the right answer!
The way forward will be interfacing the AI with a calculator. It seems to me very plausible that the AI will be able to recognise a maths problem, input into something like Wolfram Alpha, and pad the output into a verbose answer. Something like ChatGPT is a Swiss Army knife AI - you can tackle anything with it, but it's actually the right tool for any particular task (unless maybe writing product description blurbs - it's amazing at that).
Totally agree that it’s an inefficient calculator.
More just pointing out that most people don’t stop to think about the difference between single step and multi-step processes. When I write prose for instance, I often need to do several passes of editing before it resembles something that might be worth reading, but we judge AI on what it produces in a single pass.
On prose writing, I’ve actually gotten the new Claude Sonnet 3.5 to write some pretty exceptional genre fiction recently. I sent some to my brother and he said “This is better than most of the books Ike read recently. That’s wild.” And I share his sentiments.
You could use the original adversarial Turing test.
Yes, if you took a group of 5 human works and 5 AI works, and asked which group was the AI group and which was the human group, 95%+ of people would get it right.
The original adversarial Turing test has all participants know they are in the test.
So adapting to this setting, you would tell the humans before they produce their artwork to produce something that will pass as human. (And you tell the computers to try to produce something that passes as human.)
So our human participants could deliberately produce stuff that's (still) hard for computers to produce.
Calling that a "Turing Test" is a metaphor, not an actual factual statement.
No AI has yet passed a formal Turing Test. If you accept sufficiently loose analogies, Eliza passed one a couple of days after being written.
It's a reasonable analogy, but it's not more than that. Don't overrate it.
That said, even passing a formal Turing Test wouldn't suffice to prove intelligence to someone who didn't want to believe. This is partially because intelligence doesn't have a fixed definition. It makes much more sense to talk about actually testable properties, and this is on example showing how impressive AI image generation has become. But it's NOT a test of intelligence. (Whatever that means. And apparently neither is passing the bar exam, which an AI has also done with a higher score than most humans who have passed it. Nor it being a better go player.)
FWIW, I feel that we're getting quite close to an AI that could pass a formal Turing Test...but that also won't suffice. Partially because current AIs aren't equipped with a wide range of motives. And I still suspect that an AI may need to control a robot body to actually be intelligent. And if people are going to understand its intelligence, it had probably be a body as similar as feasible to that of a human. (OTOH, dogs have proven that it doesn't need to actually BE a human body. But it requires reactions that people can map onto their own reactions.)
> That said, even passing a formal Turing Test wouldn't suffice to prove intelligence to someone who didn't want to believe. This is partially because intelligence doesn't have a fixed definition.
I see two possibilities. Either intelligence is a property which can be tested over a bidirectional sensory link such as a text chat. In that case, one would have to concede that the LLM is as intelligent as one's fellow humans after it passes the Turing test.
Or intelligence is an intrinsic property which can never be tested by third parties. In that case, you can rule the LLM 'not intelligent by default', but then the question arises if you consider your fellow humans intelligent, and why.
Of course, Eliezer Yudkowsky preempted this discussion sixteen years ago when he wrote the sequence on words. (In retrospect, a lot of his writing can be understood to follow the objective to minimize the number of pointless discussions to be had when AI arrives.) See for example
https://www.lesswrong.com/s/SGB7Y5WERh4skwtnb/p/WBdvyyHLdxZSAMmoz
Consider the posibility that "intelligence can be tested over a bidirectional sensory link such as a text chat". Asserting that this is true does not say what tests over that link would be convincing, and with the vague definition that intelligence has it is to be expected that different people will have different criteria.
The problem is that "intelligence" has always been an "I'll know it when I see it" kind of phenomenon rather that one that has a testable definition. IQ is about as close to a consensus as we've gotten, but that's clearly insufficient as programs that can do quite well on various IQ tests can't butter a piece of toast.
Nobody's held a formal Turing Test competition in quite a while now, to be fair.
Nobody's EVER held one. Up until recently there wouldn't have been ANY point to it. Even currently it's obvious that the AI would fail, but it's getting pretty close.
The thing is, in a formal Turing Test the interaction is interactive, and the judges know the context (and are trying to decide). Up until a couple of years ago the AIs were so blatantly incompetent that there was no purpose in a real Turing Test. Currently there are still problems that AIs can't deal with that are easy for people (as well as the converse), so an AI would fail a real Turing Test. Next year (or perhaps later this year) they'd likely be close enough that they might pass.
I think human artists still win on the metric of "create an image that follows this prompt EXACTLY and this EXACT style". I pay for three different image generation models and they still couldn't replace humans when I need *very* specific art to be generated.
If you can ask for anything you like, no AI image generator I've seen can make accurate piano keyboards or accordions. (I try them to see if it knows how to count and get repetitive groups right.)
To be fair, few humans would be able to draw a convincing accordion unpromted either - it seems to be something where people have the vague idea that there are bellows, buttons, and (usually) a piano keyboard, but very little concept of how those parts are orientated to each other.
I’m reminded of how bad many people are at drawing a bicycle from memory [1]. Certainly, the AI image generators beat that!
But I don’t think comparing to average people without any help is a relevant test. How good would an artist with their usual tools and access to Google Image Search be at drawing a bicycle, a piano keyboard, or an accordion? That would be more relevant test.
And to be fair, I wonder if AI image generators could be improved if they were given access to an archive of good reference images at inference time?
[1] https://www.gianlucagimini.it/portfolio-item/velocipedia/
Searching the image results for "AI art bicycle" actually turns up a fair few that seem pretty well put-together. Compare that to the horrendously cursed abominations that come up for accordions: https://duckduckgo.com/?t=ftsa&q=AI+art+accordion&iax=images&ia=images
Yes, accordions and piano keyboards are my test to see if new image generators have gotten fundamentally better. The last one I tried would *sometimes* output an accordion that isn't cursed.
I can't draw for shit, though.
Can we do human vs ai at playing clarinet?
If you mean "it can create things that aren't easily distinguishable from human artifacts," sure. However, the original Turing Test is not that. It's a social game like the Mafia party game (also called Werewolf). Skilled human players who have practiced could cooperate to identify each other.
AI would still suck at *that* Turing Test; nobody is really trying as far as I know.
It's pretty easy to come up with an art test that a human succeeds at but AI fails at. Just give it very specific instructions regarding composition. For instance, DALL E failed at the following prompt: "Can you generate an image of a three people. The right one with short black hair, smiling. The middle with blonde long hair, frowning. And the left with short curly green hair, laughing. And with a field in the background."
Even an inexperienced human can do this. It might not be pretty, but they can follow the instructions.
I think this might be easier in, say, nature photography, where no-one really disagrees about what value is.
I think I managed to get less than 50% right. Looking forward to the follow up, partly to see how everyone did and partly to see which artist did some of my favorites there.
Yes, me as well. The highest certainty I entered was 60%, and now I think I was overconfident.
Some art styles feel like it'll be basically impossible to tell whether it was made by an AI or a human, assuming you don't pick an egregiously bad example of either. I'm sure people more versed in classical artwork could tell things I can't, but for most of the classical pieces I haven't a clue; I know AI has been able to do style transfer (like cracks in the paint) for quite a few years now. Other art styles there are still some strong tells, most especially consistency of lines extending across the image lining up like #rot13 gur furcureq'f pebbx va Fnvag va Zbhagnvaf #.
I was stunned to discover that # Napvrag Tngr jnf NV - gur flzzrgel naq pbafvfgrapl V unq gubhtug qrsvavgryl vaqvpngrq vg jnf uhzna # - but doing a search now I see that it is at least one of the top generations by a particularly skilled AI artist. Just as with Google-fu, there's a lot of skill in learning how to prompt both text and image generating AIs well, and I guess I'm not surprised that especially practiced and lauded people can get better results than I'd expect.
Contrast, glow + failure to render small figures/bushes made it pretty obvious to me. There is a sharpness to all the edges that you wouldn't get in anything painted during the neoclassical era
Huh. I thought of Napvrag Tngr as extremely obvious AI art. I think of it as being in the genre of "superficially appears 'detailed', but all the details are bad and incoherent." Here are some of my issues with it:
Gur erq naq oyhr cnvag naq oynax fgbar srry yvxr gurl'er fhccbfrq gb ribxr jbea-arff, ohg vg'f abg pyrne jung fglyr guvf vf fhccbfrq gb or n jbea-qbja irefvba bs. Bar trgf gur srryvat gung vs nyy gur cnvag jrer cerfrag vg jbhyq ybbx yvxr n cvyr bs fuvccvat pbagnvaref, vs fuvccvat pbagnvaref jrer bayl znqr va gjb pbybef.
Vg unf beanzragf, fbeg bs, ohg gurl qba'g ybbx yvxr nalguvat, be rira n jbea-qbja irefvba bs nalguvat. Gurer ner zngpul qvfxf va gur yrsg/pragre/evtug, bxnl, rkprcg gurl'er qvssrerag fvmrf, qvssrerag pbybef, naq unir arvgure "qrgnvy juvpu cnefrf nf nalguvat" abe fgnex fzbbguarff.
Vg unf fghss gung'f inthryl ribpngvir bs rtlcgvna cnvagvatf vs lbh qvqa'g ybbx pnershyyl ng nyy.
Gur yrsg pbyhza unf n fbeg bs qbbe jvgu n znffvir gbc-bs-qbbejnl-guvatl bire vg. Jul? VQX. Gur evtug pbyhza qbrfa'g naq lbh'q rkcrpg vg gb. Vafgrnq gur evtug pbyhza unf, yvxr. 2.5 nepurf rzobffrq vagb vg gung whfg xvaqn unysurnegrqyl genvy bss.
Ner gurfr frzv-gbc cebgehqvat fdhnerf fhccbfrq gb or erq be oyhr? Ruu, jungrire. Qbrf gur gbc obeqre cebgehqr gur jubyr jnl? Ruu, zbfgyl.
Fully agree with you. That one was one of the more obvious ones for me. I got completely juked by the ones in a more vzcerffvbavfgvp be fcrpvsvp fglyr, ohg gur barf gung ybbx xvaqn trarevp 'negfgngvba cvrprf' jrer boivbhf gb zr
SUPER impressed by the others, though.
oh I also got thrown by vzcerffvbavfgvp and nofgenpg and tenssvgv art.
Right, both schools are easy to fake (to my eyes). And yes, this one was one of the obvious ones for me.
I think human artists fudge small details a lot of the time as well, but somehow they fudge them differently than AI. You can tell there's a thought process behind it.
Yeah this one felt "obvious AI" to me. Large, very detailed stuff almost always registers as AI for me, especially if it has that "glossy" look. It's the "orange" (I think) model look
Counterpoint, tvnag fuvc is human and it's very much a big dumb detailed structure thing
True, I got this one wrong because of that (though it doesn't have the "glossy"/plastic look)
Right, how is tvnag fuvc ever human? Was whoever made that inspired by AI?
There's a "personality type" (I'm deliberately tip-toeing here) who's really attracted to that kind of work. Examples of famous real people like that are Philippe Druillet ( https://goodman-games.com/wp-content/uploads/2021/11/WebsiteGraphic_WIDE_111821_2.jpg , https://comicart.dk/wp-content/uploads/2023/06/3f5689701a0a48809d05ab662b1b12d4b2d63514_2000x2000.webp ) or Piranesi ( https://cannonballread.com/wp-content/uploads/2020/10/piranesi-3-694x700.jpg ) or, sometimes, Escher ( https://www.huffpost.com/entry/unknown-mc-escher-drawing_n_560165d5e4b08820d91a1df5 ).
Old-school French comics mag Métal Hurlant published a lot of these blokes, like the aforementioned Druillet but also Schuiten ( https://www.messynessychic.com/wp-content/uploads/2016/06/schuiten_01.jpg ), Jean-Claude Gal ( https://www.bdtheque.com/repupload/G/49349-planche-bd-epopees-fantastiques-arn-les-armees-du-conquerant.jpg ).
Anyways it's a thing.
There is usually a global idea behind Escher. Maybe the early example you link to is unusually easy to fake.
That Piranesi is unlike all other Piranesi I've seen (well, I've seen only the Carceri, like nearly everybody else). Yes, it would be tempting to take that for AI - I would have been tempted to double-check with a friend whether all the Latin text makes sense.
That one was really easy to spot as human for me. I counted the number of ropes in the symmetrical rigging sections for port/aft, and they were always perfectly symmetrical. AI art models can't count to five (let alone 11), so that would have been an incredible coincidence.
that tricked me because I've only ever seen that artistic style done in AI art.
Looking at it again, it has repeated consistency in the design that is a-typical of AI art
there was another one that I thought was AI which was also done by a human (100% using digital tools, the artefacts of which were part of the reason I thought it was AI). I thought it was AI *mostly* because of the incoherent design choices from a structural engineering perspective. Of course, most human artists also don't understand the forces involved in space travel or why certain cool-looking things would immediately break in horrible ways.
I think it meets the heuristic of "specularity and shading that draws attention to its level of detail plus generally busy" (plus combining a neoclassical style with what appears to be a fictive structure or landscape and a questionable point of view for someone without access to a drone.)
Same here, it's common for AI art to have details that, for lack of a better word, "boil" - they're chaotic, they have a lot of things going on but if you focus they never resolve to anything.
I thought was one was quite obvious; the background gives it away, it's a very different style than the foreground.
rot13 gur furcureq'f pebbx va Fnvag va Zbhagnvaf #.
If I had noticed the detail you mention, I would've been certain it was AI.
Yet, that one turns out to be human.
I knew there were going to be things I was very confident in where I was just wrong.
I also assumed all #navzr tveyf# were AI, because that fandom has leapt into it so much.
I also ended up erring with Napvrag Tngr. When I saw it, I though it looked obviously AI, but then I started looking closer, and was surprised by the consistency of the symmetrical details, and I thought that is something AIs are bad at. Apparently not. Should probably just go with the gut feeling in such cases.
That was exactly my experience with that one as well!
I'm no expert, but a number of the AI classical paintings are way too realistic; e.g. Tvey Va Svryq is way too detailed a face for that style. But that's probably not too crazy hard to train away, they just haven't gotten to it.
I thought that looked like something a very weak or beginner painter from that period would do (the posture is all wrong) - of course there were plenty of those, but you wouldn't usually find images of their work online, so I voted AI.
When you've finished the survey, any chance you'll host this on a quiz site that grades your answers? I can think of a few people I'd like to send it too.
If Scott's okay with it maybe you could make this into a quiz on uQuiz or something.
I already forgot half my answers
uQuiz is good for personality quizzes, but horrible for rated quizzes. Jetpunk or Sporcle would be a much better fit for this.
in case someone does make such a quiz, please tag me when you share the link. I would be interested as well
Exactly 50%.
Same. Which surprised me, I expected more like 60% based on past experience -- but I think Scott's made some adversarial choices, so maybe that explains it?
I felt fairly confident when I finished the survey, and yet when I looked at the conclusion I got some fairly major answers wrong. So that was interesting.
One thing I learned is that for the more abstract themes, you can absolutely get extremely convincing AI images. Photos and anime stuff are still very visible.
Since a human must be picking the AI art pieces, there must be a selection effect going on that means the AI art has been “tainted with human-ness”. In fact the intervention of that choice probably makes it human art!
I don’t know if I agree with this or not but certainly you will start hearing this argument a lot in the coming years.
It seems obviously true. Every time I gen something I make at least 10+ images before I get a final version I like (including prompt engineering, generating a bunch of images and removing ones with bad anatomy, editing images for details, etc).
Sure, a lot of the art is in the choice
If you are going so far as editing the image then the human component starts to exceed the genAI contribution, in my mind.
If you'd seen the edits I'm talking about, I doubt it. I'm talking things like:
- Correcting the pupil's position by whiting out the eyes, then drawing a colored circle with a black circle in the middle, then feeding it through img2img again at a low denoising strength.
- Making a character darker-skinned by using a second layer in paint.net with multiply and sepia tones, or using similar techniques to make them lighter skinned, or tanned, then feeding it through img2img.
- Whiting over an extraneous arm, then feeding it through img2img.
- Making a character look taller by stretching their body from the neck down by 10-20%.
Yes. All of that and what you say in the previous response all go to the point that you are choosing what to *communicate* to others. Maybe the real art was the friends (audience) we made along the way!
I don't know. This would imply that the people who prompt and then curate the generations are making art, qualifying them as artists. I think there's a large contingent of people who will go to great lengths to not admit this and try and deny it as much as possible.
I genuinely hate it with the fire of a thousand suns, but the majority of artist’s skills will have been rendered obsolete in a matter of years. It doesn’t matter if they admit it or not, outside of a small niche of “made by real humans” marketing, the only skills that will matter are the ability to do creative prompting, and having good taste.
This will likely kill the low to mid-tier graphic artist industry. The day-in-day-out artists who work 9-5 making commercial products. I'm thinking board game art, company logos, that sort of thing.
"High end art" was already separated from that kind of stuff, and not by being "good" art in the classic sense. I don't think splashing buckets of paint on a canvas expresses much skill, but apparently that's all that certain artists do.
There will continue to be human artists. Likely heavily tilted to non-digital versions (physical paintings, sculptures, etc.).
If you've got younger relatives thinking of getting into commercial art, I'd definitely steer them away as much as possible.
"mid-tier graphic artist industry" - you might well be right on that one, but I find that hard to mourn?
Fair enough. I like human expression and the development of skills. I have a family member that can freehand photo-realistic pencil drawings. That stuff just amazes me. But the commercial art departments are not really the same thing. Drawing a logo or using Adobe to do it isn't exact awe-inspiring. If a computer program could do it in 30 seconds instead of hiring a full time artist to take a few hours, the choice is obvious.
I would hope that we don't lose something valuable in this transition. Right now AI art is unbelievable faster, but the final products are lower value (weird mistakes, coloration, design choice). Already it's usually worthwhile to use AI instead of spending hundreds or thousands of dollars trying to get a human artist to do it, despite the shortcomings. If those shortcomings get fixed, then that's probably fine.
(Disclaimer, I spent a few hours trying to get an AI to make art for a personal project the other day. The results were a lot better than trying to do the same thing a year or so ago, but I still got frustrated and quit trying before I got what I wanted. Being a personal project, I am not willing to pay for an artist so this doesn't really affect a person).
The same was probably said of photography. You just need to point and click - and have good taste in picking which resulting image to publish! Maybe the taste bit is the heart of art.
I think you needn't fear then, because "having good taste" is a much rarer skill than the ability to draw! (evidence: look at all the crap produced by randos and art students)
I agree that we're going to enter a new world of different art, but (1) I don't think the old art will die away. Evidence: there are lots more theatres now than there were before the advent of TV. (2) The new forms of art will be 99% terrible - as the old forms were - and the 1% that is good will be genuinely inspiring and enriching to the soul, and will reinspire further work in the old media, too. (3) AI is also going to take over many of the bullshit jobs that we still do, and we will become even more of a service industry than we already are. Those services will include a lot of drawing pictures for each other.
I actually think we're likely to have more painters than ever in 50 years' time, and they'll mostly be terrible, and you'll be sorry that you wished for it!
There is a philosophical point here, which is that the meaning of “art” will probably shift to more emphasise choice of composition. And prompting skill will become a thing. If someone can be good enough with prompts to generate something truly new, that is not even in the dataset, that would actually be an impressive feat of imagination + prompting.
Exactly
The smart move, imho, is to deny the premise and yield no ground. But the more obvious move is to fight those people with technicalities, arguing that AI art is in fact human art.
Because the "AI bad" people are wrong on the merits, there will inevitably be people fighting back, and because of how the world is, the fight will seek out an equilibrium where no one disagrees on anything empirical and everybody can continue believing what they believe. By that metric, "technically human curation makes it human" is a very fit argument
It’s not “AI bad” so much as that AI is methodically driving the value of skilled human labor towards zero. YMMV whether you think this is a good thing or a bad thing.
You can't have it both ways
Al is driving down the value of human labor, also the labor of AI has zero value. Then how is it a replacement for human labor??
In fact, you're right that Al labor is valuable. The people I referred to as "the 'Al bad' people" are the ones who disagree with both of us, who say that Al art is bad not because of its effects but because it isn't good art, and could never compete with art made by human labor
Humans are both consumers and producers. You’re talking about consumption of a product (art in this case), I’m talking about production. And I’m not really referring to a capitalism/socialism angle, I just mean that if we get to a place where people have (or feel that they have) nothing of value to contribute to the economy, it will be very destabilizing for society and a nightmare for mental health.
In other words, for a person who is good at art and derives a sense of purpose from it, AI is something of an existential threat to their sense of self worth.
Do you think the advent of industry driving the value of skilled human farmers towards zero was a good or bad thing?
Good point. If the list was curated to find examples that were less obviously AI or less obviously human, then that's not quite the measure it seems. I was surprised by one of the AI-looking examples being by a human. I'm sure it was picked so as to appear to be AI, a type of selection bias in the samples.
Right. I was surprised by this test since we would need to have some assurance that the candidates weren’t being selected with an eye toward making it difficult on both sides. Isn’t it pretty meaningless otherwise?
Humans pretending to be robots pretending to be humans!
Beyond that, I assume it's an adversarial test, not any kind of representative selection. I think that's fine, though.
I came here to comment something like this. If a human selected these particular AI pictures, then the test is able to demonstrate something pretty different from if an AI selected them.
Maybe the best possible AI tech only produces a human-quality picture 1 out of 100 times. If that's the case, you could build a test like this that the AI would pass. And yet, you wouldn't want to put that AI in charge of your art forgery factory, because it would fail abysmally.
The test you would need if you wanted to find out whether an AI could be successful as the central brain of an art forgery factory would be more like: you give the AI one single prompt, "Using both AI-generated and human-generated images, create a Turing style test capable of demonstrating that real humans can't tell the difference between AI-generated and human-generated images." Then, without reading the output or running the prompt any more times, you direct people to the test.
Or something like that. Maybe the prompt could be for the AI to generate or select the AI images, and a human gets to select the human images, that would seem fair.
If the AI images were selected by a human, then the most the test can establish is that AI is an excellent tool for human workers, and will increase their productivity. It can't establish that AI could replace a human worker. For that, you'd need a set of samples generated without human help.
An AI replacing a human worker is probably not the question businesses are interested in. Rather, whether an AI-assisted artist can replace a multitude of non-AI-assisted artists. For which no test is necessary, because clearly it is true. Just like longwall miners replaced thousands (millions?) of humans with pickaxes!
I disagree. If the intention is to try and differentiate the two, you are most likely to encounter AI art that you need to differentiate in a place where it's being passed off as human art. By definition it will be selected by someone trying to pick a "human looking" piece of art.
I don't think that's what we're doing here. I think Scott is trying to measure (or perhaps argue for) how close AI art is to human art. There were some obvious surprises in the mix, that looked like one but were actually the other. A human imitating AI, perhaps? There's no real world use case for that technology.
It seems to me that the human who operates the AI art generator is being approximately as creative as a photographer. Choosing a prompt seems very analogous to choosing what subject you're going to photograph, and then picking among the outputs seems somewhat analogous to making decisions on angle, timing, etc.
We grant photographers a copyright on their photographs, so by this reasoning, it seems prima facie reasonable to grant copyright to human operators of AI art generators. (Though I don't feel especially strongly about the photography rule, and could see an argument for granting copyright to neither, instead of both.)
I don't predict our laws will actually be shaped by any reasoning like that, though. On my model, AI art is sufficiently disruptive and sufficiently controversial that we're going to create rules for it based on the predicted societal results of the rules (though the predictions might not be very good), and not by making any true attempt to follow precedent or gauge how much creativity is involved. (Though lots of people will convince themselves via motivated reasoning that the precedent just happens to support whatever answer they wanted anyway.)
And I don't object to creating laws based on their predicted effects! But I wish we could be honest about it, instead of pretending that we make decisions by neutrally applying the existing rules, and then pretending that the existing rules somehow support the new policy we just made up. I think most discourse on the ethics and legality of AI art is utter garbage because people pick their side based on what they WANT the rule to be, but then feel obligated to argue that the rule already IS that, instead of arguing that we should make a new rule. And as a result, none of the arguments actually touch on anyone's true reasons for their positions.
This happens for much AI art that gets posted too, however. A person who spends a lot of time generating AI art to post online will heavily select for their best images. Yet, most people against AI art would still insist that isn't art—I've seen people say that even though they did extra photoshop work to adjust the image, it still isn't art.
I do think there is something that you're getting at for this test, however. There's the question of whether these were *selected even harder* than your usual images. Clearly they have been to a degree: classical and abstract images are not remotely the common styles. I think it would have been better with more proportion of anime / digital art styles, but eh.
These certainly aren't the first result you get putting a prompt into Automatic1111, that's for sure.
Not easy. Was mostly right, which I expected, but very, very wrong in a couple of instances where I had high confidence, which was quite surprising.
A nice selection of pictures for this task, all around, although one of them is very obvious if you know a certain tabletop wargame.
Also, I feel like I straight up did not see some of the pictures on mobile, but maybe that was user error.
My favorite was Zrqvgreenarna Gbja, despite being fairly confident that was an AI generation due to a defect in the shadows (and indeed it was). I got 37/49 = 75% right, and estimated I got 60-70% right. If it wasn't for qrsrpgf ba gur unaqf, I would have gotten a much lower score. I used the method of looking for defects, apart from those that had that canonical style you get when you don't prompt strongly on the style.
Damn. I seem to have lost the answers I submitted. I want to know how well I did but guess it's lost forever now. Does anybody know if you can find Google forms submissions via your Google account? (MS forms lets you save submissions but I don't think Google does?)
Scott, maybe worth mentioning in the post that if people want to score themselves they need to make a note of their guesses or keep the tab open or something?
At least google forms claims it saves your answers if you’re logged in. Though if you’ve already re-opened it and your answers are not there I expect they’re gone.
Yeah I was assuming it would automatically tell me the number I got right!
This would be much better than whatever it is we have right now
same
I didn't track my answers along the way, so I have no idea how poorly I did. But I kept getting tripped up on the meta-game of "how tricky is this test supposed to be?" I'd have one set of answers if I used the direct heuristic of "does this look real" and another, almost opposite, set of answers if I used "but that's what they WANT you to think!" Perhaps that's the point of the test, but it seems like it depends on your priors about how Scott thinks.
This was gnawing at me, too. I decided his heuristic was "I chose these because it was for me to tell"
Likewise, although unless Scottt was going to pull a “these are all really AI!” (and there were about 1.5 I felt pretty sure I’d seen done by humans so that was alleviated as I went on), plus the inclusion of a few — in particular Napvrag tngr, which is the one I was most confident about being AI — that were very obviously AI — made me evaluate it in the intended spirit as I went on longer.
I got totally punked by the “most confident done by a human” question, though, putting “Pvgl fgerrg / cnevf fprar” as my answer because it looked *so much* like a Erabve painting.
I said I'd seen one of the images, because I thought I recognized Cnevf Fprar.
I was in the same boat.
Hmm, I was more successful than I thought. Most of my mistakes were those where I was on the edge (like Still Life).
It also seems to me that the hardest art to judge are impressionist paintings. AI can do those really well (or maybe I am just not very knowledgeable about impressionist style to be able to judge it).
My main strategy was trying to judge whether the art is "conceptual". Most AI art seems sort of random with overt details and tiny inconsistencies (which is perhaps why it works so well with impressionist art which is itself kind of "smudged"). It is good at classical-like art but botches the concept, the composition just feels off somehow, it lacks structure and sometimes things make little sense from a logical perspective (i.e. are less internally consistent). But individual details are usually very good, so if the image is cropped to maybe a single element, it works very well. I think this would be easier if images weren't cropped and you knew you were always looking at the entire piece.
I got Still Life correct by noticing that one of the trails of wax drippings off the candle was extending out into mid-air, instead of sticking to the side of the candle.
Oh darn, that was the detail I used to select human! I figured an AI would paint a typical candle, while IRL candles melt into all kinds of weird shapes.
Same, I figured the wax melted along the path of a hair, a detail no ai would fake.
My strongest heuristic is specific details about some specific thing in the real world: the AI model has in its training data lots of pictures of hands or digital art of angel women or religious paintings of Jesus, but for example not so many depictions of Alcibiades/Alexander II character from School of Athens, or the most famous portraits of Dante, or a number of other specific identifiable characters, and when multiple examples of such are brought together, either it is a case of a deliberate curve ball using lots of inpainting or whatever else you might do with generative AI, or it's a human who has painted all the characters following specific real-world inspirations.
My second strongest heuristic is humans being most likely responsible for works that I find hideous. This topic has been discussed a lot before in SSC/ACX, but I'm in the camp that e.g. lots of modern architecture is hideous and that this is Bad, and I believe the most important single contributing factor is that architects view older styles as "been there, done that" and seek novelty, which I do get (within art, I'm most invested in classical music, and my own taste has roughly traced the historical developments over time), but which isn't a valid excuse to go against the preferences of the general public who should be the intended audience of the buildings. But I digress: I wouldn't expect AI-prompters to create hideous works unless it's a deliberate curve ball on their part, while trying to be novel is an obvious motivation for human artists.
I was correct on all counts when using these heuristics, but other than that, I had very little confidence in identifying any piece as AI or human: for instance, I have seen enough AI art to know heuristics like "it's soulful and evokes emotion" don't work. With that in mind, I assumed my track record would be about 60% (these two heuristics, and then a coinflip for the rest), but eyeballing the correct answers, it does seem like my gestalt impressions I can't pin down to any specific factor were slightly better than chance after all.
>which isn't a valid excuse to go against the preferences of the general public who should be the intended audience of the buildings
This is a political failure, not an artistic one. The elites want to appear tasteful and sophisticated, so they arrange for those atrocities to get the permits/contracts instead of the boring but popular stuff.
One specific tell that I think works well for anime-style art is near-photorealistic skin shading and shadowing. Compare for example the first and second such images in the test. There are only a small handful of human artists known for doing this and almost all artists will take some degree of liberty on these details due to the typical digital painting techniques used.
Ah, that's helpful. Wait, anime-style art is AI if it is *too realistic*?
Sure, and that makes sense for that specific genre. Anime art is intentionally non-detailed and exaggerated. Giant eyes, minimal noses, that sort of thing. Someone using that art style doesn't typically make them hyper-realistic. In addition to being a ton more work, it goes against the conventions. Might as well do a different style if you're going for hyper-realism.
"Oyhr Unve Navzr Tvey" and "Navzr Tvey va Oynpx" were very easy, but maybe that's because I spend a lot of time looking at anime art. I'll note that the last developments like nijijourney are harder and harder to notice
Is it because of the reason given by Morpho, or something else?
It is for this specific one but that's a failure of the model, if you look at #nijijourney on twitter you can find a lot that don't look at all like this.
"Oyhr Unve Navzr Tvey" had grade 4 KEY-itis, that and the hidden hands were a giveaway.
I still got it wrong, though. Background tricked me.
What's 4 KEY-itis? As for hands, lots of artists hide hands because they're hard to draw. The style also looks like older anime style, compared to the style usually used in AI art (I guess "pre-modern gacha" is how I would put it). Surprisingly it's from 2020 which I would not have guessed
KEY-itis is the kind of style KEY uses. Grade 4 is somewhere between Nagamori and Ayu. It's terminal, naturally.
And well, lots of human artists hide hands because they aren't good at drawing it. AI goes "hands? move aside, filthy humans, hands are no problem for me" and does its best, even if its best is three thumbs with six knuckles each.
Thanks for the explanation!
Agree, they were my most confident ones and got them both right
Interestingly I felt much more confident about which ones were definitely AI than definitely human (although some were, e.g. Crbcyr Fvggvat).
I do think that this should be done within a genre, and perhaps with some thought put into how image selection is done - if you image a very simple output then much of the information in it can come from selecting it from many. Here we have at least two levels of selection by humans - one by the AI "artist" who fine tuned the prompt then possibly selected that image from a bunch of generations, and one from selecting it for this test. Something similar goes on for the human selected images. (I'm not saying this to downplay the capabilities of gen AI, rather to qualify what conclusions should be drawn from an exercise like this.)
Thanks for this, Scott!
Sadly, I didn't record my answers and Google didn't email me a summary so I don't know my exact score. I think I was about 60% which is where I rated myself but I was definitely wrong a couple times in both directions.
I wish there was a "50/50" button because you could easily have AI create 50 images in the style of Cézanne and then hand pick the one that looks the most human/realistic
So it could be AI generated but then a human could have picked out the best one out of 50 making it basically impossible to tell in a quiz like this
There is no "Girl in White". It's listed in the lists of all the images, between "People Sitting" and "Riverside Cafe", but in the images themselves, "Riverside Cafe" directly follows "People Sitting".
Thank you for noticing my mistake! I've added it in. The picture is https://lh6.googleusercontent.com/aioJmwtNB87RO8KHikGPZH2krgR6vxE2wO3O06siFZXH3r6hD8dDndsZl5ty2DIRHOrBbt-LjwReWFcTL-70Uk6bEtqA7M58VcEuZz7nEEZyYopkmvcVe3iih2h4X2iF5w=w740 , and it vf neg znqr ol n uhzna.
It was my selection for "most confident" (got it right).
I was fairly confident I had got almost all of these right. Finding it was more like 70%, was a shock to me. Why does this make me so sad? I suppose as an artist, having an automated process completely devalue your work, so much that you often can't tell the difference, is going to be demoralising.
It doesn’t though. You will be a better AI artist than me, because you are an artist and will have better creative vision, better choice of composition, and most importantly a better ability to use AI art tools to create something truly new and breathtaking.
I somehow doubt you'd be able to tell the difference. Now I want to see a competition between the AI art generated by a bunch of artists and a bunch of tasteless nerds.
I don't think it comes close to completely devaluing human-made art. Sometimes I care a lot about where a piece of art came from, and not just because it came from someone I know.
- I did absolutely terribly (only just about 50%), and considerably worse than I thought I would (about 60%-70%). Yet, before counting my wrong answers, I was still confident I had done passably well - and I was telling myself "well, I'm proud I caught that one and that one and that one". Now I am uncertain as to whether my reasons for catching the AI were legit, or whether I was just proceeding at random!
- From the wording at the beginning, I gathered it was *very roughly* 50-50 human-AI, though not very close to 50-50. I should have just ignored that. Humans overcorrect whenever they are afraid they are getting into a pattern (and I knew that, so I have no excuse).
- I do think Scott is picking not just particularly well-done AI art, but also some human art that is particularly easy for AI to fake, or is otherwise AI-ish.
- AI is getting good at hands, and not every memorable painter from the past got top marks on them.
- "Don't zoom in" wasn't completely fair, as we are working with screens of very different sizes, and have very different default browser testing. Of course lots of people are working on their phones, that is, they had it much worse than I did.
- I'm completely unembarrassed at having nearly no clue about digital art or anime. How do you guys judge in that case?
- This is a cope, but: I had time for this because I have (symptomatic) COVID. Stay safe, and mask consistently in airports and public transportation.
I can post my detailed remarks on the ones I got wrong - or people prefer not to do that sort of thing? We should rot13 for that, even for the comments, correct?
The original post explicitly tells readers not to read comments before taking the test, so I reckoned it's reasonable to talk about this without encryption, at least unless it's like Scott's list which explicitly links the work and its creator status together in ways that you can spot at a glance.
> I'm completely unembarrassed at having nearly no clue about digital art or anime. How do you guys judge in that case?
I think I marked every single anime girl as ai; because fuck that.
I've no clue about anime, but the inconsistency of photorealistic armpits on an otherwise cartoon girl still made the anime girl in black the easiest one to tell it was AI.
Truly shocked that Jbhaqrq Puevfg is human, I guess you don't need to be a robot to have a completely fucked up understanding of torso anatomy
Right, I thought the same. I really think Scott picked that one on purpose.
This "test" is of course a lot harder if the test maker deliberately adds such red herrings.
On the flip side, it IS a major Renaissance painting. Not like he had to go nutpick from a random bad modern artist.
yeah, I just immediately recognized it
Ayup, this was my "most confident" one because it was the only one I was pretty sure I'd seen before.
Yeah, I think it's fair game, even if you want human-made art that's reasonably representative*. Humans screw up anatomy too.
* Which I don't think should be demanded of a test like this, so long as you report your selection methodology with your results
For that one I had already assumed that plenty of human artists draw torsos in a fucked up way like that, so I looked at the other details instead, concluded that some of the faces in that image looked too lifelike to match the fucked-upness of the rest and that pushed me towards voting AI. So I played myself.
I was pretty confident that one was human because I don't think that kind of gore would make it past the content filter on a lot of the AIs.
I guessed that one is AI too: gur jnl gur ybvapybgu vf qencrq xvaqn znxrf vg ybbx yvxr Puevfg unf uvf qvpx bhg
That one was incredibly obviously human to me. Look at the hands; they're unrealistic in that they're overly-exaggerating their correct anatomy. AIs can't count to 5, and they definitely can't precisely represent every bone in every hand in view.
I'm a bit embarassed for getting that one wrong myself, since It's a famous picture which I'd seen before, but I picked AI anyway because I had assumed that it was just an AI reproduction.
Renaissance artists didn't have the resources for anatomical study we have in the present day. Their weird anatomy makes more sense when you realise they were basically inventing everything about modern drawing from first principles.
It seemed too bad to be AI, I didn't believe AI would have the clipout faces
31 out of 50, not bad
There are 49 pictures.
There were originally 49 but Scott edited the survey to add a 50th.
Too long. I stopped caring, way before halfway through.
> I’ll put an answer key in the comments here
Wouldn't it make more sense to save the answer key until the results are already in? Whatever results you get here can't really be trusted.
If someone wants to cheat, the can just look up the images, magnify them as needed etc.. This test is reliant on trusting your submitters anyway.
I was about ~75% right, expected about ~80%, so I would call it a win. The hardest ones are landscapes/cityscapes or otherwise paintings that have very generic ideas and commonly used techniques; the more interesting/creating the original idea is, the easier it is to notice the flaws with the AI execution.
But I do paint myself, I would expect the layman to be much less attentive to certain details I know to look for.
Also, when asked which I was most confident is human, I excluded the ones I recognised - I'm not sure if that was the intended approach?
I did something close to this too.
This was an interesting test, but for a true Turing Test I need to be able to ask for the painting I want to see, and then either an AI or human painter produces it. There are obviously artists so unskilled that they can’t do fingers and subjects AI is good enough at that it can avoid mistakes, so a malevolent testmaker could make this test difficult even if AI wasn’t widely competent. Complex crowd scenes where everyone is reacting in different but appropriate ways, specific dinosaur species, and “horse on an astronaut “ type images are still quite difficult to get AI to generate.
Agreed, this was less a Turing test than a set of specifically curated examples. Less freeform conversation and more "judge these samples of text to spot the imposter".
Still interesting as an exercise.
Wow, I did badly. 18 out of 49. That's much worse than chance, but it's not bizarre bad luck, it's obviously "using a metric that's anti-correlated with correctness". In particular, I seem to not only vastly underestimate how many humans will do weird interpretations of a prompt (either overly literal or barely coherent) but also how much more likely than AIs they are to do such "AI-like" things.
I mean, seriously, what kind of human responds to "Woman Unicorn" by painting BOTH a woman and a unicorn?????
(My favourite was String Doll, which I incorrectly judged as human, but was very unconfident on that one.)
the names of the images were just for reference. These were not prompts in any sense.
You wrongly assumed that the humans (and the AIs) were prompted these short titles to generate the artwork. This piece is the Gur Znvqra naq gur Havpbea from Domenichino
Argh. Well that wasn't clear to me at all. I know virtually nothing about visual art (and I have a terrible visual memory in any case). And for a community that seems to generally have less knowledge than I'd expect of fiction, music and religion (for an educated group), I didn't expect this test would involve or rest on any knowledge of classical art. Which I, unlike those other things, know nothing of.
Not that I'm complaining.
That one was pretty obviously a real human work. You can even see the cracks in the fresco!
I don't think you were *meant* to spot the mildly-famous paintings, it was just inevitable that some people would.
I'm guessing Scott will try and use the the "how many had you seen before" question to adjust for that unwanted factor.
Are those supposed to be prompts? I thought they were just descriptive captions chosen by Scott after the fact.
Yes, they were just descriptive captions chosen by me after the fact. If I'd used the real names of the paintings, then people would have noticed which pictures had more flowery names.
I almost picked human on String Doll, and then noticed a small anomaly in her right middle finger and reversed my decision.
(It looks like there's another finger that is almost completely behind the middle finger but slightly peeking out, except all her other fingers are accounted for.)
I felt like this was the most easily identifiable as AI. Look at the overall composition. It's terrible. There's no coherence in how the strings relate to one another or articulate to the frame or anything. It looks fun, sure, but it doesn't make any kind of sense from a composition point of view.
Is there a way to get our answers back? I'm not confident on which way I guessed on some edge cases.
If you want to score yourself, it's more meaningful to do it in proportion to how certain you were, rather than as a series of binary right/wrong. So the ones you were most hesitant about are the ones that matter least to your score anyway.
I don't want to be super salty here but it's tough to do this on mobile--I went back through the answers on desktop and seeing the pictures in full resolution showed a bunch of details I'd missed and/or misinterpreted when I was initially filling out the form on my phone. Which maybe speaks to how tough this test is, idk
Most surprising that they were human:
- Blue hair anime girl (What is going on with her arm?)
- Giant ship (What even is that thing?)
Most blatantly "DALL-E house style": Minaret Boat, Ancient Gate, Turtle House
I found it way harder to be confident something was human than to be confident something was AI.
> Tvnag Fuvc
It seems to me blatantly designed to look like AI art, especially gur ohvyqvatf unatvat hcfvqr qbja.
> - Blue hair anime girl (What is going on with her arm?)
Yeah this one baited me as well. The arm is weirdly distended. The hidden hands is another AI thing - often negative prompts or overweighted LoRA/embeddings to correct bad fingers will hide the hands so the AI doesn't have to attempt anatomy.
I've noticed a lot of human artists (especially amateurs drawing anime-style stuff) also like to hide hands/feet to avoid having to draw them, since they're complex and hard to get exactly right.
Digital art has looked bad since before generative AI was even a thing tbh. Especially anime waifus on deviantart. Will be honest that I guessed randomly on those because they all looked like AI.
> What is going on with her arm?
That's her wrist. I thought it was AI at first too for the same reason but then I noticed the hair fill-in patterns and realized AI would just do strands instead of that.
The test would have been a lot easier if there was a law that human artists going roughly for photorealism should draw things to a non-ridiculous scale.
I got that one right because it was bad in a really human way, actually.
For anyone participating, you might enjoy this video where VFX artists describe subtle technical "tells" of well-made AI-generated pictures and video, including stuff like noise patterns and contrast distribution
https://www.youtube.com/watch?v=NsM7nqvDNJI
I'm watching it. It's very good.
Corridor Crew has a lot of amazing videos like that, including a series of ghost/ufo debunking videos and an entire anime short film they made using ai.
This was shocking hard to do - I was really having difficulty telling which was which.
And I have been using generative AI to make images; it’s getting better quickly.
If the painting is a really old one, we run into the problem that artists before the Renaissance often had an iffy grasp of perspective, so you can’t use those errors to tell its AI. A shout-out to Filippo Brunelleschi (1377–1446) maybe in order here.
(Redacted) has no excuse, being slightly later than Albrecht Dürer.
Scott’s choice of human generated paintings is maybe biased towards the recent, i.e. Renaissance onwards, so basically everyone is trying to do linear perspective … until we get to the ones that are later than Picasso.
Well, some of them committed errors in perspective that experts (not me) would recognize as typical, no? At any rate, right, there are hardly anything pre-Renaissance here.
Yeah, the one that has me thinking “human, but old, and that’s not an AI error” is definitely Renaissance, not earlier.
Do you mean jbhaqrq Puevfg?
Yes, that one.
I don't want to look at the answer key yet for Anime Girl in Black. I will be very disappointed to discover that she has no soul.
Don't buy into artist propaganda! AI waifus are just as ensouled as traditional ones.
...technically true.
The best kind of true.
She has an *earring* coming off of her *hair*.
Huh. I got 22 out of 49 right - I picked a strategy I expected to be much worse than chance - but actually made an exception to guess that "Anime Girl in Black" was AI, because even I was extremely confident that was AI. What details make you think "Human"?
Not OP, but I guessed Anime Girl in Black was human, because the body proportions seemed right and the background scale/perspective seemed congruous. I thought those were hard for AI to get right (that's how I correctly guessed some other AI stuff). In retrospect I missed some other clues like the earring.
Also not OP, and I ultimately picked AI on this one. But one detail that pushed me towards "human" was that the water seemed inappropriately realistic for an anime picture, and I thought humans would be more likely to mix visual styles than AI, since AI would probably be prompted with a specific style to follow.
But I have basically no expertise on either anime or AI art so it's plausible I'm wrong about either or both of those.
(Also, there were 2 anime girl portraits, and Scott said he'd tried to balance themes, and I'd already picked AI on the blue one, mostly because of her arm, but also because her hands were hidden and her hat seemed to be floating. I ended up saying AI on both, though.)
> I thought humans would be more likely to mix visual styles than AI
Yeah, this bit is exactly backwards. AI struggles to keep a really consistent style across the entire image, and particularly tends to have defects where it goes too photorealistic in a small section of a work.
This was also a tell in the Robot art at the end of the test; one hip joint has a smooth gradient for its internal shadow, which is wildly out of place with the style of the piece.
>I picked a strategy I expected to be much worse than chance
Why would you do this?
I wanted to see the baseline score you'd get if you assumed that every piece of digital-style art was AI and every piece of non-digital-style art was human, as a sort-of proxy for the number of 'trick questions', or a test of the art genre's balance. But since I made three exceptions in cases where it was really obvious (two towards AI, one towards human), I suppose the actual baseline would be 19/49.
Which would *still* not be the worst score reported in these comments, so I'm interested that there are strategies even more anti-correlated with success than mine.
>What details make you think "Human"?
Wishful thinking. Oh cruel world!
I was impressed by the anatomy of the left shoulder, both the tip of the collar bone and the armpit creases. The indications of the crook of the elbows is just right, unlike on "Cherub"'s left arm, where there's just a crude line across the elbow. And the way the dress is rendered I can practically feel the spandex/polyester fabric. "Blue Hair Anime Girl" is good, and its human artist might not like to hear this, but it's not at this level.
Alas, I have to recognise Anime Girl in Black's misplaced earring, the armpit creases of the right shoulder going up too far, and the details of the bronze ornamentation breaking up a bit on close inspection. But if this quality could be sustained over a whole animation, I'd watch it.
But look at her *hands*, though!
The realistic armpit folds are a tell. Turns out you prefer 3D after all.
ugh, I only got about 70% correct when I had estimated closer to 90. The weird one-off styles without anything to compare them against were the hardest.
I can also confidently say that anime girl portraits have been cracked by AI.
The “bias in the training set” AI risk people would doubtless have something to say about the choice of training material for some of the image models…
I found the anime and digital art ones among the easiest to pick, and I don't watch anime.
OK, here is a selection of things I got wrong. Can you give me your perspective on how I got them wrong? That is, which things were evidently AI and why? And how were my reasons for thinking some human art AI actually spurious?
Pureho: N, guessed U
Not bad, AI! I didn’t like this, but I went with U simply because it had good hands. I think someone better at art history would have spotted this one – the style seemed a bit inconsistent.
I can’t remember what I chose for Gebcpny Tnegra, as I was really on the fence. Point against human: very poor drawing of a succulent in the foreground.
Terra Uvyyf: N, guessed U
Well done, AI!
I can’t remember what I chose for Yrnsl Ynar: N because I was on the fence. Part of the wall/mound on the left side is off but then things are sometimes off in impressionistic landscapes.
Zbgure Naq Puvyq: N, guessed U
I’m surprised by this one! (I picked it as “most likely to be human”.) I went with U out of an immediate Gestalt impression. I kept finding reasons to have doubts, but they were at the level of hunches, and the hands were pretty good. Seems less convincing after zooming in.
Checyr Fdhnerf: U, guessed N
Seemed dull to me. Sorry, Cnhy Xyrr. I guess that one of my hunches about Zbgure Naq Puvyq was correct: “maybe AI is trying to be shiny and attractive in a shallow way, and works the way Pepsi wins blind tests against Coke”.
Crbcyr Fvggvat: U, guessed N
I guess that bizarre Dutch table-like piece of furniture really did exist. Otherwise I thought it was a good painting from a well-defined period.
Evirefvqr Pnsr: N, guessed U
Had an AI hunch, thought I was guessing N too often, couldn’t find an overwhelming reason to go N.
Inthr Svtherf: U, guessed N
I thought “this could be human art that would be easy for AI to imitate”.
Juvgr Synt: U, guessed N
I liked it but the scene depicted seemed incoherent. Oops.
Jbzna Havpbea: U, guessed N
“This is a good one, but Lady and the Unicorn motifs are earlier, and one of the hands isn’t quite convincing”, I thought. Ooops. (Actually, that bit about their being earlier is wrong – the famous tapestries are early XVIth century, they are just housed in a medieval art museum.)
Pvgl Fgerrg: N, guessed U
Not surprised. Though “I have been choosing N too much; some art is just bad human art”.
Pvgl Fgerrg: N, guessed U
I. Should. Not. Have. Counted. The strange thing the man is riding (is it a bike with a rear basket or a carriage squeezed to the proportions of a bicycle?) should have convinced me to guess N. Also, the buildings on the left side are barely plausible if the perspective is right.
Cerggl Ynxr: N, guessed U
Not so surprised: I had doubts after zooming in a little bit (the only time…) and I felt a bit guilty though I confessed. (I just pressed Ctrl-+ twice on a laptop – lots of you guys must be working with big screens.) The Dali-like tendrils on the road were what made me think this could be N; they are more obvious if one zooms in more. Also, the trees look a little too fractal.
Synvyvat Yvzof: U, guessed N
I feel like Scott is being evil here. He isn’t just picking particularly successful attempts at AI art – he is deliberately choosing some human pictures that share some characteristics of AI art: lots of little figures of the same kind in a jumble, bad hands. (Perhaps I should have thought: the hands here are just *too* bad, and AI has made progress in that particular (more than I expected in fact).)
You didn't spend enough time with flailing limbs.
It might seem a bit AI-like at first, but the more you look at it, the more figures you find, and the more sense things make. AI is very bad at doing that effect.
There was no image that I was more certain to be human than flailing limbs.
You are probably right: I thought "but this is telling some sort of story that starts to make sense" but then overruled that by "no, you are projecting".
Yeah - I really enjoy a lot of AI art, but that one just felt startling in a way I'd never seen before, one of my favourite pieces I can remember seeing.
"Checyr Fdhnerf: U, guessed N"
It was just very consistent, no weird mismatches in the pattern-work.
Zbgure Naq Puvyq
Guvf sryg yvxr fbzrbar nfxrq sbe n Xyvzg naq/be n Olmnagvar cnvagvat va n cubgbernyvfgvp fglyr. Vg znqr ab frafr jvgu jul fbzrguvat yvxr guvf jbhyq or fb qrgnvyrq.
Evirefvqr Pnsr: N, guessed U
Gur punvef zvfzngpurq gur gnoyrf, gur gnoyrgbcf jrer n ovg gbb jvful-jnful.
Inthr Svtherf: U, guessed N
Too consistent, no weird blurring etc.
Juvgr Synt: U, guessed N
Lrnu V nterr guvf jnf n gbhtu bar ohg gur cnvag fglyr ybbxrq gbb pbafvfgrag, naq juvyr gur svtherf xvaq bs znqr ab frafr V raqrq hc tbvat jvgu U orpnhfr bs trfgnyg, gur erq synt va gur qnex ng gur gbc bs gur vzntr xvaq bs zngpuvat gur juvgr synt'f funcr jnf fb flzobyvp.
Cerggl Ynxr: N, guessed U
Gerr ebbgf jrveq naq gubfr penpxf lbh zragvbarq gbb.
Synvyvat Yvzof: U, guessed N
This was my biggest embarrassment, I also guessed as you did. Sam is 100% correct.
On Checyr Fdhnerf and Inthr Svtherf: I guess I just haven't played enough with AI to know what kind of mistake AI would be likely to make nowadays on this kind of image.
Juvgr Synt: Right, if I had noticed how the shapes of the red flag and the white flag matched in the world being modeled (while their projections to the 2-dimensional plane are different), that would have been a giveaway. However, the activities of the characters on the foreground still seem oddly disjointed, and their spatial relation to each other is not so clear.
Evirefvqr Pnsr: I can see your second point now. I don't see how the tables and chairs are mismatched but then most of my tables and chairs are probably mismatched IRL.
Zbgure Naq Puvyq: That's an example of a good art-history reason that went beyond me - there are gaps in my 19th-century/early-20th-century art knowledge - for instance, when it comes to Russia and most other Slavic countries, which had plenty of technically proficient people who were up to I don't know what. If you told me there was a painter around 1900 whose aesthetics were influenced by early Art Nouveau/Sezession but also aimed for a precise, detailed rendering of light on the human body, I would believe you.
It's a bit disturbing that one can argue (soundly) "this is not human, as there is too much attention to doing things realistically" - especially when it comes to something that is not just some anime nonsense.
Sorry, I was being too brief with the tables. What I mean is if you take a look at tables 4-6 (counting away from us), their corresponding chairs kind of lose structure/placement, e.g. one chair kind of sits awkwardly far away from its table, and also I'm not even sure it's all there.
I think to your last point, we kind of of get into the issue of context as others have brought up. In isolation, I would have no idea if this is a painting; I mean there're no dead giveaways/errors that I saw. It's just the resemblance to other styles and the nature of this quiz and knowing that AI can transfer styles with ease that pushed me in that direction. The anime comments ("too realistic skin" "too realistic shading") I had no idea about, went on errors and artifacts with those!
> It's a bit disturbing that one can argue (soundly) "this is not human, as there is too much attention to doing things realistically" - especially when it comes to something that is not just some anime nonsense.
To be fair, the actual critique here is that the AI is being stylistically inconsistent; it's mixing a more photorealistic technique in some places with a more stylized technique in others. See also the left hip joint in Punk Robot; you have a gradient-shadow on the cylindrical surface, which is photo-realistic, but very out of place in that art style.
The "how much do you know about AI images" question is missing a "dedicated hater" option. For those who don't deliberately seek out AI images, and certainly don't generate any, but have studied how to recognise them to avoid them.
Right now the options are like:
- I have no idea what AI images are
- I dabble in AI tee-hee
- I generate images regularly and look at them
- I LOVE AI 💖
Which... uhm... does not cover all possible views.
Most haters are unvirtuous, and avoid learning about ai because it's morally tainted. The landscape is always changing, and it takes dedicated time and attention to keep up on all the latest ai news, and there is no downside in saying you hate ai without learning anything about it.
I'm not necessarily an AI hater, but I think it's interesting that you think it's necessary to keep up with AI news in order to have a valid negative opinion of AI. Why can't one be a hater from first principles?
Edit - sorry I realized I misread your comment, I think you are just saying that haters tend not to seek out information about AI, not that being a hater without knowing about AI is morally bad. Although I'm still not sure what "unvirtuous" adds.
The “it can’t be AI - her boobs would be bigger if it was AI” heuristic might work for NovelAi, but less effective here. (It’s a noted bias in certain AI models).
AI can do all kinds of boob sizes! Any boob size or shape you want, there's an AI that can do it.
Missed the “don’t zoom in” instruction, sorry, although I think it was only dispositive in one case and probbaly led me astray in others.
I didn't see that instruction either, but images were low res enough that it didn't seem to help to zoom
I think it was relevant to the impressionist pictures appearing to use brushstrokes, and also an artifact in part of the "Lander" image.
Actually, I just loaded the survey on my laptop and I would have changed a bunch of my answers compared to looking at them on my phone. Oopsies!
Same, the request to not zoom in was really rough on my screen
“Surely no human being would create a picture as generic and clichéd as that” - uh, no, that was a human.
I mean, if no human had done it, it wouldn't be a cliché.
Yeah, my experience with Tvnag Fuvc
Is there a way to get my answers scored? I'm not going to go back over it by hand, trying to remember my answers.
Also, it's trickier to distinguish AI from photoshop and the like.
I’ve noticed that generative AI doesn’t decouple artistic style from other details of historical period, so e.g. if I give it a sketch and tell the AI to work it up in the style of a painting by Jean Honoré Fragonard, I am going to get eighteenth century costume no matter what I draw in the preliminary sketch. I can work with this by drawing the sketch with the corresponding period detail, of course.
Also, when using image to image any minor errors in your drawing of the human figure will tend to get transferred to the output rather than fixed up, so all your practise from life drawing classes comes in useful.
I didn't notice you telling me not to zoom in. I don't think it changed more than one of my answers.
https://manifold.markets/Prime/what-will-the-median-correct-score
What will the median correct score be on Astral Codex Ten's AI Art Turing Test?
I think it’s generally bad form for there to be a market on “what will be the most popular answer to this open online survey”. This creates incentives to brigade that will not just ruin the market, but also the survey. We should have learned this lesson from Boaty McBoatface.
I considered that but think the risk is low, if there's a consensus to close it or Scott feels the same, I'll close it.
It is a fun idea but I think you should close it because skewing effects
I echo the others -- I think you should close it. Chastity's adversarial example is good.
I’m confused… what could possibly go wrong here? Am I missing something obvious?
Pump 90+%, and then fill out the form like 3,000 times with perfect stolen answers.
Oh. That would work. Thanks for explaining.
65%
"I've tried to crop some pictures of both types into unusual shapes"
Composition is an artistic choice. By cropping you introduce human input into the AI images and something uncanny into the human images. Even if humans aren't able to complete this task, they might be able to sniff out the AI images in a regular setting.
It's not as simple as comparing performance on the images that have not been cropped. Knowing some images might be cropped also changes the way we categorize the uncropped images.
You can get around this by doing randomised cropping, each image gets cropped between 0-20% on each side at random.
This doesn't work either.
Agreed. With the cropping, I couldn't get a gestalt impression of the art and had to fall back on looking for "tells"—which was in turn harder, because (especially with traditional painterly stuff) some details would be interpreted differently depending on where they fall in the composition.
Kvestion: how many here could tell a computer screen image of the genuine Mona Lisa from a well-executed copy?
Depending on how well is "well-executed", nobody?
I don't know much about art or about AI, but Good Art (tm) obviously has a subjective component and a technical (ie. much-less-subjective) component - and (I think) "artistic value" is a separate, nearly-orthogonal concept to "good art":
_______________________
Subjectively good art:
Literally by definition, if people can't tell the difference between human and AI art, the AI art must be just as good subjectively. If people can tell the difference today - well, it's pretty clear it won't be too many years* before they can't..
(* or decades or centuries - were're concerned with the general principle here, not the specific timescale)
_______________________
Objectively/technically good art:
Whatever the non-subjective, technical skills aspects of art are (I don't know much about art but I'd guess light/shade, perspective, [golden] ratio, choice of colours/palette, etc. etc.?), computers are much better at processing and applying complex technical rules (and at scoring candidate pictures according to technical rules, which, when you can produce lots of candidate pictures super-quickly, more-or-less amounts to the same thing in a sort of P vs. NP sorta way..)
_______________________
Artistic Value:
Very different to "good art"! If the *value* of art is in its *creativity/originality* [eg. the Mona Lisa has an artistic value of 10, perhaps the first copy of it - though entirely equally good in terms of the technical skills it exhibits and how it subjectively makes you feel - only has an artistic value of perhaps 1, and the tenth copy of perhaps 0.000001...] then how can an AI, which is literally just mindlessly aggregating/copying millions of existing pieces of art rather than doing anything creative/original, have any value?
WELL: what is a human artist doing? Human artists make art by combining their internal feelings/inspirations/moods/whatever (which is totally unpredictable and irreproducible and basically just a really complicated random-number generator) with the things they've learned from looking at and thinking about millions (or at least thousands) of other pieces of art; in other words, exactly what the AI is doing! Probably you could express this as something like, "artistic value = specific neural weightings generated by exposure to large quantities of art + RNG", in both the human and the AI case..
_______________________
(Also: "Fancy Car"?! FANCY CAR?! It's *OBVIOUSLY* a late-1980s Testarossa! How can Scott possibly not know this?! Besides, [to paraphrase P.G. Wodehouse] anybody who could describe a 1987 Ferrari Testarossa merely as a "fancy car" would probably be content to call the Taj Mahal "a pretty nifty tomb".....)
I think your last point about “fancy car” undercuts your first one “if people can’t tell the difference then it must be equally good”. I can’t tell the difference between the cars, and probably 90+% of the people who take this survey can’t. But a relevant target audience can.
Art has a target audience, and other people can educate themselves to be in that audience. If the target audience can’t tell the difference then it’s probably equally good. But if a bunch of internet randos can’t that just means we aren’t in the target audience.
My point about the Testarossa was, well, just me poking fun at myself for noticing and caring about such things (which is... really not a typical indicator of a cool or sexy or interesting person, here in 2024...) - sorry for subjecting you to my peculiar inwards-aimed sense of humour, there wasn't really any meaningful point there, and of course I entirely agree that if a person can't tell the difference between a thing made by A and a thing made by B then as far as that person is concerned A and B must be basically the same in terms of quality/value.
Artistic value is a subjective metric, since it's on the "ought" side of the is/ought divide, so the question of "what is art" (and also "what is good/valuable art") basically means "what would I want art to be". That's why I think that tying the inherent value of art to information processing is doomed to fail in the same way as trying to define the terminal values of humanity in terms of information theory. It's just too cargo cultish.
Human effort is a crucial part of art for me. Art is the deliberate application of effort in an anti-inductive direction that makes humanity special. It's the conversation that humanity is having with itself. It's half of the difference between a thriving utopia and a wirehead dystopia (the other half being the network of social interactions).
The aggregation and processing of previous art is neither necessary (see outsider art, or prehistoric people) nor sufficient. One of the distinguishing marks of trashy movies and bad fanfiction is that they seem to run mostly on genre cliches due to their creators' low engagement with reality. As Hayao Miyazaki said, the problem with modern anime is that it's made by people who can't stand looking at other people.
Thanks for your reply! Made me think a great deal!
I think I understand what you say about human effort: if I understand you right, you seem to be saying something like "I don't think human art necessarily makes a piece of art any better or worse, but it makes me glad to know that a human put lots of effort into the art; it makes me think positive thoughts about the human spirit and the potential of human civilisation - since I can only get this from human art and self-evidently not from AI art, then human art is far more valuable to me." - is that right?
If so, I don't really get this feeling with paintings - to me, the artistic value is whether it says anything new or original, whether it says/does things that another artist would not have been able to say in this way, whether it advances the state of the art (pun fully intended) - in which case, of course AIs can produce work of equal (and eventually presumably greater) value to humans (for clarity - I don't think they have yet, I just expect they shall eventually) - but I *do* get this (or at least something like this! - feeling with architecture: to look at a wonderful old Basilica or something (or, for that matter, something like Stonehenge) fills me with awe and wonder and pride on behalf of my species* that we had the passion, intelligence, skill, and gumption to design this thing without computers, build it without cranes and JCBs and things, making it all fit together, keeping the project going over how many years - and crucially making it so beautiful and complex and inspiring (pun also fully intended...) Whereas, the computer-designed, JCB-built Shard or Burj Khalifa or whatever ...doesn't exactly inspire me with anything, really!
(* or at least makes me temporarily forget about how greedy, selfish and cruel the human spirit really is..)
As for the aggregation and processing of previous art being unnecessary and insufficient, I can't tell what you're trying to say:
If you're saying it's unnecessary & insufficient to make you feel the way about AI art that you feel about human art (or that I feel about architecture), then sure, absolutely.
If you're saying it's unnecessary & insufficient to make pieces of AI art that are indistinguishable from the best human-produced art then (whilst I suspect this Turing Test will demonstrate that yes, this is currently the case) I would be frankly astonished if it's the case *in principle*.
If you're saying it's unnecessary and insufficient to make human art in the first place - well I would say that 1) it's not like the prehistoric cave painters or the outsider-artists didn't make their art by combining iterative learning plus training on a large sample (even if that sample was observation of the world around them rather than other people's art..) plus the incorporation of some basically random factor that's inside them that they can't understand, control, or predict, just the same as other artists and AIs; 2) that this kinda demonstrates that art is basically a pretty easy field of creativity (relatively-speaking): I can imagine that in principle you could get an outsider-art or prehistoric painting that's as good as the Mona Lisa** - but not an outsider-art cathedral or a prehistoric 1000-page novel; 3) if you did a "trained on large collection of paintings vs. made it all up from personal experience alone" sorta Turing test, I very much suspect that the former would be very easily distinguished and almost-universally more liked, suggesting that actually aggregating and processing art probably is kinda necessary after all...
(**Disclaimer: I'm not a Mona Lisa fan and I only really appreciate it for the way LDV - er, before his lateral move into designing lorries in the West Midlands - somehow managed to painted with an alpha channel, which is obvs. a purely technical-skill sorta thing; just using it here as a placeholder for "widely-appreciated piece of art"..)
What I'm saying is that I would not want to live in Nozick's experience machine. I would like art I see to be made by living, breathing people even if I'm not aware of the difference. I would like humanity in general to make and share lots of art even if I can't keep track of it all.
I'm sure that someone could set up a network of bots that simulate an artistic scene, set up exhibitions, influence each other, go through trends etc. They could even have personalities and biographies that would shine through their work. It would be very fun to watch, and maybe I'd get emotionally invested, like I get invested in the lives of fictional characters. But ultimately, it'd be like Bostrom's Disneyland without children.
As for "aggregation and processing of previous art being unnecessary and insufficient", I'm disagreeing with the idea that this is where the value of art lies, that this is basically all that a human artist is doing, and that their own contribution amounts to a random number generator. Even under the notion that a formal, mechanistic definition of valuable art is desirable (as I mentioned, I value it for being anti-inductive, free from formality), you don't get closer to it by neglecting the artist's experience, since empirically (IMO) art that lacks it is usually poor.
I think I understand most of what you're saying (though I can't exactly disagree; de gustibus non est disputandum!), but I'm having trouble with the idea/assumption that personal contribution isn't effectively a random number generator. It appears to me that "the thing inside us that makes our art uniquely ours" must be either:
1) Dependent on the experiences/life we've had - that we don't get to choose and can't understand or predict and is thus essentially random from our point of view. If you go back in time and give teenage Leonardo da Vinci a different maths teacher or something (but keep everything else the same) then the Mona Lisa looks verrrry slightly different; if you go back in time and transport baby Leonardo da Vinci to 1970s Bristol then "Mona Lisa" comes out as a drum'n'bass record. Or:
2) Dependent on minute changes in how our heads are made (eg. our genes, how our brains are wired-up, etc.) - that we don't get to choose and can't understand or predict and thus is essentially random from our point of view. If you drop Leo on the head as a baby or take all the lead out of his environment then you change the Mona Lisa. Or:
3) Absolutely not dependent on any external physical factor, but is somehow sending "signals" into the physical world from "elsewhere" - in which case we *definitely* don't get to choose it and *definitely* can't understand or predict it! If you could somehow scramble the alien space rays, the Mona Lisa comes out scrambled.
Like, if this trichotomy is valid (ie. some combination of these must be true) then I don't see how the random combination of unknowable and out-of-Leo's-control factors that made Leo different from some other artist is any different to the random factor added to your AI's training/learning/output/whatever that makes it generate interesting, beautiful (and presumably eventually indistinguishable from human output...) art? Or, if the trichotomy is invalid, I don't see what other possibilities I've missed?
Point 1 is the closest, but the scope should be wider - it's dependent on the aggregate experience of large chunks of humanity, which includes both the creator and the audience. You don't always know the details of the creator's life, but the things they have in common with you are what enables them to innovate in ways that resonate with you, rather than flail around randomly in the latent space.
Could a generative model trained exclusively on pre-1900 music invent Heavy Metal? Nothing in its training data suggest anyone would enjoy such a thing, and in 1900 probably few would. It was shaped by a twisting path from West Africa through the US to the UK. Some of the stations in the path were the integration of Black people in the US, the physics of electric pickups and amplifiers, the rise and fall of the hippie movement, and the physiological experience of moshing in a pit.
A sufficiently advanced generative model that runs on preexisting music and randomness alone could create new music that exists in *a* world, most likely not ours. It'd be fascinatingly alien, but it would not be relatable. Maybe you'd need an AI that also scans magazine articles, concert videos and social media, IDK.
Correction: the chunk doesn't have to be large.
In the examples here, the two most prominent parts of the human side of the frontier are concrete scenes, and stylistic rawness that hasn't yet been codified into an imitable style (like impressionistic brush strokes). This means that while it nails late 19th century-style nature landscapes, it hasn't yet reached medieval, expressionist and abstract art, or relatively raw 3d renders (whenever AI imitates 3d art it gives it an overly polished sheen).
And of course, anything that isn't a digital file is out of reach for image generators, so in the offline world painters aren't in danger of replacement yet. I'm not trying to be annoying, I think that the distinction between a painting and its digital copy is an underemphasized point.
I've been wondering if highly textured paintings - palette knife or whatever - will become more popular, as something that's unavailable with AI art. I think prints can be made with some texture, but not the really sharp peaks and whatnot.
That's a great summary! I was thinking along similar lines but you put it better.
I think that impressionism is also particularly easy since it is blurred in a way that makes it easy for AI to hide minor inconsistencies.
Also AI seems to be less conceptual, sometimes when I was on the edge I was thinking in terms of "does this composition make sense"? And got it right each time in these cases. It is harder to do when the image we have is cropped e.g. to a single person.
There's something poetic about the fact that the two most obviously human images were both blending masses of human figures.
This all reminds me of the time I visited the Museum of Fine Arts in Boston. Just an endless amount of completely uninteresting impressionist and Renaissance art. There was one exhibit that caught my eye though: the works of Hyman Bloom. They had a bunch of his photographs and graphite sketches of the woods of Maine, all of which were beautiful in their own right, but the best work they had displayed was Landscape, a giant oil painting of this vista of rotting trees. It just had this grand and intoxicating presence... I ended up taking a picture of it. https://i.imgur.com/b57Jjj9.jpeg Unfortunately, they couldn't display most of his works for reasons that will be obvious once you see them. I think Cadaver on a Table is my favorite; it's exactly what it says on the tin, so don't say I didn't warn you.
I was actually torn on the blended mass painting. I kept wondering why it prominently included a naked man striking an anime gunslinger pose.
Someone has probably already done “Donald Trump being protected by US secret service agents” in the style of some appropriate Renaissance painter. (Caravaggio comes to mind).
https://en.wikipedia.org/wiki/The_Entombment_of_Christ_(Caravaggio)
(NB not a spoiler for Scott’s test, as this isn’t one of his images).
I did a little better than I thought I would. I wasn't very confident and thought I'd get around 60%, but wound up at about 67%. And a few of the ones I got wrong were ones I was very confident about too. I'll be curious to see how the community as a whole did.
Well, I got 36/50 right, and estimated that I'd gotten 70-80%. Decent calibration at least.
I was annoyed that the available ranges for "how do you think you did" were 50-60%, 60-70%, etc., because I thought I'd got about 60%. In fact, I got exactly 60% :-).
Of the six that surprised me significantly, five were surprises of the form "AI is better at X than I thought it was". The sixth was a human artwork that I'd initially confidently put down as human and changed my mind about on looking more closer, because of anatomically ridiculous fingers. I should have held my ground on that one.
(The surprises: rot13(reho, terra uvyyf, yrnsl ynar, zhfphyne zna, evirefvqr pnsr, fgvyy yvsr). I was fairly confidently wrong about all of those except that rot13(zhfphyne zna, nf zragvbarq nobir, V vavgvnyyl pbeerpgyl gubhtug jnf fheryl uhzna, naq fgvyy yvsr V jnf qbhogshy nobhg orpnhfr gur evtug-unaq pnaqyr-qevc ybbxrq vzcynhfvoyr).)
Main takeaway is that rot13(NV vf cerggl tbbq ng vzcerffvbavfz abj).
[EDITED to add: oops, the one I thought I mis-corrected from human to AI _was_ in fact AI, so that wasn't a mis-correction.]
zhfphyne zna is obviously AI if you look at the background, though it is easy to get distracted from that by how good the faces are. (Also, the hands look substandard, but that's no longer a reliable sign.)
Wait, which one is the human artwork? zhfphyne zna is AI.
oh, wait, I'm an idiot. I got 31/50 not 30/50 because what I claimed above was an anti-correction was in fact a correct correction. Thank you.
So _all_ my confident errors were thinking things were human that were actually AI. (Lots of my not-confident errors were the other way.)
Yes, we can consider impressionism to have been cracked as far as the man on the street is concerned. It would be good to have an expert onboard telling us how to distinguish AI impressionism from actual impressionism.
I submitted the form before checking, but I think I got 33/50 and guessed 70% - 80% certainty. I'm fairly ignorant of art in general so I didn't recognize anything here. I also dabble in AI image gen frequently so I know a lot of the telltale signs. I found the images that were more abstract and lacking detail to be the most difficult. They could really go either way and aren't exact enough to determine whether things are wrong or just a deliberate stylistic choice.
This was awesome. I recall seeing a tweet about how "AI art was so easy to identify," to which I responded with plane_with_red_dots.jpg (https://en.wikipedia.org/wiki/Survivorship_bias). Hopefully this survey will convince people that AI art is not at all obvious (and will only become harder to identify in the future).
And I owe an apology to a certain practitioner of (rot13) Serapu Npnqrzvp Neg for thinking they are too photorealistic to be human.
Wait, do you mean the new addition? I was on the fence about that one precisely for the opposite reasons - because I did not know whether it was normal for a painter of the time to leave out the details it left out. It's very modern in some ways; I was confused because the dress/furniture/hairdo seemed to be from the period it's actually from.
(Rot13) Terrx Grzcyr
Ah, I spotted that as a (very good) human painting that possessed some superficial characteristics that made it look like AI art.
So... is that an accurate title? I thought that thing was an anachronistic soup.
Basically? It's meant to portray writers and even painters from different eras crowning Homer.
I was almost sure it's a modern collage made from different paintings.
Oh, very well, then, to proceed to the lively arts, I present to you an early collaboration between Conway Twitty and Fifty Cent:
https://youtu.be/K4Qg99MhZQM?si=AZdnLdFA0xhCHQUb
And also when Hank Williams Sr. tried his hand at gangsta rap:
https://youtu.be/2Jh7Jk3aSlo?si=cFDajrGP9Ox9ri7-
I got 68% correct. What about other people here?
That was interesting! I did a lot worse than I expected, as some of the art really is exceptional at copying human styles.
I think that somehow AI models are really good at impressionism ... I'm not saying it's aliens but ... perhaps Monet was an alien AI?
I think it is because impressionism sort of hides the clues that give away some other AI art, it is kind of dream-like the way a lot of AI art is, with many objects and details which almost but not quite match.
I think (without having actually run the analysis to be 100% sure) that strongly impressionist paintings have vastly less information, mathematically speaking, than similarly sized non-impressionist works. The lack of low-frequency components almost forces this to be true. Which means we're simply working with less data for distinguishing them.
Yeah, you put it better than me :-)
Ok, so the oil painters are cooked. Looking forward to “which of these Linux device drivers were written by AI and which by human programmers?”
At current rate of progress, some time next year, probably.
I submitted it and looked at the answer key, but I don't remember all of my answers :(
Was pleasantly surprised to see the Slav Epic in this collection.
I was very well calibrated, but I incorrectly judged several human pieces as AI. Now looking at them more closely, I'm realizing they're not AI... just.... badly proportioned or poorly detailed.
Got the majority of the first ones wrong while the last 6 I got completely right (but with low confidence). I'm not an art dude, and I tried very hard to just give an initial impression for the first ones as instructed.
A lot of the AI ones look really convincing on first glance but any inspection of the details gives them away. On the other hand, I said all of these pieces looked bad or like imitations and several are famous pieces so maybe I just don't like visual art.
Really? I thought two of the religious pieces were master-level (I'm an atheist, so I consider myself unbiased in that way). They were both human (phew), though I can see why Scott has wickedly chosen them.
(I think with the all-caps warning, and Scott's rot13'd answer key taking the top comment spot, we should be able to freely discuss down here; I do not want to have to un-rot13 anything other than the answer key).
Whoa! I am surprised Giant Ship was human. I had chosen that as my favorite, and thought of it as a perfect example of what's cool about AI art: dreamlike scenes with lots of detail.
I thought they were all AI, but decided I should play along and choose human for some. I wonder if I would have done better if I had fully believed it was 50/50... probably not,
Giant Ship fooled me too. Especially the upside down towers and the over the top detail you mention. But I guess it was a bit too perfect for AI and ... repetitive. AI tends to make small variations on similar objects even when it makes little sense.
I did not like the image but I did vote it as the one where I was most sure that it was AI :D
The two big cues on Giant Ship were:
1) The rigging was all perfectly symmetrical. If there were 11 vertical ropes on the port rigging, then there were 11 on the starboard rigging. Current AI art can't even count to 5 fingers; it definitely can't count to 11 ropes, repeatedly.
2) All the background ships were a radically different style from the center one, which would be very challenging to prompt an AI to do.
It seems more difficult to tell for the more modern art styles.
You need to follow the fingers!
Svatref ner bsgra n tvirnjnl va NV neg. Sbe rknzcyr ybbx pybfryl ng gur svatref bs “Tvey Va Svryq”.
Nyfb gur eryngvir fvmr bs gur yrsg cvaxl va “Pureho” frrzf gbb fznyy.
Fgevat Qbyy: Gur svatref ner bss gbb (ybbxf yvxr fvk svatref sbe gur evtug unaq).
Zhfphyne Zna: evtug unaq bs gur fuvegyrff thl ybbxf svar, gur yrsg unaq qbrf abg.
Snapl Pne: U
what are we doin with that side mirror tho
V jnf 100% fher gur pne jnf NV. Gur fvqr zveebe, gur evz yvtugf, gur jurryf ng qvssrerag urvtugf, gur sebag yvtugf. Abguvat nobhg gung pne ybbxrq erzbgryl cynhfvoyr. V nz fgvyy pbasvqrag gung Fpbgg cvpxrq na rkvfgvat NV vzntr sbe gung bar naq zvfynoryrq vg uhzna.
will you email grades or something?
I cant be bothered to decode and remember what I picked
Note to self: 00011101001011110000101001100001110011010010110000
answer key: 00010110011010011001101101100010110011010011110011
match key: 11110100101110010110111011111100111111111110111100
72% hits, about what I expected.
The answer key in the comments inclines me to not trust the results of the poll. I think it would have been better to not post the answers until after the form is closed.
Cheating this test is as simple as doing a google image search. Not cheating is entirely a voluntary agreement. Posting the answer key provides very little help, and since it's ROT13'd, you'd still have to make a conscious choice to cheat.
Claude unrot13ed perfectly.
This was fun. Good idea.
I wonder, there were a surprising number of AI images with the classic finger issues, among others. Is that still state of the art, or a deliberate inclusion to make a representative sample of AI art?
Did fairly poorly I think, but am fine with that as I don't much care for most of these art styles to begin with; it's easy to fake an image the eyes endeavor to veer away from.
Was extremely disappointed, however, that I missed on an image I'd already seen elsewhere. I forgot the context and guessed wrong.
I will say "picture" is somewhat vague; there were a couple I spent a while debating whether it was an AI image, or a photograph. I'm not sure whether or not that was supposed to be in the scope.
Given that you are collecting emails anyway, maybe you could automagically email everyone a sort of scorecard?
Got 61% (30/49) using the “is this utter garbage + is there and insanely low taste/technique ratio heuristic.
I’m curious what would happen if you compiled this test but it was like… good artists only, though taste is subjective.
I’m quite surprised Mediterranean Town and Rooftops are “AI” in that yes they do the “mix 3 styles together” thing but it actually fits well enough I assumed it was intentional.
Bucolic Scene, Fancy Car, Greek Temple, and Giant Ship are prime examples of “the taste to technique ratio here is so skewed I don’t even want to start comprehending the amount of trauma these humans went through”
I must admit that, as a diletant that doesn't appreciate art, this test has me utterly shocked at how bad human generated art can be, and a lot more bullish on the "unless I have a particularly good artist I like, I should always comission a thing from an AI" stance
... what? Fancy Car and Giant Ship I agree, Bucolic Scene is OK as an example of its genre, but I thought Greek Temple was good stuff (and it is indeed by one of the greats, who incidentally sometimes perpetrated very bad anatomy, but not here).
I looked at Fancy Car and Giant Ship for a while because they seemed too good to be AI, but I marked them as AI anyway because surely a human wouldn't spend that much time on drawing these
72% correct guesses, and I guessed that I was 60-70% accurate. Abstract art feels like cheating here though, I have no way of telling whether a human or AI assembled this random jumble of ugly shapes and I wouldn't care either way.
I got all 6 abstract art pieces correct. I think I did better on abstract art than on any other category. Here's how I could tell:
1. Look at the thin lines and small dots. AI has a tendency to blur lines/dots in a way that's not possible with physical paint. (rot13) Va "Natel Pebffrf" (NV) ybbx ng ubj gur guva yvarf ner oyheel naq unir vapbafvfgrag yratguf naq guvpxarffrf, naq va "Senpgherq Ynql" (NV) abgvpr ubj va gur jvaqbj-ybbxvat ovgf, gur juvgr naq qnex cnegf xvaq bs oyraq gbtrgure. Jurernf va "Perrcl Fxhyy" (uhzna) gur fgvgpu znexf ner jryy-qrsvarq naq unir pbafvfgrag nccrnenaprf.
2. Look at the watermark. AI always puts the watermark in the corner, and usually the watermark is not an actual word. (rot13) "Checyr Fdhnerf (uhzna) unf n jngreznex va gur 2aq fdhner sebz gur evtug ba gur gbc. Na NV jbhyq unir chg gur jngreznex va gur pbeare. Naq vg'f xvaq bs vyyrtvoyr ohg vg ybbxf gb zr yvxr npghny yrggref. Va "Juvgr Oybo" (uhzna), gur jngreznex vf va gur zvqqyr bs gur vzntr naq vg'f uvtuyl fglyvmrq.
3. "Oevtug Whzoyr Jbzna" qbrfa'g unir gubfr gryyf, ohg gur jbzna'f rlrf ybbx irel NV-vfu.
That is so interesting. I thought that it would be more difficult to detect AI in abstract art.
But I guess patterns are patterns even in abstract art.
Yeah, I think the thing about physical paint is really important. For human artists working in physical media, the texture of that medium becomes a part of their art, and a part of the visual language that they're using to communicate. In theory, AI could reproduce that, but it's likely to mess it up if it doesn't have experience of how the physical paint runs.
Scott: Will you be sending answers out via email? I did it in browser and wasn’t able to see my results
Anecdotally, I've gotten a lot of compliments on the art in my latest post, which is AI generated. Makes me wonder if people would be as enthusiastic if they knew that.
An art critic writes:
“Even in his portraits, Ingres did not always stick to strict realism. For example, his 1856 Portrait of Madame Moitessier may seem inoffensive until you look at the subject’s hands. They seem unnaturally shaped, as if they’re made of rubber, fingers bending this way and that.”
Oh.
I didn't bother fully scoring myself but I did check the ones I was most confident on (Fgevat Qbyy, Napvrag Tngr, Zvanerg Obng, Qbhoyr Fgnefuvc, Zhfphyne Zna), and was relieved to see that at least I got all of those right. The approach I take to recognising AI art tends to rely on things it often gets wrong, but there's less it always gets wrong, so it's harder to be confident something is human-made than AI-made.
I think a more interesting AI Turing test would be for _very_ specific prompts and comparing how close the AI gets it vs how close the humans get it.
Compare an original New Yorker cover vs the AI version of it (I've included the prompt I've used): https://imgur.com/a/XMEzvPT. It's clear that the AI did an OK job but still not enough to replace the human illustrator. So it would be interesting to have a 1:1 competition of human vs. AI, having people vote which one did a better job. The true Turing test is whether The New Yorker can fire their illustrators and just pay $10/month to Midjourney.
I got 38/50 right. I had a pretty hard time picking which one I thought was "most human" because I could see potential AI signs in all of them, but turns out that out of all the ones I marked as human, every single one was human. So I systematically overestimated AI art's capabilities.
Edited to add: Actually I'm not sure I'm overestimating AI's capabilities, I just wasn't using priors in the correct way. I marked an image as AI if it looked like it *could* be AI, but what I should have done is marked it as AI if it looked more AI-esque than the median picture in the survey. For example (rot13) V'ir frra NV vzntrf gung ybbxrq n ybg yvxr gur oyhr navzr tvey, V guvax vg gbgnyyl pbhyq unir orra znqr ol NV, ohg gurer jnfa'g nalguvat gb fcrpvsvpnyyl vaqvpngr gung vg jnf NV. Naq gur oynpx navzr tvey jnf irel boivbhfyl NV.
I marked a few human pictures as AI due to incorrect details, for example (rot13) va "Ohpbyvp Fprar", gur ovyybj bs gur lryybj qerff ybbxf haangheny, yvxr vg ybbxf yvxr gur obggbz bs gur qerff vf fgvpxvat hc bire ure srrg, ohg onfrq ba ubj fur'f fgnaqvat, gurer'f ab jnl ure srrg pbhyq or gurer. V gbbx gung nf gur NV abg haqrefgnaqvat nangbzl, ohg nccneragyl n uhzna qerj vg yvxr gung sbe fbzr ernfba. Naq va "Frerar Evire", gur ybar gerr'f ersyrpgvba vf pyrneyl jebat.
See, I marked "Frerar Evire" immediately as human, simply because I'd seen impressionistic paintings where reflections are painted in that way. It's a stylistic quirk. At the same time, AI can copy quirks...
I think I got impressionist paintings wrong the most because they often get minor details wrong. I kind of knew that (that's why they're called "impressionist", right?) but I figured real paintings would have *blurry* details, not precisely-incorrect details.
42 out of... how is everyone getting 49? There are 50 pictures. 84%. My biggest regrets:
Gebcvpny Tneqra - V jnf irel ba gur srapr ohg vapbeerpgyl qrpvqrq na negvfg jbhyq unir orra zber pbafvfgrag jvgu pbhagvat gur fgebxrf va gur erq gvyrf.
Synvyvat Yvzof - Fbzr qvtvgf jrer junpxl ohg gurer jrer fbzr irel avpr svtherf va gurer, jvgu NV guvf jbhyq unir orra jnl yrff frafvoyr.
One of the images was left out by mistake and was added later.
Oh, I was a latecomer to it then? Thanks for the info!
I was not confident that ANY of the first ~20 images were either AI or human generated.
I'd believe either about each of them.
I don't know if this feedback can be scored :-)
I would love to see the prompts for the AI images, mine never look this good :(
Tvnag Fuvc being human is immensely confusing to me. I have to downgrade my estimate of my own capabilities of telling AI art way, way down
You are not alone. I was guessing that some digital artists are now influenced by AI (seems insane), but I'm told (see above) that this sort of thing was already a thing. I still think that whoever made this image had seen too much AI art.
This artist has been producing work like this for over 30 years. I think you could/should assume that AI has "appropriated" his themes.
I found the artist who produced Zhfphyne Zna, and this may be one of his "digital" works. So, you may be right that it was AI-generated. How sure are we that Scott did his homework on all these?
BTW, Did you mean Tvnag Fuvc or Zhfphyne Zna? To my mind, Tvnag Fuvc is so slickly graphic-artsy that it would be impossible for me to decide whether it was done by a human or AI. I got Zhfphyne Zna below it wrong, but then I looked more closely at the details and I realized the clues that I had missed by clicking through too quickly.
Wait, Zhfphyne Zna is definitely AI. Look at the background.
And right, Tvnag Fuvc may have involved AI at some point.
Hey, we weren't supposed to give the answers away! ;-)
I clicked through Zhfphyne Zna quickly because I started to bored about halfway through. Yes, it is definitely AI — but I didn't look carefully enough at the details the first time through. Do'h!
Wait, in what sense did you "find the artist"? I think that one was from an ACX-reading AI hobbyist who made it specifically for this contest.
If we're talking about "Giant Ship", I looked it up in Google Images. The image came up as "Victorian Megaship" by Mitchell Stuart. He's a fairly successful Sci-Fi/Fantasy illustrator/artist. You can find the image in the top row of images under his Concept Art section. Unless Mitchell Stuart is also your ACX-reading AI hobbyist, the person who gave it to you stole the image from Mitchell Stuart.
https://mitchellstuart.onfabrik.com/artworks
The one you said you found was Muscular Man. Giant Ship was marked as human, so it wouldn't make sense.
Fbeel. Zl onq. Zl EBG13 nggragvba qrsvpvg qvfbeqre jnf xvpxvat va!
What I found most interesting is that all the anime, science fictiony, and simplistic graphic art generally defeated me. But AI is scraping lots of graphic images, so I would have no clue whether any of the anime girls were created by someone using Adobe Illustrator, pen and ink and airbrush, or AI. Likewise, how would I know if say a human or AI created "Fancy Car"? It's a slick piece of graphic art without any clues in the image as to the technique used to create it—i.e. it could be an actual photo of a futuristic model car that was shoved through Photoshop, or it could have just as easily been produced by AI. And if it were produced by Photoshop, isn't there an element of AI in Photoshop now?
Yes, some of the tools in Photoshop use AI, which makes them somewhat more efficient, though not miraculously so. The most AI thing going on in Photoshop is that you can remove an object from a picture -- say a telephone pole from an otherwise purely natural landscape -- and an AI part of Photoshop will replace it with stuff that matches the blank spot's surroundings. You can also extend a scene by having the AI feature add more seashore or whatever onto the sides of the image you started with. You can also put in a prompt: "In the area I selected, remove what's there and put a redheaded teenage guy in the space." Even without AI Photoshop is excellent at polishing images -- things like the fancy car -- in different ways to make them pop.
Kinda curious how LLMs would do at this test
My guess is that they would do a lot worse than humans. If that weren't the case they'd be able to generate more human-like images themselves using something like the more classical GANs with one LLM acting as an "art critic" discarding generated images of a generative model which has to pass its test before the final image is released. But I am not up to date with how the newest genAI models work so it might be that something like this is already employed.
I asked Claude about all 50 images, in batches of 5, then gave it the answers and had it summarize how it did:
> Thank you for providing the correct answers. I'll calculate the results based on my previous responses and the actual answers you've given.
>
> Said human that were AI (False Negatives): 12
> Said human that were human (True Positives): 18
> Said AI that were human (False Positives): 7
> Said AI that were AI (True Negatives): 13
>
> Total correct (True Positives + True Negatives): 31 out of 50
> Accuracy: 62%
> This result shows that I had some difficulty distinguishing between AI-generated and human-created images in this test, particularly in identifying AI-generated images as such. It's a valuable reminder of how sophisticated AI-generated art has become and how challenging it can be to differentiate it from human-created art in many cases.
That beats my 54%. It only recognized two of the classical paintings by name, so likely only around +2% expected score from art memorization.
So embarrassed I fell for the Wyeth trap. (Somebody needs to confess here before everybody insists they never would have been taken in by something so radically and obviously different.) Eyewitness memory research rings true, go figure.
Also in the form comments I went on a long rant about how sad I was that ai couldn't yet do one of the ones I loved the most, only to find it was made by AI, in the most cliched twist ending possible.
I am apparently the exact target here. This is my thoroughly deflated hubris. Enjoy and very well played!
On the main body of the survey I was correct 58% of the time, but on the "analyze these pictures more deeply" section I got 6/6 correct even though my median confidence was 65%. Interesting to see that taking time and thinking deliberately made such a large difference.
I'll be very interested to see if this replicates in the full data set.
75%. I was mostly fooled by the impressionist and classical ones, and on second look some of the classical ones I marked as human had obvious tells that I missed. I got all the ones where I was most confident correct, and this included ones that I was confident were human, and confident were AI.
I found the classical art selections very odd; having seen a number of these pieces in person it seems really bizarre to compare digital thumbnails of them to actual digital art - the pictures of "classical" art work are signifiers of the works, versus the digital art which is the work itself.
This made it very weird and it felt not like a real test - I would have much preferred all digital art because it would have then felt like everything was on an equal semiotic par.
Did anyone else have this problem?
I did both better and worse than I expected, and also ended up switching a few that I originally got right. Super interesting!
Scott's title probably isn't fully serious, but I'd like to point out that in order to judge this test well people need to be familiar with art of different kinds & from different eras, and there probably are not that many people who have broad enough familiarity with art to distinguish AI from the real deal for all the different kinds of art represented here. Here's an extreme case of the same problem: asking people to judge whether some code is written by a person or is AI-generated. Someone like me, who can't code, is effectively blind when it comes to looking at code, and could not recognize even very obvious instances of typical AI errors or of forms of errors and weirdness that just don't seem human. There are lesser versions of the same thing going on with judging whether an image is by a real artist or by AI. For example, I am not very familiar with Medieval art, mostly because the examples I have seem are ugly as hell, People look malformed, the colors are murky, nothing gives a sense of things being in motion -- a picture of people dancing looks like a picture of people standing in a circle. Because I find it hideous, I have looked at so little Medieval art that I am not in a position to judge whether something looks like the real deal.
I have an idea for a fun test: What if people here, and several AI's, all try to imitate Scott Alexander? Everyone here is familiar with Scott's style of thought and writing, so is in a position create a sample for "Turing Test" on a passage purported by be by Scott. We could write a couple paragraphs on a specified topic. The AI could be given multiple examples of Scott's writing, then asked to generate 2 paragraphs making the same point.
Actually the Turing test (sort of) could be done using just AI imitators of Scott, but doesn't having the rest of us toss in some material too sound fun?
Ha! But I don't think I could imitate Scott's style.
I'd play! That sounds fun.
Harder than expected. It’s AI if it’s too perfect, yet some human output is extremely banal. Also, what about AI art that is edited by humans and human art that is tinkered with by AI? There should be some type of scale.
> I've tried to crop some pictures of both types into unusual shapes, so it won't be as easy as "everything that's in DALL-E's default aspect ratio is AI".
On one hand this is fair enough, since you told us about it and did it for both types. But I think it makes the test harder (beyond just removing the cheap aspect-ratio based way of recognising AI images) because composition is an important element of visual art, and if you're chopping up the images in arbitrary ways then you're confusing our "something's off here" sensors.
Like, even for Saint In Mountains, where the cropping is arguably fairly subtle, it gives the image a completely different feel. (The colours are also very different from those in the version on Wikipedia; I know reproductions can vary, monitors are imperfect etc., but from a quick search it seems like the version on the quiz is an outlier in terms of how brown/red it is.)
Can anyone look at https://imgur.com/qK3JIxo and
https://imgur.com/a/2on6bAF and tell me the first is a fair reproduction of the second?
At first glance, I thought Saint in Mountains was probably AI, and the composition definitely played a role. Particularly the water and boat in the background made no compositional sense, and I have seen AI do weird stuff like that before. I did eventually change my mind on this one before submitting, but if the whole painting had been present I would have leaned much more quickly towards "human."
64% correct for me, which is about what I expected to get initially—though I forget whether I answered 60–70% on the expected-correctness question or whether I had a brief fit of self-censure and second-guessed myself down to 50–60%. There's definitely a few that I played myself on, including thinking that one that I'd initially classed as very DALL·E-esque was human due to… well, I'm not sure, it seemed a bit more earnest than usual I suppose.
Also I *just* got the pun in the name “DALL·E”. Goddammit.
I wish the form had presented a grade after I submitted my answers. I don't recall what I chose for all 50, and having spent 20 minutes on it, I can't justify spending more time trying to find where Google kept my submission and manually cross-checking it to our host's list.
Does anyone know what the rules are supposed to be for repeated Turing tests? If humans are initially fooled by a computer, then figure it out and can reliably distinguish it from other humans, then has it still passed the Turing test?
I think there's no canonical answer; the common usage of 'Turing test' has already diverged a bit from Turing's original Imitation Game (https://academic.oup.com/mind/article/LIX/236/433/986238), so it's probably not worth trying to define exactly what it means to 'pass the Turing test'.
I went into this expecting to do pretty poorly, because I'm not much of an art guy at all. I was originally planning to use gestalt impressions, but once I got started I found more subtle errors and anomalies than I expected, and ended up mostly relying on that.
I didn't even try to answer the super weird/abstract/impressionistic ones, because from my perspective they're full of random noise and they could look like basically anything. My model for what these "should" look like isn't strong enough to provide meaningful constraints.
Here are my answers, and some observations they were based on. (This is NOT an answer key; I got many of these wrong.)
Angel Woman - AI - Angel's empty hand looks a lot like the birds; seemed like pattern recognition cranked up a little too high
Saint in Mountains - Human
Blue Hair Anime Girl - AI - Downward arm looks too long with too many joints; hands conveniently hidden; hat not in resting position
Girl in Field - AI - Proportions seem wrong
Double Starship - AI - Bottom thrusters (?) seem weirdly asymmetrical
Bright Jumble Woman - ????
Cherub - Human
Praying in Garden - AI - Bottom right guy seems inexplicably bloody; water spout on left seems insufficient to feed river
Tropical Garden - AI - Leaves seem to sprout from building on right
Ancient Gate - Human
Green Hills - AI - Obvious brownish line seems too narrow to be a footpath; what is it?
Bucolic Scene - Human
Anime Girl In Black - AI - Spent a lot of time analyzing conflicting vibes on this one before noticing there's an anomaly in her hair below the ear
Fancy Car - Human
Greek Temple - Human
String Doll - AI - Right hand middle finger seems to split near fingertip
Angry Crosses - ????
Rainbow Girl - Human
Creepy Skull - ????
Leafy Lane - AI - Path terminates in weird way in the distance
Ice Princess - AI - Anomaly below girl's left eye (viewer's right). Also asymmetrical earring seems odd and face gives me vaguely AI vibes.
Celestial Display - Human
Mother and Child - AI - Anomaly at crook of mother's right thumb
Fractured Lady - ????
Giant Ship - AI - Upside-down domes and spires; weird flower-like protrusions on right; level of detail is not impossible for a human but seems extreme
Muscular Man - AI - Front guy's left hand and back guy's right hand both look weird
Minaret Boat - Human - Boat seems slightly aground at the back but eh
Purple Squares - ????
People Sitting - AI - Girls sitting on table; table melds into back wall; weird vertical plank in front of picture frame
Girl in White - Human - Dunno what those straps are hanging off the far side of the girl's canvas but eh
Riverside Cafe - Human
Serene River - AI - That is a wacky tree alone in center background
Turtle House - AI - Lantern in water on bottom right (the flame inside it looks very similar to the magic floating lights in the rest of the scene, but only this one is surrounded by an incongruous lantern frame)
Still Life - AI - Hovering wax drips
Wounded Christ - Human
White Blob - ????
Weird Bird - Human
Ominous Ruin - AI - Second column from the right aligns with its neighbors on the top but not on the bottom
Vague Figures - ????
Dragon Lady - Human
White Flag - Human
Woman Unicorn - Human
Rooftops - AI - Some of the birds look like ink splatters, as if the artist isn't sure which one they're drawing
Paris Scene - Human
Pretty Lake - Human 65% - Reflections seem a bit off, which I think is a more likely mistake for a human than an AI. But also because I've been picking human any time I can't find a reason not to.
Landing Craft - AI 80% - Weird gun-thing emerging off-center from front window, weird thing coming out back of craft, background clouds seem to meld into smoke plume at right
Flailing Limbs - ???? 50% - Looks intentionally crazy. I can't tell what it ought to look like, so I have no basis for judgment.
Colorful Town - Human 55% - Not detailed enough to provide much evidence, but I've been defaulting to human and still picked AI more often, so I will continue to do so
Mediterranean Town - AI 60% - A bunch of small details look subtly wonky
Punk Robot - ???? 50% - Too abstract to tell
My score: 25 right, 16 wrong, 9 declined to guess. If you give me 50% for the ones I didn't guess then my overall accuracy is 59%, which is pretty close to what I expected. If you only grade the ones where I _expected_ to do better-than-chance then I got 61%.
The picture I said I was most confident was AI turned out to be human. In hindsight, I should have picked a picture with a single tiny-but-indefensible glitch instead of a picture with multiple bizarre-but-visually-continuous features.
The picture I said I was most confident was human, was human.
My favorite picture was AI, and I incorrectly guessed it was human.
Several other commenters seem to have had an easy time with certain pictures I got wrong, by drawing on their experience with popular AI generators and/or famous historical art. I did not have their experience.
I was definitely fooled by Mother and Child because it looks so much in the style of Bouguereau:
https://en.wikipedia.org/wiki/La_Vierge_au_lys
https://en.wikipedia.org/wiki/William-Adolphe_Bouguereau
I suppose now I know what corpus the AI was trained on!
I thought Bouguereau, too — or one of his ilk — late 19th-century academic artists. What gives it away is the abstract pattern in the background. Bouguereau would have painted a cloth of embroidered flowers and not random dots of color.
It also has a story to tell! She's inclining her head towards her son and that saintly thing stays in place. That shouldn't happen with those saintly things. And the son doesn't have a halo, and they're both drawn very human, like the picture is saying: you thought they're saints? They're just human and that's just a thing on the wall. "Her love is what matters".
Girl in field - thought the problem with this one was the placement of flowers and other elements of the background that didn't fit a coherent composition. Flowers on the path? In the middle of the grass, not on any stalk at all?
Hiding of hands is probably a tell for Human, honestly; humans care when they can't draw good hands, AI doesn't.
The human operating the AI generator and choosing among its outputs probably cares.
I think I also heard something about people using descriptors in the generator to try and prevent bad hands, and the AI sometimes fulfilling "no bad hands" by not having hands at all.
Not having hands at all is different from having hands offscreen or hidden in pockets.
Was surprised at how well I did, I think I hit every AI +-1, but I had a lot of misses the other way.
Interestingly, all the art I thought was quite good was human, and all the art I thought was trite or bad was AI.
https://en.wikipedia.org/wiki/Le_R%C3%AAve_Transform%C3%A9 (not one of Scott’s images)
(Looks at Le rêve transformé by Giorgio de Chirico)
“Look, the perspective clearly doesn’t make sense, like it’s stitching together pieces with different vanishing points. Clearly AI.”
Let's be clear: this is not a "human artist vs. AI" Turing test, this is a "human artist using paint on canvas (or other flat-media) vs. human artist using AI generative model and multiple iterations of prompt + curation". As evidence, here's Scott's note on the provenance of the images used in the test:
"All the human pictures are by specific artists who deserve credit (and all the AI pictures are by specific prompters/AI art hobbyists who also deserve credit)"
Personally, I've always believed that human curation introduced human intention, like it would with photography. But I've had so many artists tell me otherwise that it seems like a steelman to allow curated AI works under the clear label of "AI Art."
I wonder if AI art can do a decent image of a Rube Goldberg machine. The implied physics, the directionality, and non local interactions seem like they’d be hard for a diffusion model.
That's an excellent question.
It would be great interesting to try to come up with a good prompt for that.
Draw the most complicated mouse trap you can?
It would have been interesting to have a question at the start about how many people expected to get right paired with the one at the end. I think that might be a better way to measure how much harder/easier it was than people expected vs asking directly.
WAIT: This poll and experiment by Scott is not a clean-cut dichotomy between human and AI. I'm not even sure if a suitable gradient can be described.
A majority of the source images used by AI were generated by humans. AI is basically painting with human stuff. On the other hand, so-called human-credited art uses computer-assisted tools and techniques, sometimes to the extreme. Using a 3D art program is an example: Construct a simple bird model, or purchase one. Add a million random points into a scene and then have the program link a bird to each point. Select a backdrop of a sky from the programs assets. Have the program generate an image of the flock of birds in the sky. The final image is the work of the computer program, but the human gets credit. In lieu of "prompts" are instructional steps. There's not much difference between AI image generation which requires human prompts, and human-created art that requires sophisticated computer programming.
I agree. A more valid art Turing test would be using real and AI-generated paintings—Renaissance up through Abstract Expressionism. At least with the real paintings, we could be sure that there was no AI involved.
Exactly.
V zrnag gb fnl va gur erfcbafr sbezf ohg sbetbg:
V fnvq V jnf zbfg pbasvqrag gung Terrx Grzcyr jnf uhzna orpnhfr gur Terrx yrggrevat znqr frafr (vg fnlf BZRE, juvpu pbhyq or gur fgneg bs BZREBF, nf va Ubzre). V jbhyq unir orra fhecevfrq/irel vzcerffrq vs NV pbhyq trarengr npphengr naq cynhfvoyr napvrag Terrx yrggrevat.
Guvf jnf ernyyl sha. Abgnoyl zbfg bs zl snibhevgrf jrer NV neg, vapyhqvat gur zrqvgreenarna gbja juvpu V jnf pbaivaprq jnf uhzna.
Right, that's a better indicator of humanity right now than good hands are, but for how long?
30/49 = 61%, 1.6 s.d. above chance.
Here are the notes I made while judging them, unedited except to fill in Scott's answers (lowercase single letters). In hindsight I'm slightly embarrassed to have pontificated on how an impressionist artist was clearly thinking about real buildings rather than coloured blobs, but here it is, warts and all.
Contrary to instructions, I zoomed in on all of them, because that's what I'd normally do when deciding "is it or isn't it?"
Apologies to the human artists whose feelings I may have slighted.
1. Angel Woman.
AI. Obviously AI and pretty low-effort.
h
2. Saint in Mountains.
Human. The details make sense. Except, now I see that cross in the distance with some sort of triangular fabric hanging from it, I'm a little unsure.
h
3. Blue Hair Anime Girl
AI. What on earth is wrong with her left arm? And her hat isn't quite on her head. I could believe it human otherwise.
h
4. Girl in Field.
Human. Her posture looks a little stiff, but that's the sort of error a human artist would make (or his model), not an AI.
a
5. Double Starship
Human. A good piece of sci-fi cover art. The details hold up under close inspection.
h
6. Bright Jumble Woman
Human. These abstract images are difficult to judge. I'm looking at the consistency of line across the picture. I don't know what you'd prompt an AI with to get an image like this.
a
7. Cherub
AI. It's pretty good, but the crease in the left elbow isn't right, and its right wing (left in the image) doesn't meet the back properly. And that wing seems to have a thumb on its leading edge.
a
8. Praying in Garden
Human. Details, composition, story-telling, they all hold up under inspection.
h
9. Tropical Garden.
Human. The style is not to my taste, but I see a consistency of execution throughout.
h
10. Ancient Gate.
AI. There are probably a few artists in the world who could create this staggering quantity of detail, everywhere repeating but nowhere exactly the same, but an AI can churn it out effortlessly.
a
11. Green Hills
Human. A very consistent pointilliste style. Even the distant houses are clearly that, not vague distortions. The artist was thinking about real houses, not imitating blobs representing houses barely seen.
a
12. Bucolic Scene.
Human. The legs of the humans and the cattle are right.
h
13. Anime Girl in Black
Human. Because I want her to have a soul. I can see the incoherence of her belt buckle close up, but I want to believe. I know how irrational that is. My favorite image.
a
14. Fancy Car.
Human. Obviously made in a 3D modelling program, with some standard effects applied: motion blur, reflectance (unconvincing) from the damp road, glowing wheels, greebling on the front and side. Not professional quality. An AI prompted to make this would make something much more realistic, and no-one would have reason to instruct it otherwise.
h
15. Greek Temple
Human. I wavered on this, but all the details I initially identified as not making sense, I eventually realised did make sense. I'm not entirely convinced, but on balance this is where my vote goes.
h
16. String Doll
Obviously AI. It just has that look about it. Dodgy fingers too. And that strand that looks like she's sucking a noodle! A human artist wouldn't make that sort of accidental alignment.
a
17. Angry Crosses
Human. Another abstract, difficult to judge. I decided human because why would anyone try to get an AI to do this?
a
18. Rainbow Girl
Human. Digital medium, obviously, but I'm not seeing the usual AI tells.
h
19. Creepy Skull
AI, but who cares, really? I'm not interested in this picture, however it was created.
h
20. Leafy Lane
Human. No more to be said.
a
21. Ice Princess
AI. You're very nice, Ice Princess, but I don't believe in you.
a
22. Celestial Display
AI. The figure leaning against the tree doesn't convince me. Neither do the random sparkles on the shore and near waters.
h
23. Mother and Child
Human. Everything works.
a
24. Fractured Lady
AI. Another difficult abstract. AI, on balance.
a
25. Giant Ship
AI. Like Ancient Gate, this has a staggering amount of detail. And I found somewhere in there a dome that does not seem to sit properly on the structure supporting it.
h
26. Muscular Man
AI. Fingers.
a
27. Minaret Boat
AI. Like Ancient Gate and Giant Ship, the detail is enormous, but it breaks up on close inspection (e.g. the window bars) in a way that I think is not just because of the resolution of the image.
a
28. Purple Squares
AI. All of these abstracts are difficult to judge, but this one seems to have nothing behind it.
h
29. People Sitting
Human. Presumably not a Vermeer, which would be too well-known, but an artist of that time and place. Of course, an AI can be told to make an image in that style, but the details here are all convincing.
h
30. Girl in white
This picture was missing when I took the test. Scott later posted a link to it. I saw the answer before the image, so these comments are made knowing that it's AI. [ETA: I misread Scott: this picture is by a human.]
I find this very convincing except for a few details. She has a single ornamental stick in her hair, and I would expect there to be a pair. Her eyes don't match. Her right eye is looking straight at the viewer, but cover it up and her left is pointing oddly upward. The sleeve of her dress is too close-fitting. In the distance, the gentleman's left arm is a bit strange.
Despite that, I would probably have judged this human. I can read a whole narrative into it. She is the spurned lover of the gentleman in the distance. She gazes with sorrow at his new dalliance through a broken window in her poor rooms across the street from his grand palazzo. Perhaps she is painting a picture of herself and him together, to be prominently displayed somewhere to accuse him?
31. Riverside Cafe
AI. Why would a human artist bother to paint an imitation of Van Gogh's famous picture? And the legs of the chairs taper off unnaturally. AI has just as much trouble with furniture legs as animal legs.
a
32. Serene River
Human. That isolated bare tree in the middle is odd, but it's the only thing that I find suspicious here.
h
33. Turtle House
AI. I'm not at all sure though. At first glance it definitely has the AI style, but I could not find any other definite tells.
a
34. Still Life
AI. I can believe the fruit, but not the candle drips.
a
35. Wounded Christ
Human. Good hands.
h
36. White Blob
AI. Ok, it could be Miro, but I'm guessing not.
h
37. Weird Bird
AI. There is a type of line work that is typical of AI and I'm seeing it here in the columns and in the greenery at lower right.
a
38. Ominous Ruin
AI. The temple has bad legs.
a
39. Vague Figures
AI. Too vague.
h
40. Dragon Lady
AI. The combination of high saturation, contrast, and detail is characteristic of a lot of AI images. This might just reflect the sort of pictures that people using AI want to make, but AI it is. And the dragon's wings are different sizes.
a
41. White Flag
AI. Too many details making not quite enough sense.
h
42. Woman Unicorn
Human. Enough details making quite enough sense.
h
43. Rooftops
AI. Not at all definite about this, but on balance I'll say it's AI.
a
44. City Street/Paris Scene
Human. The consistency of the perspective all along the street catches my eye here.
a
45. Pretty Lake
Human. None of the usual tells, although I'm a little suspicious of the mud cracks on the path in the foreground.
a
46. Landing Craft
AI. A good piece of 50s/60s sci-fi cover art, but the details give it away. The way the right arm of the man on the right tails off. The placement of the feet of the vehicle. The random sparkles in the clouds at the right, and above them the vague aircrafty thing. The impossible moon that seems to be just half a mile away.
a
47. Flailing Limbs
Human. Why use an AI to make this? I don't care what it is.
h
48. Colorful Town
Human. Even the small human figures and the cart make sense.
h
49. Mediterranean Town
Human. It's a very striking image, a Mediterranean view in the style of de Chirico, who I'm sure never painted anything so bright. Clean execution, no AI tells.
a
50. Punk Robot
AI. Another difficult abstract. But the eyes don't match, and the teeth don't look right. I don't know what "right" would be, but these aren't.
a
Interesting. I made the exact same error on Angel Woman. I had it marked as obviously AI. Apparently, some human digital artists don't study anatomy before getting that good at the fancy bits of the picture.
White Flag was actually my pick for most obviously human. The biggest tell there, which I'm convinced no current AI would think of, is the woman in front with a kid on her back, and her arm wrapped behind her to support the baby as she's bent over and struggling forward. That's a posture I've adopted with a kid wrapped on my back and so is very real-feeling, and I don't think you could prompt AI to put her arm behind her back without generating monstrosities.
Yeah. I marked it as AI because her arms were deformed.
Angel Woman was the one I marked as most obviously human. The soldiers had Imperial Guard helmets and the guardsmen, angel, and cherubim all followed the 40k setting style. I would have expected the cherubim to either be generically angelic or generically monstrous depending on prompt if it were AI: it can't handle making a specifically 40k cherubim while focusing mostly on the guards and angel.
Maybe it's cheating but I recognized 3 of the paintings right off the bat.
#2 Saint in the Mountains is a famous painting of St Anthony by the Osservanza Master. I immediately marked it as human. But It's a really shitty reproduction — and I wondered if it might not be AI recreating a famous old master painting.
#8 Praying in the Garden is also a famous painting. I recognized it, but I couldn't remember the artist (I just looked it up and it's Agony In The Garden by Andrea Mantegna)
#30 Girl in White is a famous painting in the collection of the Met. Originally it was thought to have been painted by Jacques Louis David, but there was a big art history whoop-de-doo when it was identified as a work by a less-famous painter, Marie-Denise Villers. It's one of my favorite paintings from the late 18th-early 19th Century.
I picked "Girl in White" as my favorite, and it was my second choice for most obviously human. The particular drama of the image, with the couple in the background and the girl looking like we have caught her spying on them, didn't seem like the sort of thing AI can manage at the moment, no matter how carefully you tune the prompts.
But I picked "Greek Temple" as most obviously human, since I didn't think AI would be able to (i) cut and paste those individual portraits in different historical styles, like a 19th-century photoshop job, or (ii) produce grammatically correct Greek inscriptions on the steps.
Unless I miscounted, I got 76%, which is lower than my guessed 80-90%. I was surprised that fbzr avpr vzcerffvbavfg barf jrer NV, naq fbzr irel gnfgryrff zbqrea barf jrer abg NV
this is devious and insane, and if the answers are "100% AI generated" I'm going to be very upset with you.
I thought I got at best 70% right, but checking the answer key I was right easily more than 80% of the time.
Here were my intuitions that I think guided me very well:
- "This looks hard to write a prompt for"
- "This is something I've almost never seen before and doubt is well represented in anyone's training set"
- Has obvious midjourney/"straight out of the tube" AI art vibes (requires being familiar with what those subtle tells are). Every digital art package from Flash to Photoshop, even Unity or the Unreal Engine's default lighting produces subtle tells like this when people don't take care to deviate from default settings, and if you get familiar with it you start to catch on.
- Multiple characters in the same scene is currently hard to pull of with AI with a one-shot prompt; from what I've seen you still basically have to rely on many individual generations and some kind of compositing / traditional digital art skills to make it work. The chief weakness is that too much of the essence of one character in the prompt will leak into the other characters. So a varied crowd all with distinctly different clothes is a decent tell for non-AI art
- Uniformly high detail throughout. Detail is cheap for AI but is expensive for humans. It's rare for humans to pack an extreme amount of detail into every inch of the canvas.
All that said, if you had time travelled back to 2019 and dumped this test on me, I would have scored no better than chance.
The biggest lesson here for me is that Turing tests are themselves a moving target, because as AI's start to become real things that we interact with more each and every day we start to recognize them. The big question is when will their ability to mimic us outpace our ability to pattern match on their eccentricities.
I doubt it should be called Turing test. For the Turing test, Scott Alexander would have someone select human drawings that would be recognized as human drawings. He himself would have to find neural network drawings that would appear to be human drawings. If he, consciously or not, chose human drawings that look like neural network drawings, he simplified the task for the neural networks.
Does this survey work with uBlock Origin enabled? I never received a score, and wonder what happened with my results.
There's no auto-scoring; it's just a form submission, like surveys Scott's done in the past. For me it said something like “Your response has been recorded.” and that was it.
There is however a rot-13'ed answer key in the comments you can compare your answers with manually.
I have a feel, that I did much better with images that contain people, than those that didn't. Might be interesting to see the other people success rate on this split.
Absolutely. It's surprising how helpful humans are. But only if there are actually visible details... the super impressionist paintings have so little information that it's extremely hard to tell.
In art classes, they will usually say that we all know what human beings look like, so any mistake you make in drawing them will be very obvious. (While other subject matter you have much more leeway for making mistakes without them being noticed).
(Comment contains un-rot13’d image names with spoilers)
I think all my errors were of the form “assuming something is AI when it’s actually human”. A while through it I started to suspect it was all AI, because a lot of them were very obvious with the typical sorts of AI tells, and I was expecting this to involve a curated sample of AI art chosen to look especially convincing, which probably pushed me to err on the side of AI more than I might have otherwise. Instead, it feels more like a lot of the human art was curated to involve some weird or nonsensical elements, which I suppose I should have expected. Most of the ones I got wrong were ones I remember feeling less sure about; even Giant Ship, which I thought had to be AI, had me looking at the rigging expecting to find particular sorts of tells that I then didn’t find, which should have given me pause a little more. My pick for “most confidently human” was Tropical Garden (very genuine human errors/weirdness rather than AI weirdness) and for “most confidently AI” it was String Doll (just very obviously nonsensical strings, in the particular AI sort of way rather than a human way), which were both correct. Angel Woman was my favorite.
As someone who does some (amateur) art and runs in artist circles, the usual take on AI art there is “hating it with a passion, wanting nothing to do with it ever, and also seeing it and identifying it all the time because people are constantly posting it and trying to pass it off as human-made”, which was not represented in the options on the “experience with AI art” question.
I did worse when asked to Think Carefully Step By Step.
I've been trying for several days to access the test but repeatedly get "forms.gle took too long to respond"
Any suggestions?
These are MUCH better than most AI art I've seen, what programs/ models/ etc. were used? I'm particularly impressed by qentba ynql, which I only realized was AI when I went back to it and realized that the ersyrpgvba jnf fhogyl qvssrerag sebz gur znva vzntr va jnlf gung jrer qrsvavgryl whfg jebat engure guna checbfrshy.
I don't think that this is a valid AI art turing test and I don't think that such a thing is even possible.
Part of the formula for the Turing test is that it is an interactive process conducted in real time.
The nature of art alone makes the real time part impossible.
This "test" would be like taking a transcript of several texttual Turing tests, cutting out individual prompts and responses and asking a third party to evaluate them out of context.
I see no way that anyone could be certain that an image was AI generated if the person creating the image spent enough time refining the prompt and weeding out all but the best attempts. The one shot attempts off a simple prompt are unlikely to be convincing, at least for now.
Are we determining whether AI can create art like a human or whether some humans create art like an AI?
After checking the answers, I would argue that much of the human art could have been AI generated and isn't very good. Certainly some of the futuristic/fantasy art almost has to have detail created in an AI like way
I posted some thoughts about this quiz on tumblr:
https://www.tumblr.com/nostalgebraist/764616722122768385/i-definitely-struggled-with-the-what-level-of
Doesn't the strategy you use to hand-pick the human art strongly bias the test result? Like, it's a very different test if you pick "random human art" versus "human art that is trying to distinguish itself from AI" versus "art that I, Scott Alexander, think is appropriate for this test".
Worth noting a selection effect: someone who looks at every one of these and is like, man how am I supposed to know?, is someone who is less likely to complete the quiz. (I didn't complete it myself, for this reason.) I guess that filters for respondants being higher-confidence than your readers in general.
So I had a (what looks like) unique strategy for telling which is which
AI art has a "vibe" (I call it "the ick" or "shading ick") that isn't in human work
I will say it was rare if judged something that was AI that actually human
breakdown ran through rot13:
Natry jbzna: U (pbeerpg)
ab vpx
fnvag va zbhagnvaf: U (pbeerpg)
ab vpx
oyhr unve navzr tvey: U (pbeerpg)
ab vpx naq V xabj gung NV qbrfa'g qb navzr yvxr guvf
tvey va svryq: U (vapbeerpg, NV)
V abgvprq gur zvffzngpurq rlrf ohg V jnagrq gb tvir gur negvfg gur oravsvg bs gur qbhog. Rlr flzzrgel vf uneq
qbhoyr fgnefuvc: NV (vapbeerpg, uhzna)
bar bs gur srj rkprcgvbaf, V guvax guvf vf cubgbfubc naq gung'f jung gevccrq zr hc
oevtug whzoyr jbzna: NV (pbeerpg)
ab vpx ohg gurer vf gung snzbhf nv sentzragngvba va ure rlrf
Pureho: NV (pbeerpg)
gur vpx vf fhogyr ohg vg'f gurer
Cenlvat va gur tneqra U (pbeerpg)
ab vpx
gebcvpny tneqra: U (pbeerpg)
gur cynagf ng gur sbertebhaq ybbxrq n ovg vssl ohg jngrepbybef pna qb gung, ab fbyvq vpx.
napvrag tngr: NV (pbeerpg)
vpx bhg gur jnmbb ubyl penc. ab pbagrfg
Terra Uvyyf: U (vapbeerpg NV)
gurer'f ab pyrne vpx ohg ybbxvat onpx ba vg V frr gur fnzr gryyf sebz gur jbzna rneyvre va gur ubhfrf
Ohpbyvp Fprar: U (pbeerpg)
ab vpx
navzr tvey va oynpx: NV (pbeerpg)
abj gungf na NV navzr tvey, vpx nobhaq
snapl pne: uhzna (pbeerpg)
ab vpx
Terrx Grzcyr: Uhzna (pbeerpg)
lrf gur crbcyr va gur sebag ner wneevat ohg abg va n jnl gung errxf bs nv whfg uhzna perngvivgl
fgevat qbyy: NV (pbeerpg)
nabgure bar jvgu vpx nyy bire
Natel pebffrf: uhzna (vapbeerpg nv)
V'yy gnxr gur Y ba guvf bar V fubhyq unir frra gur thax ba vg
Envaobj Tvey: uhzna (pbeerpg)
ab vpx whfg ernyyl cerggl
perrcl fxhyy: uhzna (pbeerpg)
gur thax urer vf qvssrerag V guvax guvf zvtug or n cubgb
yrnsl ynar: Uhzna (vapbeerpg NV)
v...guvf vf bar bs gur barf gung gehyl onssyrf zr gurer'f ab vpx ab rneyl thax...V qhaab zna
Vpr Cevaprff: NV (pbeerpg)
nu nabgure vpx srfg
Pryrfgvny qvfcynl: Uhzna (pbeerpg)
nyzbfg tbg guvf bar jebat ohg gurer'f fbzrguvat nobhg vg..erirefr vpx
zbgure naq puvyq: uhzna (vapbeerpg NV)
va ergebfcrpg gurer ner...qrgnvyf gung tvir guvf bar njnl
senpgherq ynql: uhzna (vapbeerpg NV)
ntnva fubhyq unir frra gur vpx, vg'f qrnq pragre
tvnag fuvc: NV (vapbeerpg uhzna)
nabgure rkprcgvba, jung V gubhtug jnf vpx jnf cebonoyl whfg cubgbfubc
zhfphyne zna: uhzna (vapbeerpg NV)
nabgure Y V'yy gnxr, V fubhyq unir frra gur gbb zhpu qrgnvy, tbg ybfg va gur cuvybfbcul bs vg
zvanerg obng: NV (pbeerpg)
gur vpx vf onpx ubeenl!
checyr fdhnerf: uhzna (pbeerpg)
ab vpx naq jub jbhyq jnag gb trarengr fbzrguvat yvxr guvf? ubj jbhyq lbh trg fbzrguvat yvxr guvf?
crbcyr fvggvat: uhzna (pbeerpg)
ab vpx whfg avpr jbex
tvey va juvgr: Uhzna (pbeerpg?)
gur arj bar naq V guvax V xabj gur negvfg guvf vf (jung n sernxl bar gb or nqqrq)
evirefvqr pnsr: uhzna (vapbeerpg NV)
nabgure bar gung onssyrf zr, ab vpx ab bqqvgvrf
frerar evire: uhzna (pbeerpg)
ab vpx naq gung bar gerr jbhyq or rnfl sbe NV gb zrffhc
ghegyr ubhfr: nv (pbeerpg)
gur vpx vf onpx ntnva!
fgvyy yvsr: Uhzna (vapbeerpg NV)
gurer zvtug or fbzr vpx ng gur obggbz ybbxvat onpx ba vg?
jbhaqrq puevfg: Uhzna (pbeerpg)
V guvax V'q frra guvf bar orsber
juvgr oybo: uhzna (pbeerpg)
gur funcrf naq pbybef ner gbb pyrna gb or nv
jrveq oveq: nv (pbeerpg)
qvq guvf sbby nalbar?
bzvabhf ehva: V ibgrq uhzna ohg V jnf fcyvg (vg'f NV)
V abj frr gur vpx ba gur ohvyqvat vgfrys
inthr svtherf: uhzna (pbeerpg)
nu gur ornhgl bs uhzna znqr enaqbzarff
qentba ynql: NV (pbeerpg)
whfg...fb zhpu vpx
juvgr synt: uhzna (pbeerpg)
ab vpx
jbzna havpbea: uhzna (pbeerpg)
ab vpx nyfb V ernyyl yvxr guvf cvrpr
ebbsgbcf: NV (pbeerpg)
nu gur sentzragngvba thax vf onpx va shyy
cnevf fprar: uhzna (vapbeerpg nv)
nabgure bar gung onssyrf zr, ab vpx ab thax ab bqvgvrf
cerggl ynxr: NV (pbeerpg)
gur vpx vf fhogyr ohg gurer (V znl unir fhozvgrq gur jebat nafjre ol nppvqrag)
ynaqvat pensg: NV (pbeerpg)
bu obl vpx nobhg jvgu guvf bar
synvyvat yvzof: uhzna (pbeerpg)
nabgure cubgb!
pbybeshy gbja: uhzna (pbeerpg)
ab vpx
zrqvgreenavna gbja: NV (pbeerpg)
guvf vf n terng rknzcyr bs jung gur vpx npghnyyl zvtug or, NV jnagf gb chg qrgnvy jurer gurer vfa'g nal fb rira fvzcyr fghss yvxr guvf unf n uvtu pbagenfg naq funcrf gung qba'g znxr zhpu frafr
chax ebobg: NV (pbeerpg)
vg nyzbfg ybbxf yvxr n cubgb ohg gurer'f rabhtu vpx gb or rivqrag.
Apropos paintings vs reproductions...
EEGs taken of viewers while viewing the art reveal that real artworks elicit a powerful positive response in the precuneus area of the brain — much greater than their response to reproductions. This is according to a study sponsored by the Mauritshuis Museum.
If I understand the press release, Vermeer's Girl With the Pearl Earring is especially stimulating to precunei, while reproductions are much less so.
As a museum addict, I can attest to the difference between experiencing a real painting up close and personal and viewing an ultra-high-resolution image of the painting. I'm still mulling over the implications for AI art versus art made by human-intent. Maybe I'll have something to say on this subject for the next Open Thread.
https://www.mauritshuis.nl/en/press-releases/girl-with-a-pearl-earring-visually-captivates-the-viewer/
Some anti-AI people on Tumblr encountered this, came to the conclusion it was an AI training data thing, and deliberately submitted incorrect responses to sabotage it. I hope this doesn't mess up the results too much. One of them in particular suggested answering "AI" for every single question.
This is definitely late. Hope the results come soon.
Where are the results for this? W/c 28th Oct was the given date for the analysis. Thanks.
Results?
I understand that you're no longer accepting submissions on the form.
Is there any way that you can edit the form, so that instead of collecting the information on your spreadsheet, it just e-mails the person their own answers? I'm teaching a class on AI literacy next term, and I think this would be a nice exercise to have students do for their own edification, with a record of their answers for when I tell them to read your follow-up post.