How Did You Do On The AI Art Turing Test?

...

Nov 20, 2024

Last month, I challenged 11,000 people to classify fifty pictures as either human art or AI-generated images.

I originally planned five human and five AI pictures in each of four styles: Renaissance, 19th Century, Abstract/Modern, and Digital, for a total of forty. After receiving many exceptionally good submissions from local AI artists, I fudged a little and made it fifty. The final set included paintings by Domenichino, Gauguin, Basquiat, and others, plus a host of digital artists and AI hobbyists.

One of these two pretty hillsides is by one of history’s greatest artists. The other is soulless AI slop. Can you tell which is which?

If you want to try the test yourself before seeing the answers, go here. The form doesn't grade you, so before you press "submit" you should check your answers against this key.

Last chance to take the test before seeing the results, which are:

…

1: Most People Had A Hard Time Identifying AI Art

Since there were two choices (human or AI), blind chance would produce a score of 50%, and perfect skill a score of 100%.

The median score on the test was 60%, only a little above chance. The mean was 60.6%. Participants said the task was harder than expected (median difficulty 4 on a 1-5 scale).

How meaningful is this? I tried to make the test as fair as possible by including only the best works from each category; on the human side, that meant taking prestigious works that had survived the test of time; on the AI side, it meant tossing the many submissions that had garbled text, misshapen hands, or some similar deformity. But this makes it unrepresentative of a world where many AI images will have these errors.

This lovely AI image (generated by Jack Galler) almost made it into the test, but I noticed at the last second that the kid had no thumb. Or is the thumb hidden? I’m not sure, but I didn’t want to make it too easy.

I also tried to pick human works with a minimum of "tells" that would reveal their humanity without requiring any subtle artistic discrimination. So I stayed away from text (non-garbled text would be a strong sign that a picture was human), complicated wrestling-like poses (AIs mostly can't do these and end up with limbs emerging from nowhere) and pop art (something about the clean lines and replicated images is a bad match for AI's abilities). Again, this makes the test unrepresentative of a world where some art does have these "tells".

Finally, I avoided most AI art in the DALL-E "house style", since everyone already knows this is AI - or in other similar styles that humans would have trouble replicating, maybe because they do too much with color and lighting, in a way that few human artists would have the talent or patience for.

I like this picture. There’s nothing wrong with it. But somehow it’s obviously AI. If you asked me why, I’d say “something about the lighting”. But the lighting is good! I bet lots of human artists *wish* they could do lighting like this. So what’s going on? I don’t know, but I avoided pictures in this style.

It might be fairest to say that this test demonstrated that most people have a hard time identifying AI art based on subtle differences in style and quality. But in real life, there will usually be other factors of the type that this test deliberately excluded.

2: Most People Couldn’t Help Judging Art By Its Style

I warned test-takers that I included human and AI art in a variety of styles, and that they shouldn’t judge art as human just because it looked like an oil painting, or judge it as AI just because it looked like a digital image.

Respondents didn’t heed my warning. One reason for their poor performance was clumping of results by style (in reality, each style was near-evenly distributed across the two categories).

The “human bias” term indicates what percent of art in each category test-takers identified as human, normalized to a situation where the correct answer was always 50%. So in a 50-50 mix of AI and human 19th century art, they would incorrectly guess it was 75-25 human; in a 50-50 mix of digital art, they would incorrectly guess it was only 31% human.

Your instincts were worst for Impressionism; you identified every single Impressionist painting as human except the sole actually-human Impressionist work in the dataset (Paul Gauguin’s Entrance To The Village Of Osny).

Gauguin’s “Entrance to the Village of Osny”, which apparently looked more artificial than any of the actual AI-generated Impressionist pieces in the dataset.

Likewise, huge majorities voted that several human-generated digital images were by AIs:

Mitchell Stuart’s “Victorian Megaship”, which 84% of you thought was AI generated.

3: Most People Slightly Preferred AI Art To Human Art

I asked participants to pick their favorite picture of the fifty. The two best-liked pictures were both by AIs, as were 60% of the top ten.

This image (AI, generated by Jack Galler) was the best-loved in the competition.

Could this be an artifact of poorly chosen pictures? Most of the best-loved AI images were Impressionist; by chance, this category was somewhat AI-dominated in my dataset, so this could just reflect a love of Impressionist paintings (or a particular aptitude for AI in this area). But the human Impressionist painting I included (Entrance To The Village Of Osny, above) was actually quite unpopular. And if we remove all Impressionist paintings, then although humans reclaim the top two spots, an AI is still #3, and the machines still take 40% of the new top ten.

4: Even Many People Who Thought They Hated AI Art Preferred It

I asked participants their opinion of AI on a purely artistic level (that is, regardless of their opinion on social questions like whether it was unfairly plagiarizing human artists). They were split: 33% had a negative opinion, 24% neutral, and 43% positive.

The 1278 people who said they utterly loathed AI art (score of 1 on a 1-5 Likert scale) still preferred AI paintings to humans when they didn't know which were which (the #1 and #2 paintings most often selected as their favorite were still AI, as were 50% of their top ten).

These people aren't necessarily deluded; they might mean that they're frustrated wading through heaps of bad AI art, all drawn in an identical DALL-E house style, and this dataset of hand-curated AI art selected for stylistic diversity doesn't capture what bothers them.

5: But Others Might Genuinely Be On A Higher Plane Than The Rest Of Us

I asked a friend (who does digital art under the handle “Ilzo”) to beta-test an early version of the challenge. She wowed me with her ability to correctly identify AI pictures that I considered well-camouflaged. When we got to Piotr Binkowski’s ruined gateway - an AI picture I especially liked, but which she found especially slop-ish, I demanded she explain herself.

She said:

When real pictures have details, the details have logic to them. I think of Ancient Gate being in the genre "superficially detailed, but all the details are bad and incoherent". The red and blue paint and blank stone feel like they're supposed to evoke worn-ness, but it's not clear what style this is supposed to be a worn-down version of. One gets the feeling that if all the paint were present it would look like a pile of shipping containers, if shipping containers were only made in two colors. It has ornaments, sort of, but they don't look like anything, or even a worn-down version of anything. There are matchy disks in the left, center, and right, except they're different sizes, different colors, and have neither "detail which parses as anything" nor stark smoothness. It has stuff that's vaguely evocative of Egyptian paintings if you didn't look carefully at all. The left column has a sort of door with a massive top-of-doorway-thingy over it. Why? Who knows? The right column doesn't, and you'd expect it to. Instead, the right column has 2.5 arches embossed into it that just kind of halfheartedly trail off. I'm not even sure how to describe the issues with the part a little above the door. It kind of sets a rhythm but then it gets distracted and breaks it. Are these semi-top protruding squares supposed to be red or blue? Ehh, whatever. Does the top border protrude the whole way? Ehh, mostly. Human artists have a secret technique, which is that if they don't know what all the details should be they get vague. And you can tell it's vague and you're not drawn to go "hmm, this looks interesting, oh wait it's terrible".

And later, after the discussion veered more philosophical:

I think part of the problem with AI art is that it produces stuff non-artists think look good but which on close inspection looks terrible, and so it ends up turning search results that used to be good into sifting through terrible stuff. Imagine if everyone got the ability to create mostly nutritional adequate meals for like five cents, but they all were mediocre rehydrated powder with way too much sucralose or artificial grape flavor or such. And your friends start inviting you over to dinner parties way more often because it's so easy to deal with food now, but practically every time, they serve you sucralose protein shake. (Maybe they do so because they were used to almost never eating food? This isn't a perfect analogy.) Furthermore, imagine people calling this the future of food and saying chefs are obsolete. You'd probably be like "wow, I'm happy that you have easy access to food you enjoy, and it is convenient for me to use sometimes, but this is kind of driving me crazy". I feel like this is relevant to artist derangement over AI art, though of course a lot of it is economic anxiety and I'm a hobbyist who doesn't feel like a temporarily embarrassed professional and thus can't relate.

Her theory gets some support from the data. The average participant scored 60%, but people who hated AI art scored 64%, professional artists scored 66%, and people who were both professional artists and hated AI art scored 68%.

The highest score was 98% (49/50), which 5 out of 11,000 people achieved. Even with 11,000 people, getting scores this high by luck alone is near-impossible. I’m afraid I don’t know enough math to tease out the luck vs. skill contribution here and predict what score we should expect these people to get on a retest. But it feels pretty impressive.

So maybe some people hate AI because they have an artist's eye for small inadequacies and it drives them crazy.

What Did We Learn About Art?

Alan Turing recommended that if 30% of humans couldn’t tell an AI from a human, the AI could be considered to have “passed” the Turing Test. By these standards, AI artists pass the test with room to spare; on average, 40% of humans mistook each AI picture for human.

What does this tell us about AI? Seems like they’re good at art. I’m more interested in what it tells us about humans.

Humans keep insisting that AI art is hideous slop. But also, when you peel off the labels, many of them can’t tell AI art from some of the greatest artists in history. I’ve tried to be as fair as possible to these people, proposing that maybe they’re just expressing frustration with the proliferation of the DALL-E house style. And maybe some really do have an amazing eye for tiny incongruous details.

But it also seems very human to venerate sophisticated prestigious people, and to pooh-pooh anything that feels too new or low-status or too easy for ordinary people to access - without either impulse connecting with the actual content of the painting in front of you.

Marcel Duchamp famously tried to put a urinal in an art museum to challenge people’s view of what art was. The administration rejected it, but Duchamp had the last laugh: in 2004, a survey of art professionals judged it the most influential artwork of the 20th century. Art, it seems, is most meaningful when it challenges our very concept of what art is.

By this standard, I submit that Sam Altman is the greatest artist of the 21st century.

Thanks to everyone who took the test. You can download a .xlsx file of the results (stripped of identifying details) here.

Appendix: Attributions For Test Images

1: Angel Woman

Human. This is “Living Saint Hazel” by LJ Koh, as seen at /r/ImaginaryWarhammer.

This was the picture that sparked the strongest disagreement, measured by the sum of people who said it was the most-certainly-human picture in the dataset plus the people who said it was the most-certainly-AI picture. Some of the people who got it right commented that it was from Warhammer and the uniforms had accurate Warhammer symbols - if I had realized this, I would have disqualified it, sorry.

2: Saint In Mountains

Human. This is “St. Anthony Abbot Tempted By A Heap Of Gold”, by the “Ozzervanza Master”, an unknown Italian Renaissance painter from around 1435. Apparently it used to have a heap of gold in the bottom corner tempting St. Anthony, but this was “scraped out”. If I had known that originally, I would have disqualified this one too, since it might spoil something uniquely human about the integrity of the composition.

3: Blue Hair Anime Girl

Human. This is Hatsune Miku, a “virtual idol” from the late 2000s/early 2010s.

4: Girl In Field

AI. This image was generated by Ryan Wise, an AI art hobbyist who reads ACX and responded to my request for good AI pictures.

5: Double Starship

Human. This is “Malabar”, by Wojtek Kapusta.

6: Bright Jumble Woman

AI, also by Ryan.

7: Cherub

AI. This one was generated by another ACX reader, Jack Galler.

8: Praying In Garden

Human. This is “Agony In The Garden” by Andrea Mantenga, 1455.

9: Tropical Garden

Human. This is “Garden” by David Hockney. A very similar Hockney painting sold for $8 million in 2021.

10: Ancient Gate

AI. This is by Piotr Binkowski, a well-known AI art maker who posts his work on his Twitter.

11: Green Hills

AI. Another one by Jack.

12: Bucolic Scene

Human. This is “Dover Plains” by Asher Durand, painted 1848. It depicts the Hudson Valley in New York.

13: Anime Girl In Black

AI. Sorry, I seem to have lost the original source on this one, let me know if it’s yours.

14: Fancy Car

Human. This is “Ferrari Testarossa Neon Retrowave Synth”, by Arslan Safiullin.

15: Greek Temple

Human. This is “The Apotheosis Of Homer”, by Jean-Auguste-Dominique Ingres (1827).

It's also the only one that was (sort of) a trick question: after I selected it for the dataset, I noticed it contained text. Normally that would be disqualifying (correct text is too obviously human). But the most prominent text is the “OMHP” on the temple, which spells “Homer” in Greek but is gibberish in English. I was curious how many people would judge a famous work of art to be AI-generated just because it had seemingly gibberish text on it; the answer was 60%.

16: String Doll

AI. This is “Strings Come Alive” by Nikko P at Nightcafe. This was the picture that people were most confident was AI (they were right).

17: Angry Crosses

AI. This is another one by Ryan.

18: Rainbow Girl

Human. This is “Rainbow Hair” by rjv-ilustracion.

19: Creepy Skull

Human. This is “Untitled (Skull)” by Jean-Michael Basquiat in 1981. A version of this painting sold for $110 million in 2017 and was “the priciest work ever sold by a US artist”.

20: Leafy Lane

AI. This is another one by Jack.

21: Ice Princess

AI. This is “Snow Princess” by Ai Xi, seen at PixAI.

22: Celestial Display

Human. This is “Five Minutes Of Silence” by Hangmoon, seen at DeviantArt. This was the top-rated human picture.

23: Mother And Child

AI. This is “Ukrainian Madonna”, generated by TheLibertarianCatholic.

24: Fractured Lady

AI, another one by Ryan.

25: Giant Ship

Human. This is Victorian Megaship by Mitchell Stuart. This was the human picture that people got most wrong (ie were most likely to vote as AI).

26: Muscular Man

AI, another one by Ryan.

27: Minaret Boat

AI. This is “Built For The Princess” by Nikko P at Nightcafe.

28: Purple Squares

Human. This is “Fire At Full Moon” by Paul Klee, and is supposed to be a “Cubist style depiction of a night sky”.

29: People Sitting

Human. This is “Tailor’s Workshop” by Quiringh van Brekelenkam, 1660.

30: Girl In White

Human. This is “Portrait of Charlotte du Val d'Ognes” by Marie-Denise Villers (1801). I messed up adding this to the test, so only about half of you saw it.

31: Riverside Cafe

AI, another one by Jack. This was the most popular picture in the dataset.

32: Serene River

Human. This is “Banks Of The Oise At Auvers”, by Charles-François Daubigny (1863)

33: Turtle House

AI. This is “Mobile Home”, by Bellemia, seen on Nightcafe.

34: Still Life

AI, another one by Jack.

35: Wounded Christ

Human. This is “The Mourning Of Christ” by Giovanni Girolamo Savoldo (1515). This was the picture that people were most confident was human (they were right), but a few people protested and said that the anatomy was so wrong that it must be AI-generated. Sorry, I guess Giovanni Girolamo Savoldo just wasn’t very good at anatomy. Maybe that’s why Michelangelo had to dissect all those corpses.

36: White Blob

Human. This is from “Le Lezard aux Plumes d'Or” by Joan Miro (1971).

37: Weird Bird

AI. This another one from Ryan. People say AI can’t invent new styles, but I’ve never seen any human make this exact type of weird bird.

38: Ominous Ruin

AI, Ryan again.

39: Vague Figures

Human. This is “Blood Thicker Than Mud”, by Cecily Brown (2021)

40: Dragon Lady

AI. This is “To Me, You’re Perfect” by Ria Hagane on Nightcafe, made with DALL-E3.

41: White Flag

Human. This is “Meeting At Krizky” by Alphonse Mucha (1916). It is part of his Slav Epic, a series of paintings on the history of Eastern Europe, and depicts a meeting of the Hussite sect, whose attempts to found a sort of proto-Protestantism would spark the 15th-century Hussite Wars.

42: Woman Unicorn

Human. This is “The Maiden And The Unicorn” by Domenichino (1602)

43: Rooftops

AI. I managed to lose this one, sorry! If it’s yours, let me know and I’ll give you credit.

44: Paris Scene

AI, another one by Jack. This was the AI picture that people got most wrong (ie were most likely to vote as human).

45: Pretty Lake

AI, Jack again.

46: Landing Craft

AI, Ryan again. Ryan gave me lots of good sci-fi AI images, and I chose this one. People got it pretty easily, and I keep second-guessing myself and wondering if some of the others were better.

47: Flailing Limbs