305 Comments
May 30, 2022·edited May 30, 2022

Amazing images! But there's one problem: Tycho Brahe didn't use a telescope.

https://www.pas.rochester.edu/~blackman/ast104/brahe10.html

See also: https://en.wikipedia.org/wiki/Tycho_Brahe#Tycho_Brahe's_Instruments

Expand full comment
May 30, 2022·edited May 30, 2022

Somewhat off topic, but the fact that Tycho Brahe was able to approximate the true size of the cosmos without a telescope is impressive - of course, he immediately rejected it as ridiculous, that distant stars could be larger than the orbit of earth...

Expand full comment

Speaking of small details: Rather than Cervinae, reindeer and moose are both in Capreolinae. They are indeed in the deer family, but that is Cervidae.

Expand full comment
author

You're right, thanks, fixed.

Expand full comment

Didn’t Brahe have a silver nose? He lost his nose in a duel and wore a prosthetic silver one. I knew the moose story, too, but I always think of Brahe as “the guy with a silver nose”, so I was surprised that detail wasn’t included by Scott or DALL-E.

Expand full comment

Exactly! I wanted to say the same thing. Although it probably was brass, not silver.

Expand full comment

I heard that wore a brass nose most of the time, but had a silver nose for special occasions.

Expand full comment

Came here to say the same. I totally knew about the nose, but not about the moose! 👃🏻🗡😛

Expand full comment

I was trying to figure out if Brasenose College at Oxford was named after a traditional prosthetic body part, but Wikipedia tells me "Its name is believed to derive from the name of a brass or bronze knocker that adorned the hall's door."

https://en.wikipedia.org/wiki/Brasenose_College,_Oxford

Expand full comment

Maybe this says something about my priorities, but after "astronomy" the second category I have Brahe mentally filed under is "lost nose in duel". If he's on Jeopardy I'd wager decent money the answer is "This Danish astronomer lost part of his nose in a duel". The moose pictures are funnier though.

Expand full comment

There's a joke to be made here about DALL-E subtly protesting the inaccuracy via the backwards telescope in the first moose shot. Alas, I can't get the details right.

Expand full comment

Maybe DALL-E is more subtle than you think, and is trying to be accurate to period style when it makes the figures in stained glass windows look nothing like the people they're supposed to portray?

Expand full comment

You can repair the Reverend's head with 'uncropping', expanding the image upwards. Examples: https://www.reddit.com/r/dalle2/search?q=flair%3AUncrop&restrict_sr=on

Expand full comment

Am I the only one for whom 'metal nose', and not 'pet moose', was the defining trait of Tycho Brahe?

Expand full comment

No, you're not.

Expand full comment

Nope! Kinda disappointed Scott dropped the (admittedly rather small) ball on this one.

Expand full comment

I just scrolled down to the comments to look for this exact point, haha. I'd love to see what happens if you throw "metal nose" into the mix, especially with Rudolph and Santa imagery already muddling things.

Expand full comment

I googled him just now; not every picture shows his metal nose, but every picture shows his big long mustache.

Expand full comment

Metal nose, or diagram of the Tychonic system (a nice reminder, also, that precision doesn't guarantee accuracy). His prosthetic nose was apparently flesh-toned and fairly inconspicuous, but if I saw an astronomer with a conspicuous metal nose, I'd assume Brahe was meant.

I am sad I didn't learn about the moose earlier.

Expand full comment

No, no, no; his most memorable trait is how he died!

Expand full comment

etiquette-induced uremia IIRC

Expand full comment

I think of him as exploding bladder guy. Probably wasn't real but it is definitely the thing I associate with him.

Expand full comment

Now that would be interesting to see in bot-rendered stained glass!

Expand full comment

Yes, the three facts I knew about Brahe were - Kepler's teacher who did astronomy before telescopes; metal nose; died because he was embarrassed to leave for the bathroom while drinking with the king. I had never heard of the moose thing.

Expand full comment

That and the data collection are all I know about him!

Expand full comment

Off-topic, yet topical enough I don't want to put this in the off-topic thread: Matt Strassler has been doing a bunch of “how to discover/prove basic astronomy facts for yourself” recently: https://profmattstrassler.com/2022/02/11/why-simple-explanations-of-established-facts-have-value/

Expand full comment

Here's my idea: Alexandra Elbakyan standing on the shoulders of the Montgolfier brothers, who are themselves standing on the shoulders of Thomas Bayes, who is standing on the shoulders of Tycho Brahe, who is standing on the shoulders of William of Ockham. Yes, I know DALL-E wouldn't want to stack so high as it was already cutting off heads. So I might as well have Ockham standing on a giant turtle.

Expand full comment

What's the turtle standing on though?

Expand full comment

It's turtles all the way down.

Expand full comment

I think it's Darwin Finches all the way down.

Expand full comment

The turtle is standing on Terry Pratchett's legacy.

Expand full comment

In the style of the elevator in the haunted mansion at disneyland.

Expand full comment

The AI seems fuzzy on what, exactly, a telescope is used for. Most of the time Tycho seems to be trying to breathe through it, or lick it, or trying to stick it up his nose; even when he is looking through the eyepiece, as often as not he's just staring at the ground. I dunno, maybe the AI heard that story about the drunken moose and figured that Tycho himself was typically fully in the bag

Expand full comment

😂 hadn't noticed that, but going back again, it's pretty hilarious.

Expand full comment

It's funny how the telescope is pointed at his nose or mouth in every single image. You'd think DALL-E would get it right at least once, just by chance?

Expand full comment

This was my biggest takeaway from the images, yes

Expand full comment

But Tycho never used a telescope, he used a quadrant.

Expand full comment

Trained on Smell-o-scope from Futurama?

Expand full comment

When scratch and sniff cards were flown ordering Brahe to retreat, he raised his smelloscope to his metal nose and said "I have a right to be anosmic sometimes. I really do not smell the signal"

Expand full comment

Tycho didn't use telescopes, did he? Isn't he the guy who built, like, giant sextants & things like that?

Expand full comment

No, no, you misunderstand. DALL-E imagines Tycho thinking: "I wish that goddam moose would get out of my field of view so that I can resume my observations."

Expand full comment

It was the scales that Bayes were holding that made me laugh, especially the one where he his holding the scale up by one of its bowls.

Expand full comment

Image searching "looking through telescope" often gets you stock photos of people looking at a telescope instead of holding it to their eye (maybe that makes a better shot of their face?) or looking through a reflector telescope (which requires you to look 90 degrees from the direction of the scope), so maybe it has a hard time recognizing that "looking through" is an important part of that phrase?

Expand full comment

More like breathing through a telescope or sniffing through one!

Expand full comment

Yes, that was the first thing I noticed.

Expand full comment

Would love to see this with Imagen, Google's even-newer image synthesizer.(No public demo though, alas.) In the examples we've seen it does a much better job of mapping adjectives to their corresponding noun instead of just trying to apply all the adjectives to all of the nouns, which is the main failure going on here.

Expand full comment

Re faces, OpenAI says:

> Preventing Harmful Generations

> We’ve limited the ability for DALL·E 2 to generate violent, hate, or adult images. By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to these concepts. We also used advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures.

Expand full comment

It's always possible that that was their preferred way of saying "we are not capable of generating recognizable faces".

Expand full comment

Very unlikely, since much weaker models can do that - it does seem clear that they have explicitly gone out of their way to disrupt the generation of realistic faces.

Expand full comment
May 31, 2022·edited May 31, 2022

It says specifically that they intended to prevent photorealistic face generation. They could have used a solution that also happened to disrupt stylized face generation, but that would be unfortunate, and would go beyond what they said here.

I think Scott's explanation that it's anchoring heavily on style cues is more likely.

Also importantly, it's *not* trained as a facial recognition algorithm, so it doesn't necessarily even know what features to use to identify individuals. (It has no way of knowing, for example, that I don't prepare for shaving by putting on my big red beard, or that a woman doesn't age 30 years and dye her hair black when she walks into a library.)

Expand full comment

> I’m not going to make the mistake of saying these problems are inherent to AI art. My guess is a slightly better language model would solve most of them ...

What if the problem is more subtle than either of those two alternatives? What if the mapping between language prompts and 'good' pictures is itself quite fuzzy, such that different people will judge pictures rather differently for the same prompt, due to different assumptions and expectations? Don't we encounter such situations all the time, e.g., in a workplace meeting trying to settle on a particular design? Is it not naive to assume that there are objectively 'best' outputs, and we just need a better model to get them? What if I thought a particular picture was excellent, and you said, "No, no, that's not what I meant?"

Expand full comment

I mean, it's clearly an objective defect that you can tell it a person's name and the art doesn't look like that person.

Expand full comment

Of course. I'm not making the obviously wrong assertion that there are no standards of quality to be applied. There are clearly outputs that we'd all agree are bad. (In the extreme case, consider a model that output a solid black square for every prompt: obviously bad!)

So, now that we've gotten that trivial observation out of the way, let's get back to the substantive point, which I believe still stands.

Expand full comment

You're mixing up whether people like a particular image with whether people will judge a particular image as being a good match for a given prompt. There is a lot of variation in artistic taste, but there is widespread agreement on what images depict.

Expand full comment

The bulk of Scott's post discusses how good the various images are, not just identifying what the image depicts. I'm talking about exactly what Scott is talking about. Moreover, I think Scott is talking about exactly the right thing: image quality based on a wide variety of factors, which is more important than bare identification of what the image depicts.

So, both in the context of Scott's post and more generally, I don't think I'm "mixing up" anything.

Expand full comment

It's not a defect, it's deliberate. The model can do that but they're afraid of DeepFakes so they "broke" it to stop that happening.

Expand full comment

Prepositions are difficult even for people - with, in, as, on can mean very similar things depending on the associated nouns and verbs. DALL-E might not be sensitive to order or phrase structure. If it just soaks out "raven, female name, key, mouth, stained glass, library" and then groups them randomly with prepositions and associates those groups to background, foreground, mid-ground - it wouldn't distinguish between woman as raven, and woman with raven. Those would be reflected in the range of options it generated.

Also it may be chopping up the input into two or three-word phrases based on adjacent words. Putting the same words in the query but in a different order, or re-ordering the phrases such that a person would interpret it the same way, I have a feeling that would generate different results.

Wish I knew what makes it switch to the cartoon-style faces. Possibly it fills in with cartoon style when the input images are less numerous.

And yes, the preferences could vary widely.

Expand full comment

If Scott gets another go at it, maybe trying physical proximity descriptors like "near" or "adjacent" might go better. (Sitting on, standing on, eating, holding, waving, etc.)

Expand full comment

Exactly. I think the key point you make is that some of these aspects "are difficult even for people," which points to a certain indeterminacy in deciding what the "best" output would even be.

Expand full comment

This would probably be cheating in the strict sense, but if DALL-E had an "optometrist" setting where the user could indicate their preference given the first outputs and then refine that, it would help.

Also, Bayes tended to be looking to his right in DALL-E's results, just as he was in the one online image. But it was also not distinguishing his shaded jaw from the background, making his face either pinched or round instead of square-jawed.

Expand full comment

Yes, an architecture that allowed for iterative refinement would be cool. It would mean that, for subsequent rounds of refinement, the previous image would need to be an input, along with the new verbal prompt.

I don't think that's cheating at all. It's a more sophisticated (and probably more useful) architecture.

Expand full comment

The Victorian radiator was a nice touch in one of the "Scholarship" panels, but it makes me think it isn't distinguishing between "in the style of" and "with period-appropriate accoutrements." In addition to iterative refinement, it could have a "without accoutrements" check-box, maybe a "no additional nouns" feature. But I wouldn't want that all the time, since seeing what it comes up with is enjoyable.

Expand full comment

For that to work would you have to train it up with a lot of that sort of refinement dialogs? If so it’s not clear where that training input would come from.

I am so very *not* up on this stuff, but I think it’s still the case in these systems that the “learning” part is all in the extremely compute-intensive training and that the systems that actually interact with users are not doing learning per se. Is that right? It seems like what you’re asking is for it to learn on the fly, and we may not be there yet.

Expand full comment

> I think it’s still the case in these systems that the “learning” part is all in the extremely compute-intensive training and that the systems that actually interact with users are not doing learning per se. Is that right?

Yes, that's correct, for the most part. What would be required is for the training input to include images as well as the language description, with good image output targeted to be similar to the image input. That would allow the system to used at run-time in an iterative manner, even though no learning is going on at that point.

Expand full comment

Yes, human language is ambiguous, but you get to try again. The question is whether you can clarify what you meant like you would with a human artist, or whether it feels more like query engineering.

Expand full comment

I wonder how hard it would be for a future version of the software to include the ability to pick your favourite of the generated images and have it generate more "like that," perhaps even in response to a modified query. I think that might feel more like clarification.

It might also open the door to a future personalized version that could learn your tastes, and perhaps to one that could take images as prompts and refine/alter them in ways that our current algorithms can't. (The uncropping function is already impressive, but I'm thinking of e.g. turning a sketch into a painting or a photorealistic render.)

Expand full comment

Sorry about the long-delayed comment.

Yeah, the very first picture for "Darwin studying finches" shows Darwin looking at a book, with a finch under both of them. { (Darwin studying), (finches) }

Yeah, DALL-E indeed looks like it indeed "might not be sensitive to order or phrase structure".

Expand full comment

The problem is that it doesn't actually understand human language at all. It's just faking it. This is just an example of that.

It's true of all of these things. They're programming shortcuts that "fake it", they can't actually do the thing they're supposed to do, they just do "well enough" that people are like "Eh, it's usable."

This is true of image recognition as well, which is why it is possible to make non-human detectable alterations to images that cause these systems to completely misidentify them.

It's the same thing here - the system isn't actually cognizant of human language or what it is doing, it has created a complex black-box algorithm which grabs things and outputs them based on things it has associated with them heuristically. It's why you end up with red "christmasy" stuff in the reindeer images, and why the images look so bad the more specific you get - it's basically just grabbing stuff it found online and passing it through in various ways.

It seems superficially impressive, and while it is potentially "useful", it isn't intelligent in any way, and the more you poke at it, the more obvious it becomes that it doesn't actually understand anything.

Expand full comment

Curious what you're planning on depicting for the other six virtues

Expand full comment

> These are the sorts of problems I expect to go away with a few months of future research.

Why are you so confident in this? The inability of systems like DALL-E to understand semantics in ways requiring an actual internal world model strikes me as the very heart of the issue. We can also see this exact failure mode in the language models themselves. They only produce good results when the human asks for something vague with lots of room for interpretation, like poetry or fanciful stories without much internal logic or continuity.

Not to toot my own horn, but two years ago you were naively saying we'd have GPT-like models scaled up several orders of magnitude (100T parameters) right about now (https://slatestarcodex.com/2020/06/10/the-obligatory-gpt-3-post/#comment-912798).

I'm registering my prediction that you're being equally naive now. Truly solving this issue seems AI-complete to me. I'm willing to bet on this (ideas on operationalization welcome).

Expand full comment
author
May 30, 2022·edited May 30, 2022Author

It wasn't intended as a precise prediction (it used the word "about" and I described it as "irresponsible"), and we have 10 trillion parameter language models now. So I predicted it would go up from 175 billion to 100 trillion, and so far it's at 10 trillion and continuing to grow. It wouldn't surprise me if it reached 100 trillion before the point at which someone would claim "about two years" stopped being a fair description (though I think this is less likely with Chinchilla taking the focus off parameter counts).

My experience is that almost all of the things people pointed out as "AGI-complete" flaws with GPT-3 are in fact solved in newer models (for example, its inability to do most math correctly), and that you really do just need scale.

If you want to make a bet, I'll bet you $100 (or some other amount you prefer) that we can get an AI to make a stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth sometime before 2025 (longer than the "few months" I said was possible, but probably shorter than the "until AGI" you're suggesting). If neither of us can access the best models to test them, we'll take a guess based on what we know, or it'll be a wash. Interested?

Expand full comment

Do you have a link for the 10T parameter model? haven't heard of this and it would indeed put that prediction into "too close for comfort" range, even if I end up being technically correct.

In principle I'm interested, though I have to consider if your operationalization really addresses the core issue for me. Let me give it a think and get back to you in a day or two.

And just in case it isn't clear, I'm not dissing you, just challenging you in the spirit of adversarial collaboration.

Expand full comment
author

I heard about it at https://towardsdatascience.com/meet-m6-10-trillion-parameters-at-1-gpt-3s-energy-cost-997092cbe5e8 , though I don't know any more than is in that article and if you tell me this isn't "really" 10 trillion parameters in some sense then I'll believe you.

I also don't mean this in a hostile way, but I'm still interested in the bet if you are.

Expand full comment

On a tangent, if you have a list of bets going like that, it'd be nice to have them available on a webpage somewhere where people can see. I'm not sure substack supports that use case at all, though...

Expand full comment

there's a logs & bets subforum on DSL (the semi-official bulletin board): https://www.datasecretslox.com/index.php/board,11.0.html

Expand full comment

I asked about this on DSL, and it seems like these models are gaming the parameter count statistic to appear more impressive than they are. Specifically, they're not dense networks like GPT.

https://www.datasecretslox.com/index.php/topic,6232.msg257289.html#msg257289

Regarding the bet: you're on, but let's make it 3 out of 5 for similarly complex prompts in different domains, and different ways of stacking concepts on top of each other (e.g. off the top of my head: two objects, each with two relevant properties, and the objects are interacting with each other)

Expand full comment
author
Jun 4, 2022·edited Jun 4, 2022Author

All right. My proposed operationalization of this is that on June 1, 2025, if either if us can get access to the best image generating model at that time (I get to decide which), or convince someone else who has access to help us, we'll give it the following prompts:

1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth

2. An oil painting of a man in a factory looking at a cat wearing a top hat

3. A digital art picture of a child riding a llama with a bell on its tail through a desert

4. A 3D render of an astronaut in space holding a fox wearing lipstick

5. Pixel art of a farmer in a cathedral holding a red basketball

We generate 10 images for each prompt, just like DALL-E2 does. If at least one of the ten images has the scene correct in every particular on 3/5 prompts, I win, otherwise you do. Loser pays winner $100, and whatever the result is I announce it on the blog (probably an open thread). If we disagree, Gwern is the judge. If Gwern doesn't want to do it, Cassander, if Cassander doesn't want to do it, we figure something else out. If we can't get access to the SOTA language model, we look at the rigged public demos and see if we can agree that one of us is obviously right, and if not then no money changes hands.

Does that work for you?

Expand full comment

Ok, you're on! Will put this in my prediction log on DSL so it doesn't get lost.

Expand full comment

Thanks for the follow-up post and link to the bet!

While I'm sure it's covered by your future judges, I'm curious if you both agree on the resolution criteria in the case where the AI makes an unintuitive (but grammatically correct) interpretation. What happen if the image contains:

1) an anthropomorphized library has a shoulder (with a raven on it) or a mouth (with a key in it)?

2) a man wearing a top hat?

3) a child with a tail?

4) an astronaut wearing lipstick?

Does it get credit for such interpretations? Or are you looking for one with tighter binding of the prepositional phrases, that I think is more conventional English interpretation?

Or are you looking for one that draws more sensical interpretation from a model of the world, where libraries "obviously" don't have shoulders or mouths? In this case, for #3, I might think it's more sensical for a man to wear a top hat, so wouldn't be surprised if it chose to do that vs the cat. Though I suppose if the collected works of Dr Seuss ends up in the training data, it might have its own view on whether cats wear hats. :)

Expand full comment

I'm skeptical that math will truly be solved just by scaling. These models are trained to make stuff up using the techniques of fiction, in much the same way that a human would when writing fiction. A text generator for generating fiction will always have some chance of deliberately making math mistakes for verisimilitude, just as an image generator isn't going to limit itself to real historical photos, even if you ask for a "real historical photo."

A simple example: if you ask it for the URL of a page that actually exists, will it ever stick to the links it's seen in the dataset, or will it always try to generate "plausible" links that seem like they should exist?

A robo-historian that always used correct links, accurate quotes, and citations from real books, even if the argument is made up, would be pretty neat, and hopefully not too hard to build, but it's not going to happen just by scaling up fiction generators. If you want nonfiction, you have to design and/or train it to never make things up when it shouldn't do that.

Expand full comment

It's been built already, that's Cyc, but unfortunately Cycorp seem to be both uninteresting in marketing, uninterested in open collaboration and uninterested in making their tech more widely available for experimentation. They're very much in the enterprise mindset of making bespoke demos for rich clients.

Expand full comment

Or, you know, the actual reason for restricting access is that it doesn't actually work, and when people experiment with it, it becomes increasingly obvious that it isn't actually doing what it is claimed to do.

Which is exactly what is going on.

None of these systems are actually capable of this functionality because the approach is fundamentally flawed.

Machine learning is a programming shortcut, it's not a way to generate intelligent output. That doesn't mean you can't make useful tools using machine learning, but there are significant limitations.

Expand full comment

Cycorp doesn't use machine learning so I'm not sure what your point is. It's a classical symbolic approach using pure logic.

Expand full comment

> These models are trained to make stuff up using the techniques of fiction, in much the same way that a human would when writing fiction.

Are they? I was under the impression that they were primarily trained to predict "what would the internet say?" (given their training sets).

This suggests that they are at their best when generating the sort of stuff you're likely to find on the internet, which is consistent with what I've seen so far. I've yet to see any evidence of semantic abstraction, which is the absolute minimum for "working like a human does".

Expand full comment

For example, let's say you wanted to write a fictional Wikipedia article about a French city. You might start by reading some real Wikipedia articles of French cities and taking aspects of each and combining them so that your fictional article has many of the characteristics of the real articles.

If you're given half an article about a French city then you could fill in the rest by making good guesses. This will work whether it's a real French city or a fictional one. The parts you fill in are fiction either way.

Similarly, when we train AI models by asking them to fill in missing words regardless of whether they know the answer, we are training them to make up the details. "The train station is _ kilometers from the center of town." If it doesn't know the answer it's going to guess. What's a plausible number? Guessing right is rewarded.

I wrote "using the techniques of fiction" when I should have just said "training to be good at guessing." Sorry about that!

Expand full comment

I dunno, that seems like a pretty low bar. It's a step above Mad Libs or Choose Your Own Adventure, but only just. Seems a long way from creative writing, still less an essay with an original point.

And heck if it were that easy, you could teach *human beings* to be good writers by just having them read a bunch of well-know good authors and a sampling of bad authors, suitably labeled, and ask them to mimic the good authors as best they can. Which doesn't work at all, public school English curricula notwithstanding.

Expand full comment

I'm not saying it's *good* fiction. But it's still closer to making stuff up than writing something evidence-based.

A trustworthy AI would give you a real Wikipedia article if you asked for one, like a search engine would. If it made stuff up, it would indicate this. It seems like this basic trustworthiness (being clear about when you're making things up) would be a small step towards AI alignment?

Expand full comment

They are already advanced well beyond 'repeat back what I read on reddit' level of sophistication.

https://arxiv.org/abs/2205.11916 as one example.

Expand full comment

See, that's part of the problem. You consider making things up to be more "advanced" and "sophisticated" than accurate reporting and sourcing. But an AI that doesn't provide sources is worse at nonfiction.

Expand full comment

I honestly have no idea what that has to do with anything I said, nor the comment I was replying to.

Expand full comment

I'm rather sure that you're right that there are problems in this that can't be solved by scaling, but there are also problems that can be.

OTOH, what do you do about problems like the Santa Claus attactor? When the description (as understood) is fuzzy, I think that it will always head to the areas that have heavier weights. Perhaps an "exclude Santa Claus" would help, but that might require recalculating a HUGE number of weights. (And the robo-historian would have the same problem. So do people: "When all you have is a hammer, everything looks like a nail")

Expand full comment

Following, in case this thread is updated with the real world outcome in "a few months".

Expand full comment

I know/have known people who developed machine learning systems and tried to train them to solve problems. Watching them tinker with them, and discussing the issues with them, it was profoundly obvious to me that this is a programming shortcut and that it is being marketed to the public and investors as something it isn't. There was a push to describe it using different terminology so people wouldn't think it was "intelligent" but it got defeated, probably because "Artificial intelligence" is a much more marketable term and gets people investing than the more accurate terminology which set better expectations.

These systems have issues with overfitting (you feed in data, and it is great at looking at data from that set, but terrible outside of it) and you also find weird things like it finds some wonky thing which matches your data set but not the data set in general, or it latches onto some top-level thing that is vaguely predictive and can't get off of it (hence the racist parole "AI" in Florida that would classify white people as lower risk than black people, not based on individual characteristics but their race, because black people in general have a higher rate of recidivism).

The dark side of this: if people actually understood how these systems worked, no one would allow the "self driving cars" on public roads. There's a reason why people affiliated with those push the perception of intelligent "AI", and it's because it allows them to make money. When people were told about the self-driving car that hit the pedestrian flickering between different identifications for the pedestrian, and the revelation that it was throwing out data and not slowing down because if it stopped driving any time it had issues with its recognition being inconsistent, it would not be able to drive at all.

But in the end, these systems are all like this.

These systems are not only not intelligent, they aren't designed to be intelligent and aren't capable of it. They can't even fake it, once you understand how they function.

What is actually going on is that actually writing a program to solve many tasks is extremely hard, to the point where no one can do it.

What they do instead is use machine programming as a shortcut. Machine programming is a way of setting up a computer with certain parameters and running it to make it construct a black box algorithm based on input data.

Your goal is to create a tool which gives you the output you want more or less reliably.

The more limited a system is, the better these tools are at this. A chess computer doesn't understand chess at all, but it can give the correct outputs, because the system is limited and you can use a dataset of very good players to get it a very advanced starting position, and you can then use the computer's extreme level of computing power to generate a heuristic which works very well.

The more open a system is, the more obvious the limitations in this approach become. This is why these systems struggle so much with image recognition, language, etc. - they're highly arbitrary things.

The improvements to these systems are not these systems getting smarter. They're not. They are giving better output, but they still are flawed in various fairly obviously ways.

The reason why access to these systems is limited is because when you start attacking these systems, it exposes that these systems are actually faking their performance. If you are only given a select look at their output, they seem much, much more interesting than they actually are.

There is something known as an adversarial attack. A good example of this is image recognition:

https://towardsdatascience.com/breaking-neural-networks-with-adversarial-attacks-f4290a9a45aa

These adversarial attacks expose that these system don't actually recognize what they're doing. They aren't identifying these things in a way that is anything like what humans do - these manipulated images often aren't even visibly different to the human eye, and certainly nothing like what the neural network thinks it is, but the system will think, with almost absolute certainty, that the image is a different image.

The reason why is that these neural network image recognition algorithms are nothing more than shortcuts.

That isn't to say that these systems can't be useful in various ways, because they can be. But they don't understand what they are doing.

Expand full comment
Jun 4, 2022·edited Jun 4, 2022

Tonight, one of my friends got a furry to draw him a sketch of two anthropomorphic male birds cuddling. Because the image shouldn't be in Google's network yet, I took that image and fed it into the system.

The results rather illustrate what is going on.

There is art of both of these characters, but the system didn't recognize either of them. It did return a lot of sketches, but none by the same artist as the one who drew it, either.

Instead, it returned a bunch of sketches of creatures with wings, mostly non-anthropomorphic gryphons, and a smattering of birds, but some other random things as well.

All of these are black and white. The characters are an owl and a parrot, but the results don't give either of those creatures for hundreds of results - they're mostly eagle-like creatures. It gives no color images, very few of the hits are even for anthro birds, the creatures in the images overwhelmingly don't have the "wing-hands" that the characters do. It doesn't give us birds cozying up for hundreds of posts, and when it does, they're non-anthro ones, despite there being vast amounts of furry art of bird boys cuddling. In fact, that was the only one I found in hundreds of results that even came close to the original image - but I suspect that it only found that image by chance, because the characters have feathers.

The system could identify that they had feathers/wings, but it couldn't identify what they were doing, it couldn't identify what they were, it couldn't tell that I might want color images of the subject matter, etc. It identifies the image as "fictional character", when in fact there are two characters in the image.

I took another image I have - a reference image of one of my characters, a parrot/eagle hybrid - and fed it into the system. This image IS on the Internet, but it couldn't find it (so it probably hasn't indexed the page it is on). As a result, it spat out a lot of garbage. It gave me the result of "language" (probably because the character's name is written on it) but it failed to identify him as a bird, and instead returned to me a bunch of furryish comics with a lot of green, teal, blue, and yellow on them.

It didn't correctly identify it as a character reference sheet (despite the fact that this is a common format for furry things), it didn't recognize that the character was a bird, it didn't recognize that they were a wizard (even though he's dressed in robes and a wizard hat and carries a staff)... all it really seems to have gotten is a vague "maybe furry art with text on it".

If you go down a ways, you do find the odd character sheet - but not of bird boys. As far as I can tell, it seems to be finding them based on them having a vaguely similar color scheme and text on the top of them. Almost none of the images are these, however, and none of the character sheets are of birds, or wizards (or heck, of D&D characters, because he's a D&D character). It doesn't well differentiate between character and background - many of the images will take the background colors from my image and have them in characters or other objects, or it will look at it and see a blue sky in the background with green grass and think it is a match.

These systems aren't smart. They're not even stupid. They are cool, and they're fun to play around with, and they can be useful - if you want to find the source of an indexed image, these systems are pretty good at finding that. But if you want it to actually intelligently identify something, it's obvious that the system is "cheating" rather than actually identifying these images in an intelligent way.

This isn't limited to original characters. I have art of a Blitzle - a pokemon - someone drew for me years ago. It is cute, the person posted this on their page 10 years ago, and Google has long-since indexed it.

But if I feed it into the system, despite the fact that the page it is on will identify it as a Blitzle and says it is a Pokemon, Google's image recognition cannot identify it a a Blitzle. It again goes "Fictional character", and while some of the images it returns are pokemon, not a single one is a Blitzle or its evolution, suggesting it is just finding that "Pokemon" text on the page and then using that to try and find "similar images".

This is not an obscure original character; Pokemon are hyper popular. But it still can't successfully identify it, and I suspect that the portion of Pokemon images it returns is because the page that it was originally posted on says "Pokemon" on it, as the returned results don't resemble the picture at all.

And there's evidence that this is exactly the sort of shortcut it is taking.

Poking at another image that I found online - that has one kind of pokemon in the image, but the most linked to stuff of it calls it an "Eevee" - gets it to "correctly" identify it as an Eevee, and the "similar images" are all Eevees.

But it's not one, it's actually a Sylveon.

https://twitter.com/OtakuAP/status/1498848805357301766

That's the image. Note the use of "Eevee" to describe it.

https://www.google.com/search?tbs=sbi:AMhZZiuavUSy9MNC7qx8V9B1-oQsHs4tH-cT6xNUfaqleaIUM8zAIIfRZfuEjsOQNw8B8Qkvad5UEgMHawGZkPoM2GWYkV0JsZxwmK0TwS3ZYpVZKVer1as1tTMj5Q8tre7oFMENKX7y12vkVOxHbouxtL-BKNIxouRp-4JGT27nlZSwa4HvAoG8ptvzkW8VlsIToXHjHfXiY6p_1noTCJlvqnXyYd_1AMAIB6kKPvdoEhyto59hU-FUKzbv73khNntm8m26lngcJvZ14ippbrFxRgv9xQdNBkvdxWX-PUEmBH_1czfb5Ehyf5bzdrML9PM9Qtl4zC1tqTAVo5yucTWT2wnOHXp6t8zpQ&hl=en

That's the output page.

It calls it an "Eevee" and the "visually similar images" are all eevees. You have to go down a long way to find a Sylveon... and they are on images that use the word "Eevee".

This is what we're really looking at here, and it is precisely this sort of analysis that people who are trying to sell these systems - or get funding for their systems - don't want, because once you start hitting at them like this, you start realizing what they're doing, how they're really working, and that these algorithms aren't solving the problem, they're taking shortcuts.

Expand full comment

A year ago having a system like DALL-E 2 was science fiction and we were excited by systems for generating textured noise like app.wombo.art . You can't predict black swans, but there sure seems to be a lot of them recently.

Expand full comment

They're scared of giving people free access to the system because the more you poke at it, the more the paint flakes off and you realize it is doing the same sort of "faking it" that every other neural net does.

They don't want people engaging in adversarial attacks because it shows that the system isn't actually working the way they're pretending like it works.

Expand full comment

It's clearly going to need a better "world model" for this to work the way people want. And the model is going to need to be (at least) 3D, as suggested by the comments about the shading on Bayes jaw.

I don't think it's and "AI Compete" problem, but it's more difficult than just merging picts or gifs.

That said, parts of the problem are, in principle, rather simple to address. (Iterative selection would help a lot, e.g.) But if the idea is to push the AI rather than to work for better drawings, they may well not choose to head in that direction.

Expand full comment

Is DALL-E giving extra weight to its own previous results based on the similar input phrases? How much time elapsed between these queries?

I laughed until I cried. Fighting off an impulse to re-caption some of them.

Expand full comment

Oh god please do. I want to laugh some more.

Expand full comment

Okay. I am hoping you post yours also.

I didn’t come up with captions for all of them but I numbered them in order of Dall-E image, leaving out other images from the sequence.

Fourth virtue:

#1 Get me an ashtray (is that a cigarette in his right hand?)

#2 A little more to the left

#3 This will be great in the kitchen

(on to the Reverend dressed in black)

#10 Quagmire goes to Oxford

#12 Spider-Man will pay…

Tenth virtue:

#2 Okay, once more from the top

#3 Damn machine mount

#4 If I shake it just right, the lid might fall out

Seventh virtue:

#1 To be or not to be…

#2 Reflex test, just a tap

#3 Giga-whamm

#4 Nah, I’m good, I still have to drive home

#5 Metallica: Millennium

#6 We’re live in five seconds, why are you handing me this?

#9 Mr. Clean Dungeon Spray

#10 NorCal yeaaaah

#11 These damn stockings

#12 Knitting is hard

Third virtue:

#9 Polyphemus rising

#13 It’s stuck…

Eleventh virtue:

#1 I’ve got you now

#2 Homeschooling is boring…

#4 Not a Woodpecker

#5 Summer on Cape Cod

#6 Tentacles emerge at twilight

#8 Wherefore art thou, Romeo?

#9 DoorDash is faster

#11 This damn hookah

These don’t really do the pictures justice, it’s an excuse to look at them again though!

Expand full comment

And a final one - took a while - eleventh virtue panel #14 - “Brains with basil… hmm, brains with braised zucchini…what to cook…”

Expand full comment

I had to move to a distant room so that my wife could continue watching tv. Even so, I'm sure she was annoyed when I saw that adding more French names produced more angel wings.

Expand full comment

There is some indication over on the subreddit that adding noise to the prompt can sometimes avoid bad attractors. For instance, a few typos (or even fully garbled text) can improve the output. It seems important to avoid the large basins near existing low quality holiday photos, people learning to paint, and earnest fan illustrations. Maybe Dall-E associates some kinds of diction with regions messed up by OpenAI's content restrictions, or mild dyslexia with wild creativity. In comparison the early images from Imagen seem crisper, more coherent, but generally displaying a lack of that special quality which Dall-E 2 sometimes displays, which seems close to what we call "talent" in human artists. Thanks for the funny and insightful essay.

Expand full comment

William of Ockham never himself used the image of a razor; that's a modern metaphor and would be be inappropriate for depiction in the stained glass image. And few people would know who Brahe is even with the moose, so leave it out.

Expand full comment
author

Stained glass traditionally displays someone with an object that symbolizes who they are, even if it's not historically accurate. For example, here ( https://www.bridgemanimages.com/en/noartistknown/dalat-cathedral-stained-glass-window-saint-cecilia-is-the-patroness-of-musicians-dalat-vietnam/stained-glass/asset/5300711 ) is St. Cecilia playing the violin, which was invented several centuries after her death.

Expand full comment

Ada Lovelace should be using a computer then.

Expand full comment

Programming in Ada, the computer language.

Expand full comment

But the razor is a metaphor for succinctness whereas violin is a musical instruments if anachronistic for Cecilia. And what's wrong with a caption. I see that in windows. And many painting have titles. Does anyone but an art historian distinguish the Evangelists according to their symbols?

Expand full comment

Educated Catholics distinguish the evangelists by their symbols. It's a way to keep kids from getting bored at mass by having them explore beauty and symbols and shared consciousness which is another form of truth and "good news".

Expand full comment

Traditionally, most people who were looking at stained glass windows or bas reliefs would distinguish the saints entirely by means of the symbols.

https://www.exurbe.com/spot-the-saint-john-the-baptist-and-lorenzo/

Expand full comment

More likely becasue someone else would tell them. :) [I think I'm more of a Catholic nerd than 99 % of the people in the pews next to me and _I_ would have to google anyone but Mark becasue of the lions in Venice] but the point for me is unless the symbol is well known it does not help and I don't think a moose helps identify Brahe. Just label him.

Expand full comment

I think it is a wonderful form of cultural literacy that allows us to understand Renaissance art the way a contemporary would. As it said in the article Kenny linked to, "If you understand who these figures are and what they mean, a whole world of details, subtleties and comments present in the paintings come to light which are completely obscure if you don’t understand the subject."

Expand full comment

"Stained glass traditionally displays someone with an object that symbolizes who they are, even if it's not historically accurate".

Not really. That is one genre of stained glass which is particular style relating to iconography and hagiography. But for example the "rose window" kind of thing is equally a stained glass genre where beauty, and abstract symmetry is the underlying aesthetic guide. And you don't consider the Islamic and Persian traditions of stained glass.

Even a focus on hagiography ignores the fact that stained glass' primary purpose is to control lighting of a space. You might read/reread Gould's (et al) paper on spandrels and evolution. The hagiography is an epiphenomenon.

Expand full comment

"Even a focus on hagiography ignores the fact that stained glass' primary purpose is to control lighting of a space."

No, you can do that with clear glass (and the Reformation icon-smashing extended at times to stained glass). Forget your spandrels, when commissioning windows, especially when glass and colours were expensive, the idea was didactic and commemorative. If you just wanted to "control the lighting" then the abstract patterns of rose windows would do as well. Putting imagery into the glass had a purpose of its own.

https://en.wikipedia.org/wiki/Poor_Man%27s_Bible

And Scott, asking for designs for the Seven Virtues of Rationalism, is working within the tradition of depicting the Seven Virtues, Seven Sins, Seven Liberal Arts, etc. Even the Islamic tradition will have calligraphic imagery of verses from the Quran or sacred names in stained glass:

https://www.1001inventions.com/wp-content/uploads/2017/10/islamic-glass23.jpg

Expand full comment

I think you made the point. If you've read Gould's spandrels of San Marco.

The first purposes of stained glass were control of light and aesthetic effect on a collectively used space.

The hagiographic diadacticism and wealth signally were add on artifacts.

But the essence of "stained glass" is light and beauty. That is where the "art" lies.

Expand full comment

Great set of experiments and writeup.

What it really looks like is that the author is praying at the altar of a very uncaring god or gods, and getting a bunch of vague prophetic crap.

Expand full comment

Other Dali use the words “in the style of” as part of the cue instead of just sticking a comma between the content and style parts, does that make a difference?

Previous work in image stylization has used a more explicit separation between content and style, which would help here. I imagine there will be follow-on work with a setup like the following: you plug in your content description which gets churned through the language model to produce “content” latent features, then you provide it with n images that get fed into a network to produce latent “style” features, then it fuses them into the final image. Of course then you potentially would have a more explicit problem with copyright infringement since the source images have no longer been laundered through the training process but maybe that’s fairer to the source artists anyways.

Expand full comment

Yeah I came here to say that. Scott keeps dissing DALL-E for becoming confused about whether "stained glass window" is a style or a desired object but the queries never make it clear that it's meant to be a style. All the other prompts I saw were always explicit about that and I'm left uncertain why he never tried clarifying that.

Expand full comment
May 31, 2022·edited May 31, 2022

Likewise. For doing experimenting with prompts it stuck out that all but the last were "...stained glass window".

"Stained glass window depicting..." might have been tried sooner.

Speaking of style, I imagine that mentioning the virtues to be depicted might also have affected the flavor of the images returned.

Expand full comment

> Other Dali use the words “in the style of” as part of the cue

The possibility has been pointed out on the subreddit that 'in the style of' may optimize for imitators of a style over the original (which, if true, may or may not move results in the direction one wants).

Expand full comment

Since it seems to get hung up on the stained glass window style, try getting the image you want without the style, and use neural style transfer to convert it to stained glass.

Expand full comment
author

Are there any good neural style transfer engines available for public use?

Expand full comment

Style transfer went mainstream years ago, but I don’t know of one off hand. “Prisma” caught on in my group of friends but it’s only free to try.

Expand full comment
author

I tried it and its stained glass filter seems terrible - it looks like a Photoshop filter that isn't doing any AI at all. Am I missing something?

Expand full comment

I just tried a couple free ones and it didn’t even resemble stained glass. More like a photoshop filter gone wrong. Deepdreamgenerator.com (limited free options) seems to work better and has options for tuning. Here’s what I generated from a painting of Tycho and some colored triangles as the stained glass style. (Default settings except “preserve colors”) https://deepdreamgenerator.com/ddream/lsnijmgc2zi

Here’s one using a real stained glass scene, like I think you’re trying to generate, as the style. (Default settings except “preserve colors”)

https://deepdreamgenerator.com/ddream/sxra7thsoah

Maybe a more tessellated real image would hit the sweet spot.

Expand full comment

Prsima is very very old, 2016 I think.

Expand full comment

There's a reasonably good one on nightcafe, not sure which backend it uses

Expand full comment

Ostagram generally does a good job at style transfer, but I've never seen style transfer of stained glass do anywhere near as good a job as these. I've managed some Tiffany style sunset landscapes, but nothing beyond that, and I've really tried.

Expand full comment

It appears the goals of 'AI images', 'natural text' to search/generate using single strings of text, and 'actually useful' user interfaces are in conflict. The idea that you'd have an art generator where the art style, subject, colours, etc. are not discreet fields and ignores all the existing rules and databases used to search for art is a bad approach to making a useful AI art generator.

I'd be more interested if they ignored the middle part of 'single string of text' and focused more on the image AI. They are perhaps trying to solve too many problems at once with AI text being a very difficult problem on its own - that said it pulled random images which are probably not well categorised as a datasource, so I'm sure they hit various limitations as well.

I would think using an image focused AI to generate categories might be an interesting approach drawing directly from the images rather than whatever text is used to describe them on the internet. Existing art databases could be used to train the AI in art styles.

It would even be interested to see what sorts of categories the AI comes up with on its own. While we think of things like Art Nouveau, the AI is clearly thinking 'branded shaving commercials' or 'HP fan art' are a valid art categories. I don't think the shaving ads will show up in Sotheby's auction catalogue as a category anytime soon though.

Perhaps we can see 'Mona Lisa, in art style of shaving advertisement' or 'alexander the great conquest battle in as HP fan art'? 'Napolean Bonopart in the style of steampunk'

Expand full comment
May 30, 2022·edited May 30, 2022

My best guess about William’s red beard and hair: DALL-E may sort of know that “William (of) Ockham” is Medieval, but apparently no more than that since he’s not given a habit or tonsure (he’s merely bald, sometimes). But he has to be given *some* color of hair, so what to choose??

Well, we know that close to Medieval in concept space is Europe. And what else do we know? We have a name like William, which in the vaguely European region of concept space is close to the Dutch/Germanic names Willem and Wilhelm. And what do we know of the Dutch and Germanic peoples? In the North / West of Europe is the highest concentration of strawberry-blonde hair!

If that’s too much of a stretch, then maybe DALL-E knows some depictions of “William of Orange” and transposed the “Orange” part to “William (of) Ockham’s” head?

Expand full comment

When I do a google image search for "william beard", I find that 90% of the results are of Prince William (the present one, not the one of Orange) with his reddish beard.

Expand full comment

Interestingly, when I do a Google Image search for William of Ockham, the first image that comes up, is the one from his Wikipedia page, which is a stained glass window! (But without a razor.)

https://en.wikipedia.org/wiki/William_of_Ockham

Expand full comment

I am personally addicted to generating "uncanny creep" "eldritch horror" and similar prompts using mini dalle.

Literally addicted, it's become an obsession.

https://huggingface.co/spaces/dalle-mini/dalle-mini

Expand full comment

Thank you!!!!!

I entered "darwin in the style of stained glass" and was very impressed by what I saw!

Addicted at my first shot.

Thanks again!

Expand full comment

I typed in "Taylor Swift" and every picture was an eldritch horror, but also obviously her.

I typed in "me" and every picture was a white guy, most of them with glasses. Far more accurate than I had a right to expect.

Expand full comment

I wonder if you could get a key in the raven's beak if you called it a beak.

Expand full comment

I think "raven biting key" might do it.

Raven is also a general term for color, ie "raven-haired," which may have influenced it.

"Stained glass" might yield very similar results stylistically to "stained glass window" except without the influence of input noun window.

Bookstore might work out better than library, also. Alexandra Elbakyan holds crow biting key, within bookstore: stained glass, might do it. "Mosaic" also might work. Or try "painted stained glass" or "faceted stained glass" (https://www.kingrichards.com/news/Church-Stained-Glass/81/Types-Styles-of-Stained-Glass/)

(I edited this to put in "crow" for "bird and add stained glass style info. I think you specifically don't want the Tiffany-style ones.)

Expand full comment

I wonder if he could get his image by starting with "beak holding key" and sequentially uncropping the image while adding elements in each iteration.

Expand full comment

I do wonder how an human artist who got a similar query from an anonymous source would respond (assuming that the artist was willing to go to the trouble etc.)

Expand full comment

An excellent question!

Expand full comment

Presuming that the opening here was just a joke to intro playing with DALL-E? I bet artists would happily draw these things in the style of stained glass. Like, if you actually wanted to do this, instead of fiddling with DALL-E you could just go onto Shutterstock, find some artist or agency in eastern Europe that does art in that style and then pay them to do it. Might be cheaper than paying for DALL-E if/when it ever becomes an actual product.

Expand full comment

The OP has a number of amusing missteps on DALL-E's part due to it (apparently) not understanding some of the base assumptions behind the queries.

However, I was wondering if a human with a similar knowledge base would do much better, especially one not steeped in a similar culture. What part of these missteps is lack of background knowledge versus being an AI (if that means anything)?

Expand full comment

This is actually a great example of the challenges with fairness and bias issues with AI/ML. Systems that screen resumes, grant credit (eg Apple card), or even just do marketing have real problems with corpus. Even if standards for past hiring are completely fair, if the system is calibrated on data where kindergarten teachers are 45 year old women and scientists are 35 year old men due to environmental factors, it is incredibly difficult to get the system to see the unbiased standards that are desired. This is a great laymen’s exploration into why that is.

Expand full comment

Wasn’t Ada Lovelace not super her name?

Expand full comment

(Though probably it’s the most common name in captions of pictures of her.)

Expand full comment

This article of Tycho Brahe (borrowed from another comment, https://www.pas.rochester.edu/~blackman/ast104/brahe10.html) says that from his measurements Brahe concluded that either the earth is the center of the universe, or stars are too far away to accurately measure any Parallax. Then It adds:

"Not for the only time in human thought, a great thinker formulated a pivotal question correctly, but then made the wrong choice of possible answers: Brahe did not believe that the stars could possibly be so far away and so concluded that the Earth was the center of the Universe and that Copernicus was wrong."

What are other times that "a great thinker formulated a pivotal question correctly, but then made the wrong choice"?

Expand full comment

Not quite as blatant, but: William Thompson, Lord Kelvin, analyzed possible methods for the sun to generate its energy, calculated that none of the possible methods he considered would last long enough for Darwinian evolution to occur, and concluded that the sun and Earth must therefore be young.

A little while later, radioactivity was discovered.

Expand full comment
May 31, 2022·edited May 31, 2022

Rutherford on his lecture in 1904, with Kelvin in the audience: "To my relief, Kelvin fell fast asleep, but as I came to the important point, I saw the old bird sit up, open an eye and cock a baleful glance at me! Then a sudden inspiration came, and I said Lord Kelvin had limited the age of the earth, *provided no new source of heat was discovered*. That prophetic utterance refers to what we are now considering tonight, radium! Behold! the old boy beamed upon me." First source I could find here: https://link.springer.com/chapter/10.1007/978-1-349-02565-7_6

Expand full comment

Brahe was an excellent observer. He used a gigantic quadrant to measure the positions of the stars, so his measurements were the most accurate available at the time. King James of Scotland, later King James I of England visited his state of the art scientific facility. Still, Brahe's measurements were not good enough to measure parallax, even of the closest star. A parallax second, a parsec, is about 3.26 light years, so to measure the distance to Proxima Centauri, 4.2 light years away, one would need an angular accuracy of better than 1/3600 degree. Brahe did what he could with what he had. His measurements were just not good enough.

This happens a lot when instruments improve. The first exoplanet was discovered in the 1995 by measuring stellar light curves using an Earth based telescope. There was a lot of skepticism since the measurement was near the sensitivity limit, less was known about stellar behavior and it was an extraordinary claim. The paper was retracted in 2003. Since then, we've discovered thousands of exoplanets using space based observatories optimized for planetary discovery. We also know a lot more about stellar behavior. The original discovery was reconfirmed, but the Nobel Prize for discovering exoplanets went elsewhere.

Expand full comment
May 31, 2022·edited May 31, 2022

Yes, people are provincial. We think "of course" now, but this is not the case.

Whether the Earth moved around the sun or vice-versa could not be determined with reference to objects within the solar system. It's true that the Ptolemaic model was more complicated, but that was really the only downside. The moons of Jupiter did put a hole in the model, since there were now heavenly bodies that did not rotate around the Earth, but you could still patch it.

The big problem for the Copernican model was parallax. If the Earth moved around the Sun, then the distant stars would wobble back and forth, the size of the wobble depending on how far away they were. Since with the best measurements of the time, they could not measure any wobble, this was a real hole in that theory.

Wikipedia suggests first successful measurement of stellar parallax as being in 1838. So until then, there was no proof (and some evidence against) the Copernican model.

Expand full comment

People were convinced of the Copernican model long before 1838.

As you mention, Galileo discovered the moons of Jupiter, which knocked out a major argument against heliocentrism.

He also observed the phases of Venus. Both models predicted that Venus should have phases, but they predicted *different* phases. Seeing that the reality matched the new prediction was a huge win.

Finally, he discovered mountains on the moon (implying that the moon is earth-like), and sunspots (showing that the sun is imperfect). This contradicted a major claim of the old theory, that the heavens were made of a different substance than earth and were subject to different physical laws.

Expand full comment

I'm not sure, but I think there might have been an attempt to modify the geocentric theory to allow the other planets to orbit the sun while the sun orbited the Earth. But it's been a long time since I looked into it so I may be wrong about that.

If this attempt existed, it would still have been another nail in the coffin, since the theory had already been made more complex several times through adding epicycles to better match the planets' motions.

On the other hand, we've done a lot of similar things to try to explain the galaxies' motions (dark matter, dark energy, etc.) I'm not sure when we're going to give up on those - presumably when a simpler theory is proposed. So far, nothing fits.

Expand full comment
founding

Yes, that was the Tychonic system. It is, in modern terms, equivalent to the Copernican system but transformed to a non-inertial coordinate system where Earth is defined as always the origin. The main advantage seems to have been that astronomers could study the way the solar system actually worked without too closely associating themselves with the guy who went out of his way to insult the Pope.

Expand full comment

Well in all fairness, the Sun does orbit the Earth at least as much as the Earth orbits the Sun. Granted, the barycenter is I think maybe 1-2000 km above the Sun's center...

Expand full comment

Wouldn't Newtonian mechanics and theory of gravity be rather strong arguments that the Earth-orbits-the-Sun model is closer to the truth?

Expand full comment

Yes, it's another nail - but Newton was born almost half a century after Brahe died.

Expand full comment

Not unless you know how big the Sun is. However, that can be estimated, and in fact circa 300 BC Aristarchus of Samos used observations of the Moon to estimate that the Sun was ~7 times bigger than the Earth (the actual ratio being >100) and proposed that for that reason the Earth probably went around the Sun. Wikipedia summarizes the methods:

https://en.wikipedia.org/wiki/On_the_Sizes_and_Distances_(Aristarchus)

Expand full comment

I think epicycles would not be consistent with using the theory of gravity - it would be one or the other.

Expand full comment

You mean epicycles are not consistent with inertial mechanics under a central force? That is definitely the case. I thought you were just addressing heliocentric versus geocentric, and I'm just pointing out unless you do some hard thinking and close observation, your naive thought will be (looking at the sky) that the Sun is about the same size as the Moon, probably about as far away, and so there's nothing there that prevents geocentrism.

You have to get into slightly more sophisticated observations, e.g. the fact that the phases of the Moon suggest it's illuminated by the Sun and if so so the fact that it's *still* illuminated when the Sun goes down means the Sun is bigger than the Earth and so forth. This isn't super hard (which is why it was already relatively current among sophisticated Greeks circa 300 BC), but it *does* require a focus on empirical observation rather than elegant theoretical argument, which maybe accounts in part for the unreasonably long time it took for Western natural philosophers to suss it out.

Expand full comment

Famously, Einstein, with both the expanding universe and the cosmological constant.

Expand full comment

Giovanni Saccheri published a book in 1733 entitled /Euclid Freed of Every Flaw/ in which he attempted to prove the parallel postulate from the rest of Euclid's axioms. He worked by contradiction and ended up proving several theorems of hyperbolic geometry. However, at some point, he decided things had gotten too weird and declared that the hypothesis must be false because the results were "repugnant to the nature of straight lines".

Expand full comment
May 31, 2022·edited May 31, 2022

Two come to mind:

1. In the 1870s Gibbs while formulating classical statistical mechanics was obliged to use a quantum of phase space to make everything work out, which had the dimensions of action, i.e. was dimensionally identical (and indeed had the same meaning as) Planck's constant. Arguably had be been less of a recluse and known of Fraunhofer's discovery of spectral lines, maybe talked it over with Maxwell, he might have come up with quantum mechanics 40-50 years before it was invented.

2. Both Enrico Fermi and Irene Curie observed fission of U-235 during their neutron bombardment experiments (Fermi in 1934), but both failed to interpret it as such, and it was 5 years more before Frisch and Meitner figured it out. Ida Noddack actually wrote a paper suggesting this possibility, which was read by both Fermi and Curie but dismissed perhaps because Noddack was a chemist and had no suggestion for a physical mechanism of fission. Imagine a world in which, say, Albert Speer knows a nuclear chain reaction is possible in 1934, five years before the war even starts.

Expand full comment

Off-topic to the AI generation, but "What I’d really like is a giant twelve-part panel depicting the Virtues Of Rationality." - I feel that you're not alone in this.

Expand full comment

It's so Reinhold Neibuhr.

Expand full comment

From a linguistics-oriented satirical hard-boiled detective novella: "I followed him to a dining hall stretching before and above me like a small cathedral. A stained glass window in art deco style opposite the entrance portrayed the seven liberal arts staring spellbound at the unbound Prometheus lighting the world; this was flanked by a rendering of the nine Muses holding court in Arcadia on the left and of bright-eyed Athena addressing Odysseus on the right. I was led to a sheltered side alcove where Ventadorn was waiting. I stood for another minute looking at the windows before I went in. She said, 'Pretty antiquated now. Most people never notice them any more. Most of the time they don’t bother to illuminate them.'" https://specgram.com/CLXVII.m/

Expand full comment

I can swear I've read that one or something an awful lot like it.

Expand full comment

There are traditional representations of the Seven Liberal Arts, and this huge fresco titled 'The Triumph of St Thomas Aquinas' has a selection of the standard iconography:

https://www.amblesideonline.org/art-spanish-chapel

"Quadrivium:

The allegorical figure of Arithmetic holds a tablet. At her feet sits Pythagoras.

The allegorical figure of Geometry with a T-square under her arm. At her feet sits Euclid.

The allegorical figure of Astronomy. At her feet sits Ptolemy, looking up to the heavens.

The allegorical figure of Music, holding a portative organ. At her feet sits Tubal Cain, with his head tilted as he listens to the pitch of his hammer hitting the anvil.

Trivium:

The allegorical figure of Dialectics in a white robe. At her feet sits Pietro Ispana (the identity of this figure is uncertain.)

The allegorical figure of Rhetoric with a paper list. At her feet sits Cicero.

The allegorical figure of Grammar, teaching a young boy. At her feet sits Priscian."

Expand full comment

Why Tubal-Cain? As far as I can tell his only connections to music are that his profession makes noise, and that his half-brother Jubal was a musician. Why not use Jubal?

Expand full comment

First I thought that it was because Tubal-Cain worked brass and iron, and that this referred to instruments like trumpets. But it seems to be a mediaeval transcription error, where Jubal is referred to as Tubal, and their stories get muddled together with that of Pythagoras:

https://www.jasoncolavito.com/blog/tubal-cain-and-the-musical-pillars-of-wisdom

Pythagoras is supposed (in one version) to have discovered musical tones by the sounds of blacksmiths hitting anvils with hammers of different weights. Since Tubal was a blacksmith, this gets attributed to him.

"The author of the Cooke manuscript has taken the part of Petrus’ text where he essentially accuses the Greeks of lying about Pythagoras inventing music and instead harmonizes the two accounts by making Pythagoras discover Tubal Cain’s lost writings. So much for Pythagoras’ entry into the story, which comes almost certainly from Petrus using a line in Isidore of Seville’s De Musica 16, where that author writes that “Moses says that Tubal, who came from the lineage of Cain before the Flood, was the inventor of the art of music. The Greeks, however, say that Pythagoras discovered the origins of this art, from the sound of hammers and the tones made from hitting them” (my trans.).

However Petrus Comestor and Isidore have made an error derived from faulty Latin. The Old Latin translation of Flavius Josephus from around 350-400 CE, the source that stands behind Petrus and probably Isidore, mistakenly gives “Tubal” instead of “Jubal” from Genesis 4:21 as the inventor of music, and later authors repeat the error. This error probably comes from misreading the Septuagint, where Tubal Cain is shortened to just Tubal, making confusion easier."

Expand full comment

People forget how mystical iron work was originally. Wasn't the old prayer, save us from Jews, blacksmiths and women? Now all we have left of Wayland the Smith is Wayland Smithers on the Simpson's, and almost no one gets the reference.

Expand full comment

Translation of sixth section of the Breastplate of St. Patrick:

https://en.wikipedia.org/wiki/Saint_Patrick%27s_Breastplate

6. I have set around me all these powers,

Against every hostile savage power

Directed against my body and my soul,

Against the incantations of false prophets,

Against the black laws of heathenism,

Against the false laws of heresy,

Against the deceits of idolatry,

Against the spells of women and smiths and druids,

Against all knowledge that binds the soul of man.

Expand full comment

So, DALL-E can't understand style as opposed to content. This is like very young children who can recognize a red car or red hat, but haven't generalized the idea of red as being an abstraction, a descriptor that can be applied to a broad range of objects. I forget the age, maybe around three or four, that children start to realize that there are nouns AND adjectives, so DALL-E is functioning like a two or three year old. I wonder how well it does with conservation games like peek-a-boo.

P.S. Maybe instead of a Turing test, we need a Piaget test for artificial intelligence.

Expand full comment
founding
May 31, 2022·edited May 31, 2022

DALL-E 2 can't keep separate which attributes apply to which objects in the scene because of architectural trade-offs OpenAI took. Gwern's comment on this LessWrong thread speculates about some of the issues in a way I found interesting (I don't pretend to follow the details alas) "CLIP gave GLIDE the wrong blueprint and that is irreversible": https://www.lesswrong.com/posts/uKp6tBFStnsvrot5t/what-dall-e-2-can-and-cannot-do

Other models already exist that do not have the same problems (though they surely have other problems, these still being early days): https://imagen.research.google/

I mention this because in this thread I have seen people extrapolating from random DALL-E 2 quirks to positing some fundamental limitations of AI generally (someone below said they thought AI performance had already hit a ceiling, which, I don't even know where to begin with that claim) when at least some of them actually appear to be fairly well-understood architectural limitations that we already know how to solve.

Expand full comment

it can understand style vs content if you are explicit in which is which (https://arxiv.org/abs/2205.11916) I've also seen many prompts that include 'in the style of'

Expand full comment

My guess is that you are expecting to produce great art right off the bat with a new tool and only a few hours practice with it. Obviously there is a learning curve, as your post demonstrates. Spend a few days with it, and I would assume your results will be spectacularly better.

From what limited exposure to DALL-E 2 I have seen, your assumption about the query "a picture of X in the style of Y” would work to remove the stained glass from the background of the subjects and make the art itself stained glass -- "a picture of Darwin in the style of stained glass."

Perhaps someone will make a new WALL-E interface that include various sliders that work in real time, like the sliders on my phone's portrait mode, allowing me to bump up and down the cartoon effects and filters. So you could make your output more or less "art nouveau" or "stained glass" or whatever parameters you entered in your query.

Someone wanted to make a music video with WALL-E 2 yesterday, but couldn't quite do it. He still got some pretty results however.

https://youtu.be/0fDJXmqdN-A

Expand full comment
author

"A picture of Alexandra Elbakyan in a library with a raven, in the style of stained glass" gives pictures that look a lot like the first ones in that section - painting-y with a window in the background.

Expand full comment

Okay, thank you. Sorry, my experience with DALL-E is all vicarious. I tried "Alexandra Elbakyan in a library with a raven, in stained glass" in that mini DALL-E, and the results were not high quality, but better than I expected. Oddly, I get different results with that program each time I run the same query.

Your readers might enjoy the DALL-E mini, recommended by one of your readers.

https://huggingface.co/spaces/dalle-mini/dalle-mini

I also got fair results with "scissors statement stained glass"

Expand full comment

Low hopes for this, but worth a shot: "A stained glass window depicting..."

Expand full comment

That was tried in the article. It's what produced the monstrous woman-raven hybrids.

Expand full comment

FWIW I tried "stained glass by Francesc Labarta of william occam holding a razor" on dalle-mini; while the quality is subpar it was stylistically closer to 19th century than the attempts you shared.

https://imgur.com/5Pt2Dd8

Expand full comment

For Brahe, have you considered his metal nose as a signifier rather than the moose?

Expand full comment

I think that modern AI has reached a local maximum. Machine learning algorithms, as currently being developed, are not going to learn abstractions like adjectives and prepositions by massaging datasets. They're basically very advanced clustering algorithms that develop Bayesian priors based on analyzing large numbers of carefully described images. A lot of the discussion here recognizes this. Some limits, like understanding the style, as opposed to the content, of an image could be improved with improved labeling, but a lot of things will take more.

Before AI turned to what they called case based reasoning which trained systems using large datasets and statistical correlation, it took what seemed to be a more rational approach to understanding the real world. One of the big ideas involved "frames", that is, stylized real world descriptions, ontologies, and the idea was that machines would learn to fill in the frames and then reason about them. Each object in a scene, for example, would have a color, a quantity, a geometry, a size, a judgement, an age, a function, component objects and so on, so the descriptors of an object would have specific slots to be filled. A lot of this was inspired by the 19th century formalization of record keeping, and a lot of this came from linguistics which recognized that words had roles and weightings. There's a reason "seven green dragons" is quite different from "green seven dragons" even though both just consist of the same two adjectives followed by the same noun.

I suspect that we'll be hearing about frames, under a different name, in the next ten years or so, as AI researchers try to get past the current impasse. Frames may be arbitrary, but they would be something chosen by the system designer to solve a problem, whether it is getting customer contact information using voice recognition, commissioning an illustration or recognizing patterns in medical records.

P.S. As for a lot of the predictions for systems like DALL-E, I'm with Rodney Brooks, NIML.

Expand full comment

I agree that these systems are severely limited, and that the frame problem remains. However, even limited systems seem able to replace a large number of functions performed by white collar workers, because the stuff middle class people do is largely mundane and routine. Moreover, it's hard to quantify and notice all the actually intelligent things that people do in their jobs, as these things are seldom in their job descriptions. Bottom-line driven employers are then likely to switch since even limited automation does a good-enough job for the routine stuff, and then fail to notice that their organizations become less effective over time. I think the danger is the ensuing social upheaval. I think this process has already been happening for decades even without "AI" so it doesn't seem unlikely to me that it will continue.

In the longer term AI wranglers are likely to be in high demand and we'll expect more from all humans in the loop. This assumes we can muddle through to that point and not stumble our way into a horrible dystopia. Or, I suppose, we could all make sure we possess some hard-to-automate skills like plumbing, cleaning up after chaotic events, foraging, growing food, or caring for bed-ridden people.

Expand full comment

You are right. AI is definitely good enough to have a big impact on the economy, particularly eliminating a lot of mid-level jobs. We've been seeing that happening. (I'll cite Autour and others on this.)

You are also right that we are just starting to get the payoffs. AI is like the steam engine, small electric motors and microprocessors. They take a while to seep into the economy, but they make a huge difference.

You are also right that when AI does a bad job, we could stumble into a dystopia.

Your are also right that there are some jobs left for humans. My lawyer has never needed a paralegal, but he does have a receptionist who acts as a witness to legal documents. The receptionist job may vanish, but unless dystopia we'll be requiring human witnesses for a while longer.

Expand full comment

Lift your razor high, Occam

Hold it to the sky

Entities without true needs

Shan't multiply

Expand full comment

It might have placed the key better if you put it in the raven's beak instead of mouth

Expand full comment

If you want matching styles, maybe use Deep Art to adjust some as a second phase?

Expand full comment

> The most interesting thing I learned from this experience is that DALL-E can’t separate styles from subject matters (or birds from humans).

Looks like entanglement is an issue. DALL-E cannot seem to find the right basis, where the basis vectors are styles, subjects, objects, etc. and instead uses artistic license to the max.

Expand full comment

Who tells it what the correct basis should be? There's a correlation in the corpus between reindeer and men in red robes and art styles that work well on a post card, just like there's a correlation between reindeer legs, reindeer torsoes and reindeer antlers.

Expand full comment

It could at least try to identify and rotate the bases, assuming it even has anything like the basis abstraction in its code.

Expand full comment

I am 95% sure there is actually a moose in one of the stained glass windows at my church. It’s in a more recent window depicting the Creation.

Expand full comment

"DALL-E has seen one picture of Thomas Bayes, and many pictures of reverends in stained glass windows, and it has a Platonic ideal of what a reverend in a stained glass window looks like. Sometimes the stained glass reverend looks different from Bayes, and this is able to overpower its un-confident belief in what Bayes looks like."

So the Bayesian update wasn't strong enough to overcome its reverend prior?

Expand full comment

Scott,

Are you familiar with the work of the Emil Frei Stained Glass Company, based in St. Louis? If not, here is their official site:

https://www.emilfrei.com/

And this is a layman's tour of their work in St. Louis, a great resource to view the breadth of their work (spanning more than 100 years):

https://www.builtstlouis.net/mod/emil-frei-stained-glass.html

I grew up in St. Louis, but only learned about their work much later. And yet, when I saw it, it seemed hauntingly familiar. Their style is very distinctive, very quintessentially Modern, moving into Mid-Century Modern. But their stylized figures of people and animals also feel very ... Eastern, Early Christian ... Macedonian, actually.

Besides advocating for these great artisans from my hometown, I want to mention a second point about stained glass -- it is Architectural. Real stained glass windows always exist in a building, with its interior/exterior spaces, its particular site and the sunlight, and most importantly, the people who will gather there to worship.

Digital images of stained glass patterns can be spectacular. I am sure that if DALL-E had a images of the Emil Frei style in its corpus, it could generate new "works" that would be uncannily like real, original artworks from that studio.

From there, rendering the actual light coming through windows in a sanctuary is really a simple extention of existing CAD/ capabilities.

But, much like AI chess programs working under the direction of the Kasparovs and Carsons, designing the placement and subject matter and general massing and flow of these dramatic featured elements, is still the domain of humans.

Oh, btw, if you really wanted to design an epic 12-panel journey depicting the Tenents of Rationality, I think the Emil Frei style would serve you well. (Also, 12-step-or-station journeys are a time-honored way to educate people through stories/allegory/pictures.)

BRetty

Expand full comment

Here's a painting of the amazing Tiffany stained glass screen that Chester Arthur had installed in the White House:

https://www.whitehousehistory.org/photos/the-grand-illumination-by-peter-waddell

Unfortunately, Teddy Roosevelt had it junked so it's no longer existent. TR didn't approve of Tiffany's private life and the Tiffany look had gone out of fashion in the 1900s.

A vast amount of superb Tiffany glassware was lost about a century ago due to it going (temporarily) out of fashion and being thrown out.

Expand full comment

Oh gosh, so much like that happened! Old style furniture being dumped because "ah, that old stuff, clear it out and get something modern in".

My parents' generation dumped good solid wood furniture to replace it with formica. Then years later they realised and regretted, but too late then. Fashion goes in cycles: parents buy new up-to-date style, kids throw it out because they want to replace it with their own new up-to-date style, grandkids regret the loss of antiques.

Expand full comment

My family has actually gotten relatively good at avoiding this cycle by gifting furniture pieces back and forth when someone gets tired of them. My mom has fallen out of love with real wood furniture; but thankfully is just sending the pieces to my aunt and I, as we're both still very fond of solid wood.

Expand full comment
author

Thanks. It's not really my style - too modern - but it's definitely impressive.

Expand full comment

Nice post. Especially the point about stained glass as an architectural feature to control light of a space for communal activity.

Btw, the mosaic work in Cathedral Basilica of Saint Louis (the new one not the older) is also fantastic.

Expand full comment

Yes, stained glass by Robert Frei, mosaics by several, including the studio of Emil Frei, Sr.

Expand full comment

Something I've been thinking about in the context of machine translation, but it might apply to stuff like this as well.

All these neural net-y systems use an interaction pattern where you give them a single prompt, they do a bunch of internal churning, and then spit out a single response. A lot of the time, the response is nonsense, but you can sometimes trace the nonsense a bit to see how the internal churning misinterpreted the original prompt, and then continued to compound mistakes in interpretation on top of that.

The problems these systems are solving are usually multi-step problems, so there's some meaningful sense in which they should be able to "show their work" as they solve them. Like, in machine translation, the system should be able to show you how it broke the prompt into words, how it interpreted the grammatical relationships between words, how it mapped the words from the input language to the output language, and what common idioms it recognized and reworded.

So it seems like you should be able to get generally better results with machine translation if the user were able to give feedback on the accuracy of each of those steps. That way, if the initial parse into words is completely wrong, you could recognize that issue and correct it at that point, rather than letting that initial mistake get compounded into incomprehensibility. Basically, designing the interaction with a neural algorithm as a collaborative feedback loop rather than a black-box oracle.

Applied to this case, the system ought to be able to ask you questions during the process of image synthesis, like "It looks like the subject of your image is a woman named 'Alexandra Elbakyan,' who I think looks like this. Is this correct?" "It looks like you want there to be a 'raven,' which is a type of bird which looks like this, in proximity to the main subject." "It looks like you want there to be a 'key,' which is a tool that looks like this, in proximity to the 'raven.'" "Here is a composition containing these three elements. Is this acceptable?" "It looks like you want this to be drawn in the style of a medieval stained-glass window."

It doesn't seem like there should be any technical reason why these systems couldn't be designed to work more like this, so I don't know why they're all so fixated on the black-box design pattern.

Expand full comment

That's not how modern ML systems work. There is a lot of work on getting "interpretable" AI systems in which it is possible to understand what the internal states do. This work has failed completely. If anything, the system move even further away from having interpretable hidden states.

Arguably the most successful multi-step system at the moment is the Socratic Model from Google, https://socraticmodels.github.io/

This model is what you get if you let an image generator like DALL-E/CLIP interact with a language model like GPT-3. The important thing is that the developers completely the temptation to let the systems look into each other's internal states. They are ONLY allowed to communicate via output (text and images). I think this reduction was the key to make it work.

E.g., you start with an image A, you generate a few caption C0,...,C9, then you generate some images for each of C0,...C9 and ask an image-comparing AI which of these images is the most similar to A, and keep only this caption. And so on.

It might not be so surprising in hindsight that language is such a good interface. Laguage has been optimized by millenia of evolution to be good for communication between two entities which cannot observe their internal states. We should expect this to work really well, and it does.

Expand full comment

As far as I understand it, you can usually get them to generate a *description* of their internal state as part of the output, as long as you tell them in the prompt that you want them to do so. (Which isn't necessarily the same thing as an actual snapshot of the real arrangement of information in the program's memory, which is what remains generally inscrutable.) It looks like Socratic Model and the "step by step" paper linked below are doing basically that. I'm still not sure why they just use those methods to have different programs talk each other through the process, though, instead of having one program talk to a human.

Whether natural language is actually a *good* interface is another question I had after this article. It seems like many of the problems the program had are down to bad parses of the natural language prompts (confusion of "Darwin next to a finch" with "Darwin, as a finch" and so on). It sure seems like you should be able to get better results if you could get the program to understand some more explicit intermediate query, like { 'subject': "Charles Darwin", 'decorations': "finch (bird)", 'style': "stained glass window" } or whatever. But I think natural language interfaces are probably *inevitable* with this technique, at least, even if they're not *good*, just because they're brute-force training the programs with huge mounds of human-generated data, and human-generated data (especially data regarding art) is usually formatted using natural language instead of explicit query languages. So you probably couldn't teach them to use a query language like that in an automated way.

Expand full comment

Recent paper: https://arxiv.org/abs/2205.11916v1 wherein just adding "Let's think step by step" to the prompt significantly improved output.

Expand full comment

weird that you didn't try *less specific queries*. instead of "Alexandra Elbakyan in library with a raven with a key in its mouth, stained glass", why not just "raven with a key in its mouth, stained glass" or "raven in a library, stained glass"?

Expand full comment
author

Because that wasn't the picture I wanted.

Expand full comment

This also shows why art depicting saints be it in pictures, stained glass, or sculpture, relies on attributes to identify them. Female saint with palm leaf and basket of flowers? Probably St Dorothy. Female saint with palm leaf and lamb? St Agnes.

This is also why there are a bunch of attributions of the style "Madonna and Child with four saints" where the donor commissioned an obscure local saint, the identification has been lost, and you're left with standard "bishop holding a church" style iconography.

Expand full comment

Only one known picture of Thomas Bayes, and it doesn't even show his posterior...

Expand full comment

No, but we can infer what his posterior looked like from the available evidence.

Expand full comment

Immediate comment: if you've ever commissioned images on Fiverr or similar, you will know that this is a very difficult task. The real problem here is the one-shot communication and the lack of ability to iterate. I'm not sure DALLE is doing any worse than a human would.

Expand full comment

I'm pretty sure there aren't very many human artists who would creatively misinterpret "Darwin and a finch" to mean "The head of Darwin attached to the body of a finch."

Expand full comment

I understand the thought, but honestly, you've got to try it. The kinds of misinterpretation possible between people far, far, far, far, far exceed what you might commonly imagine. You may be correct that that specific kind of mistake would be very rare among human interlocutors, but mistakes that would make your mind boggle are common; and describing a visual scene to people is notoriously hard. Hence the parlour games like pictionary.

Expand full comment

I would like to see a window that depicts Joseph Overton, holding a window.

Expand full comment

I suspect the eventual takeaway from DALL-E in terms of a practical tool for art is that short snippets of natural language are a very unwieldy way to control an AI artist. IMHO, DALL-E’s capacity to render coherent scenes and styles, and to have visual familiarity with so many things, makes it the most viable “actual economic use of actual AI” case we have to date. Beside that, it’s NLP aspect is a fun spandrel that (as this post amusingly demonstrates) gets in the way of purposeful use of those greater capabilities. I think and hope that’s eventually (perhaps after DALL-E is licensed to some team more focused on consumer software) we’ll have a version that’s much more hands-on, something like an “artificially intelligent Photoshop” that lets you much more directly poke the model into doing what you want.

As an aside, though, I think “Darwin as a finch with a human head” is a fantastic visual metaphor and wholly appropriate to the symbolic stained glass setting.

Expand full comment

I think this shows that the fantastic images we are shown as representative of what DALL-E produces are (1) cherrypicked to be the best of the best and (2) the people working with it all the time know the best way to extract the results they want from it. Clearly there are tricks and shortcuts in the way you must phrase requests to get it to produce what you want, which someone working on it for months has developed while someone coming in cold and just asking "give me a raven with a key in its mouth in a library" doesn't know.

Expand full comment

Yes, prompt engineering really does help. https://www.reddit.com/r/dalle2/comments/ub0sfg/dalle_2_imitation_game_results_check_sticky_for/ makes this really clear.

Expand full comment

I wonder to what extent the original art used in training sets should be entitle to copyright.

The GitHub Copilot reproduced FOSS licensed code at some prompts. Does DALL-E encode in its parameter space high-fidelity copies of copyrighted art, reproduced without permission?

Expand full comment

It’s a good question. I’d imagine that if you prompted it with just “the Mona Lisa” it would be able to produce a pretty close approximation, but how close? And how famous does art have to be to have gotten in there with decent fidelity?

Expand full comment

Due to the stochastic process used in generating an image, it seems difficult to retrieve a high quality representation of even an extremely frequently sampled image, which might be expected to be memorized with high fidelity. However, it seems easy to generate new images in which that well known image appears as a component, where the "essence" of that image is more important than the fidelity of reproduction of the original.

Expand full comment

I assume the Darwin as finch with a human head is inspired by this famous cartoon:

https://commons.wikimedia.org/wiki/File:Editorial_cartoon_depicting_Charles_Darwin_as_an_ape_(1871).jpg

Expand full comment

Welcome to the wonderful world of AI psychology! Also known as prompt engineering. How can I extract the desired knowledge or output from the AI by asking the right questions?

I actually expect that AI psychologist becomes a common job in the future. Perhaps under a more boring name like AI operator.

Expand full comment

Only distantly related, but: The game SOMA had a character whose job description was "AI psychologist". :)

Expand full comment

Why can't the AI ask you questions to clarify your request? I'm sure a stained glass...artist? would ask you a few questions to make sure they knew exactly what you were after before completing the work. Wouldn't that resolve the issues of ambiguity? Seems like we're expecting AI to be more intelligent, or clairvoyant, than humans.

Expand full comment

A human artist who couldn't ask any clarifying questions still probably wouldn't draw "Darwin studying finches" as Darwin's head on the body of a finch, though.

Expand full comment

😂 true, but to be fair some people may commission 'surrealist' designs. The artist would need to know if the person wants a very literal design or something a bit more...out there.

Expand full comment

In fact I've just seen this human design posted in another comment:

https://commons.wikimedia.org/wiki/File:Editorial_cartoon_depicting_Charles_Darwin_as_an_ape_(1871).jpg

Expand full comment

I think you've confused "art" with "propaganda". Not really any consideration of the concept of "beauty" and "truth". Nor the idea that art is a process of "virtualization" of aspects of consciousness. See Suzanne K Langer. Form and Feeling.

Perhaps some self reflection on your personal ideas of your own aesthetic theory would reveal that you are pretty close to embracing a Marxist theory of art and hence it is no wonder that DALL E is picking up on this and thus producing works that could easily pass as part of that tradition.

Expand full comment

Are you arguing that stained-glass cathedrals can't be art because they're religious propaganda? Is the Sistine Chapel not art?

Expand full comment

No they are clearly art, but they have a secondary purpose.

Soviet art was probably art but it had a dual function of propaganda. But if aesthetics was not at all a concern then it was only propaganda.

Advertising (product propoganda) might be art if aesthetic concerns were considered, but if not then maybe not art.

"Art" is what an "artist" makes.

The artist here is not really DALLE (which is just a tool) it is Scott, if and only if he decides that his curation and description writing is made as an "artist".

It is not clear that he is actually taking on the role of artist. At best he appears to be acting in role of a progandist.

Expand full comment
Jun 1, 2022·edited Jun 1, 2022

So Scott wrote an entire post about how hard it was to get Dall-E to generate good, pretty stained glass that would be fit for decorating his home and doesn't look like bad clipart, and your conclusion is he doesn't care about aesthetics and his main goal was to advertise the rationalist virtues?

I'm baffled how you came to that conclusion.

Expand full comment
Jun 2, 2022·edited Jun 2, 2022

Who is the artist? Scott or Dalle?

(I don't think it can be DALLE, producing images is not the same as art. A camera produces images, but a camera is not an artist because the camera cannot understand or appreciate aesthetics.)

So if Scott is the artist (his choice to accept or reject this role), in what way did he consider aesthetics? Does his post ever use the words: Beauty or Truth? Not that i saw.

He chose instead a program of a particular propaganda. Are Soviet illustrators, advertising firms, or religious educators artists? Only when when they claim to be artists engaged with the primary function of art: aesthetics.

Scott has to decide whether he is the artist or not. It doesn't seem like he thinks he is. Rather he seems to mistakenly suggest that DALLE is the artist, and he is an "art" critic or maybe just a consumer considering "preferences". Picking out a rug or sofa fabric - basically nothing to do with "art".

Expand full comment

In the spirit of asking for a deer instead of a moose, in the SciHub one I would have tried asking for a crow instead of a raven.

Expand full comment

Where's Chagall when you need him?!

Expand full comment

> I’m not going to make the mistake of saying these problems are inherent to AI art. My guess is a slightly better language model would solve most of them

I think your specific problem would be better solved by a model that doesn't know any styles other than stained glass. If everything it generates looks like stained glass, you can ask for anything and it will come out looking like stained glass.

Random comments:

- Darwin #1 appears to be horrifically deformed. He's got weird flaps of skin hanging off his face. That's not a beard.

- I'm surprised you didn't try asking for something like "Alexandra Elbakyan in a blonde ponytail".

- When you ask for a person accompanied by a raven, you appear to be getting ravens that are half the size of a man. Something's very wrong there.

Expand full comment

I thought about pointing out the algorithm's apparent confusion on the relative size of humans and ravens... but I feel like this genre of art tends not to take perspective very seriously anyway, so it's not entirely untrue to the source material.

Expand full comment

Sparrows are "pretty much the same size as a tree"

https://the-toast.net/2015/04/01/two-medieval-monks-invent-bestiaries/

Expand full comment

Re Darwin's aesthetic issues: DALL·E 2 is generally bad at photorealistic faces and hands. The stained-glass constraint helps hide how bad it is at faces, but you noticed one such issue, and several people here have 4 or 6 fingers.

Expand full comment

Should've asked for a stained glass window that humans would want if they were smarter, wiser, and grown up further together.

Expand full comment

This was hilarious, and a much-needed riposte to all the "DALL-E will totally replace human artists" posts we had.

Come on, the Darwin-finch is *awesome*. Rationality should definitely keep that one!

I don't know about William of Ockham, but Gila Whamm is definitely going to cut a bitch. William was a Franciscan, so you might do a bit better putting that in, though I imagine DALL-E will then churn out images of Franciscan saints which may not be what Rationalist Virtue stained glass windows want as their imagery:

https://upload.wikimedia.org/wikipedia/commons/7/70/William_of_Ockham.png

As for the Reverend Bayes, he was an 18th century Presbyterian. Anything resembling a cassock or soutane or Anglican/Papist (but those are the same thing) clerical robes will have his ghost arising out of its resting place to haunt you.

As for Tycho Brahe - a moose is the best attribute to identify him? Scott, are you forgetting his METAL NOSE????

"Tycho Brahe lost his nose in 1566 in a duel with Manderup Parsberg, a fellow Danish student at the University of Rostock and his third cousin. Tycho wore a prosthetic nose made of brass, and afterward he and Parsberg became good friends."

Also I do like how the art attempts ended up with Alexandra Elbakyan as a Sirin of Russian folklore:

https://en.wikipedia.org/wiki/Sirin

Russian folklore has not one, but *two* woman-headed birds, the Alkonost is the second:

https://en.wikipedia.org/wiki/Alkonost

See this painting with both:

https://en.wikipedia.org/wiki/Alkonost#/media/File:Vasnetsov_Sirin_Alkonost.jpg

Expand full comment

I was going to say, the other day I stumbled on this discussion of the only picture "of Thomas Bayes": https://www.york.ac.uk/depts/maths/histstat/bayespic.htm

The picture was first presented as Bayes in a 1936 book on the history of life insurance, but it is clearly a priest wearing 19th century style, so it's definitely not him. (Unless someone in 1936 somehow knew what an 18th century figure looked like, with no other known portraits, but didn't know decadal clothing styles?)

Expand full comment

On the final thoughts, I think that DALL-E2 itself is probably capable of satisfying most of your requests, but what it needs is some amount of fine-tuning on the kind of results that you want, rather than trying to sample from 'the set of images that are likely to have your prompt as a caption', which in any case becomes less defined as your caption diverges from the training set.

The process for fine-tuning these kind of neural networks to be much more helpful is now quite well established, basically involving generating a load of pairwise preferences over results, and then fine-tuning. For example, DeepMind's GopherCite (https://www.deepmind.com/publications/gophercite-teaching-language-models-to-support-answers-with-verified-quotes), trains a language model in just a few steps to give verbatim quotes supporting an answer to a question, in a pre-specified syntax. On the other hand, if the underlying language model is prompted to do so it only sometimes get the syntax and rarely gives proper verbatim quotes. (In language model cases, they also train an RL model to plan ahead, but it's not obvious how this would transfer images which I understand are generated all at once).

Given that the model is huge and not public, it's not going to be possible as an individual but it should be quite trivial to do within OpenAI, and the number of examples required is fairly low, so if it was worth their time and money, you might even be able to give enough examples on your own for the fine tuning.

Expand full comment

It would be useful if the AI could be set to make drawings with no backgrounds. Then one could ask it to make drawings of ravens with keys in their mouth with no background. Then ask it to make library backgrounds separately. Then pick the best raven and put in in the best library.

I've been thinking about using AI art for comics. Example: in the first panel a cowboy punches a robot and in the second panel the robot punches the cowboy. A problem is the AI should draw the same cowboy and the same robot in each panel, just adjusting their positions. So for a AI to be useful for comics you should be able to make a specific character and name it, and then whenever you use that name the AI should be able to remember that character.

This seems hard because the character should be able to change to some extent, if he changes his clothes, or grows old, or gets a haircut or whatever.

The AI should also be able to remember specific objects and locations.

Expand full comment

It's possible to do masking with DALL-E 2. To get a consistent style, you should try putting the same window frame including a little background glass of the style you want, and then let DALL-E fill in the middle with the figure.

Expand full comment

Sometimes it's hard for people to recognize important differences too. The second image of Occam's razor and picture of the medieval razor are not similar.

In the picture of the medieval razor, look at where the metal attaches to the wood. There's a pivot, so this is a folding knife. Razors need to be extremely sharp, but don't have to cut through anything hard, so a razor has an extremely thin blade. This is a picture of a thin and sharp pocketknife.

Now look at the second picture for Occam's razor. The place where the metal attaches to the wood does not have a pivot and looks more like a sledge hammer. The metal is extremely thick. It may not even have a sharp edge. And it's larger than Occam's head.

The second picture shows Occam's pick. Mining tools rarely make good razors.

https://en.wikipedia.org/wiki/Pickaxe#/media/File:Keilhaue_Bergmann_Hammer_VEB_-_BKW_-_GL%C3%9CCKAUF_-Tr%C3%A4ger_des_Vaterl%C3%A4ndischen_VO_in_Gold_-_Betrieb_im_VE_BKK_Senftenberg_-_Lupus_in_Saxonia_Bild_00017.jpg

Expand full comment

The images have quite a collection of instrument:

First group:

Left hand is hidden from view, but it seems to have a razor and two other things (one metal & one wood).

Pick.

Large dagger. Notice the ridge running down it's back. This is a small, but sturdy weapon.

Second group, William of Ockham:

Nothing in hands. Pick or adze in the background.

Recurve bow.

Long handled knife. (Something like this: https://www.turbosquid.com/3d-models/blade-chinese-long-3d-model/323913)

Third group, Art Nouveau:

Tiny pickaxe.

A fan. Maybe it's a knife that's moving, but it looks more like a fan.

Large bronze butterknife. It doesn't look sharp to me, but maybe that's just the style and we can count this as a bronze razor.

Fourth group, 1890:

I don't even know. Maybe a hammer with its head halfway down its haft. It's small enough that it might be some weird geometer's tools.

We can't see the head, but it has a thick wooden haft. He seems to be using it to smash the glass vial in his left hand.

A bronze razor and some calipers. The razor looks sharp and about the right size. The calipers might also be a reflex hammer with a hat.

Expand full comment

How much would it cost to hire a human artist to design the stained glass window for you?

Expand full comment

I really appreciate you showing your process here. Most of the DALL-E stuff I see is just a curated collection of the very best, which tends to give the impression that human artists are about to go extinct, whereas the fact of the matter is that DALL-E is less "useful" and more "unintentionally hilarious".

One major problem with AI is that basically the only thing is has to learn from is the internet, and the internet is a very bad place to learn about the real world. Being able to ingest and self-label data from reality seems to be a pretty significant part of what makes humans and other animals intelligent; the neural network inference algorithms seem to be a pretty small piece of the puzzle in the end.

Expand full comment

I think you may get more consistent styles by mentioning a particular artist, for example "William Ockham holding a razor, art noveau stained glass by Louis Comfort Tiffany"

Expand full comment

For William of Ockham, I bet you'd get better results if you swapped out "razor" for "knife". They're practically the same thing in a medieval style, but DALL-E would stop pulling from shaving ads.

Or for a more metaphorical razor, go with "sword" or "longsword". Add some language about monk's robes and tonsure and you might William of Ockham halfway to a Jedi. Which would be awesome.

Expand full comment

Ockham's lightsaber: The simplest explanation is that everyone is related.

Expand full comment

Technically, that's Darwin's lightsaber.

Expand full comment

Regarding Alexandra Elbakyan, I recall reading in a different article about DALL-E that it is programmed to deliberately screw up pictures of real people, to avoid deep fakes and revenge porn and such. Could that be the problem there?

Expand full comment

Interesting. And sounds exactly like the annoying aspect of almost all modern AIs, from Google's search algorithm to voice recognition phone trees: they always funnel you too rapidly into whatever is heavily represented in their training data -- the most popular queries -- that is represented by most of your query. It's like they're all gross failures at the Sesame Street game "one of these things is not like the other" -- cannot readily pick out minority components of a query that are unusual and important in that they shove the query off the well-beaten path of its training set.

It's my experience that modern AIs are heck on wheels if you want a popular result but are a little fuzzy on how to ask for it. I can ask Google for "the redhead in ABBA" or "famous Russian marshal Operation Barbarossa" and get "Anna-Frid Lyngstad" and "Zhukov" in no time flat, very impressive. But if I want something *almost* but significantly different from typical, it's like wrestling with an oiled 600 lb walrus, trying to shove the AI off its monomaniacal well-intentioned discarding of certain keywords I add, assuming I must just be mistaken, surely I want the most common result desired when (a random) 80% of the words I used are selected...

You wonder what it is that the human brain does to pick this stuff up.

Expand full comment

Come to think of it - does the human brain always pick up on these subtleties, though? If I were to try to explain something subtle to another human, the chances are they would jump to conclusions and think I mean similar, common thing x, which they are familiar with (similar to the AI training set) when what I am trying to explain is the more uncommon y, which is similar, but different from x. Maybe what we lack with google, is the possibility to say no, I didn't mean x, I mean y. Then google showing z, and you beeing able to further specify. After all, that is how human communication works (well should work, haha).

Expand full comment
May 31, 2022·edited May 31, 2022

Not always, of course. The human brain doesn't always remember i before e except after c, or where it left the keys last night. But I'm not sure why this is relevant. The important question is: does the human brain *routinely* pick up on this stuff? And of course it does. That's why we prefer to speak to a human being rather than Alexa to get our nontrivial questions answered.

Also, the human brain is very good at taking a surprisingly accurate guess about when it has sufficient information and when it doesn't, and will spontaneously generate a request for clarification if needed.

One big honking clue would be somebody asking almost the same question twice in a row. A human brain would say to itself "whoa...this sounds like the same request coming in, that must mean I got something badly wrong the first time *and* my interlocutor can't figure out how to rephrase himself...time to ask a pointed question or two and help out."

Indeed, one of the strange aspects of modern AI productions from my point of view is sort of what you said (I think) -- how rarely I see anything approaching a dialog -- of the AI figuring out it doesn't know enough and asking for more information, a rephrasing, or just asking questions to resolve ambiguities. Why is that? If you listen to humans interacting, it's almost never a single precise command followed by aye aye sir. Almost any conveyance of request is at least a short dialog, where Alice asks Bob for the keys to her heart and Bob says I didn't know you cared and Alice says oops I mean my deer-shaped cash box the key to which I asked you to keep for me and Bob says oh yeah right sorry.

Expand full comment

Dumb question, but what is the difference between what we can deduce the AI is doing and AI truly being a black box? Doesn't this post kind of illustrate that AI isn't as black box-y as we sometimes think? Also if any one has any resources on like meta (not the company) AI studies that would be really helpful to me.

Expand full comment

I would say that the characteristic of a black box is that we don't properly understand how it works, not that we're completely ignorant of how inputs are related to outputs.

Expand full comment

Could you maybe tell me more about this? What part of not properly understanding how it works can't be illuminated by further analysis like this into how the outputs are related to the inputs? Could something like this be a nascent step into making that understanding more sophisticated, or is there "something that it's like" to be this black box that we will never have the capacity to fully understand in the same way that a dog could never fully understand that of a human? This is not a challenge, I am genuinely really fuzzy on where the reasonable deduction ends and the black box begins.

Expand full comment

I think it's a good question. I think that "black box" is a metaphor not to get too hung up on.

Consider a fancy neural network that takes an image as input and spits out a caption. On the lowest level of abstraction it's very clear what it's doing -- you can see all the nodes and all the weights and understand exactly how inputs are converted by fairly simple mathematics into outputs. You can double-check the maths by hand if you really want to (and have a lot of time and a lot of paper).

On the highest level of abstraction it's pretty simple too. You give it a picture of a dog, and it spits out the word "dog". No mystery there either.

The slightly mysterious part is just at some level of abstraction between those two, trying to understand why this particular simple-but-lengthy mathematical formula can not only identify a "dog" but also a "house" and "green goldfish in a bowl sitting on top of a television in front of an Antarctic background".

Expand full comment

Thoughts in no particular order:

Find the style of stained glass you want in books or museums and see how it is captioned there. Maybe use words like 'collection', 'ca. yyyy' or even 'C.317-1927. ©Victoria and Albert Museum'.

Art Nouveau might not be the best style for this. There are a lot of faux-wooden front doors out there with some vaguely edwardian stained glass design on them for you to put on the front of your vaguely edwardian house. Whereas something like pre-raphaelite stained glass has several advantages.

It is much more likely to be representational and allegorical. It's a reaction to the industrial world that tries to hide its fundamentally modern nature behind traditionalism like the work of Chesterton or Lewis. And art nouveau women look dreamy whilst pre-raph women look like they are done with your shit; Elbakyan is very done with academic publishing's shit.

Are all of the captions in English (*all* of them)? Maybe calling it Jugendstil or Sezessionstil would work better than Art Nouveau.

Expand full comment

Regarding Ada Loves Lace, i had a co-worker named Phil French. Windows NT decided that he must speak French, so changed his operating system language to French. Sigh.

Expand full comment

This is how it begins. While we’re all laughing 😱

Expand full comment

Vous mean "Le Sigh" 😉

Expand full comment
May 31, 2022·edited May 31, 2022

Apparently DALL-E is not completely bad at generating text, it's just that it has its own language.

You can type the "gibberish" back at it to see what it means: https://giannisdaras.github.io/publications/Discovering_the_Secret_Language_of_Dalle.pdf

Expand full comment

Whoa, that is crazy. I'd want to see more examples replicated though to confirm this. Most of the text I've seen seems just like misspellings of the English words, but this seems to arise from two images where the text being written wasn't given in the prompt.

Expand full comment

There is push back against this claims and claims made here may be confirmation bias/misinterpretation/fraud.

Expand full comment

I see a general trend here: AI is good at creating things similar to existing ones (to the training data. Same same but different. However, it is not creative. It cannot create something original

Expand full comment

Same with humans

Expand full comment

Who is creating all of humanity’s art, literature abs science then?

Expand full comment
author

It's really hard to create genuinely original literature. My impression is that on the rare occasions it happens, it's a combination of:

- a once in an era genius

- the gradual drift of everyone in society over a long period of time

- collision with some sort of non-human reality that gives a new perspective on human ideas which can then be integrated into art, without an individual human having to have a truly novel idea

- drugs

- random noise ( see https://slatestarcodex.com/2014/08/06/random-noise-is-our-most-valuable-resource/ )

Expand full comment
May 31, 2022·edited May 31, 2022

I am curious. Why didn't you specify that you wanted it in the style of a stained glass window, rather than including a stained glass window as one item in a list? If someone gave that prompt to me, that is how I would interpret it.

Expand full comment

It may be the power of suggestion, but it seems like some of those Tycho Brahe images might have been influenced by the fact that Tycho Brahe is also the pseudonym for a webcomic author.

Expand full comment

I don't see it. Webcomic Tycho Brahe is depicted with no beard or hat and a full head of hair.

Expand full comment

No, I'm saying the second set of results with the Reindeer looks (to me) a little like they are in the style of a web comic (one more than the others but I feel like I can see bits of it in all 3).

Expand full comment

DALL-E is impressive bit why is it still so hidden and locked down, and why hasn’t a large company bought it or the technology.

Expand full comment
author

I think OpenAI, who makes it, is a large company. So far it looks like their business model has been less "make people pay to use it" and more "build a lot of hype around it and hope that Microsoft or some even bigger company licenses the rights to it". My guess for why this hasn't happened is some combination of "no company really needs a fully generic AI art program" and "it's only existed for a few months and inter-company deals take time".

I bet they have some dumb idea like "the more people use it and feel like it's a toy, the less likely were are to sell it to Amazon for five zillion dollars to handle all their branding" or whatever. Alternately, they might feel like they have enough money to just have fun and shoot straight for AGI, I'm not sure. It's a good question.

Expand full comment

William Herschel just has a German name. He was a German-born British astronomer. He was raised protestant, as was typical in Northern Germany at the time.

Nothing Jewish about Herrn Herschel at all.

(Of course, Dall-e could be very American, and think all German names sound Jewish.)

Expand full comment

Not AI but: we were once in a position to commission a piece from Theodore Ellison. Not your typical suburban house lilies. He's here in the Bay Area.

https://theodoreellison.com/collection/bespoke/

Expand full comment

We actually have a stained glass window of Darwin with finches at Providence College, along with Newton and Galileo, scroll down here for pictures: https://news.providence.edu/stained-glass-windows-transform-fiondella-great-room

Expand full comment

Unfortunately, it depicts Darwin with a human body.

Expand full comment
author

Why would you choose to work in stained glass if you hate color??

Expand full comment

If you want to kill

Like Son of Sam

Try our razor

Gila Whamm

Expand full comment

There is a program called Pattern Wizard. It allows you to import patterns and use parts of them to make a new pattern. It also lets you import pictures and draw patterns over the picture. Therefore, you can fairly easily produce a pattern of pretty much whatever you want. The only thing you have to be careful of is making the pieces a size you can actually work with. However, if you simply want to make a faux stained glass, you can forget those constraints and draw pretty much what you want. It will then let you plug in different colors and patterns of stained glass which it has in a library. I use this feature to show my customers generally what the stained glass piece will look like. I also will print some more detailed pieces on sticky back plastic pages and stick them to the glass. Then I will cut out the area I want to paint a particular color and use a powdered glass mixed with a binder to fill in the area. After it dries I do another color. I continue this process until the painting of the piece is done. My glass powder fires at 1250 degrees F but there are low temp ones that fire in a microwave or oven. I hope this helps some of you out there to be able to create something special for yourselves, I know these processes have certinaly made me and my customers happy with the results. Have fun creating. Jim Walter

Expand full comment

It's interesting, but the same problem of mixing style and content is also present in humans. We can find a great example of this in early Russian literature. When the author wanted to write something religious, they copied the Bible (which was not even in Russian); when the author wanted to say everyday things, they used their transription of spoken language. When it was something in between those two themes, like description of a battle, it was a weird mixing of those two styles by page, or an attempt to find a stylistic middle ground.

Expand full comment

This was laugh out loud funny! I loved it, especially that red-bearded psycho Gila Whamm!

I was surprised you didn't mention the problem that DALL-E showed both Brahe and Herschel putting the eyepiece of the telescope up to their mouths or noses.

Expand full comment

The thing is, the more I see of these programs, the more I think they're not actually capable of doing what they pretend like they're doing at all. It looks an awful lot like me that what these programs are actually doing is basically taking bits and pieces of images from the internet and stitching it together in such a way that it is hard for the people who are trying to figure out what the thing is doing is doing this. I actually noticed this with the flamingos thing, as one of them had bits I remembered seeing in online art previously.

It also seems to run them through various "filters" which makes this harder to catch. As such, I think these things may ultimately end up being really complicated ways to obfuscate copyright infringement.

The thing is, this makes sense; these "machine learning" programs are basically programming shortcuts.

It's been known for a long time that it's possible to trick image recognition programs in various subtle ways, by altering a few pixels or making a slight change to the image, and getting it to highly confidently misidentify the image as being something completely different even though humans often cannot even detect the alterations that were made to the original image. This is because these things aren't actually recognizing the image in the way that humans do, but creating an algorithm for "things that get textually described/linked to as describing this". These attributes sometimes have very little to do with the actual desired thing ,which is why you can get these weird outputs.

Which is exactly what is going on here as well, and why including "reindeer" makes it more Christmasy - because it doesn't actually understand "reindeer" conceptually.

Expand full comment
author

See my discussion of this in https://slatestarcodex.com/2019/02/19/gpt-2-as-step-toward-general-intelligence/ and then the followup post https://slatestarcodex.com/2019/02/21/my-plagiarism/

I don't think they're literally taking bits and pieces in the sense of CTRL+C and CTRL+V - I think they're taking conceptual bits and pieces and remixing them. But that's kind of what creativity is.

Expand full comment

That strikes me as almost obviously false. If not, can you identify the preexisting bits and pieces that were remixed to come up with the Special Theory of Relativity or Der Zauberflöte? An answer of "the following observational facts about light" or "the notes on the piano" is not sufficient, because that space of combinations is combinatoric hell, so enormous you could wander there for a trillion years and not come up with anything meaningful -- it'd be like setting a thousand monkeys typing randomly and hoping to accidentally recreate Shakespeare.

I'll also point out you have a serious origin problem: if all that is new is remixed old, then where did the building blocks come from in the first place? It defies common sense to imagine all that man has invented in the last 40,000 years already lay inherent in Australopithecine grunts and gestures (or thoughts).

Expand full comment
author

In terms of "The Magic Flute", I feel like I'm just claiming Mozart was heavily influenced by the entire corpus of classical music that came before him, which is why it sounds more similar to eg another 18th century Austrian composer than to Tibetan chants or gangsta rap. When I look up music historians on Google, they say Haydn, Bach, and Handel are among the most important influences and the Magic Flute uses techniques from all three - not slavishly, and in new and interesting variations, but it does.

In terms of Special Relativity, I'm not a historian of science but I assume Einstein's influences were people like Maxwell, Newton, Kelvin, and everyone else who had come up with the particular scientific tradition he was working in. I think this is less true than for Mozart because in science moreso than art, reality can also be an influence!

Again, see https://slatestarcodex.com/2014/08/06/random-noise-is-our-most-valuable-resource/ for more on how I think of this.

Expand full comment

Alpha Go took 4.9 million simulated games to get good at Go.

No human has played 4.9 million games of Go. Even if you play ten games of Go a day, every day from when you are 4 years old until you are 34, you will have only played about 100,000 games.

You're thinking that these things are just weaker than human brains, but the number of replications is vastly, vastly higher than what humans do.

The problem isn't lack of power - it's the way that they function. It's lack of the ability to even take conceptual bits in the first place. They lack concepts, or even the concept of concepts.

They don't function like humans do, and you won't get human function by just strapping on more power to them.

I think that trying to convince people that this is how humans actually function is key to making people think that these things can actually solve the problems they purport to solve, but it's not how people *do* function.

While we may take inspiration from various places, the way that we use that inspiration varies wildly, and it also arises logically. We revise things in various ways because it works better, we change things because the way that stories flow is actually highly non-arbitrary.

In fact, when people try to make stories the way you're suggesting, it ends up coming out very stilted and unnatural. When people do this, it is because they don't understand how to make stories, and so they just end up awkwardly aping stuff (and they don't have to consume millions of pieces of writing to create this awkward, stilted, derivative stuff).

These programmers are cargo culting what humans are doing without understanding it. Painting your towers more accurately won't cause cargo to come for you because cargo doesn't come because of the way that the towers are painted.

Expand full comment
author
Jun 5, 2022·edited Jun 5, 2022Author

Why wouldn't you expect that a very dumb system needs lots of games of Go, a medium intelligence system needs only a few, and an intelligent system needs very few (or none at all?), without any conceptual break?

A human Go player also needs a lot of games to become an expert - not 4.9 million, but a lot. These AIs which are 0.1% the size of the human brain need more than humans do. I'm not necessarily claiming an AI exactly the size of the human brain will need the exact number humans do, but I think something in this area is a pretty strong possibility.

Expand full comment

Has adding more power reduced the number of cycles necessary?

That's a serious question. We've had a few generations of these AIs. Do they actually learn faster now, in terms of needing fewer games? Or do they just do their games faster?

Expand full comment

I can't wait to try out some stuff with DALL-E. This looks like a lot of fun to mess with. Thanks for sharing.

Expand full comment

Is there no way to say "NOT Santa Claus" to remove Santa elements? Or maybe there's a way to find the term antithetical to Santa Claus and add that to cancel out the Santa elements

Expand full comment

“looking realistic” is different than being real..

Expand full comment

Really excited that the AI was able to tap into Charles Darwin-Nagel's infamous treatise "What is it like to be a finch?" from the alt universe my parents came from.

Expand full comment

Is there some experimentation or guidance not depicted here that led to the consistent "[subject], [style]" phrasing? Did you ever try something like "stained glass window depicting [subject]"? This very slightly reminds me of people trying to feed overly structured queries to Jeeves back when it wanted plain English questions.

Expand full comment

I think Dall-E confuses William Ockham with Will Oldham aka Bonnie Prince Billy.

Expand full comment

Interesting that it isn't just Harry Potter fan art, the pseudo-Elbakyan is obviously Slytherin (unlike Hermione) in first and third picture and maybe Hufflepuff (again, unlike Hermione) in the second. (And if I had to assign the Stalinist hothead herself a Hogwarts house, I would probably put her in Gryffindor, too. She's many unpleasant things, but bravery does suit her. And, while she might be hard-working, she's not loyal, so no Huff.)

Expand full comment