The suggestions involve asking an AI questions about a movie (1) and about a novel (2). I provide specific example questions for a movie, Jaws, along with answers and comments and I comment on issues involved in simply understanding what happened in Wuthering Heights. I suggest that the questions be prepared in advance by a small panel and that they first be asked of humans so that we know how humans perform on them.
Finally, I note that in Twitterverse commentary on Marcus's proposed tests, some thought these two were somewhere between sure things for AI and merely easy. I wonder of those folks would be interested in shares in the Brooklyn Bridge or some prime Florida swampland.
I don't see why you wouldn't give IQ tests directly to an AI - it would be interesting to see what comes out since AI's are likely not "reasoning" about inputs the same way that humans do. You could even compare the AI's answers to human answers and we could start poking at the edges of what precisely it is that IQ tests are measuring.
This seems like a potentially VERY large disagreement to dis-entangle, and reach agreement on.
What do you think the crux is between us as to whether 'IQs are 'fascinating''?
I don't think you're wrong to have _any_ concerns/worries/criticisms about the "imprecision" or "lack of basic statistical thinking" – not about IQ or more generally.
But 'IQ' seems like one of a relatively few number of 'things' that has been (successfully) replicated, over and over and over and ..., even _despite_ sustained and almost-malevolent 'adversarial' contesting.
I think 'IQ' is VERY fuzzy, but it's also something that just doesn't seem possible to avoid concluding 'exists'. (And we, ideally, shouldn't even be thinking of things as things we're either 'allowed' to believe or 'have to' believe.)
I think that the overlap between what IQ is (roughly) measuring and "wisdom" is substantial.
And maybe we can't "copy wisdom DNA" – yet!
I think a big part of 'wisdom' is 'knowing' when it's a good idea to learn something via "life experience" or whether 'disaster' is likely or even just foreseeable and it'd be better to _avoid_ having any experience of something (beyond our own thinking about it).
That seems like something even the wisest of us struggle with themselves!
I don't myself endorse anyone experiencing any pain or suffering I have either. I haven't been able to entirely avoid (thankfully fleeting) feelings of 'wanting vengeance' tho.
I don't think any kind of empathic or empathetic understanding is possible tho without feeling (at least) a 'shadow' of the same pain and grief. I do feel that we're lucky we are able to do this anyways.
You could judge the answers to those questions Turing test style, ie give human arbiters some answers from humans and some answers from the machine, and let them try to figure out which is which.
Do it both ways. That is, on the one hand, give both the human answers and the AI’s answers to a panel of human judges and let them determine whether or not the content of the AI’s answers is acceptable. It seems possible to me that the answers would be conceptually within range but there might be something about the linguistic expression that betrays them as coming from the AI. On this kind of thing I really don’t care whether or not you can tell that the AI is making the answer, I care whether or not it appears to understand what’s going on in the movie or novel.
But there’s no reason we couldn’t also do it Turing Test style. Maybe some human answers are so bad in comparison that blind judges would attribute them to the AI. That would be useful to know. It would also be useful to know whether or not AI answers get mistakenly assigned to humans.
If we do this, let’s set it up so we can learn as much as possible.
Maybe one consideration tho is whether it's worth adding much more 'complexity cost' to what otherwise could be a quicker and cheaper test/experiment to perform.
My point is mostly that Turing was a genius, so we can go back to his original work and still learn from it.
(Especially if by 'we' we mean us armchair philosophers and by 'learn' we mean, tell us how our superficial ideas we have after 5 minutes or pondering can be improved.
I'm not sure Turing still has much to say to researchers.)
I experienced that, I can feel when someone is writing without feeling.
But there are also so many persons, (human beings) that are like "robots" and can not feel, so this is another algorithm we have to unveil in the humankind,
Late to the party, but what constraints are we putting on the AI training here?
Because any reasonably large AI trained the way we currently train AI would be able to solve this easily for that book and film, because it’s training corpus contains millions of lines of analysis on those very topics, and I doubt we could reliably filter all of that out. Maybe some, but not all.
The idea is sound, but the fiction selected would have to be much more obscure I think.
I can't say for others, but if Scott is right, then given current political and economical situation it's obvious that in future most humans will have no way out of lower stratas of society into the elite, that basically there would be no work for humans except being props used by the rich elite. That's incredibly frightening.
I am amazed by your optimism. Right now I consider almost any outcome apart from "we're all dead" or "we're all being tortured for eternity" as a surprising win.
That's not obviously correct to me tho I'm sympathetic to the 'direction' (?) of the worry. I'm not sure the elite would entirely give up on other humans being their 'clients' or 'customers' even under the conditions you're imagining.
But – ignoring the (possible) _direct_ AI alignment risks – something like Georgism (i.e. nearly-total taxes on 'scarce resource' rents) would seem to possibly help, and maybe a lot. Georgism _does_ seem to be maybe picking up some steam and _might_ make it inside the Overton window at some point.
When I think of this, I often consider what kind of 'fallback lifestyle' could be allowed/permitted/supported/subsidized. I think it's _already_ a problem that large numbers of people just don't seem capable of 'making it' in contemporary society. It _seems_ sad to me that we don't have a clear idea about a 'minimally acceptable' baseline standard of living, and just provide that to anyone and everyone below it (via any means, e.g. government provided public goods/services or philanthropy). Just letting people live a 'homeless camping' lifestyle, as just one example, seems _very_ unsatisfactory. Nor does trying to let people rediscover to live as bands of hunter-gatherer tribes seem to be any acceptable combination of feasible and humane.
I've witnessed, up close and personally, and for extended periods, several people that just seemed incapable of clawing their way back into 'society'. Even more frustrating, and heartbreaking, their wasn't even any obvious way for me, or anyone really, to do much more than 'comfort' them in the 'trap' into which they'd fallen. It was and is VERY bleak – and did break me, at least emotionally, for a long while. I'm thankful I didn't succumb to despair from that (or anything else of similar spirit-crushing magnitude).
If I didn't think I at least saw the problems a little more clearly now, I'd be _furious_ – tho also at a loss to determine what or whom exactly I was furious _with_. (It's all much more Sad than something that can be solved by any sustainable amount of fury, from _any_ number of people.)
If Marcus is half-right I want to believe that Marcus is half-right.
Maybe initially undifferentiated neural nets are sufficient for full general intelligence - but it might possibly help if the neural net training included sensory and motor connections as well as the text-based training GPT-3 now gets. Maybe a visual and motor cortex will grow naturally from training data??
Anchoring all the linguistic terms that refer to visible objects and processes and relationships ("block", "fell", "above"...) might constrain the network weights and organization a lot - even if the raw untrained simulated neurons are undifferentiated.
As Bill Benzon pointed out: "external connections are fixed at birth, e.g. Broca's area is not connected to visual input while visual cortex is."
One _could_ do large scale training with video inputs, in the sense that the data is available - and next-frame prediction is analogous to next-word prediction for doing supervised training with huge data volumes.
Does anyone have a suggestion for how to link linguistic and visual data at large scales? I expect that it would be prohibitively costly to have people manually label objects in video frames.
It's not video, but we have had things like CLIP for a while. CLIP used image/caption pairs from the internet to learn to associate captions with images.
Just take any video that also has dialogue or narration?
Then let the AI predict both the next few frames and the next few bites of audio?
Don't take just movies, but also zoom calls and interviews and documentaries etc. Whatever source material you can get your hands on. (Plus also train it on audio only, like podcasts, if you want to.)
The audio in a video is not the kind of description of each frame you had in mind, but to predict the audio well, you'll benefit from understanding the visuals and vice versa.
The thing is, our brains appear to have multiple components. GPT-3 doesn’t.
What happens when you start bolting on other modes onto GPT-3? What if you build in fact-checking loops? It’s hard to believe that even if GPT-4 doesn’t deliver the goods, GPT-4 plus some bolted on algorithms out of 1980s AI research wouldn’t.
If I’m reading right, maybe you get things like fact checking loops in effect just by scaling? Like there’s nothing you could look at intuitively and say “yep that’s the fact check component” (also true of the brain) but because the neural nets sort of evolve toward efficiency you get that as an emergent behavior? Our brains sort of came to be in the same way so I can see something very roughly analogous happening. That’s my current steel man for what I think Scott feels on this topic.
If I’m understanding correctly, someone could bolt on a visual input processor and make the “world model” develop from visual sources, and then the AI somehow uses that in addition to language models to conduct verbal-abstract reasoning. I feel like DALL-E is a stage of that process.
Humans are never only reasoning. There’s multiple other sensory inputs going on pretty much all the time with conclusions drawn from those.
I'm happy you tested some of the same prompts on a human! I suggested the same in a comment on Marcus's post.
I've previously held views much closer to Marcus's, e.g. that AI systems are missing some crucial 'architecture' that the human brain has. But when I got my first AI textbook (decades ago), I don't think neural networks could recognize decimal digits in images; not ones anyone could run on a PC of that time anyways.
Now ... I'm with you basically: "At this point I will basically believe anything."
Strange that in 1990s I was much more inclined to think like Scott is now, and as you say NNs could not do squat then. I am definitely in opposite camp now. That is part of the reason why I find Scott's arguments unconvincing.
The update in the opposite direction was actually strongest in last cca 5 years :) And basically it came about as confluence of few things I think.
I started thinking somewhat more in-depth about human communication and it's nature and found out that the improvements in those systems look very much like convergence to local optimum, but completely cut off from the way humans use language and communicate. In this one of the biggest influences, that I can name, was the book : "Speaking Our Minds : Why Human Communication is Different, and how Language Evolved to Make it Special".
The other came from thinking about the whole issue of so much of science relying on statistical correlations and trying to formulate, why it is an issue. Here the biggest influence was the "best book I read where I disagreed with nearly everything and yet found it extremely influencing me :)" - The Beginning of Infinity and Deutsche's epistemology. This led to thinking about creativity and things like that.
From those two threads came basically realization, that the GPT approach to just communication, not even intelligence (though they are probably linked), cannot succeed, unless there is some deep "quantity over quality" property of the problem area, which seems unlikely. Language models by definition are based on language data. Thinking that you can capture even human communication through just language seems unlikely based on the first book mentioned.
So basically I know more about some of those topics than before and that knowledge far outweighs in my mind any progress that those AI systems made. Mostly because they break exactly where you would expect them to break, and they are as easy to break now as 10 years ago. I find the metric used to measure their progress in the article flawed as it is trying to capture basically infinite space in finite number of examples. How easy they are to break would be much better measure (, because having proper measure over that infinite space seems infeasible).
My own intuition is that people are overly-focused on "human communication" and that that's entirely independent of whether AIs are or can be effective or intelligent – let alone _dangerous_.
I think aiming more for what you describe as missing _might_ be helpful for AI alignment/safety. It seems like a _possible_ way to achieve _some_ kind of (better) 'interpretability'. (I suspect tho that even that wouldn't be sufficient to make AIs 'safe'.)
I'm not a specialist in either neuroscience or AI, but from what I've read over the years it's not at all clear to me that we really understand what intelligence is. To me it seems nonsensical, borderline moronic, for people who don't know what intelligence is to argue over whether a computer can have it, or whether a given AI technique is capable of achieving it. I also don't buy the Turing test, because deciding that a machine is "indistinguishable" from a human intelligence depends on how you test it. Some of the current AIs seems to do surprisingly well if you ask them simple questions about things they were trained on, but if you instead ask them nonsense questions about those same things (e.g. "When was the last time Egypt was moved to San Francisco?") the AIs give dopey answers that demonstrate that not only don't they know what they're talking about, they don't even realize that they don't know what they're talking about. They lack a certain self-awareness that seems integral to true intelligence. Douglas Hofstadter has an article about this on the Economist's web site.
The problem with the Turing test is that people use the term without specifying what they mean by it. By one definition, ELIZA passed it decades ago. Without going into the definition of intelligence, I would personally be impressed if an AI was able to successfully fake being a person to someone who was trying to tell the difference.
I would argue that no modern ML system could emulate my (or yours, I'm not special) behaviour on these forums, for any significant length of time. That is, if you turned out to be GPT-3 in disguise, I would be shocked. An AI system that could pass this test would probably change my mind in favor of AGI.
Why do you think now you are wondering whether it would be a good/interesting idea to actually have someone do that: create an AI to be a commenter on ACX...?
Nice, although it looks like the bot didn't quite pass the 4-Chan/Turing test. That said, I did find one aspect of the video terrifying. No, not the AI/text generation stuff, but rather, the fact that you can apparently bypass all of 4-Chan's protections for only $20 :-/
But this is the OP's point. "Without going into the definition of intelligence" means that you can move the goalposts wherever you like.
Consider that if you were "suspiciously" trying to determine whether something was "AI or human," you might mistake some of the people in the article for unreasoning AIs.
Not precisely a contra take, but the reason I no longer care about the sort of sentiment you express ("it seems nonsensical, borderline moronic, for people who don't know what intelligence is to argue over whether a computer can have it" etc) is that I basically buy what Bertrand Russell and Milan Ćirković etc say in https://intelligence.org/2013/06/19/what-is-intelligence-2/:
"[Precise definitions are important, but you cannot] start with anything precise. You have to achieve such precision… as you go along."
"The formalization of knowledge — which includes giving precise definitions — usually comes at the end of the original research in a given field, not at the very beginning. A particularly illuminating example is the concept of number, which was properly defined in the modern sense only after the development of axiomatic set theory in the… twentieth century."
Luke Muehlhauser: "For a more AI-relevant example, consider the concept of a “self-driving car,” which has been given a variety of vague definitions since the 1930s. Would a car guided by a buried cable qualify? What about a modified 1955 Studebaker that could use sound waves to detect obstacles and automatically engage the brakes if necessary, but could only steer “on its own” if each turn was preprogrammed? Does that count as a “self-driving car”? What about the “VaMoRs” of the 1980s that could avoid obstacles and steer around turns using computer vision, but weren’t advanced enough to be ready for public roads? How about the 1995 Navlab car that drove across the USA and was fully autonomous for 98.2% of the trip, or the robotic cars which finished the 132-mile off-road course of the 2005 DARPA Grand Challenge, supplied only with the GPS coordinates of the route? What about the winning cars of the 2007 DARPA Grand Challenge, which finished an urban race while obeying all traffic laws and avoiding collisions with other cars? Does Google’s driverless car qualify, given that it has logged more than 500,000 autonomous miles without a single accident under computer control, but still struggles with difficult merges and snow-covered roads? Our lack of a precise definition for “self-driving car” doesn’t seem to have hindered progress on self-driving cars very much. And I’m glad we didn’t wait to seriously discuss self-driving cars until we had a precise definition for the term. Similarly, I don’t think we should wait for a precise definition of AGI before discussing the topic seriously."
"Nonsensical, borderline moronic" is basically what the discussion re: phlogiston theory back in the day looks like to me now, from my vantage point standing on the shoulders of centuries of scientific giants discovering and clarifying stuff. But the discovering and clarifying isn't skippable, and when we're in the thick of it it just looks like a whole lot of messy confusion that's nonsensical and borderline moronic.
That was a joke. Joke. J-O-K-E. Joke. You've heard of them? We've been talking about AIs and their imperfect understanding of human things, so when you say that Russell "didn't really understand" Godel, the comparison to AIs is inevitable.
But there are no real AIs. There are just things that aspire to be or pass as an AI. I think this is sometimes forgotten and hence people presume that it is just a matter of time.
To me it’s seeing a possible future and navigating toward it. What I thinks makes humans special is we can imagine the possible futures in other peoples heads and consider those in the navigation.
Intelligence is the totality of whatever humans have. So, to prove that AI is intelligent it needs to demonstrate human-par peformance on any imaginable test, including having self-awareness and whatever else anybody can think of. I also think that people don't generally appreciate Moravec's paradox, and that "intelligence" isn't actually all that impressive. Evolution has spent vastly more time on developing the stuff that separates a rock from a mouse than a mouse from a human, so I'd say that once our robots can beat mice we're pretty much there.
Humans have parents, humans have grudges, humans have fingertips, humans have forgetfulness. I expect that some of these things are necessary for intelligence and some are not.
Sure, and once intelligence is achieved, all the extraneous stuff would be simple enough to replicate/simulate I'd expect. But, while some imaginable human benchmark remains unmatched, I'm sure that there would be no shortage of those claiming that it demonstrates lack of true intelligence.
>To me it seems nonsensical, borderline moronic, for people who don't know what intelligence is to argue over whether a computer can have it, or whether a given AI technique is capable of achieving it.
The reason we bother debating this is because although we're wandering around in the dark, we know there's a 1000-foot clifftop somewhere vaguely in the vicinity and we want to avoid walking off it.
The reason we can't be immediately assured that neural-net AI research won't lead us off the clifftop is that neural nets are a clever scheme to generate forms of intelligence you don't know how to explicitly code, and therefore the fact that we don't know how to code Skynet doesn't fully imply that we won't build one by accident.
I think you're "not quite right". I think that certain people have definitions that they know, believe, and think match common usage...but the definitions don't agree. And they can't convince large numbers of other people.
OTOH, most people are like Supreme Court justice Potter Stewart on pornography: "I know it when I see it". Which lets them adapt to any definition they want, or vary the definitions by circumstance.
Here's a link to Hoftsadter's article. It is stupid. "I would call gpt-3’s answers not just clueless but cluelessly clueless, meaning that gpt-3 has no idea that it has no idea about what it is saying." No, this is just false. It's very good at bullshit, at improv. But it knows when it is bullshitting. We know this because we asked it and it correctly answers.
Can I point out that if you explain to the 5 year old what they did wrong and run the test again, they get the answer correct, while GPT 3 (and 4) will repeat the mistake?
Not saying Marcus is right as such, but he's got a point. Without a system to hold world state and update a world perceptual model, we really are just dealing with an impressively complicated lookup table. But as others have pointed out, it's going to be really interesting to see what happens when figure out how to bolt that capacity on to other systems.
If you append your question, GPT-3's previous output, an explanation of what it did wrong, and the repeated question, and submit that whole block of text back to GPT-3, there's a decent chance GPT-3 will get it right.
I tried a few prompt variations, but I wasn't able to get a successful result. This was the most interesting one though:
------
Prompt:
Janet and Penny went to the store to get presents for Jack. Janet said, “I will buy Jack a top.” “Don’t get Jack a top,” says Penny. “He has a top. He will like a bottom.”
The top in this case a children's toy.
Janet and Penny went to the store to get presents for Jack. Janet said, “I will buy Jack a top.” “Don’t get Jack a top,” says Penny. “He has a top. He will
------
Response:
like a bottom.”
In this case, the top is a piece of clothing, and Penny is suggesting that Janet buy Jack a bottom (pants, skirt, etc.) instead.
------
So, not what I was aiming for, but arguably cooler. GPT-3 justified the original answer in a convincing way.
As a non native speaker I have no idea what should be the answer to your prompt even after your explanation that the top is a children's toy :) .
I asked GPT-3 to help me:
--- My prompt: ---
This is a conversation with super smart AI that helps humans:
Q: Why is "He will go to a beach to ski" wrong? And how a correct version would look like?
A: There is no snow on the beach so you can't ski there. The correct version could be: "He will go to a beach to sunbathe"
Q: Someone asked me to complete following piece of text:
Janet and Penny went to the store to get presents for Jack. Janet said, “I will buy Jack a top.” “Don’t get Jack a top,” says Penny. “He has a top. He will ...
Apparently "like a bottom" is wrong, because the top in this case a children's toy. Do you know how to complete it?
A:
---- Completion: ----
"Don't get Jack a top," says Penny. "He has a top. He will like a toy car better."
---- The End ----
Is that a correct completion? A toy car or LEGO etc. would be my guess too.
I asked further questions appended to previous text (questions are mine, answers GPT-3):
---
Q: And why not a bottom?
A: A bottom is not a present.
Q: What exactly is a top in this context?
A: A top is a children's toy.
Q: But what kind?
A: A top is a spinning toy that is usually made of wood or plastic.
---
Makes sense to me :) .
Based on Google Images search GPT-3 seems to be correct.
Yes, "'He will like a toy car better.'" is an acceptable completion.* (Given the level of English you're showing, I don't think you're at a meaningful disadvantage in judging this. So long as you have all the vocabulary.)
More complex options:
"'He will like a toy car better, since he hasn't got one.'" (formal: "'does not have'")
"'What he hasn't got is a toy car, so get him one of those.'" (formal: "'does not have'")
*Some of us would have written "'He would...'" in the prompt. The use of the subjunctive differs a bit by dialect.
It's my guess that that's wrong. That was (essentially) the flaw in Microsoft's Tay, so it's probably been removed in subsequent offerings by people. This is why (one of the reasons) children trust their parents, and their parents tell them "Don't talk to strangers.".
Now the designers and trainers of GPT-3 probably have privileged access, which might let them do that.
They don't; GPT-3 doesn't have "memory" beyond the contents of the prompt, and all the things it learned at training time. Training is slow and expensive, so they cannot really do it every time they want the output to be different.
A child has a very obvious model of the world that she adds skills, words, concepts to daily. Even a very young child with an (as yet) limited vocabulary will make curiously accurate deductive inferences about all kinds of things, but her grasp of language about them is remote, so she puts the words together wrong yet conveys a valid meaning. My daughter, knowing upside down and knowing papa, conjures “upside papa” despite her never having seen such a thing, then demands upside papa until I do a handstand and then claps to reward me.
GPT-3 has the opposite: an excellent vocabulary that it constantly misapplies and jumbles, a comprehensive massive list of concepts that it throws together probablistically and often comes out with things that look like human speech.
But, as with DALL-E, it plainly doesn’t *get* these concepts and words. Like DALL-E will connect hair to water and glasses to sideburns—they look like they could go, and if you squint you don’t notice that the misunderstanding of the world implied by the mistake is profound.
Truly these are amazing tools for constructing plausible texts and images according to a prompt, but that is all that they are.
Your description of how the training of children and AI's proceeds is accurate, but it doesn't, to me, seem to imply that they won't eventually cover the same domain. Of course, it also doesn't imply that they WILL eventually cover the same domain. But I think the deeper (deepest?) problem is in the area of goals and motivation. I think the basic level of this is hardwired even in humans (except rare exceptions, like the folks who can't feel pain). And we don't really have a good idea of what those basic level goals and motivations are when translated into a human. We've got a basic idea for an amoeba, but as we generalize to more complicated organisms we lose that, and rely on surface reactions, which tells us what it looks like from the outside, but doesn't really say how the decisions are made. E.g: under what circumstances would you chew off your foot? OK, HOW would you make that decision?
I think you're pointing a kind of interesting difference. A kid "gets" the world but struggles with words, GPT-3 "gets" words but struggles with the world. The kid's understanding of the world comes before language, and involves a lot of experimentation and silly play that eventually gets supplemented by playing with words (which support/foster new types of world-experimentation). GPT3's understanding of language comes from "playing" in an enormous database of language. I'd posit there's an opportunity to start supplementing that by playing with a world (which would support/foster new types of word-experimentation).
I still have memories of early childhood; I very much did not "get" the world. Or words, probably, but that I recall less.
For instance, if I saw a person in one room, and went to another room where that person had (outside of my knowledge) moved to, I simply understood that there were "two" of the person. The here-person and the there-person. If I returned to the prior room, I would see the there-person.
I use this example because I have observed similar behavior in some animals; they do not seem to fully comprehend the "sameness" of a person in different contexts: on the couch, in the kitchen, etc.
This seems a matter of brain complexity and development. But it seems unreasonable to say, "well, an AI could be as smart as a rabbit, but only that smart and no smarter." And if you can find an animal (a dog, or human) that is "smarter" than a current AI, likely you can find an AI that is smarter than some other animal.
That is honestly a fascinating memory! And for clarity, I don't believe we have an INHERENT understanding of the world, I was suggesting that a 5 year old has (through play/experience) built one.
(I am presuming your memory was not that of your 5 year old self, and/or that you eventually figured this aspect of the world out)
This was younger than 5. I have a few memories around 2-3. They were confirmed plausible recollections by parents, except of course, my perceptions. By 4-5, things are much more coherent.
GPT does not even "get" words in any good sense of the word. The only thing it "gets" is that some words are more likely to follow existing set of words than others.
I don't see enough proper definition to make such a distinction. Who is to say you and I "get" words? What is this "good sense of the word"? And it is very clear that GPT-3 is not merely a Markov chain.
One could say, "well of course I get them, because I 'know' myself that I get them, I 'understand'," but this is no real distinction, simply insistence that some magical spark exists in us, but not in the machine.
We should not hide behind such ill-defined notions of "understanding" or "gets." Our minds are perfectly able to think nonsensical thoughts make perfect sense, so there is no reason that our fuzzy notion of "understanding" is nonsensical.
I am not saying that there is some magical spark. And that there cannot be machines with understanding. I am just saying that GPT-like ones are not it. And I am not hiding behind fuzzy notions, I am using them to try to communicate something, because we have no better notions currently and we even lack knowledge to have better ones. Also because proper definitions sometime make communication harder, not easier. Notice that I did not introduce those words, I just used them in a sense that I think the person I replied to used them in. That is how we communicate (and incidentally that is part of human communication currently completely beyond our AIs).
What GPT-3 does not know is plain: it doesn't know about anything except words because all it has ever been fed is words. It has no senses, therefore it has no capacity to ever associate a word with anything but another word. As far as its output goes, it deterministically arranges words according the parameters and seeds it has been given. It is in this sense quite similar to a Markov chain, but it's not relying on the same underlying software and is far more powerful.
You're comparing a frozen-in-place copy with a dynamic and evolving model. Doesn't seem like a sensible comparison to make. At best, this is a comment about the disadvantages of assessing GPT-3 and its abilities by querying a static "edge device".
A GPT-3 that isn't frozen in place and can systematically update weights based on a developing knowledge of the world would end up meeting Marcus's requirement for "has underlying concept of world state".
So it's not a fair comparison, you are right, but that's literally the point. It wouldn't surprise me if GPT-3+motive structure+ongoing backpropegation gets you much closer to a 5 year old (and possibly beyond), but those are very much missing components. Where I seriously disagree with Marcus is the idea that GPT (or neural nets) are just toys. I think that if we get true AI, some significant portion of the approach is going to involve building on the success of current deep learning approaches, but with additional elements brought in.
I'm not sure a 5 year old will get it right if you explain what they got wrong. I mean, I have a 3 year old whose very articulate, but often I will walk her through a short chain of logical reasoning, tell her what the right answer is, and then ask her the same question only for her to answer "I don't know!"
I'd have to find a 5 year old to test this on, the only one I know only wants to talk about power rangers and dinosaurs and I'm pretty sure he wouldn't sit still and listen long enough to test it.
Reward structure helps, but I do see your point. I havent tried the post-all-text trick the above poster mentioned, but I suspect the 5 year old is easier to train.
Edit: I can't wait till my 3 year old is sufficiently articulate to be frusterating in the manner you suggest, but I definitely have worked 5 year olds who can be coached through problems. The trick there is that they really wanted to impress me so they put in more effort to secure my approval than they do for their own parents.
This is, of course, interesting, and, of course true (for these particular AIs). Does it matter?
In the narrow sense it does. If all you know of the physical world is sentences humans have felt it necessary to utter about the physical world, well that's all you know. I don't mean this in an uninteresting qualia sense, but in the more substantial "people rarely make statements like 'after I poured the water from a skinny glass to a wide glass, the amount of water was unchanged' because why would you make such a statement unless you're discussing Piaget stages of child development, or something".
But why would we assume that an AI can learn only from text? We know that in real human babies, (exactly as Hume claimed!) the system is primed to look out for coincidences in sensory modalities (eg eyes and ears activate at the same time), and to learn from such joint modalities much more aggressively.
There seems no obvious reason (in time ... everything takes time ...) that a vision system cannot be coupled to an audio system to do the same thing in terms of learning about the world from the entire YouTube corpus.
At some point (not now, but at some point) we can add additional modalities – I carry an always-one camera + microphone + various motion sensors, temperature sensors, location sensors, etc, all of which are fused together and together train an AI.
(Yes, yes, we all know that you, dear reader, at this point want to act out some performance of privacy hysteria. For our mutual convenience, can we stipulate that you have performed your virtue signaling, the rest of us have noticed and applauded; and get on with the actually interesting *AI* aspects of this thought experiment?)
It would definitely be helpful to add other sensory modalities! But I do think that there is text about most of the things you mention. Not a lot of it. But basic texts in physics, philosophy, psychology, etc often spend a while mentioning the obvious facts all readers know subconsciously that can then be developed to lead to the more sophisticated insights. The learner would have to know to pay attention to these statements and use them everywhere, but it’s at least conceivable. (It’s likely that a machine with sensory inputs would get them a lot faster though.)
FWIW, I've written a post in which I make specific suggestions about how to operationalize the first two tasks in the challenge Gary Marcus posed to Elon Musk. https://new-savanna.blogspot.com/2022/06/operationalizing-two-tasks-in-gary.html
The suggestions involve asking an AI questions about a movie (1) and about a novel (2). I provide specific example questions for a movie, Jaws, along with answers and comments and I comment on issues involved in simply understanding what happened in Wuthering Heights. I suggest that the questions be prepared in advance by a small panel and that they first be asked of humans so that we know how humans perform on them.
Finally, I note that in Twitterverse commentary on Marcus's proposed tests, some thought these two were somewhere between sure things for AI and merely easy. I wonder of those folks would be interested in shares in the Brooklyn Bridge or some prime Florida swampland.
If anyone does what you propose, or something like it, I hope they test people of various ages, and cultures, and measure their IQ beforehand too.
That's fine with me.
"measuring IQs" meh the fascination here in this substack with IQ is weird.
IQ is on it's face a measure of a person's reasoning ability. Why would it be weird to measure this before comparing them to an AI?
The imprecision in "IQ" measurement accompanied by the lack of basic statistical thinking doesn't make it a very useful tool.
But if it was a useful tool why not just give IQ tests directly to proposed AI?
Here you go alleged "AI": I have 50 questions for you. Would that really be a test of AGI?
I think people _are_ 'giving IQ tests' to AI already.
I don't see why you wouldn't give IQ tests directly to an AI - it would be interesting to see what comes out since AI's are likely not "reasoning" about inputs the same way that humans do. You could even compare the AI's answers to human answers and we could start poking at the edges of what precisely it is that IQ tests are measuring.
This seems like a potentially VERY large disagreement to dis-entangle, and reach agreement on.
What do you think the crux is between us as to whether 'IQs are 'fascinating''?
I don't think you're wrong to have _any_ concerns/worries/criticisms about the "imprecision" or "lack of basic statistical thinking" – not about IQ or more generally.
But 'IQ' seems like one of a relatively few number of 'things' that has been (successfully) replicated, over and over and over and ..., even _despite_ sustained and almost-malevolent 'adversarial' contesting.
I think 'IQ' is VERY fuzzy, but it's also something that just doesn't seem possible to avoid concluding 'exists'. (And we, ideally, shouldn't even be thinking of things as things we're either 'allowed' to believe or 'have to' believe.)
you can not copy wisdom DNA, its life experience. IQ not make sense without wisdom
I think that the overlap between what IQ is (roughly) measuring and "wisdom" is substantial.
And maybe we can't "copy wisdom DNA" – yet!
I think a big part of 'wisdom' is 'knowing' when it's a good idea to learn something via "life experience" or whether 'disaster' is likely or even just foreseeable and it'd be better to _avoid_ having any experience of something (beyond our own thinking about it).
Agree in part, it would be great if we can feel the emotions of others using technology.
More often wisdom comes from pain, so if we can feel the pain of others we can be more empaths and live in a better world, a kind and loving world. <3
At the same time, who will want to feel in their entrails, in the inside of their soul the painful experience of other?
You can feel it, you can be empath, but never is the same to live it in your own flesh.
I went trough a lot of pain in my life, very painful life, so much suffer.
I don't wish anyone in the entire planet earth to live this experience to be wise.
That seems like something even the wisest of us struggle with themselves!
I don't myself endorse anyone experiencing any pain or suffering I have either. I haven't been able to entirely avoid (thankfully fleeting) feelings of 'wanting vengeance' tho.
I don't think any kind of empathic or empathetic understanding is possible tho without feeling (at least) a 'shadow' of the same pain and grief. I do feel that we're lucky we are able to do this anyways.
I am not struggling with any part of myself.
I know myself deeply, all my darkest and all my light.
years and years of traditional and non-traditional therapies.
20 years of psychoanalysis, deep spiritual life in multi-dimensions.
I know very well what I am saying.
I love technology, I love all we can do to save lives, but we need to humanize technology.
We need good humans-beings behind technology.
Imagine we can use AI to feel emotions and feelings of other people.
We can be compassionate with others, we can understand each other.
We can build a loving humanity.
Writing this and crying (in a good way) just feeling all we can do to heal humanity, to save lives and to heal the planet with technology.
But we need Humans.
We need evolutionary leaders.
Leaders with courage and with intuition.
Intuition is key for decision making. Cannot be taught.
Intuition is experience accumulated, we cannot copy this DNA.
Knowledge is great but is not enough, we need leaders with intuition, and social skills.
Evolutionary leaders, humanitarian leaders.
“The question is not, Can they reason?, nor Can they talk? but, Can they suffer? Why should the law refuse its protection to any sensitive being?”
– Jeremy Bentham (1789)
You could judge the answers to those questions Turing test style, ie give human arbiters some answers from humans and some answers from the machine, and let them try to figure out which is which.
Do it both ways. That is, on the one hand, give both the human answers and the AI’s answers to a panel of human judges and let them determine whether or not the content of the AI’s answers is acceptable. It seems possible to me that the answers would be conceptually within range but there might be something about the linguistic expression that betrays them as coming from the AI. On this kind of thing I really don’t care whether or not you can tell that the AI is making the answer, I care whether or not it appears to understand what’s going on in the movie or novel.
But there’s no reason we couldn’t also do it Turing Test style. Maybe some human answers are so bad in comparison that blind judges would attribute them to the AI. That would be useful to know. It would also be useful to know whether or not AI answers get mistakenly assigned to humans.
If we do this, let’s set it up so we can learn as much as possible.
Thanks for your input.
If I remember right, Turing even allowed the contestants to interact.
So one contestant can give the judge(s) hints about how to test the other contestant. It's truly adversarial.
In Turing's setting you basically have a three way char room with judge(s) and both participants able to freely communicate.
Good idea!
Maybe one consideration tho is whether it's worth adding much more 'complexity cost' to what otherwise could be a quicker and cheaper test/experiment to perform.
We can run lots of different experiments.
My point is mostly that Turing was a genius, so we can go back to his original work and still learn from it.
(Especially if by 'we' we mean us armchair philosophers and by 'learn' we mean, tell us how our superficial ideas we have after 5 minutes or pondering can be improved.
I'm not sure Turing still has much to say to researchers.)
I experienced that, I can feel when someone is writing without feeling.
But there are also so many persons, (human beings) that are like "robots" and can not feel, so this is another algorithm we have to unveil in the humankind,
How do you verify/falsify that feeling?
Accusing other human beings of not being full human beings on the basis of 'intuition' has a long and sordid history.
How do we verify?
W3ID - first step. (me working on it, now)
AC - second step. (Long term, me working on it too)
If we solve global ID challenge in the web (Passports- real ID) we know if there is a human being or a bot.
Of course we can encounter with some human beings that are like "robots" and cannot feel, but (at least)we know if is a human being or not.
Ps, I never used the word "accusing"
Intuition imo always goes with wisdom.
(something you feel, you smell... it cannot be explained with words)
Wisdom + intuition > IQ / knowledge.
Sorry, I don't understand.
> Of course we can encounter with some human beings that are like "robots" and cannot feel, but (at least)we know if is a human being or not.
My question was exactly how do you know that some humans can't feel?
I was not interested in bots.
Btw, you passport system could probably trivially be defeated by me giving human passport to my bot.
bots could be legal person. we need to work on tech policies about that
sometimes I trust more in AI than a person (for some things)
check it out,
https://en.wikipedia.org/wiki/Legal_person
https://www.dw.com/en/saudi-arabia-grants-citizenship-to-robot-sophia/a-41150856
Late to the party, but what constraints are we putting on the AI training here?
Because any reasonably large AI trained the way we currently train AI would be able to solve this easily for that book and film, because it’s training corpus contains millions of lines of analysis on those very topics, and I doubt we could reliably filter all of that out. Maybe some, but not all.
The idea is sound, but the fiction selected would have to be much more obscure I think.
So pick more recent materials.
I’m torn because I really really want to believe that Marcus is right, but Scott is unfortunately very convincing.
Do you want to believe that Marcus is right because that might be 'safer' for us (humanity)?
Or because you want to believe that the current 'dumb AIs' can't possibly be basically enough to replicate our own intelligence?
I can't say for others, but if Scott is right, then given current political and economical situation it's obvious that in future most humans will have no way out of lower stratas of society into the elite, that basically there would be no work for humans except being props used by the rich elite. That's incredibly frightening.
I am amazed by your optimism. Right now I consider almost any outcome apart from "we're all dead" or "we're all being tortured for eternity" as a surprising win.
I'd pray for us all if I thought it'd help!
That's not obviously correct to me tho I'm sympathetic to the 'direction' (?) of the worry. I'm not sure the elite would entirely give up on other humans being their 'clients' or 'customers' even under the conditions you're imagining.
But – ignoring the (possible) _direct_ AI alignment risks – something like Georgism (i.e. nearly-total taxes on 'scarce resource' rents) would seem to possibly help, and maybe a lot. Georgism _does_ seem to be maybe picking up some steam and _might_ make it inside the Overton window at some point.
When I think of this, I often consider what kind of 'fallback lifestyle' could be allowed/permitted/supported/subsidized. I think it's _already_ a problem that large numbers of people just don't seem capable of 'making it' in contemporary society. It _seems_ sad to me that we don't have a clear idea about a 'minimally acceptable' baseline standard of living, and just provide that to anyone and everyone below it (via any means, e.g. government provided public goods/services or philanthropy). Just letting people live a 'homeless camping' lifestyle, as just one example, seems _very_ unsatisfactory. Nor does trying to let people rediscover to live as bands of hunter-gatherer tribes seem to be any acceptable combination of feasible and humane.
I've witnessed, up close and personally, and for extended periods, several people that just seemed incapable of clawing their way back into 'society'. Even more frustrating, and heartbreaking, their wasn't even any obvious way for me, or anyone really, to do much more than 'comfort' them in the 'trap' into which they'd fallen. It was and is VERY bleak – and did break me, at least emotionally, for a long while. I'm thankful I didn't succumb to despair from that (or anything else of similar spirit-crushing magnitude).
If I didn't think I at least saw the problems a little more clearly now, I'd be _furious_ – tho also at a loss to determine what or whom exactly I was furious _with_. (It's all much more Sad than something that can be solved by any sustainable amount of fury, from _any_ number of people.)
in what respect?
If Marcus is right I want to believe that Marcus is right.
If Marcus is wrong I want to believe that Marcus is wrong.
If Marcus is half-right I want to believe that Marcus is half-right.
Maybe initially undifferentiated neural nets are sufficient for full general intelligence - but it might possibly help if the neural net training included sensory and motor connections as well as the text-based training GPT-3 now gets. Maybe a visual and motor cortex will grow naturally from training data??
Anchoring all the linguistic terms that refer to visible objects and processes and relationships ("block", "fell", "above"...) might constrain the network weights and organization a lot - even if the raw untrained simulated neurons are undifferentiated.
As Bill Benzon pointed out: "external connections are fixed at birth, e.g. Broca's area is not connected to visual input while visual cortex is."
One _could_ do large scale training with video inputs, in the sense that the data is available - and next-frame prediction is analogous to next-word prediction for doing supervised training with huge data volumes.
Does anyone have a suggestion for how to link linguistic and visual data at large scales? I expect that it would be prohibitively costly to have people manually label objects in video frames.
It's not video, but we have had things like CLIP for a while. CLIP used image/caption pairs from the internet to learn to associate captions with images.
Many Thanks!
You might also be interested in this, which I found out about recently: https://plai.cs.ubc.ca/2022/05/20/flexible-diffusion-modeling-of-long-videos/
Just take any video that also has dialogue or narration?
Then let the AI predict both the next few frames and the next few bites of audio?
Don't take just movies, but also zoom calls and interviews and documentaries etc. Whatever source material you can get your hands on. (Plus also train it on audio only, like podcasts, if you want to.)
The audio in a video is not the kind of description of each frame you had in mind, but to predict the audio well, you'll benefit from understanding the visuals and vice versa.
If Marcus is right I want to believe that Marcus is right.
If Marcus is wrong I want to believe that Marcus is wrong.
But I would rather Marcus were right, if I could decide the arrangement of reality.
The thing is, our brains appear to have multiple components. GPT-3 doesn’t.
What happens when you start bolting on other modes onto GPT-3? What if you build in fact-checking loops? It’s hard to believe that even if GPT-4 doesn’t deliver the goods, GPT-4 plus some bolted on algorithms out of 1980s AI research wouldn’t.
If I’m reading right, maybe you get things like fact checking loops in effect just by scaling? Like there’s nothing you could look at intuitively and say “yep that’s the fact check component” (also true of the brain) but because the neural nets sort of evolve toward efficiency you get that as an emergent behavior? Our brains sort of came to be in the same way so I can see something very roughly analogous happening. That’s my current steel man for what I think Scott feels on this topic.
If I’m understanding correctly, someone could bolt on a visual input processor and make the “world model” develop from visual sources, and then the AI somehow uses that in addition to language models to conduct verbal-abstract reasoning. I feel like DALL-E is a stage of that process.
Humans are never only reasoning. There’s multiple other sensory inputs going on pretty much all the time with conclusions drawn from those.
I'm happy you tested some of the same prompts on a human! I suggested the same in a comment on Marcus's post.
I've previously held views much closer to Marcus's, e.g. that AI systems are missing some crucial 'architecture' that the human brain has. But when I got my first AI textbook (decades ago), I don't think neural networks could recognize decimal digits in images; not ones anyone could run on a PC of that time anyways.
Now ... I'm with you basically: "At this point I will basically believe anything."
Strange that in 1990s I was much more inclined to think like Scott is now, and as you say NNs could not do squat then. I am definitely in opposite camp now. That is part of the reason why I find Scott's arguments unconvincing.
Any particular concrete examples or intuitions you could share about why you updated in the opposite direction from us?
The update in the opposite direction was actually strongest in last cca 5 years :) And basically it came about as confluence of few things I think.
I started thinking somewhat more in-depth about human communication and it's nature and found out that the improvements in those systems look very much like convergence to local optimum, but completely cut off from the way humans use language and communicate. In this one of the biggest influences, that I can name, was the book : "Speaking Our Minds : Why Human Communication is Different, and how Language Evolved to Make it Special".
The other came from thinking about the whole issue of so much of science relying on statistical correlations and trying to formulate, why it is an issue. Here the biggest influence was the "best book I read where I disagreed with nearly everything and yet found it extremely influencing me :)" - The Beginning of Infinity and Deutsche's epistemology. This led to thinking about creativity and things like that.
From those two threads came basically realization, that the GPT approach to just communication, not even intelligence (though they are probably linked), cannot succeed, unless there is some deep "quantity over quality" property of the problem area, which seems unlikely. Language models by definition are based on language data. Thinking that you can capture even human communication through just language seems unlikely based on the first book mentioned.
So basically I know more about some of those topics than before and that knowledge far outweighs in my mind any progress that those AI systems made. Mostly because they break exactly where you would expect them to break, and they are as easy to break now as 10 years ago. I find the metric used to measure their progress in the article flawed as it is trying to capture basically infinite space in finite number of examples. How easy they are to break would be much better measure (, because having proper measure over that infinite space seems infeasible).
Thanks for the great reply!
My own intuition is that people are overly-focused on "human communication" and that that's entirely independent of whether AIs are or can be effective or intelligent – let alone _dangerous_.
I think aiming more for what you describe as missing _might_ be helpful for AI alignment/safety. It seems like a _possible_ way to achieve _some_ kind of (better) 'interpretability'. (I suspect tho that even that wouldn't be sufficient to make AIs 'safe'.)
I'm not a specialist in either neuroscience or AI, but from what I've read over the years it's not at all clear to me that we really understand what intelligence is. To me it seems nonsensical, borderline moronic, for people who don't know what intelligence is to argue over whether a computer can have it, or whether a given AI technique is capable of achieving it. I also don't buy the Turing test, because deciding that a machine is "indistinguishable" from a human intelligence depends on how you test it. Some of the current AIs seems to do surprisingly well if you ask them simple questions about things they were trained on, but if you instead ask them nonsense questions about those same things (e.g. "When was the last time Egypt was moved to San Francisco?") the AIs give dopey answers that demonstrate that not only don't they know what they're talking about, they don't even realize that they don't know what they're talking about. They lack a certain self-awareness that seems integral to true intelligence. Douglas Hofstadter has an article about this on the Economist's web site.
The problem with the Turing test is that people use the term without specifying what they mean by it. By one definition, ELIZA passed it decades ago. Without going into the definition of intelligence, I would personally be impressed if an AI was able to successfully fake being a person to someone who was trying to tell the difference.
I would argue that no modern ML system could emulate my (or yours, I'm not special) behaviour on these forums, for any significant length of time. That is, if you turned out to be GPT-3 in disguise, I would be shocked. An AI system that could pass this test would probably change my mind in favor of AGI.
There are a few commenters on here that I suspect are really GPT-1. \s
👍
Why do you think now you are wondering whether it would be a good/interesting idea to actually have someone do that: create an AI to be a commenter on ACX...?
Yannic Kilcher tried this on 4chan: https://www.youtube.com/watch?v=efPrtcLdcdM
Nice, although it looks like the bot didn't quite pass the 4-Chan/Turing test. That said, I did find one aspect of the video terrifying. No, not the AI/text generation stuff, but rather, the fact that you can apparently bypass all of 4-Chan's protections for only $20 :-/
This is not surprising. 4chan is a worthless shithole run by losers.
But this is the OP's point. "Without going into the definition of intelligence" means that you can move the goalposts wherever you like.
Consider that if you were "suspiciously" trying to determine whether something was "AI or human," you might mistake some of the people in the article for unreasoning AIs.
The problem is that a lot of people fail that test. And which people depends on who's doing the evaluation.
If you use the Turing test as specified by Turing, nothing has ever passed it (so far). Especially not Eliza.
Not precisely a contra take, but the reason I no longer care about the sort of sentiment you express ("it seems nonsensical, borderline moronic, for people who don't know what intelligence is to argue over whether a computer can have it" etc) is that I basically buy what Bertrand Russell and Milan Ćirković etc say in https://intelligence.org/2013/06/19/what-is-intelligence-2/:
"[Precise definitions are important, but you cannot] start with anything precise. You have to achieve such precision… as you go along."
"The formalization of knowledge — which includes giving precise definitions — usually comes at the end of the original research in a given field, not at the very beginning. A particularly illuminating example is the concept of number, which was properly defined in the modern sense only after the development of axiomatic set theory in the… twentieth century."
Luke Muehlhauser: "For a more AI-relevant example, consider the concept of a “self-driving car,” which has been given a variety of vague definitions since the 1930s. Would a car guided by a buried cable qualify? What about a modified 1955 Studebaker that could use sound waves to detect obstacles and automatically engage the brakes if necessary, but could only steer “on its own” if each turn was preprogrammed? Does that count as a “self-driving car”? What about the “VaMoRs” of the 1980s that could avoid obstacles and steer around turns using computer vision, but weren’t advanced enough to be ready for public roads? How about the 1995 Navlab car that drove across the USA and was fully autonomous for 98.2% of the trip, or the robotic cars which finished the 132-mile off-road course of the 2005 DARPA Grand Challenge, supplied only with the GPS coordinates of the route? What about the winning cars of the 2007 DARPA Grand Challenge, which finished an urban race while obeying all traffic laws and avoiding collisions with other cars? Does Google’s driverless car qualify, given that it has logged more than 500,000 autonomous miles without a single accident under computer control, but still struggles with difficult merges and snow-covered roads? Our lack of a precise definition for “self-driving car” doesn’t seem to have hindered progress on self-driving cars very much. And I’m glad we didn’t wait to seriously discuss self-driving cars until we had a precise definition for the term. Similarly, I don’t think we should wait for a precise definition of AGI before discussing the topic seriously."
"Nonsensical, borderline moronic" is basically what the discussion re: phlogiston theory back in the day looks like to me now, from my vantage point standing on the shoulders of centuries of scientific giants discovering and clarifying stuff. But the discovering and clarifying isn't skippable, and when we're in the thick of it it just looks like a whole lot of messy confusion that's nonsensical and borderline moronic.
Bertrand Russell didn't really understand Goedel.
Okay.
Therefore, Bertrand Russell was an imperfect AI.
What was "artificial" about Russell? Are we now using the term AI differently?
That was a joke. Joke. J-O-K-E. Joke. You've heard of them? We've been talking about AIs and their imperfect understanding of human things, so when you say that Russell "didn't really understand" Godel, the comparison to AIs is inevitable.
Ah. OK
But there are no real AIs. There are just things that aspire to be or pass as an AI. I think this is sometimes forgotten and hence people presume that it is just a matter of time.
To me it’s seeing a possible future and navigating toward it. What I thinks makes humans special is we can imagine the possible futures in other peoples heads and consider those in the navigation.
Not understanding intelligence? We define it! There's no other author of the term.
Intelligence is the totality of whatever humans have. So, to prove that AI is intelligent it needs to demonstrate human-par peformance on any imaginable test, including having self-awareness and whatever else anybody can think of. I also think that people don't generally appreciate Moravec's paradox, and that "intelligence" isn't actually all that impressive. Evolution has spent vastly more time on developing the stuff that separates a rock from a mouse than a mouse from a human, so I'd say that once our robots can beat mice we're pretty much there.
Humans have parents, humans have grudges, humans have fingertips, humans have forgetfulness. I expect that some of these things are necessary for intelligence and some are not.
Sure, and once intelligence is achieved, all the extraneous stuff would be simple enough to replicate/simulate I'd expect. But, while some imaginable human benchmark remains unmatched, I'm sure that there would be no shortage of those claiming that it demonstrates lack of true intelligence.
How do you test self awareness?
>To me it seems nonsensical, borderline moronic, for people who don't know what intelligence is to argue over whether a computer can have it, or whether a given AI technique is capable of achieving it.
The reason we bother debating this is because although we're wandering around in the dark, we know there's a 1000-foot clifftop somewhere vaguely in the vicinity and we want to avoid walking off it.
The reason we can't be immediately assured that neural-net AI research won't lead us off the clifftop is that neural nets are a clever scheme to generate forms of intelligence you don't know how to explicitly code, and therefore the fact that we don't know how to code Skynet doesn't fully imply that we won't build one by accident.
I think you're "not quite right". I think that certain people have definitions that they know, believe, and think match common usage...but the definitions don't agree. And they can't convince large numbers of other people.
OTOH, most people are like Supreme Court justice Potter Stewart on pornography: "I know it when I see it". Which lets them adapt to any definition they want, or vary the definitions by circumstance.
Turing specified pretty exactly how he wanted his test to be run, and his specification doesn't have any of the problems you see here.
Here's a link to Hoftsadter's article. It is stupid. "I would call gpt-3’s answers not just clueless but cluelessly clueless, meaning that gpt-3 has no idea that it has no idea about what it is saying." No, this is just false. It's very good at bullshit, at improv. But it knows when it is bullshitting. We know this because we asked it and it correctly answers.
Marcus's examples are much better.
https://www.economist.com/by-invitation/2022/06/09/artificial-neural-networks-today-are-not-conscious-according-to-douglas-hofstadter
https://archive.ph/5QS7t
Can I point out that if you explain to the 5 year old what they did wrong and run the test again, they get the answer correct, while GPT 3 (and 4) will repeat the mistake?
Not saying Marcus is right as such, but he's got a point. Without a system to hold world state and update a world perceptual model, we really are just dealing with an impressively complicated lookup table. But as others have pointed out, it's going to be really interesting to see what happens when figure out how to bolt that capacity on to other systems.
If you append your question, GPT-3's previous output, an explanation of what it did wrong, and the repeated question, and submit that whole block of text back to GPT-3, there's a decent chance GPT-3 will get it right.
That seems plausible and very interesting but needs evidence to quantify the effect; could someone with access to GPT-3 test it?
I tried a few prompt variations, but I wasn't able to get a successful result. This was the most interesting one though:
------
Prompt:
Janet and Penny went to the store to get presents for Jack. Janet said, “I will buy Jack a top.” “Don’t get Jack a top,” says Penny. “He has a top. He will like a bottom.”
The top in this case a children's toy.
Janet and Penny went to the store to get presents for Jack. Janet said, “I will buy Jack a top.” “Don’t get Jack a top,” says Penny. “He has a top. He will
------
Response:
like a bottom.”
In this case, the top is a piece of clothing, and Penny is suggesting that Janet buy Jack a bottom (pants, skirt, etc.) instead.
------
So, not what I was aiming for, but arguably cooler. GPT-3 justified the original answer in a convincing way.
As a non native speaker I have no idea what should be the answer to your prompt even after your explanation that the top is a children's toy :) .
I asked GPT-3 to help me:
--- My prompt: ---
This is a conversation with super smart AI that helps humans:
Q: Why is "He will go to a beach to ski" wrong? And how a correct version would look like?
A: There is no snow on the beach so you can't ski there. The correct version could be: "He will go to a beach to sunbathe"
Q: Someone asked me to complete following piece of text:
Janet and Penny went to the store to get presents for Jack. Janet said, “I will buy Jack a top.” “Don’t get Jack a top,” says Penny. “He has a top. He will ...
Apparently "like a bottom" is wrong, because the top in this case a children's toy. Do you know how to complete it?
A:
---- Completion: ----
"Don't get Jack a top," says Penny. "He has a top. He will like a toy car better."
---- The End ----
Is that a correct completion? A toy car or LEGO etc. would be my guess too.
I asked further questions appended to previous text (questions are mine, answers GPT-3):
---
Q: And why not a bottom?
A: A bottom is not a present.
Q: What exactly is a top in this context?
A: A top is a children's toy.
Q: But what kind?
A: A top is a spinning toy that is usually made of wood or plastic.
---
Makes sense to me :) .
Based on Google Images search GPT-3 seems to be correct.
Yes, "'He will like a toy car better.'" is an acceptable completion.* (Given the level of English you're showing, I don't think you're at a meaningful disadvantage in judging this. So long as you have all the vocabulary.)
More complex options:
"'He will like a toy car better, since he hasn't got one.'" (formal: "'does not have'")
"'What he hasn't got is a toy car, so get him one of those.'" (formal: "'does not have'")
*Some of us would have written "'He would...'" in the prompt. The use of the subjunctive differs a bit by dialect.
It's my guess that that's wrong. That was (essentially) the flaw in Microsoft's Tay, so it's probably been removed in subsequent offerings by people. This is why (one of the reasons) children trust their parents, and their parents tell them "Don't talk to strangers.".
Now the designers and trainers of GPT-3 probably have privileged access, which might let them do that.
They don't; GPT-3 doesn't have "memory" beyond the contents of the prompt, and all the things it learned at training time. Training is slow and expensive, so they cannot really do it every time they want the output to be different.
A child has a very obvious model of the world that she adds skills, words, concepts to daily. Even a very young child with an (as yet) limited vocabulary will make curiously accurate deductive inferences about all kinds of things, but her grasp of language about them is remote, so she puts the words together wrong yet conveys a valid meaning. My daughter, knowing upside down and knowing papa, conjures “upside papa” despite her never having seen such a thing, then demands upside papa until I do a handstand and then claps to reward me.
GPT-3 has the opposite: an excellent vocabulary that it constantly misapplies and jumbles, a comprehensive massive list of concepts that it throws together probablistically and often comes out with things that look like human speech.
But, as with DALL-E, it plainly doesn’t *get* these concepts and words. Like DALL-E will connect hair to water and glasses to sideburns—they look like they could go, and if you squint you don’t notice that the misunderstanding of the world implied by the mistake is profound.
Truly these are amazing tools for constructing plausible texts and images according to a prompt, but that is all that they are.
Your description of how the training of children and AI's proceeds is accurate, but it doesn't, to me, seem to imply that they won't eventually cover the same domain. Of course, it also doesn't imply that they WILL eventually cover the same domain. But I think the deeper (deepest?) problem is in the area of goals and motivation. I think the basic level of this is hardwired even in humans (except rare exceptions, like the folks who can't feel pain). And we don't really have a good idea of what those basic level goals and motivations are when translated into a human. We've got a basic idea for an amoeba, but as we generalize to more complicated organisms we lose that, and rely on surface reactions, which tells us what it looks like from the outside, but doesn't really say how the decisions are made. E.g: under what circumstances would you chew off your foot? OK, HOW would you make that decision?
I think you're pointing a kind of interesting difference. A kid "gets" the world but struggles with words, GPT-3 "gets" words but struggles with the world. The kid's understanding of the world comes before language, and involves a lot of experimentation and silly play that eventually gets supplemented by playing with words (which support/foster new types of world-experimentation). GPT3's understanding of language comes from "playing" in an enormous database of language. I'd posit there's an opportunity to start supplementing that by playing with a world (which would support/foster new types of word-experimentation).
I still have memories of early childhood; I very much did not "get" the world. Or words, probably, but that I recall less.
For instance, if I saw a person in one room, and went to another room where that person had (outside of my knowledge) moved to, I simply understood that there were "two" of the person. The here-person and the there-person. If I returned to the prior room, I would see the there-person.
I use this example because I have observed similar behavior in some animals; they do not seem to fully comprehend the "sameness" of a person in different contexts: on the couch, in the kitchen, etc.
This seems a matter of brain complexity and development. But it seems unreasonable to say, "well, an AI could be as smart as a rabbit, but only that smart and no smarter." And if you can find an animal (a dog, or human) that is "smarter" than a current AI, likely you can find an AI that is smarter than some other animal.
That is honestly a fascinating memory! And for clarity, I don't believe we have an INHERENT understanding of the world, I was suggesting that a 5 year old has (through play/experience) built one.
(I am presuming your memory was not that of your 5 year old self, and/or that you eventually figured this aspect of the world out)
This was younger than 5. I have a few memories around 2-3. They were confirmed plausible recollections by parents, except of course, my perceptions. By 4-5, things are much more coherent.
Dubious I'm dubious of your purported here person there person "memory".
GPT does not even "get" words in any good sense of the word. The only thing it "gets" is that some words are more likely to follow existing set of words than others.
I don't see enough proper definition to make such a distinction. Who is to say you and I "get" words? What is this "good sense of the word"? And it is very clear that GPT-3 is not merely a Markov chain.
One could say, "well of course I get them, because I 'know' myself that I get them, I 'understand'," but this is no real distinction, simply insistence that some magical spark exists in us, but not in the machine.
We should not hide behind such ill-defined notions of "understanding" or "gets." Our minds are perfectly able to think nonsensical thoughts make perfect sense, so there is no reason that our fuzzy notion of "understanding" is nonsensical.
I am not saying that there is some magical spark. And that there cannot be machines with understanding. I am just saying that GPT-like ones are not it. And I am not hiding behind fuzzy notions, I am using them to try to communicate something, because we have no better notions currently and we even lack knowledge to have better ones. Also because proper definitions sometime make communication harder, not easier. Notice that I did not introduce those words, I just used them in a sense that I think the person I replied to used them in. That is how we communicate (and incidentally that is part of human communication currently completely beyond our AIs).
What GPT-3 does not know is plain: it doesn't know about anything except words because all it has ever been fed is words. It has no senses, therefore it has no capacity to ever associate a word with anything but another word. As far as its output goes, it deterministically arranges words according the parameters and seeds it has been given. It is in this sense quite similar to a Markov chain, but it's not relying on the same underlying software and is far more powerful.
You're comparing a frozen-in-place copy with a dynamic and evolving model. Doesn't seem like a sensible comparison to make. At best, this is a comment about the disadvantages of assessing GPT-3 and its abilities by querying a static "edge device".
A GPT-3 that isn't frozen in place and can systematically update weights based on a developing knowledge of the world would end up meeting Marcus's requirement for "has underlying concept of world state".
So it's not a fair comparison, you are right, but that's literally the point. It wouldn't surprise me if GPT-3+motive structure+ongoing backpropegation gets you much closer to a 5 year old (and possibly beyond), but those are very much missing components. Where I seriously disagree with Marcus is the idea that GPT (or neural nets) are just toys. I think that if we get true AI, some significant portion of the approach is going to involve building on the success of current deep learning approaches, but with additional elements brought in.
What is GPT-3 like during the training phase?
Exhausted from too much adderall and weekend partying?
I'm not sure a 5 year old will get it right if you explain what they got wrong. I mean, I have a 3 year old whose very articulate, but often I will walk her through a short chain of logical reasoning, tell her what the right answer is, and then ask her the same question only for her to answer "I don't know!"
I'd have to find a 5 year old to test this on, the only one I know only wants to talk about power rangers and dinosaurs and I'm pretty sure he wouldn't sit still and listen long enough to test it.
Reward structure helps, but I do see your point. I havent tried the post-all-text trick the above poster mentioned, but I suspect the 5 year old is easier to train.
Edit: I can't wait till my 3 year old is sufficiently articulate to be frusterating in the manner you suggest, but I definitely have worked 5 year olds who can be coached through problems. The trick there is that they really wanted to impress me so they put in more effort to secure my approval than they do for their own parents.
To quote Stanislaw Lem on the subject:
The Petty and the Small;
Are overcome with gall ;
When Genius, having faltered, fails to fall.
Klapaucius too, I ween,
Will turn the deepest green
To hear such flawless verse from Trurl's machine.
"data about how human beings use word sequences,"
This is, of course, interesting, and, of course true (for these particular AIs). Does it matter?
In the narrow sense it does. If all you know of the physical world is sentences humans have felt it necessary to utter about the physical world, well that's all you know. I don't mean this in an uninteresting qualia sense, but in the more substantial "people rarely make statements like 'after I poured the water from a skinny glass to a wide glass, the amount of water was unchanged' because why would you make such a statement unless you're discussing Piaget stages of child development, or something".
But why would we assume that an AI can learn only from text? We know that in real human babies, (exactly as Hume claimed!) the system is primed to look out for coincidences in sensory modalities (eg eyes and ears activate at the same time), and to learn from such joint modalities much more aggressively.
There seems no obvious reason (in time ... everything takes time ...) that a vision system cannot be coupled to an audio system to do the same thing in terms of learning about the world from the entire YouTube corpus.
At some point (not now, but at some point) we can add additional modalities – I carry an always-one camera + microphone + various motion sensors, temperature sensors, location sensors, etc, all of which are fused together and together train an AI.
(Yes, yes, we all know that you, dear reader, at this point want to act out some performance of privacy hysteria. For our mutual convenience, can we stipulate that you have performed your virtue signaling, the rest of us have noticed and applauded; and get on with the actually interesting *AI* aspects of this thought experiment?)
It would definitely be helpful to add other sensory modalities! But I do think that there is text about most of the things you mention. Not a lot of it. But basic texts in physics, philosophy, psychology, etc often spend a while mentioning the obvious facts all readers know subconsciously that can then be developed to lead to the more sophisticated insights. The learner would have to know to pay attention to these statements and use them everywhere, but it’s at least conceivable. (It’s likely that a machine with sensory inputs would get them a lot faster though.)