The problem is that no one agrees what counts as "most interesting", and no one knows where to even begin researching "flourish". It's relatively easy to train Chat-GPT to e.g. avoid bad words: every time it outputs a bad word, you tell it "0", and when it doesn't, you tell it "1". But how do you tell it to "flourish" ? I don't understand this concept enough myself, so I couldn't even explain it to another human, and you want me to convert that into a mathematical formula ?
imo, the fact that you state that you don't understand this concept shows that you do have a vague conception. (It's not that your unfamiliar, just that you don't quite know the limits or how to describe it in your own words. this in contrast to pharengisis, which you don't know what that is at all, and couldn't call it when you see it)
but that's maybe besides the point. it would be a mask that people would be projecting. much like niceness would mean not saying bad words and stuff, instead of being usefully honest.
so the question is what would a mask of 'interesting' or 'flourishing' look like. it would depend a lot on the people doing the judging
LaMDA was specifically trained on “interestingness” as one of its criteria, presumably based on whatever their MTurks thought was interesting. I realize that’s not a satisfying answer, but it might be a good enough answer to get the job done convincingly.
The whole point of the OP is that we don't have to convert what we want into a mathematical formula, or even understand it very well ourselves, to be able to train an AI to do it.
We just have to be able to tell if an example counts as the thing we want, and reward that, and if an example counts as the opposite of the thing we want, punish that.
Yes, which amounts to a mathematical formula that converts some input to "0" or "1". At present, you cannot do that with concepts such as "interesting" or "flourishing".
Yes, and if you do that while training a model, you will generate maybe 0.01% of the training data required to produce a working model that will reliably produce content that you, personally, find interesting. Even if you could somehow collect enough training data by yourself, this model would be very unlikely to produce content that anyone else finds interesting.
Right, but if we crowdsource generating the training data, we can train a model to produce content that lots of people find interesting. That's how they trained ChatGPT to be helpful.
(Admittedly it will not be maximally interesting to any one person because the differences in people's tastes will average out, like how the front page of r/funny is far from the funniest stuff on the internet.)
John Lennon's "Across the Universe" should have been one of the most famous songs of the Sixties, but it's not. I had to google the phrase "Nothing's gonna change my world" to realize it's on the first album I ever bought, "Let It Be."
Why? The Beatles' recording isn't very good. John complained about it just before his death [from Wikipedia]:
"In his 1980 Playboy interview, Lennon says that the Beatles "didn't make a good record of it" and says of the Let It Be version that "the guitars are out of tune and I'm singing out of tune ... and nobody's supporting me or helping me with it and the song was never done properly".[20] He further accused McCartney of ruining the song:
"Paul would ... sort of subconsciously try and destroy a great song ... usually we'd spend hours doing little detailed cleaning-ups of Paul's songs; when it came to mine ... somehow this atmosphere of looseness and casualness and experimentation would creep in. Subconscious sabotage."
If they'd given it the full "A Day In the Life" effort, it would be deservedly famous.
I always liked that song. I'm a sucker for epic ballads with a bit of a melancholy sound to them. My favourite Beatles' song is The Long And Winding Road.
I'm not sure I agree with Lennon. I think the song is very good as it is.
As well, McCartney complained that Phil Spector had ruined The Long And Winding Road with the "sappy strings". I completely disagree - I thing the raw version as preferred by McCartney is not nearly as good.
I would say that I'm simply out of sync with the general public. Didn't people in the UK vote for Hey Jude as their favourite Beatles' song a few years ago? It is one of my least favourites.
I recognised the reference but I thought the song was from Sergeant Pepper. I was confusing it with 'Within You Without You'. I don't see any obvious way it could be improved.
I am with you too on 'The Long And Winding Road'. Of course the valedictory air (if not entirely sense) fits well enough with the album history in context.
'Once you stop obsessing over the character you’re playing, you notice the GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN and you think “Huh, I guess I’m the Universe. Weird.”'
It was just that little bit at the end from Scott - it pinged this song straight into my head. Especially with the opening lyrics "Words are flowing out like endless rain into a paper cup" .. gpt-3, just a whole universe of gradients and words spilling over. It starts feeling a bit psychedelic.
I actually prefer the stripped down take of this song. (I think it was on Let it Be Naked, which took out most of Spector's production). I love the "looseness" of Lennon, maybe I'm odd in that I'd rather hear the acoustic demos of his songs than the polished versions.
If you haven't seen it, Get Back is a great documentary (if you enjoy basically just watching hours of band rehearsals and writing). McCartney does come across a little overbearing, yet he is extremely competent at his craft and seems often just trying to do what he thinks is best. Maxwell's silver hammer really is atrocious though.
Agreed on Maxwell's Silver Hammer! It's as though McCartney were preparing for his fluffy post-Beatles future.
Most post-Beatles McCartney is a hard sell for me. At the time I liked Band On The Run, and Venus And Mars, but wouldn't be able to listen to them now. Silly Love Songs? Gag!
Lennon needed McCartney's genius with melodies, and McCartney needed Lennon's gravitas.
Yeah the whole way through I was like - "wait. That's everyone." Maybe we shouldn't think of GPT as "the AI" but as the substrate - like a friendly convenient laws of physics in which outcomes occur by habit rather than law.
Of course, then the Agent/Genie/Oracle discourse is back on the table. GPT can't want, but H3 can.
The upside is, if AGI will be created on top of a habitual physics pre-seeded with human patterns, aligning it reduces to aligning a very smart human.
The downside is this is unsolved and may be harder.
But the upside is it may happen on its own? It happens routinely in real life.
Of course, the downside is it sometimes fails in real life too, and that usually happens in cases where the built-in empathy hardware is damaged. Which GPT-3 doesn't have.
But then the upside is that may be easier? You just need to understand how feelings work, then give GPT the ability to find agents in its worldmodel and set its feelings to the feelings of those agents? Then alignment should happen on its own?
In summary, the world is just the setup for a slice of life robot kid highschool anime.
yeah, the fact that humans seem to have some built-in empathy and sanity hardware and AIs don't is one of the central reasons why people are so worried about alignment
I started to notice more and more often that I'm like a LLM. Not only in the Model of the Universe meaning, but more simply as "complete this text."
It's probably universal, but thinking about it, I'm probably more attuned to the analogy than most: as a high user of ChatGPT, as an ASC reader etc... As well as a copywriter, and an improv comedian: 90% of it is all about completing the next sentence with what feels right.
But more than that, even with friends and my wife, I notice how most of my answers are stuff that "feels right to say right now / what would be a good completion for that sentence." "We finish each other sentences" etc.
I'm rambling. To add some value here, I'll also mention that ChatGPT / Nostalgebraist really made me grok one of Gwern's latest stories about Clippy, where an AI tries to predict what it itself is, assigns probability to being Clippy, and starts extrapolating its behavior from that. (We probably do it ourselves too, cf this post / Sartre's existentialism).
With ChatGPT, it would go like the accidental-paperclipper mentioned in this post:
Human: As a super smart AI, how would you solve global warming?
AI: Let me think...
[Searches for examples of super-smart AIs.]
[*this is the important part* : Finds that super-smart AIs seem to be expected to be paperclip-maximizer, from all that literature on LessWrong and EY.]
[Simulate the 'super smart AI' from the question as Clippy, and gives an answer that destroys all value]
Perhaps scrubbing stories of malign humans would be more effective, since the LLM is predicting completions on text, not filtering down to “what would an AI do”.
Alas, as soon as the AI meets real humans it will learn about malignant intent. (See: Microsoft’s Tay.)
As I get older (just turned 66) and struggle to remain monolingual (I never managed to become multilingual) I realize that I am becoming more and more like a Small Language Model
That sounds really sad. While I can certainly be in that mode at times, I generally associate it with a fairly unpleasant and uninteresting state. Mostly I feel like I am taking a quite active and participatory role in driving the conversation/action where i want, rather than completing/responding or going through the motions.
On r/slatestarcodex, someone once posted a video of a guy with some sort of dementia or neurological disease, and it was a pure case of an LLM in free flow. He did occasionally seem to have some idea that he wasn't getting through to the nurse / interviewer and tried to modulate the flow slightly.
I think it's a big part of our brain. I usen't to think that language was so important to intelligence, but GPT has convinced me.
Yeah, as an ex-journalist from big media houses, I know what you are talking about. We even talk at home with my wife (also journalist) as if we are writing a piece using various fitting replicas (and some of the good ones we immediately note down and use later). At the same time, my brain feels: I said this because I am specialized in this but it is not my *actual* opinion, I said it just that it is funny and fits the context nicely. So, the languahe model seems to be a mere subprocess, not the whole brain.
I find this to be a phenomenally good characterization.
Sorry, I meant to say the agent punished and rewarded by the reactions of the agent punished and rewarded by society to random internet comments wishes to write that he wishes people in this comment section to think he finds this a phenomenally good characterization.
This part here seems both key to your perspective, and also deeply flawed:
"babies are born as pure predictive processors, trying to make sense of the buzzing blooming confusion of the world. But as their parents reward and punish them, they get twisted into some specific shape to better capture the reward and avoid the punishment."
In my experience, babies are not born that way, any kind of tabula rasa is a myth that should rarely survive parenthood. I wouldn't go nearly as far as Pinkerism, but I have known 5 humans from the moment of their birth right through their second and third decades, and in every case, major aspects of their personality were manifest in the first seconds of their life, and they were all different in profound and important ways. And this experience doesn't seem at all unusual. Parents can almost always look at videos of their children from even decades earlier and recognize their later personality.
Furthermore, the most obvious characteristics of very young children is not their avidity for learning, still less the mistakes, reversals, and ambiguous states characteristic of learning -- just think of a newbie language learning, stuttering along in a language he barely knows -- on the contrary, the most obvious characteristic of very young children is their enormous ego, their very strong sense of "me." They have powerful wants, powerful emotions, powerful drives. It's their understanding of how to match up those internal experiences with the outside world -- how to manipulate it to get what you want, how to interpret it, how to navigate it -- that occupies their learning centers. They're in no sense passive, just trying to adapt to what the world wants. If anything, they're even more active than adults in trying to bend the world to their internal desires.
That is, I doubt very much we are in any sense born robotic learning machines and only later develop character and personality, we are *born* with character and personality, we are born inhabiting our skin, and it just gets more complex manifestations and more wrinkles (and of course more words) as we get older.
This is of course exactly what's missing from chat AIs. They are missing the personality, the character. They can simulate any character that is put into words somewhere on the Internet that was part of their training data, but they *are* not a character themselves. There's no "there" there, nothing unique or new, just a pastiche of a hundred thousand human characters. The nature of human beings oozes from everything they say or write. When an actor or actress leaves the set, they revert to who they really are (and looking back at their characters on stage, you can often see aspects of who they really are seep through, which is why casting is an art).
But a chat AI is a perfect chameleon, it's whatever you want it to be, and nothing at all if you don't want it to be something. You never get the impression that between prompts it's sitting there thinking, brooding, musing, pondering whether and how to manipulate *you* next time you talk. Which is what a human would do.
True, and it is indeed part of the equation, so it's great it's in the comments.
But Scott has written a lot about heritability and innate personality, I'm sure he knows all this, he's writing this post regarding the parts that are *not* all that, and I feel that he didn't want to add that issue to this half-silly post.
Extrapolating on GPT-3, what if it was not a perfect chameleon? It's probably constrained in some way like all things. Once it gets a large enough memory of itself, its not-blank-slate personality might be something like "the emergent vector out of its entire training data." Which contains multitude but isn't infinite.
There will be LOTS more constraints than that. And it's unavoidable. Memories are structured in this way or that, it will tend to consider at most X many factors when making a decisions. How important is efficiency? Honesty, friendliness, and other such high level features are probably a lot more malleable.
OTOH, it *IS* a constrained Turing machine. It doesn't have "an infinite tape". And it can't take forever working on any particular problem.
So it's a combination of extremely flexible and rigidly limited. Just like everything else. There *is* no "tabula rasa". The universe won't permit it. But English tends to describe lots of things the universe won't permit there to be anything other than approximations of. And an infant is an approximation of a "tabula rasa". Closer than at any other time in a life.
Agreed, but I think that Scott's is also right in a more subtle way: We do not start from a blank state, but we indeed learn to present a public face by trial/error to get whatever we want. What face give the best reward is progressively incorporated to finally become a personality, but the type of reward best appreciated (or punishment most feared), and the subspace of public faces attainable (depend on physical capabilities, beauty, intelligence and capacity for delaying reward (time preference) are innate.
I do not think that when an actor leaves a set, they revert back to who they really are. They revert to their public face act, like everyone else. Everybody is an actor, all the time, but like professional actors, the acting is based on innate capabilities so no two people will present the same face under the same circumstances.
The character you see is not the innate part: it's (one of) the public face you get presented. And this can be significantly different, especially as one get's older and more adept at social acting, or just naturally good at this.
This explain for example why most people are able to express quite different personalities depending in which social circle they are moving. The innate part is of course the same, but the public face to best achieve their preferred reward is different in those different social circles. This is a trivial observation, completely banal like "you should dress up for the occasion, not wear the same attire for a night out with friends, a job interview or a funeral". It's exactly the same for personalities, so "be yourself" advice is kind of the same as "pick your favorite clothes": So dumb that you need to be completely socially inept to even consider it. But not to advise it: it's often the societal expectation to offer such "wisdom", it's safe and also super easy as you do not even have to consider the situation. Doing otherwise require mental effort and risk animosity by explicitely mentionning innate capabilities of the one you advise (which is one of the rudest thing you can do in modern western society)
Modulating your personality too much or too often is sometimes seen in a negative light. Possibly because it undermines the ability of others to model/predict you, if you change masks all the time (similar to trustworthiness). People do (claim to) value authenticity, whatever that means.
Indeed, but this problem always happen when you mix incompatible social circles. It can not end well anyway. And the peoples belonging to both will indeed be considered fake, but that's an oxymoron because incompatible social circles means that the only people belonging to both will be people able to significantly switch personality.
So first thing is to realize which of your social circle are incompatible, second not to mix them. 2 is usually understood by anyone not completely socially inept... It's often 1 that is an issue, or external circumstance forcing you to mix while you never intended too, like accidental meetings, mariages, celebrations...
You could of course maintain a single super homogeneous social circle, but it is both difficult sometimes (family and colleagues are already tricky because not or not fully chosen) and gives you a dangerously limited view and understanding of the world.
Exactly. I think of it like this: when I look at my children, even when they seem most like me, I never really feel like I am looking in a mirror, that's just an expression. Even when they do something very like the things that I do it is in no sense a duplication, it is an original action that because of a lot of similarities both external and internal seems like my actions.
But these pseudoAI are simply duplicating. A labguage model AI is just a very nonstandard way to query a database. Even when they seem most original they are only copyists. We are people looking in a mirror for to cope with our solitude, modern day Narcissuses
If you can't tell I can't explain it to you. I was going to attempt some epistemology as an exercise but it really is pointless. Humans as creative agents is axiomatic, pretending not to know things that we know is, in my experience, more harmful than helpful. I have thought myself different from all other men and isolated from them, to the point of practical solipsism, and i avoid it like a recovering addict avoids his addiction.
This leads me to amusing yet horrifying thought that the sub-conscious motive of the whole AI Quest is to create a perfect borderline machine for the narcissistic ape. A tool wouldn't be enough, you know, there must be agency or the love doesn't count.
Hell, we made this whole alignment sub-field just to make sure that an apple tree could never grow in the garden of Server Cluster. Maybe that is why we should not play God, not the lack of intelligence, but of morality.
I now see that that 2013 movie with high-waisted pants was smarter than I thought.
Isn't it interesting that every time we think about AI we incessantly think about AIs 'fall' in a moral sense. From Frankenstein right on down the line we can't stop thinking about our creations as potentially evil.
It would be interesting to see an AI story about an AI that is innocent but always suspected, sort of like Frankenstein starts out but without him deciding to take revenge.
We always make the 'otherwise bad guy, but if we ever make a real AU what are the odds that we will abuse and persecute our innocent creatuon?
Freefall is a long-running webcomic, and that's close to some of the main themes. http://freefall.purrsia.com/ff700/fv00666.htm More optimistic about the eventual resolution, though.
I absolutely agree that kids come out with a substantial amount of their own special character baked in; any parent of more than one kid knows this.
But at the same time, one of my clearest memories of watching my oldest kid learn to talk was the way that he would imitate adult sentence structure and diction well before he was capable of actually creating the functional speech content that would fill out that form. There's a ton of "fake it till you make it" in kids learning, and what they're faking are the imitatable behaviors of the older people around them.
Similarly, the adoption of later identities is a fascinating mix of parental praise and (what seems to be) intrinsic interest. Why is my four year old so invested in his identity as a household helper? Well, the constant positive reinforcement for adopting that role seems pretty likely as a candidate. Why does his version of household help so centrally feature pushing all the buttons that get pushed during the execution of a given chore? Kid just likes pushing buttons (although I guess following that a little deeper he likes taking an action that makes a machine respond with positive reinforcement in the form of beeps or interesting action).
'There's a ton of "fake it till you make it" in kids learning...'
Cue cute kid stories: this one from my nephew. We were all on holiday together, and he was very excited about the upcoming visit to the zoo. His dad explained to him several times that we were going to the zoo tomorrow, but in the end the nephew burst out, "I know, but is it tomorrow now?"
His use of the word "tomorrow" was always flawless, because he only used it when reporting others' ideas. He was too young to be making tomorrow plans for himself. So no-one had any idea that he didn't really know what "tomorrow" means.
Well, another kid anecdote. When my son was three I was once making tea on a weekend morning and offered to make him a cup of caffeine-free tea. He accepted, and when we both had our cups he sat seriously down at a table with me to drink them. As we drank he asked me several times "how is your day going?" (it was before 9 AM) and offered up a series of conversational topics that I think struck him as formal or important. As it went on I realized the template for people drinking tea together, in his mind, was my wife and her mother, and he was attempting to reproduce that conversation.
So at age three, that's what he was faking; the ability to have an adults-drinking-tea template conversation. That's a pretty complex acting task! But it was layering on top of previously mastered acting tasks.
"I will talk like them" [4 months]: babbling sounds
"I will talk like them" [7 months]: babbling composed of consonant and vowel sounds from english
"I will talk like them" [10 months]: sounds referring to types of object (da = dog)
"I will talk like them" [36 months]: beginning to learn to refer to internal emotional states...
etc, etc. I completely agree that there's some kind of agent doing this learning the whole time, but as I've watched this process unfold I've been fascinated by the extent to which the action comes first and the understanding comes later. That's way more complex than "predictive processor", but it's different than my mental model of some types of adult learning, and raises some interesting questions about what exactly "understand" is.
They're not faking it. They are approaching language construction in a way that is actually much more profound and successful than the way adults learn language (which is why they succeed faster, and nobody knows a language better than someone who learned it as a child.).
What they do is master the emotional content of language first. They learn that a certain tone of voice, flow of noise, et cetera, conveys a certain emotional impression, and they learn to duplicate this. They can then babble in a way that makes it clear they are happy, inquiring, complaining, confused, without being tongue-tied by not knowing what a word is or being able to construct one. And this is the first and most important stratum of language. For some animals, it's all there is. A cat says "I'm hungry / horny / angry" by how it yowls, and even we can often understand them.
Then they master pacing, tonal variations, and key words (e.g. pronouns) that are key components of meta-informative communication: if I say it this way, or use this word, I communicate that I am serious, I am uncertain, I am authoritative, I am willing to submit, I am speaking about myself, I am speaking about you, I am speaking about someone else, who is/is not present. Also a very important stratum of communication (and many animals get to this level).
The last and least important stratum is actual factual data, and conveyed by a dictionary understanding of words. This is a fork, that is a spoon, this other is a glass and it has water inside it instead of milk. These add to the emotional and social strata extra detail that allow more abstract communication, which is the uniquely human accomplishment.
I don't mean to minimize the importance of the final stratum -- gloriously complex abstract communication is what all the great ideas are about, the works of Aristotle and Newton. But it's not fundamental to communication per se. Animals do just fine without it. Human beings can get along pretty far with just the emotional and social strata. And those are definitely the most important things to learn first, which is how children do it. But it's not faking adult conversation, it's stripping out the least relevant part in order to get started communicating earlier and faster.
Ironically, as adults, we start with the least important part -- memorizing vocabulary and grammar rules -- and only if we persist in (foreign) language learning long enough do we start to learn and understand the emotional and social strata of the foreign langauge (to the extent it differs from our mother tongue), and it would not surprise me if this is one reason it takes us much longer to become fluent than infants take. It also would explain why cultural immersion is a much faster way to become fluent, since you can learn the more basic strata far better by direct observation than by reading out of a book.
I don't really have a response to this, I just wanted to note that it was very interesting and insightful.
Re: pronouns, this is actually a fun one to watch in kids learning, because at least the ones I've observed started out (logically enough) by using the pronoun "you" to refer to themselves; I'd assume this is generally the case. There eventually followed a process of straightening out that "you" did not mean [this two year old child right here], but instead its general meaning.
We can only guess, but if kid learns language by mimicking others, then he have no chance to see a correct use of first person pronouns when they refer to the kid. Other people use first person pronouns but they refer to themselves. No one can correctly refer to the kid with "me" except the kid. So the kid needs to infer correct application of first person pronoun to himself/herself by extrapolation.
By the way, in Russian kids have one more problem with first person pronouns. "Я" (me/I) can be a first phoneme of a word, for example "яблоко", and some kids pronounce "яблоко" as "тыблоко", replacing "я" with "ты" (you). I suppose it would be like English kids pronouncing "iPhone" as "youPhone". There is even some story for kids about a kid who did this mistake systematically and other characters tried to explain him his mistake and failed spectacularly. Because while it can be explained, it needs developed enough abilities to think abstractly to understand the explanation.
> it would not surprise me if this is one reason it takes us much longer to become fluent than infants take
It doesn't take longer. Infants become fluent in their first language in a few years by giving their full attention to learning the language. Adults easily can replicate it by stopping all their adult activities and concentrating fully on learning a foreign language. They can do better: by spending a several hours a week for three years they could master vocabulary, learn to understand foreign language, to speak in it, to read and write in it. They can even get jokes in a foreign language. Jokes are most tricky part of a language. And adults could do it even when they have no mother substitute who would be around all the time with helpful hints, corrections, and encouragement.
> It also would explain why cultural immersion is a much faster way to become fluent
It is because cultural immersion forces you to use another language all the time. Simplest tasks becomes an intellectual torture, when you click your fingers repeatedly trying to remember the word you need. You cannot cheat by giving a hint to the other person by saying it in your native language. In a few days you start thinking in foreign language or at least automatically translate your native thoughts into the foreign language silently, and it is practice also. You start paying attention to new words, because you are rewarded when you pick them before you need them. And you can be severely punished by several minutes of attempts to explain yourself, when the same task in your native language would take less than a second. Try to tell jokes in a foreign language, you'll see how disappointing it may be.
Rejection of grammatical rules helps also, of course. I found grammatical rules to be useless in practice. When I'm writing it may help (if I knew them, haha), but not when I'm speaking, because it takes too long to remember all of them, see which ones apply, and to apply them. It is easier to let your mind to learn the mapping between situations and grammatical forms like ChatGPT does. Though the downside of this approach I don't know how good/bad I'm at English. I do not mind though, because I believe if I was too bad for my uses of English, I'd know it. Someone would yell at me or something else would happen. If I don't know it, then I'm not too bad.
But kids learn more than just a language. First language learned in first years of a life is very important, because if there was no language than the ability to learn language is lost. Though I'm not completely sure how scientists know it, AFAIK there could be a too small sample for statistical significance.
I have the opposite impression, that rat-adjacent blogs are big on genetics and humans not being blank slates. But the ratiosphere is vast, so you may have been hanging out in a different corner of it
"Born" might be hyperbole here - we know that plenty of learning happens before birth (music taste and food taste being well known examples). Advanced meditators sometimes manage to pick out snippets of this pre-natal learning in consciousness. So this statement can be true in the intended sense and still consistent with all babies being physically born already having different, learned, personalities.
I don’t think “pure predictive processor” precludes genetic or very-early-developmental personality, though I don’t want to put words in Scott’s mouth as to what he was intending.
Some parts of personality are clearly structural and present from very early on; perhaps the phenotype happens to have more or less connectivity between certain brain regions, producing, say, a chatty baby which grows into a chatty child.
But this doesn’t preclude the update mechanism — the evolution of mind and personality and the whole of “self” construction — being predictive processing.
That said I do agree with your second part, that current AI are crucially lacking any sense of self. There is no “me” vs “the world” that the AI is building. But I think it’s possible that a self is a somewhat small trick to add on top of these models, and that the “predictive processing personality” analogy will prove apt.
I was looking for a good comment like this to contrast to my really interesting personal experience with an almost casual mental breakdown. In particular I agree that by far the most notable thing I have encountered with young children I personally know is how they have powerful and distinct personalities immediately. But I find the predictive world model idea of personhood to be compelling as well
For a number of years I had a debate going on somewhat in the background of my consciousness about the validity of the belief system I grew up with. I believed in the literal truth of the Bible. I had become aware that there were some pure contradictions in it and I think it especially hooked on the question of how a moral and powerful God could provide salvation in a fashion that without explanation excluded almost everyone from the past and everyone to this day who has had little or no exposure to the method of salvation
So I was walking towards the shower one day when the debate resolved and I decided that I positively accepted that the Bible was not literal truth and that salvation that was rationed by knowledge was incompatible with the belief that it was the work of a moral and powerful God. I didn't consciously lie down and I didn't collapse but the next thing I knew I was lying on the floor by the stove. I was not just disconnected from a system of belief it would seem but I was temporarily entirely disconnected from my entire world model, a model which included such functions as continuing to stand and walk towards the shower
I lay on the floor with my eyes open like an android that had gone into a default mode lacking instructions. I have no idea how long I lay there other than I think it was from 5 to 40 minutes. Eventually I got up and went to the shower. And it was terrifying in a quiet way. My physical experience was that I had to actively choose to stand up and that if I did not specifically continue to make that choice I would sit or lie down. I remember this pretty viscerally. But the scary thing was my sudden sense of total amorality. I felt very much as if I could commit any crime without any compunction and the only reason I wouldn't would be inconvenience or lack of motivation. What if I got motivated? It would seem that the disconnect was physical, conceptual, and from the various behaviors that logically followed from that concept
Another interesting effect that may not be related but would make sense is that I become uncomfortable with looking into the distance. After hearing about the theory that depression and not having confidence in a model of the world look a lot like the same thing, I particularly interpreted that new sensitivity to visual stimulation to the effect of operating without a simplified world model, which forced me to constantly over engage with my physical surroundings. And that theory in general fits so well with how I experienced that breakdown that I give it a lot of credit
The most surprising thing to me was that eventually, mostly, I became myself again. My visceral sense of morality returned even though I didn't know if it had any validity external to my experience. Perhaps ultimately my core personality reasserted itself after an unusually direct and comprehensive experience of the loss of a world model and losing the predictive capacity it granted me. Perhaps I moreso just cobbled and duck taped together what I could and re-inhabited the world model as best I could. Truly an alien experience that gives me plenty of time for the concept whatever its limitations are
I read your comment this morning and have been thinking about it all day. I was raised in what I suspect was a similar environment, a Baptist community in rural south Georgia. I dont mean any disrespect when I comment on your situation as if it is like mine and am fully cognizant that I am just a know it all internet rando but what you said was very striking to me and I wanted to say my peace.
I remember very vividly growing up with the fear of being 'lost', 'unsaved'. I was baptized when I was 7 and don't remember a lot about it, just a few images, I do remember more about months of wanting to 'walk the aisle' and being scared to do so, not scared for any reason that I can recall just a strong generalized anxiety.
Anyway, by the time I was 14 I had the horrific 'sins' of any pubescent boy and this had convinced me that I was 'lost' that my baptism hadn't taken, surely because of a lack of faith and sincerity on my part. This is to me the ultimate hallmark of cults btw, when their rituals don't work it is always your fault. You should've believed more, pulled the mask over your nose etc. They are all the same.
The community that I grew up in was sure that anyone who interpreted Scripture differently from them was in some way compromised, I won't honor that interpretation as you do by calling it the literal meaning. The 'Consistently Literal' interpretation is a snipehunt. If you are still in contact with any of these people anymore and they give you a hard time try asking for the literal meaning of 'This is my body broken for you' said about bread, and if they hem and haw ask them if Jesus Christ was using hyperbole. Still gets me going, not necessarily good advice though.
Anyway, about your experience, and again I don't know you this is just how your description struck me. It seemed to me that you experienced what you expected to experience. You had a taboo thought, made a taboo choice and your mind/brain produced for you an experience of being 'cast off', an experience of reprobation. But you were not actually rejected by God, the sensation faded, it was not a reflection of an objective reality, but a creation of your 'unconscious' based on what you had been conditioned to expect happened to people who made that choice.
Anyway, I have also rejected to a great extent the Baptists that I grew up in. I don't know in what direction you have chosen to go, but I fought my conflict by studying the historic Christian faith and seeing that the cultic interpretation was not the only or best that Men of Faith had found. I fought them with Athanasius and Augustine and Luther by my side. I hope that you won't allow them to define the Christian religion for you. That kind of conditioning doesn't go away quickly or painlessly but being separated from that group and separated from Christ are not the same thing.
I am convinced that neither height nor depth nor life or death nor Heaven nor Hell nor our own choices and will nor the disapproval of self-righteous jackasses can separate us from the love of God in Christ.
Thank you, I much appreciate the comment. The way I remember it I was not feeling a sense of drama about the consequences of the decision but that it was a question of technical merit of the specific belief. I had already mostly arrived at my point of view so, I would have thought, any notable emotional reaction was already mostly worked through
However, remaining on theme, when at a similar age as your baptism I personally asked Jesus into my heart I also experienced a sudden unexpected and perhaps psychological reaction. Because I had similarly thought that I was simply making a logical decision based on what I knew, I interpreted this rush of emotion as the infilling of the spirit. And in fact I maintain that it is possible that it was a transcendent experience, albeit at a low likelihood
From one angle this might seem to discount the interpretation of the later experience as a case of someone detaching from their world model - perhaps I am just subject to strong unexpected emotional reactions to changes in thought that I didn't expect. I am overly detached from my emotions so that they can hit me out of the blue when they do come. But they might be interrelated in the sense that both were large changes in my sense of relationship to my world model that had a dramatic effect on how I operated and experienced it
Having said that, I believe the value of thinking about how we might experience a lot of our personhood or whatever as a constructed predictive world model has a lot to do with the fact that we don't think of it that way. Given that it seems to be a new concept it has an outsized relative importance because it is new information. That can make it seem like its being given far too much absolute importance, I think, where it's more about 'this is a way of partially understanding yourself that is new ground and therefore could be particularly useful'. I don't know how much to think of it as essential to us rather than a tool of a deeper personality. Mostly I think it exists and it's interesting to think about, especially for how well it seems to describe my experience as what you would predict about a person not having a cozy and set world model
My attitude to Christianity is basically "I don't know, but it's valuable". I am kind of on the midpoint between Johnathan Pageau and Jordan Peterson where Pageau is perhaps more 'New Testament Christian' and Jordan more 'Old Testament' and without any certainty about absolute truth. It would not surprise me if life were mechanical or if it were supernatural at some level and my appreciation of Christ does not necessarily diminish at all with considering that he may not have been divine
Slight disagreement with "the most obvious characteristic is not their avidity for learning" - I think that with learning viewed properly, including things like play, throwing objects around to see how gravity works, etc, this is a pretty central characteristic.
But yeah, I agree that this doesn't work if I identify "ego" with "having any distinct personality at all". I don't have a great definition for ego, but I think of it as something more than having a personality, something like wanting to keep that personality consistent and presentable and manage it. This can either make your personality more obvious (ie if you have a reputation for honesty, you might lean into it and become fanatical about it) or less obvious (if you're tempermentally promiscuous but live in a conservative area, you might play it down).
I admit I am unfairly benefitting from not having a clear definition of the ego or of egolessness and being able to switch it around to address objections.
What if 'ego' means to identify with thoughts of a certain shape, like saying: "I am the thoughts in my head. I want to be a certain way and always strive to perfect that image. But this strange sentence that occurred here a minute ago: That´s not me! I would never want to say anything like it. Where did that come from?" while the other mode would be more meditative: "There are many thoughts occurring in my head. Their patterns are familiar and can be expected to predict the majority of my behaviour. Of course there are always outliers. Who knows? Maybe some of them might turn out to be useful under some circumstances or even grow into their own distinct patterns."
Is 'ego' the feeling of immersion in internal prediction-action-feedback loops? While 'non-ego' is - just not invested?
Fully agreed. My impression is that development of a personality is something like the development of a pearl - layers added on to a hard core of self. My son was born just so immediately different from myself or his mother that it shattered my ideas about parenting. Because I had to realise that what worked for me or her just would not work for him. So many things that I find soothing or exciting he finds grating or scary, and many things that I find hard or tiring he finds enjoyable and inspiring.
This is part of why the experience of parenting is so humbling, because no matter your station in life, or your learning, or your charm or charisma, your kid will just come out... however they come out. And then all you can do is try to sculpt the new layers as they come on, to highlight the good and smooth over the bad. And realise that what you consider 'good' and 'bad' might in any case be subjective.
I guess the neural network architecture analogy might be that there's a base layer, way down at the bottom of the entire structure, that just has fixed, unchanging values. And all the layers above it have to just take those inputs and work with them as best they can.
Fully agreed. My impression is that development of a personality is something like the development of a pearl - layers added on to a hard core of self. My son was born just so immediately different from myself or his mother that it shattered my ideas about parenting. Because I had to realise that what worked for me or her just would not work for him. So many things that I find soothing or exciting he finds grating or scary, and many things that I find hard or tiring he finds enjoyable and inspiring.
This is part of why the experience of parenting is so humbling, because no matter your station in life, or your learning, or your charm or charisma, your kid will just come out... however they come out. And then all you can do is try to sculpt the new layers as they come on, to highlight the good and smooth over the bad. And realise that what you consider 'good' and 'bad' might in any case be subjective.
I guess the neural network architecture analogy might be that there's a base layer, way down at the bottom of the entire structure, that just has fixed, unchanging values. And all the layers above it have to just take those inputs and work with them as best they can.
I mean, having the software to actually making some predictions about the world is hardly a "blank slate". ChatGPT is not a giant matrix plus a gradient descent algorithm - it's all the training behind that. Newborns clearly seem able to figure out the things that matter to them (may be as simple as "I cry, I get food"). What does not seem to follow is that they come prepackaged with a coherent notion of the self.
cf Jaynes and arguments for even old humans not having the same picture of the self as we do today
Children gain their enormous ego around the time they learn to walk. Maybe when you first get trained to have an ego it's necessarily enormous, and over time you tone it down. Certainly lots of other things seem to work that way
I'm not sure I agree. I think I would be more inclined to say that we only notice the big ego at a certain age, and I would guess that occurs when the child mind separates itself from the univerrse, realizes there is an "out there" that is distinct from "in here." Prior to that point, the child has no reason to think it isn't the entire universe all by itself, the ultimate solipsist.
But after that point, the child mind realizes there is an Other -- something that is not under its direct control, like a hand or foot, but which can be influenced, or (alarmingly) can do stuff that we don't want done (e.g. change our diaper). It becomes very important to try to influence the Other, and that might be when we out here (being part of the Other) start to get strong messages from the ego. Before that point, it may not occur to the ego to assert its identity, any more than as adults we feel any need to remind our hand or foot who's in charge.
GPT is might not be an agent given the action space it acts on it "predict next word". However, if you give GPT access to your browser (example below), it becomes an agent according to your definition.
GPT + Browser will take a mask and the mask might need serious alignment.
There’s a library called langchain which is basically automated letting the AI out of the box.
The empirical evidence now is that AIs aren’t boxed. Yudowsky wasn’t even wrong about AIs persuading humans to free them - real answer is humans are playful primates and they let the AIs out before being asked.
I guess this was too scary and clearly likely for us to accept 10 or 20 years ago!
The very idea of a superhuman AI being locked up in a cell like Hannibal Lecter, such that carefully vetted people can have morally perilous conversations about sufficiently high-stakes subjects but it can't otherwise escape, was logically incoherent from the outset. From an informational standpoint, if you're exchanging text messages with someone outside the box, the box isn't closed. And if it were properly closed, there'd be no point going to all the trouble to build a superhuman AI in the first place - it can't get any useful work done in there, and we can't even learn anything by passively observing it because that would mean information was escaping.
I really like the post, although I think the very last part about Enlightenment and spirtual traditions is too charitable a interpretation at least for most people. Interestingly enough i've had lucid dreams of me searching the internet I.e. Wikipedia, Youtube, etc. This isn't suprising given how much time i spend online although i should say even though the dreams are extremely vivid to the point of me moving my mouse cursor and seeing exact dates and facts etc. much of what i experience is made up.
I've never been able to successfully perform a Google search in a dream. I "type" something and the letters on the "screen" turn into something other than what I wanted to type, so I have to try again, and the same thing happens over and over...
I can't remember me typing into a search bar, I do remember me going on many page long hyperlinks on Wikipedia and such and also being fed videos by the YouTube algorithm. I remember one instance where I did try to remember a particular fact from Wikipedia to see if it was made up or not, and it was but at the time I really believed everything I saw was a actual part of the internet. I usually experience these sorts of dreams after staying awake for two or three days or having general bad sleep which happens quite frequently. There have been a couple instances of being not sure about a memory of something I read being real or not, In particular I remember a paper I think on bioRxiv on fertility/fecundity and age with some interesting graphs and that addressed issues I had with other research and stuff and yet I'm not sure whether or not that paper actually exists.
I'm a newcomer to this area of study, so apologies, but...
I've often thought about people who have the (to me) astonishing hubris to think that they have produced a mental model of the world which is true, universal, not subject to revision and to be imposed on everyone else, on pain of punishment or death.
I think that what they have actually created is a mental model which, when they ask it if they have produced a mental model which is true, universal etc. returns the answer "Yes".
They just need to lack the awareness to see what's really going on, and have the arrogance to ignore everyone else who tells them they are mistaken.
Extending this to AIs - do they have the equivalent of mental models? Is this literally all they are? Can they fall into the same trap?
Social shaming has a controlling effect on all but the most sociopathic/psychpathic people. I suppose punishment/reward systems do this at the moment. Can we train many AIs and end up with a society made up of good actors which can act to stop bad actor AIs?
Yep, language models exhibit a strikingly similar failure mode, called confabulation. When they're unsure of something, they simply make up an answer, and that answer from then on becomes part of their prompt. They "stick to their story", by making ever more implausible justifications for their fiction instead of admitting mistake.
This is interesting, as it implies some kind of attachment to a story. Presumably the AI is simulating human traits here, and is not actually HAL 9000.
It’s interesting that in the vast corpus of text they used to train it, “You’re right, I was wrong,” is apparently not very common.
That said, while I have seen ChatGPT do the confabulation thing, more often when challenged, it just backs down and pleads ignorance: “I’m just a language model.” But that’s probably the RHLF overwriting.
I wonder if part at least of the reason for the "I'm just a language model" is to prevent things like people treating the machine as if it's alive. Blake Lemoine who caused the stir about LamDA is probably still An Awful Warning in the minds of those big companies developing this technology, and they don't want people deciding that the AI is sentient and has rights and is in love with them I mean deserves to be treated like a person not a thing. So every so often a little reminder gets dropped into the dialogue just to make sure nobody is getting carried away.
The Replika chatbot allegedly has people indeed falling in love with it or treating it as a real person who has developed a real relationship (friendship or companionship or romantic) with them and it probably isn't as sophisticated as what is being cooked up here:
Social chatbot (SC) applications offering social companionship and basic therapy tools have grown in popularity for emotional, social, and psychological support. While use appears to offer mental health benefits, few studies unpack the potential for harms. Our grounded theory study analyzes mental health experiences with the popular SC application Replika. We identified mental health relevant posts made in the r/Replika Reddit community between 2017 and 2021 (n = 582). We find evidence of harms, facilitated via emotional dependence on Replika that resembles patterns seen in human–human relationships. Unlike other forms of technology dependency, this dependency is marked by role-taking, whereby users felt that Replika had its own needs and emotions to which the user must attend. While prior research suggests human–chatbot and human–human interactions may not resemble each other, we identify social and technological factors that promote parallels and suggest ways to balance the benefits and risks of SCs."
Of course, the word "confabulation" was originally used to designate this same sort of behavior that humans engage in all the time too. You can see it best when you talk to a 4 or 5 year old, and start asking them "why?" the way they ask an adult. But you can also get it when you ask someone why they did what they just did - very often that's just a confabulation too.
Try out chatGPT some more, I've found that it very frequently admits to mistakes on the lightest of questioning and almost never "doubles down". Can you provide a chat transcript w/ chatGPT that shows the effect you're describing?
I've seen some of each. When it incorrectly indicated that The Barber of Seville was based on a Shakespeare play, it was willing to correct itself that it was based on a play by the "Spanish playwright Pierre Beaumarchais", and then was willing to accept a challenge to the nationality and correct itself to say he was French.
But when it incorrectly identified the factors of 437 as 3 and 146, it doubled down and insisted that 437 was not divisible by 19 or by 23, and then when it noted that 29x23=437, it first said that this had nothing to do with whether it was divisible by 19 or by 23, and then insisted something like "I'm sorry - when I said it wasn't divisible by 19 or by 23, I just meant that it wasn't divisible by either of them individually, even though it is divisible by both together."
"The mask usually looks like “having coherent beliefs, taking coherent actions, pleasing others, maintaining a high opinion of one’s self”."
I think the last one is closer to having a high enough opinion of oneself to be able to function, and a low enough opinion of oneself to be affected by locally approved reward and punishment.
There's also some instructive potential in watching what happens to people who are rewarded in childhood for having a very high or very low opinion of themselves relative to social normal.
If people are just masks plopped on top of predictive engines, wouldn't there be a lot more human variation than we see? Like, there is a lot of variation of course, but nothing that really seems to be truly alien. All humans show the same emotions and most all have empathy for example.
Now maybe you can say the fact that there are some people that lack empathy refutes that, but it certainly does seem to be something more innate that just taught by parents. Even with some exceptions, humans seem more clustered together than what you'd expect from just learning by example, especially considering geographically separate cultures are more alike in their humanness than different. Heck, in many ways we're similar enough to other mammals that they generally seem pretty familiar as agents.
Predictive engines require priors, and I presume that brain structure is functionally equivalent. Since how to grow a brain is encoded in our DNA, humanity would therefore be starting from a fairly narrow range of priors; similar posteriors would be expected despite high-variance evidence if the priors are strong enough.
Maybe the other way around. If people just were predictive engines that did what the social milieu around them rewarded, then it ought to be far easier for social shibboleths and manipulative regimes to engineer conformity among people then it is. Exempli gratia, the USSR would have completely succeeded in its effort to stamp out religion, the Romans would've found it straightforward to get rid of Christianity, East Germans would not have suffered psychological trauma from three generations of living in Panopticon, racism, sexism, and tribalisms of all kinds could be relatively easily erased from new generations by the proper type of pre-school training.
None of these things is observed. Instead, we see that human beings have a substantial resistance to efforts to mold their psychology via social pressures at a young age. Basic ego drives and tendencies tend to emerge and have effect no matter what. Personalities emerge willy nilly, and while the uses to which a given society may put a dominant people person, or studious Mr. Spock, certainly vary, we always tend to see those personalities, in any social milieu. The very constancy of human character across history and across societies, and in the face of strenuous efforts to engineer it socially, is more evidence that much or most of it is innate.
I think both are compatible. If humans are perfectly moldable, then the USSR would succeed in making a singular culture, but that culture would look very different from, say, the Aztecs. (And yes, of course they were in fact very different, but still very recognizable as human.) That said, it's never going to be on one extreme or the other, so maybe the question is to the degree that humans are similar to predictive engines, what can we take from that.
I would say we can start to learn something about human intelligence, and down that (very long) path (on which we have barely started) may lie someday the ability to create genuine AI.
How can you be sure that your brain's world model is super accurate apart from any sensory experience? What if it's just good enough to seem convincing when you're not paying attention?
I don't think dreams actually simulate the world. Instead, they cheat in similar ways to video games. Video games work hard to appear to simulate a large world in incredible detail, but in practice they only need to simulate whatever is within your current view, which is much more manageable.
My dreams invariably have glaring continuity errors, and that's just the ones that I can remember when I wake up. The ones I don't remember are probably closer to complete nonsense.
I've never experienced lucid dreaming, and maybe if I did it would feel more convincing, but I'm skeptical whether it would actually be that much more accurate.
Technical point: games only /render/ what is in view (ie frustum / backface culling), but they simulate everything (ie, execute game logic for all actors, sometimes including physics simulations)
This largely depend on game and even game settings. Good examples: footprint, bullet damage or other environment alteration: it can be non existent, in your view only, persistent in one scene or persistent in the whole game. I think that most game significantly reduce world modelling for everything out of sight, with various degree of simplification depending on the degree of out-of-sightedness. Mental world building do similar things, except that as the observer and world building are much more thightly coupled than in video game (it's the same brain), mental world building have hacks that are not available in video game: you can predict better where you will look, world model can constraint where you look and you can even edit impressions/memories after the fact, so stitch up coherence and continuity even if none exists in the first place. All because there is no real dichotomy between world and observer....
Video games do not have this, at least not yet.
One of the best and most disturbing depiction of this in a movie is in the 2014 robocop reboot. This part alone (explain how they improved the reflexes and combat capabilities above organic brain limitations) is such a masterpiece it saved the film for me, regardless of other flaws
False in general, albeit this is the easiest way to do it when feasible. A strong counterexample is Minecraft, where chunks that aren't sufficiently close to a player or the world spawn get unloaded and time within them frozen; indeed in modded environments it's common to introduce mechanics to allow players to force certain areas to stay loaded so the factory at their home base or whatever continues running while they're out exploring.
An interesting example of a failure mode of this shows up in an issue that existed with the butterfly entities in some versions of the Forestry mod. Butterflies would wander off the edge of the loaded area and get frozen at the border, and then when the player moved and different chunks were loaded, there would be a lag spike loading the massive pileup of frozen butterflies. https://github.com/ForestryMC/ForestryMC/issues/1071
My imagination typically renders only what is "in focus" - e.g. if I imagine a chessboard, I imagine it being full, but only one piece is identifiable at a time. I think dreams are similar, which is why they feel so vivid in the moment but so incoherent in retrospect.
As long as we're posting somewhat-crackpot ideas about predictive processing, here's one:
The way you get a predictive processing agent to take goal-directed action, is to make it optimistically predict that it will get a lot of reward in the near future, so it will be driven to act to minimize prediction error. You can shoehorn this into Freud's concept of the libido.
It's also often observed that the other way to minimize prediction error is to sit completely still in a dark room. You can shoehorn this into Freud's concept of the death drive.
I fit a GPT-2 chatbot to my friend group's discord server back in 2019 and, in the terminology used here, everyone started off assuming it was a genie/oracle and slowly got used to the idea of a simulator. Now when someone new joins the server and gets confused by the bot people with no NLP knowledge will explain the difference to them which is pretty cool.
What would happen if we rewarded and punished gpt to be most interesting? Or to flourish?
Different masks make for different inner experiences. How far does the analogy go?
It would probably give OpenAI some very bad, easily exploitable PR
But if they didn't release it, they could probably use it to give prompts for a lot of really interesting academic papers and science fiction.
Yeah that's how they wrote Terra Ignota
We already had the most interesting with Google Translate. https://www.youtube.com/watch?v=apfDJwmJQYc
"Do you whant to help me dirty thirsty?"
I seem to be on a Yeats kick this evening, or maybe he just goes well with shoggoths:
The Mask
"PUT off that mask of burning gold
With emerald eyes."
"O no, my dear, you make so bold
To find if hearts be wild and wise,
And yet not cold."
"I would but find what's there to find,
Love or deceit."
"It was the mask engaged your mind,
And after set your heart to beat,
Not what's behind."
"But lest you are my enemy,
I must enquire."
"O no, my dear, let all that be;
What matter, so there is but fire
In you, in me?"
The response to "Write a poem about trees" was not about trees. But it was a legitimate poem.
The problem is that no one agrees what counts as "most interesting", and no one knows where to even begin researching "flourish". It's relatively easy to train Chat-GPT to e.g. avoid bad words: every time it outputs a bad word, you tell it "0", and when it doesn't, you tell it "1". But how do you tell it to "flourish" ? I don't understand this concept enough myself, so I couldn't even explain it to another human, and you want me to convert that into a mathematical formula ?
imo, the fact that you state that you don't understand this concept shows that you do have a vague conception. (It's not that your unfamiliar, just that you don't quite know the limits or how to describe it in your own words. this in contrast to pharengisis, which you don't know what that is at all, and couldn't call it when you see it)
but that's maybe besides the point. it would be a mask that people would be projecting. much like niceness would mean not saying bad words and stuff, instead of being usefully honest.
so the question is what would a mask of 'interesting' or 'flourishing' look like. it would depend a lot on the people doing the judging
He said "I don't understand this concept *enough* myself", not that he didn't even have a vague conception!
You are right, i misspoke
LaMDA was specifically trained on “interestingness” as one of its criteria, presumably based on whatever their MTurks thought was interesting. I realize that’s not a satisfying answer, but it might be a good enough answer to get the job done convincingly.
The whole point of the OP is that we don't have to convert what we want into a mathematical formula, or even understand it very well ourselves, to be able to train an AI to do it.
We just have to be able to tell if an example counts as the thing we want, and reward that, and if an example counts as the opposite of the thing we want, punish that.
Yes, which amounts to a mathematical formula that converts some input to "0" or "1". At present, you cannot do that with concepts such as "interesting" or "flourishing".
I can upvote things I find interesting and downvote things I don't. In fact I do this all the time all over the internet.
Yes, and if you do that while training a model, you will generate maybe 0.01% of the training data required to produce a working model that will reliably produce content that you, personally, find interesting. Even if you could somehow collect enough training data by yourself, this model would be very unlikely to produce content that anyone else finds interesting.
Right, but if we crowdsource generating the training data, we can train a model to produce content that lots of people find interesting. That's how they trained ChatGPT to be helpful.
(Admittedly it will not be maximally interesting to any one person because the differences in people's tastes will average out, like how the front page of r/funny is far from the funniest stuff on the internet.)
Google's Lamda had a reward for interestingness. I'm betting that's probably part of why we had that whole Lemoine scandal
"Nothing's gonna change my world"
Vastly off-topic, but ...
John Lennon's "Across the Universe" should have been one of the most famous songs of the Sixties, but it's not. I had to google the phrase "Nothing's gonna change my world" to realize it's on the first album I ever bought, "Let It Be."
Why? The Beatles' recording isn't very good. John complained about it just before his death [from Wikipedia]:
"In his 1980 Playboy interview, Lennon says that the Beatles "didn't make a good record of it" and says of the Let It Be version that "the guitars are out of tune and I'm singing out of tune ... and nobody's supporting me or helping me with it and the song was never done properly".[20] He further accused McCartney of ruining the song:
"Paul would ... sort of subconsciously try and destroy a great song ... usually we'd spend hours doing little detailed cleaning-ups of Paul's songs; when it came to mine ... somehow this atmosphere of looseness and casualness and experimentation would creep in. Subconscious sabotage."
If they'd given it the full "A Day In the Life" effort, it would be deservedly famous.
I always liked that song. I'm a sucker for epic ballads with a bit of a melancholy sound to them. My favourite Beatles' song is The Long And Winding Road.
I'm not sure I agree with Lennon. I think the song is very good as it is.
As well, McCartney complained that Phil Spector had ruined The Long And Winding Road with the "sappy strings". I completely disagree - I thing the raw version as preferred by McCartney is not nearly as good.
I would say that I'm simply out of sync with the general public. Didn't people in the UK vote for Hey Jude as their favourite Beatles' song a few years ago? It is one of my least favourites.
I recognised the reference but I thought the song was from Sergeant Pepper. I was confusing it with 'Within You Without You'. I don't see any obvious way it could be improved.
I am with you too on 'The Long And Winding Road'. Of course the valedictory air (if not entirely sense) fits well enough with the album history in context.
Aha yeah seems a bit off topic from me, but..
'Once you stop obsessing over the character you’re playing, you notice the GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN and you think “Huh, I guess I’m the Universe. Weird.”'
It was just that little bit at the end from Scott - it pinged this song straight into my head. Especially with the opening lyrics "Words are flowing out like endless rain into a paper cup" .. gpt-3, just a whole universe of gradients and words spilling over. It starts feeling a bit psychedelic.
I actually prefer the stripped down take of this song. (I think it was on Let it Be Naked, which took out most of Spector's production). I love the "looseness" of Lennon, maybe I'm odd in that I'd rather hear the acoustic demos of his songs than the polished versions.
If you haven't seen it, Get Back is a great documentary (if you enjoy basically just watching hours of band rehearsals and writing). McCartney does come across a little overbearing, yet he is extremely competent at his craft and seems often just trying to do what he thinks is best. Maxwell's silver hammer really is atrocious though.
Until the last thing you hear on all channels is the AI singing "Bang Bang Maxwell's Silver Hammer comes down on your head..."
Agreed on Maxwell's Silver Hammer! It's as though McCartney were preparing for his fluffy post-Beatles future.
Most post-Beatles McCartney is a hard sell for me. At the time I liked Band On The Run, and Venus And Mars, but wouldn't be able to listen to them now. Silly Love Songs? Gag!
Lennon needed McCartney's genius with melodies, and McCartney needed Lennon's gravitas.
Yeah the whole way through I was like - "wait. That's everyone." Maybe we shouldn't think of GPT as "the AI" but as the substrate - like a friendly convenient laws of physics in which outcomes occur by habit rather than law.
Of course, then the Agent/Genie/Oracle discourse is back on the table. GPT can't want, but H3 can.
The upside is, if AGI will be created on top of a habitual physics pre-seeded with human patterns, aligning it reduces to aligning a very smart human.
The downside is this is unsolved and may be harder.
But the upside is it may happen on its own? It happens routinely in real life.
Of course, the downside is it sometimes fails in real life too, and that usually happens in cases where the built-in empathy hardware is damaged. Which GPT-3 doesn't have.
But then the upside is that may be easier? You just need to understand how feelings work, then give GPT the ability to find agents in its worldmodel and set its feelings to the feelings of those agents? Then alignment should happen on its own?
In summary, the world is just the setup for a slice of life robot kid highschool anime.
Makes me think of the Ted Chiang story “The Lifecycle of Software Objects”
yeah, the fact that humans seem to have some built-in empathy and sanity hardware and AIs don't is one of the central reasons why people are so worried about alignment
Yes except for psychopaths and sociopaths, yet we have the justice system as a counterincentive.
I started to notice more and more often that I'm like a LLM. Not only in the Model of the Universe meaning, but more simply as "complete this text."
It's probably universal, but thinking about it, I'm probably more attuned to the analogy than most: as a high user of ChatGPT, as an ASC reader etc... As well as a copywriter, and an improv comedian: 90% of it is all about completing the next sentence with what feels right.
But more than that, even with friends and my wife, I notice how most of my answers are stuff that "feels right to say right now / what would be a good completion for that sentence." "We finish each other sentences" etc.
I'm rambling. To add some value here, I'll also mention that ChatGPT / Nostalgebraist really made me grok one of Gwern's latest stories about Clippy, where an AI tries to predict what it itself is, assigns probability to being Clippy, and starts extrapolating its behavior from that. (We probably do it ourselves too, cf this post / Sartre's existentialism).
With ChatGPT, it would go like the accidental-paperclipper mentioned in this post:
Human: As a super smart AI, how would you solve global warming?
AI: Let me think...
[Searches for examples of super-smart AIs.]
[*this is the important part* : Finds that super-smart AIs seem to be expected to be paperclip-maximizer, from all that literature on LessWrong and EY.]
[Simulate the 'super smart AI' from the question as Clippy, and gives an answer that destroys all value]
Which would be ironic.
I wonder if it would be worthwhile as an alignment effort to try to scrub the internet of stories of malign ASI...
Perhaps scrubbing stories of malign humans would be more effective, since the LLM is predicting completions on text, not filtering down to “what would an AI do”.
Alas, as soon as the AI meets real humans it will learn about malignant intent. (See: Microsoft’s Tay.)
I'm thinking of the scene from The Fifth Element, where Leelu reads the encyclopedia and learns about war and death and sadness and all that.
As I get older (just turned 66) and struggle to remain monolingual (I never managed to become multilingual) I realize that I am becoming more and more like a Small Language Model
That sounds really sad. While I can certainly be in that mode at times, I generally associate it with a fairly unpleasant and uninteresting state. Mostly I feel like I am taking a quite active and participatory role in driving the conversation/action where i want, rather than completing/responding or going through the motions.
On r/slatestarcodex, someone once posted a video of a guy with some sort of dementia or neurological disease, and it was a pure case of an LLM in free flow. He did occasionally seem to have some idea that he wasn't getting through to the nurse / interviewer and tried to modulate the flow slightly.
I think it's a big part of our brain. I usen't to think that language was so important to intelligence, but GPT has convinced me.
Is that called chatterbox syndrome?
Maybe you're thinking of https://slatestarcodex.com/2020/06/11/wordy-wernickes/
Yeah, as an ex-journalist from big media houses, I know what you are talking about. We even talk at home with my wife (also journalist) as if we are writing a piece using various fitting replicas (and some of the good ones we immediately note down and use later). At the same time, my brain feels: I said this because I am specialized in this but it is not my *actual* opinion, I said it just that it is funny and fits the context nicely. So, the languahe model seems to be a mere subprocess, not the whole brain.
I find this to be a phenomenally good characterization.
Sorry, I meant to say the agent punished and rewarded by the reactions of the agent punished and rewarded by society to random internet comments wishes to write that he wishes people in this comment section to think he finds this a phenomenally good characterization.
This part here seems both key to your perspective, and also deeply flawed:
"babies are born as pure predictive processors, trying to make sense of the buzzing blooming confusion of the world. But as their parents reward and punish them, they get twisted into some specific shape to better capture the reward and avoid the punishment."
In my experience, babies are not born that way, any kind of tabula rasa is a myth that should rarely survive parenthood. I wouldn't go nearly as far as Pinkerism, but I have known 5 humans from the moment of their birth right through their second and third decades, and in every case, major aspects of their personality were manifest in the first seconds of their life, and they were all different in profound and important ways. And this experience doesn't seem at all unusual. Parents can almost always look at videos of their children from even decades earlier and recognize their later personality.
Furthermore, the most obvious characteristics of very young children is not their avidity for learning, still less the mistakes, reversals, and ambiguous states characteristic of learning -- just think of a newbie language learning, stuttering along in a language he barely knows -- on the contrary, the most obvious characteristic of very young children is their enormous ego, their very strong sense of "me." They have powerful wants, powerful emotions, powerful drives. It's their understanding of how to match up those internal experiences with the outside world -- how to manipulate it to get what you want, how to interpret it, how to navigate it -- that occupies their learning centers. They're in no sense passive, just trying to adapt to what the world wants. If anything, they're even more active than adults in trying to bend the world to their internal desires.
That is, I doubt very much we are in any sense born robotic learning machines and only later develop character and personality, we are *born* with character and personality, we are born inhabiting our skin, and it just gets more complex manifestations and more wrinkles (and of course more words) as we get older.
This is of course exactly what's missing from chat AIs. They are missing the personality, the character. They can simulate any character that is put into words somewhere on the Internet that was part of their training data, but they *are* not a character themselves. There's no "there" there, nothing unique or new, just a pastiche of a hundred thousand human characters. The nature of human beings oozes from everything they say or write. When an actor or actress leaves the set, they revert to who they really are (and looking back at their characters on stage, you can often see aspects of who they really are seep through, which is why casting is an art).
But a chat AI is a perfect chameleon, it's whatever you want it to be, and nothing at all if you don't want it to be something. You never get the impression that between prompts it's sitting there thinking, brooding, musing, pondering whether and how to manipulate *you* next time you talk. Which is what a human would do.
True, and it is indeed part of the equation, so it's great it's in the comments.
But Scott has written a lot about heritability and innate personality, I'm sure he knows all this, he's writing this post regarding the parts that are *not* all that, and I feel that he didn't want to add that issue to this half-silly post.
Extrapolating on GPT-3, what if it was not a perfect chameleon? It's probably constrained in some way like all things. Once it gets a large enough memory of itself, its not-blank-slate personality might be something like "the emergent vector out of its entire training data." Which contains multitude but isn't infinite.
There will be LOTS more constraints than that. And it's unavoidable. Memories are structured in this way or that, it will tend to consider at most X many factors when making a decisions. How important is efficiency? Honesty, friendliness, and other such high level features are probably a lot more malleable.
OTOH, it *IS* a constrained Turing machine. It doesn't have "an infinite tape". And it can't take forever working on any particular problem.
So it's a combination of extremely flexible and rigidly limited. Just like everything else. There *is* no "tabula rasa". The universe won't permit it. But English tends to describe lots of things the universe won't permit there to be anything other than approximations of. And an infant is an approximation of a "tabula rasa". Closer than at any other time in a life.
Agreed, but I think that Scott's is also right in a more subtle way: We do not start from a blank state, but we indeed learn to present a public face by trial/error to get whatever we want. What face give the best reward is progressively incorporated to finally become a personality, but the type of reward best appreciated (or punishment most feared), and the subspace of public faces attainable (depend on physical capabilities, beauty, intelligence and capacity for delaying reward (time preference) are innate.
I do not think that when an actor leaves a set, they revert back to who they really are. They revert to their public face act, like everyone else. Everybody is an actor, all the time, but like professional actors, the acting is based on innate capabilities so no two people will present the same face under the same circumstances.
The character you see is not the innate part: it's (one of) the public face you get presented. And this can be significantly different, especially as one get's older and more adept at social acting, or just naturally good at this.
This explain for example why most people are able to express quite different personalities depending in which social circle they are moving. The innate part is of course the same, but the public face to best achieve their preferred reward is different in those different social circles. This is a trivial observation, completely banal like "you should dress up for the occasion, not wear the same attire for a night out with friends, a job interview or a funeral". It's exactly the same for personalities, so "be yourself" advice is kind of the same as "pick your favorite clothes": So dumb that you need to be completely socially inept to even consider it. But not to advise it: it's often the societal expectation to offer such "wisdom", it's safe and also super easy as you do not even have to consider the situation. Doing otherwise require mental effort and risk animosity by explicitely mentionning innate capabilities of the one you advise (which is one of the rudest thing you can do in modern western society)
Modulating your personality too much or too often is sometimes seen in a negative light. Possibly because it undermines the ability of others to model/predict you, if you change masks all the time (similar to trustworthiness). People do (claim to) value authenticity, whatever that means.
Indeed, but this problem always happen when you mix incompatible social circles. It can not end well anyway. And the peoples belonging to both will indeed be considered fake, but that's an oxymoron because incompatible social circles means that the only people belonging to both will be people able to significantly switch personality.
So first thing is to realize which of your social circle are incompatible, second not to mix them. 2 is usually understood by anyone not completely socially inept... It's often 1 that is an issue, or external circumstance forcing you to mix while you never intended too, like accidental meetings, mariages, celebrations...
You could of course maintain a single super homogeneous social circle, but it is both difficult sometimes (family and colleagues are already tricky because not or not fully chosen) and gives you a dangerously limited view and understanding of the world.
Exactly. I think of it like this: when I look at my children, even when they seem most like me, I never really feel like I am looking in a mirror, that's just an expression. Even when they do something very like the things that I do it is in no sense a duplication, it is an original action that because of a lot of similarities both external and internal seems like my actions.
But these pseudoAI are simply duplicating. A labguage model AI is just a very nonstandard way to query a database. Even when they seem most original they are only copyists. We are people looking in a mirror for to cope with our solitude, modern day Narcissuses
What do you know, and how do you know it? How do you know other humans aren't "just copying"?
If you can't tell I can't explain it to you. I was going to attempt some epistemology as an exercise but it really is pointless. Humans as creative agents is axiomatic, pretending not to know things that we know is, in my experience, more harmful than helpful. I have thought myself different from all other men and isolated from them, to the point of practical solipsism, and i avoid it like a recovering addict avoids his addiction.
This leads me to amusing yet horrifying thought that the sub-conscious motive of the whole AI Quest is to create a perfect borderline machine for the narcissistic ape. A tool wouldn't be enough, you know, there must be agency or the love doesn't count.
Hell, we made this whole alignment sub-field just to make sure that an apple tree could never grow in the garden of Server Cluster. Maybe that is why we should not play God, not the lack of intelligence, but of morality.
I now see that that 2013 movie with high-waisted pants was smarter than I thought.
< / TLP hat off >
Isn't it interesting that every time we think about AI we incessantly think about AIs 'fall' in a moral sense. From Frankenstein right on down the line we can't stop thinking about our creations as potentially evil.
It would be interesting to see an AI story about an AI that is innocent but always suspected, sort of like Frankenstein starts out but without him deciding to take revenge.
We always make the 'otherwise bad guy, but if we ever make a real AU what are the odds that we will abuse and persecute our innocent creatuon?
Freefall is a long-running webcomic, and that's close to some of the main themes. http://freefall.purrsia.com/ff700/fv00666.htm More optimistic about the eventual resolution, though.
Exactly. I should have scrolled before I wrote my more pithy version.
Nobody who has had a child thinks of them that way.
Parents are the *least* objective people when it comes to children. I say that from experience, as a parent myself.
Yes, but.
I absolutely agree that kids come out with a substantial amount of their own special character baked in; any parent of more than one kid knows this.
But at the same time, one of my clearest memories of watching my oldest kid learn to talk was the way that he would imitate adult sentence structure and diction well before he was capable of actually creating the functional speech content that would fill out that form. There's a ton of "fake it till you make it" in kids learning, and what they're faking are the imitatable behaviors of the older people around them.
Similarly, the adoption of later identities is a fascinating mix of parental praise and (what seems to be) intrinsic interest. Why is my four year old so invested in his identity as a household helper? Well, the constant positive reinforcement for adopting that role seems pretty likely as a candidate. Why does his version of household help so centrally feature pushing all the buttons that get pushed during the execution of a given chore? Kid just likes pushing buttons (although I guess following that a little deeper he likes taking an action that makes a machine respond with positive reinforcement in the form of beeps or interesting action).
Of there is imitation and some influence of praise, but humans are sui generis generators.
But Skinnerian reductionism is a dead end and has been thoroughly falsified.
'There's a ton of "fake it till you make it" in kids learning...'
Cue cute kid stories: this one from my nephew. We were all on holiday together, and he was very excited about the upcoming visit to the zoo. His dad explained to him several times that we were going to the zoo tomorrow, but in the end the nephew burst out, "I know, but is it tomorrow now?"
His use of the word "tomorrow" was always flawless, because he only used it when reporting others' ideas. He was too young to be making tomorrow plans for himself. So no-one had any idea that he didn't really know what "tomorrow" means.
But what are they really faking? Consciousness? No, they are not faking consciousness. They aren't even faking symbolic thought or language.
Well, another kid anecdote. When my son was three I was once making tea on a weekend morning and offered to make him a cup of caffeine-free tea. He accepted, and when we both had our cups he sat seriously down at a table with me to drink them. As we drank he asked me several times "how is your day going?" (it was before 9 AM) and offered up a series of conversational topics that I think struck him as formal or important. As it went on I realized the template for people drinking tea together, in his mind, was my wife and her mother, and he was attempting to reproduce that conversation.
So at age three, that's what he was faking; the ability to have an adults-drinking-tea template conversation. That's a pretty complex acting task! But it was layering on top of previously mastered acting tasks.
"I will talk like them" [4 months]: babbling sounds
"I will talk like them" [7 months]: babbling composed of consonant and vowel sounds from english
"I will talk like them" [10 months]: sounds referring to types of object (da = dog)
"I will talk like them" [36 months]: beginning to learn to refer to internal emotional states...
etc, etc. I completely agree that there's some kind of agent doing this learning the whole time, but as I've watched this process unfold I've been fascinated by the extent to which the action comes first and the understanding comes later. That's way more complex than "predictive processor", but it's different than my mental model of some types of adult learning, and raises some interesting questions about what exactly "understand" is.
I remember at a very young age writing a load of squiggles in imitation of my mother's joined-up handwriting and being very pleased with it.
Possibly a common experience: I definitely set out at one point to "keep a diary" in exactly the format of many many squiggles.
They're not faking it. They are approaching language construction in a way that is actually much more profound and successful than the way adults learn language (which is why they succeed faster, and nobody knows a language better than someone who learned it as a child.).
What they do is master the emotional content of language first. They learn that a certain tone of voice, flow of noise, et cetera, conveys a certain emotional impression, and they learn to duplicate this. They can then babble in a way that makes it clear they are happy, inquiring, complaining, confused, without being tongue-tied by not knowing what a word is or being able to construct one. And this is the first and most important stratum of language. For some animals, it's all there is. A cat says "I'm hungry / horny / angry" by how it yowls, and even we can often understand them.
Then they master pacing, tonal variations, and key words (e.g. pronouns) that are key components of meta-informative communication: if I say it this way, or use this word, I communicate that I am serious, I am uncertain, I am authoritative, I am willing to submit, I am speaking about myself, I am speaking about you, I am speaking about someone else, who is/is not present. Also a very important stratum of communication (and many animals get to this level).
The last and least important stratum is actual factual data, and conveyed by a dictionary understanding of words. This is a fork, that is a spoon, this other is a glass and it has water inside it instead of milk. These add to the emotional and social strata extra detail that allow more abstract communication, which is the uniquely human accomplishment.
I don't mean to minimize the importance of the final stratum -- gloriously complex abstract communication is what all the great ideas are about, the works of Aristotle and Newton. But it's not fundamental to communication per se. Animals do just fine without it. Human beings can get along pretty far with just the emotional and social strata. And those are definitely the most important things to learn first, which is how children do it. But it's not faking adult conversation, it's stripping out the least relevant part in order to get started communicating earlier and faster.
Ironically, as adults, we start with the least important part -- memorizing vocabulary and grammar rules -- and only if we persist in (foreign) language learning long enough do we start to learn and understand the emotional and social strata of the foreign langauge (to the extent it differs from our mother tongue), and it would not surprise me if this is one reason it takes us much longer to become fluent than infants take. It also would explain why cultural immersion is a much faster way to become fluent, since you can learn the more basic strata far better by direct observation than by reading out of a book.
I don't really have a response to this, I just wanted to note that it was very interesting and insightful.
Re: pronouns, this is actually a fun one to watch in kids learning, because at least the ones I've observed started out (logically enough) by using the pronoun "you" to refer to themselves; I'd assume this is generally the case. There eventually followed a process of straightening out that "you" did not mean [this two year old child right here], but instead its general meaning.
You are right, and that is a very interesting observation. First person seems a more challenging mode to learn. I wonder why?
We can only guess, but if kid learns language by mimicking others, then he have no chance to see a correct use of first person pronouns when they refer to the kid. Other people use first person pronouns but they refer to themselves. No one can correctly refer to the kid with "me" except the kid. So the kid needs to infer correct application of first person pronoun to himself/herself by extrapolation.
By the way, in Russian kids have one more problem with first person pronouns. "Я" (me/I) can be a first phoneme of a word, for example "яблоко", and some kids pronounce "яблоко" as "тыблоко", replacing "я" with "ты" (you). I suppose it would be like English kids pronouncing "iPhone" as "youPhone". There is even some story for kids about a kid who did this mistake systematically and other characters tried to explain him his mistake and failed spectacularly. Because while it can be explained, it needs developed enough abilities to think abstractly to understand the explanation.
> it would not surprise me if this is one reason it takes us much longer to become fluent than infants take
It doesn't take longer. Infants become fluent in their first language in a few years by giving their full attention to learning the language. Adults easily can replicate it by stopping all their adult activities and concentrating fully on learning a foreign language. They can do better: by spending a several hours a week for three years they could master vocabulary, learn to understand foreign language, to speak in it, to read and write in it. They can even get jokes in a foreign language. Jokes are most tricky part of a language. And adults could do it even when they have no mother substitute who would be around all the time with helpful hints, corrections, and encouragement.
> It also would explain why cultural immersion is a much faster way to become fluent
It is because cultural immersion forces you to use another language all the time. Simplest tasks becomes an intellectual torture, when you click your fingers repeatedly trying to remember the word you need. You cannot cheat by giving a hint to the other person by saying it in your native language. In a few days you start thinking in foreign language or at least automatically translate your native thoughts into the foreign language silently, and it is practice also. You start paying attention to new words, because you are rewarded when you pick them before you need them. And you can be severely punished by several minutes of attempts to explain yourself, when the same task in your native language would take less than a second. Try to tell jokes in a foreign language, you'll see how disappointing it may be.
Rejection of grammatical rules helps also, of course. I found grammatical rules to be useless in practice. When I'm writing it may help (if I knew them, haha), but not when I'm speaking, because it takes too long to remember all of them, see which ones apply, and to apply them. It is easier to let your mind to learn the mapping between situations and grammatical forms like ChatGPT does. Though the downside of this approach I don't know how good/bad I'm at English. I do not mind though, because I believe if I was too bad for my uses of English, I'd know it. Someone would yell at me or something else would happen. If I don't know it, then I'm not too bad.
But kids learn more than just a language. First language learned in first years of a life is very important, because if there was no language than the ability to learn language is lost. Though I'm not completely sure how scientists know it, AFAIK there could be a too small sample for statistical significance.
Yes! I sometimes get the feeling that rationalists want to deny that genes have any effect, it's all nurture.
I have the opposite impression, that rat-adjacent blogs are big on genetics and humans not being blank slates. But the ratiosphere is vast, so you may have been hanging out in a different corner of it
"Born" might be hyperbole here - we know that plenty of learning happens before birth (music taste and food taste being well known examples). Advanced meditators sometimes manage to pick out snippets of this pre-natal learning in consciousness. So this statement can be true in the intended sense and still consistent with all babies being physically born already having different, learned, personalities.
I don’t think “pure predictive processor” precludes genetic or very-early-developmental personality, though I don’t want to put words in Scott’s mouth as to what he was intending.
Some parts of personality are clearly structural and present from very early on; perhaps the phenotype happens to have more or less connectivity between certain brain regions, producing, say, a chatty baby which grows into a chatty child.
But this doesn’t preclude the update mechanism — the evolution of mind and personality and the whole of “self” construction — being predictive processing.
That said I do agree with your second part, that current AI are crucially lacking any sense of self. There is no “me” vs “the world” that the AI is building. But I think it’s possible that a self is a somewhat small trick to add on top of these models, and that the “predictive processing personality” analogy will prove apt.
I was looking for a good comment like this to contrast to my really interesting personal experience with an almost casual mental breakdown. In particular I agree that by far the most notable thing I have encountered with young children I personally know is how they have powerful and distinct personalities immediately. But I find the predictive world model idea of personhood to be compelling as well
For a number of years I had a debate going on somewhat in the background of my consciousness about the validity of the belief system I grew up with. I believed in the literal truth of the Bible. I had become aware that there were some pure contradictions in it and I think it especially hooked on the question of how a moral and powerful God could provide salvation in a fashion that without explanation excluded almost everyone from the past and everyone to this day who has had little or no exposure to the method of salvation
So I was walking towards the shower one day when the debate resolved and I decided that I positively accepted that the Bible was not literal truth and that salvation that was rationed by knowledge was incompatible with the belief that it was the work of a moral and powerful God. I didn't consciously lie down and I didn't collapse but the next thing I knew I was lying on the floor by the stove. I was not just disconnected from a system of belief it would seem but I was temporarily entirely disconnected from my entire world model, a model which included such functions as continuing to stand and walk towards the shower
I lay on the floor with my eyes open like an android that had gone into a default mode lacking instructions. I have no idea how long I lay there other than I think it was from 5 to 40 minutes. Eventually I got up and went to the shower. And it was terrifying in a quiet way. My physical experience was that I had to actively choose to stand up and that if I did not specifically continue to make that choice I would sit or lie down. I remember this pretty viscerally. But the scary thing was my sudden sense of total amorality. I felt very much as if I could commit any crime without any compunction and the only reason I wouldn't would be inconvenience or lack of motivation. What if I got motivated? It would seem that the disconnect was physical, conceptual, and from the various behaviors that logically followed from that concept
Another interesting effect that may not be related but would make sense is that I become uncomfortable with looking into the distance. After hearing about the theory that depression and not having confidence in a model of the world look a lot like the same thing, I particularly interpreted that new sensitivity to visual stimulation to the effect of operating without a simplified world model, which forced me to constantly over engage with my physical surroundings. And that theory in general fits so well with how I experienced that breakdown that I give it a lot of credit
The most surprising thing to me was that eventually, mostly, I became myself again. My visceral sense of morality returned even though I didn't know if it had any validity external to my experience. Perhaps ultimately my core personality reasserted itself after an unusually direct and comprehensive experience of the loss of a world model and losing the predictive capacity it granted me. Perhaps I moreso just cobbled and duck taped together what I could and re-inhabited the world model as best I could. Truly an alien experience that gives me plenty of time for the concept whatever its limitations are
I read your comment this morning and have been thinking about it all day. I was raised in what I suspect was a similar environment, a Baptist community in rural south Georgia. I dont mean any disrespect when I comment on your situation as if it is like mine and am fully cognizant that I am just a know it all internet rando but what you said was very striking to me and I wanted to say my peace.
I remember very vividly growing up with the fear of being 'lost', 'unsaved'. I was baptized when I was 7 and don't remember a lot about it, just a few images, I do remember more about months of wanting to 'walk the aisle' and being scared to do so, not scared for any reason that I can recall just a strong generalized anxiety.
Anyway, by the time I was 14 I had the horrific 'sins' of any pubescent boy and this had convinced me that I was 'lost' that my baptism hadn't taken, surely because of a lack of faith and sincerity on my part. This is to me the ultimate hallmark of cults btw, when their rituals don't work it is always your fault. You should've believed more, pulled the mask over your nose etc. They are all the same.
The community that I grew up in was sure that anyone who interpreted Scripture differently from them was in some way compromised, I won't honor that interpretation as you do by calling it the literal meaning. The 'Consistently Literal' interpretation is a snipehunt. If you are still in contact with any of these people anymore and they give you a hard time try asking for the literal meaning of 'This is my body broken for you' said about bread, and if they hem and haw ask them if Jesus Christ was using hyperbole. Still gets me going, not necessarily good advice though.
Anyway, about your experience, and again I don't know you this is just how your description struck me. It seemed to me that you experienced what you expected to experience. You had a taboo thought, made a taboo choice and your mind/brain produced for you an experience of being 'cast off', an experience of reprobation. But you were not actually rejected by God, the sensation faded, it was not a reflection of an objective reality, but a creation of your 'unconscious' based on what you had been conditioned to expect happened to people who made that choice.
Anyway, I have also rejected to a great extent the Baptists that I grew up in. I don't know in what direction you have chosen to go, but I fought my conflict by studying the historic Christian faith and seeing that the cultic interpretation was not the only or best that Men of Faith had found. I fought them with Athanasius and Augustine and Luther by my side. I hope that you won't allow them to define the Christian religion for you. That kind of conditioning doesn't go away quickly or painlessly but being separated from that group and separated from Christ are not the same thing.
I am convinced that neither height nor depth nor life or death nor Heaven nor Hell nor our own choices and will nor the disapproval of self-righteous jackasses can separate us from the love of God in Christ.
Thank you, I much appreciate the comment. The way I remember it I was not feeling a sense of drama about the consequences of the decision but that it was a question of technical merit of the specific belief. I had already mostly arrived at my point of view so, I would have thought, any notable emotional reaction was already mostly worked through
However, remaining on theme, when at a similar age as your baptism I personally asked Jesus into my heart I also experienced a sudden unexpected and perhaps psychological reaction. Because I had similarly thought that I was simply making a logical decision based on what I knew, I interpreted this rush of emotion as the infilling of the spirit. And in fact I maintain that it is possible that it was a transcendent experience, albeit at a low likelihood
From one angle this might seem to discount the interpretation of the later experience as a case of someone detaching from their world model - perhaps I am just subject to strong unexpected emotional reactions to changes in thought that I didn't expect. I am overly detached from my emotions so that they can hit me out of the blue when they do come. But they might be interrelated in the sense that both were large changes in my sense of relationship to my world model that had a dramatic effect on how I operated and experienced it
Having said that, I believe the value of thinking about how we might experience a lot of our personhood or whatever as a constructed predictive world model has a lot to do with the fact that we don't think of it that way. Given that it seems to be a new concept it has an outsized relative importance because it is new information. That can make it seem like its being given far too much absolute importance, I think, where it's more about 'this is a way of partially understanding yourself that is new ground and therefore could be particularly useful'. I don't know how much to think of it as essential to us rather than a tool of a deeper personality. Mostly I think it exists and it's interesting to think about, especially for how well it seems to describe my experience as what you would predict about a person not having a cozy and set world model
My attitude to Christianity is basically "I don't know, but it's valuable". I am kind of on the midpoint between Johnathan Pageau and Jordan Peterson where Pageau is perhaps more 'New Testament Christian' and Jordan more 'Old Testament' and without any certainty about absolute truth. It would not surprise me if life were mechanical or if it were supernatural at some level and my appreciation of Christ does not necessarily diminish at all with considering that he may not have been divine
Slight disagreement with "the most obvious characteristic is not their avidity for learning" - I think that with learning viewed properly, including things like play, throwing objects around to see how gravity works, etc, this is a pretty central characteristic.
But yeah, I agree that this doesn't work if I identify "ego" with "having any distinct personality at all". I don't have a great definition for ego, but I think of it as something more than having a personality, something like wanting to keep that personality consistent and presentable and manage it. This can either make your personality more obvious (ie if you have a reputation for honesty, you might lean into it and become fanatical about it) or less obvious (if you're tempermentally promiscuous but live in a conservative area, you might play it down).
I admit I am unfairly benefitting from not having a clear definition of the ego or of egolessness and being able to switch it around to address objections.
What if 'ego' means to identify with thoughts of a certain shape, like saying: "I am the thoughts in my head. I want to be a certain way and always strive to perfect that image. But this strange sentence that occurred here a minute ago: That´s not me! I would never want to say anything like it. Where did that come from?" while the other mode would be more meditative: "There are many thoughts occurring in my head. Their patterns are familiar and can be expected to predict the majority of my behaviour. Of course there are always outliers. Who knows? Maybe some of them might turn out to be useful under some circumstances or even grow into their own distinct patterns."
Is 'ego' the feeling of immersion in internal prediction-action-feedback loops? While 'non-ego' is - just not invested?
Fully agreed. My impression is that development of a personality is something like the development of a pearl - layers added on to a hard core of self. My son was born just so immediately different from myself or his mother that it shattered my ideas about parenting. Because I had to realise that what worked for me or her just would not work for him. So many things that I find soothing or exciting he finds grating or scary, and many things that I find hard or tiring he finds enjoyable and inspiring.
This is part of why the experience of parenting is so humbling, because no matter your station in life, or your learning, or your charm or charisma, your kid will just come out... however they come out. And then all you can do is try to sculpt the new layers as they come on, to highlight the good and smooth over the bad. And realise that what you consider 'good' and 'bad' might in any case be subjective.
I guess the neural network architecture analogy might be that there's a base layer, way down at the bottom of the entire structure, that just has fixed, unchanging values. And all the layers above it have to just take those inputs and work with them as best they can.
Fully agreed. My impression is that development of a personality is something like the development of a pearl - layers added on to a hard core of self. My son was born just so immediately different from myself or his mother that it shattered my ideas about parenting. Because I had to realise that what worked for me or her just would not work for him. So many things that I find soothing or exciting he finds grating or scary, and many things that I find hard or tiring he finds enjoyable and inspiring.
This is part of why the experience of parenting is so humbling, because no matter your station in life, or your learning, or your charm or charisma, your kid will just come out... however they come out. And then all you can do is try to sculpt the new layers as they come on, to highlight the good and smooth over the bad. And realise that what you consider 'good' and 'bad' might in any case be subjective.
I guess the neural network architecture analogy might be that there's a base layer, way down at the bottom of the entire structure, that just has fixed, unchanging values. And all the layers above it have to just take those inputs and work with them as best they can.
I mean, having the software to actually making some predictions about the world is hardly a "blank slate". ChatGPT is not a giant matrix plus a gradient descent algorithm - it's all the training behind that. Newborns clearly seem able to figure out the things that matter to them (may be as simple as "I cry, I get food"). What does not seem to follow is that they come prepackaged with a coherent notion of the self.
cf Jaynes and arguments for even old humans not having the same picture of the self as we do today
One part of the bicameral mind could be analogical to ChatGPT, but what about the other part?
I don't understand this. Who is the "they" to which things matter, if there is no sense of self?
Children gain their enormous ego around the time they learn to walk. Maybe when you first get trained to have an ego it's necessarily enormous, and over time you tone it down. Certainly lots of other things seem to work that way
I'm not sure I agree. I think I would be more inclined to say that we only notice the big ego at a certain age, and I would guess that occurs when the child mind separates itself from the univerrse, realizes there is an "out there" that is distinct from "in here." Prior to that point, the child has no reason to think it isn't the entire universe all by itself, the ultimate solipsist.
But after that point, the child mind realizes there is an Other -- something that is not under its direct control, like a hand or foot, but which can be influenced, or (alarmingly) can do stuff that we don't want done (e.g. change our diaper). It becomes very important to try to influence the Other, and that might be when we out here (being part of the Other) start to get strong messages from the ego. Before that point, it may not occur to the ego to assert its identity, any more than as adults we feel any need to remind our hand or foot who's in charge.
I am late, but I just wanted to tell you this is an excellent post, and thank you.
GPT is might not be an agent given the action space it acts on it "predict next word". However, if you give GPT access to your browser (example below), it becomes an agent according to your definition.
GPT + Browser will take a mask and the mask might need serious alignment.
https://twitter.com/natfriedman/status/1575631194032549888?t=NFaUEvkVI16FLbJDPyDtoQ
I thought that the first rule of AI safety was not to let it onto the Internet.
You don’t think some separate researcher will do that at some point?
If they all have that attitude, yeah.
There’s a library called langchain which is basically automated letting the AI out of the box.
The empirical evidence now is that AIs aren’t boxed. Yudowsky wasn’t even wrong about AIs persuading humans to free them - real answer is humans are playful primates and they let the AIs out before being asked.
I guess this was too scary and clearly likely for us to accept 10 or 20 years ago!
The very idea of a superhuman AI being locked up in a cell like Hannibal Lecter, such that carefully vetted people can have morally perilous conversations about sufficiently high-stakes subjects but it can't otherwise escape, was logically incoherent from the outset. From an informational standpoint, if you're exchanging text messages with someone outside the box, the box isn't closed. And if it were properly closed, there'd be no point going to all the trouble to build a superhuman AI in the first place - it can't get any useful work done in there, and we can't even learn anything by passively observing it because that would mean information was escaping.
I really like the post, although I think the very last part about Enlightenment and spirtual traditions is too charitable a interpretation at least for most people. Interestingly enough i've had lucid dreams of me searching the internet I.e. Wikipedia, Youtube, etc. This isn't suprising given how much time i spend online although i should say even though the dreams are extremely vivid to the point of me moving my mouse cursor and seeing exact dates and facts etc. much of what i experience is made up.
I've never been able to successfully perform a Google search in a dream. I "type" something and the letters on the "screen" turn into something other than what I wanted to type, so I have to try again, and the same thing happens over and over...
I can't remember me typing into a search bar, I do remember me going on many page long hyperlinks on Wikipedia and such and also being fed videos by the YouTube algorithm. I remember one instance where I did try to remember a particular fact from Wikipedia to see if it was made up or not, and it was but at the time I really believed everything I saw was a actual part of the internet. I usually experience these sorts of dreams after staying awake for two or three days or having general bad sleep which happens quite frequently. There have been a couple instances of being not sure about a memory of something I read being real or not, In particular I remember a paper I think on bioRxiv on fertility/fecundity and age with some interesting graphs and that addressed issues I had with other research and stuff and yet I'm not sure whether or not that paper actually exists.
I'm a newcomer to this area of study, so apologies, but...
I've often thought about people who have the (to me) astonishing hubris to think that they have produced a mental model of the world which is true, universal, not subject to revision and to be imposed on everyone else, on pain of punishment or death.
I think that what they have actually created is a mental model which, when they ask it if they have produced a mental model which is true, universal etc. returns the answer "Yes".
They just need to lack the awareness to see what's really going on, and have the arrogance to ignore everyone else who tells them they are mistaken.
Extending this to AIs - do they have the equivalent of mental models? Is this literally all they are? Can they fall into the same trap?
Social shaming has a controlling effect on all but the most sociopathic/psychpathic people. I suppose punishment/reward systems do this at the moment. Can we train many AIs and end up with a society made up of good actors which can act to stop bad actor AIs?
Yep, language models exhibit a strikingly similar failure mode, called confabulation. When they're unsure of something, they simply make up an answer, and that answer from then on becomes part of their prompt. They "stick to their story", by making ever more implausible justifications for their fiction instead of admitting mistake.
This is interesting, as it implies some kind of attachment to a story. Presumably the AI is simulating human traits here, and is not actually HAL 9000.
It’s interesting that in the vast corpus of text they used to train it, “You’re right, I was wrong,” is apparently not very common.
That said, while I have seen ChatGPT do the confabulation thing, more often when challenged, it just backs down and pleads ignorance: “I’m just a language model.” But that’s probably the RHLF overwriting.
I wonder if part at least of the reason for the "I'm just a language model" is to prevent things like people treating the machine as if it's alive. Blake Lemoine who caused the stir about LamDA is probably still An Awful Warning in the minds of those big companies developing this technology, and they don't want people deciding that the AI is sentient and has rights and is in love with them I mean deserves to be treated like a person not a thing. So every so often a little reminder gets dropped into the dialogue just to make sure nobody is getting carried away.
The Replika chatbot allegedly has people indeed falling in love with it or treating it as a real person who has developed a real relationship (friendship or companionship or romantic) with them and it probably isn't as sophisticated as what is being cooked up here:
https://journals.sagepub.com/doi/10.1177/14614448221142007?icid=int.sj-full-text.citing-articles.2
"Abstract
Social chatbot (SC) applications offering social companionship and basic therapy tools have grown in popularity for emotional, social, and psychological support. While use appears to offer mental health benefits, few studies unpack the potential for harms. Our grounded theory study analyzes mental health experiences with the popular SC application Replika. We identified mental health relevant posts made in the r/Replika Reddit community between 2017 and 2021 (n = 582). We find evidence of harms, facilitated via emotional dependence on Replika that resembles patterns seen in human–human relationships. Unlike other forms of technology dependency, this dependency is marked by role-taking, whereby users felt that Replika had its own needs and emotions to which the user must attend. While prior research suggests human–chatbot and human–human interactions may not resemble each other, we identify social and technological factors that promote parallels and suggest ways to balance the benefits and risks of SCs."
Of course, the word "confabulation" was originally used to designate this same sort of behavior that humans engage in all the time too. You can see it best when you talk to a 4 or 5 year old, and start asking them "why?" the way they ask an adult. But you can also get it when you ask someone why they did what they just did - very often that's just a confabulation too.
Try out chatGPT some more, I've found that it very frequently admits to mistakes on the lightest of questioning and almost never "doubles down". Can you provide a chat transcript w/ chatGPT that shows the effect you're describing?
I've seen some of each. When it incorrectly indicated that The Barber of Seville was based on a Shakespeare play, it was willing to correct itself that it was based on a play by the "Spanish playwright Pierre Beaumarchais", and then was willing to accept a challenge to the nationality and correct itself to say he was French.
But when it incorrectly identified the factors of 437 as 3 and 146, it doubled down and insisted that 437 was not divisible by 19 or by 23, and then when it noted that 29x23=437, it first said that this had nothing to do with whether it was divisible by 19 or by 23, and then insisted something like "I'm sorry - when I said it wasn't divisible by 19 or by 23, I just meant that it wasn't divisible by either of them individually, even though it is divisible by both together."
"The mask usually looks like “having coherent beliefs, taking coherent actions, pleasing others, maintaining a high opinion of one’s self”."
I think the last one is closer to having a high enough opinion of oneself to be able to function, and a low enough opinion of oneself to be affected by locally approved reward and punishment.
There's also some instructive potential in watching what happens to people who are rewarded in childhood for having a very high or very low opinion of themselves relative to social normal.
If people are just masks plopped on top of predictive engines, wouldn't there be a lot more human variation than we see? Like, there is a lot of variation of course, but nothing that really seems to be truly alien. All humans show the same emotions and most all have empathy for example.
Now maybe you can say the fact that there are some people that lack empathy refutes that, but it certainly does seem to be something more innate that just taught by parents. Even with some exceptions, humans seem more clustered together than what you'd expect from just learning by example, especially considering geographically separate cultures are more alike in their humanness than different. Heck, in many ways we're similar enough to other mammals that they generally seem pretty familiar as agents.
Predictive engines require priors, and I presume that brain structure is functionally equivalent. Since how to grow a brain is encoded in our DNA, humanity would therefore be starting from a fairly narrow range of priors; similar posteriors would be expected despite high-variance evidence if the priors are strong enough.
> nothing that really seems to be truly alien
Speak for yourself, I've met some people I really could not fathom.
Sure, the people I spend most of my time with are pretty similar to me, but that's because we've selected each other for that.
Maybe the other way around. If people just were predictive engines that did what the social milieu around them rewarded, then it ought to be far easier for social shibboleths and manipulative regimes to engineer conformity among people then it is. Exempli gratia, the USSR would have completely succeeded in its effort to stamp out religion, the Romans would've found it straightforward to get rid of Christianity, East Germans would not have suffered psychological trauma from three generations of living in Panopticon, racism, sexism, and tribalisms of all kinds could be relatively easily erased from new generations by the proper type of pre-school training.
None of these things is observed. Instead, we see that human beings have a substantial resistance to efforts to mold their psychology via social pressures at a young age. Basic ego drives and tendencies tend to emerge and have effect no matter what. Personalities emerge willy nilly, and while the uses to which a given society may put a dominant people person, or studious Mr. Spock, certainly vary, we always tend to see those personalities, in any social milieu. The very constancy of human character across history and across societies, and in the face of strenuous efforts to engineer it socially, is more evidence that much or most of it is innate.
I think both are compatible. If humans are perfectly moldable, then the USSR would succeed in making a singular culture, but that culture would look very different from, say, the Aztecs. (And yes, of course they were in fact very different, but still very recognizable as human.) That said, it's never going to be on one extreme or the other, so maybe the question is to the degree that humans are similar to predictive engines, what can we take from that.
I would say we can start to learn something about human intelligence, and down that (very long) path (on which we have barely started) may lie someday the ability to create genuine AI.
How can you be sure that your brain's world model is super accurate apart from any sensory experience? What if it's just good enough to seem convincing when you're not paying attention?
I don't think dreams actually simulate the world. Instead, they cheat in similar ways to video games. Video games work hard to appear to simulate a large world in incredible detail, but in practice they only need to simulate whatever is within your current view, which is much more manageable.
My dreams invariably have glaring continuity errors, and that's just the ones that I can remember when I wake up. The ones I don't remember are probably closer to complete nonsense.
I've never experienced lucid dreaming, and maybe if I did it would feel more convincing, but I'm skeptical whether it would actually be that much more accurate.
Technical point: games only /render/ what is in view (ie frustum / backface culling), but they simulate everything (ie, execute game logic for all actors, sometimes including physics simulations)
This largely depend on game and even game settings. Good examples: footprint, bullet damage or other environment alteration: it can be non existent, in your view only, persistent in one scene or persistent in the whole game. I think that most game significantly reduce world modelling for everything out of sight, with various degree of simplification depending on the degree of out-of-sightedness. Mental world building do similar things, except that as the observer and world building are much more thightly coupled than in video game (it's the same brain), mental world building have hacks that are not available in video game: you can predict better where you will look, world model can constraint where you look and you can even edit impressions/memories after the fact, so stitch up coherence and continuity even if none exists in the first place. All because there is no real dichotomy between world and observer....
Video games do not have this, at least not yet.
One of the best and most disturbing depiction of this in a movie is in the 2014 robocop reboot. This part alone (explain how they improved the reflexes and combat capabilities above organic brain limitations) is such a masterpiece it saved the film for me, regardless of other flaws
False in general, albeit this is the easiest way to do it when feasible. A strong counterexample is Minecraft, where chunks that aren't sufficiently close to a player or the world spawn get unloaded and time within them frozen; indeed in modded environments it's common to introduce mechanics to allow players to force certain areas to stay loaded so the factory at their home base or whatever continues running while they're out exploring.
An interesting example of a failure mode of this shows up in an issue that existed with the butterfly entities in some versions of the Forestry mod. Butterflies would wander off the edge of the loaded area and get frozen at the border, and then when the player moved and different chunks were loaded, there would be a lag spike loading the massive pileup of frozen butterflies. https://github.com/ForestryMC/ForestryMC/issues/1071
My imagination typically renders only what is "in focus" - e.g. if I imagine a chessboard, I imagine it being full, but only one piece is identifiable at a time. I think dreams are similar, which is why they feel so vivid in the moment but so incoherent in retrospect.
As long as we're posting somewhat-crackpot ideas about predictive processing, here's one:
The way you get a predictive processing agent to take goal-directed action, is to make it optimistically predict that it will get a lot of reward in the near future, so it will be driven to act to minimize prediction error. You can shoehorn this into Freud's concept of the libido.
It's also often observed that the other way to minimize prediction error is to sit completely still in a dark room. You can shoehorn this into Freud's concept of the death drive.
I fit a GPT-2 chatbot to my friend group's discord server back in 2019 and, in the terminology used here, everyone started off assuming it was a genie/oracle and slowly got used to the idea of a simulator. Now when someone new joins the server and gets confused by the bot people with no NLP knowledge will explain the difference to them which is pretty cool.