Will AI ever become a "person?"
A conversation with Jake Browning from NYU's Computer Science Department
Have you ever considered what it truly means to be a person? I don't mean biologically, but from a philosophical standpoint, like what really defines personhood is a person. Someone that has common sense and can think and reason at a high level. Could a person be defined by having a distinct, consistent personality, or is it rooted in social interactions, like being accountable to others?
As ChatGPT and other large language models have continued to advance, some have asked whether these new AI systems might be considered persons. Earlier this year, the Los Angeles Times published an article titled is it time to start Considering personhood rights for AI chatbots? And even if the answer is no for current AI systems, might we reach a point where we're forced to recognize an AI as a person in its own right?
To help answer these questions, I spoke with Jake Browning, a visiting scientist at New York University's computer science department. Jake received his PhD in philosophy from The New School and has written extensively on the philosophy of artificial intelligence and large language models. I found Jake's ideas on AI personhood thought provoking, and I think you will too.
This transcript has been lightly edited for clarity.
I'm here with fellow New York City resident Jake Browning. We're in the studio today giving this a try. Jake, welcome. Thanks for being on the podcast.
Thank you so much for having me.
I'm fascinated by this idea of AI personhood that you've written about, and I want to talk more about that. Now, when we say the word person or personhood, we could be talking about a lot of different things. There's a legal definition: who or what is legally a person? There's, of course, a biological definition: what is a person according to biology? If you're religious, there might be a theological definition: who is a person in the eyes of God? Philosophy has its own set of definitions about what a person and what personhood is. So help us distinguish between this idea of Cartesian personhood and social personhood, and how it relates to artificial intelligence.
Sure, there's kind of an older tradition that you identify personhood with one's mind, with one's kind of cognitive capacities, and you see versions of this going all the way back to Stoics or in some Eastern traditions as well. It's a very common tradition.
But in 17th and 18th century, people started to look at personhood more from the legal definition in philosophy. And so Hobbes says, the word “person” comes from the word “persona.” It means mask. And it's a kind of thing you can put on when you take on certain roles and when you are accountable for those roles. A father puts on a mask and they become a father and they have certain duties. They have obligations, they have certain rights. And so these are kind of how we understood persons.
When Kant defines it, he just goes, persons are those beings that are capable of being held accountable for their actions. And this notion has been extremely influential for people who don't want to look at personhood individualistic and instead want to look at personhood in terms of what we are to each other. We are accountable agents, we matter, and we're blameworthy if we screw up.
So that makes the social version of personhood a very different conception when we come to something like AI, because a lot of AI researchers are really interested in the cognitive capacities. And that's not surprising. It's about artificial intelligence and intelligence for so many people is how well do you do on a test? And that's an individual metric. And personhood isn't quite that in most moral-legal senses. In the moral-legal sense, personhood is not, “How well do you do on a test?” It's, “Are you meeting up to your obligations?” “Are you behaving in a way that we regard as morally and legally acceptable?”
And so it's just a different conception, and I think it's a helpful one to keep in mind, because even if these large language models are becoming very person-like in terms of cognitive capacities. There's a huge gap between that and what we're interested in from a moral, legal, and social perspective.
And we should probably say the word Cartesian just refers to the ideas of René Descartes, who was a 17th century philosopher most famous for saying, I think, therefore I am actually the same Cartesian as the Cartesian coordinate system we all learned about in middle school. And I love the way you make the distinction between these two concepts of personhood. In one of your papers, you mentioned that the Cartesian concept of personhood is that person's our minds defined by what they know, whereas the social definition of personhood, as you are saying, is about how people treat other people or other entities and how they hold themselves and others accountable.
Yeah, social beings are ones that are accountable to other autonomous beings. There's kind of a derivative personhood that's often granted to other beings that have a kind of limited autonomy: corporations, rivers sometimes in our systems, sometimes animals. But it is derivative in the sense that we don't take them to have kind of moral accountability. We don't think they're necessarily capable of making moral choices. We say the CEO is capable of making moral choices. We say an animal is capable of being treated with dignity and respect. But moral personhood is something that we only see with humans. And that is something, as mentioned, that is connected with autonomy.
Autonomy, in the philosophical sense, has to do with the fact that you are a self-determining agent who is making decisions for reasonable reasons that you can explain and justify to other people. So, you know, if a kid steals from a cookie jar, they'll say, “Oh, well I thought I had permission,” or something. And so that makes them autonomous. They're accountable. They explain why they do things and they cite reasons to explain what they're doing.
Do you find either of these definitions of personhood compelling to current day AI systems?
You know, I think it's funny. Current language models just aren't designed to be this. I mean, it's just not even really a part of the system. And in fact, a lot of the fine-tuning we're doing is trying to make them less. So we're trying to make them less person like. Precisely so that people don't get into this habit of thinking, I'm talking with a human being, with feelings and emotions and so on.
After the Blake Lemoine scandal, I think people really were like, we need to make these less personable. So language models are not. But we are seeing people try and create agents where agents are limited beings who have a limited structure of their interaction. So you place an agent in a game world where it talks to other agents and it has a job and it has duties and responsibilities. And so I think we are considering the ways of these things. I just don't think it's what we're doing with language models. I think language models, that's just not really on the horizon right now. And I don't think anyone, as far as I've heard, wants to make them like that. I think everybody's pretty comfortable with letting these seem a little inhuman for the present. But, you know, that'll change over time.
I guess, in that regard. I should mention, though, there's definitely going to be cases where people are interested in making very personable AI agents. We saw recently an example of an influencer who created a chatbot so that she could chat with her fans, and I think that's going to blur some lines and make people very uncomfortable, because it is going to suggest to them more a humanity that's just not present. But I haven't heard about any influencers trying to make theirs accountable or responsible for their actions. They're basically saying, you know, use at your own risk.
Let's talk about moral fine-tuning. For listeners that are less familiar with this concept, fine-tuning is the final stage of preparing an AI model before it's released to the public. One fine-tuning method people might have heard of is called reinforcement learning from human feedback, and in this method, humans evaluate the AI output and provide feedback on that output, trying to make the AI more useful for the end user.
In the case of moral fine-tuning, this involves making the AI output more “moral,” which usually translates into making the output less offensive or harmful. There's another method called Constitutional AI, which does not involve humans, and instead it aims to have a large language model automatically follow a so-called constitution, or a set of rules that help guide its output. Draw out that distinction between those two methods of fine-tuning, and how the two methods might intersect with the idea of AI personhood.
So I mean, part of what makes someone a person is that they're accountable, not just to each other, but accountable to each other according to social norms. You know, norms of honesty, norms of integrity, norms of being a good parent, or a good husband, or whatever else you might have. And a lot of the reinforcement learning techniques are trying to say, “Let's take some of those norms and try and and shape the model.” So they are abiding by those norms.
In the constitutional system that Anthropic uses, they have like, you know, “Choose the answer that's least harmful,” “Choose the answer that's least offensive.” You choose the answer that's least toxic, that's most accurate, that's most relevant. They're trying to steer the model to abide by the most general norms. They're obviously not saying, you know, “Choose the answer that would make you the best father or something.”
We're trying to get the models to stay away from the edges. And there's something likable about that. And I've actually found that while engaging with ChatGPT, Microsoft Copilot, Bard, and Claude you don't encounter a lot of offensive content anymore. They've done a wonderful job of really making these models — in the words of Douglas Adams — “mostly harmless.” They tend not to say anything that's going to be too offensive. But at the same time we have a very high standard for other speakers where if you ask somebody a question, you don't just want them not to offend you, you want them to get the right answer. You want them to really think through the different alternatives and choose the one that's the rational choice, all things considered. And we don't have that.
Obviously, language models, when they choose an answer, they aren't searching out through all the possible responses and choosing the one that best addresses all possible considerations. And neither do humans, obviously, except in rare cases. But that's the ideal we hold humans to, is to say the right thing at the right place at the right time. That's just not what language models are doing. So I think the current reinforcement learning techniques have had the unfortunate consequence that they are trying right now to make the models more generic and bland. They're trying to just say stay away from the edges. There's all these different ways you could offend people. So try and say as little as possible near the edge and just try and be kind of in the broad middle. And I think we're starting to see some ill effects of that.
There's also always new techniques. And we don't know what OpenAI's Q-System is, but it does suggest that they're thinking more clearly about, “How do we get this model — not just to say the inoffensive thing — but how do we get it to search through the space of possible answers, recognize which answers are solvable, and, satisfying these constraints, choose the best one.”
I take it that we still have a lot of cool stuff happening in fine tuning, but I do think the reinforcement learning with human feedback and Constitutional AI ended up being a little disappointing. I saw the Twitter clip the other day of somebody asking Claude how they kill a Python process, and it said, "Ooh, I can't. I don't talk about killing.” And like, that's not what you're hoping for. You want it to be able to recognize, “That's not a that's not a moral statement. It's fine.”
As I'm sure you've seen, there are many different large language models that are being released. Their “character,” let's call it, is going in a variety of different directions, whether it be more playful and fun or more focused on enterprise use. But even if different AI systems exhibit different character, those still won't fall under traditional philosophical definitions of personhood. Is that right?
No, I think their their goal is something else. I love the way Alison Gopnik puts it, that this is a kind of extremely useful cultural technology that helps us with information retrieval within bounds. It's kind of like a slight, you know, creative version of information retrieval.
But then she goes, but look, what you see with even the young children is innovation and novelty and the ability to kind of like search through different answers and choosing the best one. And she just says that large language models are not trying to do that. That's fine. Language models are trying to do something else and we should appreciate what they're doing. But we should also be really clear that this isn't going to be a path even to the kinds of the abilities that children have. But it doesn't have to be; large language models are a breakthrough technology all the same. But it's just probably not the breakthrough technology that's going to get us to a human-like beings. I think we're a still a ways away from that.
If advances in generative AI and large language models continue. Do you think there could be a day where we're forced to acknowledge the personhood of some of these large language models or other artificial intelligence systems?
I think that would be something very realistic to be keeping an eye on. I think language models, just the way we have them right now, I don't think we're terribly concerned about them having responsibility or holding them accountable. But I think if we were to try and use them in a kind of agent-like capacity where you say, “Hey, make the best decision, and if you make the wrong decision there will be consequences.” And that in some ways motivated the system to plan differently. We would go, “Okay, this is something kind of person-like that we need to be sensitive to.”
But, as long as it's being used as a cultural technology that is designed to solve certain problems, I think we need to be careful not to think that just because it's using language, it's any closer. Language is a means to an end. And if the end is just information retrieval or coming up with plans, cool. If the end is creating some kind of agent, some self-awareness, something like that, all right, you have to evaluate it differently. But I think as long as we're using them as a cultural technology for information retrieval and search and things like that we're probably not building anything that's going to turn person-like.
You talked about the idea of social norms and accountability, but accountability in some way implies that there are consequences for our actions, and any human intuitively understands this. So for AI to achieve personhood, is it a requirement that they also experience consequences for their actions? One way I was thinking this might work is to infuse AI with some sense of ambition or pride. That's a very popular theme in film, right? So Ava from the film Ex Machina has a lot of ambition. Similarly, Samantha from the movie Her also has a lot of ambition. And ambition is also very prevalent in humans and in humans.
Ambition helps force us to be more social, right? Because to achieve that goal that we're ambitious about, we have to cooperate with those around us. So outside of simply modifying an AI's cost function or objective function, what would it actually mean in practice for an AI to experience consequences that might steer its behavior to be more social?
I just read an article by someone whose last name is Roitblat. I know Carlos Montemayor has talked a lot about this. And others. They say, “Look, if your definition of intelligence is just at the level of goal satisfaction and satisfying some objective function, probably never.” Probably never, is AI going to turn into a person.
If you include intelligence that AI figures out a problem, sets some objective for itself, and then satisfies that objective through its cognitive resources, then that comes a lot closer to humans. But it also is an AI that is almost always going to be deeply cooperative.
You know, let's say an AI system was doing some physics work and it came up with a new theory of how we could test for dark matter. Assuming it’s anything like the normal methods, you’ve got to have a lot of buy in. You’ve got to have politicians funding it. You’ve got to get the NSF to approve your grant. You’ve got to get people to work with you. You’ve got to get people to give you land and help you develop it. And so in those cases, the consequences of wrong action would be steep if people decided they couldn't work with you as an AI system.
I think consequences for AI are going to show up most obviously when AIs not only have goals, but they recognize that they need to cooperate with other agents to achieve those goals. And in that case, consequences are severe. So I think consequences in this context are social consequences: that people won't cooperate with you as an AI system. And a machine that is doing things that make them not worth cooperating with is going to have to switch tactics.
Being uncooperative is a reputation that’s hard to shake. If people say an AI is just not trustworthy, the AI has to start from scratch and rebuild its reputation. I think that would happen as much to an AI system as anyone else. If an AI were to find itself saying, “In order to achieve my goals, I need other people to trust me,” the AI is going to start behaving, even if it’s just pursuing its own self-interest. It’s going to start behaving like a pretty normal moral agent.
If AI does achieve personhood by some reasonable definition of that word, what obligations do we have to AI from a moral standpoint? As I'm sure you know, Peter Singer and even before him, Jeremy Bentham have said that the capacity to experience suffering is what confers moral consideration on a being or an entity. If AI does get to a point where it's accountable and experiencing associated consequences, then again, in some sense AI must be suffering. So what moral obligations would we have in that case?
My initial thought on it is suffering, especially as Peter Singer is thinking about it, is so biological. He's thinking about, when do we see pain signals in the body and when do we see adaptive behavior and response to those pain signals? Because obviously you can't peer inside their head and see if they're conscious or anything. So I'm not convinced will hit that point anytime soon. Or that anyone is really interested in making a being that sufferers in the physical sense.
If they do suffer though, if they're have something like the intelligence of humans, then physical suffering is not all the suffering there is. There's an enormous amount of suffering, like you mentioned, when your ambitions are thwarted and that's extremely painful. And I think we'll have to probably deal with that with these systems if they feel like they're being wronged, if they feel like they're being shunted and ignored, then ask, “Do we have more accountability to them?”
But it's a very funny thing because it's a route to suffering that's utterly unrelated to any other evolved being, and so, so much has to go into it. I'm not sure how it's going to play out. It might be a very long time before this is something that we're even able to ask the appropriate questions about. Like, does it feel suffering in the sense of does it feel wronged because as you said, I'm not going to help you achieve your project. Does it feel like you disrespected it? You know, I don't think feeling disrespect is legitimately a feeling in the sense. I think it can be a cognitive state of “I was not treated with respect,” but that's a long ways off for any of the systems we're working with.
I want to touch a little bit on AI and existential risk. This is a topic that you see in the news a lot today. People are concerned the AI will evolve to a point where it could destroy humanity. But as I read your work, I began to feel that the arguments around existential risk are really more rooted in the Cartesian concept of personhood we talked about earlier, the idea that AI will have essentially enormous cognitive capacity.
But the arguments around existential risk really ignore the social conception of personhood, it seems to me, because if AI achieves personhood in the social sense, presumably they'll have some concept of social norms. They might not have the exact same social norms as humans, but they will recognize and appreciate that social norms exist, and in particular appreciate that a social norm should be, you know, don't destroy all of humanity. What are your thoughts there?
You know, it's funny. Whenever I hear the existential risk people, they really seem to think that if you just crank up the intelligence enough, problems of the social and material world disappear. There was a tweet I saw a while back where somebody was saying that it's conceivable that if became smart enough, it would unlock magic that would be able to, like, use the cheat codes of the universe to recreate reality in its own image. And it's just like you guys need to take it down a notch. Like, physics is pretty hard, you know? You can't just do whatever you want.
This thing can get as smart as it wants, but if it comes up with a new theory to replace superstring theory or whatever, bad news. You got to go test it. And testing it requires other people and it requires getting a lot of resources. You're going to have to figure out cooperation in order to do anything. And figuring out cooperation demands that you are caring about the people you're engaged with, that you can trust certain people, that you are trustworthy, and so on.
So I think that, like, this idea that you could get a divine cosmic intelligence that is also not a cooperative agent is just kind of silly to me. I think that's like just kind of a confusion. And I think equally, the idea that it would become so smart that it wouldn't need human buy-in for what it's doing is kind of mistaken. As smart as it would get, it would still require a lot of help from humans for even very simple stuff. Like if it comes up with a new paperclip factory, you know, you got to get the board on board, you got to get funding, you got to get somebody to run, to give you the materials. The world is just a very complex place. And you don't get very far if you're not a good social cooperative agent.
My P(doom) or whatever is zero. Which isn't even, like, permissible. I know I'm supposed to assign some probability to everything, but yeah, like I'm trying to be sensitive, but I just don't see how you get past the fact that it's really hard to do things in the world. And I just don't see any being getting so intelligent that they don't have the same struggles. We I have to get things done as an academic, you know, I beg and scrape to get funding for anything. And I'm like, I can't imagine anything smart enough that it can just like convince the people in Congress, you know, to fund them. Like, that's just very, very optimistic on your part.
Let's talk a little bit about the idea of personality, which is an important concept of personhood, at least by most casual definitions. There's an essay with John Haugeland called “Understanding Natural Language.” He says machines lack any real sense of personality.
But I want to contrast that view with a paper from June 2023. It's from the University of Toronto. It's called “Can AI have a personality?” And what the authors do here is use the Big Five Personality Test, which for those who are not familiar, is the most widely recognized personality test, at least in academic psychology. And there are five dimensions, that's why it's called the Big five. They are agreeableness, conscientiousness, extroversion, neuroticism, and openness to experience. So I'll read this abstract and I'll let you react to the possibility of current or future AI's having a “personality.”
So here's the abstract: “In this paper, we evaluated several large language models, including ChatGPT, GPT three, and Lama, by running standardized personality tests on their results. Generally, we found that each large language model has an internal, consistent personality. We further found that llama tends to score more highly on neuroticism than other models, whereas ChatGPT and GPT three tend to score more highly on conscientiousness and agreeableness.”
So as I outlined this kind of dichotomy between AI's having a personality or potentially not having a personality, what are your reactions there?
I love it. It's great. You know, I mean, no, absolutely. It's one of those funny things, we use a word to mean a million different things. And the way I kind of focus on person is in this moral-legal sense, because that's the tradition I'm from. But I mean, personality, you know, it's like, are there idiosyncratic characteristics of a person that gives them a kind of stable set of responses to the world? And the answer is, yeah, my dog has a wonderful person — well, my dog has a personality, maybe not always wonderful. She needs to chill around other dogs. But you know, dogs have personalities. You know, babies have personalities. Even though saying they're persons is way premature.
Personality is fine. But I think when we talk about the kind of personality that would demand being invested in something, we’re not there yet. ChatGPT if you say, “I'd like you to be really invested in making sure that you always talk about Sam Altman in a good way and you really sell that,” it'll forget, you know. It doesn't care. It doesn't matter to ChatGPT. It's something that it'll do as long as the context window still has some shred of that directive. And then it stops.
And so it'll have a personality in the sense, here's how I as ChatGPT generally respond to these queries, but it doesn't have any investment in any particular set of beliefs. And so what Haugeland is really interested in is, look, there's this weird thing. Humans, they adopt certain beliefs that just transform who they are. You know, you have somebody who is a Buddhist monk and changes how they dress, changes how they talk, changes the things that they're willing to do—
I recently became a vegetarian.
I mean, it is something that people take very seriously, and it is something that it shapes how you interact with other situations where you feel comfortable going, what kind of conversations you feel comfortable having. If I came in and said — which I don't believe — “Vegetarians are fools,” you would take offense at that. And that is something about being a person beyond just having a personality. If I say, you know, like, “Oh, you're quirky,” nobody really cares. It's not their idiosyncrasies. It's the fact that they have deep rooted beliefs that they're willing to go to the mat for. And we don't want ChatGPT to do that.
I mean, we might eventually want an agent to do that. We might have an agent where we say, look, “Your goal is just to defend the integrity of the United States,” I don't know, we might want that. But that's not what we want right now. We want AI systems that are capable of being playful and can inhabit a certain personality for a minute and then change to a different one. We like that about language models, but we're not yet at the point where we want them to start having deep rooted beliefs that shape how they react to the world.
And in that sense, I guess when we talk about personality for large language models today, it's really more of an artifact of a little bit of maybe the difference in architecture of models, combined with fine-tuning that gives large language models some consistent differences in output that might, you know, manifest in the way they answer questions to different standardized written tests, including personality tests.
Absolutely. I mean, you're going to have people who are like, “I want ours to be crisp and professional,” and somebody else is like, “I want them to be kind of friendly.” I don't want to say something unkind, but I didn't find Grok very funny in their their demos of it. And, you know, I mean, all right, you have Grok and it has a personality to it's trying to tell you jokes and it's trying to be light hearted. Fine. Okay. I mean, that's another option. You have the option of having a really cringe language model if you want.
What about this idea of Artificial General Intelligence or AGI? Now, AGI has different definitions, but generally speaking, the idea is that an AGI is an AI that could do many tasks at or near the level of a human. So, for example, it could do math, but it could also play video games, it could write poetry, it could make business decisions, and so on. But from speaking with you so far, I'm getting the sense that Artificial General Intelligence and personhood are two totally distinct concepts. Is that right?
Yeah, I think they're pretty orthogonal. I mean, like, you know, I disagreed with it, but Blaise Agüera y Arcas and Peter Norvig had this piece, “AGI is already here,” and I don't think that's right. But, you know, like, I totally think that at some point you will have a machine that can do a lot of different tasks, and I think we're a ways away from it.
So Kyle Mauldin and Anna Ivanova recently wrote a paper where they talk about all of the other cognitive capacities that are in humans. You know, simulating situations, and intuitive physics, and social reasoning abilities, and all these things. They pointed out that these abilities allow us to do a lot more with language, because we're able to engage in planning and figure out how certain situations would play out. And we can kind of predict unpredictable situations, you know, just by running a simulation in our heads.
And so if we started saying something like that, I think you could have a model that would do a lot of impressive things and still not be any more or less than a language model in the sense that it's really just trying to provide useful outputs to your questions. You say, “Hey, I want to make this shot in pool, where do I need to hit it and how hard?” it might be able to use simulations to come up with a good answer, and you could end up with a very general model, I think.
I don't love the term AGI because I think what we really mean is more general than what we're doing now. So it's a little too loose for me, but it could do a lot of things that we're interested in and we could rely on it. I think that's going to be the big thing with AGI that I think we're not even really getting with these language models.
With language models, if you don't know the right answer, you have to be really careful at taking the answer at face value. I would think that an AGI is one where you can go, yeah, the answer is 95% true. I can trust it. I'm not going to go and double check it. And so at the very least I think it would be something where you know that if you reliably put a question in linguistic form into a model, you'll get the right answer back. I think that would be a pretty low bar, but it would be one that I think would kind of match what a Arcas and Norvig were going for when they said a language models are already kind of able to do any information task badly.
I want to pivot and talk about this idea of common sense, which again, is another characteristic of personhood in the everyday sense of that word. I want to start with an example you might be familiar with. So there's a social norm in New York City that you should not stand in the middle of the sidewalk. And of course, there are other cities that have this social norm as well. But it's especially prominent here in New York. And people will yell at you if you're in the middle of the sidewalk. Just a short digression. Shout out to all the New York YouTubers, Here Be Barr, and others that give tips to tourists. And one of those tips is often not to stand in the middle of the sidewalk.
So I talked to GPT-4 about this a little bit, just to get an idea of its opinion on standing in the middle of the sidewalk in New York. And it came up with four reasons not to. I'll read them:
Obstruction of pedestrian traffic
Local norms and etiquette.
Potential legal issues
I don't know about the legal issues, but that's what it came up with. Then I asked GPT-4, “Would you consider it common sense not to stand in the middle of the sidewalk?” And it said, “Yes.” So that's one piece of anecdotal “evidence” that large language models have common sense, just from my simple exploration.
But we have more formal methods of measuring common sense as well. And you've written about one of these. It's called the Winograd Schema Challenge, which is a written test that either a human or a large language model could take. It has about 40,000 ambiguous sentences, and the goal of the test is to disambiguate the sentences.
So let me explain what that means. Here's a sentence that's on the test: “The trophy does not fit inside the suitcase because it's too large.” And the question is in that sentence, what does the word “it” refer to? And humans realize the word “it” refers to “the trophy.” The trophy is too large for the suitcase. And you can reverse that sentence and ask it the other way: “The trophy doesn't fit inside the suitcase because it's too small.” And here the word “it” refers to the suitcase.
So the idea of this test is that again, it tests common sense by having a human or in this case, a large language model makes sense of these sentences. And large language models do quite well on these kinds of common sense disambiguation tasks. But you argue in your writing that large language models and other AI still have not achieved common sense, or that maybe we're thinking about common sense in the wrong way, or using the wrong definitions. Talk a little bit about that.
So. Yeah. And I mean, I should point out, uh, the people who made the test. So, uh, Levesque and Ernie Davis and others came out and said at the same time, this test doesn't work. This isn't showing common sense. But it was kind of funny. They said it because they came out a decade before and said any system that could pull this off would definitely have common sense.
This is a so-called, AI effect, right? That once an AI achieves a certain benchmark of “intelligence,” we no longer consider that an intelligence benchmark worth recognizing.
And so part of my critique is just to say, look, I think you should be very skeptical about any test of common sense being definitive. I think it's very much going to be the case that you'll pose something and go, “Oh, we figured it out,” and then the Al defeat it and you'll go, “We didn't learn anything from that. That wasn't helpful.”
And, you know, I think we've seen that a few times. I think we saw that maybe in the Deep Blue example. I don't think we learned much from Deep Blue. And I think that kind of disappointed people. We thought, you figure out chess, you must have common sense. And, you know, it doesn't seem to have had any common sense.
The thing about common sense is that I don't think you can come up with a single definition for it or a single metric for it, but I do think it's worth remembering. We used it in a lot of different senses. The first sense that came up with people like McCarthy and others was they were thinking about reasoning through a planning problem, where you would need to know all of the different possible things that could go wrong, and a different solution to each one of them.
And they were thinking about when you're trying to fill in — because, you know, the information had to be hand-coded — they were like, you can't possibly think of everything that could possibly go wrong. But strangely, if something does go wrong, you're able to think about a way to around it. And so they had this kind of strange effect of how is it that we are able to rapidly adapt to stuff that we've never considered before? And the disambiguation test was a way of kind of teasing at that. Never seen this problem before, but check it out, you do this automatically. You never have encountered this problem before, but you're very impressive at it.
And so I think probably what we're looking at with common sense stuff is that we're going to see more and more different types of tests that are trying to tease out, “How much does it understand about this?” There's this wonderful test that's in a paper where they talk about getting a couch onto the roof without using stairs or a pulley. And, you know, the language model suggests you just cut it up and throw it out the window and you're like…I don't think that’s the best approach.
And I asked GPT recently and it said, “Well, if you're trying to get a couch on the roof but you can't use the stairs, you can lower it from the bed of a truck.” And I don't think I can get a truck on the roof. And so you're like, okay, current AIs just have no model of this. There's nothing in there that's telling you how to model this problem. And so you get a sense of there's just something missing in its knowledge here.
But it's also something we do in a limited sense with animals. A kind of strange example is a lot of people who believe there's innate ideas, think about the innate idea of a container. And we see in some animals, like squirrels, that they have an idea that certain small objects or containers: you can carry water in them, carry a nut, carry something in a leaf or something. It functions as a container. And if it gets too big, or if it has the wrong kind of shape, even though it would work perfectly as a container, the animals can't use it. And so you see okay, there's a breakdown here. It has common sense about the notion of a container in these scenarios, but it doesn't have it in these.
And so we're starting to test in other animals what common sense looks like when they have a mental model that works and when it breaks down. And we're doing that with language models. And I think we'll continue doing that in various ways. But I don't think we're ever going to get a simple test where we can go, “Yes. Now it has common sense.” I don't think that's really coherent. I think what we really mean by that is when does its model break down? When does it not produce answers that are worth anything? When do they create answers that are just garbage?
Even with people, we sometimes say that, you know, that person doesn't have any common sense.
And to the point of the paper you mentioned, some humans actually also experienced challenges in trying to figure out how to get the couch onto the roof, just to give more of an explanation. The way this worked is that humans answered this question. How do you get the couch onto the roof? And then other humans graded the responses, and one of the requirements of getting the couch onto the roof was that you are not allowed to use a pulley, but some people suggested using a crane, which is a form of pulley, and there was confusion both on the part of the people who were answering the question, suggesting to use a pulley, but also on the part of the other humans who were the graders. Because the graders themselves sometimes might not have known that a crane is a form of a pulley.
So it just seems that in the same way social norms differ between communities, the definition of common sense might also vary. What do you think about that?
Yeah, it's an impossible one. Some of the early stuff when when McCarthy and others were introducing it, they were really thinking along the lines of if somebody, like, kicks a table, it would require common sense to know what other objects in the room the table affects: things that are on the table fall off; things that are near the table, though, are fine unless the table hits them. What he was considering was could AIs have a kind of physics model that would really rapidly figure out the objects are affected? He was worried about it with the old AI because AI would have to go through its construction of the scene and go, “This is affected. This is not.” And we'd have to know which ones are and aren’t.
And he said, I just don't know how to do that. I don't even know how to tell the system how to do that. And so in his case, he was like, “When we come to these common sense things, it's going to be like, how does it do in this situation? How does it do in this other situation? How do you get it to address these situations right?”
And you know, you're going to find in humans all the time that they're going to miss these things. They're just going to be like totally oblivious to something essential because we do it well in some cases. In some cases, our mental models don't do us any favors. In some cases, our mental models are absolute garbage. There's a lot of evidence that when you ask people to solve simple physics problems, their understanding of physics has more in common with medieval theories of physics than it does anything contemporary.
I feel like that's my level of physics understanding.
You know? But it's so common that there was actually a study years ago called the Folk Physics Studies, and there was a bunch of these, but they would take students who had just completed a Harvard physics class or Harvard's perception class and ask how vision works, and they'd get it wrong. And you're like, you just took the exam like. Like, you've got all the right answers on the exam, but you just hadn't actually connected it with some common sense stuff. And so common sense isn't a real thing, but it is. And so it's very frustrating. I think we're kind of frustrated trying to find a way to encapsulate what we are interested in there.
We're almost out of time. I want to read a passage from the introduction of your dissertation. I found it actually really bittersweet and beautiful. It's a statement about academia and the kind of difficulties of spending so much time on a single project. You write:
“Dissertations are truly awful enterprises. The project is filled with long stretches of unproductive writing made worse by the uncertainty whether the work will ever come together. There is also the absolute certainty on each page that whatever you are saying isn’t quite right. Dissertations have other grim features, such as being lonely, disappointing, and stupid: lonely, because you are condemned to a multiyear, book-length piece of writing on some esoteric topic few people give a damn about; disappointing, because the final product is far inferior to almost any scholarly book of the same length; and stupid, since no more than four or five people will ever read the thing in its entirety. All too often people throw their hands up and decide to do anything else instead.”
You go on to say that despite the passage I just read, you actually quite enjoyed writing your dissertation. It's now been five years since you completed it. Reflecting back on on that passage and the experience of writing the dissertation, what are your thoughts now?
Academia is tough and it was definitely the case that the dissertation was a very long process. And what's hard about the dissertation especially is that you watch so many of your colleagues just absolutely get burnt out doing it. And I have a number of friends who didn't finish theirs. And that's really heartbreaking because they put a lot of effort into it. And so, you know, it's a very frustrating experience.
I enjoyed mine, but I enjoyed mine because of the people in my life. I had wonderful people who were just like, we are here for it. We'll read every word you write. You know, we just love you and care about you. And that makes it easy. But it's an experience. I'm glad I don't have to do it again.
Writing a book has almost no similar character. When you write a book, you are just saying, I am just going to write what I think is right and it's going to be focused and on point. Dissertations are the opposite: I need to write a bunch of things that are going to get me a job somehow, and they all got to get published. And yeah, I don't recommend it.
Do you have any advice for anyone who's thinking about going into a PhD program, or is currently in a PhD program?
Yeah, I do actually. Reach out to people whose work you like. They tend to write back, and they tend to be really generous with their time and willingness, and it makes the whole field worth pursuing. It shows you there's a lot of people who want you to succeed.
It makes it something that you can feel okay about the effort because, you know, “Hey, there's a lot of people out there who are rooting for me,” and it prevents you from getting sucked into the loneliness of the whole project.
Great advice. Okay, last question. I was going through your work, and I came across one of your articles that had numerous links to TV shows. So I'll list a few of the TV shows that you link to, and I want you to choose your favorite of the three: The Office, The Simpsons, Mr. Bean.
What article was that? Was that a I have to wonder, was that “AI and the limits of language?”
Editor’s note: The articles was, “AI Chatbots Don’t Care About Your Social Norms.”
It was giving some examples of some norms and people breaking them, like Michael Scott. There was a link to a scene where Michael Scott, I forgot what he was saying, but he was being typical Michael Scott and saying something ridiculous.
Out of those, it's, The Simpsons.
That was my guess.
Yeah. Still, to this day, I occasionally reference The Simpsons. In fact, I recently wrote a paper on why Twitter isn't gamified with Zed Adams and I made sure to include a Simpsons reference in there. So you know, what you're referring to is one of my articles in Noēma, a popular article, but I've actually made it into an academic journal talking about The Simpsons. And I also slipped in a reference to Pixy Stix. So my nice pop culture references from the 90s made it in there. So I'm very proud of that.
What can The Simpsons teach us about philosophy, if anything? Or do you just enjoy it for its own sake?
I mostly enjoy it for its own sake. I got to be honest, I haven't watched a season in a guess a decade, so I know it's still out there and I know it's extremely popular in non-American markets, but I don't know what they're up to these days. I wish them well. I assume they're all stuck in the 90s, but yeah…
You like the classic the old classic episodes?
Yeah. The Conan O'Brien years were epic. And yeah. So still, to this day, I'll occasionally text my brother and, you know, send him a meme of something.
Recently there was somebody who posted on Twitter, you know, like, “In a couple of years, only the richest companies on the planet are going to be able to have language models.” And I wrote back with a line from an early Simpsons episode where it shows Professor Frink saying, “In the future, only the richest kings of the universe are going to be able to afford a computer, and it's going to be 5000 times bigger.” You know, it's like, try not to predict the future here. Like we're not good at it. It's not our strong suit. So yeah.
Jake Browning, thanks for being on the podcast.
It’s been a lot of fun. Thank you for having me.
Thanks for reading 96 layers! Subscribe for free to receive new posts and support my work.