Could LLMs be sentient?

written in llms, machine learning

In June 2022, a story hit international news that a Google engineer believed that one of their LLMs had achieved sentience.

Blake Lemoine was testing Google’s conversational LLM LaMDA (the model that went on to power the original Bard) through a series of chats. Over time, as Lemoine completed more and more of these tests, he began to believe that this model was showing signs of sentience - in fact that it had a soul. Alarmed by the idea of a sentient entity being potentially exploited, Lemoine went to the press to advocate for policies to protect LaMDA and similar models. He was promptly fired for violating Google’s privacy policies as he had released confidential transcripts as part of his work with the press, and Google released their own statement, saying:

Our team … have reviewed Blake’s concerns per our AI principles and have informed him that the evidence does not support his claims. He was told that there was no evidence that LaMDA was sentient (and lots of evidence against it).

On first glance, this seems obvious. Of course this LLM was not sentient! But when you think about it for a second, you might wonder: what evidence did Google consider to dismiss Lemoine’s claims of LaMDA’s sentience? And what convinced Lemoine that it was sentient in the first place?

This and the other two blog posts in this series are based on a keynote I delivered at PyCon Italia this year, as well as a talk at NDC Oslo, where I debunk some of the more outrageous claims about LLMs demonstrating human-level traits and behaviours. If you also want to give the whole talk a watch, you can see it below. You can also read the previous post in this series, where I discuss why LLMs are not using language in the same we do as humans.

I have based a lot of my work for this post on the excellent paper, “Could a Large Language Model be Conscious?” by David Chalmers, which is an great read and dives deep into this complex and fascinating topic.

What is sentience?

Let’s start by defining what sentience is. Sentience can be summed up as any subjective experience we have, collectively called “quaila”. This can include anything from any sort of cognition, the experiences we have, feelings and perceptions, awareness and self-awareness, and having a sense of selfhood.

As you can see, what sentience is is pretty hard to pin down and define concretely.

A way to make what sentience is more intuitive is to think of it as the subjective experience of being “like” something. We know intuitively that there is a subjective experience of being “like” some things in the world, like being a bat, just as we know that there is no subjective experience of being “like” other things, like jackets or water bottles.

This is illustrated really nicely by the philosopher Thomas Nagel in his famous paper, “What is it like to be a bat?”

… imagine that one has webbing on one’s arms, which enables one to fly around at dusk and dawn catching insects in one’s mouth; that one has very poor vision, and perceives the surrounding world by a system of reflected high-frequency sound signals; and that one spends the day hanging upside down by one’s feet in an attic.

In so far as I can imagine this (which is not very far), it tells me only what it would be like for me to behave as a bat behaves.

Image credit

What we can see from Nagel’s exercise is that subjective experiences of being like an organism are unique to that organism, and are cohesive experiences shaped by that organisms’ sensory experiences of the world. This is what makes up an organism’s sentience.

So when do organisms develop sentience? A better question is to ask when an organism needs to have sentience.

As we go about our lives, we are constantly receiving sensory inputs that our brains need to process. Some of these inputs are so common and need such a fast reaction, that they are processed at the level of mere signal processing, and our brains react with reflexes, such as strong light causing a narrowing of the pupil.

However, most of the sensory information we need to process needs to be integrated and abstracted in some way in order to be useful. This is because processing individual signals would be an overload of information and not particularly useful. Our brain does this by creating some sort of higher-order meaning from incoming signals, and then passes this meaning on to other areas.

One of the easiest ways of understanding these levels of representation is by looking at how the visual system works. When we first see something, like the strawberries in the picture below, it is in the form of light reflecting off an object, which hits our retinas and is converted into electrical signals. But these individual electrical signals don’t tell us this object is a strawberry - this information is way too low level.

Image credit

So what the visual system does is passes information through different parts of the brain, where a bottom-up integration of this information happens. In the visual cortex, the orientation of individual lines is first detected, and these are built up into features like textures and contours. You can see we’re getting more meaning here, but our brain has still not consolidated this into the concept of a strawberry. It’s not until additional information gets pulled in from other parts of the brain, such as emotions and long-term memories that we become consciously aware we’re looking at a strawberry. (This is an oversimplification, but the general principle is the same.)

So we’ve investigated when sentience may evolve. But we haven’t really talked about why.

The need for sentience is thought to be driven by evolutionary selection pressures, coming from the environment that an organism evolved in. The reason we create subjective experiences is that they offer us a convenient way of processing, and importantly, also reacting to information at the right level. Subjective experience is therefore not a neutral, passive acquisition of information: it is an active process of sense-making which is designed to help us make effective decisions about how to act, such as in response to a threat like this lion.

Image credit

What this means is that while there is no consensus among researchers about what turns simple information processing into sentient experience, we know that the things that we can think about, the things we can become conscious of, must be things that are sufficiently important in our environment to warrant such a level of representation in our brains.

Was LaMDA sentient?

So let’s go back to LLMs: what sort of evidence might Lemoine and Google have been talking about when they said LaMDA did and did not have sentience?

Well, we know why Lemoine thought LaMDA was sentient: it told him it was sentient. Here is an example from one of the transcripts Lemoine released:

Convincing stuff, huh?

However, the thing with LLMs is that it’s very easy to get them to tell you what you want to hear. Reed Berkowitz did an experiment with GPT-3, seeing what happened when he made a slight alteration to Lemoine’s prompt that completely negated it. Most of GPT’s answers wholeheartedly agree with the input prompt, a selection of which are below:

As you can see, relying on LLMs to tell us they are sentient is not very compelling evidence.

That last output becomes even funnier when you consider that LaMDA (or as it became, Bard) was in fact not very good at math, as you can see from all of these Reddit posts complaining about it. To be fair, this is typical of language models - as models designed for language tasks, they tend to fall down when asked to handle any maths tasks, from simple arithmetic to proofs that require complex reasoning.

World-models

So let’s look beyond self-report evidence, and go a bit deeper. We saw from our investigation of sentience in humans that sentience depends on some sort of ability to represent meaningful information about the environment in a cohesive way. We can think of this as a world-view or a self-view. Is there any evidence that LLMs have these?

Revisiting the previous post on language, we know that LLMs can integrate information to a degree. They can form models of syntactic patterns, and it certainly seems they can create some limited sense of meaning. We also know that other types of neural nets can create internal models. Take the example of convolutional neural networks (CNNs), a very popular type of model for computer vision. These models are able to derive features from images in a hierarchical fashion - just like our own visual cortex. We can see here, CNNs start detecting edges, then textures, patterns, parts, all the way up to whole objects.

Image source

However, the main issue is that LLMs seem to lack coherence between the different models they’ve created. A basic example of this is that LLMs are prone to contradicting themselves even with simple factual statements. We can see this in an example here, where a prompt about Black Mirror, fed into the same model twice, yields contradictory information. The first time, the model correctly outputs that Black Mirror started on Channel 4 then moved to Netflix, but then the same model outputs that Black Mirror started on Netflix, showing how LLMs cannot correctly process the same inputs in a consistent way.

This inability to integrate different models is, for me at least, one of the strongest arguments for LLMs not showing sentience. If we go back to our example of how we perceive objects: this would be like if we saw the strawberry, but only used information from the visual system to react to it. We wouldn’t know that it was something edible, that it was something we liked to eat, we wouldn’t know what to do with the strawberry. Such a lack of coherence means it is impossible for LLMs to act in a predictable, action-orientated way in response to environmental inputs.

And this is likely because LLMs don’t have an environment, as we already discussed when talking about language. LLMs don’t have the same sensory inputs we have - they cannot feel, smell, see, they can only “speak” and “listen”. LLMs also don’t really have any selection pressure the same way our ancestors did, meaning they have no reason to act in a coherent manner that can ensure their survival.

Self-models

On top of not having integrated world views, LLMs certainly don’t act with an integrated self-view, or unified persona. You can see this when you use an LLM: it’s easy to get them to write colloquially, like an academic, like a tabloid newspaper, or even like your nice grandmother.

One attempt to overcome this are agent models - these are LLMs which are fine-tuned or prompted so that they simulate a specific persona. However, the models are quite limited, and still show signs of disunity.

Screenshot from character.ai

There are also ethical concerns with creating such models. It is fun to have agent models mimicking Alexander the Great, or, for some reason, Dobby from Harry Potter, but it gets into uncomfortable ethical grounds to have ones that behave like real, living people like Barack Obama or Lula da Silva. And let’s not get started on use cases like “Talk to your ex”, an app that creates an agent model of your ex from your chat history with them, allowing you to talk to them long after they dumped you.

Is it possible that LLMs have achieved sentience?

Overall, sentience is one of the slipperiest claims made about LLMs, partially because it is not even well understood how to define it in humans.

What I can say is that the evidence seems to be pointing strongly to the idea that LLMs do not possess sentience. So does this mean that we still have work to do? Should we be aiming to create an AI system with sentience? Well, maybe?

The question that remains unanswered is what the subjective experiences of being “like” a large language model would be, and therefore we don’t know what this implies.

As we don’t really understand the role of sentience in humans and other organisms, we don’t really know the benefits of having a sentient AI. There is no promise that it’s capabilities will drastically improve, because we don’t really understand the role that sentience plays in our own capabilities.

Moreover, we don’t know whether a sentient AI would be “like” us. Sentience is by nature, subjective, and its not guaranteed that the subjective experience of an AI would be compatible with the experience of being like a human. All of this raises questions about both the benefits and ethical implications of conscious AI, and certainly suggests this is something we should not embark on lightly.

If you liked this post, check out the previous one in this series, where I discuss whether LLMs have human-level language use. In the next post, I’ll end this series by exploring whether LLMs could be showing signs of artificial general intelligence.