Skip navigation
Talk to Ditto the donkey and help him learn English Convo.co.uk - Learning bit by bit

How Most Chatterbots Work

Most chatterbots in use today work by keyword spotting, either at the level of individual words and phrases or, occasionally, at the level of multiple utterances. This means they are little better than lookup tables, and they certainly don't have any understanding of the conversations they take part in.

Simple Keyword Spotting

Most conversational systems work on a straightforward stimulus-response principle, where a given input from the user gives rise to an associated output. A good example of the use of this technique is Dr Richard Wallace's ALICE. In this type of chatterbot, a pattern-matching algorithm detects certain keywords or key phrases uttered by the user, and replies with a pre-programmed response.

Here's a hypothetical (and very much over-simplified) example of how a pattern-matching chatterbot might work. The software contains a lookup table like this:

Stimulus

Response

britney

Britney Spears is a very popular singer!

hello

Hiya, how are you today?

music

What's your favourite kind of music?

talk about

Let's talk about Britney Spears.

If the bot receives a certain stimulus it replies with the corresponding response. If all goes well, the conversation might proceed as follows. (The keyword stimuli are shown in bold.)

User: Hello there!
Bot: Hiya, how are you today?
User: Good, thanks. What shall we talk about?
Bot: Let's talk about Britney Spears.
User: I don't like Britney Spears much!
Bot: Britney Spears is a very popular singer!
User: I hate her music.
Bot: What's your favourite kind of music?
User: I like heavy metal.

In this example the conversation flows well, and the bot does a reasonably good simulation of human conversation. But using the same lookup table, things can go badly wrong:

User: I hope you don't want to talk about Miss Spears again.
Bot: Let's talk about Britney Spears.
User: Hello? Did you hear what I said?
Bot: Hiya, how are you today?
User: I'm fine, but please don't mention Britney Spears again.
Bot: Britney Spears is a very popular singer!
User: Look, let's keep off the subject of music!
Bot: What's your favourite kind of music?
User: Are you deliberately ignoring me? Not music!
Bot: What's your favourite kind of music?
User: NOT MUSIC!!!!!
Bot: What's your favourite kind of music?
User: Aargh! You irritating bot! Why are you deliberately annoying me?!?!?!?

Clearly this simple keyword-spotting technique has severe limitations: the context in which a keyword appears is vitally important, but the program totally ignores it.

Multiple Word Meanings

A major problem with keyword-spotting bots is that they operate at a superficial level. Words and phrases are the building blocks of language, but it's important to realise that they're not just strings of letters, but that they're also associated with underlying meanings.

In English and many other languages, a single word can have multiple meanings. Take the word bank, for example:

... and a dictionary will point out several more shades of meaning. It's clear that the stimulus bank can't be associated with a single response in a lookup table.

When we converse, as humans we have some understanding of the context in which a word is used. For example, the question "Did you go to the bank at lunchtime?" probably refers to a financial bank and not a river bank. We know this because, as humans living in the modern world, we know that going to the bank is an activity often performed by people during their midday break. A computer program lacks this everyday knowledge, and currently there is no satisfactory way to give it such knowledge, despite the best efforts of artificial intelligence researchers.

Although people have sometimes come up with schemes that allow chatterbots to attempt to differentiate among the various meanings of a word, these schemes are not very robust and they often fail in practice. For example, if you associate the phrase go to the bank with the financial meaning of bank, you also have to cope with variants like going to the bank, been to the bank, arrived at the bank, go to the nearest bank, etc. Some of these are not unambiguously associated with the financial institution: for example, go to the nearest bank might also be used in a sailing context. There are just too many different ways in which people can use a word.

Taking Context Into Account

More advanced programs, for example Rollo Carpenter's Jabberwacky, keep track of the conversational context. Rather than considering only the most recent user input, they take into account the progress of the conversation so far. This gives far better conversational ability, as the bot has an improved ability to stay on topic and converse in a more human way. With this approach the multiple meanings of words are likely to be taken into account, because they're embedded in an overall conversational context.

But there's also a hidden problem that's much more fundamental: the program has no understanding of the conversation it's engaged in. It doesn't even understand the meaning of the words and phrases it uses. It just sees a particular pattern of letters and responds by outputting another pattern of letters. The patterns could be in English, Dutch or Swahili and it would make no difference. This is well illustrated by Jabberwacky's ability to converse in many different languages. Essentially it matches its responses to the stimuli it gets from the user. But it understands nothing.

Grammatical Analysis

Many people have tried the idea of analysing conversational utterances grammatically. The aim is to label each word as a part of speech (noun, verb, adjective, etc), a process known as parsing. The next stage would be to somehow extract meaning from the parsed sentences. Unfortunately, most people are very ungrammatical in their use of language. You only have to visit an internet chatroom or forum to appreciate this. Any program based on parsing sentences is bound to fail when presented with ungrammatical input; in other words, it won't work with real people, because we don't talk proper.

What people say may not be grammatically correct, but it usually manages to convey meaning. On the other hand, the linguist Noam Chomsky famously pointed out that a sentence like "Colourless green ideas sleep furiously" is grammatically correct but carries no real meaning.

Another problem is that the parsing of a sentence is not a separate activity from extracting its meaning: the two are interdependent. For example, "Are the chickens ready to eat?" contains a serious ambiguity. Does it mean "Are the chickens hungry?" or "Are the chickens fully cooked?" In the first case to eat is an active infinitive, but it the second case it's effectively passive, meaning to be eaten.There's no way to tell which is the correct grammatical analysis, unless you know whether the context is raising poultry or cooking dinner. There are also pathologically ambiguous examples such as "He saw that gas can explode" or "The blind man picked up the hammer and saw", which can be parsed in two quite different ways resulting in two different meanings.

Even if a parser analyses a sentence correctly, there are many problems with interpreting the results. A simple example is "Mary saw a book in the shop and wanted to buy it." Buy what? The rules of English tell us that it must refer to the shop, the nearest preceding inanimate object in the sentence. But as knowledgeable humans, we know that Mary is much more likely to want to buy a book than a shop! A parser can identify it as the object of the verb buy, but the program could easily err in assuming that it means the shop. If this seems too easy to fix, consider "Mary saw a poem in a book and started to read it". Read what? The poem or the book as a whole? Even we humans have problems with ambiguous examples like this.

Character and Personality

If a conversational system is to appeal to users, it must have some kind of personality and character. If you ask it "What's your favourite colour?" it needs to know what it prefers. If you ask "Where do you live?", it needs to give a sensible response (and not something like "In a server somewhere in Germany"). And if you tell it "You're stupid!", it needs to react with a suitable emotion, as a human would. Without these abilities, a program will never be interesting to converse with.

Many people have tried to give their chatterbot a character by presenting users with a visual image of the character and, to a lesser extent, by authoring suitable responses. (A good example of the latter is Jürgen Pirner's Jabberwock, which has been given the character of a dragon.) However, unless the character is well constructed and self-consistent, it will fail to convince. Because chatterbots are currently so arbitrary in their responses, zany or robotic characters are possibly the most convincing, though their conversation leaves much to be desired.

Conclusion

Clearly there is a lot more to a conversational system than just responding to individual words or phrases. The system needs to know how the same word can be used in many different ways, and it needs to have general knowledge about the world in which we humans live. Both of these are very difficult problems to solve. A chatterbot should also have a consistent and convincing character if users are to hold satisfying conversations with it.

Home · Experiments · Technical · About us