Skip navigation
Talk to Ditto the donkey and help him learn English Convo.co.uk - Learning bit by bit

How Experiment 5 Works

Experiment 5, Babbling, generates "words" at random using character trigram probabilities. The trigrams and their associated probabilities come from two sources: the user's input and Ditto's experiences in Experiment 2. This experiment also uses the same emotional model as Experiment 2.

Introduction

Babbling is a process of emitting random utterances, a stage that babies go through when first learning to talk. Ditto the donkey's babbling is not quite the same as a baby's: a baby is learning to form the sounds of the language — phonemes — but in Ditto's case the elements are not phonemes but character trigrams (e.g. thr, eas, oul).

Ditto responds to what you type by a combination of imitation and his own made-up words. Sometimes his output seems complete gibberish, or possibly a foreign language, but occasionally an English word is recognisable. Very occasionally Ditto will say a whole English sentence, but this is a chance occurrence, not a programmed event. If Ditto's random babblings are sometimes recognisable as meaningful, this is because we read meaning into them, as we often do with babies' babbling.

Rationale

Why bother with generating random utterances? For babies, random utterances are the first stage in a learning process. Parents are very proud when their infant, after weeks of making meaningless sounds, says its first word (i.e. babbling that is recognised by the parents as English!). The infant itself soon learns that emitting certain sounds (which we call words) gets results — as well as parental attention — and begins to associate these sounds with actions and objects. In fact, it is learning to speak and to convey its desires and intentions.

Similarly, generating random utterances is the first stage in Ditto's learning of English. One way to teach him words would be to implement a learning scheme and reward the production of real English words in some way, possibly also penalising the production of non-words. If the learning process succeeds, he will soon be producing mostly good English words. (In fact Ditto is already learning to recognise English by another route in Experiment 3, English or Gibberish.)

Equipped with a vocabulary of English words, Ditto could next be programmed to construct random utterances, using word pairs (say) in place of character trigrams. Again, he could be rewarded for producing good English sentences rather than meaningless strings of words.

In theory this process could be continued, with Ditto learning to produce responses relevant to a user's utterance, and so on. However, this would be a long and tedious process, and there may be better ways to achieve the same aims.

Trigram Frequencies

Now to some technical details.

The training examples from Experiment 2, Simple Emotion Modelling, were analysed to find the character trigram frequencies occurring in user utterances that Ditto has been exposed to. For example, the trigrams _yo, you and ou_ (where the underscore character _ represents a space) are the most frequent, because people have used the word you a lot when speaking to Ditto. On the other hand, the trigrams w_p, thm and sig have been very infrequently encountered in Ditto's experience. Therefore he is quite likely to produce the word you and much less likely to say something involving infrequently encountered trigrams.

Two sources of trigrams are used as a source for Ditto's babbling: the Experiment 2 training examples, and the user's recent utterances within the current "conversation". This allows Ditto's babbling to imitate the user, after a fashion. To achieve this, the user's utterances are first cleaned up as in Experiment 2 and are then analysed into a set of trigrams.

Random Utterance Construction

Ditto's responses are built up using trigrams selected randomly from both sources (training examples and recent user utterances). There is currently (October 2006) an equal probability of choosing a trigram from each source. Within each source, the probability of a trigram being selected is determined by its relative frequency.

The first step in building an utterance is to search randomly for a trigram beginning with the underscore character, which marks the beginning of a word. For example, _tr might be chosen.

The next step is to search for a trigram of the form tr?, where ? represents any character. Suppose tre is found; the partial utterance is then extended to _tre. This process continues iteratively until a sufficiently long string of characters has been formed. The string is then trimmed to the last underscore character; for instance _tremin_i_byest_dittop_rema would be trimmed to _tremin_i_byest_dittop_.

At any stage it can happen that, after a large amount of random searching, a suitable trigram is not found. This can be either because there are no suitable ones available or because the search has not turned one up yet. In either case the partial utterance is discarded, and the construction process is restarted from the beginning. This can happen several times before a suitable string is finally formed.

Moreover, when the utterance is complete it is scanned for certain undesirable substrings. In other words, it is censored in the hope that Ditto will not swear at the user or insult them. (However, this procedure is not guaranteed to be foolproof, and more banned substrings may have to be added as time goes by!) If an unacceptable utterance is detected, the whole construction process is restarted.

For these reasons, Ditto's response time can vary. A more efficient backtracking algorithm may be used in the future, but this one has the advantage of being simple and working well in practice, even though it involves a lot of computation.

Post-Processing

Once an utterance has been constructed and accepted, it is subjected to two further processes:

So _tremin_i_byest_dittop_ might be transformed into Tremin I byest Dittop?, and this would be Ditto's response.

Emotion Modelling

In this experiment Ditto's webcam images are selected by the same process as used in Experiment 2, Simple Emotion Modelling. The user's utterances are analysed in exactly the same way, using the results of the Experiment 2 training. The only difference is that the responsiveness parameters of the model are set to lower values in this experiment, so Ditto's emotions are likely to remain closer to neutral.

The emotion model is totally separate from the babbling and there is no interaction between the two.

Conclusion

This experiment is an analogue of the babbling stage in child language acquisition. Random utterances are constructed from trigrams sourced from previous user utterances, either the training examples of Experiment 2 or recent utterances within the current "conversation". The probability of selecting a particular trigram depends on its relative frequency in the user utterances. Sometimes real English words will be produced as a result of this probabilistic approach.

Links

Home · Experiments · Technical · About us