Back channels

The term back-channel is used to indicate the communicative behaviours produced by participants in a conversation as feedback on the reception of the communicative behaviours of the other participants.

Like Levinson [1] has paraphrased: communication consists of the ‘sender’ intending to cause the ‘receiver’ to think or do something, just by getting the ‘receiver’ to recognize that the ‘sender’ is trying to cause that thought or action. Therefore, communication is a complex kind of intention that is achieved or satisﬁed just by being recognized.

A general assumption behind the concept of back-channel is that all the participants in a face-to-face conversation are both producers and recipients of communicative signals, but that there are different levels on which this occurs. During a conversation the listener is called to provide information on the successfulness of the communication. Through one or more channels like voice, head, face, gaze, posture and gesture, listeners provide backchannels signals of perception, attention, interest, understanding, attitude (belief, liking...) and acceptance towards what the speaker is saying [2, 3, 4].

A big challenge that must be faced in the design of virtual agents is the issue of credibility, not only in the agent’s aspect but also in its behaviour. Users tend to react as if in a real human-human interaction when the virtual agent behaves in a natural human manner.

Implement back-channel behaviour in a conversational agent can make the ECA more realistic and human-like and the user can feel the ECA is participates actively in the conversation. Back-channels, provide information about the listener’s mental state towards the speaker’s speech (e.g., if s/he believes or not what the speaker is saying) and also its behaviour tendencies (or Baseline), that is the particular way of producing non-verbal signals that characterizes the agent.

Baseline: Behaviour tendencies are deﬁned by the preference that the agent has in using each available communicative modality (head, gaze, face, gesture and torso) and a set of parameters (expressivity parameters) that affect the qualities of the agent’s behaviour (e.g. wide vs. narrow gestures).
Agent’s mental state: it is how the agent reacts towards the interaction, that is how the agent reacts to the user’s speech (if it agrees, refuses, understands... what is being said). Having such information the system speciﬁes which communicative intentions (agree, refuse, understand, ...) it will convey through its backchannel signals. We consider twelve communicative intentions related to backchannels chosen from the literature: agreement, disagreement, acceptance, refusal, belief, disbelief, interest, no interest, liking, disliking, understanding, no understanding [6].

To display a believable listener behaviour, a virtual agent must be able to decide when a backchannel signal should be emitted and select which communicative intentions the agent should transmit through the signal. Backchannel trigger needs three data as input:

the user’s verbal and non-verbal behaviour*, tracked through a video camera and a microphone;
the user’s estimated interest level, an emotional state linked to the speaker’s goal of obtaining new knowledge. Such a level is calculated evaluating the user’s gaze, head and torso direction within a temporal window.
the agent’s mental state towards the interaction

The probability that user behaviour provokes a backchannel signal from the agent depends on the user’s estimated level of interest. This value is used by the system to vary the backchannel emission frequency: when the interest level decreases the user might want to stop the conversation, consequently the agent provides less and less backchannels.

When a backchannel must be emitted the backchannel generator module uses the information about the user’s mental state to decide which communicative intentions the agent should convey.

Back-channels in Greta platform

In the Figure below, it is shown a basic configuration to which is added modules needed to have the agent do back-channels, namely: a ListenerIntentPlanner, SSIXMLToFrameTranslator and SSIFrameToSignalTranslator modules. The last two modules allow recognizing user’s verbal and nonverbal behaviours and sending them to the ListenerIntentPlanner module.

drawing

The SSIXMLToFrameTranslator receives an XML file (Figure below) with detected user’s information. Audio-visual signals are captured from user’s behaviours (via camera and microphone). In particular acoustics cues linked to prosody are gathered. Head and gesture positions are also stored.

This XML file is the input that the Greta platform needs to trigger the back-channels.

The file can be created via SSI or EyesWeb and then sent to Greta via ActiveMQ. The SSIXMLToFrameTranslator receives the file (the receiving frequency can be set in SSI or EyesWeb and translate it in frames, according to the type of information provided, i.e. headframe, prosodyframe or bodyframe. The frames are sent to SSIFrameToSignal module in order to be translated in signals (headSignal, audioSignal, AUSignal, etc.). The Signal produced are defined UserSignals, i.e. translation in Greta platform language of what the user just performed. The UserSignals are the key to trigger the back-channels.

The ListenerIntentPlanner receive the UserSignals, check if all of these are in any of the rule and in positive case, according to the rule, can trigger signals to be perform by the agent.

drawing

The ListenerIntentPlanner computes the backchannel signals that the agent provides while listening. This module implements three types of backchannels:

Reactive: derive from a first process of perception of the speaker's speech and they show contact and perception;
Response: are generated by a more aware evaluation that comprehends memory and cognitive process;
Mimicry: derive from the imitation of the speaker's behaviour

Responsive/Reactive back-channels are link to agent’s mental state to decide which communicative functions the agent should convey.

The mimicry module determines which signals would mimic the agent. So far, we are considering solely speaker's head movement (nod, shake) and some facial-expressions(smile) in the signals to mimic.

drawing

The agent's mental state in the Greta platform is an XML file where can be specified the communicative intention. All of them are listed in the XML file example of the figure below.

drawing

According to the agent’s persona the type of Back-channels can change. For example, if an agent is happy and positive, it tends to communicate mainly through facial expression and tends to provide Back-channel signals that are the expression of positive communicative intentions, as liking, acceptance and interest. Instead, an agent gloomy and sad tends to produce Back-channels signals mostly with the head and tends to convey negative communicative intentions, in particular disbelief, refusal and no understanding. Or an agent pragmatic and sensitive tends to perform slow movements mainly on the head and face modalities and conveys positive communicative intentions, in particular agreement, belief and understanding.

References

[1] Stephen C. Levinson, ‘Putting linguistics on a proper footing: explorations in goffman’s concept of participation’, in Erving Goffman. Exploring the Interaction Order, eds., Paul Drew and Anthony Wootton, 161–227, Polity Press, Cambridge, (1988).

[2] Adam Kendon, ‘Movement coordination in social interaction: some examples described’, Acta Psychologica, 32, 100–125, (1970).

[3] W.S. Condon and W.D. Ogston, ‘A segmentation of behavior’, Journal of Psychiatry, 5, 221–235, (1967).

[4] A.E. Scheﬂen, ‘The signiﬁcance of posture in communication systems’, Psychiatry, 27, 316–331, (1964).

[5] J. Allwood, J. Nivre, and E. Ahls´en, ‘On the semantics and pragmatics of linguistic feedback’, Semantics, 9(1), (1993).

[6] J. Allwood, J. Nivre, and E. Ahlsn. On the semantics and pragmatics of linguistic feedback. Semantics, 9(1), 1993.

[7] Bevacqua, E., Heylen, D., Tellier, M., Pelachaud, C., Facial feedback signals for ECAs. In AISB'07 Annual convention, workshop “Mindful Environments”, pages 147--153, Newcastle upon Tyne, UK. 2007.

[8] Heylen, D., Bevacqua, E., Tellier, M., Pelachaud, C., Searching for prototypical facial feedback signals. In Proceedings of 7th International Conference on Intelligent Virtual Agents IVA 2007, pages 147--153, Paris, France, 2007.

[9] Lakin, J. L., Jefferis, V. A., Cheng, C. M., Chartrand, T. L., Chameleon effect as social glue: Evidence for the evolutionary significance of nonconsious mimicry. Nonverbal Behaviour, 27(3):145—162. 2003.