A New Way to Practice What You Preach

USC Viterbi computer scientists create a virtual audience for public speaking training.

hero Stefan Scherer, research assistant professor at ICT and the USC Viterbi Department of Computer Science.

Cicero aims to create expert public speakers in the tradition of the famous Roman orator.

Even for the greatest orators, a polished performance requires practice and feedback. But sharpening a speech in front of a crowd or even a close friend can induce anxiety in almost anyone, and even for those interested in rehearsing in front of real people, a willing group is not always available.

Enter Cicero, an interactive virtual audience solution being developed by researchers at the USC Institute for Creative Technologies (ICT) and the USC Viterbi School of Engineering. Named for the Roman rhetorician, Cicero combines machine learning models and Toastmasters tips to automatically evaluate a person’s delivery and provide constructive critiques for improvement.

“We’ve all had the experience practicing a presentation in front of mirrors or empty chairs,” said Stefan Scherer, co-leader of this effort and research assistant professor at ICT and the USC Viterbi Department of Computer Science. “But in order to get better, you need audience feedback, including nonverbal signals like nodding heads or downcast eyes that tell you if you are doing well or not. The goal of this project is to give people that feedback before it’s too late.”

To begin that process, Scherer and project co-leader Louis-Philippe Morency, director of ICT’s MultiComp Lab and a research assistant professor in the Department of Computer Science, made a science out of studying public speaking, compiling what characteristics studies and expert elocutionists have determined will put an audience on the edge of their chairs and what will send them slumping in their seats.

Next, they brought in study subjects who gave speeches in front of a static virtual audience. Researchers recorded their performances, tracking components of the presenters’ speech, gaze and body movement, and measuring and monitoring over 20 nonverbal characteristics associated with good or bad speaking performances.

The focus was on style, not substance. The team did not address the content of what people said—that aspect might be added later—but rather looked and listened for the way in which the speech was delivered. Were voices monotone or did inflections change? Did people speak in a breathy whisper or with a strong timbre? Did they make the most of the space on the stage, direct their eyes to specific people, wave their arms or clasp their hands?

“These are all measurable factors that go into determining whether a performance is effective or not,” Morency said. “People make these calculations automatically, and what we discovered is that computers can be taught to do the same.”

In results presented last year at the International Conference on Intelligent Virtual Agents, the researchers reported that the initial Cicero prototype recognized properties of effective speeches, including strong voice quality, eye contact and gesturing, nearly as accurately as trained Toastmasters who had volunteered to appraise the talks.

The evaluative engine driving Cicero is MultiSense. Developed by ICT research programmer Giota Stratou, MultiSense can instantly quantify facial expressions, posture and speech patterns. The framework, combined with simple cameras, microphones and a Microsoft Kinect sensor, can automatically analyze people’s gestures, voices, eye contact and facial expressions to provide intelligent feedback that helps them improve their performances in public speaking.

In SimSensei, another ICT research project, MultiSense automatically assesses nonverbal behaviors associated with depression and allows a virtual interviewer to respond appropriately.

In the case of Cicero, the researchers’ next challenge is to combine MultiSense and SmartBody, a character animation system overseen by ICT research scientist and Cicero co-investigator Ari Shapiro. SmartBody determines individualized feedback behaviors for each member of the virtual audience—behaviors that are driven by the practice performance and informed by learning strategies designed to effect positive change.

“We are realizing we don’t need to model everything a real audience would do,” said Morency. “Rather than have a virtual listener quietly fall asleep, we might have them shift their body and cough to signal to the speaker that people seem bored. If a speaker avoids eye contact, we might have an audience member clear his or her throat to get the presenter’s attention.”

The team is currently conducting a study with 60 people who each give a presentation. Some people get no feedback, while others receive feedback in the form of green or red color bars that indicate levels of audience engagement. A third group gets their feedback from an interactive virtual audience. After receiving feedback (or not), each person presents again. After a comparison, the team hopes to determine which form of feedback is most effective.

Aside from helping to improve speechmakers’ skills, this phase of the Cicero study aims to advance ICT’s existing research on developing interactive virtual humans, including improving how these characters move, listen, react and perceive as they communicate with real people. The researchers are also using this project to better understand how to implement effective computer-delivered instruction, provide automatic assessments and model individualized behaviors for diverse groups of virtual humans.

ICT has long specialized in training systems to improve interpersonal skills. Cicero is sponsored in part by the U.S. Army to encourage the development of leaders who are confident speaking in front of a crowd. The National Science Foundation also provides funding. The team sees other potential applications that can improve how people present themselves and inform future human-computer interaction research, like preparing politicians for press conferences or job candidates for group interviews.

If Morency and Scherer succeed in making it easier for people across professions to more clearly get their points across, even the most apprehensive of announcers might be prompted to propose a toast.

But, the researchers caution, practice, whether with a virtual audience or a real one, is not the only factor when it comes to delivering a crowd-pleasing presentation.

“People project more confidence when they are enthusiastic about the message they want to deliver,” said Scherer. “There may be people who have plenty of training but they don’t believe in what they are saying.”

And that may be the most valuable feedback of all.