Chatbots. Increasingly ubiquitous and (for this writer, anyway) unfailingly annoying, these computer systems can make or break a user’s experience. Sadly, perhaps, FAQ pages aren’t enough, anymore. When we land on a website, we apparently need “interaction.” (Whatever that means, since a chatbot is just an algorithm designed to stand in for an actual human.) Enter the chatbot or other dialog system.
Pre-populated with answers to standard questions, chatbots have a closed and narrow list of possible responses, which means the human user has to adjust. That adjustment, in turn restricts communication in ways that often feel unnatural. Indeed, the experience often feels as if the human must accommodate the chatbot, rather than the other way ‘round. You know you’re in trouble when you long for the days of interminable hold times followed by an unhelpful, but still human, customer service representative.
The general problem with the chatbot seems to be that it can’t think. Consider the Turing test. In his 1950 paper, mathematician Alan Turing proposed the Imitation Game as a test of computing intelligence. If a human believes they’re talking to another human during a text-based chat, then the machine has exhibited thought or intelligence.
But let’s suppose the human types this joke: A vacuum cleaner salesman turns up on a person’s doorstep. “I’ve got a vacuum that sounds too good to be true,” he says, “but it’s real. This vacuum is so effective, it’ll cut your housework by half!” The person thinks for a moment and says. “One vacuum cuts my housework in half, you say?” The salesman nods vigorously. “Why, yes. Yes, it does!” After a moment’s more thought, the person says, “Okay, I’ll take two!”
An ordinary chatbot won’t get why it’s supposed to be funny — not unless it’s programmed to detect the ambiguity at the heart of the joke. Such detection would be no small feat indeed for a program limited to queries about things like product returns. Chatbots require an enormous amount of conversational data as training material, so expanding the program’s repertoire might be more trouble than it’s worth, particularly when things like jokes are obviously outside the scope of, for example, an ordinary customer service exchange. (Siri and Alexa have apparently learned to engage with the ha-ha’s, but we’re still left to wonder if they have a sense of humor — but that’s a topic for another article!)
Ordinary language is replete with ambiguities that elude dialog systems such as chatbots. In a recent study, “Investigating Robustness of Dialog Models to Popular Figurative Language Constructs,” computer scientists from Carnegie Mellon University and UC San Diego found that computer dialog systems, like virtual personal assistants and chatbots, aren’t proficient with figurative language. Such systems stumble over idioms like, “He got cold feet when it came time to parachute from the plane.” Confronted with conversations that emphasized this sort of language, dialog system performance suffered.
Lead authors, Harsh Jhamtani and Taylor Berg-Kirkpatrick set out to document when and how much dialog systems stumble — and to test a candidate for helping the system to improve. First, they found that dialog system performance dropped considerably when conversation data sets emphasized figurative language. After writing a script for systems to scan dictionaries for translations of figurative language into literal expressions, dialog system performance improved up to 15%. While there is more work to be done to preserve, for example, the meaning of a figurative expression, the hope is that improvements to dialog system scripts will improve human engagement.
Sources: Neuroscience News, Cornell University arXive