You can’t see him, talk to him on the phone, or interact with him except by texting messages to him. Communicate with Eugene Goostman, and he’ll tell you he’s a 13-year-old Ukrainian, from Odessa. And like many teenagers, you’ll discover he’s flippant, evasive, over-confident, and prone to misdirection and dissembling. He quips. He’s sarcastic. Unsurprisingly given his Ukrainian roots, he speaks in broken English. Unlike real teenagers, however, Eugene is a computer program, designed by a team of Russian researchers to convince human judges that he’s real flesh and blood.
In a much ballyhooed event hosted by the University of Reading and conducted at the Royal Society in London, the Goostman program in a five minute text exchange convinced 33 percent of selected human judges that he was human. Robert Marks and David Klinghoffer have already commented on the story. Here’s my take.
The event made major news — blogs and news organizations around the world covered it — because the test is based on the original Turing test — the iconic measure of machine intelligence first proposed by computer pioneer Alan Turing in his landmark 1950 paper "Can a Machine Think?" Pose questions to a computer and a human on a Teletype (or, today, via texts), and if a human judge can’t tell the difference, Turing proposed that we should consider the machine intelligent. Why? Well, for one thing, because we can’t "see" consciousness in other people or computers, and the very concepts of consciousness and mind and intelligence are open to endless philosophical debate. Turing, ever the scientist and engineer, simply proposed that we converse with machines purporting to be humans, and if we think they’re humanly intelligent, then that’s the story. They are.
And so too with Goostman, in this limited version of a Turing test. As a few of the judges decided he was human, the academics at Reading quickly claimed the official date for the first successful Turing test: June 7, 2014, sixty years after Turing’s death. This would be big news, of course, if it were remotely true. It’s not, but we can start first with the hype. Then the actual test. Then what it really means.
Any ostensible success in AI — whatever the reality — will, of course, be immediately seized upon as a major success with potentially earth-shaking ramifications for humans. I’m not a psychologist, but there’s something almost delusional about the emotional reporting on Artificial Intelligence. The Independent announced the ostensible Goostman victory as a "breakthrough," claiming also that the program is a "supercomputer." Time Magazine — in an astonishing bit of journalistic emotionalism and excess even for Time — simply proclaimed that "The Age of Robots is Here." The BBC called it a "world first," while the popular tech blog Gizmodo announced to their readers "This is big." And on and on.
It’s not big. I’m not sanguine about real successes in AI, but as Aesop warned us about crying wolf, the field is not advanced by pseudo-successes based on tricks and gimmicks. Indeed, Gary Marcus, a cognitive scientist at NYU who writes for The New Yorker, denounced the Goostman flap as a parlor trick, and serious computer scientists like Hector Levesque of Toronto have called such performances "cheap tricks." The sober criticism stems from the very real observation that Goostman is designed to feign human responses, without actually understanding any of the questions posed to it. As Marcus explains, such misdirection creates an illusion of intelligence without requiring any:
Marcus: Do you read The New Yorker?
Goostman: I read a lot of books … So many — I don’t even remember which ones.
Goostman’s response relies on a shallow parsing of Marcus’s question: the verb "read" followed by proper nouns implies the title of a book (actually it’s a magazine). It reflects no understanding of the question. It’s a canned response to questions — any questions — about reading. Since the program knows nothing about the question other than from shallow syntactic clues (the word "read" and so on), the misdirection isn’t even clever. It’s a trick.
Beyond the basic strategy of the Russian designers of Goostman, to fool judges rather than understand their questions, the so-called Turing test is also unfairly limited. Goostman’s designers created a character that human judges expect little from — not just because he’s 13, but because he’s not a native English speaker. And so the boyish know-nothing sarcasm combined with the limitations baked into his character mean the bar is set artificially low to begin with. Add to this, the test had a five-minute time limit, unlike Turing’s original test, and required only a minority of the judges to score a victory. Fully 66 percent of the judges thought Goostman was not human. This is certainly not the original Turing test, yet the University of Reading researchers, along with the Pavlovian press, were unperturbed about such cheats. But the details matter, of course. Double the time to ten minutes, and how many judges would still be duped? One? None?
So, yes, Goostman is a fraud. That’s been the story all along with attempts at passing the Turing test. In the 1960s, a silly program called ELIZA posed as a Rogerian psychotherapist and created a temporary illusion of understanding, too. A typical ELIZA exchange was eerily human:
Patient: Well, I’ve been having problems with my husband.
ELIZA: Tell me more about your husband.
Patient: He just doesn’t listen.
ELIZA: Does your husband’s not listening bother you?
And so on. We’re just not making any headway on actual Artificial Intelligence when we’re engaged in these admittedly fun and humorous exercises. Serious researchers in AI like Levesque know this, of course, and have argued that it’s a limitation in Turing’s original test. Levesque, for instance, as well as Marcus argue that we might eliminate the fraudulent element by reformulating the test as a Q&A requiring actual answers to informative questions. Another strategy — and no doubt the one Turing originally envisioned — would be to remove time limitations. Given enough time, cheap tricks are inevitably exposed.
Ultimately, the question of whether a computer can reproduce a mind has two sides. On the one hand, we have a tangle of scientific and engineering questions about the nature of computation, the human brain, and intelligence itself. On the other hand, we have social, cultural, and psychological questions about how, or why, we’re willing to change or reduce our image of ourselves in order to fit our hopes for machines.
Jaron Lanier, pioneer of Artificial Intelligence applications like Virtual Reality and best-selling author of You Are Not a Gadget and Who Owns the Future?, calls this a "dumbing down" of ourselves. In our quest to find a machine that is like us, we’re willing to set the bar lower and lower for what counts as human intelligence. This seems like a bad strategy. Artificial Intelligence — whether it’s possible at all, in the end, or not — is a high bar. We’re on the right track when we acknowledge this.
Founder and CEO of a software company in Austin, Texas, Erik Larson has been a Research Scientist Associate at the IC2 Institute, University of Texas at Austin, where in 2009 he also received his PhD focusing on computational linguistics, computer science, and analytic philosophy. He now resides in Seattle.