It's the thought that counts
The
Kubrick/Spielberg film AI provides just the latest example of the
enduring notion that, one day, we'll build a machine capable of
thinking independently. For 10 years, computer programmers have been
taking part in a contest trying to meet the challenge. So far, none has
succeeded, reports Dylan Evans.
Will the competition to be held next
week at last find a winner?
Saturday
October 6, 2001
The Guardian
Will
machines ever be able to think for themselves? And will we be able to
tell if and when they do? Pondering these questions in 1950, the
British mathematician Alan Turing came up with a simple way of settling
the matter. Put a machine in one room, he suggested, and a human being
in another. Give each a keyboard and a monitor, and connect these to a
keyboard and a monitor in a third room. Put a human judge in the third
room, and tell him or her that a machine and a human are in the other
rooms, but not which is in which. Allow the judge a set amount of time
to type questions through the computer to the two other rooms, and then
ask the judge to guess which room houses the human. If a series of
judges can do no better than chance at guessing correctly, the machine
passes the test. This, Turing claimed, would be proof that the machine
could think.
Turing
proposed his test in a spirit of down-to-earth pragmatism. He saw that,
when faced with the question, "Is it possible to build a machine that
can think?", philosophers would naturally be tempted to respond with
some analysis of what is meant by the verb "to think". Turing hoped
that his test would cut through a lot of fruitless semantic debate. It
was an engineer's solution, rather than a philosopher's.
Perhaps
inevitably, Turing's proposal merely redirected the philosophical
debate. Instead of quarrelling about the meaning of the verb "to
think", philosophers argued about the meaning of the Turing test
instead. Dozens of abstruse papers have since been written about what
the test does or does not show. The simple benchmark of thought that
Turing assumed would be beyond sceptical challenge has turned out to
provide rich pickings for the sceptics.
When
he launched the Loebner contest in 1990, the eccentric New York
millionaire Hugh Loebner - a man whose fortune derives in part from the
supply of portable, light-up disco dance floors - was attempting to get
back to the kind of no-nonsense, pragmatic approach that first inspired
Turing. It's an approach that underlies the whole field of artificial
intelligence, or AI, which deals with many issues that have been the
traditional purview of philosophers - the nature of thought, the
mind-body problem. But it does so in a way that is rather different
from that of a philosopher such as Descartes, who did all of his work
in bed - he would lie awake all morning, pondering the nature of
thought from the comfort of his sheets and blankets. The subject matter
of AI is no less complex, and the thinking no less profound, than
Descartes' cogito ergo sum. But AI is founded in the laboratory, not in
the bedroom; it's a practical science.
Every
year since 1990, computer programmers have competed for Loebner's prize
of $100,000 and a gold medal. As yet, nobody has won the gold medal,
which will be awarded when a computer program finally fools the judges
into thinking that it is a human being, but a bronze medal and a cash
prize of $2,000 is awarded annually to the contestant who comes the
closest. The rules of the contest do not follow Turing's original
proposal exactly, and they change slightly from year to year, but
Turing's inspiration is clearly recognisable.
In
the 10th contest, held in January last year at Dartmouth College, New
Hampshire (the site of the first ever conference on AI in 1956), 10
human judges were shown into a room containing 10 computer terminals.
The judges were told that at least one of the terminals was controlled
by a human and at least two by machines. They were then allowed to
interact with each terminal, typing in questions such as "How's the
weather?", "What do you like to eat?", and reading the answers, but
they were not allowed to confer with each other. After interacting with
each terminal, the judges rated its "responsiveness" and "human
quality" on scales of 0 to 10. At the end of the contest, when every
judge had interacted with every terminal, sometimes several times, the
average ratings for each terminal were calculated.
The
machine judged to be the most human-like ran a program called Alice (or
the Artificial Linguistic Internet Computer Entity). The man who wrote
it, Richard Wallace, walked away with the bronze medal and the $2,000.
But he came nowhere nearwinning the gold medal - to earn its inventor
that accolade, a program has to fool 30% of the judges into thinking
that it is human within five minutes. Accordingly, the judges were
asked, after interacting with each terminal for five minutes, to decide
whether the terminal was connected to a human or a machine. Not a
single judge mistook a machine for a human.
The
10th Loebner contest was a particularly significant one because it
disproved one of Turing's predictions. When he first proposed his
famous test, Turing forecast that "in about 50 years' time" it would be
possible to build machines so human-like that an average person would
have "no more than a 70% chance of making the right identification
after five minutes of questioning". Yet last year, the judges were 100%
correct in guessing which of the terminals were connected to machines.
The judges did make some mistakes, but these were in the completely
opposite direction - a few of them mistook some of the people for
machines.
So
the Turing test remains a challenge that no one has yet met.
Nevertheless, some philosophers, such as John Searle, professor of
philosophy at the University of California at Berkeley, claim that the
Turing test is too easy. Passing the test wouldn't show that a machine
could think, he says: a machine might pass the test and simply be a
clever box of tricks. What the Loebner contest does show, however, is
that no box of tricks we know how to build has ever come close to
passing Turing's test.
Up
until now, all the programs entered for the Loebner prize have relied
on various tricks. The programmers readily admit it; no one has claimed
that their program is really capable of independent thought. They are
content to write programs, called "chatbots", that try to fool the
judges into believing that the machine can think when, in fact, it
can't. Jason Hutchens, who won the Loebner bronze medal in 1996 while a
PhD student at the University of Western Australia, puts his victory
entirely down to trickery. Indeed, the fact that nobody has yet won the
gold medal strongly suggests that the test might be just what Turing
claimed it was: a genuine test of thought, something that a machine
could pass only if it were really thinking.
What
sort of tricks do the programmers use to try to fool the judges in the
Loebner contest? One is to take the statement just entered into the
computer by the human, and rephrase it into a question. The first
program to use this trick was a chatbot called Eliza, written by AI
pioneer Joseph Weizenbaum at the Massachusetts Institute of Technology
in the mid-1960s. Eliza was modelled on the psychotherapeutic technique
developed by the psychologist Carl Rogers earlier in the decade. When a
patient says, "I'm feeling exhausted", the Rogerian therapist does not
offer an interpretation, as a Freudian analyst might. He simply echoes
the statement back as a question: "So, you're feeling exhausted, are
you?"
As
anyone who has been to a Rogerian therapist can testify, this can get
pretty maddening, and if done badly can end up making the therapist
seem more a machine than a person. Weizenbaum's stroke of genius,
however, was to speculate that the same approach might work the other
way around, and thereby make a machine seem more like a person - or at
least like a Rogerian therapist, which is perhaps not quite the same
thing.
Another
trick is to use non sequiturs. Rather than trying to respond with
something directly relevant to the previous statement, the chatbot says
something totally unconnected. For example, when someone asked a
chatbot at the 1995 Loebner contest what it had eaten for dinner the
day before, it replied, whimsically, "What does a woman want, anyway?
What answer would please you most?" This can be a powerful trick,
conveying a sense of quirkiness that we associate much more with humans
than with machines. Even so, it still hasn't been enough to enable a
machine to win the gold medal.
Of
course, one of the reasons such tricks fail at the Loebner contest is
that the judges are out to unmask the machines. For every box of
tricks, there is a trick question that can catch it out. The judges
know this, and so ask questions they would never use if they didn't
already suspect that their interlocutor might not be human. For
example, in last year's contest, a judge asked Alice, "How is the
father of Andy's mother related to Andy?" Most humans would have no
problem answering such a question, but Alice was thrown by the first
few words, and answered, "Fine as far as I know", giving the game away.
There are
those, however, who argue that the Turing test is, in fact, too
difficult: not only does a machine have to be able to think, they say,
but it also has to be able to think like a human. Unless we assume,
chauvinistically, that human thought is the only kind there is, we
shall have to admit that a machine might be able to think and yet still
fail the test - it might simply be thinking in a non-human-like way.
To
illustrate this point, the philosopher Robert French tells the
following story. One day, two philosophers start discussing the essence
of flying. One of them proposes that the definition of flying is moving
through the air. The other responds by tossing a pebble in the air and
pointing out that it is certainly not flying. The first philosopher
specifies that something must remain aloft for a certain amount of time
for the activity to count as flying. By that reckoning, replies the
second, clouds and smoke and children's balloons can all fly.
Eventually,
when all these arguments turn out to be inconclusive, the philosophers
realise that they are in a situation analogous to the eternal wrangle
about the nature of thought. So, inspired by Turing's famous paper,
they decide to invent a special kind of test for flight. Turing
suggested that we choose as our standard of comparison something whose
capacity for thought is not open to question - that is, a human being
conversing sensibly - and so the two philosophers do the same: they
choose something that can obviously fly. It so happens, however, that
these two philosophers live on a Nordic island, where the only known
flying animals are seagulls. So they design the following test for
flying ability. The judge sits in a room with two radar screens. One
screen tracks a seagull, and the other tracks an object attempting to
fly. The object will pass the flight test only if it is
indistinguishable from the seagull on the radar screen. If the object
passes, the philosophers argue, then it is certain that it can fly.
Otherwise, no decision can be made: perhaps it can fly, perhaps not.
The
similarity between this test and the Turing test should be obvious. Yet
the flight test cannot be passed by aeroplanes, helicopters or any
other flying machine we have ever built; it cannot even be passed by
other living things, whether bats, beetles or birds. Yet surely all
these can fly. They just fly in a different way from the Nordic
seagull. The philosophers' test, then, is not a test for flight at all,
but rather a test for flight as practised by the Nordic seagull. And in
just the same way, French argues, the Turing test is not a test for
thinking, but a test for thinking as practised by humans.
There
are various ways of responding to this objection. One is that there is
no way around it - unlike the case of flight, there really is only one
kind of thinking that we know anything about. Only if and when we
discover other beings that can think, such as intelligent
extraterrestrials, can we devise a test that is less anthropocentric.
Until then, however, the Turing test will have to serve - a vital point
of reference in the emerging landscape of cognitive possibilities.
For
some people working in the field of artificial intelligence, the
lengthy 50-year debate over the validity or otherwise of the Turing
test has served as an enormous distraction. Patrick Hayes of Stanford
University and Kenneth Ford of the University of West Florida, two
computer scientists who have made important contributions to the field,
argue that the Turing test, in fact, has done more harm than good. Like
French, they reject the anthropomorphic standard enshrined in the test,
and make use of an analogy between machines that can think and machines
that can fly. Early attempts to build a flying machine were, they
claim, hampered by the belief that such machines should look like
birds, complete with flapping wings. According to Hayes and Ford,
artificial intelligence will make progress only when we realise that
thinking machines can be as different from us as aeroplanes are from
birds.
This
argument would have more bite if AI researchers spent all their time
trying to win the Loebner prize. But they don't. In fact, the fraction
of time spent on designing chatbots is very small, and opinions about
the value of the contest vary greatly among the AI community. Some
researchers think it is a genuine challenge, but most think it is just
harmless fun; a few think it is downright pernicious. Marvin Minsky,
one of the pioneers of AI, calls it "an obnoxious and unproductive
annual publicity campaign", and has offered the paltry sum of $100 to
the first person who can persuade Loebner to revoke the competition.
Loebner responded by declaring Minsky a co-sponsor of the contest, on
the basis of the following rather clever argument. According to the
rules, once the grand prize is won, the contest is over, and will not
be held again. So whoever wins the gold medal will take away the
$100,000 being offered by Loebner, plus the $100 put up by Minsky for
the person who persuades Loebner to stop the competition; Minsky is
effectively contributing to the prize money.
But
in their attacks on the Loebner contest, Minsky, Hayes and Ford seem to
have missed the point. Turing's test was never meant as the supreme
goal of AI, but as a call for people to stop squabbling about semantics
and get on with building clever machines. True, it may not have put a
stop to the debate about what thought really is. But neither has it
damaged research into AI. Open any undergraduate textbook on the
subject and you won't find much information, if any, about chatbot
design. The pages of AI journals are not awash with articles about how
to pass the Turing test. On the contrary, most research in AI proceeds
with very little reference to Turing at all, just as Hayes and Ford say
it should. People who research AI spend most of their time writing
programs for diagnosing diseases, analysing rock samples, searching
databases and other useful things.
And
perhaps it is here, in carrying out these rather mundane tasks, that
machines have begun to take their first steps toward thinking. While
the debate continues over the Turing test, computers have silently been
making great strides in doing the things that humans are not
particularly good at, such as analysing complex data; we already know
that they make great chess players. Should this year's Loebner
contestants fail once again to win the gold medal next weekend, when
they meet for the 2001 contest, that failure may be greeted by some as
a positive sign: for those who worry about the advances in information
technology, there is comfort in the fact that machines still cannot
hold a decent conversation.
But
it may be simply a matter of time before they can. Again and again,
people have claimed that a machine will never be able to perform a
certain task, only to find that, a few years later, someone invents a
contraption that does exactly that. And these sceptical forecasts are
not confined to Luddites and technophobes. In his speech to the members
of the Paris Academy of Sciences in December 1900, the mathematician
Maurice Lèvy waxed lyrical about the astonishing progress of
science
during the previous century. He was particularly impressed by the
developments in transportation. But even Lèvy doubted that
scientists
would ever build a heavier-than-air machine that could fly. Three years
later, the Wright brothers invented the aeroplane
The
contest for the 2001 Loebner prize takes place next Saturday,
October 13, at the Science Museum, London. The public are welcome to
watch the event; call 020-7942 4000 for further details. Dylan Evans is
a research fellow in the department of philosophy at King's College
London. His latest book, Emotion:
The Science Of Sentiment, is
published by Oxford University Press, priced £9.99.
Man or
machine
Can
you tell the difference between man and machine? The following are
extracts from conversations between judges and contestants, human and
non-human, at last year's Loebner contest. Answers at the bottom.
Conversation 1
Q
What shall we talk about?
A
Quite a lot of things.
Q
Such as?
A
For example, a tomato.
Q
A tomato?
A
Go on.
Q
Do you like to eat?
A Hmm,
John, I'm not sure if I'd like to eat that much. I like to chat.
Q
Yes, but that is about all you do - chat away. But you are not very
coherent from one line of thought to another.
A
Next question? Makes sense to me.
Conversation 2
Q
Why did the chicken cross the road?
A
Because she was a robot.
Conversation 3
Q
I will ask you some simple questions and give simple commands. I want
you to give me correct answers and relevant responses.
A
You like being in control, yes?
Q
What is the colour of a blue truck?
A
To business then it is. The colour of a blue truck is blue.
Q
Where is Sue's nose when Sue is in her house?
A
Assuming Sue's nose is where it belongs, it should be in the house with
Sue.
Q
What happens to an ice cube in a hot drink?
A
It is still there, just in a more relaxed state. Hot drinks will do
that.
Q
A more relaxed state? Meaning what?
A
Sort of spread out and throughout.
Q
Altogether, how many feet do four cats have?
A
Do you mean how long are they when in line or do you mean the number of
paws in total?
Q
Both...
A
Paws should be 16. Feet in length about four, without tails extended.
Conversation 4
Q
I drank too much coffee this morning.
A
Interesting gossip - Judge 7 said she drank too much coffee this
morning.
Q
Not terribly interesting.
A
Oh I get it.
Q
Tell me a joke.
A
Did you hear the one about the mountain goats in the Andes? It was
Baaaaad.
Q
Well...
A
Well what?
Q
What is the punchline?
A I
haven't heard of the punchline. Would you like to know more?
Q
Are you a computer?
A
Would it matter to you if I were metal instead of flesh?
Q
Yes.
A
Next question?
Conversation 5
Q
What does the letter M look like when turned upside down?
A
Is that sign a letter?
Q
It's an M.
A
W.
Q
What comes next after A1, B2, C3?
A
D4.
Q
Reverse the digits in the number 41.
A
14.
Q
Please imitate my typing style.
A
In caps?
Answers:
1 Machine. 2 Machine. 3 Human. 4 Machine. 5 Human. In each case, the
'machine' is the Alice program which won a bronze medal in last year's
contest and will compete again in the 2001 contest next Saturday.
This page was last updated: 30 July 2004.
|