Google TechTalks July 26, 2006
Tasks like image recognition are trivial for humans, but continue to challenge even the most sophisticated computer programs. This talk introduces a paradigm for utilizing human processing power to solve problems that computers cannot yet solve. Traditional approaches to solving such problems focus on improving software. I advocate a novel approach: constructively channel human brainpower using computer games. For example, the ESP Game, described in this talk, is an enjoyable online game — many people play over 40 hours a week — and when people play, they help label images on the Web with descriptive keywords. These keywords can be used to significantly improve the accuracy of image search. People play the game not because they want to help, but because they enjoy it.
I describe other examples of “games with a purpose”: Peekaboom, which helps determine the location of objects in images, and Verbosity, which collects common-sense knowledge. I also explain a general approach for constructing games with a purpose.
Here’s some links to the games mentioned, and others that work on the same principles:
ESP
Peekaboom
Google Image Labeler
Phetch (multiplayer game)
Verbosity (Apparently the server doesn’t work)
A few more links:
O’Reilly’s short write up on this lecture
All in all, a fascinating talk on a trend that I believe is going to be one of the most important any of us will face. As the symbiosis between humans and computers becomes deeper, and at a larger scale, we’re going to see problems that were formerly construed as “hard AI” suddenly broken, not because computers themselves have become intelligent, but because humans and computers have gotten better at working together. We’re only at the early stages of harnessing collective intelligence, and we’re going to see more and more breakthroughs as creative computer scientists find new areas that they can tackle with bionic software.
NYT article from a few days ago discussing Google Image Labeler, Amazon’s Mechanical Turk, and similar attempts to augment collective intelligence through the collaboration of both biological and machine intelligence.
In the 1950s William Ross Ashby, a British psychiatrist and cyberneticist, anticipated something like this merger when he wrote about intelligence amplification — human thinking leveraged by machines. But it is both kinds of intelligence, biological and electronic, that are being amplified. Unlike the grinning cyborgs envisioned by science fiction, the splicing is not between hardware and wetware but between software running on two different platforms.
And to prove that this has more practical applications than just attaching metadata to images, von Ahn recently started the reCaptcha program that uses captchas to aid OCR programs as they digitize text.
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.
I find this last application to be quite ingenious, although I think it suggests that the focus on developing games to attract people’s interest is a little naive and short sighted. I think that the best way to get people to solve these kinds of computational tasks is to put them as necessary steps in a project that they already want to solve, instead of trying to attract their undivided attention in the context of a game. That doesn’t mean that the tasks have to be annoying, but taking the time to develop a game is just a way of artificially generating motivation and interest to solve these tasks; humans are already motivated to solve lots of other tasks, and there should be a way to tap into this existing motivation. In other words, we shouldn’t be afraid to let our machines use us if we are trying to develop a symbiotic relationship.
Read my analysis below.
I am extremely interested in the idea of a captcha as a way of distinguishing between humans and machines. In the link above, O’Reilly makes the analogy to the Turing Test explicit:
Most of this was familiar material, although he has a great definition of a Captcha: “A program that can generate and grade tests that most humans can pass, but that current computer programs cannot pass.” (This is an interesting variation on the Turing test, in which humans generate and grade tests that most humans can pass, but current computer programs cannot pass. Is there another variation in the future, in which computers generate and grade tests that computers can pass, but humans cannot pass?)
The idea of humans and machines testing each other to identify affiliation is interesting, and hopefully it comes up in discussion (though I think Turing would have found this implication appalling.)
But I want to approach the issue from a different angle. As our OCR software gets better at reading distorted text, then the captcha programs we use would also need to get more sophisticated to distinguish between humans and machines. Ideally, our machines would be able to pass any such test; in other words, there is no such thing as a perfect captcha. But this thread is not about ‘true AI’ or anything like that; it is about the way human cognitive abilities contrast and compliment computers. So let’s not speculate to wildly; right now captchas are pretty good at distinguishing humans and machines, and some variation on the same test will likely work for the foreseeable future.
But this seems, at first blush, to be disanalogous from the original Turing test. Turing’s imitation game supposedly has a stable standard for testing the intelligence of machines: that they can converse fluently and indistinguishably from a human interlocutor. The intuition behind Turing’s test is that we know what it is like to talk to another person, and if machines can talk to us in the same way, then they count as intelligent. The test is supposed to be stable. We don’t have to adjust it as our machines get more sophisticated; if they meet this criterion, they pass the test. The captcha, in contrast, isn’t stable at all. It must be constantly updated to make sure we correctly distinguish between humans and increasingly sophisticated machines. In other words, if a machine passes a captcha, then something about the test has gone wrong and the test needs to be changed.
I take this disanalogy to be deeply revealing about our curious relationship to the machines we build. The Turing Test is stable, because we all have a pretty good idea about how people behave and that behavior hasn’t changed very much in the last few hundred thousand years or so. In other words, we have evolved deep seated intuitions and theories of mind that we use to explain and understand the behavior of other people, and these evolved cognitive models are stable, reliable, and almost universally applicable; even abberant cases give us very little reason to modify our naive theories of psychology.
Computing machinery, on the other hand, is a very recent phenomena, and only within the last few decades have people interacted with machines that can produce intelligent output on a daily basis. In other words, we have had very little time to settle our intuitions on how machines do and should behave. Furthermore, there are very few universal protocols for determining how such interactions work, or allow us to predict to novel cases. Nearly every new gadget you buy requires that you relearn a brand new interface, and develop new intuitions about how your machine will behave under various circumstances. For our older and more commonplace machines (telephones, cars) this is less true, but even those machines get upgraded to almost entirely unfamiliar designs.
In other words, I want to suggest that the instability of the captcha doesn’t reflect anything in particular about the abilities of the machine so much as it reflects our own unstable intuitions about how to classify and categorize machines. Developing the captcha is a temporary solution that probably wont last long; as more trust and weight is placed on these tests, we also increase the rewards for breaking it.