Thursday, February 17, 2011

On robot overlords and other things.

To those of you who watched Watson handily defeat two Jeopardy masters Ken Jennings and Brad Rutter, I say to you PANIC. Now is the time to stock up on water and essential food supplies. If you haven't built your nuclear fallout shelter and formulated a post-apocalyptic plan yet, you should definitely get on that. It's only a matter of time before Watson decides "Skynet" is a better name and proceeds to hack the US Defense department computers, hijack all the world's nuclear weapons and rain fire and death down upon us hapless meatbags. This will probably happen on December 21, 2012, just like those sneaky Mayans predicted.


Now for those of you who care enough to have read to this point, let me start out by saying that Watson poses no more of a threat to humanity than any other computer on the planet by the shear virtue of Watson being no more aware of its own existence than, say, my coffee mug. Watson was not designed as a machine to replicate human intelligence (I'll explain this below), but instead represents a significant step forward in a machine's understanding of basic human language, both semantically and grammatically.


Although I don't know the exact algorithms underlying Watson's surprisingly "human" ability to understand and propose answers to complex questions (I have my guesses, but they're just that: guesses), my own doctoral research in the fields of artificial intelligence, natural language processing, topic modeling, and decision theory give me some insight to posit on Watson's inner workings. In language, even relatively simple phrases have a complex logical underpinnings. The difference between "I do like apples" and "I do not like apples" may seem obvious to us, but that's because we're intuitively familiar with the inverting nature of "not" and its role as an adverb. In our brains, at a fundamental layer of consciousness, most of us in America think in English (because that's our first language). A computer thinks entirely in binary and has absolutely no idea what "not" means unless we program it in (a highly inefficient task when you start thinking about the number of different words in the English language, the subtly different ways we use them, the unpredictable nature of colloquial language, and the different meaning a single word can take on). Thus, to most "state-of-the-art" topic modeling algorithms, "I do like apples" and "I do not like apples" are really the same sentence, the difference between them is the subtle insertion of the word "not". The best we can do currently is for a computer to learn co-occurrences of words, i.e. if we see the word "airplane" we should also expect to see words such as "runway", "airport" and "take-off", but will probably not see words that have little to do with airplanes such as "fish" or "ophthalmology". In pre-processing large corpora, we often remove what we call "stop words" or words that have little to no semantic meaning such as "I", "a", "not", "do", etc. In doing so, we allow our models to better learn co-occurrence of semantic words but discard much of the rich syntactic and contextual nature of language. It is for this reason that synthesized speech, while usually semantically relevant and grammatically correct, often lack that certain je ne sais quoi that makes human speech and conversations so rich and compelling.


Although recent (and much ongoing -- including, as a shameless plug, my own) research has increased a machine's understanding of grammatical and syntactic phrase, sentence, paragraph structure (natural language, if you will), we are still quite a long way from approaching human levels of linguistic intuition. We are limited on two main fronts. The first is computing power -- most of these cutting edge algorithms introduce complex statistical dependencies resulting in high-dimensional solution spaces where the task of finding optimized solutions is, if not impossible, then often intractable given the omnipresent constraints of time and computational power. The second issue is the highly linearized approximations we use in most cognitive computing models. The simple truth is that the human brain's power comes not from raw computing power (we actually possess very little raw computing power, try adding large numbers quickly in your head, as an example), but in its highly nonlinear mode of operation**. It is this remarkable nonlinearity that humans (and possibly other higher order organisms such as whales) are able to link seemingly disparate ideas and concepts in the synthesis of new thought and knowledge.


Watson solves the first issue through brute force by employing 32 quad core 3.5 GHz processors and some 16 terabytes of RAM. However, it confidently arrived at glaringly wrong answers on several occasions (e.g. the "finis" vs "terminus" answer) when an educated human would probably not have made the same mistake. If you go back and re-watch the three episodes, you'll notice that Watson tended to fail more when asked to synthesize multiple thoughts together (it tended to answer one part of the question fully, but not the entire question), whereas the human competitors' mistakes were more of the "intuitively wrong" category, that is, they synthesized together the wrong ideas. While I believe IBM has made enormous strides in computational natural language processing, it's clear that challenges still remain in terms artificial cognition and intuition. For this reason, I believe Watson and IBM have not solved the second issue (though certainly they've made great strides in this direction as well).


I don't mean to say any of this to disparage IBM's efforts. Watson is a monumental achievement in computing, and represents a milestone in our understanding of language modeling and synthesis. We should be proud that our species, newcomers on the universal stage, with our cosmologically infinitesimal life spans, our biological feebleness, clinging to life on a thin film on the surface of smallish iron lump in the outskirts of an average spiral galaxy, possesses what Carl Sagan once called a "great soaring passionate intelligence." Instead, we should look to markers of progress like Watson with pride and with hope that we hold the tools for understanding ourselves, our universe, and empowering humanity's future.


Of course, I could be wrong. We might already have the tools to develop highly human-like synthetic intelligence. This would explain Justin Bieber.


**To see an example of this nonlinearity for yourself, consider taking a road trip and how we often percieve the trip home to be shorter than the way there despite the total passage of time in both directions being almost the same. The very organic (human?) experience of emotions, expectations, and anticipation of the unknown/returning to the familiar, warps how we percieve our realities. In fact, each of our "realities" are unique, and are actually a synthesis of what our senses and what we individually believe we sense. Our cognitive and computational abilities, not to mention our noisy, unreliable, and low-bandwidth nervous system, are too limited to process more than a small sample of the world at any moment. It is up to our brains to fill in the rest. It looks like Aldous Huxley was right, each man truly is an island.


Computers, as they currently stand, are utterly incapable of being "fooled" in such fashion (at least not until you get down to the quantum mechanical level and travel at an appreciable fraction of light speed, or approach a large star/a black hole). Unfortunately, it is this precise "preciseness" of the transistor-borne computer that makes replicating human intuition and emotion so difficult, but we are making progress.

No comments: