Thursday, February 17, 2011

On robot overlords and other things.

To those of you who watched Watson handily defeat two Jeopardy masters Ken Jennings and Brad Rutter, I say to you PANIC. Now is the time to stock up on water and essential food supplies. If you haven't built your nuclear fallout shelter and formulated a post-apocalyptic plan yet, you should definitely get on that. It's only a matter of time before Watson decides "Skynet" is a better name and proceeds to hack the US Defense department computers, hijack all the world's nuclear weapons and rain fire and death down upon us hapless meatbags. This will probably happen on December 21, 2012, just like those sneaky Mayans predicted.


Now for those of you who care enough to have read to this point, let me start out by saying that Watson poses no more of a threat to humanity than any other computer on the planet by the shear virtue of Watson being no more aware of its own existence than, say, my coffee mug. Watson was not designed as a machine to replicate human intelligence (I'll explain this below), but instead represents a significant step forward in a machine's understanding of basic human language, both semantically and grammatically.


Although I don't know the exact algorithms underlying Watson's surprisingly "human" ability to understand and propose answers to complex questions (I have my guesses, but they're just that: guesses), my own doctoral research in the fields of artificial intelligence, natural language processing, topic modeling, and decision theory give me some insight to posit on Watson's inner workings. In language, even relatively simple phrases have a complex logical underpinnings. The difference between "I do like apples" and "I do not like apples" may seem obvious to us, but that's because we're intuitively familiar with the inverting nature of "not" and its role as an adverb. In our brains, at a fundamental layer of consciousness, most of us in America think in English (because that's our first language). A computer thinks entirely in binary and has absolutely no idea what "not" means unless we program it in (a highly inefficient task when you start thinking about the number of different words in the English language, the subtly different ways we use them, the unpredictable nature of colloquial language, and the different meaning a single word can take on). Thus, to most "state-of-the-art" topic modeling algorithms, "I do like apples" and "I do not like apples" are really the same sentence, the difference between them is the subtle insertion of the word "not". The best we can do currently is for a computer to learn co-occurrences of words, i.e. if we see the word "airplane" we should also expect to see words such as "runway", "airport" and "take-off", but will probably not see words that have little to do with airplanes such as "fish" or "ophthalmology". In pre-processing large corpora, we often remove what we call "stop words" or words that have little to no semantic meaning such as "I", "a", "not", "do", etc. In doing so, we allow our models to better learn co-occurrence of semantic words but discard much of the rich syntactic and contextual nature of language. It is for this reason that synthesized speech, while usually semantically relevant and grammatically correct, often lack that certain je ne sais quoi that makes human speech and conversations so rich and compelling.


Although recent (and much ongoing -- including, as a shameless plug, my own) research has increased a machine's understanding of grammatical and syntactic phrase, sentence, paragraph structure (natural language, if you will), we are still quite a long way from approaching human levels of linguistic intuition. We are limited on two main fronts. The first is computing power -- most of these cutting edge algorithms introduce complex statistical dependencies resulting in high-dimensional solution spaces where the task of finding optimized solutions is, if not impossible, then often intractable given the omnipresent constraints of time and computational power. The second issue is the highly linearized approximations we use in most cognitive computing models. The simple truth is that the human brain's power comes not from raw computing power (we actually possess very little raw computing power, try adding large numbers quickly in your head, as an example), but in its highly nonlinear mode of operation**. It is this remarkable nonlinearity that humans (and possibly other higher order organisms such as whales) are able to link seemingly disparate ideas and concepts in the synthesis of new thought and knowledge.


Watson solves the first issue through brute force by employing 32 quad core 3.5 GHz processors and some 16 terabytes of RAM. However, it confidently arrived at glaringly wrong answers on several occasions (e.g. the "finis" vs "terminus" answer) when an educated human would probably not have made the same mistake. If you go back and re-watch the three episodes, you'll notice that Watson tended to fail more when asked to synthesize multiple thoughts together (it tended to answer one part of the question fully, but not the entire question), whereas the human competitors' mistakes were more of the "intuitively wrong" category, that is, they synthesized together the wrong ideas. While I believe IBM has made enormous strides in computational natural language processing, it's clear that challenges still remain in terms artificial cognition and intuition. For this reason, I believe Watson and IBM have not solved the second issue (though certainly they've made great strides in this direction as well).


I don't mean to say any of this to disparage IBM's efforts. Watson is a monumental achievement in computing, and represents a milestone in our understanding of language modeling and synthesis. We should be proud that our species, newcomers on the universal stage, with our cosmologically infinitesimal life spans, our biological feebleness, clinging to life on a thin film on the surface of smallish iron lump in the outskirts of an average spiral galaxy, possesses what Carl Sagan once called a "great soaring passionate intelligence." Instead, we should look to markers of progress like Watson with pride and with hope that we hold the tools for understanding ourselves, our universe, and empowering humanity's future.


Of course, I could be wrong. We might already have the tools to develop highly human-like synthetic intelligence. This would explain Justin Bieber.


**To see an example of this nonlinearity for yourself, consider taking a road trip and how we often percieve the trip home to be shorter than the way there despite the total passage of time in both directions being almost the same. The very organic (human?) experience of emotions, expectations, and anticipation of the unknown/returning to the familiar, warps how we percieve our realities. In fact, each of our "realities" are unique, and are actually a synthesis of what our senses and what we individually believe we sense. Our cognitive and computational abilities, not to mention our noisy, unreliable, and low-bandwidth nervous system, are too limited to process more than a small sample of the world at any moment. It is up to our brains to fill in the rest. It looks like Aldous Huxley was right, each man truly is an island.


Computers, as they currently stand, are utterly incapable of being "fooled" in such fashion (at least not until you get down to the quantum mechanical level and travel at an appreciable fraction of light speed, or approach a large star/a black hole). Unfortunately, it is this precise "preciseness" of the transistor-borne computer that makes replicating human intuition and emotion so difficult, but we are making progress.

Sunday, April 27, 2008

Towels are a good idea.

So I've finally decided to join interweb two dot oh. I've always held the belief that blogs are pointless and stupid, but lately my point of view has changed, and I'll explain why. I've had a bit of a cognitive surplus lately (kudos to Clay Shirky for that phrase, check it out at here), and I've been thinking a lot about communication.

Your cellphone and laptop is a pretty powerful communications tag-team. These two pieces of electronics allows you access to a wealth of communications and information options that even the CIA couldn't dream about ten years ago. We can "google" anyone in the world and find out information on them, we can contact people we know we are halfway around the world. For the first time in human history, our ability to communicate is not limited by the medium. The medium of communication, of information dissemination between one person or a group of people, is what has shaped our world.

Think of our social structures: towns, states, provinces, countries. These were founded in an era when the speed of the message was limited by how fast a horse could run, or a ship could sail. This one limitation determined the size of countries. It's no surprise then that many European countries are the same magnitude of size. We drew borders at the edges of our communication envelopes, and these are the same borders which stand today. America was founded in a time when electronic communication was just around corner, when mail could be delivered at the speed of train. Thus, America stretches the length of the North American continent. China was not unified until the 20th century when modern communication made unification possible.

So what am I getting at?

We're on the event horizon of something extraordinary. I predict that in the next century, the old models of government, economics, and society will either evolve so as to be unrecognizable, or disappear entirely. I'm not saying that society will devolve into anarchy, far from it. What we view as the "virtual" world, the one in which World of Warcraft, Second Life, Xbox Live, online colleges, Facebook, eBay, Craigslist, blogs, and Myspace all exist, will become much more prolific, to the point that we will no longer consider it to be virtual. A person's electronic existence will be just as important as their physical existence. Anyone who's ever been locked out of their Facebook or Myspace account before has firsthand experience that this is already happening.

The concept of "nations" will change in this new world. We will still draw boundaries in the electronic world. Humans love boundaries, they serve to separate and insulate. However, these new boundaries will be drawn based on ideas rather than geographical locations. This difference will allow electronic nations to be far more efficient and effective than their physical counterparts. They will be more stable as the citizens of electronic countries will share common principles and ideas. They will be more specialized and more hierarchically flat. They will practice truer democracy than anything we as a society have ever experienced. They will be smaller than current nations, and be highly willing to trade with other nations who are specialized in other fields. People will be able to belong to as few or as many electronic nations as they want.

Eventually, technology will reach the point such that we will be able to be plugged into the social network all the time. This is when the idea of the virtual world truly drops away, every human being then lives in a duality of existence. The important question I've been asked is "How will people keep their two lives separate?" My answer to that is simple: They won't. Just like Schroedinger's Cat in a box argument, we cannot simultaneously exist in two places at once. Instead, think of navigating the electronic world as adding extra dimensions to our physical existence. I've long postulated that higher level thought occurs in dimensions far greater than the 4 dimensional physical world we perceive (i.e. our brains are fully capable of processing data in high dimensional spaces, called "manifolds", despite the limited 4-dimensional data that our senses feed it; in fact I think our brains map our sensory inputs into a much higher dimensional space to compute before mapping it back down to 4 dimensions for output, we just don't realize it). If you buy this argument, then living in a higher dimensional world, navigating a world defined simultaneously by 3-d physical topography, 1-d constant temporal displacement, and n-d thought topography presents little problem to our cognitive abilities, with the only danger being some initial disorientation.

Eventually (and all you AI and ML people, you knew this part was coming), this leads to what's known in the cottage AI industry as a "singularity", or collective awareness and consciousness. Actually, each electronic nation will be its own singularity. I'm not saying this is good or evil, but I am convinced it's going to happen.


So here we find ourselves, at the edge of a great social event. This is a paradigm shift on the order of the invention of fire, the construction of the first cities, and the industrial revolution. It may even be greater than all of those events, because we are now able to expand and redefine the "human experience."

So I've joined the web 2.0 revolution, but with the realization that even the internet as we know it is a bit of an experiment. And experiments sometimes fail. Actually, experiments usually fail. Nonetheless, critical mass has already been reached. The human race is too connected now to every go back to the way it was. We've already crossed the event horizon, past the point of no return.

I hope you brought a towel.