Watson is an artificially intelligent computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first CEO and industrialist Thomas J. Watson. The computer system was specifically developed to answer questions on the quiz show Jeopardy! In 2011, Watson competed on Jeopardy! against former winners Brad Rutter and Ken Jennings. Watson received the first place prize of $1 million.
Watson had access to 200 million pages of structured and unstructured content consuming four terabytes of disk storage including the full text of Wikipedia, but was not connected to the Internet during the game. For each clue, Watson's three most probable responses were displayed on the television screen. Watson consistently outperformed its human opponents on the game's signaling device, but had trouble responding to a few categories, notably those having short clues containing only a few words.
In February 2013, IBM announced that Watson software system's first commercial application would be for utilization management decisions in lung cancer treatment at Memorial Sloan–Kettering Cancer Center in conjunction with health insurance company WellPoint. IBM Watson's business chief Manoj Saxena says that 90% of nurses in the field who use Watson now follow its guidance.
- 1 Description
- 2 Operation
- 3 History
- 4 Future applications
- 5 See also
- 6 References
- 7 Further reading
- 8 External links
Watson is a question answering (QA) computing system that IBM built to apply advanced natural language processing, information retrieval, knowledge representation, automated reasoning, and machine learning technologies to the field of open domain question answering.
The key difference between QA technology and document search is that document search takes a keyword query and returns a list of documents, ranked in order of relevance to the query (often based on popularity and page ranking), while QA technology takes a question expressed in natural language, seeks to understand it in much greater detail, and returns a precise answer to the question.
According to IBM, "more than 100 different techniques are used to analyze natural language, identify sources, find and generate hypotheses, find and score evidence, and merge and rank hypotheses."
Watson uses IBM's DeepQA software and the Apache UIMA (Unstructured Information Management Architecture) framework. The system was written in various languages, including Java, C++, and Prolog, and runs on the SUSE Linux Enterprise Server 11 operating system using Apache Hadoop framework to provide distributed computing.
The system is workload optimized, integrating massively parallel POWER7 processors and being built on IBM's DeepQA technology, which it uses to generate hypotheses, gather massive evidence, and analyze data. Watson is composed of a cluster of ninety IBM Power 750 servers, each of which uses a 3.5 GHz POWER7 eight core processor, with four threads per core. In total, the system has 2,880 POWER7 processor cores and has 16 terabytes of RAM.
According to John Rennie, Watson can process 500 gigabytes, the equivalent of a million books, per second. IBM's master inventor and senior consultant Tony Pearson estimated Watson's hardware cost at about $3 million. Its performance stands at 80 TeraFLOPs which is not enough to place it at Top 500 Supercomputers list. According to Rennie, the content was stored in Watson's RAM for the game because data stored on hard drives are too slow to access.
The sources of information for Watson include encyclopedias, dictionaries, thesauri, newswire articles, and literary works. Watson also used databases, taxonomies, and ontologies. Specifically, DBPedia, WordNet, and Yago were used.
The IBM team provided Watson with millions of documents, including dictionaries, encyclopedias, and other reference material that it could use to build its knowledge. Although Watson was not connected to the Internet during the game, it contained 200 million pages of structured and unstructured content consuming four terabytes of disk storage, including the full text of Wikipedia.
When playing Jeopardy! all players must wait until host Alex Trebek reads each clue in its entirety, after which a light is lit as a "ready" signal; the first to activate their buzzer button wins the chance to respond. Watson received the clues as electronic texts at the same moment they were made visible to the human players. It would then parse the clues into different keywords and sentence fragments in order to find statistically related phrases. Watson's main innovation was not in the creation of a new algorithm for this operation but rather its ability to quickly execute thousands of proven language analysis algorithms simultaneously to find the correct answer. The more algorithms that find the same answer independently the more likely Watson is to be correct. Once Watson has a small number of potential solutions, it is able to check against its database to ascertain whether the solution makes sense. In a sequence of 20 mock games, human participants were able to use the average six to seven seconds that Watson needed to hear the clue and decide whether to signal for responding. During that time, Watson also has to evaluate the response and determine whether it is sufficiently confident in the result to signal. Part of the system used to win the Jeopardy! contest was the electronic circuitry that receives the "ready" signal and then examined whether Watson's confidence level was great enough to activate the buzzer. Given the speed of this circuitry compared to the speed of human reaction times, Watson's reaction time was faster than the human contestants except when the human anticipated (instead of reacted to) the ready signal. After signaling, Watson speaks with an electronic voice and gives the responses in Jeopardy! 's question format. Watson's voice was synthesized from recordings that actor Jeff Woodman made for an IBM text-to-speech program in 2004.
Comparison with human players
Ken Jennings, Watson, and Brad Rutter in their Jeopardy! exhibition match Watson's basic working principle is to parse keywords in a clue while searching for related terms as responses. This gives Watson some advantages and disadvantages compared with human Jeopardy! players. Watson has deficiencies in understanding the contexts of the clues. As a result, human players usually generate responses faster than Watson, especially to short clues. Watson's programming prevents it from using the popular tactic of buzzing before it is sure of its response. Watson has consistently better reaction time on the buzzer once it has generated a response, and is immune to human players' psychological tactics.
The Jeopardy! staff used different means to notify Watson and the human players when to buzz, which was critical in many rounds. The humans were notified by a light, which took them tenths of a second to perceive. Watson was notified by an electronic signal and could activate the buzzer within about eight milliseconds. The humans tried to compensate for the perception delay by anticipating the light, but the variation in the anticipation time was generally too great to fall within Watson's response time. Watson did not operate to anticipate the notification signal.
Since Deep Blue's victory over Garry Kasparov in chess in 1997, IBM had been on the hunt for a new challenge. In 2004, IBM Research manager Charles Lickel, over dinner with coworkers, noticed that the restaurant they were in had fallen silent. He soon discovered the cause of this evening hiatus: Ken Jennings, who was then in the middle of his successful 74-game run on Jeopardy!. Nearly the entire restaurant had piled toward the televisions, mid-meal, to watch the phenomenon. Intrigued by the quiz show as a possible challenge for IBM, Lickel passed the idea on, and in 2005, IBM Research executive Paul Horn backed Lickel up, pushing for someone in his department to take up the challenge of playing Jeopardy! with an IBM system. Though he initially had trouble finding any research staff willing to take on what looked to be a much more complex challenge than the wordless game of chess, eventually David Ferrucci took him up on the offer. In competitions managed by the United States government, Watson's predecessor, a system named Piquant, was usually able to respond correctly to only about 35% of clues and often required several minutes to respond. To compete successfully on Jeopardy!, Watson would need to respond in no more than a few seconds, and at that time, the problems posed by the game show were deemed to be impossible to solve.
In initial tests run during 2006 by David Ferrucci, the senior manager of IBM's Semantic Analysis and Integration department, Watson was given 500 clues from past Jeopardy! programs. While the best real-life competitors buzzed in half the time and responded correctly to as many as 95% of clues, Watson's first pass could get only about 15% correct. During 2007, the IBM team was given three to five years and a staff of 15 people to solve the problems. By 2008, the developers had advanced Watson such that it could compete with Jeopardy! champions. By February 2010, Watson could beat human Jeopardy! contestants on a regular basis.
Although the system is primarily an IBM effort, Watson's development involved faculty and graduate students from Rensselaer Polytechnic Institute, Carnegie Mellon University, University of Massachusetts Amherst, the University of Southern California's Information Sciences Institute, the University of Texas at Austin, the Massachusetts Institute of Technology, and the University of Trento, as well as students from New York Medical College.
Watson demo at an IBM booth at a trade show In 2008, IBM representatives communicated with Jeopardy! executive producer Harry Friedman about the possibility of having Watson compete against Ken Jennings and Brad Rutter, two of the most successful contestants on the show, and the program's producers agreed. Watson's differences with human players had generated conflicts between IBM and Jeopardy! staff during the planning of the competition. IBM repeatedly expressed concerns that the show's writers would exploit Watson's cognitive deficiencies when writing the clues, thereby turning the game into a Turing test. To alleviate that claim, a third party randomly picked the clues from previously written shows that were never broadcast. Jeopardy! staff also showed concerns over Watson's reaction time on the buzzer. Originally Watson signaled electronically, but show staff requested that it press a button physically, as the human contestants would. Even with a robotic "finger" pressing the buzzer, Watson remained faster than its human competitors. Ken Jennings noted, "If you're trying to win on the show, the buzzer is all," and that Watson "can knock out a microsecond-precise buzz every single time with little or no variation. Human reflexes can't compete with computer circuits in this regard." Stephen Baker, a journalist who recorded Watson's development in his book "Final Jeopardy", reported that the conflict between IBM and Jeopardy! became so serious in May 2010 that the competition was almost canceled. Watson learns from his mistakes, for example, the following mistake during a practice round: he was given the clue "This trusted friend was the first non-dairy powdered creamer," to which he replied, "What is milk?", mistaking the clue as asking for a dairy product. As part of the preparation, IBM constructed a mock set in a conference room at one of its technology sites to model the one used on Jeopardy! Human players, including former Jeopardy! contestants, also participated in mock games against Watson with Todd Alan Crain of The Onion playing host. About 100 test matches were conducted with Watson winning 65% of the games.
To provide a physical presence in the televised games, Watson was represented by an "avatar" of a globe, inspired by the IBM "smarter planet" symbol. Jennings described the computer's avatar as a "glowing blue ball criscrossed by 'threads' of thought—42 threads, to be precise," and stated that the number of thought threads in the avatar was an in-joke referencing the significance of the number 42 in Douglas Adams' Hitchhiker's Guide to the Galaxy. Joshua Davis, the artist who designed the avatar for the project, explained to Stephen Baker that there are 36 triggerable states that Watson was able to use throughout the game to show its confidence in responding to a clue correctly; he had hoped to be able to find forty-two, to add another level to the Hitchhiker's Guide reference, but he was unable to pinpoint enough game states.
A practice match was recorded on January 13, 2011, and the official matches were recorded on January 14, 2011. All participants maintained secrecy about the outcome until the match was broadcast in February.
In a practice match before the press on January 13, 2011, Watson won a 15-question round against Ken Jennings and Brad Rutter with a score of $4,400 to Jennings's $3,400 and Rutter's $1,200, though Jennings and Watson were tied before the final $1,000 question. None of the three players responded incorrectly to a clue.
The first round was broadcast February 14, 2011, and the second round, on February 15, 2011. The right to choose the first category had been determined by a draw won by Rutter. Watson, represented by a computer monitor display and artificial voice, responded correctly to the second clue and then selected the fourth clue of the first category, a deliberate strategy to find the Daily Double as quickly as possible. Watson's guess at the Daily Double location was correct. At the end of the first round, Watson was tied with Rutter at $5,000; Jennings had $2,000.
Watson's performance was characterized by some quirks. In one instance, Watson repeated a reworded version of an incorrect response offered by Jennings (Jennings said "What are the '20s?" in reference to the 1920s. Then Watson said "What is 1920s?") Because Watson could not recognize other contestants' responses, it did not know that Jennings had already given the same response. In another instance, Watson was initially given credit for a response of "What is leg?" after Jennings incorrectly responded "What is: he only had one hand?" to a clue about George Eyser (The correct response was, "What is: he's missing a leg?"). Because Watson, unlike a human, could not have been responding to Jennings's mistake, it was decided that this response was incorrect. The broadcast version of the episode was edited to omit Trebek's original acceptance of Watson's response. Watson also demonstrated complex wagering strategies on the Daily Doubles, with one bet at $6,435 and another at $1,246. Gerald Tesauro, one of the IBM researchers who worked on Watson, explained that Watson's wagers were based on its confidence level for the category and a complex regression model called the Game State Evaluator.
Watson took a commanding lead in Double Jeopardy!, correctly responding to both Daily Doubles. Watson responded to the second Daily Double correctly with a 32% confidence score.
Although it wagered only $947 on the clue, Watson was the only contestant to miss the Final Jeopardy! response in the category U.S. CITIES ("Its largest airport was named for a World War II hero; its second largest, for a World War II battle"). Rutter and Jennings gave the correct response of Chicago, but Watson's response was "What is Toronto?????" Ferrucci offered reasons why Watson would appear to have guessed a Canadian city: categories only weakly suggest the type of response desired, the phrase "U.S. city" didn't appear in the question, there are cities named Toronto in the U.S., and Toronto in Ontario has an American League baseball team. Dr. Chris Welty, who also worked on Watson, suggested that it may not have been able to correctly parse the second part of the clue, "its second largest, for a World War II battle" (which was not a standalone clause despite it following a semicolon, and required context to understand that it was referring to a second-largest airport). Eric Nyberg, a professor at Carnegie Mellon University and a member of the development team, stated that the error occurred because Watson does not possess the comparative knowledge to discard that potential response as not viable. Although not displayed to the audience as with non-Final Jeopardy! questions, Watson's second choice was Chicago. Both Toronto and Chicago were well below Watson's confidence threshold, at 14% and 11% respectively. (This lack of confidence was the reason for the multiple question marks in Watson's response.)
The game ended with Jennings with $4,800, Rutter with $10,400, and Watson with $35,734.
During the introduction, Trebek (a Canadian native) joked that he had learned Toronto was a U.S. city, and Watson's error in the first match prompted an IBM engineer to wear a Toronto Blue Jays jacket to the recording of the second match.
In the first round, Jennings was finally able to choose a Daily Double clue, while Watson responded to one Daily Double clue incorrectly for the first time in the Double Jeopardy! Round. After the first round, Watson placed second for the first time in the competition after Rutter and Jennings were briefly successful in increasing their dollar values before Watson could respond. Nonetheless, the final result ended with a victory for Watson with a score of $77,147, besting Jennings who scored $24,000 and Rutter who scored $21,600.
The prizes for the competition were $1 million for first place (Watson), $300,000 for second place (Jennings), and $200,000 for third place (Rutter). As promised, IBM donated 100% of Watson's winnings to charity, with 50% of those winnings going to World Vision and 50% going to World Community Grid. Similarly, Jennings and Rutter donated 50% of their winnings to their respective charities.
In acknowledgment of IBM and Watson's achievements, Jennings made an additional remark in his Final Jeopardy! response: "I for one welcome our new computer overlords", echoing a similar memetic reference to the episode "Deep Space Homer" on The Simpsons, in which TV news presenter Kent Brockman speaks of welcoming "our new insect overlords". Jennings later wrote an article for Slate, in which he stated "IBM has bragged to the media that Watson's question-answering skills are good for more than annoying Alex Trebek. The company sees a future in which fields like medical diagnosis, business analytics, and tech support are automated by question-answering software like Watson. Just as factory jobs were eliminated in the 20th century by new assembly-line robots, Brad and I were the first knowledge-industry workers put out of work by the new generation of 'thinking' machines. 'Quiz show contestant' may be the first job made redundant by Watson, but I'm sure it won't be the last."
Philosopher John Searle argues that Watson—despite impressive capabilities—cannot actually think. Drawing on his Chinese room thought experiment, Searle claims that Watson, like other computational machines, is capable only of manipulating symbols, but has no ability to understand the meaning of those symbols; however, Searle's experiment has its detractors.
Match against members of the United States Congress On February 28, 2011, Watson played an untelevised exhibition match of Jeopardy! against members of the United States House of Representatives. In the first round, Rush D. Holt, Jr. (D-NJ, a former Jeopardy! contestant), who was challenging the computer with Bill Cassidy (R-LA), led with Watson in second place. However, combining the scores between all matches, the final score was $40,300 for Watson and $30,000 for the congressional players combined.
IBM's Christopher Padilla said of the match, "The technology behind Watson represents a major advancement in computing. "In the data-intensive environment of government, this type of technology can help organizations make better decisions and improve how government helps its citizens."
According to IBM, "The goal is to have computers start to interact in natural human terms across a range of applications and processes, understanding the questions that humans ask and providing answers that humans can understand and justify." It has been suggested by Robert C. Weber, IBM's general counsel, that Watson may be used for legal research. The company also intends to use Watson in other information-intensive fields, such as telecommunications, financial services, and government.
Watson is based on commercially available IBM Power 750 servers that have been marketed since February 2010. IBM also intends to market the DeepQA software to large corporations, with a price in the millions of dollars, reflecting the $1 million needed to acquire a server that meets the minimum system requirement to operate Watson. IBM expects the price to decrease substantially within a decade as the technology improves.
Commentator Rick Merritt said that "there's another really important reason why it is strategic for IBM to be seen very broadly by the American public as a company that can tackle tough computer problems. A big slice of [IBM's profit] comes from selling to the U.S. government some of the biggest, most expensive systems in the world."
In 2013, it was reported that three companies were working with IBM to create apps embedded with Watson technology. Fluid is developing an app for retailer, The North Face, designed to provide advice to online shoppers. Welltok is developing an app designed to give people advice on ways to engage in activities to improve their health. MD Buyline is developing an app for the purpose of advising medical institutions on equipment procurement decisions.
In November, 2013, IBM announced it would make Watson's API available to software application providers, enabling them to build apps and services that are embedded with Watson's capabilities. To build out its base of partners who create applications on the Watson platform, IBM consults with a network of venture capital firms, which advise IBM on which of their portfolio companies may be a logical fit for what IBM calls the Watson Ecosystem. Thus far, roughly 800 organizations and individuals have signed up with IBM, with interest in creating applications that could use the Watson platform.
On January 30, 2013, it was announced that Rensselaer Polytechnic Institute would receive a successor version of Watson, which would be housed at the Institute's technology park and be available to researchers and students. By summer 2013, Rensselaer had become the first university to receive a Watson computer.
On February 6, 2014, it was reported that IBM plans to invest $100 million in a 10-year initiative to use Watson and other IBM technologies to help countries in Africa address development problems, beginning with healthcare and education.
On June 3, 2014, three new Watson Ecosystem partners were chosen from more than 400 business concepts submitted by teams spanning 18 industries from 43 countries. "These bright and enterprising organizations have discovered innovative ways to apply Watson that can deliver demonstrable business benefits," said Steve Gold, vice president, IBM Watson Group. The winners are Majestyk Apps with their adaptive educational platform, FANG (Friendly Anthropomorphic Networked Genome); Red Ant with their retail sales trainer; and GenieMD with their medical recommendation service.
On July 9, 2014, Genesys Telecommunications Laboratories announced plans to integrate Watson to improve their customer experience platform, citing the sheer volume of customer data to analyze is staggering.
Watson has been integrated with databases including Bon Appétit magazine to perform a recipe generating platform
In healthcare, Watson's natural language, hypothesis generation, and evidence-based learning capabilities allow it to function as a clinical decision support system for use by medical professionals. To aid physicians in the treatment of their patients, once a doctor has posed a query to the system describing symptoms and other related factors, Watson first parses the input to identify the most important pieces of information; then mines patient data to find facts relevant to the patient's medical and hereditary history; then examines available data sources to form and test hypotheses; and finally provides a list of individualized, confidence-scored recommendations. The sources of data that Watson uses for analysis can include treatment guidelines, electronic medical record data, notes from doctors and nurses, research materials, clinical studies, journal articles, and patient information. Despite being developed and marketed as a "diagnosis and treatment advisor," Watson has never been actually involved in the medical diagnosis process, only in assisting with identifying treatment options for patients who have already been diagnosed.
In February 2011, it was announced that IBM would be partnering with Nuance Communications for a research project to develop a commercial product during the next 18 to 24 months, designed to exploit Watson's clinical decision support capabilities. Physicians at Columbia University would help to identify critical issues in the practice of medicine where the system's technology may be able to contribute, and physicians at the University of Maryland would work to identify the best way that a technology like Watson could interact with medical practitioners to provide the maximum assistance.
In September 2011, IBM and WellPoint, a major American healthcare solutions provider, announced a partnership to utilize Watson's data crunching capability to help suggest treatment options to doctors. Then, in February 2013, IBM and WellPoint gave Watson its first commercial application, for utilization management decisions in lung cancer treatment at Memorial Sloan–Kettering Cancer Center.
IBM announced a partnership with Cleveland Clinic in October 2012. The company has sent Watson to the Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, where it will increase its health expertise and assist medical professionals in treating patients. The medical facility will utilize Watson's ability to store and process large quantities of information to help speed up and increase the accuracy of the treatment process. "Cleveland Clinic's collaboration with IBM is exciting because it offers us the opportunity to teach Watson to 'think' in ways that have the potential to make it a powerful tool in medicine," said C. Martin Harris, MD, chief information officer of Cleveland Clinic.
On February 8, 2013, IBM announced that oncologists at the Maine Center for Cancer Medicine and Westmed Medical Group in New York have started to test the Watson supercomputer system in an effort to recommend treatment for lung cancer.
- Baker, Stephen (2012) Final Jeopardy: The Story of Watson, the Computer That Will Transform Our World, Mariner Books
- IBM Watson platform in the Cloud and Applications
- Watson homepage
- DeepQA homepage
- About Watson on Jeopardy.com
- Smartest Machine on Earth (PBS NOVA documentary about the making of Watson)
- Power Systems
- The Watson Trivia Challenge. The New York Times. June 16, 2010.
- #IBMWatson Twitter hashtag
- This is Watson - IBM Journal of Research and Development (published by the IEEE)
- Jeopardy! Show #6086 - Game 1, Part 1
- Jeopardy! Show #6087 - Game 1, Part 2
- Jeopardy! Show #6088 - Game 2
- PBS NOVA documentary on the making of Watson
- Template:YouTube (21:42), IBMLabs
- Template:YouTube - November 15, 2011, David Ferrucci at Computer History Museum, alternate
- Template:YouTube - 2012
- Template:YouTube - IBM at EDGE 2012
- Template:YouTube - Martin Kohn, 2013
- IBM Watson playlist, IBMLabs Watson playlist