SPEECH AND Mobility, June 2010, notes

Speaker:

· Adam Cheyer, Engineering Director, Apple, and co-founder, SIRI (Adam’s website)

Discussants:

· Victor Abrash

· Jordan Cohen

· Mike Cohen

· Farzad Ehsani

· Beth Ann Hockey (with special guest appearance of Venice)

· Jaya Jagannath

· Todd Mozer

· Leo Neumeyer

· Patti Price

· Ben Reaves

· Min Yin

· Fuliang Weng

· Silke Witt

· Jing Zheng

· Jua? Bosch

NOTES:
Background on Adam Cheyer:
Adam Cheyer is currently an engineering director at Apple. He is a co-founder of Siri, Inc., a company aiming to fundamentally redesign the face of the consumer internet experience. He is also a Founding Member of Change.org, the premier social network for positive social change, and a Founder of Genetic Finance, LLC. (creator of Sandwalk Capital). Previously, Adam was a Program Director in SRI's Artificial Intelligence Center, where he served as Chief Architect of the CALO/PAL project, an ambitious effort to create a next-generation personal cognitive assistant that learns and self-improves "in the wild" (e.g. with no code changes). Adam was also VP of Engineering at Dejima and at Verticalnet. He has many years of experience in a variety of roles, including executive, software engineer, research scientist, consultant, lecturer, and technical manager. A pioneer in the areas of distributed computing, intelligent agents, and advanced user interfaces, he is the author of more than fifty peer-reviewed publications and nine patents. As Senior Scientist and Co-Director of the Computer Human Interaction Center (CHIC) at SRI International, Mr. Cheyer led a multidisciplinary team of researchers exploring web services, distributed knowledge, and pervasive computing. While at Bull S.A., he was lead developer and architect for NOEMIE, a configuration expert system used to manage Bull's line of 30,000 hardware and software products worldwide. Mr. Cheyer received his bachelor's degree with highest honors from Brandeis University and his master's degree with an "outstanding master's student" award from UCLA.

Overview

The goal of SIRI is to provide a virtual assistant to help manage meetings, social events, and other activities. Just like a real assistant, Siri aims to understand what you say, accomplish tasks for you and adapt to your preferences over time. Today, Siri can help you find and plan things to do. You can ask Siri to find a romantic place for dinner, tell you what’s playing at a local jazz club or get tickets to a movie for Saturday night. Siri may occasionally misunderstand things you ask it to do even within its range of understanding. Nonetheless, Siri will improve by getting to know you better and understanding a broader set of tasks. In fact, right now, Siri’s learning how to handle reminders, flights stats and reference questions. Our vision is that, over time, you’ll trust Siri to manage many personal details in your life - from recommending a wine you might enjoy to managing your to do list. The current version of Siri is built for the iPhone 3GS and the iPod Touch and works only in the US. Soon, Siri will run on iPhone 3G and additional mobile platforms, as well.

Background

Part of the genesis of SIRI was SRI’s participation in the ATIS (Air Travel Information System) task sponsored by DARPA, the first benchmark that involved both speech recognition and natural language understanding. An early demo of the Open Agent Architecture showed speech and language technology as part of the services provided. This architecture allowed new services to sign on with the services offered and needs required to offer them. A dramatic demo was given for a DARPA funder when, while rolling the cart down to the demo room the video card came loose and was not working. The system detected the lack of video, found Adam’s schedule and the fact that he was in the demo room, found the phone number there, and used text to speech synthesis to provide the output over the phone rather than on the screen. Adam worked on various components of the vision at SRI: the Open Agent Architecture, using speech and natural language as services, and computer human interfaces in the context of the CHIC program and the CALO/PAL project. Later work at VerticalNet fleshed out the distributed part of the vision. The natural language components were delivered in a product to Salesforce when he worked at Dejima.

While working at SRI on a 5 year project involving 400 some people at 25 institutions, he was hoping that with all those smart people there should be a major breakthrough. But in the end, what gets measured is what advances, and in the controlled environment, systems were ‘toy’ playgrounds where a few more percent correct was eked out of the various systems. He gradually made the transition to enjoying the commercial world and seeing systems roll out to real users.

He left for VerticalNet to take 10 months with 10 people to make something real out of the CALO project. (think of Visual Basic for AI). And shortly afterward saw a VLINGO demo that seemed fast, accurate and usable and decided speech recognition was ready to be integrated into a system that could adapt to both you and to the resources available on the internet. This became SIRI (now owned by Apple). The initial focus was local search and command and control. There are plenty of ambiguities that make things complex. In February a sample application was offered to 100 people to gather some initial data. Using allmenus.com, for example, to gather information on restaurant menus, and other web resources, accepting tap/type input or speech with auto-completion the system is able to answer questions such as ‘who has the best lasagna in L.A.’ or ‘what romance movies are playing locally6’. As an example ‘book a table for 2 at Zibibbo’s in Palo Alto’ has to resolve the ambiguity of the word ‘book’ find Zibibbo’s and how to make a reservation there, and collect the information needed to do that, using paraphrasing as confirmation. Demos and more examples are at the siri.com website. People in conversation don’t back out and start over --- they just repair and move on. That’s the strategy SIRI takes. Today there are about 40 services. Some defaults are set, but these are just data values that could be set by a user, or automatically adapted.

SIRI is not a speech company. SIRI initially partnered with VLINGO, but went to market with Nuance. It’s not clear what will happen going forward. Apple does still have a small speech group and he is starting to meet relevant people. He thought of SRI as a big toy store. Steve Jobs told him to think of Apple as his candy store.

Discussion

There was plenty of time for discussion and socializing and the real killer app was found: texting while driving. (it’s a joke, get it?)