SPEECH AND PERSONAL ASSISTANCE, October 2010

Speaker:

·        Sunil Vemuri, co-founder and CPO, reQall

Discussants:

·        Jared Bernstein, Ordinate/Pearson

·        Jordan Cohen, Spelamode

·        Farzad Ehsani, Fluential

·        Dilek Hakkani-Tur, Microsoft

·        Nancy Jamison, Jamison Consulting

·        Alan Levi, Palm/HP

·        Patti Price, PPRICE Speech and Language Technology

·        Fuliang Weng, Bosch

 

NOTES:
Background on Sunil Vemuri

Dr. Sunil Vemuri’s is Co-Founder and Chief Product Officer at reQall Inc.  His interests include human memory assistance; information retrieval, extraction, and visualization; knowledge acquisition; organizational memory; speech recognition; and interface and interaction design.

 

At reQall, Dr. Vemuri oversees product development and contributes to the company’s product vision. Previously, Sunil worked in the Apple Technology Group at Apple Computer, the Beckman Research Institute of the City of Hope hospital near Los Angeles, and France Telecom. He has four patents, and his research has been covered by The New York Times, The Boston Globe, Newsweek, CNN, and MIT Technology Review.

Overview

ReQall is a voice-enabled memory aid to seamlessly integrate your mobile phone, email, text messaging and IM into an organizer, reminder system and productivity assistant. reQall lets you capture ideas, tasks and commitments before, and it proactively reminds you of your tasks.

Background

The roots of reQall are in Sunil’s thesis at MIT, a memory prosthesis.  He recorded 2 years of data, including conversations with anyone who gave permission for the recordings, weather data, email, etc.  The specific focus of the thesis work was cases in which something had been forgotten and an attempt was made to recover the information from memory. The recording of such data did help with both forgetting and ‘blocking’ (the tip of the tongue syndrome’).  There were some differences among the subjects studied.  Those who were particularly overloaded with conversations with many people on similar topics tended to misattribute more who said what, for example.  The user sample was not large, however.

The paradigm was to give a subject a question based on the recorded data and ask them to remember.  They then got access to a tool to help search through the recordings (they could listen to a speeded up version as an errorful transcript streamed by with confidence levels in accuracy indicated by the font). Error rates at that time were on the order of 35%. But it still helped as a search tool.

A key lesson from that work was that it was not a good idea to record everything: that results in so much bathwater that it is hard to find the baby; it’s better to let users choose what to keep. Sunil found that he used the system often when he was writing a paper or preparing a talk.  In two previous studies people wrote notes, but there were no recordings.

After finishing his thesis, he worked at Apple. He had hoped to see if industrial (group) memory could be improved, but that question has not been answered. He was interested in finding out what part partial memory might play in helping to recover more details.  It seems that it can help for up to about a year, and not much after that.  Before that, the transcription, even very errorful transcriptions, can help to contextualize things and recover memories through that context.

He did not look at memory dysfunction (such as aphasia, or Alzheimers). The total number of subjects was about 12, with most of the data focused on 3.  Recording devices are better now. And, the fact that recording was taking place probably affected the data.  To understand these consequences, he walked around with a visible microphone on: some people backed away, while others seemed attracted to it and came up to talk.

His thesis garnered some press attention and he was recruited to help start a company.  For the product the typical recordings are very short , about 10 seconds typically, or less.  There is also a ‘nag’ function, to remind others.  The research at MIT focused on those times when you know you are having a memory problem.  The product focuses on setting reminders to help prevent those situations. The system knows of your reminders, and your calendar, and location.  You can look at your reminders at any time, but the system is also proactive and will make decisions on when to remind you of things.

People typically use it in three areas: lists (shopping lists, for example), relationships/contact management, and health/fitness. But there is a very long tail with many, many uses.  Speech recognition is supplemented by natural language (based on key word spotting).

There is a free version and a paid version.  Their one-week retention is a little less than average, but their conversion rate to the paid version is above average.

They have been fortunate with press --- David Pogue had a very nice column discussing it, for example.

Discussion

Much of the discussion focused on how it might be used to take advantage of demographic information of the users. Perhaps cheaper rates could be given for revealing more personal information.  There is a lot more information in recordings, because speech is easier than typing. (The average Google search is about 2.3 words, whereas reQall user utterances contain 7 – 8 words, and also the location of the speaker.   The company is not currently looking at the use of the system for an aging population (who is likely to suffer more than average memory issues). But they would love for someone else to do this.  They are looking at trying to perform some of the requests using 3rd party partners, e.g., Amazon buy button if the reminder is to buy a certain book.   The system uses Google’s recognition at present, and Yap’s on the iPhone.  Users can opt in to get human correction or not.  Early system used human transcribers, and it is hard to migrate those users to the more errorful fully automated system.  They are not a recognition company and are looking at all recognizers.  Yap seems a little better than Google for some things, but they have used Google, Yap, Nuance, U. Colorado, Sphinx.  Yap has been very responsive.  Since the quality of the recognition is EVERYTHING, it is worth spending more money on it, including using multiple recognizers. On the iPhone, because of the human transcription, there is a half hour delay between the recording and the transcription. On the Android, because there is no human in the loop, the transcription is immediate. Now, with Yap, the transcription CAN be done faster on the iPhone, but the users are used to the less errorful human transcription. So they need to reset expectations.