Why isn't voice-based UI mainstream?

13 May 2008 - 1:36pm
5 years ago
32 replies
1178 reads
Jeff Garbers
2008

Most of us old-timers probably expected voice I/O to be a common part
of personal computing by now. But here we are in 2008, and I don't see
even early signs of voice emerging into the mainstream. Products like
Naturally Speaking have some popularity, but my sense is that they're
used far more for dictation than any sort of command and response
interface. Both Mac OS X and Windows Vista have built-in speech
recognition capability, but does anybody use them (or even know
they're there)?

So my question for the group is: why? Is it due to technical
shortcomings, like recognition accuracy and dealing with background
noise? Are there social issues, like not wanting to be overheard or
feeling silly talking to a machine?

Or is it that splicing a voice-based UI into current graphical
interfaces just doesn't give a satisfactory user experience?

This, to me, is the most intriguing possibility. Voice command today
reminds me of the earliest versions of mice for PCs, which generated
arrow keystrokes as you moved them around; although they were
ostensibly compatible with the existing applications, they just didn't
work well enough to justify using them. Could it be that an effective
voice-based UI requires a more basic integration into the OS and
applications? Perhaps we need an OS-defined structure for a spoken
command syntax and vocabulary rather than just expecting users to
speak menu items?

Why aren't we talking to our computers yet? Should we be?

Comments

13 May 2008 - 2:43pm
Jeffrey D. Gimzek
2007

Imagine 4 people in a small office all talking to their computers
every 2 seconds to say "new window....scroll down..... stop...up...
select file...."

I think it is mostly social, although everyone i know that has tried
voice command has given it up, even when trying home alone in the
quiet house, so the tech isnt there either.

plus, talking is WAY slower than your hands.

On May 13, 2008, at 11:36 AM, Jeff Garbers wrote:

> Most of us old-timers probably expected voice I/O to be a common
> part of personal computing by now. But here we are in 2008, and I
> don't see even early signs of voice emerging into the mainstream.
> Products like Naturally Speaking have some popularity, but my sense
> is that they're used far more for dictation than any sort of command
> and response interface. Both Mac OS X and Windows Vista have built-
> in speech recognition capability, but does anybody use them (or even
> know they're there)?
>
> So my question for the group is: why? Is it due to technical
> shortcomings, like recognition accuracy and dealing with background
> noise? Are there social issues, like not wanting to be overheard or
> feeling silly talking to a machine?
>
> Or is it that splicing a voice-based UI into current graphical
> interfaces just doesn't give a satisfactory user experience?
>
> This, to me, is the most intriguing possibility. Voice command today
> reminds me of the earliest versions of mice for PCs, which generated
> arrow keystrokes as you moved them around; although they were
> ostensibly compatible with the existing applications, they just
> didn't work well enough to justify using them. Could it be that an
> effective voice-based UI requires a more basic integration into the
> OS and applications? Perhaps we need an OS-defined structure for a
> spoken command syntax and vocabulary rather than just expecting
> users to speak menu items?
>
> Why aren't we talking to our computers yet? Should we be?

13 May 2008 - 2:55pm
Kristopher Kinlen
2008

I am currently dealing with the same questions / problems. I work in
the clinical space where the user's hands are often gloved up and
covered in fluids. Interacting with software via a touchscreen or
hardware device presents sterility issues so voice is the natural
solution. As simple an answer as that seems, to date, few people in
the industry actually use the voice solutions that are available.

It seems to be creeping in... sync in cars is becoming more common as
well as the touch tone menus on the other end of many 1-800 numbers
being replaced by voice.

I had the same sort of thoughts...

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=29005

13 May 2008 - 3:02pm
gretchen anderson
2005

>plus, talking is WAY slower than your hands.
You bet. At least for some things.

We just did a related project and looked at voice, and one thing that
came up is that StarTrek really set an expectation that's hard to
deliver on. The whole "computer: [insert your open ended, humanly voiced
question/command here]" thing isn't quite prime time. Plus, people have
a hard time remembering the voice commands where a GUI can give you
prompts.

Another reason is that many platforms make invoking voice command hard.
You often have to go somewhere/do something special and then start
talking. I subscribed to Jott, thinking it would be my new fave way to
set reminders for myself. But in reality I don't remember to make a
special phone call to set a reminder, I go to my calendar on my phone.
"Input where you output!"

13 May 2008 - 2:23pm
Christine Boese
2006

David Pogue at NYTimes has been out front about how in love with voice
systems he is. You can search his past columns.

Chris

On Tue, May 13, 2008 at 2:36 PM, Jeff Garbers <jgarbers at xltsoftware.com>
wrote:

> Most of us old-timers probably expected voice I/O to be a common part of
> personal computing by now. But here we are in 2008, and I don't see even
> early signs of voice emerging into the mainstream. Products like Naturally
> Speaking have some popularity, but my sense is that they're used far more
> for dictation than any sort of command and response interface. Both Mac OS
> X and Windows Vista have built-in speech recognition capability, but does
> anybody use them (or even know they're there)?
>
> So my question for the group is: why? Is it due to technical shortcomings,
> like recognition accuracy and dealing with background noise? Are there
> social issues, like not wanting to be overheard or feeling silly talking to
> a machine?
>
> Or is it that splicing a voice-based UI into current graphical interfaces
> just doesn't give a satisfactory user experience?
>
> This, to me, is the most intriguing possibility. Voice command today
> reminds me of the earliest versions of mice for PCs, which generated arrow
> keystrokes as you moved them around; although they were ostensibly
> compatible with the existing applications, they just didn't work well enough
> to justify using them. Could it be that an effective voice-based UI
> requires a more basic integration into the OS and applications? Perhaps we
> need an OS-defined structure for a spoken command syntax and vocabulary
> rather than just expecting users to speak menu items?
>
> Why aren't we talking to our computers yet? Should we be?
>
>
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

13 May 2008 - 3:09pm
Christine Boese
2006

It just struck me, I wonder how much of the resistance to it is probably
because of "Open the pod bay door, HAL."

Another freaky thing hit me the other day, very disconcerting. I listen to
public radio constantly at home, every morning. I imagine public radio has
many reasons to want to cut costs, but unlike NOAA (the automated weather
repeater you get on your weather radio as you drive through thunderstorms
and tornadoes cross-country... sometimes I just listen so I can feel like
Stephen Hawking is riding with me in the car... if I could just get it to
talk about string theory or something fun), public radio would have
REALISTIC sounding automated voice announcers, wouldn't they?

I really don't think NPR is running segues and other bits from automated
voice generators, but the trick of my ear is that I sometimes HEAR it that
way. Maybe it is in the nature of the digital signal, I don't know, but
either the fake voices being created now are being modeled on the
inflections of NPR announcers (segue announcers, not story readers, who are
clearly real people), or something about the transmission of those announcer
voices is making them sound synthesized.

I definitely have a few HAL moments while listening some mornings, that's
for sure. Except it is usually that woman's synthesized voice, more like the
411 numbers. Calm and NPR-sounding women. I'm sure they test out great for
delivering info in a style to keep us calm while we are being kept on hold.

Chris

On Tue, May 13, 2008 at 3:43 PM, Jeffrey D. Gimzek <listserv at jdgimzek.com>
wrote:

>
> Imagine 4 people in a small office all talking to their computers every 2
> seconds to say "new window....scroll down..... stop...up... select file...."
>
> I think it is mostly social, although everyone i know that has tried voice
> command has given it up, even when trying home alone in the quiet house, so
> the tech isnt there either.
>
> plus, talking is WAY slower than your hands.
>
>
>
> On May 13, 2008, at 11:36 AM, Jeff Garbers wrote:
>
> Most of us old-timers probably expected voice I/O to be a common part of
> > personal computing by now. But here we are in 2008, and I don't see even
> > early signs of voice emerging into the mainstream. Products like Naturally
> > Speaking have some popularity, but my sense is that they're used far more
> > for dictation than any sort of command and response interface. Both Mac OS
> > X and Windows Vista have built-in speech recognition capability, but does
> > anybody use them (or even know they're there)?
> >
> > So my question for the group is: why? Is it due to technical
> > shortcomings, like recognition accuracy and dealing with background noise?
> > Are there social issues, like not wanting to be overheard or feeling silly
> > talking to a machine?
> >
> > Or is it that splicing a voice-based UI into current graphical
> > interfaces just doesn't give a satisfactory user experience?
> >
> > This, to me, is the most intriguing possibility. Voice command today
> > reminds me of the earliest versions of mice for PCs, which generated arrow
> > keystrokes as you moved them around; although they were ostensibly
> > compatible with the existing applications, they just didn't work well enough
> > to justify using them. Could it be that an effective voice-based UI
> > requires a more basic integration into the OS and applications? Perhaps we
> > need an OS-defined structure for a spoken command syntax and vocabulary
> > rather than just expecting users to speak menu items?
> >
> > Why aren't we talking to our computers yet? Should we be?
> >
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

13 May 2008 - 3:23pm
Scott McDaniel
2007

I think it'd be fair to say that voice controls would largely need to
be an enhancement to screen/key/mouse
driven input for all the reasons mentioned before. I fear, too, that
many of the approaches to voice UI is following
the past 20 years of visual UI Design, based on products out there
instead of starting from the ground up of
"What would someone want a voice UI to do?"

At least if voice command phone systems and the navigation system on
my Prius are any indication, anyway :)

Scott

--
'Life' plus 'significance' = magic. ~ Grant Morrison

13 May 2008 - 3:25pm
Peyush Agarwal
2007

I think the problem w/ voice-based UIs are/would be:
1. Technical - dealing w/ accents, sound levels, ambient noise etc.
2. The computer would need to understand what we 'mean' as opposed to visual UI where we click what the computer has to offer
3. Humans work better by recognition rather than recall. Visual UI's aid recognition, while voice UI basically requires good recall. You'd have to remember the exact command that'd generate desirable response or else you're back to #2.
4. This is one of the biggest drawbacks of voice based interaction with a computer - it is essentially serial, as opposed to visual UI which is parallel. This is one of the reasons why I think the iPhone's visual vmail was such a hit. In this respect, the computer would really need to get to the level of a human-human interaction - just "knowing" when to interrupt and when to get interrupted in order to carry a serial interaction with almost parallel efficiency.
5. Probably I'm just used to the keyboard/mouse, but I think talking to the computer would be tiring, unless of course you're doing StarTrek - volume, tone, tenor, clarity, noise no bar - and maybe it'll be workable enough...

-Peyush

13 May 2008 - 2:46pm
Dave Malouf
2005

I think I would only be happy with one if it worked as well and as
kookie as those in Iron Man. There are 2 clear examples of this:
1) Jarvis the incredible AI. Very very natural speech in both
directions.
2) But even his robotic arms responded to incredibly natural and
often colloquial speech as well.

The issue is mode changing. Going into that un-natural mode is very
disconcerting.

I also think you have lot of good points as well. But I really think
the technology isn't there yet. I recently demoed a new Ford Sync
system (co-done w/ Microsoft) and while it was novel, with good
surprises, I think as a total UX it was quite, well sub-par.

In the end I don't think people "trust" these systems enough b/c
the ones we are forced through have such a negative experience (even
if they are pretty darn functional). Meaning that the total
experience design is flawed, so even if the technical side works
correctly, our total experience emotionally is tied to a very
negative response.

- dave

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=29005

13 May 2008 - 3:36pm
Will Parker
2007

On May 13, 2008, at 12:55 PM, Kristopher Kinlen wrote:

> I am currently dealing with the same questions / problems. I work in
> the clinical space where the user's hands are often gloved up and
> covered in fluids. Interacting with software via a touchscreen or
> hardware device presents sterility issues so voice is the natural
> solution. As simple an answer as that seems, to date, few people in
> the industry actually use the voice solutions that are available.

I can think of several reasons why voice commands in a surgical
environment would be problematical.

The best reported reliability I've seen for a simple voice command
system was around 98%, and frankly, I didn't believe that number when
I saw it. Most trials involving voice to text systems report about 95%
reliability, and those usually involved a period of training the
software to recognize individual users' utterances.

Is ~95% reliability sufficient in the operating room? That depends, I
suppose, on which functions could not be performed more reliably by
the operating room staff without adding to the overall cognitive load
for any one of the staff.

And if we're talking about introducing introducing slightly-unreliable
functionality into a risk-sensitive, cognitive-load-sensitive process,
I have to ask what *actual* improvements in surgical practice (other
than reducing the cost of staffing an operating room) would come from
voice command systems?

-Will

Will Parker
wparker at channelingdesign.com

13 May 2008 - 3:39pm
Loredana
2008

This is an interesting topic.
I'm currently working on a Voice UI for a consumer product application.

It seems to me that while voice I/O promises to deliver an enhanced
experience, the technology does not and cannot yet live up to its
promise.
Aside from the social awkwardness of talking to your computer in an
office full of people, here's what makes matters even worse:

1. Recognizers usually tend to miss-recognize short words that would
feel intuitive to the user, such as "back" and "next" and "stop"
What you are left with as the a designer is "Go back, Play next, Stop
now" - words that consumers would never think to say, and frankly
irritate them.

2. Let's assume though that they do make the effort to learn the
keywords, and are alone (or ignore the folks at the office). They
open their mouth wide and say "Plaaay Neeeext." only to be faced
with their worst fear: "I'm sorry, I couldn't understand that."

Us humans rely heavily on being able to communicate. Our survival as
a species depends on it, and our success is a direct result of the
ability we have to understand each other.
We are hard-wired to be really upset when we cannot make ourselves
understood. At the gut-level, miscommunication is a threat.

The application I'm working on gives users the option to interact
either via keypad input or voice input. Only about 30% choose voice.
It's convenient when they're driving, when they absolutely need to
focus their eyes on something else.

But in truth, with the current technology, there seem to be
circumstances in which the advantage of using voice to communicate
with a machine is greater than its drawbacks.

Loredana

13 May 2008 - 3:05pm
Tim Ostler
2007

1. Above all it is social. Working amongst fellow workers all talking to
their computers would be like working in a call centre - only without the
scope for eavesdropping on something interesting..
2. It creates more cognitive load for both human and computer:
- for the human, to verbalise what you want something on screen to do and
then say it, then confirm that it has worked;
- for the computer, to interpret the sound it detects and convert that into
interface instructions

I am not surprised that voice recognition is more widely used for dictation
than for commands, as that is a situation where it can offer real
productivity benefits. Even here, some people just prefer to express
themselves with a keyboard; personally I never got used to using a
dictaphone or dictating to a secretary (remember them?) .

>
> --
Tim Ostler
London

13 May 2008 - 4:06pm
Will Parker
2007

On May 13, 2008, at 1:25 PM, Peyush Agarwal wrote:

> 3. Humans work better by recognition rather than recall. Visual UI's
> aid recognition, while voice UI basically requires good recall.
> You'd have to remember the exact command that'd generate desirable
> response or else you're back to #2.
>
> 4. This is one of the biggest drawbacks of voice based interaction
> with a computer - it is essentially serial, as opposed to visual UI
> which is parallel. This is one of the reasons why I think the
> iPhone's visual vmail was such a hit. In this respect, the computer
> would really need to get to the level of a human-human interaction -
> just "knowing" when to interrupt and when to get interrupted in
> order to carry a serial interaction with almost parallel efficiency.

"Almost parallel efficiency" is indeed the key victory condition for
voice UI.

Even at 99.99% voice recognition reliability (plus the absurd 100%
natural language parsing reliability we see in the movies), every
command interaction that involves a non-trivial, unrecoverable change
in state is going to require a confirmation phase: "I think you said
'Go Left'. Is that correct?"

One-way auditory *signals* are a great thing, even under high-stress
conditions. Two-way auditory *communication* requires a mix of trust
and half-duplex hand-shake negotiation, and that last bit is the deal-
breaker for unreliable computer voice recognition.

-Will

Will Parker
wparker at channelingdesign.com

13 May 2008 - 6:58pm
Dave Malouf
2005

Where I work @ Motorola Enterprise Mobility our partners create a lot of voice activation systems. The main application is in item picking (think a warehouse setting) or other finite tasking system.

These systems
1. learn @ the individual level
2. usually include ear piece & boom mic.
3. have short & broad menuing systems that are filtered by role & individual.
4. the worker & the system both go through training

This is fairly successful, but also pretty far from mainstream.

- dave

13 May 2008 - 8:07pm
Jeffrey D. Gimzek
2007

On May 13, 2008, at 1:36 PM, Will Parker wrote:

> On May 13, 2008, at 12:55 PM, Kristopher Kinlen wrote:
>
>> I am currently dealing with the same questions / problems. I work in
>> the clinical space where the user's hands are often gloved up and
>> covered in fluids. Interacting with software via a touchscreen or
>> hardware device presents sterility issues so voice is the natural
>> solution. As simple an answer as that seems, to date, few people in
>> the industry actually use the voice solutions that are available.
>
> I can think of several reasons why voice commands in a surgical
> environment would be problematical.
>
> The best reported reliability I've seen for a simple voice command
> system was around 98%, and frankly, I didn't believe that number
> when I saw it. Most trials involving voice to text systems report
> about 95% reliability, and those usually involved a period of
> training the software to recognize individual users' utterances.
>
> Is ~95% reliability sufficient in the operating room? That depends,
> I suppose, on which functions could not be performed more reliably
> by the operating room staff without adding to the overall cognitive
> load for any one of the staff.
>
> And if we're talking about introducing introducing slightly-
> unreliable functionality into a risk-sensitive, cognitive-load-
> sensitive process, I have to ask what *actual* improvements in
> surgical practice (other than reducing the cost of staffing an
> operating room) would come from voice command systems?

you can push the malpractice suits over to the computer company ?

- -

Jeffrey D. Gimzek | Senior User Experience Designer

http://www.glassdoor.com

13 May 2008 - 8:14pm
Brandon E.B. Ward
2008

A friend of mine had voice-recognition software that locked his computer. To unlock the machine he just talked into the mic. (think Sneakers "My voice is my passport.")

But if people in the office were being noisy, or the air-conditioner kicked on, or someone was walking by, or he had a cold - he couldn't get into the system because the audio the machine was receiving differed too much from what it initially recorded when he set up his password.

B

13 May 2008 - 5:08pm
Victoria Stanbach
2008

I used to work for a start-up called AgileTV. We developed a very
robust speech to TV control interface. The company is now called
Promptu. Check them out: www.promptu.com

- Speech recognition is very advanced today. You can have anyone
speak a number of specific words into a microphone and the computer
adapts to your speech. Promptu's technology of speech
input>server>response is very fast.

- In many user tests the system was found to be very interesting and
useful to some - mainly elderly and disabled, but we ran a regional
test with a local cable company, typical users found that it was just
as complicated to learn the new speech interface as it was to navigate
the on screen guides.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=29005

14 May 2008 - 2:14am
keyur sorathia
2007

Hi all,

I work with IIT (Indian Institute of Technology) Mumbai, India, as an
interaction designer. Currently we are doing a research project called
"galla - a low cost retail management system". We are designing a hardware
and a software for small grocery shop keepers for better customer
management, item management and vendor management.

In one of our exploration, we tried context based speech recognition system,
which works pretty well. We designed a UI particularly for this application.
While making a bill of particular items in a grocery shop, these context
based words helps making bills faster. In India, as grocery shops are noisy,
this system is facing some problems about accuracy because of background
noise. This system is designed in a way where there is no need to train it,
one can directly start operating it. Currently the system works only for
english words, we are also trying out with regional languages. But for sure,
as it is a context based voice recognition system, it works much better that
normal speech recognition system.

We are still trying to find a good solution for reducing the background
noise and making this system more effective.

Cheers!!!

On Wed, May 14, 2008 at 3:38 AM, Victoria Stanbach <vic at victoriastanbach.com>
wrote:

> I used to work for a start-up called AgileTV. We developed a very
> robust speech to TV control interface. The company is now called
> Promptu. Check them out: www.promptu.com
>
> - Speech recognition is very advanced today. You can have anyone
> speak a number of specific words into a microphone and the computer
> adapts to your speech. Promptu's technology of speech
> input>server>response is very fast.
>
> - In many user tests the system was found to be very interesting and
> useful to some - mainly elderly and disabled, but we ran a regional
> test with a local cable company, typical users found that it was just
> as complicated to learn the new speech interface as it was to navigate
> the on screen guides.
>
>
>
>
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Posted from the new ixda.org
> http://www.ixda.org/discuss?post=29005
>
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

--
Keyur Sorathia
Interaction Designer,
Media Lab Asia,
IIT Mumbai.
mobile : +91 98198 15448

email : keyurbsorathia at gmail.com

14 May 2008 - 4:49am
Anders Ljung
2008

Jeff, I think all of the reasons you mentioned applies. Speech
recognition and synthesis in my opinion adds very little in the
keyboard/mouse/screen paradigm we are currently in.

Studying this also reveals how much information there is in the way
we say things, human to human, which is very tricky for computers to
analyze. A "Hmm" can mean so many things depending on timing,
intonation etc. Gabriel Skantze at KTH did a pretty nice system for
"pedestrian navigation" which tries to overcome this.

http://www.speech.kth.se/~gabriel/software.html

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=29005

14 May 2008 - 9:28am
Kristopher Kinlen
2008

The focus is more on the "cover my ass" side of things rather than
actually using software to help perform procedures. There are all kinds
of information that has to be documented and charted during/after a
procedure is performed and many doctors are looking to improve
productivity and profitability so they look to software.

You are right on with the reliability though... I personally am not
comfortable with a 95% reliable surgeon hehe.

Regards,
Kristopher Kinlen
x63331

-----Original Message-----
From: Will Parker [mailto:wparker at channelingdesign.com]
Sent: Tuesday, May 13, 2008 4:37 PM
To: Kris Kinlen
Cc: discuss at ixda.org
Subject: Re: [IxDA Discuss] Why isn't voice-based UI mainstream?

And if we're talking about introducing introducing slightly-unreliable
functionality into a risk-sensitive, cognitive-load-sensitive process,
I have to ask what *actual* improvements in surgical practice (other
than reducing the cost of staffing an operating room) would come from
voice command systems?

-Will

Will Parker
wparker at channelingdesign.com

14 May 2008 - 12:53pm
Will Parker
2007

On May 14, 2008, at 7:28 AM, Kris Kinlen wrote:

> The focus is more on the "cover my ass" side of things rather than
> actually using software to help perform procedures. There are all
> kinds
> of information that has to be documented and charted during/after a
> procedure is performed and many doctors are looking to improve
> productivity and profitability so they look to software.

Can you give an example or two of the type of procedure documentation
required?

I'm wondering why a non-interactive audiovisual record wouldn't fill
the bill. (Like the no-doubt-fascinating-to-surgical-interns knee
reconstruction videos that keep popping up on the University of
Washington cable channel).

Why impose the additional workload of managing the data collection
system on the *most critical personnel* in the process?

It's quite cheap (financially and technically) to add massive-but-dumb
data collection functionality to an already-wired venue like the
modern surgical theater. Grab the entire event as fine-grained raw
data and emulate Google to find the interesting bits. Or let your pet
intern do that for you. (Oh ... wait ... that last bit isn't
monetizable. Forget I said that.)

-Will

Will Parker
wparker at channelingdesign.com

14 May 2008 - 1:10pm
Kevin Doyle
2007

Jeffery is right -- the workplace is what's keeping voice UI from
becoming commonplace. It's where most computers are used -- imagine
how noisy a cube farm of just 20 people talking to their computer
would get.

I've read about some great HCI coming to the home -- you'll be able
to start your dishwasher, check what's in your fridge while at the
grocery store (or order from home using your fridge) and turn on the
AC/heat very soon. I could see how the inside of a home could be
controlled by voice once things get that wired... but until then, I
don't see much voice happening.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=29005

14 May 2008 - 2:27pm
Scott Berkun
2008

I'm sure I'll be forever labeled as the curmmodgeonly luddite on the list,
but I really do not want to *ever* debug or reboot my refridgerator, even if
that means I'll always have to make shopping lists the old fashioned way:
the upside of automation is totally outweighed for me by the likelyhood of
adding more fragility. Frankly in 2008 its still pretty damn hard to find a
thermostat that doesn't toally suck to use - my faith in the ease of use of
web programmable kitchen appliances is comically low.

More in-line with this thread: why do we assume homes have less background
noise than offices? If the TV or radio is on doesn't that create nearly as
many problems?

-Scott

Scott Berkun
www.scottberkun.com

> From: "Kevin Doyle" <kbdoyle at gmail.com>
>
>
> I've read about some great HCI coming to the home -- you'll be able
> to start your dishwasher, check what's in your fridge while at the
> grocery store (or order from home using your fridge) and turn on the
> AC/heat very soon. I could see how the inside of a home could be
> controlled by voice once things get that wired... but until then, I
> don't see much voice happening.

14 May 2008 - 2:34pm
Loredana
2008

I was reading an interesting book - "It's better to be good machine
than a bad person" in which it was described how controlling your
home appliances by voice can prove to be ... um, challenging.
Background noise is an issue - let's say you're watching a movie and
the main character shouts "turn that off!" At the same time your
dishwasher stops. Grr!
High error rates in this type of applications are common.

But I believe that the main problem with voice is still social/
psychological. How do you talk to a machine?
I've looked at a bunch of Sync videos on Youtube - people are
obviously feeling uneasy talking to their car. I'd love to read about
the psychology of IVR...

How do you folks feel when you have to use an interactive voice
response system?

On May 14, 2008, at 12:27 PM, Scott Berkun wrote:

> I'm sure I'll be forever labeled as the curmmodgeonly luddite on
> the list, but I really do not want to *ever* debug or reboot my
> refridgerator, even if that means I'll always have to make shopping
> lists the old fashioned way: the upside of automation is totally
> outweighed for me by the likelyhood of adding more fragility.
> Frankly in 2008 its still pretty damn hard to find a thermostat
> that doesn't toally suck to use - my faith in the ease of use of
> web programmable kitchen appliances is comically low.
>
> More in-line with this thread: why do we assume homes have less
> background noise than offices? If the TV or radio is on doesn't
> that create nearly as many problems?
>
> -Scott
>
> Scott Berkun
> www.scottberkun.com
>
>> From: "Kevin Doyle" <kbdoyle at gmail.com>
>>
>>
>> I've read about some great HCI coming to the home -- you'll be able
>> to start your dishwasher, check what's in your fridge while at the
>> grocery store (or order from home using your fridge) and turn on the
>> AC/heat very soon. I could see how the inside of a home could be
>> controlled by voice once things get that wired... but until then, I
>> don't see much voice happening.
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help

14 May 2008 - 2:49pm
Jeff Garbers
2008

On May 14, 2008, at 3:34 PM, Loredana Crisan wrote:
> How do you folks feel when you have to use an interactive voice
> response system?

Anxious, because I don't trust them; given the choice to "press or say
your account number" I *always* use the keypad, figuring DTMF is a lot
less ambiguous than English.

Irritated, because they often ask questions as if they understand
natural language, but they don't. "What can I help you with today?"
is a pretty generic prompt, and I have very low confidence that
anything good will happen if I go into my 30-second description of why
my check got credited to the wrong account, etc. etc. Best thing that
can happen is to have it say "Okay, let me get you a representative to
help you with that problem."

Maybe we have the same sort of "uncanny valley" phenomenon with IVR as
we do with CG human characters in movies... perhaps it's better not to
try to simulate human behavior, since you lead people to focus on the
differences and not the similarities.

14 May 2008 - 3:04pm
Brandon E.B. Ward
2008

>> Anxious, because I don't trust them;

I remember my mom telling me about the first electronic calculators they had way back in the day. They'd just switched from mechanical adding machines to electronic calculators. She said they were great - small, light, fast, but they couldn't be trusted 100% of the time. Sometimes the answer it gave was wrong. So after initially doing everything quickly with the calculator (can't remember what - some data entry/books/accounting type stuff) they'd do it all over again in their head or by hand or using the old mechanical system to verify the answer. It didn't take long before they didn't have to do this anymore, but she recalled being untrusting of the new fandangled technology.

I'm guessing that Voice-Rec. has a similar hurdle to jump - but it probably will someday.

B

14 May 2008 - 3:06pm
Jackie O\'Hare
2008

On May 14, 2008, at 3:34 PM, Loredana Crisan wrote:
> How do you folks feel when you have to use an interactive voice
> response system?

"Anxious, because I don't trust them; given the choice to "press or say
your account number" I *always* use the keypad, figuring DTMF is a lot
less ambiguous than English.

Irritated, because they often ask questions as if they understand
natural language, but they don't. "What can I help you with today?"..."

...

I find interactive voice response difficult at best, but frequently
infuriating. As many people have been indicating, error rates are high,
and what you intuitively think you need to say to get the response you
need is not necessarily the command that the response system requires in
order to get that action. My own experiences with interactive voice
response have generally ended with me trying to usurp the system by
pressing the * key repeatedly, which does usually boot you out of the
system and land you on the phone with a real live human.

Unfortunately, when someone is already mad about something it's not a
really great time to engage them in a challenging user environment.

This doesn't really apply to more neutral situations - but how long does
it take someone to become infuriated if the voice command to execute
simple tasks repeatedly malfunctions?

14 May 2008 - 8:47pm
Jeff Seager
2007

I work with a number of people with disabilities who actually use
Dragon Dictate and Dragon Naturally Speaking, and some of you who
have some experience with this will understand that the technology is
imperfect at best.

If you've never seen this in action, you should know that the
software must be trained by each user. It's a rather painstaking
process, but it's worth it for people who have physical impairments
that limit their options.

Even after customizing it for your voice, dictating letters with this
software is an exercise in extreme patience. I resist entering one
co-worker's office when she's drafting a letter or e-mail or some
other document, because she has to say "go to sleep" before we can
carry on a discussion. And even though the software has "learned"
her accent and vocal inflections, she's constantly having to back it
up to correct the spelling of an uncommon word or name. I would call
her an expert user, and she still says, "Dragon Dictate sucks!"

Having seen quite a few people struggle with this, I think current
voice recognition software is sufficient for very discreet purposes
where you have a limited command set. A cell phone or address book
will have a finite number of stored contact names, for example. I
don't know how this would work for a TV or car stereo, as it seems
to me the sound of the device itself would interfere with the command
reception.

For people who need it to get any work done on a computer, it's
worth the hassle. But probably not for anyone else. I believe that
anyone who can improve this will reap some very good karma.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=29005

15 May 2008 - 12:14am
Joe Pemberton
2007

Great discussion...

We did some UI (voice and graphical) design for Promptu, a voice search company focused on set top TV and mobile device voice search. The key learnings were mostly around the awkwardness of dealing with a hybrid UI -- one where you're providing voice input and receiving visual output. In a graphical UI we might be taking for granted the cues we're giving to users all the time -- hover states, loading indicators, etc. You have to compensate harder in voice UI I think.

This hybrid approach is actually an asset, but just requires new thinking. The alternative (voice input, aural output) is more like what we experience with telephone banking. Ugh.

The other learning was that people dont know what to say and the consequences for errors are high. Users were afraid to make mistakes because returning search results and dealing with mistakes was time consuming and burdensome.

Promptu actually has some great "did you mean" functionality akin to Google's and just as intuitive. Further, because the search was within categories (e.g. movie titles or album names) the accuracy was excellent. When input is open ended, as with dictation, it's dismal by comparison.

Lastly, deslite mobile handsets being used with voice all the time, people still have an awkwardness talking into a visual UI. Users didn't know how to hold it and would talk at the screen not the microphone-- even though they use their mobile for voice more than they do for mobile data.

I think we'll get there with voice making incremental inroads into UI where it makes sense -- cars for one and where the scope of spoken input is well defined as with categorized search.

15 May 2008 - 12:29am
Pankaj Chawla
2008

I think the real reason is, how will you want to interact with the
system. For eg if you have to check the balance in your account
maintainied in a spreadsheet what will be your sequence of voice
commands:
1. Start
2. Programs
3. Windows Explorer
4. My Documents
5. Open Account.xls
6. Select Column D
7. Sum to Cell D6
8. Speak D6
9. Close
10. Exit

or

1. Hey computer can you tell me the balance in my account.

If its the first case we are about 40% on our way to reaching there
but if its the second scenario we havent even started yet. To me its a
question of mental model vs implementation model with an undefined
designer model as of now. I am not sure if we will be able to reach
the current mental model anytime soon so as designers its imperative
that a designer model is first brought forward that can bridge the gap
between the mental and implementation model within the limitations of
currently available technology and business needs.

My 2 cents.

Cheers
Pankaj
---------------------------------------------
http://13degree.wordpress.com
Do your dreams!

15 May 2008 - 10:42am
Michael Micheletti
2006

On Tue, May 13, 2008 at 11:36 AM, Jeff Garbers <jgarbers at xltsoftware.com>
wrote:

> Why aren't we talking to our computers yet? Should we be?
>

Or cars. I've been thinking about BMW's iDrive while following this thread.
This is a screen-based control system that also has a voice control
interface. I remember reading comments elsewhere from a BMW salesperson who
said that he sat new owners down in their cars and helped them train the air
conditioner. That way they could issue commands vocally without taking their
eyes off the road.

The idea of using voice recognition as a navigation layer superimposed on
other controls is interesting, but I'll admit I'm glad my older 3-series has
pushbuttons.

Michael Micheletti

15 May 2008 - 11:14am
Victor Lombardi
2003

On Thu, May 15, 2008 at 11:42 AM, Michael Micheletti
<michael.micheletti at gmail.com> wrote:
> On Tue, May 13, 2008 at 11:36 AM, Jeff Garbers <jgarbers at xltsoftware.com>
> wrote:
>
>> Why aren't we talking to our computers yet? Should we be?

Apple includes basic speech recognition on Macs:
http://www.apple.com/accessibility/physical/

Now that I'm a parent, talking with other parents, I found a common
use case for speech recognition: many hours of feeding and soothing a
baby requiring both hands. The baby is happy and occupied, but
caregivers would like to do something too. I've talked to parents who
started reading coil-bound books just because they lay flat without
constant handing. So a hands-free software UI for caregivers could be
a big hit, especially if it was optimized for reading email, web
browsing, and writing rather than general purpose control.

16 May 2008 - 12:04am
Sachendra
2005

I think Voice UI will not become primary mode of interaction in the
near future for obvious reasons. It'll be used mostly when Visual UI
is difficult to use i.e while driving a car, taking care of babies,
performing surgical procedure etc

Sachendra Yadav
http://sachendra.wordpress.com

Syndicate content Get the feed