Voice interfaces aren\'t Visual interfaces WAS Any data on users making use of Help?

10 May 2009 - 4:58pm
5 years ago
5 replies
884 reads
DampeS8N
2008

http://www.ixda.org/discuss.php?post=41773

My experiences with voice-based interfaces has always been pretty
caustic. Often you have a voice command-line, where in the user can
speak commands that the computer understands (let's ignore
imperfections in recognition for now) and the computer performs this
task:

"Play Metallica" "Skip" "Play Genre Jazz" and so on.

Every kind of voice controlled system I've dealt with has boiled
down to some kind of flat list of context-sensitive keywords. If you
say these words here, this will happen, if you say those, that will.

In the thread: "Any data on users making use of Help?" I mentioned
treating these systems like text adventures and RPGs. And I think
there is more we can learn from this that goes far beyond simple
help.

As consoles have replaces the PC for most RPGs, and tight plots have
replaced the more freeform text driven commands of the past. I think
there are a variety of interesting techniques used in these games to
make interaction with fictional people easier. Ones we can lift.

Often time our current systems have a lack of memory. They attempt to
tell the user all that they can do right away, and every time they
interact with the system. Recently, in the game Left 4 Dead, Valve
has added a teaching mechanic to the game that alerts the player when
something new or perhaps not completely learned comes up. And it goes
beyond a list of tips that pop up a few times. The game tracks, for
example, how often the player crouches and how quickly they crouch
when they reach an obstacle they can crawl under. It also watches the
player in combat and looks to see if they crouch to let the other
players behind them have clear shots. When the game feels the user
isn't using this core mechanic enough, it lets the player know he
can do it with a tip that doesn't break the flow of the game.

We can also do this. Perhaps you have a voice activated music system.
You could watch the user and note that they never rate songs they are
listening to. Perhaps ratings are one of the primary ways you pick
the songs they are most likely to want to hear at any given moment.
One way to alert them might be to have the system say, "Remember,
say [Add to Favorites] to let me know what you like!"

The user may not have ever known they could do that, or they may have
just forgotten. But either way, now they know.

Along with this, we can track what users have recently heard, and how
much help we are giving them. We can prioritize our commands and alert
the user about the most important features first. Such as "Skip" or
"Pause".

Something else that was very common in old text adventures was
keeping a large list of equivalent words. In Zork, you have to face a
giant cyclops. To defeat him, you must scare him away. And to do so
requires saying a certain heroes name. Problem? He has two acceptable
names. Ulysses and Odysseus. No problem, either one works.

Yes, Yeah, Yup, Sure, Affirmative, Certainly, and so on are all
acceptable alternatives for each other. And they should all be
acceptable to your software. But this rule goes far beyond yes and
no. It should apply to anything and everything. Skip, pass, next.
Stop, halt, pause, break, hold on, wait. Everything should have as
many alternatives as possible.

Along with this is not mapping words that mean almost the same thing
to different functions. I know that "Forward" might be shorter than
"Fast Forward" but Forward is ambiguous, it could also mean
"Skip", on the other hand "Seek" is viable for "Fast Forward".
You have to be careful. If the user is going to remember this long
term, it has to make sense, and there can't be confusion about what
will do what. If this means limiting the functionality of your
software, so-be-it.

What thoughts do the rest of you have about this? Clearly this is
only the tip of the iceberg. And for call-taking systems that are
likely to only be used once or twice, it isn't very helpful. What
ways might we make the user's life easier with those? Or with any
voice entry system?

Comments

10 May 2009 - 8:11pm
DampeS8N
2008

I'm not sure what all these links have to do with each other. Or what
they have to do with the topic. The last one sure. But Morse Code is
difficult to learn and doesn't really offer anything to a modern
voice-based interface... Unless I'm really missing something.

And I understand the implications of tonal communication for inter
species relations. I'm not sure what it would do for us here. That
was a fantastic movie, though. One of my all time favs. :)

Angel, I'm normally really on board with what you have to say. And
the things you offer have a fun way of not making any sense at first
and then slapping you in the face several weeks later at just the
right moment.

But I'm afraid you lost me, bro.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=41891

10 May 2009 - 9:21pm
Angel Marquez
2008

What are my options?
[?]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 2785 bytes
Desc: not available
URL: <http://lists.interactiondesigners.com/pipermail/discuss-interactiondesigners.com/attachments/20090510/932dbd97/attachment.gif>

11 May 2009 - 10:12pm
Phillip Hunter
2006

William,

Actually, most of the ideas you mention are useful for and being used
in over-the-phone systems, which I've worked in for almost too long.
Part of the problem you, I, and others face in using these systems is
the highly imperfect applications of expertise to the design issues.
GUI designers who don't understand the linguistics involved, speech
engineers and linguists who don't understand design, etc.

But, foregoing the rant, rest assured that synonyms are heavily used
and in the best systems are driven by the data of actual usage. And
that just-in-time contextual tips are a hallmark of good voice
design.

As you point out, if the system is called infrequently, the
individual user might not directly benefit, but later callers can.

Phillip

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=41891

11 May 2009 - 10:29pm
Angel Marquez
2008
Syndicate content Get the feed