Article on Number of Usability Test Participants

1 Oct 2009 - 1:05pm
5 years ago
106 replies
5538 reads
Chris Ryan
2004

I have been looking, unsuccessfully, through back issues of interactions magazine for an article, published a few years back, written I believe by someone from Microsoft as part of a debate about statistical significance in usability testing. There was something of a debate about testing with large numbers of users, and this article, as I recall, made an eloquent case for sticking to six to eight participants. Does anyone remember this? Perhaps I'm wrong in recalling that it was in interactions.

Comments

1 Oct 2009 - 2:26pm
Robert Wünsch
2009

Are you looking for that specific article or are you looking for arguments on your topic?

Because, if you google for your own topic, you'll find these 2 articles:
http://www.useit.com/alertbox/20000319.html
http://usability.gov/pubs/092006news.html

Both telling about the same: 5 participants are good enough and don't have more than 8.

1 Oct 2009 - 1:50pm
bminihan
2007

Here's the link I've used before...from Jakob Nielsen. Argue his
credibility if you'd like, but in practice I've seen "testing a small number
of representative users" as effective as "a lot of random users".

http://www.useit.com/alertbox/20000319.html

I haven't seen any justification that 5-6 users is statistically accurate,
according to strict mathematical rules, but for practical hands-on work, I'm
not entirely sure that's necessary, either. Our six-sigma folks at one
company argued heavily that we needed to test 10-20% of a 100K population
for statistical accuracy, to which I replied: And meanwhile, we'll test 6-8
folks from each core group and get back to you when we hire an army of
practitioners.

Bryan Minihan

-----Original Message-----
From: discuss-bounces at lists.interactiondesigners.com
[mailto:discuss-bounces at lists.interactiondesigners.com] On Behalf Of Chris
Ryan
Sent: Thursday, October 01, 2009 2:06 PM
To: discuss at ixda.org
Subject: [IxDA Discuss] Article on Number of Usability Test Participants

I have been looking, unsuccessfully, through back issues of interactions
magazine for an article, published a few years back, written I believe by
someone from Microsoft as part of a debate about statistical significance in
usability testing. There was something of a debate about testing with large
numbers of users, and this article, as I recall, made an eloquent case for
sticking to six to eight participants. Does anyone remember this? Perhaps
I'm wrong in recalling that it was in interactions.
________________________________________________________________
Welcome to the Interaction Design Association (IxDA)!
To post to this list ....... discuss at ixda.org
Unsubscribe ................ http://www.ixda.org/unsubscribe
List Guidelines ............ http://www.ixda.org/guidelines
List Help .................. http://www.ixda.org/help

1 Oct 2009 - 2:24pm
Angel Anderson
2010

Hi Ryan,
Are you perhaps thinking of Jakob Nielsen's rule of 5?
http://www.useit.com/alertbox/20000319.html

Kind regards,

Angel Anderson
Senior Interaction Designer
HUGE
----------------------------------
IxDA Los Angeles
----------------------------------
Email: angel.j.anderson at gmail.com
Twitter: AngelAnderson
Skype: AngelJAnderson

On Thu, Oct 1, 2009 at 11:05 AM, Chris Ryan
<chris.ryan at visioncritical.com>wrote:

> I have been looking, unsuccessfully, through back issues of interactions
> magazine for an article, published a few years back, written I believe by
> someone from Microsoft as part of a debate about statistical significance in
> usability testing. There was something of a debate about testing with large
> numbers of users, and this article, as I recall, made an eloquent case for
> sticking to six to eight participants. Does anyone remember this? Perhaps
> I'm wrong in recalling that it was in interactions.
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

--

1 Oct 2009 - 2:25pm
Brooke Baldwin
2008

chris
i'm not sure i know the article you're writing about but you can
also take a look at what jeff sauro (oracle) has done and written
about statistical significance

http://www.measuringusability.com/statistics.php

scroll to the section at the bottom 'Sample Size'
good luck

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

1 Oct 2009 - 3:19pm
Alan James Salmoni
2008

The number of participants you need to achieve statistical power will
depend upon the design of your study - which will be determined (in
large part) by the questions you are trying to answer. This assumes
you want statistical power of course. Many studies don't feel the
need for it.

Sorry it's not much help but statistical questions rarely have
simple answers IMHO.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

1 Oct 2009 - 3:17pm
Chauncey Wilson
2007

Laura Faulkner has written a reasoned article on sample size. You can
find a copy at:

http://www.geocities.com/faulknerusability/Faulkner_BRMIC_Vol35.pdf

The number of participants issue depends on a number of issues
including the risk inherent in the product, the number of distinct
user groups, whether you are using the sample in many rounds of
iterative evaliuation designed to filter out problems over the course
of the design cycle (formative versus summative), the complexity of
the UI, the number of paths possible, .....

If you look in the ACM Digital Library, you will find a number of
articles related to the number of participants.

Chauncey

On Thu, Oct 1, 2009 at 2:05 PM, Chris Ryan
<chris.ryan at visioncritical.com> wrote:
> I have been looking, unsuccessfully, through back issues of interactions magazine for an article, published a few years back, written I believe by someone from Microsoft as part of a debate about statistical significance in usability testing. There was something of a debate about testing with large numbers of users, and this article, as I recall, made an eloquent case for sticking to six to eight participants. Does anyone remember this? Perhaps I'm wrong in recalling that it was in interactions.
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

1 Oct 2009 - 5:02pm
Steve Baty
2009

Sorry Bryan, but I need to call this out: "testing a small number of
representative users" as effective as "a lot of random users".

You give the impression that larger studies choose random users as test
participants. You'll find that testing sessions run to meet statistical
standards are required to select a representative sample in a highly
structured and formalised manner. They choose 'users at random'; they don't
choose random users. And the result is a much more rigorous representation
of your audience.

However, what happens on this large scale is not very different to what we
do on a small scale when choosing users from each persona. This is a type of
stratified random sample, and the way you select the representative from
each is likely to be a fairly random method.

None of which changes the point you were trying to make, which is that
smaller tests can be highly effective, and a much more efficient use of your
budget.

Regards
Steve

2009/10/2 Bryan Minihan <bjminihan at gmail.com>

> Here's the link I've used before...from Jakob Nielsen. Argue his
> credibility if you'd like, but in practice I've seen "testing a small
> number
> of representative users" as effective as "a lot of random users".
>
> http://www.useit.com/alertbox/20000319.html
>
>

--
Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
www.linkedin.com/in/stevebaty

1 Oct 2009 - 5:30pm
Charlie Kreitzberg
2008

Hi All:

I am not a mathematician but I have conducted many usability tests.
Sometimes clients have demanded large samples in tests that have spanned
multiple days. In my experience, this was not productive. I generally felt
that I learned everything I could from the first 6 or so users.

I've thought a lot about why this might be and would like to offer the
following thoughts...

Generally in usability testing (at least in formative testing) we are not
looking for statistical significance. Rather we are looking for problems to
address. We don't particularly care, for example, if 40% vs. 60% of users
make a particular error -- what is important is that we are seeing that a
problem exists so we can address it.

As a designer, I benefit most from the qualitative aspects of usability
testing. Often, I find the metrics less useful. Though they do play well
with management though.

As I practice them, usability tests are deep structured interviews during
which I can observe behaviors against a controlled set of tasks and really
learn a lot about the user's mental models and where they clash with the
design. With this perspective I learn a lot from 6 users and usually test
8-10 just to make certain. But by the end of the tests I am hearing the same
things over and over again.

Similar debates have been part of social science for a long time. Much
scientific research is statistical (nomothetic) and relies on finding the
shared characteristics of a group. This is great for assessing the outcomes
of treatments but does not generate a lot of in-depth information. The other
alternative is the case-study approach (ideographic research) which probes
individuals in-depth.

I suspect that a lot of metrically-inclined people will disagree with me but
I find that thinking of usability testing as case-studies yields the most
information.

I might take a different position for a summative test whose purpose is to
demonstrate the usability of an entire product and not as a design tool.

Best,

Charlie

============================
Charles B. Kreitzberg, Ph.D.
CEO, Cognetics Corporation
============================

-----Original Message-----
From: discuss-bounces at lists.interactiondesigners.com
[mailto:discuss-bounces at lists.interactiondesigners.com] On Behalf Of
Chauncey Wilson
Sent: Thursday, October 01, 2009 4:17 PM
To: Chris Ryan
Cc: discuss at ixda.org
Subject: Re: [IxDA Discuss] Article on Number of Usability Test Participants

Laura Faulkner has written a reasoned article on sample size. You can
find a copy at:

http://www.geocities.com/faulknerusability/Faulkner_BRMIC_Vol35.pdf

The number of participants issue depends on a number of issues
including the risk inherent in the product, the number of distinct
user groups, whether you are using the sample in many rounds of
iterative evaliuation designed to filter out problems over the course
of the design cycle (formative versus summative), the complexity of
the UI, the number of paths possible, .....

If you look in the ACM Digital Library, you will find a number of
articles related to the number of participants.

Chauncey

On Thu, Oct 1, 2009 at 2:05 PM, Chris Ryan
<chris.ryan at visioncritical.com> wrote:
> I have been looking, unsuccessfully, through back issues of interactions
magazine for an article, published a few years back, written I believe by
someone from Microsoft as part of a debate about statistical significance in
usability testing. There was something of a debate about testing with large
numbers of users, and this article, as I recall, made an eloquent case for
sticking to six to eight participants. Does anyone remember this? Perhaps
I'm wrong in recalling that it was in interactions.
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>
________________________________________________________________
Welcome to the Interaction Design Association (IxDA)!
To post to this list ....... discuss at ixda.org
Unsubscribe ................ http://www.ixda.org/unsubscribe
List Guidelines ............ http://www.ixda.org/guidelines
List Help .................. http://www.ixda.org/help

1 Oct 2009 - 6:32pm
dszuc
2005

Hi:

Testing with a smaller number can yield useful insights and you can
reuse other portions of your budget to re-test on what you have found
out from a first round of testing. Never understood the need to see
the same problem repeat over and over again, when the monies could be
better spent prioritizing it, mapping it against a business goal and
seeing how/where to fix it.

My question is: Where does the question of statistical significance
in usability testing come from?

It seems that when we have faced this question from business, its
situations where the business:

* Is testing for the first time
* Knows little about Usability/UX/iterative research
* Trying to win an internal battle against another team (yikes!)
* Taking the need for larger numbers of participants from other
methods like surveys or focus groups (historical)
* Dont trust the results from a Usability Test (maturity)
* Left testing too late so want to test with larger numbers to cover
their behinds (political)
* Fill in your own :)

Something always scares me a little when we are asked the
"statistical significance" question when the same question is not
applied to other parts of the business. Perhaps the question comes
from a lack of understanding and maturity around what we do? Be
pleased to see this question disappear forever!

Suggest by identifying where the question is coming from we may all
be better in finding ways to better inform/education the business.

Thoughts?

rgds,
Dan

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

1 Oct 2009 - 7:18pm
bminihan
2007

You're right, Steve, and I agree.

I didn't mean to imply that a statistically accurate study is less
effective, or less rigorous. I ran such a survey for one company, and
we were very rigorous in ensuring our sampling was randomly
distributed, and had a lot of help from some brilliant statisticians
to ensure we picked the right people. We didn't learn much more than
we did from the first 30-odd participants. However, the weight of the
results meant much more to our executives, because the stats seemed
much more thorough to their thinking. I'd say I actually felt more
confident, knowing we had gone the extra mile.

Sorry about the slip. Just meant to say you can get pretty good
results with a small sampling, which is often as much as you need, and
as much as you have resources to test =]

Cheers =]

Bryan Minihan

On Oct 1, 2009, at 6:02 PM, Steve Baty wrote:

> Sorry Bryan, but I need to call this out: "testing a small number of
> representative users" as effective as "a lot of random users".
>
> You give the impression that larger studies choose random users as
> test participants. You'll find that testing sessions run to meet
> statistical standards are required to select a representative sample
> in a highly structured and formalised manner. They choose 'users at
> random'; they don't choose random users. And the result is a much
> more rigorous representation of your audience.
>
> However, what happens on this large scale is not very different to
> what we do on a small scale when choosing users from each persona.
> This is a type of stratified random sample, and the way you select
> the representative from each is likely to be a fairly random method.
>
> None of which changes the point you were trying to make, which is
> that smaller tests can be highly effective, and a much more
> efficient use of your budget.
>
> Regards
> Steve
>
> 2009/10/2 Bryan Minihan <bjminihan at gmail.com>
> Here's the link I've used before...from Jakob Nielsen. Argue his
> credibility if you'd like, but in practice I've seen "testing a
> small number
> of representative users" as effective as "a lot of random users".
>
> http://www.useit.com/alertbox/20000319.html
>
>
>
> --
> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292
> | E: stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty |
> LinkedIn: www.linkedin.com/in/stevebaty

1 Oct 2009 - 8:44pm
Will Hacker
2009

Chris,

There is not any statistical formula or method that will tell you the
correct number of people to test. In my experience it depends on the
functions you are testing, how many test scenarios you want to run
and how many of those can be done by one participant in one session,
and how many different levels of expertise you need (e.g. novice,
intermediate, and/or expert) to really exercise your application.

I have gotten valuable insight from testing 6-10 people for ecommerce
sites with fairly common functionality that people are generally
familiar with but have used more for more complex applications where
there are different levels of features that some users rely on
heavily and others never use.

I do believe that any testing is better than none, and realize you
are likely limited by time and budget. I think you can usually get
fairly effective results with 10 or fewer people.

Will

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

2 Oct 2009 - 2:32am
James Page
2008

It is dependent on how many issues there are, the cultural variance of your
user base, and the margin of error you are happy with. Five users or even 10
is not enough on a modern well designed web site.

The easy way to think of a Usability Test is a treasure hunt. If the
treasure is very obvious then you will need fewer people, if less obvious
then you will need more people. If you increase the area of the hunt then
you will need more people. Most of the advocates of only testing 5 to 10
users, experience comes from one country. Behaviour changes significantly
country by country, even in Western Europe. See my blog post here :
http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/

If your client website has 1 million visitors a year, a usability issue that
effects 10% of the users would be unlikely to be discovered on a test of
only 5 to 10 users, but would give 100,000 people a bad experience when they
visit the site.

Can you find treasure with only five or ten users. Of course you can. But
how sure can you be that you have found even significant issues.

A very good argument in why 10 is not enough is Woolrych and Cockton 2001.
They point out an issue in Nielsen formula in that he does not take into
account the visibility of an issue. They show using only 5 users can
significantly under count even significant usability issues.

The following powerpoint from an eyetracking study demonstrates the issue
with only using a few users.
http://docs.realeyes.it/why50.ppt

You may also want to look at the margin of error for the test that you are
doing.

All the best

James
blog.feralabs.com

2009/10/1 Will Hacker <willhacker at sbcglobal.net>

> Chris,
>
> There is not any statistical formula or method that will tell you the
> correct number of people to test. In my experience it depends on the
> functions you are testing, how many test scenarios you want to run
> and how many of those can be done by one participant in one session,
> and how many different levels of expertise you need (e.g. novice,
> intermediate, and/or expert) to really exercise your application.
>
> I have gotten valuable insight from testing 6-10 people for ecommerce
> sites with fairly common functionality that people are generally
> familiar with but have used more for more complex applications where
> there are different levels of features that some users rely on
> heavily and others never use.
>
> I do believe that any testing is better than none, and realize you
> are likely limited by time and budget. I think you can usually get
> fairly effective results with 10 or fewer people.
>
> Will
>
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Posted from the new ixda.org
> http://www.ixda.org/discuss?post=46278
>
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

2 Oct 2009 - 2:44am
Steve Baty
2009

"If your client website has 1 million visitors a year, a usability issue
that
effects 10% of the users would be unlikely to be discovered on a test of
only 5 to 10 users, but would give 100,000 people a bad experience when they
visit the site."

Actually, that's not true. You'd be fairly likely to discover it with only
5-10 users - in the 65%+ range of 'likely'. Manufacturing quality control
systems and product quality testing have been using such statistical methods
since the 20's and they went through heavy refinement and sophistication in
the 60's, 70's and 80's.

It's also worth repeating the message both Jakob & Jared Spool are
constantly talking about: test iteratively with a group of 5-10
participants. You'll find that 65%+ figure above rises to 99%+ in that case.

Again, doesn't change your basic points about cultural diversity and
behaviour affecting the test parameters, but your above point is not
entirely accurate.

Cheers
Steve

2009/10/2 James Page <jamespage at gmail.com>

> It is dependent on how many issues there are, the cultural variance of your
> user base, and the margin of error you are happy with. Five users or even
> 10
> is not enough on a modern well designed web site.
>
> The easy way to think of a Usability Test is a treasure hunt. If the
> treasure is very obvious then you will need fewer people, if less obvious
> then you will need more people. If you increase the area of the hunt then
> you will need more people. Most of the advocates of only testing 5 to 10
> users, experience comes from one country. Behaviour changes significantly
> country by country, even in Western Europe. See my blog post here :
> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/
>
> If your client website has 1 million visitors a year, a usability issue
> that
> effects 10% of the users would be unlikely to be discovered on a test of
> only 5 to 10 users, but would give 100,000 people a bad experience when
> they
> visit the site.
>
> Can you find treasure with only five or ten users. Of course you can. But
> how sure can you be that you have found even significant issues.
>
> A very good argument in why 10 is not enough is Woolrych and Cockton 2001.
> They point out an issue in Nielsen formula in that he does not take into
> account the visibility of an issue. They show using only 5 users can
> significantly under count even significant usability issues.
>
> The following powerpoint from an eyetracking study demonstrates the issue
> with only using a few users.
> http://docs.realeyes.it/why50.ppt
>
> You may also want to look at the margin of error for the test that you are
> doing.
>
> All the best
>
> James
> blog.feralabs.com
>
> 2009/10/1 Will Hacker <willhacker at sbcglobal.net>
>
> > Chris,
> >
> > There is not any statistical formula or method that will tell you the
> > correct number of people to test. In my experience it depends on the
> > functions you are testing, how many test scenarios you want to run
> > and how many of those can be done by one participant in one session,
> > and how many different levels of expertise you need (e.g. novice,
> > intermediate, and/or expert) to really exercise your application.
> >
> > I have gotten valuable insight from testing 6-10 people for ecommerce
> > sites with fairly common functionality that people are generally
> > familiar with but have used more for more complex applications where
> > there are different levels of features that some users rely on
> > heavily and others never use.
> >
> > I do believe that any testing is better than none, and realize you
> > are likely limited by time and budget. I think you can usually get
> > fairly effective results with 10 or fewer people.
> >
> > Will
> >
> >
> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> > Posted from the new ixda.org
> > http://www.ixda.org/discuss?post=46278
> >
> >
> > ________________________________________________________________
> > Welcome to the Interaction Design Association (IxDA)!
> > To post to this list ....... discuss at ixda.org
> > Unsubscribe ................ http://www.ixda.org/unsubscribe
> > List Guidelines ............ http://www.ixda.org/guidelines
> > List Help .................. http://www.ixda.org/help
> >
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

--
Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
www.linkedin.com/in/stevebaty

2 Oct 2009 - 3:59am
William Hudson
2009

Chris -

I wrote an article on this topic for the SIGCHI Bulletin (while it was
still a printed publication sent to all SIGCHI members). It's at
http://www.syntagm.co.uk/design/articles/howmany.htm

Regards,

William Hudson
Syntagm Ltd
Design for Usability
UK 01235-522859
World +44-1235-522859
US Toll Free 1-866-SYNTAGM
mailto:william.hudson at syntagm.co.uk
http://www.syntagm.co.uk
skype:williamhudsonskype

Syntagm is a limited company registered in England and Wales (1985).
Registered number: 1895345. Registered office: 10 Oxford Road, Abingdon
OX14 2DS.

Confused about dates in interaction design? See our new study (free):
http://www.syntagm.co.uk/design/datesstudy.htm

12 UK mobile phone e-commerce sites compared! Buy the report:
http://www.syntagm.co.uk/design/uxbench.shtml

Courses in card sorting and Ajax interaction design - Las Vegas and
Berlin:
http://www.nngroup.com/events/

> -----Original Message-----
> From: discuss-bounces at lists.interactiondesigners.com [mailto:discuss-
> bounces at lists.interactiondesigners.com] On Behalf Of Chris Ryan
> Sent: 01 October 2009 7:06 PM
> To: discuss at ixda.org
> Subject: [IxDA Discuss] Article on Number of Usability Test
> Participants
>
> I have been looking, unsuccessfully, through back issues of
> interactions magazine for an article, published a few years back,

2 Oct 2009 - 5:24am
James Page
2008

Steve,

The real issue is that the example I have given is that it is over
simplistic. It is dependent on sterile lab conditions, and the user
population been the same in the lab and in the real world. And there only
being one issue that effects 10% of the user population. One of the great
beauties of the world is the complexity and diversity of people. In the
sterile lab people are tested on the same machine (we have found machine
configuration such as screen size has a bearing on behaviour), and they
don't have the distractions that normally effect the user in the real
world.

Actually, that's not true. You'd be fairly likely to discover it with only
> 5-10 users - in the 65%+ range of 'likely'.
>
> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This is
far off from Nielson number that 5 users will find 84% of the issues.
(1-(1-0.31)^5)

If I was manufacturing and there was a 45% chance that 10% of my cars leave
the production line with a fault, there is a high chance that consumers
would stop buying my product, the company would go bust, and I would be out
a job. From my experience of production lines a sample size of 10 for a
production of one million units would be considered extremely low.

We have moved allong way since 1993 when Nielsen and Landauer's paper was
published. The web was not arround, and the profile of users was very
different. The web has changed that. We will need to test with more people
as websites traffic increases, and we get better at web site design. For
example if we assume that designers of a web site have been using good
design principles and therefore an issue only effects 2.5% of users. Then 10
users in a test will only discover that issue 22% of the time. But using our
1 million visitors a year example the issue will mean that 25,000 people
will experience problems.

But we do agree that each population needs it's own test. And I totally
agree that testing iteratively is a good idea.

@William -- Woolrych and Cockton 2001 argument applies to simple task based
tests. See http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf>

All the best

James
blog.feralabs.com

PS (*Disclaimer*) Due to my belief that usability testing needs not just to
be more statistically sound, but also be able to test a wide range of users
from different cultures I co-founded www.webnographer.com a remote usability
testing tool. So I am advocate for testing with more geographically diverse
users than normal lab tests.

2009/10/2 Steve Baty <stevebaty at gmail.com>

> "If your client website has 1 million visitors a year, a usability issue
> that
> effects 10% of the users would be unlikely to be discovered on a test of
> only 5 to 10 users, but would give 100,000 people a bad experience when
> they
> visit the site."
>
> Actually, that's not true. You'd be fairly likely to discover it with only
> 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality control
> systems and product quality testing have been using such statistical methods
> since the 20's and they went through heavy refinement and sophistication in
> the 60's, 70's and 80's.
>
> It's also worth repeating the message both Jakob & Jared Spool are
> constantly talking about: test iteratively with a group of 5-10
> participants. You'll find that 65%+ figure above rises to 99%+ in that case.
>
> Again, doesn't change your basic points about cultural diversity and
> behaviour affecting the test parameters, but your above point is not
> entirely accurate.
>
> Cheers
> Steve
>
> 2009/10/2 James Page <jamespage at gmail.com>
>
> It is dependent on how many issues there are, the cultural variance of your
>> user base, and the margin of error you are happy with. Five users or even
>> 10
>> is not enough on a modern well designed web site.
>>
>> The easy way to think of a Usability Test is a treasure hunt. If the
>> treasure is very obvious then you will need fewer people, if less obvious
>> then you will need more people. If you increase the area of the hunt then
>> you will need more people. Most of the advocates of only testing 5 to 10
>> users, experience comes from one country. Behaviour changes significantly
>> country by country, even in Western Europe. See my blog post here :
>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/
>>
>> If your client website has 1 million visitors a year, a usability issue
>> that
>> effects 10% of the users would be unlikely to be discovered on a test of
>> only 5 to 10 users, but would give 100,000 people a bad experience when
>> they
>> visit the site.
>>
>> Can you find treasure with only five or ten users. Of course you can. But
>> how sure can you be that you have found even significant issues.
>>
>> A very good argument in why 10 is not enough is Woolrych and Cockton 2001.
>> They point out an issue in Nielsen formula in that he does not take into
>> account the visibility of an issue. They show using only 5 users can
>> significantly under count even significant usability issues.
>>
>> The following powerpoint from an eyetracking study demonstrates the issue
>> with only using a few users.
>> http://docs.realeyes.it/why50.ppt
>>
>> You may also want to look at the margin of error for the test that you are
>> doing.
>>
>> All the best
>>
>> James
>> blog.feralabs.com
>>
>> 2009/10/1 Will Hacker <willhacker at sbcglobal.net>
>>
>> > Chris,
>> >
>> > There is not any statistical formula or method that will tell you the
>> > correct number of people to test. In my experience it depends on the
>> > functions you are testing, how many test scenarios you want to run
>> > and how many of those can be done by one participant in one session,
>> > and how many different levels of expertise you need (e.g. novice,
>> > intermediate, and/or expert) to really exercise your application.
>> >
>> > I have gotten valuable insight from testing 6-10 people for ecommerce
>> > sites with fairly common functionality that people are generally
>> > familiar with but have used more for more complex applications where
>> > there are different levels of features that some users rely on
>> > heavily and others never use.
>> >
>> > I do believe that any testing is better than none, and realize you
>> > are likely limited by time and budget. I think you can usually get
>> > fairly effective results with 10 or fewer people.
>> >
>> > Will
>> >
>> >
>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> > Posted from the new ixda.org
>> > http://www.ixda.org/discuss?post=46278
>> >
>> >
>> > ________________________________________________________________
>> > Welcome to the Interaction Design Association (IxDA)!
>> > To post to this list ....... discuss at ixda.org
>> > Unsubscribe ................ http://www.ixda.org/unsubscribe
>> > List Guidelines ............ http://www.ixda.org/guidelines
>> > List Help .................. http://www.ixda.org/help
>> >
>> ________________________________________________________________
>> Welcome to the Interaction Design Association (IxDA)!
>> To post to this list ....... discuss at ixda.org
>> Unsubscribe ................ http://www.ixda.org/unsubscribe
>> List Guidelines ............ http://www.ixda.org/guidelines
>> List Help .................. http://www.ixda.org/help
>>
>
>
>
> --
> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
> stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
> www.linkedin.com/in/stevebaty
>

2 Oct 2009 - 6:20am
Steve Baty
2009

James,

Excellent points.

Nielsen argues that 5 users will discover 84% of the issues; not that the
likelihood of finding a particular issue is 84% - thus the discrepancy in
our figures (41% & 65% respectively).

(And I can't believe I'm defending Nielsen's figures, but this is one of his
better studies) The results from '93 were re-evaluated more recently for
Web-based systems with similar results. There's also some good theory on
this from sociology and cultural anthropology - but I think we're moving far
afield from the original question.

Regarding the manufacturing reference - which I introduced, granted - units
tend to be tested in batches for the reason you mention. The presence of
defects in a batch signals a problem and further testing is carried out.

I also like the approach Amazon (and others) take in response to your last
point, which is to release new features to small (for them) numbers of users
- 1,000, then 5,000 etc - so that these low-incidence problems can surface.
When the potential impact is high, this is a really solid approach to take.

Regards
Steve

2009/10/2 James Page <jamespage at gmail.com>

> Steve,
>
> The real issue is that the example I have given is that it is over
> simplistic. It is dependent on sterile lab conditions, and the user
> population been the same in the lab and in the real world. And there only
> being one issue that effects 10% of the user population. One of the great
> beauties of the world is the complexity and diversity of people. In the
> sterile lab people are tested on the same machine (we have found machine
> configuration such as screen size has a bearing on behaviour), and they
> don't have the distractions that normally effect the user in the real
> world.
>
> Actually, that's not true. You'd be fairly likely to discover it with only
>> 5-10 users - in the 65%+ range of 'likely'.
>>
>> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This is
> far off from Nielson number that 5 users will find 84% of the issues.
> (1-(1-0.31)^5)
>
> If I was manufacturing and there was a 45% chance that 10% of my cars leave
> the production line with a fault, there is a high chance that consumers
> would stop buying my product, the company would go bust, and I would be out
> a job. From my experience of production lines a sample size of 10 for a
> production of one million units would be considered extremely low.
>
> We have moved allong way since 1993 when Nielsen and Landauer's paper was
> published. The web was not arround, and the profile of users was very
> different. The web has changed that. We will need to test with more people
> as websites traffic increases, and we get better at web site design. For
> example if we assume that designers of a web site have been using good
> design principles and therefore an issue only effects 2.5% of users. Then 10
> users in a test will only discover that issue 22% of the time. But using our
> 1 million visitors a year example the issue will mean that 25,000 people
> will experience problems.
>
> But we do agree that each population needs it's own test. And I totally
> agree that testing iteratively is a good idea.
>
> @William -- Woolrych and Cockton 2001 argument applies to simple task
> based tests. See
> http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf>
>
> All the best
>
> James
> blog.feralabs.com
>
> PS (*Disclaimer*) Due to my belief that usability testing needs not just
> to be more statistically sound, but also be able to test a wide range of
> users from different cultures I co-founded www.webnographer.com a remote
> usability testing tool. So I am advocate for testing with more
> geographically diverse users than normal lab tests.
>
> 2009/10/2 Steve Baty <stevebaty at gmail.com>
>
> "If your client website has 1 million visitors a year, a usability issue
>> that
>> effects 10% of the users would be unlikely to be discovered on a test of
>> only 5 to 10 users, but would give 100,000 people a bad experience when
>> they
>> visit the site."
>>
>> Actually, that's not true. You'd be fairly likely to discover it with only
>> 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality control
>> systems and product quality testing have been using such statistical methods
>> since the 20's and they went through heavy refinement and sophistication in
>> the 60's, 70's and 80's.
>>
>> It's also worth repeating the message both Jakob & Jared Spool are
>> constantly talking about: test iteratively with a group of 5-10
>> participants. You'll find that 65%+ figure above rises to 99%+ in that case.
>>
>> Again, doesn't change your basic points about cultural diversity and
>> behaviour affecting the test parameters, but your above point is not
>> entirely accurate.
>>
>> Cheers
>> Steve
>>
>> 2009/10/2 James Page <jamespage at gmail.com>
>>
>> It is dependent on how many issues there are, the cultural variance of
>>> your
>>> user base, and the margin of error you are happy with. Five users or even
>>> 10
>>> is not enough on a modern well designed web site.
>>>
>>> The easy way to think of a Usability Test is a treasure hunt. If the
>>> treasure is very obvious then you will need fewer people, if less obvious
>>> then you will need more people. If you increase the area of the hunt then
>>> you will need more people. Most of the advocates of only testing 5 to 10
>>> users, experience comes from one country. Behaviour changes significantly
>>> country by country, even in Western Europe. See my blog post here :
>>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/
>>>
>>> If your client website has 1 million visitors a year, a usability issue
>>> that
>>> effects 10% of the users would be unlikely to be discovered on a test of
>>> only 5 to 10 users, but would give 100,000 people a bad experience when
>>> they
>>> visit the site.
>>>
>>> Can you find treasure with only five or ten users. Of course you can. But
>>> how sure can you be that you have found even significant issues.
>>>
>>> A very good argument in why 10 is not enough is Woolrych and Cockton
>>> 2001.
>>> They point out an issue in Nielsen formula in that he does not take into
>>> account the visibility of an issue. They show using only 5 users can
>>> significantly under count even significant usability issues.
>>>
>>> The following powerpoint from an eyetracking study demonstrates the issue
>>> with only using a few users.
>>> http://docs.realeyes.it/why50.ppt
>>>
>>> You may also want to look at the margin of error for the test that you
>>> are
>>> doing.
>>>
>>> All the best
>>>
>>> James
>>> blog.feralabs.com
>>>
>>> 2009/10/1 Will Hacker <willhacker at sbcglobal.net>
>>>
>>> > Chris,
>>> >
>>> > There is not any statistical formula or method that will tell you the
>>> > correct number of people to test. In my experience it depends on the
>>> > functions you are testing, how many test scenarios you want to run
>>> > and how many of those can be done by one participant in one session,
>>> > and how many different levels of expertise you need (e.g. novice,
>>> > intermediate, and/or expert) to really exercise your application.
>>> >
>>> > I have gotten valuable insight from testing 6-10 people for ecommerce
>>> > sites with fairly common functionality that people are generally
>>> > familiar with but have used more for more complex applications where
>>> > there are different levels of features that some users rely on
>>> > heavily and others never use.
>>> >
>>> > I do believe that any testing is better than none, and realize you
>>> > are likely limited by time and budget. I think you can usually get
>>> > fairly effective results with 10 or fewer people.
>>> >
>>> > Will
>>> >
>>> >
>>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>>> > Posted from the new ixda.org
>>> > http://www.ixda.org/discuss?post=46278
>>> >
>>> >
>>> > ________________________________________________________________
>>> > Welcome to the Interaction Design Association (IxDA)!
>>> > To post to this list ....... discuss at ixda.org
>>> > Unsubscribe ................ http://www.ixda.org/unsubscribe
>>> > List Guidelines ............ http://www.ixda.org/guidelines
>>> > List Help .................. http://www.ixda.org/help
>>> >
>>> ________________________________________________________________
>>> Welcome to the Interaction Design Association (IxDA)!
>>> To post to this list ....... discuss at ixda.org
>>> Unsubscribe ................ http://www.ixda.org/unsubscribe
>>> List Guidelines ............ http://www.ixda.org/guidelines
>>> List Help .................. http://www.ixda.org/help
>>>
>>
>>
>>
>> --
>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
>> stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
>> www.linkedin.com/in/stevebaty
>>
>
>

--
Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
www.linkedin.com/in/stevebaty

2 Oct 2009 - 6:51am
Thomas Petersen
2008

"It's also worth repeating the message both Jakob & Jared Spool are
constantly talking about: test iteratively with a group of 5-10
participants. You'll find that 65% figure above rises to 99% in
that case"

I find this an absurd statement. The above can only have some merit
if we are talking about the actual product being tested.

If we are talking wireframes or any other replacements for the real
thing whatever you will find have very little if anything to do with
what you find in the end.

The real issues arise after the launch not before and the real
question is not how many participants but at what point participants
should be used.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

2 Oct 2009 - 7:05am
Steve Baty
2009

I'm not sure I understand your line of reasoning, Thomas. What issues are we
identifying in the wireframes if not those same issues that might otherwise
make it through into the final product? Certainly at a different level of
detail; and definitely our early tests aren't able to show up everything;
but that hardly makes it an absurd statement.

2009/10/2 Thomas Petersen <tp at hellobrand.com>

> "It's also worth repeating the message both Jakob & Jared Spool are
> constantly talking about: test iteratively with a group of 5-10
> participants. You'll find that 65% figure above rises to 99% in
> that case"
>
> I find this an absurd statement. The above can only have some merit
> if we are talking about the actual product being tested.
>
> If we are talking wireframes or any other replacements for the real
> thing whatever you will find have very little if anything to do with
> what you find in the end.
>
> The real issues arise after the launch not before and the real
> question is not how many participants but at what point participants
> should be used.
>
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Posted from the new ixda.org
> http://www.ixda.org/discuss?post=46278
>
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

--
Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
www.linkedin.com/in/stevebaty

2 Oct 2009 - 7:36am
James Page
2008

Steve,

Woolrych and Cockton argue that the discrepancy is Nielsen's constant of
.31. Neilson assumes all issues have the same visibility. We have not even
added the extra dimension of evaluator effect :-)

Do you have a reference for the more resent paper? I would be interested in
reading it.

On the manufacturing side most of the metrics use a margin of error. With
just 10 users your margin of error will be about +/-35% (very rough
calculation). That is far better than no test, but still would be considered
extremely low in a manufacturing process.

In Anthropology most of papers I have read use far greater sample sizes than
just a population of 10. Yes it depends on the subject mater. The
Anthropologist will use techniques like using informers, which increases the
number of participants. And the Anthropologist is studying the population
over months if not years, so there are far more observations.

@thomas testing the wireframe will only show up what is already visible. But
if a feature has an issue, and it is implemented in the wireframe, then a
test will show it up. Discovering an issue early is surely better than
later. I think your statement iterates the idea that testing frequently is a
good idea.

All the best

James
blog.feralabs.com

2009/10/2 Steve Baty <stevebaty at gmail.com>

> James,
>
> Excellent points.
>
> Nielsen argues that 5 users will discover 84% of the issues; not that the
> likelihood of finding a particular issue is 84% - thus the discrepancy in
> our figures (41% & 65% respectively).
>
> (And I can't believe I'm defending Nielsen's figures, but this is one of
> his better studies) The results from '93 were re-evaluated more recently for
> Web-based systems with similar results. There's also some good theory on
> this from sociology and cultural anthropology - but I think we're moving far
> afield from the original question.
>
> Regarding the manufacturing reference - which I introduced, granted - units
> tend to be tested in batches for the reason you mention. The presence of
> defects in a batch signals a problem and further testing is carried out.
>
> I also like the approach Amazon (and others) take in response to your last
> point, which is to release new features to small (for them) numbers of users
> - 1,000, then 5,000 etc - so that these low-incidence problems can surface.
> When the potential impact is high, this is a really solid approach to take.
>
> Regards
>
> Steve
>
> 2009/10/2 James Page <jamespage at gmail.com>
>
>> Steve,
>>
>> The real issue is that the example I have given is that it is over
>> simplistic. It is dependent on sterile lab conditions, and the user
>> population been the same in the lab and in the real world. And there only
>> being one issue that effects 10% of the user population. One of the great
>> beauties of the world is the complexity and diversity of people. In the
>> sterile lab people are tested on the same machine (we have found machine
>> configuration such as screen size has a bearing on behaviour), and they
>> don't have the distractions that normally effect the user in the real
>> world.
>>
>> Actually, that's not true. You'd be fairly likely to discover it with only
>>> 5-10 users - in the 65%+ range of 'likely'.
>>>
>>> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This is
>> far off from Nielson number that 5 users will find 84% of the issues.
>> (1-(1-0.31)^5)
>>
>> If I was manufacturing and there was a 45% chance that 10% of my cars
>> leave the production line with a fault, there is a high chance that
>> consumers would stop buying my product, the company would go bust, and I
>> would be out a job. From my experience of production lines a sample size of
>> 10 for a production of one million units would be considered extremely low.
>>
>> We have moved allong way since 1993 when Nielsen and Landauer's paper was
>> published. The web was not arround, and the profile of users was very
>> different. The web has changed that. We will need to test with more people
>> as websites traffic increases, and we get better at web site design. For
>> example if we assume that designers of a web site have been using good
>> design principles and therefore an issue only effects 2.5% of users. Then 10
>> users in a test will only discover that issue 22% of the time. But using our
>> 1 million visitors a year example the issue will mean that 25,000 people
>> will experience problems.
>>
>> But we do agree that each population needs it's own test. And I totally
>> agree that testing iteratively is a good idea.
>>
>> @William -- Woolrych and Cockton 2001 argument applies to simple task
>> based tests. See
>> http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf>
>>
>> All the best
>>
>> James
>> blog.feralabs.com
>>
>> PS (*Disclaimer*) Due to my belief that usability testing needs not just
>> to be more statistically sound, but also be able to test a wide range of
>> users from different cultures I co-founded www.webnographer.com a remote
>> usability testing tool. So I am advocate for testing with more
>> geographically diverse users than normal lab tests.
>>
>> 2009/10/2 Steve Baty <stevebaty at gmail.com>
>>
>> "If your client website has 1 million visitors a year, a usability issue
>>> that
>>> effects 10% of the users would be unlikely to be discovered on a test of
>>> only 5 to 10 users, but would give 100,000 people a bad experience when
>>> they
>>> visit the site."
>>>
>>> Actually, that's not true. You'd be fairly likely to discover it with
>>> only 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality
>>> control systems and product quality testing have been using such statistical
>>> methods since the 20's and they went through heavy refinement and
>>> sophistication in the 60's, 70's and 80's.
>>>
>>> It's also worth repeating the message both Jakob & Jared Spool are
>>> constantly talking about: test iteratively with a group of 5-10
>>> participants. You'll find that 65%+ figure above rises to 99%+ in that case.
>>>
>>> Again, doesn't change your basic points about cultural diversity and
>>> behaviour affecting the test parameters, but your above point is not
>>> entirely accurate.
>>>
>>> Cheers
>>> Steve
>>>
>>> 2009/10/2 James Page <jamespage at gmail.com>
>>>
>>> It is dependent on how many issues there are, the cultural variance of
>>>> your
>>>> user base, and the margin of error you are happy with. Five users or
>>>> even 10
>>>> is not enough on a modern well designed web site.
>>>>
>>>> The easy way to think of a Usability Test is a treasure hunt. If the
>>>> treasure is very obvious then you will need fewer people, if less
>>>> obvious
>>>> then you will need more people. If you increase the area of the hunt
>>>> then
>>>> you will need more people. Most of the advocates of only testing 5 to 10
>>>> users, experience comes from one country. Behaviour changes
>>>> significantly
>>>> country by country, even in Western Europe. See my blog post here :
>>>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/
>>>>
>>>> If your client website has 1 million visitors a year, a usability issue
>>>> that
>>>> effects 10% of the users would be unlikely to be discovered on a test of
>>>> only 5 to 10 users, but would give 100,000 people a bad experience when
>>>> they
>>>> visit the site.
>>>>
>>>> Can you find treasure with only five or ten users. Of course you can.
>>>> But
>>>> how sure can you be that you have found even significant issues.
>>>>
>>>> A very good argument in why 10 is not enough is Woolrych and Cockton
>>>> 2001.
>>>> They point out an issue in Nielsen formula in that he does not take into
>>>> account the visibility of an issue. They show using only 5 users can
>>>> significantly under count even significant usability issues.
>>>>
>>>> The following powerpoint from an eyetracking study demonstrates the
>>>> issue
>>>> with only using a few users.
>>>> http://docs.realeyes.it/why50.ppt
>>>>
>>>> You may also want to look at the margin of error for the test that you
>>>> are
>>>> doing.
>>>>
>>>> All the best
>>>>
>>>> James
>>>> blog.feralabs.com
>>>>
>>>> 2009/10/1 Will Hacker <willhacker at sbcglobal.net>
>>>>
>>>> > Chris,
>>>> >
>>>> > There is not any statistical formula or method that will tell you the
>>>> > correct number of people to test. In my experience it depends on the
>>>> > functions you are testing, how many test scenarios you want to run
>>>> > and how many of those can be done by one participant in one session,
>>>> > and how many different levels of expertise you need (e.g. novice,
>>>> > intermediate, and/or expert) to really exercise your application.
>>>> >
>>>> > I have gotten valuable insight from testing 6-10 people for ecommerce
>>>> > sites with fairly common functionality that people are generally
>>>> > familiar with but have used more for more complex applications where
>>>> > there are different levels of features that some users rely on
>>>> > heavily and others never use.
>>>> >
>>>> > I do believe that any testing is better than none, and realize you
>>>> > are likely limited by time and budget. I think you can usually get
>>>> > fairly effective results with 10 or fewer people.
>>>> >
>>>> > Will
>>>> >
>>>> >
>>>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>>>> > Posted from the new ixda.org
>>>> > http://www.ixda.org/discuss?post=46278
>>>> >
>>>> >
>>>> > ________________________________________________________________
>>>> > Welcome to the Interaction Design Association (IxDA)!
>>>> > To post to this list ....... discuss at ixda.org
>>>> > Unsubscribe ................ http://www.ixda.org/unsubscribe
>>>> > List Guidelines ............ http://www.ixda.org/guidelines
>>>> > List Help .................. http://www.ixda.org/help
>>>> >
>>>> ________________________________________________________________
>>>> Welcome to the Interaction Design Association (IxDA)!
>>>> To post to this list ....... discuss at ixda.org
>>>> Unsubscribe ................ http://www.ixda.org/unsubscribe
>>>> List Guidelines ............ http://www.ixda.org/guidelines
>>>> List Help .................. http://www.ixda.org/help
>>>>
>>>
>>>
>>>
>>> --
>>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
>>> stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
>>> www.linkedin.com/in/stevebaty
>>>
>>
>>
>
>
> --
> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
> stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
> www.linkedin.com/in/stevebaty
>

2 Oct 2009 - 7:55am
Steve Baty
2009

James,

More good points. I did some calculations a while back on the confidence
intervals for pass/fail user tests -
http://www.meld.com.au/2006/05/when-100-isnt-really-100-updated - the more
interesting part being the link to a paper on estimators of expected values.
Worth a read if you haven't seen it.

I'll try to dig up the more recent paper - working from memory on that one.

Regarding the anthropology & sociology references - I was referring more to
the notion of uncovering societal norms rather than the specific 'supporting
a sample size of x'.

Coming back to your first point: yeah, the use of the .31 is a
simplification for the sake of one of his free articles; it's a modal figure
based on (his words) a large number of projects. So, looking at a range of
figures, you would have some projects where more users were needed (to your
earlier point), and in some cases - few - you could get away with less
(although I admit that the use of less than 5 participants causes me some
concern).

Anyway, enjoying the discussion, and I still think we're violently in
agreement on the basic point :)

Cheers
Steve

2009/10/2 James Page <jamespage at gmail.com>

> Steve,
>
> Woolrych and Cockton argue that the discrepancy is Nielsen's constant of
> .31. Neilson assumes all issues have the same visibility. We have not even
> added the extra dimension of evaluator effect :-)
>
> Do you have a reference for the more resent paper? I would be interested in
> reading it.
>
> On the manufacturing side most of the metrics use a margin of error. With
> just 10 users your margin of error will be about +/-35% (very rough
> calculation). That is far better than no test, but still would be considered
> extremely low in a manufacturing process.
>
> In Anthropology most of papers I have read use far greater sample sizes
> than just a population of 10. Yes it depends on the subject mater. The
> Anthropologist will use techniques like using informers, which increases the
> number of participants. And the Anthropologist is studying the population
> over months if not years, so there are far more observations.
>
> @thomas testing the wireframe will only show up what is already visible.
> But if a feature has an issue, and it is implemented in the wireframe, then
> a test will show it up. Discovering an issue early is surely better than
> later. I think your statement iterates the idea that testing frequently is a
> good idea.
>
> All the best
>
> James
> blog.feralabs.com
>
>
> 2009/10/2 Steve Baty <stevebaty at gmail.com>
>
>> James,
>>
>> Excellent points.
>>
>> Nielsen argues that 5 users will discover 84% of the issues; not that the
>> likelihood of finding a particular issue is 84% - thus the discrepancy in
>> our figures (41% & 65% respectively).
>>
>> (And I can't believe I'm defending Nielsen's figures, but this is one of
>> his better studies) The results from '93 were re-evaluated more recently for
>> Web-based systems with similar results. There's also some good theory on
>> this from sociology and cultural anthropology - but I think we're moving far
>> afield from the original question.
>>
>> Regarding the manufacturing reference - which I introduced, granted -
>> units tend to be tested in batches for the reason you mention. The presence
>> of defects in a batch signals a problem and further testing is carried out.
>>
>> I also like the approach Amazon (and others) take in response to your last
>> point, which is to release new features to small (for them) numbers of users
>> - 1,000, then 5,000 etc - so that these low-incidence problems can surface.
>> When the potential impact is high, this is a really solid approach to take.
>>
>> Regards
>>
>> Steve
>>
>> 2009/10/2 James Page <jamespage at gmail.com>
>>
>>> Steve,
>>>
>>> The real issue is that the example I have given is that it is over
>>> simplistic. It is dependent on sterile lab conditions, and the user
>>> population been the same in the lab and in the real world. And there only
>>> being one issue that effects 10% of the user population. One of the great
>>> beauties of the world is the complexity and diversity of people. In the
>>> sterile lab people are tested on the same machine (we have found machine
>>> configuration such as screen size has a bearing on behaviour), and they
>>> don't have the distractions that normally effect the user in the real
>>> world.
>>>
>>> Actually, that's not true. You'd be fairly likely to discover it with
>>>> only 5-10 users - in the 65%+ range of 'likely'.
>>>>
>>>> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This is
>>> far off from Nielson number that 5 users will find 84% of the issues.
>>> (1-(1-0.31)^5)
>>>
>>> If I was manufacturing and there was a 45% chance that 10% of my cars
>>> leave the production line with a fault, there is a high chance that
>>> consumers would stop buying my product, the company would go bust, and I
>>> would be out a job. From my experience of production lines a sample size of
>>> 10 for a production of one million units would be considered extremely low.
>>>
>>> We have moved allong way since 1993 when Nielsen and Landauer's paper was
>>> published. The web was not arround, and the profile of users was very
>>> different. The web has changed that. We will need to test with more people
>>> as websites traffic increases, and we get better at web site design. For
>>> example if we assume that designers of a web site have been using good
>>> design principles and therefore an issue only effects 2.5% of users. Then 10
>>> users in a test will only discover that issue 22% of the time. But using our
>>> 1 million visitors a year example the issue will mean that 25,000 people
>>> will experience problems.
>>>
>>> But we do agree that each population needs it's own test. And I totally
>>> agree that testing iteratively is a good idea.
>>>
>>> @William -- Woolrych and Cockton 2001 argument applies to simple task
>>> based tests. See
>>> http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf>
>>>
>>> All the best
>>>
>>> James
>>> blog.feralabs.com
>>>
>>> PS (*Disclaimer*) Due to my belief that usability testing needs not just
>>> to be more statistically sound, but also be able to test a wide range of
>>> users from different cultures I co-founded www.webnographer.com a remote
>>> usability testing tool. So I am advocate for testing with more
>>> geographically diverse users than normal lab tests.
>>>
>>> 2009/10/2 Steve Baty <stevebaty at gmail.com>
>>>
>>> "If your client website has 1 million visitors a year, a usability issue
>>>> that
>>>> effects 10% of the users would be unlikely to be discovered on a test of
>>>> only 5 to 10 users, but would give 100,000 people a bad experience when
>>>> they
>>>> visit the site."
>>>>
>>>> Actually, that's not true. You'd be fairly likely to discover it with
>>>> only 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality
>>>> control systems and product quality testing have been using such statistical
>>>> methods since the 20's and they went through heavy refinement and
>>>> sophistication in the 60's, 70's and 80's.
>>>>
>>>> It's also worth repeating the message both Jakob & Jared Spool are
>>>> constantly talking about: test iteratively with a group of 5-10
>>>> participants. You'll find that 65%+ figure above rises to 99%+ in that case.
>>>>
>>>> Again, doesn't change your basic points about cultural diversity and
>>>> behaviour affecting the test parameters, but your above point is not
>>>> entirely accurate.
>>>>
>>>> Cheers
>>>> Steve
>>>>
>>>> 2009/10/2 James Page <jamespage at gmail.com>
>>>>
>>>> It is dependent on how many issues there are, the cultural variance of
>>>>> your
>>>>> user base, and the margin of error you are happy with. Five users or
>>>>> even 10
>>>>> is not enough on a modern well designed web site.
>>>>>
>>>>> The easy way to think of a Usability Test is a treasure hunt. If the
>>>>> treasure is very obvious then you will need fewer people, if less
>>>>> obvious
>>>>> then you will need more people. If you increase the area of the hunt
>>>>> then
>>>>> you will need more people. Most of the advocates of only testing 5 to
>>>>> 10
>>>>> users, experience comes from one country. Behaviour changes
>>>>> significantly
>>>>> country by country, even in Western Europe. See my blog post here :
>>>>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/
>>>>>
>>>>> If your client website has 1 million visitors a year, a usability issue
>>>>> that
>>>>> effects 10% of the users would be unlikely to be discovered on a test
>>>>> of
>>>>> only 5 to 10 users, but would give 100,000 people a bad experience when
>>>>> they
>>>>> visit the site.
>>>>>
>>>>> Can you find treasure with only five or ten users. Of course you can.
>>>>> But
>>>>> how sure can you be that you have found even significant issues.
>>>>>
>>>>> A very good argument in why 10 is not enough is Woolrych and Cockton
>>>>> 2001.
>>>>> They point out an issue in Nielsen formula in that he does not take
>>>>> into
>>>>> account the visibility of an issue. They show using only 5 users can
>>>>> significantly under count even significant usability issues.
>>>>>
>>>>> The following powerpoint from an eyetracking study demonstrates the
>>>>> issue
>>>>> with only using a few users.
>>>>> http://docs.realeyes.it/why50.ppt
>>>>>
>>>>> You may also want to look at the margin of error for the test that you
>>>>> are
>>>>> doing.
>>>>>
>>>>> All the best
>>>>>
>>>>> James
>>>>> blog.feralabs.com
>>>>>
>>>>> 2009/10/1 Will Hacker <willhacker at sbcglobal.net>
>>>>>
>>>>> > Chris,
>>>>> >
>>>>> > There is not any statistical formula or method that will tell you the
>>>>> > correct number of people to test. In my experience it depends on the
>>>>> > functions you are testing, how many test scenarios you want to run
>>>>> > and how many of those can be done by one participant in one session,
>>>>> > and how many different levels of expertise you need (e.g. novice,
>>>>> > intermediate, and/or expert) to really exercise your application.
>>>>> >
>>>>> > I have gotten valuable insight from testing 6-10 people for ecommerce
>>>>> > sites with fairly common functionality that people are generally
>>>>> > familiar with but have used more for more complex applications where
>>>>> > there are different levels of features that some users rely on
>>>>> > heavily and others never use.
>>>>> >
>>>>> > I do believe that any testing is better than none, and realize you
>>>>> > are likely limited by time and budget. I think you can usually get
>>>>> > fairly effective results with 10 or fewer people.
>>>>> >
>>>>> > Will
>>>>> >
>>>>> >
>>>>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>>>>> > Posted from the new ixda.org
>>>>> > http://www.ixda.org/discuss?post=46278
>>>>> >
>>>>> >
>>>>> > ________________________________________________________________
>>>>> > Welcome to the Interaction Design Association (IxDA)!
>>>>> > To post to this list ....... discuss at ixda.org
>>>>> > Unsubscribe ................ http://www.ixda.org/unsubscribe
>>>>> > List Guidelines ............ http://www.ixda.org/guidelines
>>>>> > List Help .................. http://www.ixda.org/help
>>>>> >
>>>>> ________________________________________________________________
>>>>> Welcome to the Interaction Design Association (IxDA)!
>>>>> To post to this list ....... discuss at ixda.org
>>>>> Unsubscribe ................ http://www.ixda.org/unsubscribe
>>>>> List Guidelines ............ http://www.ixda.org/guidelines
>>>>> List Help .................. http://www.ixda.org/help
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
>>>> stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty |
>>>> LinkedIn: www.linkedin.com/in/stevebaty
>>>>
>>>
>>>
>>
>>
>> --
>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
>> stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
>> www.linkedin.com/in/stevebaty
>>
>
>

--
Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
www.linkedin.com/in/stevebaty

2 Oct 2009 - 7:23am
Anonymous

Thomas Petersen said:
> If we are talking wireframes or any other replacements for the real
> thing whatever you will find have very little if anything to do with
> what you find in the end.

Hi, Thomas,

Are we talking about design issues or defects? Apologies if I totally
misread you, but it sounds like you're talking about defects.

I've run into that misconception a few times lately--that usability testing
is an extension of quality assurance, intended to surface bugs or defects in
the product. In reality, usability testing is best suited for sussing out
problems with the strategic level of the design--are mental models
appropriate and intuitive enough that people can easily complete the
principal tasks associated with the product. And testing wireframes or
prototypes is a fantastic way to flush out mental model problems at an early
enough stage that course correction is financially feasible (not so if the
first usability test occurs when the product is thought to be completed).

To find the tactical level issues of implementation--the kind that you'd
find after launch--you need a robust QA process. Usability testing is a poor
substitute for quality assurance.

Will Sansbury

2 Oct 2009 - 8:34am
James Page
2008

Totally agree with your article....

> So you can get a much narrower range for your estimate, but 30+ users is a
> significant undertaking for a usability test.
>
One of our own findings from a study was that people got bored with testing
more than about 8 users.

James

2009/10/2 Steve Baty <stevebaty at gmail.com>

> James,
>
> More good points. I did some calculations a while back on the confidence
> intervals for pass/fail user tests -
> http://www.meld.com.au/2006/05/when-100-isnt-really-100-updated - the more
> interesting part being the link to a paper on estimators of expected values.
> Worth a read if you haven't seen it.
>
> I'll try to dig up the more recent paper - working from memory on that one.
>
> Regarding the anthropology & sociology references - I was referring more to
> the notion of uncovering societal norms rather than the specific 'supporting
> a sample size of x'.
>
> Coming back to your first point: yeah, the use of the .31 is a
> simplification for the sake of one of his free articles; it's a modal figure
> based on (his words) a large number of projects. So, looking at a range of
> figures, you would have some projects where more users were needed (to your
> earlier point), and in some cases - few - you could get away with less
> (although I admit that the use of less than 5 participants causes me some
> concern).
>
> Anyway, enjoying the discussion, and I still think we're violently in
> agreement on the basic point :)
>
>
> Cheers
> Steve
>
> 2009/10/2 James Page <jamespage at gmail.com>
>
>> Steve,
>>
>> Woolrych and Cockton argue that the discrepancy is Nielsen's constant of
>> .31. Neilson assumes all issues have the same visibility. We have not even
>> added the extra dimension of evaluator effect :-)
>>
>> Do you have a reference for the more resent paper? I would be interested
>> in reading it.
>>
>> On the manufacturing side most of the metrics use a margin of error. With
>> just 10 users your margin of error will be about +/-35% (very rough
>> calculation). That is far better than no test, but still would be considered
>> extremely low in a manufacturing process.
>>
>> In Anthropology most of papers I have read use far greater sample sizes
>> than just a population of 10. Yes it depends on the subject mater. The
>> Anthropologist will use techniques like using informers, which increases the
>> number of participants. And the Anthropologist is studying the population
>> over months if not years, so there are far more observations.
>>
>> @thomas testing the wireframe will only show up what is already visible.
>> But if a feature has an issue, and it is implemented in the wireframe, then
>> a test will show it up. Discovering an issue early is surely better than
>> later. I think your statement iterates the idea that testing frequently is a
>> good idea.
>>
>> All the best
>>
>> James
>> blog.feralabs.com
>>
>>
>> 2009/10/2 Steve Baty <stevebaty at gmail.com>
>>
>>> James,
>>>
>>> Excellent points.
>>>
>>> Nielsen argues that 5 users will discover 84% of the issues; not that the
>>> likelihood of finding a particular issue is 84% - thus the discrepancy in
>>> our figures (41% & 65% respectively).
>>>
>>> (And I can't believe I'm defending Nielsen's figures, but this is one of
>>> his better studies) The results from '93 were re-evaluated more recently for
>>> Web-based systems with similar results. There's also some good theory on
>>> this from sociology and cultural anthropology - but I think we're moving far
>>> afield from the original question.
>>>
>>> Regarding the manufacturing reference - which I introduced, granted -
>>> units tend to be tested in batches for the reason you mention. The presence
>>> of defects in a batch signals a problem and further testing is carried out.
>>>
>>> I also like the approach Amazon (and others) take in response to your
>>> last point, which is to release new features to small (for them) numbers of
>>> users - 1,000, then 5,000 etc - so that these low-incidence problems can
>>> surface. When the potential impact is high, this is a really solid approach
>>> to take.
>>>
>>> Regards
>>>
>>> Steve
>>>
>>> 2009/10/2 James Page <jamespage at gmail.com>
>>>
>>>> Steve,
>>>>
>>>> The real issue is that the example I have given is that it is over
>>>> simplistic. It is dependent on sterile lab conditions, and the user
>>>> population been the same in the lab and in the real world. And there only
>>>> being one issue that effects 10% of the user population. One of the great
>>>> beauties of the world is the complexity and diversity of people. In the
>>>> sterile lab people are tested on the same machine (we have found machine
>>>> configuration such as screen size has a bearing on behaviour), and they
>>>> don't have the distractions that normally effect the user in the real
>>>> world.
>>>>
>>>> Actually, that's not true. You'd be fairly likely to discover it with
>>>>> only 5-10 users - in the 65%+ range of 'likely'.
>>>>>
>>>>> For 5 uses that is only 41% (1-(1-0.1)^5), and for 10 it is 65%. This
>>>> is far off from Nielson number that 5 users will find 84% of the issues.
>>>> (1-(1-0.31)^5)
>>>>
>>>> If I was manufacturing and there was a 45% chance that 10% of my cars
>>>> leave the production line with a fault, there is a high chance that
>>>> consumers would stop buying my product, the company would go bust, and I
>>>> would be out a job. From my experience of production lines a sample size of
>>>> 10 for a production of one million units would be considered extremely low.
>>>>
>>>> We have moved allong way since 1993 when Nielsen and Landauer's paper
>>>> was published. The web was not arround, and the profile of users was very
>>>> different. The web has changed that. We will need to test with more people
>>>> as websites traffic increases, and we get better at web site design. For
>>>> example if we assume that designers of a web site have been using good
>>>> design principles and therefore an issue only effects 2.5% of users. Then 10
>>>> users in a test will only discover that issue 22% of the time. But using our
>>>> 1 million visitors a year example the issue will mean that 25,000 people
>>>> will experience problems.
>>>>
>>>> But we do agree that each population needs it's own test. And I totally
>>>> agree that testing iteratively is a good idea.
>>>>
>>>> @William -- Woolrych and Cockton 2001 argument applies to simple task
>>>> based tests. See
>>>> http://osiris.sunderland.ac.uk/~cs0awo/hci%202001%20short.pdf<http://osiris.sunderland.ac.uk/%7Ecs0awo/hci%202001%20short.pdf>
>>>>
>>>> All the best
>>>>
>>>> James
>>>> blog.feralabs.com
>>>>
>>>> PS (*Disclaimer*) Due to my belief that usability testing needs not
>>>> just to be more statistically sound, but also be able to test a wide range
>>>> of users from different cultures I co-founded www.webnographer.com a
>>>> remote usability testing tool. So I am advocate for testing with more
>>>> geographically diverse users than normal lab tests.
>>>>
>>>> 2009/10/2 Steve Baty <stevebaty at gmail.com>
>>>>
>>>> "If your client website has 1 million visitors a year, a usability issue
>>>>> that
>>>>> effects 10% of the users would be unlikely to be discovered on a test
>>>>> of
>>>>> only 5 to 10 users, but would give 100,000 people a bad experience when
>>>>> they
>>>>> visit the site."
>>>>>
>>>>> Actually, that's not true. You'd be fairly likely to discover it with
>>>>> only 5-10 users - in the 65%+ range of 'likely'. Manufacturing quality
>>>>> control systems and product quality testing have been using such statistical
>>>>> methods since the 20's and they went through heavy refinement and
>>>>> sophistication in the 60's, 70's and 80's.
>>>>>
>>>>> It's also worth repeating the message both Jakob & Jared Spool are
>>>>> constantly talking about: test iteratively with a group of 5-10
>>>>> participants. You'll find that 65%+ figure above rises to 99%+ in that case.
>>>>>
>>>>> Again, doesn't change your basic points about cultural diversity and
>>>>> behaviour affecting the test parameters, but your above point is not
>>>>> entirely accurate.
>>>>>
>>>>> Cheers
>>>>> Steve
>>>>>
>>>>> 2009/10/2 James Page <jamespage at gmail.com>
>>>>>
>>>>> It is dependent on how many issues there are, the cultural variance of
>>>>>> your
>>>>>> user base, and the margin of error you are happy with. Five users or
>>>>>> even 10
>>>>>> is not enough on a modern well designed web site.
>>>>>>
>>>>>> The easy way to think of a Usability Test is a treasure hunt. If the
>>>>>> treasure is very obvious then you will need fewer people, if less
>>>>>> obvious
>>>>>> then you will need more people. If you increase the area of the hunt
>>>>>> then
>>>>>> you will need more people. Most of the advocates of only testing 5 to
>>>>>> 10
>>>>>> users, experience comes from one country. Behaviour changes
>>>>>> significantly
>>>>>> country by country, even in Western Europe. See my blog post here :
>>>>>> http://blog.feralabs.com/2009/01/does-culture-effect-online-behaviour/
>>>>>>
>>>>>> If your client website has 1 million visitors a year, a usability
>>>>>> issue that
>>>>>> effects 10% of the users would be unlikely to be discovered on a test
>>>>>> of
>>>>>> only 5 to 10 users, but would give 100,000 people a bad experience
>>>>>> when they
>>>>>> visit the site.
>>>>>>
>>>>>> Can you find treasure with only five or ten users. Of course you can.
>>>>>> But
>>>>>> how sure can you be that you have found even significant issues.
>>>>>>
>>>>>> A very good argument in why 10 is not enough is Woolrych and Cockton
>>>>>> 2001.
>>>>>> They point out an issue in Nielsen formula in that he does not take
>>>>>> into
>>>>>> account the visibility of an issue. They show using only 5 users can
>>>>>> significantly under count even significant usability issues.
>>>>>>
>>>>>> The following powerpoint from an eyetracking study demonstrates the
>>>>>> issue
>>>>>> with only using a few users.
>>>>>> http://docs.realeyes.it/why50.ppt
>>>>>>
>>>>>> You may also want to look at the margin of error for the test that you
>>>>>> are
>>>>>> doing.
>>>>>>
>>>>>> All the best
>>>>>>
>>>>>> James
>>>>>> blog.feralabs.com
>>>>>>
>>>>>> 2009/10/1 Will Hacker <willhacker at sbcglobal.net>
>>>>>>
>>>>>> > Chris,
>>>>>> >
>>>>>> > There is not any statistical formula or method that will tell you
>>>>>> the
>>>>>> > correct number of people to test. In my experience it depends on the
>>>>>> > functions you are testing, how many test scenarios you want to run
>>>>>> > and how many of those can be done by one participant in one session,
>>>>>> > and how many different levels of expertise you need (e.g. novice,
>>>>>> > intermediate, and/or expert) to really exercise your application.
>>>>>> >
>>>>>> > I have gotten valuable insight from testing 6-10 people for
>>>>>> ecommerce
>>>>>> > sites with fairly common functionality that people are generally
>>>>>> > familiar with but have used more for more complex applications where
>>>>>> > there are different levels of features that some users rely on
>>>>>> > heavily and others never use.
>>>>>> >
>>>>>> > I do believe that any testing is better than none, and realize you
>>>>>> > are likely limited by time and budget. I think you can usually get
>>>>>> > fairly effective results with 10 or fewer people.
>>>>>> >
>>>>>> > Will
>>>>>> >
>>>>>> >
>>>>>> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>>>>>> > Posted from the new ixda.org
>>>>>> > http://www.ixda.org/discuss?post=46278
>>>>>> >
>>>>>> >
>>>>>> > ________________________________________________________________
>>>>>> > Welcome to the Interaction Design Association (IxDA)!
>>>>>> > To post to this list ....... discuss at ixda.org
>>>>>> > Unsubscribe ................ http://www.ixda.org/unsubscribe
>>>>>> > List Guidelines ............ http://www.ixda.org/guidelines
>>>>>> > List Help .................. http://www.ixda.org/help
>>>>>> >
>>>>>> ________________________________________________________________
>>>>>> Welcome to the Interaction Design Association (IxDA)!
>>>>>> To post to this list ....... discuss at ixda.org
>>>>>> Unsubscribe ................ http://www.ixda.org/unsubscribe
>>>>>> List Guidelines ............ http://www.ixda.org/guidelines
>>>>>> List Help .................. http://www.ixda.org/help
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 |
>>>>> E: stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty |
>>>>> LinkedIn: www.linkedin.com/in/stevebaty
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
>>> stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
>>> www.linkedin.com/in/stevebaty
>>>
>>
>>
>
>
> --
> Steve 'Doc' Baty | Principal | Meld Consulting | P: +61 417 061 292 | E:
> stevebaty at meld.com.au | Twitter: docbaty | Skype: steve_baty | LinkedIn:
> www.linkedin.com/in/stevebaty
>

2 Oct 2009 - 9:12am
Dana Chisnell
2008

On Oct 2, 2009, at 9:34 AM, James Page wrote:

> Totally agree with [Steve's] article....
>
>> So you can get a much narrower range for your estimate, but 30+
>> users is a
>> significant undertaking for a usability test.
>>
> One of our own findings from a study was that people got bored with
> testing
> more than about 8 users.
>
> James
>
I've found this with teams, too. Jared Spool calls this reaching the
point of least astonishment, and I think he's right. After you start
seeing similar problems repeat a few times, it's enough to know you
have a problem to solve, you've learned a ton about users, and it's
time to go make some inferences about what the issues are and iterate
design. For most formative usability tests -- that is usability tests
early in the design cycle where the team is still testing out ideas --
having more than 5-10 participants is just punishing for the team.

Instead, learn about users, see what they do with your design, and
move on to learn more on another round.

Dana

:: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: ::
Dana Chisnell
415.519.1148

dana AT usabilityworks DOT net

www.usabilityworks.net
http://usabilitytestinghowto.blogspot.com/

2 Oct 2009 - 12:02pm
Adam Korman
2004

There are a couple of points I wanted to follow up on in this
discussion:

Will Sanbury talked about how usability testing is not meant as a
replacement for QA. I think this is a really important point --
usability testing isn't a good way to measure (or improve) product
quality, but it is a good way to find out if you built the wrong
thing. In this context, using terms like "sample size" and "margin of
error" are just not that meaningful.

My practical experience has been that usability testing just a few
participants usually uncovers enough issues to keep the development
team plenty busy. If you test with 5 people, 80% of them encounter a
bunch of the same issues, and it takes the team several weeks to fix
those issues, what good does it do to keep running the same test on
another 25+ people to identify additional issues that only 10% will
encounter that the team doesn't have the capacity to work on? As Steve
Baty said, it's much more effective to test iteratively with small
numbers than run big, infrequent studies.

On Oct 2, 2009, at 4:51 AM, Thomas Petersen wrote:

> If we are talking wireframes or any other replacements for the real
> thing whatever you will find have very little if anything to do with
> what you find in the end.

I basically agree with this, except I would say that testing
wireframes isn't really usability testing.

-Adam

2 Oct 2009 - 12:14pm
Adam Korman
2004

I just re-read this and want to clarify what I meant by "...usability
testing isn't a good way to measure (or improve) product quality..." I
meant this in the sense that it's an inefficient way to find defects
in the execution, but a good way to find defects in the decision
making (it's broken vs. it doesn't make sense).

On Oct 2, 2009, at 10:02 AM, Adam Korman wrote:

> There are a couple of points I wanted to follow up on in this
> discussion:
>
> Will Sanbury talked about how usability testing is not meant as a
> replacement for QA. I think this is a really important point --
> usability testing isn't a good way to measure (or improve) product
> quality, but it is a good way to find out if you built the wrong
> thing. In this context, using terms like "sample size" and "margin
> of error" are just not that meaningful.
>
> My practical experience has been that usability testing just a few
> participants usually uncovers enough issues to keep the development
> team plenty busy. If you test with 5 people, 80% of them encounter a
> bunch of the same issues, and it takes the team several weeks to fix
> those issues, what good does it do to keep running the same test on
> another 25+ people to identify additional issues that only 10% will
> encounter that the team doesn't have the capacity to work on? As
> Steve Baty said, it's much more effective to test iteratively with
> small numbers than run big, infrequent studies.
>
> On Oct 2, 2009, at 4:51 AM, Thomas Petersen wrote:
>
>> If we are talking wireframes or any other replacements for the real
>> thing whatever you will find have very little if anything to do with
>> what you find in the end.
>
> I basically agree with this, except I would say that testing
> wireframes isn't really usability testing.
>
> -Adam
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help

2 Oct 2009 - 2:42pm
Thomas Petersen
2008

I have made this point before.

I really don't in general see the usage of testing during the design
process.

I see great benefit in testing before starting on the actual design
process in order to figure out what kind of problems, issues and
tasks users want. But testing usability in an environment that is not
final is IMO a waste of both time and money. Only if we are dealing
with entire new paradigms do I see any reason to test.

Most people who call them selves either information architects or
UX'ers or designers should be able to deliver their part without
needing to involve the users once the problems, tasks and purpose
have been established.

It is my claim that you can't really test usability before you
launch the final product and that you should factor this in instead.
I find the current state of UCD troubling to say the least.

Jakob Nielsen is to me someone to read to get an understanding of
users in general.

But i just need a look at his website and then look around at other
sites and applications to understand that his work as great as it is
is only a fraction of the whole story.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

2 Oct 2009 - 1:11pm
Katie Albers
2005

My comments are interleaved...

Katie Albers
katie at firstthought.com

On Oct 2, 2009, at 12:42 PM, Thomas Petersen wrote:

> I have made this point before.
>
> I really don't in general see the usage of testing during the design
> process.

Well, that's unfortunate.

> I see great benefit in testing before starting on the actual design
> process in order to figure out what kind of problems, issues and
> tasks users want. But testing usability in an environment that is not
> final is IMO a waste of both time and money. Only if we are dealing
> with entire new paradigms do I see any reason to test.

I'm not entirely sure what you mean by paradigms in this context.
Perhaps you mean a function we've never seen before? In any case, you
will generally find that very few users want problems or issues. They
want functions. They want to be able to find those functions, and
perform them with minimal exertion. And that's why we test.

> Most people who call them selves either information architects or
> UX'ers or designers should be able to deliver their part without
> needing to involve the users once the problems, tasks and purpose
> have been established.

Of course, they can, as long as they have the users' input. What
appears to be a completely reasonable process, or an obvious button,
or a clear name to someone working on the creation of an interface is
likely to turn out to be obscure, hard to follow or incomprehensible
when you put it in front of actual users. I suspect that everyone who
tests throughout the process has had the experience of a test in which
the "perfect element" turns out to be something that *none* of the
users gets.

> It is my claim that you can't really test usability before you
> launch the final product and that you should factor this in instead.
> I find the current state of UCD troubling to say the least.

Can you test the usability of the product? no. You don't have a
finished product. But you can test all the elements that are going in
to the product. If no one notices the critical button on the second
step even though your visual designer went to great lengths to
position it and color it and so forth, precisely to make it obvious,
it's better to know that before you've built an entire product that
relies on users pressing that button.

> Jakob Nielsen is to me someone to read to get an understanding of
> users in general.
>
> But i just need a look at his website and then look around at other
> sites and applications to understand that his work as great as it is
> is only a fraction of the whole story.

Jakob's site is built to highlight Jakob's group's expertise. It does
so admirably. To generalize from that very particular example to "what
Jakob thinks all sites should be like" is foolish in the extreme.

As for the rest of your statement here: Of course it's only a fraction
of the story. But it is a piece of the story. Testing as you go is a
central tenet of all aspects of development. Software developers test
pieces of their code to make sure they do the right thing. Design
engineers test screens to make sure that everything shows up properly
and in the correct space. UXers test the aspects and versions of the
product to make sure they are producing the desired results.

In each of these cases the goal is the same: It's a lot cheaper to
find something wrong on a piece or earlier in the process and correct
it then than it is to have to go back and redevelop the whole product
to set things right that you should have corrected months ago. It's
like building a house on an improperly laid foundation. It's cheaper
to fix the foundation alone than it is to fix the whole house.

2 Oct 2009 - 3:52pm
Thomas Petersen
2008

"Well, that's unfortunate. "

Not really.

"I'm not entirely sure what you mean by paradigms in this context.
Perhaps you mean a function we've never seen before? In any case,
you will generally find that very few users want problems or issues.
They want functions. They want to be able to find those functions,
and perform them with minimal exertion. And that's why we test."

Who talks about wanting problems? They HAVE problems/issues and you
need to understand what those are.

"Of course, they can, as long as they have the users' input. What
appears to be a completely reasonable process, or an obvious button,
or a clear name to someone working on the creation of an interface is
likely to turn out to be obscure, hard to follow or incomprehensible
when you put it in front of actual users. I suspect that everyone who
tests throughout the process has had the experience of a test in which
the "perfect element" turns out to be something that none of the
users gets. "

Which could might as well be a problem of testing an unfinished
product. None the less personally I have found much better value in
testing the actual product/service rather than a pseudo scenario.

It seems that many UCD proponents completely ignore how big an impact
the actual real environment have on the experience of usability and
are more intersted in the process leading up to the design and
development.

A button might not make sense when you experience it on a screen but
if it's experienced in the actual context things often change quite
drastically. A roll over or other choreopgraphy or a well designed
layout can do all the difference.

"But you can test all the elements that are going in to the product.
If no one notices the critical button on the second step even though
your visual designer went to great lengths to position it and color
it and so forth, precisely to make it obvious, it's better to know
that before you've built an entire product that relies on users
pressing that button."

You are assuming that when the visual designer "goes to greath
length" they don't understand anything about usability in general
otherwise the above example is absurd.

Why should the user know better where the button should be
positioned?

It is obvious that if you really where in such a situation where a
button you went to great extent to figure out where should be
positioned by highlighting it, still don't do the trick you are
dealing with a completely different problem that have nothing to do
with asking the users, but rather doing AB tests to figure out where
you have most success.

"Jakob's site is built to highlight Jakob's group's expertise. It
does so admirably. To generalize from that very particular example to
"what Jakob thinks all sites should be like" is foolish in the
extreme."

When did I say that Jakob Nielsen said anything about how all sites
should look like? Can you at least respond to what I write instead of
creating claims I never made.

"In each of these cases the goal is the same: It's a lot cheaper to
find something wrong on a piece or earlier in the process and correct
it then than it is to have to go back and redevelop the whole product
to set things right that you should have corrected months ago. "

All that would make sense if testing would rid us of bad
products/services. Yet what often happens is that the process becomes
such a piece of committee work that it loosens clarity and focus. UCD
is not by any means an insurance against bad feature decisions it's
not even an insurance against bad usability.

"It's like building a house on an improperly laid foundation. It's
cheaper to fix the foundation alone than it is to fix the whole
house."

It's nothing at all like building a house, since building a house
doesn't mean having the users of the house testing the foundation.
They wouldn't know the difference most of the times. That is why you
have experts with experience who know what they are doing.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

2 Oct 2009 - 6:21pm
Todd Warfel
2003

On Oct 2, 2009, at 12:42 PM, Thomas Petersen wrote:

> I really don't in general see the usage of testing during the design
> process.

Whoa! Red flag alert!

Usability testing helps evaluate a design concept that tries to
address a design problem. That testing can be a baseline test,
something you do on production system or it can be used as a
validation mechanism on a newly proposed design/prototype.

To think that usability testing is only useful to find problems or
holes with a current production system, but not your proposed design
solution is short sighted. Any given problem has multiple design
solutions. How do you know you've selected the right one?

> I see great benefit in testing before starting on the actual design
> process in order to figure out what kind of problems, issues and
> tasks users want. But testing usability in an environment that is
> not final is IMO a waste of both time and money. Only if we are
> dealing with entire new paradigms do I see any reason to test.

It's not April fools and this isn't the Onion, but... ;)

It can be an exploration technique, this one of the ways we use it, to
find out what users/consumers want, but that's really more exploratory
research than usability. Usability is more about identifying whether
or not the product/service meets the needs of the user/consumer,
enables them or impedes them, and gives them a satisfying experience.
Those measures apply to any system, production, or prototype.

> Most people who call them selves either information architects or
> UX'ers or designers should be able to deliver their part without
> needing to involve the users once the problems, tasks and purpose
> have been established.

Big mistake in doing this. That's how we got into the problem in the
first place. Someone designed the system w/o inviting users to kick it
around for a test drive. How do you know it wasn't a designer who did
it in the first place?

We do usability testing as part of our design process and as a
separate service offering to our clients. I can say that in both
cases, when we've designed something or our clients have designed
something, we find opportunities for improvement through testing.

Thinking that because you're a designer you know the right design, you
have the right decision, and it doesn't need validation is arrogant,
short-sighted, and ignorant. The best designers and the best systems
use a validation and feedback loop. Usability testing is one of those
feedback loops that's really important.

> It is my claim that you can't really test usability before you
> launch the final product and that you should factor this in instead.
> I find the current state of UCD troubling to say the least.

The current state of UCD is troubling, I'll agree with that, but it's
because so many people in charge of designing systems are leaving out
validation. The attitude that it's only good for finding problems on
existing production systems and not validating your proposed solution
is only going to make that worse. I'm a bit shocked, frankly, that you
don't see the flaw in the claim that "you can't really test usability
before you launch the final product."

Perhaps your definition of usability testing needs to be tested?

Cheers!

Todd Zaki Warfel
Principal Design Researcher
Messagefirst | Designing Information. Beautifully.
----------------------------------
Contact Info
Voice: (215) 825-7423
Email: todd at messagefirst.com
AIM: twarfel at mac.com
Blog: http://toddwarfel.com
Twitter: zakiwarfel
----------------------------------
In theory, theory and practice are the same.
In practice, they are not.

2 Oct 2009 - 10:51pm
Jared M. Spool
2003

On Oct 2, 2009, at 12:42 PM, Thomas Petersen wrote:

> I have made this point before.
>
> I really don't in general see the usage of testing during the design
> process.

Yah.

It didn't make any sense then. Still doesn't.

3 Oct 2009 - 12:30am
Mark Schraad
2006

I am dumbfounded... wow.

On Oct 2, 2009, at 12:42 PM, Thomas Petersen wrote:

> I really don't in general see the usage of testing during the design
> process.

3 Oct 2009 - 12:54am
David Drucker
2008

Talking to users, testing prototypes (paper, screen, etc.) and
analyzing their feedback teaches a designer what they don't know about
the problem at hand. To ignore these is to proceed at your own peril.

"The wise man knows he doesn't know."
- Lao Tzu

>
>
> On Oct 2, 2009, at 12:42 PM, Thomas Petersen wrote:
>
>> I really don't in general see the usage of testing during the design
>> process.
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help

3 Oct 2009 - 2:30am
Thomas Petersen
2008

"Talking to users, testing prototypes (paper, screen, etc.) and
analyzing their feedback teaches a designer what they don't know
about the problem at hand. To ignore these is to proceed at your own
peril."

I am all for talking to users, I am all for analyzing their feedback.

I just don't believe it should be done in the middle of the process
but rather before (user research) and after (analyzing actual user
behavior).

Just insisting that usability testing is necessary does not make it
so.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

2 Oct 2009 - 11:01pm
Anonymous

Thomas Petersen <tp at hellobrand.com> wrote:
> Most people who call them selves either information architects or
> UX'ers or designers should be able to deliver their part without
> needing to involve the users once the problems, tasks and purpose
> have been established.

This is akin to handing a contractor the well defined blueprints for
your dream home, and then not seeing it for the first time until after
the movers have already arranged the furniture.

Will Sansbury

3 Oct 2009 - 6:17am
Elizabeth Buie
2004

At 12:42 PM -0400 10/2/09, Thomas Petersen wrote:

>I see great benefit in testing before starting on the actual design
>process in order to figure out what kind of problems, issues and
>tasks users want. But testing usability in an environment that is not
>final is IMO a waste of both time and money. Only if we are dealing
>with entire new paradigms do I see any reason to test.
>
>Most people who call them selves either information architects or
>UX'ers or designers should be able to deliver their part without
>needing to involve the users once the problems, tasks and purpose
>have been established.

Tell me you're being facetious. Please.

Elizabeth

--
Elizabeth Buie
Luminanze Consulting, LLC
www.luminanze.com
@ebuie

3 Oct 2009 - 6:48am
Thomas Petersen
2008

"This is akin to handing a contractor the well defined blueprints for
your dream home, and then not seeing it for the first time until after
the movers have already arranged the furniture."

Since when have this been a question of taste?

No what UCD do is telling the customer that as long as you solve the
problems in the blueprint you have solved most issues.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

3 Oct 2009 - 6:59am
Paul Bryan
2008

The state of UCD and the overall usefulness of design testing are
fascinating topics, but I%u2019d like to return to the original topic
of usability testing, sample size and statistical significance,
because I think it is relevant in these times of tight research
budgets.

Research methods like usability testing are not quantitative or
qualitative in and of themselves. It%u2019s the manner in which the
data is collected and analyzed that makes the results either
quantitative or qualitative. You can have quantitative usability
testing or user interviews, and you can have qualitative surveys.
(More on this at: http://www.virtualfloorspace.com/?p=22)

The companies I work with would find it financially impractical to
undertake a statistically valid usability test, because of the
resources required to operationalize the concept of usability into
quantifiable variables that can be consistently and reliably
measured, and to engage a sample large enough to reach a satisfactory
confidence interval. A company like Microsoft, on the other hand, with
products that last for many years in a consistent form, and millions
of users performing repetitive operations, could get value from
quantitative usability testing.

The web sites I conduct usability testing for are large scale
e-commerce sites. They are trying to do something different and new
with every major release, and the usability of the site design will
have a dramatic impact on the bottom line. So they agree to user
testing at reasonable intervals to discover challenges that people
who know nothing about web site design may have, people who are in
their underwear at 2 a.m. buying a pair of shoes online or a new
appliance to replace one that broke down.

It%u2019s possible that genius designers are so in tune with their
customers that they don%u2019t need to run their designs at
successive stages of fidelity by a sample of customers to gain a
better understanding of how they will interpret and respond to new
interactive features, the kinds of supporting content they need, the
points in the process when they are likely to stop and consult
discussion boards or chat, etc. etc., but I haven%u2019t met these
designers yet.

In qualitative research, regardless of data collection method, sample
selection and size are always part science and part art. The science
part uses an understanding of different types of samples for
qualitative research and how to ensure that you are seeing a broad
enough range of people based on their variance along key dimensions
relevant to the site you are testing. A good source for this type of
information is Qualitative Evaluation Methods, by Michael Patton.

The art is that an experienced design researcher can estimate the
variability they are likely to see for a given system and set of user
segments, and balance that with the research goals and budget to
designate a sample size that is likely to result in enough repetition
to give the team confidence in the results. To publish a paper about
this number of participants and have people apply it to their
projects without understanding the impact of different design
variables, different goals, different user segment characteristics,
etc., is to sell your audience a bill of defective goods.

Paul Bryan
Usography (www.usography.com)
Linked In: http://www.linkedin.com/in/uxexperts

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

3 Oct 2009 - 11:48am
Jared M. Spool
2003

[Ok. I started to write a simple post about how you need to talk about
what you want to learn from your study before you can ask about number
of participants, but then it evolved into this 1200+ word history
lesson. I left that part in, but you can skip to the very end to see
my point. - Jared]

We're talking about usability testing as if it's a single procedure
that everyone performs exactly the same way. (This is not the only
problem with this thread, but it's the one that I won't be all snarky
about.)

As a field, we're not very good about making sure everyone knows about
our history. In the case of usability testing methods, history is
pretty important.

-> BACKGROUND - Skip to the next section if you just want to get to
the IMPORTANT RELEVANT STUFF

The first usability tests (as we know them today) were conducted by
cognitive psychologists in the 1970s. (You can trace usability testing
back to time-and-motion studies in the '20s and '30s, but I don't
think we need to go back that far for this conversation.)

When the cog. psychs. were testing, they were using the test
methodology as a technique to understand human behavior and cognition:
how did people react to stimuli (physical and digital)? They were
looking at reaction times, memory, motor response, and other basics. A
lot of this work was being done at universities and corporate research
labs, like Bell Labs and Xerox PARC. NASA, DARPA, and the DOD were
also involved. (Interestingly, they all discovered a lot of stuff that
we take for granted today in design -- back then it was all new and
controversial, like Fitts's Law.)

In the late '70s, early '80s, we started applying usability testing
into engineering processes. I was part of one of the first teams (at
Digital Equipment Corporation) to use usability tests in the process
of developing products. Engineering teams at IBM, HP, WANG, Boeing,
Siemens, GTE, and Nortel were doing similar things. (I'm sure there
were others that I've forgotten or didn't know about.)

At DEC, the first engineering uses of usability testing were for
either research-based prototype evaluation or very late-stage product
defect determination. Meanwhile, John Gould and his team at IBM
published a seminal paper about using an iterative process for
designing a messaging system at the 1984 Summer Olympics. Jim
Carroll's team were using testing methods for understanding
documentation needs in office systems. Ron Perkins & co at WANG were
doing similar things. Industrial design groups at many companies were
using usability testing for studying behavioral responses and
ergonomic constraints for interactive system devices.

It was still a few years until we saw labs at companies like
Microsoft, Word Perfect, and Apple. By the time they'd gotten
involved, we'd evolved many of the methods and protocols to look at a
the design at a variety of points throughout the development process.
But the early testing methods were too expensive and too time
consuming to effectively use within the engineering practice. It was
always a special case, reserved for the most important projects.

All of these studies involved laboratory-based protocols. In the very
late '80s and early '90s, many of us pushed for laboratory-less
testing techniques, to lower the costs and time constraints. We also
started experimenting with techniques, such as paper prototypes, which
reduced the up-front cost of building the design to test it.

Others, such as those behind the participatory design movement in
Scandinavia and the ethnographic/contextual design methods emerging in
the US and central Europe, were looking at other methods for gleaning
information. (This is when Jakob started popularizing Discount
Usability Engineering, which had a huge impact on the adoption of the
techniques within the design process.)

Today, we see that the cost of conducting a usability test has dropped
tremendously. When I started in the '70s, a typical study would easily
cost $250,000 in today's dollars. Today, a team can perform an eight
participant in-person study for much less than $5,000 and remote
methods are even cheaper.

-> IMPORTANT RELEVANT STUFF (in case you decided to skip the BACKGROUND)

All this is relevant to the conversation, because usability testing
has morphed and changed in its history. When we used it for scientific
behavioral and cognitive studies, we needed to pay close attention to
all the details. Number of users was critical, as was the recruiting
method, the moderation protocols, and the analysis methods. You
couldn't report results of a study without describing, in high detail,
every aspect of how you put the study together and came to your
conclusion. (You still see remnants of this today in the way CHI
accepts papers.)

When we were using it for defect detection, we needed to understand
the number of users problem better. That's when Nielsen & Landauer,
Jim Lewis, Bob Virzi, and Will Schroeder & I started looking at the
variables.

But we've moved passed defect detection for common usage. And in that
way, usability testing has morphed into a slew of different
techniques. As a result, the parameters of using the method change
based on how you're using it.

Today, the primary use is for gleaning insights about who our users
are and how they see our designs. It's not about finding problems in
the design (though, that's always a benefit). Instead, it's a tool
that helps us makes decisions in those thousands of moments during the
design process when we don't have access to our users.

Sitting next to a single user, watching them use a design, can be, by
itself an enlightening process. When we work with teams who are
watching their users for the first time (an occurrence that happens
way too often still), they come out of the first session completely
energized and excited about what they've just learned. And that's just
after seeing 1 definitely-not-statistically-significant user.

Techniques like usability testing are used today to see the design
through the eyes of the user. Because a lot of hard work has been done
through the years to bring the costs of testing down significantly, we
can use it in this way, which was never possible back when I started
in this business.

But, there are uses of usability testing that still need to take
sample size into account. For example, when we conduct our Compelled
Shopping Analysis, we typically have 50 or more participants in the
study. (The largest so far had 72 participants in the main study with
12 pilot/rehearsal participants to work the bugs out of the
protocols.) These studies are very rigorous comparisons of multiple
aspects of live e-commerce sites and we need to ensure we're capturing
all the data accurately. Interestingly, we regularly find show-
stopping design problems in the last 5 participants that weren't seen
before in the study.

-> MY POINT (finally)

So, usability testing has evolved into a multi-purpose tool. You can't
really talk about the minimum number of participants without talking
about how you want to use the tool. And you can't talk about how you
want to use the tool without talking about what you want to learn.

If you just want to gain insights about who your users are and how
they'll react to your design ideas, you only need a small number (1-5)
to get really interesting, great insights. Other techniques (such as 5-
second tests, defect detection, Compelled Shopping, Inherent Value
studies) require different numbers of participants.

And the different techniques also require different recruiting
protocols, different moderating protocols, and different data analysis
protocols. So, if we're talking about number of participants, we also
need to talk about those differences too.

Hopefully, that will clear all this up. If you want to ask about the
number of participants, tell us first about what you hope to learn.

Jared

4 Oct 2009 - 4:40am
James Page
2008

Jared,

I enjoyed your post, and it is interesting how there was a paradigm shift
from large to small studies. Surely the web's advent in the late 90's mean
that the techniques developed in the late 80's and early 90's need updating,
to leverage the technology change that has happened since then. We need a
new paradigm shift?

Of course the number of participants depends on what you are wanting to
learn. If you are interested just in one user, one user is enough. But for a
website aiming at a diverse selection users does that hold true? Doesn't
the number of users boil down to a cost issue?

You argue that techniques like usability testing are used today to see the
design through the eyes of the user. For that we do need more than 5 or 10
users as the diversity, background, frequency of use and experience of the
users has changed significantly since the days of the mainframe computers
that the discount methods where designed around.

When Jakob Nielsen, and others like yourself came up with the discount
methods, in the 1980's most use of a computer was by people that where
trained to do a task, which they did frequently. (I wonder if your own
website in a year has more users, from more countries than used one of the
systems you originally worked on like Digital Equipment Corporation's
PDP10?)

An example of this is before Internet Flight Booking systems came around in
the late 90's it took months of training to be able to book a flight on a
computer. Now nobody trains to be able to book a flight, or hotel. Times
have changed significantly since when the discount methods where developed.

The issue I have with testing with just a few users is that it can exclude a
significant issue.
Nielsen makes a claim that his useit site might look awful, but that it is
readable, which is is not the case for me. I am Dyslexic, and I find
Nielsen's useit website hard going, because he uses very wide column widths.
(I can read a narrow column twice as fast as I can read a wide one). Now the
chance that when he tested the site with only 8 people that one of the
participants was Dyslexic would be low. But there are still many millions of
us. If he either had used the heuristics from magazine, or newspaper design
or had tested the site with descent sample size he would know that he had an
issue. Or maybe the reason he did not discover the issue is because when he
built the site in 1995 screen sizes where smaller and therefore the columns
where too, but times have changed and he needs to re test.

Nielsen's constant of .33 to show that 5 users is enough assumes that 33% of
the test participants will experience the issue. My guess is that between 5%
and 10% would experience the column width issue but I may be wrong and that
is why testing is important.

If Nielsen only tests with 5 or 10 people he has no way in knowing if this
is an issue he needs to fix. Does it only effect me in the United Kingdom,
or are there many more people that have an issue with it? I am sure that
Nielsen is a very busy person, and is it worth his effort in fixing the
issue? I have heard he built the site himself. If he solves the problem how
will that effect the users that are used to the original design? With only
testing with eight people it is hard to construct an argument.

By only using a few people for user research in one location, are you not
excluding a significant number of your sites audience?

All the best

James
blog.feralabs.com

My disclaimer is that I co-started www.webnographer.com a online usability
testing tool. The reason that my partner and I have sweat hours in
developing the tool and in developing remote methods is that we believe is
that usability testing needs to become cheaper, and test a more diverse
selection of users than current methods do.

2009/10/3 Jared Spool <jspool at uie.com>

> [Ok. I started to write a simple post about how you need to talk about what
> you want to learn from your study before you can ask about number of
> participants, but then it evolved into this 1200+ word history lesson. I
> left that part in, but you can skip to the very end to see my point. -
> Jared]
>
> We're talking about usability testing as if it's a single procedure that
> everyone performs exactly the same way. (This is not the only problem with
> this thread, but it's the one that I won't be all snarky about.)
>
> As a field, we're not very good about making sure everyone knows about our
> history. In the case of usability testing methods, history is pretty
> important.
>
> -> BACKGROUND - Skip to the next section if you just want to get to the
> IMPORTANT RELEVANT STUFF
>
> The first usability tests (as we know them today) were conducted by
> cognitive psychologists in the 1970s. (You can trace usability testing back
> to time-and-motion studies in the '20s and '30s, but I don't think we need
> to go back that far for this conversation.)
>
> When the cog. psychs. were testing, they were using the test methodology as
> a technique to understand human behavior and cognition: how did people react
> to stimuli (physical and digital)? They were looking at reaction times,
> memory, motor response, and other basics. A lot of this work was being done
> at universities and corporate research labs, like Bell Labs and Xerox PARC.
> NASA, DARPA, and the DOD were also involved. (Interestingly, they all
> discovered a lot of stuff that we take for granted today in design -- back
> then it was all new and controversial, like Fitts's Law.)
>
> In the late '70s, early '80s, we started applying usability testing into
> engineering processes. I was part of one of the first teams (at Digital
> Equipment Corporation) to use usability tests in the process of developing
> products. Engineering teams at IBM, HP, WANG, Boeing, Siemens, GTE, and
> Nortel were doing similar things. (I'm sure there were others that I've
> forgotten or didn't know about.)
>
> At DEC, the first engineering uses of usability testing were for either
> research-based prototype evaluation or very late-stage product defect
> determination. Meanwhile, John Gould and his team at IBM published a seminal
> paper about using an iterative process for designing a messaging system at
> the 1984 Summer Olympics. Jim Carroll's team were using testing methods for
> understanding documentation needs in office systems. Ron Perkins & co at
> WANG were doing similar things. Industrial design groups at many companies
> were using usability testing for studying behavioral responses and ergonomic
> constraints for interactive system devices.
>
> It was still a few years until we saw labs at companies like Microsoft,
> Word Perfect, and Apple. By the time they'd gotten involved, we'd evolved
> many of the methods and protocols to look at a the design at a variety of
> points throughout the development process. But the early testing methods
> were too expensive and too time consuming to effectively use within the
> engineering practice. It was always a special case, reserved for the most
> important projects.
>
> All of these studies involved laboratory-based protocols. In the very late
> '80s and early '90s, many of us pushed for laboratory-less testing
> techniques, to lower the costs and time constraints. We also started
> experimenting with techniques, such as paper prototypes, which reduced the
> up-front cost of building the design to test it.
>
> Others, such as those behind the participatory design movement in
> Scandinavia and the ethnographic/contextual design methods emerging in the
> US and central Europe, were looking at other methods for gleaning
> information. (This is when Jakob started popularizing Discount Usability
> Engineering, which had a huge impact on the adoption of the techniques
> within the design process.)
>
> Today, we see that the cost of conducting a usability test has dropped
> tremendously. When I started in the '70s, a typical study would easily cost
> $250,000 in today's dollars. Today, a team can perform an eight participant
> in-person study for much less than $5,000 and remote methods are even
> cheaper.
>
> -> IMPORTANT RELEVANT STUFF (in case you decided to skip the BACKGROUND)
>
> All this is relevant to the conversation, because usability testing has
> morphed and changed in its history. When we used it for scientific
> behavioral and cognitive studies, we needed to pay close attention to all
> the details. Number of users was critical, as was the recruiting method, the
> moderation protocols, and the analysis methods. You couldn't report results
> of a study without describing, in high detail, every aspect of how you put
> the study together and came to your conclusion. (You still see remnants of
> this today in the way CHI accepts papers.)
>
> When we were using it for defect detection, we needed to understand the
> number of users problem better. That's when Nielsen & Landauer, Jim Lewis,
> Bob Virzi, and Will Schroeder & I started looking at the variables.
>
> But we've moved passed defect detection for common usage. And in that way,
> usability testing has morphed into a slew of different techniques. As a
> result, the parameters of using the method change based on how you're using
> it.
>
> Today, the primary use is for gleaning insights about who our users are and
> how they see our designs. It's not about finding problems in the design
> (though, that's always a benefit). Instead, it's a tool that helps us makes
> decisions in those thousands of moments during the design process when we
> don't have access to our users.
>
> Sitting next to a single user, watching them use a design, can be, by
> itself an enlightening process. When we work with teams who are watching
> their users for the first time (an occurrence that happens way too often
> still), they come out of the first session completely energized and excited
> about what they've just learned. And that's just after seeing 1
> definitely-not-statistically-significant user.
>
> Techniques like usability testing are used today to see the design through
> the eyes of the user. Because a lot of hard work has been done through the
> years to bring the costs of testing down significantly, we can use it in
> this way, which was never possible back when I started in this business.
>
> But, there are uses of usability testing that still need to take sample
> size into account. For example, when we conduct our Compelled Shopping
> Analysis, we typically have 50 or more participants in the study. (The
> largest so far had 72 participants in the main study with 12 pilot/rehearsal
> participants to work the bugs out of the protocols.) These studies are very
> rigorous comparisons of multiple aspects of live e-commerce sites and we
> need to ensure we're capturing all the data accurately. Interestingly, we
> regularly find show-stopping design problems in the last 5 participants that
> weren't seen before in the study.
>
> -> MY POINT (finally)
>
> So, usability testing has evolved into a multi-purpose tool. You can't
> really talk about the minimum number of participants without talking about
> how you want to use the tool. And you can't talk about how you want to use
> the tool without talking about what you want to learn.
>
> If you just want to gain insights about who your users are and how they'll
> react to your design ideas, you only need a small number (1-5) to get really
> interesting, great insights. Other techniques (such as 5-second tests,
> defect detection, Compelled Shopping, Inherent Value studies) require
> different numbers of participants.
>
> And the different techniques also require different recruiting protocols,
> different moderating protocols, and different data analysis protocols. So,
> if we're talking about number of participants, we also need to talk about
> those differences too.
>
> Hopefully, that will clear all this up. If you want to ask about the number
> of participants, tell us first about what you hope to learn.
>
> Jared
>
>
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

4 Oct 2009 - 3:20pm
Samantha LeVan
2009

In the corporate world, rarely do we have the budget and time to test
a website or app with hundreds, if not thousands, of users. What
matters most is deciding what you need to learn from a study - how
many critical tasks should be evaluated, is it a comparison study,
etc. Then you can start with a smaller sample of users and if
necessary, add more.

Rarely do I find a need to have more than five participants per task
(in many cases, they complete multiple task workflows). After five, I
see the patterns. I see the critical issues. Then I make my
recommendations and move on. If there is obvious inconsistency, I
continue to evaluate until the glaring issues are exposed. It works.
It's quick. And it's cheap.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=46278

4 Oct 2009 - 2:01pm
Jared M. Spool
2003

On Oct 4, 2009, at 5:40 AM, James Page wrote:

> The issue I have with testing with just a few users is that it can
> exclude a significant issue.

James,

I think that's the major flaw in your thinking. You're trying to use
usability testing primarily for issue detection and it's a very
inefficient tool for that.

> Nielsen makes a claim that his useit site might look awful, but that
> it is readable, which is is not the case for me. I am Dyslexic, and
> I find Nielsen's useit website hard going, because he uses very wide
> column widths.

I too am dyslexic, but the column widths aren't the big issue I have
with Jakob's site. The big issue issue I have is his content.

> By only using a few people for user research in one location, are
> you not excluding a significant number of your site's audience?

Yes.

Which is why using usability testing as a sole source for issue
detection will inevitably fail.

There's no way you could put together a cost-effective study (even
with super-duper remote testing applications) that would participants
at chance for every possible variance found in humans.

By trying to use usability testing in this way, you're creating a
final inspection mentality, which Demming and the world of statistical
quality control has taught us (since the 40s) is the most expensive
and least reliable way of ensuring high quality. Issues will be missed
and users will be less satisfied using this approach.

Instead, a better approach is to prevent the usability problems from
being built into the design in the first place. Jakob shouldn't need
to conduct usability tests to discover that longer column widths could
be a problem with people with reading disabilities. In fact, those of
us who've paid attention to the research on effective publishing
practices have known for a long time that shorter columns are better.

Larger sample sizes, even when the testing is dirt cheap, is too
expensive for finding problems like this. We need to shift away from
the mentality that usability testing is a quality control technique.

Because of this, we've found in our research that teams get the most
value from usability testing (along the other user research
techniques) when they use it to inform their design process. By
getting exposure to the users, the teams can make informed decisions
about their design. The more exposure, the better the outcomes of the
designs.

To research this, we studied teams building a variety of online
experiences. We looked for correlations between those teams' user
research practices and how effective the team was at producing great
designs. We looked at the range of techniques they employed, whether
they hired experienced researchers, how many studies they ran, how
frequently the studies were, and about 15 other related variables.

We found that many of the variables, including the nature of the
studies (lab versus field, for example) or number of study
participants did not correlate to better designs.

More importantly, we found that 2 key variables did correlate
substantially to better designs: the % of hours of exposure each team
member had to primary observation and the frequency of primary
observation.

This led us to start recommending that teams try to get every team
member exposed to as many hours of observing users throughout the
design process. The minimum we're recommending is 2 hours of
observation every 6 weeks. The best teams have their team members
observing users for several hours every week or so.

Based on our research, we can confidently predict that having each
team member watch two users for two hours every 3 weeks will result in
a substantially better design than hiring the world's most experienced
user researchers to conduct a 90-participant study that none of the
team members observe.

So, number of participants in the study is a red herring. The real
value is number of hours each team member is exposed to users.

That's my opinion, and it's worth what you paid for it.

Jared

p.s. Is Webnographer an unmoderated remote usability testing tool? It
occurred to me this morning that it would be great to combine
unmoderated remote usability testing with eye tracking. Then we could
throw out all the data in a single step, instead of having to ignore
it piecemeal. A huge step forward in efficiency, I would think.

Jared M. Spool
User Interface Engineering
510 Turnpike St., Suite 102, North Andover, MA 01845
e: jspool at uie.com p: +1 978 327 5561
http://uie.com Blog: http://uie.com/brainsparks Twitter: @jmspool

5 Oct 2009 - 2:29am
Harry Brignull
2004

A number of the discussions on this list are reminding me of the Ron
Jeffries article "We tried baseball and it didn't work" -
http://xprogramming.com/xpmag/jatBaseball

In other words, I can't help wondering that discussions around methods being
great / rubbish boil down to past experiences with a method, rather than the
inherent qualities of that method. In my next piece of research, I'm going
to do remote unmoderated usability testing alongside classic face-to-face
usability testing. Unfortunately I can't share the findings - another core
problem with this sort of discussion - we are stuck in vagueness because
NDAs prevent us from sharing findings like academics can.

I don't think I agree with Jared's conclusions about throwing out
eye-tracking and unmoderated usability testing (in lieu of more evidence, at
least) - while eye tracking is inherently expensive, but I suspect remote
unmoderated usability testing has potential to bring affordable usability
testing to the masses.

Anyone else care to comment?

5 Oct 2009 - 6:56am
Dana Chisnell
2008

On Oct 5, 2009, at 3:29 AM, Harry wrote:

> I suspect remote
> unmoderated usability testing has potential to bring affordable
> usability
> testing to the masses.

I think that remote unmoderated usability testing can, on its surface,
be very affordable. There are plenty of services out there that make
it quite accessible to anyone who wants to use it. There are several
important considerations that I can think of in going with remote
unmoderated tests:

- By its nature, it must be a summative, validating test of a pretty
solid design. Doing remote, unmoderated tests of an early design or a
prototype of any sizable design is dangerous. If your customer pool is
large enough to do remote, unmoderated tests, you probably need to be
doing A-B testing.

- The test itself must be very, very well designed to get the data out
of it that will help teams make decisions that are as well informed as
they get out of doing live, in-person testing. It must be much better
designed than a live, in-person, moderated test, because there's no
opportunity after it is out there to clarify questions or ask follow-
up questions. Or to actually see or hear what happened, thus
demystifying responses.

- The UI of the remote testing tool must not add a burden for the
participant on top of whatever trouble they're having using the thing
being tested. *That* needs to be tested, too. Otherwise, your results
may be muddled, conflated, or invalid.

- You have to rely on participants being good writers if you ask them
to offer up comments on what they're having problems with. Many who
answer the intercept for remote unmoderated tests simply are not
willing to invest the time. And you have no way to ask them follow-up
questions to clarify.

Dana

:: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: ::
Dana Chisnell
415.519.1148

dana AT usabilityworks DOT net

www.usabilityworks.net
http://usabilitytestinghowto.blogspot.com/

6 Oct 2009 - 2:09am
Jonas Söderström
2009

Jared,

loved your post about getting the team to observe users, instead of doing traditional usability tests. (Loved the way you presented the same thoughts at IA Summit in Miami last year, too, btw!)

Let's say we're developing a new version of an existing service. Based on the insights from your research - what do you think would be the best strategy?

To stick with letting the team watch users use the existing version - and thus, over the project, collect richer and richer real experience, and trust that the teams design skills will provide us with good solutions for the new version?

Or should we make the users try our gradually developed prototypes of the new product, in session after session?

BTW, do the successfull teams require their team members to document their observations of users? Or is it more efficient to let them use this input and the insights in an informal way?

Jonas Söderström
senior information architect
Sweden

--------------------------------------------------------
For the lesson lies in learning and by teaching I'll be taught
for there's nothing hidden anywhere, it's all there to be sought
- Keith Reid
---------------------------------------------------------

At 15.01 -0400 09-10-04, Jared Spool wrote:
>On Oct 4, 2009, at 5:40 AM, James Page wrote:
>
>>The issue I have with testing with just a few users is that it can exclude a significant issue.
>
>James,
>
>I think that's the major flaw in your thinking. You're trying to use usability testing primarily for issue detection and it's a very inefficient tool for that.
>
>>Nielsen makes a claim that his useit site might look awful, but that it is readable, which is is not the case for me. I am Dyslexic, and I find Nielsen's useit website hard going, because he uses very wide column widths.
>
>I too am dyslexic, but the column widths aren't the big issue I have with Jakob's site. The big issue issue I have is his content.
>
>>By only using a few people for user research in one location, are you not excluding a significant number of your site's audience?
>
>Yes.
>
>Which is why using usability testing as a sole source for issue detection will inevitably fail.
>
>There's no way you could put together a cost-effective study (even with super-duper remote testing applications) that would participants at chance for every possible variance found in humans.
>
>By trying to use usability testing in this way, you're creating a final inspection mentality, which Demming and the world of statistical quality control has taught us (since the 40s) is the most expensive and least reliable way of ensuring high quality. Issues will be missed and users will be less satisfied using this approach.
>
>Instead, a better approach is to prevent the usability problems from being built into the design in the first place. Jakob shouldn't need to conduct usability tests to discover that longer column widths could be a problem with people with reading disabilities. In fact, those of us who've paid attention to the research on effective publishing practices have known for a long time that shorter columns are better.
>
>Larger sample sizes, even when the testing is dirt cheap, is too expensive for finding problems like this. We need to shift away from the mentality that usability testing is a quality control technique.
>
>Because of this, we've found in our research that teams get the most value from usability testing (along the other user research techniques) when they use it to inform their design process. By getting exposure to the users, the teams can make informed decisions about their design. The more exposure, the better the outcomes of the designs.
>
>To research this, we studied teams building a variety of online experiences. We looked for correlations between those teams' user research practices and how effective the team was at producing great designs. We looked at the range of techniques they employed, whether they hired experienced researchers, how many studies they ran, how frequently the studies were, and about 15 other related variables.
>
>We found that many of the variables, including the nature of the studies (lab versus field, for example) or number of study participants did not correlate to better designs.
>
>More importantly, we found that 2 key variables did correlate substantially to better designs: the % of hours of exposure each team member had to primary observation and the frequency of primary observation.
>
>This led us to start recommending that teams try to get every team member exposed to as many hours of observing users throughout the design process. The minimum we're recommending is 2 hours of observation every 6 weeks. The best teams have their team members observing users for several hours every week or so.
>
>Based on our research, we can confidently predict that having each team member watch two users for two hours every 3 weeks will result in a substantially better design than hiring the world's most experienced user researchers to conduct a 90-participant study that none of the team members observe.
>
>So, number of participants in the study is a red herring. The real value is number of hours each team member is exposed to users.
>
>That's my opinion, and it's worth what you paid for it.
>
>Jared
>
>p.s. Is Webnographer an unmoderated remote usability testing tool? It occurred to me this morning that it would be great to combine unmoderated remote usability testing with eye tracking. Then we could throw out all the data in a single step, instead of having to ignore it piecemeal. A huge step forward in efficiency, I would think.
>
>Jared M. Spool
>User Interface Engineering
>510 Turnpike St., Suite 102, North Andover, MA 01845
>e: jspool at uie.com p: +1 978 327 5561
>http://uie.com Blog: http://uie.com/brainsparks Twitter: @jmspool
>
>
>________________________________________________________________
>Welcome to the Interaction Design Association (IxDA)!
>To post to this list ....... discuss at ixda.org
>Unsubscribe ................ http://www.ixda.org/unsubscribe
>List Guidelines ............ http://www.ixda.org/guidelines
>List Help .................. http://www.ixda.org/help

--

6 Oct 2009 - 11:17am
Adrian Howard
2005

Hi Jonas,

On 6 Oct 2009, at 08:09, Jonas Söderström wrote:

> Jared,
>
> loved your post about getting the team to observe users, instead of
> doing traditional usability tests. (Loved the way you presented the
> same thoughts at IA Summit in Miami last year, too, btw!)
>
> Let's say we're developing a new version of an existing service.
> Based on the insights from your research - what do you think would
> be the best strategy?

I'm not Jared - but my personal experience would be to avoid this:

> To stick with letting the team watch users use the existing version
> - and thus, over the project, collect richer and richer real
> experience, and trust that the teams design skills will provide us
> with good solutions for the new version?

and do this:

> Or should we make the users try our gradually developed prototypes
> of the new product, in session after session?

There's only so much information you can get from the product as it
stands. Once you start changing the design some of that information
becomes invalid. Maybe v2.0 does make doing Foo much easier, but has
it made doing Bar much harder? You need to validate the new design
decisions that you're making - and learn from the feedback.

Discovering more information about the old product is going to become
less and less useful as the old and new products diverge.

> BTW, do the successfull teams require their team members to document
> their observations of users? Or is it more efficient to let them use
> this input and the insights in an informal way?

I personally find informal mechanisms to be much more effective.
However - unless you can spent significant amounts of time with the
team you may need to fall back to more formal communication
mechanisms. That said - I find informal techniques so much more
effective I'd fight quite hard to change the environment so I can use
them :-)

Cheers,

Adrian
--
http://quietstars.com - twitter.com/adrianh - delicious.com/adrianh

7 Oct 2009 - 4:34am
Harry Brignull
2004

Dana,
I'm interested in a point you made earlier in this thread: *"**Doing remote,
unmoderated tests of an early design or a prototype of any sizable design is
dangerous. If your customer pool is large enough to do remote, unmoderated
tests, you probably need to be doing A-B testing."*

Why dangerous? I'm intrigued...

thanks

Harry

7 Oct 2009 - 10:36am
Dana Chisnell
2008

The point of doing usability testing is to get data on which to base
design decisions. Early in a design, you're looking a lot at *why*
people are doing what they're doing, not just what they're doing or
whether they can use it. If you're not observing the interaction
somehow, all you have to rely on is what people tell you in their
written comments. The problem with that is that most people are not
very good writers. And you can't ask them follow-up questions if the
test is remote and unmoderated.

If what you're working on is a mature design with a very large user
base and you want to learn whether subtle changes have an effect, then
that's where to go with A-B testing. Amazon does this. They make a
small change, release it to 5,000 people, and then turn it off and go
look at the data. They keep doing this until they get the effect they
want. But they also have millions of people using their site every day.

Dana

:: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: :: ::
Dana Chisnell
415.519.1148

dana AT usabilityworks DOT net

www.usabilityworks.net
http://usabilitytestinghowto.blogspot.com/

On Oct 7, 2009, at 5:34 AM, Harry wrote:

> Dana,
> I'm interested in a point you made earlier in this thread: *"**Doing
> remote,
> unmoderated tests of an early design or a prototype of any sizable
> design is
> dangerous. If your customer pool is large enough to do remote,
> unmoderated
> tests, you probably need to be doing A-B testing."*
>
> Why dangerous? I'm intrigued...
>
> thanks
>
> Harry
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help

7 Oct 2009 - 11:40am
Larry Tesler
2004

I've been running usability studies since 1974. In 1980-81, the
strategy I settled upon was to continue adding testers (qualified
subjects) until I had learned nothing new from two people in a row.
Depending on the circumstances, that point was usually reached after
4-8 tests, in the same range that other usability professionals have
discovered.

Often, an early tester--sometimes the very first--would encounter a
problem such as ambiguous or misleading copy that would obviously
affect a lot of users. We'd fix such problems immediately, often
before the next tester arrived.

Since that time, I haven't seen many situations where most serious
problems weren't exposed by a well-designed and carefully observed
study of 4-8 qualified subjects, people who would be likely users of
the software with typical levels of prior experience and knowledge.

Problems encountered too rarely to detect with a small number of
testers can still be serious. But most such problems are more
effectively discovered in other ways, such as analysis of customer
service and usage logs.

Errors with profound consequences, such as physical injury or
substantial economic loss, require different approaches, including
careful walkthroughs with security experts, large beta trials in which
adverse consequences are artificially limited, and exposure of test
subjects to intentionally risky scenarios within simulated environments.

Larry Tesler

7 Oct 2009 - 12:36pm
Harry Brignull
2004

Larry, I couldn't agree more. I'm pretty much born and bred in your approach
to design research, but I guess I'm just keen to learn for myself whether
I'll get any distance out of remote UT with prototypes. Because the costs
are low, the benefits don't need to be huge for it to be a valuable adjunct
to traditional face-to-face research.
For example, I find that
though I'm often personally "sold" on a finding, when it comes to
evaluating cost of implementation, suddenly only having 1/5 or 2/5 users
worth of evidence can raise question marks. I suspect that having data like
"40/50 user tested could complete the task given" / "XX/50 rated it as
satisfactory or better" / "XX/50 indicated they understood the core product
concept" / etc - these kind of broad findings, however foggy, would still be
worth having.

I vaguely recall, in another thread, Jared mentioning that user researchers
at Google sometimes use eye tracking specifically because it's appealing to
the engineers (who hold a lot of political
clout)... i.e. they choose a method that produces evidence that's most
compelling to their audience; as well as using a method that is most
effective at generating insights for them.

>From this perspective I also think remote UT may prove useful for people in
certain political situations.

Still, it's different stroke for different folks!

Harry

7 Oct 2009 - 10:51pm
Jared M. Spool
2003

On Oct 7, 2009, at 1:36 PM, Harry wrote:

> I vaguely recall, in another thread, Jared mentioning that user
> researchers
> at Google sometimes use eye tracking specifically because it's
> appealing to
> the engineers (who hold a lot of political
> clout)... i.e. they choose a method that produces evidence that's most
> compelling to their audience; as well as using a method that is most
> effective at generating insights for them.

What I specifically said was that Google's engineers prefer to attend
usability sessions where the eye tracker is employed. I didn't say
that Google used any of the data from the eye tracker in any
meaningful way, just that the little blue dot apparently is attractive
to their engineers. Sorta like good chinese food.

Jared

Syndicate content Get the feed