How many participants would you need to test to determine whether an interface component changed a users behavior? Have you seen this happen?

16 Dec 2011 - 5:55pm
4 years ago
7 replies
14650 reads

Hi everyone,

I'm curious to get other's thougths on a project I am working on.  

I need to test whether users would change their response based on whether the information was presented in one way vs another way.  For example, a value would be presented as 5 buttons where they could see all of them and click one OR the values could be presented in a slider with tick marks where they user could select one.

Question is do you think presenting the information in the two different ways could change what users would select and 2.  how many particants would you use to check this theory?

Thanks and Happy holidays.   


17 Dec 2011 - 3:14am

The answer to your question about how many participants would you need to test depends on how confident you want to be in the results.  Considering confidence intervals and margin of error statistics, 10 participants might be enough, but you will still probably have a margin of error in the neighborhood of +/-20%.  It also depends on the results.  If you conclude that approach B resulted in different behavior in 8 of 10 participants, then you would be pretty confident that there is a difference between A and B , even with a 20% margin of error.  If you conclude that approach B resulted in different behavior in only 3 of the 10 participants, then you would have pretty low confidence that there is a difference between A and B.

It seems plausible that presenting the information in the two different ways could impact what people select.  But, be careful to isolate the causal factors; what the default setting is (if there is one) will have an effect; how accurate and responsive the slider is could also have an effect (as will the device people are using).  Also assuming that you are keeping the same order and labels on the values.

18 Dec 2011 - 9:31pm
Dana Chisnell


  Well, it's all about context. What's the behavior you *want* people to engage in here? What do you want their next step to be? Where did they come from? So, think about what the success criteria would be -- that is, what do you want people to do? What's the usability / business goal here?  Without seeing the design and where in it this particular interaction fits, and without knowing what the application does and why users might be motivated to use it, I can't predict whether users would change what they select. 

  When you test it, you can have each person use both designs -- but change the order they use them in. So, every other person should start with the slider version. You probably want 8 people to test this. Though you could do it with 4, I'm assuming there will be other things in the design that you want to observe that may interact with this thing that you want to see, or that might even muddy (or even create a confound) your data.

  You may find in reviewing the data after the test that changing users' behavior had nothing to so with this specific interaction element, but a combination of factors: where the interaction is in the workflow; the motivation of the user for doing the behavior; what else is on the screen at the same time (or other possible distractions).  

Good luck! 


20 Dec 2011 - 3:38am
Andy Polaine

I agree with what the others have said, but there is also a second point here that's important, which is the difference between qualitative and quantiative methods. If you want to find out why people have changed their behaviour (i.e., the way the information – actually it's an interface that you are describing, right? – was presented) and what they think of it, then take a small sample (10 is good, but even 6 or 7 would do) and delve deeper. Either ask them to think aloud and describe what they are thinking and doing, or video/screen capture them and get them to make a voice over describing what they were thinking and doing immediately afterwards while watching their own video back (this is a better method because the thinking aloud doesn't interfere with the original interaction).

If you just test 10 people, you can get a lot of information here, but it's qualitative. It can give you insights into your participants' understanding, motivations and behaviours, but you can't infer any statistical significance from it.  In this sense, I disagree with or would modify what Ken (lunytnz) said. You can make no judgements whatsoever about a margin of error – 20% +/- is an irrellevant issue with such a small sample size. You can only really generalise that statistically if you take a quantiative approach with a larger sample size of people, which is what Google would do. But then this is still not going to be that useful - knowing that, say, 70% of people prefer one way the information is presented vs another doesn't shed any insight into why. Maybe it's the layout of slider vs buttons, but maybe it's the size, typography, colour or simple the order you present the two options (which you would ideally randomise anyway).

It depends, as Dana says, on the business case and the environment you are working in and who you have to convince. To generalise – and slightly type-cast – a very engineering led company is going to want you to try and find a "truth" rather than an insight. They're only going to be convinced if you can give them a solid number that proves your case. Other teams might be happier with the use-case insights and stories that you bring from smaller, qualitative testing. I think qualtiative insights give you actionable results – that is, things that designers can actually do something with. Think of the results of "70% of people prefer option A" (quantiative) to "several people found the slider confusing or difficult to "grab" onto with the mouse, like in this example here [show video]" . The latter gives you something to work on and improve. 

21 Dec 2011 - 9:28am
Sabrina Mach

Hi Tamella

If you want to find out if design A is better than design B then 4-10 people is probably not enough! 

There is a rule of thumb that 20 participants have a margin of error of 20%, 80 participants have a margin of error of 10%,  and 320 participants have a margin of error of 5%, What this means is that if you test with 20 people and you find that 80% of participants succeed, and you then re-run the test with 20 people, your findings can lay anywhere between 60% to 100%. See Jeff Sauros blog post:

This means for your A/B test with 20 people that, if 80% succeed on design A (confidence interval ranges from 60% to 100%) and only 50% succeed (confidence interval ranges from 30% to 70%) on design B, you will still have no idea which design works better, and you cannot be confident in those findings!

To be able to confidently conclude that one design works better than another I would suggest you would want at least 100-200 people, per design condition (a total of 200 to 400 participants for the whole test). But it depends on how similar the results are for each test.

We carry out this type of research all the time, doing remote usability testing where it is easy to test with large numbers of people. You can get access to participants cheaply via online pannels. We use them for some of our projects, and it only takes 1-2 days to collect all the data you need. 

@Andy Polaine
Knowing 70% succeeded on design A and only 50% succeeded in design B holds great insights! You know you can bin design B.

Additionally, in the type of research we carry out don't just get the quantitative information, but also information on error rates (where and why people miss a step). This then helps you to optimise design A.

@Tamella, if you want to be sure that one design works better than another, you need large numbers of participants to test with. Otherwise you have no idea if the variance in behaviour is due to the people you sampled or if there is a real difference.

Good luck with your research!


Sabrina Mach 
Director at Webnographer

Follow @webnographer on Twitter


21 Dec 2011 - 11:14am
Andy Polaine

@Sabrina - you've made a good case for quantiative research. It still depends on the context of Tamella's project and, of course, budget and time. I'm not sure I agree that "Knowing 70% succeeded on design A and only 50% succeeded in design B" means you can bin design B. Not without knowing what we mean by "succeeded" – it's a very task-oriented way of looking at interaction. There are plenty of tasks I can succeed in doing, but the process of it is a teeth-pulling experience, hence the qualitiative material from a small sample being more useful. Qualitative material from 200-400 people would be far too much to be useful.

I'm with Steve Krug on this kind of testing – you can do a lot with much, much less and it doesn't become a prohibitive process for all but the largest of projects. Like I said, it depends if you are looking for "truths" (which never really exist) or "insights".

21 Dec 2011 - 1:52pm
Larry Tesler

@Tamella - Why does it matter? Are you hoping to publish in a peer-reviewed psychology research journal? Could the choices that users make affect your client's revenues or profits? Is your client conducting an opinion poll? 

If different information presentations make little difference in user behavior, you'll need a large sample to measure the difference. But if there's such a small difference, quantifying it will only matter if it's an academic study or the financial stakes are huge. And if the financial stakes are huge, there will be better ways to improve results than the design of a choosing widget.

What the choices are also matters. Examples:

  • If the choices reflect users' opinions, then the wording of the question and its answers will make more difference than the the widget that the user operates to choose an answer.
  • If the choices are ranges, and the user has a number in mind, the way the choices are presented could well affect what they choose. If there are radio buttons labeled 1-10 and 10-20, users who want to answer "10" will divide their responses between the two. If there's a slider that can stop anywhere, all users who want to answer "10" will drag it to the same spot.



1 Jan 2012 - 9:33pm

All great things to think about. Thank you for you thoughtful feedback.  This is essentially about polling users about things and getting their feedback about how much they like it.   My initial thoughts were that it would not matter if it were biased because it would be biased consistently.  Thanks and happy new year! 


Syndicate content Get the feed