Labs/Ubiquity/Usability/Usability Testing/How To
The following is my attempt at saving others with a work-flow and actionable advice on conducting guerrilla usability testing.
Like paper prototyping, this document is purposefully sloppy to encourage harsh critiques. Please, iterate, test, and contribute -whatever your level of expertise.
Contents
Designing the Test
Traditional psychological testing has a pilot study and a final study. Because of the differences in size, recruitment, etc of usability testing, we designed the format by breaking the study into multiple chunks, analyzing the whole process after completing each chunk, and then deciding what's worth changing.
Instead of a single pilot study the testing is broken into multiple rounds that grow exponentially and become more refined. Each round goes from test to completed deliverable, after which the facilitator (that's you) evaluates the test, the workflow, and the goals of the test and decides what is worth changing and what is not. This prevents unexpected problems from infecting other test sessions, allows for improvements to the workflow, and caps testing to the minimum number of participants.
The study is designed to have three phases Alpha, Beta, and Gamma. Although presented linearly, jumping around and using Alpha tests during a Gamma phase is a sign of higher intelligence : ) We will walk through an actual study and explain each type of test as we go.
Example study: Video Vs. Written Documentation
Alpha Phase
(1 test with 1-2 participants max)
Alpha tests evaluate:
- the usability of a test
- the technical feasibility of testing the product
- and generate evidence of the worthiness of potential test data.
Spend a bare minimum of time and effort making and producing alpha tests. However, test on actual likely participants. Instead of producing a single basic video, a HTML page, and testing a considerable amount of time was spent on the following example materials, researching this topic, along with full blown mockups, 3 videos, team meetings, getting permission, etc.
<video type="vimeo" id="2901416" width="437" height="315" desc="1x Video Orignal Videos" frame="true" position="center"/>
There is a problem in our first test above, what is it?
The devilish part is that the more familiar one is with the software involved the harder the problem is to see. Two other programmers, four usability researchers, the project mailing list, and blog readers all failed to see that a video object steals keyboard focus from the browser and intercepts the Ubiquity hotkey (option and spacebar) and interpret it as pressing spacebar for pausing the movie.
This problem nuked the study for about 6 months. Beyond saving time, imagine having your manager, professor, or client fail this test during your proposal meeting. An alpha study with likely participants would have avoided all of this.
Beta Phase
(Multiple tests with 2-3 participants each)
Beta tests are pilot studies of 2-3 participants to which:
- ensure that your measurements actually measure what you are testing
- how to efficiently administer the test
- how to efficiently process the material, edit video, encoding errors into spreadsheets, creation of deliverables.
If you have not done a usability study before fully compile your data and publish your results, whether that be by video posts or outlining the final report and filling in what data you have. When done, evaluate how to improve the test itself (consent form, video captures software, format, error spreadsheet, etc) and your post production work flow (video encoders, video hosting, blogging, etc).
Publishing the tests now will also encourage buy-in from team members and management, solicit valuable feedback, and make generally help your usability cause. If possible, have a stakeholder watch the test and perform the sticky note exercise. If you are working remotely and are using video tagging assign a video to each stakeholder.
Coworkers may want to fix something, a boss may want to add additional goals. These are all good signs of usability buy-in. While respecting time constraints, try to implement their suggestions and any improvements to the test or post-production work flow.
Gamma Phase
(1 or more tests with 3-6 participants each)
The gamma phase is really just a name for when you stop assessing your workflow and testing suite and just rip through test sessions and post production like a Ninja.
Unless you are testing a complex interface (MS Office or an e-commerce website) or you have a demographic quota to fill you probably don't need to perform any gamma tests. While statistical significance is important the general rule of thumb is to stop testing when you begin to see patterns emerge. When sessions become predictable you have probably run into all the walls users can run into. Stop, fix the product, and start a new test.
Test Types
Exploratory
This is the most commonly used format for Usability testing. It is more accurately defined as a sudo-scientific,where interesting results are gained but there is no variable that changes.
Generally these are in an ‚"interview‚" format, where this is an overall goal and possibly sub-tasks but the user drives the interaction. These are great for early alpha stage products. It allows developers to see fundamental problems with the product, see what users like and understand, and generally give guidance to the overall development of the product.
Just because exploratory studies are more qualitative doesn't mean that all rules go out the window. Probable end users should be chosen, the setting should be as informal as possible, participation from the development team, error statistics, etc are all still very important in getting valuable information.
Strictly Scientific
This is when two groups of users are given the same task but two different interfaces. For websites this can happen sequentially or concurrently and the majority of usability work that major websites perform.
In a usability lab this type of test isn't much different from exploratory tests other than their scope. Users are more restricted, given only one goal instead of multiple goals. The numbers doubles, and often much more so if trying to ferret out slighter results.
These tests rely more heavily on error numbers than they do qualitative tests, so make sure that your alpha and beta phases really polish that process.
Prototype Types
Paper
Paper prototyping is, by far, the most powerful Usability technique. It's best to think of paper prototyping as the step in-between sketching a design and creating a mockup on the computer. It has all the advantages of sketching (cheap, fast to make, easy to change), helps users open up to criticizing the UI, and it's instantly interactive.
Part of the reason paper prototyping isn't used is that people think there must be some magic to it, some step they haven't thought about. The whole thing is really as simple as it sounds: draw an interface and ask people to use it.
<video type="youtube" id="GrV2SZuRPv0" width="437" height="315" desc=" Original Video" frame="true" position="center"/>
The only time that paper prototyping isn't very useful is when the interaction is unrestricted or response times are important. If you are trying to understand how users interact with a command line (unrestricted input) and that interaction depends on very fast speeds (like auto-suggest) it may be hard to get good information.
Unfortunately, if you have to share your designs to a remote team most people do not give early design prototypes the respect they deserve. Unless you have a touch screen computer, sending the sketches to remote teams can be inconvenient and under-valued.
- UIE has some great great tips on paper prototyping and information on it's so powerful. [1]
- A great write-up and walkthrough of paper prototyping the hipster PDA. [2]
- And, of course, Wikipedia's article.[3]
Wireframes
Wireframes are usually light weight mockups meant for testing layout and specifying features. They have of varying degrees of functionality and generally are left as bare as possible. These are generally used for web posts and pitching ideas.
In order from fastest to slower, also tends to be less interactive more interactive.
- Diagram programs/Omnigraffle/Viso
- Raster graphics editors (Photoshop, Gimp, etc)
- Fireworks, Flash, and other design products that have animation can be used for somewhat interactive prototypes.
- Pencil, other Gui Toolkits
Full-blown
Web pages are simple, but add some JS and you get a fully interactive prototype.
On the desktop side, there are too many GUI IDE's to list here. Of course we like Mozilla's Prism, but we STRONGLY caution against using actual implementations for early stage experimentation. You will get much more done with multiple quick paper or Canvas prototypes prototypes and then move on to a full IDE.
Recruiting & Interacting with participants
Recruiting participants should be fun and entertaining, so try and have fun with it : )
Choose a place.
Get the most informal setting you can and don't worry about a usability lab. This is known as in-field testing and it makes people act more naturally. Coffee shops, corporate lunchrooms, or university cafeterias are all great places with plenty of people.
Choose a person.
The best sessions are ones in which no money or good are offered as compensation. If are able to just start up a conversation with someone try recruiting doing just that. If working with a group elect the most social person in your team to be the test facilitator.
Most won't be comfortable doing "cold-call" recruiting. That's okay, it's more expensive, but still okay.
However you recruit participants, follow these guidelines and you will get much better interactions:
- Make eye contact and smile.
- Use the person's name when talking to them, "Heather, could you do X for me" or "Oh this is so great Heather. This will really help the team."
- Don't call the user a participant or subject in the study, with your co-workers, or in your reports. The term "participant" brings baggage and the users feels that s/he is being tested. They get nervous and act differently than they would if they were using the product in their everyday lives. Instead, refer to researchers as facilitators and participants are users or testers.
Recruit your army.
For Beta tests immediate friends and family are off limits, your history and interaction with them skews the test too much. Thankfully, their friends are not. Coworkers, total strangers, everyone else is fair game as long as they fit your target demographic.
A busy cafe or company lunchroom offers the largest pool of potential testers. If that is not an option, work whatever social networks you have, be that Facebook, emailing, or just asking around.
Donuts, company merchandise, etc are cheap ways of paying your participants. If you do have a large budget, post an ad on Craigslist.
Briefing/Consent
If you are going to collect personal information (be that demographic information, contact information, or a video capture) you must get some sort of recorded evidence that they know what is going to happen with their information.
If you are recording the session, the easiest thing is for the participant to record their consent on camera. The user remains anonymous (their name is not required and can be separated from the video) but still verifiable (if there is a dispute after the fact, you play the recording).
More important than legal consent is moral consent. If you don't think they understand what they are signing up for don't let them participate. Even if your school or business has a broiler plate legal agreement for them to sign make sure they read a one or two sentence description of what will happen to the data.
In the same vein, copyright can't protect against embarrassing your users. Fair use means that anyone can remix your video and make a parody of your test participant. If you include video in your results, make the video as small as possible, reduce the opacity to 50%, and overlay a product logo for good measure.
Recording the Session
What you use for recording your session doesn't really matter, but a quick list of stuff we have tried is here.
If you are sharing the data publicly (especially personally identifiable information) some sort of consent form is mandatory. An audio or visual consent statement is easier to manage and can't be lost. The standard IANAL applies, nor has the following been vetted by a real lawyer but you can see examples from previous testing. This consent process is consistent with other usability and psychological researchers.
Taking notes is good, but you will be going over the video again anyway so it's best to just pay attention. Silverback has an excellent feature that allows use of the Apple remote to add bookmarks.
Debriefing
Thank the person and give them information on where to find out more about the project you are working on.
Analysis and Sharing of Data
Spreadsheets are the easiest, fastest, and most flexible way of tracking your data. After a recording session, immediately watch video and record each error. Doing this immediately cuts down on the tedious work when writing your final report and allows you to immediately compare errors from each session. For future sessions, you will be able to automatically assign a mistake to a specific category, instead of trying to unify all the different sessions at the end.
Local Development Team
The general rule is that all stakeholders (from programmers, to managers, to Quality Assurance people, and even marketers) are required to sit through at least one live session. During that session have them write down their thoughts and different errors into separate sticky-notes. Once the testing is complete buy some pizza and try grouping the stickies together on a white board and talk over what everyone saw.
Remote Development Team
Contact is especially important if you are working with a remote development team. Blogging, video podcasts, and steaming data analysis are all very important but very hard to balance. Take the path of least resistance to each one.
An incredible Wordpress skin would be great, but for now just download a free one and move on! Don't provide too much analysis, especially this early in. Be a reporter of what is happening and ask for analysis by the team. Good analysis takes a lot of time and also discourages others from analyzing your data and becoming engaged. It's a blog, not a scientific journal.
Make your data accessible hosting your spreadsheets on Google Docs, for example, but don't get caught up in having an interface that generates graphs. Your data may look ugly, as long as the delivery mechanism looks okay and your put up a sign that says "This data is raw, ugly, and unfinished. I just threw it up here so you can look at it if you want" people won't judge your ability to do your job well.
If you have video begin by speeding video by 1.65 (link to study on library data). This is the upper limit on making everything go faster but still legible. Producing video is something we will eventually touch on (when we have a unified post production workflow) but Miro has an awesome multi-platform guide to producing and editing video we couldn't hope to match.
Again, disseminate the video as quickly as you can. If you plan on tagging the video post the video immediately after the session without the tag information and repost it again after it's been tagged.
Video Dissemination Options
- Upload to Viddler and assign a programmer to tag at least one session.
- Podcasts
- One major problem is that unless a session is being watched very closely (i.e. not playing in the background while a programmer is trying to get her/his programming done) it's hard to catch the most important small issues. With video programmers can take these with them and (hopefully) watch them when there is less stimulation, like on bus commutes. Mozilla has no official channels for this, however Ourmedia provides free hosting.
- Tag and make clip reels
- This can be very time intensive (easily 1 minute for every second of video) and the data is not portable. There is an active effort to improve both of these negative aspects by incorporating the tagging into the early data collection.
Triangulation
As you log your error data in a spreadsheet log bugs in the development bug tracker and link to them in both your spreadsheets, video, and final reports.
In your bug report links to any clip reels or links that jump to point in time of video promote engagement of the development team. It is probably best to create a script for this, although you could hack the ones we have. Link construction hack
Cross referencing further with user complaints provides additional support.
If you have the manpower, money, and political will a Metavid sever nicely takes care of tagging data, clip reels, disseminating information, and collaborative data analysis all in one swoop.
Final Report
You may or may not be required to make a final report. It is suggested you do, it is the single summary of the usability study that can be linked to and digested. It reflects well on you, it is the thing you can link to that you did, as opposed to a collection of data sources. And you will likely find yourself building pieces of it while doing your data analysis anyway.
Make sure to do this within "the fold" of how your company operates. An internal wiki, filing a word processed report, a presentation, etc.