User research for voice interfaces


The voice is our natural communication tool.

It is the first channel we have had as human beings and our “native” way to communicate, unlike visual tools.

Technological evolution has been enriching the landscape of our interactions beyond textual and graphical and the presence of voice interfaces is becoming more and more common in our daily lives. Every phone we use has a virtual assistant, more and more homes have an Echo and no one thinks of a woman’s name anymore when they hear “Alexa”.

Although voice interfaces are not new to the industry (Nielsen Norman was already calling them the future of interaction in 2017), this eventful year 2020 was to be their market debut. Of course, the adoption of this technology and its growing number of users make it more than interesting to consider.

With this information, it is clear why we should test voice interfaces, but the question is how to do it?

First of all, it is important to understand the levels of voice interaction and choose which one we want. We can opt for:

  • Voice- early: the interaction is 100% by voice. Both the interaction and the output of the device are done by voice. This type of device is less common.
  • Voice first: these devices have part of the output in visual form (e.g. Amazon Echo or Amazon Show) and part of the interaction (either in the selection or in the response) is tactile.

Advantages and consequences of Voice

The key to any design is that the user knows exactly what to expect from the interface they are interacting with. In voice, it is essential to make it clear to the user at the outset what to expect from the tool. For this reason, voice interfaces usually introduce themselves and indicate their function (in a simple and clear way) whenever they are invoked. On the other hand, they also have a “service vocation” and are designed to help the user with specific needs. This is why questions such as “How can I help you”, “What can I do for you” are so frequent in voice interfaces. The goal of these questions is to identify the user’s problem and lead them through the conversation flow best suited to their need.

If you are thinking of starting a project with this technology, it is important to be aware of its advantages and possible disadvantages.

Voice has many advantages in certain contexts over on-screen interfaces. Some of them are:

  • If the technology and the interaction flow are properly worked, the relationship with the interface is more natural and closer to the conversation between individuals. This facilitates communication and, above all, makes interaction easier and faster.
    Greater accessibility: for people with limited vision or even visual impairment, voice has been a great ally for decades, allowing them to use mobile devices, computers or even “read” through audio books. Until the design of voice-first tools, this technology was partly limited by certain interaction steps with screens, which are no longer necessary with voice-first. If you want to learn more about the possibilities of this technology for blind people, read the article “Voice guidance in Maps, built for people with impaired vision” by Google.
    Speed: thanks to IoT technology, voice can quickly control everyday actions such as turning on the TV, turning on the lights or playing music while taking a shower. This makes multitasking simple and very fast. Plus, it looks great if you have visitors 😉 (wow effect).

However, this technology also has some drawbacks or limitations to consider, such as:

  • The technology is still immature and therefore does not allow to cover all the use cases that could occur in a real conversation.
    People’s memory is increasingly limited. Therefore, in a conversation we tend to pay more attention to the information that is presented to us first and lose focus as what we are being told progresses. There are techniques to avoid this, such as asking questions to check that you have understood, giving different selection options instead of telling everything at first, etc. However, to present very varied or complex information it is interesting to use a graphical interface as it allows a faster scanning of the information.
    There are things that are simply easier to communicate graphically or in writing, such as maps, plans, graphs or complex texts, etc. It is necessary to adapt the needs to the context and not to take everything to the voice just because it is trendy.

Research techniques for voice interfaces

As in any other project, the first step of the research would be to check if the project is suitable for voice and validate with which type of interaction we will achieve an optimal use case.

It is clear that voice is a trend, but we must always keep in mind that there are projects that will work better with voice first, voice early and others that will work much better with graphical or mixed interfaces. Remember that the goal is always to get the most useful and intuitive tool for the user. If you are considering a voice project, testing the feasibility is as essential a step as in any other project.

Once we have defined the feasibility of voice in the project, what’s next?

Choose the research techniques that best fit the current state of our project.

In previous stages

Before we start planning our voice interface, we may want to explore how users relate to existing voice interfaces.

In this case it is essential to conduct contextual or ethnographic research studies, where we observe the user’s interaction with the technology in their usual environment to organically detect any barriers that may arise in the interaction. When users do not have much experience using a particular technology, as in the case of voice, sometimes direct questions or interviews may not be as effective as direct observation in getting to understand how they deal with the technology and whether it makes sense to use voice in the project.

During the design process

According to Helen Zipora, one of Spain’s leading voice experts, “you have to start by talking, not painting”. Before designing all the conversation flows and testing with technology, it is important to talk to users and check if our initial scripts make sense and validate the use cases.

One of the most commonly used techniques in voice research is the wizard of oz. This is a research technique in which a user interacts (unknowingly) with a person who acts as if he or she were the voice technology, following an intended script. This technique is done before investing the time to create a prototype, as it is faster to have a conversation and change the script depending on how it flows than to test different flows already implemented.

As in graphic projects, interfaces allow users to take different paths within the same web or application. The same is true for voice. That is why it is important to define well the conversation flows that can be presented taking into account the type of users that will access the interface.

Role play is also a very common technique to find the most natural form of conversation. It is similar to the wizard of oz in that one person on the team acts as if he/she were the voice technology and has a conversation with the user. It is very interesting to detect faults in speech or to improve conversation flows. The technique is useful to nimbly identify friction points before the prototyping phase.

These techniques can be applied in all phases of the design process, using increasingly complex and complete stimuli, from a simple script to a fully functional voice interface.

After the design process

Once our voice interface is already accessible to our audience, we can track the user experience through different methods. On the one hand, we can collect quantitative actual usage data that allows us to analyze user conversations in detail.


Voice interfaces are here to stay and are starting to gain market adoption. It is the perfect time to start working with them, always bearing in mind in which projects they can shine and what their limitations are.

Cover photo: Luis Cortés from Unsplash


Share on activity feed

Powered by WP LinkPress


More from our blog