When we started work on the Amazon Echo Show design, our first feeling was of recognisable comfort. We’ve been designing voice interactions for over a year and half, but this new device brings a touch screen into the mix and, with it, a whole new set of design challenges and opportunities.

pasted-image-0

In this article I’ll take you through some of the lessons we learnt adapting our voice-first Alexa experience to voice-first-with-screen, and give you the head-start you need to make the most out of your own voice-enabled apps.

I’m Craig Pugsley, Principle Designer in Just Eat’s Product Research team. I’ve been designing touch-screen experiences for 10 years. Last year, we made a little journey into the world of voice-based apps with our original Amazon Echo skill, and it mangled my mind. Having just about got my head around that paradigm shift, Amazon came along with their new Echo Show device, with its 1024px x 600px touch screen, and everything changed again. I started getting flashes of adapting our iOS or android apps to a landscape aspect screen. Designing nice big Fitts-law-observing buttons that could be mashed from across the room. But it very soon became apparent that Amazon have been making some carefully orchestrated decisions about how experiences should be designed for their new ‘voice-first’ devices, and trying to adapt an existing visual experience just wouldn’t cut the mustard.

A Bit of Background

But I’m getting ahead of myself here. Let’s jump back to 2014, when Amazon brought to the US market the world’s first voice-enabled speaker. You could play music, manage calendars, order from the Amazon store, set kitchen timers, check the weather, etc… all with your voice, naturally, as though they were having a conversation with another human. Fast forward to 2017 and you can now pick from hundreds of third-party apps to extend the speaker’s functionality. Many of the big tech names have ‘skills’ for the Echo, including Uber, Sky, The Trainline, Jamie Oliver, Philips Hue and Just Eat.

Since 2014, Amazon have brought a range of Alexa-enabled devices to market, at a multitude of wallet-friendly prices – starting with the £50 Echo Dot (like it’s big brother, but without the nice speaker) up to the new Echo Show at £199 (essentially a standard Echo, but with a touch screen and camera), with screens of all shapes and sizes in-between.

pasted-image-0-2

Why did we get into voice? Our job is to hedge the company’s bets. Just Eat’s mission is to create the world’s greatest food community, and that community is incredibly diverse – from the individual who orders their weekly treat, all the way through to repeat customers using our upwards of thirty thousand restaurants to try something new every night. To be this inclusive, and let our restaurant partners reach the widest possible audience, we need to be available on every platform, everywhere our users are. Just Eat’s core teams are hard at work on the traditional platforms of iOS, Android and Web, so we take longer-shot calculated risks with new technologies, methodologies, business models and platforms. Being a small, rapidly-iterative, user-centred team, our goal is to fail more often than we succeed – and scout a route to interesting new platforms and interactions, without needing to send the whole army off in a new direction.

So, we made a bet on voice. To be honest, it was a fairly low-risk gamble: the smartphone market has stagnated for years, become ripe for new innovation to make the next evolutionary step, and we’ve reached peak iPhone. We have projects looking at VR, AR, big screens, one buttons, distributed ordering, (so many, in fact, that we had to showcase them all in a swanky Shoreditch event last year).

pasted-image-0-1

It was only natural that voice (or, more specifically, conversational user interfaces) would be in that mix. When we were handed an Amazon Echo device under a table in a cafe in London (sometime in early 2016 – several months before the Echo’s UK release) that gave us the route to market we were looking for.

The Next Frontier

From a Design perspective, conversational UIs are clearly the next interaction frontier. They’re the perfect fit for busy people, they don’t suffer from the cognitive load and friction of moving between inconsistently-designed apps’ walled gardens (something I’ve called Beautiful Room Syndrome), and they have a slew of tangential benefits that might not be obvious at first thought. For example, our data seems to suggest that more users interacting with our skill seem to skew older. I find this fascinating! And entirely obvious, when you think about it.

There’s a whole generation of people for whom technology is alien and removed from the kinds of interactions they’re used to. Now, almost out of nowhere, the technologies of deep learning, natural language processing, neural networks, speech recognition and cloud computing power have matured to enable a kind of interaction at once both startlingly new and compelling, whilst being so obvious, inevitable and natural. At last, these people who would have been forced to learn the complexities and vagaries of touchscreen interfaces to engage in the digital world, will be given access using an interface they’ve have been using since childhood.

Amazon clearly recognised the new market they were unlocking. After the Amazon Echo speaker (around £150), they quickly followed up with a range of new devices and price points. Possibly most compelling is the £50 Echo Dot – a device barely larger than a paperback on it’s side, but packing all the same far-field microphone technology allowing it to hear you across the room and all the same Alexa-enabled smarts as it’s more expensive cousins. With the launch of the Echo Show, Amazon have addressed one of the more significant constraints of a voice-only interface: we live in an information age, and sometimes it’s just better to show what the user’s asked for, rather than describe it.

Designing For Alexa

Amazon’s design guidance on their screen-based devices is strong, and shows their obvious strategic push towards voice experiences that are augmented by simple information displays. Designing for the Show will give you all you need to translate your skill to Alexa on Fire tablets and Fire TVs, if and when Amazon enable these devices. It’s an inevitable natural progression of the voice interface, and Amazon have made some strategic design decisions to help make your skill as portable as possible.

For example, you don’t have control over all of those 1024×600 pixels. Instead, you have (at the moment) 6 customisable templates that you can insert content into. Ostensibly, there are two types: lists and blocks of text. Into that, you have four font sizes and a range of basic markup you can specify (bold, italic, etc.). You can also insert inline images (although not animated GIFs – we tried!) and ‘action buttons’ which are controls that will fire the same action as if they user said the command. Each template also contains a logo in the top right, page title and a background image. It’s fair to say the slots you get to fill are fairly limited, but this is deliberate and positive step for the Alexa user experience.

img_0053

[For a more detailed breakdown of how to build an app for Echo Show, take a look at my colleague Andy May’s in-depth article]

One key element is the background image you can display on each screen. You can make your background work really hard, so definitely spend some time exploring concepts. Amazon’s guidance is to use a photo, with a 70% black fill, but I find that too muddy and it felt too dark for our brand. Instead, we used our brand’s signature colours for the background to denote each key stage of our flow. I like how this subliminally suggests where the user is in the flow (e.g. while you’re editing your basket, the background remains blue) and gives a sense of progression.

Top 10 Tips for Designing Voice Interactions

Be Voice First

You have to remember you’re designing an experience that is augmented with a visual display. This one’s probably the hardest to train yourself to think about – we’ve been designing UI-first visual interfaces for so long, that thinking in this voice-first way is going to feel really unnatural for a while. Start by nailing your voice-only flows first, then tactically augment that flow with information using the screen.

The 7ft Test

Amazon provide four font sizes for you to use: small, medium, large and extra large. You have to make sure crucial information is large enough to be read from 7ft away. Remember: users will almost certainly be interacting only with their voice, probably from across the room.

img_5781

Be Context Aware

Your users have chosen to use your Alexa skill over your iOS app. Be mindful of that reason and context. Maybe their hands are busy making something? Maybe they’re dashing through the kitchen on their way out, and just remembered something? Maybe they’re multi-tasking? Maybe they’re an older user who is engaging with your brand for the first time? Use research to figure out how and why your users use your voice skill, and use that insight to design to that context.

Don’t Just Show What’s Said

An obvious one, but worth mentioning in this new world. Your engineers will need to build a view to be shown on the screen for each state of your flow – the Show platform will not show a ‘default’ screen automatically (which, we admit, is kinda weird) and you’ll end up in a situation where you’re showing old content while talking about something at an entirely different stage of the flow. Super confusing to the user. So, we found it was useful to start by building screens that displayed roughly what was being spoken about first, for every state.

img_0645

This will let you, the designer, make sure you’ve nailed your voice experience first, before then cherry-picking what you want to display at each state. You can use the display to show more than you’re saying, and even give additional actions to the user. Remember, like all good UX, less is most definitely more. Use the screen only when you think it would significantly add to the experience. If not, just display a shortened version of what you’re asking to the user. Typically, this could be one or two verb-based words, displayed in large font size.

Be careful with lists

In fact, be careful with how much information you’re saying, period. It’s a good design tip to chunk lists when reading them out (e.g. ‘this, this, this and this. Want to hear five more?’), but when you’ve got a screen, you can subtly adjust what you say to cue the user to look at the screen. You could, for example say ‘this, this, this and these five more’ while showing all eight on the screen.

Consistency

If you’re building a VUI with multiple steps in the flow, make sure you’re consistent in what you’re showing on screen. This is one of the few tips you can carry over from the world of visual UI design. Make sure you have consistent page titles, your background images follow some kind of semantically-relevant pattern (images related to the current task, colours that change based on state, etc…) and that you refer to objects in your system (verbs, nouns) repeatedly in the same way. You can (and should) vary what you say to users – humans expect questions to be asked and information to be presented in slightly different ways each time, so it feels more natural to be asked if they want to continue using synonymous verbs (‘continue’, ‘carry on’, ‘move on’, etc…). This is more engineering and voice design work, but it will make your experience feel incredibly endearing and natural.

Be Wary of Sessions

Remember what your user was doing, and decide whether you want to pick up that flow again next time they interact. If you’re building an e-comm flow, maybe you persist the basket between sessions. If you’re getting directions, remember where the user said they wanted to go from. This advice applies equally to non-screen Alexa devices, but it’s critical on the Show due to the way skills timeout if not interacted with. Users can tap the screen at any time in your flow. Alexa will stop speaking and the user has to say “Alexa” to re-start the conversation. If they don’t, your skill will remain on screen for 30 seconds, before returning to the Show’s home screen. When your user interacts with your skill again, you should handle picking up that state from where they were, in whatever way make sense to your skill. You could ask if they want to resume where they were, or you could figure out how long it was since they last interacted and decide that it’s been a couple of days, so they probably want to start again.

Show the prompt to continue on screen

This one is super-critical on the Echo Show. Best practise suggests that you should have your prompt question (the thing that Alexa will be listening to the answer for) at the end of her speech. But, if the user starts interacting with the screen, Alexa will immediately stop talking, and the user won’t hear the question and won’t know what to say to proceed. You need to decide what’s best for your skill, but we found that putting the prompt question in the page title (and doing it consistently on every page) meant users could safely interrupt to interact with the screen, while still having a clear indication of how to proceed.

img_0646

Worship your copywriter

Another tip relevant to non-screen voice interfaces, but it really takes the nuanced skills of a professional wordsmith to target the same message to be both spoken, written in the companion app card, and displayed on the limited real estate of the Echo Show screen. Make sure you’re good friends with your team’s copywriter. By them beer regularly and keep them close to the development of your voice interface. Encourage them to develop personality and tone of voice style guides specifically for VUIs. They’re as much a core part of your design team as UX or User Researchers. Treat them well.

In terms of user testing, we weren’t able to work with actual customers to test and iterate the designs for the Echo Show, as we routinely do with all our other products, due to the commercial sensitivity around the Echo Show UK release. So, we had to make the best judgements we could, based on the analytics we had and some expert reviewing within the team 😉 That said, we did plenty of internal testing with unsuspecting reception staff and people from other teams – Neisen’s guidance still stands: 5 users can get you 80% of usability issues, and we definitely found UX improvement, even testing with internals. Aside from the Show, we test future concepts in a wizard-of-oz style with one of us dialing in to the test lab and pretending to be Alexa. We get a huge amount of insight without writing a single line of code using this method, but that’s a whole other blog post for another day 😉

So there we go. Armed with these words of wisdom, and your existing voice-first skill, you should be fully equipped to create the next big app for the next big platform. Remember: think differently, this market is very new, look for users outside your traditional demographics and be prepared to keep your skills updated regularly as tech and consumer adoption changes. Good luck!

Craig Pugsley
Bristol, UK – Sept 2017

To find out more about the Just Eat Alexa Skill visit: //www.just-eat.co.uk/alexa

For more information visit on designing for Alexa visit: //developer.amazon.com/designing-for-voice/