Just-Eat spectrum-bottom spectrum-top facebook google-plus instagram linkedIn pinterest reddit rss twitter_like twitter_reply twitter_share twitter_veriviedtwitter vimeo whatsapp youtube error_filled error file info-filled info loading star tick arrow arrowLeft arrowRight close download minus-filled minus move play plus-filled plus searchIcon settings

Category : Design


Top 5 Tips for Building Just Eat on Amazon’s Echo Show

Hi, I’m Andy May – Senior Engineer in Just Eat’s Product Research team. I’m going to take you through some top tips for porting your existing Alexa voice-only skill to Amazon’s new Echo Show device, pointing out some of the main challenges we encountered and solved.


Since we started work on the Just Eat Alexa skill back in 2016, we’ve seen the adoption to voice interfaces explode in popularity. Amazon’s relentless release schedule for Alexa-based devices has fueled this, but the improvements in the foundational tech (AI, deep learning, speech models, cloud computing) coupled with the vibrant third-party skill community look set to establish Alexa as arguably the leader in voice apps.

From an engineering perspective adapting our existing code base to support the new Echo Show was incredibly easy. But, as with any new platform, simply porting an existing experience across doesn’t do the capabilities of the new platform justice. I worked incredibly closely with my partner-in-crime Principle Designer Craig Pugsley to take advantage of what now became possible with a screen and touch input. In fact, Craig’s written some top tips about exactly that just over here

In order to add a Show screen to your voice response you simply extend the JSON response to include markup that describes the template you want to render on the device. The new template object (Display.RenderTemplate) is added to a directives Array in the response.

For more details on the Alexa response object visit //developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interface-reference#response-body-syntax

Sounds simple, doesn’t it? Well, it’s not rocket science, but it does have a few significant challenges that I wished someone had told me about before I started on this adventure. Here are five tips to help you successfully port your voice skill to voice-and-screen.

1. You need to handle device-targeting logic

The first and main gotcha we found was that you cannot send a response including a template to a standard Echo or Dot device. We incorrectly assumed a device that does not support screens would simply ignore the additional objects in the response.

Our own Conversation Class that all Alexa requests and responses go though is built on top of the Alea Node SDK. The SDK did not exist when we first launched our Skill. We added a quick helper method from the Alexa Cook Book (//github.com/alexa/alexa-cookbook/blob/master/display-directive/listTemplate/index.js#L589) to check if we are dealing with an Echo Show or voice only device.

This method is called before we return our response to ensure we only send RenderTemplates to devices that support them.

Finally we extended our Response Class to accept the new template objects and include them in the response sent to Alexa. The result visual screens are displayed on the Echo Show alongside the spoken voice response.

2. Don’t fight the display templates

There are currently 6 templates provided to display information on the Echo Show. We decided to create one file this means the markup and structure is only declared once. We then pass the data we need to populate the template. Object destructuring, string literals alongside array.map and array.reduce make generating templates easy. We use Crypto to generic a unique token for every template we return.


Image of list – mapping basket to template listItems.

Image of basket list  – reducing basket to single string.

Markup is limited to basic HTML tags including line breaks, bold, italic, font size, inline images, and action links. Action Links are really interesting but the default blue styling meant we have so far had to avoid using them.

Many of the templates that support images take an array of image objects however just the first image object is used. We experimented providing more than one image to provide a fallback image or randomise the image displayed. The lack of fallback images means that we need to make a request to our S3 bucket to validate the image exists before including in the template.

Don’t try to hack these templates to get them to do things that weren’t designed for. Each template’s capabilities have been consciously limited by Amazon to give users a consistent user experience. Spend your time gently stroking your friendly designer and telling them they’re in a new world now. Set their expectations around the layouts, markup and list objects that are available. Encourage them to read Craig’s post.

3. Take advantage of touch input alongside voice

The Echo Show offers some great new functionality to improve user experience and make some interactions easier. Users can now make selections and trigger intents but touching the screen or saying the list item number “select number 2”.

It is your job to implement capture touch and voice selection. When a user selects a list item you code will receive a new request object of type Display.ElementSelected.

The token attribute you specify when creating the list is passed back in this new request object:

In the above example we receive the value ‘Indian’ and can treat this in the same way we would the cuisine slot value. Our state management code knows to wait for the cuisine intent with slot value or Display.ElementSelected request.

Finally we create a new Intent, utterances an a slot to handle number selection. If our new Intent is triggered with a valid number we simply match the cuisine value from the cuisine array in state with a index offset.

Find out more about touch and voice selection – //developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/display-interface-reference#touch-selection-events

4. Adapt your response based on device

The Echo Show provides lots of opportunities and features. In one part of our Skill we decided to change the flow and responses based on the device type.

When we offer users the opportunity to add popular dishes it made sense for us to shorten the flow as we can add use the screen in addition to the voice response.

We use the same supportsDisplay method to change the flow of our skill.

We use the same logic when displaying the list of popular dishes. Based on Amazon recommendations if the device supports display we don’t read out all the dishes.

You can find out more about our thoughts designing user experience for the Echo Show here.

5. The back button doesn’t work

The back button caused us some problems. When a user touches the back button the Echo Show will display the previous template. Unfortunately no callback is sent back to your code. This creates huge state management problem for us.

For example a user can get the checkout stage at this point our state engine expects only a 2 intents Pay Now or Change Something  (exc back, cancel and stop). If a Echo Show user touched back the template would now show our Allergy prompt. The state engine does not know this change has taken place so we could  not process the users Yes/No intents to move on from allergy as think the user is still on the checkout stage.

Just to add to this problem the user can actually click back through multiple templates. Thankfully you can disable the back button in the template response object:

To find out more about the Just Eat Alexa Skill visit //www.just-eat.co.uk/alexa

For more information visit on developing Alexa Display Interface visit  //developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/display-interface-reference


Top 10 Voice Design Tips for the Amazon Echo Show

When we started work on the Amazon Echo Show design, our first feeling was of recognisable comfort. We’ve been designing voice interactions for over a year and half, but this new device brings a touch screen into the mix and, with it, a whole new set of design challenges and opportunities.


In this article I’ll take you through some of the lessons we learnt adapting our voice-first Alexa experience to voice-first-with-screen, and give you the head-start you need to make the most out of your own voice-enabled apps.

I’m Craig Pugsley, Principle Designer in Just Eat’s Product Research team. I’ve been designing touch-screen experiences for 10 years. Last year, we made a little journey into the world of voice-based apps with our original Amazon Echo skill, and it mangled my mind. Having just about got my head around that paradigm shift, Amazon came along with their new Echo Show device, with its 1024px x 600px touch screen, and everything changed again. I started getting flashes of adapting our iOS or android apps to a landscape aspect screen. Designing nice big Fitts-law-observing buttons that could be mashed from across the room. But it very soon became apparent that Amazon have been making some carefully orchestrated decisions about how experiences should be designed for their new ‘voice-first’ devices, and trying to adapt an existing visual experience just wouldn’t cut the mustard.

A Bit of Background

But I’m getting ahead of myself here. Let’s jump back to 2014, when Amazon brought to the US market the world’s first voice-enabled speaker. You could play music, manage calendars, order from the Amazon store, set kitchen timers, check the weather, etc… all with your voice, naturally, as though they were having a conversation with another human. Fast forward to 2017 and you can now pick from hundreds of third-party apps to extend the speaker’s functionality. Many of the big tech names have ‘skills’ for the Echo, including Uber, Sky, The Trainline, Jamie Oliver, Philips Hue and Just Eat.

Since 2014, Amazon have brought a range of Alexa-enabled devices to market, at a multitude of wallet-friendly prices – starting with the £50 Echo Dot (like it’s big brother, but without the nice speaker) up to the new Echo Show at £199 (essentially a standard Echo, but with a touch screen and camera), with screens of all shapes and sizes in-between.


Why did we get into voice? Our job is to hedge the company’s bets. Just Eat’s mission is to create the world’s greatest food community, and that community is incredibly diverse – from the individual who orders their weekly treat, all the way through to repeat customers using our upwards of thirty thousand restaurants to try something new every night. To be this inclusive, and let our restaurant partners reach the widest possible audience, we need to be available on every platform, everywhere our users are. Just Eat’s core teams are hard at work on the traditional platforms of iOS, Android and Web, so we take longer-shot calculated risks with new technologies, methodologies, business models and platforms. Being a small, rapidly-iterative, user-centred team, our goal is to fail more often than we succeed – and scout a route to interesting new platforms and interactions, without needing to send the whole army off in a new direction.

So, we made a bet on voice. To be honest, it was a fairly low-risk gamble: the smartphone market has stagnated for years, become ripe for new innovation to make the next evolutionary step, and we’ve reached peak iPhone. We have projects looking at VR, AR, big screens, one buttons, distributed ordering, (so many, in fact, that we had to showcase them all in a swanky Shoreditch event last year).


It was only natural that voice (or, more specifically, conversational user interfaces) would be in that mix. When we were handed an Amazon Echo device under a table in a cafe in London (sometime in early 2016 – several months before the Echo’s UK release) that gave us the route to market we were looking for.

The Next Frontier

From a Design perspective, conversational UIs are clearly the next interaction frontier. They’re the perfect fit for busy people, they don’t suffer from the cognitive load and friction of moving between inconsistently-designed apps’ walled gardens (something I’ve called Beautiful Room Syndrome), and they have a slew of tangential benefits that might not be obvious at first thought. For example, our data seems to suggest that more users interacting with our skill seem to skew older. I find this fascinating! And entirely obvious, when you think about it.

There’s a whole generation of people for whom technology is alien and removed from the kinds of interactions they’re used to. Now, almost out of nowhere, the technologies of deep learning, natural language processing, neural networks, speech recognition and cloud computing power have matured to enable a kind of interaction at once both startlingly new and compelling, whilst being so obvious, inevitable and natural. At last, these people who would have been forced to learn the complexities and vagaries of touchscreen interfaces to engage in the digital world, will be given access using an interface they’ve have been using since childhood.

Amazon clearly recognised the new market they were unlocking. After the Amazon Echo speaker (around £150), they quickly followed up with a range of new devices and price points. Possibly most compelling is the £50 Echo Dot – a device barely larger than a paperback on it’s side, but packing all the same far-field microphone technology allowing it to hear you across the room and all the same Alexa-enabled smarts as it’s more expensive cousins. With the launch of the Echo Show, Amazon have addressed one of the more significant constraints of a voice-only interface: we live in an information age, and sometimes it’s just better to show what the user’s asked for, rather than describe it.

Designing For Alexa

Amazon’s design guidance on their screen-based devices is strong, and shows their obvious strategic push towards voice experiences that are augmented by simple information displays. Designing for the Show will give you all you need to translate your skill to Alexa on Fire tablets and Fire TVs, if and when Amazon enable these devices. It’s an inevitable natural progression of the voice interface, and Amazon have made some strategic design decisions to help make your skill as portable as possible.

For example, you don’t have control over all of those 1024×600 pixels. Instead, you have (at the moment) 6 customisable templates that you can insert content into. Ostensibly, there are two types: lists and blocks of text. Into that, you have four font sizes and a range of basic markup you can specify (bold, italic, etc.). You can also insert inline images (although not animated GIFs – we tried!) and ‘action buttons’ which are controls that will fire the same action as if they user said the command. Each template also contains a logo in the top right, page title and a background image. It’s fair to say the slots you get to fill are fairly limited, but this is deliberate and positive step for the Alexa user experience.


[For a more detailed breakdown of how to build an app for Echo Show, take a look at my colleague Andy May’s in-depth article]

One key element is the background image you can display on each screen. You can make your background work really hard, so definitely spend some time exploring concepts. Amazon’s guidance is to use a photo, with a 70% black fill, but I find that too muddy and it felt too dark for our brand. Instead, we used our brand’s signature colours for the background to denote each key stage of our flow. I like how this subliminally suggests where the user is in the flow (e.g. while you’re editing your basket, the background remains blue) and gives a sense of progression.

Top 10 Tips for Designing Voice Interactions

Be Voice First

You have to remember you’re designing an experience that is augmented with a visual display. This one’s probably the hardest to train yourself to think about – we’ve been designing UI-first visual interfaces for so long, that thinking in this voice-first way is going to feel really unnatural for a while. Start by nailing your voice-only flows first, then tactically augment that flow with information using the screen.

The 7ft Test

Amazon provide four font sizes for you to use: small, medium, large and extra large. You have to make sure crucial information is large enough to be read from 7ft away. Remember: users will almost certainly be interacting only with their voice, probably from across the room.


Be Context Aware

Your users have chosen to use your Alexa skill over your iOS app. Be mindful of that reason and context. Maybe their hands are busy making something? Maybe they’re dashing through the kitchen on their way out, and just remembered something? Maybe they’re multi-tasking? Maybe they’re an older user who is engaging with your brand for the first time? Use research to figure out how and why your users use your voice skill, and use that insight to design to that context.

Don’t Just Show What’s Said

An obvious one, but worth mentioning in this new world. Your engineers will need to build a view to be shown on the screen for each state of your flow – the Show platform will not show a ‘default’ screen automatically (which, we admit, is kinda weird) and you’ll end up in a situation where you’re showing old content while talking about something at an entirely different stage of the flow. Super confusing to the user. So, we found it was useful to start by building screens that displayed roughly what was being spoken about first, for every state.


This will let you, the designer, make sure you’ve nailed your voice experience first, before then cherry-picking what you want to display at each state. You can use the display to show more than you’re saying, and even give additional actions to the user. Remember, like all good UX, less is most definitely more. Use the screen only when you think it would significantly add to the experience. If not, just display a shortened version of what you’re asking to the user. Typically, this could be one or two verb-based words, displayed in large font size.

Be careful with lists

In fact, be careful with how much information you’re saying, period. It’s a good design tip to chunk lists when reading them out (e.g. ‘this, this, this and this. Want to hear five more?’), but when you’ve got a screen, you can subtly adjust what you say to cue the user to look at the screen. You could, for example say ‘this, this, this and these five more’ while showing all eight on the screen.


If you’re building a VUI with multiple steps in the flow, make sure you’re consistent in what you’re showing on screen. This is one of the few tips you can carry over from the world of visual UI design. Make sure you have consistent page titles, your background images follow some kind of semantically-relevant pattern (images related to the current task, colours that change based on state, etc…) and that you refer to objects in your system (verbs, nouns) repeatedly in the same way. You can (and should) vary what you say to users – humans expect questions to be asked and information to be presented in slightly different ways each time, so it feels more natural to be asked if they want to continue using synonymous verbs (‘continue’, ‘carry on’, ‘move on’, etc…). This is more engineering and voice design work, but it will make your experience feel incredibly endearing and natural.

Be Wary of Sessions

Remember what your user was doing, and decide whether you want to pick up that flow again next time they interact. If you’re building an e-comm flow, maybe you persist the basket between sessions. If you’re getting directions, remember where the user said they wanted to go from. This advice applies equally to non-screen Alexa devices, but it’s critical on the Show due to the way skills timeout if not interacted with. Users can tap the screen at any time in your flow. Alexa will stop speaking and the user has to say “Alexa” to re-start the conversation. If they don’t, your skill will remain on screen for 30 seconds, before returning to the Show’s home screen. When your user interacts with your skill again, you should handle picking up that state from where they were, in whatever way make sense to your skill. You could ask if they want to resume where they were, or you could figure out how long it was since they last interacted and decide that it’s been a couple of days, so they probably want to start again.

Show the prompt to continue on screen

This one is super-critical on the Echo Show. Best practise suggests that you should have your prompt question (the thing that Alexa will be listening to the answer for) at the end of her speech. But, if the user starts interacting with the screen, Alexa will immediately stop talking, and the user won’t hear the question and won’t know what to say to proceed. You need to decide what’s best for your skill, but we found that putting the prompt question in the page title (and doing it consistently on every page) meant users could safely interrupt to interact with the screen, while still having a clear indication of how to proceed.


Worship your copywriter

Another tip relevant to non-screen voice interfaces, but it really takes the nuanced skills of a professional wordsmith to target the same message to be both spoken, written in the companion app card, and displayed on the limited real estate of the Echo Show screen. Make sure you’re good friends with your team’s copywriter. By them beer regularly and keep them close to the development of your voice interface. Encourage them to develop personality and tone of voice style guides specifically for VUIs. They’re as much a core part of your design team as UX or User Researchers. Treat them well.

In terms of user testing, we weren’t able to work with actual customers to test and iterate the designs for the Echo Show, as we routinely do with all our other products, due to the commercial sensitivity around the Echo Show UK release. So, we had to make the best judgements we could, based on the analytics we had and some expert reviewing within the team 😉 That said, we did plenty of internal testing with unsuspecting reception staff and people from other teams – Neisen’s guidance still stands: 5 users can get you 80% of usability issues, and we definitely found UX improvement, even testing with internals. Aside from the Show, we test future concepts in a wizard-of-oz style with one of us dialing in to the test lab and pretending to be Alexa. We get a huge amount of insight without writing a single line of code using this method, but that’s a whole other blog post for another day 😉

So there we go. Armed with these words of wisdom, and your existing voice-first skill, you should be fully equipped to create the next big app for the next big platform. Remember: think differently, this market is very new, look for users outside your traditional demographics and be prepared to keep your skills updated regularly as tech and consumer adoption changes. Good luck!

Craig Pugsley
Bristol, UK – Sept 2017

To find out more about the Just Eat Alexa Skill visit: //www.just-eat.co.uk/alexa

For more information visit on designing for Alexa visit: //developer.amazon.com/designing-for-voice/


Beautiful Rooms & Why Smartphones Are Too Dumb

Some time in the future, the age of the smartphone will draw to a close and experiences will become more in-tune with the way humans actually live. We need to be thinking about this new wave of interactions at a time when our customer’s attention is a premium. We need to be augmenting their worlds, not trying to replace them…

I’m Craig Pugsley – a Principal UX Designer in Product Research. Our team’s job is to bring JUST EAT’s world-leading food ordering experience to the places our consumers will be spending their future, using technology that won’t be mainstream for twelve to eighteen months.

It’s a great job – I get to scratch my tech-geek itch every day. Exploring this future-facing tech makes me realise how old the systems and platforms we’re using right now actually are. Sometimes it feels like we’ve become their slaves, contorting the way we want to get something done to match the limitations of their platforms and the narrow worldview of the experiences we’ve designed for them. I think it’s time for change. I think smartphones are dumb… I feel like we’ve been led to believe that ever more capable cameras or better-than-the-eye-can-tell displays make our phones more useful. For the most part, this is marketing nonsense. For the last few years, major smartphone hardware has stagnated – the occasional speed bump here, the odd fingerprint sensor there… But nothing that genuinely makes our phones any smarter. It’s probably fair to say that we’ve reached peak phone hardware.


What we need is a sea-change. Something that gives us real value. Something that recognises we’re probably done with pushing hardware towards ever-more incremental improvements and focuses on something else. Now is the time to get radical with the software.

I was watching some old Steve Jobs presentation videos recently (best not to ask) and came across the seminal launch of the first iPhone. At tech presentation school, this Keynote will be shown in class 101. Apart from general ambient levels of epicness, the one thing that struck me was how Steve referred to the iPhone’s screen as being infinitely malleable to the need – we’re entirely oblivious to it now, but at that time phones came with hardware keyboards. Rows of little buttons with fixed locations and fixed functions. If you shipped the phone but thought of an amazing idea six months down the line, you were screwed.

In his unveiling of the second generation of iPhone, Jobs sells it as being the most malleable phone ever made. “Look!” (he says), “We’ve got all the room on this screen to put whatever buttons you want! Every app can show the buttons that make sense to what you want to do!”. Steve describes a world where we can essentially morph the functionality of a device purely through software.


But we’ve not been doing that. Our software platforms have stagnated like our hardware has. Arguably, Android has basic usability issues that it’s still struggling with; only recently have the worse Bloatware offenders stopped totally crippling devices out-the-box. iOS’s icon-based interface hasn’t changed since it came out. Sure, more stuff has been added, but we’re tinkering with the edges – just like we’ve been doing with the hardware. We need something radically different.

One of the biggest problems I find with our current mobile operating systems is that they’re ignorant of the ecosystem they live within. With our apps, we’ve created these odd little spaces, completely oblivious to each other. We force you to come out of one and go in the front door of the next. We force you to think first not about what you want to do, but about the tool you want to use to do it. We’ve created beautiful rooms.

Turning on a smartphone forces you to confront the rows and rows of shiny front doors. “Isn’t our little room lovely” (they cry!) “Look, we’ve decorated everything to look like our brand. Our tables and chairs are lovely and soft. Please come this way, take a seat and press these buttons. Behold our content! I think you’ll find you can’t get this anywhere else… Hey! Don’t leave! Come back!”

“Hello madame. It’s great to see you, come right this way. Banking, you say? You’re in safe hands with us. Please take a seat and use this little pen on a string…”

With a recent iOS update, you’re now allowed you to take a piece of content from one room and push it through a little tube into the room next door.

Crippled by the paralysis of not alienating their existing customers, Android and iOS have stagnated. Interestingly, other vendors have made tantalizing movements away from this beautiful-room paradigm into something far more interesting. One of my favorite operating systems of all time, WebOS, was shipped with the first Palm Pre.


There was so much to love about both the hardware and software for this phone. It’s one of the tragedies of modern mobile computing that Palm weren’t able to make more of this platform. At the core, the operating system did one central thing really, really well – your services were integrated at a system level. Email, Facebook, Twitter, Flickr, Skype, contacts – all managed by the system in one place. This meant you could use Facebook photos in an email. Make a phone call using Skype to one of your contacts on Yahoo. You still had to think about what beautiful room you needed to go into to find the tools you needed, but now the rooms were more like department stores – clusters of functionality that essentially lived in the same space.

Microsoft took this idea even further with Windows Phone. The start screen on a Windows Phone is a thing of beauty – entirely personal to you, surfacing relevant information, aware of both context and utility. Email not as important to you as Snapchat? No worries, just make the email tile smaller and it’ll report just the number of emails you haven’t seen. Live and die by Twitter? Make the tile huge and it’ll surface messages or retweets directly in the tile itself. Ambient. Aware. Useful.



Sadly, both these operating systems have tiny market shares.

But the one concept they both share is a unification of content. A deliberate, systematic and well executed breaking down of the beautiful room syndrome. They didn’t, however, go quite far enough. For example, in the case of Windows Phone, if I want to contact someone I still need to think about how I’m going to do it. Going into the ‘People Hub’ shows me people (rather than the tools to contact them), but is integrated only with the phone, SMS and email. What happens when the next trendy new communication app comes along and the People Hub isn’t updated to support the new app? Tantalizingly close, but still no cigar.

What we need is a truly open platform. Agnostic of vendors and representing services by their fundamentally useful components. We need a way to easily swap out service providers at any time. In fact, the user shouldn’t know or care. Expose them to the things they want to do (be reminded of an event, send a picture to mum, look up a country’s flag, order tonight’s dinner) and figure out how that’s done automatically. That’s the way around it should be. That’s the way we should be thinking when designing the experiences of the future.


Consider Microsoft’s Hololens, which was recently released to developers outside of Microsoft. We can anticipate an explosion of inventiveness in the experiences created – the Hololens being a unique device leapfrogging the problem of beautiful rooms to augment your existing real-world beautiful rooms with the virtual.


Holographic interface creators will be forced to take into account the ergonomics of your physical world and work harmoniously, contextually, thoughtfully and sparingly within it. Many digital experience designers working today should admit to the fact that they rarely take into account what their users were doing just before or just after their app. This forces users to break their flow and adapt their behavior to match the expectations of the app. As users, we’ve become pretty good at rapid task switching, but doing so takes attention and energy away from what’s really important – the real world and the problems we want to solve.

Microsoft may be one of the first to market with Hololens, but VR and AR hardware is coming fast from the likes of HTC, Steam, Facebook and Sony. Two-dimensional interfaces are on the path to extinction, a singular event that can’t come quick enough.