Amazon’s Alexa has started to crawl, aged 6. It matters.

The new Echo Show 10 is a landmark device that asks as many questions of us as we will ask it.

David Kerrigan
6 min readMar 10, 2021

If you ask Alexa her birthday, she’ll tell you it was 6th November 2014, some 3 years after she was conceived. She came into the world with ears (7 microphones) and a mouth (two speakers), somewhat disembodied in a 9.25” tall metal cylinder. Then, in mid 2017, in the Echo Show, she opened her eyes (well, eye — a single 5 megapixel camera) to facilitate video calls and also brought a screen, together with the ability for her to show us answers, not just speak them to us. In late 2019, she gained the amazing ability to identify objects shown to her with the simple question “Alexa, what am I holding?” — a real boon for those with eyesight challenges.

The arrival of the 3rd generation Echo Show 10 marks another huge milestone in her evolution — now, she can move to better interact with us. And so we need to move how we think about technology. Just as parents need to increase their vigilance as their children start to move, we need to keep an eye on where Alexa is going.

A small step for Alexa, a giant leap for HCI

The new Echo Show has a 10 inch screen that can swivel silently to face you as you move around the room. With almost unwavering accuracy (as long as the room isn’t too dark), the device ensures its screen is visible to you wherever you are in the room. It can move rapidly on its axis to keep up with a fast-paced walk, with up to 350 degrees of movement — positioning and space permitting — but thankfully has the smarts to stop if it bumps into something. The motor can turn at up to 180 degrees per second, but operates more slowly to avoid mishaps and maintain a smooth picture.

The Echo Show 10 can swivel its 10” screen and 13mp camera (Image courtesy Amazon)

Although Alexa started life in smart speakers, she has quickly taken up residence in hundreds of form factors, relinquishing the bonds of power cords for the freedom of wearable and mobile devices. But the new Echo Show 10 is different from Alexa in Fitbits, earbuds (Echo Buds), spectacles (Echo Frames), rings (the short-lived Echo Loop) or mobile phones. Wearing an Alexa device is still a static interaction whereas the Echo Show 10 swiveling screen imbues her with a new energy — it feels she’s more available than ever — there’s a ‘presence’ that wasn’t there before.

How we feel

Before my Echo Show 10 arrived, every review I read mentioned the word “creepy”. The Washington Post called it “invasive” and the WSJ “somewhat creepy”. When I set up my device, for the first few hours, I was impressed that, as I walked around my apartment and glanced towards the device, it was indeed somehow always facing me. It was definitely a little unnerving but as I know the device tracks me without compromising my privacy (see below), I didn’t mind. And then, after less than a day, I was so un-creeped out by this, I was actually disapproving of my previous Echo Show sitting statically, unhelpfully not bothering to look at me.

Amazon and other companies exploring this space are facing a daunting public awareness challenge — there are, admittedly, valid concerns relating to security and privacy to be addressed in the smart device sector. But, with the Echo Show 10, nobody is watching you, despite what it may look like. The device uses a combination of sound and video to figure out which way it needs to point. If you address it, it will try to swivel to where it heard its name. Echo microphones can identify the origin of a request within +/- 30 degrees. The camera and its algorithms try to make out a human shape to focus more narrowly on. With an emphasis on addressing privacy concerns, this is all done on the device, without sending scenes from your home to Amazon. In fact, as you can see in the image below, in the interests of both privacy and efficiency, the Echo Show 10 on-device processing system turns the camera image into hundreds of data points representing shapes, edges, facial landmarks, and general coloring (and then deletes the image). If you want to read more technical details, check out this page.

How Echo Show 10 tracks human shapes while protecting privacy — Image courtesy Amazon

We Need to Talk about Alexa

So why all the worry about it being creepy? There’s a physical cover for the camera and the unit will stop tracking you with a simple command. Although I often despair at reviewers’ simplistic and sensationalist use of ‘creepy’ to describe any technology that challenges convention, the instructive point is that it denotes an emotional response. What is it about the technology that unsettles some people? How can device manufacturers address peoples’ concerns?

In the Q&A after a recent talk I was giving on AI and the future of retail, and from reading much of the coverage of Amazon’s sensor-laden Go stores, it’s becoming very clear that, outside tech circles, people don’t distinguish between facial recognition and human-shape tracking. Our relationship with technology is evolving fast and, to date, many people lack the understanding to process it and engage in a constructive debate — very important debates that we need to have about the appropriateness of technology. People thought automated elevators were creepy at first too. Are doors that slide open as we approach them creepy? Is CCTV in stores for security purposes creepy? Is AI making decisions about our credit worthiness creepy? If a smart device like an Echo could save our lives by detecting health problems, is that creepy? We need to better explore what can and should be done in a privacy-respecting way with technology. How do we capture its convenience benefits without losing what’s important to us?

Crawl before you….

The first mainstream, moving domestic smart device is an important milestone in human-computer interaction. More than just waking when it hears its name, the device responds to us, reorienting itself to us. Something feels different about its ability to follow you. It feels like your smart assistant is more available than ever, less passive. But many of us aren’t ready for technology to become more active — just because it can, should it?

We were promised ambulant robots in our homes decades ago, but motion is technologically difficult. Despite being able to control smart homes, answer questions and do other things that seemingly require huge levels of intelligence, our stationary smart assistants are subject to Moravec’s Paradox — the concept that contrary to traditional assumptions, reasoning requires very little computation, but sensorimotor skills (like walking) require enormous computational resources. There have, of course, been previous attempts at domestic robots with motion and personality (Jibo, Aido and Kuri are examples if you want to Google them), but the fact that this is a (relatively) affordable and already shipping product from smart home leaders Amazon makes it especially significant, even if not everyone will welcome this development.

The Echo Show 10 is not just about its rotating screen — it’s an early indicator for a new generation of technology that not only responds to but interacts with us. Computers that can see us. Computers that move. The Show 10 may only swivel around a fixed point — early, limited attempts at motion; Alexa is now at the crawling stage. Rumours abound that Amazon Labs are hard at work on making her walk or at least roll around the house. It may have taken her 6 years to begin looking around, but I wouldn’t bet against her flying, walking and running before she’s 10. And we need to decide how we feel about that.

--

--

David Kerrigan

Thoughts about technology and society. Author of five books: details at https://david-kerrigan.com