How we built what3words voice input
By Josh Wigmore, Head of Product at what3words
We’re always looking for easier ways in which people can use 3 word addresses. After all, one of the key benefits of using 3 word addresses instead of traditional addresses is that they’re short and made of three words from the dictionary. This means they’re very quick to enter into navigation software, ride-hailing apps and delivery forms.
This benefit is even clearer when someone speaks a 3 word address. It’s very easy to say a 3 word address and, as voice tech continues to advance, we knew this would be a key input method that we wanted to offer our users.
We were presented with a somewhat paradoxical challenge though: virtually all major speech recognition platforms have been investing in Natural Language Processing (NLP) technology recently. Voice technology has become conversational, aware of commonly used phrases and word combinations, and can intelligently understand what people say and correctly interpret as needed.
Switch on the lightning in the living room –> Switch on the lighting
in the living room.
Based on the words ‘switch on’ and ‘living room’’, the technology intelligently guesses that it’s unlikely that you meant ‘lightning’ in a house and corrects to a more likely alternative.
what3words voice input, however, needed to recognise three intentionally random words taken from our wordlists, which contain around 40,000 words in English. The three words that make up a 3 word address do not link to each other in any way, and similar 3 word addresses are located as far apart as geographically possible to help with error recognition, but this presents a challenge here.
Fortunately we have been able to leverage Nuance VoCon Hybrid voice technology which can be ‘taught’ to select only from a set dictionary of words, of which our wordlists are a perfect example. Given the 3 word address that is entered has to come from one of our 28+ language wordlists, they make an ideal source with which to train a speech recognition engine, and have allowed us to average 95%+ accuracy of recognition in a lot of our major dialects.
Next we thought about how a user could best benefit from voice with their 3 word address. The mechanism is very much an input method rather than a search, in that a user needs to know their 3 word address to use it.
So the most compelling reason for users to opt for voice entry is when they are rushing or otherwise unable to use the keyboard – a situation that often arises when driving or hurrying for a taxi. This is why we are primarily exploring voice as an input method for navigation, whether via third-party navigation or ride-hailing apps, or via the in-app navigation we are developing. This will give users an even easier way to navigate with 3 word addresses, without having to leave the app.
We embedded voice entry as a button in the search screen, to reinforce the user understanding that it is an alternative input method to typing. As our focus on this user journey grows, we are exploring how we could give voice entry a more prominent position when the user enters the app, reducing the taps required to navigate to a 3 word address with voice.
The input is captured using the device microphone and passed through the Nuance VoCon Hybrid SDK; this returns JSON which is passed into optimised post-processing functions in our own what3words SDK. Leveraging our AutoSuggest function, we use this output to suggest and display the top three 3 word addresses that we believe may have been spoken, placing an increased weight and ranking on locations close to the user. This decision was taken under the belief that users would more often than not being navigating or taking a ride to a location within 100km of their current location.
Once the user selects the correct 3 word address, they see the map screen centred on their result, with tools to further interact with that 3 word address such a saving it to their account, sharing it with friends or navigating to it with one of their existing installed 3rd party apps. In keeping with streamlining the process of voice navigation, we plan to offer the option for power users to predefine their preferred navigation output, allowing them to navigate to a 3 word address immediately after speaking it, without any more interaction in the app.
As a further optimisation we could allow this entire user journey to begin from the device lock screen, and trigger it via a Siri or Google Assistant shortcut, perhaps similar to ‘Hey Siri, ask what3words to navigate to filled.count.soap’, but due to the conversational speech recognition issues mentioned above, we would need to split out the voice handled by Siri vs that handled by Nuance. This is something we’re continuing to work on, and we are confident that voice is and will remain one of the simplest and easiest ways to interact with a 3 word address.