Things aren’t, of course, quite so straightforward. The massive datasets required to train Natural Language Processing (NLP) engines—which help voice-driven interfaces execute search queries with greater accuracy—don’t exist in regional languages.
According to the Annual Digital Indian Language Report published by Reverie, an Indian technology firm enabling Indian languages across various devices, regional language content formed a minuscule 0.1% of the total online content as of 2017. As a result, there aren’t enough web pages for the NLP software to crawl through and extract voice output from.
Agentive Design Tech
An artificial intelligence method named ‘Agentive Design Technology’ used in assistants like Siri and Alexa allows voice assistants to learn user habits and behavior to make better suggestions to users
And the problem doesn’t end with maintaining the quality of the response. How the response is delivered by the app, could also pose problems. This is something that Gopal, Reverie’s voice assistant, is facing. Reverie found that just using voice didn’t make for a smooth end-to-end process. The voice assistant also had to understand the assortment, quantity, and product categories.
Take, for instance, buying apples. A simple query like Mujhe seb khareedne hai (order me apples) wouldn’t be enough for Gopal to place an order.
Even if it gets the quantity right by asking Kitna? (how much?), it will still have to choose from a list of apple varieties and price ranges. It might start to list apple varieties of every price. That would considerably slow down the buying process, possibly leaving the user confused as well. It would also require work to get Gopal to identify good deals when a consumer asks for them. “Just quoting the cheapest item won’t do the job,” adds Vivekananda Pani, co-founder, and CEO of Reverie.
To address this challenge, Slang Labs and Reverie are training their engines to identify words or phrases rather than translating entire sentences.
For instance, “order me apples” can be broken down into different groups. “Order” is an action category, “apples”, too, can be classified as a category, and, similarly, there can be other groups such as price range, color, etc. To reduce transaction time further, Reverie sees a combination of voice, text and visual components as the optimal approach, says Pani.
The focus of their efforts—both for Reverie’s Gopal and Slang Labs—has also been narrowed as a result. Both have zeroed in on very specific use cases or internet segments to double down on. “Segments that we are looking at have been (online) travel, fashion, and e-commerce segments, or essentially where there is a transaction to be done,” says Rangarajan.
Taking an example of a service like Swiggy, he points out that Swiggy’s “proprietary data” such as restaurant data, cuisines, dishes and preferences can act as data points to train the NLP engine. “We are coming in as a tech layer by providing all the pieces that are required to recognize a user’s voice for search and discovery in an e-commerce app,” Rangarajan adds.
Walking the talk
Yes, the use cases of vernacular voice tech are not as expansive as one might have hoped. Sarath Naru, managing partner, Ventureast Capital, however, believes their limited focus is the way to go. Ventureast is an investor in Indus OS, a mobile software company that develops operating systems in regional languages, and had backed Rangarajan’s earlier venture Little Eye Labs.
“When it comes to voice assistants in India, I don’t think only one company or stakeholder can make it big. There are plenty of applications to crack, and one single startup or product won’t be able to provide everything. I believe the Indian developers will build [voice recognition] around niche internet use cases, while the big shots like Google and Alexa will build a broader voice assistant for browsing content online,” he says.