After years of raising 3 kids, you would think if I ask to add diapers to the cart, it would know something. But no, it would just go with whatever is the top recommended, or first in a search, or something like that. Nothing using the brand or most recent sizes we purchased.
There was no serious attempt to drive real commerce. Instead, Alexa became full of recommendation slots that PMs would battle over. "I set that timer for you. Do you want to try the Yoga skill?"
On the other hand, they have taken on messy problems and solved them well, but not using technology, and for no real financial gain. For example, if you ask for the score of the Tigers game, Alexa has to reconcile which "Tigers" sports team you mean among both your own geography and the worldwide teams, at all levels from worldwide to local, across all sports, might have had games of interest. People worked behind the scenes to manage this manually, tracking teams of interest and filling intent slots daily.
The insane lack of basic heuristics in every day apps to do very obvious things like you mentioned baffles me. They can come up with huge scale fuzzy vector search AI suggestion systems for a billion users, but can't think to do stuff like, only suggest things available in your size?
I'm actually working on an app that solves this for a specific use case, tho it isn't in the retail space.
Voice assistants are particularly egregious - they've done all that work to correctly recognise the words I said - i.e. the hard part - but then it breaks because I said "set reminder" instead of "create reminder"??
My question is always if it’s just the company I’m at or is it how the who industry is run? The more companies I work at the more i realize it’s the later.
On the flip side, it’s easy to take for granted what DOES work when you know how much better it could be. I was siting at dinner with an 73 year old man yesterday who could stop talking about how amazing Siri is cause it’ll tell him the population of some country.
This only goes so far, trying to use your own head as a simulation or approximation of user experience. Some of us will be building software for people who we will never be in our lifetime.
They're hard in different ways (and ML helped with voice recognition to a degree that PhD linguists struggled to do for years.
But to your example. OK. Set and create probably mean the same thing in the context of a reminder. Probably add and a few other things too. Should this go on some running ToDo list app I use? Should it ask me for a due date? Should it go on my calendar? And that's a very simple example.
> Should this go on some running ToDo list app I use? Should it ask me for a due date? Should it go on my calendar? And that's a very simple example.
Yes. If it isn't obvious from the context, it should ask.
What it should not do, is demand you to issue all your commands in format of "${brand 1}, do ${something} with ${brand 2} in ${brand 3}". That's what makes current voice assistants a cringe.
> Voice assistants are particularly egregious - they've done all that work to correctly recognise the words I said - i.e. the hard part - but then it breaks because I said "set reminder" instead of "create reminder"??
They hardly even managed the hard part. What's surprising for me is that for a year now, ChatGPT app has been miles ahead of SOTA in voice assistants in terms of speech-to-text with whatever the thing is they're using, and somehow none of the voice assistants managed to improve. OpenAI could blow them all out of the water today, if they delegated a couple of engineers to spend a week integrating their app deeper into Android intent system - and 90% of that wouldn't be because GPT-4, but because of speech-to-text model that doesn't suck donkey balls.
> somehow none of the voice assistants managed to improve.
No one has been working on the old generation of assistants for years now. They all basically came to the conclusion that the architecture that everyone had settled on was a dead end and wouldn't get any better, so they directed their attention elsewhere.
Now Google is working on it again, but just using an LLM for better intent parsing isn't exciting enough to warrant attention, so in classic Google fashion they launched a brand new product (Gemini) that's going to run alongside Assistant for a few years confusing everyone until they yank Assistant (which still will have features that haven't been ported).
Apple seems to be working on improving Siri rather than starting fresh, but it's taken them a while to get it ready because Apple never moves on something fast.
Actually, speech-to-text benefits massively from a good language model. It's impossible to do speech to text if you don't understand the language. The better you understand the language and the context of what is being said, the better you will be at speech-to-text. So it's no surprise whatsoever to anyone that the best-in-class language model would have the best in class speech-to-text.
I think a lot of people underestimate how disconnected simple sound patterns are from human speech. It's hard if not impossible to even recognize word boundaries on a phonogram of regular human speech, even for highly eloquent speakers in formal settings. And many sounds are entirely ambiguous, people rarely understand the exact phonemes they use in practice. For example, most native English speakers pronounce the "peech" part of "speech" more like "beach" than like "peach", if you look at a phonogram [0]. Phonetics is really complicated, and varies far more between languages than people tend to assume.
After years of raising 3 kids, you would think if I ask to add diapers to the cart, it would know something. But no, it would just go with whatever is the top recommended, or first in a search, or something like that. Nothing using the brand or most recent sizes we purchased.
There was no serious attempt to drive real commerce. Instead, Alexa became full of recommendation slots that PMs would battle over. "I set that timer for you. Do you want to try the Yoga skill?"
On the other hand, they have taken on messy problems and solved them well, but not using technology, and for no real financial gain. For example, if you ask for the score of the Tigers game, Alexa has to reconcile which "Tigers" sports team you mean among both your own geography and the worldwide teams, at all levels from worldwide to local, across all sports, might have had games of interest. People worked behind the scenes to manage this manually, tracking teams of interest and filling intent slots daily.