It’s becoming somewhat of a running gag that any device or object will be made ‘smart’ these days, whether it’s a phone, TV, refrigerator, home thermostat, headphones or gla…
Being able to point a camera at something and have AI tell me “that’s a red bicycle” is a cool novelty the first few times, but I already knew that information just by looking at it.
Visual search is already useful. People go through the effort of posting requests to social media or forums asking “what is this thing” or “help me ID these shoes and where I can buy them” or “what kind of spider is this” all the time. They’re not searching for red bicycles, they’re taking pictures of a specific Bianchi model and asking what year it was manufactured. Automating the process and improving the reliability/accuracy of that search will improve day to day life.
And I have strong reservations about the fundamental issues of inference engines being used to generate things (LLMs and diffusers and things like that), but image recognition, speech to text, and translation are areas where these tools excel today.
they’re taking pictures of a specific Bianchi model and asking what year it was manufactured
And the answer they get will probably be wrong, or at least wrong often enough that you can’t trust it without looking it up yourself. And even if these things do get good enough people will still won’t be using it frequently enough to want to wear a device on their face to do it, when they can already do it better on their phone.
Visual search is already useful. People go through the effort of posting requests to social media or forums asking “what is this thing” or “help me ID these shoes and where I can buy them” or “what kind of spider is this” all the time. They’re not searching for red bicycles, they’re taking pictures of a specific Bianchi model and asking what year it was manufactured. Automating the process and improving the reliability/accuracy of that search will improve day to day life.
And I have strong reservations about the fundamental issues of inference engines being used to generate things (LLMs and diffusers and things like that), but image recognition, speech to text, and translation are areas where these tools excel today.
And the answer they get will probably be wrong, or at least wrong often enough that you can’t trust it without looking it up yourself. And even if these things do get good enough people will still won’t be using it frequently enough to want to wear a device on their face to do it, when they can already do it better on their phone.