I feel like it will certainly need to be a hybrid approach for the moment. They’ll get a long way running small highly specific and tuned models locally on the neural engine for a lot of stuff. But, at some point, people are going to expect to ask questions and have something in the league of a Mistral/Claude/ChatGPT talk back and that’s just not possible with today’s hardware. Over time I expect more and more of that will get moved locally, and less will hit the “escape hatch” of throwing things to a giant LLM.
If they end up renting rather than building that capability, it says to me that they’re betting on being able to move most of that locally to the phone on a relatively short time horizon, which is interesting.
>Over time I expect more and more of that will get moved locally
I'm not so sure whether this is the direction of travel. The economics are pretty harsh for local, general purpose AI. And battery will always be a limiting factor.
If you're sending a few hundered requests per day to a very powerful AI then sharing the cost of the inference machinery with others has overwhelming economic advantages. And it will need access to current data anyway, so it can't be completely local.
There will be tasks such as keyboard autocomplete and a lot of other specialised tasks where latency or privacy is more important than quality. So yes I do believe in hybrid. But I think the cloud will always do the heavy lifting when it comes to more general tasks.
I can imagine an alternative scenario where a lot of AI processing happens on Mac and PC and mobile devices benefit from that. But many people don't have powerful desktop devices. So I'm not sure Apple can rely on this.
The article is pretty clear that it's going to be a service offered as a secondary option instead of core functionality.
>Apple is preparing new capabilities as part of iOS 18 — the next version of the iPhone operating system — based on its own AI models. But those enhancements will be focused on features that operate on its devices, rather than ones delivered via the cloud. So Apple is seeking a partner to do the heavy lifting of generative AI, including functions for creating images and writing essays based on simple prompts.
I'm still expecting this is the end goal. But I've started to question what inference on device will do to battery life. I suspect it will drain devices pretty quickly but if anyone has any studies to suggest otherwise please let me know!
I don't think this excludes doing their own thing. It would be in the Apple playbook to have more than one team working on different approaches internally and still negotiate with Google about a possible partnership.
I can't see how they can delivery 'thoughtful integration' with the Apple services when work is being done on Googles servers, but perhaps the "efficient inference on device" (which could be the biggest product leap for Apple since the iPhone) is taking too long and they need a stop gap to stop the downward pressure on their shareprice.