Siri was introduced and integrated into iOS and macOS over ten years ago. In that time, little has changed about it. There are articles from 2016 complaining about how far it was behind Google and Cortana (RIP).
In the meantime, LLMs have changed the game when it comes to language understanding, context, and ability to execute based on natural language instruction. All the tools are here: Speech-to-text through libraries like Whisper, and of course the LLMs themselves which interpret language and context to a revolutionary degree. Text-to-speech is already solved.
So, ignoring the “AI hype” and seeing what these tools actually are and can do, it naturally fits that a better personal assistant could be created by combining these pieces and leveraging the existing iOS and macOS APIs to improve Siri’s functionality by an order of magnitude, or more.
Apple remains the only big tech company not publicly diving head-first into the LLM race, but it’s relatively well-known that they’re developing one internally. As that was reported last July right around WWDC 2023, it was obviously not ready for an announcement. Besides, that one was all about Vision Pro anyway.
So, that point, coupled with the fact that Apple tends to be last on the trend train but offer a more polished experience, would make WWDC 2024 the perfect time to reveal an “all new” or revamped Siri which could run locally on, say, newer iOS devices or Macs with Apple Silicon. I use SuperWhisper on my M2 MacBook Air and it works amazingly well. It would also be a good carrot to get people to upgrade their phones, iPads, and Macs. Besides, what else would be important at WWDC next year? I don’t see much else on the horizon, other than the real launch of Vision Pro.
Anyhoo, my random unsubstantiated thoughts. Talk among yourselves.
Would be cool. Tho in my experience Apple has always been a “wait till it’s good enough” kind of company instead of a “let’s try implementing brand new companies”.
Right, that’s why I think they’ll wait until WWDC next year and not do some reveal any earlier.
I still think the technology will be to new for them at that point.
Fair. I suppose it depends on how well it’s coming along in their internal development.
Not with everything though.
Siri will never be versioned. She has always been evolving. This latest major update across their entire product lines has greatly improved her abilities and understanding. Today, Oct 13, 2024, I can finally say she’s not complete shit!
Still got a long way to go but she’s been handling my requests much better. Especially when I’m in a room with AirPods in, with an Apple TV, iPad, iPhone, Mac and HomePods are listening for her too. She bounces to whatever device and gets it done. Prior iterations would always just fizzle out if I was connected to the Apple TV with headphones.
In any case, outside the iPhone, none of the versioning encroaches on their product naming schemes. There’s no reference to the iPad Pro being the iPad Pro 6. Or the Mac Studio 2. I think they place Siri alongside their physical product line and if there is any major succession, it’ll take on another name. Like MobileMe to iCloud.
Thanks for coming to my ted talk.
Ok they may not announce it specifically as “Siri 2.0” but I think we can agree that if Siri is powered with an LLM they would tout it as “next-generation” or something like that. My point is that these new technologies lend themselves well to vastly improving Siri, even in its current “evolved” form. Personally, I still think it’s very lacking.
In the meantime, LLMs have changed the game when it comes to language understanding
I don’t think this is true at all, nor do I think we’re any closer than we were several years ago. LLMs don’t understand anything at all. Given a prompt, they assemble portions of words into something that is likely to resemble what a desired response might look like, based on whatever corpus of text they’ve been fed.
They do not actually comprehend the question and then answer it.
Siri actually answers questions using a curated knowledge database. If it doesn’t have an answer, it doesn’t pretend to have. LLMs don’t really have a concept of knowing the answer or not knowing the answer since they’re not based around a repository of facts in the first place. If they have enough training data to assemble something that looks like a response that answers it, they’ll output that response. Whether it’s true or not isn’t even relevant to how they work.
If I ask Siri a question, I want the response to be reliable, or just tell me it doesn’t know. If I ask it to complete a specific task, it needs to have been programmed for that task anyway, so LLMs don’t add anything there. Either it recognizes (meaning matches keywords in its database of functions) a task it knows how to do or it doesn’t.
It can always gain new functions or new knowledge sources, but none of that involves adding a bullshit generator.