How Can Apple Make Siri Smarter With Ferret-UI, Its Multimodal Large Language Model?

Main Image
  • Like
  • Comment
  • Share

Reports about Apple working on AI-based features for its voice assistant, Siri, have been doing rounds on the internet for quite some time. However, they’ve not been able to describe how Siri will incorporate its new abilities and benefit users, not until now. A newly published Apple research paper explains how the company could integrate Ferret UI, its generative AI model trained specifically to interpret mobile app screens, into Siri to enhance its use cases and make it one of the smartest voice assistants.

What Is Ferret-UI?

Although the research paper doesn’t elaborate on the potential applications of Ferret-UI, it provides a fair idea of how Apple envisions the AI-based tool to help Siri make sense of images and icons on iPhones’ screens. For those catching up, Ferret-UI is an advanced multimodal large language model (MLLM) designed to understand and consume information beyond text, such as images, videos, and audio, and in this context, iOS’ user interface.

Ferret-UI powered Siri upgrades

How Can Ferret-UI Fuel Siri’s Transformation Into An AI-Powered Voice Assistant?

Mobile App Interface Recognition

According to the information published in the research paper, Apple has been training Ferret-UI to recognize and analyze mobile screens. “Given that UI screens typically exhibit a more elongated aspect ratio and contain smaller objects of interest (e.g., icons, texts) than natural images, we incorporate any resolution on top of Ferret to magnify details and leverage enhanced visual features,” mentions the paper.

Interaction With Apps

In other words, Ferret-powered Siri should be able to take commands related to the on-screen content. This could include opening and closing apps, pressing a particular button on the screen, navigating around the interface, which is otherwise possible via touch-based inputs, summarizing text on the screen, and so on. What’s promising is that the research paper claims better results than GPT-4V and other leading UI-focused MLLMs.

iphone-ios-17

As and when Siri becomes capable of recognizing the on-screen content and interacting with it, iPhone users will be able to perform a multitude of tasks via voice commands. For instance, Ferret-powered Siri should be able to interact with apps to order food, add items to your shopping list, book flights, search for TV shows on Netflix, and so much more. Although, there’s one thing that we’re concerned about, and that’s the clarity of commands that one might have to maintain.

In its current state and form, Siri does some basic tasks in the intended manner, but it can’t pick up the right words every two out of 10 times. This often happens when trying to use Siri to play specific tracks on Apple Music or open a specific app. However, when it gets an upgrade (based on the current research paper), Siri will become one of the most capable voice assistants.

You can follow Smartprix on TwitterFacebookInstagram, and Google News. Visit smartprix.com for the most recent newsreviews, and tech guides

Shikhar MehrotraShikhar Mehrotra
Shikhar Mehrotra is a seasoned technology writer and reviewer with over five years of experience covering consumer tech across India and global markets. At Smartprix, he has authored more than 1,700 articles, including news stories, features, comparisons, and product reviews spanning automobiles, smartphones, chipsets, wearables, laptops, home appliances, and operating systems. Shikhar has reviewed flagship devices such as the iPhone 16, Galaxy S25+, and Sennheiser HD 505 Open-Ear headphones. He also contributes regularly to Smartprix’s growing automotive section.

With a deep understanding of both iOS and Android ecosystems, Shikhar specializes in daily tech news, how-to explainers, product comparisons, and in-depth reviews. His DSLR photography in product reviews is recognized as among the best on the team.

Before joining Smartprix, Shikhar wrote for leading publications including Forbes Advisor India, Republic World, and ScreenRant. He holds a Bachelor of Arts in Journalism and Mass Communication from Amity University, Lucknow.

Related Articles

ImageCasio G-Shock D5000R Review: A Nostalgic Revival

Having spent a decade within the world of tech & wristwatches, I have started appreciating the rugged resilience and everyday practicality of Casio’s legendary G-Shock series. I was genuinely excited when the reissue of the iconic Casio D5000C from 1983, now brought in as the Casio G-Shock D5000R, was announced. After two solid weeks of …

ImageiOS 18 To Focus On AI-Related Features: Mark Gurman

After successfully releasing iOS 17 for compatible devices, Apple is working on the next iteration of its operating system for iPhone – iOS 18. The Cupertino-based giant should announce the version sometime next year, probably during the WWDC 2024, and release it as a beta version for early adopters and testers. Although iOS 17 was …

ImageGalaxy AI Won’t Kill Bixby, But It Might Improve The Voice Assistant Over Time

At the Galaxy Unpacked event held in January, Samsung fully revealed the Galaxy AI. The AI-enabled software utilizes both on-device and cloud-based solutions to enhance communication and increase productivity. However, since some of the new AI-based features are similar to those of Samsung’s in-house voice assistant, Bixby, this raises several questions. Galaxy AI And Bixby …

ImageApple’s Bold New UI: Why Liquid Glass Is Dividing Fans

Apple’s WWDC 2025 keynote was big visually, symbolically, and strategically. But whether it was revolutionary depends on where you stand in the growing debate over the company’s new design identity. Meet Liquid Glass, Apple’s glossy, depth-rich visual language that now carries forward to every other Apple OS update, from iOS 26 and iPadOS 26 to …

ImageApple Announces visionOS 26: Brings Lifelike Personas, Spatial Widgets, and more

At their World Wide Developer Conference (WWDC) today, Apple also announced the new visionOS 26. This new operating system for the Vision Pro headset not only adopts the new “Liquid Design” language seen across Apple’s lineup but also brings new intutive feature to the Vision Pro headset. ALSO READ: Full List of iPhone Models Compatible …

Discuss

Be the first to leave a comment.