Apple has introduced a novel approach to training its AI models that does not involve gathering or duplicating user content from iPhones or Macs.
As per a recent blog post, Apple is set to continue utilizing synthetic data (artificially created data that mimics user behavior) and differential privacy to enhance features such as email summaries, all while steering clear of accessing personal emails or messages.
For users who choose to participate in Apple’s Device Analytics program, the company’s AI models will contrast synthetic email-like messages with a small subset of a real user’s content stored locally on the device. Subsequently, the device identifies the synthetic messages that most closely align with the user’s sample and transmits information about the chosen match back to Apple. No actual user data is transmitted outside the device, and Apple asserts that it only receives aggregated data.
This innovative technique enables Apple to enhance its models for tasks like text generation without the need to collect genuine user content. It builds upon Apple’s longstanding use of differential privacy, which injects randomized data into broader datasets to safeguard individual identities. Apple has been employing this method since 2016 to comprehend usage patterns in accordance with the company’s privacy policies.
Apple is already leveraging differential privacy to enhance features like Genmoji, where it gathers generalized trends about popular prompts without linking any specific prompt to a particular user or device. Moving forward, Apple intends to apply similar techniques to other Apple Intelligence features, including Image Playground, Image Wand, Memories Creation, and Writing Tools.
In the case of Genmoji, Apple anonymously polls participating devices to ascertain the visibility of specific prompt fragments. Each device responds with a noisy signal – some responses reflect actual usage, while others are randomized. This approach ensures that solely widely-used terms are disclosed to Apple, with no individual response being traceable back to a user or device, as per Apple.
For tasks like summarizing emails, Apple has devised a new method. The company generates thousands of sample messages, converting them into numerical representations known as ’embeddings’ based on language, tone, and topic. User devices then compare these embeddings to locally stored samples, with only the selected match being shared, not the actual content.
By gathering the most frequently-selected synthetic embeddings from participating devices, Apple refines its training data, ultimately enabling the system to generate more relevant and realistic synthetic emails. This process aids Apple in enhancing its AI outputs for summarization and text generation without compromising user privacy.
The rollout of this system is currently in the beta phase for iOS 18.5, iPadOS 18.5, and macOS 15.5. According to Bloomberg’s Mark Gurman, Apple is addressing challenges in its AI development, which have included delayed feature releases and the aftermath of changes in the Siri team.
The efficacy of Apple’s approach in enhancing AI outputs remains to be seen, but it underscores the company’s commitment to balancing user privacy with model performance.