However, due to Apple’s strong commitment to user privacy, we couldn’t use iCloud servers for computer vision computations. Very large network models, and potentially ensembles of large models, can run on the server side, allowing clients (which could be mobile phones) to take advantage of large deep learning architectures that would be impractical to run locally.Īpple’s iCloud Photo Library is a cloud-based solution for photo and video storage. Cloud-based services typically use powerful desktop-class GPUs with large amounts of memory available. In a cloud-based solution, images are sent to a server for analysis using deep learning inference to detect faces. Most of the industry got around this problem by providing deep-learning solutions through a cloud-based API. Compared to traditional computer vision, the learned models in deep learning require orders of magnitude more memory, much more disk storage, and more computational resources.Īs capable as today’s mobile phones are, the typical high-end mobile phone was not a viable platform for deep-learning vision models. We had to completely rethink our approach so that we could take advantage of this paradigm shift. With the advent of deep learning, and its application to computer vision problems, the state-of-the-art in face detection accuracy took an enormous leap forward. We based subsequent improvements to CIDetector on advances in traditional computer vision. The earliest release of CIDetector used a method based on the Viola-Jones detection algorithm. This API was also used internally by Apple apps, such as Photos. IntroductionĪpple first released face detection in a public API in the Core Image framework through the CIDetector class. This article discusses these challenges and describes the face detection algorithm. We faced significant challenges in developing the framework so that we could preserve user privacy and run efficiently on-device. With the release of the Vision framework, developers can now use this technology and many other computer vision algorithms in their apps. We’ll have to test it out in person to see exactly how well it performs, but it sounds like Apple is doing much more to apply AI to users’ photos and make that information useful.Apple started using deep learning for face detection in iOS 10. In addition to extracting text from photos, iOS 15 will also allow users to search visually - a feature that sounds exactly the same as Google Lens and that Apple calls Visual Look Up.Īpple didn’t go into much detail about this feature during its presentation at WWDC, but it said the new tool would recognize “art, books, nature, pets, and landmarks” in photos. It also integrates with Apple’s Spotlight search feature on iOS, allowing you to search your camera roll based on the text in images.
#Apple photos api mac#
Live Text works across iPhones, iPads, and Mac computers and supports seven languages: English, Chinese (both simplified and traditional), French, Italian, German, Spanish, and Portuguese. (It stresses Apple’s privacy-heavy approach to AI, which focuses on processing data on-device rather than sending it to the cloud.) Apple’s Live Text feature brings OCR to the camera app. You can copy and paste that text, search for it on the web, or - if it’s a phone number - call that number.Īpple says the feature is enabled using “deep neural networks” and “on-device intelligence,” with the latter being the company’s preferred phrasing for machine learning. With Live Text, for example, you can tap on the text in any photo in your camera roll or viewfinder and immediately take action from it. But Apple’s implementation does look typically smooth. This is certainly not a new feature for smartphones, and we’ve seen companies like Samsung and Google offer similar tools in the past. This unlocks a slew of handy functions, from turning handwritten notes into emails and messages to searching your camera roll for receipts or recipes you’ve photographed. Apple has announced a new feature called Live Text, which will digitize the text in all your photos.