Dogtector: dog breed detection app for iOS using YOLOv5 model combined with Metal based object decoder optimized
Dogtector
Project description
Dogtector is dog breed detection app for iOS using YOLOv5 model combined with Metal based object decoder optimized for ultra fast live detection on iOS devices.
Technical Overview
Requirements
Project was developed in Swift 5.5 using XCode 13.1 and was designed for devices running iOS 14 and newer.
- XCode 13.1+
- Swift 5.5+
- iOS 14.0+
Technologies
Vast majority of the project was created in SwiftUI and Combine (UIKit was used only when there was no SwiftUI alternative for desired components or functionality) and all the porting part of YOLOv5 model was handled with CoreML and Metal.
- SwiftUI
- UIKit (when necessary)
- Combine
- CoreML
- Metal
Object decoder
The most important part of the project was porting trained YOLOv5 model to iOS which turned out to be challenging for massive model trained with 417 class objects. The most basic CPU approach for object decoding was not enough as it resutled in 1.7 fps of live detection on devices equipped with latest A15 chips. Finally, the approach that worked sufficiently well for the desired use case was implementing the object decoder in Metal to process the output using GPU.
The created Metal based object decoder has been optimized for all iPhones with iOS 14 and newer taking advantage of the latest GPU enhancements on the newer models keeping support for devices as old as iPhone 6S at the same time. As a result there emerged two strategies for object decoding:
- Fastest full metal based decoder – available only for devices with GPU Family 4 (A11 and newer) taking the full adventages of fast GPU processing
- Slightly slower hybrid based decoder (mostly Metal, partly CPU based) – available for all older devices, as older GPU families seemed to have issues with atomic operations in loops in Metal kernels and do not support non-uniform thread grids
Performance
Finally, the Metal implementations appeared to be fast enough to be used for live detection for the computationally expensive model I intended to use. With comparision to CPU implementation, live detection on iPhone 13 Pro turned out to be approximaly 25x faster with the Metal decoder acheiving ~43 fps of live detection from initial ~1.7 fps on CPU.
Speed of the live detection depends largely on model parameters such as amount of classess and model input size. However, devices with Apple Neural Engine and faster GPUs perform better when compared using the same model.
The table below shows acheived results on various devices for YOLOv5m model trained with 417 object classes and with 288×288 input size:
Device | iPhone 13 Pro | iPhone 12 Pro | iPhone 11 Pro | iPhone X | iPhone 6s |
---|---|---|---|---|---|
Chip | A15 | A14 | A13 | A11 | A9 |
ANE | ✅ (5th gen) |
✅ (4th gen) |
✅ (3rd gen) |
❌ (private) |
❌ |
Apple GPU family | 8 | 7 | 6 | 4 | 3 |
Live detection performance | 30+ fps (throttled*) |
30+ fps (throttled*) |
~15 fps | ~5 fps | ~2.7 fps |
*Camera output has been limited to 30 fps for stability and battery reasons but in raw conditions A15 and A14 acheived ~43 fps and ~35 fps of live detection, respectively.
Build and run
Dotgtector has been build with pure iOS SDK without any external libraries / frameworks so building and running is as simple as opening Dogtector.xcodeproj
file with XCode and running Dogtector
target.
Important note: this repository does not contain the trained model used in the production version of Dogtector so if you would like to build the application with trained model you need to follow the instructions from bringing own model section. However, if you want to just preview how the appliaction works without getting your own model you can download the production version of the application from the App Store.
Bringing own model
Model setup
You can configure the project to work with your model very easily as it was designed to be as flexible as possible for all the models from YOLOv5 family and should work fine regardless of different model parameters like amount of classes or input size.
First, you need trained YOLOv5 model. If you don’t have any you can follow this tutorial to train your own. Keep in mind that weights in Pytorch .pt
format won’t work with CoreML so you need to convert your model to the compatible .mlmodel
format using coremltools
. If you don’t know how, please follow these instructions.
When you have your trained model in CoreML format you just have to drag and drop it to your workspace in XCode. By default the application will seek for Detector.mlmodel
file but if you want to use different name you have to change assetName
value in Dogtector/ObjectDetector/ModelInfo.swift
.
Class information
The last step is updating information about classes in order to display detection annoations properly. This data is stored in Dogtector/ObjectDetector/en.lproj/ObjectInfo.plist
. Object class information can be localized and by default there is also Polish version of the file in Dogtector/ObjectDetector/pl.lproj
directory.
Most important requirements for ObjectInfo.plist
file:
- The data should be reflection of the classes used to train your model
- Order of the items in array and should be exactly the same as in your trained model
- Identifier should be unique
Object format description:
identifier
– [required] – should be unique string, is used to find object image to display on the screen. App will seek for images named{identifier}
for larger resolution preivew and for{identifier}_miniature
thumbnail images for faster rendereing of annoatations. Although it is not colpulsory to add images of each of the classes info the project, this field is also used for internal class object identification purposes so it is required to add it even if you don’t plan to add any images.name
– [required] – string with name of class object displayed to the user.alternativeNames
– [optional] – array of strings with alternative class names displayed to the user. These will be displayed inAlternative names
section on detection information sheet – was designed to display alternative dog breed names.origin
– [optional] – array of strings with origin of class item – was designed to display countries of origin of dog breeds.url
– [optional] – string with URL to more information about class object – was designed to displayMore info
button on information sheet if present.licence
– [optional] – string with information about licence of displayed image.
Whole object format looks like this:
<dict>
<key>identifier</key>
<string>UNIQUE_NAME_IDENTIFIER</string>
<key>name</key>
<string>OBJECT_DISPLAYED_NAME</string>
<key>alternativeNames</key>
<array>
<string>ALTERNATIVE_NAME_1</string>
</array>
<key>origin</key>
<array>
<string>ORIGIN_INFO_1</string>
</array>
<key>url</key>
<string>URL_TO_MORE_INFO</string>
<key>licence</key>
<string>INFO_ABOUT_IMAGE_LICENCE</string>
</dict>
The only required fields are identifier
and name
so the simplest working example for the whole plist
file would be:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>identifier</key>
<string>dog</string>
<key>name</key>
<string>Dog</string>
</dict>
</array>
</plist>