Style Transfer with Metal

Yoki
4 min readAug 24, 2021
Photo by Marcus Ganahl on Unsplash

Introduction

You can use Style Transfer in Create ML to apply any style to an image or video. In this article, we will process the input from the camera using the Style Transfer model, and then draw it in Metal.

Here is a screenshot of what we will create.

The entire application process is as follows.

The entire code is here.

Create Model

Before we can write code, we need a model.
You can create a model by using Create ML App. The steps are as follows

  1. Select Style Transfer in the template.
  2. Set the style image, Validation Image (for checking the application of the style during training), and Content Images (for training), and start training.
  3. Output the .mlmodel and add it to the project to be used

Please take a look at the WWDC video below for a clear explanation.

Build Image and Video Style Transfer models in Create ML — WWDC20 — Videos — Apple Developer

Setup Camera Capture

Let’s take a look at them in order.
First, configure the camera to accept input from the camera.

private func setupAndStartCaptureSession(){
self.captureSession = AVCaptureSession()

// setup capture session
self.captureSession.beginConfiguration()
if self.captureSession.canSetSessionPreset(.photo) {
self.captureSession.sessionPreset = .photo
}
self.captureSession.automaticallyConfiguresCaptureDeviceForWideColor = true
self.setupInputs()
self.setupOutput()
self.captureSession.commitConfiguration()

self.captureSession.startRunning()
}

private func setupInputs(){
guard let backCameraDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .back) else {
fatalError("❌ no capture device")
}

guard let backCameraInput = try? AVCaptureDeviceInput(device: backCameraDevice) else {
fatalError("❌ could not create a capture input")
}

if !captureSession.canAddInput(backCameraInput) {
fatalError("❌ could not add capture input to capture session")
}

captureSession.addInput(backCameraInput)
}

private func setupOutput(){
let videoOutput = AVCaptureVideoDataOutput()
let videoQueue = DispatchQueue(label: "videoQueue", qos: .userInteractive)
videoOutput.setSampleBufferDelegate(self, queue: videoQueue)

if captureSession.canAddOutput(videoOutput) {
captureSession.addOutput(videoOutput)
} else {
fatalError("❌ could not add video output")
}

videoOutput.connections.first?.videoOrientation = .portrait
}
extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

// vision request
}

Setup Vision Request

Next, we will use the Style Transfer model to process the input from the camera. You can use the Vision framework to create a request as follows You will get a CVPixelBuffer as a result.

( The DemoStyleTransfer02_usecase_image part is the file name of your model )

guard let model = try? VNCoreMLModel(for: DemoStyleTransfer02_usecase_image.init(configuration: config).model) else { return }

// Create Vision Request
let request = VNCoreMLRequest(model: model) { [weak self] (finishedRequest, error) in
guard let self = self else { return }
guard let results = finishedRequest.results as? [VNPixelBufferObservation] else { return }

guard let observation = results.first else { return }

let pixelBuffer = observation.pixelBuffer

// notify render metal
}

Execute the created request using the VNImageRequestHandler.

guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])

Setup Rendering With Metal

There are several ways to draw in Metal, but this time we will use CIContext.

Create and retain a CIImage when the result of a Vision request is retrieved.

// Create Vision Request
let request = VNCoreMLRequest(model: model) { [weak self] (finishedRequest, error) in
guard let self = self else { return }
guard let results = finishedRequest.results as? [VNPixelBufferObservation] else { return }

guard let observation = results.first else { return }

let pixelBuffer = observation.pixelBuffer
var ciImage = CIImage(cvPixelBuffer: pixelBuffer)
let scaleX = self.outputWidth / ciImage.extent.width
let scaleY = self.outputHeight / ciImage.extent.height

ciImage = ciImage.resizeAffine(scaleX: scaleX, scaleY: scaleY)!

self.currentCIImage = ciImage

DispatchQueue.main.async {
self.mtkView.setNeedsDisplay()
}
}

Then, in draw of MTKViewDelegate, use the render function to draw it.

extension ViewController: MTKViewDelegate {
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
print("\(self.classForCoder)/" + #function)
}

func draw(in view: MTKView) {
guard let commandBuffer = metalCommandQueue.makeCommandBuffer() else {
return
}

guard let ciImage = currentCIImage else {
return
}

guard let currentDrawable = view.currentDrawable else {
return
}

let heightOfciImage = ciImage.extent.height
let heightOfDrawable = view.drawableSize.height
let yOffsetFromBottom = (heightOfDrawable - heightOfciImage)/2

ciContext.render(ciImage,
to: currentDrawable.texture,
commandBuffer: commandBuffer,
bounds: CGRect(origin: CGPoint(x: 0, y: -yOffsetFromBottom), size: view.drawableSize),
colorSpace: CGColorSpaceCreateDeviceRGB())

commandBuffer.present(currentDrawable)
commandBuffer.commit()
}
}

Summary

These are the points of this application.
How did you like it? If you have any questions or doubts, feel free to comment!

Reference

--

--