Face detection with Core Image on Live Video

Face detection with Core Image on Live Video

In this article I will explain how to do face detection on a live video feed using an iOS 5 device. We will be using Core Image to do the heavy lifting. The code is loosely based on the SquareCam sample code from Apple.

To get started, we need to show the live video of the front facing camera. We use AVFoundation to do this. We start by setting up the AVCaptureSession. We use 640x480 as the capture resolution. Keep in mind that face detection is relatively compute intensive. The less pixels we need to munch, the faster the processing can be done. This is an interactive application, so realtime performance is important. We tell the AVCaptureSession which camera to use as input device.

To show the preview, we create an AVCaptureVideoPreviewLayer and add it to the previewView, that was created in the Xib. Don't forget to call [session startRunning]. That was the easy part.

NSError *error = nil;
AVCaptureSession *session = [[AVCaptureSession alloc] init];
if ([[UIDevice currentDevice] userInterfaceIdiom] == UIUserInterfaceIdiomPhone){
    [session setSessionPreset:AVCaptureSessionPreset640x480];
} else {
    [session setSessionPreset:AVCaptureSessionPresetPhoto];
}
// Select a video device, make an input
AVCaptureDevice *device;
AVCaptureDevicePosition desiredPosition = AVCaptureDevicePositionFront;
// find the front facing camera
for (AVCaptureDevice *d in [AVCaptureDevice devicesWithMediaType:AVMediaTypeVideo]) {
	if ([d position] == desiredPosition) {
		device = d;
        self.isUsingFrontFacingCamera = YES;
		break;
	}
}
// fall back to the default camera.
if( nil == device )
{
    self.isUsingFrontFacingCamera = NO;
    device = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];
}
// get the input device
AVCaptureDeviceInput *deviceInput = [AVCaptureDeviceInput deviceInputWithDevice:device error:&error];
if( !error ) {

    // add the input to the session
    if ( [session canAddInput:deviceInput] ){
        [session addInput:deviceInput];
    }

    self.previewLayer = [[AVCaptureVideoPreviewLayer alloc] initWithSession:session];
    self.previewLayer.backgroundColor = [[UIColor blackColor] CGColor];
    self.previewLayer.videoGravity = AVLayerVideoGravityResizeAspect;

    CALayer *rootLayer = [self.previewView layer];
    [rootLayer setMasksToBounds:YES];
    [self.previewLayer setFrame:[rootLayer bounds]];
    [rootLayer addSublayer:self.previewLayer];
    [session startRunning];

}
session = nil;
if (error) {
	UIAlertView *alertView = [[UIAlertView alloc] initWithTitle:
                        [NSString stringWithFormat:@"Failed with error %d", (int)[error code]]
                                           message:[error localizedDescription]
									      delegate:nil
							     cancelButtonTitle:@"Dismiss"
							     otherButtonTitles:nil];
	[alertView show];
	[self teardownAVCapture];
}

Now for the face detection.

We create the face detector itself in viewDidLoad, and keep a reference to it with a property. We use low accuracy, again for performance reasons.

NSDictionary *detectorOptions = [[NSDictionary alloc] initWithObjectsAndKeys:CIDetectorAccuracyLow, CIDetectorAccuracy, nil];
self.faceDetector = [CIDetector detectorOfType:CIDetectorTypeFace context:nil options:detectorOptions];

 

We access the data captured by the camera by creating an AVCaptureVideoDataOutput, using BGRA as pixelformat. We drop frames we cannot process. To do the actual processing, we create a separate processing queue. This feature works via the delegate method, that gets called for each frame on the processing queue.

// Make a video data output
self.videoDataOutput = [[AVCaptureVideoDataOutput alloc] init];
// we want BGRA, both CoreGraphics and OpenGL work well with 'BGRA'
NSDictionary *rgbOutputSettings = [NSDictionary dictionaryWithObject:
                                   [NSNumber numberWithInt:kCMPixelFormat_32BGRA] forKey:(id)kCVPixelBufferPixelFormatTypeKey];
[self.videoDataOutput setVideoSettings:rgbOutputSettings];
[self.videoDataOutput setAlwaysDiscardsLateVideoFrames:YES]; // discard if the data output queue is blocked
// create a serial dispatch queue used for the sample buffer delegate
// a serial dispatch queue must be used to guarantee that video frames will be delivered in order
// see the header doc for setSampleBufferDelegate:queue: for more information
self.videoDataOutputQueue = dispatch_queue_create("VideoDataOutputQueue", DISPATCH_QUEUE_SERIAL);
[self.videoDataOutput setSampleBufferDelegate:self queue:self.videoDataOutputQueue];
if ( [session canAddOutput:self.videoDataOutput] ){
    [session addOutput:self.videoDataOutput];
}
// get the output for doing face detection.
[[self.videoDataOutput connectionWithMediaType:AVMediaTypeVideo] setEnabled:YES];

The actual processing happens in the delegate method, that gets called on the background. First the frameBuffer is created, we use all attachments that come with the captured frame for processing.  We add exif information onto the image, because we need to know which side is up. The actual face detection is done in the method [self.facedetector featuresInImage:ciImage options:imageOptions];

- (void)captureOutput:(AVCaptureOutput *)captureOutput
    didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
       fromConnection:(AVCaptureConnection *)connection
{
	// get the image
	CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
	CFDictionaryRef attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate);
	CIImage *ciImage = [[CIImage alloc] initWithCVPixelBuffer:pixelBuffer
                                                      options:(__bridge NSDictionary *)attachments];
	if (attachments) {
		CFRelease(attachments);
    }

    // make sure your device orientation is not locked.
	UIDeviceOrientation curDeviceOrientation = [[UIDevice currentDevice] orientation];

	NSDictionary *imageOptions = nil;

	imageOptions = [NSDictionary dictionaryWithObject:[self exifOrientation:curDeviceOrientation]
                                               forKey:CIDetectorImageOrientation];

	NSArray *features = [self.faceDetector featuresInImage:ciImage
                                                   options:imageOptions];

    // get the clean aperture
    // the clean aperture is a rectangle that defines the portion of the encoded pixel dimensions
    // that represents image data valid for display.
	CMFormatDescriptionRef fdesc = CMSampleBufferGetFormatDescription(sampleBuffer);
	CGRect cleanAperture = CMVideoFormatDescriptionGetCleanAperture(fdesc, false /*originIsTopLeft == false*/);

	dispatch_async(dispatch_get_main_queue(), ^(void) {
		[self drawFaces:features
            forVideoBox:cleanAperture
            orientation:curDeviceOrientation];
	});
}

The last step is to actually draw something on the screen where the face has been detected. The method drawFaces:forVideoBox:orientation is called on the main thread to do this.

In this method, we will draw an image onto a CALayer in the previewLayer. For each detected face, we will create or reuse a layer. We have to setup the correct size based on the bounds of the detected face. Take into account that the video has been scaled, so we also need to take that factor into account.  Then we position the image onto the layer. The layer in turn needs to be rotated into the right orientation. This is done based on the device orientation.

// called asynchronously as the capture output is capturing sample buffers, this method asks the face detector
// to detect features and for each draw the green border in a layer and set appropriate orientation
- (void)drawFaces:(NSArray *)features
      forVideoBox:(CGRect)clearAperture
      orientation:(UIDeviceOrientation)orientation
{
	NSArray *sublayers = [NSArray arrayWithArray:[self.previewLayer sublayers]];
	NSInteger sublayersCount = [sublayers count], currentSublayer = 0;
	NSInteger featuresCount = [features count], currentFeature = 0;

	[CATransaction begin];
	[CATransaction setValue:(id)kCFBooleanTrue forKey:kCATransactionDisableActions];

	// hide all the face layers
	for ( CALayer *layer in sublayers ) {
		if ( [[layer name] isEqualToString:@"FaceLayer"] )
			[layer setHidden:YES];
	}	

	if ( featuresCount == 0 ) {
		[CATransaction commit];
		return; // early bail.
	}

	CGSize parentFrameSize = [self.previewView frame].size;
	NSString *gravity = [self.previewLayer videoGravity];
	BOOL isMirrored = [self.previewLayer isMirrored];
	CGRect previewBox = [ViewController videoPreviewBoxForGravity:gravity
                                                        frameSize:parentFrameSize
                                                     apertureSize:clearAperture.size];

	for ( CIFaceFeature *ff in features ) {
		// find the correct position for the square layer within the previewLayer
		// the feature box originates in the bottom left of the video frame.
		// (Bottom right if mirroring is turned on)
		CGRect faceRect = [ff bounds];

		// flip preview width and height
		CGFloat temp = faceRect.size.width;
		faceRect.size.width = faceRect.size.height;
		faceRect.size.height = temp;
		temp = faceRect.origin.x;
		faceRect.origin.x = faceRect.origin.y;
		faceRect.origin.y = temp;
		// scale coordinates so they fit in the preview box, which may be scaled
		CGFloat widthScaleBy = previewBox.size.width / clearAperture.size.height;
		CGFloat heightScaleBy = previewBox.size.height / clearAperture.size.width;
		faceRect.size.width *= widthScaleBy;
		faceRect.size.height *= heightScaleBy;
		faceRect.origin.x *= widthScaleBy;
		faceRect.origin.y *= heightScaleBy;

		if ( isMirrored )
			faceRect = CGRectOffset(faceRect, previewBox.origin.x + previewBox.size.width - faceRect.size.width - (faceRect.origin.x * 2), previewBox.origin.y);
		else
			faceRect = CGRectOffset(faceRect, previewBox.origin.x, previewBox.origin.y);

		CALayer *featureLayer = nil;

		// re-use an existing layer if possible
		while ( !featureLayer && (currentSublayer < sublayersCount) ) {
			CALayer *currentLayer = [sublayers objectAtIndex:currentSublayer++];
			if ( [[currentLayer name] isEqualToString:@"FaceLayer"] ) {
				featureLayer = currentLayer;
				[currentLayer setHidden:NO];
			}
		}

		// create a new one if necessary
		if ( !featureLayer ) {
			featureLayer = [[CALayer alloc]init];
			featureLayer.contents = (id)self.borderImage.CGImage;
			[featureLayer setName:@"FaceLayer"];
			[self.previewLayer addSublayer:featureLayer];
			featureLayer = nil;
		}
		[featureLayer setFrame:faceRect];

		switch (orientation) {
			case UIDeviceOrientationPortrait:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(0.))];
				break;
			case UIDeviceOrientationPortraitUpsideDown:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(180.))];
				break;
			case UIDeviceOrientationLandscapeLeft:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(90.))];
				break;
			case UIDeviceOrientationLandscapeRight:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(-90.))];
				break;
			case UIDeviceOrientationFaceUp:
			case UIDeviceOrientationFaceDown:
			default:
				break; // leave the layer in its last known orientation
		}
		currentFeature++;
	}

	[CATransaction commit];
}

There you go. That is the basic principle behind Face Detection in iOS 5. For the nitty gritty details, just have a look at the code on github or download the zip.

There is much more to be explored. Core Image also provides access to the detected location of eyes and mouth. That would be even better to place the mustache correctly. We could also rotate the image, based on the angle of the face on the screen.

Adios!

Any feedback is appreciated in the comments.