Face detection with Core Image on Live Video

Posted by | May 04, 2012 | 19 Comments

In this article I will explain how to do face detection on a live video feed using an iOS 5 device. We will be using Core Image to do the heavy lifting. The code is loosely based on the SquareCam sample code from Apple.

To get started, we need to show the live video of the front facing camera. We use AVFoundation to do this. We start by setting up the AVCaptureSession. We use 640×480 as the capture resolution. Keep in mind that face detection is relatively compute intensive. The less pixels we need to munch, the faster the processing can be done. This is an interactive application, so realtime performance is important. We tell the AVCaptureSession which camera to use as input device.

To show the preview, we create an AVCaptureVideoPreviewLayer and add it to the previewView, that was created in the Xib. Don’t forget to call [session startRunning]. That was the easy part.

NSError *error = nil;
AVCaptureSession *session = [[AVCaptureSession alloc] init];
if ([[UIDevice currentDevice] userInterfaceIdiom] == UIUserInterfaceIdiomPhone){
    [session setSessionPreset:AVCaptureSessionPreset640x480];
} else {
    [session setSessionPreset:AVCaptureSessionPresetPhoto];
}
// Select a video device, make an input
AVCaptureDevice *device;
AVCaptureDevicePosition desiredPosition = AVCaptureDevicePositionFront;
// find the front facing camera
for (AVCaptureDevice *d in [AVCaptureDevice devicesWithMediaType:AVMediaTypeVideo]) {
	if ([d position] == desiredPosition) {
		device = d;
        self.isUsingFrontFacingCamera = YES;
		break;
	}
}
// fall back to the default camera.
if( nil == device )
{
    self.isUsingFrontFacingCamera = NO;
    device = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];
}
// get the input device
AVCaptureDeviceInput *deviceInput = [AVCaptureDeviceInput deviceInputWithDevice:device error:&error];
if( !error ) {

    // add the input to the session
    if ( [session canAddInput:deviceInput] ){
        [session addInput:deviceInput];
    }

    self.previewLayer = [[AVCaptureVideoPreviewLayer alloc] initWithSession:session];
    self.previewLayer.backgroundColor = [[UIColor blackColor] CGColor];
    self.previewLayer.videoGravity = AVLayerVideoGravityResizeAspect;

    CALayer *rootLayer = [self.previewView layer];
    [rootLayer setMasksToBounds:YES];
    [self.previewLayer setFrame:[rootLayer bounds]];
    [rootLayer addSublayer:self.previewLayer];
    [session startRunning];

}
session = nil;
if (error) {
	UIAlertView *alertView = [[UIAlertView alloc] initWithTitle:
                        [NSString stringWithFormat:@"Failed with error %d", (int)[error code]]
                                           message:[error localizedDescription]
									      delegate:nil
							     cancelButtonTitle:@"Dismiss"
							     otherButtonTitles:nil];
	[alertView show];
	[self teardownAVCapture];
}

Now for the face detection.

We create the face detector itself in viewDidLoad, and keep a reference to it with a property. We use low accuracy, again for performance reasons.

NSDictionary *detectorOptions = [[NSDictionary alloc] initWithObjectsAndKeys:CIDetectorAccuracyLow, CIDetectorAccuracy, nil];
self.faceDetector = [CIDetector detectorOfType:CIDetectorTypeFace context:nil options:detectorOptions];

 

We access the data captured by the camera by creating an AVCaptureVideoDataOutput, using BGRA as pixelformat. We drop frames we cannot process. To do the actual processing, we create a separate processing queue. This feature works via the delegate method, that gets called for each frame on the processing queue.

// Make a video data output
self.videoDataOutput = [[AVCaptureVideoDataOutput alloc] init];
// we want BGRA, both CoreGraphics and OpenGL work well with 'BGRA'
NSDictionary *rgbOutputSettings = [NSDictionary dictionaryWithObject:
                                   [NSNumber numberWithInt:kCMPixelFormat_32BGRA] forKey:(id)kCVPixelBufferPixelFormatTypeKey];
[self.videoDataOutput setVideoSettings:rgbOutputSettings];
[self.videoDataOutput setAlwaysDiscardsLateVideoFrames:YES]; // discard if the data output queue is blocked
// create a serial dispatch queue used for the sample buffer delegate
// a serial dispatch queue must be used to guarantee that video frames will be delivered in order
// see the header doc for setSampleBufferDelegate:queue: for more information
self.videoDataOutputQueue = dispatch_queue_create("VideoDataOutputQueue", DISPATCH_QUEUE_SERIAL);
[self.videoDataOutput setSampleBufferDelegate:self queue:self.videoDataOutputQueue];
if ( [session canAddOutput:self.videoDataOutput] ){
    [session addOutput:self.videoDataOutput];
}
// get the output for doing face detection.
[[self.videoDataOutput connectionWithMediaType:AVMediaTypeVideo] setEnabled:YES];

The actual processing happens in the delegate method, that gets called on the background. First the frameBuffer is created, we use all attachments that come with the captured frame for processing.  We add exif information onto the image, because we need to know which side is up. The actual face detection is done in the method [self.facedetector featuresInImage:ciImage options:imageOptions];

- (void)captureOutput:(AVCaptureOutput *)captureOutput
    didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
       fromConnection:(AVCaptureConnection *)connection
{
	// get the image
	CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
	CFDictionaryRef attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate);
	CIImage *ciImage = [[CIImage alloc] initWithCVPixelBuffer:pixelBuffer
                                                      options:(__bridge NSDictionary *)attachments];
	if (attachments) {
		CFRelease(attachments);
    }

    // make sure your device orientation is not locked.
	UIDeviceOrientation curDeviceOrientation = [[UIDevice currentDevice] orientation];

	NSDictionary *imageOptions = nil;

	imageOptions = [NSDictionary dictionaryWithObject:[self exifOrientation:curDeviceOrientation]
                                               forKey:CIDetectorImageOrientation];

	NSArray *features = [self.faceDetector featuresInImage:ciImage
                                                   options:imageOptions];

    // get the clean aperture
    // the clean aperture is a rectangle that defines the portion of the encoded pixel dimensions
    // that represents image data valid for display.
	CMFormatDescriptionRef fdesc = CMSampleBufferGetFormatDescription(sampleBuffer);
	CGRect cleanAperture = CMVideoFormatDescriptionGetCleanAperture(fdesc, false /*originIsTopLeft == false*/);

	dispatch_async(dispatch_get_main_queue(), ^(void) {
		[self drawFaces:features
            forVideoBox:cleanAperture
            orientation:curDeviceOrientation];
	});
}

The last step is to actually draw something on the screen where the face has been detected. The method drawFaces:forVideoBox:orientation is called on the main thread to do this.

In this method, we will draw an image onto a CALayer in the previewLayer. For each detected face, we will create or reuse a layer. We have to setup the correct size based on the bounds of the detected face. Take into account that the video has been scaled, so we also need to take that factor into account.  Then we position the image onto the layer. The layer in turn needs to be rotated into the right orientation. This is done based on the device orientation.

// called asynchronously as the capture output is capturing sample buffers, this method asks the face detector
// to detect features and for each draw the green border in a layer and set appropriate orientation
- (void)drawFaces:(NSArray *)features
      forVideoBox:(CGRect)clearAperture
      orientation:(UIDeviceOrientation)orientation
{
	NSArray *sublayers = [NSArray arrayWithArray:[self.previewLayer sublayers]];
	NSInteger sublayersCount = [sublayers count], currentSublayer = 0;
	NSInteger featuresCount = [features count], currentFeature = 0;

	[CATransaction begin];
	[CATransaction setValue:(id)kCFBooleanTrue forKey:kCATransactionDisableActions];

	// hide all the face layers
	for ( CALayer *layer in sublayers ) {
		if ( [[layer name] isEqualToString:@"FaceLayer"] )
			[layer setHidden:YES];
	}	

	if ( featuresCount == 0 ) {
		[CATransaction commit];
		return; // early bail.
	}

	CGSize parentFrameSize = [self.previewView frame].size;
	NSString *gravity = [self.previewLayer videoGravity];
	BOOL isMirrored = [self.previewLayer isMirrored];
	CGRect previewBox = [ViewController videoPreviewBoxForGravity:gravity
                                                        frameSize:parentFrameSize
                                                     apertureSize:clearAperture.size];

	for ( CIFaceFeature *ff in features ) {
		// find the correct position for the square layer within the previewLayer
		// the feature box originates in the bottom left of the video frame.
		// (Bottom right if mirroring is turned on)
		CGRect faceRect = [ff bounds];

		// flip preview width and height
		CGFloat temp = faceRect.size.width;
		faceRect.size.width = faceRect.size.height;
		faceRect.size.height = temp;
		temp = faceRect.origin.x;
		faceRect.origin.x = faceRect.origin.y;
		faceRect.origin.y = temp;
		// scale coordinates so they fit in the preview box, which may be scaled
		CGFloat widthScaleBy = previewBox.size.width / clearAperture.size.height;
		CGFloat heightScaleBy = previewBox.size.height / clearAperture.size.width;
		faceRect.size.width *= widthScaleBy;
		faceRect.size.height *= heightScaleBy;
		faceRect.origin.x *= widthScaleBy;
		faceRect.origin.y *= heightScaleBy;

		if ( isMirrored )
			faceRect = CGRectOffset(faceRect, previewBox.origin.x + previewBox.size.width - faceRect.size.width - (faceRect.origin.x * 2), previewBox.origin.y);
		else
			faceRect = CGRectOffset(faceRect, previewBox.origin.x, previewBox.origin.y);

		CALayer *featureLayer = nil;

		// re-use an existing layer if possible
		while ( !featureLayer && (currentSublayer < sublayersCount) ) {
			CALayer *currentLayer = [sublayers objectAtIndex:currentSublayer++];
			if ( [[currentLayer name] isEqualToString:@"FaceLayer"] ) {
				featureLayer = currentLayer;
				[currentLayer setHidden:NO];
			}
		}

		// create a new one if necessary
		if ( !featureLayer ) {
			featureLayer = [[CALayer alloc]init];
			featureLayer.contents = (id)self.borderImage.CGImage;
			[featureLayer setName:@"FaceLayer"];
			[self.previewLayer addSublayer:featureLayer];
			featureLayer = nil;
		}
		[featureLayer setFrame:faceRect];

		switch (orientation) {
			case UIDeviceOrientationPortrait:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(0.))];
				break;
			case UIDeviceOrientationPortraitUpsideDown:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(180.))];
				break;
			case UIDeviceOrientationLandscapeLeft:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(90.))];
				break;
			case UIDeviceOrientationLandscapeRight:
				[featureLayer setAffineTransform:CGAffineTransformMakeRotation(DegreesToRadians(-90.))];
				break;
			case UIDeviceOrientationFaceUp:
			case UIDeviceOrientationFaceDown:
			default:
				break; // leave the layer in its last known orientation
		}
		currentFeature++;
	}

	[CATransaction commit];
}

There you go. That is the basic principle behind Face Detection in iOS 5. For the nitty gritty details, just have a look at the code on github or download the zip.

There is much more to be explored. Core Image also provides access to the detected location of eyes and mouth. That would be even better to place the mustache correctly. We could also rotate the image, based on the angle of the face on the screen.

Adios!

Any feedback is appreciated in the comments.

19 Comments

  • Andrea says:

    Hi there, we built an app based on the same logic, but greatly improved. You should definitely check out “Stickers” on the App Store. Let me know what you think…
    http://itunes.apple.com/us/app/id475823824

  • JJ says:

    Fantastic contribution!

    I downloaded your code and I’m trying to add a pair of glasses apart from the moustache based on the eye’s position. I cannot get it done since I’m getting really confused with the different reference systems. I have gone through the ‘Layer Geometry and Transforms’ section in the documentation but I still don’t get it.

    I have created another CALayer of 20×10 pixels and set is origin at leftEyePosition.x-10, leftEyePosition.y-5. Then I call to:
    [previewLayer addSublayer:sublayer];

    but the location of the glasses does not correspond to the location of the eyes at all.

    Could you give me any hints about how to get this done?

    Thanks in advance.

    • jeroentrappers says:

      You need to convert the coordinates… just like for the face rectangle.

      Flip height and weight,
      Flip x for y,
      Rescale for the viewbox,
      And adjust for mirroring.

      // flip preview width and height
      CGFloat temp = faceRect.size.width;
      faceRect.size.width = faceRect.size.height;
      faceRect.size.height = temp;
      temp = faceRect.origin.x;
      faceRect.origin.x = faceRect.origin.y;
      faceRect.origin.y = temp;
      // scale coordinates so they fit in the preview box, which may be scaled
      CGFloat widthScaleBy = previewBox.size.width / clearAperture.size.height;
      CGFloat heightScaleBy = previewBox.size.height / clearAperture.size.width;
      faceRect.size.width *= widthScaleBy;
      faceRect.size.height *= heightScaleBy;
      faceRect.origin.x *= widthScaleBy;
      faceRect.origin.y *= heightScaleBy;

      if ( isMirrored )
      faceRect = CGRectOffset(faceRect, previewBox.origin.x + previewBox.size.width – faceRect.size.width – (faceRect.origin.x * 2), previewBox.origin.y);
      else
      faceRect = CGRectOffset(faceRect, previewBox.origin.x, previewBox.origin.y);

      • Joan says:

        Can you show an example to convert the coordinates of the eyes?

        Fantastic article!!!

        Thanks!!!

      • rais38 says:

        Fantastic Article!

        Can you please give an example of how to convert the coordinates? (Draw the eye position)

  • madhu says:

    can u please tell me how to save face after detecting face by using ur code…………….

  • rais38 says:

    Fantastic article!

    Could you please tell us how to convert the coordinates to draw the eye position?

    Thanks

  • [...] Face detection with Core Image on Live Video [...]

  • Viren says:

    Hi , Can u plz tell me
    How to capture the image after face detection is done ,as i have added this IBAction ,but it is not returning any image

    -(IBAction)takePicture:(id)sender
    {
    AVCaptureConnection *videoConnection = [_stillImageOutput connectionWithMediaType:AVMediaTypeVideo];;
    for (AVCaptureConnection *connection in _stillImageOutput.connections)
    {
    for (AVCaptureInputPort *port in [connection inputPorts])
    {
    if ([[port mediaType] isEqual:AVMediaTypeVideo] )
    {
    videoConnection = connection;
    break;
    }
    }
    if (videoConnection) { break; }
    }

    NSLog(@”about to request a capture from: %@”, _stillImageOutput);
    [_stillImageOutput captureStillImageAsynchronouslyFromConnection:videoConnection completionHandler: ^(CMSampleBufferRef sampleBuffer, NSError *error)
    {
    CFDictionaryRef exifAttachments = CMGetAttachment( sampleBuffer, kCGImagePropertyExifDictionary, NULL);
    if (exifAttachments)
    {
    // Do something with the attachments.
    NSLog(@"attachements: %@", exifAttachments);
    }
    else
    NSLog(@"no attachments");

    NSData *imageData = [AVCaptureStillImageOutput jpegStillImageNSDataRepresentation:sampleBuffer];
    //UIImage *image = [[UIImage alloc] initWithData:imageData];
    ImageViewController *imageViewController =[[ImageViewController alloc]initWithNibName:@”ImageViewController” bundle:nil] ;
    imageViewController.image =[[UIImage alloc] initWithData:imageData];

    //self.vImage.image = image;
    }];
    }

  • Harshi says:

    Hello Jeroen,

    Thank you for your article and code. It has been very helpful in my project. I am trying to take the user touch information on the iPad and use this to crop a CIimage.

    In your code you went from CIImage coordinate space to view coordinate space in the “drawFaces” method.

    I am trying to do the opposite, where a user taps on the screen and the CIImage is cropped based on where the user tapped the screen.

    I simply tried reversing the conversion, but this did not work. Please could you point me in the right direction or give me some hints as to how I could do this?

    Thanks,
    Harshi

  • raul says:

    Good afternoon, I can not rotate the camera well as the iphone could give me an idea of what happens

    thanks

  • Ricardo says:

    is there any way to identify if it is a boy or a girl?

  • SABARI says:

    Instead of mustache how to add glasses in eyes position?

  • Yohan says:

    Hello, thanks a lot for this sample ! I have a question, related to my lack of knowledge in Xcode : where is the part of the program that runs the detection every x seconds ? I assume there is a schedule somewhere but can’t get to find it…
    Thanks for your reply ;)

    • It’s the

      - (void)captureOutput:(AVCaptureOutput *)captureOutput
      didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
      fromConnection:(AVCaptureConnection *)connection

      method that gets called automatically many times per second.

  • manuel says:

    Hi!! Great project!! I was just playing a bit with it and I was having a WARNING cos this line:

    BOOL isMirrored = [self.previewLayer isMirrored];

    Inside the ViewController is deprecated so you should be doing now like this:

    BOOL isMirrored ;
    if ([self.previewLayer respondsToSelector:@selector(connection)])
    {
    isMirrored = self.previewLayer.connection.isVideoMirrored;
    }
    else
    {
    isMirrored = self.previewLayer.isMirrored;
    }

Leave a Reply

Your email address will not be published.