# Python Wu Enda deep learning assignment 16 -- face recognition

2022-07-24 16:44:48

# Face recognition - the Happy House

In this assignment , You will learn how to build a face recognition system .

Face recognition problems are usually divided into two categories ：

• Face verification ： For example, in some airports , The system scans your passport , Then confirm that you （ People with passports ） It's me , Through customs . For example, mobile phones that use face unlocking . Usually this category is 1：1 Matching problems .
• Face recognition ： For example, the lecture showed the face recognition of a Baidu employee entering the office video . This kind is 1：K Matching problems .

FaceNet The network encodes the face image into 128 Number vectors and learn , By comparing two such vectors , To determine whether the two pictures are the same person .

In this job , You will be

• Realization triplet loss Loss function
• The face image is mapped into 128 Dimensional coding
• Use these codes to realize face verification and face recognition

In this exercise , We will use the pre training model , The model uses "channels first" To express ConvNet Activate , Instead of using it like previous programming jobs "channels last". let me put it another way , A batch of images will have ( m , n C , n H , n W ) (m,n_C,n_H,n_W) Dimensions , Instead of ( m , n H , n W , n C ) (m,n_H,n_W,n_C) . These two methods have considerable attraction in open source implementation . There is no unified standard in deep learning .

First, let's load the required packages .

from keras.models import Sequential
from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D, AveragePooling2D
from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.initializers import glorot_uniform
from keras.engine.topology import Layer
from keras import backend as K
K.set_image_data_format('channels_first')
import cv2
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
from fr_utils import *
from inception_blocks_v2 import *

%matplotlib inline

# np.set_printoptions(threshold=np.nan)
import sys
np.set_printoptions(threshold=sys.maxsize)

The autoreload extension is already loaded. To reload it, use:


## 0 Face verification

In face verification , You will get two images , And it must be determined whether they belong to the same person . The simplest way is to compare two images pixel by pixel . If the distance between the original images is less than the selected threshold , It may be the same person ！

Of course , The performance of this algorithm is really poor , Because the pixel value will be due to illumination , Face orientation , Even small changes in the position of the head and other factors .

You'll find that , Can code f ( i m g ) f(img) Instead of using the original image , In this way, the element by element comparison of the code can more accurately determine whether the two pictures belong to the same person .

## 1 Encode the face image as 128 Dimension vector

### 1.1 Use ConvNet Calculate the code

FaceNet The model needs a lot of training data and takes a long time to train . therefore , Follow the routine application practice in deep learning , We load the weights that have been trained . The network structure follows Szegedy et al. Medium Inception Model . We provide the initial network implementation . You can check the files inception_blocks.py To understand its implementation .

The key thing you need to know is ：

• This network uses 96 × 96 96 \times 96 Size RGB Image as input . say concretely , Input a face image （ Or a batch of m m Face image ） As dimension is ( m , n C , n H , n W ) = ( m , 3 , 96 , 96 ) (m,n_C,n_H,n_W)=(m,3,96,96) Tensor .
• The output dimension is ( m , 128 ) (m,128) Matrix , The matrix encodes each input face image into 128 Dimension vector .

Run the following cells to create a face image model .

FRmodel = faceRecoModel(input_shape=(3, 96, 96))

print("Total Params:", FRmodel.count_params())

Total Params: 3743280


By using 128 The whole connecting layer composed of neurons is the last layer , This model ensures that the output is of size 128 The coding vector of . then , Use this code to compare two face images , As shown below ：

By calculating the distance between the two codes and the threshold , To determine whether the two pictures represent the same person

If the following conditions are met , Coding will be a good choice ：

• The coding of two images of the same person is very similar to each other
• The coding difference between two images of different people is obvious

triplet loss The loss function facilitates this implementation , It tries to make “ Same person （ Anchor and forward ）” Coding of two images “ PUSH ” Closer , At the same time, another person （ Anchor point , Negative ） Coding of two images “ PULL ” Go further .

thereinafter , We will call the picture from left to right ： Anchor point （A）, positive （P）, Negative （N）

### 1.2 Triplet loss

For the image x x , Its code is expressed as f ( x ) f(x) , among f f It is the calculation function of neural network .

We add a standardization step at the end of the model , In order to make ∣ ∣ f ( x ) ∣ ∣ 2 = 1 {||f(x)||}_2=1 ( It means that the coding vector should be norm 1).

The training will use three sets of images ( A , P , N ) (A,P,N) :

• A yes “ Anchor example ” Images ： Photos of people .
• P yes “ Positive example ” Images ： Photo of the same person as the anchor example image .
• N yes “ Negative example ” Images ： Photos of people different from the anchor example image .

These images are selected from our training set . We use ( A ( i ) , P ( i ) , N ( i ) ) (A^{(i)},P^{(i)},N^{(i)}) To represent the i i Training examples .

If you want to identify a person's image A ( i ) A^{(i)} Compare the negative example image N ( i ) N^{(i)} Closer to the positive example image P ( i ) P^{(i)} At least make sure α \alpha :
∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 + α < ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2
therefore , You need to minimize the following "triplet cost"：
J = ∑ i = 1 m [ ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 ⏟ (1) − ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 ⏟ (2) + α ] + (3) \mathcal{J} = \sum^{m}_{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2}_\text{(1)} - \underbrace{\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2}_\text{(2)} + \alpha \large ] \small_+ \tag{3}
ad locum , We use symbols " [ z ] + [z]_+ " Express m a x ( z , 0 ) max(z,0) .

Be careful ：

• term （1） Is an example of an anchor for a given triple “A” And positive examples “P” The square distance between ; Expected minimized value .
• term （2） Is an example of an anchor for a given triple “A” And negative examples “N” The square distance between , Expect this value to be relatively large , So it makes sense to have a minus sign in front of it .
• α \alpha It is called margin . It is a super parameter that can be adjusted manually . We will use α = 0.2 \alpha = 0.2 .

Most implementation methods also need to standardize the coding vector so that its norm is equal to 1（ namely ∣ ∣ f ( i m g ) ∣ ∣ 2 = 1 ||f(img)||_2=1

practice ： Implementation formula （3） Defined triple loss . contain 4 A step ：

1. Calculation “ Anchor example ” and “ Positive example ” Distance between codes ： ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2
2. Calculation “ Anchor example ” and “ Negative example ” Distance between codes ： ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2
3. Calculate the formula according to each training example ：$\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid - \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 + \alpha$
4. Calculate the complete formula by taking the maximum value to zero and summing the training examples ： J = ∑ i = 1 m [ ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 − ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 + α ] + (3) \mathcal{J} = \sum^{m}_{i=1} \large[ \small \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 - \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2+ \alpha \large ] \small_+ \tag{3}

Some useful functions ：tf.reduce_sum(), tf.square(), tf.subtract(), tf.add(), tf.maximum().

For steps 1 And steps 2, You need to add ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 and ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 , In the first 4 In step , You need to sum up the training examples .

def triplet_loss(y_true, y_pred, alpha = 0.2):
"""  According to the formula （4） Implement triple loss function   Parameters ： y_true -- true label , When you are in Keras When you define a loss function in, you need it , But there's no need for . y_pred --  List the type , Contains the following parameters ： anchor --  A given "anchor" Image coding , Dimension for (None, 128) positive -- "positive" Image coding , Dimension for (None, 128) negative -- "negative" Image coding , Dimension for (None, 128) alpha --  Hyperparameters , threshold   return ： loss --  The set of real Numbers , The value of the loss  """
#  obtain anchor、positive、negative Image coding for
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]

# Step1： Calculation "anchor" And "positive" The distance between codes , You need to use axis = -1
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,positive)),axis=-1)

# Step2： Calculation "anchor"  And  "negative" The distance between codes , You need to use axis=-1
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)),axis=-1)

# Step3： Subtract the previous two distances , And then add alpha

#  The whole formula is calculated by taking the maximum value with zero and summing the training samples
loss = tf.reduce_sum(tf.maximum(basic_loss,0))

return loss

with tf.Session() as test:
tf.set_random_seed(1)
y_true = (None, None, None)
y_pred = (tf.random_normal([3, 128], mean=6, stddev=0.1, seed = 1),
tf.random_normal([3, 128], mean=1, stddev=1, seed = 1),
tf.random_normal([3, 128], mean=3, stddev=4, seed = 1))
loss = triplet_loss(y_true, y_pred)

print("loss = " + str(loss.eval()))

loss = 528.14307


## 2 Load the trained model

Train by minimizing triple loss FaceNet. But because training requires a lot of data and calculation , So here we will not start training from scratch . We load the previously trained model . Use the following cells to load the model ; This will take several minutes to run .

FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])


Example of coding distance output of three people

Now? , Let's use this model to perform face verification and face recognition ！

## 3 Model application

go back to the Happy House（ Data set introduction can be referred to Homework ）！ Since you realized the happiness recognition of the house in your earlier task , Residents live a happy life .

however , There are several problems emerging ： Happy home has become so happy , So that every happy person nearby strolls in your living room . The house becomes very crowded , This has had a negative impact on the residents inside . All other happy people are eating your food too .

therefore , You decide to change the access control policy , Don't let random happy people enter , Even if they Happy！ contrary , You want to build a “ Face verification ” System , So that only people in the specified list are allowed to enter . To be admitted , Everyone must paint one ID card （ Identification card ） To trigger the face recognition system on the door , Then check whether they are themselves .

### 3.1 Face verification

Let's build a database , It contains the coding vector of people who are allowed to enter the happiness house . We use img_to_encoding(image_path, model) Generate code , It basically runs the forward propagation of the model on the specified image .

Run the following code to build the database （ With python The dictionary says ）. The database maps everyone's name to their face 128 Dimensional coding .

database = {
}
database["danielle"] = img_to_encoding("images/danielle.png", FRmodel)
database["younes"] = img_to_encoding("images/younes.jpg", FRmodel)
database["tian"] = img_to_encoding("images/tian.jpg", FRmodel)
database["andrew"] = img_to_encoding("images/andrew.jpg", FRmodel)
database["kian"] = img_to_encoding("images/kian.jpg", FRmodel)
database["dan"] = img_to_encoding("images/dan.jpg", FRmodel)
database["sebastiano"] = img_to_encoding("images/sebastiano.jpg", FRmodel)
database["bertrand"] = img_to_encoding("images/bertrand.jpg", FRmodel)
database["kevin"] = img_to_encoding("images/kevin.jpg", FRmodel)
database["felix"] = img_to_encoding("images/felix.jpg", FRmodel)
database["benoit"] = img_to_encoding("images/benoit.jpg", FRmodel)
database["arnaud"] = img_to_encoding("images/arnaud.jpg", FRmodel)

WARNING:tensorflow:From d:\vr\virtual_environment\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.


Now? , When someone shows up at your front door and swipes their ID card , You can look up their codes in the database , And use it to check whether the person standing at the front door is himself .

practice ： Realization verify() function , This function checks the pictures taken by the front door camera （image_path） Whether it's me . You need to perform the following steps ：

1. from image_path Calculate the encoding of the image
2. Calculate the distance between this encoding and the encoding of the identity image stored in the database
3. If the distance is less than 0.7, Open the door , Otherwise, do not open .

As mentioned above , You should use L2 distance （np.linalg.norm）.（ Be careful ： In this implementation , take L2 Distance, not L2 Square of distance and threshold 0.7 Compare .）

def verify(image_path, identity, database, model):
"""  Yes "identity" And "image_path" To verify the coding .  Parameters ： image_path --  The picture of the camera  identity --  Character type , The name of the person you want to verify . database --  Dictionary type , Contains the name information of the member and the corresponding code . model --  stay Keras An example of a model of   return ： dist --  The coding gap between the picture of the camera and the picture in the database  is_open_door -- boolean, Whether to open the door . """
# Step1： Calculate the encoding of the image , Use fr_utils.img_to_encoding() To calculate .
encoding = fr_utils.img_to_encoding(image_path, model)

# Step2： Calculate the difference from the code stored in the database
dist = np.linalg.norm(encoding - database[identity])

# Step3： Determine whether to open the door
if dist < 0.7:
print(" welcome  " + str(identity) + " get home ！")
is_door_open = True
else:
print(" verified , You and " + str(identity) + " Not in conformity with ！")
is_door_open = False

return dist, is_door_open


Younes （Younes） Try to enter the happy home , Then the camera takes pictures for him （“images/camera_0.jpg”）. Let's run your verification algorithm on this image ：

verify("images/camera_0.jpg","younes",database,FRmodel)

 welcome  younes get home ！

(0.6671406, True)


Destroyed the aquarium last weekend Benoit Have been forbidden to enter the house , And deleted from the database . He stole it Kian ID card of , Then go back to the house , Try to dress up as Kian. The front door camera shot Benoit Photos of the （“images/camera_2.jpg”）. Let's run the validation algorithm to check benoit Is it possible to enter .

verify("images/camera_2.jpg", "kian", database, FRmodel)

 verified , You and kian Not in conformity with ！

(0.85868865, False)


### 3.2 Face recognition

Your face verification system works well in most cases . But since Ken （Kian） Since my ID card was stolen , When he came home that night , He can't get in ！

To reduce this prank , You want to change the face verification system to face recognition system . such , No longer need to carry ID card . Authorized personnel can walk to the front of the house , The front door will unlock for them ！

So , You will implement a human face recognition system , The system takes the image as the input , And determine whether the image is one of the authorized personnel . Different from previous face verification systems , We will no longer get a person's name as another input .

practice ： Realization who_is_it(), You need to perform the following steps ：

1. from image_path Calculate the target code of the image
2. Find the code with the shortest distance from the target code from the database .
• take min_dist The variable is initialized to a large enough number （100）. This will help you track the code closest to the input code .
• Traverse the name and code of the database dictionary . Recycling for (name, db_enc) in database.items().
• Calculate target “ code ” With the current “ code ” Between L2 distance .
• If the distance is less than min_dist, Will min_dist Set to dist, And will identity Set to name.
def who_is_it(image_path, database, model):
"""  Face recognition according to the specified picture   Parameters ： image_path --  Image address  database --  A dictionary containing names and codes  model --  stay Keras Examples of models in .  return ： min_dist --  The code that most intersects with the specified image in the database . identity --  String type , And min_dist The name corresponding to the code . """
# Step1： Calculates the encoding of the specified image , Use fr_utils.img_to_encoding() To calculate .
encoding = fr_utils.img_to_encoding(image_path, model)

# Step2 ： Find the closest code
##  initialization min_dist The variable is a number large enough , I'm going to set it to 100
min_dist = 100

##  Traverse the database to find the closest code
for (name,db_enc) in database.items():
###  Calculate the relationship between the target code and the current database code L2 disparity .
dist = np.linalg.norm(encoding - db_enc)

###  If the gap is less than min_dist, Then update the name and code to identity And min_dist in .
if dist < min_dist:
min_dist = dist
identity = name

#  Determine whether it is in the database
if min_dist > 0.7:
print(" I'm sorry , Your information is not in the database .")

else:
print(" full name " + str(identity) + "  disparity ：" + str(min_dist))

return min_dist, identity


Younes （Younes） At the front door , The camera takes pictures for him （“images/camera_0.jpg”）. Let's take a look at your who_it_is() Whether the algorithm can recognize Younes.

who_is_it("images/camera_0.jpg", database, FRmodel)

 full name younes   disparity ：0.6671406

(0.6671406, 'younes')


You can take "camera_0.jpg"（younes Pictures of the ） Change to "camera_1.jpg" （bertrand Pictures of the ）, And look at the results .

Your happy home is working well . It only allows authorized personnel to enter , And people no longer need to carry identity cards ！

Now you know how the latest face recognition system works .

Although we will not implement it here , But there are some ways to further improve the algorithm ：

• Put more images of everyone （ Under different light conditions , Images taken on different days ） Put into database . Then give a new image , Compare new faces with multiple pictures of characters to improve accuracy .
• Crop an image that contains only the face , And reduce... Around the face “ Frame ” Area . This preprocessing removes some irrelevant pixels around the face , It also makes the algorithm more robust .

You should remember

• Face verification solves the simpler 1：1 Matching problems ; Face recognition solves the more difficult 1：K Matching problems .
• Triple loss is an effective loss function used to train neural networks to learn facial image coding .
• The same code can be used for verification and identification . By measuring the distance between the encoding of two images , It can be determined whether they are photos of the same person .