High School Student Blog: Project Bridging AI and Facial Recognition
Giving Robots an Infant’s Sense of Vision
After the first part of this mini-series on detecting human emotions through audio features, it is the right choice to teach the robot to detect emotion through image features. For the second part of my three-part mini-series on AI & robotics, I am demonstrating an experimental Face Emotion Recognition project to explore its potential.
Here’s a quick overview of the project:
Project Description: Using a cascade classifier to detect faces in a frame and a custom sequential model to detect emotions using that frame in real-time.
Data Set:
Includes male and female
Contains data separated into train and validation and further into angry, disgust, fear, happy, neutral, sad, surprise
Furthermore, let’s divide the project into 2 parts:
Model: loading the data and building, training, and testing the model
Detecting faces: Finding regions of interest (ROI) or faces in a frame and real-time emotion detection based on them
MODEL
To prepare the data for our model, we would have to take a train and validation generator and then read each file from the data folder. We also have to rescale the images so it is easy to deal with them. We can use this code sample:
# Define data generators
train_dir = "../RawData/facial-expression/images/train/"
val_dir = "../RawData/facial-expression/images/validation/"
num_train = 28709
num_val = 7178
batch_size = 64
num_epoch = 50
train_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(48,48),
batch_size=batch_size,
color_mode="grayscale",
class_mode='categorical')
validation_generator = val_datagen.flow_from_directory(
val_dir,
target_size=(48,48),
batch_size=batch_size,
color_mode="grayscale",
class_mode='categorical')
Now, onto building the model:
# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48,48,1)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(7, activation='softmax'))
Finally, if running this code for the first time, or the model needs to be retrained, this code sample should be used:
model.compile(loss='categorical_crossentropy',optimizer=Adam(lr=0.0001, decay=1e-6),metrics=['accuracy'])
model_info = model.fit_generator(
train_generator,
steps_per_epoch=num_train // batch_size,
epochs=num_epoch,
validation_data=validation_generator,
validation_steps=num_val // batch_size)
plot_model_history(model_info)
model.save_weights('model.h5')
It uses the Adam optimizer and categorical cross-entropy as its loss function. Finally, it stores the weight generates in a .h5 file so it could be loaded quickly when run after this.
DETECTING FACES
Before we analyze emotions from faces in a frame, we need to start a video source, prepare the image and find faces/regions of interest (ROI) in it. To add to that, sometimes we just need to load the model instead of training it, and to do all that, we can use this code sample:
print("[*] Loading the model...")
model.load_weights('model.h5')
# prevents openCL usage and unnecessary logging messages
cv2.ocl.setUseOpenCL(False)
# dictionary which assigns each label an emotion (alphabetical order)
emotion_dict = {0: "Angry", 1: "Disgusted", 2: "Fearful", 3: "Happy", 4: "Neutral", 5: "Sad", 6: "Surprised"}
# start the webcam feed
print("[*] Loading camera...")
cap = cv2.VideoCapture(0)
while True:
# Find haar cascade to draw bounding box around face
ret, frame = cap.read()
frame = cv2.rotate(frame, cv2.ROTATE_180)
if not ret:
print("[*] Camera not found")
break
facecasc = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = facecasc.detectMultiScale(gray,scaleFactor=1.3, minNeighbors=5)
cv2.imwrite("original_fr.png", frame)
cv2.imwrite("grayscale_fr.png", gray)
Finally, let’s make a prediction and draw a rectangle around the face for the output frame for all faces in the original frame.
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y-50), (x+w, y+h+10), (255, 0, 0), 2)
roi_gray = gray[y:y + h, x:x + w]
cropped_img = np.expand_dims(np.expand_dims(cv2.resize(roi_gray, (48, 48)), -1), 0)
prediction = model.predict(cropped_img)
maxindex = int(np.argmax(prediction))
cv2.putText(frame, emotion_dict[maxindex], (x+20, y-60), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)
if(debug):
print(emotion_dict[maxindex])
# cv2.imshow('Video', cv2.resize(frame,(1600,960),interpolation = cv2.INTER_CUBIC))
CONCLUSION
Similar to how an infant starts crying even if you look at them with an angry face, this model can be used by robots to know and judge our emotions while looking at us. Moreover, it’s an addition to recognizing human emotions through voice features from the last part of this mini-series.
This code was tested on a Raspberry Pi 4 powered Humanoid — Shelbot (one of my ongoing projects)
The full code for this project can be found here.
ABOUT ME
Github: https://github.com/LakshBhambhani
LinkedIn: https://www.linkedin.com/in/lakshbhambhani/
Laksh Bhambhani is a Student Ambassador in the Inspirit AI Student Ambassadors Program. Inspirit AI is a pre-collegiate enrichment program that exposes curious high school students globally to AI through live online classes. Learn more at https://www.inspiritai.com/.
https://lakshbhambhani.medium.com/giving-robots-an-infants-sense-of-vision-1008f720a669