SecureFace is an application that recognizes a person's identity by their face and creates a short video when motion is detected. When a person comes within 30 cm of the ultrasonic sensor, the camera is activated and takes photos. The Esp32cam integrated with the camera sends the photos over Wi-Fi to a Raspberry device, which uses a recognition function to identify the face and return the result. The LCD then displays the recognized person's name or an error message. Instead if motion is detected using the PIR sensor, a batch of images is sent to the Raspberry server, which creates and stores the video of the suspicious scene.
For this project we used:
- Raspberry (suggested pi 3/4 )
- Esp32cam Ai-Thinker board
- FTDI232 (USB to serial converter)
- PIR sensor (HC-SR501)
- Ultrasonic sensor (HC-SR04)
- LCD Display (1602) with I2C module
If an other version of the board is used you have to modify the cam's pins with the right ones.
#define PWDN_GPIO_NUM 32
#define RESET_GPIO_NUM -1
#define XCLK_GPIO_NUM 0
#define SIOD_GPIO_NUM 26
#define SIOC_GPIO_NUM 27
#define Y9_GPIO_NUM 35
#define Y8_GPIO_NUM 34
#define Y7_GPIO_NUM 39
#define Y6_GPIO_NUM 36
#define Y5_GPIO_NUM 21
#define Y4_GPIO_NUM 19
#define Y3_GPIO_NUM 18
#define Y2_GPIO_NUM 5
#define VSYNC_GPIO_NUM 25
#define HREF_GPIO_NUM 23
#define PCLK_GPIO_NUM 22
The ultrasonic sensor is required only if the face recognition is active in order to understand if a person is in front of the cam.
The LCD Display is optional, if you can't use one you can see the response of the face recognition in the serial monitor.
You can modify all the pins we used, but with the esp32cam Ai-Thinker we suggest to not use GPIO0 and GPIO16. The former is used as source of the clock for the camera and the latter is used to activate the PSRAM (if your board has one).
You can follow this guide to set the sensor in auto-reset mode and, if you want, to adjust the sensitivity and the output timing. We used the default values. Connect the output pin to GPIO12.
If you choose another GPIO pin remeber to change also the value of the bitmask.
We simply connected the trigger pin to GPIO13 and the echo pin to GPIO15 as you can see here.
In order to use less pins (the pins of the esp32cam were not enough) we used the I2C adapater for the display, that uses only two pins GPIO14 and GPIO2 (SDA and SCL). The address of our display is 0x27 but it could be another one, tipically 0x3F.
We suggest to install the VS-code extension called PlatformIO IDE where import the ESP-32CAM
folder project. With this extention all the library and the architecture dependencies will be import automatically. Other IDE could be used instead, all the dependencies and configuration are written in the platformio.ini
platform = espressif32
board = esp32cam
framework = arduino
monitor_speed = 115200
board_build.partitions = min_spiffs.csv
build_flags =
# -DCORE_DEBUG_LEVEL=5 # activate to receive debugging infos
lib_deps =
Otherwise you can use also the PlatformIO CLI.
The raspberry environment only need python and a cpp compiler that are already installed in the Raspbian distro and some python library that could be easily installed with PIP.
$ sudo apt update && sudo apt upgrade
$ sudo apt install python3-pip
You need to install opencv-headless
and face-recognition
, but before it is needed to expand the swapfile from 100
to 2048
$ sudo nano /etc/dphys-swapfile
And restart swapfile to take effect :
$ sudo systemctl restart dphys-swapfile
Now we can run the requirements.txt
file to install the dependencies :
$ cd SecureFace/facial_req/
$ pip install -r requirements.txt
The installation process could need some hour to be complete. After the installation, we need to restore the swapfile with 100
by running the same two commands.
├── ESP-32CAM
│ ├── platformio.ini # platformio configuration file
│ ├── include
│ ├── lib
│ ├── test
│ └── src
│ └── main.cpp # script for Esp32-CAM
├── Server
│ ├── test
│ │ └── send_photo.c # test server
│ ├── server_video.c # script for video
│ └── server_rec.c # script for face recognition
└── facial_rec
├── encodings.pickle # faces train model
├── haarcascade_frontalface_default.xml # frontal face trained model
├── # script to recognize faces
├── # script for training model
└── shell.nix # configuration file for nix-shell
The code for the booard is made in a modulare way. You can comment/uncomment the following macros to enable/disable some features.
// Enable Video
#define VIDEO
// Enable Recogniton
// If display is used
When VIDEO is enabled the camera starts taking photos for the server that makes the video.
When RECOGNITION is enabled only if the ultrasonic sensor measures a distance minor than PHOTO_TRIGGER, the camera interrupts the video (if enabled) and starts taking photos for the recognition server.
The LCD_DISPLAY macro enables the print on an external display.
Before uploading the code there are some macros that you have to set:
- WIFI_SSID your wifi network name
- WIFI_PSW you wifi password
- HOST_VIDEO the IP or the hostname of the video server
- HOST_PHOTO the IP or the hostname of the recognition server
In order to upload the code on the board you can proceed either with the VS-code extension or with the CLI, using the FTDI232.
In VS-code you will see the toolbar shown down here. To upload the code click the arrow (third item in the toolbar)
With the terminal go in the ESP32-CAM directory and run:
$ pio run -t upload
The first step is to train the facial recognition model. To run the training, a dataset is needed. To create this dataset, you must create a dataset
folder, and then add a folder for each person that the model must recognize. The name of these folders will be the output of the recognition script when it recognizes someone, while "unknown" will be the output when the script doesn't recognize anyone. In our project, we uploaded a hundred images per person with an acceptable accuracy result.
$ cd facial_rec
$ mkdir dataset
$ cd dataset
$ mkdir person1
Next, run
to train the model on the dataset you created.
$ python3
When the training is complete, the file encoding.pickle
will be created. This file will be used to recognize faces.
Now you need to compile the server_rec.c
and server_video.c
$ cd Server
$ gcc server_rec.c -o server_rec
$ gcc server_video.c -o server_video
The server_rec.rec
script opens a socket and waits for a connection from a client. When a connection is established, a child process is created to manage the connection.
// Accept the data packet from client and verification
connfd = accept(sockfd, (struct sockaddr *)&cli, &len);
if (connfd < 0)
printf("server accept failed...\n");
printf("server accept the client...\n");
// Start new child process that handles the connection
int fid = fork();
The child process creates a temporary folder where it saves the image files received from the client. The data bytes are read from the socket and saved as images with the save_photo()
function. Once the data transmission is complete, the
script is used to recognize faces by matching source images with the encodings.pickle
#define REC_PROGRAM "../facial_rec/"
// Path to the .pickle file returned from facial recognition training
#define PICKLE_FILE "../facial_req/encodings.pickle"
// Arguments to pass to the recognition program
char *argv_recognition[ARG_LEN] = {"python3", REC_PROGRAM, PICKLE_FILE, "-d"};
// main()
if (fid == 0)
// Buffer used to receive photos
uint8_t buff[MAX];
// Creation of temporary directory for incoming photos
char *dir_name = mkdtemp(template);
// Add directory to face recognition arguments
argv_recognition[ARG_LEN - 2] = dir_name;
// Bytes read
int bytes;
// Read untill EOF or an error
while ((bytes = read(connfd, buff, MAX)) > 0)
if (!save_photos(buff, bytes, dir_name))
// Break if the photos are finished
// Overwrite the stdout with the connection fd to send the response
dup2(connfd, STDOUT_FILENO);
// Execute the recognition program
execvp("python3", argv_recognition);
In the argument of python script a -d
flag are add to enable the erase of temp folder after recognition.
The server_video
script opens a socket and waits for a connection from a client. When a connection is established, a child process is created. This child process calls ffmpeg
to create a video from the image bytes received. The child process overwrites stdin
with the socket stream so that ffmpeg
can take the image bytes from the input stream.
// If child process
if (fid == 0)
// Overwrite the stdin with the connection fd to receive the photos
dup2(connfd, 0);
// Create the filename for the video
char filename[20];
sprintf(filename, "video-%d.avi", counter);
// Execute ffmpeg with images as input
execlp("ffmpeg", "ffmpeg", "-loglevel", "debug", "-y", "-f", "image2pipe", "-vcodec", "mjpeg", "-r",
"10", "-i", "-", "-vcodec", "mpeg4", "-qscale", "5", "-r", "10", filename, NULL);
script calls the recognition()
function on each image in the input folder. The function uses OpenCV
to open the image, detect face bounding boxes, and the face_recognition
library compute matches with the encodings.pickle
file. The function returns the name of the best matching index.
def recognition(path):
# Load the image
frame = cv2.imread(path)
# Resize the image
frame = cv2.resize(frame, (500, 500))
# Detect the face boxes
boxes = face_recognition.face_locations(frame)
# Compute the facial embeddings for each face bounding box
encoding = face_recognition.face_encodings(frame, boxes)[0]
matches = face_recognition.compare_faces(data["encodings"], encoding)
name = "Unknown" # If face is not recognized, then print Unknown
# Check to see if we have found a match
if True in matches:
# Find the indexes of all matched faces then initialize a
# dictionary to count the total number of times each face
# was matched
matchedIdxs = [i for (i, b) in enumerate(matches) if b]
counts = {}
# Loop over the matched indexes and maintain a count for
# each recognized face face
for i in matchedIdxs:
name = data["names"][i]
counts[name] = counts.get(name, 0) + 1
# Determine the recognized face with the largest number
# of votes (note: in the event of an unlikely tie Python
# will select first entry in the dictionary)
name = max(counts, key=counts.get)
return name
The script returns the most common name returned by the recognition()
function from all images.
for photo in os.listdir(path):
names.append(recognition(os.path.join(path, photo))) # call recognition function for each image
# -------------------------------------------------------------
# Counts the occurances of the results
counter = collections.Counter(names)
# Print the most common result
print(counter.most_common(1)[0][0]) # compute the most common index
Students: Enrico Carnelos - Roberto Lorenzon - Fabio Grotto
Embedded Software for the Internet of Things Course - Professor: Yildrim Kasim Sinan
MIT Licence or otherwise specified. See license file for details.