Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the face alignment performance in detect_faces() #1409

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

huulockt
Copy link

Tickets

#1244
#1406

What has been done

With this PR, the detect_faces() logic has been modified when alignment is enabled:

  • The original image is passed directly to the detect function, so the detection results are not affected by the alignment flag.
  • Only the face region is aligned, rather than the entire image, which improves alignment speed.

Sorry I didn't add a unit test case for #1244 as I promised. I think this bug can only be detected visually, since it’s hard to test automatically. But if an automated test is needed, I’d suggest using template matching algorithms.

How to test

make lint && make test

@serengil
Copy link
Owner

When I perform current design and your change of detection module for this image, I am getting these results:

current design:

opencv deteceted 5 faces
ssd deteceted 3 faces
dlib deteceted 4 faces
mtcnn deteceted 6 faces
retinaface deteceted 7 faces
yunet deteceted 4 faces
yolov8 deteceted 8 faces
centerface deteceted 6 faces

your change:
opencv deteceted 5 faces
ssd deteceted 4 faces
dlib deteceted 4 faces
mtcnn deteceted 5 faces
retinaface deteceted 6 faces
yunet deteceted 4 faces
yolov8 deteceted 7 faces
centerface deteceted 7 faces

You can also find the detection results here:
Screenshot 2024-12-24 at 11 01 49

As you can see, your change causes not to find faces close to image boundaries for most of detectors.

I am sharing my test code here:

from deepface import DeepFace

img_path = "dataset/selfie-many-people.jpg"

detector_backends = [
    "opencv",
    "ssd",
    "dlib",
    "mtcnn",
    "retinaface",
    "yunet",
    "yolov8",
    "centerface",
]

for detector_backend in detector_backends:
    face_objs = DeepFace.extract_faces(
        img_path=img_path,
        detector_backend=detector_backend,
        # expand_percentage=0,
    )
    print(f"{detector_backend} deteceted {len(face_objs)} faces")
    fig = plt.figure(figsize=(10, 10))
    for face_obj in face_objs:
        face = face_obj["face"]
        plt.imshow(face)
        plt.axis("off")
        plt.show()

TLDR: current design can detect faces close to boundaries but your change cannot.

@huulockt
Copy link
Author

Thanks for the feedback! I'm busy with Christmas right now, but I’ll check it carefully soon. For now, here are my thoughts:

  • Each model is pretrained on its own dataset, so there are always some constraints on what it can detect.
  • Lightweight detectors like SSD, YuNet, and CenterFace can struggle with large images. In my design, by not adding border before detection helps, these models tend to perform better.
  • Detection results mainly depend on the threshold. If we want to make it easier for users, we could run a proper benchmark to suggest optimal thresholds. Otherwise, users would need to find an appropriate threshold themselves for their specific models (as I'm doing).

P/S: Merry Christmas! Hope you enjoy the holiday season. 🎄

@serengil
Copy link
Owner

Of course, take your time. I hope you understand my concern. When increasing its time consumption performance, I don't want to decrease its detection performance. Enhancement should offer same accuracy performance or more. Here are my comments:

  • Each model is pretrained on its own dataset, so there are always some constraints on what it can detect -> This is independent from model because with current design retinaface and mtcnn can detect more faces. So, your change caused this.
  • Lightweight detectors like SSD, YuNet, and CenterFace can struggle with large images. In my design, by not adding border before detection helps, these models tend to perform better -> right, ssd and centerface outperform than existing design.
  • Detection results mainly depend on the threshold -> again, we used same threshold for retinaface and mtcnn, but new design is missing some faces. So, that must be related to the detection logic you proposed.

@huulockt huulockt force-pushed the enhance-aligment-performance branch from 5a0eea1 to 421ef9e Compare December 25, 2024 22:17
@huulockt
Copy link
Author

Actually, the detection results in my design are the same as the current design when the align flag is turned off. Here’s the first solution I came up with: We can keep the current border-adding step before detection, but combine it with my proposal to only apply alignment to the face region. I implemented this in the last commit. However, when I tested it myself, I couldn’t figure out why mtcnn still returns 5 faces in both my design and the current design. Could you please run the test code again with mtcnn?

Moreover, the above solution doesn’t fully preserve the detection improvements observed with models like ssd, centerface, and yunet(*). I propose adding these models in a skip-border-addition list, and the code would look like this:
if align is True and model_name not in skip_list:

Additionally, for further clarification, in my design, yolov8 can detect all 7 faces, but faces near the border return outer-eye coordinates as (0,0), which affects alignment. The current design didn't have this problem because of the border, so please let me know if you want to add yolov8 to the skip-list in the list too.

What do you think of these ideas?

(*) For yunet, with a threshold of 0.8, the current design detects 4 faces, while my design detects 6. Perhaps we could consider lowering the threshold for better results.

@serengil
Copy link
Owner

serengil commented Dec 25, 2024

So, we are still adding borders to image. Why should we merge this PR then? I cannot see any reasonable improvements.

@huulockt
Copy link
Author

The main improvement in my design lies in how the alignment input is adjusted, regardless of whether borders are added or not. Currently, the entire image is used as the input for alignment, and this process is repeated n times—once for each face detected.

In my design, only the facial area is used as the input for each alignment operation. Since the entire image is significantly larger than the facial area, this optimization saves a considerable amount of processing time—especially when multiple faces are present in the image.

I hope this explanation clarifies the benefits of the proposed changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants