-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute near duplicates + ROI fields #214
Conversation
f126730
to
d4f80bc
Compare
"%s mixin" % fbs.DuplicatesMixin | ||
) | ||
|
||
similarity_index.find_duplicates(thresh=threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brimoor This is what I mean with embeddings/models/similarity boilerplate. This function is 60 lines of code for effectively one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but this boilerplate does serve some purposes:
- Allow
compute_near_duplicates()
to use a different default model thancompute_similarity()
- Enforce that a pre-existing
similarity_index
must implement theDuplicatesMixin
mixin
@@ -30,18 +30,21 @@ def compute_leaky_splits( | |||
samples, | |||
splits, | |||
threshold=None, | |||
roi_field=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this for copy-paste augments and other strange edge cases.
_DEFAULT_MODEL = "mobilenet-v2-imagenet-torch" | ||
_DEFAULT_BATCH_SIZE = None | ||
|
||
|
||
def compute_similarity( | ||
samples, | ||
patches_field, | ||
roi_field, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between ROI field and patches field from a functional standpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patches_field
argument throughout the Brain instructs the relevant methods to operate on object patches rather than samples. So for example compute_similarity(patches_field=)
says to generate a similarity index keyed by label ID rather than sample ID.
The roi_field
argument means that you're doing something at the sample-level, but you want to use a specific ROI in the image rather than the full image to do the analysis. The most common use case here would be if you have a single Detection
per image. If you have Detections
, then roi_field
aggregates the per-object embeddings into a single embedding (currently by averaging them) and uses that vector to represent the sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in that case passing a single patch per sample to patches_field is equivalent to ROI field? feels strange to have them as two different arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No they're not quite equivalent. In that case (one patch per sample), the embeddings in the index will be the same. But if you use roi_field
then the primary key for the index will be sample ID, while if you use patches_field
the primary key will be label ID.
And of course if there are in fact multiple object patches per sample, then roi_field
vs patches_field
also differs in that ROI field will have # objects
vectors in the index while patches_field
will have # objects
vectors.
Change log
fob.compute_near_duplicates()
method that provides a use case-centric interface to the existingDuplicatesMixin.find_duplicates()
methodroi_field
arguments tocompute_similarity()
andcompute_leaky_splits()
for consistency withcompute_uniqueness()
andcompute_representativeness()
Example near duplicates usage
Example
roi_field
usageSetup
Similarity w/ ROI
Near duplicates w/ ROI
Leaky splits w/ ROI