Skip to content

Commit

Permalink
Merge pull request #203 from voxel51/feature/leaky-splits
Browse files Browse the repository at this point in the history
Feature/leaky splits
  • Loading branch information
jacobsela authored Nov 25, 2024
2 parents cbb806b + 9c51d46 commit 7b0259c
Show file tree
Hide file tree
Showing 2 changed files with 530 additions and 0 deletions.
104 changes: 104 additions & 0 deletions fiftyone/brain/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -708,3 +708,107 @@ def compute_exact_duplicates(
return fbd.compute_exact_duplicates(
samples, num_workers, skip_failures, progress
)


def compute_leaky_splits(
samples,
brain_key=None,
split_views=None,
split_field=None,
split_tags=None,
threshold=0.2,
similarity_brain_key=None,
embeddings_field=None,
model=None,
model_kwargs=None,
similarity_backend=None,
similarity_config_dict=None,
**kwargs,
):
"""Uses a similarity index or creates one on the spot to find leaks.
Calling this method only creates the index. You can then call the methods
exposed on the returned object to perform the following operations:
- :meth:`leaks_view <fiftyone.brain.core.internal.leaky_splits.LeakySplitIndex.leaks>`:
Returns a view of all leaks in the dataset.
- :meth:`no_leaks_view <fiftyone.brain.core.internal.leaky_splits.LeakySplitIndex.no_leaks_view>`:
Returns a subset of the given view without any leaks.
- :meth:`leaks_for_sample <fiftyone.brain.core.internal.leaky_splits.LeakySplitIndex.leaks_for_sample>`:
Returns a view with leaks corresponding to the given sample.
- :meth:`tag_leaks <fiftyone.brain.core.internal.leaky_splits.LeakySplitIndex.tag_leaks>`:
Tags leaks in the dataset as leaks.
Args:
samples: a :class:`fiftyone.core.collections.SampleCollection` This should be a union of the
splits provided.
brain_key (None): a brain key under which to store the results of this
method. If no brain key is provided the results will not be saved.
split_views (None): a dict of :class:`fiftyone.core.view.DatasetView`
corresponding to different splits in the datset. Only one of
`split_views`, `split_field`, and `splits_tags` need to be used.
split_field (None): a string name of a field that holds the split of the sample.
Each unique value in the field will be treated as a split.
Only one of `split_views`, `split_field`, and `splits_tags` need to be used.
split_tags (None): a list of strings, tags corresponding to differents splits.
Only one of `split_views`, `split_field`, and `splits_tags` need to be used.
The splits should be disjoint and their union should be samples.
threshold (0.2): The threshold to run the algorithm with. Values between
0.1 - 0.25 tend to give good results.
similarity_brain_key (None): a brain key for the similarity index
If the brain key exists already, it will load up the similarity index corresponding to it
If the brain key does not exist already, a new similarity index will be created
and the results will be saved under this name
The similarity backend passed should have been computed on at least the argument samples.
This method may break if this condition is not met.
embeddings_field (None): field for embeddings to feed the index. This argument's
behavior depends on whether a ``model`` is provided, as described
below.
If no ``model`` is provided, this argument specifies the field of pre-computed
embeddings to use
If a ``model`` is provided, this argument specifies where to store
the model's embeddings
model (None): a :class:`fiftyone.core.models.Model` or the name of a
model from the
`FiftyOne Model Zoo <https://docs.voxel51.com/user_guide/model_zoo/index.html>`_
to use, or that was already used, to generate embeddings. The model
must expose embeddings (``model.has_embeddings = True``)
model_kwargs (None): a dictionary of optional keyword arguments to pass
to the model's ``Config`` when a model name is provided
similarity_backend: string, the similarity backend to use. The supported values are
``fiftyone.brain.brain_config.similarity_backends.keys()`` and the
default is
``fiftyone.brain.brain_config.default_similarity_backend``
similarity_config_dict: dict, used to build the similarity backend. Arguments take
precedence over the values in the dict (e.g. model)
Returns:
a :class:`fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex`,
a :class:`fiftyone.core.view.DatasetView`
"""

from fiftyone.brain.internal.core.leaky_splits import compute_leaky_splits

return compute_leaky_splits(
samples,
brain_key=brain_key,
split_views=split_views,
split_field=split_field,
split_tags=split_tags,
threshold=threshold,
similarity_brain_key=similarity_brain_key,
embeddings_field=embeddings_field,
model=model,
model_kwargs=model_kwargs,
similarity_backend=similarity_backend,
similarity_config_dict=similarity_config_dict,
**kwargs,
)
Loading

0 comments on commit 7b0259c

Please sign in to comment.