Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add audio datasets #598

Merged
merged 33 commits into from
Jul 14, 2024
Merged

Add audio datasets #598

merged 33 commits into from
Jul 14, 2024

Conversation

amanteur
Copy link
Contributor

  • [*] I've checked contribution guide.

This PR introduces support for audio datasets as a new modality, addressing part of the feature request outlined in issue #582. Below are the key additions:

  • Mock Audio Dataset: Added the get_mock_audios_dataset function, which mirrors the functionality of get_mock_images_dataset. This function provides a subset of the VoxCeleb1 dataset, commonly used for voice recognition and verification tasks.

  • Dataset Classes: Introduced new dataset classes to facilitate audio data handling: AudioBaseDataset, AudioLabeledDataset, AudioQueryGalleryDataset, AudioQueryGalleryLabeledDataset
    These classes are analogous to the existing image dataset classes, enabling similar operations for audio data.

  • Audio Visualization: Implemented functions to visualize audio data both as standalone images and within HTML code, including embedded audio playback.

@AlekseySh AlekseySh linked an issue Jun 17, 2024 that may be closed by this pull request
oml/datasets/audios.py Outdated Show resolved Hide resolved
Copy link
Contributor

@AlekseySh AlekseySh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, and thank you for your great work! I like how similar to other modalities these datasets look like :)

ci/requirements_optional.txt Outdated Show resolved Hide resolved
oml/const.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Show resolved Hide resolved
oml/utils/audios.py Outdated Show resolved Hide resolved
oml/utils/audios.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
tests/test_oml/test_datasets/test_datasets.py Show resolved Hide resolved
oml/const.py Outdated Show resolved Hide resolved
@AlekseySh
Copy link
Contributor

@amanteur please make any non meaningful PR so CI triggers on your comments automatically

Copy link
Contributor

@AlekseySh AlekseySh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing the comments!

Sometimes you use "Args", but sometimes "Params". Please, follow the existing library's code style. I think CI/CD should catch it when activated on your comments.

oml/const.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Show resolved Hide resolved
oml/datasets/audios.py Show resolved Hide resolved
oml/datasets/audios.py Show resolved Hide resolved
oml/utils/audios.py Outdated Show resolved Hide resolved
@AlekseySh
Copy link
Contributor

@amanteur Hey, if you have any problems with making CI green just let me know :)

oml/const.py Outdated Show resolved Hide resolved
@AlekseySh
Copy link
Contributor

AlekseySh commented Jun 27, 2024

@amanteur As we discussed on call, let's move optional imports into the functions bodies (it will help in the case when we have more than 1 optional library for the current modality, for example, when we add a library for audio augs)

@amanteur
Copy link
Contributor Author

i guess thats it

Copy link
Contributor

@AlekseySh AlekseySh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing the previous comments!

A few notes about dataframe:

  1. Please, add the description of the start_time column here: https://github.com/OML-Team/open-metric-learning/blob/main/docs/source/oml/data.rst

  2. I think it may be confusing that all examples if df.csv within the range of (0, 1). So, before checking the docstring I thought that the measure is a fraction of the audio file. Can we replace a few values with ones greater than 1.0?

  3. What if I don't have start_time for some of the records? I suggest to leave some of the records empty in df_with_start_time.csv. And add fillna(0.0) in your parse_start_time() function. We have similar thing for bboxes: some of the images may not have bboxes.

oml/const.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
dataset = AudioBaseDataset(df[PATHS_COLUMN].tolist(), num_channels=num_channels)
for item in dataset:
audio = item[dataset.input_tensors_key]
assert audio.shape[0] <= num_channels, f"Audio channels {audio.shape[0]} exceed specified {num_channels}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why <= but not a != ?

Copy link
Contributor Author

@amanteur amanteur Jul 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this was dumb from me to use num_channels, because there is usually (99%) only two or one channel, and users mostly downmix them into mono audios or leave the same (very rare), so I changed it from num_channels to is_mono.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, let's keep this approach

But is_mono: bool = DEFAULT_IS_MONO sounds like a flag indicating if an audio is in mono format or not.
I suggest convert_to_mono: bool = CONVERT_TO_MONO_DEFAULT which is better semantically from my point of view.

tests/test_oml/test_datasets/test_audio.py Outdated Show resolved Hide resolved
tests/test_oml/test_datasets/test_audio.py Outdated Show resolved Hide resolved
oml/utils/download_mock_dataset.py Outdated Show resolved Hide resolved
oml/utils/download_mock_dataset.py Outdated Show resolved Hide resolved
Copy link
Contributor

@AlekseySh AlekseySh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, almost done :)

oml/datasets/audios.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
oml/utils/audios.py Outdated Show resolved Hide resolved
oml/utils/audios.py Outdated Show resolved Hide resolved
oml/utils/download_mock_dataset.py Outdated Show resolved Hide resolved
oml/utils/download_mock_dataset.py Outdated Show resolved Hide resolved
oml/utils/download_mock_dataset.py Outdated Show resolved Hide resolved
tests/test_oml/test_datasets/test_datasets.py Show resolved Hide resolved
oml/datasets/audios.py Outdated Show resolved Hide resolved
oml/const.py Outdated Show resolved Hide resolved
@AlekseySh AlekseySh merged commit baad3e6 into OML-Team:main Jul 14, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Support audio
3 participants