-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add audio datasets #598
Add audio datasets #598
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, and thank you for your great work! I like how similar to other modalities these datasets look like :)
@amanteur please make any non meaningful PR so CI triggers on your comments automatically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing the comments!
Sometimes you use "Args", but sometimes "Params". Please, follow the existing library's code style. I think CI/CD should catch it when activated on your comments.
@amanteur Hey, if you have any problems with making CI green just let me know :) |
@amanteur As we discussed on call, let's move optional imports into the functions bodies (it will help in the case when we have more than 1 optional library for the current modality, for example, when we add a library for audio augs) |
# Conflicts: # tests/test_oml/test_datasets/test_datasets.py
i guess thats it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing the previous comments!
A few notes about dataframe:
-
Please, add the description of the start_time column here: https://github.com/OML-Team/open-metric-learning/blob/main/docs/source/oml/data.rst
-
I think it may be confusing that all examples if df.csv within the range of (0, 1). So, before checking the docstring I thought that the measure is a fraction of the audio file. Can we replace a few values with ones greater than 1.0?
-
What if I don't have start_time for some of the records? I suggest to leave some of the records empty in
df_with_start_time.csv
. And addfillna(0.0)
in yourparse_start_time()
function. We have similar thing for bboxes: some of the images may not have bboxes.
dataset = AudioBaseDataset(df[PATHS_COLUMN].tolist(), num_channels=num_channels) | ||
for item in dataset: | ||
audio = item[dataset.input_tensors_key] | ||
assert audio.shape[0] <= num_channels, f"Audio channels {audio.shape[0]} exceed specified {num_channels}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why <=
but not a !=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this was dumb from me to use num_channels
, because there is usually (99%) only two or one channel, and users mostly downmix them into mono audios or leave the same (very rare), so I changed it from num_channels
to is_mono
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, let's keep this approach
But is_mono: bool = DEFAULT_IS_MONO
sounds like a flag indicating if an audio is in mono format or not.
I suggest convert_to_mono: bool = CONVERT_TO_MONO_DEFAULT
which is better semantically from my point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, almost done :)
This PR introduces support for audio datasets as a new modality, addressing part of the feature request outlined in issue #582. Below are the key additions:
Mock Audio Dataset: Added the
get_mock_audios_dataset
function, which mirrors the functionality ofget_mock_images_dataset
. This function provides a subset of the VoxCeleb1 dataset, commonly used for voice recognition and verification tasks.Dataset Classes: Introduced new dataset classes to facilitate audio data handling:
AudioBaseDataset, AudioLabeledDataset, AudioQueryGalleryDataset, AudioQueryGalleryLabeledDataset
These classes are analogous to the existing image dataset classes, enabling similar operations for audio data.
Audio Visualization: Implemented functions to visualize audio data both as standalone images and within HTML code, including embedded audio playback.