Add audio datasets #598

amanteur · 2024-06-17T04:30:52Z

[*] I've checked contribution guide.

This PR introduces support for audio datasets as a new modality, addressing part of the feature request outlined in issue #582. Below are the key additions:

Mock Audio Dataset: Added the get_mock_audios_dataset function, which mirrors the functionality of get_mock_images_dataset. This function provides a subset of the VoxCeleb1 dataset, commonly used for voice recognition and verification tasks.
Dataset Classes: Introduced new dataset classes to facilitate audio data handling: AudioBaseDataset, AudioLabeledDataset, AudioQueryGalleryDataset, AudioQueryGalleryLabeledDataset
These classes are analogous to the existing image dataset classes, enabling similar operations for audio data.
Audio Visualization: Implemented functions to visualize audio data both as standalone images and within HTML code, including embedded audio playback.

oml/datasets/audios.py

AlekseySh

Hey, and thank you for your great work! I like how similar to other modalities these datasets look like :)

ci/requirements_optional.txt

oml/const.py

oml/datasets/audios.py

oml/utils/audios.py

oml/datasets/audios.py

tests/test_oml/test_datasets/test_datasets.py

oml/const.py

AlekseySh · 2024-06-18T06:30:42Z

@amanteur please make any non meaningful PR so CI triggers on your comments automatically

# Conflicts: # ci/requirements_optional.txt

AlekseySh

Thank you for addressing the comments!

Sometimes you use "Args", but sometimes "Params". Please, follow the existing library's code style. I think CI/CD should catch it when activated on your comments.

oml/const.py

oml/datasets/audios.py

oml/utils/audios.py

tests/test_oml/test_datasets/test_datasets.py

tests/test_oml/test_datasets/test_audio.py

AlekseySh · 2024-06-25T15:28:49Z

@amanteur Hey, if you have any problems with making CI green just let me know :)

oml/const.py

AlekseySh · 2024-06-27T15:44:15Z

@amanteur As we discussed on call, let's move optional imports into the functions bodies (it will help in the case when we have more than 1 optional library for the current modality, for example, when we add a library for audio augs)

# Conflicts: # tests/test_oml/test_datasets/test_datasets.py

amanteur · 2024-06-30T16:16:05Z

i guess thats it

AlekseySh

Thank you for addressing the previous comments!

A few notes about dataframe:

Please, add the description of the start_time column here: https://github.com/OML-Team/open-metric-learning/blob/main/docs/source/oml/data.rst
I think it may be confusing that all examples if df.csv within the range of (0, 1). So, before checking the docstring I thought that the measure is a fraction of the audio file. Can we replace a few values with ones greater than 1.0?
What if I don't have start_time for some of the records? I suggest to leave some of the records empty in df_with_start_time.csv. And add fillna(0.0) in your parse_start_time() function. We have similar thing for bboxes: some of the images may not have bboxes.

oml/const.py

oml/datasets/audios.py

AlekseySh · 2024-07-02T10:16:00Z

tests/test_oml/test_datasets/test_audio.py

+    dataset = AudioBaseDataset(df[PATHS_COLUMN].tolist(), num_channels=num_channels)
+    for item in dataset:
+        audio = item[dataset.input_tensors_key]
+        assert audio.shape[0] <= num_channels, f"Audio channels {audio.shape[0]} exceed specified {num_channels}"


why <= but not a != ?

yeah, this was dumb from me to use num_channels, because there is usually (99%) only two or one channel, and users mostly downmix them into mono audios or leave the same (very rare), so I changed it from num_channels to is_mono.

okay, let's keep this approach

But is_mono: bool = DEFAULT_IS_MONO sounds like a flag indicating if an audio is in mono format or not.
I suggest convert_to_mono: bool = CONVERT_TO_MONO_DEFAULT which is better semantically from my point of view.

tests/test_oml/test_datasets/test_audio.py

oml/utils/download_mock_dataset.py

AlekseySh

Looks good, almost done :)

oml/datasets/audios.py

oml/utils/audios.py

oml/utils/download_mock_dataset.py

tests/test_oml/test_datasets/test_datasets.py

oml/datasets/audios.py

oml/const.py

amatov-ae added 5 commits June 16, 2024 15:16

Add audio related datasets

c8d229a

Add dataset tests

4ac7bb8

Minor changes

65d18f5

Add HTML visualizer

012a639

Add audio requirements

1c1d40b

AlekseySh assigned amanteur Jun 17, 2024

AlekseySh added the new feature label Jun 17, 2024

AlekseySh linked an issue Jun 17, 2024 that may be closed by this pull request

Support audio #582

Open

Apply linters

fda813f

AlekseySh reviewed Jun 17, 2024

View reviewed changes

oml/datasets/audios.py Outdated Show resolved Hide resolved

AlekseySh requested changes Jun 18, 2024

View reviewed changes

amanteur added 7 commits June 23, 2024 20:01

Fix datasets and additional minor changes

f765882

Minor changes

2214bf9

Add audio tests

afefc85

Change frame_offsets to start_times

bb54e6a

Add audio requirements

d38a0f3

Merge branch 'main' into audio-processing

5578b79

# Conflicts: # ci/requirements_optional.txt

Add audio pip install tag

f410198

AlekseySh reviewed Jun 23, 2024

View reviewed changes

tests/test_oml/test_datasets/test_datasets.py Outdated Show resolved Hide resolved

tests/test_oml/test_datasets/test_audio.py Outdated Show resolved Hide resolved

amanteur added 3 commits June 24, 2024 11:41

Merge branch 'main' into audio-processing

0738f7e

Minor changes

d49cd17

Minor changes

20c29ba

amanteur added 3 commits June 27, 2024 10:59

Isolate audio datasets

74d9bca

Fix tests

72c45cc

Merge branch 'main' into audio-processing

3f6ba6a

Update README.md

ed66fec

AlekseySh reviewed Jun 27, 2024

View reviewed changes

oml/const.py Outdated Show resolved Hide resolved

amanteur added 4 commits June 30, 2024 20:34

Fix tests

3487190

Fix tests & change NDArray to ndarray

a17e610

Merge branch 'main' into audio-processing

78286bd

# Conflicts: # tests/test_oml/test_datasets/test_datasets.py

Update datasets

873dd94

AlekseySh requested changes Jul 2, 2024

View reviewed changes

amanteur added 5 commits July 7, 2024 23:19

Fix issues

740032e

Minor changes

343bb76

Minor changes

af16a0b

Minor changes

7083d24

Minor changes

91cbb6e

AlekseySh requested changes Jul 8, 2024

View reviewed changes

amanteur added 3 commits July 13, 2024 11:25

Code polishing

20d860a

Merge branch 'main' into audio-processing

ee7bacc

Minor changes

9dcd25b

AlekseySh requested changes Jul 14, 2024

View reviewed changes

oml/const.py Outdated Show resolved Hide resolved

Update mock dataset's md5

9d8f37b

AlekseySh merged commit baad3e6 into OML-Team:main Jul 14, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add audio datasets #598

Add audio datasets #598

amanteur commented Jun 17, 2024

AlekseySh left a comment

AlekseySh commented Jun 18, 2024

AlekseySh left a comment

AlekseySh commented Jun 25, 2024

AlekseySh commented Jun 27, 2024 •

edited

Loading

amanteur commented Jun 30, 2024

AlekseySh left a comment •

edited

Loading

AlekseySh Jul 2, 2024

amanteur Jul 7, 2024 •

edited

Loading

AlekseySh Jul 8, 2024

AlekseySh left a comment

Add audio datasets #598

Add audio datasets #598

Conversation

amanteur commented Jun 17, 2024

AlekseySh left a comment

Choose a reason for hiding this comment

AlekseySh commented Jun 18, 2024

AlekseySh left a comment

Choose a reason for hiding this comment

AlekseySh commented Jun 25, 2024

AlekseySh commented Jun 27, 2024 • edited Loading

amanteur commented Jun 30, 2024

AlekseySh left a comment • edited Loading

Choose a reason for hiding this comment

AlekseySh Jul 2, 2024

Choose a reason for hiding this comment

amanteur Jul 7, 2024 • edited Loading

Choose a reason for hiding this comment

AlekseySh Jul 8, 2024

Choose a reason for hiding this comment

AlekseySh left a comment

Choose a reason for hiding this comment

AlekseySh commented Jun 27, 2024 •

edited

Loading

AlekseySh left a comment •

edited

Loading

amanteur Jul 7, 2024 •

edited

Loading