We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello,
I'm trying to reproduce the fine-tuning result on FSD50K.
I've tried multiple checkpoints but am not able to reach the 0.649 mAP in Table 4 of the paper.
Here is the results I've been able to attain:
Checkpoint music_audioset_epoch_15_esc_90.14.pt Fine-Tuned mAP: 0.499
Checkpoint music_speech_audioset_epoch_15_esc_89.98.pt Fine-Tuned mAP: 0.503
I've also tried the latest checkpoints that use the HTSAT-tiny audio model, with similar result.
Here is my setup as per the finetinetune-fsd50k.sh script:
python -m evaluate.eval_linear_probe \ --save-frequency 50 \ --save-top-performance 3 \ --save-most-recent \ --dataset-type="webdataset" \ --precision="fp32" \ --warmup 0 \ --batch-size=40 \ --lr=1e-4 \ --wd=0.1 \ --epochs=100 \ --workers=8 \ --use-bn-sync \ --freeze-text \ --amodel HTSAT-base \ --tmodel roberta \ --report-to wandb \ --wandb-notes "10.14-finetune-fsd50k" \ --datasetnames "FSD50K_webdataset" \ --datasetinfos train \ --seed 3407 \ --datasetpath /home/ubuntu/datasets/processed \ --logs /home/ubuntu/CLAP/clap_logs \ --gather-with-grad \ --lp-loss="bce" \ --lp-metrics="map" \ --lp-lr=1e-4 \ --lp-mlp \ --class-label-path="/home/ubuntu/CLAP/class_labels/FSD50k_class_labels_indices.json" \ --openai-model-cache-dir /home/ubuntu/CLAP/.cache \ --pretrained="/home/ubuntu/CLAP/pretrained" \ --data-filling "repeatpad" \ --data-truncating "rand_trunc" \ --optimizer "adam"
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hello,
I'm trying to reproduce the fine-tuning result on FSD50K.
I've tried multiple checkpoints but am not able to reach the 0.649 mAP in Table 4 of the paper.
Here is the results I've been able to attain:
Checkpoint music_audioset_epoch_15_esc_90.14.pt
Fine-Tuned mAP: 0.499
Checkpoint music_speech_audioset_epoch_15_esc_89.98.pt
Fine-Tuned mAP: 0.503
I've also tried the latest checkpoints that use the HTSAT-tiny audio model, with similar result.
Here is my setup as per the finetinetune-fsd50k.sh script:
The text was updated successfully, but these errors were encountered: