Skip to content

Francesco-Ranieri/music-genre-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation


AWS - deploy API ๐Ÿ•ธ AWS - Deploy APP ๐Ÿ•ธ Linter ๐Ÿ PyPI - release Feature Extractor ๐ŸŒช PyPI license Prometheus Grafana Loki prometheus Better Uptime Badge Better Uptime Badge Loki

Intro - Project Idea

Music Classification aims to understand the music semantics over various different features. In this project we have proposed a novel ensemble model for the Music Genre Classification task which try to classify music based on its genre. The final model is created by combining the predictions from multiple models: Random forest algorithm and Convolutional Neural Network. It scores on test set an accuracy of 87%.

0. Project Structure

This project used the Cookiecutter๐Ÿช template for project strucutre and the conventional commit specification for adding human and machine readable meaning to commit messages.
It is composed of 3 components:

  • WEB APP Module:
  • Observability Module:
    • Grafana dashboard created with:
      • Prometheus
      • Tempo
      • Loki
  • PyPI package for song features extraction:
Project detailed tree structure ๐Ÿ” [CLICK TO EXPAND]
๐Ÿ“ฆmusic-genre-classification
 โ”ฃ ๐Ÿ“‚.dvc
 โ”ฃ ๐Ÿ“‚.github                                         
 โ”ƒ โ”— ๐Ÿ“‚workflows                                 : project pipelines
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œaws_deploy_api.yml                      : backend app aws deploy 
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œaws_deploy_app.yml                      : frontend app aws deploy
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œlinter.yml                              : code checks and tests
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œrelease_to_pypi.yml                     : pypi package release
 โ”ƒ โ”ฃ ๐Ÿ“œ.gitignore
 โ”ฃ ๐Ÿ“‚data                                        : Hosted Dataset 
 โ”ƒ โ”ฃ ๐Ÿ“‚processed                                 : PROCESSED DATA - DVC hosted
 โ”ƒ โ”ƒ                       
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚gtzan_data                              : 1ยฐ dataset
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œx_test.pkl                            : test dataset features
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œx_train.pkl                           : train dataset features
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œx_train_split.pkl                     : train subset dataset features
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œx_validation.pkl                      : test subset dataset features
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œy_test.pkl                            : test dataset labels
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œy_train.pkl                           : train dataset labels
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œy_train_split.pkl                     : train subset dataset labels
 โ”ƒ โ”ƒ โ”ƒ โ”— ๐Ÿ“œy_validation.pkl                      : test subset dataset labels
 โ”ƒ โ”ƒ โ”ƒ      
 โ”ƒ โ”ƒ โ”— ๐Ÿ“‚mfcc_data                               : 2ยฐ dataset                       
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œx_test.pkl                            : ...        
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œx_train.pkl                           : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œx_train_split.pkl                     : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œx_validation.pkl                      : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œy_test.pkl                            : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œy_train.pkl                           : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œy_train_split.pkl                     : ...
 โ”ƒ โ”ƒ โ”ƒ โ”— ๐Ÿ“œy_validation.pkl                      : ...
 โ”ƒ โ”ƒ                            
 โ”ƒ โ”— ๐Ÿ“‚raw                                       - RAW DATA - Google Drive hosted                 
 โ”ƒ โ”ƒ โ”— ๐Ÿ“‚dataset                                 : 1000 songs, 10x genre
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚genres_original                       : Original .wav song
 โ”ƒ โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚blues                               : 100 blues songs
 โ”ƒ โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚classical                           : 100 classical songs
 โ”ƒ โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚country                             : 100 contry songs
 โ”ƒ โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚disco                               : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚hiphop                              : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚jazz                                : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚metal                               : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚pop                                 : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚reggae                              : ...
 โ”ƒ โ”ƒ โ”ƒ โ”ƒ โ”— ๐Ÿ“‚rock                                : 100 rock songs
 โ”ƒ โ”ƒ โ”ƒ โ”— ๐Ÿ“œfeatures_3_sec.csv                    : Song features
 โ”ฃ ๐Ÿ“‚notebooks
 โ”ƒ โ”ฃ ๐Ÿ“œaudio_augmentation.ipynb                  : Song augmentation notebook
 โ”ƒ โ”— ๐Ÿ“œfeat_extractor.ipynb                      : Song features extractor 
 โ”ฃ ๐Ÿ“‚observability                               : Observability module
 โ”ƒ โ”ฃ ๐Ÿ“‚grafana                                   
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚dashboards                              
 โ”ƒ โ”ƒ โ”ƒ โ”— ๐Ÿ“œdashboards.json                       : Grafana dashboard implementation 
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œdashboards.yml                          : Grafana config
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œdata_source.yml                         : Grafana data source 
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œgrafana.ini                             
 โ”ƒ โ”ฃ ๐Ÿ“‚prometheus
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œalert.yml                               : Prometheus alerts 
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œprometheus.yml                          : Prometheus config
 โ”ƒ โ”— ๐Ÿ“‚tempo 
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œtempo.yml                               : Tempo config
 โ”ฃ ๐Ÿ“‚reports
 โ”ƒ โ”ฃ ๐Ÿ“‚figures                                   
 โ”ƒ โ”ฃ ๐Ÿ“‚history                                   : Pipeline track files
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œgtzan_history.json
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œmfcc_history.json
 โ”ƒ โ”— ๐Ÿ“‚tests                                     : Test track files
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œdeep_checks.json                        : โ”€โ”’
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œdeep_gtzan_checks.html                  :  โ”ฃโ”€โ”€> Deep checks reports file
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œdeep_mfcc_checks.html                   : โ”€โ”›
 โ”ฃ ๐Ÿ“‚src                 
 โ”ƒ โ”ฃ ๐Ÿ“‚api                                       : App BE folder
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚entities                                : Api models
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œmodel_allowed_enum.py
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œpredict_model_request.py
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œapi_rest.py                             : Api controller
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œmusic_prediction.py                     : Api services
 โ”ƒ โ”ƒ
 โ”ƒ โ”ฃ ๐Ÿ“‚app                                       : App FE folder
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œgradio_app.py                           : App main
 โ”ƒ โ”ฃ ๐Ÿ“‚data                                      : Data modeling
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œdata_utils.py                       
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œmake_dataset.py                         
 โ”ƒ โ”ฃ ๐Ÿ“‚feat_extractor                            : PyPi package used in APP
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œfeat_extractor.py
 โ”ƒ โ”ฃ ๐Ÿ“‚models                                    
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚classes                                  
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œbase_model.py                         : Common Model
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œgtzan_model.py                       
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œmfcc_model.py
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œevaluation.py                           : Model evaluation utils
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œmodel_utils.py                          : Model creation utils
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œpredict_model.py                        : Pipeline script for testing
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œtrain_model.py                          : Pipeline script for training
 โ”ƒ โ”ฃ ๐Ÿ“‚visualization
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œvisualize.py                            : Song feature visualization
 โ”ƒ โ”ฃ ๐Ÿ“œpathUtils.py                              : Relative project paths
 โ”ƒ โ”ฃ ๐Ÿ“œsetup.py
 โ”ฃ ๐Ÿ“‚tests
 โ”ƒ โ”ฃ ๐Ÿ“‚api_tests
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œtest_api.py                             : Unit tests - API
 โ”ƒ โ”ฃ ๐Ÿ“‚dataset_tests 
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œtest_dataset_integrity.py               : Integrity tests - DATASET     
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œtest_dataset_util.py                    : Unit tests - DATASET
 โ”ƒ โ”ฃ ๐Ÿ“‚feat_extractor_tests
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œtest_feat_extractor.py                  : Unit tests - PyPI package
 โ”ƒ โ”ฃ ๐Ÿ“‚models_tests
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œtest_behavioral_model.py                : Behavioral Tests - MODEL
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œtest_model.py                           : Unit tests - MODEL
 โ”ƒ โ”ฃ ๐Ÿ“‚path_utils_tests                          
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œtest_path_utils.py                      : Unit tests - PATH UTILS
 โ”ƒ โ”ฃ ๐Ÿ“‚resources                             
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚augmented                           
 โ”ƒ โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“‚noise                                            
 โ”ƒ โ”ƒ โ”ƒ โ”— ๐Ÿ“‚shift_time
 โ”ƒ โ”ƒ โ”— ๐Ÿ“œhip_hop_test.wav
 โ”ƒ โ”ฃ ๐Ÿ“‚test_utils
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œmock_dataset.py
 โ”ƒ โ”ƒ โ”ฃ ๐Ÿ“œutils.py
 โ”ฃ ๐Ÿ“œdocker-compose.yml                          : docker compose for BE/FE/Observability
 โ”ฃ ๐Ÿ“œDockerfile-be                               : BE docker file
 โ”ฃ ๐Ÿ“œDockerfile-fe                               : FE docker file
 โ”ฃ ๐Ÿ“œdvc.yaml                                    : DVC pipeline file
 โ”ฃ ๐Ÿ“œparams.yaml                                 : DVC pipeline params
 โ”ฃ ๐Ÿ“œrequirements.txt
 โ”ฃ ๐Ÿ“œrequirements_be.txt
 โ”ฃ ๐Ÿ“œrequirements_fe.txt
 โ”ฃ ๐Ÿ“œsetup.py                                    : Src folder installation

1. Inception

Model card

The Music Genre classifier is an ensemble model which combines:

Dataset card

The models described above use the following datasets respectively:

2. Reproducibility

Dagshub

Dagshub is a Github's inspired platform, specifically created for data science projects, that allows to host, version, and manage code, data, models, experiments, Dagshub is free and open-source.
Music Genre Classification - Dagshub Repository

DVC

DVC is a software, based on Git, that allows to version data and track data science experiments. In this project, the contents of the data folder is stored and tracked using DVC. The remote storage used is the one offered by Dagshub.

PyPi

The Python Package Index (PyPI) is a repository of software for the Python programming language.
PyPI helps you find and install software developed and shared by the Python community. In order to split the models module and the GUI app, the share logic, for features extraction, is exported as pypy package. This choice provides not only a logical separation but also allowed to divide this project into 3 sub-project:

  • one for the model module
  • one for the app
  • one for the pypi package

Docker and Compose

Docker is a software platform that allows you to build, test, and deploy applications quickly. It is possibile to run the entire project:

  • APP
  • API
  • GRAFANA
    • PROMETHEUS
    • LOKI
    • TEMPO just by run the command
   docker compose up

or to build them using the parameter

    docker compose up --build 

Pipelines

DVC allows not only to version data, but also to create fully reproducible pipelines. The pipelines are defined using the CLI or by manually editing the dvc.yaml file.
A pipelines of 5 steps has been defined:

  • prepare: dowload dataset if not exitst from a google drive source, load GTZAN dataset and create MFCC dataset.
  • train gtzan: train model on the train data of the GTZAN dataset
  • train mfcc: train model on train data of the MFCC dataset
  • test gtzan: test model on the test data of the GTZAN dataset
  • test mfcc: test model on the test data of the MFCC dataset

The pipeline can be configured using the params.yaml file. This file contains configurations for the type of the model. By setting the correct params, it is possible to choose which model should be trained or tested.

MLFlow

MLFLow is a software that allows to track Machine Learning experiments and models. It stores the metrics of the experiments, allowing the developer to compare different models and parameters. Also, allows to store the models and retrieve them when needed. In this project MLflow tracks every experiment, params and metrics which are available for consultation in a convenient GUI.

3. ENVIROMENT

local .env

The .env file is not shared for security reason. But the env of this project contain the following variables:

  • MLFLOW_TRACKING_URI
  • MLFLOW_TRACKING_USERNAME
  • MLFLOW_TRACKING_PASSWORD
  • API_URL

Github variables and secrets

In the github actions the env variables described above are needed to run the different pipeline. In order to store them in a safe place, the following github secrets has been defined:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • MLFLOW_TRACKING_PASSWORD
  • MLFLOW_TRACKING_URI
  • MLFLOW_TRACKING_USERNAME
  • PYPI_API_TOKEN
  • PYPI_USERNAME

4. Github workflow

AWS - deploy API/APP ๐Ÿ•ธ

  • Trigger: src/api or src/app folder modified
  • Action: aws deploy

AWS - Linter ๐Ÿ

  • Trigger: every commit
  • Action: code checks (better explain in next section)

PyPI - release Feature Extractor ๐ŸŒช

  • Trigger: new tag created
  • Action: pypi release

5. Quality assurance

Pylint

This project integrates pylint, which is a static code analyser for Python, which checks the quality of the source code.
The code has been rated at 8.53/10

Flake8

This project integrates Flake8, which is a linter that verifies pep8, pyflakes, and circular complexity

Pynblint

This project integrates pynblint, which is a static code analyser for Python notebooks.

Tool for code quality

For code formatting it is used autopep8 which automatically formats Python code to conform to the PEP 8 style guide.

$ pip install --upgrade autopep8
$ cd <folder-to-format>
$ autopep8 --in-place --recursive .

Tests

Junit tests

Pytest is a Python testing framework. This project integrates pytest for unit testing of the code.

Deep Checks

Deepchecks Open Source is a python library for data scientists and ML engineers. The package includes extensive test suites for machine learning models and data, built in a way thatโ€™s flexible, extendable and editable.
And the genererated report are:

Behavioural tests

Behavioral testing is concerned with testing different capabilities of a system by validating the output, without any knowledge of the internal structure. In this project the following Behavioural tests have been made:

  • Test with normal music
  • Test with augmented music:
    • add noise to the song
    • change the order of some song parts in different direction More details in Behavioural Test Reports

6. API

Both the modules app and api are deployed on aws serverless Fargate instance.
The api docs are avaiable here

The api rest exposed are:

  • /
    • happy path for server test
  • /predict/music
    • used for predict music genre

7. APP

The app module is available here The FE code is generated by using gradio python package. It is a powerful library for machine learning frontend application.
With this snippet:

demo = gradio.Interface(
    fn=predict,
    inputs=gradio.Audio(),
    outputs=gradio.Label(label='Predicted Genre'),
    allow_flagging='never',
    title='Music Genre Classification',
    description='This is a Music Genre Classification model based on a novel ensemble approach'
)

gradio generates this interface: You can drop audio and click predict. In a local env it takes 10 seconds to predict the genre but on AWS it takes 50 seconds because of low ram ECS instance (0.5 GB)

8. Monitoring

Application monitoring is the process of monitoring an application's performance, availability, and end-user experience to ensure the application is functioning properly.

Grafana

Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. In this project the Grafana dashboards are build on these three components:

  • Traces with Tempo and OpenTelemetry Python SDK
  • Metrics with Prometheus and Prometheus Python Client
  • Logs with Loki


The dashboard implemented is:


It consists of 4 panels:

  • pie chart for genre predicted
  • number of prediction graph
  • highest ram capacity used indicator
  • log section

Better Uptime

Application monitoring is important not only it is necessary to track an application's performance but also identify when and where along the journey an abnormality was found and why it happened. Better Uptime send notification when a server is down. In this project the free plan it is used and the notification are available only by mails.
These are the monitored servers (API and APP):

Dashboard details:
In case of down, this is the mail received:

Finally, if you are a programmer, check this utils !!