Skip to content

Latest commit

 

History

History
88 lines (60 loc) · 2.92 KB

README.md

File metadata and controls

88 lines (60 loc) · 2.92 KB

ML-NLP-LyricsGen-Transformer

AI-lab task by Overfitter

Fine-tuning and prompting a transformer(GPT2) for a song-lyrics generator.

Group members and Contributors:

Tr33Bug gusse-dev CronJorian BFuertsch

Project structure

  1. README.md
  2. 10_DataEngineering.ipynb
    • 11_createDataset.py
  3. 20_GPT2_TrainingLoop.ipynb
    • 21_GPT2_TrainingLoop.py
    • startTraining.sh
  4. 30_ModelEvaluation.ipynb
  5. 40_Prompting.ipynb
    • 41_Prompting.py
  6. 50_GeneratorGUI.py

Main project files

10_DataEngineering.ipynb

Notebook to generate, clean, and analyze the lyrics dataset for the lyrics generator.

  1. generate: to generate the dataset, we use 3 lists from IMDB and the lyricsgenius framework to crawl the song lyrics from the API from (genius.com)[www.genius.com]:

  2. clean: cleaning the .txt files and deleting all unnecessary characters such as , (), etc.

  3. analyzing: viewing graphs, merging the lists of artists, dropping short songs and artists with fewer songs, and counting the most used words.

In the end, we export the generated dataset files to df_rap.csv, df_songs.csv, and df_top.csv.

11_createDataset.py

Python script to generate the datasets from the folders and save the datasets as df_rap.csv, df_songs.csv, and df_top.csv.

20_GPT2_TrainingLoop.ipynb

Notebook to export the test data for evaluation and train the dataset on the GPT2 model.

21_GPT2_TrainingLoop.py

Python script exported from 20_GPT2_TrainingLoop.ipynb file to train remote on the KILab pool PC.

startTraining.sh

Helper script to perform the remote training on the KILab pool PC.

30_ModelEvaluation.ipynb

Notebook to evaluate the Training of our models and compare them to pretrained GPT2. For that, we load the training results and the models and calculate the BLEU score.

40_Prompting.ipynb

Notebook to perform prompting with OpenPrompt on the models.

41_Prompting.py

Python script exported from 40_Prompting.ipynb file to train remote on the KILab pool PC.

50_GeneratorGUI.py

Python script to demonstrate the song lyrics generation via a GUI.


Setup notes

For 10_DataEngineering.ipynb there needs to be a API token for the genius.com API stored in a file called geniusToken.txt.

cd ML-NLP-LyricsGen-Transformer/

touch geniusToken.txt

echo TOKEN > geniusToken.txt

To use the dataset in the notebooks run the 11_createDataset.py to create the CSV files stored in ./datasets/.

pip install pandas

python 11_createDataset.py