1-line programs for fine-tuning, inference and more
- Papers
- Videos 📽️
- Installation
- Documentation
- ACL-2022 Tutorial
gft contains 4 main functions:
- gft_fit: fit a pretrained model to data (aka fine-tuning)
- gft_predict: apply a model to inputs (aka inference)
- gft_eval: score a model on a split of a dataset
- gft_summary: Find good stuff (popular models and datasets), and explain what's in those models and datasets.
These gft functions make use of 4 main arguments (though most arguments in most hubs are also supported):
- data: standard datasets hosted on hubs such as HuggingFace, PaddleNLP, or custom datasets hosted on the local filesystem
- model: standard models hosted on hubs such as HuggingFace, PaddleNLP, or custom models hosted on the local filesystem
- equation: string such as "classify: label ~ text", where classify is a task, and label and text refer to columns in a dataset
- task: classify, classify_tokens, classify_spans, classify_audio, classify_images, regress, text-generation, translation, ASR, fill-mask
Here are some simple examples:
emodel=H:bhadresh-savani/roberta-base-emotion
# Summarize a dataset and/or model
gft_summary --data H:dair-ai/emotion
gft_summary --model $emodel
gft_summary --data H:dair-ai/emotion --model $emodel
# find some popular datasets and models that contain "emotion"
gft_summary --data H:__contains__emotion --topn 5
gft_summary --model H:__contains__emotion --topn 5
# make predictions on inputs from stdin
echo 'I love you.' | gft_predict --task classify
# The default model (for the classification task) performs sentiment analysis
# The model, $emodel, outputs emotion classes (as opposed to POSITIVE/NEGATIVE)
echo 'I love you.' | gft_predict --task classify --model $emodel
# some other tasks (beyond classification)
echo 'I love New York.' | gft_predict --task H:token-classification
echo 'I <mask> you.' | gft_predict --task H:fill-mask
# make predictions on inputs from a split of a standard dataset
gft_predict --eqn 'classify: label ~ text' --model $emodel --data H:dair-ai/emotion --split test
# return a single score (as opposed to a prediction for each input)
gft_eval --eqn 'classify: label ~ text' --model $emodel --data H:dair-ai/emotion --split test
# Input a pre-trained model (bert) and output a post-trained model
gft_fit --eqn 'classify: label ~ text' \
--model H:bert-base-cased \
--data H:dair-ai/emotion \
--output_dir $outdir
The table below shows a 3-step recipe, which has become standard in the literature on deep nets.
Step | gft Support | Description | Time | Hardware |
---|---|---|---|---|
1 | Pre-Training | Days/Weeks | Large GPU Cluster | |
2 | gft_fit | Fine-Tuning | Hours/Days | 1+ GPUs |
3 | gft_predict | Inference | Seconds/Minutes | 0+ GPUs |
This repo provides support for step 2 (gft_fit) and step 3 (gft_predict). Most gft_fit and gft_predict programs are short (1-line), much shorter than examples such as these, which are typically a few hundred lines of python. With gft, users should not need to read or modify any python code for steps 2 and 3 in the table above.
Step 1, pre-training, is beyond the scope of this work. We recommend starting with models from HuggingFace and PaddleHub/PaddleNLP hubs, as illustrated in the examples above.
@inproceedings{church-etal-2022-gentle,
title = "A Gentle Introduction to Deep Nets and Opportunities for the Future",
author = "Church, Kenneth and
Kordoni, Valia and
Marcus, Gary and
Davis, Ernest and
Ma, Yanjun and
Chen, Zeyu",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-tutorials.1",
pages = "1--6",
abstract = "The first half of this tutorial will make deep nets more accessible to a broader audience, following {``}Deep Nets for Poets{''} and {``}A Gentle Introduction to Fine-Tuning.{''} We will also introduce GFT (general fine tuning), a little language for fine tuning deep nets with short (one line) programs that are as easy to code as regression in statistics packages such as R using glm (general linear models). Based on the success of these methods on a number of benchmarks, one might come away with the impression that deep nets are all we need. However, we believe the glass is half-full: while there is much that can be done with deep nets, there is always more to do. The second half of this tutorial will discuss some of these opportunities.",
}
@article{church-etal-2022-gft,
title={Emerging trends: General fine-tuning (gft)},
DOI={10.1017/S1351324922000237},
journal={Natural Language Engineering},
publisher={Cambridge University Press},
author={Church, Kenneth and Cai, Xingyu and Ying, Yibiao and Chen, Zeyu and Xun, Guangxu and Bian, Yuchen},
year={2022},
pages={1–17}}