Skip to content

Commit

Permalink
Add PPMiniLM class (#1512)
Browse files Browse the repository at this point in the history
* add ppminilm class

* update copyright

* add ppminilm tokenizer to __init__

* update ppminilm

* remove useless comments, remove ernie

* remove useless readme, remove ernie
  • Loading branch information
LiuChiachi authored Dec 28, 2021
1 parent 806ff6f commit a16c4bc
Show file tree
Hide file tree
Showing 17 changed files with 726 additions and 58 deletions.
12 changes: 6 additions & 6 deletions examples/model_compression/pp-minilm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,16 +104,16 @@ PP-MiniLM 压缩方案以面向预训练模型的任务无关知识蒸馏(Task-a

## 导入 PP-MiniLM

PP-MiniLM 是使用任务无关蒸馏方法,以 `roberta-wwm-ext-large` 做教师模型蒸馏产出的 6 层 ERNIE 模型(即包含 6 层 Transformer Encoder Layer、Hidden Size 为 768 的中文预训练小模型),在 CLUE 上 7 个分类任务上的模型精度超过 BERT<sub>base</sub>、TinyBERT<sub>6</sub>、UER-py RoBERTa L6-H768、RBT6。
PP-MiniLM 是使用任务无关蒸馏方法,以 `roberta-wwm-ext-large` 做教师模型蒸馏产出的含 6 层 Transformer Encoder Layer、Hidden Size 为 768 的预训练小模型,在 CLUE 上 7 个分类任务上的模型精度超过 BERT<sub>base</sub>、TinyBERT<sub>6</sub>、UER-py RoBERTa L6-H768、RBT6。

可以这样导入 PP-MiniLM:

```python

from paddlenlp.transformers import ErnieModel, ErnieForSequenceClassification
from paddlenlp.transformers import PPMiniLMModel, PPMiniLMForSequenceClassification

model = ErnieModel.from_pretrained('ppminilm-6l-768h')
model = ErnieForSequenceClassification.from_pretrained('ppminilm-6l-768h') # 用于分类任务
model = PPMiniLMModel.from_pretrained('ppminilm-6l-768h')
model = PPMiniLMForSequenceClassification.from_pretrained('ppminilm-6l-768h') # 用于分类任务
```

PP-MiniLM 是一个 6 层的预训练模型,使用 `from_pretrained`导入 PP-MiniLM 之后,就可以在自己的数据集上进行 fine-tuning。接下来会介绍如何用下游任务数据在导入的 PP-MiniLM 上进行微调、进一步压缩及推理部署。
Expand Down Expand Up @@ -193,7 +193,7 @@ sh run_clue.sh CLUEWSC2020 1e-4 32 50 128 0 ppminilm-6l-768h
假设待导出的模型的地址为 `ppminilm-6l-768h/models/CLUEWSC2020/1e-4_32`,可以运行下方命令将动态图模型导出为可用于部署的静态图模型:

```shell
python export_model.py --model_type ernie --model_path ppminilm-6l-768h/models/CLUEWSC2020/1e-4_32 --output_path fine_tuned_infer_model/float
python export_model.py --model_type ppminilm --model_path ppminilm-6l-768h/models/CLUEWSC2020/1e-4_32 --output_path fine_tuned_infer_model/float
cd ..
```

Expand Down Expand Up @@ -221,7 +221,7 @@ cd ..
cd pruning
export FT_MODELS=../finetuning/ppminilm-6l-768h/models/CLUEWSC2020/1e-4_32

sh prune.sh CLUEWSC2020 5e-5 16 50 128 0 ${FT_MODELS} 0.75
sh prune.sh CLUEWSC2020 1e-4 32 50 128 0 ${FT_MODELS} 0.75
```
其中每个参数依次表示:CLUE 中的任务名称、学习率、batch size、epoch 数、最大序列长度、gpu id、学生模型的地址、裁剪后宽度比例列表。执行完成后,模型保存的路径位于 `pruned_models/CLUEWSC2020/0.75/best_model/`

Expand Down
4 changes: 2 additions & 2 deletions examples/model_compression/pp-minilm/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@
import numpy as np

from paddle.metric import Metric, Accuracy
from paddlenlp.transformers import ErnieForSequenceClassification, ErnieTokenizer
from paddlenlp.transformers import PPMiniLMForSequenceClassification, PPMiniLMTokenizer
from paddlenlp.transformers import BertForSequenceClassification, BertTokenizer

MODEL_CLASSES = {
"ernie": (ErnieForSequenceClassification, ErnieTokenizer),
"ppminilm": (PPMiniLMForSequenceClassification, PPMiniLMTokenizer),
"bert": (BertForSequenceClassification, BertTokenizer)
}

Expand Down
2 changes: 0 additions & 2 deletions examples/model_compression/pp-minilm/finetuning/run_clue.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@

from paddlenlp.datasets import load_dataset
from paddlenlp.data import Stack, Tuple, Pad, Dict
from paddlenlp.transformers import BertForSequenceClassification, BertTokenizer, BertModel
from paddlenlp.transformers import ErnieForSequenceClassification, ErnieTokenizer
from paddlenlp.transformers import LinearDecayWithWarmup

sys.path.append("../")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ export CUDA_VISIBLE_DEVICES=$6
export MODEL_PATH=$7

python -u ./run_clue.py \
--model_type ernie \
--model_type ppminilm \
--model_name_or_path ${MODEL_PATH} \
--task_name ${TASK_NAME} \
--max_seq_length ${MAX_SEQ_LEN} \
Expand Down
2 changes: 1 addition & 1 deletion examples/model_compression/pp-minilm/inference/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def parse_args():
", ".join(METRIC_CLASSES.keys()), )
parser.add_argument(
"--model_type",
default='ernie',
default='ppminilm',
type=str,
help="Model type selected in the list: " +
", ".join(MODEL_CLASSES.keys()), )
Expand Down
2 changes: 1 addition & 1 deletion examples/model_compression/pp-minilm/pruning/export.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

MODEL_PATH=$1
TASK_NAME=$2
python export_model.py --model_type ernie \
python export_model.py --model_type ppminilm \
--model_name_or_path ${MODEL_PATH}/${TASK_NAME}/0.75/best_model \
--sub_model_output_dir ${MODEL_PATH}/${TASK_NAME}/0.75/sub/ \
--static_sub_model ${MODEL_PATH}/${TASK_NAME}/0.75/sub_static/float \
Expand Down
2 changes: 1 addition & 1 deletion examples/model_compression/pp-minilm/pruning/export_all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ MODEL_PATH=pruned_models
for TASK_NAME in AFQMC TNEWS IFLYTEK CMNLI OCNLI CLUEWSC2020 CSL

do
python export_model.py --model_type ernie \
python export_model.py --model_type ppminilm \
--model_name_or_path ${MODEL_PATH}/${TASK_NAME}/0.75/best_model \
--sub_model_output_dir ${MODEL_PATH}/${TASK_NAME}/0.75/sub/ \
--static_sub_model ${MODEL_PATH}/${TASK_NAME}/0.75/sub_static/float \
Expand Down
18 changes: 10 additions & 8 deletions examples/model_compression/pp-minilm/pruning/export_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import argparse
import logging
import os
import sys
import math
import random
import time
Expand All @@ -26,20 +27,21 @@
import paddle.nn as nn
import paddle.nn.functional as F

from paddlenlp.transformers import ErnieModel, ErnieForSequenceClassification, ErnieTokenizer
from paddlenlp.transformers import PPMiniLMModel
from paddlenlp.utils.log import logger
from paddleslim.nas.ofa import OFA, utils
from paddleslim.nas.ofa.convert_super import Convert, supernet
from paddleslim.nas.ofa.layers import BaseBlock

MODEL_CLASSES = {"ernie": (ErnieForSequenceClassification, ErnieTokenizer), }
sys.path.append("../")
from data import MODEL_CLASSES


def ernie_forward(self,
input_ids,
token_type_ids=None,
position_ids=None,
attention_mask=None):
def ppminilm_forward(self,
input_ids,
token_type_ids=None,
position_ids=None,
attention_mask=None):
wtype = self.pooler.dense.fn.weight.dtype if hasattr(
self.pooler.dense, 'fn') else self.pooler.dense.weight.dtype
if attention_mask is None:
Expand All @@ -52,7 +54,7 @@ def ernie_forward(self,
return encoded_layer, pooled_output


ErnieModel.forward = ernie_forward
PPMiniLMModel.forward = ppminilm_forward


def parse_args():
Expand Down
31 changes: 16 additions & 15 deletions examples/model_compression/pp-minilm/pruning/prune.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
from paddlenlp.datasets import load_dataset
from paddlenlp.transformers import LinearDecayWithWarmup
from paddlenlp.utils.log import logger
from paddlenlp.transformers import ErnieForSequenceClassification, ErnieTokenizer, ErnieModel
from paddlenlp.transformers import PPMiniLMModel

from paddleslim.nas.ofa import OFA, DistillConfig, utils
from paddleslim.nas.ofa.utils import nlp_utils
Expand Down Expand Up @@ -194,11 +194,11 @@ def evaluate(model, metric, data_loader, width_mult, student=False):


### monkey patch for bert forward to accept [attention_mask, head_mask] as attention_mask
def ernie_forward(self,
input_ids,
token_type_ids=None,
position_ids=None,
attention_mask=[None, None]):
def ppminilm_forward(self,
input_ids,
token_type_ids=None,
position_ids=None,
attention_mask=[None, None]):
wtype = self.pooler.dense.fn.weight.dtype if hasattr(
self.pooler.dense, 'fn') else self.pooler.dense.weight.dtype
if attention_mask[0] is None:
Expand All @@ -211,7 +211,7 @@ def ernie_forward(self,
return encoded_layer, pooled_output


ErnieModel.forward = ernie_forward
PPMiniLMModel.forward = ppminilm_forward


### reorder weights according head importance and neuron importance
Expand All @@ -220,14 +220,15 @@ def reorder_neuron_head(model, head_importance, neuron_importance):
for layer, current_importance in enumerate(neuron_importance):
# reorder heads
idx = paddle.argsort(head_importance[layer], descending=True)
nlp_utils.reorder_head(model.ernie.encoder.layers[layer].self_attn, idx)
nlp_utils.reorder_head(model.ppminilm.encoder.layers[layer].self_attn,
idx)
# reorder neurons
idx = paddle.argsort(
paddle.to_tensor(current_importance), descending=True)
nlp_utils.reorder_neuron(
model.ernie.encoder.layers[layer].linear1.fn, idx, dim=1)
model.ppminilm.encoder.layers[layer].linear1.fn, idx, dim=1)
nlp_utils.reorder_neuron(
model.ernie.encoder.layers[layer].linear2.fn, idx, dim=0)
model.ppminilm.encoder.layers[layer].linear2.fn, idx, dim=0)


def soft_cross_entropy(inp, target):
Expand Down Expand Up @@ -305,9 +306,9 @@ def do_train(args):
args.model_name_or_path, num_classes=num_labels)

# Step4: Config about distillation.
mapping_layers = ['ernie.embeddings']
for idx in range(model.ernie.config['num_hidden_layers']):
mapping_layers.append('ernie.encoder.layers.{}'.format(idx))
mapping_layers = ['ppminilm.embeddings']
for idx in range(model.ppminilm.config['num_hidden_layers']):
mapping_layers.append('ppminilm.encoder.layers.{}'.format(idx))

default_distill_config = {
'lambda_distill': 0.1,
Expand All @@ -333,8 +334,8 @@ def do_train(args):
ofa_model.model,
dev_data_loader,
loss_fct=criterion,
num_layers=model.ernie.config['num_hidden_layers'],
num_heads=model.ernie.config['num_attention_heads'])
num_layers=model.ppminilm.config['num_hidden_layers'],
num_heads=model.ppminilm.config['num_attention_heads'])
reorder_neuron_head(ofa_model.model, head_importance, neuron_importance)

if paddle.distributed.get_world_size() > 1:
Expand Down
2 changes: 1 addition & 1 deletion examples/model_compression/pp-minilm/pruning/prune.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ export CUDA_VISIBLE_DEVICES=$6
export STUDENT_DIR=$7
export WIDTH_LIST=$8

python -u ./prune.py --model_type ernie \
python -u ./prune.py --model_type ppminilm \
--model_name_or_path ${STUDENT_DIR} \
--task_name $TASK_NAME --max_seq_length ${SEQ_LEN} \
--batch_size ${BATCH_SIZE} \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
import paddleslim
from paddlenlp.data import Stack, Tuple, Pad, Dict
from paddlenlp.datasets import load_dataset
from paddlenlp.transformers import ErnieTokenizer
from paddlenlp.transformers import PPMiniLMTokenizer

sys.path.append("../")
from data import convert_example, METRIC_CLASSES, MODEL_CLASSES
Expand Down Expand Up @@ -85,7 +85,7 @@ def quant_post(args, batch_size=8, algo='avg'):

train_ds = load_dataset("clue", args.task_name, splits="dev")

tokenizer = ErnieTokenizer.from_pretrained(args.model_name_or_path)
tokenizer = PPMiniLMTokenizer.from_pretrained(args.model_name_or_path)

trans_func = partial(
convert_example,
Expand Down
2 changes: 2 additions & 0 deletions paddlenlp/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
from .bert_japanese.tokenizer import *
from .ernie.modeling import *
from .ernie.tokenizer import *
from .ppminilm.modeling import *
from .ppminilm.tokenizer import *
from .gpt.modeling import *
from .gpt.tokenizer import *
from .roberta.modeling import *
Expand Down
16 changes: 0 additions & 16 deletions paddlenlp/transformers/ernie/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,20 +168,6 @@ class ErniePretrainedModel(PretrainedModel):
"vocab_size": 30522,
"pad_token_id": 0,
},
"ppminilm-6l-768h": {
"attention_probs_dropout_prob": 0.1,
"intermediate_size": 3072,
"hidden_act": "relu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 6,
"type_vocab_size": 4,
"vocab_size": 21128,
"pad_token_id": 0,
},
}
resource_files_names = {"model_state": "model_state.pdparams"}
pretrained_resource_files_map = {
Expand All @@ -196,8 +182,6 @@ class ErniePretrainedModel(PretrainedModel):
"https://bj.bcebos.com/paddlenlp/models/transformers/ernie_v2_base/ernie_v2_eng_base_finetuned_squad.pdparams",
"ernie-2.0-large-en":
"https://bj.bcebos.com/paddlenlp/models/transformers/ernie_v2_large/ernie_v2_eng_large.pdparams",
"ppminilm-6l-768h":
"https://bj.bcebos.com/paddlenlp/models/transformers/ppminilm-6l-768h/ppminilm-6l-768h.pdparams",
}
}
base_model_prefix = "ernie"
Expand Down
2 changes: 0 additions & 2 deletions paddlenlp/transformers/ernie/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,6 @@ class ErnieTokenizer(PretrainedTokenizer):
"https://bj.bcebos.com/paddlenlp/models/transformers/ernie-gen-large/vocab.txt",
"ernie-gen-large-430g-en":
"https://bj.bcebos.com/paddlenlp/models/transformers/ernie-gen-large-430g/vocab.txt",
"ppminilm-6l-768h":
"https://bj.bcebos.com/paddlenlp/models/transformers/ppminilm-6l-768h/vocab.txt",
}
}
pretrained_init_configuration = {
Expand Down
Empty file.
Loading

0 comments on commit a16c4bc

Please sign in to comment.