You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Supervised data augmentation: Current data augmentation method for labeled data provides a steady but limited performance boost when labeled data is usually small.
Unsupervised data augmentation (UDA):
Design data augmentation method for unlabeled data since unlabeled data is often larger.
Consistency loss: Minimize the KL divergence between between the predicted distributions on an unlabeled example and an augmented unlabeled example.
Consistency/smoothness enforcing: UDA smooths input/hidden space so that model can be more robust.
Total loss: Supervised loss + Consistency loss
Allows label information to propagate from labeled data to unlabeled data.
Training Techniques
Propose Training signal (supervised loss) annealing (TSA) for preventing overfitting on small labeled data: Gradually release supervised loss signal during training with log/linear/exp schedules (exp is recommeded for very limited labeled data).
Use targeted data augmentation (e.g. AutoAugment) gives a significant improvement over other untargeted data augmentations.
Diverse and valid augmentations that inject targeted inductive biases are the keys, but there are tradeoffs for generating text, e.g., diverse text may not be a valid sentence.
Propose (1) Confidence-based masking; (2) Entropy minimization; (3) Softmax temperature controlling to sharpen the unlabeled data predictions (prevent to be over-flat and thus causes the consistency loss useless). (1)+(3) is the most effective.
Propose Domain-relevance Data Filtering to address the mismatch of class distribution of out-of-domain unlabeled data: Train a in-domain baseline model, predict unlabeled data, and pick out the examples that the model is most confident about (equally distributed among classes).
How to apply it in regression problem?
Results
2.7% error rate (w/ 4000 labeled data) on CIFAR-10, nearly matching full dataset performance.
2.85% error rate (w/ 250 labeled data) on SVHN, nearly matching full dataset performance.
4.2% error rate (w/ 20 labeled data) > SToA model (w/ 25000 labeled data) on IMDb text classification.
Improves ImageNet top-1/top-5 accuracy from 55.1/77.3% to 68.7%/88.5% (w/ 10% of labeled data)
Improves ImageNet top-1/top-5 accuracy from 78.3/94.4% to 79.0/94.5% (w/ full labeled data + 1.3M extra unlabeled data)
Notable Related Work
mixup: Beyond Empirical Risk Minimization by MIT & FAIR (ICLR 2018): Data augmentation from a single data point and performs interpolation of data pairs to achieve augmentation.
Metadata
The text was updated successfully, but these errors were encountered: