Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupervised Data Augmentation for Consistency Training #60

Open
howardyclo opened this issue Jul 23, 2019 · 1 comment
Open

Unsupervised Data Augmentation for Consistency Training #60

howardyclo opened this issue Jul 23, 2019 · 1 comment

Comments

@howardyclo
Copy link
Owner

Metadata

@howardyclo
Copy link
Owner Author

howardyclo commented Jul 23, 2019

TL;DR

  • Supervised data augmentation: Current data augmentation method for labeled data provides a steady but limited performance boost when labeled data is usually small.
  • Unsupervised data augmentation (UDA):
    • Design data augmentation method for unlabeled data since unlabeled data is often larger.
    • Consistency loss: Minimize the KL divergence between between the predicted distributions on an unlabeled example and an augmented unlabeled example.
    • Consistency/smoothness enforcing: UDA smooths input/hidden space so that model can be more robust.
    • Total loss: Supervised loss + Consistency loss
    • Allows label information to propagate from labeled data to unlabeled data.

Training Techniques

  • Propose Training signal (supervised loss) annealing (TSA) for preventing overfitting on small labeled data: Gradually release supervised loss signal during training with log/linear/exp schedules (exp is recommeded for very limited labeled data).
  • Use targeted data augmentation (e.g. AutoAugment) gives a significant improvement over other untargeted data augmentations.
  • Diverse and valid augmentations that inject targeted inductive biases are the keys, but there are tradeoffs for generating text, e.g., diverse text may not be a valid sentence.
  • Propose (1) Confidence-based masking; (2) Entropy minimization; (3) Softmax temperature controlling to sharpen the unlabeled data predictions (prevent to be over-flat and thus causes the consistency loss useless). (1)+(3) is the most effective.
  • Propose Domain-relevance Data Filtering to address the mismatch of class distribution of out-of-domain unlabeled data: Train a in-domain baseline model, predict unlabeled data, and pick out the examples that the model is most confident about (equally distributed among classes).

    How to apply it in regression problem?

Results

  • 2.7% error rate (w/ 4000 labeled data) on CIFAR-10, nearly matching full dataset performance.
  • 2.85% error rate (w/ 250 labeled data) on SVHN, nearly matching full dataset performance.
  • 4.2% error rate (w/ 20 labeled data) > SToA model (w/ 25000 labeled data) on IMDb text classification.
  • Improves ImageNet top-1/top-5 accuracy from 55.1/77.3% to 68.7%/88.5% (w/ 10% of labeled data)
  • Improves ImageNet top-1/top-5 accuracy from 78.3/94.4% to 79.0/94.5% (w/ full labeled data + 1.3M extra unlabeled data)

Notable Related Work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant