Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Recipe for Training Neural Networks #56

Open
howardyclo opened this issue Jun 8, 2019 · 1 comment
Open

A Recipe for Training Neural Networks #56

howardyclo opened this issue Jun 8, 2019 · 1 comment

Comments

@howardyclo
Copy link
Owner

Metadata

@howardyclo
Copy link
Owner Author

howardyclo commented Jun 8, 2019

Spend Time to Understand Data

  1. Understand the distribution and patterns.
  2. Look for data imbalances and biases.
  3. Examples:
    • Are very local features enough or do we need global context?
    • How much variation is there and what form does it take?
    • What variation is spurious and could be preprocessed out?
    • Does spatial position matter or do we want to average pool it out?
    • How much does detail matter and how far could we afford to downsample the images?
    • How noisy are the labels?
  4. Visualize the statistics and the outliers along any axis.

Setup Training/Evaluation and Start from Simple Model

  • Establish baselines and visualize the train/eval metrics.
  • Fix random seed and run code twice to ensure to get the same results.
  • Disable any unnecessary fanciness, e.g., data augmentation.
  • Plot test losses of entire data instead of only batches.
  • Ensure the loss started with right value, e.g., -log(1/n_classes).
  • Visualize the fixed test batch to see the "dynamics" of how model learns (be aware of very low or very high learning rates).
  • Be aware of view and transpose/permute.
  • Write simple code that can work first and refactor to more generalizable version later.

Overfit

  • Follow the most related paper and try their simplest architecture that achieves good performance. Do not start to customize things in early stage.
  • Use Adam optimizer with 3e-4 learning rate is safe.
  • If we have multiple signals, plug them into model one by one to ensure that you get a performance boost you'd expect.
  • Be careful with learning rate decay (different dataset size/problem requires different learning rate decay schedule). Disable learning rate decay first and tune this later.

Regularize

  • Don't spend a lot of engineering costs to squeeze juice out of a small dataset when you could instead be collecting more data.
  • Data augmentation.
  • Creative augmentation: domain randomization, use of simulation, clever hybrids such as inserting (potentially simulated) data into scenes, or even GANs.
  • Pretraining.
  • Stick with supervised learning.
  • Smaller input dimensionality (try input small image).
  • Decrease the batch size (small batch = stronger regularization).
  • Add dropout (dropout2d for CNNs).
  • Weight decay.
  • Try larger model (early stopped performance may be better than the small ones).

Tuning Hyperparameters

Squeeze Out the Juice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant