Is there data leakage in the maml-omniglot example? #107

SunHaozhe · 2021-05-10T13:43:18Z

In the maml-omniglot.py example code, net.train() is used for meta-test phases (link).

Does this not cause data leakage of meta-test data via the statistics of nn.BatchNorm2d (net contains several nn.BatchNorm2d)?

The text was updated successfully, but these errors were encountered:

brando90 · 2021-11-04T20:17:50Z

I think there is.

However, in my experience without it the model diverges and explodes after an adaptation step (i.e. e.g. 5 steps of the inner opt):

>>maml_old (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5939, grad_fn=<NormBackward1>)
>maml_old (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>>maml_old (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5940, grad_fn=<NormBackward1>)
>maml_old (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>>maml_old (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5940, grad_fn=<NormBackward1>)
>maml_old (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>>maml_old (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5940, grad_fn=<NormBackward1>)
>maml_old (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>>maml_old (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5941, grad_fn=<NormBackward1>)
>maml_old (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>>maml_old (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5940, grad_fn=<NormBackward1>)
>maml_old (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>>maml_old (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5942, grad_fn=<NormBackward1>)
>maml_old (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>>maml_old (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5940, grad_fn=<NormBackward1>)
>maml_old (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>>maml_old (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5940, grad_fn=<NormBackward1>)
>maml_old (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>>maml_old (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5939, grad_fn=<NormBackward1>)
eval_loss=0.9859228551387786, eval_acc=0.5907692521810531
args.meta_learner.lr_inner=0.01
==== in forward2
>maml_new (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>maml_new (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(171440.6875, grad_fn=<NormBackward1>)
>maml_new (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>maml_new (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(208426.0156, grad_fn=<NormBackward1>)
>maml_new (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>maml_new (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(17067344., grad_fn=<NormBackward1>)
>maml_new (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>maml_new (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(40371.8125, grad_fn=<NormBackward1>)
>maml_new (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>maml_new (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(1.0911e+11, grad_fn=<NormBackward1>)
>maml_new (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>maml_new (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(21.3515, grad_fn=<NormBackward1>)
>maml_new (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>maml_new (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(5.4257e+13, grad_fn=<NormBackward1>)
>maml_new (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>maml_new (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(128.9109, grad_fn=<NormBackward1>)
>maml_new (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>maml_new (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(3994.7734, grad_fn=<NormBackward1>)
>maml_new (before inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(9.5937, grad_fn=<NormBackward1>)
>maml_new (after inner adapt): fmodel.model.features.conv1.weight.norm(2)=tensor(1682896., grad_fn=<NormBackward1>)
eval_loss_sanity=nan, eval_acc_santiy=0.20000000298023224

though these are on mini-imagenet but I am 100% that the issue causing it. when I do mdl.train() it goes away...

brando90 · 2021-11-04T20:30:36Z

cross posted:

brando90 · 2021-11-04T22:57:48Z

brando90 · 2021-11-04T22:58:38Z

@SunHaozhe did you ever fix this...?

I think the only way to fix it (and idk if it will work) is to either

re-train the model but use batch stats (+ maml)
make sure your using the train stats...? https://stackoverflow.com/questions/69846779/how-does-one-use-the-mean-and-std-from-training-in-batch-norm

brando90 · 2021-11-05T16:04:50Z

create a pull request: #122

brando90 · 2021-11-05T19:15:16Z

@SunHaozhe I don't think this is an issue anymore because of this:

this is likely wrong because during training we want to use batch statistics in meta-leanring, since tasks have different distributions. But then how do we retain determinism at inference time? see: https://discuss.pytorch.org/t/how-does-one-use-the-mean-and-std-from-training-in-batch-norm/136029/5

in summary .train() uses batch statistics, yes it updates the running mean with cheating means but those are never actually used if the .train() is set. During training, the network uses batch statistics anyway. As long as you don't save the model as a checkpoint with these cheated means it doesn't matter. But your code does become less determinsitic.

brando90 mentioned this issue Nov 4, 2021

How does one return an adapted model without using the context manager? #119

Open

brando90 mentioned this issue Nov 5, 2021

Fixing the data leakage from the maml omniglot example #122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there data leakage in the maml-omniglot example? #107

Is there data leakage in the maml-omniglot example? #107

SunHaozhe commented May 10, 2021

brando90 commented Nov 4, 2021

brando90 commented Nov 4, 2021 •

edited

Loading

brando90 commented Nov 4, 2021 •

edited

Loading

brando90 commented Nov 4, 2021

brando90 commented Nov 5, 2021

brando90 commented Nov 5, 2021

Is there data leakage in the maml-omniglot example? #107

Is there data leakage in the maml-omniglot example? #107

Comments

SunHaozhe commented May 10, 2021

brando90 commented Nov 4, 2021

brando90 commented Nov 4, 2021 • edited Loading

brando90 commented Nov 4, 2021 • edited Loading

brando90 commented Nov 4, 2021

brando90 commented Nov 5, 2021

brando90 commented Nov 5, 2021

brando90 commented Nov 4, 2021 •

edited

Loading

brando90 commented Nov 4, 2021 •

edited

Loading