Question about TSA attention weight #278

Angericky · 2024-08-19T14:26:09Z

Hi, thanks for your nice work!

I'm confuse by initialization of attention weights. They are all set to zero.

BEVFormer/projects/mmdet3d_plugin/bevformer/modules/temporal_self_attention.py

Line 123 in 66b65f3

constant_init(self.attention_weights, val=0., bias=0.)

Why are the weights set to zero? Won't gradient vanishing happenn in the linear layer? Since in this way, the gradients of the weights are also be 0s during back propagation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about TSA attention weight #278

Question about TSA attention weight #278

Angericky commented Aug 19, 2024

Question about TSA attention weight #278

Question about TSA attention weight #278

Comments

Angericky commented Aug 19, 2024