Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update models_mamba.py #90

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Update models_mamba.py #90

wants to merge 1 commit into from

Conversation

mdchuc
Copy link

@mdchuc mdchuc commented May 30, 2024

While trying to train vision mamba with bidirectional mode in masked autoencoder network, I experienced nan loss. Though switch training from mixed precision to full precision fixed the problem but significantly increased training time (almost twice). Looking at the code, the adding of forward and backward hidden_states/residuals does increased the magnitude of both twice as compared to the original hidden states (after patch embedding). By dividing by 2, nan loss was resolved and mixed precision training can continue.

While trying to train vision mamba with bidirectional mode in masked autoencoder network, I experienced nan loss. Though switch training from mixed precision to full precision fixed the problem but significantly increased training time (almost twice). 
Looking at the code, the adding of forward and backward hidden_states/residuals does increased the magnitude of both twice as compared to the original hidden states (after patch embedding). By dividing by 2, nan loss was resolved and mixed precision training can continue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant