Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine Tuning The Model #1

Open
shank885 opened this issue Jun 23, 2022 · 3 comments
Open

Fine Tuning The Model #1

shank885 opened this issue Jun 23, 2022 · 3 comments

Comments

@shank885
Copy link

I am trying to fine-tune the HDN model on the POT dataset. I have preprocessed the POT dataset as instructed. However, the training loss seems jumpy as it fluctuates from 1 to 600. I tried to debug the training flow and found that it happens when we feed the supervised training samples to the model. Also, the loss for the HMNET model comes out to be nan. Even after training the model for 1 epoch, the model performance decreases too much. Is there a bug in the training pipeline? Please assist.

@zhanxinrui
Copy link
Owner

I'm sorry I haven't seen the issue for some reason, did you solve the problem now?
Could you answer the following questions?

  1. If the COCO14 and GOT-10K trained well in your experiment?
  2. Have you visualized the input patches to ensure the augmentation or the format is ok?
  3. Is the Similarity branch ok?
    We did try to fine-tune the Similarity branch on POT long ago, and the performance became better, there may be some potential bugs in HMNET, but I think the loss is nan probably due to some settings being set wrongly, you can actually try to localize the which loss and check the if the label is set correctly (You can supervise the corner offsets to be zeros and see if the loss still fluctuates).

@pinguo-huxiaohe
Copy link

I also encountered the situation where the loss is NaN. It happened because I only provided supervised data to the model without any unsupervised data, so when calculating, division by zero occurred, resulting in an error. In your case of fine-tuning based on the POT dataset, there should be no unsupervised data available, which might be the reason for this issue. Moreover, during training, these NaN losses do not have any practical impact.

@pinguo-huxiaohe
Copy link

Do you have any experience with training from scratch or training based on the improved model provided by the author? I have tried training from scratch and training based on pre-trained models using COCO14 and GOT-10K data, but I couldn't achieve the same accuracy as the improved model mentioned by the author. Do you have any tips or experiences to share?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants