-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory during eval but not train? #51
Comments
The problem comes from the fact that all of the encoded predictions are kept in memory, so has more predictions are made, more RAM is needed.
Moreover, as you noticed, the GPU is not used during eval, so you might want to change that too |
@gianfrancodemarco I have the same problem, could you please provide the detail modified evaluation script, thanks! |
@zhenghao977 I've provided a scheme for you to modify the script. Otherwise, I've the implementation in my fork of the project (even if I don't like to advertise it here...). However, the source code has been heavily modified |
@gianfrancodemarco Thanks for the insight, however I could not find the codes you mentioned in their repo, do you mind telling me where can i find the related codes? |
这是来自QQ邮箱的假期自动回复邮件。
您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
|
@thomascong121 Hi! I recently encountered a problem similar to yours. Did you find a way to modify the evaluation script? Thanks. |
这是来自QQ邮箱的假期自动回复邮件。
您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
|
@thomascong121 @WayneWong97 our version is here |
这是来自QQ邮箱的假期自动回复邮件。
您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
|
@gianfrancodemarco Thanks! I fixed the bug with your scheme. |
@WayneWong97 I have the same problem, could you please provide the detail modified evaluation script with the scheme, thanks! |
@Sunhxxin You can find the scheme from @gianfrancodemarco's link.
|
@WayneWong97 Thanks! |
For me, this is because I did not set the gradient cumulative steps when calling the program, and its default value was set to none. So you just need to change your eval_acc to 1 to solve this problem. |
Please try the latest version. It should have fixed the problem. |
这是来自QQ邮箱的假期自动回复邮件。
您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。
|
Description:
During the execution of the code in the evaluate phase, the computer's memory(no cuda memory) keeps increasing, and the program is eventually killed.
Server Base Configuration :
GPU : V100S * 2
RAM : 256GB
May I ask if you have modified the source files in the HuggingFace Transformers? What configurations are needed to implement the code?
The text was updated successfully, but these errors were encountered: