-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out of memory. #2
Comments
你可以降低训练规模 计算job 和 machine的数量,另外我在文章中给了训练超参数和硬件配置,你可以参考一下
从 Windows 版邮件发送
发件人: rookie797
发送时间: 2022年11月18日 16:09
收件人: leikun-starting/End-to-end-DRL-for-FJSP
抄送: Subscribed
主题: [leikun-starting/End-to-end-DRL-for-FJSP] out of memory. (Issue #2)
抱歉打扰,在我运行您的代码时,报错runtimeerror: cuda out of memory,我通过降低batch size,换显存更大的GPU运行都无法解决,请问这应该检查什么部分或者如何解决呢?谢谢
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
还有就是我有上传预训练模型,你可以直接运行validation进行测试
从 Windows 版邮件发送
发件人: rookie797
发送时间: 2022年11月18日 16:09
收件人: leikun-starting/End-to-end-DRL-for-FJSP
抄送: Subscribed
主题: [leikun-starting/End-to-end-DRL-for-FJSP] out of memory. (Issue #2)
抱歉打扰,在我运行您的代码时,报错runtimeerror: cuda out of memory,我通过降低batch size,换显存更大的GPU运行都无法解决,请问这应该检查什么部分或者如何解决呢?谢谢
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
我推测,这个问题出在了作者将矩阵邻接矩阵转成了稀疏表示 |
10 \times10规模是可以在8g 的卡上训练的,30 \times 20 需要大概v100 32g版本的 |
这个在哪里修改呀 |
不知道为啥租了个 A100 还是out of memory |
在Params里面修改configs的参数 |
我也是,用一张A100显卡还是跑不动 |
Params.py文件中,把第7、8行关于算例规模的设置改小就可以跑了,亲测10X5只需要3G |
能分享一下您的代码吗?一直调试报错 |
hyq4310,你加我V我发给你吧 |
已经解决了,把工件和机器数量调小就行 |
抱歉打扰,在我运行您的代码时,报错runtimeerror: cuda out of memory,我通过降低batch size,换显存更大的GPU运行都无法解决,请问这应该检查什么部分或者如何解决呢?谢谢
The text was updated successfully, but these errors were encountered: