-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can not run python3 -m vall_e.train yaml=config/test/nar.yml #81
Comments
gpt给你的建议是对的 |
你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供 还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了 |
是缺了什么东西了吗 |
'NoneType' object has no attribute 'optimizer_name', self._config is a nonetype |
encountered same problem. vall_e.train stopped working. At first look it seems that a change was applied to microsoft's DeepSpeed code. when Micorosoft's module is initialized it looks for a config object that contains the attribute optimizer_name. vall_e uses DeepSpeed and initializes it as part of the class 'Engine' in utils/engines.py but it does not pass the required config parameter. I am not familiar with this code but I could see that other classes in utils/engines.py (e.g. the 'Engines' class) do use a config object that probably has the necessary information. Can anyone help? |
我能正常跑诶,我感觉是one.qnt.pt的维度有问题,你可以尝试一下把one相关的pt和txt都删掉,只用自带的test.pt和txt跑跑看,看会不会报错。如果可以正常跑的话就能证明是Encodec对one.wav编码的时候出点问题,你重新编码试试,看看能不能得到[1, 8, x]维度的pt. |
See the discussion here: #87 |
thanks |
thx |
I opened a pull request that deals with this issue. Make sure to have mpi4py installed correctly, as I utilize the default initialization of distributed training which might search for mpis. |
!pip install deepspeed==0.8.3 make it alright |
牛皮 thx,解决了train的问题
|
python3 -m vall_e.train yaml=config/test/nar.yml --debug
跑这个的时候报错了.chatgpt4 说是有可能是原始文件的问题但是又没法给出具体的建议.只能问作者了.
File "/sam/vall-e/vall_e/utils/trainer.py", line 150, in train
for batch in _make_infinite_epochs(train_dl):
File "/sam/vall-e/vall_e/utils/trainer.py", line 103, in _make_infinite_epochs
yield from dl
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 634, in next
data = self._next_data()
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([])
File "/usr/local/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
Added tensor with shape: torch.Size([])
Converted path: /sam/vall-e/data/train/one.qnt.pt -> /sam/vall-e/data/train/one.qnt.pt
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/sam/vall-e/vall_e/data.py", line 185, in getitem
proms = self.sample_prompts(spkr_name, ignore=path)
File "/sam/vall-e/vall_e/data.py", line 172, in sample_prompts
raise RuntimeError("All tensors in prom_list are zero-dimensional.")
RuntimeError: All tensors in prom_list are zero-dimensional.
Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([])
Added tensor with shape: torch.Size([])
Converted path: /sam/vall-e/data/train/one.qnt.pt -> /sam/vall-e/data/train/one.qnt.pt
Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([])
Added tensor with shape: torch.Size([])
chatgpt4 帮我写的程序:
root@CH-202203180108:/sam/vall-e/data# cat 1.py
import torch
train_qnt = torch.load('/sam/vall-e/data/train/one.qnt.pt')
print("Train qnt shape:", train_qnt.shape)
val_qnt = torch.load('/sam/vall-e/data/val/test.qnt.pt')
print("Val qnt shape:", val_qnt.shape)
root@CH-202203180108:/sam/vall-e/data# python3 1.py
Train qnt shape: torch.Size([3])
Val qnt shape: torch.Size([1, 8, 149])
data的目录结构:
root@CH-202203180108:/sam/vall-e/data# ll
total 24
drwxr-xr-x 5 root root 4096 Mar 30 21:07 ./
drwxr-xr-x 8 root root 4096 Mar 30 23:45 ../
-rw-r--r-- 1 root root 216 Mar 30 21:07 1.py
drwxr-xr-x 2 root root 4096 Mar 28 14:27 test/
drwxr-xr-x 2 root root 4096 Mar 30 23:34 train/
drwxr-xr-x 2 root root 4096 Mar 28 14:55 val/
train目录文件:
root@CH-202203180108:/sam/vall-e/data# ll train/
total 408
drwxr-xr-x 2 root root 4096 Mar 30 23:34 ./
drwxr-xr-x 5 root root 4096 Mar 30 21:07 ../
-rw-r--r-- 1 root root 159 Mar 28 14:53 1.py
-rw-r--r-- 1 root root 37 Mar 28 14:49 one.phn.txt
-rw-r--r-- 1 root root 747 Mar 28 14:54 one.qnt.pt
-rw-r--r-- 1 root root 26 Mar 28 14:38 test.phn.txt
-rw-r--r-- 1 root root 10286 Mar 28 14:38 test.qnt.pt
-rw-r--r-- 1 root root 380750 Mar 30 23:34 test.wav
root@CH-202203180108:/sam/vall-e/data#
报错了不知道怎么搞
The text was updated successfully, but these errors were encountered: