can not run python3 -m vall_e.train yaml=config/test/nar.yml #81

samual30000 · 2023-03-30T16:15:59Z

python3 -m vall_e.train yaml=config/test/nar.yml --debug

跑这个的时候报错了.chatgpt4 说是有可能是原始文件的问题但是又没法给出具体的建议.只能问作者了.

trainer.train(

File "/sam/vall-e/vall_e/utils/trainer.py", line 150, in train
for batch in _make_infinite_epochs(train_dl):
File "/sam/vall-e/vall_e/utils/trainer.py", line 103, in _make_infinite_epochs
yield from dl
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 634, in next
data = self._next_data()
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([])
File "/usr/local/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
Added tensor with shape: torch.Size([])
Converted path: /sam/vall-e/data/train/one.qnt.pt -> /sam/vall-e/data/train/one.qnt.pt
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/sam/vall-e/vall_e/data.py", line 185, in getitem
proms = self.sample_prompts(spkr_name, ignore=path)
File "/sam/vall-e/vall_e/data.py", line 172, in sample_prompts
raise RuntimeError("All tensors in prom_list are zero-dimensional.")
RuntimeError: All tensors in prom_list are zero-dimensional.

Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([])
Added tensor with shape: torch.Size([])
Converted path: /sam/vall-e/data/train/one.qnt.pt -> /sam/vall-e/data/train/one.qnt.pt
Loaded tensor from /sam/vall-e/data/train/one.qnt.pt with shape: torch.Size([])
Added tensor with shape: torch.Size([])

chatgpt4 帮我写的程序:
root@CH-202203180108:/sam/vall-e/data# cat 1.py

import torch

train_qnt = torch.load('/sam/vall-e/data/train/one.qnt.pt')
print("Train qnt shape:", train_qnt.shape)

val_qnt = torch.load('/sam/vall-e/data/val/test.qnt.pt')
print("Val qnt shape:", val_qnt.shape)

root@CH-202203180108:/sam/vall-e/data# python3 1.py
Train qnt shape: torch.Size([3])
Val qnt shape: torch.Size([1, 8, 149])

data的目录结构:
root@CH-202203180108:/sam/vall-e/data# ll
total 24
drwxr-xr-x 5 root root 4096 Mar 30 21:07 ./
drwxr-xr-x 8 root root 4096 Mar 30 23:45 ../
-rw-r--r-- 1 root root 216 Mar 30 21:07 1.py
drwxr-xr-x 2 root root 4096 Mar 28 14:27 test/
drwxr-xr-x 2 root root 4096 Mar 30 23:34 train/
drwxr-xr-x 2 root root 4096 Mar 28 14:55 val/

train目录文件:

root@CH-202203180108:/sam/vall-e/data# ll train/
total 408
drwxr-xr-x 2 root root 4096 Mar 30 23:34 ./
drwxr-xr-x 5 root root 4096 Mar 30 21:07 ../
-rw-r--r-- 1 root root 159 Mar 28 14:53 1.py
-rw-r--r-- 1 root root 37 Mar 28 14:49 one.phn.txt
-rw-r--r-- 1 root root 747 Mar 28 14:54 one.qnt.pt
-rw-r--r-- 1 root root 26 Mar 28 14:38 test.phn.txt
-rw-r--r-- 1 root root 10286 Mar 28 14:38 test.qnt.pt
-rw-r--r-- 1 root root 380750 Mar 30 23:34 test.wav
root@CH-202203180108:/sam/vall-e/data#

报错了不知道怎么搞

The text was updated successfully, but these errors were encountered:

Xiangbj17 · 2023-04-06T09:04:06Z

gpt给你的建议是对的
经过Encodec编码的.pt文件维度都是[1, 8, time_step]
/sam/vall-e/data/train/one.qnt.pt 只有一个维度，不太对，检查一下你用qnt编码的过程是不是出了什么问题

samual30000 · 2023-04-12T08:23:14Z

gpt给你的建议是对的经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度，不太对，检查一下你用qnt编码的过程是不是出了什么问题

你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了

samual30000 · 2023-04-12T08:23:55Z

gpt给你的建议是对的经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度，不太对，检查一下你用qnt编码的过程是不是出了什么问题

是缺了什么东西了吗

samual30000 · 2023-04-16T02:57:10Z

'NoneType' object has no attribute 'optimizer_name', self._config is a nonetype

ilanshib · 2023-04-16T05:08:31Z

encountered same problem. vall_e.train stopped working. At first look it seems that a change was applied to microsoft's DeepSpeed code. when Micorosoft's module is initialized it looks for a config object that contains the attribute optimizer_name.

vall_e uses DeepSpeed and initializes it as part of the class 'Engine' in utils/engines.py but it does not pass the required config parameter. I am not familiar with this code but I could see that other classes in utils/engines.py (e.g. the 'Engines' class) do use a config object that probably has the necessary information.

Can anyone help?

Xiangbj17 · 2023-04-16T06:37:35Z

gpt给你的建议是对的经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度，不太对，检查一下你用qnt编码的过程是不是出了什么问题

你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了

我能正常跑诶，我感觉是one.qnt.pt的维度有问题，你可以尝试一下把one相关的pt和txt都删掉，只用自带的test.pt和txt跑跑看，看会不会报错。如果可以正常跑的话就能证明是Encodec对one.wav编码的时候出点问题，你重新编码试试，看看能不能得到[1, 8, x]维度的pt.

ilanshib · 2023-04-18T14:14:17Z

'NoneType' object has no attribute 'optimizer_name', self._config is a nonetype

See the discussion here: #87

samual30000 · 2023-04-25T04:05:50Z

'NoneType' object has no attribute 'optimizer_name', self._config is a nonetype

See the discussion here: #87

thanks

samual30000 · 2023-04-25T04:06:13Z

gpt给你的建议是对的经过Encodec编码的.pt文件维度都是[1, 8, time_step] /sam/vall-e/data/train/one.qnt.pt 只有一个维度，不太对，检查一下你用qnt编码的过程是不是出了什么问题

你能跑起来了吗,经过了跟gpt4的折腾和调试之后还是没办法,项目是不是有一些训练的数据没有提供还是确了什么东西啊,就是到了 python3 -m vall_e.train yaml=config/test/nar.yml --debug 这一步就怎么样都跑不起来了

我能正常跑诶，我感觉是one.qnt.pt的维度有问题，你可以尝试一下把one相关的pt和txt都删掉，只用自带的test.pt和txt跑跑看，看会不会报错。如果可以正常跑的话就能证明是Encodec对one.wav编码的时候出点问题，你重新编码试试，看看能不能得到[1, 8, x]维度的pt.

thx

kgasenzer · 2023-04-25T13:23:26Z

encountered same problem. vall_e.train stopped working. At first look it seems that a change was applied to microsoft's DeepSpeed code. when Micorosoft's module is initialized it looks for a config object that contains the attribute optimizer_name.

vall_e uses DeepSpeed and initializes it as part of the class 'Engine' in utils/engines.py but it does not pass the required config parameter. I am not familiar with this code but I could see that other classes in utils/engines.py (e.g. the 'Engines' class) do use a config object that probably has the necessary information.

Can anyone help?

I opened a pull request that deals with this issue. Make sure to have mpi4py installed correctly, as I utilize the default initialization of distributed training which might search for mpis.

samual30000 · 2023-05-02T10:27:26Z

!pip install deepspeed==0.8.3 make it alright

tangzhimiao · 2023-09-20T03:32:19Z

牛皮 thx，解决了train的问题

!pip install deepspeed==0.8.3 make it alright

kgasenzer mentioned this issue Apr 25, 2023

Fix missing DeepSpeedConfig for deepspeed v0.9.1 #92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can not run python3 -m vall_e.train yaml=config/test/nar.yml #81

can not run python3 -m vall_e.train yaml=config/test/nar.yml #81

samual30000 commented Mar 30, 2023

Xiangbj17 commented Apr 6, 2023

samual30000 commented Apr 12, 2023

samual30000 commented Apr 12, 2023

samual30000 commented Apr 16, 2023

ilanshib commented Apr 16, 2023

Xiangbj17 commented Apr 16, 2023

ilanshib commented Apr 18, 2023

samual30000 commented Apr 25, 2023

samual30000 commented Apr 25, 2023

kgasenzer commented Apr 25, 2023

samual30000 commented May 2, 2023

tangzhimiao commented Sep 20, 2023

can not run python3 -m vall_e.train yaml=config/test/nar.yml #81

can not run python3 -m vall_e.train yaml=config/test/nar.yml #81

Comments

samual30000 commented Mar 30, 2023

Xiangbj17 commented Apr 6, 2023

samual30000 commented Apr 12, 2023

samual30000 commented Apr 12, 2023

samual30000 commented Apr 16, 2023

ilanshib commented Apr 16, 2023

Xiangbj17 commented Apr 16, 2023

ilanshib commented Apr 18, 2023

samual30000 commented Apr 25, 2023

samual30000 commented Apr 25, 2023

kgasenzer commented Apr 25, 2023

samual30000 commented May 2, 2023

tangzhimiao commented Sep 20, 2023