-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The usage of RAM is always increasing during one epoch. #19
Comments
Hello, may I ask how the signal features of your audio are extracted |
@rjc7011855 |
hello, How do you get landmarks? please |
@xz0305 import cv2
import dlib
import numpy as np
class LandmarksExtractor(object):
def __init__(self, model_path):
self.detector = dlib.get_frontal_face_detector()
self.predictor = dlib.shape_predictor(model_path)
def forward(self, image, is_rgb=True):
if not is_rgb:
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
landmarks = self.__predict(image)
return landmarks
def __predict(self, image):
faces = self.detector(image, 1)
assert len(faces) > 0
face = faces[0]
landmarks = self.predictor(image, face)
landmarks = self.shape_to_np(landmarks)
return landmarks
@staticmethod
def shape_to_np(shape, dtype=int):
coords = np.zeros((68, 2), dtype=dtype)
for i in range(68):
coords[i] = (shape.part(i).x, shape.part(i).y)
return coords |
very thankful |
very thankful |
您好,想问一下您的复现效果如何,可以交流一下吗 |
训练了一些epoch,下面是一些效果。 我这经过预处理之后总共有400+段视频片段,作者只给了训练集的视频名称,没有给测试集的,所以我就直接随机分了数据集。 由于训练过程中内存占用不断增加(看最上面的问题描述),经过多次实验,最终每个视频使用前1100帧(间隔一帧取一帧)用作训练和测试。difftalk_demo.zip 里的视频是测试集中的视频,使用连续的前720帧做的测试。可以看到还是有点效果的。 后面的实验我打算减少视频数量,使用每个视频的所有帧。使用那种同一个视频可以截取出多个视频片段的数据,其中一个片段作为测试集,其他视频作为训练集,再训练看看效果。 训练过程中没有验证集,只有测试集,最终的测试效果也是在测试集上观察的,可能有数据泄露的风险。作者应该也是这么搞得。 |
how you downloaded the hdtf data, the video I downloaded has no sound |
@xz0305 |
@quqixun |
Thank you for sharing this link. |
Make (3000,16,29) to (3000, 8, 16, 29). Or you can refer the code at https://github.com/miu200521358/NeuralVoicePuppetryMMD/blob/master/Audio2ExpressionNet/Training%20Code/data/audio_dataset.py#L85 , there are two ways to generate the sequence. |
想问一下你做了全量测试吗?我做下来发现这个方法似乎对一些训练集没见过的id效果不太好 |
请问提取的视频帧和音频帧帧数是对应的吗?我把视频处理成了25fps,截取了前1000帧,这样的话音频应该对应的是40s, 而在16khz的采样率下它共有2400帧,请问应该怎么处理呢 |
Hi. I have a basic question and I hope you can help me with it. How can we specify the number of epochs in this code? this model only trains for 1 epoch on my machine. |
音频处理部分沿用了AD-Nerf的操作,使用deepspeech作为音频特征提取器。 我在实验中没有出现内存占用不断增加的情况,如果您能找到问题所在欢迎指出并修正,改动也可以合并到该项目中。 difftalk_demo.zip中的效果看起来还可以。我们在实际应用中还增加了一步后处理操作。具体地,我们使用了[Real-time intermediate flow estimation for |
Hi, could I know whether your downloaded HDTF videos has audio stream? Could you share the downloading link? Many thanks |
I am getting [x,16,29] where x is the number of frames after deepspeech_features |
thanks i got you answer in comment above |
Hi, could I know how to download the dataset? I met some issues with dataset downloading. Thank you very much |
我沿用了AD-nerf的处理方式,您是否会遇到RuntimeError: stack expects each tensor to be equal size, but got [4, 16, 29] at entry 0 and [8, 16, 29] at entry 1 |
请问在说明中的 |
After preprocessing of HDTF dataset, I got 415 videos.
249 videos (60%) were randomly selected as training set, the others (40%) were test set.
The first 1500 frames of each video were extracted for training with stride 2.
So, I got 277,117 frames in training set, and 179,711 frames in test set.
My machine has 4 A100 GPUs with 40GB VRAM, and 377GB RAM and 72GB Swap.
In training, the batch size is set to 16.
At the first epoch, the usage of RAM is always increasing.
At step 2743, all RAM was occupied (even the Swap space) and the training stopped.
Thus, 2743 * 16 * 4 = 175,552 is the max number of frames can be used in training for my machine, and the test set was not token into account.
I tried to reduce the number of frames of both training and test set to 10,000 frames, and the training process is OK.
Questions @sstzal :
I guess the reason of this problem is that there are too much log during training.
The text was updated successfully, but these errors were encountered: