Found black and white video when sampling CogVideoX1.5-5B 10s using code from huggingface. #578

DZY-irene · 2024-12-03T13:24:33Z

System Info / 系統信息

None

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Thank you for your wonderful work!
I'm using vbench's gpt enhanced prompt for samples, and I've noticed that occasionally a couple of videos will have black or white output. Or for a long period of time the video is black and objects appear at the last second.

prompt: A focused individual sits at a sleek, modern desk in a dimly lit room, illuminated by the soft glow of a high-resolution computer screen. They wear a cozy, oversized sweater and glasses, reflecting the screen's light. The room is filled with the quiet hum of technology, with a minimalist setup including a mechanical keyboard and a wireless mouse. The person’s fingers dance swiftly across the keys, their face showing intense concentration. Behind them, a bookshelf filled with colorful books and a potted plant adds a touch of warmth to the tech-centric space. The scene captures the blend of human focus and digital interaction.

A.person.is.using.computer-0.mp4

prompt: A pristine white bathroom features a sleek, modern sink with a chrome faucet, set against a backdrop of glossy white tiles. The sink's surface is adorned with a neatly folded hand towel and a small potted plant, adding a touch of greenery. Adjacent to the sink, a contemporary toilet with a soft-close lid and a minimalist design stands out. The toilet's clean lines and the subtle sheen of its ceramic surface reflect the ambient light. The scene captures the essence of a serene, well-maintained bathroom, emphasizing cleanliness and modern aesthetics.

a.sink.and.a.toilet-1.mp4

prompt: A sleek black cat with piercing green eyes prowls gracefully through a dimly lit, mysterious alleyway, its fur glistening under the soft glow of a distant streetlamp. The cat pauses, ears perked, as it senses movement, its silhouette casting an elongated shadow on the cobblestone path. It then leaps effortlessly onto a nearby windowsill, where it sits, tail flicking, and gazes intently into the darkness. The scene transitions to a close-up of the cat's face, highlighting its sharp, alert features and the subtle twitch of its whiskers, capturing the essence of its enigmatic and nocturnal nature.

a.black.cat-2.mp4

Here is my code:

import os
import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video

prompt_file = 'test_human.txt'
with open(prompt_file, "r") as f:
    prompts = [line.strip() for line in f if line.strip()]

prompt_file_longer='test_human_longer.txt'
with open(prompt_file_longer, "r") as f:
    prompts_longer = [line.strip() for line in f if line.strip()]

output_dir = prompt_file.split('/')[-1].split('.')[0] 
os.makedirs(output_dir, exist_ok=True)

pipe = CogVideoXPipeline.from_pretrained(
    "PATH",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

for i,prompt_l in enumerate(prompts_longer):
    prompt = prompts[i]

    for num in range(5):  
        generator = torch.Generator(device="cuda").manual_seed(42 + num)
        video = pipe(
            prompt=prompt_l,
            height=768,
            width=1360,
            num_videos_per_prompt=1,
            num_inference_steps=50,
            num_frames=81,
            guidance_scale=6,
            generator=generator,
        ).frames[0]

        output_path = os.path.join(output_dir, f"{prompt}-{num}.mp4")
        export_to_video(video, output_path, fps=8)
        print(f"Video saved to {output_path}")

The "test_human.txt" is
test_human.txt

"test_human_longer.txt" is
test_human_longer.txt

The prompt in the file is the one that is likely to have a black video.

Expected behavior / 期待表现

To figure out why this is happening.

zRzRzRzRzRzRzR · 2024-12-04T02:47:05Z

emm, num_frames needs to be changed to 161, export_to_video(video, output_path, fps=16). Also, have you tried whether five seconds is normal.

DZY-irene · 2024-12-04T05:32:19Z

emm, num_frames needs to be changed to 161, export_to_video(video, output_path, fps=16). Also, have you tried whether five seconds is normal.

I want to make sure that the setting for 5 seconds video is num_frames=81 and export_to_video(video,output_path,fps=16)? And for 10 seconds is num_frames=161 and export_to_video(video,output_path,fps=16)? I found that CogVideoX1.5's frame rates are all 16fps, but the fps setting for export_to_video in huggface's demo is 8.

DZY-irene · 2024-12-04T06:11:21Z

prompt: A focused individual sits at a sleek, modern desk in a dimly lit room, illuminated by the soft glow of a high-resolution computer screen. They wear a cozy, oversized sweater and glasses, reflecting the screen's light. The room is filled with the quiet hum of technology, with a minimalist setup including a mechanical keyboard and a wireless mouse. The person’s fingers dance swiftly across the keys, their face showing intense concentration. Behind them, a bookshelf filled with colorful books and a potted plant adds a touch of warmth to the tech-centric space. The scene captures the blend of human focus and digital interaction.
Setting 81 frames and 16fps for 5-sec video output:

A.person.is.using.computer-5s.mp4

Setting 161 frames and 16fps for 10-sec video output:

A.person.is.using.computer-10s.mp4

By the way, when using the SAT version for 5-sec video sampling, everything goes well, and there is no black and white video. I suppose the diffuser version may still make things bad.

zRzRzRzRzRzRzR self-assigned this Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Found black and white video when sampling CogVideoX1.5-5B 10s using code from huggingface. #578

Found black and white video when sampling CogVideoX1.5-5B 10s using code from huggingface. #578

DZY-irene commented Dec 3, 2024

zRzRzRzRzRzRzR commented Dec 4, 2024

DZY-irene commented Dec 4, 2024

DZY-irene commented Dec 4, 2024

Found black and white video when sampling CogVideoX1.5-5B 10s using code from huggingface. #578

Found black and white video when sampling CogVideoX1.5-5B 10s using code from huggingface. #578

Comments

DZY-irene commented Dec 3, 2024

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

zRzRzRzRzRzRzR commented Dec 4, 2024

DZY-irene commented Dec 4, 2024

DZY-irene commented Dec 4, 2024