Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Found black and white video when sampling CogVideoX1.5-5B 10s using code from huggingface. #578

Open
1 of 2 tasks
DZY-irene opened this issue Dec 3, 2024 · 3 comments
Open
1 of 2 tasks
Assignees

Comments

@DZY-irene
Copy link

System Info / 系統信息

None

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Thank you for your wonderful work!
I'm using vbench's gpt enhanced prompt for samples, and I've noticed that occasionally a couple of videos will have black or white output. Or for a long period of time the video is black and objects appear at the last second.

prompt: A focused individual sits at a sleek, modern desk in a dimly lit room, illuminated by the soft glow of a high-resolution computer screen. They wear a cozy, oversized sweater and glasses, reflecting the screen's light. The room is filled with the quiet hum of technology, with a minimalist setup including a mechanical keyboard and a wireless mouse. The person’s fingers dance swiftly across the keys, their face showing intense concentration. Behind them, a bookshelf filled with colorful books and a potted plant adds a touch of warmth to the tech-centric space. The scene captures the blend of human focus and digital interaction.

A.person.is.using.computer-0.mp4

prompt: A pristine white bathroom features a sleek, modern sink with a chrome faucet, set against a backdrop of glossy white tiles. The sink's surface is adorned with a neatly folded hand towel and a small potted plant, adding a touch of greenery. Adjacent to the sink, a contemporary toilet with a soft-close lid and a minimalist design stands out. The toilet's clean lines and the subtle sheen of its ceramic surface reflect the ambient light. The scene captures the essence of a serene, well-maintained bathroom, emphasizing cleanliness and modern aesthetics.

a.sink.and.a.toilet-1.mp4

prompt: A sleek black cat with piercing green eyes prowls gracefully through a dimly lit, mysterious alleyway, its fur glistening under the soft glow of a distant streetlamp. The cat pauses, ears perked, as it senses movement, its silhouette casting an elongated shadow on the cobblestone path. It then leaps effortlessly onto a nearby windowsill, where it sits, tail flicking, and gazes intently into the darkness. The scene transitions to a close-up of the cat's face, highlighting its sharp, alert features and the subtle twitch of its whiskers, capturing the essence of its enigmatic and nocturnal nature.

a.black.cat-2.mp4

Here is my code:

import os
import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video

prompt_file = 'test_human.txt'
with open(prompt_file, "r") as f:
    prompts = [line.strip() for line in f if line.strip()]

prompt_file_longer='test_human_longer.txt'
with open(prompt_file_longer, "r") as f:
    prompts_longer = [line.strip() for line in f if line.strip()]

output_dir = prompt_file.split('/')[-1].split('.')[0] 
os.makedirs(output_dir, exist_ok=True)

pipe = CogVideoXPipeline.from_pretrained(
    "PATH",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

for i,prompt_l in enumerate(prompts_longer):
    prompt = prompts[i]

    for num in range(5):  
        generator = torch.Generator(device="cuda").manual_seed(42 + num)
        video = pipe(
            prompt=prompt_l,
            height=768,
            width=1360,
            num_videos_per_prompt=1,
            num_inference_steps=50,
            num_frames=81,
            guidance_scale=6,
            generator=generator,
        ).frames[0]

        output_path = os.path.join(output_dir, f"{prompt}-{num}.mp4")
        export_to_video(video, output_path, fps=8)
        print(f"Video saved to {output_path}")

The "test_human.txt" is
test_human.txt

"test_human_longer.txt" is
test_human_longer.txt

The prompt in the file is the one that is likely to have a black video.

Expected behavior / 期待表现

To figure out why this is happening.

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Dec 4, 2024
@zRzRzRzRzRzRzR
Copy link
Member

emm, num_frames needs to be changed to 161, export_to_video(video, output_path, fps=16). Also, have you tried whether five seconds is normal.

@DZY-irene
Copy link
Author

emm, num_frames needs to be changed to 161, export_to_video(video, output_path, fps=16). Also, have you tried whether five seconds is normal.

I want to make sure that the setting for 5 seconds video is num_frames=81 and export_to_video(video,output_path,fps=16)? And for 10 seconds is num_frames=161 and export_to_video(video,output_path,fps=16)? I found that CogVideoX1.5's frame rates are all 16fps, but the fps setting for export_to_video in huggface's demo is 8.
1733290244557

@DZY-irene
Copy link
Author

prompt: A focused individual sits at a sleek, modern desk in a dimly lit room, illuminated by the soft glow of a high-resolution computer screen. They wear a cozy, oversized sweater and glasses, reflecting the screen's light. The room is filled with the quiet hum of technology, with a minimalist setup including a mechanical keyboard and a wireless mouse. The person’s fingers dance swiftly across the keys, their face showing intense concentration. Behind them, a bookshelf filled with colorful books and a potted plant adds a touch of warmth to the tech-centric space. The scene captures the blend of human focus and digital interaction.
Setting 81 frames and 16fps for 5-sec video output:

A.person.is.using.computer-5s.mp4

Setting 161 frames and 16fps for 10-sec video output:

A.person.is.using.computer-10s.mp4

By the way, when using the SAT version for 5-sec video sampling, everything goes well, and there is no black and white video. I suppose the diffuser version may still make things bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants