You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your wonderful work!
I'm using vbench's gpt enhanced prompt for samples, and I've noticed that occasionally a couple of videos will have black or white output. Or for a long period of time the video is black and objects appear at the last second.
prompt: A focused individual sits at a sleek, modern desk in a dimly lit room, illuminated by the soft glow of a high-resolution computer screen. They wear a cozy, oversized sweater and glasses, reflecting the screen's light. The room is filled with the quiet hum of technology, with a minimalist setup including a mechanical keyboard and a wireless mouse. The person’s fingers dance swiftly across the keys, their face showing intense concentration. Behind them, a bookshelf filled with colorful books and a potted plant adds a touch of warmth to the tech-centric space. The scene captures the blend of human focus and digital interaction.
A.person.is.using.computer-0.mp4
prompt: A pristine white bathroom features a sleek, modern sink with a chrome faucet, set against a backdrop of glossy white tiles. The sink's surface is adorned with a neatly folded hand towel and a small potted plant, adding a touch of greenery. Adjacent to the sink, a contemporary toilet with a soft-close lid and a minimalist design stands out. The toilet's clean lines and the subtle sheen of its ceramic surface reflect the ambient light. The scene captures the essence of a serene, well-maintained bathroom, emphasizing cleanliness and modern aesthetics.
a.sink.and.a.toilet-1.mp4
prompt: A sleek black cat with piercing green eyes prowls gracefully through a dimly lit, mysterious alleyway, its fur glistening under the soft glow of a distant streetlamp. The cat pauses, ears perked, as it senses movement, its silhouette casting an elongated shadow on the cobblestone path. It then leaps effortlessly onto a nearby windowsill, where it sits, tail flicking, and gazes intently into the darkness. The scene transitions to a close-up of the cat's face, highlighting its sharp, alert features and the subtle twitch of its whiskers, capturing the essence of its enigmatic and nocturnal nature.
a.black.cat-2.mp4
Here is my code:
importosimporttorchfromdiffusersimportCogVideoXPipelinefromdiffusers.utilsimportexport_to_videoprompt_file='test_human.txt'withopen(prompt_file,"r")asf:
prompts=[line.strip()forlineinfifline.strip()]prompt_file_longer='test_human_longer.txt'withopen(prompt_file_longer,"r")asf:
prompts_longer=[line.strip()forlineinfifline.strip()]output_dir=prompt_file.split('/')[-1].split('.')[0]os.makedirs(output_dir,exist_ok=True)pipe=CogVideoXPipeline.from_pretrained("PATH",torch_dtype=torch.bfloat16)pipe.to("cuda")pipe.vae.enable_tiling()pipe.vae.enable_slicing()fori,prompt_linenumerate(prompts_longer):
prompt=prompts[i]fornuminrange(5):
generator=torch.Generator(device="cuda").manual_seed(42 + num)video=pipe(prompt=prompt_l,height=768,width=1360,num_videos_per_prompt=1,num_inference_steps=50,num_frames=81,guidance_scale=6,generator=generator,).frames[0]output_path=os.path.join(output_dir,f"{prompt}-{num}.mp4")export_to_video(video,output_path,fps=8)print(f"Video saved to {output_path}")
emm, num_frames needs to be changed to 161, export_to_video(video, output_path, fps=16). Also, have you tried whether five seconds is normal.
I want to make sure that the setting for 5 seconds video is num_frames=81 and export_to_video(video,output_path,fps=16)? And for 10 seconds is num_frames=161 and export_to_video(video,output_path,fps=16)? I found that CogVideoX1.5's frame rates are all 16fps, but the fps setting for export_to_video in huggface's demo is 8.
prompt: A focused individual sits at a sleek, modern desk in a dimly lit room, illuminated by the soft glow of a high-resolution computer screen. They wear a cozy, oversized sweater and glasses, reflecting the screen's light. The room is filled with the quiet hum of technology, with a minimalist setup including a mechanical keyboard and a wireless mouse. The person’s fingers dance swiftly across the keys, their face showing intense concentration. Behind them, a bookshelf filled with colorful books and a potted plant adds a touch of warmth to the tech-centric space. The scene captures the blend of human focus and digital interaction.
Setting 81 frames and 16fps for 5-sec video output:
A.person.is.using.computer-5s.mp4
Setting 161 frames and 16fps for 10-sec video output:
A.person.is.using.computer-10s.mp4
By the way, when using the SAT version for 5-sec video sampling, everything goes well, and there is no black and white video. I suppose the diffuser version may still make things bad.
System Info / 系統信息
None
Information / 问题信息
Reproduction / 复现过程
Thank you for your wonderful work!
I'm using vbench's gpt enhanced prompt for samples, and I've noticed that occasionally a couple of videos will have black or white output. Or for a long period of time the video is black and objects appear at the last second.
prompt: A focused individual sits at a sleek, modern desk in a dimly lit room, illuminated by the soft glow of a high-resolution computer screen. They wear a cozy, oversized sweater and glasses, reflecting the screen's light. The room is filled with the quiet hum of technology, with a minimalist setup including a mechanical keyboard and a wireless mouse. The person’s fingers dance swiftly across the keys, their face showing intense concentration. Behind them, a bookshelf filled with colorful books and a potted plant adds a touch of warmth to the tech-centric space. The scene captures the blend of human focus and digital interaction.
A.person.is.using.computer-0.mp4
prompt: A pristine white bathroom features a sleek, modern sink with a chrome faucet, set against a backdrop of glossy white tiles. The sink's surface is adorned with a neatly folded hand towel and a small potted plant, adding a touch of greenery. Adjacent to the sink, a contemporary toilet with a soft-close lid and a minimalist design stands out. The toilet's clean lines and the subtle sheen of its ceramic surface reflect the ambient light. The scene captures the essence of a serene, well-maintained bathroom, emphasizing cleanliness and modern aesthetics.
a.sink.and.a.toilet-1.mp4
prompt: A sleek black cat with piercing green eyes prowls gracefully through a dimly lit, mysterious alleyway, its fur glistening under the soft glow of a distant streetlamp. The cat pauses, ears perked, as it senses movement, its silhouette casting an elongated shadow on the cobblestone path. It then leaps effortlessly onto a nearby windowsill, where it sits, tail flicking, and gazes intently into the darkness. The scene transitions to a close-up of the cat's face, highlighting its sharp, alert features and the subtle twitch of its whiskers, capturing the essence of its enigmatic and nocturnal nature.
a.black.cat-2.mp4
Here is my code:
The "test_human.txt" is
test_human.txt
"test_human_longer.txt" is
test_human_longer.txt
The prompt in the file is the one that is likely to have a black video.
Expected behavior / 期待表现
To figure out why this is happening.
The text was updated successfully, but these errors were encountered: