This main version adds many improvements as well as video generation with diffusion and super resolution with supervised metrics, including for consistency models.
Features
- adding separate control of vertical and horizontal flips as augmentation (a7a6109)
- aligned crops for super-resolution (8418470)
- allow tf32 on cudnn (367cd91)
- better Canny for cond image with background (c3c7de6)
- consistency models with supervised losses (ed701ad)
- data: random bbox for inpainting (764646d)
- input and output multiple and different channels (6bcd64c)
- load models without stricness (073d57c)
- max number of visualized images from train/test set (24f0e81)
- ml: add option for vid inference (c3f83b7)
- ml: add supervised loss with GANs with aligned datasets (d7f5119)
- ml: added LPIPS supervised loss with GANs (70e8ee4)
- ml: adding example of CM+discriminator (b6b8b64)
- ml: batched prompts for turbo (023dd54)
- ml: Canny can use a range of dropout probabilities (7b4c860)
- ml: canny dropout for vid (06ce7d7)
- ml: CM with added discriminator (10516e0)
- ml: consistency models for pix2pix (cd92712)
- ml: CUT turbo (cdd508f)
- ml: debug args (a11172b)
- ml: debug crop (51c9fd6)
- ml: debug for canny inference (930f3ce)
- ml: debug for canny threshold (dca0bfa)
- ml: debug for vid metrics (7c57471)
- ml: debug inference_vid for canny (17b9a29)
- ml: debug vid for frame limit (ff97c03)
- ml: debug vid metric (ba43725)
- ml: DISTS supervised loss for aligned data (56273ef)
- ml: FID,KID,MSID for multiple test sets and non 8 bit images (74b0e65)
- ml: fix canny range option (c102ee0)
- ml: fix inference regeneration and crop canny (f75196f)
- ml: HDiT for GANs (58bedff)
- ml: HDiT generator (9a95f1f)
- ml: jenkins test inference print (b68ab53)
- ml: L1 or MSE for diffusion multiscale loss (06e3d6a)
- ml: metric fvd for video (6d458a3)
- ml: min-SNR loss weight for diffusion, 2303.09556 (c802119)
- ml: modif for horse2zebra prompt (b66a954)
- ml: multiple test sets (6db745c)
- ml: option for max_sequence_lenght of video generation (12cfc1b)
- ml: prompt for inference horze2zebra (b8e9929)
- ml: random canny inside batch (70919cd)
- ml: rename dataloader for video generation (98b1315)
- ml: The implementation of UNetVid for generating video with temporal consistency and inference (43b7018)
- ml: unchange fill_img_with_canny with random drop canny (a2ed3fc)
- ml: UNetVid for generating video with bs > 1 (00f11bc)
- ml: vid try autoregressive inference (5b92031)
- multi-prompt local works (b98746a)
- multiprompt (2bffc8b)
- multistep lr scheduler (01c3558)
- train_finetune for finetuning gans/others and removing / adding losses and networks (2f26503)
- unet_vid motion module fine-grained configuration (813e435)
Bug Fixes
- aligne dataset, resize domain A only if necessary (4127571)
- allowing for no NCE with cut (9d8ff9b)
- clamp bbox to image size during inference (fc3874d)
- cm at test time (706356b)
- cm with conditioning (0fd2d14)
- consistency model schedule upon resume (88d03f9)
- consistency models with input/output different channels (db61821)
- crash in inference script, errors in documentation (f99dd34)
- cut options at test time (dcd2438)
- D input is G output size with gans (194f42b)
- diff across input/output channels in gans (6845816)
- diff real/fake not needed + cleanup (5cbd1f0)
- diffusion inference for images > 8bits (aefdc38)
- diffusion with input and output of different channel size (cd264de)
- disable hdit flop count (8c449f8)
- fix pytest rootdir (1fe0e80)
- further lowering the input test size of cut-turbo (6914731)
- gan inference script with prompts (cef7681)
- gan metrics reference (d5570b6)
- GAN semantic visual output (d3a5565)
- GAN semantic visual output (e7ee6bd)
- gen_single_image.py for images with channels > 3 (9ad4aaa)
- hdit out_channel (84473fc)
- identity with cut turbo (2538c00)
- inference with images > 8bit and GANs (34e6c96)
- input size of cut-turbo test (2c024c2)
- interpolation size selection for projected discriminators (ef045d0)
- load_image replacement (5af5803)
- loading of ema models (995c5eb)
- lora config saving with multiple gpus (c98617d)
- lower img2img turbo test memory footprint (54a6ab4)
- missing SSIM metric option (8530851)
- ml: multiscale diffusion loss for any input resolution (5c9f997)
- multi-gpu ddp collective mismatch upon resume (471fbbc)
- multi-gpu with frozen base network (1a07342)
- multiple test sets with test.py + SSIM (06762fb)
- option default cut_nce_idt (4c5ec6d)
- palette options at test time (75f7b04)
- parser uses model_type for model level options (76095b5)
- paths are only required for video generation (eb39ec5)
- paths loading prompts file (35d2ef3)
- perceptual loss for cm when input and output channels differ (ca81789)
- potential bug in gen_single_diffusion model path (0cf63fe)
- projected discriminator allows grayscale input (44fb458)
- prompt unaligned loading (e25d4b1)
- rename sketch options in examples (6930d00)
- RGB order for diffusion inpainting (eff8a57)
- rgbn cut lpips supervision (17cfbb2)
- sam for single channel inputs (397f837)
- segformer generator for single channel inputs (1eb6695)
- show full test set output with GANs (31efdcd)
- single dataset (a6266d8)
- supervised loss for aligned GANs, with unit tests (e21ddd3)
- supervised perceptual metrics all with piq and configurable + lambda weight (d77c3c5)
- test image output tensor visuals (19596b2)
- tifffile import (a09b5ed)
- total_iters wrong variable (066dc1b)
- train batch visuals (24adb61)
- typo in semantic threshold test variable (5082c36)
- unet mha output for GANs (075b6c6)
Docker images:
- GPU (CUDA only):
docker pull docker.jolibrain.com/joligen_server:v4.0.0
- All images available from https://docker.jolibrain.com/#!/taglist/joligen_server