This project examines image generative models, focusing on DCGAN and StyleGAN3. Initially, the project started by training a DCGAN to generate landscape images, but it was found that to create a decent DCGAN, you need a significantly larger and more diverse dataset, a carefully designed architecture, advanced techniques to stabilize training between the generator and discriminator, and sufficient computational resources to train effectively. As a result, the project did not delve deeply into developing the most powerful DCGAN architecture. Instead, it explored the use of a pretrained variant of StyleGAN3, originally provided by NVIDIA and further fine-tuned on landscape images by Justin Pinkney.
More specifically, the model used is the StyleGAN3-t LHQ 256, which is a StyleGAN3-t model further trained on 15 million images of various landscapes at a resolution of 256x256. Examples of initial generated outputs from this model are shown in the following stacked arrangement of three images:
Further fine-tuning of this model was carried out for 50 epochs using the Landscape Pictures dataset, with the same seeds used to generate new landscape images after this additional training, which are displayed below. By comparing the two stacks of images, it is evident that the new dataset likely introduced greater colors and lighting variations, increased detail, and possibly more geographical diversity, since the landscapes in the second stack feature richer and more complex environments, ranging from detailed mountain terrains to lush, vibrant valleys.
Additional generated images are available in a GIF at the top of this repository, with the top three images generated by the StyleGAN3-t LHQ 256 and the bottom three by the further trained model. Below is a GIF of images generated by the provided DCGAN architecture:
These images demonstrate the architecture's ability to generate landscape-like visuals, but with noticeable limitations such as lower resolution, simplified color schemes, and less realistic textures compared to the more advanced StyleGAN3 outputs. This difference highlights the benefits of using pre-trained models, especially when training resources are limited.
The Landscape Pictures dataset, a collection of natural landscape photos from Flickr, was used to train both DCGAN and StyleGAN3-t LHQ 256 models. It consists of 4,300 images, representing a variety of landscape types. Details of these categories, including the number of pictures and a brief description of each, are provided in the table below:
Landscape Category | Number of Pictures | Description |
---|---|---|
landscapes |
900 | General landscape pictures |
landscapes_mountain |
900 | Pictures featuring mountain landscapes |
landscapes_desert |
100 | Pictures of desert landscapes |
landscapes_sea |
500 | Sea views and coastal landscapes |
landscapes_beach |
500 | Beach scenes |
landscapes_island |
500 | Pictures of island settings |
landscapes_japan |
900 | Landscapes located in Japan |
To adapt to my system's and model capabilities, the landscape images, which originally varied in resolution, were uniformly resized to a resolution of 256x256 pixels.
- Clone the repository:
git clone https://github.com/Dalageo/GANScapeGenerator.git
- Navigate to the cloned directory:
cd GANScapeGenerator
-
Open the
GANScapeGenerator_DCGAN.ipynb
using your preferred Jupyter-compatible environment (e.g., Jupyter Notebook, VS Code, or PyCharm) -
Update the dataset, model and output directory paths to point to the location of your local environment.
-
Run the cells sequentially to reproduce the results.
-
Visit the StyleGAN3 repository and follow the installation instructions.
-
Open the
GANScapeGenerator_StyleGAN3.py
using your preferred Python-compatible environment (e.g., VS Code, or PyCharm) -
Select a pretrained StyleGAN3 model suitable for your dataset:(e.g., NVIDIA's NGC catalog or Finetuned stylegan3 models on Hugging Face)
-
Update the dataset, model and output directory paths
-
Run the cells sequentially to reproduce the results.
To train the models on GPU, you will need to activate GPU support based on your operating system and install the required dependencies. You can follow this guide provided by PyTorch for detailed instructions.
Firstly, I would like to thank Alec Radford, Luke Metz, and Soumith Chintala for introducing DCGAN in their 2015 paper, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks".
Additionally, special thanks to NVIDIA for providing pretrained StyleGAN3 models for educational and research purposes, as well as Justin Pinkney for making available a StyleGAN3 variant that has been pretrained on the LHQ dataset.
The provided fine-tuned StyleGAN3 model is licensed under the Nvidia Source Code License, the dataset is under the CC 1.0 Universal, while the accompanying documentation is licensed under the AGPL-3.0 license. AGPL-3.0 license was chosen to promote open collaboration, ensure transparency, and allow others to freely use, modify, and contribute to the work.
Any modifications or improvements must also be shared under the same license, with appropriate acknowledgment.