We design a purely neural architecture to solve Sudoku, without using the rules of Sudoku in any manner. This work falls broadly in the space of neuro-symbolic reasoning. We use cGAN, Unsupervised Data Augmentation, and Recurrent Relational Network (RRN) to build this solver. We improve upon the baseline by 30% accuracy using our inductive biases and training framework. The report can be found here.
Problem statement can be found here.
-
8x8 Sudoku Board Images:
Note : these are 8x8 sudoku boards, where each column, each row and each block of size 2x4 is filled with 8 unique digits in the solved board, comprising of digits that are not neccessarily the actual digits (recognized by humans).
-
Digit Classifier: In order to give symbolic input to sudoku solver model, we extract all sub-images from the board and then use a classifier model to create 8x8=64 digits Sudoku board.
Available labelled data:
- NOT using Sudoku Rules: we solve the problem using very limited dataset, allow the neural network to understand the constraints of Sudoku, and solve the Constrained Satisfaction Problem completely from dataset.
In the following sections, we break the problem in two parts: (1) creating a classifier, and (2) training and improving RRN to solve Sudoku.
Since we are short on labelled (digit) dataset, we experiment with different clustering methods, but upon visual inspection we found that the given digits don't get clustered properly. Instead, we use unsupervised data augmentation to leverage the vast amount of unlabelled dataset (from sudoku board input-output images). Once the pseudo labels are obtained from UDA, we train a c-GAN model to remove the noise in labels.
cd ./SemiSupervised_cGAN
mkdir ./temp_saves #for saving the results
python uda.py --unlabelled_datapath <large-unlabelled-dataset-path> --supervised_datapath <small-supervised-dataset-path> --supervised_labels <path-of-labels-of-supervised-dataset> --output_labels <path-of-labelled-image-dataset-given-as-unlabelled-datapath> --output_classifier <path-of-output-classifier-using-UDA-method>
# use saved labels of uda classifier to train GAN
python train_cgan.py --root_path_to_save <directory-to-save-results> --traindatapath <large-unlabelled-dataset-path> --trainlabelspath <path-of-labelled-image-dataset-given-as-unlabelled-datapath> --train_or_gen train --num_epochs 100
#generate 9k images in form of npy files and save as gen9k.npy and target9k.npy
python train_cgan.py --gen_model_pretr <trained-model-path-from prev step> --gen9k_path <path-to-generated-images> --target9k_path <path-to-generated-image-labels> --train_or_gen generate
#convert the generated npy images in png images, and 9k real images to png images and save them and then calculate FID score
python numpy2images.py --savedir <directory-to-save-results> --numpy_images_file <path-to-generated-images> --num_images 9000
# calculate FID assuing we have gpu access ,for this step, you need to install the pytorch FID package
python -m pytorch_fid --device "cuda:0" <directory-to-save-results>/generated_images <path-to-directory-having-real-images>
# we also need a path to the directory having real images <path-to-directory-having-real-images> which can be used to get the FID score between real and generated images from our GAN
Having the symbolic data obtained from the previous section, we can now train RRN model independently. There would be some level of noise in the symbolic data, and thus, it limits the ability of RRN to learn the constraints of Sudoku. However, we can also train both the classifier and RRN model in a joint framework, and we experimentally observed that this improves the performance of both of them. This stems from the fact that RRN learnt some rules of Sudoku and this information, when passed to classifier, helped it improve itself, and this goes on back and forth.
run_solver.sh <path_to_train> <path_to_test_query> <path_to_sample_imgs> <path_to_out_csv>
Similar to the earlier part but this time, the classifier that we get from UDA is fine tuned while training the RRN. The pretrained classifier and RRN are trained jointly so that both improve each other
run_solver.sh <path_to_train> <path_to_test_query> <path_to_sample_imgs> <path_to_out_csv> true
Course assignment in Deep Learning course (course webpage) taken by Prof. Parag Singla