This project endeavors to generate explanation maps that interpret the behavior of two models trained on different datasets. The first model is a very shallow CNN trained on “MNIST-1D” , a small-scale dataset prepared for generic array classification. The second model is a VGG-network trained on the “HMT” dataset utilized for histopathologic tissue classification.
To carry out the project, a couple of attribution methods are considered: A perturbation-based metod, Semantic Input Sampling for Explanation (SISE)[1]. and a backpropagation-based method, Integrated gradient [2].
The following is a detailed description of two datasets that you will need for Project A. Both datasets can be found in the Project_A_Supp.zip file which has been uploaded to Quercus.
• Paper: Scaling down deep learning: https://arxiv.org/abs/2011.14439
• GitHub link: https://github.com/greydanus/mnist1d
• Description: This dataset is a 1-Dimensional and low-memory analogue of the popular
digit classification dataset, MNIST. In the same way as the MNIST dataset, the MNIST-
1D data are divided into 10 classes, each of which represents a digit between 0-9. Unlike
MNIST, each example in MNIST-1D train/test data is a one-dimensional sequence of
points generated by augmenting a 1-D template representing each of the digits by random
padding, random translation, adding Gaussian noise, adding a constant linear signal
analogous to shear in 2D images, and lastly, downsampling to 40 data points.
• Availability: Publicly available (for academic purposes).
• Resources needed: CPU
• Data size 4000 train data + 1000 test data (partitioned by the dataset promoters).
• Paper: Multi-class texture analysis in colorectal cancer histology:
https://www.nature.com/articles/srep27988
• Description: This dataset was formed to elevate the performance of ML-based solutions
in “histopathological tissue classification.” HMT is an equally balanced dataset that
contains images extracted from 10 independent samples of colorectal cancer (CRC)
primary tumors and divided into one of the following 8 classes: (a) tumor epithelium, (b)
simple stroma, (c) complex stroma, (d) immune cell conglomerates, (e) debris and mucus,
(f) mucosal glands, (g) adipose tissue, (h) background.
• Availability: Publicly available (for academic purposes).
• Resources needed: CPU
• Data size: 4504 train images + 496 test images.
[1] Sattarzadeh, Sam, Mahesh Sudhakar, Anthony Lem, Shervin Mehryar, Konstantinos N. Plataniotis, Jongseong Jang, Hyunwoo Kim, Yeonjeong Jeong, Sangmin Lee, and Kyunghoon Bae. "Explaining convolutional neural networks through attribution-based input sampling and block-wise feature aggregation." In 34th AAAI Conference on Artificial Intelligence. 2021. https://arxiv.org/abs/2010.00672
[2] Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. "Axiomatic attribution for deep networks." In International Conference on Machine Learning, pp. 3319-3328. PMLR, 2017. https://arxiv.org/abs/1703.01365