Overview of literature on learning from supervision on the model's explanations. A .bib
file of the papers below can be downloaded here.
Warning: permanent WIP.
Did we miss a relevant paper? Please submit a new entry in the following format:
- **An Artificially-intelligent Means to Escape Discreetly from the Departmental Holiday Party; guide for the socially awkward**
Eve Armstrong; arXiv 2020 [paper](https://arxiv.org/abs/2003.14169)
`Notes: it is a joke; a pretty good joke actually.`
- Online Resources
- Passive Learning
- Interactive Learning
- Reinforcement Learning
- Distillation
- Regularization without Supervision
- Machine Teaching
- Applications
- Related Works
- Resources
- Tutorial on Explanations in Interactive Machine Learning at AAAI-22 website
Notes: includes recording
.
Approaches that supervise the model's explanations.
-
Rationalizing Neural Predictions Tao Lei, Regina Barzilay, Tommi Jaakkola; EMNLP 2016 paper code
Notes: they learn an "explanation module" for text classificaiton from explanatory supervision, namely rationales.
-
Right for the right reasons: training differentiable models by constraining their explanations Andrew Slavin Ross, Michael C. Hughes, and Finale Doshi-Velez; IJCAI 2017 paper code
-
e-SNLI: natural language inference with natural language explanations Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom; NeurIPS 2018 paper code
-
Tell me where to look: Guided attention inference network Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Yun Fu; CVPR 2018 paper
-
e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom; NeurIPS 2018 paper code
-
Learning credible models Jiaxuan Wang, Jeeheh Oh, Haozhu Wang, and Jenna Wiens; KDD 2018 paper code
-
Not Using the Car to See the Sidewalk--Quantifying and Controlling the Effects of Context in Classification and Segmentation Rakshith Shetty, Bernt Schiele, Mario Fritz; CVPR 2019 paper
Notes: not exactly about explanations, learns from ground-truth object annotations.
-
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, and Devi Parikh; ICCV 2019 pdf
-
Learning credible deep neural networks with rationale regularization Mengnan Du, Ninghao Liu, Fan Yang, Xia Hu; ICDM 2019 paper
-
Deriving Machine Attention from Human Rationales Yujia Bao, Shiyu Chang, Mo Yu, and Regina Barzilay; ACL 2019 paper code
-
TED: Teaching AI to explain its decisions Michael Hind, Dennis Wei, Murray Campbell, Noel Codella, Amit Dhurandhar, Aleksandra Mojsilović, Karthikeyan Ramamurthy, Kush Varshney; AIES 2019 paper
-
Incorporating Priors with Feature Attribution on Text Classification Frederick Liu, Besim Avci; ACL 2019 paper
-
Saliency Learning: Teaching the Model Where to Pay Attention Reza Ghaeini, Xiaoli Fern, Hamed Shahbazi, Prasad Tadepalli; NAACL 2019 paper
-
Do Human Rationales Improve Machine Explanations? Julia Strout, Ye Zhang, Raymond Mooney; ACL Workshop BlackboxNLP 2019 paper
-
CARE: Class attention to regions of lesion for classification on imbalanced data Jiaxin Zhuang, Jiabin Cai, Ruixuan Wang, Jianguo Zhang, Weishi Zheng; International Conference on Medical Imaging with Deep Learning, 2019. paper
-
GradMask: Reduce Overfitting by Regularizing Saliency Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen; International Conference on Medical Imaging with Deep Learning, 2019. paper
-
Learning Global Transparent Models Consistent with Local Contrastive Explanations Tejaswini Pedapati, Avinash Balakrishnan, Karthikeyan Shanmugam, Amit Dhurandhar; NeurIPS 2020 paper
-
Model Agnostic Multilevel Explanations Karthikeyan Natesan Ramamurthy, Bhanukiran Vinzamuri, Yunfeng Zhang, Amit Dhurandhar; NeurIPS 2020 paper
Notes: implicitly learns to generalize across multiple local explanations.
-
Interpretations are useful: penalizing explanations to align neural networks with prior knowledge Laura Rieger, Chandan Singh, William Murdoch, Bin Yu; ICML 2020 paper code
-
Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting Sayna Ebrahimi, Suzanne Petryk, Akash Gokul, William Gan, Joseph Gonzalez, Marcus Rohrbach; ICLR 2020 paper code
Notes: uses saliency guided replay for continual learning.
-
Learning to Faithfully Rationalize by Construction Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron Wallace. ACL 2020 paper code
-
Reflective-Net: Learning from Explanations Johannes Schneider, Michalis Vlachos; arXiv 2020 paper
-
Learning Interpretable Concept-based Models with Human Feedback Isaac Lage, Finale Doshi-Velez; arXiv 2020 paper
Notes: incrementally acquires side-information about per-concept feature dependencies; side-information is per-concept, not per-instance.
-
Improving performance of deep learning models with axiomatic attribution priors and expected gradients Gabriel Erion, Joseph D. Janizek, Pascal Sturmfels, Scott Lundberg, Su-In Lee; Nature Machine Intelligence 2019 paper preprint code
-
GLocalX-From Local to Global Explanations of Black Box AI Models Mattia Setzu, Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti; Artificial Intelligence 2021 page code
Notes: converts a set of local explanations to a global explanation / white-box model.
-
IAIA-BL: A Case-based Interpretable Deep Learning Model for Classification of Mass Lesions in Digital Mammography Alina Barnett, Fides Schwartz, Chaofan Tao, Chaofan Chen, Yinhao Ren, Joseph Lo, Cynthia Rudin; Nature Machine Intelligence 2021 paper code
-
Debiasing Concept-based Explanations with Causal Analysis Mohammad Taha Bahadori, and David E. Heckerman; ICLR 2021 paper
-
Teaching with Commentaries Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, and Geoffrey Hinton; ICLR 2021 paper code
-
Saliency is a possible red herring when diagnosing poor generalization Joseph Viviano, Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen; ICLR 2021 paper code
-
Towards Robust Classification Model by Counterfactual and Invariant Data Generation Chun-Hao Chang, George Alexandru Adam, Anna Goldenberg; CVPR 2021 paper code
-
Global Explanations with Decision Rules: a Co-learning Approach Géraldin Nanfack, Paul Temple, Benoît Frénay1; UAI 2021 paper code
-
Explain and Predict, and then Predict Again Zijian Zhang, Koustav Rudra, Avishek Anand; WSDM 2021 paper code
-
Explanation-Based Human Debugging of NLP Models: A Survey Piyawat Lertvittayakumjorn, Francesca Toni; arXiv 2021 paper
-
When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data Peter Hase, Mohit Bansal; arXiv 2021 paper code
-
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience George Chrysostomou, Nikolaos Aletras; arXiv 2021 paper code
-
Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates Xiaochuang Han, Yulia Tsvetkov; arXiv 2021 paper code
-
Saliency Guided Experience Packing for Replay in Continual Learning Gobinda Saha, Kaushik Roy; arXiv 2021 paper
Notes: leverages saliency for experience replay in continual learning.
-
What to Learn, and How: Toward Effective Learning from Rationales Samuel Carton, Surya Kanoria, Chenhao Tan; arXiv 2021 paper
-
Supervising Model Attention with Human Explanations for Robust Natural Language Inference Joe Stacey, Yonatan Belinkov, Marek Rei; AAAI 2022 paper code
-
Finding and removing Clever Hans: Using explanation methods to debug and improve deep models Christopher Anders, Leander Weber, David Neumann, Wojciech Samek, Klaus-Robert Müller, Klaus-Robert, Sebastian Lapuschkin; Information Fusion 2022 paper code code
-
Toward Learning Human-aligned Cross-domain Robust Models by Countering Misaligned Features Haohan Wang, Zeyi Huang, Hanlin Zhang, Eric P. Xing; UAI 2022 paper code
-
A survey on improving NLP models with human explanations Mareike Hartmann, Daniel Sonntag; arXiv 2022 paper
-
VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives Zhuofan Ying, Peter Hase, and Mohit Bansal; arXiv 2022 paper code
-
Identifying Spurious Correlations and Correcting them with an Explanation-based Learning Misgina Tsighe Hagos, Kathleen Curran, Brian Mac Namee; arXiv 2022 paper
-
Using Explanations to Guide Models Sukrut Rao, Moritz Böhle, Amin Parchami-Araghi, Bernt Schiele; ICCV 2023 paper code
-
Learning with Explanation Constraints Rattana Pukdee, Dylan Sam, Zico Kolter, Maria-Florina Balcan, Pradeep Ravikumar; arXiv 2023 paper
-
Spuriosity Didn't Kill the Classifier: Using Invariant Predictions to Harness Spurious Features Cian Eastwood, Shashank Singh, Andrei Liviu Nicolicioiu, Marin Vlastelica, Julius von Kügelgen, Benrnhard Schölkopf; arXiv 2023 paper
-
Spurious features everywhere-large-scale detection of harmful spurious features in imagenet Yannic Neuhaus, Maximilian Augustin, Valentyn Boreiko, Matthias Hein; ICCV 2023 paper code
-
Targeted Activation Penalties Help CNNs Ignore Spurious Signals Dekai Zhang, Matt Williams, and Francesca Toni; AAAI 2024 paper code
Approaches that combine supervision on the explanations with interactive machine learning:
-
Principles of Explanatory Debugging to Personalize Interactive Machine Learning Todd Kulesza, Margaret Burnett, Weng-Keen Wong, Simone Stumpf; IUI 2015 paper
-
Explanatory Interactive Machine Learning Stefano Teso, Kristian Kersting; AIES 2019 paper code
Notes: introduces explanatory interactive learning, focuses on active learning setup.
-
Toward Faithful Explanatory Active Learning with Self-explainable Neural Nets Stefano Teso; IAL Workshop 2019. paper code
Notes: explanatory active learning with self-explainable neural networks.
-
Making deep neural networks right for the right scientific reasons by interacting with their explanations Patrick Schramowski, Wolfgang Stammer, Stefano Teso, Anna Brugger, Franziska Herbert, Xiaoting Shao, Hans-Georg Luigs, Anne-Katrin Mahlein, Kristian Kersting; Nature Machine Intelligence 2020 paper code
Notes: introduces end-to-end explanatory interactive learning, fixes clever Hans deep neural nets.
-
Embedding Human Knowledge into Deep Neural Network via Attention Map Masahiro Mitsuhara, Hiroshi Fukui, Yusuke Sakashita, Takanori Ogata, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi; arXiv 2019 paper
-
One explanation does not fit all Kacper Sokol, Peter Flach; 2020 Künstliche Intelligenz paper
-
FIND: Human-in-the-loop Debugging Deep Text Classifiers Piyawat Lertvittayakumjorn, Lucia Specia, Francesca Toni; EMNLP 2020 paper
-
Human-driven FOL explanations of deep learning Gabriele Ciravegna, Francesco Giannini, Marco Gori, Marco Maggini, Stefano Melacci; IJCAI 2020 paper
Notes: first-order logic.
-
Cost-effective Interactive Attention Learning with Neural Attention Process Jay Heo, Junhyeon Park, Hyewon Jeong, Kwang joon Kim, Juho Lee, Eunho Yang, Sung Ju Hwang; ICML 2020 paper code
Notes: attention, interaction
-
Soliciting human-in-the-loop user feedback for interactive machine learning reduces user trust and impressions of model accuracy Donald Honeycutt, Mahsan Nourani, Eric Ragan; AAAI Conference on Human Computation and Crowdsourcing 2020 paper
-
ALICE: Active Learning with Contrastive Natural Language Explanations Weixin Liang, James Zou, Zhou Yu; EMNLP 2020 paper
-
Machine Guides, Human Supervises: Interactive Learning with Global Explanations Teodora Popordanoska, Mohit Kumar, Stefano Teso; arXiv 2020 paper code
Notes: introduces narrative bias and explanatory guided learning, focuses on human-initiated interaction and global explanations.
-
Teaching an Active Learner with Contrastive Examples Chaoqi Wang, Adish Singla, Yuxin Chen. NeurIPS 2021. paper
-
Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations Wolfgang Stammer, Patrick Schramowski, and Kristian Kersting; CVPR 2021 paper code
Notes: first-order logic, attention.
-
Right for Better Reasons: Training Differentiable Models by Constraining their Influence Function Xiaoting Shao, Arseny Skryagin, Patrick Schramowski, Wolfgang Stammer, Kristian Kersting; AAAI 2021 paper
-
User Driven Model Adjustment via Boolean Rule Explanations Elizabeth Daly, Massimiliano Mattetti, Öznur Alkan, Rahul Nair; AAAI 2021 paper
-
Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers Bhavya Ghai, Vera Liao, Yunfeng Zhang, Rachel Bellamy, Klaus Mueller. Proc. ACM Hum.-Comput. Interact. 2021 paper
-
Bandits for Learning to Explain from Explanations Freya Behrens, Stefano Teso, Davide Mottin; XAI Workshop 2021 paper code
Notes: preliminary.
-
HILDIF: Interactive Debugging of NLI Models Using Influence Functions Hugo Zylberajch, Piyawat Lertvittayakumjorn, Francesca Toni; InterNLP Workshop 2021 paper code
-
Refining Neural Networks with Compositional Explanations Huihan Yao, Ying Chen, Qinyuan Ye, Xisen Jin, Xiang Ren; arXiv 2021 paper code
-
Interactive Label Cleaning with Example-based Explanations Stefano Teso, Andrea Bontempelli, Fausto Giunchiglia, Andrea Passerini; NeurIPS 2021 paper code
-
Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, Lin Guan; AAAI 2022 paper
-
Toward a Unified Framework for Debugging Gray-box Models Andrea Bontempelli, Fausto Giunchiglia, Andrea Passerini, Stefano Teso; AAAI-22 Workshop on Interactive Machine Learning paper
-
Active Learning by Acquiring Contrastive Examples Katerina Margatina, Giorgos Vernikos, Loïc Barrault, Nikolaos Aletras; EMNLP 2021 paper code
-
Finding and Fixing Spurious Patterns with Explanations Gregory Plumb, Marco Tulio Ribeiro, Ameet Talwalkar; arXiv 2021 paper
-
Interactively Generating Explanations for Transformer Language Models Patrick Schramowski, Felix Friedrich, Christopher Tauchmann, Kristian Kersting; arXiv 2021 paper
-
Interaction with Explanations in the XAINES Project Mareike Hartmann, Ivana Kruijff-Korbayová, Daniel Sonntag; arXiv 2021 paper
-
A Rationale-Centric Framework for Human-in-the-loop Machine Learning Jinghui Lu, Linyi Yang, Brian Mac Namee, Yue Zhang; ACL 2022 paper code
-
A Typology to Explore and Guide Explanatory Interactive Machine Learning Felix Friedrich, Wolfgang Stammer, Patrick Schramowski, Kristian Kersting; arXiv 2022 paper
-
CAIPI in Practice: Towards Explainable Interactive Medical Image Classification Emanuel Slany, Yannik Ott, Stephan Scheele, Jan Paulus, Ute Schmid; IFIP International Conference on Artificial Intelligence Applications and Innovations, 2022 paper
-
Semantic Interactive Learning for Text Classification: A Constructive Approach for Contextual Interactions Semastian Kiefer, Mareike Hoffmann, Ute Schmid; Machine Learning and Knowledge Extraction, 2022 paper
-
Impact of Feedback Type on Explanatory Interactive Learning Misgina Tsighe Hagos, Kathleen Curran, Brian Mac Namee; ISMIS 2022 paper
-
Leveraging Explanations in Interactive Machine Learning: An Overview Stefano Teso, Öznur Alkan, Wolfgang Stammer, Elizabeth Daly; Frontiers in AI 2023 paper preprint
-
Concept-level Debugging of Part-prototype Networks Andrea Bontempelli, Stefano Teso, Fausto Giunchiglia, Andrea Passerini; ICLR 2023 paper code
-
Learning to Intervene on Concept Bottlenecks David Steinmann, Wolfgang Stammer, Felix Friedrich, Kristian Kersting; arXiv 2023 paper
-
Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati; Human And Machine in-the-Loop Evaluation and Learning Strategies paper
-
Learning from explanations and demonstrations: A pilot study Silvia Tulli, Sebastian Wallkötter, Ana Paiva, Francisco Melo, Mohamed Chetouani; Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence 2020 paper
-
Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati; NeurIPS 2021 pdf
-
Model reconstruction from model explanations Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt; FAcct 2019 paper
-
Evaluating Explanations: How much do explanations from the teacher aid students? Danish Pruthi, Bhuwan Dhingra, Livio Baldini Soares, Michael Collins, Zachary C. Lipton, Graham Neubig, and William W. Cohen; arXiv 2020 paper
Notes: defines importance of different kinds of explanations by measuring their impact when used as supervision.
Approaches that regularize the model's explanations in an unsupervised manner, often for improved interpretability.
-
Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients Andrew Ross and Finale Doshi-Velez. AAAI 2018 paper
-
Towards robust interpretability with self-explaining neural networks David Alvarez-Melis, Tommi Jaakkola; NeurIPS 2018 paper
-
Beyond sparsity: Tree regularization of deep models for interpretability Mike Wu, Michael Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez; AAAI 2018 paper
-
Regional tree regularization for interpretability in deep neural networks Mike Wu, Sonali Parbhoo, Michael Hughes, Ryan Kindle, Leo Celi, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez; AAAI 2020 paper
-
Regularizing black-box models for improved interpretability Gregory Plumb, Maruan Al-Shedivat, Ángel Alexander Cabrera, Adam Perer, Eric Xing, Ameet Talwalkar; NeurIPS 2020 paper
-
Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram; CVPR 2020 paper code
-
Trustworthy convolutional neural networks: A gradient penalized-based approach Nicholas Halliwell, Freddy Lecue; arXiv 2020 paper
-
Explainable Models with Consistent Interpretations Vipin Pillai, Hamed Pirsiavash; AAAI 2021 paper code
-
Explanation Consistency Training: Facilitating Consistency-based Semi-supervised Learning with Interpretability Tao Han, Wei-Wei Tu, Yu-Feng Li; AAAI 2021 paper
-
Improving Deep Learning Interpretability by Saliency Guided Training Aya Abdelsalam Ismail, Hector Corrada Bravo, Soheil Feizi; NeurIPS 2021 paper code
-
Generating Deep Networks Explanations with Robust Attribution Alignment Guohang Zeng, Yousef Kowsar, Sarah Erfani, James Bailey; ACML 2021 paper
-
Learning by Self-Explaining Wolfgang Stammer, Felix Friedrich, David Steinmann, Hikaru Shindo, Kristian Kersting; arXiv 2023 paper
-
Interpretable Machine Teaching via Feature Feedback Shihan Su, Yuxin Chen, Oisin Mac Aodha, Pietro Perona, Yisong Yue; Workshop on Teaching Machines, Robots, and Humans 2017 paper
-
Teaching Categories to Human Learners with Visual Explanations Oisin Mac Aodha, Shihan Su, Yuxin Chen, Pietro Perona, Yisong Yue; CVPR 2018 paper
Notes: this is *inverse* teaching, i.e., machine teaches human.
- Improving a neural network model by explanation-guided training for glioma classification based on MRI data
Frantisek Sefcik, Wanda Benesova; arXiv 2021 paper
Notes: based on layer-wise relevance propagation.
Explanation-based learning, focuses on logic-based formalisms and learning strategies:
-
Explanation-based generalization: A unifying view Tom Mitchell, Richard Keller, Smadar Kedar-Cabelli; MLJ 1986 paper
-
Explanation-based learning: An alternative view Gerald DeJong, Raymond Mooney; MLJ 1986 paper
-
Explanation-based learning: A survey of programs and perspectives Thomas Ellman; ACM Computing Surveys 1989 paper
-
Probabilistic explanation based learning Angelika Kimmig, Luc De Raedt, Hannu Toivonen; ECML 2007 paper
Injecting invariances / feature constraints into models:
-
Tangent Prop - A formalism for specifying selected invariances in an adaptive network Patrice Simard, Bernard Victorri, Yann Le Cun, John Denker; NeurIPS 1992 paper
Notes: injects invariances into a neural net by regularizing its gradient; precursor to learning from gradient-based explanations.
-
Training invariant support vector machines Dennis DeCoste, Bernhard Schölkopf; MLJ 2002 paper
-
The constrained weight space svm: learning with ranked features Kevin Small, Byron Wallace, Carla Brodley, Thomas Trikalinos; ICML 2011 paper
Dual label-feature feedback:
-
Active learning with feedback on features and instances Hema Raghavan, Omid Madani, Rosie Jones; JMLR 2006 paper
-
An interactive algorithm for asking and incorporating feature feedback into support vector machines Hema Raghavan, James Allan; ACM SIGIR 2007 paper
-
Learning from labeled features using generalized expectation criteria Gregory Druck, Gideon Mann, Andrew McCallum; ACM SIGIR 2008 paper
-
Active learning by labeling features Gregory Druck, Burr Settles, Andrew McCallum; EMNLP 2009 paper
-
A unified approach to active dual supervision for labeling features and examples Josh Attenberg, Prem Melville, Foster Provost; ECML-PKDD 2010 paper
-
Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances Burr Settles; EMNLP 2011 paper
-
Learning from discriminative feature feedback Sanjoy Dasgupta, Akansha Dey, Nicholas Roberts, Sivan Sabato; NeurIPS 2018 paper
-
Robust Learning from Discriminative Feature Feedback Sanjoy Dasgupta, Sivan Sabato; AISTATS 2020 paper
-
Practical Benefits of Feature Feedback Under Distribution Shift Anurag Katakkar, Weiqin Wang, Clay Yoo, Zachary Lipton, Divyansh Kaushik; arXiv 2021 paper
Learning from rationales:
-
Using “annotator rationales” to improve machine learning for text categorization Omar Zaidan, Jason Eisner, Christine Piatko; NAACL 2007 paper
-
Modeling annotators: A generative approach to learning from annotator rationales Omar Zaidan, Jason Eisner; EMNLP 2008 paper
-
Active learning with rationales for text classification Manali Sharma, Di Zhuang, Mustafa Bilgic; NAACL 2015 paper
Counterfactual augmentation:
-
Learning The Difference That Makes A Difference With Counterfactually-Augmented Data Divyansh Kaushik, Eduard Hovy, Zachary Lipton; ICLR 2019 paper code
-
Explaining the Efficacy of Counterfactually Augmented Data Divyansh Kaushik, Amrith Setlur, Eduard H. Hovy, Zachary Lipton; ICLR 2021. paper code
-
An Investigation of the (In)effectiveness of Counterfactually-augmented Data Nitish Joshi, He He; arXiv 2021 paper
Critiquing in recommenders:
-
Critiquing-based recommenders: survey and emerging trends Li Chen, Pearl Pu; User Modeling and User-Adapted Interaction 2012 paper
-
Coactive critiquing: Elicitation of preferences and features Stefano Teso, Paolo Dragone, Andrea Passerini; AAAI 2017 paper
Gray-box models:
- Concept bottleneck models Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang; ICML 2020 paper
A selection of general resources on Explainable AI focusing on overviews, surveys, societal implications, and critiques:
-
Survey and critique of techniques for extracting rules from trained artificial neural networks Robert Andrews, Joachim Diederich, Alan B. Tickle; Knowledge-based systems 1995 page
-
Toward harnessing user feedback for machine learning Simone Stumpf, Vidya Rajaram, Lida Li, Margaret Burnett, Thomas Dietterich, Erin Sullivan, Russell Drummond, Jonathan Herlocker; IUI 2007 paper
-
The Mythos of Model Interpretability Zachary Lipton; CACM 2016 paper
-
A survey of methods for explaining black box models Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi; ACM Computing Surveys 2018 paper
-
Sanity checks for saliency maps Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim; NeurIPS 2018 paper code
-
Recognition in terra incognita Sara Beery, Grant Van Horn, Pietro Perona; ECCV 2018 paper
-
Explanation in Artificial Intelligence: Insights from the Social Sciences Tim Miller; Artificial Intelligence, 2019 paper
-
Unmasking clever hans predictors and assessing what machines really learn Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller; Nature Communications 2019 paper
-
Interpretation of neural networks is fragile Amirata Ghorbani, Abubakar Abid, James Zou; AAAI 2019 paper
-
A Benchmark for Interpretability Methods in Deep Neural Networks Sara Hooke, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim; NeurIPS 2019 paper code
-
Is Attention Interpretable? Sofia Serrano, Noah A. Smith; ACL 2019 paper
-
Attention is not Explanation Sarthak Jain, and Byron C. Wallace; ACL 2019 paper
-
Attention is not not Explanation Sarah Wiegreffe, and Yuval Pinter; EMNLP-IJCNLP 2019 paper
-
The (un)reliability of saliency methods Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim; Explainable AI: Interpreting, Explaining and Visualizing Deep Learning 2019 paper
-
Explanations can be manipulated and geometry is to blame Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel; NeurIPS 2019 paper
-
Fooling Neural Network Interpretations via Adversarial Model Manipulation Juyeon Heo, Sunghwan Joo, and Taesup Moon; NeurIPS 2019 paper
-
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Cynthia Rudin; Nature Machine Intelligence 2019 page
-
The Principles and Limits of Algorithm-in-the-loop Decision Making Ben Green, Yiling Chen; PACM HCI 2019 paper
-
Shortcut learning in deep neural networks Robert Geirhos, Jorn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, Felix Wichmann; Nature Machine Intelligence 2020 page
-
When Explanations Lie: Why Many Modified BP Attributions Fail Leon Sixt, Maximilian Granz, Tim Landgraf. ICML 2020 paper
-
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? Jasmijn Bastings, Katja Filippova; Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP 2020 paper
-
Why Attention is Not Explanation: Surgical Intervention and Causal Reasoning about Neural Models Christopher Grimsley, Elijah Mayfield, Julia Bursten; Language Resources and Evaluation Conference 2020 paper
-
AI for radiographic COVID-19 detection selects shortcuts over signal Alex DeGrave, Joseph Janizek, Su-In Lee; Nature Machine Intelligence 2021 paper code
-
How Well do Feature Visualizations Support Causal Understanding of CNN Activations? Roland Zimmermann,Judy Borowski, Robert Geirhos, Matthias Bethge, Thomas Wallis, Wieland Brendel; arXiv 2021 paper
-
Post hoc explanations may be ineffective for detecting unknown spurious correlation Julius Adebayo, Michael Muelly, Harold Abelson, and Been Kim; ICLR 2022 paper code
-
Where is the Truth? The Risk of Getting Confounded in a Continual World Florian Peter Busch, Roshni Kamath, Rupert Mitchell, Wolfgang Stammer, Kristian Kersting, Martin Mundt
-
Multimodal explanations: Justifying decisions and pointing to the evidence Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach; CVPR 2018 paper
-
Learning Deep Attribution Priors Based On Prior Knowledge Ethan Weinberger, Joseph Janizek, Su-In Lee; NeurIPS 2020 paper
- Crawl & reference work on NLP.
This list is directly inspired by all the awesome awesome lists out there!