Introductory

2020
Emile van Krieken, Erman Acara, and Frank van Harmelen. 2/14/2020. “Analyzing Differentiable Fuzzy Logic Operators”. PDFAbstract
In recent years there has been a push to integrate symbolic AI and deep learning, as it is argued that the strengths and weaknesses of these approaches are complementary. One such trend in the literature are weakly supervised learning techniques that use operators from fuzzy logics. They employ prior background knowledge described in logic to benefit the training of a neural network from unlabeled and noisy data. By interpreting logical symbols using neural networks, this background knowledge can be added to regular loss functions used in deep learning to integrate reasoning and learning. In this paper, we analyze how a large collection of logical operators from the fuzzy logic literature behave in a differentiable setting. We find large differences between the formal properties of these operators that are of crucial importance in a differentiable learning setting. We show that many of these operators, including some of the best known, are highly unsuitable for use in a differentiable learning setting. A further finding concerns the treatment of implication in these fuzzy logics, with a strong imbalance between gradients driven by the antecedent and the consequent of the implication. Finally, we empirically show that it is possible to use Differentiable Fuzzy Logics for semi-supervised learning. However, to achieve the most significant performance improvement over a supervised baseline, we have to resort to non-standard combinations of logical operators which perform well in learning, but which no longer satisfy the usual logical laws. We end with a discussion on extensions to large-scale problems.
Martín Abadi and Gordon Plotkin. 2/2020. “A Simple Differentiable Programming Language.” PACML, POPL. PDFAbstract
Automatic differentiation plays a prominent role in scientific computing and in modern machine learning, often in the context of powerful programming systems. The relation of the various embodiments of automatic differentiation to the mathematical notion of derivative is not always entirely clear—discrepancies can arise, sometimes inadvertently. In order to study automatic differentiation in such programming contexts, we define a small but expressive programming language that includes a construct for reverse-mode differentiation. We give operational and denotational semantics for this language. The operational semantics employs popular implementation techniques, while the denotational semantics employs notions of differentiation familiar from real analysis. We establish that these semantics coincide.
2019
Deepmind. 11/2019. “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model”. Publisher's VersionAbstract
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from selfplay. Here, we introduce an algorithm based solely on reinforcement learning, without human data, guidance, or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo.
Fei Wang, Daniel Zheng, James Decker, Xilun Wu, Gregory Essertel, and Tiark Rompf. 8/2019. “Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator.” PACML, ICFP. PDFAbstract
Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests in crucial ways on gradient-descent optimization and the ability to learn parameters of a neural network by backpropagating observed errors. However, neural network architectures are growing increasingly sophisticated and diverse, which motivates an emerging quest for even more general forms of differentiable programming, where arbitrary parameterized computations can be trained by gradient descent. In this paper, we take a fresh look at automatic differentiation (AD) techniques, and especially aim to demystify the reverse-mode form of AD that generalizes backpropagation in neural networks.
We uncover a tight connection between reverse-mode AD and delimited continuations, which permits implementing reverse-mode AD purely via operator overloading and without any auxiliary data structures. We further show how this formulation of AD can be fruitfully combined with multi-stage programming (staging), leading to a highly efficient implementation that combines the performance benefits of deep learning frameworks based on explicit reified computation graphs (e.g., TensorFlow) with the expressiveness of pure library approaches (e.g., PyTorch).      
Luc De Raedt, Robin Manhaeve, Sebastijan Dumancic, Thomas Demeester, and Angelika Kimmig. 2019. “Neuro-Symbolic = Neural + Logical + Probabilistic.” In NySe @ JCAI. PDFAbstract
The overall goal of neuro-symbolic computation is to integrate high-level reasoning with low-level perception. We argue 1) that neuro-symbolic computation should integrate neural networks with the two most prominent methods for reasoning, that is, logic and probability, and 2) that neuro-symbolic integrated methods should have the pure neural, logical and probabilistic methods as special cases. We examine the state-of-the-art with regard to these claims and briefly position our own contribution DeepProbLog in this perspective.
Mukund Raghothaman, Kihong Heo, Xujie Si, and Mayur Naik. 2019. “Synthesizing Datalog Programs using Numerical Relaxation.” IJCAI. PDFAbstract
The problem of learning logical rules from examples arises in diverse fields, including program synthesis, logic programming, and machine learning. Existing approaches either involve solving computationally difficult combinatorial problems, or performing parameter estimation in complex statistical models.
In this paper, we present Difflog, a technique to extend the logic programming language Datalog to the continuous setting. By attaching real-valued weights to individual rules of a Datalog program, we naturally associate numerical values with individual conclusions of the program. Analogous to the strategy of numerical relaxation in optimization problems, we can now first determine the rule weights which cause the best agreement between the training labels and the induced values of output tuples, and subsequently recover the classical discrete-valued target program from the continuous optimum.
We evaluate Difflog on a suite of 34 benchmark problems from recent literature in knowledge discovery, formal verification, and database query-by-example, and demonstrate significant improvements in learning complex programs with recursive rules, invented predicates, and relations of arbitrary arity.
2018
Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt. 12/2018. “DeepProbLog: Neural Probabilistic Logic Programming.” NeurIPS. PDFAbstract
We introduce DeepProbLog, a probabilistic logic programming language that incorporates deep learning by means of neural predicates. We show how existing inference and learning techniques can be adapted for the new language. Our experiments demonstrate that DeepProbLog supports both symbolic and subsymbolic representations and inference, 1) program induction, 2) probabilistic (logic) programming, and 3) (deep) learning from examples. To the best of our knowledge, this work is the first to propose a framework where general-purpose neural networks and expressive probabilistic-logical modeling and reasoning are integrated in a way that exploits the full expressiveness and strengths of both worlds and can be trained end-to-end based on examples.      
Noah D. Goodman and Joshua B. Tenenbaum. 11/13/2018. Probabilistic Models of Cognition. 2nd ed. Publisher's VersionAbstract
This book explores the probabilistic approach to cognitive science, which models learning and reasoning as inference in complex probabilistic models. We examine how a broad range of empirical phenomena, including intuitive physics, concept learning, causal reasoning, social cognition, and language understanding, can be modeled using probabilistic programs (using the WebPPL language).
Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. 9/2018. “An Introduction to Probabilistic Programming”. PDFAbstract
This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages.
We start with a discussion of model-based reasoning and explain why conditioning as a foundational computation is central to the fields of probabilistic machine learning and artificial intelligence. We then introduce a simple first-order probabilistic programming language (PPL) whose programs define static-computation-graph, finite-variable-cardinality models. In the context of this restricted PPL we introduce fundamental inference algorithms and describe how they can be implemented in the context of models denoted by probabilistic programs.
In the second part of this document, we introduce a higher-order probabilistic programming language, with a functionality analogous to that of established programming languages. This affords the opportunity to define models with dynamic computation graphs, at the cost of requiring inference methods that generate samples by repeatedly executing the program. Foundational inference algorithms for this kind of probabilistic programming language are explained in the context of an interface between program executions and an inference controller.
This document closes with a chapter on advanced topics which we believe to be, at the time of writing, interesting directions for probabilistic programming research; directions that point towards a tight integration with deep neural network research and the development of systems for next-generation artificial intelligence applications.
Conal Elliott. 8/2018. “The simple essence of automatic differentiation.” PACML, ICFP. PDFAbstract
Automatic differentiation (AD) in reverse mode (RAD) is a central component of deep learning and other uses of large-scale optimization. Commonly used RAD algorithms such as backpropagation, however, are complex and stateful, hindering deep understanding, improvement, and parallel execution. This paper develops a simple, generalized AD algorithm calculated from a simple, natural specification. The general algorithm is then specialized by varying the representation of derivatives. In particular, applying well-known constructions to a naive representation yields two RAD algorithms that are far simpler than previously known. In contrast to commonly used RAD implementations, the algorithms defined here involve no graphs, tapes, variables, partial derivatives, or mutation. They are inherently parallel-friendly, correct by construction, and usable directly from an existing programming language with no need for new data types or programming style, thanks to use of an AD-agnostic compiler plugin.      
Armando Solar-Lezama. 2018. Introduction to Program Synthesis. Publisher's Version
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. 2nd ed. Cambridge, MA. Author's Version
2017
Finale Doshi-Velez and Been Kim. 3/2017. “Towards A Rigorous Science of Interpretable Machine Learning”. PDFAbstract
As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.
Owain Evans, Andreas Stuhlmüller, John Salvatier, and Daniel Filan. 2017. “Modeling Agents with Probabilistic Programs”. Publisher's VersionAbstract

This book describes and implements models of rational agents for (PO)MDPs and Reinforcement Learning. One motivation is to create richer models of human planning, which capture human biases and bounded rationality.

Agents are implemented as differentiable functional programs in a probabilistic programming language based on Javascript. Agents plan by recursively simulating their future selves or by simulating their opponents in multi-agent games. Our agents and environments run directly in the browser and are easy to modify and extend.

The book assumes basic programming experience but is otherwise self-contained. It includes short introductions to “planning as inference”MDPsPOMDPsinverse reinforcement learninghyperbolic discountingmyopic planning, and multi-agent planning.

Matko Bosnjak, Tim Rocktäschel, Jason Naradowsky, and Sebastian Riedel. 2017. “Programming with a Differentiable Forth Interpreter.” ICML. PDFAbstract
Given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. In this paper, we consider the case of prior procedural knowledge for neural networks, such as knowing how a program should traverse a sequence, but not what local actions should be performed at each step. To this end, we present an end-to-end differentiable interpreter for the programming language Forth which enables programmers to write program sketches with slots that can be filled with behaviour trained from program input-output data. We can optimise this behaviour directly through gradient descent techniques on user-specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex behaviours such as sequence sorting and addition. When connected to outputs of an LSTM and trained jointly, our interpreter achieves state-of-the-art accuracy for end-to-end reasoning about quantities expressed in natural language stories.      
2015
Stephen Muggleton and Dianhuan Lin. 2015. “Meta-Interpretive Learning of Higher-Order Dyadic Datalog: Predicate invention revisited.” IJCAI. PDFAbstract
In recent years Predicate Invention has been underexplored within Inductive Logic Programming due to difficulties in formulating efficient search mechanisms. However, a recent paper demonstrated that both predicate invention and the learning of recursion can be efficiently implemented for regular and context-free grammars, by way of abduction with respect to a meta-interpreter. New predicate symbols are introduced as constants representing existentially quantified higher-order variables. In this paper we generalise the approach of Meta-Interpretive Learning (MIL) to that of learning higher-order dyadic datalog programs. We show that with an infinite signature the higher-order dyadic datalog class H2 2 has universal Turing expressivity though H2 2 is decidable given a finite signature. Additionally we show that Knuth-Bendix ordering of the hypothesis space together with logarithmic clause bounding allows our Dyadic MIL implementation MetagolD to PAC-learn minimal cardinailty H2 2 definitions. This result is consistent with our experiments which indicate that MetagolD efficiently learns compact H2 2 definitions involving predicate invention for robotic strategies and higher-order concepts in the NELL language learning domain.
2013
Andreas Stuhlmüller and Noah D. Goodman. 2013. “Reasoning about Reasoning by Nested Conditioning: Modeling Theory of Mind with Probabilistic Programs.” Journal of Cognitive Systems Research. PDFAbstract
A wide range of human reasoning patterns can be explained as conditioning in probabilistic models; however, conditioning has traditionally been viewed as an operation applied to such models, not represented in such models. We describe how probabilistic programs can explicitly represent conditioning as part of a model. This enables us to describe reasoning about others’ reasoning using nested conditioning. Much of human reasoning is about the beliefs, desires, and intentions of other people; we use probabilistic programs to formalize these inferences in a way that captures the flexibility and inherent uncertainty of reasoning about other agents. We express examples from game theory, artificial intelligence, and linguistics as recursive probabilistic programs and illustrate how this representation language makes it easy to explore new directions in each of these fields. We discuss the algorithmic challenges posed by these kinds of models and describe how Dynamic Programming techniques can help address these challenges.
1980
Randall Davis. 3/1980. “Meta-Rules: Reasoning About Control.” Artificial Intelligence. Publisher's VersionAbstract
How can we insure that knowledge embedded in a program is applied effectively? Traditionally the answer to this question has been sought in different problem solving paradigms and in different approaches to encoding and indexing knowledge. Each of these is useful with a certain variety of problem, but they all share a common problem: they become ineffective in the face of a sufficiently large knowledge base. How then can we make it possible for a system to continue to function in the face of a very large number of plausibly useful chunks of knowledge? In response to this question we propose a framework for viewing issues of knowledge indexing and retrieval, a framework that includes what appears to be a useful perspective on the concept of a strategy. We view strategies as a means of controlling invocation in situations where traditional selection mechanisms become ineffective. We examine ways to effect such control, and describe meta-rules, a means of specifying strategies which offers a number of advantages. We consider at some length how and when it is useful to reason about control, and explore the advantages meta-rules offer for doing this.
1978
Richard W. Weyhrauch. 1978. Prolegomena to a theory of formal reasoning. Stanford. PDFAbstract
This is an informal description of my ideas about using formal logic as a tool for reasoning systems using computers. The theoretical ideas are illustrated by the features of FOL. All of the examples presented have actually run using the FOL system.