Datalog: An Introduction to Logic Programming
filed under : #scheme #programming #computerScience #AI
This blog post is based off of a lecture that I gave at SUNY Oneonta in September 2016 titled Some fixed-point theory and its applications to logic programming.
Datalog is a declarative logic programming language that was invented back in the 1970s. Declarative programming refers to a paradigm of programming where the logic of the computation is expressed without describing any control flow. This is vastly different from other languages such as C++, Python, or Java, where control flow plays a central part. However, there's a little bit of a misnomer with Datalog -- Datalog is not Turing-complete. This is the case as Datalog programs are always guaranteed to halt, which we shall prove in this post. Datalog is a good starting point if you're interested in how things work in the logic programming or knowledge representation worlds. After all, it's what I started with when I took a course with Pascal Hitzler in my undergraduate years. In this post, I'll present all the algorithms in Scheme.
Theoretical Background of Partial Orderings
Recall that $(L, \sqsubseteq)$ is a partial order if the operator $\sqsubseteq$ has the following properties:
- For $x \in L$, $x \sqsubseteq x$.
- For $x, y \in L$, if $x \sqsubseteq y$ and $y \sqsubseteq x$, then $x = y$.
- For $x, y, z \in L$, if $x \sqsubseteq y$ and $y \sqsubseteq z$, then $x \sqsubseteq z$.
The partial order $(L, \sqsubseteq)$ is called a complete partial order (cpo) if $L$ has a least element $\perp$ (that is for all $x \in L$ then $\perp \sqsubseteq x$), and every $\omega$-chain $\{x_1 \sqsubseteq x_2 \sqsubseteq x_3 \sqsubseteq \cdots\}$ has a supremum (least upper bound) in $L$. A subset $D \subseteq L$ is called a directed set if for any $a$ and $b$ in $L$, there exists a $c \in L$ with $a \sqsubseteq c$ and $b \sqsubseteq c$.
Let $L_1$ and $L_2$ be two cpo's. A function $f \colon L_1 \to L_2$ is called scott-continuous if for every directed subset $D$ of $L_1$ with a supreme $\bigsqcup D$ in $L_1$, we have $\bigsqcup f(D) = f(\bigsqcup D)$
Datalog Languages
A Datalog Language $(V,C,R)$ consists of the following:- A finite set of variables $V$. (Elements are usually denoted by $X$, $Y$, $Z$, $X_1$, $X_2$, etc.)
- A finite non-empty set of constants $C$. (Elements are denoted by $a$, $b$, $c$, etc.)
- A finite non-empty set $R$ of predicate symbols with each predicate having some arity $n > 0$.
This is a little abstract, so let's look at an example. The
first datalog language we will look at is called
Graph
. Graph
has the following:
- $V = \{X, Y, Z\}$,
- $C = \{a, b, c, d\}$, and
- $R = \{\texttt{edge}(2), \texttt{path}(2)\}$.
You can think of $C$ being the set of vertices of some graph
while edge
is the relation that tells us if there
is an edge between two vertices, and path
is a
relation that tells us if there is a path between two vertices.
Next let's see how we can implement a datalog language in
Scheme. First we can implement a list holding three elements:
variables, constants, and relations. Second, variables and
constants can be implemented with just lists of symbols.
Lastly, a relation will be a cons
pair:
(symbol . number)
. The symbol is the name of the
relation, while the number holds the arity of the relation.
Finally, let's see what the above example would look like in
Scheme.
(define Graph
'((X Y Z)
(a b c d)
((edge . 2) (path . 2))))
Datalog Rules and Programs
Now that we have defined a Datalog language, we need a few more definitions before we can do anything else. Let $L$ be a fixed Datalog language that we'll use for the following definitions.
The first definition will be an atom. An atom is of
the form $P(V_1, V_2, \dots, V_n)$ where $P$ is a relation with
arity $n$, and $V_1, V_2, \dots, V_n \in C \cup V$. A
ground atom $P(V_1, V_2, \dots, V_n)$ is an atom with
$V_i \in C$ for $i = 1,2, \dots, n$. Examples of ground atoms
would be edge(a,b)
or path(a,b)
. We
can encode these two examples easily in Scheme by doing the
following:
'(edge a b)
'(path a b)
Next up are Datalog rules. A Datalog rule is a statement of the form (read the $\wedge$ as "and")
We can encode the Datalog rule
'(A B_1 B_2 ... B_N)
(define (head rule) (car rule))
(define (body rule) (cdr rule))
Now we can define a Datalog program. A Datalog program
is a finite set of datalog rules. Let's give an example
using Graph
as our Datalog language.
(define Graph-Program
`(,Graph ;; Language of program
((path X Y) (edge X Y))
((path X Z) (path X Y) (path Y Z))
((edge a b))
((edge b c))
((edge d a))
((edge d d))))
From the code, you can get the feel of how rules are suppose to work. The first two rules seem to define what a path is. The other rules encode information about the structure of a directed graph. The next section will explain how to systematically interpret this.
Herbrand Interpretations
A Herbrand interpretation is any set of ground atoms. In this section we'll look at how to find the best interpretation.
A substitution $\varphi$ of a rule $R$ is a function that replaces one or more variables with fixed constants. Let's write some Scheme code to give an exact definition.
(define (replace lst x y)
(let loop ((lst lst))
(cond ((null? lst) '())
((eq? (car lst) x) (cons y (loop (cdr lst))))
(else (cons (car lst) (loop (cdr lst)))))))
(define (replace-multiple lst x y)
(let loop ((lst lst) (x x) (y y))
(if (null? x) lst
(loop (replace lst (car x) (car y)) (cdr x) (cdr y)))))
(define (substitute rule variables constants)
(map (lambda (lst) (replace-multiple lst variables constants)) rule))
A ground substitution of a rule $R$ is a substitution that replaces all variables with constants. We can now define the ground of a rule $R$: as
To implement a procedure for grounding an atom, let's first make a special function that loops through all $n$-tuples of a set. For example for the set $\{a,b\}$, the set of $2$-tuples is $\{(a,a), (a,b), (b,a), (b,b)\}$. Let me first show you the Scheme function and then we'll take a look at what it does.
(define (tuples set n)
(if (= n 0) (lambda () '())
(let ((current set) (next (tuples set (- n 1))))
(lambda ()
(let ((value '()) (tuple (next)))
(cond ((null? current) (set! current set) '())
((= n 1) (set! value (car current))
(set! current (cdr current)) `(,value))
((null? tuple) (set! next (tuples set (- n 1)))
(set! current (cdr current))
(if (null? current) (begin (set! current set) '())
(cons (car current) (next))))
(else (cons (car current) tuple))))))))
So let me help you through the mess of parentheses. The
procedure tuples
takes as input a set and a natural
number $n$. With this input, tuples
then returns a
procedure that will give a unique $n$-tuple every time it's
called. When this procedure has no more $n$-tuples to iterate
through, it then returns the empty list. After that it resets
itself. Here's an example run in Guile Scheme:
scheme> (define tuple (tuples '(a b) 2))
scheme> (tuple)
$1 = (a a)
scheme> (tuple)
$2 = (a b)
scheme> (tuple)
$3 = (b a)
scheme> (tuple)
$4 = (b b)
scheme> (tuple)
$5 = ()
scheme> (tuple)
$6 = (a a)
This magic comes from the power of closures. If you're not familiar with the concept, you can read about them here.
Luckily, tuples
is the only complicated piece of
machinery we'll need. Now grounding an atom can be easily
translated to code as follows:
(define (ground language atom)
(let* ((variables (get-all-variables language atom))
(n (length variables))
(next (tuples (constants language) n)))
(if (= 0 n) `(,atom)
(let loop ((tuple (next)))
(if (null? tuple) '()
(cons (replace-multiple atom variables tuple) (loop (next))))))))
A Herbrand interpretation $I$ is called a Herbrand Model of a Datalog program $P$ if for every rule $A \leftarrow B_1 \wedge B_2 \wedge \cdots \wedge B_n$ in $\mbox{ground}(P)$ with $\{B_1, B_2, \dots, B_n\} \subseteq I$ we have $A \in I$.
The Herbrand base $B_P$ is the set of all ground atoms. It's the largest Herbrand interpretation. Let $I_P$ be the set of all the Herbrand interpretations on P. $I_P$ along with the set operation $\subseteq$ has a structure of a cpo. Define the $T_P$ operator as $T_P \colon I_P \to I_P$ by
We have two facts now
You might ask, how do we find the pre-fixed points of $T_P$? Well let's go back to the Kleene fixed-point theorem. In the proof we presented an algorithm, which we can transcribe here. First, let's define $T_P \uparrow 0 = \varnothing$, and recursively we define $T_P\uparrow n = T_P(T_P \uparrow (n-1))$. Finally, we define
We can encode $T_P$ with the last piece of code:
(define (deduce-from-rule language interpretation rule)
(let* ((variables (apply get-all-variables (cons language (body rule))))
(next (tuples (constants language) (length variables))))
(if (null? variables) (ground language (head rule))
(let loop ((tuple (next)))
(if (null? tuple) '()
(let* ((trial (substitute rule variables tuple)))
(cond ((for-all? (body trial) (lambda (x) (in? x interpretation)))
(union (ground language (head trial)) (loop (next))))
(else (loop (next))))))))))
(define (tp program interpretation)
(let ((language (car program)))
(apply union (map (lambda (rule)
(deduce-from-rule language interpretation rule))
(cdr program)))))
I just have one last note about this algorithm. I reused the
tuples
to implement the $T_P$ operator. I think
this make the algorithm more faithful to the original
mathematics. Unfortunately, this sacrifices performance. A
better approach would be to use backtracking.