Datalog: An Introduction to Logic Programming

September 26, 2018
filed under : #scheme #programming #computerScience #AI

This blog post is based off of a lecture that I gave at SUNY Oneonta in September 2016 titled Some fixed-point theory and its applications to logic programming.

Datalog is a declarative logic programming language that was invented back in the 1970s. Declarative programming refers to a paradigm of programming where the logic of the computation is expressed without describing any control flow. This is vastly different from other languages such as C++, Python, or Java, where control flow plays a central part. However, there's a little bit of a misnomer with Datalog -- Datalog is not Turing-complete. This is the case as Datalog programs are always guaranteed to halt, which we shall prove in this post. Datalog is a good starting point if you're interested in how things work in the logic programming or knowledge representation worlds. After all, it's what I started with when I took a course with Pascal Hitzler in my undergraduate years. In this post, I'll present all the algorithms in Scheme.

Theoretical Background of Partial Orderings

Recall that $(L, \sqsubseteq)$ is a partial order if the operator $\sqsubseteq$ has the following properties:

For $x \in L$, $x \sqsubseteq x$.
For $x, y \in L$, if $x \sqsubseteq y$ and $y \sqsubseteq x$, then $x = y$.
For $x, y, z \in L$, if $x \sqsubseteq y$ and $y \sqsubseteq z$, then $x \sqsubseteq z$.

The partial order $(L, \sqsubseteq)$ is called a complete partial order (cpo) if $L$ has a least element $\perp$ (that is for all $x \in L$ then $\perp \sqsubseteq x$), and every $\omega$-chain $\{x_1 \sqsubseteq x_2 \sqsubseteq x_3 \sqsubseteq \cdots\}$ has a supremum (least upper bound) in $L$. A subset $D \subseteq L$ is called a directed set if for any $a$ and $b$ in $L$, there exists a $c \in L$ with $a \sqsubseteq c$ and $b \sqsubseteq c$.

Let $L_1$ and $L_2$ be two cpo's. A function $f \colon L_1 \to L_2$ is called scott-continuous if for every directed subset $D$ of $L_1$ with a supreme $\bigsqcup D$ in $L_1$, we have $\bigsqcup f(D) = f(\bigsqcup D)$

Proposition. Every scott-continuous function is monotone.

Proof. Suppose $x \sqsubseteq y$, then $\bigsqcup \{x, y\} = y$, and so

$$ \bigsqcup f(\{x,y\}) = \bigsqcup\{f(x), f(y)\} = f(\bigsqcup\{x, y\}) = f(y). $$

Therefore, $f(x) \sqsubseteq f(y)$.

Theorem A. Kleene fixed-point theorem If $L$ is a cpo with $f \colon L \to L$ being scott-continuous, then $f$ has a fixed point.

Proof. Consider the following $\omega$-chain $C$

$$ \perp \sqsubseteq f(\perp) \sqsubseteq f^2(\perp) \sqsubseteq f^3(\perp) \cdots. $$

Suppose $p = \bigsqcup C$, then we get

$$ f(p) = f(\bigsqcup C) = \bigsqcup f(C) = \bigsqcup \{f(\perp), f^2(\perp), f^3(\perp), \cdots\} = p. $$

This completes the proof.

Datalog Languages

A Datalog Language $(V,C,R)$ consists of the following:

A finite set of variables $V$. (Elements are usually denoted by $X$, $Y$, $Z$, $X_1$, $X_2$, etc.)
A finite non-empty set of constants $C$. (Elements are denoted by $a$, $b$, $c$, etc.)
A finite non-empty set $R$ of predicate symbols with each predicate having some arity $n > 0$.

This is a little abstract, so let's look at an example. The first datalog language we will look at is called Graph. Graph has the following:

$V = \{X, Y, Z\}$,
$C = \{a, b, c, d\}$, and
$R = \{\texttt{edge}(2), \texttt{path}(2)\}$.

You can think of $C$ being the set of vertices of some graph while edge is the relation that tells us if there is an edge between two vertices, and path is a relation that tells us if there is a path between two vertices.

Next let's see how we can implement a datalog language in Scheme. First we can implement a list holding three elements: variables, constants, and relations. Second, variables and constants can be implemented with just lists of symbols. Lastly, a relation will be a cons pair: (symbol . number). The symbol is the name of the relation, while the number holds the arity of the relation. Finally, let's see what the above example would look like in Scheme.

      
(define Graph
  '((X Y Z)
    (a b c d)
    ((edge . 2) (path . 2))))

Datalog Rules and Programs

Now that we have defined a Datalog language, we need a few more definitions before we can do anything else. Let $L$ be a fixed Datalog language that we'll use for the following definitions.

The first definition will be an atom. An atom is of the form $P(V_1, V_2, \dots, V_n)$ where $P$ is a relation with arity $n$, and $V_1, V_2, \dots, V_n \in C \cup V$. A ground atom $P(V_1, V_2, \dots, V_n)$ is an atom with $V_i \in C$ for $i = 1,2, \dots, n$. Examples of ground atoms would be edge(a,b) or path(a,b). We can encode these two examples easily in Scheme by doing the following:

      
'(edge a b)
'(path a b)

Next up are Datalog rules. A Datalog rule is a statement of the form (read the $\wedge$ as "and")

$$ A \leftarrow B_1 \wedge B_2 \wedge \cdots \wedge B_n, $$

where $A, B_1, \dots, B_n$ are atoms. $A$ is referred to as the head, while $B_1 \wedge B_2 \wedge \cdots \wedge B_n$ is called the body. If $n = 0$, for a rule, then it's called a fact.

We can encode the Datalog rule

$$ A \leftarrow B_1 \wedge B_2 \wedge \cdots \wedge B_n, $$

in Scheme as the following

      
'(A B_1 B_2 ... B_N)

(define (head rule) (car rule))
(define (body rule) (cdr rule))

Now we can define a Datalog program. A Datalog program is a finite set of datalog rules. Let's give an example using Graph as our Datalog language.


(define Graph-Program
  `(,Graph ;; Language of program
    ((path X Y) (edge X Y))
    ((path X Z) (path X Y) (path Y Z))
    ((edge a b))
    ((edge b c))
    ((edge d a))
    ((edge d d))))

From the code, you can get the feel of how rules are suppose to work. The first two rules seem to define what a path is. The other rules encode information about the structure of a directed graph. The next section will explain how to systematically interpret this.

Herbrand Interpretations

A Herbrand interpretation is any set of ground atoms. In this section we'll look at how to find the best interpretation.

A substitution $\varphi$ of a rule $R$ is a function that replaces one or more variables with fixed constants. Let's write some Scheme code to give an exact definition.

      
(define (replace lst x y)
  (let loop ((lst lst))
    (cond ((null? lst) '())
          ((eq? (car lst) x) (cons y (loop (cdr lst))))
          (else (cons (car lst) (loop (cdr lst)))))))

(define (replace-multiple lst x y)
  (let loop ((lst lst) (x x) (y y))
    (if (null? x) lst
        (loop (replace lst (car x) (car y)) (cdr x) (cdr y)))))

(define (substitute rule variables constants)
  (map (lambda (lst) (replace-multiple lst variables constants)) rule))

A ground substitution of a rule $R$ is a substitution that replaces all variables with constants. We can now define the ground of a rule $R$: as

$$ \mbox{ground}(R) = \{R\varphi \mid \varphi \mbox{ is a ground substitution for } R\}. $$

Also,

$$ \mbox{ground}(P) = \bigcup_{R \in P}\mbox{ground}(R). $$

To implement a procedure for grounding an atom, let's first make a special function that loops through all $n$-tuples of a set. For example for the set $\{a,b\}$, the set of $2$-tuples is $\{(a,a), (a,b), (b,a), (b,b)\}$. Let me first show you the Scheme function and then we'll take a look at what it does.

      
(define (tuples set n)
  (if (= n 0) (lambda () '())
      (let ((current set) (next (tuples set (- n 1))))
        (lambda ()
          (let ((value '()) (tuple (next)))
            (cond ((null? current) (set! current set) '())
                  ((= n 1) (set! value (car current))
                   (set! current (cdr current)) `(,value))
                  ((null? tuple) (set! next (tuples set (- n 1)))
                   (set! current (cdr current))
                   (if (null? current) (begin (set! current set) '())
                       (cons (car current) (next))))
                  (else (cons (car current) tuple))))))))

So let me help you through the mess of parentheses. The procedure tuples takes as input a set and a natural number $n$. With this input, tuples then returns a procedure that will give a unique $n$-tuple every time it's called. When this procedure has no more $n$-tuples to iterate through, it then returns the empty list. After that it resets itself. Here's an example run in Guile Scheme:

      
scheme> (define tuple (tuples '(a b) 2))
scheme> (tuple)
$1 = (a a)
scheme> (tuple)
$2 = (a b)
scheme> (tuple)
$3 = (b a)
scheme> (tuple)
$4 = (b b)
scheme> (tuple)
$5 = ()
scheme> (tuple)
$6 = (a a)

This magic comes from the power of closures. If you're not familiar with the concept, you can read about them here.

Luckily, tuples is the only complicated piece of machinery we'll need. Now grounding an atom can be easily translated to code as follows:

      
(define (ground language atom)
  (let* ((variables (get-all-variables language atom))
         (n (length variables))
         (next (tuples (constants language) n)))
    (if (= 0 n) `(,atom)
        (let loop ((tuple (next)))
          (if (null? tuple) '()
              (cons (replace-multiple atom variables tuple) (loop (next))))))))

A Herbrand interpretation $I$ is called a Herbrand Model of a Datalog program $P$ if for every rule $A \leftarrow B_1 \wedge B_2 \wedge \cdots \wedge B_n$ in $\mbox{ground}(P)$ with $\{B_1, B_2, \dots, B_n\} \subseteq I$ we have $A \in I$.

The Herbrand base $B_P$ is the set of all ground atoms. It's the largest Herbrand interpretation. Let $I_P$ be the set of all the Herbrand interpretations on P. $I_P$ along with the set operation $\subseteq$ has a structure of a cpo. Define the $T_P$ operator as $T_P \colon I_P \to I_P$ by

$$ T_P(I) = \{A \in B_P \mid (A \leftarrow B_1 \wedge B_2 \wedge \cdots \wedge B_n) \in \mbox{ground}(P) \mbox{ and } B_i \in I\}. $$

We have two facts now

Theorem B. The $T_P$ operator is scott-continuous.

Proof. This follows from the fact that $\bigcup_{i}\{f(X_i)\} = f(\bigcup_i\{X_i\})$.

Theorem C. An interpretation $I$ is called a pre-fixed point of $T_P$ if $T_P(I) \subseteq I$. The Herbrand models are exactly the pre-fixed points of the $T_P$ operator.

Proof. Suppose $I$ is a pre-fixed point of $T_p$, and $R \in \mbox{ground}(P)$ is the rule $A \leftarrow B_1 \wedge B_2 \wedge \cdots \wedge B_n$. If $\{B_1, \dots, B_n\} \subseteq I$, then $A \in T_P(I)$. However, since $I$ is a pre-fixed point, $A \in I$, and so $I$ is a Herbrand model. Conversely, suppose $I$ is a Herbrand model. Suppose $A \in T_P(I)$, then there exists $A \leftarrow B_1 \wedge B_2 \wedge \cdots \wedge B_n$ in $\mbox{ground}(P)$ with $\{B_1, \dots, B_n\} \subseteq I$. Thus, $A \in I$, and the proof is complete.

You might ask, how do we find the pre-fixed points of $T_P$? Well let's go back to the Kleene fixed-point theorem. In the proof we presented an algorithm, which we can transcribe here. First, let's define $T_P \uparrow 0 = \varnothing$, and recursively we define $T_P\uparrow n = T_P(T_P \uparrow (n-1))$. Finally, we define

$$ T_P \uparrow \omega = \bigcup_{n \in \mathbb{N}}T_P \uparrow n. $$

Now since $B_P$ is finite, we have that $T_P \uparrow n$ will come across a pre-fixed point with a finite $n$.

We can encode $T_P$ with the last piece of code:

      
(define (deduce-from-rule language interpretation rule)
  (let* ((variables (apply get-all-variables (cons language (body rule))))
         (next (tuples (constants language) (length variables))))
    (if (null? variables) (ground language (head rule))
        (let loop ((tuple (next)))
      (if (null? tuple) '()
          (let* ((trial (substitute rule variables tuple)))
            (cond ((for-all? (body trial) (lambda (x) (in? x interpretation)))
                   (union (ground language (head trial)) (loop (next))))
                  (else (loop (next))))))))))

(define (tp program interpretation)
  (let ((language (car program)))
    (apply union (map (lambda (rule)
                        (deduce-from-rule language interpretation rule))
                      (cdr program)))))

I just have one last note about this algorithm. I reused the tuples to implement the $T_P$ operator. I think this make the algorithm more faithful to the original mathematics. Unfortunately, this sacrifices performance. A better approach would be to use backtracking.

Kyle Bayes

Theoretical Background of Partial Orderings

Datalog Languages

Datalog Rules and Programs

Herbrand Interpretations