Universality and quines

Quines, named by Douglas Hofstadter in honor of Willard Van Orman Quine, are self-replicating programs, programs that output their own source code. It’s actually just a consequence of Turing completeness of a programming language that it’s possible to construct a quine. For example, here’s a quine in a sort of classic structure in Python:[1]

a = ['print "a =", a', 'for s in a: print s']
print "a =", a
for s in a: print s

In this note, I’m going to show how to make a lambda calculus interpreter in lambda calculus (a “universal machine”), how to use it to prove a kind of fixed-point theorem for the lambda calculus, and then how to get a quine from it.

1. The lambda calculus

Alonzo Church came up with a fairly nice and simple calculus that can be used as a basis for both computation and mathematics. In his version, there are three types of symbolic expressions:

This sort of abstractly models the idea of functions and applying functions to values, but with the oddity that everything is a function. The “semantics” are defined by an equivalence relation on lambda calculus terms. The definitions depend quite a lot on the concept of a free variable, which can be inductively defined as such: The x in xM is called a bound variable. We also need substitution M[xN] that replaces all free instances of x with N: This is a partial function because of the constraint in the second rule, but up to the following equivalence relations one may think of it as being a function.
α-renaming:
If (xM) is a lambda term and yfree(M), (xM)(yM[xy]).
β-reduction:
If ((xM) N) is a lambda term, then ((xM) N)M[xN].
η-conversion:
If xfree(M), then M(x(M x)).

There’s not really a sense in which a lambda calculus term is evaluated per se, but there are various normal forms one could try to put a particular lambda calculus term in (as functional programming languages do). But, reduction strategies have divergent terms, which are lambda calculus terms that do not stabilize while applying the reduction strategy. For example, Ω=((x(x x)) (x(x x))) is generally divergent.

As a shorthand, we will take application to be left-associative, meaning (L M N) is ((L M) N), and abstraction to be right-associative, meaning (xyM) is (x(yM)). One usually thinks of (xyM) as being a two-argument function.[3] We will also sometimes not write outermost parentheses, and we will use the rule that xM N means (x(M N)).

1.1. de Bruijn indexes

Symbolic variables are an annoying complexity for lambda calculus because of all the issues with free variables in α-renaming and substitution. One way around this is to use the natural numbers N={0,1,2,} for variables, where the number represents how many abstractions out that variable is bound from.

Lambda calculus terms when using de Bruijn indexes are

Then, the equivalence relation no longer needs α-renaming. Substitution is

The N+ denotes incrementing each unbound variable appearing in N (this can be thought of as the simultaneous application of the substitutions [nn+1] for all nN).

The equivalence relation is generated by two rules.

β-reduction:
((λM) N)M[0N+].
η-conversion:
M(λ(M+ 0)).

For lambda calculus terms with no free variables, it is a purely mechanical process going back and forth between de Bruijn indexes. One way to start is to α-rename all the bound variables so that they come from some alphabet x0,x1,x2, so that each nested abstraction uses the next variable in the list, and then the index is di where i is from xi and d is the abstraction depth. Terms with free variables can be dealt with by imagining a that those variables have been bound in some specified order.

I sort of like the following variation of de Bruijn indexes, where variables are written in unary rather than from the infinite alphabet N.

The substitution rules M[nN] with nN are Then, while β-reduction is the same, η-conversion is simply M(λ(SM z)). However, this is at the cost of needing an S-reduction rule, the details of which I’ll spare you because they’re easy to figure out if you need them.

2. Algebraic data types

Pretty much every data type that you want can be represented as some expression of products and disjoint unions of basic data types, possibly with some recursion. For example, if represents a singleton type, the the type LL(X) for a linked list of things of type X is recursively defined by LL(X)=(X×LL(X)). These sorts of algebraic expressions to represent types are ADTs (algebraic data types).

In lambda calculus, a value (x,y) of a product type X×Y can be represented by the expression ff x y, where the projector functions are πX=pp (xyx) and πY=pp (xyy). The constructor function is xyff x y. The idea is that a value is something that provides its component values to some supplied function.

A disjoint union XY, on the other hand, is described by its inclusion functions: ιX=x(fgf x) and ιY=y(fgg y), which is a sort of dual to the projectors for the product type. The idea here is that a value is something that selects a function to handle which of the types the value is from.

A famous disjoint union is the two-valued boolean algebra, of two singleton types. In an optimized representation that doesn’t use the values from the singleton types,

T=xyxF=xyy so if b is a boolean, an if statement looks like (b x y)={xif b=Tyif b=F.[5]

In the case of products and disjoint unions of more than two types, there is an obvious optimization that one can apply. For example, for X×Y×Z we can use xyzp x y z as a constructor.

2.1. Representing lambda calculus terms

The point of introducing ADTs is that we can use them to represent lambda calculus terms within the lambda calculus itself. While it is true that technically a lambda calculus term is a string, we will be representing it in a tree form. In particular, we will represent the variation on lambda calculus terms from the end of the de Bruijn indexes section.

Let L denote the type for a lambda calculus term, for which we give the following recursive ADT description

L=LL(L×L) where the four terms correspond to z, successor, abstraction, and application. In particular, we’ll use these constructors:

As an example, the expression for T can be expressed in this type as abs(abs(succ(zero))).

If we have an expression p from L, we can decompose it and convert it into a lambda calculus term. Write I(p) for the corresponding term, the I being for “interpretation.” This I is a function outside the lambda calculus, but we will show the lambda calculus is powerful enough to implement it.

3. An interpreter

In this section we’ll derive a term u such that when p is an expression representing a lambda calculus term from the type L, then u pI(p). This interpreter is the lambda calculus equivalent of a universal machine, because it can take descriptions of lambda terms and convert them into actual lambda terms, compared to descriptions of Turing machines being run as actual Turing machines.

At a high level, the term looks like the following, in a pseudo-Haskell notation. We make the decision that free variables in p evaluate to a diverging term. The env variable is a linked list of lambda calculus terms (actual terms, not representations of them).

u p := eval p empty
eval p env := case p of
    zero -> head env
    succ(e) -> eval e (tail env)
    abs(m) -> (x |-> eval m (push x env))
    app(m1, m2) -> (eval m1 env) (eval m2 env)
empty := nil |-> cons |-> nil
push x list := nil |-> cons |-> cons x list
head list := list fail (x |-> sublist |-> x)
tail list := list empty (x |-> sublist |-> sublist)
fail := fail
The way this works is that the eval function keeps track of an environment of all the substitutions from β-reduction. The interpretation of abs(m) is as an actual lambda abstraction that pushes the value of the bound x to the zeroth entry of the environment list. Furthermore, empty is the empty linked list, push is the constructor, and head and tail are convenience destructors, where head fails in the case the environment is an empty list.

I’m afraid I’m going to be lazy and not explicitly prove that u pI(p), but it is a matter of checking the equivalence relations (and the substitution rule). Note that we do not care about evaluation strategies because the point is just that u p is a lambda term in the right equivalence class.

Now, how do we deal with recursion when the lambda calculus does not seem to provide it? The Y combinator is a way to let a term refer to itself, and its defining relation is that Y ff (Y f). Thus, if a function needs to refer to itself, like in the factorial function

factorial n := if n <= 1
    then 1
    else n * factorial (n - 1)
we can transform it like so
factorial' fact n := if n <= 1
    then 1
    else n * fact (n - 1)
factorial := Y factorial'
where none of the definitions are self-referential. The trick is that
factorial n = Y factorial' n
            = factorial' (Y factorial') n
            = factorial' factorial n
where the equality signs actually mean equivalence, and hence factorial’ is able to refer to something that is equivalent to the factorial expression itself. This Y combinator can actually be given as a lambda calculus term: Y=f(xf (x x)) (xf (x x)) The defining relation is not too difficult to check.

Let’s look at the fail function, which we could define as

fail := Y fail'
fail' f := f
hence fail is Y (xx)(x(x x)) (x(x x))=Ω, the divergent term from before.

Let’s rewrite the definition for u to be non-self-referential, while also substituting in some of the definitions for linked lists.

u p := (Y eval') p (n |-> c |-> n)
eval' eval p env := case p of
  zero -> env omega (x |-> t |-> x)
  succ(e) -> eval e (env env (x |-> t |-> t))
  abs(m) -> (x |-> eval m (n |-> c |-> c x env))
  app(m1, m2) -> (eval m1 env) (eval m2 env)
omega := (x |-> x x) (x |-> x x)
Y f = (x |-> f (x x)) (x |-> f (x x))
For succ(e), there is a trick that if env is empty then by passing env as the first argument the expression is equivalent to the empty list in that case.

Now, to write this in an impenetrable way as a single lambda term:

u := expr |->
  ((f |-> (x |-> f (x x)) (x |-> f (x x)))
    (eval |-> p |-> env |->
     p (env ((x |-> x x) (x |-> x x)) (x |-> t |-> x))
       (e |-> eval e (env env (x |-> t |-> t)))
       (m |-> x |-> eval m (n |-> c |-> c x env))
       (m1 |-> m2 |-> (eval m1 env) (eval m2 env)))
  ) expr (n |-> c |-> n)

Or, if that was too clear, using de Bruijn indexes:

u=λ((λ(λ1(00))(λ1(00)))(λλλ1(0((λ00)(λ00))(λλ1))(λ30(11(λλ0)))(λλ41(λλ024))(λλ(412)(402))))0(λλ1) The point of this last exercise was just to make it obvious that u can be represented as an expression U. If everything worked out, then in fact u Uu.

If we do not care about the behavior for free variables, we could shorten this term by replacing Ω with something random, like 0 for instance, which would have the effect of making free variables evaluate to the empty list.

It’s also possible to write a universal interpreter that enforces a particular reduction strategy, for example the Krivine machine[6], with a short implementation by Scott Kovach.

4. A fixed-point theorem

Consider a lambda term f with the property that for every lambda expression p from L, f p is equivalent to something in L. That is, f can be thought of as an actual function LL. We will show that then f has a fixed point with respect to u, namely that there is some expression q from L such that u (f q)u q. (The argument is adapted from the one presented here.[7])

A key operator in the construction of q is quote:LL which takes a lambda expression and produces a lambda expression that when evaluated is a term that is equivalent to that expression. It’s much less complicated to say in symbols: u quote(p)p. We can (tediously) define this operator inductively.[8]

Hopefully it’s clear how to write down a lambda term that actually implements quote, but it’s similar to the way the case expression in u was implemented.

Let s:LL be defined by s p=apply(p,quote(p)), which creates an expression that represents supplying the representation of a lambda term to itself as an argument.

Let m be the expression for pu (f (s p)), and let q=s m.

u (f q)u (f (s m))(pu (f (s p))) m(u m) mu apply(m,quote(m))u (s m)u q. Hence, q is indeed a fixed point of f with respect to u.

5. Quines

A quine is an expression q in L such that u qq.

Quines are the result of applying this fixed-point theorem to the quote operator. By that theorem, there is some q in L such that u quote(q)u q, but since u quote(q)q, we have obtained a quine.

What does this quine look like? We had that m was the expression for pu quote(apply(p,quote(p)), but this simplifies to papply(p,quote(p)), so we may let m=abs(apply(apply(A,zero),apply(Q,zero))), where A is the lambda expression for apply and Q is a lambda expression for quote. Then, q is apply(m,quote(m)), which expanded a bit is

q=apply(abs(apply(apply(A,zero),apply(Q,zero))),quote(abs(apply(apply(A,zero),apply(Q,zero))))) It doesn’t seem like it would be too illuminating to expand this out anymore, but it is interesting that u was eliminated so easily, suggesting the fixed-point theorem was tremendously overpowered for the purpose of deriving a quine.

Anyway, notice that the way q works is that there is a code part and a data part, and that Q is sort of like a print function. There’s sort of a parallel between the structure of q and the Python program from the beginning.

6. In set theory

As I understand it, one of the ways to study ZFC set theory is by model theory, where a model is a set M along with a relation ()M×M that satisfies all of the axioms. Propositions of first-order logic can themselves be represented as sets (like how lambda terms can be represented in L as lambda calculus terms), and there is a truth term that evaluates the truth of a particular proposition against a particular model.

Unfortunately there is no way to prove in ZFC that there is a model of ZFC (that’s Gödel’s second incompleteness theorem).

As an application of the idea in the fixed-point theorem, let’s show that there’s no absolute notion of Truth in ZFC. If you want, that there’s no truth term τ(p) that evaluates whether or not p is a true proposition in the current model.

The idea is that we can use proposition-level functions defined on set representations of propositions. If f:PP is a proposition-level function on the set of propositions P (i.e., xP1,!y,yP1f(x,y), where we will write f(x) for this y and leave it up to the reader to transform the first-order statement appropriately), then we can show that there is some statement qP such that τ(f(q))τ(q). Let P1 be the set of set representations of propositions with one free variable (predicates). We can define proposition-level function s:P1P that feeds a predicate the representation of the predicate. Then, we can define a predicate F(p)=τ(f(s(p))) and thus a proposition q=s(F)P, the required fixed point.

But, what if f is the function p¬p that negates a proposition? Then we have a proposition q such that τ(¬q)τ(q), so τ can’t possibly be a truth term in actuality. My intuition for what is going wrong is that τ seems like it would need to have an environment to keep track of all of the bound variables, but ZFC prevents you from having sets with arbitrary elements: you always have to specify a domain. Well, the domain would be the model, but you don’t have a way to gain direct access to the current model from within the model. If you did, you’d have Russell’s paradox.

A kind of fun modification to this is to say “well, ok, fine, there’s no τ(p) predicate, but we can still have have a universal truth predicate T(p)=M,model(M)τ(p,M) that checks the truth across all models, since you can write down a predicate to check whether a set is a model of set theory, so what happens now?”[9] Here’s what happens: the kind of argument above gives us a proposition q such that T(q)T(¬q). Gödel’s completeness theorem says that T(p) is equivalent to p being provable, and so q being provable is equivalent to ¬q being provable. So, if the theory is consistent, neither q nor ¬q are provable (the law of the excluded middle does not apply to provability). The theory is incomplete!

[2] Traditionally, this is (λx.M), hence the name of the calculus. I like the modern notation for function abstraction more, but consider it to be idiosyncratic.
[3] Currying is basically the bijection hom(X×Y,Z)hom(X,hom(Y,Z)).
[4] 6/12/2020: This substitution rule for S is incorrect. The N would need to have no free variables for the rule as-is. The general rule would be more complicated. This makes me less excited about this variation, but at least the interpreter later on is simple and (likely) correct.
[5] With de Bruijn indexes, this is T=λλSz and F=λλz.
[6] Jean-Louis Krivine, A call-by-name lambda-calculus machine. Higher Order Symbol. Comput. 20 (3) (2007) 199–207. doi: 10.1007/s10990-007-9018-9
[7] However, it’s been pointed out to me that this is essentially an application of Lawvere’s fixed-point theorem.
[8] Oh how I wish I introduced a quasiquotation notation with metasyntactic variables...
[9] Representations of propositions are constructible, so it’s OK using p with respect to different models.