The Todd-Coxeter algorithm

One problem in group theory is to understand a group from a presentation. This is not an easy problem in general, and there are some notable obstructions. For instance, there is no hope to find an algorithm which can tell whether a particular word is the identity since the word problem is undecidable.

However, for finite presentations, those with a finite set of generators and a finite set of relations between them, the Todd-Coxeter algorithm is able to give a list of all elements of a finite group in an (unknown) finite time. In fact, the algorithm is able to do more: enumerate the cosets of a finite-index finitely generated subgroup. Or, more specificially, it is able to find a faithful group action with a given finitely generated stabilizer.

What we will do in this note is describe the Todd-Coxeter algorithm, give a rather short implementation of it, and prove its correctness. But, to simplify the proof of the algorithm, let us first take a diversion through the theory of group actions.

There is an online implementation of the algorithm at Math tools on this site.

1. Classification of group actions

A G-set S for a set S and a group G is a homomorphism ϕ:GBijS — that is, a left group action. Maps between G-sets are functions on the underlying sets which commute with the respective group actions.

An invariant subset S of S is one for which gSS for all gG, and a reducible G-set is one which has a nonempty invariant proper subset. An irreducible G-set is also known as transitive, and is characterized by each pair s1,s2S having some gG with gs1=s2.

If SS is a nonempty proper invariant subset, then SS is as well. This is because if gsS, then g1(gs)S, so sS. The orbit of an element sS is the set Gs, which is an invariant subset and irreducible. Having the same orbit is an equivalence relation, and so S decomposes into a disjoint union of irreducible G-sets, one per equivalence class. For this reason, we will now focus on classifying an irreducible G-set.

A pointed G-set is one with a distinguished element s0, called the basepoint, and maps between pointed G-sets are G-set maps which map the basepoint to the basepoint. A pointed G-set has a corresponding subgroup StabS consisting of all elements gG with gs0=s0. This is actually a functor. For f:SS, the induced map f:StabSStabS is simply an inclusion, since for g stabilizing s0, gf(s0)=f(gs0)=f(s0) so f(s0) is stabilized as well.

Let H be the stabilizer for an irreducible pointed G-set S. Define the G-set map f:G/HS by f(gH)=gs0, where G/H is the set of left cosets of H, with the G-action defined by left multiplication, and with H as the basepoint. This is well-defined since H is the stabilizer of s0, and injective for the same reason. Given sS, let gG be such that gs0=s, and so f(gH)=gs0=s, hence f is surjective. Therefore, f is a G-set isomorphism. Thus, every irreducible pointed G-set is isomorphic to the coset G-set G/H for some subgroup H. This completes the classification.

In case this doesn’t seem recognizable: a corollary is that the cardinality of an orbit is the index of the stabilizer, so the cardinality of an orbit times the order of the stabilizer equals the order of G.

There are still a few questions which should be answered. What is the importance of the basepoint? Suppose s0,s1S, and H0 is the stabilizer of s0, H1 is the stabilizer of s1, and gG is such that gs0=s1. Then for hH1, (g1hg)s0=g1s1=s0, so g1hg is in H0. By symmetry, H0 and H1 are conjugate subgroups of one another.

What are the automorphisms of an irreducible G-set? Let f:G/HG/H be an automorphism (not pointed — if it were pointed, then it would be the identity). Since f(gH)=gf(H) for all gG, there is some aG such that f(gH)=gah, so f is determined by a. Furthermore, for hH, since f(hH) is both aH and haH, we have ah=ha for some hH, which implies a is in the normalizer of H. Conversely, a map f(gH)=gaH is well-defined for aN(H). The only such maps which are the identity are when aH, hence Aut(G/H) is isomorphic to N(H)/H. In particular, when H is a normal subgroup, Aut(G/H) is G/H itself. The converse follows at least for G/H finite, since then H is a stabilizer for all of G/H.

While it is something of a tautology to say that G/H is a G/H-set, it is worth pointing out because whenever a pointed G-set has H as a stabilizer, and H is a normal subgroup, then that G-set can also be regarded as a G/H-set with the same basepoint. In general, if N is normal subgroup which is a subset of the stabilizer, then the G-set may also be regarded as a G/N-set.

To relate this to the theory of covering spaces, there is a universal G-set, namely G as a G-set. Whenever H is a subgroup of G, then the quotient map GG/H is the unique pointed G-set map between them.

A construction which will prove useful is a sort of quotient. Suppose H is a normal subgroup of G and S is a G-set. Then {Hs:sS} is a G-set as well. As an aside: this is related to the induction/restriction adjunction. Restriction of S to an H-set is a functor, and the irreducible H-subsets correspond to the elements of the above quotient. Induction of an H-set S to a G-set is to take G×S with (gh,s) equivalent to (g,hs) and with the G-action defined by multiplication in the left component.

2. Describing a G-set with a Schreier graph

In this section, we will discuss a way to describe a G-set when G is generated by a set X.

A Schreier graph Γ for a pointed G-set S and generating set X is a labeled directed graph with S the vertex set and a g-labeled edge sgs for each gX and sS.

For a given label g, every vertex has exactly one outgoing edge and one incoming edge with that label. This is because the g acts on S bijectively. For this reason, we can describe a path starting from sS using only edge labels and the direction of travel along each edge. For gX, g will denote following the edge leaving a particular vertex, and g1 the following the edge entering that vertex. A path g1gn from a vertex s has each gi or gi1 in X, and we follow the path in right-to-left order, so that (g1gn)s is the vertex at the end of the path.

A relation in G is a word g1gn equaling the identity, and a relator in Γ at sS is a loop starting and ending at s. Every relation corresponds to a relator at each vertex.

The set of relators at a particular vertex, taken as words in G, is the stabilizer of that vertex. This gives a way to see whether a particular graph is the Schreier graph of a G-set with a particular stabilizer: check that each vertex has the correct number of incoming and outgoing edges, check that each relation in G is a relator in the graph, and check that the set of relators at the basepoint equals the stabilizer of the basepoint.

If Γ is a Schreier graph G, and if every element of a normal subgroup N of G is a relator in Γ at s0, then Γ is a Schreier graph of G/N as well.

Suppose G is a free group. The Schreier graph for G as a G-set is a tree, which is a loop-free graph. Given a normal subgroup N of G, the quotient G-set G/N has each element of N form a loop in G/N, and each relator in G/N is an element of N, or in other words a relation in G/N. Suppose N is the normal closure of some set of generators. Each generator is a relator in G/N at every vertex, because conjugation brings a relator from the basepoint to any other vertex. It is sufficient to check each that each generator is a relator at every vertex to see that N is a subset of the stabilizer.

Suppose H is any subgroup of G/N. We may regard it as a subgroup of G instead by finding generators of H. Then, the Schreier graph for (G/N)/H is G/(NH), so the generators of H are relators at the basepoint.

3. The algorithm

With these facts in mind, the Todd-Coxeter algorithm is fairly simple. Given the following data:

there is a group G=XR and a subgroup H=Y, and we may calculate a Schreier graph for G/H by the following process. While we talk about an infinite object, the Schreier graph of a free group, we only ever need to store a portion of a Schreier graph, with the graph outside the frontier assumed to be trees.
  1. Begin with the Schreier graph of the free group generated by X, with vertices labeled by positive integers. Let s0 be the basepoint. All of the vertices are unlabeled and unmarked, but s0 is labeled with 1.
  2. For each generator in Y, follow the path from s0, labeling the unlabeled vertices with fresh integers along the way, and then identify s0 with the end vertex.
  3. For each unlabeled unmarked vertex s:

    1. Label s with a fresh integer.
    2. Follow from s each generator and generator’s inverse, labeling unlabeled neighbors with fresh integers.
    3. For each relation in R, follow the path from s, labeling unlabeled vertices with fresh integers along the way, and then identify s with the end vertex.
    4. Mark s.

To identify two vertices of a Schreier graph, remove the vertex with the greater label, and move all the edges from the removed vertex to the remaining vertex. Then, wherever the there are too many edges of a particular label around that vertex, identify the vertices at the other end of those edges. Since the set of labeled integers at any particular moment is finite, this identification process must eventually terminate. The result is a Schreier graph of a free group.

Notice the following facts about this algorithm:

The last fact means that there is a limiting graph Γ, and the other facts imply that Γ is a Schreier graph for the free group whose stabilizer contains both H and the normal subgroup generated by R.

Thus, the result may be regarded as a G-set. Since the graph is path-connected, it corresponds to an irreducible G-set, and if [G:H] is finite, then the algorithm must terminate: the graph has no more vertices than [G:H], so there is some point where all of the neighbors of those vertices have stabilized. There is a path between the basepoint and any labeled vertex, so at this point such a path remains within the stabilized vertices.

The question remains whether the result actually is isomorphic to G/H itself. Each step of the algorithm creates a new Schreier graph whose stabilizer contains the stabilizer of the previous Schreier graph, and so there is an increasing chain of subgroups, each of which is inside the limiting H. A relator in Γ is a path, and so there is some step of the algorithm such that that path in that step’s Schreier graph is a loop, and so it is an element of that step’s stabilizer, and hence of H.

That is it for correctness.

Just for the sake of some intuition: the way I think about the algorithm is that we are creating a covering space of the wedge sum of circles, one circle per generator of G. We lazily quotient the covering space by the relations of G and the generators of H to produce a covering space whose fiber is in one-to-one correspondence with G/H.

One thing to notice about the algorithm is that there is no need for [G:H] to be finite, other than to guarantee it will terminate. There is still a limit graph Γ, but unfortunately there is no way to tell when the graph has stabilized!

4. Implementation

Now we give a concrete implementation of the algorithm in Python. For simplicity, we will just compute S3/b with the presentation a,b|a3,b2,abab. You can download tc.py.

The main data structures are

idents = []
neighbors = []
to_visit = 0
The idents variable is a list of numbers, with the property that idents[i] <= i. This is a union-find data structure for keeping track of the vertex quotients for the Schreier graph so far. The find function looks up the current-lowest label for a particular labeled vertex:
def find(c):
    c2 = idents[c]
    if c == c2:
        return c
    else:
        c2 = find(c2)
        idents[c] = c2
        return c2

For vertices which have not been removed, the neighbors data structure contains a list of all the neighboring vertices, with None for non-existent neighbors — non-existent neighbors are presumed to be a portion of the free group’s universal Schreier graph, namely a tree. Each entry is another vertex label. Vertex labels may be out of date, so find must be run on them to ensure validity.

The to_visit variable keeps track of which vertices have been marked. Any vertex whose label is less than to_visit is considered marked.

We record the relations and generators in the following structures, the details of which are not that important:

ngens = 2
rels = [
    (1, 0), # a^-1a
    (3, 2), # b^-1b
    (0, 0, 0), #a^3
    (2, 2), # b^2
    (0, 2, 0, 2) # abab
]
hgens = [
    (2,), # b
]
We use 0 and 1 for a and a1, and 2 and 3 for b and b1. For simplicity of the core algorithm, we introduce a1a and b1b as explicit relations.

The identification procedure is called unify.

def unify(c1, c2):
    c1 = find(c1)
    c2 = find(c2)
    if c1 == c2:
        return
    c1, c2 = min(c1, c2), max(c1, c2)
    idents[c2] = c1
    for d in xrange(2*ngens):
        n1 = neighbors[c1][d]
        n2 = neighbors[c2][d]
        if n1 == None:
            neighbors[c1][d] = n2
        elif n2 != None:
            unify(n1, n2)
It takes two vertices, and makes the one with the greater label refer to the lesser one with idents[c2] = c1. Then, it moves all of the neighbors of the deleted vertex by recursively calling unify. In case the neighbor being replaced is non-existent, which is the base case of the recursion, we just record it as a neighbor. There is no need to record this identification on the other end of the edge because the other end is referring to c2, which is now c1.

For following paths, we have the following two functions:

def follow(c, d):
    c = find(c)
    ns = neighbors[c]
    if ns[d] == None:
        ns[d] = new()
    return find(ns[d])

def followp(c, ds):
    c = find(c)
    for d in reversed(ds):
        c = follow(c, d)
    return c
The first takes a vertex and finds the neighbor in the d direction, creating a new vertex in that direction if needed, and the second follows a list of directions to find the end of a path. The follow function creates new neighbors with the following function:
def new():
    c = len(idents)
    idents.append(c)
    neighbors.append((2*ngens)*[None])
    return c
This just creates a fresh label so that idents[c] == c, and initializes the neighbors array to point to non-existent vertices. There is no need for follow or new to record the neighbors on the new vertex, since the relations such as a1a will eventually take care of this.

Finally, there is the core algorithm:

start = new()

for hgen in hgens:
    unify(followp(start, hgen), start)

while to_visit < len(idents):
    c = find(to_visit)
    if c == to_visit:
        for rel in rels:
            unify(followp(c, rel), c)
    to_visit += 1
It creates the start vertex, adds all of the relations for H as relators at this basepoint, and then for each unmarked vertex, adds all of the relations from rels. Notice how unify is being used to turn paths into relators.

After this, the data structures contain the Schreier graph for G/H. This can be interpreted as a permutation representation, for instance using

cosets = [c for i, c in enumerate(idents) if i == c]

perms = [[cosets.index(follow(c, 2*d)) for i, c in enumerate(cosets)]
         for d in range(ngens)]
to enumerate the cosets (which are vertices which have the least label in their equivalency classes), and then taking edges of the Schreier graph for each generator to construct a permutation.

The permutations can be written in cycle notation fairly easily. One way is with

def cycle(perm):
    parts = []
    for i in range(len(perm)):
        part = [str(i+1)]
        k = perm[i]
        while k != i:
            if k < i: break
            part.append(str(k+1))
            k = perm[k]
        else:
            parts.append(" ".join(part))
    return "("+")(".join(parts)+")"

for d in range(ngens):
    print "g%d ="%d, cycle(perms[d])

For these particular relations, the output is

g0 = (1 2 3)
g1 = (1)(2 3)
which means there are three cosets. The first generator acts cyclically on the three cosets, and the second generator merely swaps the non-subgroup cosets. This actually gives a faithful representation of S3.

Replacing H with the trivial subgroup, we instead get

g0 = (1 2 4)(3 5 6)
g1 = (1 3)(2 6)(4 5)
which means S3 has six elements.

5. Applications

There are a number of things one can deduce from the result of the algorithm.

6. References