An introduction to linear algebra

Our story of linear algebra begins with the concept of the vector space. For our discussion, we will let k be some field, for instance the real numbers R or the complex numbers C.

Definition. A vector space V is a set (whose elements are called vectors) with addition of vectors and scalar multiplication of a vector by k. That is,

  1. if v,wV, then there is an element v+wV;
  2. if ck and vV, then there is an element cvV;
  3. there is a vector called 0V such that v+0=v for all vV; and
  4. everything respects the associative, commutative, and distributive laws (u+(v+w)=(u+v)+w, v+w=w+v, c(v+w)=cv+cw, and (c+d)v=cv+dv).

Some consequences of this definition are that 0 is unique, and there are unique additive inverses v such that v+(v)=0 (left as exercises).

There are many examples of vector spaces. One example is the noble Rn, that is, the set of all n-tuples (c1,,cn) with ciR, where addition and scalar multiplication is componentwise. One can make nice diagrams of vector addition by the so-called “tip-to-tail method”: by imagining the vectors as arrows which can be translated in space without rotation, and then placing the “tail” of one vector at the “tip” of the last; the arrow which would start at the first tail and end at the last tip is the resulting sum. Scalar multiplication by c then corresponds to scaling the arrow by a factor of c (as it should, at least for integers: v+v=1v+1v=(1+1)v=2v for instance). When n=1, notice that this is just R as a vector space with R scalars.

A good example that is not Rn is the vector space of polynomials (single or multivariable). Two polynomials may be added, and a polynomial may be multiplied by a scalar. Another good example is the vector space of continuous functions, say RR. In calculus, one learns that the sum of continuous functions is continuous, as is a constant times a continuous function.

A vector subspace W of a vector space V is a subset of vectors which is itself a vector space. An example is vectors (x,0) for all xR inside of R2. One can show that a nonempty subset of a vector space is a vector subspace if and only if it is closed under addition and scalar multiplication. For instance, from this 0W because 0w=0.

Let us look again at the definition of a vector space, and suppose we have some scalars c1,,cnk and some vectors v1,,vnV. We may apply rule 2 for each pair to get vectors c1v1,,cnvnV, and then we may repeatedly apply rule 1 to get a vector c1v1++cnvnV (one may prove that it does not matter in which order these vectors are added by appealing to the associative law). This is a key operation on a collection of vectors, and we call it a linear combination. In fact, the rules are set up so that this concept arises.

1. Linear transformations

At this point, we can start asking questions: which vectors in V are linear combinations of some subset of vectors in V? can we find a finite set such that everything is a linear combination of those vectors? how unique are representations as a linear combination? However, we will take a detour through the fundamental idea of a linear transformation.

Definition. A linear transformation is a function T:VW between vector spaces V and W such that

  1. if v,wV then T(v+w)=T(v)+T(w); and
  2. if vV and ck, then T(cv)=cT(v).
That is, for a linear combination icivi, then T(icivi)=iciT(vi). When W=V, then the linear transformation is called a linear operator.

Since T(0)=T(0+0)=T(0)+T(0), it follows that T(0)=0, and so this gives a quick test to check whether a function is not a linear transformation. One example would be a translation in the plane R2.

An example of a linear transformation is T being a rotation around the origin in the plane. Another is V=W being polynomials and T being the derivative. The most basic examples are the identity transformation idx=x (also known as I) and the zero transformation Zx=0.

There are two important properties of a linear transformation T:VW. The first is the kernel (or nullspace) of T, denoted kerT, which is the set of all vectors vV such that Tv=0. One can check that kerT is a vector subspace of V. The second is the image of T, denoted imT, which is the set of all vectors TvW, for all vV (that is, the set of all vectors w in W where there is a vV such that w=Tv). One can also check that imT is a vector subspace of W.

Recall that a function T:VW is surjective (or “onto”) if for every wW there is a vV such that Tv=w. A clear observation is that a linear transformation T is surjective if and only if imT=W. Recall also that a function T:VW is injective (or “one-to-one”) if whenever Tv1=Tv2 then v1=v2 (alternatively, by the contrapositive, if whenever v1v2 then Tv1Tv2). For a linear transformation, notice that this means that whenever T(v1v2)=0 then v1v2=0, and, letting v=v1v2, that whenever Tv=0 then v=0. This amounts to saying that T is injective if and only if kerT={0} (called a “trivial kernel”).

Back to linear combinations. We call a tuple of vectors v=(v1 v2  vn) a hypervector. In fact, Vn, the collection of such hypervectors, is itself a vector space. (As an aside: we may imagine kn to be a collection of hypervectors, but we will not do this for a reason which will be clear shortly.) Given a vector ckn, we define the product vc to be the linear combination icivi. For instance, if v1=(0,1) and v2=(1,0), with c=(2,1), then

vc=((0,1),(1,0))(2,1)=2(0,1)+1(1,0)=(1,2). A astute reader will recognize this as matrix multiplication with a column vector, if we had written the vectors of v vertically while forgetting to write the inner parentheses.

In fact, if vVn is a hypervector, the function T:knV defined by T(c)=vc is a linear transformation, as one can easily check. Notice if we drop parentheses that Tc=vc, which suggestively has it that T=v. We will speak of v itself as a linear transformation.

2. Basis and dimension

Now we are in a position to ask those questions about linear combinations. When can is a collection of vectors v1,,vnV represent every vector in V? It is when the linear transformation v:knV is surjective (i.e., when imv=V). When is a representation as a linear combination unique? It is when v:knV is injective (i.e., when kerv is trivial).[1] This leads to a natural question: can one find a collection v for which v is an isomorphism (a linear transformation which is both surjective and injective)? This would give every element of V a unique coordinate in kn.

The answer to this question comes from observing two processes. The first is looking at v which are surjective. We call such surjective hypervectors spanning sets[2] of V since we imagine every element of V is “spanned” by some linear combination of these vectors. The second process is looking at v which are injective. We call vectors in an injective v linearly independent vectors. The terminology comes about from the concept of linear dependence: if v is not injective, then the kernel is nontrivial, so one of the vectors (say v1) can be written as v1=(c2/c1)v2++(cn/v1)v2, and so v1 “depends” on the rest; linearly independent vectors are ones that are not dependent.

We will limit this discussion to vector spaces which have a spanning set with finitely many vectors in it, since otherwise the theory is not so simple. Suppose V is such a vector space. Now, it makes sense to speak of a minimal spanning set v, one that has the fewest number of elements in it. There may very well be (infinitely) many minimal spanning sets, but we only care that there is one. Because it is a spanning set, every vector of V may be represented as a linear combination, but it is not clear that the representation is unique. Suppose v=(v1  vn) is a minimal spanning set: is it an independent set? If there were a dependence, then one of the vectors, say vn, would be in the span of v1,,vn1, which would be a spanning set with fewer vectors, hence it is indeed an independent set. This means that such vector spaces have an isomorphism knV for some n.

What we have not settled is whether there are minimal spanning sets with different numbers of elements. If u=(u1  um) and u(v1  vn) are both minimal spanning sets, then there are isomorphisms u:kmV and v:knV. Then v1u:knkm is an isomorphism.[3] Hence, it reduces to:

Lemma. If T:kmkn is an isomorphism, then m=n.

Proof. Suppose it is not already the case that m=n, and without loss of generality, assume m>n (replacing T with T1 as necessary). Let e1,,em be the standard basis vectors of km (with ei=(0,,0,1,0,,0) having a 1 in the ith coordinate), and construct an n×m matrix (Aij) by choosing the Aij as unique coefficients such that Tej=iAijei. Since there are more unknowns than equations, by elimination we can solve the system 0=Ac for some nonzero ckm, and since (Tej)jc=jcjTej=jcj(iAijei)=i(jcjAij)ei=Ac=0, this gives a linear dependence on the vectors Tej, which in turn gives a linear dependence on the vectors ej since T is injective. Therefore, m=n. ∎

Theorem. Every vector space V with a finite spanning set has a minimal spanning set called the basis, and any two bases have the same number of elements, called the dimension of the vector space. That is, there is hypervector v which is an isomorphism. Such vector spaces are called finite dimensional.

Proof. This follows from the lemma. ∎

Effectively, what we have said is that, up to isomorphism, there are not very many kinds of finite dimensional vector spaces, and in fact they are all isomorphic to some kn. (“Dimension is all there is.”)

To finish off this discussion of dimension, we need the following key lemma:

Lemma. A vector subspace W of a finite dimensional vector space V is itself finite dimensional.

Proof. Since V is finite dimensional, let us assume it is just kn, and let W be the corresponding subspace under the isomorphism (i.e., if T:Vkn is an isomorphism, then W is isomorphic to T(W)={Tw:wW}). Suppose W has m>n linearly independent vectors w1,,wmW. Then the matrix n×m matrix A=(w1,,wm) represents a system of equations with more unknowns than equations, so again by elimination there is a ckm such that Ac=0. Hence, there is a dependence among the wi, a contradiction, so W has at most n linearly independent vectors. Let w1,,wmW now represent a maximal set of mn linearly independent vectors, and suppose that it does not span W. Let wW be an element not in the span of these vectors, and then w1,,wm,w is a larger linearly independent set. Therefore, W is spanned by a finite set of vectors in V, and so it is finite dimensional. ∎

What this gives us is that a linearly independent set of vectors must have no more vectors than a spanning set. Thus, we may build a basis by adding vectors one at a time that are not yet in the span of the vectors already present, and the lemma guarantees this process will finish.

3. Rank-nullity

What is the relationship between a linear transformation T:VW, V, kerV, and imT? If we think about it, T “kills” the entire subspace of the kernel, and so, intuitively, it is collapsing dimensions, and so we may think that the image consists of the dimensions which are remaining out of V. This intuition is indeed correct, and gives us an important theorem for the theory of linear transformations:

Theorem. (Rank-nullity). Suppose T:VW is a linear transformation between finite dimensional vector spaces. Then dimV=dim(kerT)+dim(imT). The dimension of the kernel is called nullity and the dimension of the image is called rank.

Proof. Let u be a basis of m elements for kerT, which exists becaues V is finite dimensional and kerT is a subspace of V. Let v be a basis of n elements of V obtained by adding vectors to u until it is a minimal spanning set. Then, T(c1v1++cnvn)=T(cm+1vm+1++cnvn) because the first m vectors in the basis are in the kernel. If this resulting vector in W were 0, then cm+1vm+1++cnvnkerT also, and hence a linear combination in u, but it cannot be because v is a basis. Hence, the vectors Tvm+1,,Tnvn are linearly independent and span imT. This proves that the dimension of V is the sum of the dimensions of kerT and imT. ∎

There is a related concept underlying this theorem. Given two vector spaces V and W, we may construct a new vector space called the direct sum of V and W, denoted VW, which is the set of all pairs {(v,w):vV,wW}. It is a vector space in by component-wise addition and scalar multiplication. We have already seen R2, which is actually RR. Another construction is between vector subspaces U and W of a vector space V, namely U+W={u+w:uU,vW}, which one can check is also a vector subspace.

There is a canonical linear transformation T:UWU+W defined by T(u,w)=u+w. If U+W=V and T is an isomorphism, then we say that V is a direct sum of U and W (it is common in mathematics to speak of things being “the same,” even though they are not actually exactly the same, if there is some isomorphism between them).

Lemma. If U,W are vector subspaces of V such that U+W=V and UW={0} (i.e., they only share the zero vector), then V is the direct sum UW.

Proof. Let T:UWV be the canonical T(u,w)=u+w. It is surjective because U+W=V. Suppose (u,w)kerT. Then u+w=0, and so u=w. This entails that uW and vU since U,W are both closed under multiplication by 1k. Since u,wUW, u=w=0, hence only (0,0)kerT, and so T is injective. Therefore, T is an isomorphism. ∎

Lemma. If V is isomorphic to UW, then dimV=dimU+dimW.

Proof. If u is a basis of U and w is a basis of W, then (u1,0),,(um,0),(0,w1),,(0,wn) is a basis of UW. This is essentially the proof. ∎

What the rank-nullity theorem is actually saying is that V is isomorphic to kerTimT. We used the basis to construct a (non-unique) linear transformation S:imTV such that TS:imTimT was the identity (something like a partial inverse of T). With this, the explicit isomorphism VkerTimT is defined by v(vS(T(v)),T(v)).

4. Change of basis and matrices

If we have two bases v and w of a vector space V, how do they relate? In each case, we have an isomorphism knV, and so the composition P=w1v:knkn is itself a linear transformation, which takes coordinates according to the first basis and converts them into coordinates according to the second. That is, if ckn, then w(Pc)=w(w1vc)=vc, so Pckn really is the coordinates according to w for the vector vc. This is illustrated by the following diagram, where the top arrow is the linear transformation P, and the bottom arrow is the identity transformation idx=x. The diagram is said to commute, in that id(v(c))=w(Pc) for all ckn (where id(v(c)) is simply v(c)).

knPknvwVidV

Notice that P may be regarded as an n×n square matrix. This is a point of view which is sometimes useful when one works with the vector spaces in terms of coordinates rather than abstract vectors.

In fact, suppose T:VW is a linear transformation, with v a basis for the n-dimensional V and w a basis for the m-dimensional W. We have a similar diagram

knAkmvwVTW where the bottom arrow is T and the top arrow is the obvious thing which make the diagram commutative, namely A=w1Tv We may regard A as an m×n matrix, which is called the matrix of the transformation. The matrix can vary wildly depending on exactly which bases are chosen for V and W. When studying a particular linear transformation, it pays to find good bases for which the matrix is easy to analyze.

The point of view which must be stressed is this: vector spaces may be coordinatized in many different ways (depending on the basis), and a linear transformation between vector spaces can be pulled back to be a linear transformation on the coordinates themselves, and likewise a linear transformation on coordinates can be pushed forward to a linear transformation of the vector spaces. There is much value in this duality of representation.

5. Eigenvectors and eigenvalues

Consider a linear operator T:VV. In some cases, we may find non-zero vectors v such that Tv=λv for some (possibly zero) λk. When this happens, λ is the eigenvalue for the eigenvector v. The “eigen” refers to the German word for “own” or “innate,” in the sense that these data tell everything about the action of T.

For instance, suppose that V has a basis v of eigenvectors vi with respective eigenvalues λi. Then T(icivi)=iciλivi, and so the matrix of T with basis v is a diagonal matrix D with entries being the eigenvalues. (And, if we wish to find the matrix when V=kn in the standard basis, by a suitable basis change, the matrix becomes vDv1.)

We have already seen one example of an eigenvector, namely non-zero vectors in kerT, since Tv=0v for vkerT. An easy class of examples from the above discussion is diagonal matrices. A more interesting example is the differential operator d/dx on differentiable functions RR. Since (d/dx)(eλx)=λex, we see that eλx is an eigenvector with eigenvalue λ.

The analysis of eigenvectors begins with the observation that an eigenvector for an eigenvalue λ is in the kernel of TλI (which is a linear operator which takes a vector vV to Tvλv). We define the eigenspace Eλ for an eigenvalue λ to be ker(Tvλv). Since this is a kernel, it is clear that it is indeed a vector subspace of V.

If λμ and vEλEμ, then Tv=λv and Tv=μv simultaneously, and so 0=(λμ)v, which implies v=0. This means that there are only finitely many eigenvalues for a given linear operator since Eλ1Eλn is isomorphic to the vector subspace Eλ1++Eλn of V.

What we do not know yet is whether there even are any eigenvalues. Unfortunately, it is not always the case that there are eigenvalues (for instance, a rotation matrix in R2 by π/2),[4] but this only depends on the field being algebraically closed (i.e., in which every polynomial has a root). One such field is C, and so we will stick with this field for the rest of the section.

To show that T has an eigenvalue, let vV be a nonzero vector in a finite dimensional vector space. Since V is finite dimensional, there is some n such that v,Tv,T2v+,,Tnv is a minimal set of linearly dependent vectors (where Tn represents T(Tn1v), and so on). Let aiC be such that a0v+a1Tv++anTnv=0. Consider the polynomial p(z)=a0+a1z++anzn. By the fundamental theorem of arithmetic, p may be factored as p(z)=an(zr1)(zrn) for roots riC. Then, an(Tr1I)(Tr2I)(TrnI)v=(a0I+a1T++anTn)v=a0v+a1Tv++anTnv=0, so at least one of the linear operators TriI is not injective, hence there is at least one i such that ker(TriI) is nontrivial. Therefore, T has at least one eigenvalue ri.[5]

Unfortunately again, while this shows that there is an eigenvalue, we are not guaranteed to find a full set of eigenvalues λ1,,λn such that Eλ1++Eλn=V, but we will take what we can get. An example of this is the matrix

(0100) which has the eigenvector (1,0) with eigenvalue 0, however the above techinque for v=(a,b) gives Tv=(b,0) and T2v=0, so there is a linear dependence T2v=0, but z2=0 has only 0 as a root, and by rank-nullity, the kernel has dimension 1.

At this point, we could discuss the relationship between determinants and eigenvalues, as well as trace and eigenvalues. Determinant and trace are useful tools for their study.

6. Linear recurrence relations

In this section, we discuss an example of using eigenvectors to analyze a linear recurrence relation and determine a closed formula.

Given a sequence of numbers s=(s0,s1,), a linear recurrence relation of order k is a rule sn=a1sn1+a2sn2++aksnk) with ak constants. The Fibonacci numbers with fn=fn1+fn2 is an example of a linear recurrence relation.

Infinite sequences of numbers form a vector space V, using componentwise addition and scalar multiplication. The zero vector is the sequence (0,0,). We define two linear operators on infinite sequences. The first is the identity I where Is=s, and the second is the shift operator Ss=(s2,s3,). We may compose these to produce, for instance, the difference operator D=SI, where Ds=(s1s0,s2s1,).

Perhaps surprisingly, there are a nice class of eigenvectors of S. Since S(1,λ,λ2,)=(λ,λ2,λ3,)=λ(1,λ,λ2,), we see that (1,λ,λ2,) is an eigenvector with eigenvalue λ.[6]

Let us look at the Fibonacci sequence again. Notice that what it is saying is that fnfn1fn2=0, and so S2fSff=0. This is saying that f is in the kernel of the operator S2SI. If we take a bit of a leap of faith, notice that the polynomial x2x1 has two roots, ϕ=(1+51/2)/2 and 1ϕ=(151/2)/2, and this lets us factor S2SI=(SϕI)(S(1ϕ)I). So, the kernel of S2SI includes eigenvectors of S with eigenvalues ϕ and (1ϕ) (since the factorization could have gone in either order). A possibility then is that fn=c1ϕn+c2(1ϕ)n, a sum of two eigenvectors. Let us find c1 and c2 so that f0=0 and f1=1. In this case, 0=f0=c1+c2 and 1=f1=c1ϕ+c2(1ϕ). Solving this system for c1,c2, we have c1=51/2 and c2=51/2, hence a possible closed form

fn=(ϕn(1ϕ)n)/51/2. Indeed, we constructed it so that it was in the kernel of S2SI and so fn=fn1+fn2, and furthermore f0=0 and f1=1. It must be the closed form for the Fibonacci sequence.

Since |1ϕ|<1, |(1ϕ)n| very quickly approaches zero. The limit of fn/fn1 as n is thus ϕ, the golden ratio.

It should be said that there is another way we could have analyzed this system. By the recurrence relation, we have the following relation:

(fnfn1)=(1110)(fn1fn2) Let A be the 2×2 matrix above. Then, if we wish to find fn, we must calculate An(1,0) and take the first coordinate. Now, if there is an eigenbasis v of A (there is), we may diagonalize A in that basis as vDv1. In this form, the dynamical properties of A may be studied easily by observing that An=(vDv1)n=vDnv1. One may check that ϕ and 1ϕ are eigenvalues of A, and the eigenbasis follows.

[1] Note that if v can be written as two different ways vc and vc, then so can zero as v0 and v(cc), and this transfers to giving a representation of any vector in two different ways, if it has a representation at all.
[2] Caution: a spanning set is a tuple, not a set. The terminology is ’istoric.
[3] Exercise: prove that if T:VW is an isomorphism then so is T1:WV
[4] The next best thing is Jordan normal form, but we will not discuss this here.
[5] From Down with Determinants! by Sheldon Axler, 2014.
[6] There are also generalized eigenvectors, but because we are avoiding Jordan normal form, we will not discuss them.