ANSWERS TO QUESTIONS ASKED BY STUDENTS
in Math 110, Fall 2003, and Math H110, Fall 2008,
taught from Friedberg, Insel and Spence's "Linear Algebra, 4th Edition".
----------------------------------------------------------------------
Regarding the discussion of the plane containing three points A,
B, C (not on a straight line) on p.5 as the set of vectors
A + s(B-A) + t(C-A), you ask how this relates to the description
of planes by equations in the coordinates, ax+by+cz=d, and what
"the advantages and disadvantages" of using each form are.
Well, the relationship between these two ways of describing a plane
is typical of something that occurs throughout mathematics; where
some system of elements can be described "from below" or "from above"
-- namely, either by taking a smaller set of elements and performing
operations on them that build up the whole set, or by starting with
some larger set of elements, and describing the set we are interested
in as consisting of those members of the larger set that satisfy
certain conditions. And a recurring type of problem in mathematics
is to start with an object described in one of these ways, and find a
description of it in the other way: Given a set of elements, and a
process of "building" other elements from them, to find a condition
that precisely characterizes the elements one can get by this process;
or given a condition, to find a way of "constructing" all the elements
that satisfy it.
I could go into a long discussion of how to go back and forth between
the two ways of describing a plane; but I prefer to say: it will
be easier to do when we have learned more of the theory. At this
point, if it interests you, I think you should be able to work it out
yourself. If you try and get stuck, come and discuss it at office
hours.
And rather than considering "advantages and disadvantages" of using
one or the other form, I would say it is most valuable to be aware
that the two forms exist, so that when a problem comes up where one
or the other is more useful, you will be able to make the choice.
----------------------------------------------------------------------
You ask whether physical experiments give us a basis for believing
all 8 of the axioms for a vector space (p.7).
Physical experiments lead us to believe that a reasonable model
of various quantities (such as velocity, displacement, and force)
is given by 3-tuples of real numbers (x,y,z), representing the
components of these quantities. The axioms for a vector space describe
mathematical properties that are possessed by the set of all such
3-tuples -- and that are also possessed by many other families of
mathematical entities, e.g., the set of all polynomials.
Physical experiments certainly don't imply that the the above-
mentioned models of the universe are perfect. For example, if the
universe is finite, it clearly can't be exactly modeled by R^3. But
the model R^3 is very accurate for "small" distances (i.e., less than
billions of light-years), and is much simpler than the kinds of models
needed to describe the universe on a "large" scale. But mathematics
(and even the theory of vector spaces) is also used in studying those
models -- just in more complicated ways.
----------------------------------------------------------------------
You ask why (on p.9) the zero polynomial is defined to have degree -1.
The point is a subtle one. Notice first that the definition of degree
that works for all nonzero polynomials, given at the top of p.10,
is meaningless for the zero polynomial. So we have to make a different
choice of how to handle the zero polynomial, and try to make it in a
way that will be most convenient in our reasoning; i.e., will have
the effect that the statements that hold for the degree, so defined,
allow us to do proofs with the fewest digressions to handle special
cases.
To get a hint of what a good choice should be, consider a polynomial
of the form
a x^2 + b x + c .
In "most" cases this has degree 2 -- namely, whenever a is nonzero.
If a = 0, then it has lower degree.
In fact, if a = 0, then in "most cases" it has degree 1 -- namely
whenever b is nonzero. But if b = 0, then it has lower degree.
In this case, in "most cases" it has degree 0 -- namely whenever
c is nonzero. This suggests that when c is also 0, i.e., when
we are looking at the zero polynomial, we should define its degree
to be something lower than 0. The book's choice is to use -1.
There are actually three different choices that different authors
commonly make:
(a) Leave deg(0) undefined.
(b) Let deg(0) = - 1.
(c) Let deg(0) = - infinity.
In contexts where one considers multiplication of polynomials,
definition (c) is best, since it makes the rule deg(fg) =
deg(f) + deg(g) hold; its disadvantage, of course, is that it
requires one to use the symbol "- infinity" which is not an integer,
real number etc.. Choice (a) avoids the whole problem, but means
that in arguments involving degree one must always consider the
zero polynomial as a special case. For the purposes of this book,
the authors' choice, (b), is a good one.
----------------------------------------------------------------------
You ask whether there is a convenient symbol for "is a
subspace of" (definition on p.16).
There is no standard one. Since the authors of this book use
sans-serif font for names of vector spaces (and for some related
entities, but not for names of subsets that aren't subpaces),
when you see a relation like "U \subset W" with U and W both
in sans-serif font, you can safely conclude that U is a subspace
of W. But there's no standard equivalent to sans-serif font for
handwritten work, so that doesn't provide a way for you to show this
in your homework. In group theory (a subject introduced in Math 113)
there is a convention that "_<" (i.e., the less-than-or-equal-to sign)
means "is a subgroup of", and some mathematicians have extended this
to other fields, to mean "is a sub--- of" where "---" is whatever
the field deals with. But this extended use is not very common, and
Friedberg, Insel and Spence have not adopted it. So I'm afraid that
in your written work in this course, you'll just have to use the word
"subspace".
----------------------------------------------------------------------
You ask why condition (a) of Theorem 1.3, p.17, is needed, given
the other two conditions.
Because the empty set is a subset of V, and satisfies the other
two conditions, but it is not a subspace of V, since it does not
contain the element 0.
----------------------------------------------------------------------
Both of you asked about the relation between quotient spaces
(Ex. 31, p.23) and modular arithmetic. I had it on my notes to
answer that in class, and came close, but then jumped to the next topic.
What I indicated in class was that for an equivalence relation ~ on
a set S, one forms a new set, S/~; and that if S has some
algebraic structure, one can, for appropriate "~", give S/~ a
structure of the same sort.
Well, in particular, the set of integers has a structure of ring;
and for any choice of n, if we let ~ be the relation "congruent
mod n", then ~ has exactly n equivalence classes, and the set
Z/~ of these again becomes a ring, called Z/n "the ring of integers
modulo n".
----------------------------------------------------------------------
You ask why the span of the empty set is defined to be \{0\} (first
definition on p.30).
There are two ways to look at it.
On the one hand, Theorem 1.5 (same page) shows that the span in V of
any set S is the smallest subspace of V which contains S
("smallest" in the sense of being contained in all others). If S is
a nonempty set, then that smallest subspace can be constructed as the
set of all linear combinations of members of S. Whether this works
when S is the empty set depends on whether one wants to consider 0
as a "linear combination" of the empty set of vectors. (As I mentioned
in class, I could give arguments for doing so; but students who were
not convinced by those arguments might then have trouble with what
followed.) In fact, giving the smallest subspace of V which
contains S is the most important property of the span operation,
which we definitely don't want to lose. Hence it is stated as a
definition made "for convenience".
The other way to look at it involves thinking about how we formally
define a sum of n elements. However one does it, the inductive
step is to assume one has given a meaning to a k-term sum
a_1 + ... + a_k, and define a (k+1)-term sum a_1 + ... + a_{k+1} as
(a_1 + ... + a_k) + a_{k+1}. But where do we start the induction?
The naive way is to start with 2-term sums, a_1 + a_2, using the
fact that 2-term addition is given to us. (This means that 2-term
addition is used twice: in the base step and the inductive step.)
Better is to define the sum of one term, a_1, to be a_1 itself,
and use this is one's base step. Best of all, I say, is to define
the sum of 0 terms to be 0, and use this as the base step. In
general, this makes it easier to handle results about summations,
with fewer special cases. In particular, it has the consequence that
one can take a "linear combination of elements of the empty set
within a vector space", namely, the sum of no terms, which is 0. Thus,
if we allow empty sums, we find that the span of the empty set is
\{0\}, without the need for a special definition.
----------------------------------------------------------------------
Regarding the last part of Example 5, p.31, beginning "On the other
hand ...", you ask how the fact that the diagonal elements are equal
is relevant.
If you start with matrices having equal diagonal elements, and
form a linear combination of them, that linear combination will also
have equal diagonal elements. (Can you verify this?) But not every
member of M_{2x2}(R) has equal diagonal elements, so not every
member of M_{2x2}(R) is in the span of the elements mentioned.
----------------------------------------------------------------------
You ask what the authors mean by "infinite field" on p.35, first
sentence of section 1.5.
They mean a field with infinitely many elements. So the field R
of real numbers, the field Q of rational numbers and the field
C of complex numbers are infinite, while the field Z_2, discussed
at the bottom of p. 555, is finite (having only two elements).
----------------------------------------------------------------------
You ask about question 1(a) on p.40.
If a set of vectors is linearly dependent, then some member of the
set is in the span of the rest, but this may not be so of every
vector in the set. For instance, the subset {(1,0), (2,0), (0,1)}
of R^2 is linearly dependent, because (2,0) is a linear combination
of the other vectors, namely, it equals 2 . (1,0) + 0 . (0,1).
But (0,1) is not a linear combination of the other vectors.
----------------------------------------------------------------------
You ask about the difference between "list" and "set" that I
talked about in lecture, in connection with Theorem 1.8 (p.43).
"List" is not a mathematical term; it is simply a term from
everyday life that I was using, the idea being "one thing written
after another". The corresponding mathematical concept is that
of an ordered n-tuple. So, for instance, (5,10) and (10,5) are
different ordered pairs (pair = 2-tuple), while {5,10} and {10,5}
are the same set. (5,10,10), (5,5,10), and (10,5,5) are different
ordered 3-tuples, but the sets {5,10,10}, {5,5,10}, and {10,5,5} are
the same, and are equal to {5,10}, since a set is determined just by
what elements belong to it, and 5, 10 are the only elements in
these sets.
----------------------------------------------------------------------
You ask why the set H constructed in the proof of Theorem 1.10, p.45,
has n - m elements.
The idea of that proof is that we start with the set G, and then,
at each step, throw away an element of G and replace it with an
element of L, in such a way that the new set still spans V. Since
L has m elements, we do this process m times; so m elements
are thrown out of G. Since G has n elements, after throwing m
elements out of it, we are left with n - m elements; the resulting
set is called H in the theorem.
All this is somewhat hard to see in the book's proof, because they do
the proof by induction, and so just show the inductive step, where one
element is thrown away and one new element brought in. Thus, the
fact that the new set has n-m elements is replaced by an inductive
argument, showing that if we assume that when L has m elements,
we can get H to have n-m, then when L has m+1 elements we
can get H to have n-(m+1) (last step).
----------------------------------------------------------------------
You ask why the vector space P_n(F) of Example 10, p.47, has
dimension n+1 and not n
In asking a question like this, in order for me to help you you need
to show me how far you have been able to get, and where you've gotten
stuck. If it seems to you that the answer should be n, then you feel
that the book's statement is wrong for all n, and you may as well
start, for concreteness, with a particular case, such as P_1(F),
which the book's statement says is 2-dimensional, and you feel is
1-dimensional. So take such a case, and e-mail me what your
understanding is of what elements belong to P_1(F), and what
1-element set you think is a basis of it, and why.
So please do this now; and try to make a practice, in general, of
making clear, in your Questions of the Day, how far you can get in
answering the questions, and where you hit a problem.
----------------------------------------------------------------------
You ask what is meant on p.47, Example 11, by saying that {1} is a
basis for the field of complex numbers; specifically, what {1} means,
and what its linear combinations are.
{1} means the set having a single element, namely 1. This is an
instance of the notation described in Appendix A, where it was noted
that {1,2,3,4} was the set with exactly 4 elements, 1,2,3 and 4.
Thus, a linear combination of members of {1} simply means an
expression gotten by multiplying its line element 1 by a scalar;
i.e., products c 1, where c is in the base field, in this
case, C. Since c 1 = c, such elements are all the complex numbers.
----------------------------------------------------------------------
You ask whether the example of vector spaces having different
dimensions over the reals and complexes (p.47, Examples 11 and 12) can
be carried further, giving vector spaces with more than two different
dimensions over different fields.
Yes. But this involves concepts that are introduced in Math 113, so
I shouldn't spend a nontrivial amount of time on them in 110, even H110.
Such examples are all based on having one field contained in
another; e.g., the field of complex numbers contains the field
of real numbers, and is two-dimensional over it. There are no easy
examples other than that one where one of the fields is the real or
complex numbers; but starting instead with the rational numbers, there
are lots of examples. E.g., the field Q of rationals is contained
in the field F_1 of numbers a + b sqrt 2 where a and b are
rationals; and this in turn is contained in the field F_2 of numbers
of the form p + q 4th-root-of-2, where p and q are in F_1.
Then F_1 is 2-dimensional over Q, and F_2 is 2-dimensional
over F_1, hence 4-dimensional over Q. An n-dimensional vector
space over F_2 will be 2n-dimensional over F_1, and 4n-dimensional
over Q.
The factors that have appeared in all these examples have been powers
of 2, but that was just because we chose the simplest cases. The
set of expressions of the form a + b cube-root 2 + c (cube-root 2)^2
forms a field F 3-dimensional over Q, so we get factors of 3 when
comparing dimensions of vector spaces over F and over Q.
----------------------------------------------------------------------
You are right that when we extend a linearly independent set to a basis
(p.48), the vectors added can be chosen "randomly" as long as each one
is not a linear combination of those that came before. Thus, at each
stage, we can use "almost any" vector.
But that "almost" (i.e., the linear independence condition) makes a
big difference, and mustn't be forgotten!
----------------------------------------------------------------------
You ask about the diagram on p.49, and whether it indicates that bases
of V are subsets of general linearly independent sets.
No -- it indicates that the set of all bases of V is a subset of
the set of all linearly independent subsets of V. (For instance,
if V = R^2, then the set of all linearly independent subsets of V
includes such subsets as {(1,0), (0,1)}, {(1,0), (1,1)}, {(1,0)},
{(0,1)}, {(1,1)}, and the empty set. But of these, the set of all
bases includes only {(1,0), (0,1)}, {(1,0), (1,1)}.)
I know that thinking about sets of sets is confusing at first. I hope
this helps.
----------------------------------------------------------------------
You ask about the meaning of "interpolation" (p.51, bottom).
If we have some points on graph paper, and we find a curve that
passes through them, that is called "interpolating", i.e., "supplying
values in between".
----------------------------------------------------------------------
You ask about the product sign, with "k = 1" and "k \not-= i"
written below it and "n" written above it, in the definition of
the Lagrange polynomial f_i(x), p.51.
It means that the product is taken over all values of k from
k = 1 to k = n, except for the value k = i.
A clearer way of writing the same thing would be to put, under
the product sign, the two conditions
1 _< k _< n
k not-= i
and nothing above it.
----------------------------------------------------------------------
You ask about the first display on p.52, concerning Lagrange
polynomials, which says that f_i(c_j) = 1 or 0 depending on
whether i=j.
The idea behind it is simply to look at all n^2 combinations
of applying the n polynomials f_1,...,f_n to the n points
c_1,...,c_n. So the authors could have said "For each i and
each j in {1,...,n}, we can evaluate f_i(c_j)", and given
the formula they did. Instead, however, they focus on how each
function f_i varies with c_j, so rather than talking about i
and j varying independently, they fix i, and let j vary.
But it comes to the same thing. (Fixing f_i and varying its
input makes sense, since they've just defined f_i as a polynomial.)
As to why it is so, look at the formula for f_i on the
preceding page. The factors of the numerator are precisely
the x-c_j with j not-= i (although the authors write k
rather than j in that formula); so the roots are precisely
the c_j with j not-= i.
----------------------------------------------------------------------
You ask how the authors get the final display on p.52.
They take the next-to-last display, which shows that g(c_j) = b_j,
and substitute into the display before that, where the b_j appear;
thus, they replace these by the g(c_j).
----------------------------------------------------------------------
You ask about versions of Lagrange interpolation (pp.51-53) for
functions of more than one variable.
Some versions of the concept exist, but the general situation is not
nearly so simple as for one variable.
In the one variable case, we know that a polynomial takes the value
0 at c if and only if it is divisible by x-c, and that a polynomial
is divisible a list of such elements (x-c_0), (x-c_1), etc. (with
c_0, c_1, ... distinct) if and only if it is divisible by their product;
and this allows us to find, for any c_0, ..., c_n, unique
polynomials which are 0 at all but one of these points, and 1 at
that point. But for polynomials in two variables x and y, the
condition for being 0 at a point (c,d) is not a simple divisibility
question, and when one imposes those conditions at several points,
the nature of the resulting condition depends on how the points are
arranged.
The one really simple case is where the points are arranged in a
"rectangle", namely where one is given constants c_0,...,c_m
and d_1, ..., d_n, and wants to specify the values of the polynomial
at the (m+1)(n+1) points (c_i,d_j). Then by considering the
products of m factors (x-c_i) and n factors (y-d_j), one can
get unique polynomials of degree _< m in x and _< n in y with
arbitrary values at those (m+1)(n+1) points.
----------------------------------------------------------------------
You ask, in the proof on p.52 of the fact that the Lagrange polynomials
form the basis of P_n (F), how the a_j are defined.
Since the authors have set out to prove that the polynomials are a
basis, you, the reader, should anticipate what must be done in the proof
by recalling what the statement to be proved means. It means that
those polynomials are linearly independent, and span the space. Linear
independence, in turn, means that if any linear combination of these
polynomials is zero, then the coefficients in the linear expression
are all zero.
So when the authors write "Suppose that \Sigma_{i=0}^n a_i f_i = 0
for some scalars a_0,a_1,...,a_n", you should say to yourself, "Aha,
they are assuming some linear combination of the polynomials is zero,
and they will prove that the coefficients must be 0, so as to
establish linear independence."
I hope this answers that part of your question: They are not "defined",
they are assumed to be any family of scalars occurring in a linear
relation among the polynomials.
You then ask how this proves that the polynomials form a basis.
This is explained in the the multi-line block of text in the middle
of the page. The authors note that what they have proved shows that
beta (the set of polynomials named) is linearly independent, and they
then call on Corollary 2 to conclude from linear independence that it
forms a basis. Did you look at what Corollary 2 says, and check whether
one of the statements of the corollary is applicable in this case,
and proves that beta is a basis? If you tried, and had difficulty,
your question should have pointed to what this difficulty was -- what
part(s) of the corollary you decided might be applicable, and what
difficulty you had applying it.
----------------------------------------------------------------------
Regarding the statement of the Maximal Principle on p.59, you
ask, if script-F has a member that contains all other members,
isn't that the maximal member?
Yes. But in many cases of importance, script-F does not have a
member that contains all other members, but nevertheless has maximal
members. That is the situation when script-F is the set of linearly
independent subsets of a nonzero vector space V: No one basis contains
all others, but there are lots of bases, i.e., maximal linearly
independent subsets.
(You wrote "the" maximal member in your question. In the everyday
use of "maximal", there is usually only one, so that "the" is
appropriate; but in the mathematical use, there can be more than
one, so one generally speaks of "a" maximal element.)
----------------------------------------------------------------------
You ask whether the word "family" simply means a set, or something more.
Both!
In section 1.7, the word is used as a synonym for "set". E.g., in
the statement of the Maximal Principle, p.59, "a family of sets"
just means "a set of sets". I suppose the authors say "family" to
avoid repetition of the word "set"; though in mathematical writing,
repetition is not considered bad if the same concept does indeed occur
more than once, as it does here.
But "family" is also used for an _indexed_ collection of elements;
for instance, the 3-tuple (10,100,10) is indexed by the set {1,2,3},
since it has a first component (corresponding to the index 1), namely
10, a second component (corresponding to the index 2), namely 100, and
a third component (corresponding to the index 3), again 10. It is
clearly distinct from the set {10,100,10} = {10,100}.
----------------------------------------------------------------------
You ask about the fact that Theorem 1.12 on p.60 _assumes_ that V
has a generating set, respectively a linearly independent subset, S.
First, you should be clear that this doesn't make the theorem
invalid; the theorem merely asserts that _if_ V has a generating
set S, then S contains a basis of V. If you are clear about
that point, then you are correct in saying that to conclude from
the theorem that _every_ vector space V has a basis, we need to know
that V has a generating set.
But in fact, we do: V is a generating set for itself.
The background in terms of which to think of the theorem is the
following: Suppose one has first figured out how to prove from
the Maximal Principle that every vector space V has a basis --
namely, one applies that principle to the set script-F of all
linearly independent subsets of V. Then, in looking at specific
cases, one realizes that one would like to have bases consisting
of elements with certain properties. One sees that one can make
the same proof work if the set of all elements with those properties
generates V. So one proves Theorem 1.12, which is a generalization
of one's first theorem: that first theorem is the case where S = V,
but by taking other sets S which one may know, in particular cases,
generate one's vector space, one can get bases with various properties
that the original theorem didn't guarantee.
The situation concerning Theorem 1.13, which you also ask about, is
similar: One knows that the empty set is a linearly independent subset
of V, and the statement of the theorem for that set is equivalent to
just saying that V has a basis.
----------------------------------------------------------------------
You note that on p.65, the book provides more "tests to verify
whether a function is a linear transformation", and you ask "Do we
have to prove all four requirements or will just proving one suffice
in showing T is a linear transformation?"
Neither.
Conditions 1 and 3 say "If T is linear then ...". This means that
if you know T is linear, you know that the condition shown will
be satisfied. But knowing that the condition is satisfied will not
show T is linear; i.e., they are of no use in showing T is a linear
transformation! (They can be useful in establishing that some function
T is _not_ a linear transformation: If the conclusion of one of them
does not hold, then T cannot be linear.)
The most straightforward way to show that a transformation is linear
is to show that both conditions (a) and (b) of the _definition_ (same
page) are satisfied.
You can also use condition 2 or 4 of the list you refer to. You can
tell this from the fact that they begin "T is linear if and only if".
Condition 2 is a tiny bit shorter than the definition, and the book
prefers to use it. Condition 4 is more complicated than the definition,
so it is not particularly convenient in proving linearity.
----------------------------------------------------------------------
You ask how the authors get the next-to-last line in the computation
of Example 10, p.69.
The three matrices in that line are the elements T(1), T(x), T(x^2)
of the preceding line; on this line, the authors simply show the values
of those elements. As an example of how those values are computed, let
me show how the upper left-hand entry, -3, of T(x^2) is found.
The definition of T(f(x)) in that example says that its upper
left-hand entry is f(1) - f(2). In evaluating T(x^2), we have
f(x) = x^2. Hence the upper left-hand entry will be 1^2 - 2^2 =
1 - 4 = -3.
----------------------------------------------------------------------
You ask about the usage "f(x)" in Example 10, p.69, and whether it
wouldn't be more appropriate to write "T(f) = ..." rather than
"T(f(x)) = ...".
It would, if f simply denoted a function; but a polynomial is not
quite the same thing as a polynomial function. This can be seen in
the field Z_2, where x and x^2 determine the same polynomial
function (the identity function of Z_2), but are distinct polynomials.
(I refer briefly to this in the parenthetical third paragraph on p. 8
of the handout on Sets, Logic etc.. As I say there, one goes into it
in detail in Math 113; but I do plan to say a few words about it in
this course.) So "f(x)" does not mean "the value of the function f
at the argument x", but "a certain polynomial constructed from the
symbol (indeterminate) x"; and something like "f(1)" does not mean
the value of f(x) when x=1, but the value obtained by _replacing_
the indeterminate x by the field-element 1.
Even under this interpretation of polynomials, writers have the
option of denoting a polynomial by a single letter such as f.
But f(x) is another option, and it does not involve the difficulty
you suggested, of logically meaning a certain number.
Note also that with polynomials interpreted as above, one can
refer to T(x), T(x^2), etc., while an interpretation as functions
would require setting up symbols for the function x|->x,
the function x|->x^2, etc. before one could write down the
value of T at these functions.
----------------------------------------------------------------------
You ask about the step in the proof of the Dimension Theorem (p.70)
that gives the last display: "Hence there exist c_1,...,c_k\in F
such that [display]".
Well, in the preceding part of the proof, it was shown that the
linear combination of v_{k+1},...,v_n with coefficients
b_{k+1},...,b_n was in N(T). (That is the preceding display.)
But from the beginning of the proof, we also know that v_1,...,v_k
form a basis of N(T). Hence the element that was shown to be
in N(T) must be a linear combination of v_1,...,v_k. To say
it is a linear combination of these means that there are coefficients
c_1,...,c_k\in F such that our element equals c_1 v_1 + ... + c_k v_k.
----------------------------------------------------------------------
You ask about the statement at the end of the proof of the Dimension
Theorem, p.70, that "this argument also shows that T(v_k+1), T(v_k+2),
... T(v_n) are distinct".
The argument has shown that any relation of the sort displayed in
the middle of the page (first display with a "Sigma") implies that
all the coefficients b_i are zero. Now if we had a relation
T(v_p) = T(v_q), for distinct indices p,q\in {k+1,...,n}, then
taking b_p = 1, b_q = -1, and all other b_i = 0, we would get a
relation of that sort with not all coefficients zero, a contradiction.
----------------------------------------------------------------------
You as about the fact that a linear transformation is one-to-one if
and only if it sends no nonzero element to zero (Theorem 2.4, p.71).
As you say, this does not correspond to a property of one-to-one
function in general; it is a consequence of the facts that (a) linear
transformations respect the vector space operations, and (b) elements
of vector spaces have additive inverses. Hence if T(x) = T(y) in a
vector space, then by adding the additive inverse of T(y) to both
sides, we get T(x) - T(y) = 0, and since T respects the vector
space structure, this gives T(x-y) = 0. If x and y are distinct,
then x-y is a nonzero element of the null space.
When you take Math 113, you'll see that the general context in which
this holds is group theory. ("A homomorphism of groups is one-to-one if
and only if it has trivial kernel".) Here is an example of a situation
to which it does not apply: Let N denote the set of natural
numbers, with the operation of addition; thus, N x N is the set
of ordered pairs of natural numbers, and we can define addition of
such pairs componentwise: (a,b) + (a',b') = (a+a',b+b'). Now let
f: N x N -> N be defined by f(a,b) = a+b. Then f is not one-to-one:
Each natural number n has n+1 preimages; e.g., 2 has the three
preimages (2,0), (1,1), (0,2); in particular, the additive identity
element, 0, has just one inverse image, (0,0); so the "kernel" of
the map is trivial.
----------------------------------------------------------------------
You ask where the authors develop the "one-to-one correspondence
between matrices and linear transformations" that they promise in
the introductory paragraph on p.79.
Well, that paragraph is a simplification. What we actually see is that
_for_each_choice_of_an_ordered_basis_for_V_and_an_ordered_basis_for_W_
they give us a one-to-one correspondence between linear transformations
V -> W and m x n matrices (where n = dim V and m = dim W).
So in fact, they give us more than they promise: not just one
one-to-one correspondence, but correspondences tailor-made for the
situation. This is done in the Definition near the bottom of p. 80.
----------------------------------------------------------------------
Regarding ordered bases p.79, you ask, "... is ... the point that the
order remains fixed throughout whatever operations are performed on
that basis? ... Theorem 2.6 (p. 72) seems to imply the latter ..."
As I will note in class, the "order" is really an indexing of the
set by integers 1,...,n. I'm not sure what you mean by saying that
it "remains fixed" under some operations. In Theorem 2.6, the fact
that v_1,...,v_n are mapped to w_1,...,w_n respectively does
not mean that the transformation "fixes the order"; it simply means
that, having labeled our basis of V in a certain way with the
integers 1,...,n, we use these same integers to keep track of the
elements that v_1,...,v_n to be mapped to. In other words, it
is not a property of the elements, but of the way we are keeping
track of them.
----------------------------------------------------------------------
You ask why ordered bases (p.79) are not introduced for
infinite-dimensional vector spaces.
This is done (though not in our text); but the subject gets a lot
more complicated. As I noted in class, there are many different
ways a countable set can be ordered; so even looking at the three
simplest, one could represent vectors by columns that start at a top
and go downward in infinitely many steps, which would thus be indexed
by the positive integers (since one writes a column with the
lowest-indexed term at the top); or by columns that start at a bottom
and go upward in infinitely many steps, indexed by the negative
integers; or that go both upward and downward in steps, indexed by all
the integers. Going to a more complicated ordering, one could have
columns indexed by the rational numbers ... . And there would be
still different sets of choices if one's vector space were
uncountable-dimensional.
Moreover, under such a representation, not every possible column
corresponds to a vector; only the columns in which all but finitely
many of the entries are zero. On the other hand, in constructing a
linear transformation, one specifies its value arbitrarily at all
members of a basis of the domain space; there's no requirement that
all but finitely many go to zero. Hence a linear transformation
between infinite-dimensional vector spaces is represented by a matrix
in which each column has only finitely many nonzero entries, but there
is no such restriction on rows. (This is called a "column-finite"
matrix.)
The main subject of our text is finite-dimensional vector spaces.
The authors are to be commended for defining and proving things for
arbitrary vector spaces when this doesn't lead to excessive
complications; but it's understandable that when it would, as in the
case of ordered bases, they limit themselves to the case that the
book revolves around.
----------------------------------------------------------------------
You ask whether there is a use for the vector space L(V,W)
of linear transformations from V to W (p.82).
Well, we certainly make use of linear transformations! And
we sometimes add two transformations V -> W, or multiply
such a transformation by a scalar. When we do so, we are
treating the set of linear transformations as a vector space.
So it makes sense to recognize that this is what we are doing.
That way, we can call on general results about vector spaces
when we study linear transformations.
----------------------------------------------------------------------
You ask about the convention of writing functions to the left of
their arguments, and therefore composing them so that UT(x) =
U(T(x)) (implicitly assumed on p.86).
I mentioned a sort of exception in class: while m x n matrices
act on the left on columns of height n (giving as outputs columns
of height m), they also act on the right on rows of length m (giving
as outputs rows of length n). When we discuss section 2.6 (dual
spaces) I will, hopefully, be able to discuss the relationship between
these actions.
I go into detail about these things in
/~gbergman/grad.hndts/left+right.ps
(Do you have a browser tool and/or printer that can handle PostScript?)
It is aimed at grad students, and assumes a little familiarity with
module theory, which is basically linear algebra over rings that are
not necessarily fields. If you have a little such background, you
might find it interesting.
----------------------------------------------------------------------
You ask about the Kronecker delta (p.89).
It is not limited to use in linear algebra. It is simply a
general symbol for the instruction "Output 1 if the two
subscripts are equal, output 0 if they are unequal". This is
handy to have available when the need arises in any branch of
mathematics. Not enormously important; just a handy symbolic
device.
----------------------------------------------------------------------
You ask what is meant at the beginning of the proof of Theorem 2.16,
p.93, where it says "It is left to the reader to show that (AB)C is
defined."
Look at the definition of matrix multiplication on p. 87, and then
at the paragraph that follows, at the top of p. 88. The definition
says what "AB" means only when A is m x n and B is n x p.
Since for an m x n matrix, "n" is the number of columns, and for
an n x p matrix, "n" is the number of rows, the product AB is
defined only when the number of columns of A equals the number
of rows of p. (So, for instance, the product of a 3 x 7 matrix
and a 7 x 2 matrix is defined, but the product of a 3 x 7 matrix
and a 6 x 2 matrix is not.) The comment at the top of p. 88 expands
on this, and describes the number of rows and the number of columns of
the product AB when it is defined.
Now in Theorem 2.16, the hypothesis (i.e., the assumption we start with)
is that A(BC) is defined. This means that the product of B and
C is defined, and that the product of A with this product matrix is
defined. As discussed above, this corresponds to certain equalities
among the numbers of rows and numbers of columns of A, B and C.
The first assertion of the theorem, which the authors leave to the
reader to show, is that these equalities also guarantee that (AB)C
defined. Using the definition and the comment on p. 88, see whether
you can verify that this is so.
----------------------------------------------------------------------
Concerning the "dominance relations" discussed on p.95, you ask
about the exclusion of the diagonal entry in the statement that
A + A^2 has a row in which all entries except the diagonal are
positive.
Mathematicians use "positive" to mean "> 0". To say ">_ 0" one
uses "nonnegative". Every entry of A^2 + A is, of course,
>_ 0, but they are talking about which are actually > 0.
You also ask about their use of the word "stages". They are
using "x dominates y in two stages" to mean "x dominates someone
who dominates y". They are assuming that the reader who has thought
about how A^2 is computed will see that it counts relationships
of this form, and hence will recognize "dominates in two stages"
as natural ad hoc wording to express this.
----------------------------------------------------------------------
You ask, in connetion with the concept of invertible matrices, p.100,
whether an nxn matrix of rank n must be invertible.
Yes. I was surprised not to be able to find an explicit statement
of this in the text, but essentially it is the implications
(c) => {(a) and (b)} of Theorem 2.5, p. 71, applied to L_A.
----------------------------------------------------------------------
You ask why the statement of the lemma on p.101 says first that
V is finite-dimensional if and only if W is finite-dimensional,
and then says the dimensions are equal, instead of letting the latter
statement cover both facts.
It is because we have not defined the concept of dimension for an
infinite-dimensional vector space. To do so requires the concept of
the cardinality of an infinite set, developed in Math 135 (and also
touched on in Math 104), but not assumed in other undergraduate courses.
We could, I suppose, call the dimension "infinity" if the space is
infinite-dimensional, but that would sweep under the rug the question
of existence of bases, whether the bases have the same "number" of
elements, etc.; and it would also tend to mislead students into thinking
they could argue about "the dimension" without considering separately
the finite case, where it is defined in a precise way, and the infinite
case, where it is described by that catch-all. So I think the authors
are right, in this treatment that does not discuss cardinalities,
to define the dimension only for finite-dimensional spaces. The result
is that one can only write dim(V) = dim(W) after assuming or asserting
that both spaces are finite dimensional.
----------------------------------------------------------------------
You ask how we know, in proving Theorem 2.17, p.102, that T is onto
and one-to-one.
This is the fact mentioned near the top of the page, between numbered
statements "2" and "3": A function (between any two sets) is
invertible if and only if it is one-to-one and onto. This is a
standard fact about sets and maps. See whether you can prove it; if
not, I can supply details. (But in that case, tell me what part you
have succeeded in proving, so that I don't repeat this.)
----------------------------------------------------------------------
Regarding the Corollary to Thm. 2.20 on p.104, you ask for an
explanation of the proof in greater detail than the book provides.
For learning the material in the course, it is essential that when
you come across an argument where the authors leave some details to
be filled in by the reader, you do your best to fill them in yourself.
Look at the statement of the Theorem; look at the statement of the
Corollary; see what features of the two can be matched with each other,
and what one would need to know to get the remaining features of the
Corollary from those of the Theorem. Then, if you can't see how this
can be done, you should be able to pose a very precise question, "Under
the conditions of the corollary, the theorem tells us such and such
[and sometimes, as in this case: "the author's sketch of the proof adds
such and such a fact"]; but how does this imply so-and-so?"
Hopefully, if you approach things in this way -- actively rather than
passively -- you will be able to resolve such questions yourself in
most cases. (After all, the authors have considered this something
the student should be able to fill in, and this text is not aimed at
an honors course but a regular course.) In some cases, they may
misjudge the difficulty of filling in the details, or you may be
thinking about something the wrong way, and you can't fill in the
details; but then at least you should be able to make clear in your
question how far you have gotten, and what you see is still needed.
----------------------------------------------------------------------
Regarding the concept of dual space (p.119), you ask what field V*
is defined over.
You certainly need to know that to follow this section! Since
V* is defined as L(V,F), a special instance of the class of
vector spaces L(V,W), this depends on understanding what
field the authors make L(V,W) a vector space over. If
you aren't sure of that, you should check back to the definition
of L(V,W). (The field isn't named in the definition of L(V,W),
but it is stated quite explicitly in the preceding sentence.
You can find the definition quickly using the List of Symbols
in the inner cover page.)
----------------------------------------------------------------------
In connection with the material of section 2.7 (pp.127-140) you ask
whether we will ever study vector spaces over fields other than R
and C.
In this section, the fields in question are the real and complex
numbers (mostly the latter); but elsewhere, whenever we prove a
result about "a vector space V", this true whatever field V may
be a vector space over -- the reals, the complex numbers, the
rational numbers, Z_2, etc.! So in everything we are doing, we
are simultaneously studying vector spaces over all fields.
We don't give many examples specific to fields other than R and
C because the study of such fields is part of Math 113, which is
not a prerequisite to this course. This is a result of the fact
that our courses are aimed at many audiences, including students
from some departments which want to have them take linear algebra
but not abstract algebra. However, when you do take 113, you will
see several ways of constructing fields, and the general results
we get in H110 will apply to vector spaces over them.
----------------------------------------------------------------------
You ask whether one can represent differential operators (p.129) by
matrices.
To do this one has to have a finite-dimensional vector space of
functions, of which one knows a basis. If, say, we consider a
finite-dimensional space of polynomials of low degree, the problem
is that the solution to our differential equation will not in general
be a polynomial.
However, there are situations where one might usefully do this.
One of the approaches to certain sorts of nonhomogeneous differential
equations is called the "method of undetermined coefficients".
E.g., in solving (D - I) f(t) = sin t, we know the solution of
the homogeneous equation (D - I) f(t) = 0, so it suffices to find
a single solution of the nonhomogeneous equation, and we guess that
this will be of the form a sin t + b cos t, and write down the
differential equation and solve for the coefficients a and b. This
can translated into a calculation that represents the operation of
D - I on the space spanned by sin t and cos t by a matrix, and
solves the corresponding matrix equation.
Why can we trust that there is a solution of the form a sin t +
b cos t ? Because the space spanned by sin t and cos t is closed
under D and I, and since it contains no solutions to (D - I)f(t) =
0, the operator D-I will be one-to-one on it, hence onto.
----------------------------------------------------------------------
You ask why the identity operator appears in expressions for
differential operators (p.131).
Well, if one has a differential equation of the form
a D^2 y + b D y + c y = 0,
then the first term corresponds to applying what operator to
y ? Obviously, a times the operator D^2. And the second term
is similarly gotten by applying b times the operator D. The
final term is gotten by multiplying y by c; if c were 1
this operation would just be the identity operator I; for general
c, it is c I.
In particular, if we think of our differential operator as p(D)
where (in the above example) p is the polynomial a t^2 + b t + c,
then we can think of the c as c t^0, i.e., c times the
multiplicative identity element of the polynomial ring, and when
we substitute D for t, we use c times the multiplicative
identity element of the ring of linear operators, which is I.
----------------------------------------------------------------------
Regarding the exponential function, introduced on p.132, you ask which
definition of it is more common, the one in terms of an infinite series
or the one as a solution to the differential equation y' = y; and
which came first.
I think the infinite sum is the commonest way to define the exponential
function. I just feel that the differential equation is the phenomenon
that makes that infinite sum important. Historically, I think that
what came first was lim_{n->infinity} (1 + x/n)^n. This is "compound
interest compounded at shorter and shorter intervals", and the limit,
where one's account grows continuously at a rate proportional to its
current value, is equivalent to the solution to the differential
equation.
----------------------------------------------------------------------
In connection with the definition of e^{a+bi} on p.132, you ask
whether there are definitions of what it means to raise an arbitrary
number to a complex exponent.
Yes. The definition used is
a^z = e^{z ln a}.
The complication is that in the complex plane, "ln" is multi-valued.
The function z |-> e^z has 2 pi i as a period, so for every nonzero
complex number a, there are infinitely many complex numbers w that
satisfy e^w = a, differing by integer multiples of 2 pi i. If a
is a positive real number, only one of these choices is real,
and putting that into the above formula we get an interpretation of
a^z that extends the real-variables interpretation of a^x. But for
other values of a, there is no such choice available, so one has
infinitely many candidates for the function "a^z", each of which is as
good as the others. Despite this messiness, each of those functions,
alone, is nicely behaved; e.g., satisfies the exponent laws.
You'll see this in Math 185.
----------------------------------------------------------------------
You ask whether there is a way to prove Theorem 2.33 (p.137) other
than the inductive proof suggested in Exercise 10.
Well, one can rearrange the proof so that the induction is
not so obvious. If we have a nontrivial linear relation among
these functions, choose a k such that e^{c_k t} occurs with
nonzero coefficient, and apply to the relation the operator
(D - c_1 I) ... (D - c_n I) with only the factor D - c_k I left
out. One finds that this annihilates all the terms of the
linear relation except the e^{c_k t} term, and does not annihilate
that, giving a relation C e^{c_k t} = 0 for some nonzero constant C,
a contradiction.
But induction is implicit in figuring out how the product of operators
acts on the linear relation, and it is somewhat messy to write "the
operator (D - c_1 I) ... (D - c_n I) with only the factor D - c_k I
left out".
----------------------------------------------------------------------
You ask how the authors got the second display on p.139,
(D-cI)^n(t^ke^(ct)) = 0.
Well, it's preceded by the word "Hence", which means they are
asserting that it is a consequence of the preceding display.
Did you see that? Did you try applying the preceding display in
evaluating (D-cI)^n(t^ke^(ct)) ?
As I say in the class handout, if you ask a question like this
in my office hours, I can probe and see how far you have gotten,
and where your difficulty was. But that's much more difficult
with e-mail, so when you submit your question, you should say as
clearly as possible how far you got -- what obvious steps you
applied, what you concluded, and where you got stuck. Then I
can see what is tripping you up, and help you past it. Please
let me know how far you got with respect to this question.
----------------------------------------------------------------------
You ask whether there are other vector spaces for which analogs of
the results of section 6.7 (pp.127-140) apply.
Well, on the one hand, this section is given as an application of
linear algebra to a particular problem; the general linear algebra
that is being used here, about the relation between bases, dimension,
and linear transformations, applies to all vector spaces. (In
particular, this is true of the one new result in linear algebra
proved in this section, Lemma 2 on p. 135.)
The specific results proved in this section also have various
generalizations. For instance, if T is any linear operator
on any vector space V, and p(t) is a polynomial which one can
factor p(t) = p_1 (t) p_2 (t), then one has p(T) = p_1 (T) p_2 (T)
and p(T) = p_2 (T) p_1 (T); so the null space of p(T) will contain
the null spaces of p_1(T) and p_2(T). If the sum of these
spaces doesn't happen to equal N(p(T)), we can still study this by
starting with, say, N(p_2(T)), finding a basis, finding a family of
elements of V that p_1(T) maps to the elements of that basis,
and combining these with the elements of a basis of N(p_1(T)),
as we have done in this section.
Finally, the result that a solution to (D - cI)(y) = 0 is
given by y = e^{ct} has a generalization. If instead of
looking at functions f: R -> C, one looks at functions
f: R -> V for any finite-dimensional real vector space V,
then one can take a linear transformation U: V -> V, and
look for functions f: R -> V that are differentiable as
functions of t, and satisfy D f(t) = U f(t); i.e., are in
the null space of D - U. We then find that solutions can
be written f(t) = e^{U t} f(0), where e^U denotes
the linear operator I + U + U^2 /2 + U^3 /6 + ... ; which
can be shown to be a convergent series, though the details
of what this means are outside the scope of a course like
this. (One way of looking at this is that if A is the
matrix of U with respect to some basis, then e^U is the
transformation with matrix I + A + A^2/2 + ... , where the
sum of this series of matrices is evaluated entry-wise.)
> Linear algebra is often considered a method for proving theorems
> about lots of "objects" that behave similarly. ...
One can say this about algebra generally; or still more generally,
about mathematics.
----------------------------------------------------------------------
Regarding the proof of Theorem 3.4, p.153, you write
> ... I don't quite understand the switch from L_A(F^n) = R(L_A) ...
Tell me what you think each side of that equation means, so that
I can see what misunderstanding is keeping you from seeing that they
are the same.
----------------------------------------------------------------------
You ask why the row echelon form of a matrix is unique, as stated
on p.191, Corollary to Theorem 3.16.
Looking at the theorem, I see that one more condition should be added
to it to make the corollary really follow from it: namely, that
if there are more than one column equal the e_i, then the column
referred to as b_j_i is the _first_ of those. Note, then, as a
consequence of the meaning of "reduced row echelon form", that the
column b_j_i will be the first column of B having a nonzero entry
in the i-th row.
It then follows that the columns b_j_i will be precisely those columns
of B that are not linear combinations of those columns that precede
them. Using the theorem, we can conclude from this that a matrix
B is the reduced row echelon form of A if and only if it can be
constructed from A as follows: Let the columns of A that are
_not_ linear combinations of the columns to the left of them be the
j_1-st, the j_2-nd, ..., through the j_r-th. Then these form a basis
of the column space of A. Thus, for every j, the j-th column of A
can be represented uniquely as a linear combination d_1 a_j_1 + ...
+ d_r a_j_r. When this is so, then the j-th column of B
is d_1 e_1 + ... + d_r e_r. This description uniquely specifies B.
----------------------------------------------------------------------
You ask about how one proves that a matrix is invertible if and only
if its determinant is nonzero (p.236).
As you indicate, this can be done using the fact that det(AB) =
det(A) det(B). So that is really the "deep" fact about matrices!
The authors give a short proof of that fact in Theorem 4.7 on p. 223
(not assigned). It's based on the fact that one can figure out how
a matrix is affected by applying elementary row operations; those
results, in turn, are proved in Theorems 4.3, 4.5 and 4.6.
Too bad we are supposed to skip all that material in this course!
----------------------------------------------------------------------
You ask whether there is a 0 x 0 matrix, and if so, what its
determinant (pp.232-242) is.
Yes, there is; it has no entries; it represents the unique linear
operator on the 0-dimensional vector space.
Its determinant is 1. This is the value that makes various formulas
relating determinants of matrices of different sizes work. For
instance, in cofactor expansion of a 1x1 matrix with entry a, says
that that determinant equals a times the determinant of the
0x0 matrix, which comes out right if we take the determinant of the
0x0 matrix to be 1. And in the result that the determinant of a matrix
with block decomposition /A B\ , with A k x k, B k x (n-k),
\0 C/
and C (n-k) x (n-k), is det(A) det(B), if we take k=n, so that
A is the whole matrix and C is the 0 x 0 matrix, we see that
for this formula to work the determinant of C must likewise be 1.
Finally, since there is only one 0 x 0 matrix, it is, in particular,
the identity 0 x 0 matrix, so as an identity matrix, it must have
determinant 1.
----------------------------------------------------------------------
You ask about the relation between the book's definition of
diagonalizability (p.245) and what you learned in Math 54, namely
that it's "just being able to arrange a matrix into a diagonal matrix
with its eigenvalues as the A_{ii} values."
Well, I trust that what you were taught in Math 54 was more precise
than that: What you call "arranging" a matrix A was replacing it
with a matrix Q A Q^{-1} for some invertible matrix Q. But looking
at Theorem 2.23, p. 112, you can see that if A represented a linear
transformation T with respect to one basis, then Q A Q^{-1}
represents T with respect to another basis; so what you did in
Math 54 corresponds precisely to finding a basis with respect to
which T has diagonal form, as in our text.
----------------------------------------------------------------------
You ask whether eigenvectors (p.246) reveal any properties about the
vector spaces they lie in.
Well, an n-dimensional vector space over a field F does not
really have have any "properties" other than the integer n.
That is, if V and W are both n-dimensional vector spaces,
then we know that there is an invertible linear transformation
between them, and linear transformations preserve the vector space
structure; so any statement about V that can be expressed in
terms of the vector space structure alone is also true of W,
and vice versa.
So for a vector space alone, there are no distinguishing properties
to be discovered other than the dimension. But if one considers
a vector-space-V-given-with-a-linear-operator-T-on-it, then
such vector-spaces-with-transformation can differ among themselves
according to the way the operator acts, and the eigenvalues do give
distinguishing properties of these entities.
----------------------------------------------------------------------
Concerning linear operators on a real vector space with only complex
eigenvalues (as in my discussion in class of cases like Example 2
on p.247), you ask whether we should view these as "a turning sort
of dilation".
One could. Consider, for simplicity, a real n x n matrix A whose
eigenvalues in C are nonreal complex numbers of absolute value 1.
Each eigenvector x\in C^n is mapped by T to a complex scalar
multiple of itself; if we picture the 1-dimensional vector space
over C that x spans as looking like the complex plane, the
action of T corresponds to rotation of that plane. Those eigenspaces
will contain no nonzero vectors with real coordinates, since our
real matrix sends vectors with real coordinates to vectors with
real coordinates, and so cannot act on a vector with real coordinates as
multiplication by a non-real complex number. But for each eigenvector
x as above, the vector x-bar gotten by applying conjugation to
all the coordinates of x is an eigenvector with eigenvalue conjugate
to the eigenvalue of x, and the space that x and x-bar span will
have some elements with real coordinates, namely x + x-bar and
(x - x-bar) / i. On the 2-dimensional real vector space spanned
by these two elements, the "complex rotations" that T produces
on x and on x-bar combine to give a "real rotation" of that
real space.
More generally, considering any non-real complex eigenvalue (not
necessarily of absolute value 1), the same construction gives
rotation combined with dilation.
Going back to the absolute-value-1 case, note that if we coordinatize
the space spanned by x and x-bar using a real basis different from
the one referred to above, the appearance of the "rotation" can be
distorted, making T carry points along ellipses instead of circles.
----------------------------------------------------------------------
You ask about the meaning of "collinear" in example 2 on p.247.
"Collinear" means "lying on the same line". Of course, any two points
lie on a common line; so what the authors must mean is that v and T(v)
determine the same line through 0. Thanks for pointing this out; I'll
include it in the letter of corrections etc. that I send to the authors
and the end of the Semester.
----------------------------------------------------------------------
You ask, concerning the phrase in the proof of Theorem 5.5, p.261 "By
the induction hypothesis, {v_1, v_2, ... , v_k-1} is linearly
independent," whether they are assuming this, or whether it is the
result of some calculation.
One could describe it either way! Are you familiar with the
method of mathematical induction? (It is generally presented in
freshman calculus.) In mathematical induction, one establishes
a fact for an initial value (in this proof, the value k = 1), and
one then gives an argument showing that if it is true for one value
k-1, it will also be true for the next value, k. From these two
facts, one can conclude that it is true for all k from the initial
value on.
Now in proving "if it is true for one value k-1, it will also be
true for the next value, k", one begins "Assume it is true for k-1".
This is called the "inductive hypothesis", the authors use that phrase
here. One speaks of it as an "assumption"; but the idea behind this
is that we imagine having done k-1 repetitions of the argument that
we are about to describe, and prepare to do the k-th repetition. So
in that sense, it is "the result of a calculation".
In any case, mathematical induction is a technique you need to
thoroughly understand for your upper division courses. If you have
not yet learned it, you should study up on it.
----------------------------------------------------------------------
You ask whether showing that a polynomial splits (p.262) over a field
F is equivalent to saying that the roots can be expressed in radicals.
No. When one speaks of roots being expressed in radicals, one means
radical expressions (expressions involving addition, subtraction,
multiplication, division, and n-th roots) in the coefficients of
the polynomial; and for polynomials of degree > 4, it has been proved
that this is not in general possible; but over a field such as the
complex numbers, it is still true that every polynomial splits.
----------------------------------------------------------------------
You ask about the phrase "(algebraic) multiplicity" in the definition
on p.263.
I imagine that some authors would define the multiplicity of lambda
to be the dimension of E_lambda, and would distinguish the number
the authors are using by the term the "algebraic multiplicity". So I
think that by saying "(algebraic)" they mean to convey that what they
call "multiplicity" some would call "algebraic multiplicity. But that
way of showing it is not that informative to the reader who doesn't
know the background.
----------------------------------------------------------------------
You're right that the last sentence of the Definition on p.264,
in referring to eigenspaces, should specify that they are with
respect to some eigenvalue lambda.
----------------------------------------------------------------------
You ask what is meant by the phrase "consisting of" in the sentence
following the Definition on p.264.
To say that a set consists of certain elements is to say that those
elements are all the members of the set. In other words, "X consists
of all elements x for which P(x) is true" means X = {x : P(x)}.
So it is a general phrase used in set theory, and not special to
linear algebra.
----------------------------------------------------------------------
You ask about the analog of Theorem 5.9 (p.268) for infinite-dimensional
vector spaces.
The authors have only defined diagonalizability for linear operators
on finite-dimensional vector spaces (bottom of p. 245); but one can
easily extend this definition to the infinite-dimensional case, so
say that a linear operator on any vector space is diagonalizable if
and only if the space has a basis of eigenvectors.
However, in the infinite-dimensional case, one doesn't have an analog
of the characteristic polynomial, so one can't speak about its
"splitting", and there is no close analog of statement (a) of
Theorem 5.9.
On the other hand, the analog of statement (b) is true: If T is
diagonalizable in the sense stated above, then if one takes for every
eigenvalue lambda a basis beta_lambda of E_lambda, then the
union of the beta_lambda will be a basis for V.
----------------------------------------------------------------------
You ask why, in the example at the top of p.270, they don't have to
test the eigenvalue lambda_1.
This is because of Theorem 5.7. When the algebraic multiplicity, m,
is 1, then the relation 1 _< dim(E_lambda) _< m of that theorem
forces dim(E_lambda) to be precisely m, which is equivalent to
"condition 2". (Remember that for any linear transformation,
n - rank = dimension of null space.)
----------------------------------------------------------------------
You ask, in connection with the concept of raising matrices to
powers as used in Example 7, p.272, whether it is possible to define
fractional powers of matrices.
It is, and one does, but there are complications. You know, for
instance, that any nonzero complex number has two square roots. One
can see from this that any n x n diagonal matrix over C with n
distinct eigenvalues will have at least 2^n square roots. If
there are repeated eigenvalues, one can get even more. For instance,
the identity 2 x 2 matrix, I, over the reals has infinitely many
square roots, given by reflections about the infinitely many lines
through the origin in the plane (as well as two others, I and -I).
So the answer is, yes, but you can see why we don't go into it in
this course.
----------------------------------------------------------------------
You ask why our authors give the definition of direct sum on p.275,
when Theorem 5.10 on the next page shows that four other conditions
are equivalent to it.
There are two aspects to such a choice. On the one hand, when several
conditions are equivalent, some authors feel it best to make one of
them the definition of a property, and then prove that property
equivalent to the rest, while other authors prefer to first prove
the conditions equivalent, and then give a definition saying "We will
call an object satisfying these equivalent conditions a --". Each of
these approaches has its advantages: the former gives a more "explicit"
definition; the latter a more "flexible" one. We could say that the
authors have made the former choice. Having chosen to do so, one must
decide which of the conditions to use; and the authors have chosen
the one that is simplest, in that it does not require quantifications
such as "For every vector ...", "For every basis ..." etc..
Another factor that often comes in is which form of a definition can
be generalized conveniently to wider classes of cases that one may
want to consider; and I think that was really the deciding factor here.
Notice that the definition on p. 275 is for an arbitrary vector space,
while Theorem 5.10 only concerns a finite-dimensional space. One
could adapt the theorem to infinite-dimensional spaces, if in
statements (b) and (c) one took the approach of the handout I gave
out on unique expressions and infinite sums; and in statements (d) and
(e) one used unordered bases, and added the condition that the bases
be pairwise disjoint (no two have an element in common). But given
that this text was not written for an honors course, the authors
probably didn't want to get into such details. So they gave the
general definition for arbitrary spaces, but only formulated the
other conditions for finite-dimensional spaces.
----------------------------------------------------------------------
You ask whether there is an essential difference between conditions (d)
and (e) of Theorem 5.10, p.276.
Definitely! Condition (d) is formally much stronger than (e), since (d)
says something is true for _every_ family of ordered bases, while (e)
only says that it is true for _at_least_one_ such family.
So if one wants to prove that a vector space is a direct sum of
certain subspaces, it is easier to prove (e) -- one just has to find
one system of bases with the indicated property. (E.g., we see
that F^4 is the direct sum of \{(a,b,0,0)\} and \{(0,0,c,d)\}
by letting gamma_1 and gamma_2 be the obvious subsets of the
standard basis of F^4.) On the other hand, if one is given or
has prove that a vector space is a direct sum of certain subspaces,
it is most useful to have (d), since one can apply it to any system
of bases.
----------------------------------------------------------------------
Regarding the statement of Theorem 5.10 on p.276, you ask
> ... Is there any value to constructing TFAE type statements? Or
> are TFAE statements just convenient ways of condensing what could
> be several inter-related theorems ... ?
That is a big value! Instead of having a tangle of theorems to
remember, and having to chain them together when one wanted to get
from a certain statement to a certain other statement, one has one
easy-to-remember theorem which can always be applied in one step.
> If one wanted to add a condition to the list, would it suffice to
> prove it true assuming one of the other conditions true, or
> would it be necessary to make it fit into the "chain" ...
It certainly wouldn't be enough to show that it was implied by
one member of the list. E.g., if we list some equivalent conditions
for a matrix to be invertible, and then note that invertibility
implies that the matrix is not the zero matrix, that doesn't mean
that invertibility is equivalent to not being the zero matrix!
But one doesn't have to re-work the proof of the theorem to
add one more condition. It suffices to prove that, as you say,
some condition on the existing list implies it, and _also_ that it
implies some condition on that list (possibly the same condition
as for the other implication, possibly a different one). I hope
you will think this through and see that it is so.
----------------------------------------------------------------------
You ask whether the proof of Theorem 5.10 on p.276 doesn't involve
"circular logic".
The criticism "circular logic" applies when one is trying to show
a statement X is true, and one assumes X in doing so. But here
the authors are not claiming that conditions (a)-(e) are always true!
They are asserting that they are equivalent; i.e., that if any one is
true, then so are the rest. And proving (a)=>(b)=>(c)=>(d)=>(e)=>(a)
does exactly that.
A point to think about here is what one is trying to do when one
proves implications X=>Y=>Z. In some situations, one may simply
be trying to get Z from X, and Y has no function but as a
step along the way. But in other situations, one is interested not
just in the fact that X=>Z, but that if X is true Y will also
be true, and that whenever Y is true (whether or not X is),
Z will also be true. That is the case here. So the chain of
implications (a)=>(b)=>(c)=>(d)=>(e)=>(a) is not intended as a way to
get from (a) to (a), but to prove 19 other implications, including such
as (b)=>(d) and (d)=>(b) (the latter going by (d)=>(e)=>(a)=>(b)).
----------------------------------------------------------------------
Your question about the last display on p.276 seems to be based on
a misreading of that formula. When you "parse" a formula, first break
it at the relation-symbols such as "\in", "=", "\subset", etc. (unless,
of course, those symbols occur inside another construction, like the
"\in" in the expression "{x | f(x) \in Y}"). Doing so with this
formula, we see that the set W_j \intersect Sigma_{i\neq j} W_i is one
major piece of the formula, and the formula says that -v_j belongs to
this set (not just to W_j !), and this set is equal to {0}. It
follows immediately that -v_j = 0, hence that v_j = 0, as stated.
----------------------------------------------------------------------
You ask, in connection with the discussion of how to take the
limit of the powers of a diagonalizable transition matrix (p.287),
whether there exists nondiagonalizable transition matrices.
There do; for instance
/ 1 1/2 0 \
| 0 1/2 1/2 |
\ 0 0 1/2 /
In the reading after next, in the middle of p. 300, the authors
state a result, Theorem 5.20 which we won't be able to prove till a
later chapter, describing lim A^m in such cases (statements (b)-(e)).
----------------------------------------------------------------------
You ask whether, for a linear operator T on an infinite-dimensional
vector space, a T-cyclic subspace (p.313) has to be finite-dimensional.
No. If V is the space of those sequences of elements of F with
all but finitely many terms zero, and T is the right shift operator,
then it is not hard to see that the T-cyclic subspace of V generated
by any nonzero element is infinite-dimensional. (On the other hand,
for T the left shift operator on this space, all T-cyclic subspaces
are finite-dimensional.)
----------------------------------------------------------------------
You ask what is meant by the assertion two lines after the last
display on p.313 that W is the smallest T-invariant subspace
of V containing x.
The authors are using "smallest" to mean "contained in all others".
W is a T-invariant subspace of V containing x, and moreover every
T-invariant subspace of V containing x contains W. (They say
that explicitly in the next sentence, and I also did so in class.)
It is common in mathematics to use "smallest" to mean "contained in
all others", since inclusion relations contain more information than
concepts of "size".
----------------------------------------------------------------------
You ask about the relation among the matrices B_1, B_2, B_3 in the
block decomposition of the matrix A in the last display on p.314.
They are three independent matrices. B_1 is k x k, B_2 is
k x (n-k), and B_3 is (n-k) x (n-k). The first expresses
T(v_1),...,T(v_k) in terms of v_1,..., v_k. The remaining two
give the components of T(v_k+1),...,T(v_n) -- namely, B_2
gives the v_1,..., v_k-components of those elements, while
B_3 gives their v_k+1,..., v_n-components.
If you take _any_ three matrices B_1, B_2, B_3 of the indicated
sizes, and put them together as shown to get an n x n matrix A,
then A will represent a linear operator with respect to which
the subspace W = span({v_1,..., v_k}) is invariant.
----------------------------------------------------------------------
You ask about the expression for A shown in the proof of Theorem 5.21,
p.314.
It is unfortunate that the authors write "by Exercise 12", making it
sound as though there is something nontrivial here; it is simply a
straightforward verification.
Given that you know that T_W acts on v_1,...,v_k according to the
matrix B_1, if you ask yourself, "How does T act on v_1,...,v_k ?"
the answer should be clear. And if you then ask yourself, "How is
this reflected in the form of the matrix for T with respect to
v_1,..., v_n ?", that should also be easy. Try this, and if you have
difficulty, write and tell me where you run into a problem.
----------------------------------------------------------------------
Regarding Theorem 5.21, p.314, you ask "if the characteristic
polynomial has no roots in the base field ... then there is no proper
T-invariant subspace of the vector space?"
Assuming that by "proper" you mean "proper nontrivial", the theorem
gives that conclusion if the polynomial has no proper factors in the
base field, but not if it has no roots. E.g., if the field is R and
the characteristic polynomial is (t^2 + 1)(t^2 + 2), then T can (and
we will eventually see, must) have T-invariant subspaces on which it
acts with characteristic polynomial t^2 + 1, respectively, t^2 + 2.
----------------------------------------------------------------------
You asked where a_0, a_1, ... come from in Theorem 5.22(b), p.315.
Part (a) of the theorem asserts that {v, ... , T^k-1(v)} is
a basis of W. Since by definition T^k(v) is also a member
of W, the ordered set {v, ... , T^k-1(v),T^k(v)} can't be
linearly independent, so we must have a nontrivial linear relation
b_0 v + b_1 T(v) + ... + b_k-1 T^k-1(v) + b_k T^k(v) = 0. Here
b_k must be nonzero, since there is no nontrivial linear relation
among v, ... , T^k-1(v). Dividing by b_k, the above becomes a
linear relation in which T^k(v) has coefficient 1, and we can
write this a_0 v + a_1 T(v) + ... + a_k-1 T^k-1(v) + T^k(v) = 0.
----------------------------------------------------------------------
I'm not sure what your difficulty is with the illustration of
Theorem 5.24 in p.319. The text introduces the integer m_i as the
dimension of E_lambda_i. It then determines the characteristic
polynomial of T (using Theorem 5.24, applied after an intermediate
step where they determine the characteristic polynomials of the
restrictions of T to those spaces. The order of the sentence is
such that the intermediate step is given earlier in the sentence
than the definition of the m_i; but I hope you see that logically,
that definition preceeds the determination of those characteristic
polynomials.) Once we have the characteristic polynomial of T,
we can read the multiplicity of each eigenvalue from it -- since
that multiplicity was defined in terms of that polynomial (Definition
on p. 263). We see that it is E_lambda_i, getting the result
asserted. "As expected" means "in agreement with Theorem 5.9(a)".
----------------------------------------------------------------------
You ask whether the definition of a direct sum of matrices (p.320)
can be extended to non-square matrices.
Yes. A square matrix, in general, corresponds to a linear
transformation V --> V of a vector-space V, while a non-square
matrix corresponds to a linear transformation V --> W, where V
and W are two vector spaces. One gets a direct sum of two square
matrices by looking at a vector space V that decomposes as a direct
sum of two subspaces, V = V_1 (+) V_2, and a linear transformation
V --> V that acts on each of V_1 and V_2 by mapping it into
itself; hence the analog for non-square matrices would be based on
a linear transformation T: V --> W where the spaces V and W
each decompose as a direct sum, V = V_1 (+) V_2 and W = W_1 (+) W_2,
such that T takes V_1 into W_1 and V_2 into W_2. Then if
we have bases for V and W which are put together from bases for
the above summands, then the matrix for T with respect to that basis
will be the "direct sum" of the matrices expressing the restrictions of
T to a map V_1 --> W_1 and a map V_2 --> W_2 in terms of the given
bases of V_1, V_2, W_1 and W_2. The same applies, of course, to
the case where there are more than two summands.
----------------------------------------------------------------------
You ask about the last equality in the last display on p.331,
Example 5, which uses the identity z z-bar = |z|^2.
On p. 558, Definition and following sentence, the absolute value
|z| is defined, and the formula z z-bar = |z|^2 is noted It's
straightforward to check, by writing z = a+bi and doing the
multiplication.
----------------------------------------------------------------------
Regarding the Frobenius inner product (p.332, top) you ask
> ... Are there other inner products that are commonly used when
> working with matrices? ...
I don't know; but I doubt it. There is a "magic" in the trace operator;
it and its scalar multiples are the only linear functionals on M_nxn(F)
that are always equal for similar matrices, for instance; and it (or
its negative, depending on one's conventions) is the coefficient of
t^n-1 in the characteristic polynomial of a matrix. So the trace
of a product, with the second matrix appropriately transformed to
make the operation conjugate-linear, is a very "canonical" thing to
look at; and there is not likely to be anything else nearly so
natural.
> ... Why is this not defined as the "standard inner product on M_nxn,"
> as they did for F^n?
If something is seen as an important contribution, it is often named
after its originator; so I suppose that this inner product was
introduced by Frobenius, and named after him. When things simply
develop over time through the work of many people, and no name is
attached to them, then textbook authors have to come up with some
way to refer to them, and names like "standard inner product" are used.
----------------------------------------------------------------------
You ask about the third equality in the display in the proof of
Theorem 6.3 on p.342.
Well, I'll give you a hint, with which I hope you can answer it
yourself: The fact that {v_1,...,v_n} is an orthogonal set gives
you the precise values for all but one of the numbers ,
, ... , . What are the values, and which is
the one exception? When you have answered this, substitute into
the summation.
If this hint doesn't help, ask again.
----------------------------------------------------------------------
You ask whether the order in which the elements of S are used in
the Gram-Schmidt process (Theorem 6.4, p.344) matters.
If we start with elements in a different order, we will in general
get a different orthogonal basis. But, of course, whichever order
we start with, we will get some orthogonal basis.
The process described has the property that v_k lies in
span({w_1,...w_k}), and sometimes, with this in mind, it is useful to
choose a particular ordering for S. For instance, if V = P_m(R),
and we choose w_k = x^k-1, then the above fact says that each v_k
is a polynomial of degree < k. If instead we take w_i = x^n-i+1,
then it says that each v_k is a polynomial divisible by x^n-k+1.
We can't get both properties at once, but by choosing the order we can
be sure of getting whichever one of them we prefer.
----------------------------------------------------------------------
You ask about the step in the proof at the top of p.351, where
the authors say that the desplayed inequality being an equality
implies that ||u-x||^2 + ||z||^2 = ||z||^2.
Well, the preceding display is of the sort that I discussed on a
homework sheet a few weeks ago, with a string of equals-signs and
one ">_" sign mixed in. If the left-hand side is equal to the
right-hand side, then where the ">_" sign appears, one cannot have
">", since that would make the left-hand side greater than the
right-hand side. So instead, there must be equality at that step.
And that is what the authors conclude.
----------------------------------------------------------------------
You ask about the justification of the statement at the beginning of the
proof of part (b) of Theorem 6.7, p.352, that "S_1 is clearly a subset
of W^perp".
Since the set {v_1,..., v_n} is orthogonal, each element of S_1 =
{v_k+1,..., v_n} is orthogonal to each element of {v_1,..., v_k},
and hence is orthogonal to the space spanned by the latter set,
which is W^perp.
----------------------------------------------------------------------
You ask what is meant by the phrase "Since g and h both
agree on beta" on p.357, line after the last display.
Notice that g and h, are functions. One says that two
functions u and v agree on a set S if u(s) = v(s) for
every s\in S. (The word "both" here is confusing; I would
prefer to drop it.)
----------------------------------------------------------------------
You ask when it is natural to look at transpose maps, and when at
adjoint maps (p.357).
Looking at linear operators, the transpose T^t: W* --> V* is the
natural dual in terms of abstract linear algebra. But when we
consider finite-dimensional spaces that are also given with an inner
product, we find it convenient use V and W themselves instead of
the dual spaces V* and W*, since every linear functional has the
form < - , y >. So then in place of T^t: W* --> V* it is natural
to use T*: W --> V. The choice between the matrices A^t and A* is
just the computational reflection of this choice.
So adjoints are used in the study of finite-dimensional inner product
spaces, while only dual maps can be used in the case of spaces without
inner product structure. When we have infinite-dimensional inner
product spaces, all transformations have duals, but only some have
adjoints. In that case, if the adjoint exists, it can be a useful
tool in studying how the transformation T behaves with respect to the
inner product.
----------------------------------------------------------------------
You ask why the same symbol * is used for dual spaces and adjoint
operators (p.358, Theorem 6.9).
I hope that what I said in class made it clear that the adjoint
of an operator is closely connected with the dualization construction.
Most algebraists would in fact write T* for the construction that the
authors denote T^t: W* -> V* (p. 121). But in the theory of finite
dimensional inner product spaces, one makes use of the identification
V <--> V* described in Theorem 6.8. So one translates the duality
map into a map W -> V, and and uses the symbol "T*" for that map
instead, calling the "raw" duality map T^t.
----------------------------------------------------------------------
You ask, in connection with Theorem 6.10, p.359, whether if beta and
gamma are two orthonormal bases of an inner product space, one has
[T*]_beta ^gamma = ([T]_beta ^gamma)*
No -- but something like it is true, namely
[T*]_beta ^gamma = ([T]_gamma ^beta)*
This may seem strange; but it wouldn't if the authors had noted
that the concept of a adjoint map is defined, more generally, for
linear transformations from one inner product space to the other.
Then, since the adjoint goes in the opposite direction to the original
transformation, that would be the only sensible way that adjoints
of matrices could behave.
----------------------------------------------------------------------
Concerning the proof of point (b) on line 2 of p.360 you write
> ... I'm guessing that bringing a constant out of an inner
> product expression requires conjugation.
Yes and no! For the "no", see the fourth line of p. 330, part (b) of
the definition of "inner product". For the "yes", see the fourth
line of p. 333, part (b) of Theorem 6.1.
Do you see the difference between the situation to which the "no"
applies and the situation to which the "yes" applies? Do you see
why the computation on p. 360 falls under the latter case? If not,
ask again!
----------------------------------------------------------------------
In answer to your pro forma question about the proof of the Corollary
on p.362, you wrote that "since rank(A) = n, A is invertible".
That would be right if A were an n x n matrix; but it's an
m x n matrix.
So -- try again! Send me your corrected argument.
----------------------------------------------------------------------
Regarding the comment on p.363, that if the experimenter chooses the
times t_i to sum to zero the computations are greatly simplified,
you ask whether this means he or she must choose negative times.
Yes. He or she would do this by choosing the "zero-point" of his or
her time scale to be the average of the times at which he or she did
the measurements. E.g., if on the original time-scale, the times were
1, 7, 8,
then on the new time-scale, they would be reduced by (1+7+8)/3 = 16/3,
to give
-13/3, 5/3, 8/3,
which do indeed add up to 0.
It would complicate the preparation of the data, but simplify the
matrix computations. Hard to know which way would be better on balance.
----------------------------------------------------------------------
You ask how, in the proof of the Lemma on p.369, the conclusion
"v is orthogonal to the range of T* - lambda-bar I" implies
that the linear operator T* - lambda-bar I is not onto.
Since v is nonzero, it is not orthogonal to all vectors of V.
(For instance, it is not orthogonal to itself.) So as the range
of T* - lambda-bar I is contained in the set of vectors to which
v is orthogonal, that range must be a proper subset of V, which
says that T* - lambda-bar I is not onto.
----------------------------------------------------------------------
You ask why Theorem 6.16, p.372, requires finite dimensionality.
Only for linear operators on finite-dimensional vector spaces
do we have a concept of "characteristic polynomial". And indeed
Example 3 involves an operator on an infinite-dimensional vector space
that does not satisfy any polynomial equation.
----------------------------------------------------------------------
You ask why, as indicated by the answer to Exercise 1(b) on p.374,
operators and their adjoints do not have the eigenvectors, writing
"I assumed that they would be the same."
Not safe to assume that at all! A linear operator and its adjoint
are very different maps (except, of course, in the case of a
self-adjoint operator). I could give a half hour lecture on how to
"think about" the relation between a linear transformation and its
adjoint, but I don't know whether it would get across to people.
So we should just think of the fact that a linear operator and its
adjoint have the same eigenvalues, and the stronger statement that
they have the same Jordan canonical forms, as useful facts, but
don't assume any more is true.
----------------------------------------------------------------------
You also ask why, as indicated by the answer to Exercise 1(d) on p.374,
a matrix A is normal if and only if L_A is normal.
Unlike the relation between a linear transformation and its adjoint,
the relation between linear transformations and matrices is a very
close one. For every concept that the textbook defines for linear
transformations, it defines a corresponding concept for matrices,
generally using the same symbols, and for every result that they prove
for one, they generally prove a corresponding result for the other.
They also prove in general that properties of a linear transformations
T match the properties of the matrices [T]_beta, and properties of
matrices A match the properties of the linear transformations L_A.
In cases where they do not explicitly prove a relationship, it
can generally be gotten from results that they do prove. So to see
that normality of A is equivalent to normality of L_A, take the
definition of what it means for L_A to be normal, namely
L_A (L_A)* = (L_A)* L_A, use results that they have proved relating
the adjoint of a matrix A and the adjoing of the induced linear
operator L_A, namely the Corollary on p. 359, then use the result
they likewise prove relating the product of two matrices and the
products of the induced linear operators, ... and you end up with the
equation L_{AA*} = L_{A*A}, which means AA* = A*A, i.e., that A
is normal.
----------------------------------------------------------------------
You ask about the term "isometry" mentioned near the bottom of p.379,
and why it is only referred to in the context of infinite-dimensional
vector spaces.
"Isometry" is a general term for a distance-preserving function, not
limited to linear algebra, but applicable to any context where distance
is defined. Within linear algebra, it is meaningful when discussing
inner product spaces, since ||x-y|| gives a distance function.
I think the point of what they are saying is this: For an
operator on an inner product space, preserving distances implies
1-1-ness, so in the finite-dimensional case, it also implies
invertibility. In the infinite-dimensional case, one has to
distinguish between distance-preserving operators that are
also required to be invertible, and those that are not; so people
in the field refer to the former as "unitary" or "orthogonal"
(depending on F), while for the latter, they use the term "isometry"
(even though in its more general use, that term would not even imply
linearity -- but they consider the subclass of linear operators
that are isometries.)
So the reason it is not used in the finite-dimensional case of the
study of inner product spaces is that in that case, there is no need
for a term distinct from "unitary" or "orthogonal".
----------------------------------------------------------------------
You ask how the last step in the first displayed equation on p.380
is done.
Did you notice the words "since |h(t)|^2 = 1 for all t" right
after that display?
If you didn't, there's a lesson to be learned: in mathematical writing,
calculations are often explained in the immediately preceding or
following text, so you should look to see whether there are such
explanations.
If you did see it, it would have helped me if you'd mentioned that
in your question, and said what you were or were not able to do with
that fact. Looking at the integral in the display, with the equation
"|h(t)|^2 = 1" in mind, one sees that the integrand has a factor
h(t) and another factor which is the complex conjugate of h(t).
Their product is |h(t)|^2, which as noted equals 1, so the
integrand becomes the product of f(t) with its conjugate. The
resulting integral is therefore = ||f||^2.
----------------------------------------------------------------------
You ask whether Theorem 6.18, p.380 remains true for a linear
map T: V --> W between different inner product spaces.
Nice question!
If the spaces have the same dimension, it remains true. But if
dim(W) > dim(V), some of the conditions can hold, without the
others holding. For instance, for m < n, the map F^m -> F^n
taking (a_1,...,a_m) to (a_1,...,a_m,0,...0) satisfies (b) and (d),
but not (c) or (d). It satisfies part of (a), namely T*T = I_V;
but TT* is not equal to I_W.
If we reword conditions (c) and (d) to say "... such that T(beta)
is an ordered orthonormal subset of W", then we again get five
equivalent conditions: the part of condition (a) mentioned above,
condition (b), these modified versions of condition (c) and (d), and
condition (e).
----------------------------------------------------------------------
You ask whether the statement in Theorem 6.18, p.380, that the
properties listed are equivalent means that if one holds, they all hold.
Right!
----------------------------------------------------------------------
You ask about the inner product expression in the first displayed
line on p.381, where the index of summation is "i" in one sum, and
"j" in the next.
As I discuss in the handout on "Sets, Logic and Mathematical Language",
an index of summation is what is called a "dummy variable". The symbol
used for this variable does not make a difference in the value of
the expression. Both of the summations in the inner product expression
mean the same thing: a_1 v_1 + a_2 v_2 + ... + a_n v_n; so writing
it in two ways does not violate the uniqueness of expressions of
elements in terms of a basis.
The authors choose to use different dummy variables in the two
expressions so that they can easily represent the result of expanding
by the distributive laws for inner products (condition (a) on the 3rd
line of p. 330, and condition (a) on the 3rd line of p. 333). That
expansion gives a sum involving all n^2 combinations of the n terms of
the left-hand factor with the n terms of the right-hand factor, so in
this double sum, the terms of the left-hand factor need to be indexed by
a different subscript symbol from the terms of the right-hand factor.
The correct form of this double sum is easier to see if we have already
used those different symbols in the preceding expression. Hence the
authors' choice to write the sums in the inner product as they did.
----------------------------------------------------------------------
You ask how one shows that unitary equivalence of nxn matrices is
an equivalence relation (p.384).
Well, have you written down the things that have to be verified to
show that it is an equivalence relation, and tried to verify them?
If so, which have you had difficulties with, and what were the
difficulties?
----------------------------------------------------------------------
You ask whether rigid motions (p.385) can be defined on complex
inner product spaces.
The concept of distance-preserving transformation can be defined
wherever one has a concept of distance; i.e., in what in Math 104 you
will learn is called a metric space; so this includes complex inner
product spaces. But as I will point out in class, a distance-preserving
transformation on a complex inner product space which sends 0 to 0
need not be linear with respect to the complex vector space structure;
only with respect to the real vector space structure. So though one
can study such maps, when one does, one is really looking at the spaces
as real inner product spaces. So that is the case our book focuses on.
----------------------------------------------------------------------
You ask about Theorem 6.22, p.386, saying "aren't orthogonal operators
always rigid motions?"
Yes, certainly -- they form a subset of the rigid motions. But it is
a proper subset; that is why we need to compose them with another class
of rigid motions, the translations, to get all the rigid motions.
(The construction of all rigid motions from orthogonal operators and
rotations is somewhat analogous to the construction of all elements
of a vector space V from the elements of subspaces W_1 and W_2
when V = W_1 (+) W_2, except that composition of maps is involved
in the "rigid motions" case, where addition of vectors is used in the
"direct sum of subspaces".)
----------------------------------------------------------------------
You ask why, on p.388, second line below second display, they can
say that the line L about which we are reflecting is the eigenspace
of the eigenvalue 1 of the reflection operator.
When we reflect about a line, that line is the set of points not moved
by our reflection. And the set of vectors not moved by a linear
transformation T is the eigenspace of the eigenvalue 1 of T.
(Or the trivial subspace, if 1 is not an eigenvalue; but in this
case we know that there are nonzero points not moved by our reflection,
so 1 is an eigenvalue.)
----------------------------------------------------------------------
You ask whether one can find a linear transformation that represents
a translation, as a rotation is represented by a linear transformation
on p.390.
No. A linear transformation must respect vector space structure,
so in particular, it must send the zero element to the zero element.
A translation (other than the identity) moves the zero element.
That is the reason that one factors a rigid motion into two parts
as described in this section: The translation handles the movement
of the identity element; when the translation is taken care of, one
is left with a rigid motion that fixes the identity element, and that
is a linear transformation.
(There is a very different way in which one _can_ represent a
translation by a linear transformation; but that involves thinking
of the plane, not as the set of all points of R^2, but, via a certain
correspondence, as a subset of the set of lines through the origin
in R^3. I may talk about that if there's a little extra time. But
for the plane as R^2, translations are not linear transformations,
for the reason noted above.)
----------------------------------------------------------------------
You ask about the statement on p.393 that a subspace W_1 of a vector
space V can be part of two different direct sum decompositions,
V = W_1 (+) W_2 and V = W_1 (+) W_3, arguing that if v = x + y
(x\in W_1, y\in W_2) v = x + z (x\in W_1, y\in W_3), then we must
have y = z, from which one can deduce that W_2 = W_3.
Well, the full statements, for any v\in V, are
There exist x\in W_1, y\in W_2 such that v = x + y,
There exist x\in W_1, z\in W_3 such that v = x + z.
But when we use two such statements together, it is not legitimate
to use the same letter x for the element of W_1 introduced in
the first statement and the element of W_1 introduced in the second
statement, when we have no basis for assuming they are the same. So
if we are going to put those statements side by side we must say
something like
There exist x \in W_1, y\in W_2 such that v = x + y,
There exist x'\in W_1, z\in W_3 such that v = x'+ z.
Then on subtracting the two statements, we don't get 0 = y - z,
but 0 = (x-x') + y - z, which does not force W_2 to equal W_3.
An example that I have mentioned in lecture is where V = R^2,
W_1 = span{(1,0)}, W_2 = span{(0,1)}, W_3 = span{(1,1)}. Then,
for instance, the vector (2,1)\in V, when decomposed using the
relation V = W_1 (+) W_2, gives (2,1) = (2,0) + (0,1), while
decomposing it using the relation V = W_1 (+) W_3 we have
(2,1) = (1,0) + (1,1).
----------------------------------------------------------------------
You ask about the relation between normal and self-adjoint operators,
noting that in the Spectral Theorem (p.401) they seem "not much
different".
As I emphasized in class, the normal operators form a very wide class,
which includes the self-adjoint operators (those satisfying T* = T),
the unitary operators (those satisfying T* = T^-1), and many that
are neither (e.g., the product of any unitary operator with any
self-adjoint operator that commutes with it). In fact, Corollary 3
of the Spectral Theorem shows exactly how much more restrictive the
self-adjoint operators are than the normal ones: The normal ones
can have any eigenvalues, while the self-adjoint ones can only have
real eigenvalues.
The reason that normal operators and self-adjoint operators are
mentioned together in the Spectral Theorem is that normal operators
over the field C, and self-adjoint operators over the field R,
share the property of having an orthogonal basis of eigenvectors.
It is precisely because over R an eigenvector must have a _real_
eigenvalue that when working over R, we must use the more restricted
class of self-adjoint operators for the Spectral Theorem to hold.
----------------------------------------------------------------------
Regarding the Theorem 6.26 on p.406, you ask whether the uniqueness
of the sigma_i doesn't follow immediately from the description of
these values in terms of the action of T relative to the bases
\{v_1,...,v_n\} and \{u_1,...,u_m\}.
The difficulty is that these bases are not themselves unique. The
statement of the result says that they are orthonormal bases of V
and W respectively, such that the images of the members of the first
basis under T are nonnegative scalar multiples of appropriate members
of the second basis. Those are strong conditions, which in most cases
greatly restrict the possible bases that can satisfy them; but such
bases are still not uniquely determined; so it is nontrivial to prove
that _whatever_ pair of bases one chooses that satisfy those
conditions, one will get the same constants sigma_i.
----------------------------------------------------------------------
You ask about the "alternative method for computing Q" at the bottom of
p.432.
The authors start with the n x 2n matrix (A | I) (i.e., the n x n
matrix A with the n x n identity matrix I put to the right of
it). Then, whenever an elementary matrix E is to act on the
n x n left half by multiplying it --
by E^t on the left and E on the right,
they act on their n x 2n matrix by multiplying by E^t on the left,
and on the right by a matrix we could call diag(E, I), i.e., the
2n x 2n matrix with E as the upper left-hand nxn block, I as
the lower right-hand nxn block, and 0 for the other two nxn
blocks. (You should verify for yourself that these multiplications
have effect they describe.) Thus, as the successive elementary
matrices E_i are used, the 2n x n matrix (A|I) ends up being
multiplied on the left by --
... E_2^t E_1^t = (E_1 E_2 ...)^t
and on the right by --
diag(E_1, I) diag(E_2, I) ... = diag(E_1 E_2 ... , I).
Using the fact that E_1 E_2 ... = Q, the result is
Q^t (A | I) diag(Q, I) = (Q^t A | Q^t I) diag(Q, I) = (Q^t A Q | Q^t).
So the left-hand half of the resulting 2n x n matrix, i.e., the
diagonal matrix D, is indeed Q^t A Q.
Probably this very matrix-theoretic description of the row and
column operations wasn't needed to answer your question -- just
a clear understanding of what operations they are performing on
the left and right of (A | I).
----------------------------------------------------------------------
You ask about the definition of a Jordan canonical form of a matrix.
Although the authors haven't stated this as an italicized definition,
they do give it in a precise way, on p.483. Note that the Jordan
canonical form is the _first_ matrix on that page; there, each of the
A_i is a square block having the form shown in the _second_ matrix on
that page (which they call a "Jordan canonical block).
In particular, 1x1 blocks are 1x1 matrices (lambda), and when all
of the blocks A_i in the Jordan canonical form are 1x1, it is
diagonal; but not otherwise.
----------------------------------------------------------------------
You ask why we go to the trouble of studying Jordan canonical forms
(pp.483 et seq).
Well, there are many properties of a matrix that depend only on its
similarity type (its equivalence class under the equivalence relation
"is similar to"). If we want to study these properties, it can be
convenient to have a small family of sorts of matrices such that
every matrix is similar to one of them; then we only have to calculate
with those matrices. We will see this in the material added to next
Wednesday's reading (as noted on the current homework sheet), where
the question of whether lim A^n exists can be examined by going to
the Jordan canonical form and calculating whether lim J^n exists,
rather than trying to do this separately for every possible matrix.
This is in addition to what I said when first introducing the subject:
That if we want to know whether two matrices are similar, we can do
this by putting them both into Jordan form, and seeing whether those
Jordan matrices are equal.
----------------------------------------------------------------------
You ask what is meant by the statement at the top of p.484 that the
Jordan canonical form of a linear transformation is "unique up to
the order of the Jordan blocks".
Well, as an example, consider the matrix near the bottom of the
preceding page, determined by a choice of basis {v_1,...,v_8}.
Suppose we rearrange this in the order v_4,v_1,v_2,v_3,v_5,v_6,v_7,v_8.
(To make this easier to work with, we might call the reordered vectors
w_1,...,w_8, so that w_1 = v_4, w_2 = v_1, w_3 = v_2, etc.) If you
express the same linear operator in terms of the new basis, it will
be built up from the same four Jordan blocks, but the first two blocks
(the 3x3 block and the 1x1 block) will now appear in the reverse order.
So, though the resulting matrix will be a Jordan canonical form for the
same transformation, it will be a different Jordan canonical form --
differing from the original in the order of the Jordan blocks. If
we call two Jordan canonical forms that differ only in the order of
the blocks "essentially the same", then the result that the authors
are claiming is that the Jordan canonical form of a matrix is
"essentially unique".
(Incidentally, not every rearrangement of a Jordan canonical basis for
a linear transformation is a Jordan canonical basis. For instance, if
in the above example we used the order v_2,v_1,v_3,v_4,v_5,v_6,v_7,v_8,
then the matrix we would get for the linear operator would again be a
rearrangement of the matrix shown, but that rearrangement would not fit
the definition of a Jordan canonical form.)
----------------------------------------------------------------------
I hope that my discussion of Jordan canonical forms in class cleared
up your questions about the statement "(T-2I)^3(v_i) = 0 for i=1,...,4"
on p.484. I'll summarize here:
(T-2I)(v_1) = 0 because v_1 is an eigenvector with eigenvalue 2.
(T-2I)^2(v_2) = 0 because (T-2I)(v_2) = v_1, and then when we apply
a second T-2I, the preceding observation is applicable.
(T-2I)^3(v_3) = 0 because (T-2I)(v_3) = v_2, and then when we apply
two more T-2I's, the preceding observation is applicable.
(T-2I)(v_4) = 0 because v_4 is an eigenvector with eigenvalue 2.
So each of v_1,...,v_4 is annihilated by (T-2I)^m for some m _< 3,
so a fortiori, all are annihilated by (T-2I)^3.
(Of course, they are also annihilated by (T-2I)^m for m > 3; but
that is an immediate consequence of being annihilated by (T-2I)^3,
so we don't need to state it -- all we need to state is the smallest
power that annihilates these elements.)
The general rule is that if a Jordan matrix has blocks of various
sizes for eigenvalue lambda, then the basis elements associated with
all these blocks will be annihilated by (T-lambda I)^m where m is
the largest of the sizes of these blocks (where I am calling n the
"size" of an nxn block).
----------------------------------------------------------------------
You ask how, in the third from last display on p.485, the authors can
make the operators (T - lambda I)^p and T commute.
They are not claiming that all operators on a vector space commute
(which is false); but are using the fact that T commutes with any
polynomial in T -- because it commutes with every power of itself,
and polynomials in T are linear combinations of powers of T.
----------------------------------------------------------------------
You ask how the authors reach the conclusion (T-2I)^2 (v_3) = 0 in the
middle of p.488.
As they say, they are referring to Example 1, p. 483. Did you go back
to that example and try computing (T-2I)^2 (v_3) ? Remember that the
i-th column of the matrix shows how T(v_i) is expressed in terms of
the other v_j; so it is easy to write down the effect of T on any
of the basis vectors, and from this the effect of T-2I. If in working
the computation you get a different answer, or still don't see how
to proceed, write me again.
It is important for this course to be able to do such computations,
and this particular computation is an important one for understanding
this last topic.
(After you do this computation, I suggest you go to the matrix above
that one on p. 483 --the middle display-- and, calling the corresponding
linear operator U, see what U - lambda I does to the i-th basis
vector, and from this, what (U - lambda I)^j does to that vector.)
----------------------------------------------------------------------
You ask whether Theorem 7.7 (p.490) holds in the infinite dimensional
case.
Well, first one has to decide what the statement in the infinite
dimensional case should be. (If you wanted to know the answer, you
really should have posed a precise statement and asked whether it
was true.) If we take a vector space with basis {x_1, x_2, ... }
indexed by the positive integers, and define a linear transformation
T by
T(x_1) = 0, T(x_{i+1}) = x_i (i>0),
then we see that V = K_0, but V doesn't have a basis consisting
of finite "cycles", in the sense of the Definition on p. 488. So if
we want such a result to be possible, we should extend the definition
of "cycle" to allow infinite cycles having and initial vector but no
end vector. In that case, the basis given in the above example does
consist of a single such infinite cycle.
We can ask whether K_lambda always has a basis consisting of
cycles in this generalized sense, where infinite cycles are allowed.
Even that doesn't seem to be the case. I can construct a counterexample
using an uncountable-dimensional V. I have an idea for a possible
countable-dimensional counterexample, but don't have time to try
to figure out whether it works. I can show you both examples in office
hours if you are interested. (If you do it would help to let me know
ahead of time that you are coming for this, so that I can get the ideas
straight in my mind in advance.)
----------------------------------------------------------------------
In connection with Corollary 2 on p.491, you ask whether if the
characteristic polynomial of a matrix doesn't split, the matrix
can have a Jordan canonical form.
No. Notice that a matrix in Jordan canonical is upper triangular,
and that if A is any upper triangular matrix, then its characteristic
polynomial, det(A - tI), will equal the product of the diagonal
entries of A - tI, namely (A_11 - t)...(A_nn - t). So that
polynomial splits. Now similar matrices have the same characteristic
polynomial, so if a matrix has a Jordan canonical form J, it is
similar to the upper triangular matrix J, so its characteristic
polynomial must split.
For matrices whose characteristic polynomial doesn't split, one
has the "rational canonical form", covered in section 7.4, which
we won't read. I think the most helpful way to study such matrices
is not by using the rational canonical form over the given field
F, but by going to a larger field over which their characteristic
polynomial splits, and using the Jordan canonical form there.
----------------------------------------------------------------------
You ask why, in the proof of Theorem 7.9, p.499, U moves each dot in
S_2 r places upward.
The answer lies in the labeling of the dot diagram on the preceding
page. Note that each column is formed by putting its end vector at
the bottom, and letting the vectors above that end vector be the end
vector's images under successively higher powers of T-lambda_i I --
until one reaches a vector which T-lambda_i I sends to zero, which
forms the top of the column.
I hope you can see from this that T-lambda_i I sends each dot
which is not in the top row to the dot one step above it. It follows
that (T-lambda_i I)^r sends each dot not in one of the first
r rows to the dot r steps above it, the statement you asked about.
----------------------------------------------------------------------
You ask about the dot diagram for the matrix of Example 3, p.504.
The authors define a different dot diagram for each eigenvalue of
a linear transformation; so the two diagrams shown on the page are
the diagrams for this matrix for lambda = 2 and for lambda = 4.
One could, if one wanted, define the dot diagram for the whole
matrix as gotten by putting together the separate diagrams; but
then one would have to show which dots belong to which eigenvalue.
Only within the diagram for each eigenvalue could one keep a rule
that longer cycles come before shorter ones. So the diagram for
the matrix you ask about would then be
eigenvalue: 2 4
| | |
| o o | o |
| | o |
----------------------------------------------------------------------
You write, concerning the exercises on p.509
> I understand how exercise 1(d) is true, that matrices having the same
> Jordan canonical form are similar but I do not see how it is that 1(c)
> `linear operators having the same characteristic polynomial are
> similar' is false. Are these two statements related to each other?...
Yes. Knowing the characteristic polynomial (and assuming it splits)
we in general get finitely many possibilities for the Jordan canonical
form. If two matrices with that characteristic polynomial have the
same Jordan canonical form, they are similar; if they have different
forms, they are not.
(I have several times shown in class the different Jordan forms that
can be associated to the same characteristic polynomial; mostly using
2x2 or 3x3 matrices; this Monday I also showed a 4x4 case. The only
characteristic polynomials to which only a single Jordan form can
correspond are those which split into _distinct_ linear factors;
in that case, the matrix is diagonalizable.)
----------------------------------------------------------------------
The three of you asked (in two cases "pro forma") why we study the
minimal polynomial of a linear transformation (p.516). It was hard for
me to know what to make of these questions -- whether the reason that
was obvious to me was not sufficient for you, as suggested by answers
to pro forma questions saying that it could be used to prove other
things, or whether what seemed obvious to me wasn't in fact obvious to
others. So I'll take a chance that the latter is the case, and put it
into words.
To understand a linear transformation, we want to know "what it can
do"; in particular, what we can get by applying it repeatedly, i.e.,
taking its powers, and linear combinations of these. To understand
these linear combinations, we want to know when taking further powers
stops leading to larger spaces of linear combinations; i.e., when
some power is itself just a linear combination of smaller powers.
To have a power T^k reduce to a linear combination of T^0, T^1,
... , T^{k-1} is for T to satisfy an equation p(T) = 0 for some
polynomial p(t) of degree k. So we'd like to know when this
happens; and, of course, how it happens: what the first polynomial
p(t) is that makes p(T) = 0.
That, in essence, is it. I don't mean to exclude the minimal
polynomial's being useful for proving other things -- rather, when
it answers such a fundamental question about the behavior of the
linear transformation, we can expect that it _will_ lead to other
results. But I don't think we should straightjacket our mathematical
pursuits by tallies of what we can predict in advance that we will
gain from answering a given question. We should try to answer
questions because they are fundamental questions about the objects
in question.
----------------------------------------------------------------------
You write that on p.549, "it states that the empty set is a subset of
all sets, meaning that 0 lies in every set."
No, it definitely does not mean that 0 lies in every set! "0" counts
how many elements are in the empty set, but 0 is not a member of
the empty set. (Similarly, "3" counts how many elements are in the
set {1, 10, 100}, but it is not a member of that set; only 1, 10 and
100 are.) The empty set, by definition, has no members.
----------------------------------------------------------------------
You ask about the meaning of "characteristic" near the bottom of p.555.
A general convention in mathematical writing is that a word being
defined is printed in a special font -- sometimes italic, sometimes,
as in this book, boldface. If you look at the first point in this
paragraph where the word "characteristic" is used, you will see that
it is in boldface; this means that the statement being given is the
definition of the word. It begins "In this case...", which refers to
the beginning of the paragraph ("In an arbitrary field F, it may
happen ..."). The contrary case is handled by the second half of the
sentence, where "characteristic zero" is defined.
As I said in class, you won't be responsible for properties of fields
other than R or C; but if you want to understand this concept,
read that sentence very carefully. If there are words of phrases in
it that you have trouble with, ask about them.
----------------------------------------------------------------------
Regarding the author's remarks near the bottom of p.555 about
unusual properties of fields of nonzero characteristic, you
ask what these are.
Actually, such fields are not as different as the authors' comments
might suggest. If one looks up all the references to "characteristic
of a field" in the index, then except for the page where the
characteristic is defined, they all turn out to be places where the
assumption is made that the characteristic is not 2. This is needed
to argue that if x = -x, then x = 0. (Namely, if x = -x, then
(1+1).x = 0, so if the characteristic is not 2, this makes x=0.) This
is used in proving things about odd functions, antisymmetric matrices,
etc.. But the most important facts of linear algebra remain true
regardless of the characteristic.
----------------------------------------------------------------------
Regarding the appendix on complex numbers, beginning on p.556, you ask
"What, in our world, motivates the use of the complex number system?"
It was probably the lack of motivating examples in our world that
is the reason the theory of complex numbers was so slow to develop!
But once it was understood, it was seen to be extremely useful in
studying mathematical questions that could be motivated solely
in terms of the real numbers. For a very simple example, note
that the formula for trigonometric functions of sums of angles,
multiple angles, etc. are awkward to remember and derive; but if
one uses the definitions sin x = (e^{ix} - e^{-ix})/2i, cos x =
(e^{ix} + e^{-ix})/2, these formulas are trivial to check, and easy
to derive from scratch.
If you look in the student store at some Math 185 texts, the first
few pages often discuss the history of the complex numbers.
----------------------------------------------------------------------
You comment that you can't remember seeing complex numbers (p.556)
in any science class you've taken, nor have you seen applications in
any math class.
Are you sure you didn't see them in Math 54 in the study of differential
equations? The behavior of the solutions of a linear differential
equation with real coefficients depends on the roots of the associated
polynomial; complex roots correspond to oscillating solutions, with the
frequency of oscillation depending on the imaginary part of the root,
while the real part determines whether the amplitude decays, grows, or
is stable. If you haven't seen them in science classes, that simply
means that the science classes you have taken so far were tailored for
students who had not seen complex numbers, so that all uses of them
were left out. But you certainly couldn't teach a course in quantum
mechanics without them.
----------------------------------------------------------------------
You ask about the equation e^{i theta} = cos theta + i sin theta
on p.560, asking "How do I know this is true?"
Well, the book says, in the preceding sentence and the one you point to,
that this equation is a special case of Euler's formula, introduced
on p. 132. So you should have checked out that page to see what it
says there.
On that page, it gives the formula as a definition. So one can't ask
"How do I know this is true?" -- a definition is true by definition.
You can only ask, "Why is this considered a good definition to make?"
There are various justifications one can give. The power series
argument that you gave is one.
Another is to note that e^x is the solution to the differential
equation d/dx f(x) = f(x) such that f(0) = 1. The differential
equation says, intuitively, that if one changes x by a tiny amount
Delta x, then f(x) will change by approximately f(x) Delta x. Now
if we assume that f(x) can be defined for complex x so as to
satisfy this same condition, then on letting Delta x be a small
imaginary number, the change in f(x) should be in a direction in the
complex plane perpendicular to f(x) (in the counterclockwise sense).
In particular, if we start with f(0) and look at f(i theta) as
theta increases, we see that the velocity of the vector f(i theta)
will be perpendicular to that vector, and equal to it in magnitude; so
f(i theta) will move in a circle about the origin. From the condition
f(0) = 1, we deduce that f(theta) will be given by cos theta +
i sin theta.
----------------------------------------------------------------------
In connection with the result that the field of complex numbers
is algebraically closed (p.560), you ask about other algebraically
closed fields.
At the end of Math 113, you will (hopefully) see the construction
by which, given any field F, and any polynomial p(t) of
positive degree with coefficients in F but not having a zero
in F, you can construct a field K containing F in which
p does have a zero. In the case where F is the field of real
numbers and p the polynomial t^2 + 1, the new field you get
is the complex numbers, and every polynomial of positive degree
has a zero in that field, so that the construction doesn't have
to be repeated any more. But for most fields F (e.g., the
rationals, or Z_2) that does not happen. However, one can repeat
the construction infinitely many (possibly uncountably many) times
so as to eventually handle all the polynomials; and the result
is then an algebraically closed field. This isn't done in Math 113,
but it is in Math 250A.
----------------------------------------------------------------------