ANSWERS TO QUESTIONS ASKED BY STUDENTS in Math H1B,
		Fall 2010, Fall 2012, and Fall 2014,
    taught from the UC Berkeley custom text based on James Stewart's
		  "Calculus, Early Transcendentals".

(For the first two years we used a custom text based on Stewart's
6th edition, which had pages numbered as in that edition; but I
converted the pagination to that of the 7th edition after the semester.
When we get a copy of the 8th edition, I hope to update the pagination
to that.  Though we covered the material in nonconsecutive order, I
have rearranged the answers below in the order of pages referred to,
with Stewart's appendices at the end.  Incidentally, the page that
I consider a question to refer to, which determines its location in
this file, is denoted "p.N" where N is the page-number, while any
other pages referred to in the answer are denoted "p. N", with a
space before the  N;  this can be of help in pattern-searching for
material regarding a particular topic.)

======================================================================
You ask about Zeno's paradox (p.6) which argues that a man can never
walk across a room, because he has to walk half way, then half of the
remaining half, etc., and this involves infinitely many successive
intervals.

I don't know who first came up with a satisfactory explanation; but
here is my way of looking at it:  If the room is, say, one unit in
width, and the man is going at the speed of 1 unit per minute, then it
is true that the finite distance can be divided, as described, into
infinitely many intervals; but the time of 1 minute can be divided
in exactly the same way into infinitely many time intervals: half
a minute to cross half the room, a quarter of a minute to cross
half the remaining distance, etc.; and just as the infinitely many
fractional distances add up to 1 unit of distance, so the infinitely
many fractions of a minute add up to one minute; so he succeeds in
crossing the room in the finite time of one minute.

However, if one makes a tiny modification in the situation, then
the man really is blocked from reaching the other side.  Suppose
we assume that half way across the room, he pauses for a tenth
of a second to think "I've gotten half-way across"; and suppose
that half-way across the remaining half, he again pauses for a
tenth of a second to think "I've gotten half-way across that
part", and so on, pausing for a tenth of a second at each of the
points Zeno referred to.  Even though the first and second and
third pauses would add negligible time to the time it took for
him to traverse his last segment of distance, by the time he
got to his tenth pause, it would take a time comparable to the
time he took crossing the tenth interval, and after that,
the pauses would greatly outweigh the time spent walking, and
with infinitely many tenth-of-a-second pauses, he really would
never get across the room.  (Though it would take a microscope
to see that he hadn't made it.)

----------------------------------------------------------------------
You ask how the epsilon-delta definition of limit (p.110) is
different from the definition that tells that a limit is when a
function approaches a certain value.

"Approaches" is an everyday, word which by itself does not have an
exact mathematical meaning.  But the epsilon-delta definition is
very precise; it is something that one can prove is true or prove
is not true for a given  f,  a  and  L  by mathematical reasoning.

----------------------------------------------------------------------
You ask how small the error epsilon in the definition of limit
(p.110) has to be.

The definition gives a condition that has to hold for _every_
positive real number epsilon.  To give you an intuitive understanding
of the concept, the author illustrates the calculations by looking
at a few specific values of epsilon.  But handling some specific
values does not prove what you want; it just gives you a feel for
the problem, and insight into how delta should relate to epsilon.
When you are past the "exploration" stage, and ready to write
a proof, you don't prove things just for certain epsilon; you must
give a prove that works for every positive real number epsilon.

----------------------------------------------------------------------
You ask about applying the criterion for a function  f  to have limit
as  x->a  equal to  L  (p.110) to the case where every interval
(a-delta, a+delta)  has points (other than possibly a) where
f  is undefined.

For simplicity, in his development Stewart is assuming that
the set at which  f  is defined includes a neighborhood of  a
(possibly with  a  itself removed).  One can consider functions
which do not have this property, and then one modifies the
definition of limit, to say that for every  epsilon  there exists
a  delta  such that for all points  x  of  (a-delta,a+delta)  (other
than  a)  at which  f  is defined, one has  |f(x) - L| < epsilon.

Functions whose domains of definition contain "lots of gaps"
are not too important in calculus, so Stewart does not give a
definition that covers them.  If you go on to Math 104, then
you will see functions with very general domains.

(If what you had in mind was functions like the square root
of  x,  which is defined on one side of  0  but not the other,
Stewart handles these using the concept of "one-sided limit",
which is in Wednesday's reading.)

----------------------------------------------------------------------
You ask about Example 2 on p.112.  I hope that I made clear in class
that this is not "circular reasoning" but just two versions of the
same reasoning: a "scratchwork" version which starts with what we want
and ends by saying what \delta to use, and a precise version, which
starts with the \delta found in the first part, and proves that it
indeed works.  If you feel that the first part has already shown that
it works, I won't disagree; I'll simply say that it's best to give
a proof that begins by laying out the value of \delta you will use,
and then shows that it works.  If you still have difficulty with
this, let me know.

If you are concerned about what a proof you write up should look like:
it should look like the second part, labeled "Proof", not like the
first part.

----------------------------------------------------------------------
You ask about the way Stewart outlines the proof of limit statements:
first "guessing" a value of  \delta that should work, then proving it
(e.g., in Example 2, p.112).

One can talk in terms of "guessing" when one is beginning the
subject, but the point of view I emphasized in lecture was that of
"planning strategically".  The reality can be a mixture of the two.

I gave the example of finding the right conditions to use in
showing  lim f(x)+g(x) = L+M.  As a first naive try, we might choose
\delta  to make  |f(x) - L| < \epsilon  and  |g(x) - M| < \epsilon.
But we discover that this only makes  |f(x)+g(x) - (L+M)|  less
than 2\epsilon.  So we go back, and make  |f(x) - L|  and
|g(x) - M| < \epsilon/2,  and that works.

If we try the proof for  f(x)g(x)  naively, we have to go through
more steps.  The deviation of  f(x)  from  L  doesn't just
get added to the deviation of  g(x)  from  M;  rather, the
first gets multiplied by  g(x)  and the second by  f(x).  Since
g(x)  is around  M  and  f(x)  is around  L,  we might try making
|f(x) - L| < \epsilon/|M|  and  |g(x) - M|  < \epsilon/|L|.
Several problems with that:  first, we have to take account of the
two errors being combined together; so we put a "2" in each
denominator.  Second, as we said,  g(x)  is "around M",
but it isn't generally equal to it; so we handle that by replacing
the |M|  in the denominator by  |M|+1,  and making our choice of
\delta  such that  g(x)  doesn't deviate from  M  by more than 1.
Third,  L  or  M  might be 0, and we can't divide by 0.  Fortunately,
the  +1  handles that also. ...  I won't go through the details, but
we end up with the proof of Law 4, which works when we've finally
set things up right.

So, as I say:  think strategically about what conditions to
put on  \delta  to make the argument work, and then, if your
plan covered all the problems, and you feed it into the proof,
this will work.  If, in trying to prove it, you find that
additional conditions are needed, figure out how to work those
into your choice of \delta, and try again.

(I realize that you were asking about Stewart's examples with
specific functions, while my answer was about general laws; but
the problem is the same; just less trivial in the case of
general laws.)

----------------------------------------------------------------------
You both ask how one chooses a \delta in \delta-\epsilon proofs; e.g.,
Example 3 on p.113.

I can help best if you come to office hours.  I'll base my answer
on a guess that you might understand the definition of limit better
if expressed using words more than symbols.  In such terms,
lim_{x->a} f(x) = L  means that "no matter how close we want to get
f(x)  to  L,  we can guarantee this by requiring  x  to be close enough
to  a".  So in Example 3, if we want to get  \sqrt x  "close" to  0,
how "close" do we have to make  x  to 0?  For instance, if we want
to get  \sqrt x  within  .1  of  0,  or within  .001  of  0 ... ?

If you explore these cases, you will find that to get  \sqrt x
to be less than a certain distance of  0,  it will suffice to get  x
less than the square of that distance from  0.  Now the distance
from  0  less than which you want  \sqrt x  to be is called
\epsilon,  and the distance such that you hope you can achieve
this by keeping  x  less than that distance from  0  is called
\delta;  and using those symbols makes it much easier to express
what one is trying to achieve: what I have described awkwardly
in the beginning of this paragraph can be stated nicely:  given
any \epsilon > 0,  if we take \delta = \epsilon^2,  then the
condition in the definition of  lim_{x->0-} \sqrt x = 0  will
be satisfied; so  lim_{x->0-} \sqrt x = 0  is proved.

----------------------------------------------------------------------
You both asked about the "C" in Example 4, p.114.

What we are trying to do is insure that  x^2 - 9 = (x-3)(x+3)  will
be "small" (less than \epsilon) by making  x-3  small (less than
some \delta).  To know how small we need to make the first factor
in order to get the product to be small, we have to know how large
the second factor might be.  If we let  x  range over all real
numbers, then  |x+3|  could be arbitrarily large; but since the
definition of a limit as  x -> 3  allows us to restrict  x  to values
near  3,  we can avoid that problem.  Actually, any choice we make
for how close  x  must be from  3  will impose some bound on how
big  x+3  gets.  So Stewart takes the simplest such choice,
"distance < 1 from 3", notes that this will make  |x+3|  less than 7,
and that allows us to complete the argument.  He refers to the
desired bound on  x+3  as "C".

One could equally well have said "we will assume  x  to have
distance less than .5 from 3" or "we will assume  x  to have
distance less than 1,000,000 from 3"; all of these would have
led to valid proofs of the limit statement, with different
values of  C.  Stewart just made the simplest choice.

----------------------------------------------------------------------
I hope that the answer to your question -- why  |x-3| < 1  in
Example 4 on p.114 -- was clear from my lecture.  The key point
is that we can *make* it  < 1  by taking  \delta \leq 1,  which
we do explicitly when we define  \delta = min(1, \epsilon/7).

----------------------------------------------------------------------
You ask why, in the third display of Example 4 on p.114,
we "substitute  C for  x+3".

Look more closely!  The two sides of the display are not connected
by "=", but by "<" .  Reading the formula correctly, can you see
the justification?  Ask again if you still have a question about it.

----------------------------------------------------------------------
One of you notes that in Example 4, p.114, Stewart restricts  x
to a certain interval, and you ask whether there is an "exact" way
of proving the limit.

Although the way Stewart words it does sound as though restricting
to the interval is cutting corners, the justification he gives is
logically correct: the definition of a limit allows you to make delta as
small as you wish, as long as it is positive; so whatever restrictions
on it one would otherwise make, one can add the restriction that it
be \leq 1.  This means that the values of  x  for which one has
to check the condition  |x^2 - 9| < epsilon  lie within the interval
he names.

So the argument he gives is in fact an exact proof.

The other one asks whether, instead of restricting  x  to be within
a distance 1 from 3, he could have used 0.5 or other numbers.

Certainly!  One just has to make some choice, so that we have some
restriction on the size of  x+3.  Once we have made such a choice,
we can proceed with the computation Stewart gives.

----------------------------------------------------------------------
You ask whether the Sum Law (pp.114-115) and other laws for limits
can be proved simply by substitution.

Substitution doesn't always work for limits.  For instance, suppose
f(x)  is the function defined by

		f(0) = 1
		f(x) = 0  for all  x  other than  0.

Then we can't compute  lim_{x->0} f(x)  by just substituting  x=0
into  f:  f(x) = 0  as we are approaching  x=0,  so the limit is
0;  but  0  is not the value of  f(0)  itself.

The above goes wrong because the function  f(x)  is not continuous.
The statement that the limit of a sum is the sum of the limits is
in fact equivalent to saying that the operation of addition is a
continuous function.  But the operation of addition is a function
of two variables, and in this course, we are only studying functions
of one variable; we won't even see definitions of limits and continuity
for functions of more than one variable in this course; they are
introduced in Math 53.  Till then, the proof of the limit law for
addition (and likewise for multiplication) must be done more or less
as shown in the book.  (When one does have the concept of continuity of
functions of several variables, the proofs that the addition and
multiplication functions are continuous are very similar to the proofs
in today's reading, but just a little bit simpler, in that they involve
variable-symbols  x  and  y,  instead of function-symbols  f(x)
and  g(x).)

----------------------------------------------------------------------
Regarding formula [5] in the proof of the Sum Law on p.115,
you ask why one has  \leq  at the last step.

The absolute value of the sum of two numbers can either be
equal or less than the sum of their absolute values.  E.g.,
|3 + 7| = |3| + |7|,  but  |3 + (-7)| < |3| + |-7|.  Either case
can happen in the situation of the proof of the Sum Law:  If both
f(x)  and  g(x)  are greater than  L  and  M  respectively, or
are both less, then  |f(x)-L|  and  |g(x)-M|  have the same
sign, and we get equality.  But if one is greater and the other
is less, we get inequality.

----------------------------------------------------------------------
You ask why Stewart talks about different values  \delta_1 and
\delta_2 on p.115.

It is because we have two different functions,  f  and  g,  to which
we are applying the definition of limit.  Since  lim_{x->a} f(x) = L,
we know that to get  f(x)  within  \epsilon/2  of  L,  it is enough
to get  x  "close enough" to  a;  and since  lim_{x->a} g(x) = M,
to get  g(x)  within  \epsilon/2  of  M  it is also enough to get  x
"close enough" to  a;  but in these statements, "close enough" may
be different for  f  and for  g.  So we call a value that works
for  f  "\delta_1" and a value that works for  g  "\delta_2".

Then, as Stewart observes in the next step,  min(\delta_1,\delta_2)
will work for both of them.

----------------------------------------------------------------------
You ask how we find the  \delta  needed to verify that a function
f(x)  satisfies  lim_{x->a} f(x) = infinity  (defined on p.115).

That depends on the function  f(x),  of course.  Given a particular
function, ask yourself, "Why am I sure that the function is approaching
infinity?"  Your answer will involve some properties of the function;
and hopefully, you can translate those properties into a precise
argument showing that given any  M,  a certain  \delta  will have
the required property.

If what you are given is not a particular function but a generic
situation, such as that of Exercise 44(a) (p. 118), the idea is
still the same; but in place of specific properties that a known
function has, you will have to use the properties that the functions
named in the situation are assumed to have.

----------------------------------------------------------------------
In connection with Stewart's definition of infinite limits on p.115,
you ask how one would define infinite limits as  x --> a+  or  a-.

I hope you can see how to answer that yourself!  Just look at how
the definitions of  lim_{x->a+} = L  and  lim_{x->a-} = L  are
gotten by modifying the definition of  lim_{x->a} = L,  and make
the same modifications in the definition of  lim_{x->a+} = infinity.

Let me know if you have difficulty with this.

----------------------------------------------------------------------
You ask about Stewart's use of  M  in defining limits equal to
infinity, on p.115 versus  N  in defining limits equal
to  - infinity, on p.116.

These are arbitrary choices that he makes in writing these
definitions; the choices aren't standard.  Whatever letters
you choose in writing the definitions will be acceptable, if what
you say about them is correct.  (M and N are the common choices
in such statements, but there is no common practice of using
different letters depending on which infinity one is approaching.)

----------------------------------------------------------------------
You ask whether there is an easier way of showing that
lim_{x->0} 1/x^2  is infinity than the one given in Example 5
on p.116.  In particular, you say the use of "M" confuses you.

This example shows how to get the result directly from the
definition of a limit being infinite.  One has to work directly
from definitions to get one's first results.  Once one knows how
to use the definitions, one can use them to get general results,
and prove specific results from these, which is often easier.

To understand this proof, you should look back at the definition
(on the same page, which introduces the symbol "M") and work on
understanding that.  Ask for help from me or the GSI if you have
difficulty with this.

----------------------------------------------------------------------
Concerning 6th line on p.116, you ask why choosing a larger  M
may require a smaller \delta.

Well, just think of the statement that  lim_{x->0} 1/x^2 = infinity.
Instances of this statement are "If we take  x  small enough, we
can get  1/x^2  bigger than  100", and "If we take  x  small enough,
we can get  1/x^2  bigger than  10000".  To do the first of these,
we need to take  |x| < 1/10,  i.e., take \delta = 1/10 (or smaller).
To get the second, we need to take  |x| < 1/100,  i.e., take
\delta = 1/100 (or smaller).  So the bigger you want to get  f(x),
the smaller you have to get \delta.

The general principle is that to get a strong conclusion, you need
a strong condition.  When we were dealing with statements of the
form  lim_{x->a} f(x) = L,  both a "strong condition" and a "strong
conclusion" referred to values being small (|x-a| and |f(x)-L|
respectively).  But when we look at infinite limits, or limits
at infinity, one or the other of these conditions says that something
should be large, while (unless we are considering infinite limits
*at* infinity), the other still says that something is small.
So they may seem "opposite" in nature; but they are really both
cases of "strong conclusion requires strong assumption".

----------------------------------------------------------------------
You ask how to approach exercise 38, p.118.  (By the way, as stated
in the class handout, your Question of the Day should be on the
reading itself.  You can ask questions on homework in addition to
a question on the reading, but they don't substitute for such a
question.  I'll count it this time, but please remember in the
future.)

If you asked about something like this at office hours, I'd give
you a first suggestion to think about, see how far you could carry it,
and then take you a step further ... .  By e-mail, the best I can do
is give you a start, and speak briefly about where to go after that.

As a first step, think about how you could prove that the limit
as  x  approaches  0  of the Heaviside function is not, say  17.
To show that the statement in the definition of  lim_{t->0} H(t) = 17
is false, you want to show that it is not true that for every epsilon
there exists a delta such that [the rest of the def.]; in other words,
you need to show that for some epsilon there exists no such delta.
I think you will find it easy to pick such an epsilon (a value
such that  H(t)  does not stay within that distance of  17
for small  t).

Well, if you can do it for 17, can you do it for less egregious
values, such as  1/2?  What about the values  0  and  1?  After
you have done this experimentation, can you write down a proof
that it can't be done for any value of  L?  (If you can skip a
lot of the experimentation and see your way to the proof without
it, that's fine.  The experimentation shouldn't go into your
write-up of the proof, anyway.)

----------------------------------------------------------------------
You ask about what  L  to use in the proof that   lim_{t->0} H(t)
does not exist, in Exercise 38, p.118.

If you just used  L=1,  you would just be showing that  H(t)  did
not have 1 as its limit; that wouldn't show it had no limit.
Similarly, if you just used  L=0,  you would just be showing that
0  was not its limit.  So you have to give a proof that shows that
for every  L,  not just 0 or 1, the function does not have limit  L.
Your argument will in fact have to depend to a certain degree on  L;
i.e., you will have to argue that if  L  is in a certain range, then
the limit is not  L  by one computation, while if it is in another
range, the limit is not  L  by a slightly different computation.

Note also that your Question of the Day is supposed to be about
the reading, not the homework.  As I say on the class handout,
you are *also* welcome to ask questions about the homework, but
this should be *in addition to* your question about the reading.
Please bear this in mind for future questions.

----------------------------------------------------------------------
You write

> Can limits and continuity be determined for other types of equations,
> such as parametric and polar?

"Parametric and polar" aren't different kinds of equations -- they
are different ways of graphing functions.  The concepts of limit
and continuity are properties of functions, not of how we graph
them.  (It is true that on p.119 Stewart describes continuity as
meaning that you can graph the function "without removing your pen
from the paper".  But that is just a way of getting the idea across;
it is not the definition.)

----------------------------------------------------------------------
Regarding Definition 3 on p.120, you ask about the parenthetical
statement about one-sided continuity at endpoints in the case
where the interval is open.

When Stewart says "an endpoint", he means "an endpoint which belongs
to the interval".  So, for instance, if the interval is  [0,1),  then
this only applies only to the endpoint  0,  while if the interval is
open, the comment about endpoints doesn't apply to at all.

Although the parenthetical statement you are referring to, taken
by itself, is ambiguous, the preceding sentence makes it clear that
he is talking about "every number in the interval", which doesn't
include endpoints that are excluded from the interval.

----------------------------------------------------------------------
Regarding Example 4 on p.121, you ask "What is the purpose of moving
the lim to inside the square root sign?"

The statement that a function  f  is continuous at  a,  namely,

	lim_{x->a} f(x) = f(a),

can be thought of as

	lim_{x->a} f(x) = f(lim_{x->a} x)

This is a silly way to write it, since it's easier to write "a" than
"lim_{x->a} x".  But it gets the point across that continuity is
means that "limits can pass from outside to inside  f", which gives
us a hint as to what to do when we want to show a complicated function
is continuous:  We pass the "lim" from the outside to the inside one
step at a time.  In Example 4 on p.121, passing it through the square
root sign is one of those steps.

One can also ask "What is the justification for that step?"  Stewart
says "(by 11)".  It took me some hunting to find what he meant:
By law 11 in section 2.3, on p. 101.

----------------------------------------------------------------------
Regarding the proof of Theorem 4 on p.121, you write

> I believe that Stuart writes (f+g)(x) to
> mean f(x)+g(x).  Is that common notation ... ?

Yes.  If you are not familiar with such notation, see the section
"Combinations of functions", pp. 39-41.

----------------------------------------------------------------------
In relation to Theorem 7 on p.123, which says that many sorts
of functions are continuous everywhere on their domain, you ask
whether there are any functions -- other than those defined by
different formulas on different sets -- that are not continuous
everywhere on their domains.

Well, all such functions have something peculiar about their
definitions; but here are three examples that aren't blatantly
of the sort you refer to.

Two are given in Stewart:

The "greatest integer" function, [[x]], is defined on p. 105, and
given as an example of discontinuity on p. 120.

Another is given in Problems Plus number 2 on p. 781.

For the third example, recall that one usually defines  arcsin x
to be the unique  y  between  -\pi/2  and  \pi/2  with  sin y = x;
but someone might want his arcsin function to be non-negative valued,
and define it to be the smallest  y \geq 0  with  sin y = x.  This
turns out to be discontinuous at  0.  (For  x = 0,  it gives  0,
but for  x < 0,  it gives values  > \pi.)

----------------------------------------------------------------------
You asked whether it wasn't obvious that, as stated in Theorem 8,
p.125,  lim_{x->a} f(g(x)) = f(lim_{x->a} g(x)).

I hope that what I said in class clarified this:  if as  x -> a,
the function  g(x)  approaches some  b,  and we then apply  f
to these values  g(x)  that are approaching  b,  then if  f  is
continuous, we are fine:  the values  f(g(x))  will be approaching
f(b);  but if  f  is discontinuous, this need not be true.

----------------------------------------------------------------------
Regarding the technique of integration by parts (pp.464-468)
you ask whether there is an easy way to decide which part of
an integral should be  u  and which should be  v'.

In general, no.  Obviously, you want the  v'  to be something that
you know how to integrate; i.e., such that you can find an appropriate
v.  Among the various choices for which that is true, you have to
look ahead to foresee which choice of  u  and  v'  will lead to a
product  u' v  on the right-hand side of the integration by parts
formula that you can also integrate.

----------------------------------------------------------------------
You ask whether, parallel to the development of integration by parts
from the product rule for differentiation (p.464), there is a technique
based on the quotient rule.

Actually, the quotient rule is a version of the product rule.  If
we differentiate  u/v = u v^{-1},  then the product rule says the
result is  u (v^{-1})' + u' v^{-1} = uv'(-1/v^2) + u'/v,  which
when brought to a common denominator gives  (u'v-uv')/v^2.  This
used the fact that the derivative of v^{-1} is -v'/v^2, and likewise,
if we have a function such as  uv'/v^2  to integrate, we can say "Aha,
v'/v^2  is the negative of the derivative of v^{-1} !", and use
ordinary integration by parts, with  v^{-1}  as one of the functions.

----------------------------------------------------------------------
Regarding example 6 on p.467, you ask

  Why is u=sin^{n-1} x?
  Why is v=-cos(x)?

First, I hope you saw the word "Let" before those equations.  In
other words, the author is not saying that we can tell from the
integrand  sin^n x dx  that we must take  u=sin^{n-1} x  and
v=-cos(x).  In the technique of integration by parts, we have
to start with some factorization of the integrand, and he is saying,
"Let's try the simplest factorization, the one into  sin^{n-1} x
times  sin x dx,  noting that we can write the latter as  d(-cos x)."

You then ask what happens to the coefficient  1/n  on the right-hand
side of the equation to be proved.

In this example, think of the left-hand side of that equation
as what is given, and suppose we didn't know the right-hand side.
Starting with the above choices of  u  and  v,  we apply integration
by parts, as the author shows.  We get an equation in which the
original integral occurs on both the left and the right-hand sides,
with coefficients  1  and  -(n-1).  As shown on the next-to-last
equation of the example, we can combine these into one term on the
left, with a coefficient of  n.  Dividing by that  n,  we get the
desired formula, with "1/n" on the right.

----------------------------------------------------------------------
You ask whether there are reduction formulae for integrals of powers
of other trigonometric functions, in addition to those developed for
the sine and cosine (Example 6 on p.467, and Exercise 48 on p.469).

Yes.  On the one hand, to get such a formula for the secant,
you can just take the reduction formula for the cosine and
put a negative value in for the exponent.  This will give an
equation expressing the integral of a "smaller negative" power
of the cosine in terms of the integral of a "larger negative" power;
equivalently, expressing the integral of a smaller power of the
secant in terms of the integral of a larger power; but it can
be turned around to express the integral of the larger power in
terms of the integral of the smaller power.

To get a reduction formula for powers of the tangent, one has
to start from scratch; but it can be found.  The results for
tangent, cotangent, secant and cosecant are given as formulas
75, 76, 77 and 78 in the table of integrals (on the sheet I
handed out on Wednesday).

----------------------------------------------------------------------
Regarding the first two displayed equations on p.468, you write that
"cos^2 x  becomes  -(n-1)\int sin^n x dx."

That is not what is happening -- look more carefully.  At the top of
the page there is a formula in which the integrand contains "cos^2 x".
In the next line the author notes that  cos^2 x = 1 - sin^2 x;  so
try substituting  1 - sin^2 x  for the  cos^2 x  in the right-hand
side of the formula at the top of the page, multiplying out, and
seeing what you get!

As I said the first day of class, reading a math text should be a
struggle with the text.  Not a struggle because the author is against
you, but a struggle to make the methods being introduced a part of
your thinking.  This means doing calculations yourself when the
results are not obvious.

----------------------------------------------------------------------
You ask, in connection with the last sentence of Example 6, p.468,
how the reduction formula can be used repeatedly.

The formula proved expresses the integral of  sin^n x  in terms
of the integral of  sin^{n-2} x.  If  n-2 >_ 2, you can apply the
same formula to that new integral, with "n-2" in place of "n".
(E.g., if you started with the integral of  sin^5 x,  then the
reduction formula with  n=5  expresses this in terms of the integral
of  sin^3 x;  and you can then apply the reduction formula with
n=3  to that, to and express it in terms of the integral of  sin x.)

So if you start with any odd integer  n,  you can reduce the
integration successively to the corresponding integrals with exponent
n-2, n-4, ..., 3, 1;  while if you start with even  n,  you can can
again reduce to the cases n-2, n-4, ..., etc.; since these are
even, we will end with  ..., 2, 0.  In either case, the final integral
is one we can easily do.

----------------------------------------------------------------------
Regarding Exercise 37 on p.469, you ask how you would know to
use substitution if the exercise didn't say so.

Well, we haven't had a formula for integrating  cos \sqrt x,  but
we have had a formula for integrating  cos t;  substitution is a
tool that will convert one into another.  (It will also bring in
another factor,  dx/dt,  but we can hope that this won't create
an intractable problem.)  Stewart has told us to do this here
because the focus of this section is integration by parts, and
the student may not have the technique of substitution in mind.
But in general, one has to ask "What tools do I know, and which
of them is likely to bring the problem into a form I can solve?"
As I said the first day, integration is an example of an "inverse
problem", where instead of having straightforward methods that
always give the answer, one needs ingenuity to discover a method
that will work.  After giving various techniques, Stewart will
discuss "Strategy for integration" in section 7.5 (reading #5).

----------------------------------------------------------------------
You ask when one would need to use the identity
[(sin x)(cos x) = (1/2)(sin 2x)] (p.473) in integration.

Well, if one was integrating  sin^4 x cos^4 x,  it would
be convenient to turn this into  (sin x cos x)^4 =
(1/2 sin 2x)^4 = (1/16) sin^4 2x, before preceding further.
One could use the methods the author describes without this
first step; but the step would make things shorter.

It can also help when one is trying to decide whether solutions
one got by two methods agree.  If one solution involves  sin x cos x
and the other involves  sin 2x,  then one can convert between
these expressions and check.

----------------------------------------------------------------------
You ask whether there are formulas for the integrals of
sin^m x cos^n x  and (p.473)  tan^m x sec^n x  (p.474) that
work whether  m  and  n  are odd or even.

Well, for the sine and cosine, once one has learned a little
about calculus with complex numbers (numbers  x+iy,  where  i
is a square root of -1), one can use the formulas in Exercise 48
on p. A64 (which come from equation 6 on the preceding page); this
reduces the problem to integrating exponential functions (which is
easy).  But one has to go through the algebra of substituting the
formulas from that exercise into the integrand, expanding the
result algebraically, and after integrating, converting the answers
back to trigonometric form.

For the tangent and secant, the same idea, together with the
substitution  u = e^{ix},  converts the problem into one of
integrating rational functions, which we will learn about in
reading #4.  This is messier than the preceding case, but it also
works.

We'll discuss the complex interpretation of trig functions
referred to above when we get to reading #25; though it will
only be a small part of that reading.

----------------------------------------------------------------------
You're right, in point (a) of the box on p.474, where Stewart
has  k\geq 2,  he should have  k\geq 1.  I'll include that in the
letter of corrections I send him at the end of the Semester.  Thanks!

----------------------------------------------------------------------
You asked about the 1/2 before the answer to Example 8, p.476.

It comes from the fact that if we write "I" for the
integral we are trying to find, then the preceding equation
has the form  I = -I + other stuff (though the -I occurs in
the middle of the other stuff).  So when we solve this, we
get  2I = other stuff,  or  I = 1/2 (other stuff).

----------------------------------------------------------------------
You ask how one gets the trigonometric formulas on p.476.

I hoped to sketch how one gets them in class Friday and again today,
but didn't have the time.  One nice geometric way to see them is
to draw a picture with unit vectors in the plane coming out of the
origin at angles  A-B  and  A+B.  (To make the situation easy to draw,
let  A  be a relatively large angle -- say 40 degrees -- and  B  a
relatively small one, say 10 degrees.)  So those two vectors will
have coordinates  (sin A-B, cos A-B)  and  (sin A+B, cos A+B).  Hence
the midpoint of those vectors will have coordinates
(1/2 [cos(A-B) + cos(A+B)], 1/2 [sin(A-B) + sin(A+B)]).  But that
midpoint will clearly come out of the origin at the average of the
angles  A-B  and  A+B,  namely,  A;  and it is not hard to compute
that its length will be cos B; hence its coordinates will be
(cos A cos B, sin A cos B).  Equating this with the previous formula,
we get (a) and (c) of the formulas on p. 476.  You can easily get
(b) by putting \pi/2 - B  for  B  in (a).

----------------------------------------------------------------------
Both of you asked how to derive a trigonometric identity Stewart
uses; and though the identities were different, the answer is the
same:  they can both be derived from identities 12a and 12b on
p. A29.

Namely, on the one hand, if you take x = y in 12a, and then
use the formula  cos^2 = 1-sin^2  to get rid of the "cos^2" in
the resulting equation; and solve that equation for  sin^2 x, you get
the "half-angle formula" that Stewart uses in Example 4 on p. 472.
(But I don't know why he calls it a "half-angle formula".)  It
also appears as 17b on p. A29.

On the other hand, putting -y in place of y in equation 12a,
one gets equation 13a, then adding these two equations and
dividing by 2, one gets equation 2a on p.476 (18a on p. A30);
while treating 12b in the same way (but both adding and
subtracting), one gets 2b and 2c (18b and 18c).

There is an interesting intuitive interpretation of the
equation one gets from, say, 2a on p.476 if one writes
A = ax,  B = bx,  and assumes  a  is large compared to  b.
Then the right-hand side represents (except for the factor 1/2)
the sum of two sine waves of slightly different frequencies.
We know that two such sine waves slowly go in and out of phase,
so that their sum looks like a sine wave of intermediate frequency
(i.e., the  sin ax  on the left-hand side) whose amplitude slowly
rises and falls -- and that rise and fall is what the factor  cos bx
on the left achieves.

----------------------------------------------------------------------
You ask about Stewart's comment on Example 9, p.476, that
\int sin 4x cos 5x dx  could be done by integration by parts.

Good question!  If I give you the hint to try the method of Example 4
in the preceding section (p. 466), do you think you can take it from
there?  I may add a version of this question, with a little more detail,
to the next homework.

----------------------------------------------------------------------
You ask why we are able to use trigonometric functions to solve
problems not involving such functions (p.478 et seq.).

I'm not sure in what sense you are asking "why".  From the point
of view of mathematical rigor, note that whenever you have a
number  x  between  -1  and  1,  you can find some angle  \theta  such
that  x = sin \theta,  and then do computations involving  x  based
on the properties of the trig functions of  \theta.  The trigonometric
substitutions used in this section are all based on this idea.  In the
end, once one has done the integration, one substitutes back, and
translates one's answer into a function of the original  x.

If you are asking why trigonometric substitutions should be the
"right way to go" for these computations, recall the motivation I
gave in class on Wednesday when previewing this topic:  That if we
want to integrate  \sqrt{1-x^2},  and we draw the picture of the
area we want to compute, we see that one piece of it is a segment
of the unit circle, between the y-axis and the radius to the
point (x, \sqrt{1-x^2}).  The area of that segment is equal to
half the angle  \theta  it subtends, and that angle satisfies
sin \theta = x.  This suggests using the substitution  x = sin \theta.
We try it out, see that it works, and find that the properties
that make it useful will apply in general to expressions involving
\sqrt{1-x^2}.

----------------------------------------------------------------------
You ask why we don't use  (-\pi/2, pi/2)  as the interval of
definition of the secant function in making trigonometric substitutions,
as noted in the table on p.478.

Because the secant function is not one-to-one on  (-\pi/2, pi/2)  --
look at its graph.  In fact, you can see from its graph that there
is no interval on which it is both one-to-one, and takes on the
whole range of its values, (-infinity, -1]  and  [1, infinity); so
in any given substitution, one will generally want to use either
an interval on which it takes on the former set of values, or one
on which it takes on the latter.

----------------------------------------------------------------------
You ask why one needs to have an inverse function to  g  in an
integration by reverse substitution, as described on p.478.

Well, when one succeeds in integrating  f(g(t)) g'(t)dt,  one gets
a function of  t,  and one needs to express  t  as a function of  x
in order to turn this into a function of the original variable  x.
If  g  is many-to-one, one may be able to choose, for each value
of  x,  _some_ value of  t  that maps to it, but this can get
complicated.  For simplicity, Stewart considers the case where  f
has an inverse function  h,  and we can simply substitute  t = h(x).

For a function such as  sin,  one gets the existence of an inverse by
restricting its domain and codomain, regarding it as a function from
[-\pi/2,\pi/2]  to  [-1,1].  Then  sin^{-1}: [-1,1] -> [-\pi/2,\pi/2]
is the inverse function.  (As, say a function with domain [0,2\pi],
and/or with codomain the whole real line, it has no inverse.)

(For more on functions and inverses, cf. the handout on Sets, Logic,
etc., in particular, the last item on p. 2, "f: X --> Y", and the
first item on the next page, "f^{-1}".

----------------------------------------------------------------------
You ask why integration by substitution, as used on pp.478-483, is
reasonable.

Well, let's think of an integral involving  \sqrt{a^2 - x^2}  (where
a  is positive).  That expression is defined only for  x  in [-a,a],
and for each  x  in that interval, there is a unique  \theta  in
the range  [-\pi/2, \pi/2]  such that  x = a sin theta.  So we can
think of each  x  as corresponding to some such \theta, and
study our integral by thinking, "for each value of \theta,  what
does our expression in  x  and  \sqrt{a^2 - x^2}  equal; and as
\theta  changes by a tiny amount, what is the corresponding
tiny change in  x?"  We can then compute the resulting integral,
being careful to remember that the values of  x  are not the same
as the values of  theta,  and the change in  x  ("dx") is not the
same as the change in theta, but, rather, that one is determined
by the other in each case.  If we were to forget these distinctions,
the computation would not be valid; if we keep them in mind, it is.

You ask whether there can be a problem with domains.  Yes; I sketched
in class yesterday a kind of situation where there would be.  If we
had an integral where  x  ranged from  -1  to  1, and we wanted to
make the substitution  x = 1/t,  then although x = -1  and
x = +1  would correspond to  t = -1  and  t = +1, we could not just
write our result as an integral from  t = -1  to t = +1.  Rather, as
x  ranges from  -1  upward to  +1,  t  goes from -1 down to -infinity,
and then from +infinity down to +1.  We haven't yet studied integrals
over ranges that "go to infinity", but we will in reading #7; and if
we handle the range correctly, a substitution of the above sort will
work; but not if we blindly integrate "from t = -1 to t = +1".

You also worried about the fact that I motivated trigonometric
substitutions by a geometric approach you would not have thought of.
Don't worry -- if it were something I could have expected students
to see for themselves, I wouldn't have presented it in lecture.
It's good that you should try to come up with ideas for yourself;
but since you're taking the class, that means that you know you can't
come up with everything that way!

----------------------------------------------------------------------
You ask how one chooses the limits of integration after a change
of variables, in an integration like that of Example 2, p.479.

If one is letting  x = g(t),  one has to choose the range of values
that  t  runs over so that  x  will run over the given range, taking
on each value just once.  In the problem shown, Stewart has
reduced to the case where  x  runs from  0  to  a,  so he
needs to take a range of values of  theta  which will make
x = a sin theta  do this.  The most convenient such range is
the range from  0  to  pi/2;  though other ranges, such as
2 pi  to  (5/2) pi,  would also work.

----------------------------------------------------------------------
You ask how the author gets the formula  csc \theta = \sqrt{x^2+4}/x
using the triangle in the margin at the bottom of p.480.

He has defined  \theta  by the formula  x = 2 tan \theta  at
the beginning of the discussion of the problem.  This means that
\theta  will be the angle of a right triangle in which the ratio
of the opposite and adjacent perpendicular sides is  x : 2,  so he
draws such a triangle.  The Pythagorean theorem then makes the
hypotenuse  sqrt{x^2+4},  so the cosecant  is  sqrt{x^2+4}/x.

The same result could be gotten without drawing a diagram.  We know
that  sec\theta = \sqrt{1+tan^2 \theta} = \sqrt{1 + (x/2)^2} =
\sqrt{1 + x^2/4} = \sqrt{(4 + x^2)/4} = \sqrt{4 + x^2)}/2.  Now
csc\theta = sec\theta/tan\theta,  so this is  (\sqrt{4+x^2)}/2)/(x/2) =
\sqrt{4+x^2)}/x .

----------------------------------------------------------------------
You ask why  a>0  in Example 5, p.481.

The case  a = 0  would have to be done by different methods, since
in that case one can't write  x = a sec theta.  The case  a < 0  does
not logically have to be excluded; the calculation that Stewart does
for  a > 0  works for any nonzero  a.  But since  a^2 = |a|^2,  if the
function has  a < 0  we can always rewrite it using  |a|  in place of
a,  so Stewart chooses the positive value.  I suppose his thought is
to do so "just in case" the question of which sign  a  has would
complicate some later computation; even though it actually wouldn't.

A case where the choice would make a difference is Example 1 on
p. 479.  One could take  x = -3 sin theta,  but then the signs
of most of the equations that follow are reversed; in particular,
the sign in the formula for  cot theta  in the middle of the page
has to be reversed.  The final answer would be the same, but the
details of the calculation would be different.

----------------------------------------------------------------------
You ask about reversing the trigonometric substitutions in the
indefinite integrals of section 7.3, in particular Example 5, p.481.

In that example, the substitution was  x = a sec \theta,  so the
inverse substitution is  \theta = sec^{-1} x/a.  When one substitutes
this into an expression like  tan \theta,  one has for figure out
what  tan (sec^{-1} x/a)  is as a function of  x.  From the diagram
in the left margin, one concludes that this is  \sqrt{x^2-a^2} / a.

An equivalent way to do this is to express everything as a function
of  a sec \theta,  and then put  x  everywhere for  a sec \theta.
For instance, one writes  tan \theta = \sqrt{(sec^2 \theta) - 1}
= \sqrt{(a sec \theta)^2 - a^2} / a = \sqrt{x^2-a^2} / a.

----------------------------------------------------------------------
You ask why, near the end of solution 1 to Example 5 on p.481,
the author rewrites  - ln a + C  as  C_1.

Since  C  can be any number, and  ln a  is a fixed number, the
sum  -ln a + C  can also be any number; so the answer is put
in simpler form if instead of writing the two terms "-ln a + C",
we regard their sum as a single arbitrary constant.

In a table of integrals, we would just call this constant "C".
But since Stewart has used the symbol "C" already in this derivation,
he calls the new constant "C_1".  (Mathematicians, among themselves,
might say or write something like "renaming  -ln a + C  as  C", and
end the calculation with "... + C"; Stewart might do this himself in
writing to a fellow mathematician.  But since he is writing here to
students, he is very careful to avoid that notational shortcut which
might lead to confusion.)

----------------------------------------------------------------------
You ask about the step at the end of Solution 1 to Example 5, p.481,
where the author says "Writing  C_1 = C - ln a ...".

The constant of integration denotes "any constant"; so the form
Stewart gets at the next-to-last step means "what you get when
you take any constant and subtract  ln a  from it".  Obviously,
that just comes down to "any constant", so it would be silly
to express it in a more complicated way.  You probably wouldn't
lose points on an exam for not making this simplification; but
it is certainly preferable to give one's answers without unnecessary
complications.

----------------------------------------------------------------------
You asked about the solution to Example 5 on p.481 based on
hyperbolic functions.

Those functions are defined and discussed in section 3.11.  They
aren't nearly as important as the trigonometric functions, and we
won't be paying much attention to them; but you can see from the
formulas at the bottom of pp. 258 and 259 that they have properties
very much like those of the trigonometric functions, which allow
them to be used in a similar way in solving differential equations.
When we study complex numbers in reading #26, we'll get some insight
into the relation between hyperbolic and trigonometric functions.

----------------------------------------------------------------------
Regarding Solution 2 to Example 5, p.481, you ask "How is it that
\cosh^{-1}(\frac{x}{a}) = \log|x + \sqrt{x^2 - a^2}|?"

Write the equation  \cosh(y) = \frac{x}{a}  using the definition
of  \cosh(y)  in terms of exponentials; multiply by  e^y  to get
rid of exponentials with "negative" exponents; regard the result
as a quadratic equation in  e^y;  solve using the quadratic formula;
and take the logarithm to find  y.

You'll get the above formula, but without the absolute value signs.
This is because Stewart has assumed  x > 0,  so that the expression
in question is positive.  If  x < 0, then one has to use the
substitution  x = -a cosh t  instead of  x = a cosh t.

----------------------------------------------------------------------
You ask about the words "proper" and "improper" as used in the top
part of p.485.

These are words taken from elementary-school arithmetic:  A fraction
like 9/4 is called an "improper fraction", because its numerator is
larger than its denominator.  One simplifies it to  2 1/4,  the sum
of an integer, 2, and a "proper fraction", 1/4.

The author is extending the terms to rational functions, calling
a rational function P(x)/Q(x) "proper" if P(x) has smaller degree
than Q(x), and "improper" otherwise; and he notes that an improper
rational function can be reduced to the sum of a polynomial and a
proper rational function.

These terms probably won't come up again, either in this course
or in other math courses.  (There is another use of "proper" and
"improper" that you will see in the later courses, regarding
subsets of a set; that meaning is entirely different.)

----------------------------------------------------------------------
You ask how we know that "long division of polynomials", used
on p.485, is valid.

If we start with polynomials  P(x)  and  Q(x),  which begin
a_n x^n + ...  and  b_m x^m + ... ,  and where  n \geq m,  then
the first "step" in the long division process consists of writing
the term  (a_n/b_m) x^{n-m}  as the beginning of the quotient.
Now let us write
	P(x) = (a_n/b_m) x^{n-m} Q(x) + (P(x)-(a_n/b_m) x^{n-m} Q(x)).
Then the right-hand term  P(x)-(a_n/b_m) x^{n-m} Q(x)  will be a
polynomial of lower degree than  P(x),  since the highest degree
terms of  P(x)  and (a_n/b_m) x^{n-m} Q(x)  are equal, and cancel
when we subtract.

We can now write  P(x)-(a_n/b_m) x^{n-m} Q(x)  as a polynomial
beginning with an  x^{n-1}  term (which might or might not be zero),
and if  n-1\geq m,  we can repeat this process, subtracting a constant
multiple  of  x^{n-m-1} Q(x)  from  P(x)-(a_n/b_m) x^{n-m} Q(x)
so as to remove the  x^{n-1} term.  Continuing this process, we end
up with an expression  P(x) = S(x) Q(x) + R(x),  where  S(x)  and  R(x)
are polynomials, with  R  having degree < m.

Long division of polynomials is simply a way of writing out
this process conveniently.  As noted above, the first term that
we write in the quotient is  (a_n/b_m) x^{n-m};  the subtraction
that we then do corresponds to subtracting  a_n/b_m) x^{n-m} Q(x)
from  P(x),  etc.

----------------------------------------------------------------------
You ask why we put x=a for A(x+a) and x=-a for B(x-a) in Example 3
on p.487.

This follows the procedure described for the preceding Example, in
the "NOTE" between these examples.  Did you skip that Note, or not
understand it?  If you have trouble with it, I'll be glad to help;
but you should certainly read it and see whether you can follow it,
rather than skipping it and then asking about the same method when
it is re-used in the next example.

----------------------------------------------------------------------
You ask about the "repeating/extra" factors in the examples on
p.488 et seq.

These are introduced in the discussion beginning at the bottom
of p. 487 and continuing at the top of p. 488.  I will try to discuss
the "why" of it in class, but assuming you had read that introductory
paragraph, you shouldn't have been surprised when they appeared
in the examples.  Likewise, his discussion of Case IV, beginning at
the bottom of p. 490, introduces the corresponding phenomenon with
quadratic factors, which appears in Examples 7 and 8.

Moral: read his discussion of the topics in the book, not just
the examples.  And if something you don't understand comes up
in such a discussion, ask about that.

----------------------------------------------------------------------
You ask why, in the solution to Example 4, p.488, the constant of
integration is written  K  rather than  C.

This is because the letter "C" has already been used to denote one of
the undetermined coefficients (starting in the display before [8]).
A constant can be denoted by any letter, but one has to avoid using
the same letter to denote two different things.  (Unless one makes
very clear that one is changing notation.  E.g., it is OK if one
writes explicitly "From this point on, we will use  n  to denote what
we previously called  2n"; or, in the case at hand, "Since we have
found the values we have been denoting A, B, C, we no longer need to use
those letters to denote those values, and will feel free to use C for
a constant of integration below".  But when one is not short of symbols,
it is simpler not to go through a change of notation like that, and
just to use a different letter.)

----------------------------------------------------------------------
You ask to be shown how to integrate the function shown in the
second display on p.489.

If you need help with this problem, you should carry the calculation
as far as you can using the techniques that the book describes, then
tell me what difficulty you encounter, what you need clarified about
the next step, etc., and I will do my best to help.  So please re-send
your question, following the above guidelines.

----------------------------------------------------------------------
You ask where we get formula [10] on p.489.

I showed in class how to get by trigonometric substitution.  (I just
did the case  a = 1,  but the principles of trigonometric substitution
shown in the previous section tell you what to do for any  a.)

Alternatively, you can recall that the derivative of  tan^{-1} x  is
1/(x^2 + 1).  This gives the integral of  1/(x^2 + 1);  and by a
change of variable, you can reduce the integral of  1/(x^2 + a^2)
to that integration.

----------------------------------------------------------------------
You ask why example 8 on p.491 has three terms, when there are
only two distinct factors in the denominator.

First note that this is what Stewart has told you would be the case,
in formula [11] at the bottom of p. 490:  When there is a repeated
irreducible factor, one gets a sum of terms with different powers
of that factor as their denominators, rather than just a single term.
So there should have been no surprise in seeing this in the example
he gives.

Note also that this is analogous to what happens when the
factors are linear rather than quadratic: see formula [7]
near the top of p. 488.

There are different ways of explaining "why" this happens.  The
approach that I used in class, for the case of linear factors,
was to note that when a linear factor  x-a  occurs just once, then
the rational function goes to infinity as  x --> a  "like" some
function  A/(x-a),  and that by subtracting  A/(x-a)  from the
function for an appropriate  c,  one can make the function
smooth at  x=a;  in other words, get a function whose denominator
does not have a factor  x-a.  On the other hand, if the denominator
is divisible by  (x-a)^r,  then one has to do things stepwise:
first subtract a term   A_r/(x-a)^r  that causes the number of
factors  (x-a)  in the denominator to decrease by 1;  then
subtract a term   A_{r-1}/(x-a)^{r-1}  that causes the number of
factors  (x-a)  in the denominator to decrease again, and so on.
We then end up with an expression for our rational function as
A_n/(x-a)^n  + A_{n-1}/(x-a)^{n-1} + ... + A_1/(x-a)  plus a
rational function  r(x)  with no  x-a  in its denominator (and then we
start working on the other factors in the denominator of r(x)).

This shows why we get such expressions in the case where all
divisors of the denominator are linear; and once we see this,
it is no surprise that we get similar expressions when there
are quadratic factors.

----------------------------------------------------------------------
You ask about the last display in Example 9, p.492, and how the author
goes from  \int \sqrt{x+4}/x dx  to  2\int du + 8\int du/{u^2 - 4}.

He is using the computation of the preceding display, which has
converted  \int \sqrt{x+4}/x dx  to  2 \int (1+ 4/(u^2 - 4)) du.
In the first line of the display you ask about, he carries this
one easy step further.  It is only on the next line that he applies
Formula 6.

----------------------------------------------------------------------
You ask whether the Weierstrass substitution  t = tan(x/2),  shown
in Exercise 59 on p.493, is a good way to evaluate the integrals in
section 7.3.

Well, it gives a method that will work if all else fails.

It expresses the resulting integrals in terms of the tangent
of  x/2,  rather than in terms of trigonometric functions of  x.
This can be fixed using the formula  tan(x/2) = (sin x)/(1+cos x).
But when the methods of section 7.3 give an easily solution,
I suspect that this substitution will give a much lengthier path
to the same answer.

----------------------------------------------------------------------
Concerning example 4(c) on p.496, you write:

> The function integrated is 1/(1-cos x). This is done by multiplying
> and dividing by 1+cosx. Can we do this even if the integral is a
> definite integral from pi/2 to 3pi/2? The value of cos x at pi is -1
> so 1/(1+cosx) doesn't exist at this value. Can we still apply this
> method to solve this integral?


Good point!  The answer is "yes and no". 

When one does the integration, one gets a function which is
guaranteed to have the right derivative except at the points
where the modified integrand is undefined.  But since the
original integrand was continuous at some of these points, such
as  pi,  and is equal to the modified integrand except where
that is undefined, we can expect that its integral should
agree with the integral of the original function where it is
defined.  So we should hope that when we compute the integral,
we should be able to "fill in" the values at points like  pi
to get a differentiable function which is the desired integral.

If we continue the integration where Stewart leaves off, we
get  -cot x - csc x,  still undefined at  x = pi;  but we
notice that  -cot x  goes to  +infinity  as  x -> pi  from
below, while  -csc x  goes to  -infinity,  so there is a
hope that their sum will behave reasonably.  Expressing
cot and csc in terms of sine and cosine, we get  -((cos x) + 1)/sin x,
where numerator and denominator both vanish at  x = pi.

How can we simplify that?  We would like the fact that  (cos x) + 1
goes to  0  at  pi  to be a result of the fact that some sine or
cosine goes to zero at that point, so that we can hope to cancel
such a sine or cosine in the denominator.  An expression that
represents the zero of  (cos x) + 1  at  pi  as the zero of
such a function is the half-angle formula,  (cos x) + 1 =
2 cos^2 x/2.  So we also apply the half-angle formula to the
denominator, writing  sin x = 2 sin x/2 cos x/2.  Then
-((cos x) + 1)/sin x  becomes  -(2 cos^2 x/2)/(2 sin x/2 cos x/2),
which simplifies to  -cos x/2 / sin x/2 = -cot x/2.  We can check
that this is an antiderivative of the original function.  (When
we differentiate it, we get  (1/2) csc^2 x/2 = 1/(2 sin^2 x/2).
Using the half-angle formula once more turns this into
1/(1-cos x),  our original integrand.)  Since it is in fact
continuous at  x = pi,  the definite integral of original
function from  pi/2  to  3pi/2  is the difference between the
values of this function at those points; and since it equals
the result that Stewart's calculation leads to, -((cos x) + 1)/sin x,
at  pi/2  and  3pi/2,  the difference between the values of that
function at those two points is the correct integral.  But as
you pointed out, we couldn't have known that without finding
this alternative expression for it.

This development suggests another way of doing the original
integration:  applying the half-angle formula  1 - cos x  = 2 cos^2 x/2
to the denominator of the integrand.  And in fact, it gets the
same result faster.

----------------------------------------------------------------------
You ask whether there is an alternative approach to
\int dx / (1-cos x),  given in (c) under heading 4 on p.496.

Yes.  One can turn the "half-angle" formula  sin^2 x = (1-cos 2x)/2
backwards, getting  1-cos x = 2 sin^2 x/2,  so that the integral
becomes  1/2 \int dx / sin^2 (x/2) = 1/2 \int csc^2 (x/2) dx
= -cot x/2.  (With the help of the formulas in Exercise 59(b) on
p. 493, one can, if one wishes, express  cot x/2 = (cos x/2)/(sin x/2)
as  (1 + cos x)/ sin x,  thus expressing this integral in terms
of the sine and cosine of  x.)

----------------------------------------------------------------------
You ask how one can prove that the antiderivative of a function
is not elementary (p.499).

Interesting question.  I don't know the details of the answer, but
hunting around, I find that a key step is the result described in
http://en.wikipedia.org/wiki/Liouville's_theorem_(differential_algebra)

I haven't seen the proof of that result; and it certainly takes work
to show that a given function doesn't satisfy the criterion which that
result gives; but at least it shows the kind of method used.

I realize that the article takes for granted some other concepts,
such as "differential field", which you haven't seen; so it is a
still less complete answer for you than for me.  If you have questions
about it, ask me at office hours.

Anyway, I'm impressed that the result was proved as far back as
the early 1800's.

----------------------------------------------------------------------
In connection with the introductory paragraphs of section 7.7
(p.506), you ask how we know that functions which we can't
integrate explicitly do have integrals.

Theorem 3 on p. 373 shows that any continuous function on
a closed interval has an integral.  The proof would be hard
to give using the material of this course; but you'll certainly
see it proved if you take Math 104.

In that result, "integral" is defined as a certain sort of
limit of partial sums.  If you were thinking of "integral"
as meaning "antiderivative", then you can get the existence
of that by combining the above result with the Fundamental
Theorem of Calculus: p. 388.

----------------------------------------------------------------------
You ask what to look for in deciding which method of approximate
integration (pp.506-515) to use.

The progression in this section of the book is from the
naive to the more sophisticated; so if you really wanted
to do a computation, the best of the techniques described in
the section would be one we will see in Wednesday's reading,
"Simpson's rule".  On the other hand, the easiest ones to
remember, and to apply in a "scratchwork" first-approximation,
are the ones in today's reading, the easiest being those at
the start.

We'll see on Wednesday why some approximation methods work better
than others.

----------------------------------------------------------------------
You ask about the relative values of Riemann sums versus the
methods of approximate integration described in section 7.7
(pp.506-515).

The biggest value of Riemann sums is not in practical computations,
but in developing the theory of integration, something you will see
if you take Math 104.  The formal definition of an integral says
that  \int a b f(x) dx  exists and is equal to  A  if the Riemann sums 
approach  A  no matter what "sample points" x*_i  one takes in the
various integrals, and no matter what subdivisions of  [a,b]  into
equal or unequal subintervals one uses, as long as one lets the
lengths of these subintervals approach 0.  (See p. 373, Note 4 for
divisions into unequal subintervals.  Stewart justifies these in
terms of a practical application, but they are essential to the
general development.)

In Math (H)1AB, results we prove about integrals are based on the fact
that if  f  is continuous, integration of  f  gives an antiderivative
of  f;  so one can prove results such as the formula for change of
variables using results on differentiation.  But in the general
theory, one integrates functions that may not be continuous,
and in that case, the integral may not be differentiable; so one
has to develop the general theory of integration without relying
on the theory of differentiation.  Then results like the
change-of-variable formula require the general definition of
a Riemann integral based on not-necessarily-equal subdivisions.
(Because when one changes variable, equal subdivisions generally
become unequal.)

----------------------------------------------------------------------
You ask about approximate integration (introduced on p.507) in the
case of integrals over infinite domains.

Good question!

For this to make sense, the function must go to zero fairly
rapidly as  x --> infinity  (and/or minus-infinity, as the case
may be), and it may be possible to show that the integral from
a certain point on will be less than some small constant, and
then apply one of the methods of this section to the integral
over the remaining finite region; and adding the two bounds,
get an error bound for difference between the approximation of
the integral over the finite region and the exact integral over
the whole region.

Alternatively, one can make a change of variables that converts
the integral over the infinite region into an integral over a
finite region.  E.g., given the integral of a function as  x  ranges
from minus-infinity to infinity, we might substitute  x = tan \theta,
and integrate from  \theta = -\pi/2  to  \pi/2.

Integrals over infinite domains, and integrals of functions that
go to infinity in places, will be looked at in the reading after
next; but Stewart doesn't talk about approximating their values,
except in one exercise, number 70 on p. 529.

----------------------------------------------------------------------
I hoped to answer your question in class, but didn't have time.
You asked why, as stated in Stewart on p.509, the midpoint rule
tends to be more accurate than the trapezoidal rule.

To see it visually, consider the case where the function
is  y = x^2,  and the interval  [x_{i-1}, x_i]  is  [-1,1].
The picture one gets in comparing the two rules sits in a box
of height 1 and length 2, and because of the way the parabola
is curved, you can see that the area under the parabola is less
than half the area of the box.  (It is, in fact, 2/3,  which is
1/3 of the area of the box.)  Now the midpoint rule for this
picture gives 0, while the trapezoidal rule gives the full
area of the box, 2; so the midpoint rule is closer.

The picture given on the right-hand side of Figure 5, p. 509,
is like this, but shifted in several ways that don't affect
the essential point.  (It is shifted horizontally, so that
the midpoint is not necessarily zero; it is rescaled, horizontally
so that  x_i - x_{i-1}  need not be 2; it is rescaled vertically
(in this case, by a negative factor, so that the curve bends
downwards rather than upwards); it is shifted upward, and
finally, a linear function is added, so that the slopes of
BC and QR need not be 0.)  But it is still true that the pink area,
representing the error of the midpoint rule, is about half the
blue area, the error of the trapezoidal rule.

----------------------------------------------------------------------
You ask why, in Figure 5 on p.509, BC has midpoint P and is
tangent to the curve at that point.

The point  P  is taken to be the point on the curve above
\bar{x}_i,  and Stewart defines  BC  to be the tangent
to the curve at that point, extended to meet the vertical
edges of the rectangle.  Now  \bar{x}_i  is defined to be
half-way between  x_{i-1} and x_i  (their midpoint); so  P  will
be the point of BC whose x-coordinate is half-way between the
x-coordinates of its endpoints; hence  P  is the midpoint
of  BC.

The one thing that is not so obvious is why BC is shown as
almost parallel to QR.  That can be deduced from the fact
that for  x_{i-1} - x_i  small, the curve looks almost like
a parabola, as I discussed in class.  And for a parabola,
you can verify that the two lines will be parallel.  (I.e.,
that the deviations of the height of the curve from BC will be
equal at the two ends.)

----------------------------------------------------------------------
Regarding the results on p.510 and p. 514, you ask "Why are
error bounds necessary?"

This comes down to the question "Do the Midpoint Rule,
the Trapezoidal Rule and Simpson's Rule actually work?
Always?  If not always, then when ...?  And in what sense
do they `work'?  I.e., how close can we be sure that the
answers they give come to the real values of the integrals?"

One can't simultaneously answer these questions for all
conceivable functions; but the error bounds in the text do
answer them for large classes of functions:  Functions which
are twice (or in the case of Simpson's Rule, four times)
differentiable, and where we know bounds on the derivative
in question.

----------------------------------------------------------------------
You ask about the relation between the error estimates for the
midpoint and trapezoid laws (p.510) and the actual errors in
the case of particular functions.

The error estimates are the maximum values that the absolute
value of the error might achieve, given on our knowledge of the
second derivative of  f.  Particular functions for which the second
derivative is everywhere \leq a given constant  K  may have  E_T
and  E_M  anywhere between the values shown in the "error bound"
formulas and their negatives; including the value  0.

If we bring in more information about a particular function, we may,
of course, be able to prove that the error actually has a lower value
than the one given by those estimates.  But we may not have such
extra information, or we may have it but it may be a lot of work
to see what we can prove from it, or it may lead to still more
time-consuming calculations.  So, while under some circumstances
it may be worth making further calculations based on more
information, in this section we are learning some convenient facts
provable for every function for which we can bound the second (or
later in the section, the third) derivative.

----------------------------------------------------------------------
You ask about the geometric interpretations of  K  and
(b-a)^3  in the error bounds on p.510.

If  K = 0,  then  f  has zero second derivative, hence
constant first derivative, hence is a straight line, and
in that case, the midpoint rule and trapezoid rule
give the exact integral.  The larger  K  is, the more
"curved" the graph is, and hence the larger the shaded
areas in the diagram at the bottom of p. 509 can be.  So
the larger  K  is, the larger the error can be (unless
we compensate by making a finer subdivision, i.e., larger n).

As for  (b-a)^3,  note that if we, say, double the length
of the interval we are using, but keep  n  the same, then
the base of each rectangle doubles.  One can deduce that
since the curve looks, locally, like a bit of a parabola,
doubling the length and keeping  K  the same causes the
shaded areas in those diagrams to be multiplied by  2^3;
so generally speaking, the error will grow with the cube
of  b-a.

Or, to look at it another way, if we want to double
b-a  but keep the bases of our rectangles the same,
we need to double  n  at the same time.  Then our
bound on the error in each interval will stay the same,
but there will be twice as many intervals, so our bound
on the error should double.  This means that if we
multiply  b-a  by 2 and compensate by multiplying  n
by  2,  our bound will (merely) be multiplied by  2.
Given that  n  appears to the second power in the
denominator,  b-a  must appear to the 3rd power in the
numerator.

This second approach shows that in any such error bound,
the exponent of  b-a  should be one more than the exponent
of  n  in the denominator; and the bound for Simpson's
Rule on p. 514 indeed follows this pattern.

----------------------------------------------------------------------
You ask whether the error formulas on p.510 also apply to the
errors in the left and right endpoint approximations.

No.  To see this, notice that the midpoint and trapezoidal
approximations have zero error for linear functions, f(x)=Ax+B,
which is consistent with the formulas given, since such functions
have second derivative 0, so that one can take  K = 0.  But
the left and right endpoint approximations have nonzero errors
for linear functions, so the error estimate in question can't
be correct for them.

In fact, one can get an error estimate for those two approximations
in which  K  is taken to be an upper bound for the first derivative
of  f.  In this estimate,  n  will appear to the first power in
the denominator, rather than the second power; so as one takes smaller
and smaller subdivisions, these approximations improve less quickly
than the midpoint and trapezoidal approximations.

----------------------------------------------------------------------
You ask about the different derivatives that occur in the error
estimates for the different approximation rules (pp.510 and 514).

In my class discussion on Monday, I very roughly sketched why
different order derivatives occur in these rules.  The midpoint
and trapezoidal rules would give exact answers if the curve were
a straight line, and the picture that Stewart gave, and that I
showed on the board, indicates, roughly, that the error is
proportionate to the failure to be a straight line, i.e., the
failure of the first derivative to be constant, i.e., to the
second derivative.  But the this affects the midpoint and
trapezoidal rules in opposite ways, and when one combines them
in Simpson's rule, the effects cancel out -- if the curve has
some constant second derivative, then Simpson's rule gives the
exact answer.  So the error in Simpson's rule depends on higher
derivatives.  As I said in class, one might expect it to depend
on the third derivative; but because of the symmetric way the
rule works, a function having the sort of symmetry that odd functions
show (but with respect to the midpoint of the interval rather than
with respect to 0) have no effect on the Simpson's Rule estimate
or on the integral itself; and functions with constant third
derivative are gotten from functions with constant second
derivative by adding a multiple of  y^3,  an odd function; so
the error in Simpson's rule somehow depends on the nonconstancy
of the 3rd derivative, which depends on the 4th derivative.

----------------------------------------------------------------------
You ask about the "K" in the error estimates for the various
approximation rules (pp.510, 514).

The idea of the error estimates is that if the function  f  doesn't
vary too "wildly", the results of the approximation formulas will
be close to the actual value of the integral.  The way the function
varies depends on its derivatives; so if we know that one or another
derivative never exceeds a certain value, then we can say that the
error in the result of applying the approximation formula will not
exceed a value computed from this.  For instance, in box [3], p. 510,
"K" denotes any number that we know  |f''|  never exceeds in the given
interval; if we know that, we get the bounds on the errors shown in
the last line of that box.

----------------------------------------------------------------------
You both ask why in the first display on p.512, the  B  disappears
after the first step.

Stewart answers this in the left-hand margin, saying "Here we
have used Theorem 5.5.7".  To find that theorem, turn to section 5.5,
and look for the boxed theorem number "[7]".  (That takes a bit of
looking, but it's on p. 412.)  After checking that theorem, do you
see the explanation?

----------------------------------------------------------------------
You ask how we know that  S_{2n} = (1/3) T_n + (2/3) M_n  (p.513,
last displayed equation).

Write out the formulas for  S_{2n},  T_n,  and  M_n,  and
see how they are related.  The \Delta x  in  S_{2n}  will
be different from that in  T_n  and  M_n;  so you might use some
symbol such as  c  for  (b-a)/2n,  and then write the  x_1, x_2
etc. of the midpoint rule, the trapezoidal rule, and Simpson's
rule in terms of  a  and  c;  and see what happens when you
write down the formulas for  S_{2n}   on the one hand, and
(1/3) T_n + (2/3) M_n  on the other.

----------------------------------------------------------------------
You ask whether there is any way to use Simpson's Rule (p.513)
with an odd value of n.

For Simpson's Rule itself,  n  has to be even.  But there are
other expressions in  f(x_0), ..., f(x_n)  that give equally good
or better approximations, which can be applied with  n  even or
odd.  It's harder to discover these other methods than Simpson's
rule, which may be why they are not mentioned in calculus texts.
If I ever have time to get to the heading "Is Simpson's Rule optimal"
that I had on the agenda today, I'll say a little about this.

----------------------------------------------------------------------
You ask whether there is a rule more accurate than Simpson's
rule (p.513).

Yes; in two ways.  On the one hand, the string of coefficients
used in Simpson's rule, "1, 4, 2, 4, ..., 4, 2, 4, 1", although
relatively easy to come up with, is not really the best possible.
There are strings of coefficients that actually yield somewhat
better approximations of a function, but if presented at this
level, they would seem to be "pulled out of a hat".  On the other
hand, Simpson's rule is based on approximating our function by
parabolas on successive segments.  If one instead approximates
it by higher degree curves, one can get estimates that improve
still faster as  n  increases than Simpson's Rule, just as the
Simpson's Rule approximation improves faster than the Midpoint
and Trapezoid Rules.

----------------------------------------------------------------------
I hope my discussion in lecture clarified the point you asked about.
Where Stewart writes "for  a \leq x \leq b" in the first sentence
of the error bound statements on p. 510 and p.514, this is formally
ambiguous between "for all  x  satisfying a \leq x \leq b"
and "for some  x  satisfying a \leq x \leq b"; but what is
meant is "for all" in both cases.  If we merely knew that the second
or fourth derivative of a function was  \leq K  at _some_ point of
our interval, this wouldn't give much information on how the
function behaved in the interval as a whole, and so wouldn't allow
us to get an error bound.  Only from knowing that it is  \leq K  at
_all_ x  in the interval can we draw conclusions limiting how badly
f  can "stray" from its expected behavior.  So if we took for  K
the smallest value  f''  (or  f'''') attained, the bounds could
not be true -- we must take the largest value it attains, or
more generally (if we can't be sure of the largest value) anything we
know is at least the largest value.

As Stewart says in the margin on p. 510, smaller values of  K  give
better bounds -- but these must still be taken from among values
of  K  with the property of being larger than  f''(x)  for _all_ x
in the interval.  (There in the margin, he does explicitly say
"for all".)

----------------------------------------------------------------------
You ask why the error bound for Simpson's Rule (p.514) involves the
fourth derivative instead of the second.

Simpson's Rule is set up so that it will give exactly the right
value if the curve is a parabola; and one can show that it will
even give the right value for  f(x)  a polynomial of degree 3.
(This is like the fact that the Midpoint Rule, although set up
so that it would give the exact answer for a function that is
constant on each segment [x_i, x_{i+1}],  turns out to give the
right answer for any function whose graph is a straight line on each
of those segments.)  The polynomials of degree 3 are the functions
whose 4th derivatives are 0; so to measure how Simpson's Rule fails
to give the right answer for a given function, one looks at how
that function fails to have 4th derivatives 0.  If that failure
can be bounded, by showing that the function has 4th derivative
of absolute value everywhere \leq K, then we get a bound on the
error in Simpson's Rule.

(If, however, one has a function for which a bound on the 2nd derivative
is known, but one doesn't have a way of bounding its 4th derivative,
one could use the fact that  S_{2n} = 1/3 T_n + 2/3 M_n,  and
estimate the error in S_{2n}  using the second-derivative estimates
on the errors in  T_n  and  M_n.  So from that point of view, the
4th-derivative error bound for  S_{2n}  isn't the only one that it is
possible to use.  But the estimate based on the 2nd derivative would
not decrease nearly as fast, as  n  grows, as the estimate based on
the 4th derivative does; so the latter is the more useful tool.)

----------------------------------------------------------------------
You ask why the error bounds for the various approximate integration
rules given on p.510 and p.514 involve  b-a.

The presence of  b-a  means that the larger the interval over which
the integral we want to approximate is taken, the larger the error
may be, if other things are equal (the number of segments into which
we divide it, and the second or fourth derivative of the function.)

Another way to look at this is to recall that  \Delta x = (b-a)/n.
Then the formulas become

	|E_T|  \leq  K n (\Delta x)^3 / 12,
	|E_M|  \leq  K n (\Delta x)^3 / 24,
	|E_S|  \leq  K n (\Delta x)^5 / 180.

You might find this form easier to understand.  But Stewart gives
the form in terms of  b-a  and  n,  because in a given situation,
b-a  will generally be given, and  n  is what we have to choose, and
choose large enough to keep the error below a given value.  (Still
another way to express these formulas would be in terms of  b-a  and
\Delta x.  Those would show how small we have to take  \Delta x  to
get a given accuracy.)

----------------------------------------------------------------------
You ask about terms like "infinite discontinuity", used by
Stewart on p.519.

As I said in class, this is one point on which I strongly
disagree with his usage.  A point  x = a  which he describes
that way I (and most mathematicians) would describe as a point
such that f(x) is unbounded in the neighborhood of  x = a.

----------------------------------------------------------------------
You ask "Is it ever necessary to use the precise definition of the
limit that we just learned in section 2.4 in improper integrals?"
(pp.520-526)

Generally speaking, the precise definition is used to prove various
properties (such as the limit laws for sums, products, etc.), and
these are what we use in other tasks, such as evaluating improper
integrals.  I can't say "never", but I think it will be rare to have
to go back to the definition of limit when working with improper
integrals.

----------------------------------------------------------------------
You ask, in the case of an integral from -infinity to +infinity,
and its expression as the integral from -infinity to  a  plus
the integral from  a  to +infinity (p.520), "If one of the integrals
diverges, does the whole integral, from negative infinity to positive
infinity, diverge?"

Right.  The whole integral only converges if both parts do.

----------------------------------------------------------------------
You ask about Stewart's statement on p.521 that  1/x  has
divergent integral from 1 to infinity because it doesn't
approach zero "fast enough".

Well, for easier understanding, let us think about adding up
a series rather than integrating a function.  If a man agrees
to give you a pound of cheese the first day, half a pound the
second day, a third of a pound on the third day, etc., then if
he continued this forever, he would give you, in the long run,
an infinite amount of cheese.  (But it would take a long long
time to see a "large" amount of cheese -- I think that to get a
mere 10 pounds, you would have to wait something like 66 years.
In real life, the amount he would give you would be limited by
his and your lifetimes.  And even if you lived forever, after
trillions of years the amount he would give you per day would
be less than a molecule, so the arrangement would no longer
make sense.  But in abstract mathematical terms, it is true that
the series  1 + 1/2 + ... + 1/n + ...  does have limit infinity.)

On the other hand, if he started with two pounds of cheese, and
gave you half of it (one pound) the first day, half of what was
left (half a pound) the second day, and half of what remained on
each succeeding day, then even going off to eternity, he would
never give you more than the two pounds he started with.

The difference is that the quantities of cheese he gives per
day in the second scenario tapers off faster than that of the
first scenario.  So to understand in intuitive terms the
difference between a convergent series -- or integral -- and
one that does not converge, one can speak of "how fast" the
function approaches zero.

I hope this helps.

----------------------------------------------------------------------
Concerning the comparison of  \int (1/x^2)  and  \int (1/x)  on p.521,
with the former converging, but not the latter, you write

> Stewart offers the explanation that $1/x$ does not approach zero
> fast enough to converge.  But, given enough time (we are going to
> infinity), wouldn't the function approach zero like $1/x^2$ ... ?

If we were just interested in whether the function approached zero,
this idea would be valid; but we are interested in how the integrals
behave, and the "time" it takes to get to a given point comes into
the calculation of the integral; the "time" is, roughly, the base
of a rectangle whose area is  base x height.  If the time it took
for one (positive decreasing) function to get down to a certain low
value were merely a constant multiple of the time it took another to
get there, your argument would still be valid; but, for instance,
the time it takes  1/x  to get down to  1/4  is twice the time it
takes  1/x^2 to do so, the time it takes it to get to  1/16  is four
times the time it takes  1/x^2 to do so, the time it takes it to get
to  1/64 is 8 times the time it takes  1/x^2 to do so, and so on;
and the end result is that the areas involved in the calculation of
the integral of  1/x  can add up to infinity, while those involved
in the calculation of the integral of  1/x^2  don't.

----------------------------------------------------------------------
You quote the third paragraph on p.521 as saying that "1/x^2  is
finite but 1/x is not".  But this is not what it says!  It says
that the integral of  1/x^2  (from 1 to infinity) is finite, but
the integral of  1/x  (over the same range) is not.  There's a
world of difference!

The visual way that Stewart expresses this is to say that the
area under one curve (as one goes out to infinity) is finite,
while the area of the other (in the same sense) is infinite.
Since we can't see that whole infinite stretch of area, it
is hard to visualize what the difference is.  But calculation
shows that it is so.

----------------------------------------------------------------------
You ask how, in the third display of the solution to Example 2
on p.521, Stewart gets from one step to the next.

Note the words before that calculation: "... by l'Hospital's Rule
we have".  Did you learn l'Hospital's Rule in your AP calculus
course?  If you didn't, or if you are not sure you remember it in
full, look up "l'Hospital's Rule" in the index of this text, and
review it there.  Of course, if you have any questions about how
it works, you can e-mail them to me.

Points to be learned from this: Stewart often explains his
calculations, so if you don't understand one, look at the
words that precede or follow it, or sometimes (though not
this time) words he puts in the margin by the calculation.  And
if he refers to some topic you are not sure about, use the index.

----------------------------------------------------------------------
You ask whether it is possible for a function that converges
when  x --> infinity  to have an integral which diverges
when  x --> infinity,  and vice versa.

Yes.  The book gives examples; e.g.,  1/x  converges to  0  (i.e.,
approaches the limit 0) as  x --> infinity,  but as noted on
p.521, its integral does not converge.

[Perhaps you were using "f(x) converges" to mean "f(x) has
convergent integral".  That is not how the word is used, but
if it is what you meant, you can get an example by taking the
derivative of the above example.]

The opposite situation is much less common, but it does happen;
an example is given in Question "81(a)" to section 7.8, on the
latest homework-sheet (but the question is not assigned).  Question
"81(b)" then challenges you to find an even more extreme sort of
example (which you can do by carrying the idea of what happens in 81(a) 
further).

----------------------------------------------------------------------
You ask whether, in Example 3 on p.522 we couldn't just find the
integral by integrating from  -t  to  +t  and taking the limit
as  t --> infinity.

We could if we knew that the integral existed!  But Exercise 61
on p. 528 (listed in the "interesting/challenging" category in this
week's homework sheet) shows that the limit might exist even if the
integral does not.  So we need to do the two integrations to be
sure the integral is defined.

----------------------------------------------------------------------
Regarding Exercise 31 in section 7.8, you write

> ... I concluded that since 1/t^3 is different at t=0 from plus
> side and negative side, the integral is divergent. ...

No, that's not the criterion for divergence!  Look at part (c) of
the Definition on p.523.  It doesn't say that the integral is
convergent if the integrals from the left and from the right are
equal; it says it is convergent if each of them is a convergent
integral, and in that case its value is their sum.  The reason
this example is divergent is that  1/t^3  does not approach any
(finite) limit as  x  approaches  0  from one side or another.

----------------------------------------------------------------------
You ask how we define the integral of a function over a range
from  a  to   b  if it "blows up" at both ends, a case left out
in the definition on p.523.

Good question.  The idea is exactly the same as in part (c) of
the definition on p. 520, which does the same for integrals from
-infinity to +infinity: break the integral into two parts, each of
which is improper only at one end, and define the integral over the
whole interval to be their sum.

----------------------------------------------------------------------
Concerning the warning Stewart gives on p.524(bottom)-525(top) at
the end of example 7, you ask why one can't evaluate the definite
integral "the ordinary way" in that case.

The ordinary method of evaluating the definite integral is
based on finding an antiderivative (in this case,  ln |x|), and
using part 2 of the Fundamental Theorem of Calculus (p. 391) to
deduce that the difference of its values between the endpoints
equals the definite integral.  

The Fundamental Theorem of Calculus is stated for continuous
functions defined on an interval.  1/x is in Stewart's language
discontinuous; in mine it is not defined on the whole interval,
and in fact has a pole (singularity) at  x = 0.  Whichever way one
says it, the Fundamental Theorem of Calculus is not applicable to such
functions.

Intuitively, if you think of trying to "add up little bits of" f(x),
this process does not converge, so it doesn't make sense to say
that the result of the process is the difference in the values of
ln |x|  between the endpoints.

----------------------------------------------------------------------
You ask whether the Comparison Test for improper integrals
(p.525) can be inconclusive.

Certainly.  If we want to know whether a positive-valued
function  a(x)  has convergent integral, and we choose
some function  f(x)  that is greater than it, then if the
integral of  f(x)  is divergent, that leaves it open whether
the integral of  a(x)  converges or diverges.  Likewise, if
we choose a positive-valued function  g(x)  that is less
than  a(x),  then if the integral of  g(x)  is convergent,
that leaves it open what the integral of  a(x)  does.  It
is only if we can find an  f(x)  larger than  a(x)  whose
integral converges, or a positive  g(x)  less than  a(x)
whose integral diverges, that the Comparison Test helps.

----------------------------------------------------------------------
You ask why, in Example 9 on p.526, the author starts by breaking
the integral from  0  to  infinity  into the sum of the integral
from  0  to  1  and the integral from  1  to  infinity.

Because the comparison test (as he has stated it) requires the
assumption that  f(x) \leq g(x)  for all  x  in the interval
considered.  The relation  x \leq x^2,  and hence its consequence
e^x \leq e^{x^2},  holds for all  x\geq 1,  but not for x\in(0,1),
so he cannot apply the comparison test to the integral from  0  to
infinity, but only to the integral from  1  to  infinity.

----------------------------------------------------------------------
> ... you said that integrals that are improper on both sides
> should NOT be done like this: lim t->infinity, integral of f(x) from
> -t to t. Why is this wrong?

Well, if the doubly improper integral does converge, that calculation
will give the correct answer.  But it can also give an answer if the
doubly improper integral does not converge.  For instance, as pointed
out in Stewart's exercise 61 on p.528, it gives the value  0  for
\int -infinity infinity  x dx.  If you agree that that answer is
nonsense, good.  If you think maybe that integral should be considered
0, then note that this answer isn't preserved under change of
variables:  If we let  x = y+1,  then that integral becomes
\int -infinity +infinity (y+1) dy,  and the same method of
evaluation for it gives  +infinity instead of 0.

I'll give another such example in class, when I have time to get
back to this unfinished discussion.

----------------------------------------------------------------------
You ask about finding an upper bound on arc length, to go with
the lower bound used by Stewart to approximate the length on p.538.

Well, if you have a curve that is convex in one direction or
the other, and you draw tangents to it at a sequence of points,
the first of which is the starting-point of the curve and the last
of which is the end-point, and extend each pair of successive tangents
till they meet, you get a polygon that is (roughly speaking)
circumscribed around the curve; and its length will be an upper
bound on the length of the curve.

If a curve doesn't have that convexity property, but can be broken
up into finitely many pieces that do (e.g.,  y = x^3  on  [-1,1],
which is convex upward for positive  x,  and downward for negative  x),
then you can bound the lengths of those pieces as above, and add up
the results to get a bound on the total length.

These cases cover most curves that are easily described.  But
there are, nonetheless, curves that are "wiggly" at every scale,
so that one can't use this method to get upper bounds on their
lengths, though you can still use polygonal approximations as in
Stewart to bound it from below.  So this bound is not as robust
as that one.

----------------------------------------------------------------------
You ask whether Stewart's notation  |P_{i-1}P_i|  for the distance
from  P_{i-1}  to  P_i  (pp.538-539) is standard.

No it isn't, to my knowledge.  It is common to use absolute value
signs for the magnitude of a vector, so I think Stewart's idea is
to use  P_{i-1}P_i  to mean the vector from the point  P_{i-1}  to
the point  P_i.  Such notation might be used in physics, but I think
they would put an arrow above  P_{i-1}P_i  to express "vector".
Another notation would be  |P_i - P_{i-1}|,  where  P_i - P_{i-1}
would denote the vector whose coordinates are gotten by subtracting
the coordinates of  P_{i-1}  from those of  P_i.

A simpler notation, which you are likely to see in Math 104,
is  d(P_{i-1}, P_i),  where  d(--,--)  is the "distance function".

----------------------------------------------------------------------
You ask about the top displayed formula on p.539.

As the sentence before it says, we get this from the Mean Value
Theorem.  If you don't remember that theorem from your preceding
calculus course, you should have looked it up in the index!  Did you?
Let me know whether, when you look it up, it answers your question.

----------------------------------------------------------------------
You ask why the last displayed formula before Definition 2 on p.539,
and the formula before that, are equal, as Stewart claims, "by the
definition of a definite integral".

Well, what definition of definite integral have you seen?  If the
one that you saw doesn't seem to connect with what he says, you should
see what definition he gives.  Using the index and a bit of searching,
one finds the definition on p. 372, in the box at the top of the page
and the comments that follow.  Do you agree that the result he claims
does follow from that definition?

If that definition differs greatly from the one you saw in your
previous course, let me know what the definition you saw was.

----------------------------------------------------------------------
Regarding the concept of surfaces of rotation on pp.538-543,
you ask whether one can do a "double rotation" of a curve,
rotating it first about one axis and then about the other.

Well, one could; but since the first rotation would give a
two-dimensional structure (a surface), the second would
turn this into a three-dimensional structure (a solid).
The boundaries of this solid would be hard to describe, so
the volume would also be.

But it's interesting to think about what one would get if
one simply took a point, rotated this around the x-axis
to get a circle, and rotated that circle around the y-axis
to get a surface.  (That is not an example of what is
done in this reading, because the circle one would
rotate around the y-axis would not lie in the x-y-plane,
as the curves considered in this reading do.)  Can you
figure out what the resulting surface would be?

A different way one could do two rotations would be in
more than 3 dimensions.  For instance, in 4 dimensions,
calling the coordinates w, x, y and z, one could start
with any curve in the x-y-plane, rotate this by performing
a rotation on the y-z-coordinates (as one does in this
section) getting a surface in x-y-z space, then perform
another rotation on the w-x-coordinates, getting a 3-dimensional
hypersurface in the whole w-x-y-z space.  I don't think it would
be hard to compute its volume; but it's outside the scope of
this course.

----------------------------------------------------------------------
You ask how Stewart goes from the formula in the 3rd display
on p.539, which involves f'(x*_i),  to the integral in the 4th
display, which involves  f'(x).

Stewart is using the definition of the integral -- see the boxed
definition on p. 372.  If you have questions about that definition,
let me know.

----------------------------------------------------------------------
You ask why in the arc length formula on p.539,  f'  is required
to be continuous.

This is so that we can be sure that the integral in the definition
of the arc-length is defined.

Discontinuous functions may or may not be integrable -- this
is a difficult topic, dealt with starting in Math 104.  Continuous
functions are always integrable.  Curves with discontinuous
derivatives may or may not have finite arc-length.

----------------------------------------------------------------------
You ask whether Simpson's rule (suggested on p.541) would be more
accurate for estimating arc-lengths than simply adding up the
distances of the line-segments in the same subdivision of the
interval of definition.

I don't know for sure, but my guess is that Simpson's rule would
be more accurate.  The line-segment computation has a built-in
bias -- it gives smaller values than the real distance, because
each line-segment is the shortest distance between its endpoints,
hence shorter than the segment of the actual curve.  Simpson's
rule is designed to make two sorts of biases, those of the
midpoint and trapezoid rule, cancel each other to a large
extent.  However, since these are both different from the bias of
the line-segment approximation, one would have to do some calculation
to answer your question with more certainty.

----------------------------------------------------------------------
You ask regarding Example 3 on p.541, "Does an arc length function
exist for the hyperbola xy = 1, or is always necessary to estimate it?"

The function exists -- but I believe it is not an elementary
functions.  Some non-elementary functions can be found in tables.
It is also often possible to calculate them accurately by other
means, such as power series expansions.  Methods of estimation
such as those we read about in section 7.7, which Stewart uses in
this example, are yet another method.

----------------------------------------------------------------------
I hope what I said in class helped with your question about Figure
7 on p.542.  The triangle shown is any right triangle with sides
parallel to the x- and y-axes, and hypotenuse tangent to the curve
at the point (x,y).  From this, I hope you can see that for  ds  and
dx  as in that picture, the ratio  ds/dx  represents the "speed" with
which the length of the curve is growing relative to  x  at that point.

Imprecisely, but intuitively, the hypotenuse of that triangle can
be thought of as an infinitely small segment of the curve, infinitely
magnified.  Whether that viewpoint helps you is yours to decide;
if not, ignore it!

----------------------------------------------------------------------
You ask about Figure 7 on p.542.

Well, although historically, "dx", "dy" etc. meant "infinitely
small changes", this was hard to give precise meaning to; so
various ways of making it precise were developed.  The one which
Stewart follows is to let  dx  denote any nonzero real number,
then let  dy  denote the amount that  y  would increase if  x
increased by  dx  while the function continued to increase at a
constant rate, rather than letting its rate of change vary as
it actually does.  In the picture, this is shown by the red
line tangent to the curve:  its x-coordinate increases by  dx,
its slope is constant, equal to the slope that the curve has at
the starting point of that line, rather than changing as the
slope of the curve does; and the the increase in the y-coordinate
of that red line is called  dy.  Finally its length is called  ds,
which likewise represents the amount by which the length of the
curve would increase if its slope remained constant rather than
changing.  The Pythagorean Theorem, applied to that picture,
gives formula [8].

Something that may not be clear from the picture is that the
line labeled "dy" is intended to label the height of the vertical
red line.  It is not shown right against that line because
then it would run into the label "Delta y".

In the above discussion, I've tried to strike a balance between
precise and intuitive explanation.  I hope it has helped.  If
Figure 7 still remains a mystery, I hope the rest of what Stewart
says makes formulas [5]-[8] reasonable.

----------------------------------------------------------------------
As you say, if we compute the arc length of the function in
Example 4, pp.542-543, from x=0, we would get an infinite result,
since as x ->0, the logarithm function approaches -infinity, so
x^2 - (1/8) ln x  approaches +infinity.

This shows that the graphs on p. 543 are badly drawn!  Stewart
was just interested in values \geq 1, but he should have had
the graph drawn correctly for all values it shows.  Thanks for
bringing this this to my attention -- I'll include it in the list
of comments and corrections I send Stewart at the end of the Semester!

----------------------------------------------------------------------
You ask, as Stewart does in his caption to Figure 8, p.543, why
the arc length in that figure is shown as negative for  x < 1.

The arc length has to be measured relative to some starting point.
Stewart arbitrarily chooses  x = 1  as that starting point, so each
point with  x > 1  is a certain distance _after_ that point along
the curve, and each point with  x < 1  is a certain distance _before_
that point along the curve.  Hence it is natural to define the
arc length to have positive values for the former sort of points,
and negative values for the latter.

One could, of course, naively say "We define the arclength to mean
the distance along the curve from the starting point, so it will
be positive on both sides".  But that definition would lead to
messy consequences:  The arclength would be given by the absolute
value of the integral Stewart gives, rather than the integral
itself; and when we wanted to change our starting point, this
would have to be handled differently in different cases, rather
than just by changing the constant of integration.  So the definition
described above, with positive values to the right of the starting
point and negative values to the left, given by the integral without
absolute value signs, is more useful.

----------------------------------------------------------------------
You ask why one uses the average radius in [2] on p.546.

The formula before [2] can be written  A = \pi (r_1 + r_2) l.
The author could have left it that way, but he chose instead to
write  r_1 + r_2  as  2  times  (r_1 + r_2)/2.  That way, the
term  (r_1 + r_2)/2  has an easy interpretation, as the average
radius, while the  2  times the  \pi  together give the coefficient
we are accustomed to in the formula for the circumference of a
circle.  So he gets a formula that is easy to understand:  2\pi r l.

----------------------------------------------------------------------
You ask why on p.547, formula [4] contains the square root
expression involved in defining arc-length, but formula [7]
doesn't.

In formula [7], the "ds" represents the differential of
arc-length.  As defined in equation [7] on p. 542, it already
contains the square-root expression that appears in the other
formula.  (Note the words that precede formula [7] on p.547,
"using the notation for arc length given in Section 8.1".)

----------------------------------------------------------------------
You ask about the relation between the "ds"'s one looks at when
rotating a curve about the x-axis and about the y-axis, and whether
one should use one formula for  ds  in the first case and the
other in the second (pp.547-548).

They represent the same thing; intuitively, the length of a "bit"
of the curve that one is rotating.  In both cases, one can compute
the surface area using either for the formula for  ds  in terms of
dx, or the formula in terms of  dy.  In Example 2 on p. 549, Stewart
considers a parabola rotated about the y-axis, and shows that
using the two formulas, one gets the same answer.

----------------------------------------------------------------------
You note that the surface area of a sphere, 4 pi r^2  (illustrated
by Example 1 on p.548) is the derivative of its volume,
(4/3) pi r^3,  and you ask whether the areas of other surfaces
can be obtained is derivatives of their volumes.

In the case you refer to, we are looking at a whole family
of spheres, one for each value of  r,  and as we increase  r,
the surface "grows" perpendicular to the tangent plane at
each point.  If we have a family of closed surfaces described
in terms of some parameter  t,  such that as  t  changes,
the surface grows at constant rate 1 in the direction perpendicular
to its tangent plane, then the area at any value of  t  will
be the derivative of the volume.  But it can be tricky to design
such families of surfaces; so one doesn't have a really useful
general technique.

----------------------------------------------------------------------
You ask how Simpson's rule can be used to approximate the
area of a surface of revolution, as suggested in Exercises 17-20,
p.550, when Simpson's rule relates to the area under a curve,
which is different from areas of surfaces of revolution.

Simpson's rule is applicable to the integral of any function
(and the error estimate applies whenever the function is
continuously 4-times differentiable).  Even though Stewart uses
pictures in which the integral represents the area under a
curve, in order to make the rule intuitively reasonable, it is
not restricted to that case.

Moreover, if one wishes, one can translate the problem of finding
the area of a surface of revolution into that of finding the
area under a curve:  The formula for the area of the surface
gotten by rotating  y = f(x)  around the x-axis is given by
\int_a ^b f(x) \sqrt{1+f'(x)^2} dx,  so it is equal to the
area under the curve  y = f(x) \sqrt{1+f'(x)^2}  from  x=a  to
x=b.

----------------------------------------------------------------------
You ask how Stewart handles the units in Example 2, p.554.

The key step is substituting  62.5  for  \delta.  He notes in
the margin on the preceding page that in customary units, the
weight density of water is  62.5 lb/ft^3.

----------------------------------------------------------------------
You ask what the significance of moments (pp.554-560) is. 

As I think I said in class, moments don't have one single
significance: they are a type of calculation that comes up in
various situations.  If you have a function of one variable,
and you take the integral of that function multiplied by the
n-th power of the variable, that is called the n-th moment of
the function.  More generally, if you have a function of
several variables, and you take the multiple integral of that
function multiplied by a product of various powers of the
various variables, the results one get are likewise called
moments of the function.  (You won't see multiple integrals
defined in general until Math 53, but Stewart uses in section 8.3
what is essentially the case of a double integral where the function
is constant with respect to one variable, so the part of the
integration that would come from that variable can just be replaced
by the length over which one would integrate times the value of the
function at that location.)

Two cases of the first moment (which Stewart just calls "the
moment") of a function of one variable that come up involve the
torque on a lever, gotten by integrating mass times distance from the
fulcrum, and the volume of a surface of rotation, where the distance
from the axis of rotation becomes a factor in the integration
because when one rotates an object, the distance a piece of it
moves is proportional to its distance from the axis of rotation.
Another way moments come up is in probability theory (section 8.5,
which we are skipping), where the expected value of a variable is
the moment of the probability function with respect to that variable.

> ... why is the moment about the y-axis the sum of the
> masses times the x coordinates? 

Because the distance from the y-axis is the x-coordinate.  As I
said in class, what Stewart calls "the moment about the y-axis" is
better described as "the moment with respect to the x-coordinate".

----------------------------------------------------------------------
You write that you are having trouble understanding why Equation [4]
on p.555 can be rewritten as M = sum m_i x_i.

He doesn't say it can be!

You need to read the words between the equations more carefully.
What he says about  M = sum m_i x_i  (on the line below that
equation) is that it "is called the moment of the system about
the origin".

So what is the relation with equation [4]?  That equation shows
us that  sum m_i x_i  is in important player in this situation,
and motivates our giving it a symbol and a name.  Once we have
these, we can rewrite [4] as  \bar{x} = M/m,  or equivalently,
as noted in the next sentence, as  m\bar{x} = M.

----------------------------------------------------------------------
You ask regarding the display after the end of Example 3 on p.556,
"... is the p supposed to be area density?"

Good point!  On p. 552, Stewart defines  rho  to be the density
of a fluid; that is, the mass per unit volume.  In the bottom
paragraph of p. 556, on the other hand, he takes rho  to be the
"density" of a lamina, but doesn't say what this means.  He must
mean the mass per unit area.

----------------------------------------------------------------------
You ask about the first boxed formula on p.557.

That equality follows from the definition of integration, given
in the top box on p. 372.  Ask if you have questions about this.

----------------------------------------------------------------------
Regarding the derivation of the equations for the center of mass
on pp.557, you note that they are based on using midpoints,
and ask whether it would be more accurate to use Simpson's rule.

If we wanted to compute a center of mass using an approximation
by dividing the mass into finitely many strips, then Simpson's
rule would be more accurate than the midpoint rule.  But both
these rules are ways of approximating an integral, and the
point of the discussion Stewart gives is to show what integral
is being approximated.  So once he gets to the integral, the
approximation that he used to lead up to it doesn't matter.
The best choice is simply the one that gives the quickest,
clearest derivation.

----------------------------------------------------------------------
You ask what "Formula 9" refers to on line 4 of p.559 is.

It is the formula at the bottom of the preceding page!  In any
section, "Formula n" means the formula numbered n in that section,
while formulas from other parts of the book are described in ways
that specify where in the book they are.

----------------------------------------------------------------------
You ask how one can use Simpson's rule in Exercise 36, p.561,
when no formula is given for  f(x).

You are supposed to use the graph shown to estimate the values
of  f(x)  at the different points needed.  Presumably, you are
expected to let  n = 8, and use the values on the coordinate-lines.

----------------------------------------------------------------------
In connection with the material of section 9.1 (pp.580-584) you ask,
"Are all differential equations solvable?"

In one sense, namely "Given a differential equation, can we find
an elementary function which is a solution to the equation?", the
answer is certainly "No", since if the equation has the form  y' = f(x),
then a solution would be an integral of  f,  and we know that some
elementary functions have nonelementary integrals (pp. 498-499).

Another sense of your question is "Given a differential equation, must
there exist a function (elementary or not) which is a solution to
the equation?".

The answer is still "No".  But recall that I gave in class an
example of an initial-value problem with more than one solution,
but pointed out that in that case, the function  f  giving the
differential equation was not differentiable at the relevant point;
and I said that there are are theorems (at least one case of which
you'll see in Math 54) saying that for "reasonable" differential
equations  y' = f(x,y),  an initial value cannot correspond to more
than one solution.  Those theorems also tell us that for such
equations, every initial value does corresponds to some solution.
So for "reasonable" differential equations, the answer to your
question is yes.

----------------------------------------------------------------------
> What exactly is logistic differential equation?

It's just a strange name that people have given to a certain
differential equation proposed to model population growth;
equation 2 on p.581.

Why is it called "logistic"?  After looking online, I think the
following is the explanation.

The word "logistic", in addition to its other meanings, used
to have the mathematical meaning  "related to logarithms and
exponentials".  The mathematical biologist Verhulst, mentioned
on p.581, not only proposed the differential equation for population
growth, but found the solution, equation 7 on p. 609.  Since this
solution involves an exponential function (though it is not itself
such a function), he called it "logistic growth".  And since his
differential equation leads to "logistic growth", it came to be
called the "logistic differential equation".

----------------------------------------------------------------------
You ask why under the logistic equation, the population never exceeds
the carrying capacity.

This is not quite true in real-world situations.  The carrying
capacity might change (due to some environmental change), and a
population that had been below the carrying capacity might find
itself above the carrying capacity; or some circumstances might
force a population out of a region that could sustain it and into
one with a smaller carrying capacity.  In that case, once the
population was facing a new constant carrying capacity, the situation
would be represented by one of the descending curves above "P=M" in
Figure 3 on p.581.

But if the population starts below or at the carrying capacity, and the
carrying capacity is constant, as assumed on p.581, and the logistic
equation truly holds, then the population will never rise above it,
since to reach a value above it, it would have to have positive
derivative at some point where  P > M, which would contradict the
logistic equation.  (To see this, apply the Mean Value Theorem to an
interval from the moment it crosses P=M to a moment when P>M.)

----------------------------------------------------------------------
Regarding Example 2 on on p.584, you ask "How does  y' = .5(y^2-1)
become equal to  (2ce^t)/(1-ce^t)^2?"

Hopefully, you understand that it is not .5(y^2-1)  that becomes
(2ce^t)/(1-ce^t)^2.  Rather, Stewart is saying that examples
of functions  y  satisfying  y' = .5(y^2-1)  are given by the
functions  y = (2ce^t)/(1-ce^t)^2.  (Notice that the former
equation is a formula for  y',  the latter a formula for  y.)

Stewart isn't saying here how to discover the solution to this
differential equation!  In this section he is teaching us what it
means for a function  f  to be a solution to the equation.  Once we
understand that, he will show us, in later sections, some methods
of finding solutions.

----------------------------------------------------------------------
Regarding Example 2 on p.588, you ask what an "equilibrium solution"
is.

The first thing to do when the book assumes you know a term and
you don't recognize it is to look in the index.  If you don't find
it there, or have trouble understanding the definition, then ask!

----------------------------------------------------------------------
You ask why Euler's method (p.589) is needed; why the direction field
method is not enough.

Well, when we draw a direction field, we can only put our little
line segments at a limited number of points.  Say we put them at
every interval of .05, so that there are 20 per unit, and thus
400 per 1 x 1 square.  (A lot of work to draw, especially getting the
slope of each "just right"!)  And suppose we have a perfect hand, that
can draw a curve that exactly matches the slopes of those segments
(which is actually unreasonable to assume).  Still, what do we do
*between* segments?  If we have reached  x = 0.35, getting a
y-value of  0.72,  the slope we use should be somewhere between the
slopes of our little segments at  (0.35,0.70)  and  (0.35,0.75),  but
just what in between value will it be?  One can take a slope 2/5 of
the way between the values at those points; but that will only be
correct if the function  F(x,y)  is un-curved between those points.
And even if we know the correct slopes at, say  x = 0.35 and x = 0.40,
what do we use for slopes between those x-values, as we move our
pencil from one to the next?

Even Euler's method only gives an approximately correct answer.
But there, we can take the intervals between the points we use
as small as we want, depending only on how much computing time we
are willing to use.  And even before computers were available
(they certainly weren't in Euler's time!), the computations could
be done by accurate arithmetic to many decimal places, rather than
depending on judgements of eye and hand.

Incidentally, to a mathematician, a "direction field" means something
much more abstract than what Stewart describes: a function that
associates to *every* point a direction; rather than the approximation
one gets by drawing little line segments at regular intervals.
A direction field in this abstract sense does exactly determine
the solution curves, which neither Euler's method nor the method
that Stewart gives us does.  But the abstract concept does not give
us a way of computing that solution.

----------------------------------------------------------------------
You ask about the term  h F(x_{n-1}, y_{n-1})  in the
description of Euler's method on p.590.

Given that  y_{n-1}  is our approximation of  y  at  x = x_{n-1},
we use the differential equation  y' = F(x,y)  to approximate
the slope of the curve,  getting  F(x_{n-1}, y_{n-1}).  Since
the interval from  x_{n-1}  to  x_n  has length  h  (see the
beginning of that paragraph), the amount that we estimate that
y  changes over that interval, namely, the length of the interval
times our approximation of the slope, is  h F(x_{n-1}, y_{n-1}).
Our approximation for  y  at the end of the interval is the sum
of our approximation at the beginning of the interval and our
approximation of the change:  y_{n-1} + h F(x_{n-1}, y_{n-1}).
We name this  y_n.

(Incidentally, if most of the pictures in the margin give the
impression "Those approximations are way off!", this is because
Stewart has taken  h  large enough so that you can easily see the
difference between the curve and the approximation.  When one
takes  h  small enough to get a really good approximation, as in
the higher curves in Figure 16, then it is harder to see the
process.)

----------------------------------------------------------------------
You ask about the meaning of the display "h(y) dy = g(x) dx"
on p.594.

Good question!  Historically, "dx" and "dy" were symbols for
the "infinitesimal" quantities that  \Delta x  and  \Delta y
turned into as they approached 0.  So differentiation was the
process of taking the ratio of these, and integration was the
process of summing (infinitely many) such infinitesimals.
Then George Berkeley pointed out that this had no logical
backing, and mathematicians started defining  dy/dx  as the
limit of the ratios  \Delta y/\Delta x,  and  \int f(x)dx,
as the integral of  f(x)  with respect to  x  (also defined as
a limit), without giving meanings to  dy  and  dx  themselves.
But the latter were so convenient for thinking about these things
that they eventually came up with ways of redefining them:  in
Stewart, see the first paragraphs of the section "differentials"
on p. 253.

But having made the definition he gives there, which is
compatible with the definition of  dy/dx,  is it also compatible
with the definition of integration, so that equality of
h(y) dy  and  g(x) dx  implies equality (up to a constant of
integration) of the corresponding integrals?

Well, it took me a bit of searching to find where Stewart justifies
this, and he doesn't go into much detail; but see p. 408, "The
Substitution Rule" and the two paragraphs that follow.

----------------------------------------------------------------------
You ask whether putting equation [1] on p.594 into "differential
form" and then integrating both sides is valid.

I would say, yes, but to see definitions that justify it, one would
have to go to a higher-level course (and in this case, I'm not sure
what that course would be -- maybe Math 214.)

However, one can justify [2] more easily than Stewart does.  The
version of "h(y)dy = g(x)dx" that doesn't involve differentials is

			h(y)y' = g(x).

Now if  h  has an antiderivative  H,  and  g  has an antiderivative
G,  then the two sides of the above equation are the derivatives of
H(y)  and  G(x).  Since these derivatives are equal, the two functions
must differ by a constant, so we get  H(y) = G(x) + C,  which is,
essentially, [2].

----------------------------------------------------------------------
You ask how Stewart goes from the 3rd-from-last equation on p.594
to the next-to-last equation.

Well, the integral of  h  is some function; let us call it  H.
(The general integral will have the form  H + C  for some constant
C,  but we are just interested in any one integral, which we call  H.)
Thus, the expression  \int h(y) dy  means  H(y).  In this discussion,
y  is a certain function of  x.  The left-hand side of the first of
the two equations you ask about can now be written  d/dx H(y),  and
the Chain Rule (which Stewart has said he will use) turns this to
(d H(y)/dy)(dy/dx),  which is the left-hand-side of the second
equation.

Meanwhile, on the right-hand side, Stewart is applying the
Fundamental Theorem of Calculus; if we give the integral of  g
the name  G,  he is rewriting what we would call  d G(x) / dx  as
g(x).  (At the next step, he will similarly apply the Fundamental
Theorem of Calculus to the left side.)

----------------------------------------------------------------------
You ask whether Stewart is using circular reasoning on p.594, where he
deduces [2] from [1] and then at the bottom of the page deduces [1]
from [2].

No; in that last step, he is *checking* his solution, not deducing
a new fact.  The reason he wants to check it is that one may be unsure
whether expressions like "h(y) dy" which he used in getting [2] from
[1] have a clear meaning.  He has shown us how to do the calculation
using such expressions because it is such an elegant and
easy-to-remember method; but once he has the solution it gives, he
wants to justify it, which he does by checking that [2] implies [1];
i.e., that a function  y  satisfying [2] will satisfy [1].

> ... should circular logic be avoided all the times?

The phrase "circular reasoning" means assuming what one is trying
to prove in an attempted proof of it; so in that sense, it is never
valid.  But there are many things having some resemblance to that
which are useful; for instance, talking about what one hopes to
prove before beginning a proof; or (as I mention in the handout on
induction) proving the n=k+1 case of a statement from the n=k case.

----------------------------------------------------------------------
Regarding Example 1 on p.595, you note that Stewart simplifies
the solution by writing  K in place of 3C, and you ask, "Since C
is an arbitrary constant, can we write C in place of 3C?"

Well, sometimes we do say things like "Let us write  C  for what
we previously called  3C".  Other times, one chooses to keep
one's notation consistent between different parts of a calculation,
and uses a different letter.  One tries to balance the goals of
simplicity, brevity, and clarity.  Different people make different
choices.

----------------------------------------------------------------------
You ask about the uniqueness theorem Stewart mentions in the margin
at the top of p.596.

A uniqueness theorem says that under certain conditions, the solution
to a problem is unique.  The uniqueness theorem Stewart talks about
is for a differential equation together with an initial condition;
or more generally, together with a condition saying that at a certain
x-value, the function has a certain y-value (even if the x-value
isn't at the beginning of the interval, as in the case of an initial
condition.)

The uniqueness theorem doesn't apply to all differential equations.
For instance, I pointed out last week that the equation  y' = 2 \sqrt{y}
has solutions that start out following the x-axis,  y=0,  and then at
some arbitrary point  x = a  start following the curve  y = (x-a)^2.
This misbehavior is related to the fact that the function  \sqrt{y}
grows very fast near  y = 0.  Uniqueness theorems typically have
bounds on how the functions involved grow.

Such a theorem is applicable to the equation Stewart is looking
at here  y' =  x^2 y,  in which the function  x^2 y  doesn't grow
unreasonably fast).  Now since  y = 0  is a solution to the equation,
we see from the uniqueness theorem that any solution which has  y = 0
at one point has to be equal to that solution everywhere; so a solution
which is not everywhere 0 can't take on the value 0 at all, which is
what Stewart is claiming.

To see a uniqueness theorem such as Stewart is referring to, see
http://en.wikipedia.org/wiki/Picard-Lindelof_theorem .  The bound
on how the function grows is the condition immediately following the
displayed equations at the beginning: "Suppose  f  is Lipschitz
continuous in y and continuous in t."  If you click on "Lipschitz
continuous" you'll see what that condition is.

----------------------------------------------------------------------
You ask what uniqueness theorem Stewart is referring to in the top
marginal note on p.596.

Good question!  I'm not sure; probably a theorem saying that if we
have a solution to  y' = f(x,y)  that passes through some point
(x_0,y_0), and if  f  is not too badly behaved, then no other solution
can pass through  (x_0,y_0).  (We saw in class on Monday that
2 \sqrt y  was badly behaved at  y = 0,  since its derivative near
y=0  approaches infinity, and that  y' = 2 \sqrt y  in fact had
nonunique solutions with certain initial values.)  Now since the
constant function  y = 0  is a solution to  dy/dx = x^2 y  that
passes through every point  (x,0), it follows from that uniqueness
that no solution other than that one -- i.e., no solution
that ever takes on a nonzero value -- can pass through any point
(x,0).  This says that every nonzero solution to that equation stays
nonzero.

But I'll suggest to Stewart that he either try to make that comment
more informative, or drop it.

----------------------------------------------------------------------
You ask, regarding the point on the top half of p.597  where Stewart
writes +- (e^-3c)(e^-3t) = Ae^-3t, how the +- sign can disappear?

Stewart is letting  A  denote  e^{-3c}  if the sign is +, and -e^{-3c}
if it is -.  So in either case, +- e^{-3c} becomes  A.

To do this, one has to realize that the sign + or - will be constant
in any solution to the differential equation -- the function can't
jump between a solution based on a positive and a negative coefficient
-- so if we write  A = +- e^{-3c}, then  A  really is a constant.

----------------------------------------------------------------------
You ask whether there is a family of curves depending on a single
parameter  k,  for which it is impossible to find a system of
orthogonal trajectories as discussed on p.597.

That depends on what sort of messy behavior we allow.  (Mathematicians
call such messiness "pathology".)  If we don't require the curves to
be differentiable (e.g., if we let them "wiggle" at arbitrarily small
scale), then the concept of orthogonality has no meaning.

In a much milder vein, we might have a family of curves that cross
each other, such as the parabolas  y = (x-c)^2.  Then we wouldn't
know which curve through each point to make our orthogonal curves
orthogonal to; though this could be handled by noting that around
each point with  y > 0,  the curves that pass by can be divided
into two clearly distinct families, so we can get curves that are
orthogonal (in such a region) to those in one family, and other curves
that are orthogonal to those in the other family.

Finally, there can be families of curves that are nice in all the
above senses, and which have orthogonal families, but for which the
latter family is not given by elementary functions, so that we can't
"find" it in the sense of writing down a formula for it.

(More than you bargained for when you asked your question, I would
guess.)

----------------------------------------------------------------------
Regarding the equation at the beginning of "The Logistic Model" on
p.606, you ask, "How is P determined to be small enough to prefer
to apply the exponential model over the logistic model?"

If we're looking at a situation for which we know the conditions
well, the exponential model is applicable when the population is
small enough so that competition for resources is not important.
If we're looking at data and don't know much about the details of
the situation, then we apply the exponential model if it seems to
fit those data well.

----------------------------------------------------------------------
You asked why in the logistic equation  dP/dt = k P(1 - P/K) (p.607),
the second factor has the form  1 - P/K  rather than  K - P,  which
would likewise have the effect of making growth decrease to zero
when the population approached  K.

This is so that when  P  is small, i.e., when the problem of
overpopulation doesn't limit the growth, the equation approaches
the natural growth equation.

Of course,  P' = k P (1- P/K)  could be rewritten  P' = L P (K - P),
taking  L = k/K.  But this  L  doesn't have a very natural meaning
for the population growth, while  k  does: it is the relative growth
rate when the population is small.

----------------------------------------------------------------------
Regarding the paramecium problem on p.610, you ask how we find
values like the "k" given there.  You ask "Do we just plug in
some numbers until we get a good value (line of best fit kind
of thing)?"

There's a technique for computing "best fit" -- the method of least
squares, covered in Math 54 and sometimes again in Math 110.  I
think Stewart is intentionally vague about how the numbers were
found because that sort of computation is not a topic of calculus.

----------------------------------------------------------------------
You ask how, in the experiment described on pp.610-611, Gause could
estimate the carrying capacity as 64, when it reached the value 76
early on.

It's hard to know what to make of the data we are given about that
experiment.  One could justify the value 64 on the assumption that
when the number of individuals is small, it can sometimes exceed and
sometimes fall under the number  M  that the environment could support
on a long-term basis; so that an *average* of the population over a
period of time when it had stopped growing would be a reasonable
value for the carrying capacity.

It's also not clear whether the numbers we are given represent
the actual population of paramecia, or the number that he counted
in a fixed-size sample; e.g., one droplet of water, put under the
microscope each day.  If the latter is the case, then if that daily
sample constituted, say, 1/100,000 of his paramecium culture, then
the actual populations would not be small numbers like 2, 3, ..., 57,
but hundreds of millions of paramecia, and the fact that the numbers
bob up and down irregularly might just mean that the sample droplet
he took each day sometimes contained more and sometimes less of
the paramecia in the culture.

----------------------------------------------------------------------
Regarding the Paramecia example on pp.610-611, you write

> In the beginning of the solution, Stewart comments that biologist G.F.
> Gause used the same relative growth rate from the exponential growth
> model for paramecia for his logistic growth model. Stewart states that
> this is reasonable because the initial population is small compared to
> the carrying capacity. Roughly how large can the initial population
> be compared to the carrying capacity before this assumption ceases
> to be reasonable?

Such things depend on the degree of accuracy that one wants in
determining one's constants.  As I said in class, despite Gause's
giving the relative growth rate to 4 decimal places, he could
not have been claiming it was accurate to that many places, so
we don't know what accuracy he was claiming.

Even more important than the ratio of the initial population to
the carrying capacity is the ratio of the later populations that
are used to estimate the "initial relative growth" to the
carrying capacity.  E.g., if Gause based his 0.7944 on the
values of  P(0), P(1), P(2), P(3),  as may well be the case,
since the upper curve in Figure 4 (p. 611) seems to be nestled
nicely among those first four points, then that value of  k
may be far from appropriate for the logistic curve, since
those values extend into a region where the black and red
curves of Figure 4 are quite far apart.  Perhaps the best
value to use for logistic growth would have been larger,
leading to a curve that rose faster before flattening out,
and fit the data better.

On the other hand, I find it suspicious that P(3) sits squarely
on the red curve in that graph.  I wonder whether Gause chose
his coefficients in the logistic equation precisely to make
P(0) = 2, P(3) = 16, and  K = 64.  If  that was so, then he
didn't really estimate  k  by applying the exponential model
to the initial growth.

Hmm, I just pulled out a table of logarithms and found the value
of  k  that would the values of P(0), P(3), and 64 in the limit;
and it was 0.77846.  Close, but not 0.7944.  So we really can't
tell how he computed his value.

----------------------------------------------------------------------
You're right that the most of the models of population growth we've
been shown make extinction impossible!  The models in the lower half
of p.612 would allow it, though.  For simplicity, let us drop the
1 - P/M  term from both, since we are talking about a situation
where  P  is low, rather than close to the carrying capacity of the
environment.  Then both curves take the form  P' = kP - constant,
and if  P  is low enough so that  kP  is less than that constant,
it will decrease at an ever-growing rate.  (And, if we believe the
equations, it will become negative and keep getting more negative.)

If  P  is above the critical value, then according to the equations,
extinction would never occur; but in reality, changes in the
environment could change the values of the constants involved, so
that a population that had been above the critical value would
suddenly find itself below it, and if things don't change for the
better, become extinct.

----------------------------------------------------------------------
In connection with the material on p.616, you write

> ... Stewart goes from xy' + y = (xy)' to (xy)' = 2x.
> How does he make this connection?

Good question!  Note that between the two equations that you quote,
he says, "and so we can write the equation as"; so we have to
find "the equation" he is referring to!  This is the equation
xy' + y = 2x,  in the sentence containing equation (2).  When I
write him, I will suggest that he display and number that equation,
and refer to it by number.

----------------------------------------------------------------------
Regarding the top displayed equation on p.617, you ask why there
is a constant of integration  C,  when the indefinite integral
hasn't been evaluated.

Well, I guess Stewart is thinking of "\int I(x)Q(x)dx" as denoting
any particular choice of antiderivative for  I(x)Q(x),  and is
throwing in the "+ C" to get an expression that will denote all
antiderivatives.  I don't know whether he has a fixed convention
on whether an indefinite integral symbol means "all antiderivatives"
or "some antiderivative"; in his tables of integrals, he leaves out
the + C from the integral itself, and only shows it on the solution.
I'll raise the question in my e-mail to him at the end of the Semester.

----------------------------------------------------------------------
You ask whether we should have a constant C in the integrating
factor introduced by Stewart on p.617.

We don't need one.  When one multiplies by an integrating factor,
this transforms the equation into an equivalent equation, hence into
one that has exactly the same set of solutions.  So it's enough to
find one integrating factor.  Stewart makes this point in the sentence
before display [5], saying "We are looking for a particular integrating
factor, not the most general one ...".

----------------------------------------------------------------------
You ask how Example 2 on p.618 would change if we dropped the
condition  x > 0.

Well, note that the differential equation  x^2 y + xy = 1  can't
have a solution that behaves nicely at  x=0,  since there the
equation becomes  0 = 1.  So we can find solutions that are defined
for negative  x,  and solutions that are defined for positive  x,
but these will not connect with one another.  If we want a solution
that satisfies  y(1) = 2,  i.e., that passes through the point (1,2),
is has to be a solution defined for positive  x.  Hence Stewart's
condition  x>0.

If we drop the condition  y(1)=2,  then we can look separately for
solutions defined for positive  x  and solutions defined for negative
x.  The difference would come up when we integrate  1/x.  The
expression for that integral that Stewart usually shows is  ln |x|.
In this problem, since he had  x > 0,  he was able to write this
as  ln x.  If we looked for solutions for  x < 0,  then we could
make the integral  ln (-x).  We could then find such a solution passing
through and point  (a,b)  with negative  a.

----------------------------------------------------------------------
You ask about the limits of integration on the integral in the
second display on p.619.

That display is gotten by making the first display more precise.
In the first display, the indefinite integral shown means "any
function whose derivative is  e^{x^2}".  The different functions
having that derivative differ by constants.  If one adds a
constant one such function, then, since it is multiplied
by  e^{-x^2}  in the formula for  y,  that formula  gets a constant
times  e^{-x^2}  added to it.  This just corresponds to choosing
a different constant  C  in that formula; so the set of functions
described by the first display doesn't change if one chooses a
different indefinite integral.

But if we want an answer that doesn't involve a random choice
of function with derivative  e^{x^2},  we choose a particular
such function, namely the definite integral of  e^{x^2}  from
0  to  x.  The Fundamental Theorem of Calculus tells us that the
result has the desired derivative.  We could instead use the definite
integral from any fixed starting point  a  to  x;  this would differ
from the value we have chosen by the integral from   a  to  0  of
e^{x^2},  which is a constant; and for the reasons stated in the
preceding paragraph, it would again give the same set of solutions.
The choice  a = 0  just gives us a particular formula to write down.

----------------------------------------------------------------------
You ask whether the unit of time is significant in the differential
equation [7] shown for the electric circuit on p.619.

In that differential equation and the discussion that precedes it,
the statement that the voltage across the inductor is  L(dI/dt)
depends on the inductance  L  being expressed in units compatible
with those of the time  t  and the current  I.  When time is
expressed in seconds, and current in amperes, then the corresponding
unit of inductance is the henry.  If units that did not match were
used, then one would have to put a correcting factor into the
formula  L(dI/dt).  (E.g., if one kept the ampere and henry, but
expressed time in milliseconds, one would have to use the formula
1000 L(dI/dt).)

----------------------------------------------------------------------
You ask whether the natural growth function  dR/dt = kR  for rabbits
in the absence of wolves (p.622) is similar to the model of the
growth of a rabbit population given by the Fibonacci sequence.

Yes, but the formula defining the Fibonacci sequence is a
"difference equation" rather than a "differential equation":
Instead of describing the "instantaneous" rate of change, it
describes the change over a fixed interval of time.  If we
write it as  f_n - f_{n-1} = f_{n-2},  it says that the
change in the number of pairs of rabbits as  n  changes by  1  is
given by the number of pairs of rabbits -- and not the present
number, but the number two units of time ago.

The relation between solving difference equations and solving
differential equations is like the difference between summing
a series and integrating a function.  It is often difficult to
find an explicit solution to a difference equation (though one
can easily compute any number of terms), just as it can be
difficult to find an explicit formula for the sum of a series.
However, linear difference equations with constant coefficients,
like linear differential equations with constant coefficients,
have solutions which, in general are given by linear combinations
of exponential functions; and this approach leads to an expression
for the general term of the Fibonacci sequence in that form.

There is a free online calculus text, "Difference Equations to
Differential Equations" by Dan Sloughter,
http://synechism.org/drupal/de2de/ , which I suppose, based on
its title, emphasizes the relationship between the two sorts
of equations.  (But I haven't gone through it.)

----------------------------------------------------------------------
You ask whether solutions to the Lotka-Volterra equations (p.623)
other than the equilibrium solution approach the equilibrium
solution, or go around and around endlessly.

They go around and around endlessly, since they have to stay on
a curve of the sort I showed in class, which does not approach
the equilibrium point.

On the other hand, modified equations, like that of Exercise 9,
may approach the equilibrium solution.

As for real-world predator-prey situations -- the ones we hear
about do seem to oscillate rather than ending up at equilibrium.
But it might be that some pairs of species tend to reach
equilibrium, and others tend to oscillate (depending on subtle
differences in the genuine equations for their interaction), and
that only the oscillating ones get mentioned in books on
differential equations.

----------------------------------------------------------------------
Regarding Stewart's comment just before Example 1 on p.623
that it it usually impossible to find explicit formulas for
the  R  and  W  satisfying the Lotka-Volterra equation, you ask

> ... Are there certain type of differential equations which have
> been proven to be impossible to solve non-graphically? ...

We have already seen that there are elementary functions whose
integrals are not elementary functions.  Since an integration
problem is a special sort of differential equation, these are
examples of differential equations that cannot be "solved", if
by solving one means naming the solution as a certain elementary
function.  Likewise, there are differential equations which are
not themselves integration problems, but which can be reduced
(using the method of separable equations, or the method described
in the handout) to integration problems where the integral is not
an elementary function.  Doubtless there are also differential
equations whose solutions are non-elementary functions which
don't arise in this way from integrals.

But as with the case of integrals, one can always give the
solution to such an equation a name and a symbol, calculate
tables of values, and solve other differential equations with
the help of that function.

----------------------------------------------------------------------
You ask about Stewart's instruction to "make sketches of R and W
as functions of t", which he states as part (e) on p.623, and
claims to do on p. 625.

That is a topic on which I had already made a note to write to him!
The phase trajectory gives no information as to the relative speeds
of progress at different points.  So all one can really put into
these sketches is information as to how high and how low each curve
gets, and what point of the cycle each one is in when the other is
at a given point of the cycle.  The additional information that
Figures 2 and 3 imply (e.g., that the wolf population falls more
slowly than it rises) certainly can't be deduced from the phase
diagram.

----------------------------------------------------------------------
> In point (b) on p.624, Stewart uses the chain rule to get
> dW/dR, why did he do that? Is it wrong to just divide dR/dt
> by dR/dt to get dW/dR or do we have to use the chain rule?

Well, when we are dealing with situations where there is just
one independent variable, expressions like  dy/dx  behave
like fractions, and the chain rule  dW/dt = dW/dR dR/dt  can be
thought of as cancellation of a numerator and denominator.  So
it is safe to treat these derivatives symbolically as fractions,
and multiply them and divide them as one would multiply and
divide fractions -- if one remembers that this only works in
the case of 1 independent variable.

But when you get to Math 53, you will have a more general kind
of differentiation:  Given a function of (say) two variables,
y = F(w,x),  you will learn about taking "the derivative of  y
with respect to  w  as  x  is held constant" and "the derivative of
y  with respect to  x  as  w  is held constant", which will be
written "curly-d y / curly-d w" and "curly-d y / curly-d x" (where
"curly-d" is a symbol sort of like a backwards 6).  In these
situations, one can't treat them simply as fractions: the
chain rule takes a more complicated (though still elegant) form.

So in conclusion, you can "just divide dR/dt by dR/dt" -- if
you keep in mind that this only works when there is just one
independent variable.

----------------------------------------------------------------------
You ask how one could give versions of Figures 1 and 2 on p.624
that bring in time.

Well, in Figure 1, one could make the little line-segments have
length proportional to the speed.  This simply means that the
difference between the R-coordinates at the two ends should be
proportional to  R',  and the difference between the W-coordinates
of the ends should be proportional to  W'.  But this might be
difficult in practice, because where the speed is small, the
segments might look nearly like dots, while where it was large,
they could end up running through one another.

In Figure 2, one could put many dots along the curves, at intervals
corresponding to some unit of time.  But since the times that
the different curves take to get back to where they started cannot
be expected to have convenient ratios, it would not in general
be possible to choose the unit of time so that the spacing rule
applied everywhere -- one would, I suppose, give each curve a
"starting point" (say, its bottom, or top, or leftmost or rightmost
point) and stipulate that the distance from the last dot before
that point to that point does not necessarily represent the same
time-interval as all the other distances do.

----------------------------------------------------------------------
You ask about the time it takes to complete various trajectories of
the predator-prey equation in phase space (p.624).

One certainly can't tell that from the direction-field alone.

As I indicated in class, one could draw a direction-field
diagram in which, at each point one had, not just a mark
showing the direction, but an arrow whose length represented
the speed with which the system moved in that direction;
and using this, one could draw curves with widely or
narrowly spaced markings indicating the passage of time.

> ... do the trajectories further out imply a longer period, or do
> they imply the same period because the derivatives will be higher?

Near the equilibrium point -- let's call it (R_e,W_e) -- I am fairly
sure that the period will approach some nonzero limiting value.  This
is because the pair (R',W') can be approximated near that point by
a linear function of  (R-R_e, W-R_e),  and the differential equation
determined by a linear function rotates the whole plane (around
concentric ellipses) with a constant period.  As we move out
from the equilibrium point, and the linear approximation fails,
the period doubtless changes.  It's not obvious whether it will
increase or decrease.  My guess is that it will increase; because
if we take an initial point with  W  very very low, then  R
will grow with approximately "natural increase" for a long time,
until the initially tiny value of  W  builds up enough to start
bringing  R  down.

----------------------------------------------------------------------
You ask how there can be an equilibrium point in a phase portrait
like that on p.624, when every point shows a slope indicating
change in the population.

At the point marked by the red dot in the center of the phase
portrait, there is no slope:  It is the point where the numerator
and the denominator of the formula near the top of p. 624 are both 0.
So  dW/dt  and  dR/dt  are both 0, and the population does not change.

The thing to remember is that the phase portrait ignores time.
When one is near the equilibrium point, the functions  R  and  W
have very slight change per unit time; but the portrait just shows
dW/dR,  which doesn't reflect the smallness of the change of each
function.

I mentioned in class that one could refine the technique of drawing
a phase portrait by replacing the sloping direction-field symbols
by arrows, such that an arrow is longer if  dR/dt  and  dW/dt  are
larger, and shorter if they are smaller, even when their ratio (and
hence the slope of the arrow) remains the same.  Then as one looked
at the phase portrait, one would see arrows getting shorter and
shorter as one got near the equilibrium point, and finally going
to zero length at that point.

----------------------------------------------------------------------
You ask why the ratio between minimum and maximum population of
rabbits in Figure 3 on p.625 is so much more extreme than the ratio
between minimum and maximum population of wolves.

Interesting question; I don't know the answer.  Mathematically,
it probably has to do with the fact that the coefficients that
Stewart has given us in the equation for  dW/dt  on p. 623,
.02 and .00002, are much smaller than the coefficients in the
equation for  dR/dt,  .08 and .001.  This may have the effect
that the population of wolves grows and shrinks to a smaller
degree than that of rabbits.  What this corresponds to in the
real world is harder to guess.  I can imagine that wolves might
eat rabbits until the rabbit population is extremely low, and
then, when that happens, switch to eating mostly other foods
-- mice, berries, etc..  But this would mean that they aren't
really behaving in accordance with the equations, which seem
to be based on the assumption that they need rabbits to live.
So it might be that the coefficients in the equations are
distorted by people trying to use the equation to model data
from a real-world situation that really doesn't fit it.

----------------------------------------------------------------------
You ask whether it might be appropriate to introduce a carrying
capacity for wolves into the population equations, as Stewart
does for rabbits on p.626.

Well, insofar as the idea of carrying capacity is based on limitation
of resources, and it seems to be assumed that the resource limiting
the population of wolves in this scenario is food, and that rabbits
are their main source of food, the Lotka-Volterra equations already
involve "carrying capacity".  But it doesn't appear in the the same form
as in section 9.1, where it is essentially a P^2 term on the right-hand
side of the equation.  Which version models reality better I don't
know.  Anyway, it might be reasonable to introduce a section-9.1-type
carrying capacity term in connection with some other resource.

----------------------------------------------------------------------
You ask why, in studying sequences, we can only define limits as
n --> infinity (p.692), and not limits as  n  approaches other values.

For a function  f defined on an interval of the real line, the idea
of the definition of  lim_{x->a} f(x)  is based on looking at the
values of  f(x)  for  x  arbitrarily close to  a  (i.e., differing
from  a  by less than any positive real number \delta).  But if we
instead have a sequence  (a_n),  and a particular positive integer
N,  we can't look at the values  a_i  for  i  "arbitrarily close to  N",
because  i  ranges over the integers, so the difference  |N-i|  can't
be made any smaller than  1.  In other words, the set of integers is
discrete, so we can't look at one integer "approaching" another.

(That idea reminds me of a bit of graffiti I once saw on a bulletin
board in Evans Hall:  "\sqrt{3} = 2  for large values of 3".)

(There are some concepts of integers being "close" occurring in
more advanced areas of math, which are very different from the
familiar one, and with respect to which one can define the kinds of
limits you ask about.  For instance, one can fix a prime number  p,
and consider two integers  m  and  n  to be "closer" the larger the
power of  p  that divides  m-n.  With respect to that concept, called
the "p-adic metric", one can indeed talk about a sequence having a
limit as  n  approaches an arbitrary integer.  But that topic is
far from freshman calculus.  One might see a bit of it in Math 115;
and more in Math 254AB.)

----------------------------------------------------------------------
You ask why the precise definition of a sequence having limit
"infinity" (p.693, Definition 5) does not use absolute values, like
definitions of other sorts of limits.

The absolute value signs (and the specification that the absolute
value be less than some value delta or epsilon) are involved when
a finite value is being approached either by the independent variable
x  or by the dependent variable  f(x).  This is because a finite
value can be approached from either side; so to say that  x  is near
a  on one side or the other, or that  f(x)  is near  L  on one
side or the other, one uses the formula  |x-a|<delta, respectively
|f(x)-L|<epsilon.

But when we say that a real number is approaching infinity (i.e.,
positive infinity), it can only do that from below; and it does so
by getting and staying larger than any real-number value; so the
precise statement takes the forms  x > N  and  f(x) > M (p. 140,
Definition 9) or  n > N  and  a_n > M  (p. 693, Definition 5).

When one of the two variables (independent and dependent) approaches
infinity, and the other approaches a real number, one gets definitions
that contain one inequality involving an absolute value, and one without
(p. 115, Definition 6, p. 130, Definition 1, p. 692, Definition 2.  The
same happens when one of the variables is approaching -infinity.)

----------------------------------------------------------------------
You ask why the Squeeze Theorem for Sequences (p.694, first
boxed statement) is only stated for limits as  n --> infinity, while
the Squeeze Theorem for Function (p. 105) is stated for limits
as  x  approaches an arbitrary  a.

For a function of a real variable, one can talk about the limit as
that variable approaches an arbitrary number or infinity or -infinity.
But for a function of an integer-valued variable, there is no concept
of letting the variable "approach" an integer  n.  An integer either
equals  n,  or it differs from  n  by at least  1; there is no
"getting closer and closer".  So limits as  n  approaches +infinity
are the only kind that we can look at.

(In other areas of mathematics, there are concepts of integers
approaching integers.  For instance, if one is interested in
divisibility by  2  (or some other prime  p), one can regard  m
as "close to"  n  if  n-m  is divisible by a large power of  2  (or
generally,  p);  and one can define concepts of limit with respect to
this concept of "closeness".  You would see these concepts in
Math 254; but they're out of the ballpark for Math 1AB.)

----------------------------------------------------------------------
You ask how, in the proof of the Monotonic Sequence Theorem,
p.698, one can be sure that  \{a_n\}  has a least upper bound.

Stewart gives the reason before he makes the assertion -- he says
"By the Completeness Axiom".

If you had skimmed too quickly, and not noticed his statement of the
Completeness Axiom right before the theorem, then seeing that
reason given, you should have looked back to see what the axiom
was about.  When you don't understand something in a math text, it
is always a good idea to first look right before what you don't
understand.  But if it didn't occur to you that the Completeness
Axiom might have been given right before the theorem, you should have
gone to the index of the text and looked up "Completeness Axiom".
It would have told you that the axiom was on p.698, and looking over
the page, you would not have had too much trouble finding it.  (It
is named in bold type where it is stated.)

Once you've read the Completeness Axiom, let me know whether you can
see how it is applied in the second sentence of the proof of the
Monotonic Convergence Theorem.  If you still have trouble, try to
be precise about what it is -- how much you can see of the connection
between the proof and the Axiom, and where you have trouble fitting
them together.

----------------------------------------------------------------------
You ask about the distinction between mathematical induction
(p.699, note in left column) and deduction, and how often each is used.

I would call any kind of precise reasoning "deduction"; so it would
include mathematical induction.

In nonmathematical usage, "induction" can refer to a non-rigorous
kind of reasoning.  The Oxford English Dictionary gives, as its
7th meaning of the word,

	Logic. a. The process of inferring a general law or principle
	from the observation of particular instances (opposed to
	DEDUCTION, q.v.).

But in mathematics, where mathematical induction is a rigorously
valid tool, there isn't a contrast between it and deduction.
(The OED's 8th meaning of the word is that of mathematical induction.)

Mathematical induction is only one of many tools of mathematical
reasoning, and a somewhat sophisticated one, so it occurs only in
a small fraction of the cases of deduction; but it is a powerful
tool in those situations where it is needed.

There are also many situations where we use mathematical induction
in a situation where the reasoning is intuitively clear, and we
don't think to call it by that formal name.  E.g., knowing that the
derivative of a polynomial of degree  n>0  has degree  n-1, we can
"see" that the k-th derivative of that polynomial for  k\leq n
is  n-k;  though to argue this precisely, one would have to use
mathematical induction.

----------------------------------------------------------------------
You ask how, in Example 7, pp.707-708, showing that  s_{2^n}  
diverges, implies that the whole sequence of partial sums
s_n  diverges.

If the sequence  s_n  converged, then there would be some
L  such that, as  n --> infinity,  the values of  s_n  became
arbitrarily close to  L.  So the values  s_{2^n},  being
among these values, would also become arbitrarily close to  L.
But since they are approaching infinity, they are not becoming
arbitrarily close to any fixed real number  L.

Having been shown the argument in rough form, you ought to be
able to translate it into an "epsilon-delta" argument.  Can you?

----------------------------------------------------------------------
Regarding Example 8 on p.708, you ask

> What is a harmonic series?

The phrase "harmonic series" at the beginning of this exercise appears
in boldface type.  This signals that what is being said is the
definition of the term.  So the phrase "the harmonic series" means
"the series 1 + 1/2 + 1/3 + ... + 1/n + ...".  (Depending on the author 
and the type of writing, definitions may be signaled by boldface
or italic type.  Whichever is used, if you see something put in a
different font in mathematical writing, and there is no other obvious
reason to do so, such as making a contrast or stressing something,
it is a good guess that the words are being defined.)

> Why is the solution using partial sums when n = 2,4,8 case?

Because that is the property of the harmonic series that we are
going to use -- that the single term 1/2, and the sum of the next
two terms, and the sum of the four terms after that, etc., are
all about the same size; so as we go on summing terms, we keep
adding in sums of about the same size, so the sums we get
increase without bound.

> And how did the solver decide to use inequalities to solve the
> problem?

Proving divergence (of a series of positive terms) is always a
question of inequalities -- of showing that the partial sums can
get arbitrarily large.

----------------------------------------------------------------------
You ask why Theorem 7, p.709, is true.

Stewart answers this in the sentence right after the theorem!

Did you read that sentence?  If you had difficulty understanding
the reasoning of that sentence, then you should have written me
about that difficulty.

Please e-mail me what your difficulty was!

----------------------------------------------------------------------
You ask about the assumption Stewart makes in the Integral
Test (p.716) and the Remainder Estimate (p. 718) that  f  is
continuous.

He assumes this so that we can be sure that the integral of  f(x)
makes sense -- see Theorem 3 on p. 373.

In fact, it can be proved that any bounded monotone function is
integrable, and using that fact, one can see that the continuity
condition can be dropped from those theorems.  But since discontinuous
monotone functions are not important in a course at this level,
while continuous functions (and the slight generalization of these
referred to in the theorem on p. 373) cover most of the function we
look at, Stewart only states the integrability result for that case;
hence he has to put the continuity assumption into the two theorems
you pointed to.

----------------------------------------------------------------------
You ask about the sentence at the beginning of the new
heading on p.718 saying that "any partial
sum  s_n  is an approximation to  s ...".

That statement doesn't have a precise mathematical meaning;
it conveys the intuitive idea that the  s_n  are the "steps"
toward the limit value  s.  Depending on the series in question
and the value of  n,  the numbers  s_n  may be very near to
s,  or quite far.  (But we know that for any convergent series,
if we take  n  large enough, then  s_n  will be as close to
s  as we wish.)

----------------------------------------------------------------------
You ask why in the proof of the Integral Test on p.720, the summation
of the areas under the curve start with  a_2.

The rectangles with  x  ranging from 1 to 2, and top sides above
and below the curve, are used to give upper and lower bounds on the
integral from  x=1  to  2.  Since  f(x)  is decreasing, it has its
largest value in the interval [1,2] at x=1, and its smallest value at
x=2.  These values are  f(1)=a_1  and  f(2)=a_2  so those are the
heights of those two rectangles, which have base 1.  Hence the
summation that bounds the integral above begins with  a_1,  the area
of the taller rectangle, while the summation that bounds it below
begins with  a_2,  the area of the shorter rectangle.

----------------------------------------------------------------------
You ask about Stewart's statement at (i) on p.720, that if the
integral in question is convergent, then (4) gives an inequality
involving a sum from 2 to n.  Specifically, you ask why the sum
begins at 2 rather than 1.

Well, did you look back at formula (4) (the one labeled [4], in
red, higher on the page) and see what it says?  And if, on
looking at that formula, you were puzzled at why the subscripts
begin where the do, did you look at how Stewart obtained that
formula?

After following the argument back, let me know what you understand,
and what step(s), if any, need clarification, and I'll address these.

(See the "Note" on the lower part of the back of the class handout,
beginning "If in my office hours ...".)

----------------------------------------------------------------------
You ask about applying the Comparison Test (p.722) to a series one or
more terms of which have forms like  1/0.

If that happens, then what one has is not a series.  The definition
of a series requires that every term be a real number!

Note that the series Stewart writes down always avoid such cases;
e.g., p. 712, Ex.66 and p. 721, Ex.22 both start
with values of n bigger than 1 to avoid the zero denominators.

----------------------------------------------------------------------
You ask whether, in the Limit Comparison Test (p.724), given (a_n)
we can take any sequence for (b_n).

(b_n) can be any sequence that satisfies the conditions of the
statement, namely that its terms be positive and that the ratios
a_n/b_n  approach some nonzero limit.  The idea is to look for a
positive sequence (b_n) that is "similar" to (a_n) in the way it
behaves (expressed by the ratios  a_n/b_n  approaching a nonzero limit)
but is simpler in its form (so that one can tell more easily whether
it converges or diverges).

----------------------------------------------------------------------
After a few questions on the estimation of sums discussed on
p.725-726, you ask

> ... what is the use of sequences and series in general?? ...

Well, it can go in either of two directions:  One can have a
known mathematical entity and get a useful handle on it by
finding a series that represents it; or one can have some
mathematical situation leading to sums of terms that get
closer and closer to some value, and try to learn about where
it is heading by finding a simple expression for that value.
In the beginning of this chapter, we focused on the latter
idea, e.g., figuring out what  1 + 1/2 + 1/4 + 1/8 + ... was
approaching; but we have already seen bits of the latter.
E.g., by expressing the known rational number  1/7  as the
sum of an infinite series, we get the decimal expression
1/7 = .142857142857142857..., which is easy to compute with
in our decimal system of notation; and Stewart mentions here and
there that series expansions for pi  allow one to find its value
to great accuracy.  Starting with section 11.9, we will be
studying representations of functions by "power series", which
gives us a new information on functions like  e^x  and  sin x,
as well as non-elementary functions like the integral of e^{-x^2}.

----------------------------------------------------------------------
You ask about justifying the assumption that a_n is less than b_n
for all n greater than some number N in working p.726, Ex.40.

Whether one can take something like that for granted depends on
the level of the audience for which one is writing (or when one
is a student being graded, the level of the course one is in).
At this level, it is best to give the details.

The way to do so is to use the definition of the statement
that  lim a_n/b_n = 0.  That definition begins, "for every
epsilon there exists an  N  such that ...".  After filling in
the "..." in what I have written, do you see what value of epsilon
would have the effect that for  n > N,  a_n < b_n ?

----------------------------------------------------------------------
Noting that the Alternating Series Test (p.727) can be used to
show that  \sum_{n=1}^{infinity} sin(n\pi)/n  converges, you ask
whether  \sum_{n=1}^{infinity}  (sin n)/n  converges.

The answer is:  yes, but we don't have the tools to prove this
in Math (H)1B.

The proof would use the generalization of the Alternating Series
Test that I sketched near the end of class today:  That if
(b_n)  is a decreasing sequence of positive numbers with limit 0,
and if  (e_n)  is a sequence of numbers such that the set of partial
sums  e_1 + ... + e_n  is bounded, then the series  \sum b_n e_n
converges.  (The Alternating Series Test is the case where
e_n = (-1)^n.)  To answer your question we apply this with  b_n = 1/n,
and  e_n = sin n.  It isn't obvious that the set of partial sums
sin 1 + ... + sin n  is bounded, but when we get to reading #25,
we'll have the tools to prove that; I'll put it on the homework sheet
then.  (I don't know whether I'll make it an assigned problem; that
will depend on how many other good problems on Appendix H there are.)
With a little ingenuity, one can also get a bound on the sums
sin 1 + ... + sin n  by geometric reasoning, without the ideas of
Appendix H.

----------------------------------------------------------------------
You ask why, in Example 2, p.729, Stewart couldn't immediately
say that the series diverged on finding that condition (ii) of
the Alternating Series Test failed.

In general, if a test "X ==> Y" fails, i.e., if the condition "X"
is shown not to be true, one can't conclude that "Y" fails.  So,
for instance, in the case of the Alternating Series Test, a series
which fails to satisfy (i) may or may not converge.  However, you
are right that a series that fails to satisfy (ii) can never converge,
because failing to satisfy (ii) is equivalent to satisfying the
condition in the "Test for Divergence".  But I guess Stewart didn't
feel that this would be obvious to the reader, and so gave a separate
three lines of argument.

----------------------------------------------------------------------
Regarding Example 2 on pp.729, you say

> ... we tested the series using the Test for Divergence. I thought
> that rule only applied for series with positive terms. ...

Check out the statement of that Test!  There's nothing in it about
"positive terms".

Stewart is generally a very careful writer.  If he doesn't say
in a theorem that the terms must be positive, then that is not
assumed.

----------------------------------------------------------------------
You ask whether an alternating series can fail to converge if there
are finitely many exceptions to the condition  b_{n+1} \leq b_n.

No.  We know that convergence of a series is not affected by
finitely many terms:  For instance,  \sum_1 ^infinity a_n
= a_1 + \sum_2 ^infinity a_n,  so if the sum from 2 to infinity
converges, so will the sum from 1 to infinity.  So if the condition
for an alternating series applies from the second term on (or from
the nth term on), the series will converge.

Stewart states the alternating series test for the case where
all the  b_n's  are decreasing to make it easy to understand
and remember.  But he takes for granted that you can see that it
follows that it still applies if there are finitely many exceptions
when he writes, on p.729, in the solution to Example 3, "all that
really matters is that the sequence \{b_n\} is eventually decreasing".

----------------------------------------------------------------------
Regarding the error estimate for alternating series given on p.730,
you write

> ... I am always wondering what is the purpose of finding the size
> of the error?

If we want to know the sum of a series, and it's not one where we
can find the exact answer, then the best we can do is add up a lot
of terms and regard the result as an approximation of the sum.  Then
we naturally want to know how good an approximation it is -- if we
know that it has an error of less than .001, for instance, then we
have essentially found the sum to 3 decimal places.

Sometimes, even when we do "know" the sum, these error estimates
give useful information.  For instance, if the sum is  pi,  or
ln 2,  then by summing terms of the series, we can find the value
of  pi  or  ln 2  to many decimal places.

Incidentally, we don't "find" the size of the error; we bound it.
If we could find the exact size of the error, then we could find
the exact value of the sum, by just adding the error to the
partial sum.  The best we can generally do is say that the error
is less than some value.

> Will it be a foundation of something else that we will learn in
> the future?

Not in this course.  Very likely in Math 128, if you take it.

----------------------------------------------------------------------
You ask about the exact value of  \Sigma (cos n)/(n^2),  the
convergence of which is proved in Example 3, p.733.

I don't know; I suspect that it can't be expressed in terms of
elementary functions.

You also say that since the series is neither alternating nor all
positive, you don't know how to estimate the sum.  If by estimate
you mean finding error bounds on the partial sums, that is not hard.
Writing  a_n = (cos n)/(n^2),  note that |a_n| \leq 1/n^2 < 1/(n(n-1))
= 1/(n-1) - 1/n  (for n>1).  From this it is not hard to verify that

   |R_N| = |\Sigma_{n>N} a_n| \leq \Sigma_{n>N} [1/(n-1) - 1/n] = 1/N

(summed as a telescoping series).

----------------------------------------------------------------------
Regarding the ratio test (p.734) you note that it applies
an absolute value to the terms of the series, and ask
"how can it account for an alternating series?"

Some alternating series do converge by the ratio test; e.g.,
the series with a_n = (-1/2)^n.  As you indicate, the absolute
value throws away the "alternating" property; so these are among
the series that would converge even without that property --
they are absolutely convergent.

But if you take an alternating series which is not absolutely
convergent, such as the one with a_n = (-1)^n/n, you
will find that it falls under the case where "the ratio
test is inconclusive".

----------------------------------------------------------------------
You ask about the author's statement on p.736, end of solution
to Example 5, that  lim_{n->infinity} (1+1/n)^n = e.

Notice that after that equation, he writes "(see Equation 3.6.6)".
That means the equation numbered [6] in section 3.6.  Check it out.
Does it answer your question?

----------------------------------------------------------------------
Concerning the fact mentioned on p.737, that a rearrangement of
a conditionally convergent series can sum to any real number,
you ask whether there are ways of rearranging such a series that
will not change its sum.

Yes.  We know that rearranging just finitely many terms has no effect.
It is also not hard to prove other special cases, such as that
interchanging each odd-position term and the term after it won't
change the sum.  But it is very tricky to describe the most general
permutation that will not affect the sum; we won't go into that.
(One time when I was teaching Math 104, I thought about that question,
and got an exact criterion; but it was something far too complicated
to give even to a 104 class.)

----------------------------------------------------------------------
You ask how one would write a formula for the series [7] on p.737.

The straightforward way would be to say that it is  \Sigma a_n,
where   a_n  is defined to be  1/n  if  n  is even and  0  if  n
is odd.  (There are other tricks that one could use, such as
writing  a_n = (1 + (-1)^n) / 2n.  But the point I want to make
is that if one wants to express the condition that  0  is used
for every other term, one can simply say this in a precise way.
Once one has learned that, then one can go on to tricks if they
are helpful.)

----------------------------------------------------------------------
You ask why the convention on p.741 that  (x-a)^0  is  1  when
x = a  is "valid".

Any definition which says what we will mean by a symbol is "valid", as
long as we follow it consistently, and reason correctly using it.  One
can ask whether it is consistent with other definitions we have made;
but we have no other definition of x^y that apply to the case x=y=0.

Assuming we follow and reason about our definitions consistently,
the next question is whether a given definition is useful.  In
situations like this one, where it is clear what definition we want
to use for "most" values of the argument, and we need to decide whether
it would be useful to extend this definition to some cases where it
is not obvious what our choice should be, we should ask which choice,
if any, would make various general statements hold for the new cases
as for the new.

One condition that would be nice, but that we can't make hold for
the function  x^y  when  x = y = 0,  is continuity as a function
of both  x  and  y:  If we take  x = 0  and let  y -> 0+,  the limit
is  0,  while if we take  y = 0  and let  x -> 0,  the limit is  1.

On the other hand, if we define positive integer powers of a number  x
by starting with  x^0 = 1,  and recursively letting  x^{n+1} = x x^n,
then this leads to a nice uniform development of the laws of
exponents (for nonnegative integer exponents), which requires
no exceptions for  x = 0.

The choice  0^0 = 1  is also convenient in that it allows us to
express a polynomial or power series  a_0 + a_1 x + a_2 x^2 + ...  as
Sigma a_n x^n.  Note also that in dealing with power series, we never
have exponents  n  that approach 0 via nonzero values (which is the
situation in which the discontinuity of  0^y  at  y = 0  would make
trouble), since no integer is closer to 0 than 1; but we can have  x
approaching  0  via nonzero values (whenever we look at the behavior
of our series near 0).  So of the two competing definitions of  0^0
that continuity considerations lead to, the choice  1,  that makes
x^y  continuous in  x,  is more useful than  0^0 = 0,  which would
make  x^y  continuous in  y.

Depending on the area of mathematics, one may find it useful to
define  0^0  as  1,  or leave it undefined.  In the majority of
cases, including the study of power series, the definition  0^0 = 1
is by far the most useful.

(Incidentally, this does not contradict the statement "0^0 is an
indeterminate form".  That is not a statement about the value
of  0^0;  rather, it is shorthand way of saying that if two functions
f(x)  and  g(x)  both approach  0  as  x-->a,  then this is not
enough information to determine  lim_{x->a} f(x)^{g(x)}.  The
reason for this is the fact mentioned above, that no choice of
definition for  0^0  makes the function  x^y  continuous at (0,0).)

----------------------------------------------------------------------
Concerning Stewart's statement on p.741 that in writing power series
as  \sum a_n (x-a)^n,  we make the convention that  (x-a)^0 = 1
even when  x = a,  you ask how it can be justified: "Isn't 0^0
undefined?"

Well, whether something is undefined depends on whose definition
one follows.  Unlike most cases of exponentiation, there is not a
completely obvious meaning for  0^0,  but I think most mathematicians
would agree that the optimal choice -- the definition that gives the
most elegant and coherent system -- is to make  0^0 = 1.  In particular,
this choice is universally agreed on in the context of writing
polynomials and power series.  Stewart cautiously says "we have
adopted the convention"; but even then, if that is the convention
adopted, that's what the symbol means in this course.

One likely source of confusion is that "undefined" is sometimes used
as shorthand for a statement about limits.  It is quite true that
if  lim_{x->a} f(x)  and  lim_{x->a} g(x)  are both  0,  one cannot
deduce from this what  lim_{x->a} f(x)^{g(x)}  is.  This is sometimes
turned into the slogan "0^0 is undefined", but that is not a statement
about 0^0 as an arithmetic operation.  Note that in studying power
series or polynomials, exponents are always integers, so one never
has exponents approaching zero through nonzero values; so the above
fact about limits is irrelevant.
 
Evidence pointing to  0^0 = 1  as a reasonable definition is that
it allows the following elegant inductive definition of exponentiation:
x^0 = 1  for all  x,  and  x^{n+1} = x x^n.  (Usually people start
the definition of exponentiation with  x^1 = x; but under the above
definition,  x^1 = x  can be proved rather than being assumed.)

Note that making  x^0  the constant function 1 for all  x
makes the law for differentiating  x^n  work for  n = 1,  while
leaving it undefined at x=0  would punch a hole in that law.

----------------------------------------------------------------------
You ask about Bessel functions.

I don't know much about them myself -- just that they're in the
realm of mathematical physics and (advanced) differential equations.
Stewart isn't saying that you should know about Bessel functions and
their orders, or that we'll learn about them in this course -- in
Example 3, p.742, he is simply using this particular function, saying
"here is a function of real-world importance ..."  to illustrate how
to test a power series for radius of convergence.  He likewise gives
the power series for the Bessel function of order 1 in Exercise 35
to this section.

An example that would have led to a computation involving the
same idea as Example 3, p. 742 (how to deal with factorials),
but a simpler computation would have been the power series for the
function  e^x.  But, ironically, Stewart does not want to give that
now, because in section 11.10 we will learn how to find that power
series, and he doesn't want to claim here without explanation a result
that we will get by computation later on.  So instead, he gives us
the series for a function that we have not seen before.

----------------------------------------------------------------------
Regarding Example 3 on p.742, where Stewart notes that
x^2/4(n+1)^2 --> 0   for all  x,  you ask "what if  x  is
big enough so that  x^2/4(n+1)^2 > 1?"

When Stewart writes "--> 0" here, he means "approaches zero
as  n  approaches infinity, with  x  fixed".  Now with  x
fixed, as  n  goes to infinity it eventually gets larger
than  x,  larger than  10x,  larger than  1000x,  etc..
So the fraction shown does indeed approach 0.

If instead we were talking about the limit as  x --> infinity,
with  n  fixed, then the opposite would be true:  x^2/4(n+1)^2
would approach infinity.

You may well ask, "How do we know that we are talking about
a limit as  n  approaches infinity with  x  fixed?"  The
answer lies in the situation we are considering.  The concept
of a power series involves a set of series, one for each
value of  x.  For each such value of  x,  we imagine computing the
terms of the series using that fixed  x,  and summing them over  n.
(Stewart says this in the second sentence of this section (on p. 741):
"For each fixed  x,  the series [1] is a series of constants that we
can test for convergence or divergence".  I similarly emphasized in
class that each value of  x  gives us a series that we sum, and that
we then consider these sums a function of  x.  But in computing
_each_ such sum, we use a fixed value for  x.  This is just like
polynomials:  In computing a value of  3x^2 + 5x + 7,  we don't
use different values of  x  in the "3x^2" and the "5x".)  So the arrow
in "x^2/4(n+1)^2 --> 0" refers to what happens as  n --> infinity,
with  x  fixed.

Make sense now?

----------------------------------------------------------------------
You ask why the ratio and root tests fail at the endpoints of the
interval of convergence of a power series, as stated on p.743.

Let's assume, for simplicity, that the series is centered at  0.

Now if the power series converges by the root test at a point  x,
one can deduce that for some constant  C  and some  r  with
0 \leq r < 1,  the n-th term of the series at  x  must have absolute
value  < C r^n.  If this is so, let us choose some  q\in (r,1).  Then
we find that at the point  (q/r)x,  the n-th term of the series will
have absolute value < C q^n;  so the series will also converge at that
point.  This shows that an  x  at which the series converges by the
root test cannot have the largest absolute value among the points
where the series converges; so such a point is not an endpoint of
the interval of convergence.

A similar argument shows that a point at which the series diverges
by the root test cannot have smallest absolute value among points
where the series diverges.

The argument for a series centered at an arbitrary point  a  is
similar; the only difference is that instead of multiplying  x  by
q/r,  we multiply  x-a  by that factor; i.e., take a new point whose
distance from  a  is  q/r  times the distance of  x  from  a.

The same statement for the ratio test follows from the fact that
if a series converges by the ratio test, it also converges by the
root test.

Finally, you ask whether this means that at the endpoints of the 
interval of convergence, the series is always conditionally convergent.

No.  It can be a series that is absolute convergent but for which
the ratio test fails (e.g., a p-series with  p > 1).  Or it could
be a series which is conditionally convergent.  Or it could be a
series which is divergent, but for which the ratio test fails.
(E.g.,  n^k  for any  k > 0;  or even any  k > -1.)  You should be
able to give examples of power series behaving in each of these
ways at the endpoints of their intervals of convergence!

----------------------------------------------------------------------
You note that in Example 5, p.745 the ratio  a_{n+1}/a_n  simplifies
to  (n+1)(x+2)/3n; and you ask how one knows what to do next.

In studying the properties of series, the thing one is looking at
is how the terms change as  n --> infinity.  (Everything but  n
is a "constant" so far as the process of summation is concerned.)
Likewise, in using the ratio test, you want to know how the ratio
of successive terms changes with  n -- whether it approaches
some limit you can describe.  The two terms depending on  n in
(n+1)(x+2)/3n  are the  n+1  in the numerator and the  n  in
the denominator, so one separates them out and sees how they
interact.  This gives a factor  (n+1)/n,  which one can see
approaches  1,  by writing it  1 + 1/n.  What is left is the
factor  (x+2)/3,  and one sees that that will be the limit value.

----------------------------------------------------------------------
Regarding the end of Example 5 on p.745, you ask what test for
divergence Stewart is using here.

Notice that he writes "the Test for Divergence", in capital letters.
This shows that "the Test for Divergence" is the name he has given a
certain test, and not just a general description.  So you can look that
name up in the index of the book, and find the test he is referring to.

----------------------------------------------------------------------
You ask why the geometric series in today's reading (e.g., [1]
on p.747) start with n=0, while in the earlier development of
geometric series, Stewart started with n=1.

It's not just about geometric series.  In sections 11.1-11.7, Stewart
indexed almost all the series he showed starting with n=1, while
starting in section 11.8, where he introduced power series, he has
begun his series with n=0.  I assume he started his series in the
earlier section with n=1 because we are used to counting "1, 2, ...",
so students would feel it unnatural for the subscripts used in a
series to start with 0.  But with power series (as with polynomials),
we have terms corresponding to different powers of  x,  with the
constant term corresponding to the 0-th power; so it is natural to
start with n=0.

A given series (and in particular, a geometric series) can be
written either way; he has switched his way of writing general
series as of section 11.8, and carried the case of geometric series
along with this.

----------------------------------------------------------------------
Regarding Example 3, p.748, you ask why we are allowed to move an
"x^3" past the  \sum  sign  in a series, but can't move it past
the  \int  sign in an integration.

What you can't move past the integral sign is anything that depends
on the variable of integration, often denoted  x.  Similarly, you
can't move past the summation sign anything that depends on the
variable of summation, typically denoted  n.  But if you were
doing, say,  \int xy dy  (an integral with respect to  y  rather
than  x),  where neither  x  nor  y  was a function of the other, then
from the point of view of that integration,  x  would be a constant,
and you could indeed rewrite the integral as  x \int y dy,  by the
first formula in table [1] on p. 398.  Similarly, in the example you ask
about,  x^3  does not depend on the variable of summation,  n  (i.e.,
it represents the same number in every terms of the series), so you
can move it past the summation sign by rule (i) of Theorem 8 on
p. 709.

----------------------------------------------------------------------
Regarding Note 2 on p.749, you ask

> ... Does this mean that the whole interval may change when it is
> differentiated like from 0<x<2 to 1<x<3 ...

If we are looking at a power series "centered" at a point  a,  i.e.,
a summation involving the powers of  x-a,  then the endpoints of the
interval have to be  a-R  and  a+R  (p. 743, Theorem 3 (iii)).  Clearly,
when we apply the formulas in Theorem 2, p. 748, we still get a series
centered at  a.  So the interval of convergence can't change in the
way you ask about; all that can change is whether the endpoints are
or are not in the interval.

----------------------------------------------------------------------
You ask why, in Example 7 on p.750, Stewart chooses x=0
to determine the constant  C.

Because  x = 0  is the one point where we can see immediately
what the sum of the series is!

----------------------------------------------------------------------
Noting that for the approximation of the integral in Example 8(b) on
p.751 Stewart uses the Alternating Series Estimation Theorem, you ask
> ... if there is a general way to set the error to be less than
> something... or will it definitely depend on the form of the series?

In general, it will depend on the form of the series.  However,
when  |x| < |R|,  we know that the absolute values of the terms of
the series will always be less than or equal to those of some
geometric series with  0 < r < 1,  so one can use the fact that
the terms will be less than those of that series.  But the estimate
we get may not be the best estimate there is, and this method won't
work at the ends of the interval of convergence, i.e., when  |x| = |R|.

By the way, where you write "to set the error to be less than
something", the easy way to express this is "to bound the error".
or "to get a bound on the error".

----------------------------------------------------------------------
You ask about proving the formula at the bottom of p.753 by induction.

Induction is the right way to get a solid proof of the result.  Do
you see how the induction would work?  You can't just say "Assume
f^{(k)}(a) = k! c_k" and prove the k+1 case from that -- in the
formulas  f(a)=c_0,  f'(a)=c_1, etc. on that page, each of these
formulas isn't deduced from the one before.  Rather, it is the
formulas [1], [2], [3], [4] each of which is deduced from the one
before.  So you have to figure out the general formula of which
[1], [2], [3], [4] are the cases n=0,...,3, and show how to prove
the n=k+1 case from the n=k case; and then deduce f^{(n)}(a) = n! c_n
by taking x=a in the n-th case of that formula.

----------------------------------------------------------------------
Regarding the argument leading up to Theorem 5 on pp.754, you ask
"Why can we assume x = a?"

We are assuming formula [1] on p. 753, which states that the
given equation is true for all  x  with  |x-a| < R.  (Stewart
just writes "|x-a| < R" rather than the fuller statement "for all
x  with  |x-a| < R", because by this point, having studied the
concept of radius of convergence, we can recognize that this is
the kind of condition meant.)  It is noted that the consequences
[2], [3] and [4] likewise hold for all  x  in that range.  And a
statement true for all  x  in that range holds in particular
for  x = a,  since  a  is in the range.  That is what Stewart
means when he speaks of putting  x = a  in those formulas.

----------------------------------------------------------------------
You ask about the convention 0! = 1  stated on p.754, just
before the Theorem.

There are different ways to explain that.  One of them is to
say that the simplest way of defining  n!  is not the way you
have seen, but to start with

			0! = 1,

and then say that for every  n>0,

(*)			n! = (n-1)! n.

You can check that this will give the values you are familiar
with for  n = 1, 2, ... .  But if we started by defining
0!  to be any value  C  different from  1,  then the above
rule would not give the values you are familiar with.  So
to get those values, and have a nice consistent relation
between the factorials of different numbers, we need to
take  0! = 1.

Another is to suppose we know the factorials of all integers
greater than 0, and wonder whether we can choose a value for
0!  that will relate to them in a natural way.  Factorials
of successive positive integers are related by formula (*)
above, so we try to define  0!  to be the value that will
make (*) work for n=1; and that gives us 1.

Still another way is to note that  n!  is the number of ways
a set of  n  things can be put in order.  For both  n=1  and
n=0,  there is only one order one can use; so both 1! and 0!
should be  1.

One way or another, the choice  0! = 1  makes (*) and many
other mathematical properties of factorials work right
for n=0,  so that is made the definition.

Remember that Stewart has called (*) a "convention".  That
means a choice that people agree to follow.  When a definition
that you have used before doesn't give any answer in some case,
it is reasonable to make a convention that will define the
concept in that case, if the choice one makes has properties
that are convenient to reason with.  When one makes such a
convention, one isn't saying that the old definition gives
the new answer; one is agreeing to use an extended definition.

----------------------------------------------------------------------
Concerning Taylor series (p.754) you ask how an infinitely
differentiable function could fail to be given by that series --
how it could "grow away from the power series" if all their
derivatives are equal.

Well, all their derivatives are equal at the point  a  about
which one is taking the Taylor series; but they don't stay the
same.  Consider the function  e^{-1/x^2}  (made 0 at x=0) of
exercise 74 (p. 766).  It's value at 0 is 0; to grow away from this,
it must have nonzero derivative as one moves away from 0.  But
its derivative at 0 is 0, so to get the derivative to have
some nonzero value close to 0, it must have a larger value for
the second derivative in between.  But its second derivative at 0 is
also 0, so to get the second derivative to have the above largish
nonzero value close to 0, it must have a still larger value for
the third derivative in between.  And so on.  And if you start
computing the various derivatives of  e^{-1/x^2},  you find
exactly this phenomenon:  Although they all go to 0 at x=0,
they become large at a faster and faster rate very close to
0,  as you go to higher and higher derivatives.

On the other hand, for most familiar functions, one can prove
that the successive derivatives don't grow very fast, so the
function is in fact given in a neighborhood of each point by
its Taylor series.

Incidentally, your intuition that an infinitely differentiable
function will be determined by its derivatives at a point turns
out to be right (when properly stated) for functions of a complex
variable (rather than a real variable).  You will see this if
you eventually take Math 185.

----------------------------------------------------------------------
Regarding the material on pp.754-758 you ask

> What is the difference, or advantage, of writing the sum of the
> Taylor series of e^x centered at a=0 versus a=2?

If you want to approximate the values of  e^x  for  x  near
0,  the series centered at  a=0  is the useful one; if you want
to approximate the values for  x  near  2,  the series centered
at  a=2  is best.

----------------------------------------------------------------------
You ask about the statement on p.755 that  f(x)  is the sum of its
Taylor series if it is the limit of the  T_n(x).

Remember that a number is the sum of a series if it is the
limit of the partial sums of that series (p. 705, Definition 2).
At the bottom of p. 755, the values  T_n(x)  are defined to be
the partial sums of the Taylor series.  Putting these two
facts together, we get the statement you asked about.

----------------------------------------------------------------------
You ask what happens in the context of Theorem 8 (p.756) for |x - a|
not less than  R.

Stewart's argument really shows that at any point  x,  f(x)  is given
by its Taylor series about  a  if and only if the indicated limit
statement holds.  He puts in the condition "|x - a| < R" because that
is the type of condition that typically determines a region where
such a statement holds, and hence the kind of region on which you
will typically be applying the theorem.  But it is not essential
to the theorem.  Wherever the limit statement holds,  f(x)  will
equal the sum of its Taylor series.

----------------------------------------------------------------------
Regarding the sentence between the two theorems on p.756, you ask
whether we are taking the lim_{n->infinity} R_n (x) = 0 for granted,
saying "It would be convenient if this was true" and then proving
it in the next part.

Basically, right; but where you say we prove it "in the next part",
the next theorem gives us a tool for proving it, which works for lots of
familiar functions.  But Exercise 74 (p. 766) shows that it isn't true
for all functions.

Regarding the second theorem, you ask

> Is M a constant? and how is d determined?

For any given function  f,  we hope to find some interval
[a-d, a+d]  in which  |f^{(n_1)}(x)|  is bounded by some
(not too big) constant  M.  Then we can apply the theorem.
As I said above, being able to do so depends on the functions.

----------------------------------------------------------------------
Regarding Theorem 9 on p.756, you ask "What if |f^{(n+1)}(x)|\geq M ?"

That theorem says that if  f  is a function, and  a,  d  and  M  are
constants such that the inequality  |f^{(n+1)}(x)|\leq M  for all  x
such that |x-a| \leq d,  then the stated conclusion holds.

If  f  is a function, and  a,  d  and  M  are constants such that the
reverse of the above inequality holds, we could get the reverse of
the inequality in the conclusion of the theorem, by roughly the same
method I used in class to prove the theorem.  But this would seldom
be of much use.  When one applies the theorem, one's hope is to find
a not-too-large M  that satisfies the stated conditions; and in fact
to get for each  n  an  M_n  with that property; and then verify that
these make Theorem 8 applicable for appropriate values of  x.  Getting
the reverse inequality might occasionally be of use in showing that
a function is not the limit of its Taylor series.

----------------------------------------------------------------------
Regarding the proof of the n=1 case of Taylor's Inequality on p.756,
you note

> the book says that a < x < a + d... but if |x-d| < d shouldn't
> it be  a - d < x < a + d ?  why are they taking x > a ?

The calculations for  x > a  and for  x < a  are mirror images of
one another.  He first does the  x > a  calculation, then says in
the middle of the next page that "similar calculations" handle
the case  x < a.

----------------------------------------------------------------------
You note that a few lines after Theorem 9 on p.756 it says that an
antiderivative of f'' is f', and that this implies that
f'(x)-f'(a)<=M(x-a).  You say

> I do not really know where this comes from, is there any direct
> relationship between the antiderivative that the book mentions in
> the past sentence with this equation?

Yes!  The book says "by Part 2 of the Fundamental Theorem of
Calculus".  Did you look up (or remember) what Part 2 of the
Fundamental Theorem of Calculus says, and ask yourself how
it might be applicable to this implication?

I am not answering in this way to be difficult, but to point
out to you the way you need to read the text in order to do
well in a math course.  Let me know whether you had already
checked out what Part 2 of the Fundamental Theorem of Calculus
said, and thought about whether it was relevant, or not;
and in either case, how far you get with the connection, and
whether you still need help with it.

----------------------------------------------------------------------
You ask whether Taylor's Inequality (p.756) proves that the Taylor
series converges to  f(x).

The  M  in the hypothesis of Taylor's theorem can depend on  n.
To prove that the series converges to  f(x)  for a given  f  and
x,  we need to see how  |M|  grows as a function of  n.  The answer
depends on the function  f,  and the values of  a  and  x.

----------------------------------------------------------------------
You ask how mathematicians come up with non-obvious results like
Taylor's Inequality (p.756).

Mathematical research is an exciting and difficult endeavor, and
everyone has their own ways of coming up with ideas and proofs.

The fact that Taylor's inequality is likely to be true is easy to
guess:  Because the x^0 through  x^n terms  of the Taylor series
are chosen to make  T_n(x)  have value, first derivative, etc.
through the n-th derivative matching those of  f(x)  at  x=a,  the
remainder  R_n(x)  will have all those derivatives  0  at  x=a,  so
the process by which  R_n(x)  moves away from  0  can be thought
of as one where its (n+1)-st derivative allows its n-th derivative
to move away from 0, the n-th derivative allows the (n-1)-st
derivative to move away from 0, etc., with the movement of the
first derivative away from 0 allowing f(x) itself to move away
from 0.  Now suppose we know a bound  M  on the absolute value of
the (n+1)-st derivative.  Intuitively, this process of each derivative
allowing the next to move away from 0 should have largest possible
effect if the (n+1)-st derivative is everywhere equal to  M,  or
everywhere equal to  -M.  Taking the former case for simplicity,
it's easy to compute, by successive integrations, that in that
case, the n-th derivative of  R_n  will be  M(x-a), the (n-1)-st
derivative will be  M(x-a)^2/2,  and so forth, ending up with
R_n(x) = M(x-a)^{n+1}/(n+1)!.  Since this is what happens in the
case where  R_n  is expected to grow as fast as is allowed by
the restriction  |R_n(x)^{(n+1)}| \leq M,  we can expect that
in general, if  |R_n(x)^{(n+1)}| \leq M  then  |R_n(x)|  will be
\leq that value.

But how someone came up with the proof by repeated application
of Rolle's Theorem, I can't guess.

I myself have two metaphors for math research, as I experience it:
(1)  Banging your head against a wall until you find a soft spot in
the wall.  (2)  Playing.

----------------------------------------------------------------------
Regarding the computations of Example 2, pp.757, you write

> ... I am confused about how this proves that the function is equal
> to the sum of its Taylor series for all x ...

It's because of equation 10 on p. 757.  Note the words after
that equation, "for every real number x".  If the denominator
were, say, 5^n,  the limit statement would just be true for
|x| < 5,  but with a factorial in the denominator, it is true
for all real numbers.

> ... and not just a set radius of convergence.  ...

Recall Theorem 3 on p. 743:  a power series may have a finite
radius of convergence, or it may converge for all  x.

Returning the the first part of your question -- did you read the
justification carefully, and see the explanation "from Equation 10"
preceding the second display of Example 2, p. 757?

----------------------------------------------------------------------
> Is it possible to use a power series to prove or disprove the
> rationality of a number?

In a roundabout way, yes.  Power series lead to the formula [12] for
e  on p.757.  From this we can prove that  e  is irrational, as follows:

Note that if  r = a/b  is a rational number, where  a  and  b  are
integers, and we approximate it by some other rational number
p/q  not exactly equal to  a/b,  then the error  e = a/b - p/q  can
be written with denominator  bq,  hence, since it is not zero,
its absolute value must be at least  1/bq.  This says that  |eq|  is
at least  1/b.  So for every rational number  r,  there is a fixed
constant  c > 0  (namely, 1/ the denominator of  r) such that whenever
we approximate  r  by another rational, the error times the denominator
of that approximating rational is at least  c.

On the other hand, if we approximate  e  by  r = 1 + 1/2 + ... + 1/n!,
this is an expression that can be written with denominator  n!, and
on can show that the error is only slightly more than the next term
1/(n+1)!.  Hence the error times the denominator of our approximation
is roughly  n! . 1/(n+1)! = 1/(n+1).  Now as we take  n  larger and
larger, this does not remain  \geq  any positive constant c.  So
e  cannot be rational.

(I've left out the details of why the error is only slightly more
than  1/(n+1)!  It's not hard; but to get around that, one can
consider, instead,  e^{-1} = \sum (-1)^n / n!.  Because this is an
alternating series of decreasing terms, the error will always be
_less_ than the preceding term, and the above argument works without
extra details.  And, of course, knowing that  e^{-1}  is irrational,
we can conclude that  e  is.)

----------------------------------------------------------------------
You ask how one can prove that the Maclaurin series for  e^{2x}
found in two ways -- by substituting  2x  for  x  in the
Maclaurin series for  e^x  (Example 3, p.757), and by direct
computation using the derivatives of  e^2x -- must be the same.

Well, we know that  e^x  is given by its Maclaurin series,
which means that the equality  e^x = \sum x^n/n!  holds
for all  x;  hence, substituting  2x  for  x,  we get a power
series in  x  that represents  e^{2x}.  If we apply Theorem 5 on
p. 754, to the case  f(x) = e^{2x}  and the series found as above,
it shows that the coefficients of that series must come from
the derivatives of  e^{2x}  by the formula at the end of that
theorem; i.e., that they are the coefficients obtained by
direct computation.  So the two series must be the same.  (In
this case, with  e^{2x},  it's not particularly hard to get those
coefficients directly; but your question could equally be applied
to  e^{x^2} -- cf. Example 11, p. 762 -- and there it is much harder
to do so.)

----------------------------------------------------------------------
You ask why Stewart expands  sin x  about the particular point
pi/3 in Example 7 on p.759.

He wants to illustrate the fact that one can use these series,
even when the calculation is hairy.  The point  pi/3  is a
natural one, since it represents 60 degrees, and it has a
sine and cosine given by simple expressions.

If one wants to compute values of a function near a particular
point  x=a,  the power series centered at  a  is much more
useful than series centered at more distant points.  It converges
more quickly, since the terms  (x-a)^n  becomes very small.
So for applications one needs to be ready to expand about any  a.

----------------------------------------------------------------------
You ask how Stewart gets the formula for  tan^{-1} x  in the chart
on p.762.  (Incidentally, your question of the day is supposed to
specify the page, and item on that page, that you are asking about.
I had to hunt around a bit to find what page your question must be
referring to.)

He got the formula on p. 750, in Example 7.  (I'll suggest that
he insert here a reference to that example.)

----------------------------------------------------------------------
You ask why, in the chart on p.762, the terms in the series for
tan^{-1} x  don't have factorials in the denominators, as most of
the other series do.

Those factorials come in when the series is computed using the
formula [7] on p. 754.  The power series for  tan^{-1}  is obtained
by a different method in Example 7, p. 750.

Of course, it could also be computed using formula [7] on p. 754;
but that computation would be messy, and in the end, the
factorial in each denominator would interact with another factorial
in the numerator, and give the quotient  1/{2n+1}.

----------------------------------------------------------------------
You ask whether long division, as in Example 12(b)[13], p.764, is
the only way to find the power series for  tan x.

No.  Formula 6 (p. 754) is also applicable to  tan x.  But
there is no simple formula for the n-th derivative of tan x,
while there are such formulas for  sin x  and  cos x.  Using
Formula 6, one can compute as many terms as one wants of
the series for  tan x;  but since there isn't a general
formula, it's easier to compute them from the series for
sin x  and  cos x,  which do have general formulas.

----------------------------------------------------------------------
Regarding multiplication and division of power series, illustrated
in Example 13 p.764, you ask how many terms one should take.

Stewart says at the end of the first paragraph that he will "only find
the first few terms because the calculations of later terms become
tedious ...".  Typically, there is no simple formula for the general
term, so no "pattern" emerges from further calculation.  Any number
of terms can be computed, and in real life, one would compute however
many one wants for one's purposes.  Stewart just computes enough to
illustrate the method of computation.

----------------------------------------------------------------------
You ask why, as Stewart says on p.769, last sentence of
first paragraph, the sequence (T_n(x)) converges more slowly to
e^x  the farther  x  is from  0.

This is because the terms that are left out of  T_n(x),  i.e.,
x^{n+1}/(n+1)! + x^{n+2}/(n+2)! + ..., are larger the bigger  x  is.

More generally, see Taylor's inequality on p. 756, where the larger
|x-a| is, the larger (and hence the weaker) the bound bound on
|R_n(x)|  is.

----------------------------------------------------------------------
Regarding the discussion of approximating the function  e^x  on
p.769, you ask how we can find an  M  to use in getting error bounds,
given that  e^x  (and its derivatives, which, as you note, are all
e^x)  is unbounded on its domain, the real line.

We can't find one  M  that will bound  e^x.  But for  x  in
any given range, e.g.,  (-1,1)  or  (100,1000), there will be
upper bounds on those derivatives, depending on the range.  So
we can get results saying that for  x  within a given finite range,
such-and-such many terms of the Taylor series are enough to
approximate  e^x  to within a given error.

----------------------------------------------------------------------
> P. 769, bottom, part(b) solution of Example 1 ...
> ... Why does Stewart point out that the series is not alternating
> when x < 8?

After the first two terms, the coefficients  c_n  do alternate in
sign.  (At the first step, one is differentiating a positive power
of  x,  x^{1/3},  but after that step, one is always differentiating
a negative power, e.g.,  x^{-2/3}, x^{-5/3}, etc., so it always
brings in a negative factor.)  So one might think that the series
becomes alternating after the first term.  This works for  x > 8,
since then  (x-8)^n  is positive; but for  x < 8,  we have (x-8)^n
switching signs at each step, so the products  c_n (x-8)^n  keep
the same signs after the first step, so the series is not alternating.

----------------------------------------------------------------------
Regarding Example 1, p.769, you ask how we know the radius of
convergence of the cube root of  x  around  x=8.

Actually, Stewart never says that we know the radius of
convergence.  One can form the Taylor series and ask how closely
a given number of its terms approximates a function without knowing
that it converges, or, if it does, that the limit is the function.
(For an example where, though the Taylor series does converge, it
doesn't converge to the given function, we recall the function
e^{-1/x^2},  whose Taylor series is  0;  and this does approximate that
function quite closely for small very values of  x.)

However, we do have the tools to answer your question, and
I'm surprised that Stewart doesn't mention this.  Since we are
thinking of  x^{1/3}  relative to the point  x=8,  we can write it
as  ((x-8) + 8)^{1/3},  and simplify this to  2((x/8 - 1) + 1)^{1/3}.
If we make the change of variables  y = x/8 - 1,  then what we have
after the factor  2  is  (y+1)^{1/3},  which we can expand by the
binomial series.  This has radius of convergence 1 as a series in
y;  hence as a series in  x = 8y + 8  it has radius of convergence 8,
and interval of convergence  (0, 16)  possibly with one or both
endpoints thrown in.  (It does converge at both endpoints, but
I won't go into details.)

Indeed, in terms of the binomial series, we have
2(y+1)^{1/3}
   = 2 (1 + (1/3)y + (1/3)(-2/3)y^2/2! + (1/3)(-2/3)(-5/3)y^3/3! + ...)
   = 2 + (2/3)y - (2/9)y^2 + (5/81) y^3 + ...
   = 2 + (2/3)(x/8 - 1) - (2/9)(x/8 - 1)^2 + (5/81) (x/8 - 1)^3 + ...
   = 2 + (1/12)(x - 8) - (1/288)(x - 8)^2 + (5/20736)(x - 8)^3 + ...
which matches Stewart's series as far as it goes.

----------------------------------------------------------------------
You ask why, as stated on the last line on p.772, "If  v  is much
smaller than  c,  then all the terms after the first [in the last
formula on the page] are very small when compared with the first term".

To say "v  is much smaller than  c" is to say "v/c  is very small",
i.e., close to zero; so higher powers of  v/c  are very small compared
with lower powers.  (E.g., the 4th power is (v/c)^2 times the 2nd
power, so the ratio between them,  (v/c)^2, the square of a number very
near  0,  is very near zero.  Putting in moderate-sized coefficients
such as 1/2, 3/8 etc. doesn't significantly affect things when  v/c
is *very* small, e.g., around 10^{-6} in part (b) of this example.)

----------------------------------------------------------------------
You ask how Gauss's "cos \phi = 1" simplification in equations [2]
at the bottom of p.773 leads to equation [3] on the next page.

When one puts  cos \phi = 1,  then the expressions under
the square root signs in [2] become squares.  One can see this
for the first of those equations by rearranging the term under
the square root sign (after setting  cos \phi = 1)  as
R^2 - 2R(s_0+R) + (s_0+R)^2 = (R - (s_0+R))^2 = (-s_0)^2 = (s_0)^2,
or simply by expanding out, and noting that everything cancels but
(s_0)^2.  Similarly, the expression under the second square root
gives  (s_i)^2.  Hence, the equations in [2] become  l_0 = s_0
and  l_i = s_i.  Now substituting these into [1] we get [3].

(But I don't know how [1] is derived.)

----------------------------------------------------------------------
You ask whether there are common applications of Taylor polynomials
in fields other than physics (discussed on pp.772-774).

Probably; especially in fields closely related to physics, such
as chemistry and astronomy/cosmology.  The reason physics is the
most obvious place is that it has exact mathematically expressible
laws, so one knows when one is replacing these laws by approximations.
In something like biology, the mathematical models are approximations
anyway, so that rather than going from an exact law to an approximation,
one simply goes from one model to another.  Insofar as chemistry and
astronomy are based on physics, they too have exact laws.

Still, there may be other areas that I just don't know about.

[sent later:]

I was just looking online for something else (in relation to another
student's question of the day), and I ran into a book where Taylor
series approximations are applied to a different field -- finance!

See  http://books.google.com/books?id=o3w4ilXdIGgC&pg=PA219

----------------------------------------------------------------------
You ask how Stewart gets equation [3] on p.774.

When he replaces  cos \phi  by  1  in equations [2], the
expressions under the square roots become the squares of
R - (s_o +R),  respectively  R - (s_i +R),  so those
equations simplify to  l_o = |R - (s_o + R)| = s_o  and
l_i = |R - (s_i + R)| = s_i.  So in [1], all the denominators
l_o  and  l_i  become  s_o  and  s_i,  and this gives [3].

----------------------------------------------------------------------
You ask about the origin of the term "homogeneous" in connection
with differential equations, as used on p.1142.

I discussed this in class last week.  First note that one says that a
polynomial in variables w, x, y, z, ...  is "homogeneous" if all terms
with nonzero coefficients have the same degree; e.g., w^2 - xz  is
homogeneous because all terms have degree 2.  Next, one can talk
about a polynomial being homogeneous in some subset of the variables.
E.g., if one looks at  x^2 - wxyz  as a polynomial in all four
variables, it is not homogeneous; but if one considers only
its dependence on  x and y, then all terms have degree 2 in
those two variables, so it is homogeneous in x and y.  Finally,
when one considers a linear differential equation

P_n(x) y^{(n)} + P_{n-1}(x) y^{(n-1)} +...+ P_1(x)y' + P_0(x) y = G(x),

it is most useful to consider it as a polynomial in  y, y',..., y^{(n)},
and hence to ask whether it is homogeneous in those variables.
All the terms on the left have degree 1 in those variables, while the
term on the right has degree 0 in them; so (assuming that the
P_m  are not all zero, as we must for this to be a differential
equation) our equation is homogeneous if and only if the term
on the right can be ignored in these considerations, i.e., is zero.

----------------------------------------------------------------------
You ask whether the  c_1  and  c_2  in Theorem 4 on p.1143 can be
taken to be real numbers, or whether we need to use complex numbers.

If  y_1  and  y_2  are real-valued solutions, and we want all
real-valued solutions, it suffices to take  c_1  and  c_2  real.
But if we allow complex-valued solutions, then we need to allow
c_1  and  c_2  to be complex constants.

(In particular, in the context of statement [11] on p.1145, if we had
somehow discovered the real-valued solutions  e^{\alpha x} cos \beta x
and  e^{\alpha x} sin \beta x  directly, then to get all real-valued
solutions, it would be enough to combine these using real coefficients,
as the solution given there shows.  But since the process we used
involved starting with the more natural complex-valued solutions
e^{(\alpha + i\beta) x}  and  e^{(\alpha - i\beta) x},  we had to use
complex-valued coefficients  C_1  and  C_2,  then derive the
expression in terms of  e^{\alpha x} cos \beta x  and
e^{\alpha x} sin \beta x  and real coefficients from these.)

----------------------------------------------------------------------
You ask why the key to proving Theorem 3 on p.1149 is showing
that if  y  is a solution to [1], then  y - y_p  is a solution
to the complementary equation.

Well, notice that

		y = y_p + (y-y_p).

So if  y  being a solution to [1] makes  y-y_p  a solution
to the complementary equation, then the above displayed
equation says that  y  has the form shown in the theorem.

----------------------------------------------------------------------
You ask whether Theorem 3 on p.1149 also works for n-th order
linear equations with constant coefficients.

It works for all n-th order linear equations, whether the coefficients
are constant or not!

(And I would have expected you to have been able to figure this
out for yourself -- just try working through the proof that
Stewart gives on p. 1149 for the general case, and see whether
it works!)

----------------------------------------------------------------------
Regarding the method of undetermined coefficients (pp.1149-1153)
you ask whether this can be used when G(x) is some trig function
other than sin x or cos x, such as tan x or csc x.

Essentially, the answer is no:  As I described in class, that method
depends on having a finite-dimensional vector space of functions
that is taken into itself by the operator  d/dx;  and the ones that
Stewart listed are the only such spaces.  (If a space of functions
containing  tan x  is closed under differentiation, it will contain all
the derivatives of  tan x,  and these will yield an infinite-dimensional
vector space, showing that the method can't be used.)

That said, it is conceivable that there may be ways of using the
idea of undetermined coefficients that don't depend on having such
a finite-dimensional vector space -- some trick that will give one
a space that one is sure will contain  tan x,  for instance, for
some reason other than the one I described.  But I don't know any
such tricks.

----------------------------------------------------------------------
You ask about the sentence in the middle of p.1151 beginning "If G(x)
is a product of the functions of the preceding types...".

Stewart has talked about three "types" of function  G(x):  polynomials,
functions  A e^{rx},  and functions  A cos kx + B sin kx.  If one
now looks at  x cos 3x,  this is a product of a polynomial of
degree  1  and a function "A cos kx + B sin kx" with  k=3.  So
what that sentence means in this case is that you should try
something that combines these two sorts of solutions, "polynomials
of degree \leq 1" and "functions  A cos 3x + B sin 3x", using
multiplication.  What he shows you is  (Ax+B)cos 3x + (Cx+D)sin 3x.
Strictly speaking, this is not a product of a function  ax+b  and
a function  A cos 3x + B sin 3x;  but each of the terms (Ax+B)cos 3x
and  (Cx+D)sin 3x  is such a product, so though his wording is
imperfect, I hope the meaning is now clear.

----------------------------------------------------------------------
You ask why, in the method of variation of parameters (p.1153),
"we are able to replace constants with an arbitrary function".

The constants appear in the solution to the homogeneous equation.
We are looking for solutions to a nonhomogeneous equation; so as
a "trial balloon" for a solution (which will turn out to work, with
a little trickery), Stewart suggests we look at a similar formula
with functions instead of constants.  I hope to discuss a better
way of coming up with that approach in class, when there is time.

----------------------------------------------------------------------
Regarding the development of the method of variation of parameters
on pp.1153-1154, you ask why the condition u_1' y_1 + u_2' y_2 = 0
is "valid".  I meant to answer this in lecture, but realized, after
I left, that I hadn't really.

Equation [5] on p. 1154 gives too many ways of representing
any function:  If a certain function  y_p  can be represented as

	y_p(x) = u_1(x) y_1(x) + u_2(x) y_2(x),

then for any function  v,  we could take

	w_1(x) = u_1(x) + v(x)/y_1(x),
	w_2(x) = u_2(x) - v(x)/y_2(x),

and then we would also have

	y_p(x) = w_1(x) y_1(x) + w_2(x) y_2(x).

Thus  w_1(x)  and  w_2(x)  would lead to the same  y_p  as  u_1(x)
and  u_2(x)  do.  So we need to put some additional restriction on
u_1  and  u_2  to make them anything near to unique, so that we
can solve for them.

Stewart suggests equation [7] as an additional condition that
we might put on them.  We can't be sure, a priori, that among
all the pairs of functions satisfying [5], there will be a pair
that also satisfies [7].  But if there is one, then, just
because it satisfies [5], it will lead to a solution to our
differential equation.  And it is plausible that there should
be a solution that also satisfies [7], because two equations
in two unknowns tends be right for giving a unique solution.
Stewart follows up the consequences of assuming [7], and gets an
equation [9] which, together with [7], does indeed uniquely
determine u_1' and u_2', allowing one to get  u_1  and  u_2
by integration.

----------------------------------------------------------------------
You ask why there must exist  u_1  and  u_2  satisfying equations [7]
and [9] on p.1154.

In general, two linear equations in two unknowns have a unique
solution, so that one can solve for  u_1'  and  u_2'  in those
equations, and integrate the results to get  u_1  and  u_2.

But the above sentence began with "In general".  The situation
where it does not hold is if the coefficients in the two equations
are linearly dependent: equations  ax + by = p,  cx + dy = q
(where a,b,c,d,p,q  are given, and  x,  y  are to be found)
don't necessarily have a solution if the pair  (a,b)  is
a scalar multiple of the pair (c,d) or vice versa; equivalently,
if the pair  (a,c)  is a scalar multiple of the pair (b,d) or
vice versa.  (You should check that these conditions come to
the same thing.)  In this situation, the latter pair of coefficients
are  (y_1(x), a y_1'(x))  and  (y_2(x), a y_2'(x)).  Suppose the first
of these pairs was  C  times the second for some value  x = x_0.  
That would mean that  y_1  and  C y_2  satisfied the same initial-value
conditions at  x_0,  so they would be the same function, so  y_1  and
y_2  would not be linearly independent solutions.

So in conclusion:  if  y_1  and  y_2  are indeed linearly
independent solutions of the complementary equation, then
equations [7] and [9] will have a unique solution (u_1',u_2').

----------------------------------------------------------------------
Regarding Stewart's statement on p.1154, after equation [6],
that since  u_1  and  u_2  are arbitrary functions, we can impose
two conditions on them, you ask how they can be arbitrary functions
when they are the things that we are trying to solve for.

What Stewart means is, roughly speaking, "If we consider two
functions (without any restrictions other than those we are about
to impose), we can determine them by imposing two conditions on
them".  That is a very vague statement, and under many possible
interpretations it will be wrong, but the idea is right -- "two
unknowns will be determined by two conditions" (even when they are
unknown functions).  Then the point that he makes is that the problem
we have set ourselves imposes one condition, but that is not enough
to determine the functions (I gave examples of that in class).  There
will in general be lots of pairs of functions  u_1, u_2  that give
the same  y_p,  but we don't know how to find even one of them.
So, his idea is, let us choose a second condition, so as to make
u_1 and u_2 unique, and then solve those two equations to find them.
If we chose the other condition at random, we might end up with a
system of equations that would be as difficult to solve as the
original equation.  But a lucky or inspired choice will make things
much easier.

As I said, I will, when I have the chance, give what I hope is a
more insightful way of discovering this technique.

You also asked whether this same method could be applied to first
order linear differential equations, treating those as second-order
equations with the coefficient  a  equal to 0.

No, because the technique involves working with the two linearly
independent solutions of the differential equation, and when  a = 0
it has only one.

However, if we look at the underlying idea, which was, in the
second-order case, to take a general linear combination
u_1 y_1 + u_2 y_2  with nonconstant coefficients, then in the
first-order case, that says to start with a solution  y_1  of
the homogeneous equation, and then take an expression  u_1 y_1  for
our candidate solution to the nonhomogeneous equation.  And that
is exactly what we were led to do in section 3 of the handout on
differential equations.  In the first-order case, we have just one
unknown function,  u_1,  and one condition that it should satisfy,
so we don't need to introduce another condition "out of a hat";
so there wasn't any mysterious choice to make.  (In the opposite
direction, in solving a third-order linear differential equation,
the method of variation of parameters give us an expression
u_1 y_1 + u_2 y_2 + u_3 y_3,  and so we have to supplement our
original differential equation with two conditions taken "out of
a hat".)

----------------------------------------------------------------------
You ask why the force of gravity doesn't make a difference between the
situations of Figure 1 and Figure 2 on p.1156.

If we assume an ideal spring -- one that satisfies "restoring force =
-kx" without change of k over the full range of values of x through
which it moves in our problem -- then a constant force (such as the
weight of the mass) will not change the frequency of vibration.

Stewart makes the difference between the two situations disappear
completely by measuring  x  relative to the equilibrium position
of the hanging spring, i.e., the position at which the force of
gravity balances the restoring force of the spring, and calling
that  x = 0.  If, instead, we let  x = 0  be the position at which
the spring would rest if gravity were not affecting it, then the
spring equation becomes  m d^2 x / dt^2 + kx = A,  where  A  is the
weight of the spring.  This is a nonhomogeneous differential equation;
it is easy to see that a particular solution is the constant function
x = A/k,  so the general solution is gotten by adding the general
solution of the homogeneous equation to that constant; so it has
the same frequency as the solution of the homogeneous equation that
Stewart gets.  In fact, Stewart's "x" is just the above "x" minus  A/k.

----------------------------------------------------------------------
You ask about buoyant force in the situation of Figure 3, p.1157.
Nice point!

When Stewart talks about the "natural length" of the spring on
p. 1156, he is taking account of the weight of the object attached
at the end of the spring, if that weight is hanging vertically.  (So
the same spring would have different "natural lengths", i.e.,
equilibrium positions, in the situations of Figure 1 and Figure 2.)

In the situation of Figure 3, the buoyant force would be a constant,
which would cancel part of the weight of the mass, and so lead to
still a different "natural length".  But it wouldn't affect the
spring constant or the damping force; so taking  x  to be the
number of units the spring is stretched from that new natural length,
the discussion at the bottom of p. 1157 is still valid.

----------------------------------------------------------------------
You ask how on p.1158, case II, Stewart gets  -c/2m  as the
common value of the two roots  r_1  and  r_2  of equation [3].

The quadratic formula gives the roots as

	(-c +- \sqrt{c^-4mk})/2m.

If the roots are equal, then  c^2 - 4mk = 0,  and that formula
simplifies to  -c/2m.

(However, using the equation  c^2 - 4mk = 0,  we can rewrite that
value in other ways; e.g.,  -\sqrt{k/m}.)

----------------------------------------------------------------------
Regarding the discussion of an underdamped vibrating spring on p.1158,
you note that

> the equation implies it will keep vibrating forever, just at
> lower and lower amplitudes.  Is this the case, or does something
> physically stop the vibration at some point?  In real life springs
> appear to stop, ...

Well, since the decay of the vibration is exponential, it would
rather quickly decrease below the point where it could be observed.
E.g., if the amplitude decreased by a factor of 10 in one minute,
then in 10 minutes it would decrease by a factor of 10,000,000,000.
Very soon it would be less than the motion added by random collisions
of air molecules, etc..

That said, it may very well be that before that point, various
nonlinearities in the accurate description of the mechanics of
very small motions would become more significant than the terms
of the equation we have been using.  For instance, though friction
is described as proportional to the velocity, I suspect that for
many common interfaces between solids, there is a small constant
term, so that if one imposes a force smaller than what is needed
to overcome that constant term, the object does not slide.

----------------------------------------------------------------------
You ask about the equation  \omega =\sqrt{4mk-c^2}/(2m)  near the
bottom of p.1158.

After writing "Here the roots are complex:", Stewart gives you
the result of applying the quadratic formula to equation [3];
but instead of writing the result as one big expression divided
by  2m,  he first writes out the non-square-root part divided
by 2m, and then, since the number under the square root sign in
the quadratic formula,  c^2 - 4mk,  is being assumed negative,
he writes that square root as  \sqrt{4mk-c^2} i.  Instead of
showing this expression in the formula for  r_1  and  r_2,
he has called the result of dividing  \sqrt{4mk-c^2}  by the
denominator  2m  "\omega"; so the formula he gives for the
roots involves  +-\omega i,  and then, on the next line, he
says what he means by  \omega.  (The word "where" signals
that the second formula is a qualification of something that
precedes.)

He has chosen the symbol  \omega for  \sqrt{4mk-c^2}/(2m)
because  \omega  is commonly used for frequency, expressed
in radians-per-unit-time, as also on p. 1156.

----------------------------------------------------------------------
You ask why, in the formulas for electric circuits on p.1160, 
the capacitance C occurs in the denominator, while resistance and
inductance occur as coefficients.

One could say that it is arbitrary that we measure capacitors with
the number that we use, rather than its reciprocal.  If we used
that reciprocal, it would go in the numerator where we now put
C  in the denominator.  Likewise if we measured resistors and/or
inductances with the inverse of the numbers we now associate with them,
then these would go in the denominator rather than the numerator.
(The inverse of resistance has a name, "conductance" -- and where the
unit of resistance is the ohm, that of conductance is called the "mho".
Looking online, I find that the inverse of capacitance also has a name,
"elastance"; but it is little used.)

I would guess that the reason the usual measures are used for
resistors, inductances, and capacitors, rather than their reciprocals,
is the following:  In an electric wire, resistance and inductance are
generally negligible.  To get significant resistance, one puts in a
different, poorer, conductor in place of the copper wire; and to get
significant inductance, one puts in a large coil.  So the effects of
these are measured by how much they change things from the default
situation; i.e., how much they push back against the free passage and
change of the current.  On the other hand, a capacitor represents
a break in the electrical circuit, and ordinarily, negligible
current can pass a break.  But by putting two plates very very close
together, one can allow a little current to flow in and out, with
balancing charges building up on the two plates.  So in this case,
it is allowing current through that changes the default situation;
and the measure of the capacitor is the extent to which it does this.
(It's easy to make a capacitor that lets through very little current;
harder to make one that lets through more.)

----------------------------------------------------------------------
Regarding Example 1 on p.1164, you ask

> Would it be possible to use power series to solve the equation 
> $y''+y=0$ without reindexing Equation 3 so that it looks like 
> Equation 4? Does Equation 3 need to be reindexed to compare 
> it to Equation 2?

This essential thing is to remember that if two power series
are equal, then for each  n,  the coefficients of  x^n  in
those series must be the same.  So in Example 1, for each  n,
you need to find the coefficient of  x^n  in  y'' + y,  and
set it equal to  0.  If you are comfortable eyeballing
equation 3 and seeing that the coefficient of  x^n  in  y''  is
(n+1)(n+1)c_{n+2}, and concluding that the sum of that with
the coefficient of  x^n  in equation 1, namely  c_n,  is  0,
that's fine.  The method of re-indexing is simply a convenient
reliable way of doing this on paper.

----------------------------------------------------------------------
Regarding the technique of series solutions to differential
equations developed on pp.1164-1167, you ask

> ... whether there would ever be a different assumption made
> than y= summation of c*x^n? ...

Yes.  For instance, if the coefficients of the equation involved
x^{1/2},  it would be reasonable to expand as a series in terms
x^{n/2}.  In some situations, if one had reason to believe that
the solution would have a pole at  x = 0,  one might use a series
like  c_{-1} x^{-1} + c_0 + c_1 x + c_2 x^2 + ... .  And, for a
much more trivial variant of the  Sigma c_n x^n  expansion,
in some situations one might want to expand in powers of
x-a  for some nonzero  a.

----------------------------------------------------------------------
You ask about the blue equation in the left margin of p.1166.

The two sides of that equation differ in that the right-hand
side includes a summand with  n=0,  while the left-hand side
leaves that out.  But in the summand with  n=0,  the factor n
is  0,  so that summand is 0; so leaving it out makes no difference.

----------------------------------------------------------------------
Your question of why, in equation [8] on p.1167, Stewart shows the
first few terms of the series outside the summations, is an interesting
one!

There is a convention that mathematicians like, but students find
difficult; if Stewart had stated and used this convention, then he would
have been able to incorporate those first few terms into the summations;
but it might have made the formulas more difficult for the average
student.  The convention is that when one gives a formula involving
a product of  k  factors,  p_k = f_1 f_2 ... f_k,  then the  k = 0
case, i.e., the product  p_0,  is understood to be  1.  This
makes sense, because then each term is related to the next by
p_k = f_k p_{k-1},  even when  k = 1.  You have seen special cases
of the convention in the formulas  x^0 = 1  (since  x^k  means  k
multiplied by itself  k  times) and  0! = 1  (since  k!  means
1 . 2 . ... . k).  But many students feel that a product of 0 factors
should be 0 rather than 1; hence Stewart does not introduce the
convention.  (0  is what is called the "neutral element" for addition,
meaning that  x + 0 = x,  while the neutral element for multiplication
is  1,  since  x . 1 = x.  So where it makes sense to define the sum
of no factors to be  0,  the analogous choice for the product of no
factors is indeed  1.)

If Stewart had introduced this convention, that would have
brought the term  1/2! x  from the first line of [8] and the
term  x  from the second line into the summations.

But what about the term  1  of the first line?  That takes a little
more thought.  Notice that successive factors in the numerator
of the general term of the summation on that line differ by  4.
So if one wants a factor before "3", it would be "-1".  If
we change the minus-sign before the summation to a plus sign,
we can write that numerator as a product of  n  terms,
(-1) . 3 . ... . (4n-5).  Then the term  -1/2! x^2  fits in
naturally as the  n=1  case, without calling on the "product of
no factors" convention, while the  n=0  case would be correctly
given as  1  by that convention.

(Of course, sometimes it may happen that a few terms of a
series are not given by the same rule as the rest.  In such
cases, they would really have to be written separately.  But
that's not the situation here.)

----------------------------------------------------------------------
You ask where the 4n-5 in the display beginning "c_{2n} = ..."
on p.1167 comes from.

It comes from equation [7] on the preceding page, with the
index readjusted to give a formula for c_{2n},  which is what
we want here.  That requires using  2n-2  in place of  n  in
the formula.  Then the  2n-1  in the numerator becomes 2(2n-2)-1
= 4n-5.  In [7], the expression with  2n-1  in the numerator is
multiplied by  c_n;  so in this "even coefficient" formula, the
4n-5  is multiplied by the numerator of the preceding even
coefficient, the "3 . 7 . 11 . ... .".

Stewart probably didn't expect students to think it through this
way; the approach he is suggesting seems to be "Compute the first
few coefficients, not multiplying them out, but keeping them
as products, since that will show the pattern that is developing.
We see that the extra term that gets multiplied into the numerator
in the successive even coefficients -- for n = 2, a factor of 3,
for n=4,  a factor of 7,  for  n=6,  a factor of 11 -- is
increasing by 4 each time, so it must have the form  4n+C, and
looking at any one of these cases, we find  C=-5, so that's
what we write down."

----------------------------------------------------------------------
You ask about Stewart's statement in Note 4 on p.1168 that in the
case he is considering, all the even coefficients will be zero.

First, do you understand what he means by "even coefficients"?
(It's a slightly shorthand way of speaking.  You can probably
guess correctly what he means, but let me know just to be sure.)

Assuming you do understand, go to the formula he found for
the general solution of this differential equation on the
preceding page, substitute for  c_0  and  c_1  the values he
gives just before the statement about even coefficients, and see
what you get as even coefficients when you make this substition.
Let me know the answer you get, or if you run into any difficulty,
let me know what it is.

----------------------------------------------------------------------
You ask, regarding the proof of Law 4 on p.A39, "why is
|g(x) - M| < epsilon / 2(1+L) ?"

I hope that what I said in lecture answered this for you.
The inequality that you refer to is not a fact that we deduce.
Rather, the definition of "lim_{x -> a} g(x) = M" tells us that
_we_can_find_a_delta_ such that, whenever  x  is at distance
< delta from  c,  the above inequality holds.  In fact, it
tells us that we can find a delta which, in the same way,
ensures any degree of closeness between  g(x)  and  M  that
we want.  We decide that we want exactly that degree of closeness.

Why do we choose that degree of closeness?  That is what I was
sketching when class ended.  The idea is to think strategically,
"What degrees of closeness of  f(x)  to  L  and of  g(x)  to  M
can together put  f(x) g(x)  within  epsilon  distance from  LM?"
That strategic thinking led us to choose  epsilon / 2(1+L)
as the degree of closeness of  g(x)  to  M  that we wanted.

----------------------------------------------------------------------
You ask why we need  min{\delta_1,\delta_2,\delta_3}  on p.A40,
line 3, concluding "... isn't it logical to just choose the lowest
\delta?"

Choosing the lowest value is exactly what the "min" function does.
In a concrete situation with particular  f, g  etc., we can see which
is smallest, and use it.  This abstract proof covers all such concrete
situations, and the same one will not be smallest in all situations,
so we express the choice of the least by "min".

----------------------------------------------------------------------
You ask how, near the top of p.A40, $|L| (\epsilon / (2 (1 + |L|)))$
"is reduced to" $\epsilon / 2$.

Stewart does not assert that they are equal: note that the formulas
you are comparing are not connected by "=" but by "<".  The argument
is that since
		 $2(1+|L|) > 2|L|$,

when they occur in the denominators (with a positive numerator), we have

	$numerator / (2(1+|L|))  <  numerator / (2|L|)$

After noting this, and the fact that the numerator contains $|L|$,
one can, as you suggested, cancel the $|L|$ in the numerator with
the $|L|$ in the denominator, getting $\epsilon/2$.

----------------------------------------------------------------------
Regarding the third and fourth displayed lines on p.A40, you ask
how we know that  |L| . \epsilon/2(1+|L|) < \epsilon/2.

As I described in class, the "1+" was only put into the denominator so
that the fraction would not be undefined if  L  happens to be 0.  If  L
is nonzero, we have |L| . \epsilon/2(1+|L|) < |L| . \epsilon/2|L| =
\epsilon/2.  If  L=0,  then  |L| . \epsilon/2(1+|L|) = 0 < \epsilon/2.

----------------------------------------------------------------------
Regarding the proof of Limit Law 5, p.A40, you ask "Why is
|g(x)-M| < |M|/2 ?"

I can't tell from your question whether you had read the first
part of the sentence, saying "... there is a number \delta ...
such that whenever  0 < |x-a| < \delta  we have".

In view of that phrase, what Stewart is saying is not that
"|g(x)-M| < |M|/2" is something that automatically holds, but
that it is something we can cause to hold, by taking  x  close
enough to  a.

Given this, your question can be broken in two: "Why do we want
to cause  |g(x)-M| < |M|/2  to hold?" and "How do we know that
we can cause  |g(x)-M| < |M|/2  to hold?"  Let me know whether
you need help with either of these questions.  (Hints:  The answer
to the second question is based on the first part of the sentence.
The answer to the first part lies in the remainder of the proof --
what we subsequently use the the inequality  |g(x)-M| < |M|/2  for.)

----------------------------------------------------------------------
You ask about the step  |g(x) - M| < |M|/2  in the proof of
Limit Law 5, p.A40.

The first thing to understand is that we are not proving that
this inequality holds; we are noting that we can _cause_ it to
hold, by restricting  x  to be sufficiently close to  a.  I hope
it's clear to you that what Stewart is saying, in the sentence
leading up to that inequality.

Assuming you understand this, your question is "Why do we
want to make  |g(x) - M| < |M|/2?"

The answer is that we want to make the fraction

		| (M-g(x)) / Mg(x) |

small (namely, < epsilon).  Now a fraction will be "small" if and
only if the numerator is much "smaller than" the denominator.
(E.g., a fraction  a/b  will have absolute value  < 1/10  if and only
if the numerator is less than a tenth the size of the denominator
in absolute value.)  In the above fraction,  M  is fixed, but
the numerator and denominator both depend on  g(x),  so we want
to see whether for  x  close enough to  a,  we can make  g(x)
assume values making the numerator "much smaller than" the
denominator.  The way we will do this is to make the numerator
"very small" without letting the denominator get "very small".

So first we keep the denominator from getting arbitrarily
small, by noting that  g(x)  is approaching the nonzero
value  M;  so if we make  g(x)  close enough to  M,  it will
have absolute value near  |M|.  Specifically, we can get that
absolute value to be at least  |M|/2  by making  |g(x) - M| < |M|/2
(the inequality you asked about).

Then, once we have the fixed lower bound  |M|/2  on  g(x),
and hence the lower bound  |M^2|/2  on the absolute value of
the denominator of our fraction, we can figure out how small
we need to make the numerator to get the whole fraction to
have absolute value  < epsilon.  That's what Stewart achieves
on the next page, in the sentence starting with "Also".

(Incidentally, we didn't _have_to_ make  |g(x) - M|  at least
|M|/2  in the above proof.  We could have chosen any value
<|M|; e.g., we could have made it at least  3|M|/7  by choosing
|g(x) - M| < 4|M|/7.  But |M|/2  was just the simplest choice
of a number strictly between  0  and  |M|.)

----------------------------------------------------------------------
You ask how, at the bottom of p.A40,
|M|=|M-g(x)+g(x)| \leq |M-g(x)| + |g(x)| < |M|/2 +|g(x)|
is obtained from |g(x)-M| < |M|/2 .

The relation |g(x)-M| < |M|/2 is applied to get the final "<".
Note that |g(x)-M| is the same as |M-g(x)|.  So where the middle
term of the long formula above contains |M-g(x)|, this is the
same as |g(x)-M|, and hence, by the shorter formula, it can be
replaced by  |M|/2,  with a "<" inserted.

The first step in the long formula, on the other hand, is the
triangle inequality.  Calling  M-g(x) "A" and g(x) "B", it is
the statement |A+B| \leq |A| + |B|.

----------------------------------------------------------------------
You asked how the assumption  f(x) \leq g(x)  was used in the proof
of Theorem 2 on p.A41.  it is used in the vary last two lines of
the page.  Up to that point, the author has proved that if  L  were
> M,  then there would be some  \delta  such that in a certain
interval around  a,  we would have  g(x) < f(x).  He now uses the
assumption that  g(x) \geq f(x)  to conclude that this is not true
(since  g(x) \geq f(x),  it can't also be < f(x)).  So the assumption
L > M  can't be true; i.e.,  L \leq M,  as desired.

----------------------------------------------------------------------
You ask about Stewart's statement in the proof of Theorem [2] on
p.A41 that  L - M > 0  "by hypothesis".  What he means is "because
we have assumed that  L > M" (at the beginning of the proof).  When
I write him at the end of the semester, I'll suggest that he change
this wording.  ("By hypothesis" usually means "by the assumptions
of the theorem".)

----------------------------------------------------------------------
You ask, in connection with the theorem that the inverse of a
one-to-one continuous function on an open interval is continuous
(p.A42), whether a function on an open interval can have continuous
inverse without being continuous and one-to-one.

A function has to be one-to-one for its inverse to be well-defined.

On the other hand, if  f  is one-to-one but not continuous, it can
still have a continuous inverse, if we understand the domain of
the inverse to be the range of  f,  even if that is not an interval.

For instance, let  f,  with domain  (0,2),  be defined to
have  f(x) = x  if  x  lies in  (0,1],  and  f(x) = x+1  if
x  lies in  (1,2).  Then the range of  f  is the union of
the intervals  (0,1]  and  (2,3).  On this range, the function
f^{-1}  takes a point  y  of  (0,1]  to  y,  but takes a
point  y  of  (2,3)  to  y-1.  You can check that this is
continuous at each point of its domain.

Incidentally, in the above example,  f  was an increasing
function.  But if we change our definition by making
f(x) = 4-x  (instead of  x+1)  for  x  in  (1,2),  we find
that the function is still one-to-one, and that its inverse
is still continuous on the range of  f,  and that range is
still the union of  (0,1]  and  (2,3);  but  f  is no longer
everywhere-increasing or everywhere-decreasing.

----------------------------------------------------------------------
You ask about the meaning of the statement on the 5th-from-last
line on p.A42 that  f  "maps" the numbers in one interval onto the
numbers in a certain other interval.

When one says that a function  f  maps a point  x  to a point  y,
this simply means  f(x) = y;  "maps" is synonymous with "takes" or
"carries".  So Stewart is saying that  f  takes the points of one
interval onto the points of the other.  A key word is "onto" (which
I'm afraid he doesn't explain -- I've now made a note to recommend
that in future editions, it be made clear).  To say that a function
f  maps a set  X  onto a set  Y  means that not only is the image
f(x)  of every element  x\in X  a member of  Y,  but every member
of  Y  occurs as the image of an element of  X.  (If we merely know
the former statement, then we say that  f  maps  X  "into"  Y;  "onto"
expresses both facts.  So, for instance, the squaring function
carries  [-1,1]  *into* both [0,1] and [0,2];  it carries    [-1,1]
*onto*  [0,1],  but not onto  [0,2].)  In the sentence in question,
the fact that  f  is increasing forces it to carry
(x_0-\epsilon, x_0+\epsilon)  into  (f(x_0-\epsilon), f(x_0+\epsilon))
-- it can't fall below the starting value or exceed the end-value --
while the Mean Value Theorem shows that it will actually map it onto
(not merely into) that interval.

You also ask how Stewart asserts on the second line of the next
page "... that |y - y_0| < \delta, and therefore |f^-1(y) - f^-1(y_0)|
< \epsilon, ..."

No, the statement in the second line of p.A43 is 
"*if*  |y - y_0| < \delta, *then* |f^-1(y) - f^-1(y_0)| < \epsilon".
He is not saying that the former is true and hence the latter is
true, but that every  y  which satisfies the former inequality
satisfies the latter.

To see that this is true, note that  |y - y_0| < \delta  means
y\in (y_0-\delta, y_0+\delta), and  |f^{-1}(y) - f^{-1}(y_0)| < \epsilon
states that  f^{-1}(y)  is in a similar interval.  So take these
statements about what is in what interval, and compare them with what
is proved toward the end of the preceding page.

----------------------------------------------------------------------
You ask about the formula  |g(x)-b| < delta_1  in the last
display of the proof of Theorem 8 on p.A43, and in particular,
why it has  delta_1  where you would expect an  epsilon.

There is nothing sacred about the letters "epsilon" and "delta".
It is simply convenient to write  delta  for our bound on the
deviation of the input of a function whose limit behavior we
are studying, and  epsilon  for our bound on the deviation of
the output.  That convention helps us keep track of what we
are doing.  But in this proof, the input of  f  is the output of
g,  so whichever name we give to the number needed at that point,
it can't fit the convention with regards to both  f  and  g.

In any case, I hope you read the whole sentence, and not
just the formula, so that you saw that it was not asserting
that "|g(x)-b| < delta_1" in general, but only that there exists
a delta such that for all  x  within delta distance of  a,
"|g(x)-b| < delta_1" holds.  That there exists such a  delta
follows from the definition of the statement that
lim_{x->a} g(x) = b.  Namely, since that statement begins "for
every epsilon > 0", we can, in particular, apply that statement
with the  delta_1  we have found in the role of that epsilon.

----------------------------------------------------------------------
You ask two questions about the proof of Theorem 8 on p.A43.
Let's start with the second:

> .. And why is it necessary to introduce the y variable into the proof?

We are looking at the function  f(g(x));  so the output of  g  gets
fed into  f  as its input.  In discussing this situation, we need to
use the fact that  f  is continuous at  b,  which is a statement about
the relationship between inputs and outputs of  f  in general.  We
could call the input of  f  "x" in that discussion, but this would
be confusing, because when we use  g(x)  as the input of  f,  this
is different from the "x" that is the input of  g.  So it is better
to use a different letter for any values that occur as inputs of
f,  in particular, those arising as outputs of  g.

Now to your first question:

> Why is it that if 0 < |x-a| <\delta then |g(x)-b| <\delta_1 ?

Stewart is not saying that this is true for any old \delta; if you
look at the line before the formula you are asking about, you see
that Stewart says "there exists \delta > 0 such that".

If, after thinking about the point, you still have a question about
it, then ask again, making clear that you understand what it is
Stewart is asserting.  (And if you did understand that, and had
made it clear in your question, then I could have addressed it to
begin with.  One way or the other, there's something to be learned
from this -- either that in looking at Stewart's formulas, you need
to read them in the context of the sentence containing them, or
that in formulating your question of the day, you need to make
clear how much you do understand before getting to the point
that you don't understand.)

----------------------------------------------------------------------
You ask why, in the final lines of the proof of Theorem 8 on p.A43, we
can say that  |g(x)-b| < \delta_1  implies  |f(g(x)) - f(b)| < \epsilon.

This because  \delta_1  was chosen to satisfy the third display in the
proof: "if 0 < |y-b| < \delta_1  then  |f(y)-f(b)| < \epsilon".  Stewart
is applying this statement with  g(x)  in the role of  y.  However, as I
said in class, Stewart appears to be neglecting the condition
"0 < |y-b|" in that condition.  But in this situation, the restriction
"0 < |y-b|" can be dropped from that display, because if  0 = |y-b|,
i.e.,  y=b,  then  f(y)=f(b),  so  |f(y)-f(b)| = 0 < \epsilon.

----------------------------------------------------------------------
Regarding the proof that  tan theta > theta  on pp.A43-A44, you ask

> After the step |PQ|<|RT|<|RS|, how does Stewart arrive at
> L_n<|AD|=tan(theta)?

The sum of the terms of the form "|PQ|" is L_n,
the sum of the terms of the form "|RT|" is AD, and
|AD| = tan theta.

In the first assertion, you should understand that Stewart's picture
just shows the case where  n=3,  and he has only labeled one of the
secants to the circle "PQ"; but in the situation he is talking about,
n  can be any positive integer, and "PQ" represents any one of the
n  secants (which include, in his picture, the two labeled AP and QR).
So we see that if we add them all up, we get the length  L_n  of
the inscribed "polygon" (string of segments -- in his picture, APQB.)

Similarly, in the situation he is talking about, "RT" represents
any one of the  n  pieces into which the side AD is divided (including,
in his picture, those labeled AR and SD); so when we add these up
we get AD.

I hope the equation  |AD| = tan theta  is clear.

----------------------------------------------------------------------
You both ask about Stewart's statement that to prove Cauchy's
Mean Value Theorem we "change" the function  h(x)  given in
equation [4] on p. 286 to the h(x) given on p.A45.

What he means is that we take the proof of the ordinary Mean Value
Theorem on p. 286, and where that theorem uses the function  h(x)
defined at the top of that page, we instead use the function  h(x)
defined on p. A45.  He leaves it to you to verify that the same
arguments used on p. 286, when applied to the new function, will
give the new version of the Mean Value Theorem.  Probably "change"
was not the best choice of words; it might have been better to
say "use ... instead of ...".

(Note that the  h(x)  on p. 286 is a special case of the one
on p. A45:  it is the case where  g(x) = x.  Similarly, the
old Mean Value Theorem is the case of Cauchy's version where
g(x) = x.)

----------------------------------------------------------------------
You ask what the function
	f(x) - f(a) - ((f(b)-f(a))/(g(b)-g(a))(g(x) - g(a))
in the proof of Cauchy's Mean Value Theorem on p.A45 represents.

It measures the failure of the point  (f(x),g(x))  to lie on
the line connecting  (f(a),g(a))  and  (f(b),g(b)).

This is easier to see if we replace the variable  x  by "t",
and think of the curve given by the values of  f  and  g  as
the parametrized curve  x = f(t),  y = g(t).  Then the expression
	(x - f(a)) - ((f(b)-f(a))/(g(b)-g(a))(y - g(a))
measure the failure of any point  (x,y)  to lie on the line
described.  Namely,  x-f(a)  and  y-g(a)  measure the horizontal
and vertical displacements of  (x,y)  from the point  (f(a),g(a)),
and the above expression measures the failure of these displacements
to be in the same ratio as the displacement of  (f(b),g(b))  from
(f(a),g(a)).

The set where the above function has any constant value is a
line parallel to the one connecting  (f(a),g(a))  and  (f(b),g(b)).
When the curve  (f(x),g(x))  gets as far as it ever does from
that curve, on one side or the other, then it will, in general,
be tangent to one of those parallel lines, which is why  f'/g'
will equal the ratio  ((f(b)-f(a))/(g(b)-g(a)),  as stated in
Cauchy's Mean Value Theorem.

----------------------------------------------------------------------
You ask why we use the extended functions  F  and  G  rather than
f  and  g  in the proof of L'Hospital's rule on p.A46.  This is
because we don't have values for  f  and  g  at the endpoint
0  of the interval we are looking at, so we can't apply Cauchy's
Mean Value theorem to them on that interval.  But the assumption
that  f  and  g  approach  0  as  x --> a  tells us that the
"right" way to extend them, to make them continuous, is by
taking the values at  a  to be  0.  We could, sloppily, say "Let
us extend  f  and  g  to equal  0  at  x=a", and so continue to
use the symbols  f  and  g  rather than new symbols  F  and  G.
Experienced mathematicians, who know what precise statement
that sloppy statement stands for, might well do so.  But to
develop this material for beginners, Stewart is careful to state
things precisely, even though that means introducing new symbols.

Incidentally, the difficulty that we overcome by extending
f  and  g  to be  0  at  a  is one that cannot be overcome when
we try to prove L'Hospital's rule for functions that approach
infinity instead of  0.  There we simply can't apply Cauchy's
Mean Value Theorem to an interval  [a,x],  because there's
no way of defining  f  and  g  at  a  that makes them
continuous; so we have to use intervals with one end  x  and
the other end much closer to  a  than  x  is; and as I
sketched in class, and you will see in your homework, this
makes for a more complicated proof than the (0,0) case.

----------------------------------------------------------------------
Regarding the proof of L'Hospital's Rule on p.A46 you ask

> In this proof, is there any reason to define F(x) and G(x)
> other than to establish the fact that F(x) = G(x) = 0 and
> simplify the result of Cauchy's Mean Value Theorem?

It not merely "simplifies the result of" Cauchy's Mean Value Theorem
-- it makes it possible to apply that theorem to our functions on the
interval  [a,x].

One might, as I did in lecture, cut corners and say "redefine  f
and  g  so that they are both  0  at  a", rather than introducing
new symbols  F  and  G.  It all depends on whether one feels it
more important to emphasize the fact that the new functions are
"except for one little detail (being undefined or being 0 at a)"
the same as the old ones, which led to my choice of using the
same symbol, or whether one is more worried that using the same
symbol for two (slightly) different things might confuse the
reader, which led to Stewart's choice to use different symbols.

----------------------------------------------------------------------
I hope what I said at the end of class answered your question about why
on p.A46 we create "new functions" F and G.  It's basically a quibble
-- because Cauchy's Mean Value Theorem is stated for functions that
are defined and continuous at the endpoints of the given interval,
we need such functions; and as given,  f  and  g are not necessarily
defined at  a.  One could informally say "Let us extend  f  and  g
by defining  f(a)=g(a)=0", and most mathematicians in writing for
other mathematicians would do so.  But that relies on the reader
understanding that we've switched the meaning of the symbol, and in
addressing students who are not familiar with precise mathematical
reasoning, Stewart wants to avoid having this seemingly imprecise
statement, so he introduces new symbols for the modified functions.

----------------------------------------------------------------------
You ask whether a delta-epsilon proof is required for L'Hospital's
Rule (p.A46).

I think that the reason Stewart does not give one is that he has
written his book so that instructors who consider the delta-epsilon
definition of a limit "too hard" can have their students skip
section 2.4.  He then needs to make as much as possible of the rest
of the book independent of that section; in particular, he words the
proof of L'Hospital's rule so that it does not refer to epsilons
and deltas.  But this requires him to use the somewhat handwavy
wording "if we let  x --> a^+,  then  y --> a^+", and to argue
by the displayed equation that follows this, in which the relation
between "x" and "y" implicitly comes from the paragraph that
precedes.  A precise delta-epsilon argument gets rid of the
handwaviness.

On the other hand, most any mathematician who has worked with
limits would know how to translate a proof such as Stewart
gives here into a delta-epsilon proof.  So one could say that
for an experienced mathematician, the difference between a
delta-epsilon proof and the proof Stewart gives is not that
important.

----------------------------------------------------------------------
Your question concerns the relation between  x  and  y  in Stewart's
proof of L'Hospital's rule on p.A46.

In the calculation,  x  represents any value "near enough to  a" on the
given side; in the notation I used in class, any  t\in (a,a+\delta),
while  y  represents a value such that  f(x)/g(x) = f'(y)/g'(y)  which
by Cauchy's Mean Value Theorem can be found in  (a,x).  So as he notes,
as  x  approaches  a,  the corresponding value of  y  also approaches
a.  In his displayed calculation, he deduces that since for each  x
and  y  so chosen,  f(x)/g(x) = f'(y)/g'(y), it follows that
lim_{x -> a} f(x)/g(x) = lim_{y -> a} f'(y)/g'(y).  Logically, this
could also be written lim_{x -> a} f(x)/g(x) = lim_{x -> a} f'(x)/g'(x);
but to remind us where this comes from, he is using the letters
x  and  y  as them in the preceding discussion.

The way I did it in class (recall that this was the easy argument
I gave, not the hard one!) using epsilons and deltas avoids having
to say things like "as  x  approaches  a,  y  approaches  a"; we
can simply say "for every epsilon, choose delta such that ...",
and show explicitly why  |f(x)/g(x) - L| < epsilon  for  x  in an
appropriate range.

----------------------------------------------------------------------
You ask how in the middle of p.A46, the limit of  f'(y)/g'(y)  can be
the same as the limit of  f'(x)/g'(x),  when  y  is not the same as  x.

The limit is not something defined in terms of a single value
of  x  or  y,  but something whose definition is based on
considering all  x  or  y  in a certain range, and looking
at whether the delta-epsilon definition of that limit, as
we select our  x   or  y  in that range, is true.  Stewart's
proof is a little handwavy, in that he first considers any
particular  x,  and chooses a  y  in terms of it; then he
talks about limits as these  x  and  y  vary (approach  a).
The version of the proof that I gave in class, using epsilon
and delta (the easy proof, for  f, g -> 0,  not the hard proof
for  f, g -> infinity), translated Stewart's argument into a
precise form that I hope you won't have trouble with.  If you
do, ask at office hours (or by e-mail if you think you have
a question that can be stated and answered briefly).

----------------------------------------------------------------------
You ask about Stewart's argument on p.A46 that "if we let  x->a+
then  y->a+".

The precise argument, which I gave in class in a hurry, was this:
Given  \epsilon > 0,  we want to find a  \delta  such that

(1)	for  x\in (a,a+\delta)  we have  |f(x)/g(x) - L| < \epsilon.

To do this, we use the fact that  lim_{x->a+} f'(x)/g'(x) = L  to find
\delta  such that

(2)	for  x\in (a,a+\delta)  we have  |f'(x)/g'(x) - L| < \epsilon.

Now given  x\in (a,a+\delta)  we know by Cauchy's Mean Value Theorem
that there is some  y\in (a,x)  such that

(3)	f(x)/g(x) = f'(y)/g'(y).

Because x\in (a,a+\delta),  the interval  (a,x)  is a subset
of  (a,a+\delta),  so the  y  we get is also in  (a,a+\delta)
(this is the step corresponding to Stewart's "if  x-->a+ then
y --> a+"), so by (2),  |f'(y)/g'(y) - L| < \epsilon,  and combining
with (3) we get  |f(x)/g(x) - L| < \epsilon,  proving (1), as desired.

----------------------------------------------------------------------
You ask about Stewart's statement, in the last calculation in
his proof of l'Hospital's rule for a = infinity on p.A46, that
lim_(t-->0+) [f(1/t)/g(1/t)] =
lim_(t-->0+) [f'(1/t)(-1/t^2)]/[g'(1/t)(-1/t^2)],
"by l'Hospital's Rule for finite a".

We think of  f(1/t)  and  g(1/t)  as functions of  t,  and want
to see what they do as  t --> 0+.  L'Hospital's Rule for finite a = 0
says that their ratio will have the same limit as the ratio of
their derivatives does, if that limit exists.  The derivatives of
f(1/t)  and  g(1/t)  are  f'(1/t)(-1/t^2)  and  g'(1/t)(-1/t^2).

----------------------------------------------------------------------
Regarding the proof of L'Hospital's rule (p.A46), you wrote,

> ... you mentioned a case where you can use L'Hospital's rule in some
> case that is not 0/0 or infinity/infinity. How does that work?

The proof of the infinity/infinity case really only uses the
fact that  g(x) --> infinity.  So even if  f(x)  doesn't approach
infinity, one can use the rule.

Of course, if  g(x)  approaches infinity but  f(x)  doesn't, then
the only question is whether  f(x)/g(x)  approaches  0,  or no
limit at all.  (Examples with no limit are  f(x) = x^2 sin x,
and  g(x) = x  or  x^2.)  I don't see any examples where L'Hospital's
rule could show that such a limit is zero where the answer isn't
obvious some other way.  But one might run into one.

----------------------------------------------------------------------
To slightly reword your question -- you ask when one can take a step
in a proof as "clearly true", and so not in need of a formal argument.

That's a good question.  As a first approximation, I would say:
When both the person giving the proof and the person to whom the
proof is addressed understand the situation well enough so that
they *could*, if asked, fill in the detail of the argument that are
being omitted.

In the proofs Stewart gives, lots of points could be omitted that
Stewart would be able to fill in if asked; but since he is addressing
this material to students to whom it is new, he gives much more in
the way of detail than he would if addressing, say, more advanced
students.

(One sort of step that I would think might be omitted even at this
level is that going from statement 1 of the theorem at the bottom
of p.A46 to statement 2 of that theorem.  The second is gotten by
taking the contrapositive of the first, writing "diverges" for "does
not converge", and renaming the values of  x  being considered.)

One difficulty with what I have said is that in teaching students
calculus, we can't go through a whole axiomatic development of the
properties of the real numbers; so we do rely on students "knowing"
that certain things are true even if they can't justify these.  And
the boundary between what we teach students how to prove, and what
we take for granted that they know, is sometimes fuzzy.  (In Math 104,
an axiomatic development of the real numbers is given.  But even there,
it is taken for granted that students can handle logic and simple
set theory, though those haven't yet been developed formally for
them.  This is finally done in Math 125 and 135.  These are
not required courses for Math majors, but they are electives.)

Anyway, if you point to a step where you think Stewart is giving
details that seem unnecessary to you, I can say whether I agree that
they are, indeed, pretty straightforward, or whether there is some
point that you might be missing.

----------------------------------------------------------------------
You ask whether the results on intervals of convergence that Stewart
proves on p.A46-47 could be proved more easily using the Ratio Test.

The proof you sketched works if the ratios  |c_{n+1}| / |c_n|
approach some limit, but not otherwise!  The kind of example that I
showed in class solving "Exercise 46" for 11.6 (with c_n given by
powers of 1/2^n for n odd and 1/3^n for n even) show that this need
not happen.

----------------------------------------------------------------------
You ask whether Stewart's proof of l'Hospital's rule also works
in the case where the limit  L  of  f'(x)/g'(x)  is infinity or
- infinity.

The calculation in the middle of the page, ending with "= L",
leaves out details.  If one fills in the details, they would involve
"epsilon and delta" when  L  is finite, and "M and delta" (as
in Definition 6, p. 115) when  L  is  infinity  or  - infinity;
but both cases work fairly straightforwardly.

----------------------------------------------------------------------
You comment on Stewart's taking \epsilon = 1 in the proof in
today's reading (p.A47, line 2).

The reason he can do this is that to prove that the series converges
for |x| < |b|,  we don't need the full assumption that it converges
for  x = b,  but (as the version of the proof that I gave shows), only
the weaker condition that the summands are bounded at  x = b.  He
finds it convenient to use instead the fact that the summands are
eventually bounded by  1  (so where I compared the sum at  x  with
a geometric progression  C |x/b|^n,  he compares a tail of the sum
with the corresponding tail of the simpler progression  |x/b|^n).

Either way, the idea is that a very weak condition holding at  x = b
implies a very strong condition (absolute convergence) at all points
with |x| < |b|;  and he uses the very weak condition in the form
"the terms are eventually of absolute value  < 1".

----------------------------------------------------------------------
Regarding the proof of the first theorem stated on p.A47, you
ask what it means for  S  to be nonempty.

The empty set is the set with no elements; so a set is nonempty
if it contains any elements.  In particular,  S  is nonempty
because it contains  b.

You then ask,

> If we can only guarantee that x=b is contained in the set, does
> this one number satisfy the conditions of the Completeness Axiom?

For  S  to satisfy the hypothesis of the Completeness Axiom, we
have to know that it is nonempty and bounded.  If we knew that
it consisted only of the element  b,  then, yes, that would
make the axiom applicable; but if all we know is that it contains
b,  that isn't the same as knowing that it consists only
of  b;  so we need more to show that it has a least upper bound;
namely, we need to know it is bounded.  Stewart establishes that
on lines 3 and 4 of the proof.

----------------------------------------------------------------------
You ask why Stewart can begin the proof of the Theorem in the
middle of p.A47 by saying "Suppose that neither case 1 nor case 2
is true."

To prove a statement of the form "X or Y or Z is true" is equivalent
to showing "If neither X nor Y is true, then Z is true", and that's
what Stewart is doing.  He is approaching the proof this way because
(i) and (ii) are relatively simple conditions, so it is easy to see
what must follow from their being false, and to deduce that this leads
to the more complicated condition (iii).

(Another condition equivalent to "X or Y or Z is true" is "X, Y and
Z are not all false".  So in proving a different sort of theorem of
the form "X or Y or Z is true", one might begin by saying "Suppose
X, Y and Z are all false", and showing that from this one can deduce
a contradiction.)

----------------------------------------------------------------------
Concerning the proof I gave of the theorem on p.A47 describing
intervals of convergence, you asked whether a value |x| that was
bigger than one upper bound might be another upper bound, and
still a member of  S.

You may be confused about what "upper bound" means.  For instance,
if  S  is the union of the two intervals  [1,2]  and  [3,4],  you
might be thinking of  2  and  4  as "upper bounds" of  S.  But that
is not what the term means; it means a number that is \geq all members
of  S.   (See paragraph on p. 698 preceding box [12].)  So 2 is
not an upper bound of the set just described; but  4  is, and so are
4.2, 500, etc..

As in that example, any set  S  that is bounded above has many upper
bounds.  If it is nonempty, it will have a least upper bound  M;
the real numbers  > M  will be its other upper bounds.  But if a
number  x  is  > M,  it can't belong to  S,  since  M  being an
upper bound to  S  makes  M \geq  all members of  S;  but by
assumption it is not  \geq x.

(If you did think  2  and  4  were what were called "upper bounds"
of the union of  [1,2]  and  [3,4],  then you might wonder,
"What can one call them?"  In fact, the set  \{1,2,3,4\}  is
what is called the "boundary" of the above set; so  2  and  4
are two of the points of that boundary.  This is not a concept
we will introduce in H1B, but you are liable to see it in some
upper division and/or graduate courses:  104, 140, 202A, ...).

----------------------------------------------------------------------
You ask whether the material in Appendix G (pp.A50-A56) would
need to be proved through epsilon-delta methods on a test.

"epsilon-delta" proofs occur mostly in setting up the foundations
of calculus.  Those foundations include results which have been
proved using "epsilon-delta" methods, and which then can be used
to prove other results.  So the answer is no, what Stewart does
here is essentially correct; though as I pointed out in lecture,
there are a couple of places that need to be filled in.  One
is about differentiability of inverse functions, where the proof
that I sketched has a key step, changing variables in a limit
statement, that would need epsilon-deltas for a full proof.
For the other, the proof that since  ln 2^n  approaches infinity
as  n  does,  ln x  will approach infinity as  x  does, I showed
the an "M, N" proof, which is the equivalent of an "epsilon-delta"
when  x  and  f(x)  are approaching infinity instead of real
constants.

----------------------------------------------------------------------
Regarding the proof of Law 1 on p.A51 you ask,

> How can we replace  a  (a constant) with y (a variable) in the proof?
> Since constants and variables are treated differently when
> differentiating, why can we substitute a constant a for variable y?

Constants and variables are both numbers.  When we differentiate or
integrate, we fix all but one of the numbers denoted by letters, and
consider how the resulting expression varies as we vary that one number,
which we call the variable with respect to which we are differentiating
or integrating.  But in a different calculation we can change what
we are varying.

Note that in Definition 1 on p. A40, the variable of integration
is  t,  while  x  is a constant, which could just as well be
written  a.  In the first display on p. A51,  x  is again a
constant from the point of view of the integration; but from the
point of view of the differentiation, it is the variable with
respect to which we are differentiating.  (And this always happens
when we are applying the Fundamental Theorem of Calculus.)

> And if we took the ln of a product of two functions ln(g(y) f(x))
> would the same laws still hold up?

Right.

----------------------------------------------------------------------
You ask whether one can correctly prove the third law of logarithms
on p.A51 by differentiating  ln(x^r),  noting that it has the
same derivative as  r ln(x),  and verifying that the constant by
which  ln(x^r)  and r ln(x)  differ must be  0.

Well, differentiating  ln(x^r)  requires knowing the derivative
of  x^r.  Looking through the early sections of Stewart, I see
that he proves the formula for the derivative of  x^n  where  n
is a positive integer directly (p. 175), and then on the next
page, states the rule for any real number x, saying he will
prove it in section 3.6.  In that section, he proves it using
properties of the logarithm, including the 3rd law of logarithms.
(See the first display in the proof on p. 221.  Though he writes
"n" for the exponent, he has said it represents any real number.)

So the proof that you describe could not be used in the
context Stewart has set up in this section, where we have put
aside everything we had derived by methods that assumed real
exponents, etc..

However, the formula for differentiating  x^r  when  r  is rational
_could_have_ been derived (with more work) using the formula for  x^n
where  n  is an integer; and if such a derivation had been given,
your proof would be correct.

(Needless to say, I won't give on an exam any question where you
would have to sort out what was proven how in Math 1A to know whether
your proof of something is valid!)

[SENT A FEW HOURS LATER:]

I see that your proof of the third law of logarithms, which I
criticized, follows the hint in Stewart, Exercise 5, p. A57 !

Unfortunately, Stewart doesn't say in this exercise whether, in the
"third law of logarithms",  x^r  is meant to be defined as an integer
root of an integer power of  x,  which is necessarily what is meant
in the statement of that law on p. A51, or by the more general
equation 13 on p. A54.

If the "integer root of an integer power" is meant, then, as
I said, one can't use the law  d/dx x^r = r x^{r-1},  because
that was based on Stewart's earlier treatment of exponentiation,
which he says at the end of the second paragraph on p. A50 we are
not going to use here.  On the other hand, if the equation 13
definition is meant, then differentiation is not needed; the
result comes easily out of the definition.

So either way, I think his hint is not appropriate.

I guess I should e-mail the class about this, as a correction
to the homework.

----------------------------------------------------------------------
You ask how we know to substitute  1/y  for  x  in proving Law 2
of logarithms, p.A51.

Well, it's usually simpler to use something one has proved before
than to start from scratch, and at this point we have proved Law 1,
describing the logarithm of a product.  Now dividing by  y  is the
same as multiplying by  1/y,  so we would like to use  1/y  as one
of the terms in Law 1.

That's half the story.  The other half is to remember what "1/y"
means -- it is the number which, when you multiply it by  y,
gives  1.  So to figure out its logarithm, we first apply Law 1
to the product of  1/y  and  y.  From the resulting equation,
we get the law  ln(1/y) = - ln y.  We then use that in Law 1
with an arbitrary  x,  and with  1/y  in place of  y,  to
find  ln x/y.

----------------------------------------------------------------------
You ask about graphing functions on the complex plane
(or the Argand plane, as it is named on p.A57).

Hard to do.  If  f  is a function  C --> C,  then using
a 3-dimensional graph, one can graph the real part of
f(x+iy)  as a function of  x  and  y;  or the complex
part, or the absolute value, etc.; but it would take
4 dimensions to graph the real and complex parts together.

I think some of the plaster models in the cabinet in
the Common Room, 1015 Evans, represent graphs of the
real parts of certain complex functions.  (Such models
must have been popular around 100 years ago.  Many math
departments have them, but hardly anyone looks at them
nowadays.)

Something else one can do is restrict the function to a
line in the complex plane, say the real axis or the
complex axis, and so get a map  R --> C,  which can
also be graphed in 3 dimensions, this time as a curve.

Anyway, combining the intuition about how functions
R --> R  behave that we get using graphs, and the
theorems proved in Math 185 about how complex functions
behave, one can develop an intuition for such functions,
even though one can't graph them entirely satisfactorily.

----------------------------------------------------------------------
You ask whether Stewart's statement, regarding Figure 3 on
p.A58, that |z| = \sqrt(a^2 + b^2),  should be  \sqrt(a^2 - b^2),
since  (bi)^2 = -b^2.

The statement is correct as he gives it.  It is a standard
definition, so the only challenge one can raise is "Does
that function of  a+bi  have useful properties, that would
justify making such a definition?"  It does, as is shown in
the next few calculations.

Something that might have led you to think  b^2  should be
(bi)^2  is the thought that the vertical line labeled "b"
in Figure 3 should be labeled "bi".  But the labels Stewart
puts on line segments in that figure give their geometric
lengths, not the complex number representing the difference
between their endpoints.  (So the red arrow is labeled with
a real number, not with the complex number  a+bi.)  Taking
this into account, I hope the figure now makes sense.

----------------------------------------------------------------------
Regarding the statement on p.A59 that the argument of a complex number
is not unique, you ask

> ... Is it just referring to the fact that an angle "a" is equal
> to "a + 2pi*n" where n is any integer?

Essentially, yes.

For some purposes, mathematicians do create a kind of entity
in which  a  and  a + 2 pi n  are actually equal.  But for our
present purposes, we have to regard  a  as a number, and then
a + 2 pi n  is not the same number as  a;  it's a different number,
though one at which the sine, cosine etc. have the same values that
they have at  a.  A consequence is that "arg" is not a well-defined
function; the symbol "arg(z)" means "any one of the infinitely
many real numbers theta that make the boxed equation on p. A59 true".

This nonuniqueness has important consequences for taking N-th
roots.  When we divide "arg(z)" by  N,  the results no longer
differ by multiples of  2 pi,  so they no longer lead to the
same complex number.  Rather, as noted on p. A62, every nonzero
complex number has  N  distinct N-th roots.

----------------------------------------------------------------------
You ask why the \theta occurring in the polar form of a
complex number (p.A59) is called its "argument".

I don't know.  Looking in the online Oxford English Dictionary,
one of the definitions of "argument" is:

	Astr. and Math. The angle, arc, or other mathematical quantity,
	from which another required quantity may be deduced, or on
	which its calculation depends.

Their examples show it being used by Chaucer, I think with reference
to astronomical calculations.  My guess is that this sense split into
two:  When  f  is a function, then in the expression  f(x)  we now
call  x  the "argument of  f"; so it is a quantity on which another
quantity depends.  On the other hand, its use referring to an
important angle must have led to the specialized sense in the
study of complex numbers that Stewart gives here.  (Note that in
primitive astronomy, everything that one could measure was an angle
in the sky, so the two senses were not that different.)

As to what such quantities or angles have to do with the other
meanings of the word "argument", the OED the gives not a hint.

----------------------------------------------------------------------
You ask how the use of complex numbers (pp.A57-A63) in describing
real-world wave motion can be justified.

I'm not sure what sort of use of complex numbers you are referring
to, but I'll mention several.

(a)  As a consequence of the equation  e^{it} = cos t + i sin t,
one gets  cos t = (e^{it} + e^{-it})/2
          sin t = (e^{it} - e^{-it})/2i.

Hence every linear combination
		a sin t  +  b  cos t
(where  a and  b  are real numbers) can be written
		c e^{it} + d e^{-it}
where  c  and  d  are conjugate complex numbers.  For computations,
this expression can be much more convenient than the original
expression, since exponentials behave more simply than trig
functions with respect to multiplication, differentiation, etc..
As long as we specify that  c  and  d  are conjugate, the
expression  c e^{it} + d e^{-it}  will be real-valued, and so
can describe a wave function in the real world.

(b)  The wave equation is linear, so any linear combination of
functions that satisfy it satisfies it again.  A consequence
is that the condition that  c  and  d  be conjugate is irrelevant
to the mathematical study of the wave equation, and so can be
dropped.  We then find that the simplest functions in terms of
which to write the general function as a linear combination are
e^{it}  and  e^{-it}.  So for simplicity, one uses these; and
since they behave identically (they are conjugate to one another),
one may, for simplicity, use just one.  It doesn't represent a
"real-world" wave function, but it's easy to work with, and
one can get real-world wave functions by taking linear combinations
of it and its conjugate.

(c)  I believe that in quantum mechanics, one posits wave-functions
that are genuinely complex-valued.  Since the relation between quantum
mechanics and the world we know is mysterious anyway, and I have
only a layman's knowledge of the subject, I won't try to guess
whether here, complex numbers really do occur in nature, or whether
there is a range of possible mathematical formulations of the same
not-directly-observable phenomena, in which case the founders of
quantum mechanics may have chosen the mathematically simplest, in
the absence of any other criterion for preferring one above others ...
or what!

----------------------------------------------------------------------
You ask how to prove the second displayed formula on p.A60.

Well,  z_1/z_2  is the number which, when multiplied by  z_2,  will
give  z_1.  Equation [1] on that page tells you how to multiply
complex numbers expressed in polar form.  You should be able to use
it to set up the problem "what is the polar form of the complex number
which, when multiplied by  z_1,  gives  z_1?"  Try it, and let me
know whether you can carry it through, or if not, where you have
trouble.

----------------------------------------------------------------------
Regarding complex exponentiation, defined on p.A63, you ask
about raising numbers  c  other than  e  to complex powers.

That is simple in the abstract, tricky in reality.  The abstract
answer is:  let "ln c" be a complex number  r  such that  e^r = c,
and define  a^z = e^{r z}.  The complication is that there are
infinitely many such complex numbers  r.  (Even for  c = e,  there
are infinitely many; namely  1 + any integer multiple of  2\pi i,
because  e^{2\pi i} = 1.)  So in complex analysis, one can't speak
of "the" exponential function with base  a;  one has to choose
such an  r  and then study the function determined using that  r.

----------------------------------------------------------------------