ANSWERS TO QUESTIONS ASKED BY STUDENTS in Math H1B, Fall 2010, taught from the UC Berkeley custom text based on the 6th edition of James Stewart's "Calculus, Early Transcendentals". (Page numbers are the same as in the standard version of the abovementioned text. Though we covered it in nonconsecutive order, I have rearranged the answers below in the order of pages referred to, with Stewart's appendices at the end. Incidentally, the page that I consider a question to refer to, which determines its location in this file, is denoted "p.N" where N is the page-number, while other pages referred to are denoted "p. N", with a space before the N.) ====================================================================== You ask about the way Stewart outlines the proof of limit statements: first "guessing" a value of \delta that should work, then proving it (e.g., in Example 2, p.112). One can talk in terms of "guessing" when one is beginning the subject, but the point of view I emphasized in lecture was that of "planning strategically". The reality can be a mixture of the two. I gave the example of finding the right conditions to use in showing lim f(x)+g(x) = L+M. As a first naive try, we might choose \delta to make |f(x) - L| < \epsilon and |g(x) - M| < \epsilon. But we discover that this only makes |f(x)+g(x) - (L+M)| less than 2\epsilon. So we go back, and make |f(x) - L| and |g(x) - M| < \epsilon/2, and that works. If we try the proof for f(x)g(x) naively, we have to go through more steps. The deviation of f(x) from L doesn't just get added to the deviation of g(x) from M; rather, the first gets multiplied by g(x) and the second by f(x). Since g(x) is around M and f(x) is around L, we might try making |f(x) - L| < \epsilon/|M| and |g(x) - M| < \epsilon/|L|. Several problems with that: first, we have to take account of the two errors being combined together; so we put a "2" in each denominator. Second, as we said, g(x) is "around M", but it isn't generally equal to it; so we handle that by replacing the |M| in the denominator by |M|+1, and making our choice of \delta such that g(x) doesn't deviate from M by more than 1. Third, L or M might be 0, and we can't divide by 0. Fortunately, the +1 handles that also. ... I won't go through the details, but we end up with the proof of Law 4, which works when we've finally set things up right. So, as I say: think strategically about what conditions to put on \delta to make the argument work, and then, if your plan covered all the problems, and you feed it into the proof, this will work. If, in trying to prove it, you find that additional conditions are needed, figure out how to work those into your choice of \delta, and try again. (I realize that you were asking about Stewart's examples with specific functions, while my answer was about general laws; but the problem is the same; just less trivial in the case of general laws.) ---------------------------------------------------------------------- You note that in Example 4, p.114, Stewart restricts x to a certain interval, and you ask whether there is an "exact" way of proving the limit. Although the way Stewart words it does sound as though restricting to the interval is cutting corners, the justification he gives is logically correct: the definition of a limit allows you to make delta as small as you wish, as long as it is positive; so whatever restrictions on it one would otherwise make, one can add the restriction that it be \leq 1. This means that the values of x for which one has to check the condition |x^2 - 9| < epsilon lie within the interval he names. So the argument he gives is in fact an exact proof. ---------------------------------------------------------------------- You ask why, in the middle display on p.114, we "substitute C for x+3". Look more closely! The two sides of the display are not connected by "=", but by "<" . Reading the formula correctly, can you see the justification? Ask again if you still have a question about it. ---------------------------------------------------------------------- You ask whether there is an easier way of showing that lim_{x->0} 1/x^2 is infinity than the one given in Example 5 on p.116. In particular, you say the use of "M" confuses you. This example shows how to get the result directly from the definition of a limit being infinite. One has to work directly from definitions to get one's first results. Once one knows how to use the definitions, one can use them to get general results, and prove specific results from these, which is often easier. To understand this proof, you should look back at the definition (on the same page, which introduces the symbol "M") and work on understanding that. Ask for help from me or the GSI if you have difficulty with this. ---------------------------------------------------------------------- Regarding the first proof on p.122, you write > I believe that Stuart writes (f+g)(x) to > mean f(x)+g(x). Is that common notation ... ? Yes. If you are not familiar with such notation, see the section "Combinations of functions", pp. 41-43. ---------------------------------------------------------------------- You ask about the list of functions in Theorem 7 on p.124, all of which are continuous on their domains, and you ask, isn't every function continuous on its domain? It's true that naturally occurring functions tend to be continuous on their "natural" domains of definition. But one can't make a precise definition of which functions are "natural"; something that looks "unnatural" in one context may be "natural" and even important in another; so "naturally occurring functions are continuous on their natural domains of definition" can't be made a precise statement. However, one does have a precise definition of "continuous". Examples of functions which are not continuous on their domains of definition (though one might consider them unnatural functions) are given by parts (b), (c) and (d) of Example 2, p. 120. Here is a more complicated example, which one might consider less unnatural than those: Let us start with the function F(x) = x^2 sin (1/x). So far, this is undefined at x=0; but it approaches the limit 0 as x -> 0, so let us make it continuous by defining F(0) = 0. Now one can show that the resulting function F is not only everywhere continuous, it is everywhere differentiable. But its derivative, F'(x), is discontinuous at x = 0, even though that point is in its domain. (If you are interested in working out the details, but run into difficulty at some point, let me know where, and I can try to help.) ---------------------------------------------------------------------- Regarding the technique of integration by parts (pp.453-457) you ask whether there is an easy way to decide which part of an integral should be u and which should be v'. In general, no. Obviously, you want the v' to be something that you know how to integrate; i.e., such that you can find an appropriate v. Among the various choices for which that is true, you have to look ahead to foresee which choice of u and v' will lead to a product u' v on the right-hand side of the integration by parts formula that you can also integrate. ---------------------------------------------------------------------- Regarding example 6 on p.457, you ask Why is u=sin^{n-1} x? Why is v=-cos(x)? First, I hope you saw the word "Let" before those equations. In other words, the author is not saying that we can tell from the integrand sin^n x dx that we must take u=sin^{n-1} x and v=-cos(x). In the technique of integration by parts, we have to start with some factorization of the integrand, and he is saying, "Let's try the simplest factorization, the one into sin^{n-1} x times sin x dx, noting that we can write the latter as d(-cos x)." You then ask what happens to the coefficient 1/n on the right-hand side of the equation to be proved. In this example, think of the left-hand side of that equation as what is given, and suppose we didn't know the right-hand side. Starting with the above choices of u and v, we apply integration by parts, as the author shows. We get an equation in which the original integral occurs on both the left and the right-hand sides, with coefficients 1 and -(n-1). As shown on the next-to-last equation of the example, we can combine these into one term on the left, with a coefficient of n. Dividing by that n, we get the desired formula, with "1/n" on the right. ---------------------------------------------------------------------- You ask, in connection with the last sentence of Example 6, p.457, how the reduction formula can be used repeatedly. The formula proved expresses the integral of sin^n x in terms of the integral of sin^{n-2} x. If n-2 >_ 2, you can apply the same formula to that new integral, with "n-2" in place of "n". (E.g., if you started with the integral of sin^5 x, then the reduction formula with n=5 expresses this in terms of the integral of sin^3 x; and you can then apply the reduction formula with n=3 to that, to and express it in terms of the integral of sin x.) So if you start with any odd integer n, you can reduce the integration successively to the corresponding integrals with exponent n-2, n-4, ..., 3, 1; while if you start with even n, you can can again reduce to the cases n-2, n-4, ..., etc.; since these are even, we will end with ..., 2, 0. In either case, the final integral is one we can easily do. ---------------------------------------------------------------------- You ask when one would need to use the identity [(sin x)(cos x) = (1/2)(sin 2x)] (p.462) in integration. Well, if one was integrating sin^4 x cos^4 x, it would be convenient to turn this into (sin x cos x)^4 = (1/2 sin 2x)^4 = (1/16) sin^4 2x, before preceding further. One could use the methods the author describes without this first step; but the step would make things shorter. It can also help when one is trying to decide whether solutions one got by two methods agree. If one solution involves sin x cos x and the other involves sin 2x, then one can convert between these expressions and check. ---------------------------------------------------------------------- You ask why integration by substitution, as used on pp.467-472, is reasonable. Well, let's think of an integral involving \sqrt{a^2 - x^2} (where a is positive). That expression is defined only for x in [-a,a], and for each x in that interval, there is a unique \theta in the range [-\pi/2, \pi/2] such that x = a sin theta. So we can think of each x as corresponding to some such \theta, and study our integral by thinking, "for each value of \theta, what does our expression in x and \sqrt{a^2 - x^2} equal; and as \theta changes by a tiny amount, what is the corresponding tiny change in x?" We can then compute the resulting integral, being careful to remember that the values of x are not the same as the values of theta, and the change in x ("dx") is not the same as the change in theta, but, rather, that one is determined by the other in each case. If we were to forget these distinctions, the computation would not be valid; if we keep them in mind, it is. You ask whether there can be a problem with domains. Yes; I sketched in class yesterday a kind of situation where there would be. If we had an integral where x ranged from -1 to 1, and we wanted to make the substitution x = 1/t, then although x = -1 and x = +1 would correspond to t = -1 and t = +1, we could not just write our result as an integral from t = -1 to t = +1. Rather, as x ranges from -1 upward to +1, t goes from -1 down to -infinity, and then from +infinity down to +1. We haven't yet studied integrals over ranges that "go to infinity", but we will in reading #7; and if we handle the range correctly, a substitution of the above sort will work; but not if we blindly integrate "from t = -1 to t = +1". You also worried about the fact that I motivated trigonometric substitutions by a geometric approach you would not have thought of. Don't worry -- if it were something I could have expected students to see for themselves, I wouldn't have presented it in lecture. It's good that you should try to come up with ideas for yourself; but since you're taking the class, that means that you know you can't come up with everything that way! ---------------------------------------------------------------------- You ask how one chooses the limits of integration after a change of variables, in an integration like that of Example 2, pp.468-469. If one is letting x = g(t), one has to choose the range of values that t runs over so that x will run over the given range, taking on each value just once. In the problem shown, Stewart has reduced to the case where x runs from 0 to a, so he needs to take a range of values of theta which will make x = a sin theta do this. The most convenient such range is the range from 0 to pi/2; though other ranges, such as 2 pi to (5/2) pi, would also work. ---------------------------------------------------------------------- You ask why a>0 in Example 5, p.470. The case a = 0 would have to be done by different methods, since in that case one can't write x = a sec theta. The case a < 0 does not logically have to be excluded; the calculation that Stewart does for a > 0 works for any nonzero a. But since a^2 = |a|^2, if the function has a < 0 we can always rewrite it using |a| in place of a, so Stewart chooses the positive value. I suppose his thought is to do so "just in case" the question of which sign a has would complicate some later computation; even though it actually wouldn't. A case where the choice would make a difference is Example 1 on p. 468. One could take x = -3 sin theta, but then the signs of most of the equations that follow are reversed; in particular, the sign in the formula for cot theta in the middle of the page has to be reversed. The final answer would be the same, but the details of the calculation would be different. ---------------------------------------------------------------------- You ask about the step at the end of Example 5, p.470, where the author says "Writing C_1 = C - ln a ...". The constant of integration denotes "any constant"; so the form Stewart gets at the next-to-last step means "what you get when you take any constant and subtract ln a from it". Obviously, that just comes down to "any constant", so it would be silly to express it in a more complicated way. You probably wouldn't lose points on an exam for not making this simplification; but it is certainly preferable to give one's answers without unnecessary complications. ---------------------------------------------------------------------- You ask about the words "proper" and "improper" as used in the top part of p.474. These are words taken from elementary-school arithmetic: A fraction like 9/4 is called an "improper fraction", because its numerator is larger than its denominator. One simplifies it to 2 1/4, the sum of an integer, 2, and a "proper fraction", 1/4. The author is extending the terms to rational functions, calling a rational function P(x)/Q(x) "proper" if P(x) has smaller degree than Q(x), and "improper" otherwise; and he notes that an improper rational function can be reduced to the sum of a polynomial and a proper rational function. These terms probably won't come up again, either in this course or in other math courses. (There is another use of "proper" and "improper" that you will see in the later courses, regarding subsets of a set; that meaning is entirely different.) ---------------------------------------------------------------------- You ask where we get formula [10] on p.478. I showed in class how to get by trigonometric substitution. (I just did the case a = 1, but the principles of trigonometric substitution shown in the previous section tell you what to do for any a.) Alternatively, you can recall that the derivative of tan^{-1} x is 1/(x^2 + 1). This gives the integral of 1/(x^2 + 1); and by a change of variable, you can reduce the integral of 1/(x^2 + a^2) to that integration. ---------------------------------------------------------------------- You ask about the last display in Example 9, p.481, and how the author goes from \int sqrt{x+4}/x dx to 2\int du + 8\int du/{u^2 - 4}. He is using the computation of the preceding display, which has converted \int sqrt{x+4}/x dx to 2 \int (1+ 4/(u^2 - 4)) du. In the first line of the display you ask about, he carries this one easy step further. It is only on the next line that he applies Formula 6. ---------------------------------------------------------------------- Concerning example 4(c) on p.485, you write: > The function integrated is 1/(1-cos x). This is done by multiplying > and dividing by 1+cosx. Can we do this even if the integral is a > definite integral from pi/2 to 3pi/2? The value of cosx at pi is -1 > so 1/(1+cosx) doesn't exist at this value. Can we still apply this > method to solve this integral? Good point! The answer is "yes and no". When one does the integration, one gets a function which is guaranteed to have the right derivative except at the points where the modified integrand is undefined. But since the original integrand was continuous at some of these points, such as pi, and is equal to the modified integrand except where that is undefined, we can expect that its integral should agree with the integral of the original function where it is defined. So we should hope that when we compute the integral, we should be able to "fill in" the values at points like pi to get a differentiable function which is the desired integral. If we continue the integration where Stewart leaves off, we get -cot x - csc x, still undefined at x = pi; but we notice that -cot x goes to +infinity as x -> pi from below, while -csc x goes to -infinity, so there is a hope that their sum will behave reasonably. Expressing cot and csc in terms of sine and cosine, we get -((cos x) + 1)/sin x, where numerator and denominator both vanish at x = pi. How can we simplify that? We would like the fact that (cos x) + 1 goes to 0 at pi to be a result of the fact that some sine or cosine goes to zero at that point, so that we can hope to cancel such a sine or cosine in the denominator. An expression that represents the zero of (cos x) + 1 at pi as the zero of such a function is the half-angle formula, (cos x) + 1 = 2 cos^2 x/2. So we also apply the half-angle formula to the denominator, writing sin x = 2 sin x/2 cos x/2. Then -((cos x) + 1)/sin x becomes -(2 cos^2 x/2)/(2 sin x/2 cos x/2), which simplifies to -cos x/2 / sin x/2 = -cot x/2. We can check that this is an antiderivative of the original function. (When we differentiate it, we get (1/2) csc^2 x/2 = 1/(2 sin^2 x/2). Using the half-angle formula once more turns this into 1/(1-cos x), our original integrand.) Since it is in fact continuous at x = pi, the definite integral of original function from pi/2 to 3pi/2 is the difference between the values of this function at those points; and since it equals the result that Stewart's calculation leads to, -((cos x) + 1)/sin x, at pi/2 and 3pi/2, the difference between the values of that function at those two points is the correct integral. But as you pointed out, we couldn't have known that without finding this alternative expression for it. This development suggests another way of doing the original integration: applying the half-angle formula 1 - cos x = 2 cos^2 x/2 to the denominator of the integrand. And in fact, it gets the same result faster. ---------------------------------------------------------------------- You ask what to look for in deciding which method of approximate integration (pp.495-504) to use. The progression in this section of the book is from the naive to the more sophisticated; so if you really wanted to do a computation, the best of the techniques described in the section would be one we will see in Wednesday's reading, "Simpson's rule". On the other hand, the easiest ones to remember, and to apply in a "scratchwork" first-approximation, are the ones in today's reading, the easiest being those at the start. We'll see on Wednesday why some approximation methods work better than others. ---------------------------------------------------------------------- You ask about the relative values of Riemann sums versus the methods of approximate integration described in section 7.7 (pp.495-504). The biggest value of Riemann sums is not in practical computations, but in developing the theory of integration, something you will see if you take Math 104. The formal definition of an integral says that \int a b f(x) dx exists and is equal to A if the Riemann sums approach A no matter what "sample points" x*_i one takes in the various integrals, and no matter what subdivisions of [a,b] into equal or uniqual subintervals one uses, as long as one lets the lengths of these subintervals approach 0. (See p. 367, Note 4 for divisions into unequal subintervals. Stewart justifies these in terms of a practical application, but they are essential to the general development.) In Math (H)1AB, results we prove about integrals are based on the fact that if f is continuous, integration of f gives an antiderivative of f; so one can prove results such as the formula for change of variables using results on differentiation. But in the general theory, one integrates functions that may not be continuous, and in that case, the integral may not be differentiable; so one has to develop the general theory of integration without relying on the theory of differentiation. Then results like the change-of-variable formula require the general definition of a Riemann integral based on not-necessarily-equal subdivisions. (Because when one changes variable, equal subdivisions generally become unequal.) ---------------------------------------------------------------------- You ask about the "K" in the error estimates for the various approximation rules (pp.499, 503). The idea of the error estimates is that if the function f doesn't vary too "wildly", the results of the approximation formulas will be close to the actual value of the integral. The way the function varies depends on its derivatives; so if we know that one or another derivative never exceeds a certain value, then we can say that the error in the result of applying the approximation formula will not exceed a value computed from this. For instance, in box [3], p.499, "K" denotes any number that we know |f''| never exceeds in the given interval; if we know that, we get the bounds on the errors shown in the last line of that box. ---------------------------------------------------------------------- You both ask why in the first display on p.501, the B disappears after the first step. Stewart answers this in the left-hand margin, saying "Here we have used Theorem 5.5.7". To find that theorem, turn to section 5.5, and look for the boxed theorem number "[7]". (That takes a bit of looking, but it's on p. 405.) After checking that theorem, do you see the explanation? ---------------------------------------------------------------------- I hope my discussion in lecture clarified the point you asked about. Where Stewart writes "for a \leq x \leq b" in the first sentence of the error bound statements on p. 499 and p.503, this is formally ambiguous between "for all x satisfying a \leq x \leq b" and "for some x satisfying a \leq x \leq b"; but what is meant is "for all" in both cases. If we merely knew that the second or fourth derivative of a function was \leq K at _some_ point of our interval, this wouldn't give much information on how the function behaved in the interval as a whole, and so wouldn't allow us to get an error bound. Only from knowing that it is \leq K at _all_ x in the interval can we draw conclusions limiting how badly f can "stray" from its expected behavior. So if we took for K the smallest value f'' (or f'''') attained, the bounds could not be true -- we must take the largest value it attains, or more generally (if we can't be sure of the largest value) anything we know is at least the largest value. As Stewart says in the margin on p. 499, smaller values of K give better bounds -- but these must still be taken from among values of K with the property of being larger than f''(x) for _all_ x in the interval. (There in the margin, he does explicitly say "for all".) ---------------------------------------------------------------------- You ask why the error estimate for Simpson's rule (p.503) uses the fourth derivative rather than the second, as the Midpoint and Trapezoid rules do. If a function has constant second derivative, it will be a quadratic polynomial; and a Stewart points out, for such a function, the errors in the Midpoint and Trapezoid rules are opposite in sign, and in a ratio 1:2. A consequence is that when this is so, a 2:1 weighted average of the results of those two rules gives the exact integral. This 2:1 weighted average is Simpson's rule, and the above observation shows that it can only fail to give the exact answer if the the second derivative is non-constant; i.e., if the function has nonzero third derivative. Further, it turns out that if the second derivative increases as we go to one side of the midpoint and decreases by the same amount as we go to the other side, the effects cancel because of the symmetry of our rules; so for Simpson's rule to fail to give the exact answer, we have to have the second derivative changing in a non-straight-line manner; in other words, the second derivative of the second derivative (the fourth derivative) has to be nonzero! The error estimate says that, inversely, if we know a bound on that fourth derivative, we can use it to bound the error in Simpson's rule. ---------------------------------------------------------------------- You ask, in the case of an integral from -infinity to +infinity, and its expression as the integral from -infinity to a plus the integral from a to +infinity (p.509), "If one of the integrals diverges, does the whole integral, from negative infinity to positive infinity, diverge?" Right. The whole integral only converges if both parts do. ---------------------------------------------------------------------- You ask how, in the calculation in the middle of p.510, Stewart gets from one step to the next. Note the words before that calculation: "... by l'Hospital's Rule we have". Did you learn l'Hospital's Rule in your AP calculus course? If you didn't, or if you are not sure you remember it in full, look up "l'Hospital's Rule" in the index of this text, and review it there. Of course, if you have any questions about how it works, you can e-mail them to me. Points to be learned from this: Stewart often explains his calculations, so if you don't understand one, look at the words that precede or follow it, or sometimes (though not this time) words he puts in the margin by the calculation. And if he refers to some topic you are not sure about, use the index. ---------------------------------------------------------------------- You ask whether, in Example 3 on p.510 we couldn't just find the integral by integrating from -t to +t and taking the limit as t --> infinity. We could if we knew that the integral existed! But Exercise 61 on p. 516 (listed in the "interesting/challenging" category in this week's homework sheet) shows that the limit might exist even if the integral does not. So we need to do the two integrations to be sure the integral is defined. ---------------------------------------------------------------------- Concerning the warning Stewart gives on p.513 at the end of example 7, you ask why one can't evaluate the definite integral "the ordinary way" in that case. The ordinary method of evaluating the definite integral is based on finding an antiderivative (in this case, ln |x|), and using part 2 of the Fundamental Theorem of Calculus (p. 384) to deduce that the difference of its values between the endpoints equals the definite integral. The Fundamental Theorem of Calculus is stated for continuous functions defined on an interval. 1/x is in Stewart's language discontinuous; in mine it is not defined on the whole interval, and in fact has a pole (singularity) at x = 0. Whichever way one says it, the Fundamental Theorem of Calculus is not applicable to such functions. Intuitively, if you think of trying to "add up little bits of" f(x), this process does not converge, so it doesn't make sense to say that the result of the process is the difference in the values of ln |x| between the endpoints. ---------------------------------------------------------------------- You ask whether Stewart's notation |P_{i-1}P_i| for the distance from P_{i-1} to P_i (pp.525-526) is standard. No it isn't, to my knowledge. It is common to use absolute value signs for the magnitude of a vector, so I think Stewart's idea is to use P_{i-1}P_i to mean the vector from the point P_{i-1} to the point P_i. Such notation might be used in physics, but I think they would put an arrow above P_{i-1}P_i to express "vector". Another notation would be |P_i - P_{i-1}|, where P_i - P_{i-1} would denote the vector whose coordinates are gotten by subtracting the coordinates of P_{i-1} from those of P_i. A simpler notation, which you are likely to see in Math 104, is d(P_{i-1}, P_i), where d(--,--) is the "distance function". ---------------------------------------------------------------------- You ask how Stewart goes from the formula in the 3rd display on p.526, which involves f'(x*_i), to the integral in the 4th display, which involves f'(x). Stewart is using the definition of the integral -- see the boxed definition on p. 366. If you have questions about that definition, let me know. ---------------------------------------------------------------------- You ask why in the arc length formula on p.526, f' is required to be continuous. This is so that we can be sure that the integral in the definition of the arc-length is defined. Discontinuous functions may or may not be integrable -- this is a difficult topic, dealt with starting in Math 104. Continuous functions are always integrable. Curves with discontinous derivatives may or may not have finite arc-length. ---------------------------------------------------------------------- You ask whether Simpson's rule (suggested on p.528) would be more accurate for estimating arc-lengths than simply adding up the distances of the line-segments in the same subdivision of the interval of definition. I don't know for sure, but my guess is that Simpson's rule would be more accurate. The line-segment computation has a built-in bias -- it gives smaller values than the real distance, because each line-segment is the shortest distance between its endpoints, hence shorter than the segment of the actual curve. Simpson's rule is designed to make two sorts of biases, those of the midpoint and trapezoid rule, cancel each other to a large extent. However, since these are both different from the bias of the line-segment approximation, one would have to do some calculation to answer your question with more certainty. ---------------------------------------------------------------------- You ask regarding Example 3 on p.528, "Does an arc length function exist for the hyperbola xy = 1, or is always necessary to estimate it?" The function exists -- but I believe it is not an elementary functions. Some non-elementary functions can be found in tables. It is also often possible to calculate them accurately by other means, such as power series expansions. Methods of estimation such as those we read about in section 7.7, which Stewart uses in this example, are yet another method. ---------------------------------------------------------------------- Regarding the concept of surfaces of rotation on pp.525-530, you ask whether one can do a "double rotation" of a curve, rotating it first about one axis and then about the other. Well, one could; but since the first rotation would give a two-dimensional structure (a surface), the second would turn this into a three-dimensional structure (a solid). The boundaries of this solid would be hard to describe, so the volume would also be. But it's interesting to think about what one would get if one simply took a point, rotated this around the x-axis to get a circle, and rotated that circle around the y-axis to get a surface. (That is not an example of what is done in this reading, because the circle one would rotate around the y-axis would not lie in the x-y-plane, as the curves considered in this reading do.) Can you figure out what the resulting surface would be? A different way one could do two rotations would be in more than 3 dimensions. For instance, in 4 dimensions, calling the coordinates w, x, y and z, one could start with any curve in the x-y-plane, rotate this by performing a rotation on the y-z-coordinates (as one does in this section) getting a surface in x-y-z space, then perform another rotation on the w-x-coordinates, getting a 3-dimensional hypersurface in the whole w-x-y-z space. I don't think it would be hard to compute its volume; but it's outside the scope of this course. ---------------------------------------------------------------------- As you say, if we compute the arc length of the function in Example 4, pp.529-530, from x=0, we would get an infinite result, since as x ->0, the logarithm function approaches -infinity, so x^2 - (1/8) ln x approaches +infinity. This shows that the graphs on p.530 are badly drawn! Stewart was just interested in values \geq 1, but he should have had the graph drawn correctly for all values it shows. Thanks for bringing this this to my attention -- I'll include it in the list of comments and corrections I send Stewart at the end of the Semester! ---------------------------------------------------------------------- You ask about the relation between the "ds"'s one looks at when rotating a curve about the x-axis and about the y-axis, and whether one should use one formula for ds in the first case and the other in the second (pp.534-535). They represent the same thing; intuitively, the length of a "bit" of the curve that one is rotating. In both cases, one can compute the surface area using either for the formula for ds in terms of dx, or the formula in terms of dy. In Example 2 on p. 536, Stewart considers a parabola rotated about the y-axis, and shows that using the two formulas, one gets the same answer. ---------------------------------------------------------------------- You note that the surface area of a sphere, 4 pi r^2 (illustrated by Example 1 on p.535) is the derivative of its volume, (4/3) pi r^3, and you ask whether the areas of other surfaces can be obtained is derivatives of their volumes. In the case you refer to, we are looking at a whole family of spheres, one for each value of r, and as we increase r, the surface "grows" perpendicular to the tangent plane at each point. If we have a family of closed surfaces described in terms of some parameter t, such that as t changes, the surface grows at constant rate 1 in the direction perpendicular to its tangent plane, then the area at any value of t will be the derivative of the volume. But it can be tricky to design such families of surfaces; so one doesn't have a really useful general technique. ---------------------------------------------------------------------- You ask how Simpson's rule can be used to approximate the area of a surface of revolution, as suggested in Exercises 17-20, p.537, when Simpson's rule relates to the area under a curve, which is different from areas of surfaces of revolution. Simpson's rule is applicable to the integral of any function (and the error estimate applies whenever the function is continuously 4-times differentiable). Even though Stewart uses pictures in which the integral represents the area under a curve, in order to make the rule intuitively reasonable, it is not restricted to that case. Moreover, if one wishes, one can translate the problem of finding the area of a surface of revolution into that of finding the area under a curve: The formula for the area of the surface gotten by rotating y = f(x) around the x-axis is given by \int_a ^b f(x) \sqrt{1+f'(x)^2} dx, so it is equal to the area under the curve y = f(x) \sqrt{1+f'(x)^2} from x=a to x=b. ---------------------------------------------------------------------- You ask how Stewart handles the units in Example 2, p.541. The key step is substituting 62.5 for \delta. He notes in the margin on the preceding page that in customary units, the weight density of water is 62.5 lb/ft^3. ---------------------------------------------------------------------- You ask what the significance of moments (pp.542-547) is. As I think I said in class, moments don't have one single significance: they are a type of calculation that comes up in various situations. If you have a function of one variable, and you take the integral of that function multiplied by the n-th power of the variable, that is called the n-th moment of the function. More generally, if you have a function of several variables, and you take the multiple integral of that function multiplied by a product of various powers of the various variables, the results one get are likewise called moments of the function. (You won't see multiple integrals defined in general until Math 53, but Stewart uses in section 8.3 what is essentially the case of a double integral where the function is constant with respect to one variable, so the part of the integration that would come from that variable can just be replaced by the length over which one would integrate times the value of the function at that location.) Two cases of the first moment (which Stewart just calls "the moment") of a function of one variable that come up involve the torque on a lever, gotten by integrating mass times distance from the fulcrum, and the volume of a surface of rotation, where the distance from the axis of rotation becomes a factor in the integration because when one rotates an object, the distance a piece of it moves is proportional to its distance from the axis of rotation. Another way moments come up is in probability theory (section 8.5, which we are skipping), where the expected value of a variable is the moment of the probability function with respect to that variable. > ... why is the moment about the y-axis the sum of the > masses times the x coordinates? Because the distance from the y-axis is the x-coordinate. As I said in class, what Stewart calls "the moment about the y-axis" is better described as "the moment with respect to the x-coordinate". ---------------------------------------------------------------------- You ask regarding the top display on p.544, "... is the p supposed to be area density?" Good point! On p. 539, Stewart defines rho to be the density of a fluid; that is, the mass per unit volume. In the bottom paragraph of p. 543, on the other hand, he takes rho to be the "density" of a lamina, but doesn't say what this means. He must mean the mass per unit area. ---------------------------------------------------------------------- Regarding the derivation of the equations for the center of mass on pp.544-545, you note that they are based on using midpoints, and ask whether it would be more accurate to use Simpson's rule. If we wanted to compute a center of mass using an approximation by dividing the mass into finitely many strips, then Simpson's rule would be more accurate than the midpoint rule. But both these rules are ways of approximating an integral, and the point of the discussion Stewart gives is to show what integral is being approximated. So once he gets to the integral, the approximation that he used to lead up to it doesn't matter. The best choice is simply the one that gives the quickest, clearest derivation. ---------------------------------------------------------------------- You ask how one can use Simpson's rule in Exercise 36, p.549, when no formula is given for f(x). You are supposed to use the graph shown to estimate the values of f(x) at the different points needed. Presumably, you are expected to let n = 8, and use the values on the coordinate-lines. ---------------------------------------------------------------------- > What exactly is logistic differential equation? It's just a strange name that people have given to a certain differential equation proposed to model population growth; equation 2 on p.568. Why is it called "logistic"? After looking online, I think the following is the explanation. The word "logistic", in addition to its other meanings, used to have the mathematical meaning "related to logarithms and exponentials". The mathematical biologist Verhulst, mentioned on p.568, not only proposed the differential equation for population growth, but found the solution, equation 7 on p. 595. Since this solution involves an exponential function (though it is not itself such a function), he called it "logistic growth". And since his differential equation leads to "logistic growth", it came to be called the "logistic differential equation". ---------------------------------------------------------------------- In connection with the material of section 9.1 (pp.566-570) you ask, "Are all differential equations solvable?" In one sense, namely "Given a differential equation, can we find an elementary function which is a solution to the equation?", the answer is certainly "No", since if the equation has the form y' = f(x), then a solution would be an integral of f, and we know that some elementary functions have nonelementary integrals (pp. 487-488). Another sense of your question is "Given a differential equation, must there exist a function (elementary or not) which is a solution to the equation?". The answer is still "No". But recall that I gave in class an example of an initial-value problem with more than one solution, but pointed out that in that case, the function f giving the differential equation was not differentiable at the relevant point; and I said that there are are theorems (at least one case of which you'll see in Math 54) saying that for "reasonable" differential equations y' = f(x,y), an initial value cannot correspond to more than one solution. Those theorems also tell us that for such equations, every initial value does corresponds to some solution. So for "reasonable" differential equations, the answer to your question is yes. ---------------------------------------------------------------------- Regarding Example on on p.570, you ask "How does y' = .5(y^2-1) become equal to (2ce^t)/(1-ce^t)^2?" Hopefully, you understand that it is not .5(y^2-1) that becomes (2ce^t)/(1-ce^t)^2. Rather, Stewart is saying that examples of functions y satisfying y' = .5(y^2-1) are given by the functions y = (2ce^t)/(1-ce^t)^2. (Notice that the former equation is a formula for y', the latter a formula for y.) Stewart isn't saying here how to discover the solution to this differential equation! In this section he is teaching us what it means for a function f to be a solution to the equation. Once we understand that, he will show us, in later sections, some methods of finding solutions. ---------------------------------------------------------------------- Regarding Example 1 on p.581, you note that Stewart simplifies the solution by writing K in place of 3C, and you ask, "Since C is an arbitrary constant, can we write C in place of 3C?" Well, sometimes we do say things like "Let us write C for what we previously called 3C". Other times, one chooses to keep one's notation consistent between different parts of a calculation, and uses a different letter. One tries to balance the goals of simplicity, brevity, and clarity. Different people make different choices. ---------------------------------------------------------------------- Regarding the Paramecia example on pp.596-597, you write >In the beginning of the solution, Stewart comments that biologist G.F. >Gause used the same relative growth rate from the exponential growth >model for paramecia for his logistic growth model. Stewart states that >this is reasonable because the initial population is small compared to >the carrying capacity. Roughly how large can the initial population >be compared to the carrying capacity before this assumption ceases >to be reasonable? Such things depend on the degree of accuracy that one wants in determining one's constants. As I said in class, despite Gause's giving the relative growth rate to 4 decimal places, he could not have been claiming it was accurate to that many places, so we don't know what accuracy he was claiming. Even more important than the ratio of the initial population to the carrying capacity is the ratio of the later populations that are used to estimate the "initial relative growth" to the carrying capacity. E.g., if Gause based his 0.7944 on the values of P(0), P(1), P(2), P(3), as may well be the case, since the black curve in Figure 4 (p. 597) seems to be nestled nicely among those first four points, then that value of k may be far from appropriate for the logistic curve, since those values extend into a region where the black and red curves of Figure 4 are quite far apart. Perhaps the best value to use for logistic growth would have been larger, leading to a curve thar rose faster before flattening out, and fit the data better. On the other hand, I find it suspicious that P(3) sits squarely on the red curve in that graph. I wonder whether Gause chose his coefficients in the logistic equation precisely to make P(0) = 2, P(3) = 16, and K = 64. If that was so, then he didn't really estimate k by applying the exponential model to the initial growth. Hmm, I just pulled out a table of logarithms and found the value of k that would the values of P(0), P(3), and 64 in the limit; and it was 0.77846. Close, but not 0.7944. So we really can't tell how he computed his value. ---------------------------------------------------------------------- In connection with the material on p.602, you write > ... Stewart goes from xy' + y = (xy)' to (xy)' = 2x. > How does he make this connection? Good question! Note that between the two equations that you quote, he says, "and so we can write the equation as"; so we have to find "the equation" he is referring to! This is the equation xy' + y = 2x, in the sentence containing equation (2). When I write him, I will suggest that he display and number that equation, and refer to it by number. ---------------------------------------------------------------------- You ask whether the natural growth function y' = ky for rabbits in the absence of wolves (p.608) is similar to the model of the growth of a rabbit population given by the Fibonacci sequence. Yes, but the formula defining the Fibonacci sequence is a "difference equation" rather than a "differential equation": Instead of describing the "instantaneous" rate of change, it describes the change over a fixed interval of time. If we write it as f_n - f_{n-1} = f_{n-2}, it says that the change in the number of pairs of rabits as n changes by 1 is given by the number of pairs of rabbits -- and not the present number, but the number two units of time ago. The relation between solving difference equations and solving differential equations is like the difference between summing a series and integrating a function. It is often difficult to find an explicit solution to a difference equation (though one can easily compute any number of terms), just as it can be difficult to find an explicit formula for the sum of a series. However, linear difference equations with constant coefficients, like linear differential equations with constant coefficients, have solutions which, in general are given by linear combinations of exponential functions; and this approach leads to an expression for the general term of the Fibonacci sequence in that form. There is a free online calculus text, "Difference Equations to Differential Equations" by Dan Sloughter, http://synechism.org/drupal/de2de/ , which I suppose, based on its title, emphasizes the relationship between the two sorts of equations. (But I haven't gone through it.) ---------------------------------------------------------------------- Regarding Stewart's comment in the first paragraph of p.609 that it it usually impossible to find explicit formulas for the R and W satisfying the Lotka-Volterra equation, you ask > ... Are there certain type of differential equations which have > been proven to be impossible to solve non-graphically? ... We have already seen that there are elementary functions whose integrals are not elementary functions. Since an integration problem is a special sort of differential equation, these are examples of differential equations that cannot be "solved", if by solving one means naming the solution as a certain elementary function. Likewise, there are differential equations which are not themselves integration problems, but which can be reduced (using the method of separable equations, or the method described in the handout) to integration problems where the integral is not an elementary function. Doubtless there are also differential equations whose solutions are non-elementary functions which don't arise in this way from integrals. But as with the case of integrals, one can always give the solution to such an equation a name and a symbol, calculate tables of values, and solve other differential equations with the help of that function. ---------------------------------------------------------------------- > In the bottom of p.609, Stewart uses the chain rule to get > dW/dR, why did he do that? Is it wrong to just divide dR/dt > by dR/dt to get dW/dR or do we have to use the chain rule? Well, when we are dealing with situations where there is just one independent variable, expressions like dy/dx behave like fractions, and the chain rule dW/dt = dW/dR dR/dt can be thought of as cancellation of a numerator and denominator. So it is safe to treat these derivatives symbolically as fractions, and multiply them and divide them as one would multiply and divide fractions -- if one remembers that this only works in the case of 1 independent variable. But when you get to Math 53, you will have a more general kind of differentiation: Given a function of (say) two variables, y = F(w,x), you will learn about taking "the derivative of y with respect to w as x is held constant" and "the derivative of y with respect to x as w is held constant", which will be written "curly-d y / curly-d w" and "curly-d y / curly-d x" (where "curly-d" is a symbol sort of like a backwards 6). In these situations, one can't treat them simply as fractions: the chain rule takes a more complicated (though still elegant) form. So in conclusion, you can "just divide dR/dt by dR/dt" -- if you keep in mind that this only works when there is just one independent variable. ---------------------------------------------------------------------- You ask about the time it takes to complete various trajectories of the predator-prey equation in phase space (p.610). One certainly can't tell that from the direction-field alone. As I indicated in class, one could draw a direction-field diagram in which, at each point one had, not just a mark showing the direction, but an arrow whose length represented the speed with which the system moved in that direction; and using this, one could draw curves with widely or narrowly spaced markings indicating the passage of time. > ... do the trajectories further out imply a longer period, or do > they imply the same period because the derivatives will be higher? Near the equilibrium point -- let's call it (R_e,W_e) -- I am fairly sure that the period will approach some nonzero limiting value. This is because the pair (R',W') can be approximated near that point by a linear function of (R-R_e, W-R_e), and the differential equation determined by a linear function rotates the whole plane (around concentric ellipses) with a constant period. As we move out from the equilibrium point, and the linear approximation fails, the period doubtless changes. It's not obvious whether it will increase or decrease. My guess is that it will increase; because if we take an initial point with W very very low, then R will grow with approximately "natural increase" for a long time, until the initially tiny value of W builds up enough to start bringing R down. ---------------------------------------------------------------------- You ask how there can be an equilibrium point in a phase portrait like that on p.610, when every point shows a slope indicating change in the population. At the point marked by the red dot in the center of the phase portrait, there is no slope: It is the point where the numerator and the denominator of the formula near the top of p.610 are both 0. So dW/dt and dR/dt are both 0, and the population does not change. The thing to remember is that the phase portrait ignores time. When one is near the equilibrium point, the functions R and W have very slight change per unit time; but the portrait just shows dW/dR, which doesn't reflect the smallness of the change of each function. I mentioned in class that one could refine the technique of drawing a phase portrait by replacing the sloping direction-field symbols by arrows, such that an arrow is longer if dR/dt and dW/dt are larger, and shorter if they are smaller, even when their ratio (and hence the slope of the arrow) remains the same. Then as one looked at the phase portrait, one would see arrows getting shorter and shorter as one got near the equilibrium point, and finally going to zero length at that point. ---------------------------------------------------------------------- You ask why the precise definition of a sequence having limit "infinity" (p.678, Definition 5) does not use absolute values, like definitions of other sorts of limits. The absolute value signs (and the specification that the absolute value be less than some value delta or epsilon) are involved when a finite value is being approached either by the independent variable x or by the dependent variable f(x). This is because a finite value can be approached from either side; so to say that x is near a on one side or the other, or that f(x) is near L on one side or the other, one uses the formula |x-a| N and f(x) > M (p. 140, Definition 9) or n > N and a_n > M (p. 678, Definition 5). When one of the two variables (independent and dependent) approaches infinity, and the other approaches a real number, one gets definitions that contain one inequality involving an absolute value, and one without (p. 116, Definition 6, p. 131, Definition 1, p. 677, Definition 2. The same happens when one of the variables is approaching -infinity.) ---------------------------------------------------------------------- You ask why the Squeeze Theorem for Sequences (p.679, first theorem) is only stated for limits as n --> infinity, while the Squeeze Theorem for Function (p. 105) is stated for limits as x approaches an arbitrary a. For a function of a real variable, one can talk about the limit as that variable approaches an arbitrary number or infinity or -infinity. But for a function of an integer-valued variable, there is no concept of letting the variable "approach" an integer n. An integer either equals n, or it differs from n by at least 1; there is no "getting closer and closer". So limits as n approaches +infinity are the only kind that we can look at. (In other areas of mathematics, there are concepts of integers approaching integers. For instance, if one is interested in divisibility by 2 (or some other prime p), one can regard m as "close to" n if n-m is divisible by a large power of 2 (or generally, p); and one can define concepts of limit with respect to this concept of "closeness". You would see these concepts in Math 254; but they're out of the ballpark for Math 1AB.) ---------------------------------------------------------------------- You ask about the distinction between mathematical induction (p.683, note in left column) and deduction, and how often each is used. I would call any kind of precise reasoning "deduction"; so it would include mathematical induction. In nonmathematical usage, "induction" can refer to a non-rigorous kind of reasoning. The Oxford English Dictionary gives, as its 7th meaning of the word, Logic. a. The process of inferring a general law or principle from the observation of particular instances (opposed to DEDUCTION, q.v.). But in mathematics, where mathematical induction is a rigorously valid tool, there isn't a contrast between it and deduction. (The OED's 8th meaning of the word is that of mathematical induction.) Mathematical induction is only one of many tools of mathematical reasoning, and a somewhat sophisticated one, so it occurs only in a small fraction of the cases of deduction; but it is a powerful tool in those situations where it is needed. There are also many situations where we use mathematical induction in a situation where the reasoning is intuitively clear, and we don't think to call it by that formal name. E.g., knowing that the derivative of a polynomial of degree n>0 has degree n-1, we can "see" that the k-th derivative of that polynomial for k\leq n is n-k; though to argue this precisely, one would have to use mathematical induction. ---------------------------------------------------------------------- You ask how, in Example 7, pp.691-692, showing that s_{2^n} diverges, implies that the whole sequence of partial sums s_n diverges. If the sequence s_n converged, then there would be some L such that, as n --> infinity, the values of s_n became arbitrarily close to L. So the values s_{2^n}, being among these values, would also become arbitrarily close to L. But since they are approaching infinity, they are not becoming arbitrarily close to any fixed real number L. Having been shown the argument in rough form, you ought to be able to translate it into an "epsilon-delta" argument. Can you? ---------------------------------------------------------------------- You ask about Stewart's statement at (i) on p.703, that if the integral in question is convergent, then (4) gives an inequality involving a sum from 2 to n. Specifically, you ask why the sum begins at 2 rather than 1. Well, did you look back at formula (4) (the one labeled [4], in red, higher on the page) and see what it says? And if, on looking at that formula, you were puzzled at why the subscripts begin where the do, did you look at how Stewart obtained that formula? After following the argument back, let me know what you understand, and what step(s), if any, need clarification, and I'll address these. (See the "Note" on the lower part of the back of the class handout, beginning "If in my office hours ...".) ---------------------------------------------------------------------- You ask about applying the Comparison Test (p.705) to a series one or more terms of which have forms like 1/0. In that happens, then what one has is not a series. The definition of a series requires that every term be a real number! Note that the series Stewart writes down always avoid such cases; e.g., p. 694, Ex.54 and p. 704, Ex.22 both start with "n=2" to avoid the zero denominator at n=1. ---------------------------------------------------------------------- After a few questions on the estimation of sums discussed on p.708, you ask > ... what is the use of sequences and series in general?? ... Well, it can go in either of two directions: One can have a known mathematical entity and get a useful handle on it by finding a series that represents it; or one can have some mathematical situation leading to sums of terms that get closer and closer to some value, and try to learn about where it is heading by finding a simple expression for that value. In the beginning of this chapter, we focused on the latter idea, e.g., figuring out what 1 + 1/2 + 1/4 + 1/8 + ... was approaching; but we have already seen bits of the latter. E.g., by expressing the known rational number 1/7 as the sum of an infinite series, we get the decimal expression 1/7 = .142857142857142857..., which is easy to compute with in our decimal system of notation; and Stewart mentions here and there that series expansions for pi allow one to find its value to great accuracy. Starting with section 11.9, we will be studying representations of functions by "power series", which gives us a new information on functions like e^x and sin x, as well as non-elementary functions like the integral of e^{-x^2}. ---------------------------------------------------------------------- You ask about justifying the assumption that a_n is less than b_n for all n greater than some number N in working p.709, Ex.40. Whether one can take something like that for granted depends on the level of the audience for which one is writing (or when one is a student being graded, the level of the course one is in). At this level, it is best to give the details. The way to do so is to use the definition of the statement that lim a_n/b_n = 0. That definition begins, "for every epsilon there exists an N such that ...". After filling in the "..." in what I have written, do you see what value of epsilon would have the effect that for n > N, a_n < b_n ? ---------------------------------------------------------------------- Regarding the error estimate for alternating series given on p.712, you write > ... I am always wondering what is the purpose of finding the size > of the error? If we want to know the sum of a series, and it's not one where we can find the exact answer, then the best we can do is add up a lot of terms and regard the result as an approximation of the sum. Then we naturally want to know how good an approximation it is -- if we know that it has an error of less than .001, for instance, then we have essentially found the sum to 3 decimal places. Sometimes, even when we do "know" the sum, these error estimates give useful information. For instance, if the sum is pi, or ln 2, then by summing terms of the series, we can find the value of pi or ln 2 to many decimal places. Incidentally, we don't "find" the size of the error; we bound it. If we could find the exact size of the error, then we could find the exact value of the sum, by just adding the error to the partial sum. The best we can generally do is say that the error is less than some value. > Will it be a foundation of something else that we will learn in > the future? Not in this course. Very likely in Math 128, if you take it. ---------------------------------------------------------------------- Regarding Example 2 on pp.711-712, you say > ... we tested the series using the Test for Divergence. I thought > that rule only applied for series with positive terms. ... Check out the statement of that Test! There's nothing in it about "positive terms". Stewart is generally a very careful writer. If he doesn't say in a theorem that the terms must be positive, then that is not assumed. ---------------------------------------------------------------------- Regarding the ratio test (p.716) you note that it applies an absolute value to the terms of the series, and ask "how can it account for an alternating series?" Some alternating series do converge by the ratio test; e.g., the series with a_n = (-1/2)^n. As you indicate, the absolute value throws away the "alternating" property; so these are among the series that would converge even without that property -- they are absolutely convergent. But if you take an alternating series which is not absolutely convergent, such as the one with a_n = (-1)^n/n, you will find that it falls under the case where "the ratio test is inconclusive". ---------------------------------------------------------------------- Concerning the fact mentioned on p.719, that a rearrangement of a conditionally convergent series can sum to any real number, you ask whether there are ways of rearranging such a series that will not change its sum. Yes. We know that rearranging just finitely many terms has no effect. It is also not hard to prove other special cases, such as that interchanging each odd-position term and the term after it won't change the sum. But it is very tricky to describe the most general permutation that will not affect the sum; we won't go into that. (One time when I was teaching Math 104, I thought about that question, and got an exact criterion; but it was something far too complicated to give even to a 104 class.) ---------------------------------------------------------------------- You ask why the convention on p.723 that (x-a)^0 is 1 when x = a is "valid". Any definition which says what we will mean by a symbol is "valid", as long as we follow it consistently, and reason correctly using it. One can ask whether it is consistent with other definitions we have made; but we have no other definition of x^y that apply to the case x=y=0. Assuming we follow and reason about our definitions consistently, the next question is whether a given definition is useful. In situations like this one, where it is clear what definition we want to use for "most" values of the argument, and we need to decide whether it would be useful to extend this definition to some cases where it is not obvious what our choice should be, we should ask which choice, if any, would make various general statements hold for the new cases as for the new. One condition that would be nice, but that we can't make hold for the function x^y when x = y = 0, is continuity as a function of both x and y: If we take x = 0 and let y -> 0+, the limit is 0, while if we take y = 0 and let x -> 0, the limit is 1. On the other hand, if we define positive integer powers of a number x by starting with x^0 = 1, and recursively letting x^{n+1} = x x^n, then this leads to a nice uniform development of the laws of exponents (for nonnegative integer exponents), which requires no exceptions for x = 0. The choice 0^0 = 1 is also convenient in that it allows us to express a polynomial or power series a_0 + a_1 x + a_2 x^2 + ... as Sigma a_n x^n. Note also that in dealing with power series, we never have exponents n that approach 0 via nonzero values (which is the situation in which the discontinuity of 0^y at y = 0 would make trouble), since no integer is closer to 0 than 1; but we can have x approaching 0 via nonzero values (whenever we look at the behavior of our series near 0). So of the two competing definitions of 0^0 that continuity considerations lead to, the choice 1, that makes x^y continuous in x, is more useful than 0^0 = 0, which would make x^y continuous in y. Depending on the area of mathematics, one may find it useful to define 0^0 as 1, or leave it undefined. In the majority of cases, including the study of power series, the definition 0^0 = 1 is by far the most useful. (Incidentally, this does not contradict the statement "0^0 is an indeterminate form". That is not a statement about the value of 0^0; rather, it is shorthand way of saying that if two functions f(x) and g(x) both approach 0 as x-->a, then this is not enough information to determine lim_{x->a} f(x)^{g(x)}. The reason for this is the fact mentioned above, that no choice of definition for 0^0 makes the function x^y continuous at (0,0).) ---------------------------------------------------------------------- You ask why, on the top line of p.732, Stewart chooses x=0 to determine the constant C. Because x = 0 is the one point where we can see immediately what the sum of the series is! ---------------------------------------------------------------------- Noting that for the approximation of the integral in Example 8, p.732, Stewart uses the Alternating Series Estimation Theorem, you ask > ... if there is a general way to set the error to be less than > something... or will it defenitely depend on the form of the series? In general, it will depend on the form of the series. However, when |x| < |R|, we know that the absolute values of the terms of the series will always be less than or equal to those of some geometric series with 0 < r < 1, so one can use the fact that the terms will be less than those of that series. But the estimate we get may not be the best estimate there is, and this method won't work at the ends of the interval of convergence, i.e., when |x| = |R|. By the way, where you write "to set the error to be less than something", the easy way to express this is "to bound the error". or "to get a bound on the error". ---------------------------------------------------------------------- Regarding the computations of Example 2, pp.738-739, you write > ... I am confused about how this proves that the function is equal > to the sum of its Taylor series for all x ... It's because of equation 10 on p.738. Note the words after that equation, "for every real number x". If the denominator were, say, 5^n, the limit statement would just be true for |x| < 5, but with a factorial in the denominator, it is true for all real numbers. > ... and not just a set radius of convergence. ... Recall Theorem 3 on p.725: a power series may have a finite radius of convergence, or it may converge for all x. Returning the the first part of your question -- did you read the justification carefully, and see the explanation "from Equation 10" in the last sentence of p.738? ---------------------------------------------------------------------- Regarding the proof of the n=1 case of Taylor's Inequality on p.737, you note > the book says that a < x < a + d... but if |x-d| < d shouldn't > it be a - d < x < a + d ? why are they taking x > a ? The calculations for x > a and for x < a are mirror images of one another. He first does the x > a calculation, then says in the middle of the next page that "similar calculations" handle the case x < a. ---------------------------------------------------------------------- You note that at the bottom of p.737 it says that an antiderivative of f'' is f', and that this implies that f'(x)-f'(a)<=M(x-a). You say > I do not really know where this comes from, is there any direct > relationship between the antiderivative that the book mentions in > the past sentence with this equation? Yes! The book says "by Part 2 of the Fundamental Theorem of Calculus". Did you look up (or remember) what Part 2 of the Fundamental Theorem of Calculus says, and ask yourself how it might be applicable to this implication? I am not answering in this way to be difficult, but to point out to you the way you need to read the text in order to do well in a math course. Let me know whether you had already checked out what Part 2 of the Fundamental Theorem of Calculus said, and thought about whether it was relevant; or if not; and in either case, how far you get with the connection, and whether you still need help with it. ---------------------------------------------------------------------- Regarding the material on pp.735-739 you ask > What is the difference, or advantage, of writing the sum of the > Taylor series of e^x centered at a=0 versus a=2? If you want to approximate the values of e^x for x near 0, the series centered at a=0 is the useful one; if you want to approximate the values for x near 2, the series centered at a=2 is best. ---------------------------------------------------------------------- You ask why Stewart expands sin x about the particular point pi/3 on p.741. He wants to illustrate the fact that one can use these series, even when the calculation is hairy. The point pi/3 is a natural one, since it represents 60 degrees, and it has a sine and cosine given by simple expressions. If one wants to compute values of a function near a particular point x=a, the power series centered at a is much more useful than series centered at more distant points. It converges more quickly, since the terms (x-a)^n becomes very small. So for applications one needs to be ready to expand about any a. ---------------------------------------------------------------------- You ask whether long division, as in Example 12(b), p.745, is the only way to find the power series for tan x. No. Formula 6 (p.735) is also applicable to tan x. But there is no simple formula for the n-th derivative of tan x, while there are such formulas for sin x and cos x. Using Formula 6, one can compute as many terms as one wants of the series for tan x; but since there isn't a general formula, it's easier to compute them from the series for sin x and cos x, which do have general formulas. ---------------------------------------------------------------------- Regarding multiplication and division of power series, illustrated on p.745, you ask how many terms one should take. Stewart says at the end of the first paragraph that he will "only find the first few terms because the calculations of later terms become tedious ...". Typically, there is no simple formula for the general term, so no "pattern" emerges from further calculation. Any number of terms can be computed, and in real life, one would compute however many one wants for one's purposes. Stewart just computes enough to illustrate the method of computation. ---------------------------------------------------------------------- You ask whether there are common applications of Taylor polynomials in fields other than physics (discussed on pp.753-755). Probably; especially in fields closely related to physics, such as chemistry and astronomy/cosmology. The reason physics is the most obvious place is that it has exact mathematically expressible laws, so one knows when one is replacing these laws by approximations. In something like biology, the mathematical models are approximations anyway, so that rather than going from an exact law to an approximation, one simply goes from one model to another. Insofar as chemistry and astronomy are based on physics, they too have exact laws. Still, there may be other areas that I just don't know about. [sent later:] I was just looking online for something else (in relation to another student's question of the day), and I ran into a book where Taylor series approximations are applied to a different field -- finance! See http://books.google.com/books?id=o3w4ilXdIGgC&pg=PA219 ---------------------------------------------------------------------- You ask about the origin of the term "homogeneous" in connection with differential equations, as used on p.1111. I discussed this in class last week. First note that one says that a polynomial in variables w, x, y, z, ... is "homogeneous" if all terms with nonzero coefficientshave the same degree; e.g., w^2 - xz is homogeneous because all terms have degree 2. Next, one can talk about a polynomial being homogeneous in some subset of the variables. E.g., if one looks at x^2 - wxyz as a polynomial in all four variables, it is not homogeneous; but if one considers only its dependence on x and y, then all terms have degree 2 in those two variables, so it is homogeneous in x and y. Finally, when one considers a linear differential equation P_n(x) y^{(n)} + P_{n-1}(x) y^{(n-1)} +...+ P_1(x)y' + P_0(x) y = G(x), it is most useful to consider it as a polynomial in y, y',..., y^{(n)}, and hence to ask whether it is homogeneous in those variables. All the terms on the left have degree 1 in those variables, while the term on the right has degree 0 in them; so (assuming that the P_m are not all zero, as we must for this to be a differential equation) our equation is homogeneous if and only if the term on the right can be ignored in these considerations, i.e., is zero. ---------------------------------------------------------------------- You ask whether Theorem 3 on p.1118 also works for n-th order linear equations with constant coefficients. It works for all n-th order linear equations, whether the coefficients are constant or not! (And I would have expected you to have been able to figure this out for yourself -- just try working through the proof that Stewart gives on p.1118 for the general case, and see whether it works!) ---------------------------------------------------------------------- Regarding the development of the method of variation of parameters on pp.1122-1123, you ask why the condition u_1' y_1 + u_2' y_2 = 0 is "valid". I meant to answer this in lecture, but realized, after I left, that I hadn't really. Equation [5] on p. 1123 gives too many ways of representing any function: If a certain function y_p can be represented as y_p(x) = u_1(x) y_1(x) + u_2(x) y_2(x), then for any function v, we could take w_1(x) = u_1(x) + v(x)/y_1(x), w_2(x) = u_2(x) - v(x)/y_2(x), and then we would also have y_p(x) = w_1(x) y_1(x) + w_2(x) y_2(x). Thus w_1(x) and w_2(x) would lead to the same y_p as u_1(x) and u_2(x) do. So we need to put some additional restriction on u_1 and u_2 to make them anything near to unique, so that we can solve for them. Stewart suggests equation [7] as an additional condition that we might put on them. We can't be sure, a priori, that among all the pairs of functions satisfying [5], there will be a pair that also satisfies [7]. But if there is one, then, just because it satisfies [5], it will lead to a solution to our differential equation. And it is plausible that there should be a solution that also satisfies [7], because two equations in two unknowns tends be right for giving a unique solution. Stewart follows up the consequences of assuming [7], and gets an equation [9] which, together with [7], does indeed uniquely determine u_1' and u_2', allowing one to get u_1 and u_2 by integration. ---------------------------------------------------------------------- You ask about buoyant force in the situation of Figure 3, p.1126. Nice point! When Stewart talks about the "natural length" of the spring on p. 1125, he is taking account of the weight of the object attached at the end of the spring, if that weight is hanging vertically. (So the same spring would have different "natural lengths", i.e., equilibrium positions, in the situations of Figure 1 and Figure 2.) In the situation of Figure 3, the buoyant force would be a constant, which would cancel part of the weight of the mass, and so lead to still a different "natural length". But it wouldn't affect the spring constant or the damping force; so taking x to be the number of units the spring is stretched from that new natural length, the discussion at the bottom of p. 1126 is still valid. ---------------------------------------------------------------------- Regarding the discussion of an underdamped vibrating spring on p.1127, you note that > the equation implies it will keep vibrating forever, just at > lower and lower amplitudes. Is this the case, or does something > physically stop the vibration at some point? In real life springs > appear to stop, ... Well, since the decay of the vibration is exponential, it would rather quickly decrease below the point where it could be observed. E.g., if the amplitude decreased by a factor of 10 in one minute, then in 10 minutes it would decrease by a factor of 10,000,000,000. Very soon it would be less than the motion added by random collisions of air molecules, etc.. That said, it may very well be that before that point, various nonlinearities in the accurate description of the mechanics of very small motions would become more significant than the terms of the equation we have been using. For instance, though friction is described as proportional to the velocity, I suspect that for many common interfaces between solids, there is a small constant term, so that if one imposes a force smaller than what is needed to overcome that constant term, the object does not slide. ---------------------------------------------------------------------- Regarding Example 1 on p.1133, you ask > Would it be possible to use power series to solve the equation > $y''+y=0$ without reindexing Equation 3 so that it looks like > Equation 4? Does Equation 3 need to be reindexed to compare > it to Equation 2? This essential thing is to remember that if two power series are equal, then for each n, the coefficients of x^n in those series must be the same. So in Example 1, for each n, you need to find the coefficient of x^n in y'' + y, and set it equal to 0. If you are comfortable eyeballing equation 3 and seeing that the coefficient of x^n in y'' is (n+1)(n+1)c_{n+2}, and concluding that the sum of that with the coefficient of x^n in equation 1, namely c_n, is 0, that's fine. The method of re-indexing is simply a convenient reliable way of doing this on paper. ---------------------------------------------------------------------- You ask about the blue equation in the left margin of p.1135. The two sides of that equation differ in that the right-hand side includes a summand with n=0, while the left-hand side leaves that out. But in the summand with n=0, the factor n is 0, so that summand is 0; so leaving it out makes no difference. ---------------------------------------------------------------------- Regarding the technique of series solutiosn to differential equations developed on pp.1133-1137, you ask > ... whether there would ever be a different assumption made > than y= summation of c*x^n? ... Yes. For instance, if the coefficients of the equation involved x^{1/2}, it would be reasonable to expand as a series in terms x^{n/2}. In some situations, if one had reason to believe that the solution would have a pole at x = 0, one might use a series like c_{-1} x^{-1} + c_0 + c_1 x + c_2 x^2 + ... . And, for a much more trivial variant of the Sigma c_n x^n expansion, in some situations one might want to expand in powers of x-a for some nonzero a. ---------------------------------------------------------------------- You ask, regarding the proof of Law 4 on p.A39, "why is |g(x) - M| < epsilon / 2(1+L) ?" I hope that what I said in lecture answered this for you. The inequality that you refer to is not a fact that we deduce. Rather, the definition of "lim_{x -> a} g(x) = M" tells us that _we_can_find_a_delta_ such that, whenever x is at distance < delta from c, the above inequality holds. In fact, it tells us that we can find a delta which, in the same way, ensures any degree of closeness between g(x) and M that we want. We decide that we want exactly that degree of closeness. Why do we choose that degree of closeness? That is what I was sketching when class ended. The idea is to think strategically, "What degrees of closeness of f(x) to L and of g(x) to M can together put f(x) g(x) within epsilon distance from LM?" That strategic thinking led us to choose epsilon / 2(1+L) as the degree of closeness of g(x) to M that we wanted. ---------------------------------------------------------------------- You ask how, near the top of p.A40, $|L| (\epsilon / (2 (1 + |L|)))$ "is reduced to" $\epsilon / 2$. Steward does not assert that they are equal: note that the formulas you are comparing are not connected by "=" but by "<". The argument is that since $2(1+|L|) > 2|L|$, when they occur in the denominators (with a positive numerator), we have $numerator / (2(1+|L|)) < numerator / (2|L|)$ After noting this, and the fact that the numerator contains $|L|$, one can, as you suggested, cancel the $|L|$ in the numerator with the $|L|$ in the denominator, getting $\epsilon/2$. ---------------------------------------------------------------------- Regarding the proof of Limit Law 5, p.A40, you ask "Why is |g(x)-M| < |M|/2 ?" I can't tell from your question whether you had read the first part of the sentence, saying "... there is a number \delta ... such that whenever 0 < |x-a| < \delta we have". In view of that phrase, what Stewart is saying is not that "|g(x)-M| < |M|/2" is something that automatically holds, but that it is something we can cause to hold, by taking x close enough to a. Given this, your question can be broken in two: "Why do we want to cause |g(x)-M| < |M|/2 to hold?" and "How do we know that we can cause |g(x)-M| < |M|/2 to hold?" Let me know whether you need help with either of these questions. (Hints: The answer to the second question is based on the first part of the sentence. The answer to the first part lies in the remainder of the proof -- what we subsequently use the the inequality |g(x)-M| < |M|/2 for.) ---------------------------------------------------------------------- You ask about the step |g(x) - M| < |M|/2 in the proof of Limit Law 5, p.A40. The first thing to understand is that we are not proving that this inequality holds; we are noting that we can _cause_ it to hold, by restricting x to be sufficiently close to a. I hope it's clear to you that what Stewart is saying, in the sentence leading up to that inequality. Assuming you understand this, your question is "Why do we want to make |g(x) - M| < |M|/2?" The answer is that we want to make the fraction | (M-g(x)) / Mg(x) | small (namely, < epsilon). Now a fraction will be "small" if and only if the numerator is much "smaller than" the denominator. (E.g., a fraction a/b will have absolute value < 1/10 if and only if the numerator is less than a tenth the size of the denominator in absolute value.) In the above fraction, M is fixed, but the numerator and denominator both depend on g(x), so we want to see whether for x close enough to a, we can make g(x) assume values making the numerator "much smaller than" the denominator. The way we will do this is to make the numerator "very small" without letting the denominator get "very small". So first we keep the denominator from getting arbitrarily small, by noting that g(x) is approaching the nonzero value M; so if we make g(x) close enough to M, it will have absolute value near |M|. Specifically, we can get that absolute value to be at least |M|/2 by making |g(x) - M| < |M|/2 (the inequality you asked about). Then, once we have the fixed lower bound |M|/2 on g(x), and hence the lower bound |M^2|/2 on the absolute value of the denominator of our fraction, we can figure out how small we need to make the numerator to get the whole fraction to have absolute value < epsilon. That's what Stewart achieves on the next page, in the sentence starting with "Also". (Incidentally, we didn't _have_to_ make |g(x) - M| at least |M|/2 in the above proof. We could have chosen any value <|M|; e.g., we could have made it at least 3|M|/7 by choosing |g(x) - M| < 4|M|/7. But |M|/2 was just the simplest choice of a number strictly between 0 and |M|.) ---------------------------------------------------------------------- You ask, in connection with the theorem that the inverse of a one-to-one continuous function on an open interval is continuous (p.A42), whether a function on an open interval can have continuous inverse without being continuous and one-to-one. A function has to be one-to-one for its inverse to be well-defined. On the other hand, if f is one-to-one but not continuous, it can still have a continuous inverse, if we understand the domain of the inverse to be the range of f, even if that is not an interval. For instance, let f, with domain (0,2), be defined to have f(x) = x if x lies in (0,1], and f(x) = x+1 if x lies in (1,2). Then the range of f is the union of the intervals (0,1] and (2,3). On this range, the function f^{-1} takes a point y of (0,1] to y, but takes a point y of (2,3) to y-1. You can check that this is continuous at each point of its domain. Incidentally, in the above example, f was an increasing function. But if we change our definition by making f(x) = 4-x (instead of x+1) for x in (1,2), we find that the function is still one-to-one, and that its inverse is still continuous on the range of f, and that range is still the union of (0,1] and (2,3); but f is no longer everywhere-increasing or everywhere-decreasing. ---------------------------------------------------------------------- You ask about the formula |g(x)-b| < delta_1 in the last display of the proof of Theorem 8 on p.A43, and in particular, why it has delta_1 where you would expect an epsilon. There is nothing sacred about the letters "epsilon" and "delta". It is simply convenient to write delta for our bound on the deviation of the input of a function whose limit behavior we are studying, and epsilon for our bound on the deviation of the output. That convention helps us keep track of what we are doing. But in this proof, the input of f is the output of g, so whichever name we give to the number needed at that point, it can't fit the convention with regards to both f and g. In any case, I hope you read the whole sentence, and not just the formula, so that you saw that it was not asserting that "|g(x)-b| < delta_1" in general, but only that there exists a delta such that for all x within delta distance of a, "|g(x)-b| < delta_1" holds. That there exists such a delta follows from the definition of the statement that lim_{x->a} g(x) = b. Namely, since that statement begins "for every epsilon > 0", we can, in particular, apply that statement with the delta_1 we have found in the role of that epsilon. ---------------------------------------------------------------------- You ask two questions about the proof of Theorem 8 on p.A43. Let's start with the second: > .. And why is it necessary to introduce the y variable into the proof? We are looking at the function f(g(x)); so the output of g gets fed into f as its input. In discussing this situation, we need to use the fact that f is continuous at b, which is a statement about the relationship between inputs and outputs of f in general. We could call the input of f "x" in that discussion, but this would be confusing, because when we use g(x) as the input of f, this is different from the "x" that is the input of g. So it is better to use a different letter for any values that occur as inputs of f, in particular, those arising as outputs of g. Now to your first question: > Why is it that if 0 < |x-a| <\delta then |g(x)-b| <\delta_1 ? Stewart is not saying that this is true for any old \delta; if you look at the line before the formula you are asking about, you see that Stewart says "there exists \delta > 0 such that". If, after thinking about the point, you still have a question about it, then ask again, making clear that you understand what it is Stewart is asserting. (And if you did understand that, and had made it clear in your question, then I could have addressed it to begin with. One way or the other, there's something to be learned from this -- either that in looking at Stewart's formulas, you need to read them in the context of the sentence containing them, or that in formulating your question of the day, you need to make clear how much you do understand before getting to the point that you don't understand.) ---------------------------------------------------------------------- Regarding the proof that tan theta > theta on pp.A43-A44, you ask > After the step |PQ|<|RT|<|RS|, how does Stewart arrive at > L_n<|AD|=tan(theta)? The sum of the terms of the form "|PQ|" is L_n, the sum of the terms of the form "|RT|" is AD, and |AD| = tan theta. In the first assertion, you should understand that Stewart's picture just shows the case where n=3, and he has only labeled one of the secants to the circle "PQ"; but in the situation he is talking about, n can be any positive integer, and "PQ" represents any one of the n secants (which include, in his picture, the two labeled AP and QR). So we see that if we add them all up, we get the length L_n of the inscribed "polygon" (string of segments -- in his picture, APQB.) Similarly, in the situation he is talking about, "RT" represents any one of the n pieces into which the side AD is divided (including, in his picture, those labeled AR and SD); so when we add these up we get AD. I hope the equation |AD| = tan theta is clear. ---------------------------------------------------------------------- You both ask about Stewart's statement that to prove Cauchy's Mean Value Theorem we "change" the function h(x) given at the top of p. 283 to the h(x) given on p.A45. What he means is that we take the proof of the ordinary Mean Value Theorem on p. 283, and where that theorem uses the function h(x) defined at the top of that page, we instead use the function h(x) defined on p. A45. He leaves it to you to verify that the same arguments used on p. 283, when applied to the new function, will give the new version of the Mean Value Theorem. Probably "change" was not the best choice of words; it might have been better to say "use ... instead of ...". (Note that the h(x) on p. 283 is a special case of the one on p. A45: it is the case where g(x) = x. Similarly, the old Mean Value Theorem is the case of Cauchy's version where g(x) = x.) ---------------------------------------------------------------------- Your question concerns the relation between x and y in Stewart's proof of L'Hospital's rule on p.A46. In the calculation, x represents any value "near enough to a" on the given side; in the notation I used in class, any t\in (a,a+\delta), while y represents a value such that f(x)/g(x) = f'(y)/g'(y) which by Cauchy's Mean Value Theorem can be found in (a,x). So as he notes, as x approaches a, the corresponding value of y also approaches a. In his displayed calculation, he deduces that since for each x and y so chosen, f(x)/g(x) = f'(y)/g'(y), it follows that lim_{x -> a} f(x)/g(x) = lim_{y -> a} f'(y)/g'(y). Logically, this could also be written lim_{x -> a} f(x)/g(x) = lim_{x -> a} f'(x)/g'(x); but to remind us where this comes from, he is using the letters x and y as them in the preceding discussion. The way I did it in class (recall that this was the easy argument I gave, not the hard one!) using epsilons and deltas avoids having to say things like "as x approaches a, y approaches a"; we can simply say "for every epsilon, choose delta such that ...", and show explicitly why |f(x)/g(x) - L| < epsilon for x in an appropriate range. ---------------------------------------------------------------------- You ask whether a delta-epsilon proof is required for L'Hospital's Rule (p.A46). I think that the reason Stewart does not give one is that he has written his book so that instructors who consider the delta-epsilon definition of a limit "too hard" can have their students skip section 2.4. He then needs to make as much as possible of the rest of the book independent of that section; in particular, he words the proof of L'Hospital's rule so that it does not refer to epsilons and deltas. But this requires him to use the somewhat handwavy wording "if we let x --> a^+, then y --> a^+", and to argue by the displayed equation that follows this, in which the relation between "x" and "y" implicitly comes from the paragraph that precedes. A precise delta-epsilon argument gets rid of the handwaviness. On the other hand, most any mathematician who has worked with limits would know how to translate a proof such as Stewart gives here into a delta-epsilon proof. So one could say that for an experienced mathematician, the difference between a delta-epsilon proof and the proof Stewart gives is not that important. ---------------------------------------------------------------------- Regarding the proof of L'Hospital's Rule on p.A46 you ask > In this proof, is there any reason to define F(x) and G(x) > other than to establish the fact that F(x) = G(x) = 0 and > simplify the result of Cauchy's Mean Value Theorem? It not merely "simplifies the result of" Cauchy's Mean Value Theorem -- it makes it possible to apply that theorem to our functions on the interval [a,x]. One might, as I did in lecture, cut corners and say "redefine f and g so that they are both 0 at a", rather than introducing new symbols F and G. It all depends on whether one feels it more important to emphasize the fact that the new functions are "except for one little detail (being undefined or being 0 at a)" the same as the old ones, which led to my choice of using the same symbol, or whether one is more worried that using the same symbol for two (slightly) different things might confuse the reader, which led to Stewart's choice to use different symbols. ---------------------------------------------------------------------- You ask how in the middle of p.A46, the limit of f'(y)/g'(y) can be the same as the limit of f'(x)/g'(x), when y is not the same as x. The limit is not something defined in terms of a single value of x or y, but something whose definition is based on considering all x or y in a certain range, and looking at whether the delta-epsilon definition of that limit, as we select our x or y in that range, is true. Stewart's proof is a little handwavy, in that he first considers any particular x, and chooses a y in terms of it; then he talks about limits as these x and y vary (approach a). The version of the proof that I gave in class, using epsilon and delta (the easy proof, for f, g -> 0, not the hard proof for f, g -> infinity), translated Stewart's argument into a precise form that I hope you won't have trouble with. If you do, ask at office hours (or by e-mail if you think you have a question that can be stated and answered briefly). ---------------------------------------------------------------------- Regarding the proof of the first theorem stated on p.A47, you ask what it means for S to be nonempty. The empty set is the set with no elements; so a set is nonempty if it contains any elements. In particular, S is nonempty because it contains b. You then ask, > If we can only guarantee that x=b is contained in the set, does > this one number satisfy the conditions of the Completeness Axiom? For S to satisfy the hypothesis of the Completeness Axiom, we have to know that it is nonempty and bounded. If we knew that it consisted only of the element b, then, yes, that would make the axiom applicable; but if all we know is that it contains b, that isn't the same as knowing that it consists only of b; so we need more to show that it has a least upper bound; namely, we need to know it is bounded. Stewart establishes that on lines 3 and 4 of the proof. ---------------------------------------------------------------------- Concerning the proof I gave of the theorem on p.A47 describing intervals of convergence, you asked whether a value |x| that was bigger than one upper bound might be another upper bound, and still a member of S. You may be confused about what "upper bound" means. For instance, if S is the union of the two intervals [1,2] and [3,4], you might be thinking of 2 and 4 as "upper bounds" of S. But that is not what the term means; it means a number that is \geq all members of S. (See last paragraph on p. 682.) So 2 is not an upper bound of the set just described; but 4 is, and so are 4.2, 500, etc.. As in that example, any set S that is bounded above has many upper bounds. If it is nonempty, it will have a least upper bound M; the real numbers > M will be its other upper bounds. But if a number x is > M, it can't belong to S, since M being an upper bound to S makes M \geq all members of S; but by assumption it is not \geq x. (If you did think 2 and 4 were what were called "upper bounds" of the union of [1,2] and [3,4], then you might wonder, "What can one call them?" In fact, the set \{1,2,3,4\} is what is called the "boundary" of the above set; so 2 and 4 are two of the points of that boundary. This is not a concept we will introduce in H1B, but you are liable to see it in some upper division and/or graduate courses: 104, 140, 202A, ...). ---------------------------------------------------------------------- You ask whether one can correctly prove the third law of logarithms on p.A51 by differentiating ln(x^r), noting that it has the same derivative as r ln(x), and verifying that the constant by which ln(x^r) and r ln(x) differ must be 0. Well, differentiating ln(x^r) requires knowing the derivative of x^r. Looking through the early sections of Stewart, I see that he proves the formula for the derivative of x^n where n is a positive integer directly (p. 174), and then on the next page, states the rule for any real number x, saying he will prove it in section 3.6. In that section, he proves it using properties of the logarithm, including the 3rd law of logarithms. (See the first display in the proof on p. 218. Though he writes "n" for the exponent, he has said it represents any real number.) So the proof that you describe could not be used in the context Stewart has set up in this section, where we have put aside everything we had derived by methods that assumed real exponents, etc.. However, the formula for differentiating x^r when r is rational _could_have_ been derived (with more work) using the formula for x^n where n is an integer; and if such a derivation had been given, your proof would be correct. (Needless to say, I won't give on an exam any question where you would have to sort out what was proven how in Math 1A to know whether your proof of something is valid!) [SENT A FEW HOURS LATER:] I see that your proof of the third law of logarithms, which I criticized, follows the hint in Stewart, Exercise 5, p. A57 ! Unfortunately, Stewart doesn't say in this exercise whether, in the "third law of logarithms", x^r is meant to be defined as an integer root of an integer power of x, which is necessarily what is meant in the statement of that law on p. A51, or by the more general equation 13 on p. A54. If the "integer root of an integer power" is meant, then, as I said, one can't use the law d/dx x^r = r x^{r-1}, because that was based on Stewart's earlier treatment of exponentiation, which he says at the end of the second paragraph on p. A50 we are not going to use here. On the other hand, if the equation 13 definition is meant, then differentiation is not needed; the result comes easily out of the definition. So either way, I think his hint is not appropriate. I guess I should e-mail the class about this, as a correction to the homework. ---------------------------------------------------------------------- You ask whether the material in Appendix G (pp.A50-A56) would need to be proved through epsilon-delta methods on a test. "epsilon-delta" proofs occur mostly in setting up the foundations of calculus. Those foundations include results which have been proved using "epsilon-delta" methods, and which then can be used to prove other results. So the answer is no, what Stewart does here is essentially correct; though as I pointed out in lecture, there are a couple of places that need to be filled in. One is about differentiability of inverse functions, where the proof that I sketched has a key step, changing variables in a limit statement, that would need epsilon-deltas for a full proof. For the other, the proof that since ln 2^n approaches infinity as n does, ln x will approach infinity as x does, I showed the an "M, N" proof, which is the equivalent of an "epsilon-delta" when x and f(x) are approaching infinity instead of real constants. ---------------------------------------------------------------------- You ask about graphing functions on the complex plane (or the Argand plane, as it is named on p.A57). Hard to do. If f is a function C --> C, then using a 3-dimensional graph, one can graph the real part of f(x+iy) as a function of x and y; or the complex part, or the absolute value, etc.; but it would take 4 dimensions to graph the real and complex parts together. I think some of the plaster models in the cabinet in the Common Room, 1015 Evans, represent graphs of the real parts of certain complex functions. (Such models must have been popular around 100 years ago. Many math departments have them, but hardly anyone looks at them nowadays.) Something else one can do is restrict the function to a line in the complex plane, say the real axis or the complex axis, and so get a map R --> C, which can also be graphed in 3 dimensions, this time as a curve. Anyway, combining the intuition about how functions R --> R behave that we get using graphs, and the theorems proved in Math 185 about how complex functions behave, one can develop an intuition for such functions, even though one can't graph them entirely satisfactorily. ---------------------------------------------------------------------- You ask whether Stewart's statement, regarding Figure 3 on p.A58, that |z| = sqrt(a^2 + b^2), should be sqrt(a^2 - b^2), since (bi)^2 = -b^2. The statement is correct as he gives it. It is a standard definition, so the only challenge one can raise is "Does that function of a+bi have useful properties, that would justify making such a definition?" It does, as is shown in the next few calculations. Something that might have led you to think b^2 should be (bi)^2 is the thought that the vertical line labeled "b" in Figure 3 should be labeled "bi". But the labels Stewart puts on line segments in that figure give their geometric lengths, not the complex number representing the difference between their endpoints. (So the red arrow is labeled with a real number, not with the complex number a+bi.) Taking this into account, I hope the figure now makes sense. ---------------------------------------------------------------------- Regarding the statement on p.A59 that the argument of a complex number is not unique, you ask > ... Is it just refering to the fact that an angle "a" is equal > to "a + 2pi*n" where n is any integer? Essentially, yes. For some purposes, mathematicians do create a kind of entity in which a and a + 2 pi n are actually equal. But for our present purposes, we have to regard a as a number, and then a + 2 pi n is not the same number as a; it's a different number, though one at which the sine, cosine etc. have the same values that they have at a. A consequence is that "arg" is not a well-defined function; the symbol "arg(z)" means "any one of the infinitely many real numbers theta that make the boxed equation on p.A59 true". This nonuniqueness has important consequences for taking N-th roots. When we divide "arg(z)" by N, the results no longer differ by multiples of 2 pi, so they no longer lead to the same complex number. Rather, as noted on p. A62, every nonzero complex number has N distinct N-th roots. ---------------------------------------------------------------------- You ask why the \theta occurring in the polar form of a complex number (p.A59) is called its "argument". I don't know. Looking in the online Oxford English Dictionary, one of the definitions of "argument" is: Astr. and Math. The angle, arc, or other mathematical quantity, from which another required quantity may be deduced, or on which its calculation depends. Their examples show it being used by Chaucer, I think with reference to astronomical calculations. My guess is that this sense split into two: When f is a function, then in the expression f(x) we now call x the "argument of f"; so it is a quantity on which another quantity depends. On the other hand, its use referring to an important angle must have led to the specialized sense in the study of complex numbers that Stewart gives here. (Note that in primitive astronomy, everything that one could measure was an angle in the sky, so the two senses were not that different.) As to what such quantities or angles have to do with the other meanings of the word "argument", the OED the gives not a hint. ---------------------------------------------------------------------- You ask how the use of complex numbers (pp.A57-A63) in describing real-world wave motion can be justified. I'm not sure what sort of use of complex numbers you are referring to, but I'll mention several. (a) As a consequence of the equation e^{it} = cos t + i sin t, one gets cos t = (e^{it} + e^{-it})/2 sin t = (e^{it} - e^{-it})/2i. Hence every linear combination a sin t + b cos t (where a and b are real numbers) can be written c e^{it} + d e^{-it} where c and d are conjugate complex numbers. For computations, this expression can be much more convenient than the original expression, since exponentials behave more simply than trig functions with respect to multiplication, differentiation, etc.. As long as we specify that c and d are conjugate, the expression c e^{it} + d e^{-it} will be real-valued, and so can describe a wave function in the real world. (b) The wave equation is linear, so any linear combination of functions that satisfy it satisfies it again. A consequence is that the condition that c and d be conjugate is irrelevant to the mathematical study of the wave equation, and so can be dropped. We then find that the simplest functions in terms of which to write the general function as a linear combination are e^{it} and e^{-it}. So for simplicity, one uses these; and since they behave identically (they are conjugate to one another), one may, for simplicity, use just one. It doesn't represent a "real-world" wave function, but it's easy to work with, and one can get real-world wave functions by taking linear combinations of it and its conjugate. (c) I believe that in quantum mechanics, one posits wave-functions that are genuinely complex-valued. Since the relation between quantum mechanics and the world we know is mysterious anyway, and I have only a layman's knowledge of the subject, I won't try to guess whether here, complex numbers really do occur in nature, or whether there is a range of possible mathematical formulations of the same not-directly-observable phenomona, in which case the founders of quantum mechanics may have chosen the mathematically simplest, in the absence of any other criterion for preferring one above others ... or what! ----------------------------------------------------------------------