ANSWERS TO QUESTIONS ASKED BY STUDENTS in Math H1B, Fall 2010, Fall 2012, and Fall 2014, taught from the UC Berkeley custom text based on James Stewart's "Calculus, Early Transcendentals". (For the first two years we used a custom text based on Stewart's 6th edition, which had pages numbered as in that edition; but I converted the pagination to that of the 7th edition after the semester. When we get a copy of the 8th edition, I hope to update the pagination to that. Though we covered the material in nonconsecutive order, I have rearranged the answers below in the order of pages referred to, with Stewart's appendices at the end. Incidentally, the page that I consider a question to refer to, which determines its location in this file, is denoted "p.N" where N is the page-number, while any other pages referred to in the answer are denoted "p. N", with a space before the N; this can be of help in pattern-searching for material regarding a particular topic.) ====================================================================== You ask about Zeno's paradox (p.6) which argues that a man can never walk across a room, because he has to walk half way, then half of the remaining half, etc., and this involves infinitely many successive intervals. I don't know who first came up with a satisfactory explanation; but here is my way of looking at it: If the room is, say, one unit in width, and the man is going at the speed of 1 unit per minute, then it is true that the finite distance can be divided, as described, into infinitely many intervals; but the time of 1 minute can be divided in exactly the same way into infinitely many time intervals: half a minute to cross half the room, a quarter of a minute to cross half the remaining distance, etc.; and just as the infinitely many fractional distances add up to 1 unit of distance, so the infinitely many fractions of a minute add up to one minute; so he succeeds in crossing the room in the finite time of one minute. However, if one makes a tiny modification in the situation, then the man really is blocked from reaching the other side. Suppose we assume that half way across the room, he pauses for a tenth of a second to think "I've gotten half-way across"; and suppose that half-way across the remaining half, he again pauses for a tenth of a second to think "I've gotten half-way across that part", and so on, pausing for a tenth of a second at each of the points Zeno referred to. Even though the first and second and third pauses would add negligible time to the time it took for him to traverse his last segment of distance, by the time he got to his tenth pause, it would take a time comparable to the time he took crossing the tenth interval, and after that, the pauses would greatly outweigh the time spent walking, and with infinitely many tenth-of-a-second pauses, he really would never get across the room. (Though it would take a microscope to see that he hadn't made it.) ---------------------------------------------------------------------- You ask how the epsilon-delta definition of limit (p.110) is different from the definition that tells that a limit is when a function approaches a certain value. "Approaches" is an everyday, word which by itself does not have an exact mathematical meaning. But the epsilon-delta definition is very precise; it is something that one can prove is true or prove is not true for a given f, a and L by mathematical reasoning. ---------------------------------------------------------------------- You ask how small the error epsilon in the definition of limit (p.110) has to be. The definition gives a condition that has to hold for _every_ positive real number epsilon. To give you an intuitive understanding of the concept, the author illustrates the calculations by looking at a few specific values of epsilon. But handling some specific values does not prove what you want; it just gives you a feel for the problem, and insight into how delta should relate to epsilon. When you are past the "exploration" stage, and ready to write a proof, you don't prove things just for certain epsilon; you must give a prove that works for every positive real number epsilon. ---------------------------------------------------------------------- You ask about applying the criterion for a function f to have limit as x->a equal to L (p.110) to the case where every interval (a-delta, a+delta) has points (other than possibly a) where f is undefined. For simplicity, in his development Stewart is assuming that the set at which f is defined includes a neighborhood of a (possibly with a itself removed). One can consider functions which do not have this property, and then one modifies the definition of limit, to say that for every epsilon there exists a delta such that for all points x of (a-delta,a+delta) (other than a) at which f is defined, one has |f(x) - L| < epsilon. Functions whose domains of definition contain "lots of gaps" are not too important in calculus, so Stewart does not give a definition that covers them. If you go on to Math 104, then you will see functions with very general domains. (If what you had in mind was functions like the square root of x, which is defined on one side of 0 but not the other, Stewart handles these using the concept of "one-sided limit", which is in Wednesday's reading.) ---------------------------------------------------------------------- You ask about Example 2 on p.112. I hope that I made clear in class that this is not "circular reasoning" but just two versions of the same reasoning: a "scratchwork" version which starts with what we want and ends by saying what \delta to use, and a precise version, which starts with the \delta found in the first part, and proves that it indeed works. If you feel that the first part has already shown that it works, I won't disagree; I'll simply say that it's best to give a proof that begins by laying out the value of \delta you will use, and then shows that it works. If you still have difficulty with this, let me know. If you are concerned about what a proof you write up should look like: it should look like the second part, labeled "Proof", not like the first part. ---------------------------------------------------------------------- You ask about the way Stewart outlines the proof of limit statements: first "guessing" a value of \delta that should work, then proving it (e.g., in Example 2, p.112). One can talk in terms of "guessing" when one is beginning the subject, but the point of view I emphasized in lecture was that of "planning strategically". The reality can be a mixture of the two. I gave the example of finding the right conditions to use in showing lim f(x)+g(x) = L+M. As a first naive try, we might choose \delta to make |f(x) - L| < \epsilon and |g(x) - M| < \epsilon. But we discover that this only makes |f(x)+g(x) - (L+M)| less than 2\epsilon. So we go back, and make |f(x) - L| and |g(x) - M| < \epsilon/2, and that works. If we try the proof for f(x)g(x) naively, we have to go through more steps. The deviation of f(x) from L doesn't just get added to the deviation of g(x) from M; rather, the first gets multiplied by g(x) and the second by f(x). Since g(x) is around M and f(x) is around L, we might try making |f(x) - L| < \epsilon/|M| and |g(x) - M| < \epsilon/|L|. Several problems with that: first, we have to take account of the two errors being combined together; so we put a "2" in each denominator. Second, as we said, g(x) is "around M", but it isn't generally equal to it; so we handle that by replacing the |M| in the denominator by |M|+1, and making our choice of \delta such that g(x) doesn't deviate from M by more than 1. Third, L or M might be 0, and we can't divide by 0. Fortunately, the +1 handles that also. ... I won't go through the details, but we end up with the proof of Law 4, which works when we've finally set things up right. So, as I say: think strategically about what conditions to put on \delta to make the argument work, and then, if your plan covered all the problems, and you feed it into the proof, this will work. If, in trying to prove it, you find that additional conditions are needed, figure out how to work those into your choice of \delta, and try again. (I realize that you were asking about Stewart's examples with specific functions, while my answer was about general laws; but the problem is the same; just less trivial in the case of general laws.) ---------------------------------------------------------------------- You both ask how one chooses a \delta in \delta-\epsilon proofs; e.g., Example 3 on p.113. I can help best if you come to office hours. I'll base my answer on a guess that you might understand the definition of limit better if expressed using words more than symbols. In such terms, lim_{x->a} f(x) = L means that "no matter how close we want to get f(x) to L, we can guarantee this by requiring x to be close enough to a". So in Example 3, if we want to get \sqrt x "close" to 0, how "close" do we have to make x to 0? For instance, if we want to get \sqrt x within .1 of 0, or within .001 of 0 ... ? If you explore these cases, you will find that to get \sqrt x to be less than a certain distance of 0, it will suffice to get x less than the square of that distance from 0. Now the distance from 0 less than which you want \sqrt x to be is called \epsilon, and the distance such that you hope you can achieve this by keeping x less than that distance from 0 is called \delta; and using those symbols makes it much easier to express what one is trying to achieve: what I have described awkwardly in the beginning of this paragraph can be stated nicely: given any \epsilon > 0, if we take \delta = \epsilon^2, then the condition in the definition of lim_{x->0-} \sqrt x = 0 will be satisfied; so lim_{x->0-} \sqrt x = 0 is proved. ---------------------------------------------------------------------- You both asked about the "C" in Example 4, p.114. What we are trying to do is insure that x^2 - 9 = (x-3)(x+3) will be "small" (less than \epsilon) by making x-3 small (less than some \delta). To know how small we need to make the first factor in order to get the product to be small, we have to know how large the second factor might be. If we let x range over all real numbers, then |x+3| could be arbitrarily large; but since the definition of a limit as x -> 3 allows us to restrict x to values near 3, we can avoid that problem. Actually, any choice we make for how close x must be from 3 will impose some bound on how big x+3 gets. So Stewart takes the simplest such choice, "distance < 1 from 3", notes that this will make |x+3| less than 7, and that allows us to complete the argument. He refers to the desired bound on x+3 as "C". One could equally well have said "we will assume x to have distance less than .5 from 3" or "we will assume x to have distance less than 1,000,000 from 3"; all of these would have led to valid proofs of the limit statement, with different values of C. Stewart just made the simplest choice. ---------------------------------------------------------------------- I hope that the answer to your question -- why |x-3| < 1 in Example 4 on p.114 -- was clear from my lecture. The key point is that we can *make* it < 1 by taking \delta \leq 1, which we do explicitly when we define \delta = min(1, \epsilon/7). ---------------------------------------------------------------------- You ask why, in the third display of Example 4 on p.114, we "substitute C for x+3". Look more closely! The two sides of the display are not connected by "=", but by "<" . Reading the formula correctly, can you see the justification? Ask again if you still have a question about it. ---------------------------------------------------------------------- One of you notes that in Example 4, p.114, Stewart restricts x to a certain interval, and you ask whether there is an "exact" way of proving the limit. Although the way Stewart words it does sound as though restricting to the interval is cutting corners, the justification he gives is logically correct: the definition of a limit allows you to make delta as small as you wish, as long as it is positive; so whatever restrictions on it one would otherwise make, one can add the restriction that it be \leq 1. This means that the values of x for which one has to check the condition |x^2 - 9| < epsilon lie within the interval he names. So the argument he gives is in fact an exact proof. The other one asks whether, instead of restricting x to be within a distance 1 from 3, he could have used 0.5 or other numbers. Certainly! One just has to make some choice, so that we have some restriction on the size of x+3. Once we have made such a choice, we can proceed with the computation Stewart gives. ---------------------------------------------------------------------- You ask whether the Sum Law (pp.114-115) and other laws for limits can be proved simply by substitution. Substitution doesn't always work for limits. For instance, suppose f(x) is the function defined by f(0) = 1 f(x) = 0 for all x other than 0. Then we can't compute lim_{x->0} f(x) by just substituting x=0 into f: f(x) = 0 as we are approaching x=0, so the limit is 0; but 0 is not the value of f(0) itself. The above goes wrong because the function f(x) is not continuous. The statement that the limit of a sum is the sum of the limits is in fact equivalent to saying that the operation of addition is a continuous function. But the operation of addition is a function of two variables, and in this course, we are only studying functions of one variable; we won't even see definitions of limits and continuity for functions of more than one variable in this course; they are introduced in Math 53. Till then, the proof of the limit law for addition (and likewise for multiplication) must be done more or less as shown in the book. (When one does have the concept of continuity of functions of several variables, the proofs that the addition and multiplication functions are continuous are very similar to the proofs in today's reading, but just a little bit simpler, in that they involve variable-symbols x and y, instead of function-symbols f(x) and g(x).) ---------------------------------------------------------------------- Regarding formula [5] in the proof of the Sum Law on p.115, you ask why one has \leq at the last step. The absolute value of the sum of two numbers can either be equal or less than the sum of their absolute values. E.g., |3 + 7| = |3| + |7|, but |3 + (-7)| < |3| + |-7|. Either case can happen in the situation of the proof of the Sum Law: If both f(x) and g(x) are greater than L and M respectively, or are both less, then |f(x)-L| and |g(x)-M| have the same sign, and we get equality. But if one is greater and the other is less, we get inequality. ---------------------------------------------------------------------- You ask why Stewart talks about different values \delta_1 and \delta_2 on p.115. It is because we have two different functions, f and g, to which we are applying the definition of limit. Since lim_{x->a} f(x) = L, we know that to get f(x) within \epsilon/2 of L, it is enough to get x "close enough" to a; and since lim_{x->a} g(x) = M, to get g(x) within \epsilon/2 of M it is also enough to get x "close enough" to a; but in these statements, "close enough" may be different for f and for g. So we call a value that works for f "\delta_1" and a value that works for g "\delta_2". Then, as Stewart observes in the next step, min(\delta_1,\delta_2) will work for both of them. ---------------------------------------------------------------------- You ask how we find the \delta needed to verify that a function f(x) satisfies lim_{x->a} f(x) = infinity (defined on p.115). That depends on the function f(x), of course. Given a particular function, ask yourself, "Why am I sure that the function is approaching infinity?" Your answer will involve some properties of the function; and hopefully, you can translate those properties into a precise argument showing that given any M, a certain \delta will have the required property. If what you are given is not a particular function but a generic situation, such as that of Exercise 44(a) (p. 118), the idea is still the same; but in place of specific properties that a known function has, you will have to use the properties that the functions named in the situation are assumed to have. ---------------------------------------------------------------------- In connection with Stewart's definition of infinite limits on p.115, you ask how one would define infinite limits as x --> a+ or a-. I hope you can see how to answer that yourself! Just look at how the definitions of lim_{x->a+} = L and lim_{x->a-} = L are gotten by modifying the definition of lim_{x->a} = L, and make the same modifications in the definition of lim_{x->a+} = infinity. Let me know if you have difficulty with this. ---------------------------------------------------------------------- You ask about Stewart's use of M in defining limits equal to infinity, on p.115 versus N in defining limits equal to - infinity, on p.116. These are arbitrary choices that he makes in writing these definitions; the choices aren't standard. Whatever letters you choose in writing the definitions will be acceptable, if what you say about them is correct. (M and N are the common choices in such statements, but there is no common practice of using different letters depending on which infinity one is approaching.) ---------------------------------------------------------------------- You ask whether there is an easier way of showing that lim_{x->0} 1/x^2 is infinity than the one given in Example 5 on p.116. In particular, you say the use of "M" confuses you. This example shows how to get the result directly from the definition of a limit being infinite. One has to work directly from definitions to get one's first results. Once one knows how to use the definitions, one can use them to get general results, and prove specific results from these, which is often easier. To understand this proof, you should look back at the definition (on the same page, which introduces the symbol "M") and work on understanding that. Ask for help from me or the GSI if you have difficulty with this. ---------------------------------------------------------------------- Concerning 6th line on p.116, you ask why choosing a larger M may require a smaller \delta. Well, just think of the statement that lim_{x->0} 1/x^2 = infinity. Instances of this statement are "If we take x small enough, we can get 1/x^2 bigger than 100", and "If we take x small enough, we can get 1/x^2 bigger than 10000". To do the first of these, we need to take |x| < 1/10, i.e., take \delta = 1/10 (or smaller). To get the second, we need to take |x| < 1/100, i.e., take \delta = 1/100 (or smaller). So the bigger you want to get f(x), the smaller you have to get \delta. The general principle is that to get a strong conclusion, you need a strong condition. When we were dealing with statements of the form lim_{x->a} f(x) = L, both a "strong condition" and a "strong conclusion" referred to values being small (|x-a| and |f(x)-L| respectively). But when we look at infinite limits, or limits at infinity, one or the other of these conditions says that something should be large, while (unless we are considering infinite limits *at* infinity), the other still says that something is small. So they may seem "opposite" in nature; but they are really both cases of "strong conclusion requires strong assumption". ---------------------------------------------------------------------- You ask how to approach exercise 38, p.118. (By the way, as stated in the class handout, your Question of the Day should be on the reading itself. You can ask questions on homework in addition to a question on the reading, but they don't substitute for such a question. I'll count it this time, but please remember in the future.) If you asked about something like this at office hours, I'd give you a first suggestion to think about, see how far you could carry it, and then take you a step further ... . By e-mail, the best I can do is give you a start, and speak briefly about where to go after that. As a first step, think about how you could prove that the limit as x approaches 0 of the Heaviside function is not, say 17. To show that the statement in the definition of lim_{t->0} H(t) = 17 is false, you want to show that it is not true that for every epsilon there exists a delta such that [the rest of the def.]; in other words, you need to show that for some epsilon there exists no such delta. I think you will find it easy to pick such an epsilon (a value such that H(t) does not stay within that distance of 17 for small t). Well, if you can do it for 17, can you do it for less egregious values, such as 1/2? What about the values 0 and 1? After you have done this experimentation, can you write down a proof that it can't be done for any value of L? (If you can skip a lot of the experimentation and see your way to the proof without it, that's fine. The experimentation shouldn't go into your write-up of the proof, anyway.) ---------------------------------------------------------------------- You ask about what L to use in the proof that lim_{t->0} H(t) does not exist, in Exercise 38, p.118. If you just used L=1, you would just be showing that H(t) did not have 1 as its limit; that wouldn't show it had no limit. Similarly, if you just used L=0, you would just be showing that 0 was not its limit. So you have to give a proof that shows that for every L, not just 0 or 1, the function does not have limit L. Your argument will in fact have to depend to a certain degree on L; i.e., you will have to argue that if L is in a certain range, then the limit is not L by one computation, while if it is in another range, the limit is not L by a slightly different computation. Note also that your Question of the Day is supposed to be about the reading, not the homework. As I say on the class handout, you are *also* welcome to ask questions about the homework, but this should be *in addition to* your question about the reading. Please bear this in mind for future questions. ---------------------------------------------------------------------- You write > Can limits and continuity be determined for other types of equations, > such as parametric and polar? "Parametric and polar" aren't different kinds of equations -- they are different ways of graphing functions. The concepts of limit and continuity are properties of functions, not of how we graph them. (It is true that on p.119 Stewart describes continuity as meaning that you can graph the function "without removing your pen from the paper". But that is just a way of getting the idea across; it is not the definition.) ---------------------------------------------------------------------- Regarding Definition 3 on p.120, you ask about the parenthetical statement about one-sided continuity at endpoints in the case where the interval is open. When Stewart says "an endpoint", he means "an endpoint which belongs to the interval". So, for instance, if the interval is [0,1), then this only applies only to the endpoint 0, while if the interval is open, the comment about endpoints doesn't apply to at all. Although the parenthetical statement you are referring to, taken by itself, is ambiguous, the preceding sentence makes it clear that he is talking about "every number in the interval", which doesn't include endpoints that are excluded from the interval. ---------------------------------------------------------------------- Regarding Example 4 on p.121, you ask "What is the purpose of moving the lim to inside the square root sign?" The statement that a function f is continuous at a, namely, lim_{x->a} f(x) = f(a), can be thought of as lim_{x->a} f(x) = f(lim_{x->a} x) This is a silly way to write it, since it's easier to write "a" than "lim_{x->a} x". But it gets the point across that continuity is means that "limits can pass from outside to inside f", which gives us a hint as to what to do when we want to show a complicated function is continuous: We pass the "lim" from the outside to the inside one step at a time. In Example 4 on p.121, passing it through the square root sign is one of those steps. One can also ask "What is the justification for that step?" Stewart says "(by 11)". It took me some hunting to find what he meant: By law 11 in section 2.3, on p. 101. ---------------------------------------------------------------------- Regarding the proof of Theorem 4 on p.121, you write > I believe that Stuart writes (f+g)(x) to > mean f(x)+g(x). Is that common notation ... ? Yes. If you are not familiar with such notation, see the section "Combinations of functions", pp. 39-41. ---------------------------------------------------------------------- In relation to Theorem 7 on p.123, which says that many sorts of functions are continuous everywhere on their domain, you ask whether there are any functions -- other than those defined by different formulas on different sets -- that are not continuous everywhere on their domains. Well, all such functions have something peculiar about their definitions; but here are three examples that aren't blatantly of the sort you refer to. Two are given in Stewart: The "greatest integer" function, [[x]], is defined on p. 105, and given as an example of discontinuity on p. 120. Another is given in Problems Plus number 2 on p. 781. For the third example, recall that one usually defines arcsin x to be the unique y between -\pi/2 and \pi/2 with sin y = x; but someone might want his arcsin function to be non-negative valued, and define it to be the smallest y \geq 0 with sin y = x. This turns out to be discontinuous at 0. (For x = 0, it gives 0, but for x < 0, it gives values > \pi.) ---------------------------------------------------------------------- You asked whether it wasn't obvious that, as stated in Theorem 8, p.125, lim_{x->a} f(g(x)) = f(lim_{x->a} g(x)). I hope that what I said in class clarified this: if as x -> a, the function g(x) approaches some b, and we then apply f to these values g(x) that are approaching b, then if f is continuous, we are fine: the values f(g(x)) will be approaching f(b); but if f is discontinuous, this need not be true. ---------------------------------------------------------------------- Regarding the technique of integration by parts (pp.464-468) you ask whether there is an easy way to decide which part of an integral should be u and which should be v'. In general, no. Obviously, you want the v' to be something that you know how to integrate; i.e., such that you can find an appropriate v. Among the various choices for which that is true, you have to look ahead to foresee which choice of u and v' will lead to a product u' v on the right-hand side of the integration by parts formula that you can also integrate. ---------------------------------------------------------------------- You ask whether, parallel to the development of integration by parts from the product rule for differentiation (p.464), there is a technique based on the quotient rule. Actually, the quotient rule is a version of the product rule. If we differentiate u/v = u v^{-1}, then the product rule says the result is u (v^{-1})' + u' v^{-1} = uv'(-1/v^2) + u'/v, which when brought to a common denominator gives (u'v-uv')/v^2. This used the fact that the derivative of v^{-1} is -v'/v^2, and likewise, if we have a function such as uv'/v^2 to integrate, we can say "Aha, v'/v^2 is the negative of the derivative of v^{-1} !", and use ordinary integration by parts, with v^{-1} as one of the functions. ---------------------------------------------------------------------- Regarding example 6 on p.467, you ask Why is u=sin^{n-1} x? Why is v=-cos(x)? First, I hope you saw the word "Let" before those equations. In other words, the author is not saying that we can tell from the integrand sin^n x dx that we must take u=sin^{n-1} x and v=-cos(x). In the technique of integration by parts, we have to start with some factorization of the integrand, and he is saying, "Let's try the simplest factorization, the one into sin^{n-1} x times sin x dx, noting that we can write the latter as d(-cos x)." You then ask what happens to the coefficient 1/n on the right-hand side of the equation to be proved. In this example, think of the left-hand side of that equation as what is given, and suppose we didn't know the right-hand side. Starting with the above choices of u and v, we apply integration by parts, as the author shows. We get an equation in which the original integral occurs on both the left and the right-hand sides, with coefficients 1 and -(n-1). As shown on the next-to-last equation of the example, we can combine these into one term on the left, with a coefficient of n. Dividing by that n, we get the desired formula, with "1/n" on the right. ---------------------------------------------------------------------- You ask whether there are reduction formulae for integrals of powers of other trigonometric functions, in addition to those developed for the sine and cosine (Example 6 on p.467, and Exercise 48 on p.469). Yes. On the one hand, to get such a formula for the secant, you can just take the reduction formula for the cosine and put a negative value in for the exponent. This will give an equation expressing the integral of a "smaller negative" power of the cosine in terms of the integral of a "larger negative" power; equivalently, expressing the integral of a smaller power of the secant in terms of the integral of a larger power; but it can be turned around to express the integral of the larger power in terms of the integral of the smaller power. To get a reduction formula for powers of the tangent, one has to start from scratch; but it can be found. The results for tangent, cotangent, secant and cosecant are given as formulas 75, 76, 77 and 78 in the table of integrals (on the sheet I handed out on Wednesday). ---------------------------------------------------------------------- Regarding the first two displayed equations on p.468, you write that "cos^2 x becomes -(n-1)\int sin^n x dx." That is not what is happening -- look more carefully. At the top of the page there is a formula in which the integrand contains "cos^2 x". In the next line the author notes that cos^2 x = 1 - sin^2 x; so try substituting 1 - sin^2 x for the cos^2 x in the right-hand side of the formula at the top of the page, multiplying out, and seeing what you get! As I said the first day of class, reading a math text should be a struggle with the text. Not a struggle because the author is against you, but a struggle to make the methods being introduced a part of your thinking. This means doing calculations yourself when the results are not obvious. ---------------------------------------------------------------------- You ask, in connection with the last sentence of Example 6, p.468, how the reduction formula can be used repeatedly. The formula proved expresses the integral of sin^n x in terms of the integral of sin^{n-2} x. If n-2 >_ 2, you can apply the same formula to that new integral, with "n-2" in place of "n". (E.g., if you started with the integral of sin^5 x, then the reduction formula with n=5 expresses this in terms of the integral of sin^3 x; and you can then apply the reduction formula with n=3 to that, to and express it in terms of the integral of sin x.) So if you start with any odd integer n, you can reduce the integration successively to the corresponding integrals with exponent n-2, n-4, ..., 3, 1; while if you start with even n, you can can again reduce to the cases n-2, n-4, ..., etc.; since these are even, we will end with ..., 2, 0. In either case, the final integral is one we can easily do. ---------------------------------------------------------------------- Regarding Exercise 37 on p.469, you ask how you would know to use substitution if the exercise didn't say so. Well, we haven't had a formula for integrating cos \sqrt x, but we have had a formula for integrating cos t; substitution is a tool that will convert one into another. (It will also bring in another factor, dx/dt, but we can hope that this won't create an intractable problem.) Stewart has told us to do this here because the focus of this section is integration by parts, and the student may not have the technique of substitution in mind. But in general, one has to ask "What tools do I know, and which of them is likely to bring the problem into a form I can solve?" As I said the first day, integration is an example of an "inverse problem", where instead of having straightforward methods that always give the answer, one needs ingenuity to discover a method that will work. After giving various techniques, Stewart will discuss "Strategy for integration" in section 7.5 (reading #5). ---------------------------------------------------------------------- You ask when one would need to use the identity [(sin x)(cos x) = (1/2)(sin 2x)] (p.473) in integration. Well, if one was integrating sin^4 x cos^4 x, it would be convenient to turn this into (sin x cos x)^4 = (1/2 sin 2x)^4 = (1/16) sin^4 2x, before preceding further. One could use the methods the author describes without this first step; but the step would make things shorter. It can also help when one is trying to decide whether solutions one got by two methods agree. If one solution involves sin x cos x and the other involves sin 2x, then one can convert between these expressions and check. ---------------------------------------------------------------------- You ask whether there are formulas for the integrals of sin^m x cos^n x and (p.473) tan^m x sec^n x (p.474) that work whether m and n are odd or even. Well, for the sine and cosine, once one has learned a little about calculus with complex numbers (numbers x+iy, where i is a square root of -1), one can use the formulas in Exercise 48 on p. A64 (which come from equation 6 on the preceding page); this reduces the problem to integrating exponential functions (which is easy). But one has to go through the algebra of substituting the formulas from that exercise into the integrand, expanding the result algebraically, and after integrating, converting the answers back to trigonometric form. For the tangent and secant, the same idea, together with the substitution u = e^{ix}, converts the problem into one of integrating rational functions, which we will learn about in reading #4. This is messier than the preceding case, but it also works. We'll discuss the complex interpretation of trig functions referred to above when we get to reading #25; though it will only be a small part of that reading. ---------------------------------------------------------------------- You're right, in point (a) of the box on p.474, where Stewart has k\geq 2, he should have k\geq 1. I'll include that in the letter of corrections I send him at the end of the Semester. Thanks! ---------------------------------------------------------------------- You asked about the 1/2 before the answer to Example 8, p.476. It comes from the fact that if we write "I" for the integral we are trying to find, then the preceding equation has the form I = -I + other stuff (though the -I occurs in the middle of the other stuff). So when we solve this, we get 2I = other stuff, or I = 1/2 (other stuff). ---------------------------------------------------------------------- You ask how one gets the trigonometric formulas on p.476. I hoped to sketch how one gets them in class Friday and again today, but didn't have the time. One nice geometric way to see them is to draw a picture with unit vectors in the plane coming out of the origin at angles A-B and A+B. (To make the situation easy to draw, let A be a relatively large angle -- say 40 degrees -- and B a relatively small one, say 10 degrees.) So those two vectors will have coordinates (sin A-B, cos A-B) and (sin A+B, cos A+B). Hence the midpoint of those vectors will have coordinates (1/2 [cos(A-B) + cos(A+B)], 1/2 [sin(A-B) + sin(A+B)]). But that midpoint will clearly come out of the origin at the average of the angles A-B and A+B, namely, A; and it is not hard to compute that its length will be cos B; hence its coordinates will be (cos A cos B, sin A cos B). Equating this with the previous formula, we get (a) and (c) of the formulas on p. 476. You can easily get (b) by putting \pi/2 - B for B in (a). ---------------------------------------------------------------------- Both of you asked how to derive a trigonometric identity Stewart uses; and though the identities were different, the answer is the same: they can both be derived from identities 12a and 12b on p. A29. Namely, on the one hand, if you take x = y in 12a, and then use the formula cos^2 = 1-sin^2 to get rid of the "cos^2" in the resulting equation; and solve that equation for sin^2 x, you get the "half-angle formula" that Stewart uses in Example 4 on p. 472. (But I don't know why he calls it a "half-angle formula".) It also appears as 17b on p. A29. On the other hand, putting -y in place of y in equation 12a, one gets equation 13a, then adding these two equations and dividing by 2, one gets equation 2a on p.476 (18a on p. A30); while treating 12b in the same way (but both adding and subtracting), one gets 2b and 2c (18b and 18c). There is an interesting intuitive interpretation of the equation one gets from, say, 2a on p.476 if one writes A = ax, B = bx, and assumes a is large compared to b. Then the right-hand side represents (except for the factor 1/2) the sum of two sine waves of slightly different frequencies. We know that two such sine waves slowly go in and out of phase, so that their sum looks like a sine wave of intermediate frequency (i.e., the sin ax on the left-hand side) whose amplitude slowly rises and falls -- and that rise and fall is what the factor cos bx on the left achieves. ---------------------------------------------------------------------- You ask about Stewart's comment on Example 9, p.476, that \int sin 4x cos 5x dx could be done by integration by parts. Good question! If I give you the hint to try the method of Example 4 in the preceding section (p. 466), do you think you can take it from there? I may add a version of this question, with a little more detail, to the next homework. ---------------------------------------------------------------------- You ask why we are able to use trigonometric functions to solve problems not involving such functions (p.478 et seq.). I'm not sure in what sense you are asking "why". From the point of view of mathematical rigor, note that whenever you have a number x between -1 and 1, you can find some angle \theta such that x = sin \theta, and then do computations involving x based on the properties of the trig functions of \theta. The trigonometric substitutions used in this section are all based on this idea. In the end, once one has done the integration, one substitutes back, and translates one's answer into a function of the original x. If you are asking why trigonometric substitutions should be the "right way to go" for these computations, recall the motivation I gave in class on Wednesday when previewing this topic: That if we want to integrate \sqrt{1-x^2}, and we draw the picture of the area we want to compute, we see that one piece of it is a segment of the unit circle, between the y-axis and the radius to the point (x, \sqrt{1-x^2}). The area of that segment is equal to half the angle \theta it subtends, and that angle satisfies sin \theta = x. This suggests using the substitution x = sin \theta. We try it out, see that it works, and find that the properties that make it useful will apply in general to expressions involving \sqrt{1-x^2}. ---------------------------------------------------------------------- You ask why we don't use (-\pi/2, pi/2) as the interval of definition of the secant function in making trigonometric substitutions, as noted in the table on p.478. Because the secant function is not one-to-one on (-\pi/2, pi/2) -- look at its graph. In fact, you can see from its graph that there is no interval on which it is both one-to-one, and takes on the whole range of its values, (-infinity, -1] and [1, infinity); so in any given substitution, one will generally want to use either an interval on which it takes on the former set of values, or one on which it takes on the latter. ---------------------------------------------------------------------- You ask why one needs to have an inverse function to g in an integration by reverse substitution, as described on p.478. Well, when one succeeds in integrating f(g(t)) g'(t)dt, one gets a function of t, and one needs to express t as a function of x in order to turn this into a function of the original variable x. If g is many-to-one, one may be able to choose, for each value of x, _some_ value of t that maps to it, but this can get complicated. For simplicity, Stewart considers the case where f has an inverse function h, and we can simply substitute t = h(x). For a function such as sin, one gets the existence of an inverse by restricting its domain and codomain, regarding it as a function from [-\pi/2,\pi/2] to [-1,1]. Then sin^{-1}: [-1,1] -> [-\pi/2,\pi/2] is the inverse function. (As, say a function with domain [0,2\pi], and/or with codomain the whole real line, it has no inverse.) (For more on functions and inverses, cf. the handout on Sets, Logic, etc., in particular, the last item on p. 2, "f: X --> Y", and the first item on the next page, "f^{-1}". ---------------------------------------------------------------------- You ask why integration by substitution, as used on pp.478-483, is reasonable. Well, let's think of an integral involving \sqrt{a^2 - x^2} (where a is positive). That expression is defined only for x in [-a,a], and for each x in that interval, there is a unique \theta in the range [-\pi/2, \pi/2] such that x = a sin theta. So we can think of each x as corresponding to some such \theta, and study our integral by thinking, "for each value of \theta, what does our expression in x and \sqrt{a^2 - x^2} equal; and as \theta changes by a tiny amount, what is the corresponding tiny change in x?" We can then compute the resulting integral, being careful to remember that the values of x are not the same as the values of theta, and the change in x ("dx") is not the same as the change in theta, but, rather, that one is determined by the other in each case. If we were to forget these distinctions, the computation would not be valid; if we keep them in mind, it is. You ask whether there can be a problem with domains. Yes; I sketched in class yesterday a kind of situation where there would be. If we had an integral where x ranged from -1 to 1, and we wanted to make the substitution x = 1/t, then although x = -1 and x = +1 would correspond to t = -1 and t = +1, we could not just write our result as an integral from t = -1 to t = +1. Rather, as x ranges from -1 upward to +1, t goes from -1 down to -infinity, and then from +infinity down to +1. We haven't yet studied integrals over ranges that "go to infinity", but we will in reading #7; and if we handle the range correctly, a substitution of the above sort will work; but not if we blindly integrate "from t = -1 to t = +1". You also worried about the fact that I motivated trigonometric substitutions by a geometric approach you would not have thought of. Don't worry -- if it were something I could have expected students to see for themselves, I wouldn't have presented it in lecture. It's good that you should try to come up with ideas for yourself; but since you're taking the class, that means that you know you can't come up with everything that way! ---------------------------------------------------------------------- You ask how one chooses the limits of integration after a change of variables, in an integration like that of Example 2, p.479. If one is letting x = g(t), one has to choose the range of values that t runs over so that x will run over the given range, taking on each value just once. In the problem shown, Stewart has reduced to the case where x runs from 0 to a, so he needs to take a range of values of theta which will make x = a sin theta do this. The most convenient such range is the range from 0 to pi/2; though other ranges, such as 2 pi to (5/2) pi, would also work. ---------------------------------------------------------------------- You ask how the author gets the formula csc \theta = \sqrt{x^2+4}/x using the triangle in the margin at the bottom of p.480. He has defined \theta by the formula x = 2 tan \theta at the beginning of the discussion of the problem. This means that \theta will be the angle of a right triangle in which the ratio of the opposite and adjacent perpendicular sides is x : 2, so he draws such a triangle. The Pythagorean theorem then makes the hypotenuse sqrt{x^2+4}, so the cosecant is sqrt{x^2+4}/x. The same result could be gotten without drawing a diagram. We know that sec\theta = \sqrt{1+tan^2 \theta} = \sqrt{1 + (x/2)^2} = \sqrt{1 + x^2/4} = \sqrt{(4 + x^2)/4} = \sqrt{4 + x^2)}/2. Now csc\theta = sec\theta/tan\theta, so this is (\sqrt{4+x^2)}/2)/(x/2) = \sqrt{4+x^2)}/x . ---------------------------------------------------------------------- You ask why a>0 in Example 5, p.481. The case a = 0 would have to be done by different methods, since in that case one can't write x = a sec theta. The case a < 0 does not logically have to be excluded; the calculation that Stewart does for a > 0 works for any nonzero a. But since a^2 = |a|^2, if the function has a < 0 we can always rewrite it using |a| in place of a, so Stewart chooses the positive value. I suppose his thought is to do so "just in case" the question of which sign a has would complicate some later computation; even though it actually wouldn't. A case where the choice would make a difference is Example 1 on p. 479. One could take x = -3 sin theta, but then the signs of most of the equations that follow are reversed; in particular, the sign in the formula for cot theta in the middle of the page has to be reversed. The final answer would be the same, but the details of the calculation would be different. ---------------------------------------------------------------------- You ask about reversing the trigonometric substitutions in the indefinite integrals of section 7.3, in particular Example 5, p.481. In that example, the substitution was x = a sec \theta, so the inverse substitution is \theta = sec^{-1} x/a. When one substitutes this into an expression like tan \theta, one has for figure out what tan (sec^{-1} x/a) is as a function of x. From the diagram in the left margin, one concludes that this is \sqrt{x^2-a^2} / a. An equivalent way to do this is to express everything as a function of a sec \theta, and then put x everywhere for a sec \theta. For instance, one writes tan \theta = \sqrt{(sec^2 \theta) - 1} = \sqrt{(a sec \theta)^2 - a^2} / a = \sqrt{x^2-a^2} / a. ---------------------------------------------------------------------- You ask why, near the end of solution 1 to Example 5 on p.481, the author rewrites - ln a + C as C_1. Since C can be any number, and ln a is a fixed number, the sum -ln a + C can also be any number; so the answer is put in simpler form if instead of writing the two terms "-ln a + C", we regard their sum as a single arbitrary constant. In a table of integrals, we would just call this constant "C". But since Stewart has used the symbol "C" already in this derivation, he calls the new constant "C_1". (Mathematicians, among themselves, might say or write something like "renaming -ln a + C as C", and end the calculation with "... + C"; Stewart might do this himself in writing to a fellow mathematician. But since he is writing here to students, he is very careful to avoid that notational shortcut which might lead to confusion.) ---------------------------------------------------------------------- You ask about the step at the end of Solution 1 to Example 5, p.481, where the author says "Writing C_1 = C - ln a ...". The constant of integration denotes "any constant"; so the form Stewart gets at the next-to-last step means "what you get when you take any constant and subtract ln a from it". Obviously, that just comes down to "any constant", so it would be silly to express it in a more complicated way. You probably wouldn't lose points on an exam for not making this simplification; but it is certainly preferable to give one's answers without unnecessary complications. ---------------------------------------------------------------------- You asked about the solution to Example 5 on p.481 based on hyperbolic functions. Those functions are defined and discussed in section 3.11. They aren't nearly as important as the trigonometric functions, and we won't be paying much attention to them; but you can see from the formulas at the bottom of pp. 258 and 259 that they have properties very much like those of the trigonometric functions, which allow them to be used in a similar way in solving differential equations. When we study complex numbers in reading #26, we'll get some insight into the relation between hyperbolic and trigonometric functions. ---------------------------------------------------------------------- Regarding Solution 2 to Example 5, p.481, you ask "How is it that \cosh^{-1}(\frac{x}{a}) = \log|x + \sqrt{x^2 - a^2}|?" Write the equation \cosh(y) = \frac{x}{a} using the definition of \cosh(y) in terms of exponentials; multiply by e^y to get rid of exponentials with "negative" exponents; regard the result as a quadratic equation in e^y; solve using the quadratic formula; and take the logarithm to find y. You'll get the above formula, but without the absolute value signs. This is because Stewart has assumed x > 0, so that the expression in question is positive. If x < 0, then one has to use the substitution x = -a cosh t instead of x = a cosh t. ---------------------------------------------------------------------- You ask about the words "proper" and "improper" as used in the top part of p.485. These are words taken from elementary-school arithmetic: A fraction like 9/4 is called an "improper fraction", because its numerator is larger than its denominator. One simplifies it to 2 1/4, the sum of an integer, 2, and a "proper fraction", 1/4. The author is extending the terms to rational functions, calling a rational function P(x)/Q(x) "proper" if P(x) has smaller degree than Q(x), and "improper" otherwise; and he notes that an improper rational function can be reduced to the sum of a polynomial and a proper rational function. These terms probably won't come up again, either in this course or in other math courses. (There is another use of "proper" and "improper" that you will see in the later courses, regarding subsets of a set; that meaning is entirely different.) ---------------------------------------------------------------------- You ask how we know that "long division of polynomials", used on p.485, is valid. If we start with polynomials P(x) and Q(x), which begin a_n x^n + ... and b_m x^m + ... , and where n \geq m, then the first "step" in the long division process consists of writing the term (a_n/b_m) x^{n-m} as the beginning of the quotient. Now let us write P(x) = (a_n/b_m) x^{n-m} Q(x) + (P(x)-(a_n/b_m) x^{n-m} Q(x)). Then the right-hand term P(x)-(a_n/b_m) x^{n-m} Q(x) will be a polynomial of lower degree than P(x), since the highest degree terms of P(x) and (a_n/b_m) x^{n-m} Q(x) are equal, and cancel when we subtract. We can now write P(x)-(a_n/b_m) x^{n-m} Q(x) as a polynomial beginning with an x^{n-1} term (which might or might not be zero), and if n-1\geq m, we can repeat this process, subtracting a constant multiple of x^{n-m-1} Q(x) from P(x)-(a_n/b_m) x^{n-m} Q(x) so as to remove the x^{n-1} term. Continuing this process, we end up with an expression P(x) = S(x) Q(x) + R(x), where S(x) and R(x) are polynomials, with R having degree < m. Long division of polynomials is simply a way of writing out this process conveniently. As noted above, the first term that we write in the quotient is (a_n/b_m) x^{n-m}; the subtraction that we then do corresponds to subtracting a_n/b_m) x^{n-m} Q(x) from P(x), etc. ---------------------------------------------------------------------- You ask why we put x=a for A(x+a) and x=-a for B(x-a) in Example 3 on p.487. This follows the procedure described for the preceding Example, in the "NOTE" between these examples. Did you skip that Note, or not understand it? If you have trouble with it, I'll be glad to help; but you should certainly read it and see whether you can follow it, rather than skipping it and then asking about the same method when it is re-used in the next example. ---------------------------------------------------------------------- You ask about the "repeating/extra" factors in the examples on p.488 et seq. These are introduced in the discussion beginning at the bottom of p. 487 and continuing at the top of p. 488. I will try to discuss the "why" of it in class, but assuming you had read that introductory paragraph, you shouldn't have been surprised when they appeared in the examples. Likewise, his discussion of Case IV, beginning at the bottom of p. 490, introduces the corresponding phenomenon with quadratic factors, which appears in Examples 7 and 8. Moral: read his discussion of the topics in the book, not just the examples. And if something you don't understand comes up in such a discussion, ask about that. ---------------------------------------------------------------------- You ask why, in the solution to Example 4, p.488, the constant of integration is written K rather than C. This is because the letter "C" has already been used to denote one of the undetermined coefficients (starting in the display before [8]). A constant can be denoted by any letter, but one has to avoid using the same letter to denote two different things. (Unless one makes very clear that one is changing notation. E.g., it is OK if one writes explicitly "From this point on, we will use n to denote what we previously called 2n"; or, in the case at hand, "Since we have found the values we have been denoting A, B, C, we no longer need to use those letters to denote those values, and will feel free to use C for a constant of integration below". But when one is not short of symbols, it is simpler not to go through a change of notation like that, and just to use a different letter.) ---------------------------------------------------------------------- You ask to be shown how to integrate the function shown in the second display on p.489. If you need help with this problem, you should carry the calculation as far as you can using the techniques that the book describes, then tell me what difficulty you encounter, what you need clarified about the next step, etc., and I will do my best to help. So please re-send your question, following the above guidelines. ---------------------------------------------------------------------- You ask where we get formula [10] on p.489. I showed in class how to get by trigonometric substitution. (I just did the case a = 1, but the principles of trigonometric substitution shown in the previous section tell you what to do for any a.) Alternatively, you can recall that the derivative of tan^{-1} x is 1/(x^2 + 1). This gives the integral of 1/(x^2 + 1); and by a change of variable, you can reduce the integral of 1/(x^2 + a^2) to that integration. ---------------------------------------------------------------------- You ask why example 8 on p.491 has three terms, when there are only two distinct factors in the denominator. First note that this is what Stewart has told you would be the case, in formula [11] at the bottom of p. 490: When there is a repeated irreducible factor, one gets a sum of terms with different powers of that factor as their denominators, rather than just a single term. So there should have been no surprise in seeing this in the example he gives. Note also that this is analogous to what happens when the factors are linear rather than quadratic: see formula [7] near the top of p. 488. There are different ways of explaining "why" this happens. The approach that I used in class, for the case of linear factors, was to note that when a linear factor x-a occurs just once, then the rational function goes to infinity as x --> a "like" some function A/(x-a), and that by subtracting A/(x-a) from the function for an appropriate c, one can make the function smooth at x=a; in other words, get a function whose denominator does not have a factor x-a. On the other hand, if the denominator is divisible by (x-a)^r, then one has to do things stepwise: first subtract a term A_r/(x-a)^r that causes the number of factors (x-a) in the denominator to decrease by 1; then subtract a term A_{r-1}/(x-a)^{r-1} that causes the number of factors (x-a) in the denominator to decrease again, and so on. We then end up with an expression for our rational function as A_n/(x-a)^n + A_{n-1}/(x-a)^{n-1} + ... + A_1/(x-a) plus a rational function r(x) with no x-a in its denominator (and then we start working on the other factors in the denominator of r(x)). This shows why we get such expressions in the case where all divisors of the denominator are linear; and once we see this, it is no surprise that we get similar expressions when there are quadratic factors. ---------------------------------------------------------------------- You ask about the last display in Example 9, p.492, and how the author goes from \int \sqrt{x+4}/x dx to 2\int du + 8\int du/{u^2 - 4}. He is using the computation of the preceding display, which has converted \int \sqrt{x+4}/x dx to 2 \int (1+ 4/(u^2 - 4)) du. In the first line of the display you ask about, he carries this one easy step further. It is only on the next line that he applies Formula 6. ---------------------------------------------------------------------- You ask whether the Weierstrass substitution t = tan(x/2), shown in Exercise 59 on p.493, is a good way to evaluate the integrals in section 7.3. Well, it gives a method that will work if all else fails. It expresses the resulting integrals in terms of the tangent of x/2, rather than in terms of trigonometric functions of x. This can be fixed using the formula tan(x/2) = (sin x)/(1+cos x). But when the methods of section 7.3 give an easily solution, I suspect that this substitution will give a much lengthier path to the same answer. ---------------------------------------------------------------------- Concerning example 4(c) on p.496, you write: > The function integrated is 1/(1-cos x). This is done by multiplying > and dividing by 1+cosx. Can we do this even if the integral is a > definite integral from pi/2 to 3pi/2? The value of cos x at pi is -1 > so 1/(1+cosx) doesn't exist at this value. Can we still apply this > method to solve this integral? Good point! The answer is "yes and no". When one does the integration, one gets a function which is guaranteed to have the right derivative except at the points where the modified integrand is undefined. But since the original integrand was continuous at some of these points, such as pi, and is equal to the modified integrand except where that is undefined, we can expect that its integral should agree with the integral of the original function where it is defined. So we should hope that when we compute the integral, we should be able to "fill in" the values at points like pi to get a differentiable function which is the desired integral. If we continue the integration where Stewart leaves off, we get -cot x - csc x, still undefined at x = pi; but we notice that -cot x goes to +infinity as x -> pi from below, while -csc x goes to -infinity, so there is a hope that their sum will behave reasonably. Expressing cot and csc in terms of sine and cosine, we get -((cos x) + 1)/sin x, where numerator and denominator both vanish at x = pi. How can we simplify that? We would like the fact that (cos x) + 1 goes to 0 at pi to be a result of the fact that some sine or cosine goes to zero at that point, so that we can hope to cancel such a sine or cosine in the denominator. An expression that represents the zero of (cos x) + 1 at pi as the zero of such a function is the half-angle formula, (cos x) + 1 = 2 cos^2 x/2. So we also apply the half-angle formula to the denominator, writing sin x = 2 sin x/2 cos x/2. Then -((cos x) + 1)/sin x becomes -(2 cos^2 x/2)/(2 sin x/2 cos x/2), which simplifies to -cos x/2 / sin x/2 = -cot x/2. We can check that this is an antiderivative of the original function. (When we differentiate it, we get (1/2) csc^2 x/2 = 1/(2 sin^2 x/2). Using the half-angle formula once more turns this into 1/(1-cos x), our original integrand.) Since it is in fact continuous at x = pi, the definite integral of original function from pi/2 to 3pi/2 is the difference between the values of this function at those points; and since it equals the result that Stewart's calculation leads to, -((cos x) + 1)/sin x, at pi/2 and 3pi/2, the difference between the values of that function at those two points is the correct integral. But as you pointed out, we couldn't have known that without finding this alternative expression for it. This development suggests another way of doing the original integration: applying the half-angle formula 1 - cos x = 2 cos^2 x/2 to the denominator of the integrand. And in fact, it gets the same result faster. ---------------------------------------------------------------------- You ask whether there is an alternative approach to \int dx / (1-cos x), given in (c) under heading 4 on p.496. Yes. One can turn the "half-angle" formula sin^2 x = (1-cos 2x)/2 backwards, getting 1-cos x = 2 sin^2 x/2, so that the integral becomes 1/2 \int dx / sin^2 (x/2) = 1/2 \int csc^2 (x/2) dx = -cot x/2. (With the help of the formulas in Exercise 59(b) on p. 493, one can, if one wishes, express cot x/2 = (cos x/2)/(sin x/2) as (1 + cos x)/ sin x, thus expressing this integral in terms of the sine and cosine of x.) ---------------------------------------------------------------------- You ask how one can prove that the antiderivative of a function is not elementary (p.499). Interesting question. I don't know the details of the answer, but hunting around, I find that a key step is the result described in http://en.wikipedia.org/wiki/Liouville's_theorem_(differential_algebra) I haven't seen the proof of that result; and it certainly takes work to show that a given function doesn't satisfy the criterion which that result gives; but at least it shows the kind of method used. I realize that the article takes for granted some other concepts, such as "differential field", which you haven't seen; so it is a still less complete answer for you than for me. If you have questions about it, ask me at office hours. Anyway, I'm impressed that the result was proved as far back as the early 1800's. ---------------------------------------------------------------------- In connection with the introductory paragraphs of section 7.7 (p.506), you ask how we know that functions which we can't integrate explicitly do have integrals. Theorem 3 on p. 373 shows that any continuous function on a closed interval has an integral. The proof would be hard to give using the material of this course; but you'll certainly see it proved if you take Math 104. In that result, "integral" is defined as a certain sort of limit of partial sums. If you were thinking of "integral" as meaning "antiderivative", then you can get the existence of that by combining the above result with the Fundamental Theorem of Calculus: p. 388. ---------------------------------------------------------------------- You ask what to look for in deciding which method of approximate integration (pp.506-515) to use. The progression in this section of the book is from the naive to the more sophisticated; so if you really wanted to do a computation, the best of the techniques described in the section would be one we will see in Wednesday's reading, "Simpson's rule". On the other hand, the easiest ones to remember, and to apply in a "scratchwork" first-approximation, are the ones in today's reading, the easiest being those at the start. We'll see on Wednesday why some approximation methods work better than others. ---------------------------------------------------------------------- You ask about the relative values of Riemann sums versus the methods of approximate integration described in section 7.7 (pp.506-515). The biggest value of Riemann sums is not in practical computations, but in developing the theory of integration, something you will see if you take Math 104. The formal definition of an integral says that \int a b f(x) dx exists and is equal to A if the Riemann sums approach A no matter what "sample points" x*_i one takes in the various integrals, and no matter what subdivisions of [a,b] into equal or unequal subintervals one uses, as long as one lets the lengths of these subintervals approach 0. (See p. 373, Note 4 for divisions into unequal subintervals. Stewart justifies these in terms of a practical application, but they are essential to the general development.) In Math (H)1AB, results we prove about integrals are based on the fact that if f is continuous, integration of f gives an antiderivative of f; so one can prove results such as the formula for change of variables using results on differentiation. But in the general theory, one integrates functions that may not be continuous, and in that case, the integral may not be differentiable; so one has to develop the general theory of integration without relying on the theory of differentiation. Then results like the change-of-variable formula require the general definition of a Riemann integral based on not-necessarily-equal subdivisions. (Because when one changes variable, equal subdivisions generally become unequal.) ---------------------------------------------------------------------- You ask about approximate integration (introduced on p.507) in the case of integrals over infinite domains. Good question! For this to make sense, the function must go to zero fairly rapidly as x --> infinity (and/or minus-infinity, as the case may be), and it may be possible to show that the integral from a certain point on will be less than some small constant, and then apply one of the methods of this section to the integral over the remaining finite region; and adding the two bounds, get an error bound for difference between the approximation of the integral over the finite region and the exact integral over the whole region. Alternatively, one can make a change of variables that converts the integral over the infinite region into an integral over a finite region. E.g., given the integral of a function as x ranges from minus-infinity to infinity, we might substitute x = tan \theta, and integrate from \theta = -\pi/2 to \pi/2. Integrals over infinite domains, and integrals of functions that go to infinity in places, will be looked at in the reading after next; but Stewart doesn't talk about approximating their values, except in one exercise, number 70 on p. 529. ---------------------------------------------------------------------- I hoped to answer your question in class, but didn't have time. You asked why, as stated in Stewart on p.509, the midpoint rule tends to be more accurate than the trapezoidal rule. To see it visually, consider the case where the function is y = x^2, and the interval [x_{i-1}, x_i] is [-1,1]. The picture one gets in comparing the two rules sits in a box of height 1 and length 2, and because of the way the parabola is curved, you can see that the area under the parabola is less than half the area of the box. (It is, in fact, 2/3, which is 1/3 of the area of the box.) Now the midpoint rule for this picture gives 0, while the trapezoidal rule gives the full area of the box, 2; so the midpoint rule is closer. The picture given on the right-hand side of Figure 5, p. 509, is like this, but shifted in several ways that don't affect the essential point. (It is shifted horizontally, so that the midpoint is not necessarily zero; it is rescaled, horizontally so that x_i - x_{i-1} need not be 2; it is rescaled vertically (in this case, by a negative factor, so that the curve bends downwards rather than upwards); it is shifted upward, and finally, a linear function is added, so that the slopes of BC and QR need not be 0.) But it is still true that the pink area, representing the error of the midpoint rule, is about half the blue area, the error of the trapezoidal rule. ---------------------------------------------------------------------- You ask why, in Figure 5 on p.509, BC has midpoint P and is tangent to the curve at that point. The point P is taken to be the point on the curve above \bar{x}_i, and Stewart defines BC to be the tangent to the curve at that point, extended to meet the vertical edges of the rectangle. Now \bar{x}_i is defined to be half-way between x_{i-1} and x_i (their midpoint); so P will be the point of BC whose x-coordinate is half-way between the x-coordinates of its endpoints; hence P is the midpoint of BC. The one thing that is not so obvious is why BC is shown as almost parallel to QR. That can be deduced from the fact that for x_{i-1} - x_i small, the curve looks almost like a parabola, as I discussed in class. And for a parabola, you can verify that the two lines will be parallel. (I.e., that the deviations of the height of the curve from BC will be equal at the two ends.) ---------------------------------------------------------------------- Regarding the results on p.510 and p. 514, you ask "Why are error bounds necessary?" This comes down to the question "Do the Midpoint Rule, the Trapezoidal Rule and Simpson's Rule actually work? Always? If not always, then when ...? And in what sense do they `work'? I.e., how close can we be sure that the answers they give come to the real values of the integrals?" One can't simultaneously answer these questions for all conceivable functions; but the error bounds in the text do answer them for large classes of functions: Functions which are twice (or in the case of Simpson's Rule, four times) differentiable, and where we know bounds on the derivative in question. ---------------------------------------------------------------------- You ask about the relation between the error estimates for the midpoint and trapezoid laws (p.510) and the actual errors in the case of particular functions. The error estimates are the maximum values that the absolute value of the error might achieve, given on our knowledge of the second derivative of f. Particular functions for which the second derivative is everywhere \leq a given constant K may have E_T and E_M anywhere between the values shown in the "error bound" formulas and their negatives; including the value 0. If we bring in more information about a particular function, we may, of course, be able to prove that the error actually has a lower value than the one given by those estimates. But we may not have such extra information, or we may have it but it may be a lot of work to see what we can prove from it, or it may lead to still more time-consuming calculations. So, while under some circumstances it may be worth making further calculations based on more information, in this section we are learning some convenient facts provable for every function for which we can bound the second (or later in the section, the third) derivative. ---------------------------------------------------------------------- You ask about the geometric interpretations of K and (b-a)^3 in the error bounds on p.510. If K = 0, then f has zero second derivative, hence constant first derivative, hence is a straight line, and in that case, the midpoint rule and trapezoid rule give the exact integral. The larger K is, the more "curved" the graph is, and hence the larger the shaded areas in the diagram at the bottom of p. 509 can be. So the larger K is, the larger the error can be (unless we compensate by making a finer subdivision, i.e., larger n). As for (b-a)^3, note that if we, say, double the length of the interval we are using, but keep n the same, then the base of each rectangle doubles. One can deduce that since the curve looks, locally, like a bit of a parabola, doubling the length and keeping K the same causes the shaded areas in those diagrams to be multiplied by 2^3; so generally speaking, the error will grow with the cube of b-a. Or, to look at it another way, if we want to double b-a but keep the bases of our rectangles the same, we need to double n at the same time. Then our bound on the error in each interval will stay the same, but there will be twice as many intervals, so our bound on the error should double. This means that if we multiply b-a by 2 and compensate by multiplying n by 2, our bound will (merely) be multiplied by 2. Given that n appears to the second power in the denominator, b-a must appear to the 3rd power in the numerator. This second approach shows that in any such error bound, the exponent of b-a should be one more than the exponent of n in the denominator; and the bound for Simpson's Rule on p. 514 indeed follows this pattern. ---------------------------------------------------------------------- You ask whether the error formulas on p.510 also apply to the errors in the left and right endpoint approximations. No. To see this, notice that the midpoint and trapezoidal approximations have zero error for linear functions, f(x)=Ax+B, which is consistent with the formulas given, since such functions have second derivative 0, so that one can take K = 0. But the left and right endpoint approximations have nonzero errors for linear functions, so the error estimate in question can't be correct for them. In fact, one can get an error estimate for those two approximations in which K is taken to be an upper bound for the first derivative of f. In this estimate, n will appear to the first power in the denominator, rather than the second power; so as one takes smaller and smaller subdivisions, these approximations improve less quickly than the midpoint and trapezoidal approximations. ---------------------------------------------------------------------- You ask about the different derivatives that occur in the error estimates for the different approximation rules (pp.510 and 514). In my class discussion on Monday, I very roughly sketched why different order derivatives occur in these rules. The midpoint and trapezoidal rules would give exact answers if the curve were a straight line, and the picture that Stewart gave, and that I showed on the board, indicates, roughly, that the error is proportionate to the failure to be a straight line, i.e., the failure of the first derivative to be constant, i.e., to the second derivative. But the this affects the midpoint and trapezoidal rules in opposite ways, and when one combines them in Simpson's rule, the effects cancel out -- if the curve has some constant second derivative, then Simpson's rule gives the exact answer. So the error in Simpson's rule depends on higher derivatives. As I said in class, one might expect it to depend on the third derivative; but because of the symmetric way the rule works, a function having the sort of symmetry that odd functions show (but with respect to the midpoint of the interval rather than with respect to 0) have no effect on the Simpson's Rule estimate or on the integral itself; and functions with constant third derivative are gotten from functions with constant second derivative by adding a multiple of y^3, an odd function; so the error in Simpson's rule somehow depends on the nonconstancy of the 3rd derivative, which depends on the 4th derivative. ---------------------------------------------------------------------- You ask about the "K" in the error estimates for the various approximation rules (pp.510, 514). The idea of the error estimates is that if the function f doesn't vary too "wildly", the results of the approximation formulas will be close to the actual value of the integral. The way the function varies depends on its derivatives; so if we know that one or another derivative never exceeds a certain value, then we can say that the error in the result of applying the approximation formula will not exceed a value computed from this. For instance, in box [3], p. 510, "K" denotes any number that we know |f''| never exceeds in the given interval; if we know that, we get the bounds on the errors shown in the last line of that box. ---------------------------------------------------------------------- You both ask why in the first display on p.512, the B disappears after the first step. Stewart answers this in the left-hand margin, saying "Here we have used Theorem 5.5.7". To find that theorem, turn to section 5.5, and look for the boxed theorem number "[7]". (That takes a bit of looking, but it's on p. 412.) After checking that theorem, do you see the explanation? ---------------------------------------------------------------------- You ask how we know that S_{2n} = (1/3) T_n + (2/3) M_n (p.513, last displayed equation). Write out the formulas for S_{2n}, T_n, and M_n, and see how they are related. The \Delta x in S_{2n} will be different from that in T_n and M_n; so you might use some symbol such as c for (b-a)/2n, and then write the x_1, x_2 etc. of the midpoint rule, the trapezoidal rule, and Simpson's rule in terms of a and c; and see what happens when you write down the formulas for S_{2n} on the one hand, and (1/3) T_n + (2/3) M_n on the other. ---------------------------------------------------------------------- You ask whether there is any way to use Simpson's Rule (p.513) with an odd value of n. For Simpson's Rule itself, n has to be even. But there are other expressions in f(x_0), ..., f(x_n) that give equally good or better approximations, which can be applied with n even or odd. It's harder to discover these other methods than Simpson's rule, which may be why they are not mentioned in calculus texts. If I ever have time to get to the heading "Is Simpson's Rule optimal" that I had on the agenda today, I'll say a little about this. ---------------------------------------------------------------------- You ask whether there is a rule more accurate than Simpson's rule (p.513). Yes; in two ways. On the one hand, the string of coefficients used in Simpson's rule, "1, 4, 2, 4, ..., 4, 2, 4, 1", although relatively easy to come up with, is not really the best possible. There are strings of coefficients that actually yield somewhat better approximations of a function, but if presented at this level, they would seem to be "pulled out of a hat". On the other hand, Simpson's rule is based on approximating our function by parabolas on successive segments. If one instead approximates it by higher degree curves, one can get estimates that improve still faster as n increases than Simpson's Rule, just as the Simpson's Rule approximation improves faster than the Midpoint and Trapezoid Rules. ---------------------------------------------------------------------- I hope my discussion in lecture clarified the point you asked about. Where Stewart writes "for a \leq x \leq b" in the first sentence of the error bound statements on p. 510 and p.514, this is formally ambiguous between "for all x satisfying a \leq x \leq b" and "for some x satisfying a \leq x \leq b"; but what is meant is "for all" in both cases. If we merely knew that the second or fourth derivative of a function was \leq K at _some_ point of our interval, this wouldn't give much information on how the function behaved in the interval as a whole, and so wouldn't allow us to get an error bound. Only from knowing that it is \leq K at _all_ x in the interval can we draw conclusions limiting how badly f can "stray" from its expected behavior. So if we took for K the smallest value f'' (or f'''') attained, the bounds could not be true -- we must take the largest value it attains, or more generally (if we can't be sure of the largest value) anything we know is at least the largest value. As Stewart says in the margin on p. 510, smaller values of K give better bounds -- but these must still be taken from among values of K with the property of being larger than f''(x) for _all_ x in the interval. (There in the margin, he does explicitly say "for all".) ---------------------------------------------------------------------- You ask why the error bound for Simpson's Rule (p.514) involves the fourth derivative instead of the second. Simpson's Rule is set up so that it will give exactly the right value if the curve is a parabola; and one can show that it will even give the right value for f(x) a polynomial of degree 3. (This is like the fact that the Midpoint Rule, although set up so that it would give the exact answer for a function that is constant on each segment [x_i, x_{i+1}], turns out to give the right answer for any function whose graph is a straight line on each of those segments.) The polynomials of degree 3 are the functions whose 4th derivatives are 0; so to measure how Simpson's Rule fails to give the right answer for a given function, one looks at how that function fails to have 4th derivatives 0. If that failure can be bounded, by showing that the function has 4th derivative of absolute value everywhere \leq K, then we get a bound on the error in Simpson's Rule. (If, however, one has a function for which a bound on the 2nd derivative is known, but one doesn't have a way of bounding its 4th derivative, one could use the fact that S_{2n} = 1/3 T_n + 2/3 M_n, and estimate the error in S_{2n} using the second-derivative estimates on the errors in T_n and M_n. So from that point of view, the 4th-derivative error bound for S_{2n} isn't the only one that it is possible to use. But the estimate based on the 2nd derivative would not decrease nearly as fast, as n grows, as the estimate based on the 4th derivative does; so the latter is the more useful tool.) ---------------------------------------------------------------------- You ask why the error bounds for the various approximate integration rules given on p.510 and p.514 involve b-a. The presence of b-a means that the larger the interval over which the integral we want to approximate is taken, the larger the error may be, if other things are equal (the number of segments into which we divide it, and the second or fourth derivative of the function.) Another way to look at this is to recall that \Delta x = (b-a)/n. Then the formulas become |E_T| \leq K n (\Delta x)^3 / 12, |E_M| \leq K n (\Delta x)^3 / 24, |E_S| \leq K n (\Delta x)^5 / 180. You might find this form easier to understand. But Stewart gives the form in terms of b-a and n, because in a given situation, b-a will generally be given, and n is what we have to choose, and choose large enough to keep the error below a given value. (Still another way to express these formulas would be in terms of b-a and \Delta x. Those would show how small we have to take \Delta x to get a given accuracy.) ---------------------------------------------------------------------- You ask about terms like "infinite discontinuity", used by Stewart on p.519. As I said in class, this is one point on which I strongly disagree with his usage. A point x = a which he describes that way I (and most mathematicians) would describe as a point such that f(x) is unbounded in the neighborhood of x = a. ---------------------------------------------------------------------- You ask "Is it ever necessary to use the precise definition of the limit that we just learned in section 2.4 in improper integrals?" (pp.520-526) Generally speaking, the precise definition is used to prove various properties (such as the limit laws for sums, products, etc.), and these are what we use in other tasks, such as evaluating improper integrals. I can't say "never", but I think it will be rare to have to go back to the definition of limit when working with improper integrals. ---------------------------------------------------------------------- You ask, in the case of an integral from -infinity to +infinity, and its expression as the integral from -infinity to a plus the integral from a to +infinity (p.520), "If one of the integrals diverges, does the whole integral, from negative infinity to positive infinity, diverge?" Right. The whole integral only converges if both parts do. ---------------------------------------------------------------------- You ask about Stewart's statement on p.521 that 1/x has divergent integral from 1 to infinity because it doesn't approach zero "fast enough". Well, for easier understanding, let us think about adding up a series rather than integrating a function. If a man agrees to give you a pound of cheese the first day, half a pound the second day, a third of a pound on the third day, etc., then if he continued this forever, he would give you, in the long run, an infinite amount of cheese. (But it would take a long long time to see a "large" amount of cheese -- I think that to get a mere 10 pounds, you would have to wait something like 66 years. In real life, the amount he would give you would be limited by his and your lifetimes. And even if you lived forever, after trillions of years the amount he would give you per day would be less than a molecule, so the arrangement would no longer make sense. But in abstract mathematical terms, it is true that the series 1 + 1/2 + ... + 1/n + ... does have limit infinity.) On the other hand, if he started with two pounds of cheese, and gave you half of it (one pound) the first day, half of what was left (half a pound) the second day, and half of what remained on each succeeding day, then even going off to eternity, he would never give you more than the two pounds he started with. The difference is that the quantities of cheese he gives per day in the second scenario tapers off faster than that of the first scenario. So to understand in intuitive terms the difference between a convergent series -- or integral -- and one that does not converge, one can speak of "how fast" the function approaches zero. I hope this helps. ---------------------------------------------------------------------- Concerning the comparison of \int (1/x^2) and \int (1/x) on p.521, with the former converging, but not the latter, you write > Stewart offers the explanation that $1/x$ does not approach zero > fast enough to converge. But, given enough time (we are going to > infinity), wouldn't the function approach zero like $1/x^2$ ... ? If we were just interested in whether the function approached zero, this idea would be valid; but we are interested in how the integrals behave, and the "time" it takes to get to a given point comes into the calculation of the integral; the "time" is, roughly, the base of a rectangle whose area is base x height. If the time it took for one (positive decreasing) function to get down to a certain low value were merely a constant multiple of the time it took another to get there, your argument would still be valid; but, for instance, the time it takes 1/x to get down to 1/4 is twice the time it takes 1/x^2 to do so, the time it takes it to get to 1/16 is four times the time it takes 1/x^2 to do so, the time it takes it to get to 1/64 is 8 times the time it takes 1/x^2 to do so, and so on; and the end result is that the areas involved in the calculation of the integral of 1/x can add up to infinity, while those involved in the calculation of the integral of 1/x^2 don't. ---------------------------------------------------------------------- You quote the third paragraph on p.521 as saying that "1/x^2 is finite but 1/x is not". But this is not what it says! It says that the integral of 1/x^2 (from 1 to infinity) is finite, but the integral of 1/x (over the same range) is not. There's a world of difference! The visual way that Stewart expresses this is to say that the area under one curve (as one goes out to infinity) is finite, while the area of the other (in the same sense) is infinite. Since we can't see that whole infinite stretch of area, it is hard to visualize what the difference is. But calculation shows that it is so. ---------------------------------------------------------------------- You ask how, in the third display of the solution to Example 2 on p.521, Stewart gets from one step to the next. Note the words before that calculation: "... by l'Hospital's Rule we have". Did you learn l'Hospital's Rule in your AP calculus course? If you didn't, or if you are not sure you remember it in full, look up "l'Hospital's Rule" in the index of this text, and review it there. Of course, if you have any questions about how it works, you can e-mail them to me. Points to be learned from this: Stewart often explains his calculations, so if you don't understand one, look at the words that precede or follow it, or sometimes (though not this time) words he puts in the margin by the calculation. And if he refers to some topic you are not sure about, use the index. ---------------------------------------------------------------------- You ask whether it is possible for a function that converges when x --> infinity to have an integral which diverges when x --> infinity, and vice versa. Yes. The book gives examples; e.g., 1/x converges to 0 (i.e., approaches the limit 0) as x --> infinity, but as noted on p.521, its integral does not converge. [Perhaps you were using "f(x) converges" to mean "f(x) has convergent integral". That is not how the word is used, but if it is what you meant, you can get an example by taking the derivative of the above example.] The opposite situation is much less common, but it does happen; an example is given in Question "81(a)" to section 7.8, on the latest homework-sheet (but the question is not assigned). Question "81(b)" then challenges you to find an even more extreme sort of example (which you can do by carrying the idea of what happens in 81(a) further). ---------------------------------------------------------------------- You ask whether, in Example 3 on p.522 we couldn't just find the integral by integrating from -t to +t and taking the limit as t --> infinity. We could if we knew that the integral existed! But Exercise 61 on p. 528 (listed in the "interesting/challenging" category in this week's homework sheet) shows that the limit might exist even if the integral does not. So we need to do the two integrations to be sure the integral is defined. ---------------------------------------------------------------------- Regarding Exercise 31 in section 7.8, you write > ... I concluded that since 1/t^3 is different at t=0 from plus > side and negative side, the integral is divergent. ... No, that's not the criterion for divergence! Look at part (c) of the Definition on p.523. It doesn't say that the integral is convergent if the integrals from the left and from the right are equal; it says it is convergent if each of them is a convergent integral, and in that case its value is their sum. The reason this example is divergent is that 1/t^3 does not approach any (finite) limit as x approaches 0 from one side or another. ---------------------------------------------------------------------- You ask how we define the integral of a function over a range from a to b if it "blows up" at both ends, a case left out in the definition on p.523. Good question. The idea is exactly the same as in part (c) of the definition on p. 520, which does the same for integrals from -infinity to +infinity: break the integral into two parts, each of which is improper only at one end, and define the integral over the whole interval to be their sum. ---------------------------------------------------------------------- Concerning the warning Stewart gives on p.524(bottom)-525(top) at the end of example 7, you ask why one can't evaluate the definite integral "the ordinary way" in that case. The ordinary method of evaluating the definite integral is based on finding an antiderivative (in this case, ln |x|), and using part 2 of the Fundamental Theorem of Calculus (p. 391) to deduce that the difference of its values between the endpoints equals the definite integral. The Fundamental Theorem of Calculus is stated for continuous functions defined on an interval. 1/x is in Stewart's language discontinuous; in mine it is not defined on the whole interval, and in fact has a pole (singularity) at x = 0. Whichever way one says it, the Fundamental Theorem of Calculus is not applicable to such functions. Intuitively, if you think of trying to "add up little bits of" f(x), this process does not converge, so it doesn't make sense to say that the result of the process is the difference in the values of ln |x| between the endpoints. ---------------------------------------------------------------------- You ask whether the Comparison Test for improper integrals (p.525) can be inconclusive. Certainly. If we want to know whether a positive-valued function a(x) has convergent integral, and we choose some function f(x) that is greater than it, then if the integral of f(x) is divergent, that leaves it open whether the integral of a(x) converges or diverges. Likewise, if we choose a positive-valued function g(x) that is less than a(x), then if the integral of g(x) is convergent, that leaves it open what the integral of a(x) does. It is only if we can find an f(x) larger than a(x) whose integral converges, or a positive g(x) less than a(x) whose integral diverges, that the Comparison Test helps. ---------------------------------------------------------------------- You ask why, in Example 9 on p.526, the author starts by breaking the integral from 0 to infinity into the sum of the integral from 0 to 1 and the integral from 1 to infinity. Because the comparison test (as he has stated it) requires the assumption that f(x) \leq g(x) for all x in the interval considered. The relation x \leq x^2, and hence its consequence e^x \leq e^{x^2}, holds for all x\geq 1, but not for x\in(0,1), so he cannot apply the comparison test to the integral from 0 to infinity, but only to the integral from 1 to infinity. ---------------------------------------------------------------------- > ... you said that integrals that are improper on both sides > should NOT be done like this: lim t->infinity, integral of f(x) from > -t to t. Why is this wrong? Well, if the doubly improper integral does converge, that calculation will give the correct answer. But it can also give an answer if the doubly improper integral does not converge. For instance, as pointed out in Stewart's exercise 61 on p.528, it gives the value 0 for \int -infinity infinity x dx. If you agree that that answer is nonsense, good. If you think maybe that integral should be considered 0, then note that this answer isn't preserved under change of variables: If we let x = y+1, then that integral becomes \int -infinity +infinity (y+1) dy, and the same method of evaluation for it gives +infinity instead of 0. I'll give another such example in class, when I have time to get back to this unfinished discussion. ---------------------------------------------------------------------- You ask about finding an upper bound on arc length, to go with the lower bound used by Stewart to approximate the length on p.538. Well, if you have a curve that is convex in one direction or the other, and you draw tangents to it at a sequence of points, the first of which is the starting-point of the curve and the last of which is the end-point, and extend each pair of successive tangents till they meet, you get a polygon that is (roughly speaking) circumscribed around the curve; and its length will be an upper bound on the length of the curve. If a curve doesn't have that convexity property, but can be broken up into finitely many pieces that do (e.g., y = x^3 on [-1,1], which is convex upward for positive x, and downward for negative x), then you can bound the lengths of those pieces as above, and add up the results to get a bound on the total length. These cases cover most curves that are easily described. But there are, nonetheless, curves that are "wiggly" at every scale, so that one can't use this method to get upper bounds on their lengths, though you can still use polygonal approximations as in Stewart to bound it from below. So this bound is not as robust as that one. ---------------------------------------------------------------------- You ask whether Stewart's notation |P_{i-1}P_i| for the distance from P_{i-1} to P_i (pp.538-539) is standard. No it isn't, to my knowledge. It is common to use absolute value signs for the magnitude of a vector, so I think Stewart's idea is to use P_{i-1}P_i to mean the vector from the point P_{i-1} to the point P_i. Such notation might be used in physics, but I think they would put an arrow above P_{i-1}P_i to express "vector". Another notation would be |P_i - P_{i-1}|, where P_i - P_{i-1} would denote the vector whose coordinates are gotten by subtracting the coordinates of P_{i-1} from those of P_i. A simpler notation, which you are likely to see in Math 104, is d(P_{i-1}, P_i), where d(--,--) is the "distance function". ---------------------------------------------------------------------- You ask about the top displayed formula on p.539. As the sentence before it says, we get this from the Mean Value Theorem. If you don't remember that theorem from your preceding calculus course, you should have looked it up in the index! Did you? Let me know whether, when you look it up, it answers your question. ---------------------------------------------------------------------- You ask why the last displayed formula before Definition 2 on p.539, and the formula before that, are equal, as Stewart claims, "by the definition of a definite integral". Well, what definition of definite integral have you seen? If the one that you saw doesn't seem to connect with what he says, you should see what definition he gives. Using the index and a bit of searching, one finds the definition on p. 372, in the box at the top of the page and the comments that follow. Do you agree that the result he claims does follow from that definition? If that definition differs greatly from the one you saw in your previous course, let me know what the definition you saw was. ---------------------------------------------------------------------- Regarding the concept of surfaces of rotation on pp.538-543, you ask whether one can do a "double rotation" of a curve, rotating it first about one axis and then about the other. Well, one could; but since the first rotation would give a two-dimensional structure (a surface), the second would turn this into a three-dimensional structure (a solid). The boundaries of this solid would be hard to describe, so the volume would also be. But it's interesting to think about what one would get if one simply took a point, rotated this around the x-axis to get a circle, and rotated that circle around the y-axis to get a surface. (That is not an example of what is done in this reading, because the circle one would rotate around the y-axis would not lie in the x-y-plane, as the curves considered in this reading do.) Can you figure out what the resulting surface would be? A different way one could do two rotations would be in more than 3 dimensions. For instance, in 4 dimensions, calling the coordinates w, x, y and z, one could start with any curve in the x-y-plane, rotate this by performing a rotation on the y-z-coordinates (as one does in this section) getting a surface in x-y-z space, then perform another rotation on the w-x-coordinates, getting a 3-dimensional hypersurface in the whole w-x-y-z space. I don't think it would be hard to compute its volume; but it's outside the scope of this course. ---------------------------------------------------------------------- You ask how Stewart goes from the formula in the 3rd display on p.539, which involves f'(x*_i), to the integral in the 4th display, which involves f'(x). Stewart is using the definition of the integral -- see the boxed definition on p. 372. If you have questions about that definition, let me know. ---------------------------------------------------------------------- You ask why in the arc length formula on p.539, f' is required to be continuous. This is so that we can be sure that the integral in the definition of the arc-length is defined. Discontinuous functions may or may not be integrable -- this is a difficult topic, dealt with starting in Math 104. Continuous functions are always integrable. Curves with discontinuous derivatives may or may not have finite arc-length. ---------------------------------------------------------------------- You ask whether Simpson's rule (suggested on p.541) would be more accurate for estimating arc-lengths than simply adding up the distances of the line-segments in the same subdivision of the interval of definition. I don't know for sure, but my guess is that Simpson's rule would be more accurate. The line-segment computation has a built-in bias -- it gives smaller values than the real distance, because each line-segment is the shortest distance between its endpoints, hence shorter than the segment of the actual curve. Simpson's rule is designed to make two sorts of biases, those of the midpoint and trapezoid rule, cancel each other to a large extent. However, since these are both different from the bias of the line-segment approximation, one would have to do some calculation to answer your question with more certainty. ---------------------------------------------------------------------- You ask regarding Example 3 on p.541, "Does an arc length function exist for the hyperbola xy = 1, or is always necessary to estimate it?" The function exists -- but I believe it is not an elementary functions. Some non-elementary functions can be found in tables. It is also often possible to calculate them accurately by other means, such as power series expansions. Methods of estimation such as those we read about in section 7.7, which Stewart uses in this example, are yet another method. ---------------------------------------------------------------------- I hope what I said in class helped with your question about Figure 7 on p.542. The triangle shown is any right triangle with sides parallel to the x- and y-axes, and hypotenuse tangent to the curve at the point (x,y). From this, I hope you can see that for ds and dx as in that picture, the ratio ds/dx represents the "speed" with which the length of the curve is growing relative to x at that point. Imprecisely, but intuitively, the hypotenuse of that triangle can be thought of as an infinitely small segment of the curve, infinitely magnified. Whether that viewpoint helps you is yours to decide; if not, ignore it! ---------------------------------------------------------------------- You ask about Figure 7 on p.542. Well, although historically, "dx", "dy" etc. meant "infinitely small changes", this was hard to give precise meaning to; so various ways of making it precise were developed. The one which Stewart follows is to let dx denote any nonzero real number, then let dy denote the amount that y would increase if x increased by dx while the function continued to increase at a constant rate, rather than letting its rate of change vary as it actually does. In the picture, this is shown by the red line tangent to the curve: its x-coordinate increases by dx, its slope is constant, equal to the slope that the curve has at the starting point of that line, rather than changing as the slope of the curve does; and the the increase in the y-coordinate of that red line is called dy. Finally its length is called ds, which likewise represents the amount by which the length of the curve would increase if its slope remained constant rather than changing. The Pythagorean Theorem, applied to that picture, gives formula [8]. Something that may not be clear from the picture is that the line labeled "dy" is intended to label the height of the vertical red line. It is not shown right against that line because then it would run into the label "Delta y". In the above discussion, I've tried to strike a balance between precise and intuitive explanation. I hope it has helped. If Figure 7 still remains a mystery, I hope the rest of what Stewart says makes formulas [5]-[8] reasonable. ---------------------------------------------------------------------- As you say, if we compute the arc length of the function in Example 4, pp.542-543, from x=0, we would get an infinite result, since as x ->0, the logarithm function approaches -infinity, so x^2 - (1/8) ln x approaches +infinity. This shows that the graphs on p. 543 are badly drawn! Stewart was just interested in values \geq 1, but he should have had the graph drawn correctly for all values it shows. Thanks for bringing this this to my attention -- I'll include it in the list of comments and corrections I send Stewart at the end of the Semester! ---------------------------------------------------------------------- You ask, as Stewart does in his caption to Figure 8, p.543, why the arc length in that figure is shown as negative for x < 1. The arc length has to be measured relative to some starting point. Stewart arbitrarily chooses x = 1 as that starting point, so each point with x > 1 is a certain distance _after_ that point along the curve, and each point with x < 1 is a certain distance _before_ that point along the curve. Hence it is natural to define the arc length to have positive values for the former sort of points, and negative values for the latter. One could, of course, naively say "We define the arclength to mean the distance along the curve from the starting point, so it will be positive on both sides". But that definition would lead to messy consequences: The arclength would be given by the absolute value of the integral Stewart gives, rather than the integral itself; and when we wanted to change our starting point, this would have to be handled differently in different cases, rather than just by changing the constant of integration. So the definition described above, with positive values to the right of the starting point and negative values to the left, given by the integral without absolute value signs, is more useful. ---------------------------------------------------------------------- You ask why one uses the average radius in [2] on p.546. The formula before [2] can be written A = \pi (r_1 + r_2) l. The author could have left it that way, but he chose instead to write r_1 + r_2 as 2 times (r_1 + r_2)/2. That way, the term (r_1 + r_2)/2 has an easy interpretation, as the average radius, while the 2 times the \pi together give the coefficient we are accustomed to in the formula for the circumference of a circle. So he gets a formula that is easy to understand: 2\pi r l. ---------------------------------------------------------------------- You ask why on p.547, formula [4] contains the square root expression involved in defining arc-length, but formula [7] doesn't. In formula [7], the "ds" represents the differential of arc-length. As defined in equation [7] on p. 542, it already contains the square-root expression that appears in the other formula. (Note the words that precede formula [7] on p.547, "using the notation for arc length given in Section 8.1".) ---------------------------------------------------------------------- You ask about the relation between the "ds"'s one looks at when rotating a curve about the x-axis and about the y-axis, and whether one should use one formula for ds in the first case and the other in the second (pp.547-548). They represent the same thing; intuitively, the length of a "bit" of the curve that one is rotating. In both cases, one can compute the surface area using either for the formula for ds in terms of dx, or the formula in terms of dy. In Example 2 on p. 549, Stewart considers a parabola rotated about the y-axis, and shows that using the two formulas, one gets the same answer. ---------------------------------------------------------------------- You note that the surface area of a sphere, 4 pi r^2 (illustrated by Example 1 on p.548) is the derivative of its volume, (4/3) pi r^3, and you ask whether the areas of other surfaces can be obtained is derivatives of their volumes. In the case you refer to, we are looking at a whole family of spheres, one for each value of r, and as we increase r, the surface "grows" perpendicular to the tangent plane at each point. If we have a family of closed surfaces described in terms of some parameter t, such that as t changes, the surface grows at constant rate 1 in the direction perpendicular to its tangent plane, then the area at any value of t will be the derivative of the volume. But it can be tricky to design such families of surfaces; so one doesn't have a really useful general technique. ---------------------------------------------------------------------- You ask how Simpson's rule can be used to approximate the area of a surface of revolution, as suggested in Exercises 17-20, p.550, when Simpson's rule relates to the area under a curve, which is different from areas of surfaces of revolution. Simpson's rule is applicable to the integral of any function (and the error estimate applies whenever the function is continuously 4-times differentiable). Even though Stewart uses pictures in which the integral represents the area under a curve, in order to make the rule intuitively reasonable, it is not restricted to that case. Moreover, if one wishes, one can translate the problem of finding the area of a surface of revolution into that of finding the area under a curve: The formula for the area of the surface gotten by rotating y = f(x) around the x-axis is given by \int_a ^b f(x) \sqrt{1+f'(x)^2} dx, so it is equal to the area under the curve y = f(x) \sqrt{1+f'(x)^2} from x=a to x=b. ---------------------------------------------------------------------- You ask how Stewart handles the units in Example 2, p.554. The key step is substituting 62.5 for \delta. He notes in the margin on the preceding page that in customary units, the weight density of water is 62.5 lb/ft^3. ---------------------------------------------------------------------- You ask what the significance of moments (pp.554-560) is. As I think I said in class, moments don't have one single significance: they are a type of calculation that comes up in various situations. If you have a function of one variable, and you take the integral of that function multiplied by the n-th power of the variable, that is called the n-th moment of the function. More generally, if you have a function of several variables, and you take the multiple integral of that function multiplied by a product of various powers of the various variables, the results one get are likewise called moments of the function. (You won't see multiple integrals defined in general until Math 53, but Stewart uses in section 8.3 what is essentially the case of a double integral where the function is constant with respect to one variable, so the part of the integration that would come from that variable can just be replaced by the length over which one would integrate times the value of the function at that location.) Two cases of the first moment (which Stewart just calls "the moment") of a function of one variable that come up involve the torque on a lever, gotten by integrating mass times distance from the fulcrum, and the volume of a surface of rotation, where the distance from the axis of rotation becomes a factor in the integration because when one rotates an object, the distance a piece of it moves is proportional to its distance from the axis of rotation. Another way moments come up is in probability theory (section 8.5, which we are skipping), where the expected value of a variable is the moment of the probability function with respect to that variable. > ... why is the moment about the y-axis the sum of the > masses times the x coordinates? Because the distance from the y-axis is the x-coordinate. As I said in class, what Stewart calls "the moment about the y-axis" is better described as "the moment with respect to the x-coordinate". ---------------------------------------------------------------------- You write that you are having trouble understanding why Equation [4] on p.555 can be rewritten as M = sum m_i x_i. He doesn't say it can be! You need to read the words between the equations more carefully. What he says about M = sum m_i x_i (on the line below that equation) is that it "is called the moment of the system about the origin". So what is the relation with equation [4]? That equation shows us that sum m_i x_i is in important player in this situation, and motivates our giving it a symbol and a name. Once we have these, we can rewrite [4] as \bar{x} = M/m, or equivalently, as noted in the next sentence, as m\bar{x} = M. ---------------------------------------------------------------------- You ask regarding the display after the end of Example 3 on p.556, "... is the p supposed to be area density?" Good point! On p. 552, Stewart defines rho to be the density of a fluid; that is, the mass per unit volume. In the bottom paragraph of p. 556, on the other hand, he takes rho to be the "density" of a lamina, but doesn't say what this means. He must mean the mass per unit area. ---------------------------------------------------------------------- You ask about the first boxed formula on p.557. That equality follows from the definition of integration, given in the top box on p. 372. Ask if you have questions about this. ---------------------------------------------------------------------- Regarding the derivation of the equations for the center of mass on pp.557, you note that they are based on using midpoints, and ask whether it would be more accurate to use Simpson's rule. If we wanted to compute a center of mass using an approximation by dividing the mass into finitely many strips, then Simpson's rule would be more accurate than the midpoint rule. But both these rules are ways of approximating an integral, and the point of the discussion Stewart gives is to show what integral is being approximated. So once he gets to the integral, the approximation that he used to lead up to it doesn't matter. The best choice is simply the one that gives the quickest, clearest derivation. ---------------------------------------------------------------------- You ask what "Formula 9" refers to on line 4 of p.559 is. It is the formula at the bottom of the preceding page! In any section, "Formula n" means the formula numbered n in that section, while formulas from other parts of the book are described in ways that specify where in the book they are. ---------------------------------------------------------------------- You ask how one can use Simpson's rule in Exercise 36, p.561, when no formula is given for f(x). You are supposed to use the graph shown to estimate the values of f(x) at the different points needed. Presumably, you are expected to let n = 8, and use the values on the coordinate-lines. ---------------------------------------------------------------------- In connection with the material of section 9.1 (pp.580-584) you ask, "Are all differential equations solvable?" In one sense, namely "Given a differential equation, can we find an elementary function which is a solution to the equation?", the answer is certainly "No", since if the equation has the form y' = f(x), then a solution would be an integral of f, and we know that some elementary functions have nonelementary integrals (pp. 498-499). Another sense of your question is "Given a differential equation, must there exist a function (elementary or not) which is a solution to the equation?". The answer is still "No". But recall that I gave in class an example of an initial-value problem with more than one solution, but pointed out that in that case, the function f giving the differential equation was not differentiable at the relevant point; and I said that there are are theorems (at least one case of which you'll see in Math 54) saying that for "reasonable" differential equations y' = f(x,y), an initial value cannot correspond to more than one solution. Those theorems also tell us that for such equations, every initial value does corresponds to some solution. So for "reasonable" differential equations, the answer to your question is yes. ---------------------------------------------------------------------- > What exactly is logistic differential equation? It's just a strange name that people have given to a certain differential equation proposed to model population growth; equation 2 on p.581. Why is it called "logistic"? After looking online, I think the following is the explanation. The word "logistic", in addition to its other meanings, used to have the mathematical meaning "related to logarithms and exponentials". The mathematical biologist Verhulst, mentioned on p.581, not only proposed the differential equation for population growth, but found the solution, equation 7 on p. 609. Since this solution involves an exponential function (though it is not itself such a function), he called it "logistic growth". And since his differential equation leads to "logistic growth", it came to be called the "logistic differential equation". ---------------------------------------------------------------------- You ask why under the logistic equation, the population never exceeds the carrying capacity. This is not quite true in real-world situations. The carrying capacity might change (due to some environmental change), and a population that had been below the carrying capacity might find itself above the carrying capacity; or some circumstances might force a population out of a region that could sustain it and into one with a smaller carrying capacity. In that case, once the population was facing a new constant carrying capacity, the situation would be represented by one of the descending curves above "P=M" in Figure 3 on p.581. But if the population starts below or at the carrying capacity, and the carrying capacity is constant, as assumed on p.581, and the logistic equation truly holds, then the population will never rise above it, since to reach a value above it, it would have to have positive derivative at some point where P > M, which would contradict the logistic equation. (To see this, apply the Mean Value Theorem to an interval from the moment it crosses P=M to a moment when P>M.) ---------------------------------------------------------------------- Regarding Example 2 on on p.584, you ask "How does y' = .5(y^2-1) become equal to (2ce^t)/(1-ce^t)^2?" Hopefully, you understand that it is not .5(y^2-1) that becomes (2ce^t)/(1-ce^t)^2. Rather, Stewart is saying that examples of functions y satisfying y' = .5(y^2-1) are given by the functions y = (2ce^t)/(1-ce^t)^2. (Notice that the former equation is a formula for y', the latter a formula for y.) Stewart isn't saying here how to discover the solution to this differential equation! In this section he is teaching us what it means for a function f to be a solution to the equation. Once we understand that, he will show us, in later sections, some methods of finding solutions. ---------------------------------------------------------------------- Regarding Example 2 on p.588, you ask what an "equilibrium solution" is. The first thing to do when the book assumes you know a term and you don't recognize it is to look in the index. If you don't find it there, or have trouble understanding the definition, then ask! ---------------------------------------------------------------------- You ask why Euler's method (p.589) is needed; why the direction field method is not enough. Well, when we draw a direction field, we can only put our little line segments at a limited number of points. Say we put them at every interval of .05, so that there are 20 per unit, and thus 400 per 1 x 1 square. (A lot of work to draw, especially getting the slope of each "just right"!) And suppose we have a perfect hand, that can draw a curve that exactly matches the slopes of those segments (which is actually unreasonable to assume). Still, what do we do *between* segments? If we have reached x = 0.35, getting a y-value of 0.72, the slope we use should be somewhere between the slopes of our little segments at (0.35,0.70) and (0.35,0.75), but just what in between value will it be? One can take a slope 2/5 of the way between the values at those points; but that will only be correct if the function F(x,y) is un-curved between those points. And even if we know the correct slopes at, say x = 0.35 and x = 0.40, what do we use for slopes between those x-values, as we move our pencil from one to the next? Even Euler's method only gives an approximately correct answer. But there, we can take the intervals between the points we use as small as we want, depending only on how much computing time we are willing to use. And even before computers were available (they certainly weren't in Euler's time!), the computations could be done by accurate arithmetic to many decimal places, rather than depending on judgements of eye and hand. Incidentally, to a mathematician, a "direction field" means something much more abstract than what Stewart describes: a function that associates to *every* point a direction; rather than the approximation one gets by drawing little line segments at regular intervals. A direction field in this abstract sense does exactly determine the solution curves, which neither Euler's method nor the method that Stewart gives us does. But the abstract concept does not give us a way of computing that solution. ---------------------------------------------------------------------- You ask about the term h F(x_{n-1}, y_{n-1}) in the description of Euler's method on p.590. Given that y_{n-1} is our approximation of y at x = x_{n-1}, we use the differential equation y' = F(x,y) to approximate the slope of the curve, getting F(x_{n-1}, y_{n-1}). Since the interval from x_{n-1} to x_n has length h (see the beginning of that paragraph), the amount that we estimate that y changes over that interval, namely, the length of the interval times our approximation of the slope, is h F(x_{n-1}, y_{n-1}). Our approximation for y at the end of the interval is the sum of our approximation at the beginning of the interval and our approximation of the change: y_{n-1} + h F(x_{n-1}, y_{n-1}). We name this y_n. (Incidentally, if most of the pictures in the margin give the impression "Those approximations are way off!", this is because Stewart has taken h large enough so that you can easily see the difference between the curve and the approximation. When one takes h small enough to get a really good approximation, as in the higher curves in Figure 16, then it is harder to see the process.) ---------------------------------------------------------------------- You ask about the meaning of the display "h(y) dy = g(x) dx" on p.594. Good question! Historically, "dx" and "dy" were symbols for the "infinitesimal" quantities that \Delta x and \Delta y turned into as they approached 0. So differentiation was the process of taking the ratio of these, and integration was the process of summing (infinitely many) such infinitesimals. Then George Berkeley pointed out that this had no logical backing, and mathematicians started defining dy/dx as the limit of the ratios \Delta y/\Delta x, and \int f(x)dx, as the integral of f(x) with respect to x (also defined as a limit), without giving meanings to dy and dx themselves. But the latter were so convenient for thinking about these things that they eventually came up with ways of redefining them: in Stewart, see the first paragraphs of the section "differentials" on p. 253. But having made the definition he gives there, which is compatible with the definition of dy/dx, is it also compatible with the definition of integration, so that equality of h(y) dy and g(x) dx implies equality (up to a constant of integration) of the corresponding integrals? Well, it took me a bit of searching to find where Stewart justifies this, and he doesn't go into much detail; but see p. 408, "The Substitution Rule" and the two paragraphs that follow. ---------------------------------------------------------------------- You ask whether putting equation [1] on p.594 into "differential form" and then integrating both sides is valid. I would say, yes, but to see definitions that justify it, one would have to go to a higher-level course (and in this case, I'm not sure what that course would be -- maybe Math 214.) However, one can justify [2] more easily than Stewart does. The version of "h(y)dy = g(x)dx" that doesn't involve differentials is h(y)y' = g(x). Now if h has an antiderivative H, and g has an antiderivative G, then the two sides of the above equation are the derivatives of H(y) and G(x). Since these derivatives are equal, the two functions must differ by a constant, so we get H(y) = G(x) + C, which is, essentially, [2]. ---------------------------------------------------------------------- You ask how Stewart goes from the 3rd-from-last equation on p.594 to the next-to-last equation. Well, the integral of h is some function; let us call it H. (The general integral will have the form H + C for some constant C, but we are just interested in any one integral, which we call H.) Thus, the expression \int h(y) dy means H(y). In this discussion, y is a certain function of x. The left-hand side of the first of the two equations you ask about can now be written d/dx H(y), and the Chain Rule (which Stewart has said he will use) turns this to (d H(y)/dy)(dy/dx), which is the left-hand-side of the second equation. Meanwhile, on the right-hand side, Stewart is applying the Fundamental Theorem of Calculus; if we give the integral of g the name G, he is rewriting what we would call d G(x) / dx as g(x). (At the next step, he will similarly apply the Fundamental Theorem of Calculus to the left side.) ---------------------------------------------------------------------- You ask whether Stewart is using circular reasoning on p.594, where he deduces [2] from [1] and then at the bottom of the page deduces [1] from [2]. No; in that last step, he is *checking* his solution, not deducing a new fact. The reason he wants to check it is that one may be unsure whether expressions like "h(y) dy" which he used in getting [2] from [1] have a clear meaning. He has shown us how to do the calculation using such expressions because it is such an elegant and easy-to-remember method; but once he has the solution it gives, he wants to justify it, which he does by checking that [2] implies [1]; i.e., that a function y satisfying [2] will satisfy [1]. > ... should circular logic be avoided all the times? The phrase "circular reasoning" means assuming what one is trying to prove in an attempted proof of it; so in that sense, it is never valid. But there are many things having some resemblance to that which are useful; for instance, talking about what one hopes to prove before beginning a proof; or (as I mention in the handout on induction) proving the n=k+1 case of a statement from the n=k case. ---------------------------------------------------------------------- Regarding Example 1 on p.595, you note that Stewart simplifies the solution by writing K in place of 3C, and you ask, "Since C is an arbitrary constant, can we write C in place of 3C?" Well, sometimes we do say things like "Let us write C for what we previously called 3C". Other times, one chooses to keep one's notation consistent between different parts of a calculation, and uses a different letter. One tries to balance the goals of simplicity, brevity, and clarity. Different people make different choices. ---------------------------------------------------------------------- You ask about the uniqueness theorem Stewart mentions in the margin at the top of p.596. A uniqueness theorem says that under certain conditions, the solution to a problem is unique. The uniqueness theorem Stewart talks about is for a differential equation together with an initial condition; or more generally, together with a condition saying that at a certain x-value, the function has a certain y-value (even if the x-value isn't at the beginning of the interval, as in the case of an initial condition.) The uniqueness theorem doesn't apply to all differential equations. For instance, I pointed out last week that the equation y' = 2 \sqrt{y} has solutions that start out following the x-axis, y=0, and then at some arbitrary point x = a start following the curve y = (x-a)^2. This misbehavior is related to the fact that the function \sqrt{y} grows very fast near y = 0. Uniqueness theorems typically have bounds on how the functions involved grow. Such a theorem is applicable to the equation Stewart is looking at here y' = x^2 y, in which the function x^2 y doesn't grow unreasonably fast). Now since y = 0 is a solution to the equation, we see from the uniqueness theorem that any solution which has y = 0 at one point has to be equal to that solution everywhere; so a solution which is not everywhere 0 can't take on the value 0 at all, which is what Stewart is claiming. To see a uniqueness theorem such as Stewart is referring to, see http://en.wikipedia.org/wiki/Picard-Lindelof_theorem . The bound on how the function grows is the condition immediately following the displayed equations at the beginning: "Suppose f is Lipschitz continuous in y and continuous in t." If you click on "Lipschitz continuous" you'll see what that condition is. ---------------------------------------------------------------------- You ask what uniqueness theorem Stewart is referring to in the top marginal note on p.596. Good question! I'm not sure; probably a theorem saying that if we have a solution to y' = f(x,y) that passes through some point (x_0,y_0), and if f is not too badly behaved, then no other solution can pass through (x_0,y_0). (We saw in class on Monday that 2 \sqrt y was badly behaved at y = 0, since its derivative near y=0 approaches infinity, and that y' = 2 \sqrt y in fact had nonunique solutions with certain initial values.) Now since the constant function y = 0 is a solution to dy/dx = x^2 y that passes through every point (x,0), it follows from that uniqueness that no solution other than that one -- i.e., no solution that ever takes on a nonzero value -- can pass through any point (x,0). This says that every nonzero solution to that equation stays nonzero. But I'll suggest to Stewart that he either try to make that comment more informative, or drop it. ---------------------------------------------------------------------- You ask, regarding the point on the top half of p.597 where Stewart writes +- (e^-3c)(e^-3t) = Ae^-3t, how the +- sign can disappear? Stewart is letting A denote e^{-3c} if the sign is +, and -e^{-3c} if it is -. So in either case, +- e^{-3c} becomes A. To do this, one has to realize that the sign + or - will be constant in any solution to the differential equation -- the function can't jump between a solution based on a positive and a negative coefficient -- so if we write A = +- e^{-3c}, then A really is a constant. ---------------------------------------------------------------------- You ask whether there is a family of curves depending on a single parameter k, for which it is impossible to find a system of orthogonal trajectories as discussed on p.597. That depends on what sort of messy behavior we allow. (Mathematicians call such messiness "pathology".) If we don't require the curves to be differentiable (e.g., if we let them "wiggle" at arbitrarily small scale), then the concept of orthogonality has no meaning. In a much milder vein, we might have a family of curves that cross each other, such as the parabolas y = (x-c)^2. Then we wouldn't know which curve through each point to make our orthogonal curves orthogonal to; though this could be handled by noting that around each point with y > 0, the curves that pass by can be divided into two clearly distinct families, so we can get curves that are orthogonal (in such a region) to those in one family, and other curves that are orthogonal to those in the other family. Finally, there can be families of curves that are nice in all the above senses, and which have orthogonal families, but for which the latter family is not given by elementary functions, so that we can't "find" it in the sense of writing down a formula for it. (More than you bargained for when you asked your question, I would guess.) ---------------------------------------------------------------------- Regarding the equation at the beginning of "The Logistic Model" on p.606, you ask, "How is P determined to be small enough to prefer to apply the exponential model over the logistic model?" If we're looking at a situation for which we know the conditions well, the exponential model is applicable when the population is small enough so that competition for resources is not important. If we're looking at data and don't know much about the details of the situation, then we apply the exponential model if it seems to fit those data well. ---------------------------------------------------------------------- You asked why in the logistic equation dP/dt = k P(1 - P/K) (p.607), the second factor has the form 1 - P/K rather than K - P, which would likewise have the effect of making growth decrease to zero when the population approached K. This is so that when P is small, i.e., when the problem of overpopulation doesn't limit the growth, the equation approaches the natural growth equation. Of course, P' = k P (1- P/K) could be rewritten P' = L P (K - P), taking L = k/K. But this L doesn't have a very natural meaning for the population growth, while k does: it is the relative growth rate when the population is small. ---------------------------------------------------------------------- Regarding the paramecium problem on p.610, you ask how we find values like the "k" given there. You ask "Do we just plug in some numbers until we get a good value (line of best fit kind of thing)?" There's a technique for computing "best fit" -- the method of least squares, covered in Math 54 and sometimes again in Math 110. I think Stewart is intentionally vague about how the numbers were found because that sort of computation is not a topic of calculus. ---------------------------------------------------------------------- You ask how, in the experiment described on pp.610-611, Gause could estimate the carrying capacity as 64, when it reached the value 76 early on. It's hard to know what to make of the data we are given about that experiment. One could justify the value 64 on the assumption that when the number of individuals is small, it can sometimes exceed and sometimes fall under the number M that the environment could support on a long-term basis; so that an *average* of the population over a period of time when it had stopped growing would be a reasonable value for the carrying capacity. It's also not clear whether the numbers we are given represent the actual population of paramecia, or the number that he counted in a fixed-size sample; e.g., one droplet of water, put under the microscope each day. If the latter is the case, then if that daily sample constituted, say, 1/100,000 of his paramecium culture, then the actual populations would not be small numbers like 2, 3, ..., 57, but hundreds of millions of paramecia, and the fact that the numbers bob up and down irregularly might just mean that the sample droplet he took each day sometimes contained more and sometimes less of the paramecia in the culture. ---------------------------------------------------------------------- Regarding the Paramecia example on pp.610-611, you write > In the beginning of the solution, Stewart comments that biologist G.F. > Gause used the same relative growth rate from the exponential growth > model for paramecia for his logistic growth model. Stewart states that > this is reasonable because the initial population is small compared to > the carrying capacity. Roughly how large can the initial population > be compared to the carrying capacity before this assumption ceases > to be reasonable? Such things depend on the degree of accuracy that one wants in determining one's constants. As I said in class, despite Gause's giving the relative growth rate to 4 decimal places, he could not have been claiming it was accurate to that many places, so we don't know what accuracy he was claiming. Even more important than the ratio of the initial population to the carrying capacity is the ratio of the later populations that are used to estimate the "initial relative growth" to the carrying capacity. E.g., if Gause based his 0.7944 on the values of P(0), P(1), P(2), P(3), as may well be the case, since the upper curve in Figure 4 (p. 611) seems to be nestled nicely among those first four points, then that value of k may be far from appropriate for the logistic curve, since those values extend into a region where the black and red curves of Figure 4 are quite far apart. Perhaps the best value to use for logistic growth would have been larger, leading to a curve that rose faster before flattening out, and fit the data better. On the other hand, I find it suspicious that P(3) sits squarely on the red curve in that graph. I wonder whether Gause chose his coefficients in the logistic equation precisely to make P(0) = 2, P(3) = 16, and K = 64. If that was so, then he didn't really estimate k by applying the exponential model to the initial growth. Hmm, I just pulled out a table of logarithms and found the value of k that would the values of P(0), P(3), and 64 in the limit; and it was 0.77846. Close, but not 0.7944. So we really can't tell how he computed his value. ---------------------------------------------------------------------- You're right that the most of the models of population growth we've been shown make extinction impossible! The models in the lower half of p.612 would allow it, though. For simplicity, let us drop the 1 - P/M term from both, since we are talking about a situation where P is low, rather than close to the carrying capacity of the environment. Then both curves take the form P' = kP - constant, and if P is low enough so that kP is less than that constant, it will decrease at an ever-growing rate. (And, if we believe the equations, it will become negative and keep getting more negative.) If P is above the critical value, then according to the equations, extinction would never occur; but in reality, changes in the environment could change the values of the constants involved, so that a population that had been above the critical value would suddenly find itself below it, and if things don't change for the better, become extinct. ---------------------------------------------------------------------- In connection with the material on p.616, you write > ... Stewart goes from xy' + y = (xy)' to (xy)' = 2x. > How does he make this connection? Good question! Note that between the two equations that you quote, he says, "and so we can write the equation as"; so we have to find "the equation" he is referring to! This is the equation xy' + y = 2x, in the sentence containing equation (2). When I write him, I will suggest that he display and number that equation, and refer to it by number. ---------------------------------------------------------------------- Regarding the top displayed equation on p.617, you ask why there is a constant of integration C, when the indefinite integral hasn't been evaluated. Well, I guess Stewart is thinking of "\int I(x)Q(x)dx" as denoting any particular choice of antiderivative for I(x)Q(x), and is throwing in the "+ C" to get an expression that will denote all antiderivatives. I don't know whether he has a fixed convention on whether an indefinite integral symbol means "all antiderivatives" or "some antiderivative"; in his tables of integrals, he leaves out the + C from the integral itself, and only shows it on the solution. I'll raise the question in my e-mail to him at the end of the Semester. ---------------------------------------------------------------------- You ask whether we should have a constant C in the integrating factor introduced by Stewart on p.617. We don't need one. When one multiplies by an integrating factor, this transforms the equation into an equivalent equation, hence into one that has exactly the same set of solutions. So it's enough to find one integrating factor. Stewart makes this point in the sentence before display [5], saying "We are looking for a particular integrating factor, not the most general one ...". ---------------------------------------------------------------------- You ask how Example 2 on p.618 would change if we dropped the condition x > 0. Well, note that the differential equation x^2 y + xy = 1 can't have a solution that behaves nicely at x=0, since there the equation becomes 0 = 1. So we can find solutions that are defined for negative x, and solutions that are defined for positive x, but these will not connect with one another. If we want a solution that satisfies y(1) = 2, i.e., that passes through the point (1,2), is has to be a solution defined for positive x. Hence Stewart's condition x>0. If we drop the condition y(1)=2, then we can look separately for solutions defined for positive x and solutions defined for negative x. The difference would come up when we integrate 1/x. The expression for that integral that Stewart usually shows is ln |x|. In this problem, since he had x > 0, he was able to write this as ln x. If we looked for solutions for x < 0, then we could make the integral ln (-x). We could then find such a solution passing through and point (a,b) with negative a. ---------------------------------------------------------------------- You ask about the limits of integration on the integral in the second display on p.619. That display is gotten by making the first display more precise. In the first display, the indefinite integral shown means "any function whose derivative is e^{x^2}". The different functions having that derivative differ by constants. If one adds a constant one such function, then, since it is multiplied by e^{-x^2} in the formula for y, that formula gets a constant times e^{-x^2} added to it. This just corresponds to choosing a different constant C in that formula; so the set of functions described by the first display doesn't change if one chooses a different indefinite integral. But if we want an answer that doesn't involve a random choice of function with derivative e^{x^2}, we choose a particular such function, namely the definite integral of e^{x^2} from 0 to x. The Fundamental Theorem of Calculus tells us that the result has the desired derivative. We could instead use the definite integral from any fixed starting point a to x; this would differ from the value we have chosen by the integral from a to 0 of e^{x^2}, which is a constant; and for the reasons stated in the preceding paragraph, it would again give the same set of solutions. The choice a = 0 just gives us a particular formula to write down. ---------------------------------------------------------------------- You ask whether the unit of time is significant in the differential equation [7] shown for the electric circuit on p.619. In that differential equation and the discussion that precedes it, the statement that the voltage across the inductor is L(dI/dt) depends on the inductance L being expressed in units compatible with those of the time t and the current I. When time is expressed in seconds, and current in amperes, then the corresponding unit of inductance is the henry. If units that did not match were used, then one would have to put a correcting factor into the formula L(dI/dt). (E.g., if one kept the ampere and henry, but expressed time in milliseconds, one would have to use the formula 1000 L(dI/dt).) ---------------------------------------------------------------------- You ask whether the natural growth function dR/dt = kR for rabbits in the absence of wolves (p.622) is similar to the model of the growth of a rabbit population given by the Fibonacci sequence. Yes, but the formula defining the Fibonacci sequence is a "difference equation" rather than a "differential equation": Instead of describing the "instantaneous" rate of change, it describes the change over a fixed interval of time. If we write it as f_n - f_{n-1} = f_{n-2}, it says that the change in the number of pairs of rabbits as n changes by 1 is given by the number of pairs of rabbits -- and not the present number, but the number two units of time ago. The relation between solving difference equations and solving differential equations is like the difference between summing a series and integrating a function. It is often difficult to find an explicit solution to a difference equation (though one can easily compute any number of terms), just as it can be difficult to find an explicit formula for the sum of a series. However, linear difference equations with constant coefficients, like linear differential equations with constant coefficients, have solutions which, in general are given by linear combinations of exponential functions; and this approach leads to an expression for the general term of the Fibonacci sequence in that form. There is a free online calculus text, "Difference Equations to Differential Equations" by Dan Sloughter, http://synechism.org/drupal/de2de/ , which I suppose, based on its title, emphasizes the relationship between the two sorts of equations. (But I haven't gone through it.) ---------------------------------------------------------------------- You ask whether solutions to the Lotka-Volterra equations (p.623) other than the equilibrium solution approach the equilibrium solution, or go around and around endlessly. They go around and around endlessly, since they have to stay on a curve of the sort I showed in class, which does not approach the equilibrium point. On the other hand, modified equations, like that of Exercise 9, may approach the equilibrium solution. As for real-world predator-prey situations -- the ones we hear about do seem to oscillate rather than ending up at equilibrium. But it might be that some pairs of species tend to reach equilibrium, and others tend to oscillate (depending on subtle differences in the genuine equations for their interaction), and that only the oscillating ones get mentioned in books on differential equations. ---------------------------------------------------------------------- Regarding Stewart's comment just before Example 1 on p.623 that it it usually impossible to find explicit formulas for the R and W satisfying the Lotka-Volterra equation, you ask > ... Are there certain type of differential equations which have > been proven to be impossible to solve non-graphically? ... We have already seen that there are elementary functions whose integrals are not elementary functions. Since an integration problem is a special sort of differential equation, these are examples of differential equations that cannot be "solved", if by solving one means naming the solution as a certain elementary function. Likewise, there are differential equations which are not themselves integration problems, but which can be reduced (using the method of separable equations, or the method described in the handout) to integration problems where the integral is not an elementary function. Doubtless there are also differential equations whose solutions are non-elementary functions which don't arise in this way from integrals. But as with the case of integrals, one can always give the solution to such an equation a name and a symbol, calculate tables of values, and solve other differential equations with the help of that function. ---------------------------------------------------------------------- You ask about Stewart's instruction to "make sketches of R and W as functions of t", which he states as part (e) on p.623, and claims to do on p. 625. That is a topic on which I had already made a note to write to him! The phase trajectory gives no information as to the relative speeds of progress at different points. So all one can really put into these sketches is information as to how high and how low each curve gets, and what point of the cycle each one is in when the other is at a given point of the cycle. The additional information that Figures 2 and 3 imply (e.g., that the wolf population falls more slowly than it rises) certainly can't be deduced from the phase diagram. ---------------------------------------------------------------------- > In point (b) on p.624, Stewart uses the chain rule to get > dW/dR, why did he do that? Is it wrong to just divide dR/dt > by dR/dt to get dW/dR or do we have to use the chain rule? Well, when we are dealing with situations where there is just one independent variable, expressions like dy/dx behave like fractions, and the chain rule dW/dt = dW/dR dR/dt can be thought of as cancellation of a numerator and denominator. So it is safe to treat these derivatives symbolically as fractions, and multiply them and divide them as one would multiply and divide fractions -- if one remembers that this only works in the case of 1 independent variable. But when you get to Math 53, you will have a more general kind of differentiation: Given a function of (say) two variables, y = F(w,x), you will learn about taking "the derivative of y with respect to w as x is held constant" and "the derivative of y with respect to x as w is held constant", which will be written "curly-d y / curly-d w" and "curly-d y / curly-d x" (where "curly-d" is a symbol sort of like a backwards 6). In these situations, one can't treat them simply as fractions: the chain rule takes a more complicated (though still elegant) form. So in conclusion, you can "just divide dR/dt by dR/dt" -- if you keep in mind that this only works when there is just one independent variable. ---------------------------------------------------------------------- You ask how one could give versions of Figures 1 and 2 on p.624 that bring in time. Well, in Figure 1, one could make the little line-segments have length proportional to the speed. This simply means that the difference between the R-coordinates at the two ends should be proportional to R', and the difference between the W-coordinates of the ends should be proportional to W'. But this might be difficult in practice, because where the speed is small, the segments might look nearly like dots, while where it was large, they could end up running through one another. In Figure 2, one could put many dots along the curves, at intervals corresponding to some unit of time. But since the times that the different curves take to get back to where they started cannot be expected to have convenient ratios, it would not in general be possible to choose the unit of time so that the spacing rule applied everywhere -- one would, I suppose, give each curve a "starting point" (say, its bottom, or top, or leftmost or rightmost point) and stipulate that the distance from the last dot before that point to that point does not necessarily represent the same time-interval as all the other distances do. ---------------------------------------------------------------------- You ask about the time it takes to complete various trajectories of the predator-prey equation in phase space (p.624). One certainly can't tell that from the direction-field alone. As I indicated in class, one could draw a direction-field diagram in which, at each point one had, not just a mark showing the direction, but an arrow whose length represented the speed with which the system moved in that direction; and using this, one could draw curves with widely or narrowly spaced markings indicating the passage of time. > ... do the trajectories further out imply a longer period, or do > they imply the same period because the derivatives will be higher? Near the equilibrium point -- let's call it (R_e,W_e) -- I am fairly sure that the period will approach some nonzero limiting value. This is because the pair (R',W') can be approximated near that point by a linear function of (R-R_e, W-R_e), and the differential equation determined by a linear function rotates the whole plane (around concentric ellipses) with a constant period. As we move out from the equilibrium point, and the linear approximation fails, the period doubtless changes. It's not obvious whether it will increase or decrease. My guess is that it will increase; because if we take an initial point with W very very low, then R will grow with approximately "natural increase" for a long time, until the initially tiny value of W builds up enough to start bringing R down. ---------------------------------------------------------------------- You ask how there can be an equilibrium point in a phase portrait like that on p.624, when every point shows a slope indicating change in the population. At the point marked by the red dot in the center of the phase portrait, there is no slope: It is the point where the numerator and the denominator of the formula near the top of p. 624 are both 0. So dW/dt and dR/dt are both 0, and the population does not change. The thing to remember is that the phase portrait ignores time. When one is near the equilibrium point, the functions R and W have very slight change per unit time; but the portrait just shows dW/dR, which doesn't reflect the smallness of the change of each function. I mentioned in class that one could refine the technique of drawing a phase portrait by replacing the sloping direction-field symbols by arrows, such that an arrow is longer if dR/dt and dW/dt are larger, and shorter if they are smaller, even when their ratio (and hence the slope of the arrow) remains the same. Then as one looked at the phase portrait, one would see arrows getting shorter and shorter as one got near the equilibrium point, and finally going to zero length at that point. ---------------------------------------------------------------------- You ask why the ratio between minimum and maximum population of rabbits in Figure 3 on p.625 is so much more extreme than the ratio between minimum and maximum population of wolves. Interesting question; I don't know the answer. Mathematically, it probably has to do with the fact that the coefficients that Stewart has given us in the equation for dW/dt on p. 623, .02 and .00002, are much smaller than the coefficients in the equation for dR/dt, .08 and .001. This may have the effect that the population of wolves grows and shrinks to a smaller degree than that of rabbits. What this corresponds to in the real world is harder to guess. I can imagine that wolves might eat rabbits until the rabbit population is extremely low, and then, when that happens, switch to eating mostly other foods -- mice, berries, etc.. But this would mean that they aren't really behaving in accordance with the equations, which seem to be based on the assumption that they need rabbits to live. So it might be that the coefficients in the equations are distorted by people trying to use the equation to model data from a real-world situation that really doesn't fit it. ---------------------------------------------------------------------- You ask whether it might be appropriate to introduce a carrying capacity for wolves into the population equations, as Stewart does for rabbits on p.626. Well, insofar as the idea of carrying capacity is based on limitation of resources, and it seems to be assumed that the resource limiting the population of wolves in this scenario is food, and that rabbits are their main source of food, the Lotka-Volterra equations already involve "carrying capacity". But it doesn't appear in the the same form as in section 9.1, where it is essentially a P^2 term on the right-hand side of the equation. Which version models reality better I don't know. Anyway, it might be reasonable to introduce a section-9.1-type carrying capacity term in connection with some other resource. ---------------------------------------------------------------------- You ask why, in studying sequences, we can only define limits as n --> infinity (p.692), and not limits as n approaches other values. For a function f defined on an interval of the real line, the idea of the definition of lim_{x->a} f(x) is based on looking at the values of f(x) for x arbitrarily close to a (i.e., differing from a by less than any positive real number \delta). But if we instead have a sequence (a_n), and a particular positive integer N, we can't look at the values a_i for i "arbitrarily close to N", because i ranges over the integers, so the difference |N-i| can't be made any smaller than 1. In other words, the set of integers is discrete, so we can't look at one integer "approaching" another. (That idea reminds me of a bit of graffiti I once saw on a bulletin board in Evans Hall: "\sqrt{3} = 2 for large values of 3".) (There are some concepts of integers being "close" occurring in more advanced areas of math, which are very different from the familiar one, and with respect to which one can define the kinds of limits you ask about. For instance, one can fix a prime number p, and consider two integers m and n to be "closer" the larger the power of p that divides m-n. With respect to that concept, called the "p-adic metric", one can indeed talk about a sequence having a limit as n approaches an arbitrary integer. But that topic is far from freshman calculus. One might see a bit of it in Math 115; and more in Math 254AB.) ---------------------------------------------------------------------- You ask why the precise definition of a sequence having limit "infinity" (p.693, Definition 5) does not use absolute values, like definitions of other sorts of limits. The absolute value signs (and the specification that the absolute value be less than some value delta or epsilon) are involved when a finite value is being approached either by the independent variable x or by the dependent variable f(x). This is because a finite value can be approached from either side; so to say that x is near a on one side or the other, or that f(x) is near L on one side or the other, one uses the formula |x-a| N and f(x) > M (p. 140, Definition 9) or n > N and a_n > M (p. 693, Definition 5). When one of the two variables (independent and dependent) approaches infinity, and the other approaches a real number, one gets definitions that contain one inequality involving an absolute value, and one without (p. 115, Definition 6, p. 130, Definition 1, p. 692, Definition 2. The same happens when one of the variables is approaching -infinity.) ---------------------------------------------------------------------- You ask why the Squeeze Theorem for Sequences (p.694, first boxed statement) is only stated for limits as n --> infinity, while the Squeeze Theorem for Function (p. 105) is stated for limits as x approaches an arbitrary a. For a function of a real variable, one can talk about the limit as that variable approaches an arbitrary number or infinity or -infinity. But for a function of an integer-valued variable, there is no concept of letting the variable "approach" an integer n. An integer either equals n, or it differs from n by at least 1; there is no "getting closer and closer". So limits as n approaches +infinity are the only kind that we can look at. (In other areas of mathematics, there are concepts of integers approaching integers. For instance, if one is interested in divisibility by 2 (or some other prime p), one can regard m as "close to" n if n-m is divisible by a large power of 2 (or generally, p); and one can define concepts of limit with respect to this concept of "closeness". You would see these concepts in Math 254; but they're out of the ballpark for Math 1AB.) ---------------------------------------------------------------------- You ask how, in the proof of the Monotonic Sequence Theorem, p.698, one can be sure that \{a_n\} has a least upper bound. Stewart gives the reason before he makes the assertion -- he says "By the Completeness Axiom". If you had skimmed too quickly, and not noticed his statement of the Completeness Axiom right before the theorem, then seeing that reason given, you should have looked back to see what the axiom was about. When you don't understand something in a math text, it is always a good idea to first look right before what you don't understand. But if it didn't occur to you that the Completeness Axiom might have been given right before the theorem, you should have gone to the index of the text and looked up "Completeness Axiom". It would have told you that the axiom was on p.698, and looking over the page, you would not have had too much trouble finding it. (It is named in bold type where it is stated.) Once you've read the Completeness Axiom, let me know whether you can see how it is applied in the second sentence of the proof of the Monotonic Convergence Theorem. If you still have trouble, try to be precise about what it is -- how much you can see of the connection between the proof and the Axiom, and where you have trouble fitting them together. ---------------------------------------------------------------------- You ask about the distinction between mathematical induction (p.699, note in left column) and deduction, and how often each is used. I would call any kind of precise reasoning "deduction"; so it would include mathematical induction. In nonmathematical usage, "induction" can refer to a non-rigorous kind of reasoning. The Oxford English Dictionary gives, as its 7th meaning of the word, Logic. a. The process of inferring a general law or principle from the observation of particular instances (opposed to DEDUCTION, q.v.). But in mathematics, where mathematical induction is a rigorously valid tool, there isn't a contrast between it and deduction. (The OED's 8th meaning of the word is that of mathematical induction.) Mathematical induction is only one of many tools of mathematical reasoning, and a somewhat sophisticated one, so it occurs only in a small fraction of the cases of deduction; but it is a powerful tool in those situations where it is needed. There are also many situations where we use mathematical induction in a situation where the reasoning is intuitively clear, and we don't think to call it by that formal name. E.g., knowing that the derivative of a polynomial of degree n>0 has degree n-1, we can "see" that the k-th derivative of that polynomial for k\leq n is n-k; though to argue this precisely, one would have to use mathematical induction. ---------------------------------------------------------------------- You ask how, in Example 7, pp.707-708, showing that s_{2^n} diverges, implies that the whole sequence of partial sums s_n diverges. If the sequence s_n converged, then there would be some L such that, as n --> infinity, the values of s_n became arbitrarily close to L. So the values s_{2^n}, being among these values, would also become arbitrarily close to L. But since they are approaching infinity, they are not becoming arbitrarily close to any fixed real number L. Having been shown the argument in rough form, you ought to be able to translate it into an "epsilon-delta" argument. Can you? ---------------------------------------------------------------------- Regarding Example 8 on p.708, you ask > What is a harmonic series? The phrase "harmonic series" at the beginning of this exercise appears in boldface type. This signals that what is being said is the definition of the term. So the phrase "the harmonic series" means "the series 1 + 1/2 + 1/3 + ... + 1/n + ...". (Depending on the author and the type of writing, definitions may be signaled by boldface or italic type. Whichever is used, if you see something put in a different font in mathematical writing, and there is no other obvious reason to do so, such as making a contrast or stressing something, it is a good guess that the words are being defined.) > Why is the solution using partial sums when n = 2,4,8 case? Because that is the property of the harmonic series that we are going to use -- that the single term 1/2, and the sum of the next two terms, and the sum of the four terms after that, etc., are all about the same size; so as we go on summing terms, we keep adding in sums of about the same size, so the sums we get increase without bound. > And how did the solver decide to use inequalities to solve the > problem? Proving divergence (of a series of positive terms) is always a question of inequalities -- of showing that the partial sums can get arbitrarily large. ---------------------------------------------------------------------- You ask why Theorem 7, p.709, is true. Stewart answers this in the sentence right after the theorem! Did you read that sentence? If you had difficulty understanding the reasoning of that sentence, then you should have written me about that difficulty. Please e-mail me what your difficulty was! ---------------------------------------------------------------------- You ask about the assumption Stewart makes in the Integral Test (p.716) and the Remainder Estimate (p. 718) that f is continuous. He assumes this so that we can be sure that the integral of f(x) makes sense -- see Theorem 3 on p. 373. In fact, it can be proved that any bounded monotone function is integrable, and using that fact, one can see that the continuity condition can be dropped from those theorems. But since discontinuous monotone functions are not important in a course at this level, while continuous functions (and the slight generalization of these referred to in the theorem on p. 373) cover most of the function we look at, Stewart only states the integrability result for that case; hence he has to put the continuity assumption into the two theorems you pointed to. ---------------------------------------------------------------------- You ask about the sentence at the beginning of the new heading on p.718 saying that "any partial sum s_n is an approximation to s ...". That statement doesn't have a precise mathematical meaning; it conveys the intuitive idea that the s_n are the "steps" toward the limit value s. Depending on the series in question and the value of n, the numbers s_n may be very near to s, or quite far. (But we know that for any convergent series, if we take n large enough, then s_n will be as close to s as we wish.) ---------------------------------------------------------------------- You ask why in the proof of the Integral Test on p.720, the summation of the areas under the curve start with a_2. The rectangles with x ranging from 1 to 2, and top sides above and below the curve, are used to give upper and lower bounds on the integral from x=1 to 2. Since f(x) is decreasing, it has its largest value in the interval [1,2] at x=1, and its smallest value at x=2. These values are f(1)=a_1 and f(2)=a_2 so those are the heights of those two rectangles, which have base 1. Hence the summation that bounds the integral above begins with a_1, the area of the taller rectangle, while the summation that bounds it below begins with a_2, the area of the shorter rectangle. ---------------------------------------------------------------------- You ask about Stewart's statement at (i) on p.720, that if the integral in question is convergent, then (4) gives an inequality involving a sum from 2 to n. Specifically, you ask why the sum begins at 2 rather than 1. Well, did you look back at formula (4) (the one labeled [4], in red, higher on the page) and see what it says? And if, on looking at that formula, you were puzzled at why the subscripts begin where the do, did you look at how Stewart obtained that formula? After following the argument back, let me know what you understand, and what step(s), if any, need clarification, and I'll address these. (See the "Note" on the lower part of the back of the class handout, beginning "If in my office hours ...".) ---------------------------------------------------------------------- You ask about applying the Comparison Test (p.722) to a series one or more terms of which have forms like 1/0. If that happens, then what one has is not a series. The definition of a series requires that every term be a real number! Note that the series Stewart writes down always avoid such cases; e.g., p. 712, Ex.66 and p. 721, Ex.22 both start with values of n bigger than 1 to avoid the zero denominators. ---------------------------------------------------------------------- You ask whether, in the Limit Comparison Test (p.724), given (a_n) we can take any sequence for (b_n). (b_n) can be any sequence that satisfies the conditions of the statement, namely that its terms be positive and that the ratios a_n/b_n approach some nonzero limit. The idea is to look for a positive sequence (b_n) that is "similar" to (a_n) in the way it behaves (expressed by the ratios a_n/b_n approaching a nonzero limit) but is simpler in its form (so that one can tell more easily whether it converges or diverges). ---------------------------------------------------------------------- After a few questions on the estimation of sums discussed on p.725-726, you ask > ... what is the use of sequences and series in general?? ... Well, it can go in either of two directions: One can have a known mathematical entity and get a useful handle on it by finding a series that represents it; or one can have some mathematical situation leading to sums of terms that get closer and closer to some value, and try to learn about where it is heading by finding a simple expression for that value. In the beginning of this chapter, we focused on the latter idea, e.g., figuring out what 1 + 1/2 + 1/4 + 1/8 + ... was approaching; but we have already seen bits of the latter. E.g., by expressing the known rational number 1/7 as the sum of an infinite series, we get the decimal expression 1/7 = .142857142857142857..., which is easy to compute with in our decimal system of notation; and Stewart mentions here and there that series expansions for pi allow one to find its value to great accuracy. Starting with section 11.9, we will be studying representations of functions by "power series", which gives us a new information on functions like e^x and sin x, as well as non-elementary functions like the integral of e^{-x^2}. ---------------------------------------------------------------------- You ask about justifying the assumption that a_n is less than b_n for all n greater than some number N in working p.726, Ex.40. Whether one can take something like that for granted depends on the level of the audience for which one is writing (or when one is a student being graded, the level of the course one is in). At this level, it is best to give the details. The way to do so is to use the definition of the statement that lim a_n/b_n = 0. That definition begins, "for every epsilon there exists an N such that ...". After filling in the "..." in what I have written, do you see what value of epsilon would have the effect that for n > N, a_n < b_n ? ---------------------------------------------------------------------- Noting that the Alternating Series Test (p.727) can be used to show that \sum_{n=1}^{infinity} sin(n\pi)/n converges, you ask whether \sum_{n=1}^{infinity} (sin n)/n converges. The answer is: yes, but we don't have the tools to prove this in Math (H)1B. The proof would use the generalization of the Alternating Series Test that I sketched near the end of class today: That if (b_n) is a decreasing sequence of positive numbers with limit 0, and if (e_n) is a sequence of numbers such that the set of partial sums e_1 + ... + e_n is bounded, then the series \sum b_n e_n converges. (The Alternating Series Test is the case where e_n = (-1)^n.) To answer your question we apply this with b_n = 1/n, and e_n = sin n. It isn't obvious that the set of partial sums sin 1 + ... + sin n is bounded, but when we get to reading #25, we'll have the tools to prove that; I'll put it on the homework sheet then. (I don't know whether I'll make it an assigned problem; that will depend on how many other good problems on Appendix H there are.) With a little ingenuity, one can also get a bound on the sums sin 1 + ... + sin n by geometric reasoning, without the ideas of Appendix H. ---------------------------------------------------------------------- You ask why, in Example 2, p.729, Stewart couldn't immediately say that the series diverged on finding that condition (ii) of the Alternating Series Test failed. In general, if a test "X ==> Y" fails, i.e., if the condition "X" is shown not to be true, one can't conclude that "Y" fails. So, for instance, in the case of the Alternating Series Test, a series which fails to satisfy (i) may or may not converge. However, you are right that a series that fails to satisfy (ii) can never converge, because failing to satisfy (ii) is equivalent to satisfying the condition in the "Test for Divergence". But I guess Stewart didn't feel that this would be obvious to the reader, and so gave a separate three lines of argument. ---------------------------------------------------------------------- Regarding Example 2 on pp.729, you say > ... we tested the series using the Test for Divergence. I thought > that rule only applied for series with positive terms. ... Check out the statement of that Test! There's nothing in it about "positive terms". Stewart is generally a very careful writer. If he doesn't say in a theorem that the terms must be positive, then that is not assumed. ---------------------------------------------------------------------- You ask whether an alternating series can fail to converge if there are finitely many exceptions to the condition b_{n+1} \leq b_n. No. We know that convergence of a series is not affected by finitely many terms: For instance, \sum_1 ^infinity a_n = a_1 + \sum_2 ^infinity a_n, so if the sum from 2 to infinity converges, so will the sum from 1 to infinity. So if the condition for an alternating series applies from the second term on (or from the nth term on), the series will converge. Stewart states the alternating series test for the case where all the b_n's are decreasing to make it easy to understand and remember. But he takes for granted that you can see that it follows that it still applies if there are finitely many exceptions when he writes, on p.729, in the solution to Example 3, "all that really matters is that the sequence \{b_n\} is eventually decreasing". ---------------------------------------------------------------------- Regarding the error estimate for alternating series given on p.730, you write > ... I am always wondering what is the purpose of finding the size > of the error? If we want to know the sum of a series, and it's not one where we can find the exact answer, then the best we can do is add up a lot of terms and regard the result as an approximation of the sum. Then we naturally want to know how good an approximation it is -- if we know that it has an error of less than .001, for instance, then we have essentially found the sum to 3 decimal places. Sometimes, even when we do "know" the sum, these error estimates give useful information. For instance, if the sum is pi, or ln 2, then by summing terms of the series, we can find the value of pi or ln 2 to many decimal places. Incidentally, we don't "find" the size of the error; we bound it. If we could find the exact size of the error, then we could find the exact value of the sum, by just adding the error to the partial sum. The best we can generally do is say that the error is less than some value. > Will it be a foundation of something else that we will learn in > the future? Not in this course. Very likely in Math 128, if you take it. ---------------------------------------------------------------------- You ask about the exact value of \Sigma (cos n)/(n^2), the convergence of which is proved in Example 3, p.733. I don't know; I suspect that it can't be expressed in terms of elementary functions. You also say that since the series is neither alternating nor all positive, you don't know how to estimate the sum. If by estimate you mean finding error bounds on the partial sums, that is not hard. Writing a_n = (cos n)/(n^2), note that |a_n| \leq 1/n^2 < 1/(n(n-1)) = 1/(n-1) - 1/n (for n>1). From this it is not hard to verify that |R_N| = |\Sigma_{n>N} a_n| \leq \Sigma_{n>N} [1/(n-1) - 1/n] = 1/N (summed as a telescoping series). ---------------------------------------------------------------------- Regarding the ratio test (p.734) you note that it applies an absolute value to the terms of the series, and ask "how can it account for an alternating series?" Some alternating series do converge by the ratio test; e.g., the series with a_n = (-1/2)^n. As you indicate, the absolute value throws away the "alternating" property; so these are among the series that would converge even without that property -- they are absolutely convergent. But if you take an alternating series which is not absolutely convergent, such as the one with a_n = (-1)^n/n, you will find that it falls under the case where "the ratio test is inconclusive". ---------------------------------------------------------------------- You ask about the author's statement on p.736, end of solution to Example 5, that lim_{n->infinity} (1+1/n)^n = e. Notice that after that equation, he writes "(see Equation 3.6.6)". That means the equation numbered [6] in section 3.6. Check it out. Does it answer your question? ---------------------------------------------------------------------- Concerning the fact mentioned on p.737, that a rearrangement of a conditionally convergent series can sum to any real number, you ask whether there are ways of rearranging such a series that will not change its sum. Yes. We know that rearranging just finitely many terms has no effect. It is also not hard to prove other special cases, such as that interchanging each odd-position term and the term after it won't change the sum. But it is very tricky to describe the most general permutation that will not affect the sum; we won't go into that. (One time when I was teaching Math 104, I thought about that question, and got an exact criterion; but it was something far too complicated to give even to a 104 class.) ---------------------------------------------------------------------- You ask how one would write a formula for the series [7] on p.737. The straightforward way would be to say that it is \Sigma a_n, where a_n is defined to be 1/n if n is even and 0 if n is odd. (There are other tricks that one could use, such as writing a_n = (1 + (-1)^n) / 2n. But the point I want to make is that if one wants to express the condition that 0 is used for every other term, one can simply say this in a precise way. Once one has learned that, then one can go on to tricks if they are helpful.) ---------------------------------------------------------------------- You ask why the convention on p.741 that (x-a)^0 is 1 when x = a is "valid". Any definition which says what we will mean by a symbol is "valid", as long as we follow it consistently, and reason correctly using it. One can ask whether it is consistent with other definitions we have made; but we have no other definition of x^y that apply to the case x=y=0. Assuming we follow and reason about our definitions consistently, the next question is whether a given definition is useful. In situations like this one, where it is clear what definition we want to use for "most" values of the argument, and we need to decide whether it would be useful to extend this definition to some cases where it is not obvious what our choice should be, we should ask which choice, if any, would make various general statements hold for the new cases as for the new. One condition that would be nice, but that we can't make hold for the function x^y when x = y = 0, is continuity as a function of both x and y: If we take x = 0 and let y -> 0+, the limit is 0, while if we take y = 0 and let x -> 0, the limit is 1. On the other hand, if we define positive integer powers of a number x by starting with x^0 = 1, and recursively letting x^{n+1} = x x^n, then this leads to a nice uniform development of the laws of exponents (for nonnegative integer exponents), which requires no exceptions for x = 0. The choice 0^0 = 1 is also convenient in that it allows us to express a polynomial or power series a_0 + a_1 x + a_2 x^2 + ... as Sigma a_n x^n. Note also that in dealing with power series, we never have exponents n that approach 0 via nonzero values (which is the situation in which the discontinuity of 0^y at y = 0 would make trouble), since no integer is closer to 0 than 1; but we can have x approaching 0 via nonzero values (whenever we look at the behavior of our series near 0). So of the two competing definitions of 0^0 that continuity considerations lead to, the choice 1, that makes x^y continuous in x, is more useful than 0^0 = 0, which would make x^y continuous in y. Depending on the area of mathematics, one may find it useful to define 0^0 as 1, or leave it undefined. In the majority of cases, including the study of power series, the definition 0^0 = 1 is by far the most useful. (Incidentally, this does not contradict the statement "0^0 is an indeterminate form". That is not a statement about the value of 0^0; rather, it is shorthand way of saying that if two functions f(x) and g(x) both approach 0 as x-->a, then this is not enough information to determine lim_{x->a} f(x)^{g(x)}. The reason for this is the fact mentioned above, that no choice of definition for 0^0 makes the function x^y continuous at (0,0).) ---------------------------------------------------------------------- Concerning Stewart's statement on p.741 that in writing power series as \sum a_n (x-a)^n, we make the convention that (x-a)^0 = 1 even when x = a, you ask how it can be justified: "Isn't 0^0 undefined?" Well, whether something is undefined depends on whose definition one follows. Unlike most cases of exponentiation, there is not a completely obvious meaning for 0^0, but I think most mathematicians would agree that the optimal choice -- the definition that gives the most elegant and coherent system -- is to make 0^0 = 1. In particular, this choice is universally agreed on in the context of writing polynomials and power series. Stewart cautiously says "we have adopted the convention"; but even then, if that is the convention adopted, that's what the symbol means in this course. One likely source of confusion is that "undefined" is sometimes used as shorthand for a statement about limits. It is quite true that if lim_{x->a} f(x) and lim_{x->a} g(x) are both 0, one cannot deduce from this what lim_{x->a} f(x)^{g(x)} is. This is sometimes turned into the slogan "0^0 is undefined", but that is not a statement about 0^0 as an arithmetic operation. Note that in studying power series or polynomials, exponents are always integers, so one never has exponents approaching zero through nonzero values; so the above fact about limits is irrelevant. Evidence pointing to 0^0 = 1 as a reasonable definition is that it allows the following elegant inductive definition of exponentiation: x^0 = 1 for all x, and x^{n+1} = x x^n. (Usually people start the definition of exponentiation with x^1 = x; but under the above definition, x^1 = x can be proved rather than being assumed.) Note that making x^0 the constant function 1 for all x makes the law for differentiating x^n work for n = 1, while leaving it undefined at x=0 would punch a hole in that law. ---------------------------------------------------------------------- You ask about Bessel functions. I don't know much about them myself -- just that they're in the realm of mathematical physics and (advanced) differential equations. Stewart isn't saying that you should know about Bessel functions and their orders, or that we'll learn about them in this course -- in Example 3, p.742, he is simply using this particular function, saying "here is a function of real-world importance ..." to illustrate how to test a power series for radius of convergence. He likewise gives the power series for the Bessel function of order 1 in Exercise 35 to this section. An example that would have led to a computation involving the same idea as Example 3, p. 742 (how to deal with factorials), but a simpler computation would have been the power series for the function e^x. But, ironically, Stewart does not want to give that now, because in section 11.10 we will learn how to find that power series, and he doesn't want to claim here without explanation a result that we will get by computation later on. So instead, he gives us the series for a function that we have not seen before. ---------------------------------------------------------------------- Regarding Example 3 on p.742, where Stewart notes that x^2/4(n+1)^2 --> 0 for all x, you ask "what if x is big enough so that x^2/4(n+1)^2 > 1?" When Stewart writes "--> 0" here, he means "approaches zero as n approaches infinity, with x fixed". Now with x fixed, as n goes to infinity it eventually gets larger than x, larger than 10x, larger than 1000x, etc.. So the fraction shown does indeed approach 0. If instead we were talking about the limit as x --> infinity, with n fixed, then the opposite would be true: x^2/4(n+1)^2 would approach infinity. You may well ask, "How do we know that we are talking about a limit as n approaches infinity with x fixed?" The answer lies in the situation we are considering. The concept of a power series involves a set of series, one for each value of x. For each such value of x, we imagine computing the terms of the series using that fixed x, and summing them over n. (Stewart says this in the second sentence of this section (on p. 741): "For each fixed x, the series [1] is a series of constants that we can test for convergence or divergence". I similarly emphasized in class that each value of x gives us a series that we sum, and that we then consider these sums a function of x. But in computing _each_ such sum, we use a fixed value for x. This is just like polynomials: In computing a value of 3x^2 + 5x + 7, we don't use different values of x in the "3x^2" and the "5x".) So the arrow in "x^2/4(n+1)^2 --> 0" refers to what happens as n --> infinity, with x fixed. Make sense now? ---------------------------------------------------------------------- You ask why the ratio and root tests fail at the endpoints of the interval of convergence of a power series, as stated on p.743. Let's assume, for simplicity, that the series is centered at 0. Now if the power series converges by the root test at a point x, one can deduce that for some constant C and some r with 0 \leq r < 1, the n-th term of the series at x must have absolute value < C r^n. If this is so, let us choose some q\in (r,1). Then we find that at the point (q/r)x, the n-th term of the series will have absolute value < C q^n; so the series will also converge at that point. This shows that an x at which the series converges by the root test cannot have the largest absolute value among the points where the series converges; so such a point is not an endpoint of the interval of convergence. A similar argument shows that a point at which the series diverges by the root test cannot have smallest absolute value among points where the series diverges. The argument for a series centered at an arbitrary point a is similar; the only difference is that instead of multiplying x by q/r, we multiply x-a by that factor; i.e., take a new point whose distance from a is q/r times the distance of x from a. The same statement for the ratio test follows from the fact that if a series converges by the ratio test, it also converges by the root test. Finally, you ask whether this means that at the endpoints of the interval of convergence, the series is always conditionally convergent. No. It can be a series that is absolute convergent but for which the ratio test fails (e.g., a p-series with p > 1). Or it could be a series which is conditionally convergent. Or it could be a series which is divergent, but for which the ratio test fails. (E.g., n^k for any k > 0; or even any k > -1.) You should be able to give examples of power series behaving in each of these ways at the endpoints of their intervals of convergence! ---------------------------------------------------------------------- You note that in Example 5, p.745 the ratio a_{n+1}/a_n simplifies to (n+1)(x+2)/3n; and you ask how one knows what to do next. In studying the properties of series, the thing one is looking at is how the terms change as n --> infinity. (Everything but n is a "constant" so far as the process of summation is concerned.) Likewise, in using the ratio test, you want to know how the ratio of successive terms changes with n -- whether it approaches some limit you can describe. The two terms depending on n in (n+1)(x+2)/3n are the n+1 in the numerator and the n in the denominator, so one separates them out and sees how they interact. This gives a factor (n+1)/n, which one can see approaches 1, by writing it 1 + 1/n. What is left is the factor (x+2)/3, and one sees that that will be the limit value. ---------------------------------------------------------------------- Regarding the end of Example 5 on p.745, you ask what test for divergence Stewart is using here. Notice that he writes "the Test for Divergence", in capital letters. This shows that "the Test for Divergence" is the name he has given a certain test, and not just a general description. So you can look that name up in the index of the book, and find the test he is referring to. ---------------------------------------------------------------------- You ask why the geometric series in today's reading (e.g., [1] on p.747) start with n=0, while in the earlier development of geometric series, Stewart started with n=1. It's not just about geometric series. In sections 11.1-11.7, Stewart indexed almost all the series he showed starting with n=1, while starting in section 11.8, where he introduced power series, he has begun his series with n=0. I assume he started his series in the earlier section with n=1 because we are used to counting "1, 2, ...", so students would feel it unnatural for the subscripts used in a series to start with 0. But with power series (as with polynomials), we have terms corresponding to different powers of x, with the constant term corresponding to the 0-th power; so it is natural to start with n=0. A given series (and in particular, a geometric series) can be written either way; he has switched his way of writing general series as of section 11.8, and carried the case of geometric series along with this. ---------------------------------------------------------------------- Regarding Example 3, p.748, you ask why we are allowed to move an "x^3" past the \sum sign in a series, but can't move it past the \int sign in an integration. What you can't move past the integral sign is anything that depends on the variable of integration, often denoted x. Similarly, you can't move past the summation sign anything that depends on the variable of summation, typically denoted n. But if you were doing, say, \int xy dy (an integral with respect to y rather than x), where neither x nor y was a function of the other, then from the point of view of that integration, x would be a constant, and you could indeed rewrite the integral as x \int y dy, by the first formula in table [1] on p. 398. Similarly, in the example you ask about, x^3 does not depend on the variable of summation, n (i.e., it represents the same number in every terms of the series), so you can move it past the summation sign by rule (i) of Theorem 8 on p. 709. ---------------------------------------------------------------------- Regarding Note 2 on p.749, you ask > ... Does this mean that the whole interval may change when it is > differentiated like from 0 ... if there is a general way to set the error to be less than > something... or will it definitely depend on the form of the series? In general, it will depend on the form of the series. However, when |x| < |R|, we know that the absolute values of the terms of the series will always be less than or equal to those of some geometric series with 0 < r < 1, so one can use the fact that the terms will be less than those of that series. But the estimate we get may not be the best estimate there is, and this method won't work at the ends of the interval of convergence, i.e., when |x| = |R|. By the way, where you write "to set the error to be less than something", the easy way to express this is "to bound the error". or "to get a bound on the error". ---------------------------------------------------------------------- You ask about proving the formula at the bottom of p.753 by induction. Induction is the right way to get a solid proof of the result. Do you see how the induction would work? You can't just say "Assume f^{(k)}(a) = k! c_k" and prove the k+1 case from that -- in the formulas f(a)=c_0, f'(a)=c_1, etc. on that page, each of these formulas isn't deduced from the one before. Rather, it is the formulas [1], [2], [3], [4] each of which is deduced from the one before. So you have to figure out the general formula of which [1], [2], [3], [4] are the cases n=0,...,3, and show how to prove the n=k+1 case from the n=k case; and then deduce f^{(n)}(a) = n! c_n by taking x=a in the n-th case of that formula. ---------------------------------------------------------------------- Regarding the argument leading up to Theorem 5 on pp.754, you ask "Why can we assume x = a?" We are assuming formula [1] on p. 753, which states that the given equation is true for all x with |x-a| < R. (Stewart just writes "|x-a| < R" rather than the fuller statement "for all x with |x-a| < R", because by this point, having studied the concept of radius of convergence, we can recognize that this is the kind of condition meant.) It is noted that the consequences [2], [3] and [4] likewise hold for all x in that range. And a statement true for all x in that range holds in particular for x = a, since a is in the range. That is what Stewart means when he speaks of putting x = a in those formulas. ---------------------------------------------------------------------- You ask about the convention 0! = 1 stated on p.754, just before the Theorem. There are different ways to explain that. One of them is to say that the simplest way of defining n! is not the way you have seen, but to start with 0! = 1, and then say that for every n>0, (*) n! = (n-1)! n. You can check that this will give the values you are familiar with for n = 1, 2, ... . But if we started by defining 0! to be any value C different from 1, then the above rule would not give the values you are familiar with. So to get those values, and have a nice consistent relation between the factorials of different numbers, we need to take 0! = 1. Another is to suppose we know the factorials of all integers greater than 0, and wonder whether we can choose a value for 0! that will relate to them in a natural way. Factorials of successive positive integers are related by formula (*) above, so we try to define 0! to be the value that will make (*) work for n=1; and that gives us 1. Still another way is to note that n! is the number of ways a set of n things can be put in order. For both n=1 and n=0, there is only one order one can use; so both 1! and 0! should be 1. One way or another, the choice 0! = 1 makes (*) and many other mathematical properties of factorials work right for n=0, so that is made the definition. Remember that Stewart has called (*) a "convention". That means a choice that people agree to follow. When a definition that you have used before doesn't give any answer in some case, it is reasonable to make a convention that will define the concept in that case, if the choice one makes has properties that are convenient to reason with. When one makes such a convention, one isn't saying that the old definition gives the new answer; one is agreeing to use an extended definition. ---------------------------------------------------------------------- Concerning Taylor series (p.754) you ask how an infinitely differentiable function could fail to be given by that series -- how it could "grow away from the power series" if all their derivatives are equal. Well, all their derivatives are equal at the point a about which one is taking the Taylor series; but they don't stay the same. Consider the function e^{-1/x^2} (made 0 at x=0) of exercise 74 (p. 766). It's value at 0 is 0; to grow away from this, it must have nonzero derivative as one moves away from 0. But its derivative at 0 is 0, so to get the derivative to have some nonzero value close to 0, it must have a larger value for the second derivative in between. But its second derivative at 0 is also 0, so to get the second derivative to have the above largish nonzero value close to 0, it must have a still larger value for the third derivative in between. And so on. And if you start computing the various derivatives of e^{-1/x^2}, you find exactly this phenomenon: Although they all go to 0 at x=0, they become large at a faster and faster rate very close to 0, as you go to higher and higher derivatives. On the other hand, for most familiar functions, one can prove that the successive derivatives don't grow very fast, so the function is in fact given in a neighborhood of each point by its Taylor series. Incidentally, your intuition that an infinitely differentiable function will be determined by its derivatives at a point turns out to be right (when properly stated) for functions of a complex variable (rather than a real variable). You will see this if you eventually take Math 185. ---------------------------------------------------------------------- Regarding the material on pp.754-758 you ask > What is the difference, or advantage, of writing the sum of the > Taylor series of e^x centered at a=0 versus a=2? If you want to approximate the values of e^x for x near 0, the series centered at a=0 is the useful one; if you want to approximate the values for x near 2, the series centered at a=2 is best. ---------------------------------------------------------------------- You ask about the statement on p.755 that f(x) is the sum of its Taylor series if it is the limit of the T_n(x). Remember that a number is the sum of a series if it is the limit of the partial sums of that series (p. 705, Definition 2). At the bottom of p. 755, the values T_n(x) are defined to be the partial sums of the Taylor series. Putting these two facts together, we get the statement you asked about. ---------------------------------------------------------------------- You ask what happens in the context of Theorem 8 (p.756) for |x - a| not less than R. Stewart's argument really shows that at any point x, f(x) is given by its Taylor series about a if and only if the indicated limit statement holds. He puts in the condition "|x - a| < R" because that is the type of condition that typically determines a region where such a statement holds, and hence the kind of region on which you will typically be applying the theorem. But it is not essential to the theorem. Wherever the limit statement holds, f(x) will equal the sum of its Taylor series. ---------------------------------------------------------------------- Regarding the sentence between the two theorems on p.756, you ask whether we are taking the lim_{n->infinity} R_n (x) = 0 for granted, saying "It would be convenient if this was true" and then proving it in the next part. Basically, right; but where you say we prove it "in the next part", the next theorem gives us a tool for proving it, which works for lots of familiar functions. But Exercise 74 (p. 766) shows that it isn't true for all functions. Regarding the second theorem, you ask > Is M a constant? and how is d determined? For any given function f, we hope to find some interval [a-d, a+d] in which |f^{(n_1)}(x)| is bounded by some (not too big) constant M. Then we can apply the theorem. As I said above, being able to do so depends on the functions. ---------------------------------------------------------------------- Regarding Theorem 9 on p.756, you ask "What if |f^{(n+1)}(x)|\geq M ?" That theorem says that if f is a function, and a, d and M are constants such that the inequality |f^{(n+1)}(x)|\leq M for all x such that |x-a| \leq d, then the stated conclusion holds. If f is a function, and a, d and M are constants such that the reverse of the above inequality holds, we could get the reverse of the inequality in the conclusion of the theorem, by roughly the same method I used in class to prove the theorem. But this would seldom be of much use. When one applies the theorem, one's hope is to find a not-too-large M that satisfies the stated conditions; and in fact to get for each n an M_n with that property; and then verify that these make Theorem 8 applicable for appropriate values of x. Getting the reverse inequality might occasionally be of use in showing that a function is not the limit of its Taylor series. ---------------------------------------------------------------------- Regarding the proof of the n=1 case of Taylor's Inequality on p.756, you note > the book says that a < x < a + d... but if |x-d| < d shouldn't > it be a - d < x < a + d ? why are they taking x > a ? The calculations for x > a and for x < a are mirror images of one another. He first does the x > a calculation, then says in the middle of the next page that "similar calculations" handle the case x < a. ---------------------------------------------------------------------- You note that a few lines after Theorem 9 on p.756 it says that an antiderivative of f'' is f', and that this implies that f'(x)-f'(a)<=M(x-a). You say > I do not really know where this comes from, is there any direct > relationship between the antiderivative that the book mentions in > the past sentence with this equation? Yes! The book says "by Part 2 of the Fundamental Theorem of Calculus". Did you look up (or remember) what Part 2 of the Fundamental Theorem of Calculus says, and ask yourself how it might be applicable to this implication? I am not answering in this way to be difficult, but to point out to you the way you need to read the text in order to do well in a math course. Let me know whether you had already checked out what Part 2 of the Fundamental Theorem of Calculus said, and thought about whether it was relevant, or not; and in either case, how far you get with the connection, and whether you still need help with it. ---------------------------------------------------------------------- You ask whether Taylor's Inequality (p.756) proves that the Taylor series converges to f(x). The M in the hypothesis of Taylor's theorem can depend on n. To prove that the series converges to f(x) for a given f and x, we need to see how |M| grows as a function of n. The answer depends on the function f, and the values of a and x. ---------------------------------------------------------------------- You ask how mathematicians come up with non-obvious results like Taylor's Inequality (p.756). Mathematical research is an exciting and difficult endeavor, and everyone has their own ways of coming up with ideas and proofs. The fact that Taylor's inequality is likely to be true is easy to guess: Because the x^0 through x^n terms of the Taylor series are chosen to make T_n(x) have value, first derivative, etc. through the n-th derivative matching those of f(x) at x=a, the remainder R_n(x) will have all those derivatives 0 at x=a, so the process by which R_n(x) moves away from 0 can be thought of as one where its (n+1)-st derivative allows its n-th derivative to move away from 0, the n-th derivative allows the (n-1)-st derivative to move away from 0, etc., with the movement of the first derivative away from 0 allowing f(x) itself to move away from 0. Now suppose we know a bound M on the absolute value of the (n+1)-st derivative. Intuitively, this process of each derivative allowing the next to move away from 0 should have largest possible effect if the (n+1)-st derivative is everywhere equal to M, or everywhere equal to -M. Taking the former case for simplicity, it's easy to compute, by successive integrations, that in that case, the n-th derivative of R_n will be M(x-a), the (n-1)-st derivative will be M(x-a)^2/2, and so forth, ending up with R_n(x) = M(x-a)^{n+1}/(n+1)!. Since this is what happens in the case where R_n is expected to grow as fast as is allowed by the restriction |R_n(x)^{(n+1)}| \leq M, we can expect that in general, if |R_n(x)^{(n+1)}| \leq M then |R_n(x)| will be \leq that value. But how someone came up with the proof by repeated application of Rolle's Theorem, I can't guess. I myself have two metaphors for math research, as I experience it: (1) Banging your head against a wall until you find a soft spot in the wall. (2) Playing. ---------------------------------------------------------------------- Regarding the computations of Example 2, pp.757, you write > ... I am confused about how this proves that the function is equal > to the sum of its Taylor series for all x ... It's because of equation 10 on p. 757. Note the words after that equation, "for every real number x". If the denominator were, say, 5^n, the limit statement would just be true for |x| < 5, but with a factorial in the denominator, it is true for all real numbers. > ... and not just a set radius of convergence. ... Recall Theorem 3 on p. 743: a power series may have a finite radius of convergence, or it may converge for all x. Returning the the first part of your question -- did you read the justification carefully, and see the explanation "from Equation 10" preceding the second display of Example 2, p. 757? ---------------------------------------------------------------------- > Is it possible to use a power series to prove or disprove the > rationality of a number? In a roundabout way, yes. Power series lead to the formula [12] for e on p.757. From this we can prove that e is irrational, as follows: Note that if r = a/b is a rational number, where a and b are integers, and we approximate it by some other rational number p/q not exactly equal to a/b, then the error e = a/b - p/q can be written with denominator bq, hence, since it is not zero, its absolute value must be at least 1/bq. This says that |eq| is at least 1/b. So for every rational number r, there is a fixed constant c > 0 (namely, 1/ the denominator of r) such that whenever we approximate r by another rational, the error times the denominator of that approximating rational is at least c. On the other hand, if we approximate e by r = 1 + 1/2 + ... + 1/n!, this is an expression that can be written with denominator n!, and on can show that the error is only slightly more than the next term 1/(n+1)!. Hence the error times the denominator of our approximation is roughly n! . 1/(n+1)! = 1/(n+1). Now as we take n larger and larger, this does not remain \geq any positive constant c. So e cannot be rational. (I've left out the details of why the error is only slightly more than 1/(n+1)! It's not hard; but to get around that, one can consider, instead, e^{-1} = \sum (-1)^n / n!. Because this is an alternating series of decreasing terms, the error will always be _less_ than the preceding term, and the above argument works without extra details. And, of course, knowing that e^{-1} is irrational, we can conclude that e is.) ---------------------------------------------------------------------- You ask how one can prove that the Maclaurin series for e^{2x} found in two ways -- by substituting 2x for x in the Maclaurin series for e^x (Example 3, p.757), and by direct computation using the derivatives of e^2x -- must be the same. Well, we know that e^x is given by its Maclaurin series, which means that the equality e^x = \sum x^n/n! holds for all x; hence, substituting 2x for x, we get a power series in x that represents e^{2x}. If we apply Theorem 5 on p. 754, to the case f(x) = e^{2x} and the series found as above, it shows that the coefficients of that series must come from the derivatives of e^{2x} by the formula at the end of that theorem; i.e., that they are the coefficients obtained by direct computation. So the two series must be the same. (In this case, with e^{2x}, it's not particularly hard to get those coefficients directly; but your question could equally be applied to e^{x^2} -- cf. Example 11, p. 762 -- and there it is much harder to do so.) ---------------------------------------------------------------------- You ask why Stewart expands sin x about the particular point pi/3 in Example 7 on p.759. He wants to illustrate the fact that one can use these series, even when the calculation is hairy. The point pi/3 is a natural one, since it represents 60 degrees, and it has a sine and cosine given by simple expressions. If one wants to compute values of a function near a particular point x=a, the power series centered at a is much more useful than series centered at more distant points. It converges more quickly, since the terms (x-a)^n becomes very small. So for applications one needs to be ready to expand about any a. ---------------------------------------------------------------------- You ask how Stewart gets the formula for tan^{-1} x in the chart on p.762. (Incidentally, your question of the day is supposed to specify the page, and item on that page, that you are asking about. I had to hunt around a bit to find what page your question must be referring to.) He got the formula on p. 750, in Example 7. (I'll suggest that he insert here a reference to that example.) ---------------------------------------------------------------------- You ask why, in the chart on p.762, the terms in the series for tan^{-1} x don't have factorials in the denominators, as most of the other series do. Those factorials come in when the series is computed using the formula [7] on p. 754. The power series for tan^{-1} is obtained by a different method in Example 7, p. 750. Of course, it could also be computed using formula [7] on p. 754; but that computation would be messy, and in the end, the factorial in each denominator would interact with another factorial in the numerator, and give the quotient 1/{2n+1}. ---------------------------------------------------------------------- You ask whether long division, as in Example 12(b)[13], p.764, is the only way to find the power series for tan x. No. Formula 6 (p. 754) is also applicable to tan x. But there is no simple formula for the n-th derivative of tan x, while there are such formulas for sin x and cos x. Using Formula 6, one can compute as many terms as one wants of the series for tan x; but since there isn't a general formula, it's easier to compute them from the series for sin x and cos x, which do have general formulas. ---------------------------------------------------------------------- Regarding multiplication and division of power series, illustrated in Example 13 p.764, you ask how many terms one should take. Stewart says at the end of the first paragraph that he will "only find the first few terms because the calculations of later terms become tedious ...". Typically, there is no simple formula for the general term, so no "pattern" emerges from further calculation. Any number of terms can be computed, and in real life, one would compute however many one wants for one's purposes. Stewart just computes enough to illustrate the method of computation. ---------------------------------------------------------------------- You ask why, as Stewart says on p.769, last sentence of first paragraph, the sequence (T_n(x)) converges more slowly to e^x the farther x is from 0. This is because the terms that are left out of T_n(x), i.e., x^{n+1}/(n+1)! + x^{n+2}/(n+2)! + ..., are larger the bigger x is. More generally, see Taylor's inequality on p. 756, where the larger |x-a| is, the larger (and hence the weaker) the bound bound on |R_n(x)| is. ---------------------------------------------------------------------- Regarding the discussion of approximating the function e^x on p.769, you ask how we can find an M to use in getting error bounds, given that e^x (and its derivatives, which, as you note, are all e^x) is unbounded on its domain, the real line. We can't find one M that will bound e^x. But for x in any given range, e.g., (-1,1) or (100,1000), there will be upper bounds on those derivatives, depending on the range. So we can get results saying that for x within a given finite range, such-and-such many terms of the Taylor series are enough to approximate e^x to within a given error. ---------------------------------------------------------------------- > P. 769, bottom, part(b) solution of Example 1 ... > ... Why does Stewart point out that the series is not alternating > when x < 8? After the first two terms, the coefficients c_n do alternate in sign. (At the first step, one is differentiating a positive power of x, x^{1/3}, but after that step, one is always differentiating a negative power, e.g., x^{-2/3}, x^{-5/3}, etc., so it always brings in a negative factor.) So one might think that the series becomes alternating after the first term. This works for x > 8, since then (x-8)^n is positive; but for x < 8, we have (x-8)^n switching signs at each step, so the products c_n (x-8)^n keep the same signs after the first step, so the series is not alternating. ---------------------------------------------------------------------- Regarding Example 1, p.769, you ask how we know the radius of convergence of the cube root of x around x=8. Actually, Stewart never says that we know the radius of convergence. One can form the Taylor series and ask how closely a given number of its terms approximates a function without knowing that it converges, or, if it does, that the limit is the function. (For an example where, though the Taylor series does converge, it doesn't converge to the given function, we recall the function e^{-1/x^2}, whose Taylor series is 0; and this does approximate that function quite closely for small very values of x.) However, we do have the tools to answer your question, and I'm surprised that Stewart doesn't mention this. Since we are thinking of x^{1/3} relative to the point x=8, we can write it as ((x-8) + 8)^{1/3}, and simplify this to 2((x/8 - 1) + 1)^{1/3}. If we make the change of variables y = x/8 - 1, then what we have after the factor 2 is (y+1)^{1/3}, which we can expand by the binomial series. This has radius of convergence 1 as a series in y; hence as a series in x = 8y + 8 it has radius of convergence 8, and interval of convergence (0, 16) possibly with one or both endpoints thrown in. (It does converge at both endpoints, but I won't go into details.) Indeed, in terms of the binomial series, we have 2(y+1)^{1/3} = 2 (1 + (1/3)y + (1/3)(-2/3)y^2/2! + (1/3)(-2/3)(-5/3)y^3/3! + ...) = 2 + (2/3)y - (2/9)y^2 + (5/81) y^3 + ... = 2 + (2/3)(x/8 - 1) - (2/9)(x/8 - 1)^2 + (5/81) (x/8 - 1)^3 + ... = 2 + (1/12)(x - 8) - (1/288)(x - 8)^2 + (5/20736)(x - 8)^3 + ... which matches Stewart's series as far as it goes. ---------------------------------------------------------------------- You ask why, as stated on the last line on p.772, "If v is much smaller than c, then all the terms after the first [in the last formula on the page] are very small when compared with the first term". To say "v is much smaller than c" is to say "v/c is very small", i.e., close to zero; so higher powers of v/c are very small compared with lower powers. (E.g., the 4th power is (v/c)^2 times the 2nd power, so the ratio between them, (v/c)^2, the square of a number very near 0, is very near zero. Putting in moderate-sized coefficients such as 1/2, 3/8 etc. doesn't significantly affect things when v/c is *very* small, e.g., around 10^{-6} in part (b) of this example.) ---------------------------------------------------------------------- You ask how Gauss's "cos \phi = 1" simplification in equations [2] at the bottom of p.773 leads to equation [3] on the next page. When one puts cos \phi = 1, then the expressions under the square root signs in [2] become squares. One can see this for the first of those equations by rearranging the term under the square root sign (after setting cos \phi = 1) as R^2 - 2R(s_0+R) + (s_0+R)^2 = (R - (s_0+R))^2 = (-s_0)^2 = (s_0)^2, or simply by expanding out, and noting that everything cancels but (s_0)^2. Similarly, the expression under the second square root gives (s_i)^2. Hence, the equations in [2] become l_0 = s_0 and l_i = s_i. Now substituting these into [1] we get [3]. (But I don't know how [1] is derived.) ---------------------------------------------------------------------- You ask whether there are common applications of Taylor polynomials in fields other than physics (discussed on pp.772-774). Probably; especially in fields closely related to physics, such as chemistry and astronomy/cosmology. The reason physics is the most obvious place is that it has exact mathematically expressible laws, so one knows when one is replacing these laws by approximations. In something like biology, the mathematical models are approximations anyway, so that rather than going from an exact law to an approximation, one simply goes from one model to another. Insofar as chemistry and astronomy are based on physics, they too have exact laws. Still, there may be other areas that I just don't know about. [sent later:] I was just looking online for something else (in relation to another student's question of the day), and I ran into a book where Taylor series approximations are applied to a different field -- finance! See http://books.google.com/books?id=o3w4ilXdIGgC&pg=PA219 ---------------------------------------------------------------------- You ask how Stewart gets equation [3] on p.774. When he replaces cos \phi by 1 in equations [2], the expressions under the square roots become the squares of R - (s_o +R), respectively R - (s_i +R), so those equations simplify to l_o = |R - (s_o + R)| = s_o and l_i = |R - (s_i + R)| = s_i. So in [1], all the denominators l_o and l_i become s_o and s_i, and this gives [3]. ---------------------------------------------------------------------- You ask about the origin of the term "homogeneous" in connection with differential equations, as used on p.1142. I discussed this in class last week. First note that one says that a polynomial in variables w, x, y, z, ... is "homogeneous" if all terms with nonzero coefficients have the same degree; e.g., w^2 - xz is homogeneous because all terms have degree 2. Next, one can talk about a polynomial being homogeneous in some subset of the variables. E.g., if one looks at x^2 - wxyz as a polynomial in all four variables, it is not homogeneous; but if one considers only its dependence on x and y, then all terms have degree 2 in those two variables, so it is homogeneous in x and y. Finally, when one considers a linear differential equation P_n(x) y^{(n)} + P_{n-1}(x) y^{(n-1)} +...+ P_1(x)y' + P_0(x) y = G(x), it is most useful to consider it as a polynomial in y, y',..., y^{(n)}, and hence to ask whether it is homogeneous in those variables. All the terms on the left have degree 1 in those variables, while the term on the right has degree 0 in them; so (assuming that the P_m are not all zero, as we must for this to be a differential equation) our equation is homogeneous if and only if the term on the right can be ignored in these considerations, i.e., is zero. ---------------------------------------------------------------------- You ask whether the c_1 and c_2 in Theorem 4 on p.1143 can be taken to be real numbers, or whether we need to use complex numbers. If y_1 and y_2 are real-valued solutions, and we want all real-valued solutions, it suffices to take c_1 and c_2 real. But if we allow complex-valued solutions, then we need to allow c_1 and c_2 to be complex constants. (In particular, in the context of statement [11] on p.1145, if we had somehow discovered the real-valued solutions e^{\alpha x} cos \beta x and e^{\alpha x} sin \beta x directly, then to get all real-valued solutions, it would be enough to combine these using real coefficients, as the solution given there shows. But since the process we used involved starting with the more natural complex-valued solutions e^{(\alpha + i\beta) x} and e^{(\alpha - i\beta) x}, we had to use complex-valued coefficients C_1 and C_2, then derive the expression in terms of e^{\alpha x} cos \beta x and e^{\alpha x} sin \beta x and real coefficients from these.) ---------------------------------------------------------------------- You ask why the key to proving Theorem 3 on p.1149 is showing that if y is a solution to [1], then y - y_p is a solution to the complementary equation. Well, notice that y = y_p + (y-y_p). So if y being a solution to [1] makes y-y_p a solution to the complementary equation, then the above displayed equation says that y has the form shown in the theorem. ---------------------------------------------------------------------- You ask whether Theorem 3 on p.1149 also works for n-th order linear equations with constant coefficients. It works for all n-th order linear equations, whether the coefficients are constant or not! (And I would have expected you to have been able to figure this out for yourself -- just try working through the proof that Stewart gives on p. 1149 for the general case, and see whether it works!) ---------------------------------------------------------------------- Regarding the method of undetermined coefficients (pp.1149-1153) you ask whether this can be used when G(x) is some trig function other than sin x or cos x, such as tan x or csc x. Essentially, the answer is no: As I described in class, that method depends on having a finite-dimensional vector space of functions that is taken into itself by the operator d/dx; and the ones that Stewart listed are the only such spaces. (If a space of functions containing tan x is closed under differentiation, it will contain all the derivatives of tan x, and these will yield an infinite-dimensional vector space, showing that the method can't be used.) That said, it is conceivable that there may be ways of using the idea of undetermined coefficients that don't depend on having such a finite-dimensional vector space -- some trick that will give one a space that one is sure will contain tan x, for instance, for some reason other than the one I described. But I don't know any such tricks. ---------------------------------------------------------------------- You ask about the sentence in the middle of p.1151 beginning "If G(x) is a product of the functions of the preceding types...". Stewart has talked about three "types" of function G(x): polynomials, functions A e^{rx}, and functions A cos kx + B sin kx. If one now looks at x cos 3x, this is a product of a polynomial of degree 1 and a function "A cos kx + B sin kx" with k=3. So what that sentence means in this case is that you should try something that combines these two sorts of solutions, "polynomials of degree \leq 1" and "functions A cos 3x + B sin 3x", using multiplication. What he shows you is (Ax+B)cos 3x + (Cx+D)sin 3x. Strictly speaking, this is not a product of a function ax+b and a function A cos 3x + B sin 3x; but each of the terms (Ax+B)cos 3x and (Cx+D)sin 3x is such a product, so though his wording is imperfect, I hope the meaning is now clear. ---------------------------------------------------------------------- You ask why, in the method of variation of parameters (p.1153), "we are able to replace constants with an arbitrary function". The constants appear in the solution to the homogeneous equation. We are looking for solutions to a nonhomogeneous equation; so as a "trial balloon" for a solution (which will turn out to work, with a little trickery), Stewart suggests we look at a similar formula with functions instead of constants. I hope to discuss a better way of coming up with that approach in class, when there is time. ---------------------------------------------------------------------- Regarding the development of the method of variation of parameters on pp.1153-1154, you ask why the condition u_1' y_1 + u_2' y_2 = 0 is "valid". I meant to answer this in lecture, but realized, after I left, that I hadn't really. Equation [5] on p. 1154 gives too many ways of representing any function: If a certain function y_p can be represented as y_p(x) = u_1(x) y_1(x) + u_2(x) y_2(x), then for any function v, we could take w_1(x) = u_1(x) + v(x)/y_1(x), w_2(x) = u_2(x) - v(x)/y_2(x), and then we would also have y_p(x) = w_1(x) y_1(x) + w_2(x) y_2(x). Thus w_1(x) and w_2(x) would lead to the same y_p as u_1(x) and u_2(x) do. So we need to put some additional restriction on u_1 and u_2 to make them anything near to unique, so that we can solve for them. Stewart suggests equation [7] as an additional condition that we might put on them. We can't be sure, a priori, that among all the pairs of functions satisfying [5], there will be a pair that also satisfies [7]. But if there is one, then, just because it satisfies [5], it will lead to a solution to our differential equation. And it is plausible that there should be a solution that also satisfies [7], because two equations in two unknowns tends be right for giving a unique solution. Stewart follows up the consequences of assuming [7], and gets an equation [9] which, together with [7], does indeed uniquely determine u_1' and u_2', allowing one to get u_1 and u_2 by integration. ---------------------------------------------------------------------- You ask why there must exist u_1 and u_2 satisfying equations [7] and [9] on p.1154. In general, two linear equations in two unknowns have a unique solution, so that one can solve for u_1' and u_2' in those equations, and integrate the results to get u_1 and u_2. But the above sentence began with "In general". The situation where it does not hold is if the coefficients in the two equations are linearly dependent: equations ax + by = p, cx + dy = q (where a,b,c,d,p,q are given, and x, y are to be found) don't necessarily have a solution if the pair (a,b) is a scalar multiple of the pair (c,d) or vice versa; equivalently, if the pair (a,c) is a scalar multiple of the pair (b,d) or vice versa. (You should check that these conditions come to the same thing.) In this situation, the latter pair of coefficients are (y_1(x), a y_1'(x)) and (y_2(x), a y_2'(x)). Suppose the first of these pairs was C times the second for some value x = x_0. That would mean that y_1 and C y_2 satisfied the same initial-value conditions at x_0, so they would be the same function, so y_1 and y_2 would not be linearly independent solutions. So in conclusion: if y_1 and y_2 are indeed linearly independent solutions of the complementary equation, then equations [7] and [9] will have a unique solution (u_1',u_2'). ---------------------------------------------------------------------- Regarding Stewart's statement on p.1154, after equation [6], that since u_1 and u_2 are arbitrary functions, we can impose two conditions on them, you ask how they can be arbitrary functions when they are the things that we are trying to solve for. What Stewart means is, roughly speaking, "If we consider two functions (without any restrictions other than those we are about to impose), we can determine them by imposing two conditions on them". That is a very vague statement, and under many possible interpretations it will be wrong, but the idea is right -- "two unknowns will be determined by two conditions" (even when they are unknown functions). Then the point that he makes is that the problem we have set ourselves imposes one condition, but that is not enough to determine the functions (I gave examples of that in class). There will in general be lots of pairs of functions u_1, u_2 that give the same y_p, but we don't know how to find even one of them. So, his idea is, let us choose a second condition, so as to make u_1 and u_2 unique, and then solve those two equations to find them. If we chose the other condition at random, we might end up with a system of equations that would be as difficult to solve as the original equation. But a lucky or inspired choice will make things much easier. As I said, I will, when I have the chance, give what I hope is a more insightful way of discovering this technique. You also asked whether this same method could be applied to first order linear differential equations, treating those as second-order equations with the coefficient a equal to 0. No, because the technique involves working with the two linearly independent solutions of the differential equation, and when a = 0 it has only one. However, if we look at the underlying idea, which was, in the second-order case, to take a general linear combination u_1 y_1 + u_2 y_2 with nonconstant coefficients, then in the first-order case, that says to start with a solution y_1 of the homogeneous equation, and then take an expression u_1 y_1 for our candidate solution to the nonhomogeneous equation. And that is exactly what we were led to do in section 3 of the handout on differential equations. In the first-order case, we have just one unknown function, u_1, and one condition that it should satisfy, so we don't need to introduce another condition "out of a hat"; so there wasn't any mysterious choice to make. (In the opposite direction, in solving a third-order linear differential equation, the method of variation of parameters give us an expression u_1 y_1 + u_2 y_2 + u_3 y_3, and so we have to supplement our original differential equation with two conditions taken "out of a hat".) ---------------------------------------------------------------------- You ask why the force of gravity doesn't make a difference between the situations of Figure 1 and Figure 2 on p.1156. If we assume an ideal spring -- one that satisfies "restoring force = -kx" without change of k over the full range of values of x through which it moves in our problem -- then a constant force (such as the weight of the mass) will not change the frequency of vibration. Stewart makes the difference between the two situations disappear completely by measuring x relative to the equilibrium position of the hanging spring, i.e., the position at which the force of gravity balances the restoring force of the spring, and calling that x = 0. If, instead, we let x = 0 be the position at which the spring would rest if gravity were not affecting it, then the spring equation becomes m d^2 x / dt^2 + kx = A, where A is the weight of the spring. This is a nonhomogeneous differential equation; it is easy to see that a particular solution is the constant function x = A/k, so the general solution is gotten by adding the general solution of the homogeneous equation to that constant; so it has the same frequency as the solution of the homogeneous equation that Stewart gets. In fact, Stewart's "x" is just the above "x" minus A/k. ---------------------------------------------------------------------- You ask about buoyant force in the situation of Figure 3, p.1157. Nice point! When Stewart talks about the "natural length" of the spring on p. 1156, he is taking account of the weight of the object attached at the end of the spring, if that weight is hanging vertically. (So the same spring would have different "natural lengths", i.e., equilibrium positions, in the situations of Figure 1 and Figure 2.) In the situation of Figure 3, the buoyant force would be a constant, which would cancel part of the weight of the mass, and so lead to still a different "natural length". But it wouldn't affect the spring constant or the damping force; so taking x to be the number of units the spring is stretched from that new natural length, the discussion at the bottom of p. 1157 is still valid. ---------------------------------------------------------------------- You ask how on p.1158, case II, Stewart gets -c/2m as the common value of the two roots r_1 and r_2 of equation [3]. The quadratic formula gives the roots as (-c +- \sqrt{c^-4mk})/2m. If the roots are equal, then c^2 - 4mk = 0, and that formula simplifies to -c/2m. (However, using the equation c^2 - 4mk = 0, we can rewrite that value in other ways; e.g., -\sqrt{k/m}.) ---------------------------------------------------------------------- Regarding the discussion of an underdamped vibrating spring on p.1158, you note that > the equation implies it will keep vibrating forever, just at > lower and lower amplitudes. Is this the case, or does something > physically stop the vibration at some point? In real life springs > appear to stop, ... Well, since the decay of the vibration is exponential, it would rather quickly decrease below the point where it could be observed. E.g., if the amplitude decreased by a factor of 10 in one minute, then in 10 minutes it would decrease by a factor of 10,000,000,000. Very soon it would be less than the motion added by random collisions of air molecules, etc.. That said, it may very well be that before that point, various nonlinearities in the accurate description of the mechanics of very small motions would become more significant than the terms of the equation we have been using. For instance, though friction is described as proportional to the velocity, I suspect that for many common interfaces between solids, there is a small constant term, so that if one imposes a force smaller than what is needed to overcome that constant term, the object does not slide. ---------------------------------------------------------------------- You ask about the equation \omega =\sqrt{4mk-c^2}/(2m) near the bottom of p.1158. After writing "Here the roots are complex:", Stewart gives you the result of applying the quadratic formula to equation [3]; but instead of writing the result as one big expression divided by 2m, he first writes out the non-square-root part divided by 2m, and then, since the number under the square root sign in the quadratic formula, c^2 - 4mk, is being assumed negative, he writes that square root as \sqrt{4mk-c^2} i. Instead of showing this expression in the formula for r_1 and r_2, he has called the result of dividing \sqrt{4mk-c^2} by the denominator 2m "\omega"; so the formula he gives for the roots involves +-\omega i, and then, on the next line, he says what he means by \omega. (The word "where" signals that the second formula is a qualification of something that precedes.) He has chosen the symbol \omega for \sqrt{4mk-c^2}/(2m) because \omega is commonly used for frequency, expressed in radians-per-unit-time, as also on p. 1156. ---------------------------------------------------------------------- You ask why, in the formulas for electric circuits on p.1160, the capacitance C occurs in the denominator, while resistance and inductance occur as coefficients. One could say that it is arbitrary that we measure capacitors with the number that we use, rather than its reciprocal. If we used that reciprocal, it would go in the numerator where we now put C in the denominator. Likewise if we measured resistors and/or inductances with the inverse of the numbers we now associate with them, then these would go in the denominator rather than the numerator. (The inverse of resistance has a name, "conductance" -- and where the unit of resistance is the ohm, that of conductance is called the "mho". Looking online, I find that the inverse of capacitance also has a name, "elastance"; but it is little used.) I would guess that the reason the usual measures are used for resistors, inductances, and capacitors, rather than their reciprocals, is the following: In an electric wire, resistance and inductance are generally negligible. To get significant resistance, one puts in a different, poorer, conductor in place of the copper wire; and to get significant inductance, one puts in a large coil. So the effects of these are measured by how much they change things from the default situation; i.e., how much they push back against the free passage and change of the current. On the other hand, a capacitor represents a break in the electrical circuit, and ordinarily, negligible current can pass a break. But by putting two plates very very close together, one can allow a little current to flow in and out, with balancing charges building up on the two plates. So in this case, it is allowing current through that changes the default situation; and the measure of the capacitor is the extent to which it does this. (It's easy to make a capacitor that lets through very little current; harder to make one that lets through more.) ---------------------------------------------------------------------- Regarding Example 1 on p.1164, you ask > Would it be possible to use power series to solve the equation > $y''+y=0$ without reindexing Equation 3 so that it looks like > Equation 4? Does Equation 3 need to be reindexed to compare > it to Equation 2? This essential thing is to remember that if two power series are equal, then for each n, the coefficients of x^n in those series must be the same. So in Example 1, for each n, you need to find the coefficient of x^n in y'' + y, and set it equal to 0. If you are comfortable eyeballing equation 3 and seeing that the coefficient of x^n in y'' is (n+1)(n+1)c_{n+2}, and concluding that the sum of that with the coefficient of x^n in equation 1, namely c_n, is 0, that's fine. The method of re-indexing is simply a convenient reliable way of doing this on paper. ---------------------------------------------------------------------- Regarding the technique of series solutions to differential equations developed on pp.1164-1167, you ask > ... whether there would ever be a different assumption made > than y= summation of c*x^n? ... Yes. For instance, if the coefficients of the equation involved x^{1/2}, it would be reasonable to expand as a series in terms x^{n/2}. In some situations, if one had reason to believe that the solution would have a pole at x = 0, one might use a series like c_{-1} x^{-1} + c_0 + c_1 x + c_2 x^2 + ... . And, for a much more trivial variant of the Sigma c_n x^n expansion, in some situations one might want to expand in powers of x-a for some nonzero a. ---------------------------------------------------------------------- You ask about the blue equation in the left margin of p.1166. The two sides of that equation differ in that the right-hand side includes a summand with n=0, while the left-hand side leaves that out. But in the summand with n=0, the factor n is 0, so that summand is 0; so leaving it out makes no difference. ---------------------------------------------------------------------- Your question of why, in equation [8] on p.1167, Stewart shows the first few terms of the series outside the summations, is an interesting one! There is a convention that mathematicians like, but students find difficult; if Stewart had stated and used this convention, then he would have been able to incorporate those first few terms into the summations; but it might have made the formulas more difficult for the average student. The convention is that when one gives a formula involving a product of k factors, p_k = f_1 f_2 ... f_k, then the k = 0 case, i.e., the product p_0, is understood to be 1. This makes sense, because then each term is related to the next by p_k = f_k p_{k-1}, even when k = 1. You have seen special cases of the convention in the formulas x^0 = 1 (since x^k means k multiplied by itself k times) and 0! = 1 (since k! means 1 . 2 . ... . k). But many students feel that a product of 0 factors should be 0 rather than 1; hence Stewart does not introduce the convention. (0 is what is called the "neutral element" for addition, meaning that x + 0 = x, while the neutral element for multiplication is 1, since x . 1 = x. So where it makes sense to define the sum of no factors to be 0, the analogous choice for the product of no factors is indeed 1.) If Stewart had introduced this convention, that would have brought the term 1/2! x from the first line of [8] and the term x from the second line into the summations. But what about the term 1 of the first line? That takes a little more thought. Notice that successive factors in the numerator of the general term of the summation on that line differ by 4. So if one wants a factor before "3", it would be "-1". If we change the minus-sign before the summation to a plus sign, we can write that numerator as a product of n terms, (-1) . 3 . ... . (4n-5). Then the term -1/2! x^2 fits in naturally as the n=1 case, without calling on the "product of no factors" convention, while the n=0 case would be correctly given as 1 by that convention. (Of course, sometimes it may happen that a few terms of a series are not given by the same rule as the rest. In such cases, they would really have to be written separately. But that's not the situation here.) ---------------------------------------------------------------------- You ask where the 4n-5 in the display beginning "c_{2n} = ..." on p.1167 comes from. It comes from equation [7] on the preceding page, with the index readjusted to give a formula for c_{2n}, which is what we want here. That requires using 2n-2 in place of n in the formula. Then the 2n-1 in the numerator becomes 2(2n-2)-1 = 4n-5. In [7], the expression with 2n-1 in the numerator is multiplied by c_n; so in this "even coefficient" formula, the 4n-5 is multiplied by the numerator of the preceding even coefficient, the "3 . 7 . 11 . ... .". Stewart probably didn't expect students to think it through this way; the approach he is suggesting seems to be "Compute the first few coefficients, not multiplying them out, but keeping them as products, since that will show the pattern that is developing. We see that the extra term that gets multiplied into the numerator in the successive even coefficients -- for n = 2, a factor of 3, for n=4, a factor of 7, for n=6, a factor of 11 -- is increasing by 4 each time, so it must have the form 4n+C, and looking at any one of these cases, we find C=-5, so that's what we write down." ---------------------------------------------------------------------- You ask about Stewart's statement in Note 4 on p.1168 that in the case he is considering, all the even coefficients will be zero. First, do you understand what he means by "even coefficients"? (It's a slightly shorthand way of speaking. You can probably guess correctly what he means, but let me know just to be sure.) Assuming you do understand, go to the formula he found for the general solution of this differential equation on the preceding page, substitute for c_0 and c_1 the values he gives just before the statement about even coefficients, and see what you get as even coefficients when you make this substition. Let me know the answer you get, or if you run into any difficulty, let me know what it is. ---------------------------------------------------------------------- You ask, regarding the proof of Law 4 on p.A39, "why is |g(x) - M| < epsilon / 2(1+L) ?" I hope that what I said in lecture answered this for you. The inequality that you refer to is not a fact that we deduce. Rather, the definition of "lim_{x -> a} g(x) = M" tells us that _we_can_find_a_delta_ such that, whenever x is at distance < delta from c, the above inequality holds. In fact, it tells us that we can find a delta which, in the same way, ensures any degree of closeness between g(x) and M that we want. We decide that we want exactly that degree of closeness. Why do we choose that degree of closeness? That is what I was sketching when class ended. The idea is to think strategically, "What degrees of closeness of f(x) to L and of g(x) to M can together put f(x) g(x) within epsilon distance from LM?" That strategic thinking led us to choose epsilon / 2(1+L) as the degree of closeness of g(x) to M that we wanted. ---------------------------------------------------------------------- You ask why we need min{\delta_1,\delta_2,\delta_3} on p.A40, line 3, concluding "... isn't it logical to just choose the lowest \delta?" Choosing the lowest value is exactly what the "min" function does. In a concrete situation with particular f, g etc., we can see which is smallest, and use it. This abstract proof covers all such concrete situations, and the same one will not be smallest in all situations, so we express the choice of the least by "min". ---------------------------------------------------------------------- You ask how, near the top of p.A40, $|L| (\epsilon / (2 (1 + |L|)))$ "is reduced to" $\epsilon / 2$. Stewart does not assert that they are equal: note that the formulas you are comparing are not connected by "=" but by "<". The argument is that since $2(1+|L|) > 2|L|$, when they occur in the denominators (with a positive numerator), we have $numerator / (2(1+|L|)) < numerator / (2|L|)$ After noting this, and the fact that the numerator contains $|L|$, one can, as you suggested, cancel the $|L|$ in the numerator with the $|L|$ in the denominator, getting $\epsilon/2$. ---------------------------------------------------------------------- Regarding the third and fourth displayed lines on p.A40, you ask how we know that |L| . \epsilon/2(1+|L|) < \epsilon/2. As I described in class, the "1+" was only put into the denominator so that the fraction would not be undefined if L happens to be 0. If L is nonzero, we have |L| . \epsilon/2(1+|L|) < |L| . \epsilon/2|L| = \epsilon/2. If L=0, then |L| . \epsilon/2(1+|L|) = 0 < \epsilon/2. ---------------------------------------------------------------------- Regarding the proof of Limit Law 5, p.A40, you ask "Why is |g(x)-M| < |M|/2 ?" I can't tell from your question whether you had read the first part of the sentence, saying "... there is a number \delta ... such that whenever 0 < |x-a| < \delta we have". In view of that phrase, what Stewart is saying is not that "|g(x)-M| < |M|/2" is something that automatically holds, but that it is something we can cause to hold, by taking x close enough to a. Given this, your question can be broken in two: "Why do we want to cause |g(x)-M| < |M|/2 to hold?" and "How do we know that we can cause |g(x)-M| < |M|/2 to hold?" Let me know whether you need help with either of these questions. (Hints: The answer to the second question is based on the first part of the sentence. The answer to the first part lies in the remainder of the proof -- what we subsequently use the the inequality |g(x)-M| < |M|/2 for.) ---------------------------------------------------------------------- You ask about the step |g(x) - M| < |M|/2 in the proof of Limit Law 5, p.A40. The first thing to understand is that we are not proving that this inequality holds; we are noting that we can _cause_ it to hold, by restricting x to be sufficiently close to a. I hope it's clear to you that what Stewart is saying, in the sentence leading up to that inequality. Assuming you understand this, your question is "Why do we want to make |g(x) - M| < |M|/2?" The answer is that we want to make the fraction | (M-g(x)) / Mg(x) | small (namely, < epsilon). Now a fraction will be "small" if and only if the numerator is much "smaller than" the denominator. (E.g., a fraction a/b will have absolute value < 1/10 if and only if the numerator is less than a tenth the size of the denominator in absolute value.) In the above fraction, M is fixed, but the numerator and denominator both depend on g(x), so we want to see whether for x close enough to a, we can make g(x) assume values making the numerator "much smaller than" the denominator. The way we will do this is to make the numerator "very small" without letting the denominator get "very small". So first we keep the denominator from getting arbitrarily small, by noting that g(x) is approaching the nonzero value M; so if we make g(x) close enough to M, it will have absolute value near |M|. Specifically, we can get that absolute value to be at least |M|/2 by making |g(x) - M| < |M|/2 (the inequality you asked about). Then, once we have the fixed lower bound |M|/2 on g(x), and hence the lower bound |M^2|/2 on the absolute value of the denominator of our fraction, we can figure out how small we need to make the numerator to get the whole fraction to have absolute value < epsilon. That's what Stewart achieves on the next page, in the sentence starting with "Also". (Incidentally, we didn't _have_to_ make |g(x) - M| at least |M|/2 in the above proof. We could have chosen any value <|M|; e.g., we could have made it at least 3|M|/7 by choosing |g(x) - M| < 4|M|/7. But |M|/2 was just the simplest choice of a number strictly between 0 and |M|.) ---------------------------------------------------------------------- You ask how, at the bottom of p.A40, |M|=|M-g(x)+g(x)| \leq |M-g(x)| + |g(x)| < |M|/2 +|g(x)| is obtained from |g(x)-M| < |M|/2 . The relation |g(x)-M| < |M|/2 is applied to get the final "<". Note that |g(x)-M| is the same as |M-g(x)|. So where the middle term of the long formula above contains |M-g(x)|, this is the same as |g(x)-M|, and hence, by the shorter formula, it can be replaced by |M|/2, with a "<" inserted. The first step in the long formula, on the other hand, is the triangle inequality. Calling M-g(x) "A" and g(x) "B", it is the statement |A+B| \leq |A| + |B|. ---------------------------------------------------------------------- You asked how the assumption f(x) \leq g(x) was used in the proof of Theorem 2 on p.A41. it is used in the vary last two lines of the page. Up to that point, the author has proved that if L were > M, then there would be some \delta such that in a certain interval around a, we would have g(x) < f(x). He now uses the assumption that g(x) \geq f(x) to conclude that this is not true (since g(x) \geq f(x), it can't also be < f(x)). So the assumption L > M can't be true; i.e., L \leq M, as desired. ---------------------------------------------------------------------- You ask about Stewart's statement in the proof of Theorem [2] on p.A41 that L - M > 0 "by hypothesis". What he means is "because we have assumed that L > M" (at the beginning of the proof). When I write him at the end of the semester, I'll suggest that he change this wording. ("By hypothesis" usually means "by the assumptions of the theorem".) ---------------------------------------------------------------------- You ask, in connection with the theorem that the inverse of a one-to-one continuous function on an open interval is continuous (p.A42), whether a function on an open interval can have continuous inverse without being continuous and one-to-one. A function has to be one-to-one for its inverse to be well-defined. On the other hand, if f is one-to-one but not continuous, it can still have a continuous inverse, if we understand the domain of the inverse to be the range of f, even if that is not an interval. For instance, let f, with domain (0,2), be defined to have f(x) = x if x lies in (0,1], and f(x) = x+1 if x lies in (1,2). Then the range of f is the union of the intervals (0,1] and (2,3). On this range, the function f^{-1} takes a point y of (0,1] to y, but takes a point y of (2,3) to y-1. You can check that this is continuous at each point of its domain. Incidentally, in the above example, f was an increasing function. But if we change our definition by making f(x) = 4-x (instead of x+1) for x in (1,2), we find that the function is still one-to-one, and that its inverse is still continuous on the range of f, and that range is still the union of (0,1] and (2,3); but f is no longer everywhere-increasing or everywhere-decreasing. ---------------------------------------------------------------------- You ask about the meaning of the statement on the 5th-from-last line on p.A42 that f "maps" the numbers in one interval onto the numbers in a certain other interval. When one says that a function f maps a point x to a point y, this simply means f(x) = y; "maps" is synonymous with "takes" or "carries". So Stewart is saying that f takes the points of one interval onto the points of the other. A key word is "onto" (which I'm afraid he doesn't explain -- I've now made a note to recommend that in future editions, it be made clear). To say that a function f maps a set X onto a set Y means that not only is the image f(x) of every element x\in X a member of Y, but every member of Y occurs as the image of an element of X. (If we merely know the former statement, then we say that f maps X "into" Y; "onto" expresses both facts. So, for instance, the squaring function carries [-1,1] *into* both [0,1] and [0,2]; it carries [-1,1] *onto* [0,1], but not onto [0,2].) In the sentence in question, the fact that f is increasing forces it to carry (x_0-\epsilon, x_0+\epsilon) into (f(x_0-\epsilon), f(x_0+\epsilon)) -- it can't fall below the starting value or exceed the end-value -- while the Mean Value Theorem shows that it will actually map it onto (not merely into) that interval. You also ask how Stewart asserts on the second line of the next page "... that |y - y_0| < \delta, and therefore |f^-1(y) - f^-1(y_0)| < \epsilon, ..." No, the statement in the second line of p.A43 is "*if* |y - y_0| < \delta, *then* |f^-1(y) - f^-1(y_0)| < \epsilon". He is not saying that the former is true and hence the latter is true, but that every y which satisfies the former inequality satisfies the latter. To see that this is true, note that |y - y_0| < \delta means y\in (y_0-\delta, y_0+\delta), and |f^{-1}(y) - f^{-1}(y_0)| < \epsilon states that f^{-1}(y) is in a similar interval. So take these statements about what is in what interval, and compare them with what is proved toward the end of the preceding page. ---------------------------------------------------------------------- You ask about the formula |g(x)-b| < delta_1 in the last display of the proof of Theorem 8 on p.A43, and in particular, why it has delta_1 where you would expect an epsilon. There is nothing sacred about the letters "epsilon" and "delta". It is simply convenient to write delta for our bound on the deviation of the input of a function whose limit behavior we are studying, and epsilon for our bound on the deviation of the output. That convention helps us keep track of what we are doing. But in this proof, the input of f is the output of g, so whichever name we give to the number needed at that point, it can't fit the convention with regards to both f and g. In any case, I hope you read the whole sentence, and not just the formula, so that you saw that it was not asserting that "|g(x)-b| < delta_1" in general, but only that there exists a delta such that for all x within delta distance of a, "|g(x)-b| < delta_1" holds. That there exists such a delta follows from the definition of the statement that lim_{x->a} g(x) = b. Namely, since that statement begins "for every epsilon > 0", we can, in particular, apply that statement with the delta_1 we have found in the role of that epsilon. ---------------------------------------------------------------------- You ask two questions about the proof of Theorem 8 on p.A43. Let's start with the second: > .. And why is it necessary to introduce the y variable into the proof? We are looking at the function f(g(x)); so the output of g gets fed into f as its input. In discussing this situation, we need to use the fact that f is continuous at b, which is a statement about the relationship between inputs and outputs of f in general. We could call the input of f "x" in that discussion, but this would be confusing, because when we use g(x) as the input of f, this is different from the "x" that is the input of g. So it is better to use a different letter for any values that occur as inputs of f, in particular, those arising as outputs of g. Now to your first question: > Why is it that if 0 < |x-a| <\delta then |g(x)-b| <\delta_1 ? Stewart is not saying that this is true for any old \delta; if you look at the line before the formula you are asking about, you see that Stewart says "there exists \delta > 0 such that". If, after thinking about the point, you still have a question about it, then ask again, making clear that you understand what it is Stewart is asserting. (And if you did understand that, and had made it clear in your question, then I could have addressed it to begin with. One way or the other, there's something to be learned from this -- either that in looking at Stewart's formulas, you need to read them in the context of the sentence containing them, or that in formulating your question of the day, you need to make clear how much you do understand before getting to the point that you don't understand.) ---------------------------------------------------------------------- You ask why, in the final lines of the proof of Theorem 8 on p.A43, we can say that |g(x)-b| < \delta_1 implies |f(g(x)) - f(b)| < \epsilon. This because \delta_1 was chosen to satisfy the third display in the proof: "if 0 < |y-b| < \delta_1 then |f(y)-f(b)| < \epsilon". Stewart is applying this statement with g(x) in the role of y. However, as I said in class, Stewart appears to be neglecting the condition "0 < |y-b|" in that condition. But in this situation, the restriction "0 < |y-b|" can be dropped from that display, because if 0 = |y-b|, i.e., y=b, then f(y)=f(b), so |f(y)-f(b)| = 0 < \epsilon. ---------------------------------------------------------------------- Regarding the proof that tan theta > theta on pp.A43-A44, you ask > After the step |PQ|<|RT|<|RS|, how does Stewart arrive at > L_n<|AD|=tan(theta)? The sum of the terms of the form "|PQ|" is L_n, the sum of the terms of the form "|RT|" is AD, and |AD| = tan theta. In the first assertion, you should understand that Stewart's picture just shows the case where n=3, and he has only labeled one of the secants to the circle "PQ"; but in the situation he is talking about, n can be any positive integer, and "PQ" represents any one of the n secants (which include, in his picture, the two labeled AP and QR). So we see that if we add them all up, we get the length L_n of the inscribed "polygon" (string of segments -- in his picture, APQB.) Similarly, in the situation he is talking about, "RT" represents any one of the n pieces into which the side AD is divided (including, in his picture, those labeled AR and SD); so when we add these up we get AD. I hope the equation |AD| = tan theta is clear. ---------------------------------------------------------------------- You both ask about Stewart's statement that to prove Cauchy's Mean Value Theorem we "change" the function h(x) given in equation [4] on p. 286 to the h(x) given on p.A45. What he means is that we take the proof of the ordinary Mean Value Theorem on p. 286, and where that theorem uses the function h(x) defined at the top of that page, we instead use the function h(x) defined on p. A45. He leaves it to you to verify that the same arguments used on p. 286, when applied to the new function, will give the new version of the Mean Value Theorem. Probably "change" was not the best choice of words; it might have been better to say "use ... instead of ...". (Note that the h(x) on p. 286 is a special case of the one on p. A45: it is the case where g(x) = x. Similarly, the old Mean Value Theorem is the case of Cauchy's version where g(x) = x.) ---------------------------------------------------------------------- You ask what the function f(x) - f(a) - ((f(b)-f(a))/(g(b)-g(a))(g(x) - g(a)) in the proof of Cauchy's Mean Value Theorem on p.A45 represents. It measures the failure of the point (f(x),g(x)) to lie on the line connecting (f(a),g(a)) and (f(b),g(b)). This is easier to see if we replace the variable x by "t", and think of the curve given by the values of f and g as the parametrized curve x = f(t), y = g(t). Then the expression (x - f(a)) - ((f(b)-f(a))/(g(b)-g(a))(y - g(a)) measure the failure of any point (x,y) to lie on the line described. Namely, x-f(a) and y-g(a) measure the horizontal and vertical displacements of (x,y) from the point (f(a),g(a)), and the above expression measures the failure of these displacements to be in the same ratio as the displacement of (f(b),g(b)) from (f(a),g(a)). The set where the above function has any constant value is a line parallel to the one connecting (f(a),g(a)) and (f(b),g(b)). When the curve (f(x),g(x)) gets as far as it ever does from that curve, on one side or the other, then it will, in general, be tangent to one of those parallel lines, which is why f'/g' will equal the ratio ((f(b)-f(a))/(g(b)-g(a)), as stated in Cauchy's Mean Value Theorem. ---------------------------------------------------------------------- You ask why we use the extended functions F and G rather than f and g in the proof of L'Hospital's rule on p.A46. This is because we don't have values for f and g at the endpoint 0 of the interval we are looking at, so we can't apply Cauchy's Mean Value theorem to them on that interval. But the assumption that f and g approach 0 as x --> a tells us that the "right" way to extend them, to make them continuous, is by taking the values at a to be 0. We could, sloppily, say "Let us extend f and g to equal 0 at x=a", and so continue to use the symbols f and g rather than new symbols F and G. Experienced mathematicians, who know what precise statement that sloppy statement stands for, might well do so. But to develop this material for beginners, Stewart is careful to state things precisely, even though that means introducing new symbols. Incidentally, the difficulty that we overcome by extending f and g to be 0 at a is one that cannot be overcome when we try to prove L'Hospital's rule for functions that approach infinity instead of 0. There we simply can't apply Cauchy's Mean Value Theorem to an interval [a,x], because there's no way of defining f and g at a that makes them continuous; so we have to use intervals with one end x and the other end much closer to a than x is; and as I sketched in class, and you will see in your homework, this makes for a more complicated proof than the (0,0) case. ---------------------------------------------------------------------- Regarding the proof of L'Hospital's Rule on p.A46 you ask > In this proof, is there any reason to define F(x) and G(x) > other than to establish the fact that F(x) = G(x) = 0 and > simplify the result of Cauchy's Mean Value Theorem? It not merely "simplifies the result of" Cauchy's Mean Value Theorem -- it makes it possible to apply that theorem to our functions on the interval [a,x]. One might, as I did in lecture, cut corners and say "redefine f and g so that they are both 0 at a", rather than introducing new symbols F and G. It all depends on whether one feels it more important to emphasize the fact that the new functions are "except for one little detail (being undefined or being 0 at a)" the same as the old ones, which led to my choice of using the same symbol, or whether one is more worried that using the same symbol for two (slightly) different things might confuse the reader, which led to Stewart's choice to use different symbols. ---------------------------------------------------------------------- I hope what I said at the end of class answered your question about why on p.A46 we create "new functions" F and G. It's basically a quibble -- because Cauchy's Mean Value Theorem is stated for functions that are defined and continuous at the endpoints of the given interval, we need such functions; and as given, f and g are not necessarily defined at a. One could informally say "Let us extend f and g by defining f(a)=g(a)=0", and most mathematicians in writing for other mathematicians would do so. But that relies on the reader understanding that we've switched the meaning of the symbol, and in addressing students who are not familiar with precise mathematical reasoning, Stewart wants to avoid having this seemingly imprecise statement, so he introduces new symbols for the modified functions. ---------------------------------------------------------------------- You ask whether a delta-epsilon proof is required for L'Hospital's Rule (p.A46). I think that the reason Stewart does not give one is that he has written his book so that instructors who consider the delta-epsilon definition of a limit "too hard" can have their students skip section 2.4. He then needs to make as much as possible of the rest of the book independent of that section; in particular, he words the proof of L'Hospital's rule so that it does not refer to epsilons and deltas. But this requires him to use the somewhat handwavy wording "if we let x --> a^+, then y --> a^+", and to argue by the displayed equation that follows this, in which the relation between "x" and "y" implicitly comes from the paragraph that precedes. A precise delta-epsilon argument gets rid of the handwaviness. On the other hand, most any mathematician who has worked with limits would know how to translate a proof such as Stewart gives here into a delta-epsilon proof. So one could say that for an experienced mathematician, the difference between a delta-epsilon proof and the proof Stewart gives is not that important. ---------------------------------------------------------------------- Your question concerns the relation between x and y in Stewart's proof of L'Hospital's rule on p.A46. In the calculation, x represents any value "near enough to a" on the given side; in the notation I used in class, any t\in (a,a+\delta), while y represents a value such that f(x)/g(x) = f'(y)/g'(y) which by Cauchy's Mean Value Theorem can be found in (a,x). So as he notes, as x approaches a, the corresponding value of y also approaches a. In his displayed calculation, he deduces that since for each x and y so chosen, f(x)/g(x) = f'(y)/g'(y), it follows that lim_{x -> a} f(x)/g(x) = lim_{y -> a} f'(y)/g'(y). Logically, this could also be written lim_{x -> a} f(x)/g(x) = lim_{x -> a} f'(x)/g'(x); but to remind us where this comes from, he is using the letters x and y as them in the preceding discussion. The way I did it in class (recall that this was the easy argument I gave, not the hard one!) using epsilons and deltas avoids having to say things like "as x approaches a, y approaches a"; we can simply say "for every epsilon, choose delta such that ...", and show explicitly why |f(x)/g(x) - L| < epsilon for x in an appropriate range. ---------------------------------------------------------------------- You ask how in the middle of p.A46, the limit of f'(y)/g'(y) can be the same as the limit of f'(x)/g'(x), when y is not the same as x. The limit is not something defined in terms of a single value of x or y, but something whose definition is based on considering all x or y in a certain range, and looking at whether the delta-epsilon definition of that limit, as we select our x or y in that range, is true. Stewart's proof is a little handwavy, in that he first considers any particular x, and chooses a y in terms of it; then he talks about limits as these x and y vary (approach a). The version of the proof that I gave in class, using epsilon and delta (the easy proof, for f, g -> 0, not the hard proof for f, g -> infinity), translated Stewart's argument into a precise form that I hope you won't have trouble with. If you do, ask at office hours (or by e-mail if you think you have a question that can be stated and answered briefly). ---------------------------------------------------------------------- You ask about Stewart's argument on p.A46 that "if we let x->a+ then y->a+". The precise argument, which I gave in class in a hurry, was this: Given \epsilon > 0, we want to find a \delta such that (1) for x\in (a,a+\delta) we have |f(x)/g(x) - L| < \epsilon. To do this, we use the fact that lim_{x->a+} f'(x)/g'(x) = L to find \delta such that (2) for x\in (a,a+\delta) we have |f'(x)/g'(x) - L| < \epsilon. Now given x\in (a,a+\delta) we know by Cauchy's Mean Value Theorem that there is some y\in (a,x) such that (3) f(x)/g(x) = f'(y)/g'(y). Because x\in (a,a+\delta), the interval (a,x) is a subset of (a,a+\delta), so the y we get is also in (a,a+\delta) (this is the step corresponding to Stewart's "if x-->a+ then y --> a+"), so by (2), |f'(y)/g'(y) - L| < \epsilon, and combining with (3) we get |f(x)/g(x) - L| < \epsilon, proving (1), as desired. ---------------------------------------------------------------------- You ask about Stewart's statement, in the last calculation in his proof of l'Hospital's rule for a = infinity on p.A46, that lim_(t-->0+) [f(1/t)/g(1/t)] = lim_(t-->0+) [f'(1/t)(-1/t^2)]/[g'(1/t)(-1/t^2)], "by l'Hospital's Rule for finite a". We think of f(1/t) and g(1/t) as functions of t, and want to see what they do as t --> 0+. L'Hospital's Rule for finite a = 0 says that their ratio will have the same limit as the ratio of their derivatives does, if that limit exists. The derivatives of f(1/t) and g(1/t) are f'(1/t)(-1/t^2) and g'(1/t)(-1/t^2). ---------------------------------------------------------------------- Regarding the proof of L'Hospital's rule (p.A46), you wrote, > ... you mentioned a case where you can use L'Hospital's rule in some > case that is not 0/0 or infinity/infinity. How does that work? The proof of the infinity/infinity case really only uses the fact that g(x) --> infinity. So even if f(x) doesn't approach infinity, one can use the rule. Of course, if g(x) approaches infinity but f(x) doesn't, then the only question is whether f(x)/g(x) approaches 0, or no limit at all. (Examples with no limit are f(x) = x^2 sin x, and g(x) = x or x^2.) I don't see any examples where L'Hospital's rule could show that such a limit is zero where the answer isn't obvious some other way. But one might run into one. ---------------------------------------------------------------------- To slightly reword your question -- you ask when one can take a step in a proof as "clearly true", and so not in need of a formal argument. That's a good question. As a first approximation, I would say: When both the person giving the proof and the person to whom the proof is addressed understand the situation well enough so that they *could*, if asked, fill in the detail of the argument that are being omitted. In the proofs Stewart gives, lots of points could be omitted that Stewart would be able to fill in if asked; but since he is addressing this material to students to whom it is new, he gives much more in the way of detail than he would if addressing, say, more advanced students. (One sort of step that I would think might be omitted even at this level is that going from statement 1 of the theorem at the bottom of p.A46 to statement 2 of that theorem. The second is gotten by taking the contrapositive of the first, writing "diverges" for "does not converge", and renaming the values of x being considered.) One difficulty with what I have said is that in teaching students calculus, we can't go through a whole axiomatic development of the properties of the real numbers; so we do rely on students "knowing" that certain things are true even if they can't justify these. And the boundary between what we teach students how to prove, and what we take for granted that they know, is sometimes fuzzy. (In Math 104, an axiomatic development of the real numbers is given. But even there, it is taken for granted that students can handle logic and simple set theory, though those haven't yet been developed formally for them. This is finally done in Math 125 and 135. These are not required courses for Math majors, but they are electives.) Anyway, if you point to a step where you think Stewart is giving details that seem unnecessary to you, I can say whether I agree that they are, indeed, pretty straightforward, or whether there is some point that you might be missing. ---------------------------------------------------------------------- You ask whether the results on intervals of convergence that Stewart proves on p.A46-47 could be proved more easily using the Ratio Test. The proof you sketched works if the ratios |c_{n+1}| / |c_n| approach some limit, but not otherwise! The kind of example that I showed in class solving "Exercise 46" for 11.6 (with c_n given by powers of 1/2^n for n odd and 1/3^n for n even) show that this need not happen. ---------------------------------------------------------------------- You ask whether Stewart's proof of l'Hospital's rule also works in the case where the limit L of f'(x)/g'(x) is infinity or - infinity. The calculation in the middle of the page, ending with "= L", leaves out details. If one fills in the details, they would involve "epsilon and delta" when L is finite, and "M and delta" (as in Definition 6, p. 115) when L is infinity or - infinity; but both cases work fairly straightforwardly. ---------------------------------------------------------------------- You comment on Stewart's taking \epsilon = 1 in the proof in today's reading (p.A47, line 2). The reason he can do this is that to prove that the series converges for |x| < |b|, we don't need the full assumption that it converges for x = b, but (as the version of the proof that I gave shows), only the weaker condition that the summands are bounded at x = b. He finds it convenient to use instead the fact that the summands are eventually bounded by 1 (so where I compared the sum at x with a geometric progression C |x/b|^n, he compares a tail of the sum with the corresponding tail of the simpler progression |x/b|^n). Either way, the idea is that a very weak condition holding at x = b implies a very strong condition (absolute convergence) at all points with |x| < |b|; and he uses the very weak condition in the form "the terms are eventually of absolute value < 1". ---------------------------------------------------------------------- Regarding the proof of the first theorem stated on p.A47, you ask what it means for S to be nonempty. The empty set is the set with no elements; so a set is nonempty if it contains any elements. In particular, S is nonempty because it contains b. You then ask, > If we can only guarantee that x=b is contained in the set, does > this one number satisfy the conditions of the Completeness Axiom? For S to satisfy the hypothesis of the Completeness Axiom, we have to know that it is nonempty and bounded. If we knew that it consisted only of the element b, then, yes, that would make the axiom applicable; but if all we know is that it contains b, that isn't the same as knowing that it consists only of b; so we need more to show that it has a least upper bound; namely, we need to know it is bounded. Stewart establishes that on lines 3 and 4 of the proof. ---------------------------------------------------------------------- You ask why Stewart can begin the proof of the Theorem in the middle of p.A47 by saying "Suppose that neither case 1 nor case 2 is true." To prove a statement of the form "X or Y or Z is true" is equivalent to showing "If neither X nor Y is true, then Z is true", and that's what Stewart is doing. He is approaching the proof this way because (i) and (ii) are relatively simple conditions, so it is easy to see what must follow from their being false, and to deduce that this leads to the more complicated condition (iii). (Another condition equivalent to "X or Y or Z is true" is "X, Y and Z are not all false". So in proving a different sort of theorem of the form "X or Y or Z is true", one might begin by saying "Suppose X, Y and Z are all false", and showing that from this one can deduce a contradiction.) ---------------------------------------------------------------------- Concerning the proof I gave of the theorem on p.A47 describing intervals of convergence, you asked whether a value |x| that was bigger than one upper bound might be another upper bound, and still a member of S. You may be confused about what "upper bound" means. For instance, if S is the union of the two intervals [1,2] and [3,4], you might be thinking of 2 and 4 as "upper bounds" of S. But that is not what the term means; it means a number that is \geq all members of S. (See paragraph on p. 698 preceding box [12].) So 2 is not an upper bound of the set just described; but 4 is, and so are 4.2, 500, etc.. As in that example, any set S that is bounded above has many upper bounds. If it is nonempty, it will have a least upper bound M; the real numbers > M will be its other upper bounds. But if a number x is > M, it can't belong to S, since M being an upper bound to S makes M \geq all members of S; but by assumption it is not \geq x. (If you did think 2 and 4 were what were called "upper bounds" of the union of [1,2] and [3,4], then you might wonder, "What can one call them?" In fact, the set \{1,2,3,4\} is what is called the "boundary" of the above set; so 2 and 4 are two of the points of that boundary. This is not a concept we will introduce in H1B, but you are liable to see it in some upper division and/or graduate courses: 104, 140, 202A, ...). ---------------------------------------------------------------------- You ask whether the material in Appendix G (pp.A50-A56) would need to be proved through epsilon-delta methods on a test. "epsilon-delta" proofs occur mostly in setting up the foundations of calculus. Those foundations include results which have been proved using "epsilon-delta" methods, and which then can be used to prove other results. So the answer is no, what Stewart does here is essentially correct; though as I pointed out in lecture, there are a couple of places that need to be filled in. One is about differentiability of inverse functions, where the proof that I sketched has a key step, changing variables in a limit statement, that would need epsilon-deltas for a full proof. For the other, the proof that since ln 2^n approaches infinity as n does, ln x will approach infinity as x does, I showed the an "M, N" proof, which is the equivalent of an "epsilon-delta" when x and f(x) are approaching infinity instead of real constants. ---------------------------------------------------------------------- Regarding the proof of Law 1 on p.A51 you ask, > How can we replace a (a constant) with y (a variable) in the proof? > Since constants and variables are treated differently when > differentiating, why can we substitute a constant a for variable y? Constants and variables are both numbers. When we differentiate or integrate, we fix all but one of the numbers denoted by letters, and consider how the resulting expression varies as we vary that one number, which we call the variable with respect to which we are differentiating or integrating. But in a different calculation we can change what we are varying. Note that in Definition 1 on p. A40, the variable of integration is t, while x is a constant, which could just as well be written a. In the first display on p. A51, x is again a constant from the point of view of the integration; but from the point of view of the differentiation, it is the variable with respect to which we are differentiating. (And this always happens when we are applying the Fundamental Theorem of Calculus.) > And if we took the ln of a product of two functions ln(g(y) f(x)) > would the same laws still hold up? Right. ---------------------------------------------------------------------- You ask whether one can correctly prove the third law of logarithms on p.A51 by differentiating ln(x^r), noting that it has the same derivative as r ln(x), and verifying that the constant by which ln(x^r) and r ln(x) differ must be 0. Well, differentiating ln(x^r) requires knowing the derivative of x^r. Looking through the early sections of Stewart, I see that he proves the formula for the derivative of x^n where n is a positive integer directly (p. 175), and then on the next page, states the rule for any real number x, saying he will prove it in section 3.6. In that section, he proves it using properties of the logarithm, including the 3rd law of logarithms. (See the first display in the proof on p. 221. Though he writes "n" for the exponent, he has said it represents any real number.) So the proof that you describe could not be used in the context Stewart has set up in this section, where we have put aside everything we had derived by methods that assumed real exponents, etc.. However, the formula for differentiating x^r when r is rational _could_have_ been derived (with more work) using the formula for x^n where n is an integer; and if such a derivation had been given, your proof would be correct. (Needless to say, I won't give on an exam any question where you would have to sort out what was proven how in Math 1A to know whether your proof of something is valid!) [SENT A FEW HOURS LATER:] I see that your proof of the third law of logarithms, which I criticized, follows the hint in Stewart, Exercise 5, p. A57 ! Unfortunately, Stewart doesn't say in this exercise whether, in the "third law of logarithms", x^r is meant to be defined as an integer root of an integer power of x, which is necessarily what is meant in the statement of that law on p. A51, or by the more general equation 13 on p. A54. If the "integer root of an integer power" is meant, then, as I said, one can't use the law d/dx x^r = r x^{r-1}, because that was based on Stewart's earlier treatment of exponentiation, which he says at the end of the second paragraph on p. A50 we are not going to use here. On the other hand, if the equation 13 definition is meant, then differentiation is not needed; the result comes easily out of the definition. So either way, I think his hint is not appropriate. I guess I should e-mail the class about this, as a correction to the homework. ---------------------------------------------------------------------- You ask how we know to substitute 1/y for x in proving Law 2 of logarithms, p.A51. Well, it's usually simpler to use something one has proved before than to start from scratch, and at this point we have proved Law 1, describing the logarithm of a product. Now dividing by y is the same as multiplying by 1/y, so we would like to use 1/y as one of the terms in Law 1. That's half the story. The other half is to remember what "1/y" means -- it is the number which, when you multiply it by y, gives 1. So to figure out its logarithm, we first apply Law 1 to the product of 1/y and y. From the resulting equation, we get the law ln(1/y) = - ln y. We then use that in Law 1 with an arbitrary x, and with 1/y in place of y, to find ln x/y. ---------------------------------------------------------------------- You ask about graphing functions on the complex plane (or the Argand plane, as it is named on p.A57). Hard to do. If f is a function C --> C, then using a 3-dimensional graph, one can graph the real part of f(x+iy) as a function of x and y; or the complex part, or the absolute value, etc.; but it would take 4 dimensions to graph the real and complex parts together. I think some of the plaster models in the cabinet in the Common Room, 1015 Evans, represent graphs of the real parts of certain complex functions. (Such models must have been popular around 100 years ago. Many math departments have them, but hardly anyone looks at them nowadays.) Something else one can do is restrict the function to a line in the complex plane, say the real axis or the complex axis, and so get a map R --> C, which can also be graphed in 3 dimensions, this time as a curve. Anyway, combining the intuition about how functions R --> R behave that we get using graphs, and the theorems proved in Math 185 about how complex functions behave, one can develop an intuition for such functions, even though one can't graph them entirely satisfactorily. ---------------------------------------------------------------------- You ask whether Stewart's statement, regarding Figure 3 on p.A58, that |z| = \sqrt(a^2 + b^2), should be \sqrt(a^2 - b^2), since (bi)^2 = -b^2. The statement is correct as he gives it. It is a standard definition, so the only challenge one can raise is "Does that function of a+bi have useful properties, that would justify making such a definition?" It does, as is shown in the next few calculations. Something that might have led you to think b^2 should be (bi)^2 is the thought that the vertical line labeled "b" in Figure 3 should be labeled "bi". But the labels Stewart puts on line segments in that figure give their geometric lengths, not the complex number representing the difference between their endpoints. (So the red arrow is labeled with a real number, not with the complex number a+bi.) Taking this into account, I hope the figure now makes sense. ---------------------------------------------------------------------- Regarding the statement on p.A59 that the argument of a complex number is not unique, you ask > ... Is it just referring to the fact that an angle "a" is equal > to "a + 2pi*n" where n is any integer? Essentially, yes. For some purposes, mathematicians do create a kind of entity in which a and a + 2 pi n are actually equal. But for our present purposes, we have to regard a as a number, and then a + 2 pi n is not the same number as a; it's a different number, though one at which the sine, cosine etc. have the same values that they have at a. A consequence is that "arg" is not a well-defined function; the symbol "arg(z)" means "any one of the infinitely many real numbers theta that make the boxed equation on p. A59 true". This nonuniqueness has important consequences for taking N-th roots. When we divide "arg(z)" by N, the results no longer differ by multiples of 2 pi, so they no longer lead to the same complex number. Rather, as noted on p. A62, every nonzero complex number has N distinct N-th roots. ---------------------------------------------------------------------- You ask why the \theta occurring in the polar form of a complex number (p.A59) is called its "argument". I don't know. Looking in the online Oxford English Dictionary, one of the definitions of "argument" is: Astr. and Math. The angle, arc, or other mathematical quantity, from which another required quantity may be deduced, or on which its calculation depends. Their examples show it being used by Chaucer, I think with reference to astronomical calculations. My guess is that this sense split into two: When f is a function, then in the expression f(x) we now call x the "argument of f"; so it is a quantity on which another quantity depends. On the other hand, its use referring to an important angle must have led to the specialized sense in the study of complex numbers that Stewart gives here. (Note that in primitive astronomy, everything that one could measure was an angle in the sky, so the two senses were not that different.) As to what such quantities or angles have to do with the other meanings of the word "argument", the OED the gives not a hint. ---------------------------------------------------------------------- You ask how the use of complex numbers (pp.A57-A63) in describing real-world wave motion can be justified. I'm not sure what sort of use of complex numbers you are referring to, but I'll mention several. (a) As a consequence of the equation e^{it} = cos t + i sin t, one gets cos t = (e^{it} + e^{-it})/2 sin t = (e^{it} - e^{-it})/2i. Hence every linear combination a sin t + b cos t (where a and b are real numbers) can be written c e^{it} + d e^{-it} where c and d are conjugate complex numbers. For computations, this expression can be much more convenient than the original expression, since exponentials behave more simply than trig functions with respect to multiplication, differentiation, etc.. As long as we specify that c and d are conjugate, the expression c e^{it} + d e^{-it} will be real-valued, and so can describe a wave function in the real world. (b) The wave equation is linear, so any linear combination of functions that satisfy it satisfies it again. A consequence is that the condition that c and d be conjugate is irrelevant to the mathematical study of the wave equation, and so can be dropped. We then find that the simplest functions in terms of which to write the general function as a linear combination are e^{it} and e^{-it}. So for simplicity, one uses these; and since they behave identically (they are conjugate to one another), one may, for simplicity, use just one. It doesn't represent a "real-world" wave function, but it's easy to work with, and one can get real-world wave functions by taking linear combinations of it and its conjugate. (c) I believe that in quantum mechanics, one posits wave-functions that are genuinely complex-valued. Since the relation between quantum mechanics and the world we know is mysterious anyway, and I have only a layman's knowledge of the subject, I won't try to guess whether here, complex numbers really do occur in nature, or whether there is a range of possible mathematical formulations of the same not-directly-observable phenomena, in which case the founders of quantum mechanics may have chosen the mathematically simplest, in the absence of any other criterion for preferring one above others ... or what! ---------------------------------------------------------------------- You ask how to prove the second displayed formula on p.A60. Well, z_1/z_2 is the number which, when multiplied by z_2, will give z_1. Equation [1] on that page tells you how to multiply complex numbers expressed in polar form. You should be able to use it to set up the problem "what is the polar form of the complex number which, when multiplied by z_1, gives z_1?" Try it, and let me know whether you can carry it through, or if not, where you have trouble. ---------------------------------------------------------------------- Regarding complex exponentiation, defined on p.A63, you ask about raising numbers c other than e to complex powers. That is simple in the abstract, tricky in reality. The abstract answer is: let "ln c" be a complex number r such that e^r = c, and define a^z = e^{r z}. The complication is that there are infinitely many such complex numbers r. (Even for c = e, there are infinitely many; namely 1 + any integer multiple of 2\pi i, because e^{2\pi i} = 1.) So in complex analysis, one can't speak of "the" exponential function with base a; one has to choose such an r and then study the function determined using that r. ----------------------------------------------------------------------