ANSWERS TO QUESTIONS ASKED BY STUDENTS in Math 104 Fall 2003 and Spring 2006 and Math H104, Fall 2006 taught from Rudin's "Principles of Mathematical Analysis", 3rd ed. ---------------------------------------------------------------------- You ask where Rudin gets the expression (3) on p.2. I talked a little about this in class on Wednesday; I guess you weren't there then. Briefly: if p^2 < 2, we want show that we can increase it a little, without making its square > 2. So numbers whose squares are just a tiny bit < 2 should be increased by just a tiny amount; i.e., the amount by which we increase it should be small if p^2 - 2 is. Using p^2 - 2 itself for that amount turns out not to work; we overshoot. Putting in the denominator p + 2 can be thought of as a correction found by trial and error that works. For a more complicated approach that arrives at the same formula in a "principled" way, you can see Exercise 1.1:1 of the exercise handout https://math.berkeley.edu/~gbergman/ug.hndts/m104_Rudin_exs.pdf . You also ask how Rudin gets the statements "q>p" etc.. from (3). Notice that the formula for q is gotten by subtracting from p a certain fraction (first step of (3)). If p^2 - 2 < 0, you should be able to see that this fraction is negative (can you?), so subtracting it increases p. ---------------------------------------------------------------------- You ask what precisely is meant by the statement in Remark 1.2, p.2, that the set of rational numbers has gaps. This isn't formulated as a precise mathematical statement. The idea is that if you are looking for a number whose square is 2, and you start with smaller numbers and square them, getting values < 2, and you keep increasing your numbers -- you suddenly find that the value is >2, with "2" itself somehow passed by. The squaring function is continuous, so the explanation isn't that it "jumped" past 2. Instead, one thinks of there being a "gap" in Q where a number with square 2 ought to be. A statement about Q that can be made precise, which Rudin deduces from this, is that it does not have the least upper bound property. ---------------------------------------------------------------------- You ask whether Definition 1.5 of "ordered set" on p.3 shouldn't be supplemented by a condition saying "x=y and y=z implies x=z". Good question. Although mathematics deals with many different ordering relations, so that it must be explained what "an order relation" means, the concept of equality is taken to be a basic logical one: x = y means that x and y are the same thing; so the properties of this relation are dealt with in logic, rather than in the discussion of one or another field of mathematics. (Mathematics does deal with other concepts that are _like_ equality, but not exactly the same. These are called "equivalence relations", and one of the conditions for a relation " ~ " on a set to be an equivalence relation is indeed "x ~ y and y ~ z implies x ~ z", Equivalence relations come up in a fundamental way in Math 113 and later courses; they may also be discussed in Math 74 and 110. If you have seen the concept of two integers a and b being congruent mod another integer n, this relation "congruent mod n" is an equivalence relation.) ---------------------------------------------------------------------- You ask what it means to say in Definition 1.6, p.3, that an order < "is defined". It means that there is some particular ordering which we will be working with, and denoting by the symbol "<". As to your question of whether a set can have more than one ordering -- yes, definitely. It happens that the structures we will be looking at, Q and R, each have only one ordering which satisfies the definition of an "ordered field". But each has many orderings that satisfy the definition of "ordered set". ---------------------------------------------------------------------- You ask what the need is for the condition "beta \in S" in Definition 1.7, p.3. Well, before a statement like "x _< beta" can make sense, we have to have beta and x together in some ordered set; the relation "_<" is only meaningful in that case. In this definition, the ordered set is S, so we have to assume that beta belongs to it. (Also, if we change the ordered set being considered, the answer to the question "Is E bounded?" changes. For instance, if E is the set {1/2, 2/3, 3/4, 4/5, ...}, and S is the set of real numbers r such that 0 < r < 1, then in S, the set E is not bounded above. But if instead we take S to be the set of all real numbers, then E is bounded. So what Rudin defines could be called, more accurately, the condition "E is bounded above in S".) ---------------------------------------------------------------------- You ask whether the empty set is bounded above (Definition 1.7, p.3). If one considers the empty set as a subset of an ordered set S, then the empty set is bounded if and only if S is _not_ the empty set! The general observation is that a statement "there exists x\in X such that P(x) is true" is always false when X is the empty set, no matter what condition P is (because the empty set doesn't have any x that can satisfy it), while a statement "for all x\in X, P(x) is true" is always true when X is the empty set, no matter what condition P is, because requiring something of "every x\in X" does not require anything in that case. So if S is a nonempty ordered set, every element x\in S satisfies the condition of being an upper bound for the empty subset E: the condition it has to satisfy, "for all y\in E, x >_ y", is true by the first observation above. So E is bounded. But if S is empty, there can't be any x\in S satisfying that condition (because there isn't any x\in S); so then E is not bounded. (In general, Rudin won't talk about bounding the empty set; so if the above is hard to digest, it won't make a problem for you with the course.) ---------------------------------------------------------------------- Regarding Example 1.9(b), p.4, you write > It is stated that sup E may or may not be an element of E, however > nothing is ever said about inf E and whether or not it can be in E ... What I strongly advise you to do when wondering whether some situation can happen is look for examples! Even though Rudin hasn't defined the real numbers at this point, you know about real numbers from all the other math you have seen, and you know what the standard ordering on that set is, so you can ask whether you can think of a set E of real numbers which has an inf that belongs to E, and another set E' which has an inf that doesn't belong to E'. Looking for examples is a fundamental kind of mathematical thinking; I will expect students to do it in exam questions, in homework problems, etc. in this course. ---------------------------------------------------------------------- You say, regarding Theorem 1.11, p.5, "I don't understand why there must be a greatest lower bound property for a set that has the least upper-bound property." Well, you're not supposed to see that it is true by just looking at the _statement_ of the theorem! That's why the theorem has a _proof_. Have you read through the proof? If so, do you have a difficulty with it? Can you follow it up to some point, and then have trouble seeing why the next step is true? If so, ask me about the point at which you run into difficulty. See the paragraph on the back of the course information sheet beginning "Note that if in my office hours ... ". ---------------------------------------------------------------------- Referring to the definition of field on p.5, you ask "How is a field different from a vector space discussed in Linear Algebra?" Fields and vector spaces both have operations of addition that satisfy (A1)-(A5), but while a field has an operation of multiplication, under which two elements x, y of the field are multiplied to give an element xy of the field, in a vector space there is not, generally, an operation of multiplying two elements of the vector space to get another element of the vector space. Instead, there is an operation of multiplying members of the vector space ("vectors") by members of some field ("scalars") to get members of the vector space. So in its definition, a vector space is actually a less basic kind of entity than a field. (In some vector spaces, there are also operations of multiplying two vectors together, to get either a vector or a scalar; e.g., the "cross product" and "dot product" of R^3. But these generally do not satisfy (M2)-(M5). Also, these extra operations are not part of the concept of vector space, but are special features of some vector spaces.) ---------------------------------------------------------------------- You ask why Rudin explicitly says that 1 is not equal to 0 in the definition of a field (p.5). If one did not impose this condition, then the set consisting of just one element, say "z", with operations z+z=z, zz=z, would be a field (with z as both "0" and "1"). But the properties of that structure are so different from those of other structures that satisfy those axioms that this would require us to name it as an exception in lots of theorems! So that definition is not considered a natural one, and the definition of "field" is set up to exclude that object. After all, one wants one's definitions to describe classes of entities that have important properties in common. ---------------------------------------------------------------------- You observe that Rudin proves various results from the axioms on pp.5-6, and ask how one proves these axioms. The concept of "axiom" is something I wish I had had time to talk about in class yesterday. Notice that the axioms on p.5 are all parts of Rudin's "Definition 1.12". In making such a definition, one isn't asserting that those conditions hold for some particular set F with particular operations; one is only agreeing that one will call a set F with operations a field if and only if the operations satisfy those conditions! It is true that when Euclid used the term "axioms", he was saying "these are true facts about points, lines, etc.." The evolution from the use of these terms for facts assumed true to their use for conditions in a definition is part of the development of modern abstract mathematics. It means that all results proved from the field axioms are true not only for one object for which one assumed them true, but for every object which satisfies them; and as I mentioned in class, there are many of these, some of which look very different from the system of real numbers. The assertion that these conditions are "true" _does_ come in -- not in the definition, but at the point where one proves that some object _is_ a field. In this course, the real numbers are proved a field (in fact, an ordered field with the least upper bound property), but that proof is put off to the end of chapter 1. Moreover, it is based on the assumption that the rational numbers form an ordered field with certain properties, which is assumed, rather than proved, in this course. The construction of Q from Z, and the proof that Q forms a field, are done in Math 113; that Q satisfies the ordering conditions is not hard to prove once that is done, assuming appropriate properties of the order relation on Z. Finally, the construction of Z is done in (I think) Math 135, assuming the axioms of set theory. If one does not want to assume some background of facts in a course unless one had proved them, one could set up a series of prerequisites, 135 < 113 < 104 < 185. But this would be unsatisfying for students in programs that want some of the later courses, but don't want to require their students to take the whole sequence... . Incidentally, when I spoke of "the axioms of set theory", this use of the term "axiom" is closer to Euclid's. It is really a sort of a hybrid usage, in a sense that one can understand after one has taken advanced courses in set theory. ---------------------------------------------------------------------- You ask what the strings of symbols displayed in Remark 1.13, p.6 are supposed to represent. They are conventional abbreviations. For instance, in the definition of a field, Rudin says what is meant by "-x" for any element x, and says what is meant by "x+y", but he doesn't say what is meant by "x-y". He makes that the first symbol in the first display of the Remark, and then makes the second display begin with its "translation": x+(-y). So he is saying that "x-y" will be used as an abbreviation for x+(-y). Similarly with the second item, the third item, etc. of those two displays. ---------------------------------------------------------------------- You ask for a hint or proof for proposition 1.15(d), p.7. Rudin writes that the proof is similar to that of Proposition 1.14. So look at part (d) of that proof, and see whether, when you replace addition by multiplication and additive inverse by multiplicative inverse, you get what you want. (Or if you did try that and couldn't make it work, you question should have said this, and shown what calculation you tried. Cf. the paragraph of the class handout just below the middle of the second page, beginning "Note that if in my office hours ...") ---------------------------------------------------------------------- You ask, regarding statement (c) of Proposition 1.16 on p.7, "Would it be bad form to prove (-x)y = -(xy) by saying (-x)y = (-1x)y = -1(xy) = -(xy) ?" Not bad form, but logically incomplete! We hadn't proved that -x = (-1)x. After we prove the above equation, then we can deduce that fact, by putting "1" in place of "x" and "x" in place of y in the equation in question. ---------------------------------------------------------------------- You ask whether Definition 1.17 on p.7 ought not to state more, namely what inequalities hold if x<0 and y<0, etc.. In setting up a mathematical concept, one strives to start out with a minimal set of elegant conditions, and deduce as much as one can from these. The result "x<0, y<0 ==> xy>0" can be deduced from the facts given in Definition 1.17, so it is preferable not to add that as an axiom, but to deduce it. It is the z=0 case of Prop.1.18(c). ---------------------------------------------------------------------- You ask how Rudin gets the implication "y>0 and v_<0 => yv_<0" in the proof of Prop. 1.18(e), p.8. By part (b) of that Proposition, using y in place of x, v in place of y, and 0 in place of z. ---------------------------------------------------------------------- Regarding Theorem 1.19 on p.8, you write "... I thought R covers numbers from -infinity to +infinity. Where does the l.u.b. come from?" As I emphasized in class, the "least upper bound property" does _not_ say that R has a least upper bound! As stated in Definition 1.10, bottom of p.4, to say R has the least upper bound property means that every _bounded_ subset E of R has a least upper bound (not that R itself does). Please carefully re-read Definitions 1.8 and 1.10, and note the relation and the difference between them. ---------------------------------------------------------------------- I hope that what I said in class answered your question: To say that R has the least-upper-bound property (p.8, Theorem 1.19) doesn't mean that it has a least upper bound. What it means is defined in Definition 1.10, namely that every subset of R that is bounded above has a least upper bound. ---------------------------------------------------------------------- You're right that on p.8, Theorem 1.19, the statement "R contains Q as a subfield" is what Rudin translates in the following sentence by saying that the operations of Q are instances of the operations of R; and you are right that the further phrase "the positive rational numbers are positive elements of R" is something extra, not included in the statement "R contains Q as a subfield". However, it is no less important, and Rudin should have included it in the statement of the theorem. It implies that the order relation among elements of Q agrees with their order as elements of R, and this will allow us to write "a < b" without ambiguity when a and b are elements of Q and hence of R, just as the the fact that the operations are the same allows us to write "a+b" and "a.b" without ambiguity. ---------------------------------------------------------------------- You ask whether Rudin's assertion (Theorem 1.19, p.8) that there is an ordered field R with the least upper bound property implies that R is the only one. I actually answered that on Wednesday when I previewed Friday's reading. My answer was that in a literal sense, it is not the only one: If we take any set in bijective correspondence with R, then we can give it addition, multiplication, and ordering so that it becomes "an exact copy" of R (in mathematical language, it becomes an ordered field isomorphic to R). Hence it will also have the least upper bound property, it will not be the same set as R, so it is "another" such ordered field. However, if we exclude such cases by asking whether every ordered field with the least upper bound property is _isomorphic_ to R, the answer is: Yes, but it is not implied by Theorem 1.19. The proof is outlined in Exercise 1.4:3 in the exercise-packet. I have given that exercise the difficulty-rating "5", which means "too hard to give as an assigned problem even in an honors course". But you might enjoy looking at it and thinking about it. ---------------------------------------------------------------------- You ask why, in the proof of Theorem 1.20(a), p.9, we don't simply select the largest a\in A that is less than y. Because we don't know to start with that there is a largest such a. For instance, it might happen that all members of A are less than y; then there would not be a largest one that is less than y. And indeed, I sketched in lecture yesterday the description of an ordered field in which that happens. So to prove that it doesn't in R, we need to know more about R than that it is an ordered field; we must somehow show that the least upper bound property excludes this possibility; which is what Rudin's argument does. ---------------------------------------------------------------------- You ask about Rudin's statement on p.9, beginning of proof of statement (b), that the archimedean property lets us obtain m_1 such that m_1 > nx. Well, the archimedean property says that given any positive real number x and real number y we can find a positive integer n with nx > y, so we just have to figure out what "x" and "y" this statement is being applied to in this case, and what the "n" is being called. To look at "m_1 > nx" as an instance of such an inequality, we need the left-hand side to equal the product of the integer we are finding with some given real number, so we should take m_1 in the role of "n" and 1 in the role of "x". We then take the given values of nx in the role of "y", and we have the equation in the desired form. ---------------------------------------------------------------------- You ask about the implication 0 < t < 1 => t^n < t in the proof of Theorem 1.21, p.10. One uses Proposition 1.18(c) to get from "t < 1" the successive inequalities "t^2 < t", "t^3 < t^2" etc., and hence "t^n < ... < t^2 < t". Rudin assumes one can see this. ---------------------------------------------------------------------- You ask about Rudin's statement in the proof of Theorem 1.21, p.10, "Thus t belongs to E, and E is not empty." You say that you thought that when he first said "Let E be the set containing of all positive real numbers t such that t^n < x", it was already implied that t belongs to E and E was a non-empty set. But looking at that sentence, you can see that it does not refer to _one_ value of t, but to _all_ those values that satisfy the indicated relations. Now just saying "Let E be the set of all such values" doesn't prove that there are any -- that would be a very unfair way of "proving" something! Yet we need to know E is nonempty in order to apply the least upper bound property; so we have to describe some element t that we can show is a positive real number such that t^n < x. Rudin's trick for doing this is to find a positive real number that is both < x and < 1. Then he can argue (by Proposition 1.18(b)) that t^n = t.t. ... .t < 1.t. ... .t < 1.1. ... . t < ... < 1. ... 1.t = t, giving the desired inequality t^n < t. How does one show that there is a positive real number that is both < 1 and < x ? He could have left this up to you to check, and you might have come up with something like "We have x/2 < x, and 1/2 < 1; so if we let t be whichever of x/2 and 1/2 is smaller, then it will have the desired property". But instead of leaving it up to you, he gives a quick way of finding an element t with that property: if we take t = x/(1+x), both inequalities are quick to check. As I pointed out above, the first occurrence of the symbol "t" in this proof does not refer to a specific real number, but is used as a dummy variable -- the idea is that as it ranges over all positive real numbers, we test whether each one does nor doesn't satisfy the relation t^n < x, and if it does, we count it as a member of the set E; if it doesn't we leave it out of E. In subsequent occurrences in the proof, "t" refers to different positive real numbers which we likewise look at in terms of whether they satisfy t^n < x, i.e., whether they belong to E. Thus, after showing that one such value does satisfy the given inequality, making E nonempty, Rudin turns to another set of values, those > 1+x, and shows that these do not satisfy the inequality -- and this shows that 1+x is an upper bound for E. Near the bottom of the page, he looks at a third set of positive real numbers t, those that satisfy t >_ y-k. I hope you will re-read the proof with these observations in mind, and that it will be clearer. If you still have questions about it, please ask. ---------------------------------------------------------------------- You ask how, in the proof of Theorem 1.21 (p.10), the inequality b^n - a^n < (b-a) n b^{n-1} follows from the identity b^n - a^n = (b - a)(b^{n-1} + b^{n-2}a + ... + a^{n-1}). In that identity, the second factor has n summands, each of which is of the form b^m a^{n-1-m}. Now since a < b, we have a^{n-1-m) _< b^{n-1-m), so b^m a^{n-1-m} _< b^m b^{n-1-m} = b^{n-1}. That is, each of the n terms is _< b^{n-1}, so their sum is _< n b^{n-1}. The factor b-a is unchanged. ---------------------------------------------------------------------- You ask how Rudin derives the inequality h<(x-y^n)/(n(y+1)^(n-1)) in the proof of Theorem 1.21, p.10. First note that he doesn't "derive" it in the sense of proving it is true. He says "Choose any h with this property", and he then shows you that any h so chosen will have the property (y+h)^n < x. So the question is "How did he know that choosing h this way would give that inequality?" The idea is that "if you change a number by just a little, then its n-th power will be changed by just a little; in particular, assuming y^n < x, then if you change y by a small enough value h, then (y+h)^n will still be < x". To make precise the statement that "if you change a number by just a little, then its nth power will be changed by just a little", Rudin notes the formula b^n - a^n = (b - a)(b^{n-1} + b^{n-2}a + ... + a^{n-1}). The factor b - a on the left shows that if b is "just a little" bigger than a, then the product will also be "small". But how small it is is affected by the other factor, b^{n-1} + b^{n-2}a + ... + a^{n-1}. To make b^n - a^n less than some value, we have to make b - a less than that value divided by some upper bound for b^{n-1} + b^{n-2}a + ... + a^{n-1}; an upper bound that will do is nb^{n-1} if b > a > 0. (This is the second dispaly in the proof.) We now want to apply this to make (y+h)^n - y^n less than the distance from y^n to x. To do this we choose the difference between y+h and y to be smaller than x - y^n _divided_by_ an upper bound for "nb^{n-1}". If we take h < 1, then n(y+1)^{n-1} will serve as this upper bound. ---------------------------------------------------------------------- You ask whether Theorem 1.20 is used in proving Theorem 1.21 (p.10). No, it isn't. ---------------------------------------------------------------------- You ask about the fact that the Corollary on p.11 about nth roots of products uses the formula for nth powers of products. That formula is true in any field, not just R -- it follows from commutativity of multiplication: (alpha beta)^n = alpha beta alpha beta ... alpha beta = alpha alpha ... alpha beta beta ... beta = alpha^n beta^n. (Of course, one has to keep track of how many alphas and betas there are. To make this most precise, one can use induction on n.) ---------------------------------------------------------------------- You ask about Rudin's statement on p.11, heading 1.22, that the existence of the n_0 referred to there depends on the archimedean property. One uses the archimedean property to conclude that there is some n with n > x. (To do this, one applies Theorem 1.20(a) with "1" for the "x" of that statement, and "x" for the "y" of that statement".) On the other hand, by assumption, 0 _< x. Since there are only finitely many integers in the range 0,...,n, and the first of them is _< x while the last is not, one of them, which we can call n_0, will be the last to have n_0 _< x. Thus, n_0 + 1 >x, from which we can conclude that all integers > n_0 are > x. This makes n_0 the largest integer _< x, as Rudin claims. ---------------------------------------------------------------------- Concerning Rudin's sketch of decimal fractions on p.11, you note that after choosing n_0, Rudin assumes that n_0,...,n_{k-1} have been chosen, and shows how to choose n_k; and you ask how n_1,...,n_{k-1} are chosen. What Rudin is describing is a general method, which one applies successively for one value of k after another. When we have chosen n_0, we can truthfully say for k = 1 that "n_0,...,n_{k-1} have been chosen", and so the construction he describes gives n_1. When we have that, we can say for k = 2 that "n_0,...,n_{k-1} have been chosen", and so get n_2; and so on. (This is like an inductive proof, where after proving the "0" case, and showing how, when the 0, ... , k-1 cases are known, the case of k follows, we can conclude that all cases are true. When we do this for a construction rather than a proof, it is not called "induction" but "recursion"; i.e., instead of an "inductive proof" it is a "recursive construction".) ---------------------------------------------------------------------- Regarding the expression of real numbers as decimals discussed on p.11, you ask about the validity of saying that .9999... = 1. Well, the first thing to ask is "Why do we say that .99999... is a valid decimal expression?" If we constructed decimal expressions strictly as Rudin does on that page, then no real number would give that form: real numbers slightly less than 1 might begin with 0.9..., but sooner or later they would have a digit other than 9, since the condition r < 1 implies that for some k, 10^{-k} < 1-r, and this would give a digit other than 9 by the kth digit. On the other hand, the real number 1 comes out as 1.000... under that construction; so ".9999..." would never show up. On the other hand, we could simply say that for every sequence of digits n_1, n_2, ... (each in {0,...,9}), we define ".n_1 n_2 ..." to mean the least upper bound of the set of real numbers n_1/10 + ... + n_k/10^{-k} (k=1,2,...), as Rudin does in the latter half of point 1.22. Then that map is almost 1-1, the exceptions being numbers that can be written as terminating decimals ".n_1 n_2 ... n_{k-1} n_k 0 0 0 ..."; these also have the expression ".n_1 n_2 ... n_{k-1} (n_k-1) 9 9 9 ...". The reason these are equal is that the least upper bound of the sequence of finite sums defining the latter expression can be shown equal the finite sum defining the former. Taking the specific case of 1 and .9999... , this says that the least upper bound of the sums 9/10 + ... + 9/10^{-k} for k=1,2,... is 1. This can be proved by similar considerations to what I noted in the preceding paragraph: if r < 1 were a smaller upper bound, then 1-r would be positive, and we could find a k such that 10^{-k} < 1-r, but this would make 9/10 + ... + 9/10^{-k} > r, contradicting our assumption that r was an upper bound for that set. From this point of view, Rudin's construction at the beginning of 1.22 is just a way of showing that every real number has at least one decimal expansion, but without an assertion that that expansion is unique. ---------------------------------------------------------------------- You ask, regarding Rudin's statement on p.12 that the extended real numbers don't form a field, whether (M5) is the condition that fails. The most serious failures are of (A1) and (M1), since the sum (+infinity) + (-infinity) and the products 0 . (+infinity) and 0 . (-infinity) are undefined. (A5) and, as you note, (M5), also fail. ---------------------------------------------------------------------- You ask whether imaginary numbers (p.12) exist in the real world, and if not, how we can use them to solve real-world problems. I would say no, imaginary numbers don't exist in the real world; but real numbers don't either. Although "two people" are real, I don't think "two" is anything real. Yet we use the number "two" (for instance, when we talk about "two people".) I agree that imaginary numbers have less obvious connections with the real world than real numbers do; but they are still useful. When you took Math 54, I hope you saw how differential equations whose characteristic polynomials had non-real roots could be solved using those roots. Basically, mathematics is the study of abstract structure; and the structure of the complex numbers is a very fundamental one, of great value in studying other structures -- such as that of differential equations over real numbers. ---------------------------------------------------------------------- You ask whether one is ever interested in taking higher roots of -1 than the square root, and whether this leads to further extensions of the field of complex numbers (p.12). The answers are Yes and No respectively! One is interested in such roots, but one doesn't have to go beyond C to find them. In fact, one can find in the field C roots of all polynomials of positive degree over C, including such polynomials as x^4 + 1. The particular case of square roots of all elements of C is exercise 1:R10 (not assigned). Note that this shows that one can find a square root of i, which will be a fourth root of -1. You'll probably see some discussion of "roots of 1 in the complex plane" in your 113 text. The general result about polynomials over C is proved in Math 185. ---------------------------------------------------------------------- You ask, in connection with the construction of the complex numbers on p.12, whether any Euclidean spaces other than R and R^2 can be made into fields. That was a big question in the history of early modern mathematics! Briefly, the answer is no. If we assume that "Euclidean spaces" means finite-dimensional vector spaces over R, and that the multiplication is to be linear in each variable (which comes down to assuming that the field contains a 1-dimensional subspace whose field structure is that of R), then the only fields one can get this way are R and C. If one weakens these conditions in various ways, other cases appear. If we impose all conditions in the definition of field except commutativity of multiplication (and assume both distributivity conditions, x(y+z) = xy + xz and (x+y)z = xz + yz, since without commutativity of multiplication they are not equivalent), then we get the concept of a division ring, and for division rings over the complex numbers, there is one more case: 4-dimensional Euclidean space can be made a division ring, called the ring of quaternions, and discovered by W. R. Hamilton in 1843. These are of moderate interest in mathematics, though not nearly as important as the complex numbers. If one also discards the associative law of multiplication, one gets still one more case, an 8-dimensional structure called the octonions. These play a yet more minor role. Beyond that, it's been proved that no higher-dimensional vector spaces admit any multiplication with reasonable field-like properties. If instead of starting with the reals, one takes as one's base field the rational numbers Q, there are infinitely many ways to make finite-dimensional vector spaces over Q into fields. For instance, the set of numbers p + q sqrt 2 with p and q rational forms a field, whose elements can be represented by ordered pairs (p,q), and hence regarded as forming the 2-dimensional vector space over Q. In this, "sqrt 2" can be replaced by "sqrt n" for any n that is not a square. If one wants to introduce a cube root instead of a square root, one gets a 3-dimensional vector space, etc.; this is the start of the subject of field extensions, touched on at the very end of Math 113, and in greater depth in Math 114 (usually) and Math 250A. If one allows infinite-dimensional vector spaces, one can get examples over R as well. The field of rational functions f(x)/g(x) in an indeterminate x over R can be looked at as such a space. Getting back to the question of why 2 is the only finite n>1 such that the field of real numbers has an n-dimensional extension; this can be deduced from the Fundamental Theorem of Algebra, which says that every polynomial over positive degree over C has a root in C. From this one can deduce that every finite-dimensional extension of R can be embedded in C; but a 2-dimensional vector-space can't have a subspace of smaller dimension > 1, so every such extension must be C. The Fundamental Theorem of Algebra is generally proved in Math 185, in Math 114, and in Math 250A. ---------------------------------------------------------------------- You ask why the two simple arithmetic facts stated as Theorem 1.26 (p.13) are called a "theorem". The answer is given in the sentence just below the theorem, "Theorem 1.26 shows ...". In algebraic language, that theorem says that the map a |-> (a,0) gives an embedding of the field of real numbers in the field of complex numbers. ---------------------------------------------------------------------- You ask whether one can use results from elementary geometry to prove statements like Theorem 1.33(e) (the triangle inequality) on p.14 in a course like this. Yes and no -- but mainly no. You may know from a geometry course that by the axioms of Euclidean geometry, each side of a triangle is less than or equal to the sum of the other two sides. But in this course, we have defined R^2 as the set of pairs of elements of R, where R is an ordered field with the least upper bound property; and we haven't proved that R^2, so defined, satisfies the axioms of Euclidean geometry. In fact, proving Theorem 1.33(e) is a key step in establishing that fact! That said, there are a few senses in which it is valid to use elementary geometry. One of them is to give intuition -- to help see what we can hope to prove, and to help us picture the results stated and the steps given in a proof. A second would apply if Rudin had put the section "Euclidean spaces" before "The complex numbers" -- namely, if we call results proved in the section "Euclidean spaces" geometric results, then these geometric results can be applied to the case C = R^2, as long as we remember to check what definition in C corresponds to what definition in R^2. Finally, after we have established many basic results on R^2, we may sometimes cut corners and take for granted that we have established enough to make Euclid's axioms valid there (though we won't be talking about those axioms in this course), and so call on some result from geometry later on. But we should not do that when we are establishing the basics. ---------------------------------------------------------------------- You ask why, on p.15, line 4, Rudin says that "(c) follows from the uniqueness assertion of Theorem 1.21." Well, he has just shown that the square of the left-hand-side of (c) is equal to the square of the right-hand side. By definition, both the left- and right-hand sides of (c) are nonnegative, and the uniqueness assertion of Theorem 1.21 says that nonnegative square roots are unique (i.e., that nonnegative real numbers that have the same squares are equal). So the two sides of (c) are equal. I wish you had said in your question how close you had gotten to understanding this point. Did you see that the two sides of the equation that had been proved were the squares of the two sides of (c)? Did you understand that the uniqueness assertion of Theorem 1.21 meant the words "only one" in that Theorem? Did you at least examine what Rudin had proved on this page, to see what relation it had to (c), and/or look at Theorem 1.21, and see whether you understood what was meant by "the uniqueness assertion"? Letting me know these things can help me answer your question most effectively. ---------------------------------------------------------------------- You ask why the b's are shown as conjugates in the Schwarz inequality on p.15. There's a reason, having to do with the version of the inner product concept that is used on complex vector spaces. Since Rudin doesn't define complex vector spaces and their inner product structure, I can't connect it with anything else in the book; I'll simply say that when one needs to apply the Schwarz inequality to complex vector spaces, the coefficients that appear on the a_j are most often conjugates of given complex numbers, so it is most convenient to have the result stated in that form. You also ask how that inequality is used in proving Theorem 1.37(e) on p.16. Well, the vectors x and y in Theorem 1.37(3) are k-tuples of real numbers, and the real numbers form a subfield of the complex numbers; so we apply the Schwarz inequality with the coordinates of x and y in place of the a_j and b_j respectively, taking the positive square root of each side. Note that the operation of conjugating b_j has no effect when b_j is a real number. (The conjugate of a + 0i is itself.) So the summation on the left in the inequality really is the dot product. ---------------------------------------------------------------------- Regarding the proof of Theorem 1.35 on p.15, you ask how Theorem 1.31 "leads to the first line" of the computation. That theorem doesn't lead to the first line; it contains computational facts that are used in various lines of the computation. (His phrase "by Theorem 1.31" is a bit confusing that way.) ---------------------------------------------------------------------- You ask how Rudin comes up with the expression \Sigma |B*a_j - C*b_j|^2 to use in the computation in the proof of Theorem 1.35, p.15. Unfortunately, the answer is a lot longer than the proof of the theorem! I'll sketch it here for the case where the a_i and b_i are real rather than arbitrary complex numbers. The complex case can be motivated similarly. (Cf. the relation between exercises 1.7:2 and 1.7:3.) First note that if we write a = (a_1,...,a_n) and b = (b_1,...,b_n), then the numbers we are dealing with are dot products of vectors: A = a.a, B = b.b, C = a.b, and the expression Rudin pulls out of the hat is (Ba - Cb).(Ba - Cb) . To motivate the choice of this element, suppose that V is any vector space over R, and that we have defined an operation V x V --> R which we call "dot product" on V, and which, for all vectors x, y, z and scalars alpha satisfies (i) x.y = y.x (ii) (x+y).z = x.z + y.z, (iii) (alpha x) . y = alpha (x.y) Suppose we want to know whether it has the very important further property (iv) x.x >_ 0 for all vectors x. Well, to simplify this problem, suppose we take any two vectors a and b, and consider the subspace of V consisting of all vectors of the form x = alpha a + beta b (alpha, beta real numbers). Then using the laws (i)-(iii) we can figure out the dot product of any pair of elements of this subspace if we know the three numbers A = a.a, B = b.b, C = a.b . So we may ask: What relation must these three numbers have with each other in order to make (iv) hold for every x = alpha a + beta b ? With a little thought, one can see that it is enough to consider the case x = a + lambda b for a single scalar lambda. We find that x.x = A + 2 lambda C + lambda^2 B, so we want to know the conditions under which this will be nonnegative for all lambda. Assuming that A and B are nonzero, they must be positive to make appropriate values of this expression nonnegative. Then this quadratic function of lambda will have some minimum, and if that minimum is nonnegative, (iv) will hold. By calculus (which we shouldn't be using until we re-develop it in Chapter 5) we see that this minimum will occur at lambda = - C/B. Substituting this value into the expression, we get a condition which simplifies to A - C^2/B >_ 0, or clearing denominators, AB - C^2 >_0. Now if we just want to prove that this inequality is necessary for (iv) to hold (without proving that it will also make (iv) hold, and without bothering the reader with the computations by which we came upon this formula), then it is enough to say "(iv) implies that a + (-C/B) b has nonnegative dot product with itself; expand that condition and clear denominators, and you will get the relation A - C^2/B >_ 0." Or better yet, instead of writing down a + (-C/B) b, multiply this out by the scalar B, thus clearing denominators and getting the vector Ba - Cb. Then the condition that its dot product with itself is >_0 implies AB - C^2 >_ 0. Now in our situation, we _know_ that (iv) holds for our dot product, because the dot product of any vector with itself is a sum of squares; so the same calculation using the known relation (iv) and x = Ba - Cb will prove that our dot product satisfies AB >_ C^2, as desired. I guess you can see why Rudin didn't explain where he got that expression from -- it is just too roundabout! On the other hand, he might have given you a different proof that looked a little less unnatural; for instance, the one you will do for homework as Exercise 1.7:2. ---------------------------------------------------------------------- Regarding the proof of Theorem 1.35 on p.15, you ask how the second line of the computation simplifies to the third. That step would have been clearer if Rudin had inserted a line in between them: = B^2 A - B C-bar C - B C C-bar + |C|^2 B. Can you see that the second line simplifies to this? Now using the fact that C C-bar = |C|^2, one can see that the last three terms are all equal except for sign; so they simplify to a single term with a minus sign. ---------------------------------------------------------------------- You ask why in the last display on p.15 (proof of the Schwarz inequality), the final factor in the first line is B\bar{a_j} - C\bar{b_j}, rather than \bar{B a_j} - \bar{C b_j} as one might expect from the general formula |z|^2 = z\bar{z} . This is because B and C are real numbers, and the conjugation function satisfies \bar{az} = a\bar{z} for a real. ---------------------------------------------------------------------- You ask whether for k < m, the zero vector in R^k (Def.1.36, p.16) is the same as the zero vector in R^m. No, because one zero vector is a k-tuple and the other is an m-tuple. However, in some contexts, an author may say "We will identify R^k with the subspace of R^m consisting of all vectors whose last m-k components are zero" (in other words, regard k-tuples (x_1,...,x_k) as equivalent to m-tuples of the form (x_1,...,x_k,0,...,0))." In this situation, all these zero vectors will be treated as equal. ---------------------------------------------------------------------- You ask why, on pp.15-16, Rudin didn't introduce the inner product till after the Cauchy-Schwarz inequality. I also feel that it would have been better to do it that way. But putting together a book is an enormous amount of work -- I've been working for about 35 years on my Math 245 notes, /~gbergman/245 and every time I teach that course and read through what I have there, I see more things that would be better if done a little differently -- and even some just plain errors -- and I make these changes. Often making one change causes other things that aren't changed to be less than optimally arranged. So we should be grateful that Rudin did as much as he did, and sorry that at the age he has reached, further revisions are more than he is up to doing. ---------------------------------------------------------------------- You ask about Rudin's statement on p.17 (second line above Step 2) that (II) implies that if p\in\alpha and q\notin\alpha then p < q. Well, it should be clear when you draw a picture, with \alpha a region of the line "going downward", as (II) implies. Then if you choose something in that region and something outside it, the former must be below the latter. To show it formally, consider any p\in\alpha and q\notin\alpha. If p were >_ q, then in view of (II), "p\in\alpha" would imply q\in\alpha, contradicting our assumption. So we must have p < q. ---------------------------------------------------------------------- You say that in Step 3 on p.18, \gamma is being used "interchangeably as a set and a number". The word "number" is ambiguous here: Elements of Q are our old "numbers", while elements of R, which are sets of elements of Q, are our new "numbers". So our new "numbers" are sets of old "numbers". I hope that if you re-read that Step, with this in mind, it will make more sense. ---------------------------------------------------------------------- You ask, regarding Step 3 on p.18, "If gamma is a cut (= set) and gamma = sup A, does this mean that A has several supremums?" No. A supremum of a set A of elements of R means an upper bound of A which is _< all upper bounds of A. All the elements of R are sets, and the concept of "upper bound" is defined in terms of the order-relation "<" on R, as stated in Step 2, p.17. So a least upper bound of A means an element of R, i.e., a _set_, which is >_ all members of A (which are themselves sets) under the ordering stated in Step 2, and _< all such upper bounds. As noted on p.4, a subset of an ordered set can have at most one supremum; i.e., there is at most one set gamma having the properties stated. (By the way, the plural of "supremum" is "suprema".) ---------------------------------------------------------------------- Concerning Step 3, on p.18, where Rudin constructs least upper bounds of cuts, you ask how one gets greatest lower bounds. Well, since least upper bounds are done by taking the union of the given set of cuts, the obvious way to approach greatest lower bounds would be to consider the intersection of that set of cuts. This _almost_ works. Can you figure out the little correction that has to be made to get it to work exactly? ---------------------------------------------------------------------- You ask about the third and 4th lines on p.19, where Rudin says "Let beta be the set of all p with the following property: There exists r > 0 such that -p-r not-in alpha." As you note, this "p" is not referring to a specific number, but is simply part of the definition of the set beta; and you ask "What can we assume about p ?" I see that on the next line, Rudin writes "In other words, some rational number smaller than -p fails to be in alpha". In that phrase, he is not making an assertion; he is simply reinterpreting the preceding line. So the "p" referred to is still not a particular number, but part of the description of beta. (For better clarity, I would have written "In other words, _such_that_ some rational number smaller than -p fails to be in alpha".) Later in this Step, the symbol "p" is used for various rational numbers that Rudin wants to discuss in terms of whether they belong to beta. So one is not to assume the statement "There exists r > 0 such that -p-r not-in alpha" unless Rudin actually says "Let p\in beta". For some of the p considered, he does state this; for others, he wants to prove it. ---------------------------------------------------------------------- You ask why, on the 9th line of p.19, Rudin says that if q belongs to alpha then -q does not belong to beta. Well, you have to go back to the definition of beta and see whether -q can satisfy it. For -q to satisfy it means that there exists a positive r with -(-q)-r \notin alpha, i.e., with q-r\notin alpha. But this is impossible by (II) on p.17, since q-r < q. ---------------------------------------------------------------------- You ask about Rudin's choice of t = p + (r/2) on p.19, line 12, in showing that the set beta he is constructing there -- which is intended to be the additive inverse of alpha -- satisfies condition (III) for cuts. If one goes back to the idea that alpha is to be thought of as the set of rationals lower than some real number r, then we see that what we want to construct as its additive inverse should be the set of rationals lower than -r. This can be approximately constructed as the set of negatives of the numbers _not_ in alpha -- except for the difficulty that if r is itself rational, then the set of numbers not in alpha will have a smallest member, r, itself, and so their set of negatives will have a largest member, -r, and we need to exclude that. So we really want to take the set of negatives of elements of the complement of alpha excluding the negative of the smallest member of that set, if any. Rudin chooses to describe this in a way that does not requite an "if any": he makes it the set of rational numbers p such that you can go down "a bit" from -p and (still) be in the complement of alpha; i.e., such -p-r \notin alpha for some positive rational p. Now if we need to show that there is something larger than p that is also in this set, we can't be sure that p+r will work, because its negative, -p-r, though in the complement of alpha, might be the smallest member of that complement (if there is one). However, if we take the element whose negative goes "half way" down to -p-r, namely p+(r/2), this will be sure to be in the complement of alpha and not be the least element thereof. As you indicate, there is no need to use exactly r/2; we could use cr for any rational number c such that 0 < c < 1. (Or, indeed, Rudin could have said "choose any r' such that 0 < r' < r".) But "r/2" was an easy choice that had the right property. ---------------------------------------------------------------------- You ask why, around the middle of p.19, on the line beginning, "If r\in\alpha and s\in\beta," the author can say "then -s\notin\alpha." Well, the statement s\in\beta means, by the definition of \beta near the top of the page, that there exists r>0 such that -s-r\notin\alpha. If -s were in \alpha, then for any such r the element -s-r, being smaller than -s, would also be in \alpha (by condition (II)), a contradiction. So -s cannot be in \alpha. I hope that the picture of the construction of \beta in my lecture -- by taking the complement of \alpha, turning it upside down, and taking away the top element if any -- also made it clear that anything in \beta is the negative of something not in \alpha. ---------------------------------------------------------------------- Regarding the definition of multiplication in Rudin on pp.19-20 you ask "Do I need to prove that \alpha \beta = \beta \alpha?" Yes, it is one of the things that needs to be proved -- first at the top of p.20, when Rudin says "the axioms (M) and (D) of Definition 1.12 hold ... The proofs are so similar that we omit them", and then later on the page, when he says "Having proved (in Step 6) that the axioms (M) hold in R^+, it is now perfectly simple to prove them in R". In both places, since (M2), commutativity, is one of these axioms, it is one of the things that he says is easy to prove, and you should be able to do it. If you're unsure how to do it at the top of p.20, remember that he says he is omitting the proofs because they are "so similar" to the proofs of the axioms for addition. So look at how he proves axiom (A2) for addition of cuts, and see whether the same proof works for multiplication in R^+! Concerning the case in the middle of p.20, he says that having proved the result for R^+, it is perfectly simple to get it for R; so here, note how the multiplication of elements of R is defined in terms of the multiplication of R^+, and see whether commutativity of the latter immediately leads to commutativity of the former. You end, "I actually thought this was the definition." The definition of what? It is part of the definition of a field. Since we are proving that R is a field, we have to show that each condition in the definition of a field is satisfied. ---------------------------------------------------------------------- Both of you ask how we know that Q has the archimedean property, called on by Rudin in the middle of p.19. That means that you haven't entered into your copies of Rudin the notations given in the Errata packet that I gave out on the first day of class. Please do so as soon as possible! There's no reason why you should let yourself be confused by points in Rudin that are corrected in that packet. If you don't have your copy of that packet, get another one from me. ---------------------------------------------------------------------- You ask why we don't alter step 6 (p.19, bottom) to define alpha beta whenever alpha and beta are nonnegative elements of R, instead of only positive elements of R. Interesting idea. In the case where alpha or beta is 0*, the definition of alpha beta given on the bottom of p.19 wouldn't work as stated: It says use all p such that p _< rs for r\in alpha, s\in beta and both positive; but if, say alpha is 0*, then there are no positive r\in alpha, so alpha beta would be the empty set, which is not a cut, and certainly nothing like what we want (namely, 0*). However, if we modify Rudin's definition, and let alpha beta be the union of the set of negative rationals with the set of products rs with r\in alpha, s\in beta both nonnegative, this would work, since in the case where, say, alpha was 0*, it would give the union of the set negative rationals and the empty set, which would be the set of negative rationals, namely 0*, as desired. However, the verification that this multiplication satisfies (M) and (D) would involve checking, in every case, whether the possibility that one of the multiplications involved 0* might make things work out differently. It could be interesting to try working it through, and see whether this does raise any complications -- let me know if you try doing so. But I wouldn't try introducing that approach in a course at the undergraduate level. Students regularly get confused as to what statements about the empty set are true, so it would make an already challenging topic still harder to deal with. ---------------------------------------------------------------------- You ask about Rudin's use of _< instead of < in defining the product of two positive cuts (p.19, 3rd from last line), comparing this with other definitions where he uses "<". Well, it is more useful to compare this definition of multiplication of positive cuts with the definition of addition of cuts. There he took elements that were _equal_ to a sum of a member of \alpha and a member of \beta. But using products of such elements wouldn't work for multiplication -- if we used the product of an arbitrary member of \alpha and an arbitrary member of \beta, these would include products of arbitrarily large negative numbers, and so would give us arbitrarily large positive numbers, which wouldn't give the cut we wanted. So he restricts to products of positive members of \alpha and \beta, and this gives the positive members of \alpha\beta nicely -- but not 0 and the negative members. He could fix this by throwing them in separately; instead, he chooses to get them by putting a "_<" into the definition. In contrast, in places where he uses "<" in definitions, this is usually to exclude the possibility that the set construction would contain a largest element. But that isn't a danger in this case. ---------------------------------------------------------------------- You ask about the words "for some choice of r\in\alpha, s\in\beta ..." (next-to-last line on p.19) in the definition of multiplication of cuts. You are interpreting these words as meaning that one makes a choice of r and a choice of s, and then defines \alpha\beta in terms of that choice. This is a fine example of ambiguity in order of quantifiers, which I discuss near the bottom of p.9 of the notes on sets and logic! You are reading Rudin's words as (for all p, p\in\alpha\beta <=> p _< rs) for some r, s ... whereas what he means is for all p, p\in\alpha\beta <=> (p _< rs for some r, s ...) As I indicate in the notes, the meaning would be clear if all quantifiers were placed before the assertions quantified, so that their order showed which choices were made before which. What he means becomes unambiguous when expressed for all p (p\in\alpha\beta <=> for some r, s ... p _< rs) ---------------------------------------------------------------------- You ask how 1* will serve as the multiplicative identity (p.20. top), when multiplying by any member of 1* has the effect of decreasing a positive rational number. Well, ask yourself the analogous question of how 0* can serve as the additive identity, though adding any member of 0* has the effect of decreasing any rational number. If you can figure that out, the same idea will apply to 1*. If you can't, then ask me again about the 0* case (if possible, at office hours) and I'll show you why it works. (If uncertain about the 0* case, try figuring out what 0* + 0* will be -- 0* or something smaller? If it is something smaller, there should be a rational number that belongs to 0* but not to the smaller set 0* + 0*. What rational number might have that property?) ---------------------------------------------------------------------- You ask about Rudin's statement at the top of p.20 that the proof of (M) and (D) is similar to the proof of (A). He does not mean that the new proofs can be gotten by taking the old proofs and just making little changes in them. He means that the technique of proof is the same: Write down the identity you want to prove, express each inclusion that must be verified as saying that every member of a given set of rational numbers can be expressed in a certain way in terms of other sets of rational numbers, and do the necessary verifications, as illustrated in the proof of (A). ---------------------------------------------------------------------- You ask where Rudin proves the distributive property for positive real numbers, which he uses in his sketched proof in the middle of p.20 of the corresponding statement for arbitrary real numbers. Look at the top 4 lines of p.20. "(D)" is one of the axioms for which he says the proofs are "so similar" to the ones given earlier that he omits them. So -- check it out! Write down the distributive law for positive real numbers, figure out what has to be proved about elements of Q belonging to various cuts, and see whether you indeed find it straightforward to verify! If you run into a problem, e-mail me or come to office hours, and we'll untangle it. ---------------------------------------------------------------------- You note that Step 8 of Rudin's construction of R (pp.20-21) gives us the rationals, and ask how we get irrational numbers, such as sqrt 2. Well, one way is to use the idea noted on p.1 of the book, that sqrt 2 is the limit of the sequence of decimal approximations, 1, 1.4, 1.41, 1.414, 1.4142, ... . This is an increasing sequence, so sqrt 2 will be the supremum of the sequence, which by Step 3 of this Appendix is the union of the sets 1*, 1.4*, 1.41*, 1.414*, 1.4142*,.... Another way, which is not based on the decimal system, but just on the rational numbers in the abstract, is as indicated on p.2 of the text: Let A be the set of positive rationals p with p^2 < 2. Then sqrt 2 is the supremum of A. Finally, if you think about it, you will see that as a set of rational numbers, sqrt 2 consists precisely of those p\in Q such that either p < 0 or p^2 < 2. ---------------------------------------------------------------------- You ask why, in addition to associating with each rational r a cut r* (p.20), Rudin doesn't do the same for real numbers. The approach of this appendix is to assume that we know that the field of rational numbers exists, but we don't know that there is such a field as "the real numbers, R"; rather, we shall prove from the existence of the field of rationals that there a field with the properties stated for R in Theorem 1.19. (This appendix is the proof of that theorem.) So we can't "associate to each real number a cut" because we until we finish this appendix, we don't know that there is such a thing as a real number. Once we have finished the proof, we know that there is a field with the stated properties -- namely, the field R of all cuts in Q. We then don't have to associate to each member of R a cut in Q, because by definition each member of R is a cut in Q. I could go on at greater length (and I have done so in class); it's hard to know when to stop, i.e., when I've said enough to clarify the point, or whether I'm taking the right tack. If you are still having trouble with this, write again; or ask in office hours. ---------------------------------------------------------------------- You ask, regarding the choice of t on p.21, line 2, > If we were asked to prove (a) from the bottom of the previous page, > is there any other t than 2t=r+s-p to get the same result? Instead of taking r' = r-t, s' = s-t you could take r' = r-x, s' = s-y for any positive rational numbers x and y that sum to r+s-p. If you want to make x and y the same, though, then to make them add up to r+s-p, you have to use the t that Rudin gives. ---------------------------------------------------------------------- You ask whether the sentence on p.21, line 6, beginning "If r < s" means "For some r ...". As you note, what is being proved is statement (c) of Step 8 on the preceding page. That statement is (c) r* < s* if and only if r < s. This is understood to be quantified "For all r and s in Q". Hence the same applies to the proof. In other words, the sentence you ask about means "For all r and s in Q, if r < s then r\in s* but r\notin r*; hence r* < s*", where the first phrase "For all ..." covers the whole rest of the sentence, "then ... but ... hence ...". ---------------------------------------------------------------------- You ask what Step 9 of the Appendix (p.21) accomplishes. If gets the final assertion of Theorem 1.19: "R contains Q as a subfield". Namely, it summarizes Step 8 as showing that the set Q* of images q* in R of elements q of Q is a subfield of R isomorphic to Q, and then fudges, saying that we can therefore "identify" Q* with Q, and since Q* is a subfield of R, we can say (to all intents and purposes) that Q is a subfield of R. I talked a little more about this in class today, noting what one can do it one is not satisfied with something isomorphic to Q, but wants Q itself. But if you are satisfied with the identification approach, it completes the proof. ---------------------------------------------------------------------- You ask what Rudin means when he says on p.21 that Q is isomorphic to the ordered field Q*. He means that there is a 1-to-1 correspondence between elements of Q and elements of Q* such that corresponding pairs of elements have corresponding sums, corresponding pairs of elements have corresponding products, and corresponding pairs of elements have corresponding order-relations. These are the conditions called (a), (b), (c) on p.20. In Greek, "iso-" means "same", and "morph-" means "form", which mathematicians interpret as "structure". So saying that Q and Q* are isomorphic ordered fields means that they are ordered fields with identical structures. ---------------------------------------------------------------------- Both of you ask why Rudin doesn't mention irrational numbers in his construction of the reals from the rationals (pp.17-21). Because irrationals and rationals are both represented in the same way -- as cuts in the set of rationals. For instance, "2" is represented by the set of rational numbers less than 2, while the square root of 2 is represented by the set consisting of all rational numbers _< 0, and all positive rational numbers with square < 2. All he has to prove is that the R he constructs is an ordered field with the least upper bound property. Then the fact that it contains a square root of 2 will follow from Theorem 1.21, without explicitly saying (as I did above to answer your question) what it looks like in terms of the construction. One of you also asks "why does he define R to consists of sets of numbers, but not numbers?" Well, after the construction is finished, the members of R _are_ what we call numbers. But he can't define them in terms of the sorts of numbers that we assume we know of _before_ the construction, namely rational numbers, because there aren't enough rational numbers to define every real number by one of them. But there are enough _sets_ of rationals to do this. ---------------------------------------------------------------------- Regarding the construction of R on pp.17-21, you ask: > ... how do we know that this field R fills in the gaps in Q? > As I understand it, the real problem is proving that there are > irrational numbers between all elements of Q. There are already elements of Q between all elements of Q -- if p and q are in Q, then (p+q)/2 lies between them. That's not the problem; the problem is when there is "a place for an element, with no element in that place". This occurs when there is a bounded set A with no l.u.b.. In that case, the upper bounds of A also form a set B with no g.l.b., and there is "a place for an element" between A and B, but nothing there. In that case, if we enlarge A by throwing in all rational numbers p such that there exists q\in A with p < q, this doesn't change the set of upper bounds of A (because each element p we are adding is bounded above by an element of A, and hence bounded above by every upper bound for A); so A becomes a cut in Q, i.e., an element of R. And it is precisely the element of R needed to fill in the gap we found in Q. To put it briefly: the "gaps" in Q correspond to failures of the l.u.b. property, and since we proved that R has the l.u.b. property, it has no such "gaps". ---------------------------------------------------------------------- You ask about the meaning of the word "isomorphic" just below the middle of p.21. Two ordered fields E and F are called isomorphic (as ordered fields) if there exists a one-to-one and onto map, f: E --> F, which preserves addition (i.e., f(x+y) = f(x) + f(y) for all x, y), and multiplication (similarly) and order (i.e., x > y <=> f(x) > f(y)). You will see many versions of that concept (e.g., for groups, for rings, for fields, and perhaps for vector spaces, though probably not for ordered fields) when you take Math 113. Exercise 1.4:3 in the homework packet shows how to prove the result that Rudin mentions here. (Note that it is rated "d:5", meaning a problem that would be a good extra-credit exercise in H104.) Still you might find it interesting to look over and think about. --------------------------------------------------------------------- You ask in connection with the construction of R on pp.17-21 whether we will see the construction of R using Cauchy sequences of rational numbers. No. It is certainly elegant; but there are a couple of disadvantages. First, it requires the concept of the set of equivalences classes of an equivalence relation; and though this is repeatedly used in algebra, that would be the only use that would have to be made of it in this course, so one would have to go through the work of developing it just for that one use. Secondly, the natural context for the construction is the completion of a metric space; and the definition of a metric space assumes we have the real numbers. So to construct the real numbers that way, we would have to give a version of that construction based on a "metric space whose distance-function takes values in an arbitrary ordered field" just for that case, and then abandon that greater generality for all subsequent uses of the construction. Anyway, you can read about the completion of a metric space in exercise 3R24 (p.82), and the exercise after that does come down to the construction you ask about. ---------------------------------------------------------------------- You ask whether, since members of R are defined by cuts (pp.17-21), it is wrong to think of them as points. Not at all! We have used the construction by cuts to show that a structure exists that has the properties that we intuitively believe in for real numbers. But that doesn't mean that the particular construction we have used must oust our useful way of visualizing numbers. ---------------------------------------------------------------------- In reference to Rudin's Step 9, p.21, you ask about > ... using isomorphism to show that two sets, having different types > of elements (for instance rational numbers as elements vs. sets of > rational numbers as elements) can be subsets of each other. ... Well, we aren't saying that they are really subsets of one another. When we say we "identify" Q with Q*, we mean that we will use the same symbols to for elements of Q and the corresponding elements of Q*, because they agree in all the properties we care about: arithmetic operations and order, as proved on the preceding pages. Q has the advantage that we already have symbols for its elements, while Q* has the advantage that it is a subset of R; so to get both advantages, we apply the symbols we were previously using for Q to the new field Q*. > ... My question is, are there limits to this? Is this only used > when absolutely necessary? ... We avoid using it when it could lead to confusion! So if we had a need in what followed in this course to consider statements about relationships between elements of "the original Q" and elements of "the new Q" (i.e., Q*), then we couldn't use the same symbols for both! But instead, we are "leaving the old Q behind", retaining everything we knew about it (and even the mental images we used for it) as facts about (and images of) "the new Q". As I mentioned in class, this is the same thing we did when we constructed C: We noted that R was isomorphic to the subfield of C consisting of complex numbers of the form (a,0) (a\in R), and identified R with that subfield of C. There are alternative ways to handle this problem if you don't like the idea of "making identifications", or "abandoning" the old structures. For instance, if we call the field constructed on pp.17-21 "R_cut", then instead of calling this the real numbers, we could define R = "(R_cut - Q*) \union Q"; i.e., we could remove from R_cut the elements of Q*, and put the elements of Q in their places, and call the result of this surgery "the real numbers, R". Then our original Q (rather than Q*) would be a subfield of R. Of course, it would then take more pages to make the obvious definitions of addition and multiplication and order on this "hybrid" R, and show that it forms an ordered field with least upper bound property. In the end, what matters to a mathematician is the properties of the structures he or she has defined, not which construction was used; and the two different versions of the real numbers obtained as described above have the same properties (are isomorphic). The main caution in doing things like making identifications is to avoid unnecessary confusion. ---------------------------------------------------------------------- You ask about Rudin's use of "if" in definitions such as those on p.24, where other texts you have seen had "if and only if". The other texts you are referring to sound atypical -- in mathematical writing, definitions most commonly use the word "if", even though the meaning is logically like "if and only if". It's hard to say what the reason is for this usage -- I think that mathematicians generally use "if and only if" to assert that two statements whose meanings are both known are equivalent, while in a definition, you are not stating a mathematical fact, but staying how a word or phrase will be interpreted. Nevertheless, the implication does go both ways: As you say, if A~J then one calls A countable, _and_ if one calls A countable this means A~J. ---------------------------------------------------------------------- You ask whether, in the definition of one-to-one correspondence on p.25, the sets A and B have to have the same elements. Definitely not! If they had the same elements that would mean A = B. The fact that Rudin introduces a different symbol, A ~ B, should be enough to show that he means something different. Also, the definition itself shows this: It concerns the existence of a 1-1 function from A onto B; and functions aren't required to send sets to themselves. ---------------------------------------------------------------------- In connection with Definition 2.4, p.25, you ask > Can a infinite set have an equivalence relation with a finite set? In response to your use of words, I emphasized in class that this is not the meaning of "an equivalence relation". An equivalence relation means _any_ relation such that, if we write the relation as ~, then the three conditions of reflexivity, symmetry and transitivity hold; _not_ just the relation of having a 1-1 correspondence! (For instance, the relation on positive integers, "m has the same last digit as n", is an equivalence relation because if we write it "m ~ n", then it satisfies the three conditions shown in Rudin. Several examples of equivalence relations come up in Math 113.) You were referring to the specific relation Rudin introduced in Definition 2.3, so what you meant was > Can a infinite set have the same cardinality as a finite set? The answer is no. Definition 2.4(b) says that we call a set A infinite precisely if it is not finite, i.e., does not have the same cardinality as any J_n, while the finite sets are those that do have the same cardinality as some J_n. ---------------------------------------------------------------------- Regarding Definition 2.4, p.25, you write that you don't understand why one would ever need to say as set A is "at most countable" as opposed to simply "countable". Because "at most countable" allows the possibility that A is finite, while "countable" does not. ---------------------------------------------------------------------- Regarding Definition 2.4, p.25, you ask > ... but isn't a set even consisting of 1 element still countable? No! I warned against this confusion in lecture on Wednesday. I explained that different mathematicians differ on whether to use the word "countable" to describe any set that was not uncountable, or only countably _infinite_ sets, and that Rudin was one of those who use it to refer to countably infinite sets. In other words, the image is not that of "counting and eventually finishing", but of "counting forever", using all the whole-number-words. As I said in class today, when you learn a technical meaning of a word, you must not assume it agrees with what the everyday use of the word would suggest. See Definition 2.4(c), p.25 for the precise meaning of "countable". If you have trouble seeing why that excludes finite A, you should come by office hours to get clear on this. ---------------------------------------------------------------------- Regarding Definition 2.4, p.25, you ask > Q: Is a finite set a countable set? Under the definitions used in Rudin, and hence in this course, No! "Countable" is defined in Def.2.4(c), and a finite set does not have that property. (Under definitions you see in some other texts or courses, "countable" may be used for what Rudin calls "at most countable". The concepts involved are familiar to every mathematician, but every mathematician knows that there are certain variations in usage, so in reading a given author's work, one has to check how that author is using the words.) > How could a finite set be uncountable? It can't. The meaning of "uncountable" is given in Def.2.4(d): "neither finite nor countable". However, as I said, a finite set is not countable -- "uncountable" doesn't mean the same as "not countable". In general, if one sees "un-" added to a word, one can assume it means "not ...". BUT, if the word with "un-" is given its own definition, then that compound word has a separate meaning, which you must learn, and not assume that it simply means "not" plus the original word. ---------------------------------------------------------------------- Concerning the existence of a 1-1 correspondence between the integers and the positive integers Rudin gives at the bottom of p.25, you ask, why can't we make each element of the positive integers correspond to itself in the integers, so that zero and the negative integers don't correspond to anything. We can indeed define such a map -- it shows that there is a 1-1 map from the positive integers to the integers that isn't onto. But that doesn't contradict the property of Definition 2.3. That definition doesn't say that _every_ 1-1 map from A to B must be onto for A ~ B to hold; it says we consider A ~ B to hold if there is _some_ map A --> B that is 1-1 and onto. And that is what Rudin has shown to hold. (By analogy: Note that the statement that a set of real numbers is bounded above says that _some_ real number is an upper bound for the set, not that every real number is. So if we are given the interval [0,1], the fact that 1/2 is not an upper bound for it doesn't make the set unbounded above.) ---------------------------------------------------------------------- You ask how we know that every infinite set, and not just the set of integers, has the same cardinality as one of its proper subsets, as asserted in Remark 2.6, p.26. First one shows that any infinite set S contains a countable subset. Namely, since S is infinite it is nonempty, so we can choose s_1\in S. Since S is infinite it is also not {s_1} so we can choose an element s_2\in S - {s_1}, and so on; thus we get distinct elements s_1, s_2, ...\in S, so {s_1,s_2,...} is a countable subset of S. Now let us define f: S --> S by letting f(s) = s if s is not a member of {s_1,s_2,...}, and letting f(s_i) = s_{i+1}. (In other words, match {s_1,s_2,...} with a proper subset of itself, and the remaing part of S, if any, with itself.) Then we see that f is a one-to-one map of S onto S - {s_0}, a proper subset of S. Incidentally, from the point of view of rigorous set theory, the construction I just described with infinitely many choices s_1, s_2, ..., requires what you will see called the Axiom of Choice if you take Math 135. In other undergraduate courses such as this one, we do not formulate that axiom of set theory explicitly, but take it for granted when it corresponds to an "intuitively reasonable" construction. ---------------------------------------------------------------------- You ask whether the phenomenon of infinite sets being equivalent to proper subsets of themselves (p.26, Remark 2.6) doesn't cause some sort of problem. Well, if by "problem" you mean things working contrary to the ideas we've gotten from finite sets -- yes it does. We just have to learn to expect different things when dealing with infinite sets. An example is "Hilbert's Hotel": Hilbert pointed out that if we had a hotel with one room for each natural number, and every room was full, we could still admit a new guest, by moving the person in room N into room N+1 for each N, and putting the new guest in room 0. In fact, he noted, we could admit countably many new guests, by moving the guest in room N to room 2N for each N, and putting the new guests in the odd-numbered rooms. In the early days of set theory, people did run into contradictions, until they learned by experience what forms of reasoning could be validly used for infinite sets. ---------------------------------------------------------------------- You ask what Rudin means when he says on p.26, right after Definition 2.7, that the terms of a sequence need not be distinct. "x and y are distinct" means "x is not the same element as y". So to say that terms of a sequence need not be distinct means that under the definition of "sequence", repetitions are allowed. ---------------------------------------------------------------------- You ask what Rudin means in Definition 2.9 at the bottom of p.26 when he speaks of a subset E_\alpha of \Omega being "associated with" each alpha\in\Omega. It means that we have some rule that for each \alpha\in A gives us the set E_\alpha. In other words, we have a function from A to the subsets of \Omega, written \alpha |-> E_\alpha. ---------------------------------------------------------------------- You ask about the difference between "the union of a countable collection of sets" (p.27, sentence after (4)) and the union of "a sequence of countable sets" (p.29, sentence before (15)). Good question! When you read a phrase like "an [adjective#1] union of [adjective#2] sets", then "adjective#1" describes how many sets are being put together in the union (in other words, the cardinality of the index set A), while "adjective#2" describes some quality that each of these sets has. For instance, if we look at {even integers} \union {odd integers} we can call this a _finite_ union of _countable_ sets -- "finite" meaning that the number of sets listed is finite (we could write this as E_0 \union E_1, with those two symbols denoting the sets of even and odd integers respectively, so the index-set is the finite set {0,1}) while "countable" says that each of these two sets is a countable set. An example of a countable union of finite sets would be {1} \union {1,2} \union {1,2,3} \union ... Of course adjective#2 doesn't have to refer to cardinality; one can speak of "a finite union of bounded sets" etc. etc.. When Rudin refers to the union of a sequence of countable sets, "sequence" means "list indexed by the positive integers", so this is equivalent to "a countable union of countable sets". (4) differs from (15) in that there is no condition stated in (4) on the size of the sets E_m, but in (15), they are all assumed countable. (In each case, the index set A is countable, so each set is a countable union.) ---------------------------------------------------------------------- You ask why the intersection labeled "(iii)" near the top of p.28 is empty. Well, it would help me if you said what element(s) you thought were in it. I don't claim that "If you can't name anything in it, it must be empty" is a valid proof of anything. But this is a fairly explicit construction of a set, so if you understand what the symbols mean, you should be able to see what elements it contains. If you don't understand what it means, then the meaning of such intersections, introduced on the preceding page, is what you should be asking about. Please do let me know what you understand the set to contain. ---------------------------------------------------------------------- You about Rudin's statement that display (17) on p.29 shows that the elements of display (16) can be arranged in a sequence. Well, I can answer that conceptually or formally. Conceptually, a sequence just means an infinite list -- where one element is listed "first", one is listed "second", etc.. Display (17) is such a list, with x_{11} listed first, x_{21} second, x_{12} third, etc.. Formally, as Rudin says in Definition 2.7, a sequence is a function with domain {1,2,3,...}. Since we denote such a function by listing successively the values it takes on at 1, 2, 3, etc., display (17) represents the function that takes 1 to x_{11}, 2 to x_{12}, 3 to x_{21}, etc.. You also ask, "What are the semicolons doing in there?" Rudin is using them to help the reader's eye see how (17) is formed from (16). The term before the first semicolon is the one that the upper left-hand arrow in (17) passes through; between the first and second semicolons are the terms that the next arrow passes through, etc.. The arrows represent successive legs of our infinite "journey" through the array (17), and the semicolons help us see how our sequence is made up of those legs. ---------------------------------------------------------------------- You ask about the statements on p.29, after display (17), "If any two of the sets E_n have elements in common these will appear more than once in (17). Hence there is a subset T of the set of all positive integers such that S ~ T ...". Rudin's wording is a little misleading. What he means is something like "Hence if we call the nth term of (17) y_n, the map n |-> y_n may not be a one-to-one map from the set of positive integers onto S; to get a one-to-one map we have to skip an integer n whenever y_n is equal to an element that has occurred earlier. Doing so gives us a correspondence, not between S and the positive integers, but between S and a subset T of the positive integers." ---------------------------------------------------------------------- You ask about the sentence beginning "Hence" on p.29, starting two lines below display (17). The "Hence" is poor wording on Rudin's part. What he means is that since there may be repetitions, we need to use a _subset_ of the terms of the sequence, rather than the whole sequence, to get a non-repeating list. If we call the terms of (17) as shown "y_1, y_2, y_3, ...", then T can be taken to consist of those positive integers n such that y_n is not equal to any of the earlier y's. Then the map n |-> y_n is a one-to-one correspondence between T and S. ---------------------------------------------------------------------- You ask how, at the end of the proof of Theorem 2.12, p.29, E_1 being infinite implies S is countable. We have already shown that S is at most countable, i.e., _countable_or_finite_. The set E_1\subset S being infinite shows that the alternative "S is finite" is not true; so S is countable. ---------------------------------------------------------------------- You ask about the justification for the words "Let E be a countable subset of A" at the start of the proof of Theorem 2.14, p.30. Rudin does not claim that such a set E exists! The outline of the proof is to show that a countable subset E of A, _if_ any exists, cannot be all of A, hence that A is not countable. (Another way to arrange the proof would be to say "Assume by way of contradiction that A was countable," list its elements, and then construct an element not on the list exacly as in the proof given, getting a contradiction.) Actually, one can easily show that any infinite set S does contain a countable subset. [See answer to question about Remark 2.6, p.26, above.] But, as I said, this is not relevant to the proof Rudin is giving here. ---------------------------------------------------------------------- Regarding Cantor's Diagonal Argument in the proof of Theorem 2.14, p.30, you note that this construction produces from a countable set E of elements of A a new element s that is not in E, and you ask "what happens if you added this element s to E and why isn't it valid to include s in E and still have E be a countable set?" If you bring in that new element, getting E \union {s}, this will indeed be countable. But this new set still won't be all of A. The proof is that you can apply the same diagonal argument to the new set, and get yet another element, s'\in A that doesn't belong to it. (When you verify that the new set E \union {s} is countable, this involves arranging its elements in some sequence, with s in some position in that sequence. Then the procedure used in the Cantor diagonal argument will give you a new element, say s', which by construction is guaranteed to differ from s in some digit, as well as differing in one digit or another from each of the elements of E.) Perhaps the fallacy in the way you approached the question is shown in the words "include s in E". We can't consider the set E \union {s} to be the same set as E. Rather, as the preceding paragraph shows, it will be a set that, _like_ E, is countable, so that we can apply the same method to it. In summary, what the Cantor diagonal argument proves is that whatever countable subset E of A you start with, it won't be all of A; so A is not countable. ---------------------------------------------------------------------- You ask how to relate the real numbers constructed in Cantor's Diagonal Process (Theorem 2.14, p.30) to the view of the real numbers as filling gaps in the rationals via least upper bounds. Well, if s_1, s_2, s_3, ... are the successive digits of a binary expression constructed as in Theorem 2.14, then the real number represented by that binary expression is the l.u.b. of the sequence: 0, (s_1)/2, (s_1)/2 + (s_2)/4, (s_1)/2 + (s_2)/4 + (s_3)/8, ... , (s_1)/2 + (s_2)/4 + ... + (s_n)/2^n, ... So here as earlier, "new" real numbers are being obtained as l.u.b.s of "old" ones, in particular, of rationals. ---------------------------------------------------------------------- Concerning the definition of "the ball B" a little below the center of p.31, you ask > ... Just for clarity, y cannot be on the edge of the ball right? > Because otherwise |y-x| The q does not have to be the same for every neighborhood correct? Right! ---------------------------------------------------------------------- You ask what Rudin means by "neighborhood" in part (e) of Definition 2.18, p.32. He defines the term in part (a) of that definition! In general, if a textbook-author uses a term or symbol, and you don't know what it means, the first thing to do is look in the immediately preceding paragraphs and see whether it is explained there. If not, then look for it in the index (or list of symbols as the case may be). Finally, if you don't find it in either way, or if the definition given does not seem clear to you, then ask! ---------------------------------------------------------------------- You ask whether a dense subset X of E (Def.2.18(j), p.32) must have the same cardinality as X (i.e., satisfy E ~ X). No. As I pointed out in class, Q is dense in R, but Q is countable, and R is not. ---------------------------------------------------------------------- You ask how useful diagrams are in thinking about the concepts such as those introduced on p.32. As you noted, we can't draw one picture that will embrace all possibilities. However, on the one hand, one can use a variety of pictures illustrating various interesting cases, and hope that between these, one will see what kinds of situations can occur -- that was what I did in class. On the other hand, one can use pictures simply to focus one's thoughts, bearing in mind the limited relation the picture has with the situation -- e.g., in putting a point inside a curve, one may simply mean "the point is in the set" and no more. Finally, in thinking about a situation one might say "Well, what I tried to prove could fail if we had such-and-such a situation", and draw a picture that illustrates that situation, even if it doesn't show all the other things that can happen. One can alternatively draw these pictures in one's mind. There is the old joke of the mathematician who comes home, lies down on the couch, and shuts his eyes, and his wife tells the children "Don't disturb Daddy -- he's working". Which reminds me of a similar story I heard recently, from a mathematician whose wife was an artist, who heard his child telling a friend "My mother and my daddy both make pictures, but my mother's are better." ---------------------------------------------------------------------- Regarding the concepts of open and closed set (p.32), you ask why I said in class that the empty set is both open and closed. Well, to say it very briefly, the conditions for a set E to be open or closed are both universal quantifications: The former refers to "every point p of E" and the latter to "every limit point of E". Now a universally quantified statement "for all x of such-and-such a sort, such-and-such is true" is automatically valid if there do not exist any x of the indicated sort. (For instance, if a requirement for graduation is that "all library fines have been paid", then a person who never had any library fines automatically satisfies this requirement.) The empty set has not points, and has no limit points, so the conditions for being open and closed both hold, vacuously. ---------------------------------------------------------------------- You ask about the difference between a neighborhood (p.32) and a ball. In Rudin, "ball" is the word used for "neighborhood" in the special case of the metric spaces R^k. In modern mathematical usage (as noted in the page "What is Topology?" on the errata sheets), "ball" is the general term for what Rudin calls a "neighborhood", while "neighborhood" means something more general. However, since we are using Rudin in this course, we will follow his usage. ---------------------------------------------------------------------- You ask for examples of dense subsets of metric spaces (p.32, Definition 2.18(j)). For your own benefit it would have been best if you had tried to find examples on your own and told me what examples you had found, and then asked whether there were examples of different sorts, or cases that had some properties that you had not been able to find examples of. However, here are a few: (1) Take any of the examples I showed of sets E that were not closed, and regard the closure of E as a metric space X. Then E is dense in X. (2) The set of rational numbers is dense in the space R of real numbers. Likewise the set of irrational numbers is dense in R. (3) The set of nonintegers is dense in R. (4) In R^2, the set of pairs (x,y) such that neither x nor y is an integer is dense. (To picture this, think of the "grid" in R^2 that is drawn on graph paper as the set of (x,y) such that x is an integer or y is an integer. Then this set is its complement.) ---------------------------------------------------------------------- > ... If a set is bounded in the sense on p.33, does this mean it has > a least upper bound if the set it is imbedded in has the l.u.b > property? One can't talk about the l.u.b. property except for an ordered set; and then the answer depends on what assumptions one makes relating the order structure and the metric space structure. It is easy to give natural examples where the answer is no: Let X be the set of negative real numbers, and E be (-1,0). Then E is bounded in the metric space sense (every point has distance < 1/2 from -1/2), but has no least upper bound in X. The meanings of "bounded" in ordered sets and in metric spaces are really quite different. What they have in common is that for X = R, a set is bounded in the metric space sense if and only if it is bounded above and below in the ordered-set sense. ---------------------------------------------------------------------- You ask how the Corollary on p.33 follows from Theorem 2.20 (p.32). Well, suppose the Corollary were not true; i.e., that a finite set E had a limit point p. Apply the Theorem to that limit point. Do you see that we get a contradiction? This is the kind of deduction that Rudin assumes the reader can make without guidance. Looking at the Corollary, one should say to oneself "Since this is a Corollary to the Theorem, it must somehow follow directly from it. How?". Seeing that both results concern limit points and finiteness/infiniteness, one should be able to make the connection. You also comment that the corollary is not "if and only if". Right. So you should see whether you can find an example of a set without limit points that isn't finite. ---------------------------------------------------------------------- You ask whether a "finite set" (e.g., in the Corollary on p.33) should be understood to be nonempty. No; the empty set is finite. However, authors sometimes make statements about finite sets without stopping to think whether they apply to the empty set, so when you see a statement about a finite set it is worth asking yourself "Does the author really mean to allow the empty set in this case?" (The answer for Corollary on p.33 is that it is correct for all finite sets including the empty set.) ---------------------------------------------------------------------- You ask, regarding the blank space in the row for (g) on p.33, how there could be any doubt that the segment (a,b) should be regarded as a subset of R. Note that Example 2.21 begins "Let us consider the following subsets of R^2". So by hook or by crook, we must somehow consider (a,b) as a subset of R^2. What Rudin should have explained is that he is doing so by using the identification of each real number x with the complex number (x,0). The result is that the answer to whether (g) is open should be an unambiguous "No": since he says we are considering subsets of R^2, the question is whether the line-segment (g) is an open subset of the plane R^2; and it is not. ---------------------------------------------------------------------- You ask whether an infinite index, as in the infinite case of Theorem 2.22, p.33, is understood to be countable. Good question. The answer is no. For instance, if A is, say, a line-segment in the plance, then the set of points that are less than 1 unit in distance from at least one point of A can be written "\union_{p\in A} N_1(p)". This is an uncountable union, because A is uncountable. However, when one writes a union like display (4) on p.27, then the bounds "m = 1" and "infinity" are understood by convention to mean that m ranges over all positive integers, which, of course, form a countable set. ---------------------------------------------------------------------- Regarding the proof of Theorem 2.22, pp.33-34, you ask "How can he consider A to be a subset of B or vice versa just on the basis of an element x being a member of both?" The relevant criterion is that for any two sets A and B, we have A \subset B if and only if _every_ element x that is a member of A is also a member of B. In this proof, when he says "If x\in A, ...", the reasoning that follows uses _only_ the assumption that x\in A, and therefore applies to _every_ x\in A; so when he gets the conclusion "x\in B" this shows that every element x that is a member of A is a member of B. The same idea applies to the second half of the proof, with the roles of A and B reversed. This is an important form of mathematical reasoning -- remembering what we have assumed about an element, and concluding that whatever we have proved about it applies to all elements satisfying that assumption. ---------------------------------------------------------------------- Regarding Theorem 2.23 on p.34, you ask > Is it true that the only subsets of a metric space X that are both > open and closed are the set X itself and the empty set? If it's not > true can you provide an example? It's not true. For a "typical" counterexample, let X be the union of the two intervals [0,1] and [2,3]. Then each of those intervals is both open and closed in X. (They are not open in R but that is not the question.) For an example that cuts things closer, let X be R - {0}; then the set of positive reals and the set of negative reals are both open and closed in X. Likewise, if X is the metric space of rational numbers, then for any irrational number r, the set of rationals < r and the set of rationals > r are both open and closed in X. ---------------------------------------------------------------------- Probably based on Theorem 2.24(d) on p.34, you comment > ... It seems as though one could take any closed set, union with it a > finite number of arbitrary points outside of the set and still have > a closed set. This does not seem to lend itself to an intuitive > understanding of the meaning "closed," ... It shows that the property of being closed isn't one that involves the behavior of just finite numbers of points. For a set to be closed means that if it can get "closer and closer and closer" to a point (i.e., within epsilon distance of the point for every positive epsilon), then it captures the point. But adding just finitely many points doesn't get you within every positive epsilon of a point if you weren't already. ---------------------------------------------------------------------- You ask how we get display (21) on p.34 from display (20) on the preceding page. By taking the complements of both sides, and letting E_alpha be the complement of F_alpha for each alpha. ---------------------------------------------------------------------- You ask, regarding Definition 2.26, p.35, whether E won't always be contained in E'. No. In Definition 2.28(b), "A point p" means a point of X, not necessarily in E. (Rudin indicates this at the beginning of the definition.) I hope my examples in class yesterday made this clear. Part (d) of Definition 2.28 should also have made you suspect that if you were assuming that a limit point of a set was _always_ a point of that set, something was wrong. ---------------------------------------------------------------------- Regarding Theorem 2.27 on p.35, you ask > ... Is there anything wrong with the book's proof of > 2.27(a) which was changed in the errata? How do we get the statement "The complement of \bar{E} is therefore open" in that proof? It requires that p have a neighborhood which does not intersect \bar{E}, while Rudin just shows it has a neighborhood which does not intersect E. One can show that if N_r(p) is a neighborhood of the latter sort, then N_{r/2}(p) is of the former sort -- but showing that would add several more sentences to the proof. ---------------------------------------------------------------------- You write, regarding Remark 2.29, p.35, "I find Rudin's statement of relative openness to be rather wordy and a bit confusing." I guess you mean the sentence at the bottom of the page, beginning "To be quite explicit ...". The point is that that statement is not a definition, but an expansion of what he has already said, namely that if E \subset Y \subset X, then E can be considered as a subset either of the metric space X or of the metric space Y, and that if we apply Definition 2.18 in each case, we find that E being open as a subset of X and as a subset of Y are not the same thing. The point of the next sentence is to state what one gets by applying Definition 2.18 in the latter case. It is wordy because it is aimed at expanding on something that one can understand briefly from the definition; and if it seems confusing, perhaps this is because you didn't first look back at the definition to see for yourself what being open as a subset of the metric space Y should mean. Anyway, Theorem 2.30 shows that it reduces to a simple criterion. ---------------------------------------------------------------------- Regarding Theorem 2.30, p.36, you write > Q: I'm confused by "open relative" ... > ... will A be "open" or "closed" in an absolute sense? The definitions of "open", "closed", etc. in Definition 2.18 all refer to E as a subset of the metric space X. If we change what space X we are considering E to be a subset of, the meaning of being "open", "closed" etc.. changes. Remember that in that Definition, "point", "subset" etc. mean "point of X", "subset of X", etc.. For instance, in part (a), N_r(p) means the set of all points q\in X such that d(p,q)_ r. He then brings 2^n to one side of this equation by itself, by multiplying both sides by 2^n and dividing by r. The result is delta/r >_ 2^n, or equivalently, 2^n _< delta/r. ---------------------------------------------------------------------- You ask whether the statement in the proof of Theorem 2.41, p.40, that every bounded subset E of R^k is a subset of a k-cell, is trivial, or whether you need to prove it. Well, you should certainly be able to prove it if you are asked to do so! The fact that Rudin takes it for granted means that he assumes his readers can see how to prove it. ---------------------------------------------------------------------- You ask about the inequality on p.40, first display. Well, the fact that E is not bounded means that its points do not all lies within a bounded distance from any point q. (See definition of "bounded" on p.32.) In particular, they do not all lies within a bounded distance of the point 0. In other words, for every constant M, we can find a point at distance >_ M from 0. In particular, for each integer n we can find a point at distance >_ n+1 from 0, hence at distance > n from 0. Calling that point x_n, we get the displayed inequality. ---------------------------------------------------------------------- You ask why, on p.40, line after first display, the x_n have no limit point If p were a limit point of {x_n}, then there would be infinitely many points in N_1(p). These would all have distance _< d(0,p) + 1 from 0 (using the definition of N_1(p) and the triangle inequality). But taking and integer M > d(0,p)+1, we see that the displayed inequality says that no x_n except the finitely many ones with n < M can have distance _< d(0,p) + 1 from 0. ---------------------------------------------------------------------- You ask why, on p.40, two lines before second display, S has no other limit points. The intuition is that the sequence of points comprising S are getting very near to x_0, and so can't be simultaneously getting very near any other point p. This can be made precise using the triangle inequality. Can you figure out how? ---------------------------------------------------------------------- You ask, regarding the final computation in the proof of Theorem 2.41, p.40, "How do we know |x_0 - y| - 1/n >_ |x_0 - y|/2 ? This is a lesson in how to read a book like this. Notice that the next line begins "for all but finitely many n". So Rudin is not saying that all n satisfy this formula but only that all big enough values of n do. Understanding it in this sense, you see why it is true? ---------------------------------------------------------------------- You ask how, on p.40, Rudin gets the last step of the second display, and what he means by "for all but finitely many n". Whether that inequality holds depends on n. He is saying, not that it holds for all n, but "for all n with only finitely many exceptions". Now look at the two sides of that inequality. Since |x_0 - y| is a fixed number, whether the inequality holds will depend on the value of what is varying, namely n. If we replaced "1/n" by "0", it would clearly hold. You should be able to verify that if instead of replacing 1/n with 0, we get 1/n small enough, it will still hold. You should in fact be able to figure out how small 1/n has to be for this to hold (in terms of |x_0 - y| of course), and then see that if you make n large enough, 1/n will be as small as required. ---------------------------------------------------------------------- You ask why, at the end of the proof of Theorem 2.41, p.40, we need to show that y is not a limit point of S. To see this, look at the next sentence: "Thus S has no limit point in E". (To see that this follows from y not being a limit point, look at how y was introduced.) Rudin has assumed the condition "E is closed" fails, and gotten a contradiction to condition (c). Thus, he has proved that (c) implies E is closed, which was his goal in this part of the proof. ---------------------------------------------------------------------- You ask about Rudin's statement after Theorem 2.41, p.40, that "in general" (a) does not imply (b) and (c). The theorem refers to sets in R^k, while the preceding half of the sentence you ask about contrasts the behavior in a general metric space. Saying the implication does not hold "in general" means that not all metric spaces satisfy it. If he left out that phrase, it might seem to mean that it is not true in any metric space, which is false, as the theorem shows. ---------------------------------------------------------------------- You ask, regarding Rudin's comment on p.40 that L^2 does not have the property that closed bounded subsets are compact, whether there are other interesting examples with this property. Certainly! I mentioned the rational numbers in class, simply because they had a superficial similarity to R but failed to satisfy Theorem 2.41. In fact, any non-closed subset of R, e.g., the segment (0,1), will also have that property. Still more generally, so will any non-closed subset of any metric space. ---------------------------------------------------------------------- You ask about the statement used in the proof of Theorem 2.43, p.41, that a set having a limit point must be infinite. See Theorem 2.20, p.32. ---------------------------------------------------------------------- You ask about how in the proof of Theorem 2.43, p.41, Rudin's condition (ii), saying that x_n is not in the closure of V_n+1, is consistent with his other conditions. In particular, you ask whether this means "for all n". Yes -- but the key point is that n occurs twice in that statement, as the subscript in x_n, and as part of the subscript of V_n+1; and when a letter appears more than once in a formula, it means the same thing in both places. So Rudin's formula says, for instance, that x_1 is not in the closure of V_2, that x_2 is not in the closure of V_3, that x_3 is not in the closure of V_4, etc.; but it does not say, for instance, that x_5 is not in the closure of V_2. So these conditions are not inconsistent with statement (iii), saying that V_n+1 has nonempty intersection with P. ---------------------------------------------------------------------- Regarding the Corollary on p.41, you ask > ... Are the reals the "smallest" infinite set which is uncountable or > are there sets with lower cardinality that are still uncountable ... It was conjectured early in the 20th century that the cardinality of the reals was the smallest uncountable cardinality; this was called the "Continuum Hypothesis" (because the set of real numbers is called by set-theorists "the continuum"). In the 1960's it was proved that the Continuum Hypothesis is independent of the standard axioms of set theory. That is, if we take these axioms, and adjoin either the axiom that the cardinality of the reals is the next cardinality above that of the natural numbers, or the axiom that there are other cardinaleties in between, both of the resulting sets of axioms are consistent. Since there is no "real-world" way of choosing between one and the other, either must be considered a "valid" set theory. However, it is generally felt that the continuum hypothesis leads to somewhat unnatural mathematical consequences, so if one has to make the choice, one chooses the axiom that there are cardinalities between that of countable sets and that of the continuum. The place to learn more about such things is Math 135. ---------------------------------------------------------------------- You ask why the Cantor set is "clearly" compact (page 41). Because it is a closed bounded subset of R ! ---------------------------------------------------------------------- You ask about the argument in the top paragraph of p.42. This is more roundabout than it needs to be. One can simply observe that E_n has no segment of length > 3^{-n}, hence given any segment I = (alpha,beta), if we take n>0 such that 3^{-n} < beta-alpha, I will not be contained in E_n, hence will not be contained in P, giving the desired conclusion that P contains no segment I. ---------------------------------------------------------------------- You ask about the inequality of the second display on p.42, "3^{-m} < (beta - alpha) / 6". Well, the first thing to clarify is its relation to the rest of the sentence. Where Rudin precedes it with a comma and the word "if", it would be better to drop the comma and change "if" to "for all m such that". Second, although the statement is true with denominator 6 as he shows, it is even true with denominator 4. Here is how we prove it. Suppose 3^{-m} < (beta - alpha) / 4, or in other words, beta - alpha > 4 . 3^{-m}. Then if we look at all integer multiples of 3^{-m}, there must be at least four of them in the segment (alpha,beta). Call these j/3^m, (j+1)/3^m, (j+2)/3^m and (j+3)/3^m. One of the three successive integers j, j+1, j+2 must be congruent to 1 mod 3; call it 3k+1. Since not only j/3^m, (j+1)/3^m and (j+2)/3^m are in the segment (alpha,beta), but also (j+3)/3^m, we can say that not only (3k+1)/3^m but also (3k+2)/3^m is in that segment. Hence (alpha,beta) contains the segment ( (3k+1)/3^m, (3k+2)/3^m ), of the form (24), as Rudin claims. ---------------------------------------------------------------------- Where you quote Rudin on p.42 (sentence containing second display) as saying "if 3^-m<(b-a)/6, P contains no segment", you are putting the "if" clause with what comes after it; but it is actually meant to go with what comes before it. I.e., Rudin is saying "every segment (\alpha,\beta) contains a segment of the form (24) if 3^-m<(b-a)/6"; and then the whole sentence says "Since [this is true], P contains no segment". Conceptually, think of a value m being fixed, and look at all the segments (24). As k runs through the integers, these will form a dotted line "... - - - - ...". If m is large, this dotted line will be very fine-grained, and one of the "-"s will lie wholly inside (\alpha,\beta). How big must m be to make this true? Rudin's answer is "big enough to make the second display on this page hold". (But I think that the reasoning I give in the errata is simpler!) ---------------------------------------------------------------------- You ask what the purpose of the Cantor set (pp.41-42) is. Well, it gives an example showing that things can happen which people in the early years of the 20th century thought unlikely; Rudin mentions such a property in the last paragraph of the section. Unfortunately, that property concerns the material of Chapter 11, which we don't cover in 104. (It is covered in 105.) For the purposes of this course, let us just say that its purpose is to expand your awareness of what subsets of R can look like. We've seen simple examples of subsets of R, like line-segments, and countable sets of points approaching a limit (with or without that limit point being included in the set). The Cantor set is "something very different". ---------------------------------------------------------------------- You ask about the concept "measure" that Rudin refers to on p.42 at the end of the discussion of the Cantor set. Measure theory is a topic that is begun in Math 105, and developed in depth in Math 202B, not something we'll be able to treat in this course. Roughly speaking the measure of a subset of R^k abstracts the concept of the length of a segment in R^1, the area of a region in R^2, the volume of a region in R^3, etc.. But even in R^1, it gets complicated when the region is not a segment. For instance, the measure of the set of rationals in [0,1] is zero, hence the measure of the complementary subset of [0,1], the irrationals in that interval, is 1. Neither of those sets is closed, so neither is perfect. The Cantor set is an example showing that a nonempty perfect set can have measure 0. Intuitively, the idea is that as we keep cutting out middle thirds, the sum of the lengths of the remaining segments are (2/3), (2/3)^2, ... , (2/3)^n, ... , which approach 0. ---------------------------------------------------------------------- You ask, regarding the definitions of "separated" and "connected" on p.42, whether one can say that "connected" means "not separated", and whether they are opposites or complements of each other. Neither. The definition of "separated" makes it a relation between _two_ sets, A and B; you can think of it as meaning "separated from each other". The concept of "connected" refers to a single set, and means that that one set is not a union of two nonempty separated sets. So "separated" and "connected" refer to different things. It sounds as though you are skimming definitions, rather than studying them carefully. That doesn't work in math. For every definition, you need to think "What does this exclude, and what does it allow?" Sometimes you won't see good examples on your own; in that case, keep these questions in mind as you read further, and the author will generally give you such examples. ---------------------------------------------------------------------- You ask about the phrase "every neighborhood of p contains p_n for all but finitely many n" in Theorem 3.2(a), p.48. It means that if we pick a neighborhood N of p, and look at all possible values of n (i.e., all the natural numbers or all the positive integers, depending on how this sequence is indexed; let's assume positive integers), the set of values of n for which the statement "p_n \in N" fails to hold will be finite -- i.e., the statement will hold with finitely many exceptions. If you take a familiar convergent sequence, such as f(n) = 1/n, with limit in this case 0, and a particular neighborhood of that limit, say (-.001, +.001), then you will find that the set of n such that f(n) is not in that neighborhood is finite; in this case, the set {1,..., 1000}. In less simple cases, where the sequence approaches its limit in an irregular manner, the set may not consist of consecutive terms; but convergence is equivalent to that set being finite for all neighborhoods. ---------------------------------------------------------------------- You ask, regarding Theorem 3.2 on page 48 "Why is part (b) necessary? If a sequence is getting closer and closer to one point, and the sequence and point exist in a metric space, doesn't that point have to be unique?" That's the right intuition, but not every metric space "looks like" one that you're familiar with, so you have to be able to show that what seems intuitively clear really follows from the definitions of "convergence", "metric space", etc.. ---------------------------------------------------------------------- You ask how the conclusion d(p,p') < epsilon on the 5th line of p.49 implies that d(p,p') = 0. Note the words "Since epsilon was arbitrary"! Rudin has proved d(p,p') < epsilon not just for a particular epsilon, but for _every_ epsilon > 0. So d(p,p') is smaller than every positive number; so it can't be positive itself (otherwise it would be smaller than itself); so it must be 0. Another way he could have worded this proof is: "Assume p not-= p'. Then d(p,p') > 0 ..." and then used "d(p,p')" in place of the epsilon in that proof, and ended up with the conclusion "d(p,p') < d(p,p')", a contradiction, showing that the assumption p not-= p' could not be true. ---------------------------------------------------------------------- You ask why in the proof of Theorem 3.2(d) (p.49, middle of page), Rudin makes d(p_n,p) < 1/n rather than an arbitrary epsilon. If we were to choose a point of E corresponding to every epsilon > 0, this would require uncountably many choices, and we wouldn't be able to put the resulting chosen points in a sequence. But by the archimedean property of the real numbers, for every positive real number epsilon, there is a positive integer n > 1/epsilon; which makes 1/n < epsilon; so that by making each d(p_n,p) < 1/n, we guarantee that for every epsilon there will be an n such that for every m >_ n, d(p_n,p) < epsilon, which is what is needed to make p_n --> p. ---------------------------------------------------------------------- You ask how, in part (d) of the proof of Theorem 3.3, on p.50, we can be sure that the N which makes the inequality hold is > m. The point is that if the inequality holds for all n from some point N on, then it will necessarily hold for all n from any later point on. (E.g., if it holds for all n >_ 100, then it also holds for all n >_ 1000, since {n | n >_ 100} contains {n | n >_ 1000}.) So if we can find some N that makes this work, so will any larger value we choose for N; and we can choose that value to be > m. This is a form a argument that Rudin will use throughout the result of the book, so you should be sure you understand it, and can see how it works it even when it is not explained. ---------------------------------------------------------------------- You ask about the proof of the converse direction of Theorem 3.4(a) (second paragraph of proof, p.51). We need to show that if in the sequence (x_n), the jth coordinate, alpha_j,n approaches the jth coordinate alpha_j of x for each j, then the vectors x_n approach the vector x. So given any epsilon, we need to show that for large enough j, |x_n - x| < epsilon. For each j, we know that for large enough n, |alpha_j,n - alpha_j| becomes as small as we want; and since there are only a finite number of indices j, we can make these differences as small as we want simultaneously. So how small do we need to make them? If we made each of them < epsilon, then by the formula for distance in k-dimensional Euclidean space, (last display on p.16), this would imply that |x_n - x| was < (epsilon^2 + epsilon^2 + ... + epsilon^2)^1/2 = (k epsilon^2)^1/2 = (sqrt k) epsilon. This is bigger than we want by a factor of sqrt k. So, instead, we make each of the |alpha_j,n - alpha_j| less than epsilon/sqrt k; this gives the inequality |x_n - x| < epsilon, as required. In the next-to-last homework sheet ("Homework due September 24"), under the heading "A last (?) bit of advice on studying", I emphasized the importance of reading the text actively. I think that this is a good instance of how that is applicable. A student reading passively and coming on Rudin's statement that for n large enough, |alpha_j,n - alpha_j| < epsilon/sqrt k, will not have any idea why that formula is given. But if one stops and thinks "What do we have to do to get |x_n - x| < epsilon?", and checks, if one is unsure, how the norm of a vector in R^k is defined, one can come up with the formula oneself, as sketched above. ---------------------------------------------------------------------- You ask whether a subsequence, as defined in Definition 3.5, p.51, has to involve infinitely many distinct indices n_k. That definition refers to "a sequence {n_k} of positive integers", and by definition, a "sequence" in Rudin is a function from J, and the conditions n_1 < n_2 < ... prevent any two of the indices from being equal; so indeed, the subsequence has to use infinitely many indices. ---------------------------------------------------------------------- You ask why Rudin vacillates between using k and i as a subscript in Definition 3.5, p.51. Mathematicians most often begin with "i" as an integer subscript (except where they use "n"), and then go from "i" to "j" and then to "k". But in works dealing with the complex numbers, using "i" for an integer is problematic, since it also denotes the imaginary number i. I think that Rudin used "k" when he remembered that "i" could cause problems, and "i" when habit overcame him. ---------------------------------------------------------------------- You ask about the second sentence on p.52. Note that this begins "If E is finite". In this case, since the sequence doesn't have infinitely many different terms, at least one value must be _repeated_ infinitely many times. The meaning of what follows is "call such a value p, and call an infinite list of terms that have that value p_n_1, p_n_2, ... ." ---------------------------------------------------------------------- You ask why on p.52 at the end of the proof of Theorem 3.6(a), Rudin didn't simply cite Theorem 3.2(d). A nice idea! The difficulty is that if one chooses a sequence of elements in E that converges to a point p, those elements might occur in that sequence in a different order from their occurrence in the sequence (p_n), hence the sequence that they formed would not be a subsequence of (p_n). So one would have to prove a result saying that if a sequence converges, then so does any sequence obtained by arranging its terms in a different order. That is actually not hard to deduce from Theorem 3.2(a); but after stopping to prove it and show how to use it here, one wouldn't have a shorter proof than the one Rudin gives. ---------------------------------------------------------------------- You ask how, in the proof of Theorem 3.6(b), p.52, Rudin gets from Theorem 2.41 the conclusion that any bounded subset of R^k is contained in a compact set. Any bounded subset of R^k is clearly contained in a k-cell, and Theorem 2.41 shows that any k-cell is compact, giving the asserted conclusion. ---------------------------------------------------------------------- You ask how Rudin gets d(x,q) < 2^{-i} delta in the proof of Theorem 3.7, p.52. He is at the i-th stage of the construction, so the positive integer "i" is given; and clearly 2^{-i} delta is a positive real number. Hence we can form the neighborhood of radius 2^{-i} delta about q, and use the fact that q is a limit point of E* to find an element x of E* in that neighborhood, i.e., with d(q,x) < 2^{-i} delta. Now the fact that x is in E* means that it is a subsequential limit point of E, so there is some subsequence that converges to it, and some term p_n_i of that subsequence (with n_i > n_{i-1}) will be within 2^{-i} delta of x. Using the triangle inequality, we have d(q,p_n_i) _< d(q,x) + d(x,p_n_i) < 2^{-i} delta + 2^{-i} delta = 2 . 2^{-i} delta = 2^{1-i} delta ---------------------------------------------------------------------- You ask how, in the proof of Theorem 3.7, p.52, Rudin comes up with "2^{-i} delta" and "2^{1-i} delta", and how what he proves shows that (p_n_i) converges to q. His goal in the proof is to get a sequence of points of {p_n} that approach q. His method is to take a sequence of points of E* that approach q, and then take a point of {p_n} that is sufficiently near to each of these, and verify that the latter also approach q. His starting with "delta = d(q,p_1)" is really pointless; he could just as well have dropped all the delta's from this proof, and it would have come out cleaner; but this didn't occur to him, and the proof with the delta's isn't wrong; so we can still follow it. He starts by getting a sequence of points of E* approaching q by taking the i-th point to be a point x of E* within distance 2^{-i} delta of q (which is possible for every i, because q was assumed a limit point of E*). To get the points of {p_n} close to these points, he takes p_n_i within that same small distance, 2^{-i} delta, of x. By the triangle inequality, this makes the distance between p_n_i and q less than or equal to 2^{-i} delta + 2^{-i} delta = 2 . 2^{-i} delta = 2^{1-i} delta Since the numbers 2^{-i} delta become less than any epsilon as i -> infinity, their doubles, 2^{1-i} delta, also become less than any epsilon, so d(q,p_n_i) becomes arbitrarily small, so p_n_i does indeed approach q as i approaches infinity. ---------------------------------------------------------------------- You ask whether the statement that (p_n) is a Cauchy sequence (Definition 3.8, p.52) means that the distances d(p_n, p_{n+1}) approach 0. Any Cauchy sequence will have that property, but not all sequences that have that property are Cauchy. For instance, if p_n is the square root of n, then d(p_n, p_{n+1}) approaches 0 (because the slope of the graph of the square root function approaches 0) but the sequence is not Cauchy, because, for instance, if we take n and m to be two successive squares (e.g., 5^2 = 25 and 6^2 = 36), then d(p_n, p_m) = 1; so arbitrarily far out in the sequence, there are terms with distance 1 from each other. ---------------------------------------------------------------------- Regarding Definition 3.9 p.52 where Rudin defines diam E as the supremum of the set S of distances d(p,q) (p,q\in E), you ask how we know that this set is bounded. We don't! If it isn't bounded, then it's supremum is +infinity, as discussed on p.12, line 4. ---------------------------------------------------------------------- You ask how Rudin deduces the final display on p.53 from the preceding display. The preceding display shows that for any p,q\in E-bar we have d(p,q) < 2 epsilon + diam E. This shows that 2 epsilon + diam E is an upper bound for the set of such distances d(p,q). Hence diam E-bar, which is defined as the _least_ upper bound of such distances, must be _< that value, which is the assertion of the final display. ---------------------------------------------------------------------- You ask what Rudin means in the proof of Theorem 3.10, p.53, end of proof of (a), where he says that epsilon is arbitrary. He means that he has proved the preceding inequality, not just for one particular positive real number epsilon, but for any positive real number one might choose. Hence it is true for _every_ positive real number epsilon. Do you see that this implies diam E-bar _< diam E ? If not, let me know and I'll supply a precise step; but better if you can do so. ---------------------------------------------------------------------- You write, regarding the proof of theorem 3.10(a), p.53, "How come diam \bar E = diam E if epsilon is arbitrary? It is just the initial expression plus a positive number." The preceding inequality says that diam \bar E is _< the initial expression plus 2 times an _arbitrary_ positive number! That means that _whatever_ positive number you take, diam \barE _< diam E plus twice that number. So for instance, taking epsilon = 1/200, we see that diam \bar E _< diam E + 1/100; taking epsilon = 1/2,000,000, we see that diam \bar E _< diam E + 1/1,000,000, etc.. From this, can you see that diam \bar E cannot be any greater than diam E ? I could state the argument that clinches this (and I think I did in class); but it would be better for you to think for yourself "Does the above argument -- including the final `etc.' -- prove that diam \bar E cannot be any greater than diam E ?" If you're not convinced, or if you find it intuitively convincing but don't see how to make it a precise argument, then ask again and I'll fill in the final step. ---------------------------------------------------------------------- Regarding the concept of complete metric space (Definition 3.12, p.54) you ask > do there exist examples of spaces, where the choice of the metric > determines whether the space is complete? Definitely. In one sense, this is very easy. Just take different spaces with the same cardinality, one of which is complete and the other is not. By transferring the different metrics to the same set, one can consider them the same set, but with different metrics. For instance, consider the three countable subsets of R, regarded as metrics using the distance-function of R: (1) the set of natural numbers N, (2) the set K = {1/n| n\in N} \cup {0}, (3) The set Q of rational numbers. Here (1) and (2) are closed sets of real numbers, hence complete (though (1) is non-compact while (2) is compact), but (3) is not complete. Now let f: N -> K and g: N --> Q be bijections. Then we can define three metrics on N: d_1(m,n) = |m - n|, the standard metric, d_2(m,n) = |f(m) - f(n)|, d_3(m,n) = |g(m) - g(n)|. Then under d_2, N "looks just like" K, and under d_3, it "looks just like" Q. Hence N is complete under d_1 and d_2, but not under d_3. In the above example, the three metric space structures on N were topologically different: Different sets were open. What is more surprising is that one can get spaces that are "topologically the same", but where one is complete and the other is not. For example suppose that on N we define d_4(m,n) = |1/m - 1/n|. We see that under this metric, every point still has a neighborhood consisting of that point alone; so every subset is open. But the space is no longer complete, since the sequence (1/n) is Cauchy, but doesn't converge. (Similarly, using the bijection between R and (-1, 1) given by f(x) = x / sqrt(x^2+1), we get a metric on R that determines the usual topology -- because the inverse of that function is also continuous -- but which is not complete, since (-1, 1) is not.) ---------------------------------------------------------------------- You ask whether, in Definition 3.15, p.55 (of a sequence approaching + infinity), the condition "s_n >_ M" shouldn't be "s_n > M". It doesn't make any difference! If for every M there is a point in the sequence beyond which all terms are >_ M, then for every M there is also a point in the sequence beyond which all terms are > M, and vice versa. (You should make sure you see why this is so! If you still don't, then ask again.) ---------------------------------------------------------------------- You write that in the proof to Theorem 3.17, p.56, Rudin seems to assume "that the set E is nonempty if it has an upper bound." Not quite. As you note, the empty set does have upper bounds. In fact, _every_ real number is an upper bound of the empty set; hence the empty set does not have a _least_ upper bound in R. If we go to the extended reals, then -infinity is also an upper bound on the empty set, and since it is the least extended real, it is the least upper bound. So this explains Rudin's observation that if the least upper bound is real (in particular, not -infinity), then E must be nonempty. ---------------------------------------------------------------------- I hope my argument in class showing that the set E in Rudin's Theorem 3.17 (p.56) is nonempty also cleared up your question about the third paragraph of the proof of that theorem. Briefly: If s_n does not have a subsequence approaching +infinity, then it is bounded above; if the whole sequence also does not approach -infinity, then for some M it is not true that it eventually stays below M, and this means that it has a subsequence bounded below by M; hence that subsequence is bounded both above and below, and has a subsequence approaching a real number. In contrapositive form, this means that if it has no subsequence approaching a real number or +infinity, the whole sequence approaches -infinity. ---------------------------------------------------------------------- You ask about the proof of statement (b) of Theorem 3.17 in the third-from-last paragraph of p.56. Suppose infinitely many elements of s_n are >_ x. If the set of those s_n is not bounded above, then it has +infinity as a subsequential limit point, and that gives the desired element of E. Alternatively, suppose it has an upper bound B. Then the elements s_n which are >_ x form a sequence (a subsequence of (s_n)) in the interval [x,B]. Since [x,B] is compact, this sequence has a subsequential limit point in that set, which likewise is a member of E that is >_ x. ---------------------------------------------------------------------- You ask how the situation "x > s*" of Theorem 3.17(b), p.56 can occur, if s* is the supremum of the set E. In that statement, "x" is not a member of E. By "If x > s*" he means "for every real number x > s*". ---------------------------------------------------------------------- You ask, regarding the results on p.57, what "lim sup" and "lim inf" mean. Rudin defines these on p.56, on the line before Theorem 3.17, as the "s^*" and "s_*" of the preceding paragraph. So what this means is that if you are given any sequence of real numbers (s_n), and you want to find its lim sup, you get this by going through the construction of that paragraph -- i.e., taking the set of all extended real numbers which subsequences of (s_n) approach, and taking the sup of that set. This extended real number is what is called lim sup s_n. (lim inf s_n is defined similarly as the inf of that set.) Since your question referred to p.57, I wonder whether you missed entirely the fact that these symbols were defined on p.56. If so, I will repeat what I have said to the class before: When you hit a phrase or symbol you are not familiar with, use the index, respectively the "List of Special Symbols" on p.337. That list is arranged by page of occurrence, so finding this symbol on p.57, you should look for the first (main) page-number on that list that is >_ 57, and work backwards. If you do so, you will immediately come to "lim inf" and "lim sup", shown as defined on p.56, indicating that that is where to look. (If you had seen that it was defined on p.56, but you didn't understand the definition, I trust that you would have said this.) ---------------------------------------------------------------------- Regarding Theorem 3.19, p.57, you say that "If s_n is decreasing more quickly than t_n but s_1 > t_1, then the upper limit of s_n will be greater than the upper limit of t_n." This looks as though you are confusing "lim sup" with "sup". In the situation you refer to, where (s_n) and (t_n) are decreasing sequences, it is true that sup {s_n} = s_1 and sup {t_n} = t_1, so sup {s_n} > sup {t_n}. But in this situation, the set E in the definition of lim sup s_n is very different from the set {s_n}; so lim sup s_n will not be s_1. Examine Definition 3.16 again, and if you still have difficulties, ask again. ---------------------------------------------------------------------- You ask how to apply the proof of Theorem 3.20(e) on p.58 to the case where x is negative. Rudin's argument is certainly laconic. Anyway -- when we test for whether lim x^n = 0, we look at |x^n - 0| = |x^n| = |x|^n, so once we know that this eventually gets less than any epsilon>0 if x is positive, we know that the same will be true for -x, since x and -x have the same absolute value. Good point, though. ---------------------------------------------------------------------- You ask, in connection with Theorem 3.23 on p.60, about the possibility of regarding series such as sin(x) (I guess you mean \sum sin n ?) as approaching 0 even though the summands don't go to zero, because they have average value 0. Well, the simplest case of this sort to look at is (+1) + (-1) + (+1) + (-1) + (+1) + (-1) + ... The partial sums are +1, 0, +1, 0, +1, 0, ... . They don't approach a limit in the usual sense; but people have studied ways of associating a number to such series. The number they would associate to the above series is 1/2, the average values of the partial sums; see exercise 3R14, p.80. ---------------------------------------------------------------------- You note that no proof is shown for Theorem 3.24, p.60. The justification is implicit in the sentence before the theorem: The result follows from Theorem 3.14. This is fairly common in mathematical writing: One can state something formally without then saying "Proof ..." if comments one makes before it constitute a proof, or make clear how the proof goes. ---------------------------------------------------------------------- You also ask why Rudin uses "2" in Theorem 3.27, p.61, when the same argument would work for any m >_ 2. Basically, because just about any interesting consequence one can get out of the result for general m one can get out of the case m = 2. The result is introduced as a tool, and the tool with m = 2 works as well as the tool with general m. The proof with general m looks a tiny bit messier, because while the number of integers i with 2^k < i _< 2^{k+1} is 2^k, the number with m^k < i _< m^{k+1} is (m-1) m^k. Moreover, if one really looks for the most general result of this form, one shouldn't restrict attention to geometric series. Exercises 3.7:3 and 3.7:4 in the exercise packet look at the most general sequences with this property. ---------------------------------------------------------------------- You ask whether there is a result analogous to Theorem 3.27, p.61, for increasing series of negative terms. Certainly! If (a_n) is an increasing series of negative terms, then (-a_n) is a decreasing series of positive terms, and we can apply Theorem 3.27 to it, and get the corresponding statement for (a_n). ---------------------------------------------------------------------- You ask about the various uses of "log n" -- as meaning "log_e n" in Rudin, p.62 bottom, "log_10 n" in tables of logarithms, and "log_2 n" in computer science. Basically, "log n" is shorthand for "log_c n", where c is whatever constant comes up naturally in the area in question, and this constant is 10 in the traditional use of logarithms for computation, e in mathematics, and 2 in CS. It's true that "ln" was introduced as a special abbreviation for "log_e", and many mathematicians are happy to use it; on the other hand, "log" matches the word that one says, so it has its attraction as well. ---------------------------------------------------------------------- You ask about Rudin's deduction on the third line of p.63 that 1/n log n is decreasing. What he means is 1/(n log n). Actually, he should be stating this for 1/(n (log n)^p) for all positive real numbers p, since that is what he is studying here (see (10) on preceding page). But both cases are clear once one understands what is meant. ---------------------------------------------------------------------- You ask how one verifies Rudin's assertions on p.63 about expressions (12) and (13), which involve log log n in their denominators. In particular, you note that on applying Theorem 3.27 to (12), one gets (1/log 2) Sigma 1/(k log(k log 2)) and ask what one can do with that. Good question! The intuition is that log (k log 2) is very much like log k, and if we had log k in its place, we could apply Theorem 3.29 with p = 1. To get a precise argument, we can use a very crude estimate: k log 2 _< k^2 for k > 1. Hence, as smaller terms have larger multiplicative inverses, the above sum is >_ (1/log 2) Sigma 1/(k log(k^2)) = (1/log 2) Sigma 1/(2 k log k) = (1/ 2 log 2) Sigma 1/(k log k) which diverges by Theorem 3.27. On the other hand, to show that (13) converges, we use a lower bound for k log 2, namely k log 2 >_ k, and easily deduce that the series which Theorem 3.27 obtains from (13) is _< the p = 2 case of (10), hence converges. ---------------------------------------------------------------------- You ask about Rudin's comment that "there is no boundary with all convergent series on one side and all divergent series on another", on p.63. It is indeed very vague, and you don't have to worry about it, but if you're curious, I explain what he means in Exercise 3.7:1 in the exercise packet, and show how to give a simpler proof than in the exercises Rudin gives for it. ---------------------------------------------------------------------- You ask about the relation between the definition of e that Rudin gives on p.63 and the definition that you previously saw: The real number such that the integral of 1/t from 1 to e is 1. I hope that in the same course where you saw that, you were told that the function f(x) = integral_1 ^x 1/t dt is written ln x, and were shown that it has an inverse function, called exp(x), given by the power series Sigma x^n/n!. Substituting x = 1 in that definition, we get exp(1) = Sigma 1/n!. Since exp is the inverse function to ln, this shows that Sigma 1/n! (Rudin's definition of e) is the number which when substituted into ln gives 1, i.e., the number e as you saw it defined. Rudin will also develop the relation between ln x and exp x, in the section of Chapter 8 beginning on p.178. Math 104 only goes through Chapter 7, but you could probably read the beginning of Chapter 8 without too much difficulty now, simply ignoring the details regarding "uniform convergence" and such concepts, which come from Chapter 7. ---------------------------------------------------------------------- You ask about Rudin's assertion "1 + 1 + 1/2 + ... + 1/2^{n-1} < 3" near the top of p.64. The expression shown is 1 + a partial sum of a geometric progression, and he showed how to sum a geometric progression on p.61. ---------------------------------------------------------------------- You ask how the inequality (15) on p.64 follows from the preceding inequality. In the preceding inequality, the left-hand side, s_m, depends on m, while the right-hand side does not; so let us denote it C. The principle being used is that if every term of a sequence is _< a constant C, and if the sequence approaches a limit, then that limit is also _< C. Although Rudin hasn't stated this principle explicitly, it is easy to deduce from the definition of limit and the properties of an ordered field; or, if one wants a shortcut, one can get it from Theorem 3.19, taking for the t_n in that theorem the constant sequence with all terms C, and recalling that if a sequence of real numbers converges, its lim sup and lim inf are both equal to its limit. ---------------------------------------------------------------------- You ask about the end of the first (two-line) display on p.65, where Rudin goes from an expression with an infinite series to "= 1/(n!n"). The infinite series is a geometric progression, so he is applying Theorem 3.26 (p.61) with x = 1/(n+1). ---------------------------------------------------------------------- You ask why q!e is an integer in the proof that e is irrational on page 65. The author is proving this by contradiction, on the assumption that e is the rational number p/q, with p and q integers. Since q! is a multiple of q, the product q!(p/q) is an integer. ---------------------------------------------------------------------- You ask what Rudin means by the statement that e is not an algebraic number, on p.65. An algebraic number is one that is a zero of a polynomial with rational coefficients. (So the square root of 2 is an example of a number that is algebraic, though not rational.) You'll learn more about these if you take Math 113. ---------------------------------------------------------------------- You ask about the relation between p, n and N at the bottom of p.66 and the top of p.67. In the bank of formulas at the bottom of the page, Rudin begins by getting an inequality for |a_{N+1}|, from that he gets one for |a_{N+2}|, and then he puts "...". One should recognize "..." as the symbol for "continuing in this way", and so the line |a_{N+p}| represents what one gets at the p-th step, for any positive integer p. Since the next page begins "That is", the inequality shown there ought to be a translation of the preceding inequality; so we conclude that he is re-naming "N+p" as "n". This makes the resulting inequality hold for all n > N, which is what we need to apply the comparison test as he does. ---------------------------------------------------------------------- You ask what the rules are defining the two series in Example 3.35, p.67. In series (a), the terms are alternately the negative powers of 2 and the negative powers of 3; i.e., the n=2k-1 term is 1/2^k, and the n=2k term is 1/3^k. Series (b) is gotten by taking the series of negative powers of 2, grouping the terms in pairs (the n=1 and n=2 terms, then the n=3 and n=4 terms, and so on) and reversing the order in each pair. Thus, the n=2k-1 term is 1/2^{2k}, and the n=2k term is 1/2^{2k-1}. ---------------------------------------------------------------------- You ask why Rudin computes the lim inf in the second line of calculations under Example 3.35(a), p.67. It isn't strictly needed for the root test; he is helping the reader get an understanding of how the nth root of the nth term of this series behaves by computing both its lim inf and its lim sup. The lim sup, which is needed for the root test, is computed on the next line. ---------------------------------------------------------------------- Regarding the second "lim inf" in Example 3.35, p.67 you ask how we know that (1/3^n)^{1/2n} converges to 1/sqrt 3. Law of exponents! ---------------------------------------------------------------------- You note that Rudin says on p.68 that neither the root nor the ratio test is subtle with regard to divergence, and you ask whether there are more effective tests. Sure -- the comparison test, Theorem 3.25. All the later tests are based on it; they describe special cases of the comparison test which have hypotheses that can be put in elegant form. But when those special cases fail, one can always go back to the comparison test. Recall that in class, I also described the proof of Theorem 3.27 in terms of a comparison, even though Rudin doesn't express it that way. ---------------------------------------------------------------------- You ask how, on p.68, Rudin goes from the 4th-from last display, "c_{N+p} _< ..." to the 3rd-from-last display, "c_n _< ...". He is letting n = N+p. Of course, with N fixed, not every positive integer has the form N+p for a nonnegative integer p, only those that are >_ N do; but the latter inequality is marked "n >_ N", so that fits. Note also that where the indices in the former inequality are N and p, those in the latter are N and n; so he must have eliminated p. Solving "n = N+p" for p, we get p = n - N. Substituting that for p in the former inequality, and using the laws of exponents, one gets exactly the latter inequality. ---------------------------------------------------------------------- You ask about the part of the proof of Theorem 3.37, p.68, after the words "Multiplying these inequalities". Well, the inequalities are given for k = 0,...,p-1. If we multiply them together, we get on the left-hand side the product of the p terms c_{N+1} ... c_{N+p}, while on the right we have beta^p times the product of the p terms c_{N} ... c_{N+p-1}. Those two products have c_{N+1} ... c_{N+p-1} in common; if we cancel that then just c_{N+p} remains on the left, and beta^p c_N on the right, giving the next formula Rudin shows. The next line is obtained by re-naming N+p as n (so that p becomes n-N). The next line is gotten by taking the nth root, and the line after that by applying Theorem 3.20(b). ---------------------------------------------------------------------- You ask about the last step in the proof of Theorem 3.37, on p.69. The general fact being used is: If x and y are real numbers, such that for every real number r > y one has x _< r, then it follows that x _< y. To see that it indeed follows, note that if x _< y were not true, then we would have x>y. Now take r = (x+y)/2. We would have x > r > y, contradicting the assumption beginning "for every r > y ..." above. The fact is being used with x = the left-hand side of the bottom line of p.68, y = alpha, and the numbers beta of the proof in the role of "r". ---------------------------------------------------------------------- Regarding power series in Theorem 3.39, p.69 you ask, if alpha is very small, so that z is allowed to be large, can't z^n grow so fast that it keeps the series from converging. No -- if alpha is small, that means, roughly, that the coefficients c_n of the powers of z get small like a geometric progression with a small ratio. This balances the fact that the powers of z are getting large like a geometric progression with a large ratio. As long as |z| < 1/alpha, we have alpha |z| < 1, which means that the smallness of the coefficients outweighs the largeness of the powers of z, and the series as a whole behaves like (or more accurately, is bounded above by the terms of) a geometric series with ratio < 1. ---------------------------------------------------------------------- You ask how one gets R = +infinity in the last 2 lines of p.69. As I said in class, the ratio test shows that the series shown converges for all complex numbers z; hence by Theorem 3.39, R must be +infinity. So this actually determines the value of the lim sup in that theorem. If you want a direct proof, note that n! >_ [n/2]^{[n/2]}, where [n/2] denotes the integer part of n/2, since when we expand the factorial, there are at least [n/2] terms >_ [n/2]. Now substitute this into the lim sup that we need to evaluate, and simplify using the laws of exponents. ---------------------------------------------------------------------- You ask about using the ratio test to determine the radius of convergence, referring to example 3.40(b) at the bottom of p.69. In general, this can't be done. Rudin proves in Theorem 3.39 that a formula based on the root test always gives the radius of convergence, but he doesn't prove such a result for the ratio test because no such result is true. Using Theorem 3.37, one can see that 1/lim sup |c_n+1 / c_n| will be _< the radius of convergence. Hence, when that value is infinite, as in the example at the bottom of p.69, the radius of convergence must be infinite. But when that value is finite, it doesn't determine the radius of convergence. ---------------------------------------------------------------------- Concerning the first "_<" in the display at the top of p.71, you ask how we know that A_q b_q - A_{p-1} b_{p-1} is >_ 0. We don't. In general, these terms will be complex numbers, for which ">_ 0" is meaningless. But we don't need to. Rudin is using the triangle inequality, p.14 Theorem 1.33(e), which is true for any complex numbers z and w. (And from which one gets the corresponding many-term inequality, 1R:12, p.23.) ---------------------------------------------------------------------- You ask about the inequality in the second line of the first display on p.71, and in particular, how b_n - b_{n+1} >_ 0 is being used. By the triangle inequality, we can bound the summation in the first line by the sum of the absolute values of the terms. The absolute values of the A_n are bounded by M. The relations "b_n - b_{n+1} >_ 0" show that the absolute values of the factors b_n - b_{n+1} are just those factors themselves. This gives the second line. The fact that the terms "b_n - b_{n+1}" can be written without absolute value signs means that we can add them up, cancelling most of the terms, so that the summation from p to q-1 simplifies to b_p - b_q. Adding the two additional terms b_p + b_q we get 2 b_p, which is what the whole expression after the "M" simplifies to on the second line. (The absolute value signs around that expression are not needed, by the way.) ---------------------------------------------------------------------- You ask about the =-sign near the end of the first display on p.71. When you add up the terms in the summation, the summand -b_{n+1} in the n-th term cancels the summand b_{n+1} in the n+1-st term, so the Sigma-expression comes to b_p - b_q. The two terms after that sum add b_q + b_p to this, and the result is 2 b_p. The M on the outside of the absolute value sign turns this to 2 M b_p. ---------------------------------------------------------------------- You ask about the meaning of "alternating series", p.71, sentence after Theorem 3.43. I said in class that the term described series satisfying (a)-(c) of that theorem, not just condition (b); but I see that Rudin says the term refers to condition (b) only. So strictly speaking, my statement was wrong; but nevertheless, a series is said to converge "by the alternating series test" if it satisfies (a)-(c) of that theorem. If it satisfies (b) alone, that doesn't guarantee that it converges. ---------------------------------------------------------------------- You ask how to tell whether a series as in Theorem 3.44, p.71, converges at z = 1. Well, we studied series of positive terms that decrease to 0, and found tests on pp.61-62 for whether these converged, and saw on pp.62-63 that more and more subtle distinctions can occur, which require repeated applications of Theorem 3.27 to detect. Since the z = 1 case of Theorem 3.44 concerns just such a series, Sigma c_n, the answer to "When does it converge?" is "See pp.61-63". What's amazing is that we can say that for any other point z on the circle of radius 1, these sequences do converge, without reference to such tests. ---------------------------------------------------------------------- You ask whether, in view of Rudin's comment on p.72 that the comparison test etc. are really tests for absolute convergence, one can add "absolutely" after "converges" in Theorems 3.25, 3.33 and 3.34. Right! ---------------------------------------------------------------------- You write that the proof of theorem 3.47, p.72, "seems circular". What it actually does is prove one result from another result that is very similar to it; so if one doesn't notice the difference between the two results, it may well seem circular. The result it proves concerns convergence and the limit-value of a sum of two _series_. The result it uses is Theorem 3.3(a), about convergence and the limit-value of a sum of two _sequences_. One goes from one result to the other using the fact that the sum of a _series_ is defined to mean the limit of its _sequence_ of partial sums. ---------------------------------------------------------------------- You ask how, in the displayed calculation on p.75 beginning "|gamma_n| _<", we know that |beta_{N+1} a_{n-N-1} + ... + beta_n a_0| _< epsilon alpha. On the line before the calculation, Rudin chooses N so that all the beta's in this sum have absolute value _< epsilon; so each summand beta_{N+k} a_{n-N-k} has absolute value _< epsilon |a_{n-N-k}|; so by the triangle inequality, their sum has absolute value _< epsilon (|a_{n-N-1}| + ... + |a_0|). Also, by the definition of alpha (the preceding display in Rudin), the sum |a_{n-N-1}| + ... + |a_0| is _< alpha. So the above expression is _< epsilon alpha, as Rudin states. ---------------------------------------------------------------------- You ask about the two steps on p.75 beginning "Keeping N fixed". To see the first step, note that if we keep N fixed but increase n, then we can regard beta_0, ..., beta_N as constant factors which are being multiplied by a sequence of terms, a_n, ... , a_{n-N}, which each go to 0 as n --> infinity. Hence the whole expression |beta_0 a_n + ... beta_N a_{n-N}| approaches 0. So for large enough n it is less than any positive real number epsilon', so the inequality shows that for any positive epsilon', if n is large enough n, then |gamma_n| _< epsilon' + epsilon alpha. This easily implies that lim sup |gamma_n| _< epsilon alpha. Finally, the "epsilon is arbitrary" step is the observation that what we have shown is that for _every_ epsilon > 0, lim sup |gamma_n| _< epsilon alpha. But by taking epsilon small enough, we can make < epsilon alpha less than any positive real number, so lim sup |gamma_n| is less than any positive real number, so it is 0. ---------------------------------------------------------------------- You ask what the rearranged series (23) on p.76 converges to. Figuring that out is outside the scope of this course! Probably some expression involving logarithms and pi. Though there are some series that are easy to sum, others require advanced techniques. But proving that a series does converge, without determining what it converges to, is relatively easy in many cases; so that is what we are studying at this point. ---------------------------------------------------------------------- You ask how, on p.77, lines 3-5, both series being convergent would contradict the hypothesis. It would have helped if you had said whether you did or didn't see _what_hypothesis_ Rudin is referring to. If you did see that, then what you must need are details about how the displayed equation would imply convergence; if you didn't, then what you need is to know what the hypothesis referred to to is. Well, I'll answer the latter question: The hypothesis in question is that Sigma a_n does _not_ converge absolutely (first sentence of the statement of the theorem); in other words, that the right-hand side of the first displayed equation on p.77 does not converge. Does this solve your problem? ---------------------------------------------------------------------- You ask, regarding the last sentence of the proof of Theorem 4.2, p.84 "How does (delta_n) satisfy (6)? Why does (delta_n) not satisfy (5)?" He doesn't say that (delta_n) is the sequence with these properties; he says "taking delta_n = 1/n we thus find a sequence" with those properties. In the previous sentence he has shown that for each delta > 0 there is a point x with certain properties. So when he specifies "delta_n = 1/n", the idea is that for each such delta_n we should take an x_n as in that sentence, and that the sequence (x_n) will have the specified properties. Moral: If the author obtains a fact in the course of a proof, and it is not one of the statements in the theorem, you should note what it is, think about how it is related to what he is trying to prove, and expect him to use it at some later point of the proof. In this case, that applies to the fact noted in the sentence preceding the one you asked about. ---------------------------------------------------------------------- Regarding definition 4.3, p.85, you ask whether there are other examples of functions to a field that is a metric space. Well, there are obvious examples, such as polynomial functions Q --> Q, where Q is both a metric space and a field. But there are also examples that look very different. In Exercise 2.2:14 you saw the n-adic metric on Z. When n is a prime, the p-adic metric can be extended in a natural way to the field Q of rationals, and this can be "completed" with respect to that metric to give what is called "the field of p-adic rationals", denoted Q_p. (Cf. Exercises 3R24 and 3R25, not assigned. The answer to the latter is that the completion of Q with respect to its ordinary metric is R. So the construction of Q_p from Q under the p-adic metric is analogous to the construction of R from Q under the standard metric.) The study of this area is called "p-adic analysis", and is an important tool in modern number theory. But I have never studied it. ---------------------------------------------------------------------- Concerning Theorem 4.4(c) p.85, about the quotient f/g of two real-valued functions, you note that, as Rudin says, this is not defined at points where g = 0, and you ask "is f/g still defined as a function, even though it's not defined at those points?" If g = 0 at some points of E, then f/g is not a function on E. If we let E' = {x\in E : f(x) not-= 0}, then f/g is a function on E'. ---------------------------------------------------------------------- You ask why the remark following Theorem 4.4, p.85, doesn't mention quotients, and you suggest that they might be defined coordinate-wise on R^k. Well, notice first that products also aren't defined coordinatewise on R^k. Only the dot product, a scalar-valued function, is defined. The idea is that the operations we consider on R^k are those that have geometric meaning, independent of the coordinates we choose. It happens that coordinatewise addition, (x_1,...,x_k) + (y_1,...,y_k) = (x_1 + y_1, ..., x_k + y_k), has a geometric meaning, described by the parallelogram rule for adding vectors, and occurring in physics as the way forces add. But if we defined (x_1,...,x_k) (y_1,...,y_k) = (x_1 y_1, ..., x_k y_k), this would depend entirely on the choice of coordinates. E.g., the vectors (1,0) and (0,1) in R^2 would have product (0,0), yet if we used a coordinate system rotated 45 degrees from the standard one, they would take the form (1/sqrt 2, 1/sqrt 2) and (- 1/sqrt 2, 1/sqrt 2), which would have product (-1/2, 1/2), which is not the expression for (0,0) in those rotated coordinates. R^k is being used in this course as a geometric object, so componentwise multiplication and division are unnatural, and are not discussed. (Rudin also uses R^2 as the complex numbers, and for this, a different multiplication and division are used.) In some algebraic contexts, on the other hand, R^k indeed denotes the ring of k-tuples of elements of R under componentwise addition _and_ multiplication, and in such contexts, it does make sense to divide by elements of R^k all of whose coordinates are nonzero. ---------------------------------------------------------------------- Regarding Definition 4.5, p.85, which tells what it means for a function f defined on a subset E of a metric space X to be continuous at a point p of E, you ask whether we consider f be discontinuous at all points not in E. No. The concepts of "continuous/discontinuous" at p are only defined for points p of the domain E. Intuitively, "f is continuous at p" means that the limit as x --> p of f(x) "hits the target" f(p), while "f is discontinuous at p" means it "misses the target". But for points not in E, there is no target. In some calculus texts, e.g., the one used here in Math 1A/1B/53, x = 0 is called a point of discontinuity of the function 1/x. But this is poor usage, which Rudin fortunately does not follow. ---------------------------------------------------------------------- You ask about the meaning of "p" at the bottom of p.85 and the top of p.86. On the first line of that definition, beginning "Suppose ...", Rudin specifies "p \in E"; so it is a point of E. A point of E may or may not be a limit point of E. Theorem 4.6 relates the ideas of Def.4.5, where p must be a point of E but need not be a limit point, and Def. 4.1, where p must be a limit point, but need not be in E. So for that theorem, p must be both in E and a limit point. But in Def.4.5, it need not be a limit point, only a point of E. ---------------------------------------------------------------------- You ask why Rudin only considers neighborhoods V in the second paragraph of his proof on p.87 of Theorem 4.8, when the statement of the theorem refers to "every open set V". First, there's nothing logically wrong with this. The statement to be proved in that paragraph is that the condition on open sets V implies continuity; and that condition includes in particular the case where the open set V is a neighborhood; so if we can prove from this latter case that the function is continuous, we will have proved that the condition on all open sets V implies continuity. So the only objection one could have is that this argument proves continuity from an assumption that is apparently weaker than the open set condition, and that this seems to contradict the other half of the proof, which shows that continuity implies the condition for all open set V. The resolution of this paradox is that the condition that the inverse image of every neighborhood V is open is not really weaker than the condition that the inverse image of every open set V is open. If you go back to the definition of open set, you will find that a set is open if and only if it is a union of neighborhoods; and from this it is easy to see that if the inverse image under some function of every neighborhood is open, the same must be true of the inverse image of every open set. (Note that all this isn't needed to justify Rudin's argument. As explained above, this is indeed valid as it stands.) ---------------------------------------------------------------------- With respect to Theorem 4.10 on p.87, you write > ... I don't think Rudin has ever mentioned it before, but is he > assuming familiarity with the definition of a dot product in > R^K or C^K? When you're not sure about such things, you should look in the index and/or the list of symbols. The index has nothing under "dot". It does have "inner product" and "scalar product", and under "product" it has subheadings "inner" and "scalar", but perhaps you were not familiar with these as terms for the dot product. However, if you go to the list of symbols on p.337 and run down them from the beginning, you quickly come to "x . y inner product". This, like the index-references mentioned above, refers you to p.16. The definition there is the sentence containing the next-to-last display on that page. ---------------------------------------------------------------------- You ask what Rudin means on the last line of p.87 when he says "f + g is a mapping" and "f . g is a real function". "Mapping" and "function" are synonyms; there's no difference in what those mean. The difference he is emphasizing is in their codomains. After "is a mapping" you left out the words "into R^k", while he modifies "function" by "real", which means "into R". So the point he is making is that while in both cases one _starts_ with functions f and g into R^k, addition gives another function into R^k, while the dot product gives a function to R. ---------------------------------------------------------------------- You ask whether the subscript "n_1...n_k" on the "c" in display (10), p.88 denotes the product of n_1 ... n_k, or is simply a label for the coefficient of the monomial x_1 ^n_1 ... x_k ^n_k. A good question. It is the latter. An unfortunate feature in common mathematical notation is to string subscripts together without punctuation. I prefer to write such a subscript as "n_1, ..., n_k"; but general usage is to leave out such commas. ---------------------------------------------------------------------- You ask how Rudin gets from (12) to (13) on p.89. He applies f to both sides of the inclusion. Now clearly when one has an inclusion A \subset B, then applying any function f one also gets f(A) \subset f(B), so applying f to (12) one first gets f(X) \subset f(f^-1(V_alpha_1) \union ... \union f^-1(V_alpha_k)). Also, applying f to sets respects unions of sets, so the above becomes f(X) \subset f(f^-1(V_alpha_1)) \union ... \union f(f^-1(V_alpha_k)). Finally, apply to each of the above "f(f^-1 ... )" expressions the statement he mentions, that f(f^-1(E)) contains E. We thus see that the right-hand side above contains V_alpha_1 \union ... \union V_alpha_k giving (13). ---------------------------------------------------------------------- Regarding the Note after the proof of Theorem 4.14, p.89, you ask "why is not f(f^-1(E)) = E?" and "What did the inverse image f^-1(E) mean again?" For the second question, use the index of the book! (This is also discussed, at greater length, in the handout on Sets and Logic, p.3.) As to why f(f^-1(E)) is not just E, this happens when the range of f doesn't contain all of E. ---------------------------------------------------------------------- You ask about Rudin's use of boldface and italic symbols (e.g., Theorem 4.15 vs. Theorem 4.14 on p.89). He uses boldface x for a point (x_1,...,x_k) of R^k, where x_1,...,x_k, written with italic x, are its components, which are real numbers. He likewise uses boldface f for a function with values in R^k. (As described in Theorem 4.10, p.87, this also has "component" functions f_1,...,f_k.) Whenever a metric space is unspecified, he uses italic symbols for its elements and for functions into it. So his use of boldface is a special usage restricted to R^k, which gives him the convenience of having ready-made symbols, italic x_1,...,x_k, for the components of each k-tuple x in that space. ---------------------------------------------------------------------- Regarding Theorem 4.15, p.89, you're right that the same conclusion would be true with R^k replaced by any metric space. ---------------------------------------------------------------------- You ask, regarding definition 4.18 p.90, > Can you give us an example of a function that is continuous > everywhere but not uniformly continuous? Rudin gives such examples in the proof of Theorem 4.20, pp.91-92. ---------------------------------------------------------------------- You ask, regarding (16) on p.91, > I don't understand why d_x(p.q) < phi(p) > implies d_y(f(p),f(q)) When we talk about right side and left side limits are we still > allowing positive and negative infinity to be limits ... No, Rudin never allows +-infinity as limits! He allows the symbols --> +infinity and --> -infinity, but never calls those values limits (except in the section-title on p.97 in the next reading. But even there, in the section itself he never writes "lim".) That term and symbol are reserved for points in the metric space itself. ---------------------------------------------------------------------- You ask, regarding Example 4.27, p.94, whether the fact that "there are irrational numbers next to every rational number" doesn't guarantee that f(x+) and f(x-) exist and are equal to each other. No. I'm not sure why you think it does. I suspect that you are misinterpreting the symbols f(x+) and f(x-). If you are thinking of them as meaning "the result of applying f to the numbers just before and after x", that is wrong -- there are no numbers just before and after x. For if y were some number less than x that we thought might be "the last number before x", then by taking z = (x+y)/2, we would get a number between x and y, showing that y wasn't the last one after all. Rather, f(x+) and f(x-) mean what Rudin defined them to mean, and if you review that definition, you will see that for the function of Example 4.27 there are no numbers which have the properties asked for. If you have trouble with this point, ask again. ---------------------------------------------------------------------- You ask why we want display (27), "A-epsilon < f(x-delta)_< A", on p.96, and how we get it. Well, the purpose it would have been clearer if Rudin had given an epsilon-delta definition of f(x-) instead of the one that he did. But it is not hard to show the condition defining f(x-) in Definition 4.25 equivalent to the statement that for every epsilon > 0 there exists an delta > 0 such that for all t\in (x-delta, x) one has d(f(t),q) < epsilon. In the context of (27), the inequality we want is thus d(f(t),A) < epsilon; so it will suffice to show, for t in an appropriate segment, that f(t)\in (A-epsilon,A). Since f is increasing, that will follow if we can get f(x-delta) to lie in that range. We can do this by Definition 1.8: A is the least upper bound for f on (a,x), and A-epsilon is less than A, so A-epsilon is not an upper bound for f on that range, so we can find a value of t in that range at which f is > A-epsilon. We call that value "x-delta"; and we have (27). ---------------------------------------------------------------------- You ask about Rudin's remark following the Corollary on p.96. When he says that the corollary implies the statement about countability, he does not mean that it implies it in an obvious way -- he means that with the help of another result, Exercise 17 (which he points us to), one can deduce the fact from the Corollary. That Exercise says that every function has at most countably many discontinuities of the first kind. Since every discontinuity is of either the first or second kind, and this Corollary says monotone functions have none of the second kind, the statement that there are at most countably many of the first kind means that there are at most countably many altogether. ---------------------------------------------------------------------- You ask how the definition Rudin gives of "neighborhoods of +infinity and -infinity" on p.98 are related to the concept of neighborhood we had in Chapter 2. They complement those definitions! Taking the definitions in Chapter 2, which, when the metric space X is the real line, give us the concept of a neighborhood of a real number, together with these new definitions, we have the concept of a neighborhood of any point of the extended real line; and this allows us to extend the concept of limit, which we had defined only for metric spaces, to also apply to the extended real line (which is not a metric space, because distances from either of the infinite points to other points are not defined). Even though this concept of neighborhood does not fit the definition of that concept for metric spaces, it does fall under the concept of topological space, sketched in the addenda/errata. By defining the concept of neighborhood for every point of the extended real line, it makes the extended real line a topological space (in fact, a compact topological space). ---------------------------------------------------------------------- You ask why, for a real-valued function f on a subset of R, Rudin allows himself to use "f(x)-->L as x--> a" when L or a is an extended real number (Definition 4.33, p.98), but restricts "lim_{x->a} f(x) = L" to the case where they are a genuine real numbers. Well, in such situations, there are three cases: (i) What is being approached by x or f(x) is a real number. (ii) What is being approached is +- infinity. (iii) They aren't approaching anything (e.g., sin 1/x doesn't approach anything as x -> 0). Case (i) is "better" in many ways than case (ii): It fits into the scheme of limits in metric spaces (the extended real line is not a metric space); we know that when one number "approaches" another in that sense, it eventually gets within every epsilon of it, etc.. On the other hand, the things that (i) and (ii) have in common, and in contrast to (iii), are sufficiently important that one wants to be able to talk about them together when one can, without separating into cases. In order to "have it both ways", Rudin chooses to let some of his notation (namely "f(x)-->L as x--> a") apply to both cases, while other notation ("lim_{x->a} f(x) = L") is restricted to the "good" case, (i), so that if he uses the latter notation, one knows that one is in that case. Whether it is a good idea to let notation take the place of always saying explicitly what case one is referring to, I don't know. I think that if one likes case (ii) as much as case (i), one tends to feel "Why not allow the same notation in both?", whereas if one feels that case (i) is the case of main interest, and case (ii) is something that one only brings in sometimes because one has to, then restricting part of one's notation to case (i) seems more natural. ---------------------------------------------------------------------- You ask about Rudin's restriction of statements (b), (c), (d) of Theorem 4.34, p.98 to the case where the right members are defined. In cases where they are undefined, such as "infinity - infinity" the behavior of the left-hand side can't be determined by knowing the right-hand side. Often it is possible to determine it from other information, as in the situation you suggest, where one considers (f+g)(t) with g = -f, and f --> +infinity. In that case, of course, the limit is 0; but one doesn't know this from the fact that the right-hand side is "infinity - infinity", but from looking at the functions and seeing that f + g = 0. If instead of g = -f, one chose g = -2f, then we would again have "infinity - infinity", but this time f + g --> -infinity. This theorem only concerns the possibility of determining the limits of f+g, fg, etc. from the limits of f and g as members of the set of extended real numbers, and that can't be done in cases such as infinity - infinity. It doesn't go into other methods of determining those limits. ---------------------------------------------------------------------- You ask what sort of use one-sided derivatives, which Rudin alludes to in the 4th paragraph of p.104, can have. The starting point is simply to note that they give us a way of expressing in precise form properties functions may have even if they don't have full derivatives. For instance, we see that the function f(x) = |x| rises to the right from x=0 with slope 1, while on the left it has slope -1. This can be expressed by saying that at x=0, though it has no derivative, it has right-hand derivative 1 and left-hand derivative -1. And one can prove rules for one-sided derivatives of sums, products, etc.. Then one can look at, say, real-world applications, such as the pressure as a function of the vertical dimension in the neighborhood of an air-water interface like the top of a lake, and use the theory of 1-sided derivatives to study them; or one can consider mathematical applications, such as the study of the properties of the Fourier coefficients of a function which one does not assume to be differentiable, but does assume to have 1-sided derivatives at appropriate points. ---------------------------------------------------------------------- You ask how the proof of Theorem 5.2 (p.104) establishes the theorem. Cutting out the steps that show _why_ the result is true, the statement obtained in the proof is "As t --> x, f(t) - f(x) --> 0". This is equivalent to "lim_{t-->x} f(t) = f(x)", i.e., the statement that f is continuous at x, as stated in the theorem. (In reading a display like the one in this proof, a key question to ask yourself is "Are all the connecting symbols =-signs?" If they are, the line proves two things equal; if there were one or more "_<" signs mixed in with the =-signs, it would be proving that the left-hand side was _< the right-hand side; if, as in this case, there is a "-->" amid the =-signs, then it is showing that the left-hand side approaches the right-hand side. When one sees in this way what a display is establishing, this can help you understand its function in proving the result in question.) ---------------------------------------------------------------------- In connection with Rudin's statement in the paragraph on p.104 just before Theorem 5.3, that it is easy to construct continuous functions that fail to be differentiable at isolated points, you point out that an interval doesn't have isolated points. One can either look at Rudin's statement as a colloquial rather than a technical use of "isolated", or one can take the word in the technical sense, but understand this to mean that if E is the set of points where the easily constructed functions f he is referring to are non-differentiable, all points of E are isolated points of E (though they are not isolated points of the whole interval on which f is defined). ---------------------------------------------------------------------- You ask how one gets Theorem 5.3(c) (p.104) from the display at the top of p.105 using, as Rudin says, Theorems 4.4 and 5.2. To answer that, you have to ask yourself what the relation between that display and the formula in (c) is. The formula in (c) is for (f/g)'. By the definition of derivative, this is the limit as t --> x of ((f/g)(t) - (f/g)(x)) / (t-x). Now at the top of p.105, Rudin has just defined h = f/g, so the left-hand side of the big display is exactly what we want to take the limit of as t --> x. So you should look at the right hand side, and ask "What does it do as t-->x?" You should be able to see what the various parts of it approach, and get a formula for the limit. But how to justify the calculations involved? That is where you use Theorems 4.4 and 5.2. ---------------------------------------------------------------------- You ask about the first displayed formula on p.105, used in the computation of the derivative of f(x)/g(x). To get it, start off by writing out (f(t)/g(t) - f(x)/g(x))/(t-x) as 1/g(t)g(x) times an expression with denominator (t-x). The problem is that in the numerator of that expression, as in the expression I put on the board yesterday for the calculation of the derivative of f(x)g(x), the change between "t" and "x" happens simultaneously in the "f" and "g" terms, so that you do not have a convenient factor "f(t)-f(x)" or "g(t)-g(x)" to which to apply the definitions of f' or g'. In that case, we had f(t)g(t) - f(x)g(x); in this case we have f(t)g(x) - f(x)g(t). The idea of the cure is the same in both cases: subtract and add a term that makes just one of the two changes: in that case this was f(x)g(t), in this case it is f(x)g(x). Then we get f(t)g(x) - f(x)g(x) + f(x)g(x) - f(x)g(t); and factoring g(x) out of the first two terms and f(x) out of the second, we get the expression in Rudin. Do the calculations and check that this works. ---------------------------------------------------------------------- You ask how Rudin gets displays (4) and (5) on p.105. I hope the discussion I gave in class helped; but I followed a somewhat different course there, so I will summarize what Rudin does. Remember that f(t)-f(x) / t-x is the expression whose limit, as t --> x, defines f'(x). Now f(t)-f(x) / t-x is a function defined for all t in the domain of f, other than x itself, and by what I just said, this function approaches f'(x) as t-->x, so it is natural to write it as f'(x) plus a remainder function (expressing the degree to which it fails to actually equal f'(x)); so calling this remainder term u(t), we get f(t)-f(x) / t-x = f'(x) + u(t); and the fact that f(t)-f(x) / t-x approaches f'(x) as t --> x means that u(t) approaches 0 as t--> x. Now if we clear denominators in the above display, we get exactly Rudin's display (4). And the important thing is not that we can write that equation for some function u(t) -- that is trivial -- but that the u(t) that makes this equation hold approaches 0 as t --> x. The same applies to the derivation of (5). But as I say in the errata, to complete the proof, we need a little more than the statements Rudin makes; we need to say that these relations also hold when the factors t-x and s-y are 0, with u(t) and v(s) continuous at the values in question. Moreover, as I showed in class, we can do things more simply than Rudin does. ---------------------------------------------------------------------- You ask, regarding display (7) on p.106, "Is f(x) = x sin(1/x) equal to 0 at x = 0 ?" No. x sin(1/x) is only defined for the values of x such that one can substitute those values into the first factor and the second factor, get numbers, and multiply them together to get the product. Since one can't substitute x = 0 into sin(1/x), the function x sin(1/x) is not defined. However, because x sin(1/x) approaches the limit 0 as x --> 0, it is natural to look at the function which agrees with x sin(1/x) at nonzero values, and has the value 0 at 0; this is the unique continuous extension of x sin(1/x) to the domain R. And when mathematicians are talking or writing informally, they may sometimes refer to that function as "x sin(1/x)" if they are confident that the person they are talking to understands that this is shorthand for "the unique continuous extension of x sin(1/x) ...". Such usage is referred to by mathematicians as "abuse of language" or "abuse of notation". It is not considered bad; but it should only be used when the speaker and reader both understand that what is meant is not exactly what is being said. For this reason, a good author tries to avoid it in undergraduate texts. But in advanced work, one often sees statements beginning "By abuse of language, we shall say ..." or "By abuse of notation we shall write ...", where an author sets down conventions he or she will follow that don't follow the precise meanings of some words or symbols. ---------------------------------------------------------------------- You ask how one gets f'(x) >_ 0 from f(t)-f(x) / t-x >_ 0 in the proof of Theorem 5.8, p.107. Well, f'(x) is defined as the limit as t-->x of f(t)-f(x) / t-x . Do you see how to prove that if a function is nonnegative on a set, and has a limit at a point in the closure of the set, then the value of that limit is also nonnegative? You should be able to deduce this easily from the definition of limit. ---------------------------------------------------------------------- You ask about Rudin's version of the Mean Value Theorem, namely 5.9 on p.107, and in particular, the step "If h(t) > h(a)" at the bottom of the page. I hope my discussion of the theorem helped you with the picture of what is going on. As for that step, recall that we had reduced the situation to showing that a certain function h had derivative 0 somewhere in (a,b). Now we can get this from Theorem 5.8 if we know h has a local maximum or minimum somewhere in (a,b). Since h is continuous on [a,b], which is compact, it has a local maximum and a local minimum on that interval -- but how do we know this doesn't happen only at an endpoint a and/or b, where a maximum or minimum doesn't imply the derivative is 0? For this, we use the fact that h(a) = h(b) (display (12) a few lines earlier). If h is constant, we are done, while if it is not constant, it must somewhere assume a value greater than or less than the common value of h(a) and h(b). In the former case, its maximum must be at a point other than a or b, and in the latter case, its minimum must be at a point other than a or b. That is what Rudin is saying at the place you ask about. ---------------------------------------------------------------------- You ask what Rudin means by saying just before Theorem 5.10 (p.108) that it is a "special case" of Theorem 5.9 (p.107). A "special case", in mathematical language, just means a case subject to some restrictions that are not part of the general statement. Rudin considers Theorem 5.10 as the case of Theorem 5.9 where g = x. ---------------------------------------------------------------------- You ask why Theorem 5.11, p.108, isn't stated in "if and only if" form. It should be! ---------------------------------------------------------------------- You ask whether there is something wrong with the relation between the inequalities g(t_1) < g(a) and g(t_2) < g(b) in the proof of Theorem 5.12, p.108. No. The assumptions on f' make g(t) have negative slope at the left-hand end of its domain [a,b], and positive slope at the right-hand end. If you draw a picture, you will see that this causes values of g(t) _near_ both ends of that interval to be less than the value _at_ those ends. (Of course, after drawing a picture to convince yourself, you should be able to write down the computations using the definition of the derivative to establish that result.) This is important to the proof: it allows us to conclude that the minimum value of g does not occur at an endpoint, so that we can apply Theorem 5.8. ---------------------------------------------------------------------- You ask about the details of the proof of the Corollary of Theorem 5.12, p.109. What one can in fact show is that if a function g has the property that Rudin proves for f' in the theorem, and that if g(x-) exists, then g(x) = g(x-); and similarly if g(x+) exists, then g(x) = g(x+). It is easy to see that these facts imply the nonexistence of simple discontinuities; and once they are stated in this form, I hope you can see that they are easy to prove. ---------------------------------------------------------------------- The five of you all ask how the inequality "< r" in display (18) on p.109 leads to the inequality "_< r" in display (19). The inequality in (19) that you ask about is gotten from (18), as the line between them says, by letting x --> a. Thus, f(y)/g(y) is the limit as x --> a of the left-hand side of (18). If E is a subset of a metric space X, the limit of a sequence in E or a function taking values in E (if this limit exists) does not have to lie in E; it will lie in the closure of E. In this case, E = {s | s < r}, and its closure is {s | s _< r}. The result that the limit of a sequence in E lies in the closure of E is Theorem 3.2(e) (the part of the theorem added in the addenda sheets). For an example where a function which goes through values < r takes on a limit which is not < r, let E = [0,1), let f(x) = x, and consider lim_{x->1} f(x). All the values of f(x) for x\in E are < 1, but the limit is 1, which is _< 1 but not < 1. ---------------------------------------------------------------------- Regarding L'Hopital's Rule (p.109), you ask whether there is any generalization to functions of several variables, using partial derivatives in place of f'(x) and g'(x). I don't see any natural result of that sort. Consider, for instance, the functions y and y-x^2 on the plane. They are both nonzero in the region y > x^2, and their first partial derivatives approach the same values as (x,y) --> (0,0) (namely, 1 in the y-direction and 0 in the x-direction). But their ratio y/(y-x^2) behaves very messily as (x,y) -> (0,0). E.g., along the curve y = x^2 - x^3 it has the value (x^2 - x^3)/(-x^3) = 1 - 1/x, which is unbounded. ---------------------------------------------------------------------- You ask about the second paragraph of the section on Derivatives of Higher order, p.110, and in particular, the use of "x" and "t" on the first line. He is talking about f^n(x) existing. Since f^n is defined as the derivative of f^(n-1), for that derivative to be defined at x, one must be able to differentiate f^(n-1) at x. But to differentiate a function at a point x, one must look at values of the variable near x, and form the difference quotient (1) (p.103). So Rudin must assume f^(n-1) defined in an interval around x. He uses t to denote a variable ranging over that interval. (We can't use the same letter x to denote the point at which we want to evaluate f^n, and the points around it which we use in that evaluation.) ---------------------------------------------------------------------- You ask about the phrase "one-sided neighborhood" in the last paragraph of the section "Derivatives of higher order" on p.110. Well, if f is defined on an interval [a,b], then a neighborhood of a in that interval means the set of points of that interval of distance < epsilon from a i.e., (assuming epsilon < b-a), the set [a,a+epsilon). Likewise, a neighborhood of b in this interval has the form (b-epsilon, b]. Since these are not neighborhoods of those points within R, because they only extend to one side of the point, Rudin calls them "one-sided neighborhoods" here. ---------------------------------------------------------------------- You ask whether in the conclusion (24) (p.111) of Taylor's Theorem, one can bring the right-hand term into the summation P(beta) as the "k=n" summand. No. The kth term in the summation defining P(beta) involves the kth derivative of f at the point alpha, while the final term in (24) involves the nth derivative of f at some point x\in (alpha,beta) -- not at alpha ! (If that term were of the same form as the others, this would say that f was a polynomial function; but there are many nonpolynomial functions with higher derivatives, e.g., sin x.) ---------------------------------------------------------------------- Regarding the sentence before (28) on p.111, you suggest that P(\alpha) is undefined, since the k=0 term contains the expression 0^0. But 0^0 _is_ defined. It has the value 1. Indeed, x^0 has the value 1 for every x ! You probably saw the statement "0^0 is undefined" in pre-calculus or calculus, but not as the definition of what it means to raise a certain number to a certain power. Rather, it is shorthand for "0^0 is an indeterminate form"; meaning that if two quantities, f(t) and g(t) both approach 0, then one can't tell from this information what f(t)^g(t) approaches; equivalently, that the function x^y on R^2 does not approach any limit as (x,y) --> (0,0). This means that there is not an "obvious" choice based on limit-behavior for how 0^0 should be defined. However, various other considerations lead to the definition "0^0 = 1". One of these considerations is that one wants to be able to write a polynomial a_n x^n + ... + a_1 x + a_0 as Sigma a_k x^k, with the k = 0 term representing a_0, as Rudin does. ---------------------------------------------------------------------- You ask about the phrase "the limit in (2) is taken with respect to the norm" on p.112, lines 2-3. A limit of a function between metric spaces is defined in terms of the metrics of the two spaces. What Rudin is saying is that the metric in R^k is the function d(x,y) = |x - y|, where "| |" denotes the norm in R^k, so in the definition of derivative, we have to use that norm to define the limit. Hence to prove that this limit is the vector gotten by taking the derivatives of the components, we have to relate the norm of a difference in R^k with the differences of the coordinates. This was done on the top of p.88 in proving Theorem 4.10, and, in fact, the result desired here can be proved in exactly the same way as that theorem. ---------------------------------------------------------------------- You ask why, as Rudin says on p.112, the mean value theorem doesn't work for complex functions. I'm not sure what you mean. There are three ways one might interpret your question, but each of those you can answer for yourself: (i) "Why doesn't the proof we gave in the real case work for complex functions?" Look at the proof, and try to make it work for complex functions, and you'll see. (ii) "How do we know the theorem is not true for complex functions?" From Example 5.17. (iii) "How does the intuitive picture of why the mean value theorem is true fail for complex functions?" Do a sketch of the function described in Example 5.17, and see what happens. Assuming your question meant one of these three things, if you tried the approach I mentioned under that heading, and were puzzled by some point, I'd be glad to help clarify it. ---------------------------------------------------------------------- You ask about the formulas used in Example 5.17 on p.112. As Rudin tells, you, e^ix is defined as cos x + i sin x. You can differentiate this with respect to x using (29) on p.111, and check that the derivative equals i times the original function (just as the derivative of e^ax equals a e^ax for a real number a). That's all you have to know about this to follow the example. The fact that the absolute values of the derivative is 1 follows from High School facts about sines and cosines. The theory behind all this -- why e^ix is defined this way (or better, how to get a self-contained definition of e^ix, and define sine and cosine in terms of it) belongs to Math 185. ---------------------------------------------------------------------- You ask how, at the bottom of p.112, Rudin goes from |e^it| = 1 to lim_{x->0} f(x)/g(x) = 1. Well, if you write down f(x)/g(x) and simplify, you get 1/(1 + x e^{i/x^2}). You want to be able to say that the denominator approaches 1 because the "x" makes the second summand go to 0. But to say this, you need to know that the factor by which x is multiplied, e^{i/x^2}, doesn't get large. By the formula he quotes, that factor always has absolute value 1, giving you what you need. As is frequently the case, Rudin expects you to do the straightforward calculation, and he supplies the fact you will need at some key step. ---------------------------------------------------------------------- You ask why Theorem 5.10 doesn't hold in the context of Example 5.17, p.112. Theorem 5.10 is a theorem about real-valued functions. The example is of a complex-valued function. (Where Rudin says "Theorem 5.10 fails to hold in this case", I would prefer to say something like "The analog of Theorem 5.10 for complex-valued functions fails to hold"; since, strictly speaking, it is not Theorem 5.10 that fails, but the modified statement. However, Rudin's wording is an easily understood shorthand, as long as the reader keeps in mind what the example just given shows.) ---------------------------------------------------------------------- You ask why display (37) on p.113 has curly braces. There's no good reason. Mathematical typographers used to like to vary the kinds of braces used when one set of them contained another; e.g., see the top display on p.105, or the second line of the top display on p.65. Here we don't have that excuse, but I suppose it was felt that different sorts of braces all denoted the same thing. We can't really know whether it was Rudin or the typesetter who decided to use curly braces there. ---------------------------------------------------------------------- You ask whether, in display (41), p.113, the f's should be boldface, and how that result is obtained from Theorem 5.10. The f's are not meant to be boldface. Rudin has just said that there is a consequence of Theorem 5.10 which remains true for vector-valued functions. In (41) he is stating this consequence (still for real-valued functions); then in Theorem 5.19 he will show that it remains true for vector-valued functions. Display (41) is obtained from Theorem 5.10 by taking the absolute value of both sides, and then noting that the right-hand side refers to the value of |f'(x)| for _some_ x\in(a,b), and will therefore be _< the supremum of |f'(x)| over _all_ x\in(a,b). ---------------------------------------------------------------------- Regarding Chapter 5 (pp.103-114), you ask why Rudin avoids telling us that the derivative has a geometric interpretation. I think the best reason is that he assumes we already know this; that this is not the first time we have seen the concept. The point of the book is to set our mathematical understanding of these topics on a solid foundation, building the reasoning up from scratch. But since our intuitive understanding is not something based on precise reasoning, there's no need to try to build it up from scratch too. Another thing that may contribute is that mathematicians differ a lot in the extent to which they try to convey their intuitions, their sense of why we approach problems as we do, etc.. I am at one extreme in my tendency to try to do so. At the opposite extreme is the idea that all that matters is getting the precise definitions and proofs across. I'm not really sure why people take that view. Perhaps (i) because it takes enough time to go through the details, and we don't have time to spare on the intuitions (and certainly my emphasis on explaining what's behind what we do takes away from time that most instructors spend on proofs), or perhaps (ii) from some feeling that one's intuitions are one's private business, and there's no need to impose our own way of looking at things on someone else, or (iii) from an assumption that anyone who is bright enough to be a mathematician will see the ideas behind the formalism (if so, then the assumption is that they will see things the same way, the opposite of (ii)), or perhaps (iv) that not having had experience in trying to communicate their informal thought processes, it rarely occurs to them to try to do so. I don't really know how much each of these factors contributes. I've never discussed with other mathematicians why they don't try to communicate these things more than they do. Finally, I will say that to interpret the derivative _only_ as the slope of a graph is excessively narrow. The derivative describes the rate of change of one quantity relative to another. When we graph a function, its derivative shows up as slope, but that slope is not the whole "meaning" of the derivative. There are other ways to think than visually. (And even thinking visually, there are obviously other ways to picture a rate of change than a static graph.) ---------------------------------------------------------------------- You ask whether, using Theorem 5.19 (p.113) in place of the Mean Value Theorem, we could get a result for vector-valued function that bears the same relationship to Taylor's Theorem that Theorem 5.19 does to the Mean Value Theorem. To get it starting from Theorem 5.19 might be difficult, because the proof of Taylor's Theorem uses the exact equality g(beta)=0 (p.111, line after (28)) to get an exact equality g'(x_1) = 0, from which we get another exact equality g''(x_2)=0, and so forth, and even if we were only interested in a bound at the last step, it is hard to see how to replace those exact equalities in the proof by bounds. However, we could get a result such as you suggest from Taylor's Theorem in exactly the way we get Theorem 5.19 from the Mean Value Theorem. Namely, if we form a vector-valued function P(t) from a vector-valued function f(t) by formula (23) (bottom of p.110), then we can take a vector of unit length u such that |f(beta) - P(beta)| = u . (f(beta) - P(beta)). We then apply Taylor's Theorem to the real-valued function u . f(t). It is easy to check that the polynomial that takes the role of P(t) for this real-valued function is exactly u . P(t), so Taylor's Theorem gives us u . f(beta) = u . P(beta) - u . f^{n)(x)(beta-alpha)^n/n!. Bringing the P-term over to the left, using the assumption |f(beta) - P(beta)| = u . (f(beta) - P(beta)), and the fact that |u.f^(n)(x)| _< |u| |f^(n)(x)| = |f^(n)(x)|, we conclude |f(beta) - P(beta)| _< |f^(n)(x)(beta-alpha)^n /n!| for some x\in (alpha,beta), which is the result you asked for. ---------------------------------------------------------------------- Regarding the sentence "If the upper and lower integrals are equal ..." on p.121, you write > I can't seem to accept the fact that the upper and lower integrals > are ever going to equal over some random partion "P." For each partition, we have an upper and lower _sum_, U(P,f,alpha) and L(P,f,alpha). In general, there won't be any partition for which these sums are equal. (All the less likely that they will be equal for "a random partition" as you say). But the definition of the upper and lower integrals does not say that they are numbers U(P,f,alpha) and L(P,f,alpha); it says that they are, respectively, the infimum of the former numbers, and the supremum of the latter as P ranges over _all_ partitions. These infimum and supremum, i.e., greatest lower bound and least upper bound, are, intuitively, the common value that these sums _approach_ from above and below as one takes finer and finer partitions P. I hope I made clear in class the importance of the phrase following displays (5) and (6) on p.122, "the inf and sup again being taken over all partitions". This means, for instance, that in interpreting the right-hand side of (5), one looks at all partitions P (an uncountable set), from each of them one gets a number (namely, U(P,f,alpha)), and the "inf" referred to is the infimum of the set of all these numbers. You also ask whether Rudin actually proves that these upper and lower integrals are equal in Theorem 6.6. No. First of all, they aren't equal for all functions f; i.e., not all f are integral. For the proof that certain large classes of functions are integral, one has to be patient. He is first developing general properties of upper and lower sums and integrals that he will use; then in theorems 6.8 and 6.9 he will use these properties and prove general conditions for integrability. ---------------------------------------------------------------------- You ask how we can get from the definition of Riemann integral on p.121 to results such as that the integral from a to b of x^2 is the difference between the values of x^3 /3 and b and a. With patience. The purpose of this course is to understand the mathematical underpinnings of the concepts of calculus, rather than jump to the computational methods (which are taught in Math 1AB). We will spend the next few readings on the general properties of the concept of integration we have defined, and finally, in Theorem 6.21, prove a result from which the example you asked about, and examples like it, follow. ---------------------------------------------------------------------- You ask about the statement at the top of p.122, "since alpha(a) and alpha(b) are finite, it follows that alpha is bounded on [a,b]." The statement "alpha(a) and alpha(b) are finite" simply reflects the fact that "function" refers to real-valued function. Extended reals are never meant unless this is indicated explicitly (e.g., in Theorem 5.13, where he writes "-infinity _< a < ...", showing that a may be -infinity). ---------------------------------------------------------------------- You comment that in the definition of the Stieltjes integral on p.122, the terms M_i Delta alpha_i look like "multiplying height by height" rather than the "base times height" one would expect in a measure of area. Well, of course if you graph the curve y = alpha(x), then the values of alpha will appear as heights. But the idea of this integral is, rather, that the numbers Delta alpha_i = alpha(x_i+1) - alpha(x_i) are to be thought of as generalizations of Delta x_i = x_i+1 - x_i; in other words, the Delta alpha_i is "Delta x_i with some weighting factor thrown in." ---------------------------------------------------------------------- You ask whether displays (5) and (6) on p.122 are for the same partition P. Neither of them is for a single partition P ! (5) is the infimum of U(P,f,alpha) over _all_ partitions P, and (6) is likewise the supremum of L(P,f,alpha) over all partitions. Remember that U(P,f,alpha) can be pictured as the area of a certain union of "bars" whose tops are above the graph of f, and L(P,f,alpha) as the area of a union of bars whose tops are below the graph. In general, no one L(P,f,alpha) will equal the integral; they all overshoot, but by taking the inf of all of them, we get what we hope will be the integral, and we call the "upper integral". Similarly, the U(P,f,alpha) in general all undershoot, but we take the supremum of all of them and call it the lower integral. ---------------------------------------------------------------------- You ask about the meaning of "the bar next to a and b" in expressions like (5) and (6) on p.122. Its being "next to a and b" is poor typesetting. It should be thought of as a bar above or below the integral sign, standing for "upper integral", respectively "lower integral". (If I were doing the typesetting, I would move the b down in (5) and move the a up in (6), so they wouldn't be next to the bars.) ---------------------------------------------------------------------- You ask about the definition of refinement on p.123, saying that you don't understand intuitively what exactly a refinement is. "Refinement" is not an isolated concept. What one talks about is a refinement of a _partition_, so understanding the concept of refinement depends on understanding the concept of partition: What partitions are, and what they are used for. And the concept of partition is not an isolated concept: It depends on what they are used for, namely in defining upper and lower sums. So to see what refinements are about, you need to think about upper and lower sums, and how they vary when partition used is changed. Have you been attending class? I put pictures on the board Wednesday illustrating this. I talked about how we could prove the relation L(P_1,f,alpha) _< U(P_2,f,alpha) for different partitions P_1 and P_2 of the same interval, and I showed that the idea that would work would be to compare each of them with the partition that contained both all the points of P_1 and all the points of P_2. Since the method was then to note that it is easy to relate upper or lower sums with respect to two partitions when one of the partitions contained all the points of the other (and perhaps more), we wanted a name for such a partition, and we would call it a "refinement" of the other. ---------------------------------------------------------------------- Regarding the proof of Theorem 6.5, p.124, you ask "how do we divide the set of all partitions of [a, b] into two subsets P_1 and P_2?" We don't! Just as a statement that something is true for all real numbers x and y doesn't mean we divide real numbers into x's and y's, but that we consider all pairs of real numbers, whether the same or different, so here we are considering simultaneously all choices of two partitions P_1 and P_2, both those where they are the same and those where they are different. ---------------------------------------------------------------------- You ask why, at the bottom of p.124, one can choose P_1 and P_2 so as to make (14) and (15) hold. This follows from the definitions made in (5) and (6), p.122, and the fact that when the integral exists, it is by definition equal to both the upper and lower integrals. ---------------------------------------------------------------------- You ask about the claim of uniform continuity of f in the proof of Theorem 6.8, p.125. Remember where uniform continuity first showed up? It is a strengthening of continuity, which we saw that every continuous function on a compact metric space satisfies. And a closed interval [a,b] is compact. It's used not just in the proof of Theorem 6.8, but also in those of Theorems 6.10 and 6.11. ---------------------------------------------------------------------- You write > p126. Display (17) M_i - m_i <= eta > > Q. Why do we include the equality here, given Display (16)? The same > question occurred in the proof for Theorem 6.10 (p.127) ... And it also occurred in the proof of L'Hospital's rule, p.109 where several people asked why the "<" in equation (18) became the "_<" in (19). I answered that at the time, and then in writing the second Midterm I included Question 2(b), which in essence asked whether a sequence of points s_n, each of which satisfied 0 < s_n < 1, could approach a limit that did not satisfy those "<" relations. All but two people got that question essentially right, including you; so you ought to be able to apply the same understanding here. ---------------------------------------------------------------------- You ask about Rudin's choice of a partition satisfying Delta alpha_i = (alpha(b) - alpha(a))/n in the proof of Theorem 6.9, p.126. He doesn't have to choose that partition in particular; he just needs a partition in which the Delta alpha_i are all small. (Remember that in my discussion of the result in class, the proof was based on making the little rectangles whose total area we want to make small have uniformly small widths Delta alpha_i, and have heights whose sum is bounded, namely by f(b) - f(a).) He chooses to do this in the most simple-minded way: by dividing the range of possible values, [alpha(a), alpha(b)] into n equal parts, and (at the end of the proof) letting n become large. In any case, the key idea is that once we have chosen what values between alpha(a) and alpha(b) the alpha(x_i) should have, we can find x_i at which alpha indeed has these values, by the Intermediate Value Theorem (Theorem 4.23). ---------------------------------------------------------------------- You ask, regarding the proof of Thm 6.9 on page 126, > How does rudin get from the sum of f(x_i)-f(x_i-1) to [f(b)-f(a)]? Well, tell me what you think you get if you add up the terms f(x_i)-f(x_i-1) . If the answer isn't clear, try a case such as n=3, so that you can write out the terms explicitly. ---------------------------------------------------------------------- You ask about the first display on p.127: U(P,f,alpha) - L(P,f,alpha) _< [alpha(b)-alpha(a)] epsilon + 2M epsilon. Well, notice that U(P,f,alpha)-L(P,f,alpha) is a sum of terms (M_i - m_i) Delta alpha_i. For those summands that do not correspond to intervals [u_j,v_j], the factor M_i - m_i is < epsilon; moreover, the terms Delta alpha_i summed over all i give alpha(b)-alpha(a), so summed over this subset of values, they sum to _< alpha(b)-alpha(a); so this subset of the summands adds up to _< [alpha(b)-alpha(a)] epsilon. For those summands that _do_ correspond to intervals [u_j,v_j], the numbers Delta alpha_i add up to _< epsilon. (See preceding page, next-to-last sentence, where those intervals were chosen.) Also, as noted on p.127, the differences M_i - m_i are _< 2M. So the sum of these terms is _< 2M epsilon. Thus, all the terms together come to _< [alpha(b)-alpha(a)] epsilon + 2M epsilon. ---------------------------------------------------------------------- You asked about the purpose of the delta in the proof of Theorem 6.10, which first appears on p.127 line 3. I described that proof in lecture as "divide and conquer" -- we want to make U(...)-L(...), which we picture as a union of rectangles, small, and to do this, we make sure that most of the rectangles have small height, while the few of them that surround points of discontinuity of f have small width. By making the Delta x_i (for all the intervals except those around the points of discontinuity) less than delta, we get the M_i - m_i i.e., the heights of the corresponding rectangles, less than epsilon. ---------------------------------------------------------------------- Regarding the proof of Theorem 6.10, pp.126-127, you ask whether the idea is simply to integrate "as usual but omitting the intervals containing the discontinuities". Well, from the point of view of the proof, it is more correct to say that the discontinuities are treated differently, not omitted. As I showed in class, we set up our partition so that in the rectangles whose areas we want to sum to < epsilon, those around the discontinuities have small width (and bounded height, and there are only fixed number of them) while all the rest have small height (and the sum of their widths is bounded). It is true that if in computing the upper and lower sums, we simply omitted the rectangles containing the discontinuities, these sums would approach the correct integral; but approaching the computation that way would mean changing our definition of integral, which we don't want to do. Also, though one can talk about omitting certain intervals when computing the upper and lower sums, this doesn't make as much sense when computing the integral itself, since we pass from the sums to the integral by letting all our intervals be chopped into smaller and smaller pieces; so in the limit, there are no nonzero "intervals containing the discontinuities". So in summary, what you said has a germ of the right idea behind it, but can't be used as an exact statement of what we are doing in the proof. ---------------------------------------------------------------------- You ask how we know, at the start of the proof of Theorem 6.11, p.127, that phi is uniformly continuous. By Theorem 4.19. ---------------------------------------------------------------------- You ask, regarding the second line of the proof of Theorem 6.11, p.127, why Rudin can assume delta < epsilon. Uniform continuity allows him to take some delta with the property stated in the last part of the sentence. But if delta has that property, so does every smaller positive value; so if delta is not < epsilon, we can replace it by epsilon, and the property will be retained. In other words, start with a delta_0 that has the property which follows from uniform continuity, and then let delta = min(delta_0, epsilon). (Incidentally, the version of the proof that I gave in class didn't need to assume delta < epsilon, because in (18) I put delta epsilon on the right, instead of delta^2; which made later computations simpler.) ---------------------------------------------------------------------- You ask about the assumptions on f_1, f_2 in Theorem 6.12(b), p.128. Rudin should have stated that they are both in R(alpha). ---------------------------------------------------------------------- You ask whether, in getting display (20) on p.128, Rudin is using a result saying that (for infima taken over the same interval) inf(f_1) + inf(f_2) <= inf(f_1 + f_2). Yes. This principle wasn't stated because it follows directly from the definition of inf: Every value of f_1 is >_ inf(f_1), and every value of f_2 is >_ inf(f_2); hence every value of f_1 + f_2, being the sum of something >_ inf(f_1) with something >_ inf(f_2), is >_ inf(f_1) + inf(f_2). This makes inf(f_1+f_2) >_ inf(f_1)+inf(f_2). ---------------------------------------------------------------------- You ask why the final and initial "_<" signs in display (20) on p.128 are not "=" signs. Let's consider the very simplest case -- suppose the function alpha(x) is just x, and the interval has length 1, and the partition P just consists of the two points x_0 = a, x_1 = b, and the functions f_1 and f_2 actually assume maximum and minimum values (e.g., they're continuous functions on [a,b].) Then, the questions you are asking come down to "Isn't the maximum value of f_1 + f_2 on [a,b] just the sum of the maximum values of f_1 and of f_2?" The answer is no -- not unless f_1 and f_2 take on their maximum values at the same point; and the same applies, of course, to minima. You should play with examples of where they don't and see that the maximum or minimum of the sum is not in general the sum of the maxima or minima, but that it is related to it by the inequality used in (20), and you should convince yourself that this inequality always holds. For example, consider the minima of the functions (x+1)^2 and (x-1)^2 on the real line; or the maxima of the functions x and -x on [-1,+1]. In the case where we don't have maxima and minima, but suprema and infima, the problem is, similarly, that though the two functions each get near their suprema, the points where f_1 comes within epsilon of its supremum may be different from the points where f_2 comes within epsilon of its supremum. ---------------------------------------------------------------------- You ask about the inequality U(P,f_j,alpha) < integral f_j d alpha + epsilon of the second display on p.129. This comes from a combination of the bottom display on p.128 and the last inequality in the proof of Theorem 6.7 on p.125. The latter inequality shows that the integral lies between the upper and lower sums, the former shows that these are < epsilon apart, hence the integral lies within epsilon distance of each of them. ---------------------------------------------------------------------- You ask how, on p.129 Rudin goes from the display before (21) to (21) itself. The general principal that he is using is that if we have reals numbers x and y, and we know that x < y + epsilon for every positive epsilon, then x _< y. You should be able to prove this from material in Chapter 1. Something that often confuses students is the switch from "<" to "_<". But we can't preserve the "<". For instance, 1 < 1+epsilon for every positive epsilon, but it is not true that 1 < 1 ! ---------------------------------------------------------------------- You ask about the step after (21) on p.129. What Rudin means is that if we do the same calculation that gives (21), but with -f_1, -f_2 and their sum -f in place of f_1, f_2 and their sum f, we get integral -f d alpha _< integral -f_1 d alpha + integral -f_2 d alpha. Pulling out the "-" signs, and then multiplying the inequality by -1, which changes the _< to >_, we get integral f d alpha >_ integral f_1 d alpha + integral f_2 d alpha as required. ---------------------------------------------------------------------- You ask about Rudin's argument "replace f_1 and f_2 with -f_1 and -f_2" at the end the proof of Theorem 6.12, p.129. You're right that something additional is needed to justify it. That additional thing is the second statement of part (a), with c = -1. To prove that statement, one notes that U(P,-f,alpha) = - L(P,f,alpha) and L(P,-f,alpha) = - U(P,f,alpha), so when (13) holds for f, it also holds for -f, and the integral of -f is the negative of the integral of f. ---------------------------------------------------------------------- You ask about the last sentence in the proof of Theorem 6.15, p.130. Where Rudin says "x_2 -> s" he should say "x_1 and x_2 -> s". To fill in the argument: Continuity of f at s means that for every epsilon we can find a delta so that in a delta-neighborhood of s, f(x) stays within epsilon of f(s). So if we take a partition P such that x_1 and x_2 are both within delta of s, then all values of f(x) for x\in[x_1,x_2] will be within epsilon of f(s), so in particular, their supremum and infimum will be within epsilon of f(s); can you take it from there? ---------------------------------------------------------------------- Regarding Theorem 6.16 on p.130, you ask whether we are assuming f is bounded on [a,b]. We don't have to assume this -- it follows from a result in Chapter 4. Can you locate that result? ---------------------------------------------------------------------- You ask how Rudin gets the inequality alpha_2(b)-alpha_2(a) < epsilon on the line between (24) and (25), p.130. This comes from the choice of N (first display in the proof), and the definitions of \alpha_2(a) and \alpha_2(b). ---------------------------------------------------------------------- You ask about the relation between alpha as "the weighting" in our integrals, and the alpha' in Theorem 6.17, p.131. Although the choice of alpha determines how the contributions of f(x) for different values of x are weighted, the alpha itself is not the weighting factor -- i.e., it is not true that if alpha(x) is large for a certain value of x, then the corresponding values of f(x) are counted with greater weight. Rather, it is the _change_ in alpha that determines the weighting -- shown by the factors "Delta alpha_i" in the formulas for upper and lower sums, and symbolized by the "d alpha" in the symbol for the integral. Since alpha'(x) describes how fast alpha(x) is changing relative to the change in x, it is natural that when we go from integrating with respect to alpha to integrating with respect to x, a factor of alpha' must be inserted to get the same integral. One could say that the weighting with respect to the change in alpha is "implicit" in the definition of integral f d alpha, and is made "explicit" in the formula integral f alpha' dx. ---------------------------------------------------------------------- You ask how Rudin gets from (30) on p.131 to the display after it. Well, (30) implies that Sigma f(s_i) Delta alpha_i _< (Sigma f(s_i) alpha'(s_i) Delta x_i) + M epsilon . (I.e., since the two summations differ by _< M epsilon, one of them is at most the other + M epsilon.) Now in the summation on the second line above, each term f(s_i) alpha'(s_i) is _< the supremum of f(x) alpha'(x) over [x_{i-1}, x_i]; and when we replace these terms by those suprema, the summation becomes precisely the definition of U(P, f alpha'); so the second line is _< U(P, f alpha') + M epsilon . Combining these two inequalities, we get the asserted inequality. ---------------------------------------------------------------------- You ask about the statement "... (28) remains true ... Hence (31) also remains true ..." at the top of p.132. That is so because (31) was deduced from (28). That is, Rudin has shown that if P is a partition such that (28) holds, then (31) also holds. Now if P' is a refinement of P, (28) will hold with P' in place of P, hence (31) will also hold with P' in place of P. ---------------------------------------------------------------------- You ask how the statement at the top of p.132 about (31) remaining true if P is replaced by a refinement is needed in getting the first display. Remember that the two upper integrals in that display are defined as _infima_ of upper sums. If we merely knew (31) for _some_ partition, we would know that _some_ upper sum in the definition of one integral is near to _some_ upper sum in the definition of the other; but this would not show that the infimum of all uppers sums of one sort was near to the infimum of all upper sums of the other. However, we know that by going to refinements P* of P, we can get upper sums (of each sort) that are as near as we want to those respective infima. (We do this by taking a common refinement P* of our partition P, any partition P' for which (13) holds for the first integral, and any partition P" for which (13) holds for the second integral.) So those infima must themselves be within epsilon of each other. ---------------------------------------------------------------------- You ask about Rudin's use of the term "pure step function" (p.132, third line of Remark 6.18). To me, a step function means a function that jumps at finitely many points, and is constant everywhere else. So (22), which he calls a pure step function, is to me a little beyond the border of that concept, though still close to it. Anyway, it isn't important to settle on the exact definition; it isn't an important concept in the course; just a source of an occasional interesting example. ---------------------------------------------------------------------- You ask about the physical example on p.132, second paragraph of Remark 6.18. The concept of the moment of inertia of a body arises when one considers rotating the body about some axis (in this case, an axis through x=0 and perpendicular to the wire). This takes energy, and comparing two bodies of the same mass, it will take more energy to rotate (at the same number of degrees or radians or rotations per unit time) the body whose mass is distributed farther from the axis than the body whose mass is concentrated closer to the axis, since the speed at which a particle of the rotating body moves is proportional to the distance from the axis, and the energy is proportional to the square of the speed. If you ask about this at office hours, I can discuss it in more detail. ---------------------------------------------------------------------- I hope what I said in class helped with your question as to why Rudin approaches the proof of Theorem 6.20, p.133, as he does, though I only focused on the final half: proving that F' = f. Except in cases where unusual tricks are needed, the idea of finding a proof is always to ask "How does what we are given relate to what we want to prove?" It is definitely a topic which a math major should think about. But aside from my lectures, I think the best way I can help with this is if you come to office hours with a result where you don't see why the proof proceeds as it does, and I ask questions, seeing where you already have the right understanding, and where I need to point something out, provide ask a leading question that points you to some approach you aren't seeing. Of course, if you submit your question in advance by e-mail, then I will try to work more of what you need into the lecture. ---------------------------------------------------------------------- You ask about Rudin's translation of F(y) - F(x) to the integral from x to y of f(t)dt in the next-to-last display on p.133, saying that he is using the fundamental theorem of calculus before he has proved it. Although this is the same equation "F(y) - F(x) = \integral_x ^y f(t)dt" that one has in the statement of the fundamental theorem of calculus, it is not an instance of that theorem here! The fundamental theorem of calculus says (ignoring technical details) that that equation holds when f is the derivative of F. But "f is the derivative of F" is not something we are given in Theorem 6.20 -- quite to the contrary, it is something we are trying to prove! What we are given about the relationship between F and f is that F(x) = \integral_b ^x f(t)dt. For F so defined, the equation in question is just an instance of Theorem 6.12(c) -- which is exactly the statement Rudin quotes to justify it. ---------------------------------------------------------------------- You ask how, on p.134, line before 3rd display, Rudin makes use of Theorem 6.12(d) to get the final "< epsilon". By the first display on that page, the absolute value of the integrand is < epsilon for all u in the range of integration. So we can apply Theorem 6.12(d) (with that integrand for "f" and the identity function for "alpha"). Thus, the absolute value of the integral is < (t-s) epsilon; so on division by t-s we get a value < epsilon. ---------------------------------------------------------------------- You ask how the "f(x_0)" in the calculation at the top of p.134 ends up with a "t-s" in the denominator. I hope what I showed in class made the answer clear: One multiplies and divides f(x_0) by t-s, getting f(x_0) = (t-s)f(x_0) / (t-s). The numerator of this expression can be looked at as the integral of the constant f(x_0) from s to t; so the result is "integral_s ^t f(x_0) dx / (t-s)". ---------------------------------------------------------------------- You ask whether Rudin's statement near the beginning of the proof of Theorem 6.21, p.134 that the mean value theorem furnishes points t_i \in [x_i-1, x_i] should be "\in (x_i-1, x_i)". The mean value theorem does say that there will be such points in (x_i-1, x_i); but since (x_i-1, x_i) \subset [x_i-1, x_i], this immediately implies t_i \in [x_i-1, x_i]. Rudin frequently refers, not precisely to what a theorem says, but to some immediate consequence, which he expects the reader to see follows. (Often he has a reason for wanting the consequence rather than the original result; here he could just as well have said "\in (x_i-1, x_i)".) ---------------------------------------------------------------------- You ask about the relation between a 3-term version of Integration by Parts that you learned, and the 4-term version Rudin gives in Theorem 6.22, p.134. I don't know exactly what the 3-term version you learned looked like. Perhaps it had a term F(x)G(x)|_a ^b (i.e., the difference between the values of F(x)G(x) for x=a and x=b). If so, that one term, when expressed explicitly, becomes two terms in Rudin's form. Or it might have been a version for indefinite integrals, which one could get by replacing all occurrences of "b" in Rudin's expression by a variable, say u, and then regarding the definite integrals with upper limit of integraion that variable as indefinite integrals. In that case, the F(b)G(b) in Rudin's equation would become F(u)G(u), but the F(a)G(a) would be discarded because it is constant, and constant summands are ignored when looking at indefinite integrals. Either way, it is essentialy the same result as Rudin's. ---------------------------------------------------------------------- You ask whether the formula for Lambda(gamma) in Theorem 6.27 p.137, combined with the fundamental theorem of calculus, implies that a closed curve has length 0. No! The fundamental theorem of calculus does show that the integral of gamma' is zero -- that integral is the vector from the initial point to the final point of the curve. But in defining Lambda(gamma), one integrates the absolute value, |gamma'|, and this is not the derivative of a function which has the same value at the two ends. ---------------------------------------------------------------------- You ask how Rudin gets the inequality |gamma'(t)| _< |gamma'(x_i)| + epsilon in the middle of p.137. He gets this using the preceding inequality, |gamma'(s) - gamma'(t)| < epsilon. Rudin has substituted s = x_i (you should see what the assumptions on s and t in the above inequality are, and verify that they hold for s and x_i in the present context, so that this substitution is valid), and then used the observation that if two vectors are at distance < epsilon from each other (as the above inequality says), then their absolute values will differ by < epsilon, so in particular, the absolute value of one cannot excede the absolute value of the other by more than epsilon. I spoke above of "the observation that if two vectors are at distance < epsilon from each other then their absolute values differ by < epsilon" to give you the intuition. To get the formal proof, apply Theorem 1.33(e) with z = gamma'(x_i), and w = gamma'(t) - gamma'(x_i), remembering to use the relation |gamma'(x_i) - gamma'(t)| < epsilon at the end. ---------------------------------------------------------------------- You ask about the first step in the 4-line display on p.137. Rudin takes |gamma'(x_i)| Delta x_i = |gamma'(x_i) Delta x_i| and writes the value inside the absolute value signs as the integral of the constant vector gamma'(x_i) over the interval from x_{i-1} to x_i. Then, in preparation for his next step, he adds to and subtracts from that constant vector the function gamma'(t). ---------------------------------------------------------------------- You ask how in the 4-line bank of computations on p.137 Rudin gets from the next-to-last line to the last line. (I wish I could feel confident that what I did in class made it clear, but I'm afraid I did not handle that computation very clearly. So --) The first integral on the next-to-last line gives precisely gamma(x_i) - gamma(x_{i-1}); so the first summands are equal. The second term on the next-to-last line is the integral over an interval of length Delta x_i of an integrand of absolute value everywhere < epsilon, so the integral has absolute value _< epsilon Delta x_i. Adding this to the epsilon Delta x_i already in the next-to-last term, we get the 2 epsilon Delta x_i of the last term. ---------------------------------------------------------------------- You ask whether Lambda(gamma) can still be computed by the formula of Theorem 6.27 (p.137) if gamma' has discontinuities. Right -- as long as it's Riemann integrable. But the proof would have been still messier -- the same sort of "divide and conquer" approach used in the proofs of Theorems 6.10 and 6.11; one would consider separately intervals where gamma' varies by more than epsilon and intervals where it doesn't; and Rudin evidently decided to spare us yet another of those. Also, the cases going beyond continuous gamma' that come up most naturally are those where it has points where the derivative is undefined -- "corners", for instance -- and the definition of Riemann integral requires that the integrand be everywhere defined, so the generality gained would not have been that helpful. ---------------------------------------------------------------------- In your question about Rudin's Example 7.4, p.145, you say that cos 10 pi = .853. That is the cosine of 10 pi degrees! In pure mathematics, the trigonometric functions are always understood to take their inputs in radians. pi radians = 180 degrees, so 10 pi radians = 1800 degrees = 5 x 360 degrees, which has cosine equal to 1. ---------------------------------------------------------------------- You ask why Rudin's statement that m! x is an integer just before display (8) on p.145 is restricted to the case m >_ q. For m! x to be an integer, the denominator q of x must cancel with some factor of m!. We know that for m >_ q, m! is divisible by q, but in general we don't know this for smaller m. ---------------------------------------------------------------------- You ask how Rudin gets display (11) on p.146 from Theorem 3.20(d). That part of Theorem 3.20 says that as n --> infinity, any constant power of n divided by the nth power of a real number >1 approaches 0. But to divide by the nth power of a real number >1 is the same as to multiply by the nth power of a positive real number < 1, and the "1-x^2" in (10) has that form. ---------------------------------------------------------------------- You ask what the M_n are in Theorem 7.10, p.148. Any sequence of nonnegative real numbers. A more precise statement of the theorem would be that if for n = 1,2,..., f_n are functions and M_n are nonnegative real numbers such that |f_n(x)| _< M_n for all x\in E, then the stated conclusion holds. ---------------------------------------------------------------------- You ask about Rudin's use of a three-term triangle inequality in (19), p.149. The three-term triangle inequality says |p+q+r| _< |p| + |q| + |r|. This is easy to see from the two-term triangle inequality, by applying that result first to |p + (q+r)|, and then applying it again to the term |q+r| that one gets; so it is taken as clear (though Rudin did make an exercise of the n-term triangle inequality in Chapter 1, namely 1:R12). Here it is used, as you note, with p, q and r having the forms x-b, b-c, c-a; so it takes the form |x-a| _< |x-b| + |b-c| + |c-a|. In that form, it may not be quite so obvious how to prove it; but it clearly follows from the "|p+q+r|" case, whose proof is clear. ---------------------------------------------------------------------- You ask where the proof of Theorem 7.11 (p.149) needs uniform convergence. It is needed in two places: In the second sentence, N is chosen so that any m and n >_ N satisfy (18) simultaneously for all t. Once this is done, one can say that for any such m and n, by taking t close enough to x we can get f_m(t) as close as we want to A_m and f_n(t) as close as we want to A_n, and still know that f_n(t) and f_m(t) will satisfy (18). If we had only pointwise convergence, we wouldn't be able to start this process: To get f_m(t) and f_n(t) close to each other we would need to choose in advance a specific t, but once we had chosen m and n for some such specific t, we couldn't vary t to be sure that f_m(t) and f_n(t) got close to A_m and A_n, without risking losing the relation saying they were close to each other. We use uniform convergence again in choosing n so that (20) holds; we would get the same sort of problem there if we just had pointwise convergence. ---------------------------------------------------------------------- You ask whether the epsilons in (18) and the next display on p.149 are the same. Yes, they are. Although I got the second of those displays in class by using in place of (18) the corresponding inequality with epsilon/3 in place of epsilon, Rudin's method is equally valid. To put his reasoning in a form closer to mine, one could start with (18), then observe that for any epsilon', one can take t close enough to x so that |f_m(t) - A_m| and |f_n(t) - A_n| are both _< epsilon'. Combining with (18), this gives |A_n - A_m| _< epsilon + 2 epsilon'. But this argument shows this is true for all epsilon' > 0, whence we must have |A_n - A_m| _< epsilon. ---------------------------------------------------------------------- You ask whether display (19) on p.149 is gotten by the triangle inequality. Right. Specifically, one needs two applications of that inequality: First one notes that |f(t)-A| _< |f(t)-A_n| + |A_n-A|, and then that |f(t)-A_n| _< |f(t)-f_n(t)| + |f_n(t)-A_n|. (Or one can do it in the reverse order, first inserting f_n(t) between f(t) and A, and then A_n between f_n(t) and A.) This corresponds to the observation I showed diagrammatically on the board in class, by drawing three lines connecting f(t) to A, which I pointed out each corresponded to a length of at most epsilon/3. ---------------------------------------------------------------------- You ask whether on p.149 Rudin needs to choose a neighborhood to get (22) to hold, rather than just choosing "n large enough" as he did to get (20) and (21). Yes, he does. He got (20) and (21) by using the facts that f_n --> f and A_n --> A as n --> oo. But he also needs to use the fact that f_n(t) --> A_n as t --> x (the theorem is certainly not true without this); and this kind of limit-concept is equivalent to a statement about neighborhoods, not to one about taking n large. ---------------------------------------------------------------------- You ask how the proof of Theorem 7.13 on p.150 uses Theorem 4.8 to show that K_n is closed. Actually, this is more immediate from the Corollary to Theorem 4.8, applied to K_n regarded as g_n ^{-1}([\epsilon, infinity)). ---------------------------------------------------------------------- You asked why the proof of Theorem 7.13, p.150, isn't finished once one gets g_n(x) --> 0. Because that only shows that f_n --> f pointwise; it does not show uniform convergence. I.e., given epsilon, we know that for each point x there is an N such that for n >_ N, f_n(x) - f(x) < epsilon, but we don't know that there is one N that works simultaneoulsy for all points x. That is what the magic of compactness gives us. When we have an N such that K_N (and hence all K_n for n>_N) is empty, that means that the set of x for which our inequality fails is empty for all n >_ N, so that N works for all points x. ---------------------------------------------------------------------- You ask why the example f_n(x) = 1/(nx+1) (p.150, first display) would not continue to have the properties Rudin indicates if we threw 0 and 1 into the domain, making the domain compact. Throwing in 1 makes no difference, but if we throw in 0, then lim_{n->infinity} f_n(0) = 1, while the value of the limit function is 0 everywhere else, so that function is not continuous, hence no longer satisfies condition (b) of Theorem 7.13. In fact, while Rudin gives this sequence of functions on (0,1) as an example showing the need for compactness in the theorem, the same sequence of functions on [0,1] shows the need for continuity of f in condition (b) thereof. By Theorem 7.12, the convergence on [0,1] is not uniform. ---------------------------------------------------------------------- You ask how Rudin goes from the next-to-last display on p.150 to the last display. Well, the next-to-last display shows us that |h(x)| _< ||f|| + ||g|| for all x\in X. That makes ||f|| + ||g|| an upper bound for the values h(x) (x\in X), so it is >_ their _least_ upper bound; i.e., we get ||h|| _< ||f|| + ||g||; and inserting the definition of h we have the last display. ---------------------------------------------------------------------- You ask how we get the triangle inequality for the metric space \cal C(R) from the inequality ||f+g|| _< ||f|| + ||g|| at the bottom of p.150. Look at p.17, last step of the proof of Theorem 13.7, where Rudin indicates how to get the triangle inequality for R^k, namely, condition (f) of that theorem, from the inequality corresponding to the one you note, namely condition (e). (The equivalence of those two inequalities is so basic that the condition ||f+g|| _< ||f|| + ||g|| in the general definition of "norm" -- which Rudin doesn't give us here, since we just look at the "sup norm" -- is also called "the triangle inequality".) ---------------------------------------------------------------------- You ask what it means to say a sequence converges "with respect to the metric on \cal C(X)" in the italicized statement near the top of p.151. When Rudin defined what it meant for a sequence in a metric space to converge, in Definition 3.1, p.47, the definition made use of the metric (= distance function) d, but d wasn't named explicitly in the phrase "{p_n} converges", because it was viewed as "given". However, in a situation where we take something that was just seen as a set (e.g., \cal C(X)), and define a metric on it, this creates a concept of convergence where we didn't have one before, so to speak explicitly of the concept of convergence arising from this metric, we speak of a sequence converging "with respect to the metric". So when you say "Does this just mean that {f_n} converges in \cal C(X)", the answer is "Yes, it means {f_n} converges in \cal C(X), made a metric space using the indicated metric, d(f,g) = ||f - g||." You also say it seems strange that one should talk about them converging in \cal C(X), when they live in X. They are functions on X, but members of \cal C(X); no contradiction here. ---------------------------------------------------------------------- You ask about the "< 1" in the last sentence of the first paragraph of the proof of Theorem 7.15, p.151. Rudin is applying the statement that the functions f_n approach f uniformly, but using it in a very weak way: Although one can get all f_n after some point to be within epsilon of f for any epsilon > 0, here he just needs one such f_n, and just one "epsilon". It doesn't matter what the value of epsilon is, so he takes it to be 1. All he has left to prove is that f is bounded, and the relation |f_n(x)-f(x)| _< 1, plus the boundedness of f_n, are enough for that. ---------------------------------------------------------------------- You ask why it suffices to prove Theorem 7.16 (p.151) for real f_n, as stated in the first line of the proof. This is because C is regarded as R^2, and integration of an R^k-valued function is defined using its real components (Def. 6.23, p.135). Likewise, limits of vector-valued functions are given by componentwise limits. Rudin doesn't say this very explicitly, but the proof of Theorem 4.10 shows the method that can be used. ---------------------------------------------------------------------- You ask why on p.151 Rudin uses the values epsilon_n defined in (24), instead of choosing an arbitrary value epsilon > 0 and noting that for n large enough, the supremum in (24) will be less than epsilon. He could have done it that way, and it would have fit better with the way he does other proofs. On the other hand, what he does here is perfectly good; so by being a little inconsistent in his methods, he has exposed you to a slightly different way of setting up an argument. So it's hard to say whether he "should" have done it differently. If I were editing this proof, perhaps I would use the present structure, but rename what he calls "epsilon_n" as "M_n", following the notation of Theorem 7.9. ---------------------------------------------------------------------- You ask how the Corollary on p.152 follows from Theorem 7.16, p.151. Remember that the sum of a series means the limit of the sequence of partial sums. So if we write g_n(x) = Sigma_i=1 ^n f_i(x), then we can apply Theorem 7.16 to the sequence of g_n and their limit, f. One then expresses the integral of g_n as the sum of the integrals of f_1 , ... , f_n (Theorem 6.12(a)), and notes that the right-hand sum in the conclusion of the Corollary is also defined as a limit of partial sums. ---------------------------------------------------------------------- You ask how equation (30) on p.153 is obtained from (29). As Rudin says, he applies the Mean Value Theorem to f_n - f_m. This tells us that the difference between the values of that function at x and at t will be given by a certain expression involving the derivative of f_n - f_m at a certain point s between x and t. If you write down the equation expressing this fact, you will see that the left hand side is precisely the expression in absolute-value signs on the left of (30). The right-hand side will be an expression involving f'_n(s) - f'_m(s). Putting this expression in absolute value signs, and applying (29), you should get the middle step of (30), which it is easy to see is _< the right-hand term of (30). ---------------------------------------------------------------------- You ask whether the inequality below (30) on p.153 is some sneaky application of the triangle inequality. Well, verifying that it follows from the triangle inequality requires no cunning: The term inside the left-hand absolute value signs is the sum of the terms inside the other two, so it is a straightforward case of the triangle inequality. Finding it, if you had to prove this on your own, would take a little ingenuity, but not too much if you keep in mind what you are trying to do, namely to show that f_m - f_n gets small when m and n get large. We know one point where this is true by hypothesis, namely x_0, so we may regard the expression f_m(x_0) - f_n(x_0) as "something small". We also know by (30) that the difference between this and f_m(t) - f_n(t) is "something small", for any t. So voila, we add the first small thing to the small difference and conclude that their sum, f_m(t) - f_n(t), is small. That tells us what terms to use in the triangle inequality. ---------------------------------------------------------------------- You ask about the "much shorter proof" mentioned at the bottom of p.153. Assuming f'_n continuous, let g(x) = "the integral of f'_n"; precisely, the function Integral_a ^x f'_n(t) dt). By Theorem 6.20, this function will have the same derivative as f_n, so it will equal f_n + a constant, say c_n. By Theorem 7.16, these functions g_n converge to the integral of lim_n f'(n), and from the proof of that theorem one can verify that the convergence is uniform. Moreover, from the assumption that {f_n(x_0)} converges one can prove that the constants c_n converge, so the functions f_n = g_n + c_n converge uniformly to the integral of lim f'(n) plus lim_n c_n, which is a function with derivative lim_n f'(n). ---------------------------------------------------------------------- You ask how equations (34), phi(x) = |x|, and (35), phi(x+2) = phi(x) on p.154 are consistent with each other. Equation (34) is not assumed to hold for all x, only for x\in[-1,1]. When so understood, it does contradict (35). (So if you want to know, say, phi(1.7), you cannot use (34) directly because 1.7\notin [-1,1]. But you can use (35) to tell you that phi(1.7) = phi(-.3), and then use (34) to tell you that phi(-.3) = .3. Hence phi(1.7) = .3.) ---------------------------------------------------------------------- You ask where the equation |gamma_m| = 4^m just before the last display on p.154 comes from. It follows from the choice made in the line after (38). This implies that phi(4^m x) and phi(4^m (x+delta_m)) lie on a straight-line segment in the definition of phi (with no bends between them); hence they differ by exactly 4^m delta_m, hence when one applies the definition (39) one gets |gamma_m| = 4^m. ---------------------------------------------------------------------- You ask how the last display in the proof of Theorem 7.18, p.154, shows that f is not differentiable at x. The fraction between the absolute-value signs is simply the value of |(f(t)-f(x))/(t-x)| when t = x + delta_m. As m-->infinity, delta_m approaches 0, so x + delta_m approaches x. So if (f(t)-f(x))/(t-x) were approaching some real number r as t-->x, these values of that fraction would have to approach r as m --> infinity; but as Rudin shows, they instead approach infinity themselves. ---------------------------------------------------------------------- You ask what Rudin means by "finite-valued" in the third line of p.155. He simply wants to emphasize that the function phi takes values in the real numbers, not the extended real numbers. It would have been better if he had said "real-valued". ---------------------------------------------------------------------- You ask how Rudin gets the last display on p.155. It's a standard result that for any two distinct integers m and n, the functions sin mx and sin nx are "orthogonal" on the interval [0,2pi], i.e., that the integral of their product is zero. Hence when one expands the square in the integrand in that display, the mixed terms have integral zero, and one just gets the sum of the integrals of two "sin^2" functions, which is easy to evaluate. The orthogonality referred to above can be proved by trigonometric identities (there's an identity expressing a product of sines as a linear combination of sines and cosines); another way, if you are familiar with the elementary properties of complex exponentials, is to use the formula sin x = (e^ix - e^-ix)/2i. Then (sin mx)(sin nx) can be expanded as a linear combination of four exponential functions, none of which is constant, so each of them, as x moves over the range [0,2pi], goes a nonzero number of times around the unit circle in C, and has integral 0. Another way to see this is to pair each of those exponentials off with the exponential having the negative of the same exponent. Each such sum gives us a constant times a sine or a constant times a cosine, and these each integrate to zero. ---------------------------------------------------------------------- You ask why Rudin considers the functions of Example 7.21 (p.156) only on [0,1], though they have the same properties on the whole real line. It is because some results that are true on compact sets are not true on arbitrary metric spaces. He noted immediately after the proof of Theorem 7.13 an example showing that the statement fails on a non-compact set. So here he wants to show you that _even_ on a compact set, existence of a uniformly convergent subsequence can fail. This is preparation for Theorem 7.25, which will show that the condition of equicontinuity, together with compactness of the given space, do imply existence of a uniformly convergent subsequence. ---------------------------------------------------------------------- You ask about the meaning of a "family \cal F" of functions in the definition of equicontinuity, Def.7.22, p.156. As Rudin is using the term "family", it simply means "set". So the definition says that for every epsilon, there is a delta that works simultaneously for all the different functions f in the set; i.e., that makes |f(x)-f(y)| < epsilon for all choices of x and y with d(x,y) < delta. ---------------------------------------------------------------------- You ask why Theorem 7.23 is put right before Theorem 7.24 (pp.156-157); -- whether this is because it is needed for the proof of the latter. No, they are both placed where they are because of their relation to Theorem 7.25. Theorem 7.23 will be used in proving Theorem 7.25, while Theorem 7.24 and 7.25 are partial converses to each other, both concerning the relationship between equicontinuity and uniform convergence. ---------------------------------------------------------------------- You ask why the fact, noted parenthetically on p.157 three lines above 7.24, that the first n-1 terms of the sequence S may not belong to S_n, does not interfere with the conclusion that S converges at x_n. The convergence of a sequence is not affected by modifying finitely many terms at the beginning. Think about a convergent sequence, such as 1, .1, .01, .001, ... . Now suppose we change a few terms at the beginning, getting 3, 17, .1, .01, .001, .0001, ... . Do you that the resulting sequence still approaches zero; that the "3, 17" have no effect on this? Now can you turn this into a general proof? So as Rudin observes, since {f_n,k (x_n)} converges, the subsequence consisting of the members of that sequence that belong to {f_k,k (x_n)}, being a subsequence of a convergent sequence, also converges, and hence the sequence S itself, which consists of those terms with possibly finitely many modifications at the beginning, also converges. ---------------------------------------------------------------------- You ask about the difference between Rudin's taking sup(M_i) on p.158, and the "sup" that you suggested a few days ago, which I said could not be used because the supremum might not be finite. The difference is that there are only finitely many M_i, and a finite set of real numbers always has a finite supremum, namely the largest of those numbers. (You ask whether the difference is because here the space is compact. Not directly. But compactness, expressed as a finite-covering property, allows one to choose the finite set of points p_1,...,p_r, in terms of which the finitely many M_i are chosen. So indirectly, yes.) ---------------------------------------------------------------------- You ask about an application of the uniformly convergent subsequences given by Theorem 7.25 (p.158). Exercise 7R25 (not assigned) is such an application, showing existence of solutions to certain differential equations. I had hoped to talk about it today; if I have time, I'll do so Friday; if not, then probably in one of the days after the final reading. The situation is such that by methods of "successive approximation" one gets functions that come closer and closer to satisfying the differential equation, yet, contrary to what one expects of such methods, don't necessarily come closer and closer to each other, so they don't necessarily form a convergent sequence of functions. But by Theorem 7.25, the sequence of approximate solutions has a uniformly convergent subsequence, and one proves that this is a genuine solution to the equation. ---------------------------------------------------------------------- You ask whether Theorem 7.26 (p.159) would work equally well on open intervals. No. For instance, the function f(x) = 1/x on the interval (0,1) cannot be represented as a uniform limit of polynomials, because f(x) is unbounded, and a uniform limit of bounded functions is bounded. ---------------------------------------------------------------------- Regarding Weierstrass's Theorem (p.159) you ask > ... what if the domain is a compact space in the complex plane ... ? We'll see in the very last reading that the Stone-Weierstrass Theorem needs an extra condition for complex-valued functions; in particular, the uniform closure of the algebra of polynomials in a single complex variable z on, say, the unit square of the complex plane is far from consisting of all continuous functions. Functions belonging to that closure will in fact be infinitely differentiable on the interior of that square. That is something you will see proved in Math 185! However, if one considers the algebra of all real polynomial functions in two real variables x and y on the unit square, that will have all continuous real valued functions as uniform closure. And even if one takes all polynomials with complex coefficients in x and y, that will have all continuous complex valued functions as uniform closure -- that algebra satisfies the extra condition to be introduced the final theorem of this course. ---------------------------------------------------------------------- Regarding Weierstrass's Theorem, p.159, you ask: > If complex functions are part of real analysis, then what is > Math 185 about ... Math 185, and generally speaking, Complex Analysis, concerns the very surprising properties of functions f from an open set in the complex plane to the complex numbers that are differentiable, in the sense that lim_{t->x} (f(t)-f(x))/(t-x) exists. (Since the real numbers don't form an open subset of the complex plane, differentiable functions of a real variable don't fall under this description.) Until we touch on this, just about anything we prove about complex-valued functions is just a consequence of facts about real-valued functions, and so belongs under real analysis. ---------------------------------------------------------------------- You ask about the first equality of the multi-line display on p.160. As you note, P_n(x) on the left corresponds to the integral of f(x+t) Q_n(t) dt on the right, so the question is whether one can justify f(x) being the integral from -1 to 1 of f(x) Q_n(t) dt. And indeed, it is! The integral is with respect to t, so in this integration, f(x) is a constant and can be pulled out, giving us f(x) times the integral from -1 to 1 of Q(t)dt. And, as stated in (48) (with a different variable of integration), that integral is 1. ---------------------------------------------------------------------- You ask about the relation between the sense given to the word "algebra" in Definition 7.28 (p.161) and other meanings of the term that you have seen. These meanings are essentially the same -- one has a system with operations of addition, internal multiplication, and multiplication by some external "scalars". In Rudin, the "scalars" are real or complex numbers; in the concept of R-algebra that you have seen, they are members of a commutative ring (of which the real and complex numbers are special cases). Certain conditions (such as commutativity and associativity of addition, and distributivity for internal and scalar multiplication) are always assumed when one speaks of an algebra; others, such as commutativity and/or associativity of internal multiplication, may or may not be assumed. You ask why one doesn't have more specific terms. One does. One can speak of R-algebra (R a commutative ring) or k-algebras (k a field); one can speak of commutative algebras, not-necessarily-commutative associative algebras, and nonassociative algebras such as Lie algebra. In a context where only one sort of algebra is being considered, an author may conveniently use "algebra" to refer to that concept; so Rudin here uses "algebra" to mean "subalgebra of the commutative C-algebra of all complex functions on a set E, under pointwise operations". There is also a much more general sense of "algebra" than these: In the area of math called "universal algebra" (or "general algebra"), all of the sorts of objects considered in the area of algebra -- groups, rings, lattices, vector spaces, etc. -- and in general, all structures consisting of a set with a family of specified operations on it -- are called "algebras". The ambiguity between this sense of "algebra" and the more specific sense of the preceding paragraph is a real problem; but we don't have an alternative term for either concept, so it looks as though we will live with it for a long time to come. ---------------------------------------------------------------------- You ask about the definition of the uniform closure of a set of functions in Definition 7.28, p.161, versus the general definition of closure in Definition 2.26, where the former expresses the closure as a union of two kinds of points, and the latter uses a single criterion. The approach to closure given by Definition 2.26 is one of the things I like least about Rudin's book. If you look back at Definition 2.18(b) where "limit point" is defined, Rudin specifies that every neighborhood of p should contain a point of E _other_than__p__itself_. If one combines this with Definition 2.26 and thinks things through, one sees that a point p lies in the closure of E if and only if every neighborhood of p contains a point of E (whether equal to p or not). I would consider that as the natural concept, and for the occasional situation where one wants to refer to what Rudin calls limit points of E, I would say that p is a limit point of E if and only if p lies in the closure of E - {p}. Starting from the above definition of closure (or criterion for closure, if one takes Definition 2.28 as the definition), one finds that a point of a metric space X lies in the closure of a subset E if and only if there is a sequence of points of X having E as limit. The equivalence is easily seen from Theorem 3.2(d) and (e) (where (e) is the extra part of that theorem given in the Errata sheets). And this result reconciles the definition of uniform closure with our previous concept of closure in a metric space. ---------------------------------------------------------------------- Both of you ask, regarding Definition 7.28, p.161, whether the uniform closure of an algebra of functions must contain that algebra. Certainly! If f\in A, then we can form a sequence (f_n) taking all f_n to be f, and this sequence will approach f uniformly, so it will belong to the uniform closure. ---------------------------------------------------------------------- You ask for an example of an algebra that is not uniformly closed (Def. 7.28, p.161). The algebra of all polynomial functions on an interval [a,b]! Theorem 7.26 shows that its closure is the algebra of all continuous functions on [a,b], but it itself does not contain all continuous functions, so it is not closed. ---------------------------------------------------------------------- You ask about the last line of p.161, where Rudin applies Theorem 2.27 to the algebras occurring in the proof of Theorem 7.29; you note that that theorem was stated for metric spaces. Rudin is being a little sloppy in not saying what sort of set the functions of Theorem 7.29 are defined on. If they are defined on a metric space and if they are continuous, then they belong to the metric space C(X) of Definition 7.14, p.150, justifying his argument. In fact, arbitrary bounded complex-valued functions on an arbitrary set E can be made into a metric space in exactly the same way, and uniform convergence is again convergence in the uniform metric, so using this observation, his assertion can be justified. This construction C(E) can be reduced to the construction C(X) that he gives by making E a metric space in which every pair of distinct points has d(x,y) = 1. With respect to that metric, every function is continuous, so C(E) is indeed just the set of all bounded functions. The only difficulty is pedagogic -- that argument involves putting on E a particular metric, and if we want to apply the result in a situation where a different metric already exists, this can be confusing. But there is no logical difficulty; mathematicians can and do consider different metrics on the same space, and study the relation between functions continuous in one metric and functions continuous in the other. We just haven't been doing it in this course. So in conclusion: The statement at the bottom of p.161 can be justified, but Rudin ought to say more. ---------------------------------------------------------------------- You ask whether the statement "A separates points" defined in the first paragraph of p.162 means that there is one function f such that for every pair x_1, x_2 of distinct points, f(x_1) is distinct from f(x_2), or just that for each such pair of points there is a function f which separates them. It means that for each pair of points there is a function f which separates them. This follows from Rudin's wording: "... to every pair of distinct points x_1, x_2 \in E there corresponds a function f ...". When we look at families of functions on subsets of the line, it often happens that there is one function that does it for all pairs of points; but if we look at continuous real functions on more complicated spaces -- e.g., on a 2-dimensional region in R^2, or even on the circle {(x,y) | x^2 + y^2 = 1}, we find that no one such function can separate all pairs of points; yet the family of all such functions does.) ---------------------------------------------------------------------- You ask how Rudin comes up with the functions u and v defined in the proof of Theorem 7.31, p.162, and why they satisfy the equations he shows. First, perhaps the way he writes those functions is not entirely clear. Having fixed two points x_1 and x_2, and chosen g, h, k \in A as in the first display of the proof, he gives equations defining u and v; in those definitions, whenever "g", "h" and "k" appear without an argument, they denote those functions, while when they have an argument x_1 or x_2, they denote the constants that one gets when one substitutes x_1 or x_2 respectively into those functions. So the definitions he gives actually mean that for all y\in K, u(y) = g(y)k(y) - g(x_1)k(y) and v(y) = g(y)h(y) - g(x_2)h(y). His goal in making these definitions was to get a function u that would be zero at x_1 but not x_2, and a function v that would be zero at x_2 but not x_1. One way of looking at how one could come up with those formulas is to imagine that first one said, "Oh, since g has different values at x_1 and x_2, we can just subtract from it the constant g(x_1), and we'll have a function that is 0 at x_1 but not x_2." Then we remember that our algebra A may not contain the constants, so that idea may not give a function in A. But aha -- every member of A _times_ any constant is a member of A, so if we take the above first attempt, namely g - g(x_1), and multiply it by any k\in A, we will get a member of A, and clearly it will still be zero at x_1. But will it still be nonzero at x_2? It will if k is; and by the "nowhere vanishing" assumption on A, we can choose k so that it is indeed nonzero at x_1. ---------------------------------------------------------------------- Both of you ask whether uniform limits such as are considered in Theorem 7.32, p.162, can give discontinuous as well as continuous functions. If we start with an algebra A of continuous functions, then anything obtainable from functions in A by uniform limits will also be continuous. That was one of the early results in this chapter, Theorem 7.12, p.150. Of course, if we started with an algebra A containing some discontinuous functions, we would also get discontinuous functions among our limits. ---------------------------------------------------------------------- You ask about the relation between continuity of h_y, and display (56) on p.164. The nicest way to get the connection is from Theorem 4.8. In applying that theorem here, we can take Y = R, the function referred to in the theorem to be h_y(t) - f(t), the open set V to be (-epsilon, +infinity], and J_y to be the inverse image of that set under that function. Alternatively, we can apply Definition 4.1, again to h_y(t) - f(t), conclude that that function is within epsilon of 0 in some neigborhood of y, hence in particular, is > -epsilon in that neighborhood. ---------------------------------------------------------------------- You ask about the use of "adjoint" in the term "self-adjoint algebra" on p.165. I don't know the earliest history of the use of the term, but I can sketch the general use of which this case is an instance. Suppose we have a real or complex finite-dimensional inner product space V, in the sense of Math 110. A basic property of such a space is that every linear functional f on V, i.e., every linear function f from V to the base field, R or C, is given by the inner product with a fixed element of V. I.e., given f, there exists a unique x_f\in V such that for all y\in V, f(y) = (y, x_f) (or y . x_f if that was the notation you saw.) Now given a linear map among such inner product spaces, T: V --> W, we know that for every linear functional f on W, the composite map f o T will be a linear functional on V; and this construction acts as a linear map from the space of linear functionals on W to the space of linear functionals on V. That is true for any vector spaces; but because these are inner product spaces, linear functionals on them correspond to elements, as in the preceding paragraph. Thus for any linear map T: V --> W we get a linear map T^t: W --> V, called the "adjoint map". It is characterized by the equation (T x, y) = (x, T^t y) for x\in V, y\in W. (In terms of an orthonormal basis for V, is given by the conjugate transpose of the matrix of the original map. This is the "adjoint" you saw in Math 110.) In particular, if T is a linear map from a finite-dimensional inner product space V to itself, called a "linear operator" on V, then its adjoint is another linear operator on V; and one calls an algebra A of linear operators on V "self-adjoint" if for every T\in A, we also have T^t\in A. Now a large part of modern analysis consists of the study of infinite-dimensional vector spaces having properties like those of finite-dimensional inner product spaces. The nicest class of these, called "Hilbert spaces", are characterized by the condition that the _continuous_ linear functionals are precisely the functionals given as in the first paragraph above by inner products with fixed elements. So one of the favorite kinds of objects for analysts to study are algebras of continuous operators on Hilbert spaces, and a powerful property to assume for such an algebra is self-adjointness. Now recall that while real inner product spaces satisfy the condition that the inner product is linear in each variable, on a complex inner product space, the inner product is linear in the first variable and conjugate-linear in the second. One finds as a consequence that in an algebra of operators on a complex Hilbert space, the adjoint operation will be conjugate-linear. Finally, continuous functions on a compact metric spaces, such as we have been studying in Rudin, can be regarded as operators on appropriately constructed Hilbert spaces. (If we restrict to the finite-dimensional case, this corresponds to looking at n-tuples of complex numbers (z_1,...,z_n) as defining diagonal matrices, with z_1,..., z_n on the diagonal.) An algebra of such functions corresponds to an algebra of such operators; and the latter will be self-adjoint in the sense sketched above if and only if the former is self-adjoint in the sense defined in Rudin! ---------------------------------------------------------------------- You ask about the justification for the equation "f = lambda g" in the sixth line of the proof of the theorem 7.33, p.165. That equation is taken to define f. The first sentence of the paragraph refers to any f\in A, and the conclusion of that sentence is applied to two particular functions f later in the paragraph: in the second sentence, to any f satisfying f(x_1) = 1 and f(x_0) = 0; in this sentence, to the function defined by f = lambda g. I can see how you might have assumed that "f" denoted the same function as in the preceding sentence. The best I can say on how one would realize that it did not is to pay attention to the "rhythm" of the proof: At the end of the preceding sentence, the verification that A_R separates points is over with, so the notation that went with that verification should (probably) be forgotten. ----------------------------------------------------------------------