---------------------------------------------------------------------- You ask how one gets the equation |b||q|=|a| in the 4th paragraph of p.4. Note that that equation is followed by the words "for some q". Note also that the sentence is referring to the situation where b|a. Now "b|a" means that a = bq for some q; and if we take absolute values, we get the equation you asked about. Moral: look at the context. ---------------------------------------------------------------------- You ask whether the Well-Ordering Principle (p.4) can be proved. Within the context of Set Theory (Math 135), when one gives the construction of the set of natural numbers, one can show that it satisfies the Well-Ordering Principle. The question of how the number system is constructed set-theoretically is outside the scope of Math 113, so in this course we have to take the Well-Ordering Principle as an axiom. However, as I mentioned in class yesterday, the Well-Ordering Principle is equivalent to the principle of Mathematical Induction; so if you accept Mathematical Induction as an axiom, you can prove the Well- Ordering Principle from it! (The idea is that if S is a subset of the natural numbers which does not have a least element, then one proves by induction that for all n, n is not a member of S, which says that S is empty. This gives the Well-Ordering Principle in contrapositive form.) ---------------------------------------------------------------------- You ask what is meant by a lower bound on a set (p.4, bottom). This is an element that is less than or equal to all members of the set. (It is only meaningful for sets that belong to some ordered system, such as the set of real numbers. In contrast, there is no concept of one complex number being "less than" another, so we can't talk about a set of complex numbers having a lower bound.) ---------------------------------------------------------------------- You ask how one knows to begin the proof of Theorem 1.1.3, p.6, by taking the set R = {a-bp : q\in Z}. By thinking about what the "idea" of division-with-remainder is. The idea is to pull out of a as big a multiple of b as one can; equivalently, to pull out a multiple of b that will leave the smallest possible remainder. The possible remainders are the elements a-bp that are >_ 0, so we take the set R of all elements a-bp, look at the subset R^+ of its nonnegative elements, and apply the well-ordering principle. ---------------------------------------------------------------------- You ask how, in the proof of Theorem 1.1.4, p.7, we can show that I is contained in bZ when we don't know what I consists of. Well, whatever I consists of, we choose b _in_terms_of_that_set_ I, as described in the last sentence of top paragraph on that page. The argument of the third paragraph shows that if we choose b in that way, then I can't contain anything outside of bZ. I hope that when you re-read the proof, this will make sense. If it still feels unsatisfactory, I suggest you try to put together a set I that is closed under + and - for which the theorem is false. Seeing how an attempt to do this falls through will help show you why the result is true. ---------------------------------------------------------------------- You ask, regarding the definition of the greatest common divisor on p.7, how to prove that condition (ii) is satisfied. A definition simply says what meaning we will give to a word or phrase, so a definition doesn't have to be proved! If you are used to a different definition of "greatest common divisor", namely the largest of the integers that divides both a and b, then their definition shows that the meaning we will be giving the phrase in this course is _not_ the same as the one you are used to; and just as when you switch from one book to another, you have to adapt to different uses of symbols, so in using this book, you will have to adapt to a different meaning being given to this phrase. Of course, it is natural to ask whether the different definitions (the one you are used to and the one Beachy and Blair give) might really describe the same thing; and the answer turns out to be yes! For once they prove (on the next page) that any two integers (not both zero) have a "greatest common divisor" d in their sense, it follows that if we let d' be the greatest common divisor in the sense you are used to, then d' must be a divisor of d (by the condition d is assumed to satisfy), hence must be _< d, but it will also be >_ d (by the condition d' is assumed to satisfy), so d = d'. As I commented in class, though both definitions are (as this shows) equivalent for integers, the concept of "greatest common divisor" is also useful for other entities (such as polynomials), for which the book's concept is useful but the "familiar" one is meaningless. This is the justification for their introducing their unfamiliar definition. ---------------------------------------------------------------------- You ask about the reason for the statement at the end of the second paragraph on p.9, "if b > 0 and b|a, then (a,b)= b". Have you checked the definition of (a,b), and tried to figure out whether under the stated assumptions, b will satisfy that definition? If so, let me know how far you got in the check, and where you got stuck. ---------------------------------------------------------------------- You ask why the equation (a,b) = (b,r_1) = (r_1,r_2) = ... (the last display on p.9) holds. Each step is an application of the relation "(a,b) = (b,r)" proved in the third paragraph on the page. The first "=" is the case of that relation where the given a and b are used for the a and b of that relation; the second uses b in the role of the "a" of that relation and r_1 in the role of the "b". (You should check r_2 then fits the roles of the "r" of that relation.) And so on. ---------------------------------------------------------------------- I hope my discussion of the proof of the Fundamental Theorem of Arithmetic in class made clear the point you asked about: The contradiction at the end of the proof (p.19) is between the assumption that a is the _smallest_ integer > 1 having nonunique factorization (first sentence of last paragraph of proof) with the conclusion that the _smaller_ integer s also had nonunique factorization. ---------------------------------------------------------------------- You ask how Proposition 1.2.10 implies that gcd(a,b)*lcm[a,b] = ab, as claimed on p.22. Well, on seeing that claim, there are some obvious things you should have done: Since the proposition gives formulas for the factorizations of gcd(a,b) and lcm[a,b], and since what the authors claim is a formula for their product, you should have multiplied the formulas for gcd(a,b) and lcm[a,b] together and compared them with the factorization of ab. If the exponents on the corresponding primes did not then look the same to you, you should have asked yourself "Can I think of a reason why they must be equal?", and if not, then raised this as a question. One can't read a math text passively; one has to use one's mind throughout. When you get stuck -- when you hit something that you can't overcome -- that is what you should ask about. And when you do so, you should say how far you got, so that the hurdle that I help you over is the one that is stopping you, and not one that you got over yourself but didn't mention. (If in doing this approach, you get over all the hurdles in the reading by yourself, then all the better -- submit an "pro forma" question briefly noting one of those difficulties and its solution.) ---------------------------------------------------------------------- You ask how Proposition 1.2.10 shows that gcd(a,b) lcm[a,b] = ab, as asserted on p.22. Your analysis of the situation starts right -- you note that gcd(a,b) lcm[a,b] = p_1^{delta_1 + mu_1) ... p_n^{delta_n + mu_n). The next step is to write down what ab is, namely ab = p_1^{alpha_1 + beta_1) ... p_n^{alpha_n + beta_n). So the question of whether the asserted equality holds comes down to whether for each i, we have delta_i + mu_i = alpha_i + beta_i. To answer this, go back to how delta_i and mu_i are defined in terms of alpha_i and beta_i, and see whether this equality must hold. If you aren't sure about that, ask again! ---------------------------------------------------------------------- You ask about the sentence "you may divide both sides of a congruence by an integer a only if (a,n) = 1" near the top of p.27. I think that what the authors mean is "we only know (from the preceding proposition) that doing this gives correct results in the case where (a,n) = 1". They then show by example that in (some) cases where (a,n) is not 1, it does not give correct results. They are not actually claiming to show that for every pair of integers a and n with n > 0 and (a,n) not equal to 1, there is an example where it fails. But in fact, for every such a and n there is such an example. You might like to look for a proof of this yourself. (But it is not a fact you are responsible for.) ---------------------------------------------------------------------- You ask why, as stated on the bottom of p.30 x=ay+bz is a solution to the pair of congruences x==a(mod n), x==b(mod m), writing "I don't feel too confident about solving congruences in general,...". This isn't a question of your being able to _solve_ congruences, but of your being able to _check_ whether something is a solution. Looking at the element ay+bz, you need to check what its value mod m is, and what it's value mod n is. To do the first, look at the preceding displayed equations on that page, and note that mod n, the "y" of ay+bz is 1, and the "z" is 0; so you can substitute in and find x == a.1 + b.0 = a (mod n). (It is valid to substitute in this way, by Proposition 1.3.3.) Similarly, using the other two displayed congruences, you find that x == b (mod m). After checking that this works, you then have a tool for solving pairs of congruences x==a(mod n), x==b(mod m); namely, if you can find y and z as in the display shown, then yz+bz will be the desired solution. (And the proof of the next theorem, on p.31, shows you how to find such y and z.) ---------------------------------------------------------------------- You ask, in the proof of Theorem 1.3.6, p.31, how the equation x = arm + bsn follows from rm + sn = 1. It doesn't follow -- if it did, the authors wouldn't precede it by the words "we let" -- they would say "we deduce" or "we see"! They are not saying that equation has to hold; they are saying "let's try this out and see whether it works". And they tell you why it is that expression rather than another that they are trying out: because of the discussion of "the preceding paragraph"; i.e., the last paragraph of p.31. Did you read that paragraph and understand it? ---------------------------------------------------------------------- You ask whether Z_n, defined on p.35, is closed under addition and subtraction. The phrase "closed under addition", properly used, is only meaningful when we are talking about a subset S of a larger system T such that the sum a+b is defined for every member of T. Then we say that S is "closed under" addition if for all a, b\in S, the element a+b of T is also a member of S. But Z_n is not constructed as a subset of some known set on which addition is defined, so it is not meaningful to speak of it as being "closed under" the operation. One can ask whether there is a reasonable way to define addition on that set, and this is answered by Proposition 1.4.2, p.37. (And everything I said about addition goes for subtraction as well.) ---------------------------------------------------------------------- You ask how to think about the symbol [-a]_n on p.37 (in the display "Additive inverses", near the bottom of the page), asking whether you should take it as meaning [n-a]_n. If this is a problem for you, that probably means that when you see [a]_n, you are thinking that a is one of 0, 1, ..., n-1. Although the authors do note in the last display on p.35 that every element of Z_n can be written [a]_n for some a in {0,...,n-1}, Definition 1.4.1 earlier on the page defines [a]_n for every integer n. When n is < 0 or > n-1, then the set [a]_n given by that definition is equal to one of the sets in that list. But convenient though that list is, you should bear in mind that when one writes [a]_n, the a can be any integer. ---------------------------------------------------------------------- You ask how the statement that an element with a multiplicative inverse is not a divisor of zero (p.38, just before Proposition 1.4.5) is compatible with the equation [1]_2 . [2]_2 = [0]_2 in Z_2, where [1]_2 has a multiplicative inverse. Note that the definition of "divisor of zero" refers to a "nonzero congruence class [b]_n". But [2]_2 is not a nonzero congruence class, since [2]_2 = [0]_2. ---------------------------------------------------------------------- Concerning the proof of Proposition 1.4.5(b), p.39, you ask why the authors can say that there exist k, b with n = kb and a = bd. They give the reason at the beginning of the sentence, where they write "Since d|n and d|b". (Do you see why d|n and d|b ? If not, ask yourself "What is d ?") ---------------------------------------------------------------------- You ask about the proof of Proposition 1.4.10, p.41 "using the formula ([a][b])^{-1} = [b]^{-1}[a]^{-1}" that the authors refer to. Given units [a] and [b], check that [b]^{-1}[a]^{-1} satisfies the definition of an inverse to [a][b]. You will see that it does so; this shows that [a][b] is invertible, which is what the proposition claims. ---------------------------------------------------------------------- You ask how the proof of Euler's theorem (p.41) uses Proposition 1.3.3(b). The authors mean the second statement of Proposition 1.3.3(b), "if ac == ad (mod n) and (a,n) = 1, then c == d (mod n)". This is here being used in contrapositive form: We know that the elements a_i, a_j are _not_ congruent (mod n), and that (a,n) = 1, hence the products a a_i and a a_j are _not_ congruent; i.e., as the authors say on the last line of p.41, the congruence classes of these products are _distinct_. Although the reference to Proposition 1.3.3(b) does not say this as explicitly as it might, you ought to have been able to deduce what was meant: The authors say "because (a,n) = 1", and of the two sentences of Proposition 1.3.3(b), it is the second that assumes that condition, so it must be that second sentence that they are using. That sentence relates a congruence between products and a congruence between the unmultiplied terms; and comparing this with what is going on in the proof of Euler's theorem, you should see that the same sort of relationship is being considered there, but with the condition of _not_ being congruent (belonging to _distinct_ congruence classes) being the issue; so the statement from the Proposition should be used in contrapositive form. ---------------------------------------------------------------------- You ask about the argument showing (a+b)^p == a^p + b^p (mod p) on the lower half of p.42, and the role of the binomial coefficients. The first display of that argument, the long equation beginning "(a+b)^p = ...", is the Binomial Theorem. The sequence of coefficients of the different terms, a^p, a^{p-1} b, a^{p-2}b^2, ... , b^p are called "binomial coefficients"; the general formula for the coefficient of a^{p-k} b^0 is shown by the second display of that argument: p!/(k!(p-k)!). The authors point out that this is divisible by p whenever k is neither 0 nor p. Now as I emphasized in class, when one considers congruences modulo p, being divisible by p is "being 0"; so when we look at the expansion of (a+b)^p modulo p, every summand is zero except those with k = 0 and k = p. So we can throw those terms away, and are left with the two summands, a^p and b^p; i.e., (a+b)^p == a^p + b^p (mod p). ---------------------------------------------------------------------- You ask about the expression for the complex numbers near the bottom of p.49, "C = {a+bi | a,b \in R and i^2 = -1}". It's poor notation on their part. I would write "C = {a+bi | a,b \in R} where i^2 = -1". In other words, i is not ranging over all entities whose square is -1; rather, it is a fixed element having that property. The quickest way to construct such a "C" from R is, as you say, as the set of ordered pairs with appropriate definitions of sum and product. In that case, every element has the form a+bi with "a" meaning (a,0) and "i" meaning (0,1). But there are other ways of modeling the properties that C should have, as we will see later in the course; and choosing one of them is not relevant at this point; they just want to refer to C as a familiar object for illustrating the concepts to be introduced. So giving a generic description is appropriate here, except for the matter of notation I mentioned to begin with. ---------------------------------------------------------------------- You ask about the difference between "codomain" and "image", as defined on p.50. Well, suppose we define a function f: Z --> Z by f(x) = x^2. Then the codomain, by definition, is Z, while the image is the subset of square integers. The difference between them is the result of the fact that this function is not "onto". ---------------------------------------------------------------------- You ask whether the authors' description of what it means for a function to be "well-defined" on p.52 is the same as "one-to-one". No. "One-to-one" means that two different inputs can't correspond to the same output. What they are saying "well-defined" means is approximately that one input can't correspond to two different outputs. Notice the difference! "Well-defined" is something that every function, to be a function, has to be. "One-to-one" characterizes a certain special subclass of the functions. (Their definition of "well-defined" also includes the condition that there can't be inputs with no output, and that the rule giving the output for each input be unambiguous; above I emphasized the aspect of the definition that I think you were confusing with "one-to-one".) ---------------------------------------------------------------------- You ask for concrete examples illustrating Proposition 2.1.3 on p.55. If you are ready for an upper-division math course, then you should be ready to construct concrete examples of your own for basic facts stated in abstract form. As the beginning of the authors' proof says, the statement concerns four sets S, T, U, V and three functions connecting them in the indicated way -- any four sets and any three functions with the stated domains and codomains. So you should be able to take some four sets that you are comfortable with (they don't have to be different; e.g., they could all be Z) and any four functions with f: S-->T, g: T-->U, etc., and think about what the statement means for that case. That doesn't mean you should never ask for examples. In some situations examples satisfying the conditions that are referred to may be tricky to find. But you should see for yourself first; and there is nothing tricky in this case. ---------------------------------------------------------------------- Regarding Definition 2.1.4 on p.55, you ask how to prove a function is onto. The definition is that a certain statement holds "for each element y\in T". So a proof that the condition holds should generally begin "Let y be an element of T"; then you need to show that the conclusion holds, i.e., that under that assumption, there will exist an x\in S with f(x) = y. How you will prove that will depend on what you know about f, and the methods of the situation where you are working. For instance, in a problem about continuous functions on the real line, you might use the Mean Value Theorem to find x with the desired property. In a problem about finite sets, you might use a counting argument. ---------------------------------------------------------------------- You ask whether the definition of a function f being one-to-one (p.55) can be turned around to say "f is 1-1 if a=b implies f(a)=f(b)". No; rather, _every_ function f satisfies "a=b implies f(a)=f(b)"; that is implicit in the definition of a function. You should take some familiar examples of non-one-to-one functions, such as f(x) = x^2 (a function Z-->Z) or f(x) = [x]_5 (a function Z-->Z_5), and check for yourself that they all satisfy "a=b implies f(a)=f(b)". ---------------------------------------------------------------------- You ask whether the result of Proposition 2.1.8, p.58, might be true for some, even though not for all, infinite sets. The last sentence of section 2.1, on p.59, observes that this is not true: that the failure of the result of that proposition "characterizes" infinite sets, i.e., that not only is every set for which it fails infinite, but it fails for every infinite set. However, if, instead of restricting only the sets, one also restricts the functions in certain ways, then there are more cases where the proposition holds. For instance, if S and T are two finite-dimensional vector spaces of the same _dimension_, then even though they are infinite as sets, a basic result of linear algebra says that a _linear_map_ f: S --> T is 1-1 if and only if it is onto. ---------------------------------------------------------------------- You ask whether, for ~ an equivalence relation on S, and a an element of S, the set [a] of Definition 2.2.2, p.63, is a member of S. No, it is a subset of S. ---------------------------------------------------------------------- You ask about the comment after Definition 2.2.2, p.63, "but once the factor set has been constructed we no longer picture ...". This is not a mathematical statement, but just a piece of advice on how to think of equivalence classes; it is advice that I also gave in lecture in connection with the elements of Z_n. Roughly speaking, all we will be using about S/~ is that elements of S that are related by ~ give the same element of S/~, while elements of S that are not related by ~ give different elements. It "clutters" one's thoughts to always picture the elements of S/~ as sets; it is more helpful to think of them as single objects (which happen to have been constructed as sets). ---------------------------------------------------------------------- You ask, concerning the definition of partition on p.66, whether a set has infinitely many partitions. An infinite set does; a finite set has only finitely many. For instance, a 4-element set S has 15 partitions: one consisting of just one set, S itself; four in which it is broken into a 1-element set and a 3-element set; three in which it is broken into two 2-element sets, six in which it is broken into a two-element set and two one-element sets, and one in which it is broken into four one-element sets. (By the way, if P is a partition of S, beware of calling the sets comprising P "partitions of S". They are the members of the partition P, and are subsets of S. A partition of an infinite set may have finitely or infinitely many members.) ---------------------------------------------------------------------- You ask why, as stated in Example 2.2.7, p.66, the set of circles centered at the origin forms a partition of the plane. Well, you have to check whether the conditions in the definition of "partition" are all satisfied. If S is the plane, and P is the set of those circles, is it true that (i) P consists of nonempty subsets of S (i.e., that each such circle is a nonempty subset of S)? And that (ii) each member of S belongs to exactly one member of P? This is really a combination of two conditions: (ii.a) each member of S belongs to a member of P. (ii.b) no member of S belongs to more than one member of P. So you should check whether you can verify that P satisfies (i), (ii.a) and (ii.b). If you have difficulty with some of them, let me know how far you have gotten on them, and where you are unclear. ---------------------------------------------------------------------- You ask about the assertion in the second paragraph of p.67, that x ~ a. Remember that in this part of the argument, we are assuming a partition given, and defining "~" to mean "belong to the same member of the partition". Now the second sentence in this paragraph assumes that "P_alpha ... contains some element a ..."; i.e., a\in P_alpha. The next sentence says "let x\in P_alpha". So these two assumptions mean that a and x are in the same member, P_alpha, of the given partition; which is what "x ~ a" means here. ---------------------------------------------------------------------- Regarding the 3rd line of Example 2.2.8 on p.67, you ask > When they say that x belongs to a unique equivalence class, are they > saying that the function pi only has one output for each input > (one-to-one)? Having only one output for each input is not the same as being one-to-one -- it is part of the definition of "function". (One-to-one means having at most one input that gives any output.) In that sentence, "belongs to a unique equivalence class" means "belongs to an equivalence class, and not to more than one equivalence class". Because there is one and only one, it makes sense to speak of "the" equivalence class to which x belongs, and hence one can define pi to take x to that class. (Actually, however, that sentence of the authors' is a little off the mark. The definition of [x] given on p.63 would make sense for any relation "~", not just an equivalence relation; in particular, it would make sense whether or not these sets [x] had the property proved in Proposition 2.2.3; so their definition of "pi" would also make sense. If instead they defined pi to take x to "the equivalence class of ~ to which x belongs", then that sentence would be needed. Proposition 2.2.3 shows that for an equivalence relation, these two definitions mean the same thing.) ---------------------------------------------------------------------- You ask about the term "natural projection" in Definition 2.2.6, p.68. "Natural" is a word that mathematicians use to describe some map or element that arises from the nature of a situation, instead of being arbitrarily chosen. If the authors had called it the "natural map", that would have been justified, since in general there will exist lots of maps S --> S/~. For instance, if S is finite, with n elements, and S/~ has m elements, there are m^n maps from the first set to the second; among these, the "natural" one is the one arising from the way S/~ is constructed from S. But since they are using the word "projection", which doesn't have a more general meaning, the word "natural" is not really needed. ---------------------------------------------------------------------- Regarding permutations (Def.2.3.1, p.72), you ask what use they are in everyday life. I don't have a real answer to that. I see permutations as useful in studying basic concepts of mathematics: symmetries of various sorts of structures. Beachy and Blair begin each chapter by trying to paint a picture of how the subject of the chapter will be useful in later work, but I think they are generally not successful: it is only after one sees the applications that one can appreciate what they said. I think the best way to appreciate a piece of mathematics is to look at it for its own sake at the start, and come to see its place in the larger picture as one learns more. I have mentioned in class, more than once, one "real-life" example of permutations: As describing substitution codes (and what happens when you apply one after another). One can find other little examples -- in certain sorts of English Country Dances, there are complicated sequences where one dancer exchanges places with another, and then another pair of dancers exchange places, and so on; and one who has studied permutations will smile on learning to do such a dance, and think about how transpositions are being composed to give some interesting permutation of the set of dancers. But these little examples are relatively unimportant; the important thing is the bigger context of group theory, to which permutation groups form a starting point. ---------------------------------------------------------------------- You ask about the switch between the first and second two-row expressions for sigma in the first paragraph of p.74. In the first expression, the successive columns of sigma mean "sigma takes 1 to 3", "sigma takes 2 to 1", etc.. The meaning is the same whatever order one makes those statements in; so in the second expression, they have put those same statements in a different order, "sigma takes 1 to 3", "sigma takes 3 to 4", etc., so as to show successively the "path" beginning with 1. Taken together, the first 4 columns then say that sigma takes "1 to 3, that to 4, that to 2, and that back to 1"; and they then set up an abbreviation, (1,3,4,2), to express this in compact form. ---------------------------------------------------------------------- Regarding the computation of a product of permutations on p.76, you say "The product doesn't seem to be explicitly defined". Did you look for "product" in the index of the book? And under "product" for the subheading, "of permutations"? It is understandable that people will miss a definition from time to time. But you should use the index to track down what you missed. ---------------------------------------------------------------------- You ask about the correctness of the first display on p.77. It is correct. At the very first step, they are not using any information about how tau acts; they are just using the assumption i = a_j made on the preceding line. (That assumption is one of 3 possible alternatives. Later in the proof, they go to the second alternative, namely where i is one of the b's, and finally to the third alternative, that it is neither one of the a's nor one of the b's.) At the next step in the display, they use the fact that a_i is _not_ one of the elements that tau moves. ---------------------------------------------------------------------- You ask what is meant by the statement in Theorem 2.3.5, p.77, that "the cycles ... that appear in the product are unique". If the same permutation is written in two different ways as a product of disjoint cycles, the expressions can be different in that the cycles appear in different orders; e.g., the same permutation can be written as (1,3,5)(4,2,6,8)(9,10,11), as (4,2,6,8)(9,10,11)(1,3,5), and in four other ways, using different arrangements of the same three cycles. But it can't be written using _different_ cycles; e.g., that same permutation is _not_ equal to (4,2,6,10)(9,8,11)(1,3,5). So the set of cycles that appear in its expression, the set { (1,3,5), (4,2,6,8), (9,10,11) }, is unique. ---------------------------------------------------------------------- > P. 77, Theorem 2.3.5 > unimportant > > I was wondering, how would one prove that a permutation can be > expressed as a composition of disjoint cycles uniquely up to the > ordering? Well, the proof the authors give for the Theorem is essentially a translation into words of what happens when we look at the "picture" of a permutation, and see that it separates into "pieces" which are disjoint cycles. You don't say what you are not satisfied with in the proof; I suppose it is the last sentence, which leaves the uniqueness as an exercise. (If so, you should have said this!) So you want to think about why there is no more than one way to decompose the picture into disjoint cycles, and then translate this into words. Can you? ---------------------------------------------------------------------- You ask how one forms the cycle beginning with b in the proof of Theorem 2.3.5 on p.78. The authors have shown how to get the cycle beginning with 1, and how to get the cycle beginning with a. Those two use exactly the same method, and they are saying to use the same method once more beginning with b. If you have difficulties understanding the method used with 1 and a, let me know about that. Or if you see something about the way the cases of 1 and a are handled that you don't see how to apply to b, let me know what that is. You also ask how they "exhaust" S. "Exhaust" means "use up"; they are saying that as they handle 1, and then if the cycle containing 1 is not all of S, as they handle a, and then if the cycles containing 1 and a are not all of S, as they handle b, so they can go on handling more and more elements, until they reach a point where the elements in the cycles they have formed are all the elements of S, i.e., they don't have to go further because they have exhausted S. ---------------------------------------------------------------------- Concerning the argument in the proof of Prop.2.3.8 on p.80 saying that if (a_1,...,a_m)^j (b_1,...,b_r)^j = 1, then each of (a_1,...,a_m)^j and (b_1,...,b_r)^j equals 1, you ask whether the general principle is that if a product of two cycles is 1, then each of them is 1. That statement is true, but it wouldn't help here, for though (a_1,...,a_m) and (b_1,...,b_r) are cycles, (a_1,...,a_m)^j and (b_1,...,b_r)^j may not be. (E.g., (1,2,3,4) is a cycle, but write out (1,2,3,4)^2 and see whether it is a cycle.) Also, that fact doesn't generalize to 3 or more cycles, as is required for the general case of the proof. Rather, the key point is the one mentioned at the end of the sentence in question in the book: (a_1,...,a_m)^j leaves each b_i fixed, and (b_1,...,b_r)^j leaves each a_i fixed. For knowing that, suppose we compute the action of (a_1,...,a_m)^j (b_1,...,b_r)^j on the elements a_i. We see that they are all unaffected by (b_1,...,b_r)^j, by the above observation; on the other hand, they will all be unaffected by (a_1,...,a_m)^j if and only if (a_1,...,a_m)^j = 1. Similarly, if we compute the action of (a_1,...,a_m)^j (b_1,...,b_r)^j on any b_i, we see that this is first sent by (b_1,...,b_r)^j to one of the b's, and that that element is unchanged by (a_1,...,a_m)^j; so these elements are all unaffected by the composite if and only (b_1,...,b_r)^j is the identity. Thus, for (a_1,...,a_m)^j (b_1,...,b_r)^j to be the identity, which leaves everything unchanged, both factors must be the identity, as claimed. If one wants to extract from this a general principal, it is that given two permutations sigma and tau, such that no element is moved by both sigma and tau, we will have sigma tau = 1 if and only if sigma = 1 and tau = 1. Intuitively, the idea of the sets of elements moved by sigma and tau being disjoint is that it makes it impossible for nontrivial actions of sigma and tau on the same element to "cancel each other". ---------------------------------------------------------------------- You ask about the proof of the general case of Proposition 2.3.8, p.80. I think the reason that Beachy and Blair don't show the general case is that the indexing needed for the symbols is more likely to confuse than to enlighten most students. Suppose sigma is a permutation of {1,...,n}, which is a product of disjoint cycles tau_1 ... tau_k. We need names for the lengths of these cycles; say m_1,..., m_k respectively. Then we need names for the elements moved in the cycles. The elements moved in the i'th cycle need to be numbered with subscripts 1,...,m_i, but they also need a subscript i to distinguish them as elements moved in that cycle rather than other cycles; so we need a notation like a_{i,1}, a_{i,2}, ... , a_{i,m_i} for them. We can then write "sigma = (a_{1,1},...,a_{1,m_1}) (a_{2,1},...,a_{2,m_2}) ... (a_{k,1},...,a_{k,m_k})". Now when we apply sigma to a_{i,j}, it will go to a_{i,j+1}, and a second application of sigma will take it to a_{i,j+2} etc. -- until these numbers reach m_i, after which point it moves back to a_{i,1} and starts over from there. To describe this properly, it is better if our index "j" is not an integer, but a member of Z_m_i. So let us go back and rewrite sigma as (a_{1,[1]},...,a_{1,[m_1]}) (a_{2,[1]},...,a_{2,[m_2]}) ... (a_{k,[1]},...,a_{k,[m_k]})", where in a_{i,[j]} we understand [j] to be short for [j]_m_i. Then we can say correctly that sigma takes a_{i,[j]} to a_{i,[j+1]}, and more generally that sigma^k takes a_{i,[j]} to a_{i,[j+k]}. Now we ask for what k sigma^k will be the identity permutation; i.e., will fix all these elements a_{i,[j]}. For each i and j, it will send a_{i,[j]} to a_{i,[j+k]}, which is the same as a_{i,[j]} if and only if [j]_m_i = [j+k]_m_i, which will be true if and only if m_i | k. So sigma^k will fix all these elements if and only if for all i we have m_i | k; i.e., if and only if k is a common multiple of the numbers m_i. So the order of sigma, i.e., the smallest positive k such that this happens, is the smallest positive common multiple of the m_i, that is, their l.c.m., which is what the Proposition asserts. So -- was this helpful, or more confusing? ---------------------------------------------------------------------- You ask how we know that, as asserted in the 3rd line of p.83, "a must appear in at least one other transposition". This is answered in the second half of the sentence: "... since otherwise rho_1 rho_2 ... rho_k (a) = b, a contradiction". Did you not notice that explanation? Or did you not see how a's not appearing in at least one other transposition would lead to that conclusion? If the latter was the problem, did you try thinking through how rho_1 rho_2 ... rho_k would act on a if a did not appear in any of the rho's but rho_1 ? If so, did you see anything about the action of rho_1 rho_2 ... rho_k in this situation? As I say on the class handout, you need to give such details in your written questions, since I can't question you and get answers from you about such points as soon as you ask the question, the way I could at office hours. Please let me know, so that I can clarify the point for you. ---------------------------------------------------------------------- Regarding p.83, line 4, you write: > The book states: "We observe that a must appear in at least one other > transposition, say [rho]_i, with i>1, since otherwise > [rho]_i***[rho]_k(a)=b, a contradiction." ... why ... does the > product of those transpositions equaling b lead to a contradiction? I hope what I said in class clarified this. "rho_i ... rho_k(a)" is not a product; rather, rho_i ... rho_k is a product, and "rho_i ... rho_k(a) is the element of {1,...,n} that one gets when one applies this product-permutation to a\in {1,...,n}. By assumption, our product-permutation is the identity. So when applied to a it should give a, not b. ---------------------------------------------------------------------- You ask about the statement in the next-to-last sentence of the proof of Theorem 2.8.11 on p.83 that a certain equation "contradicts the choice of rho_1,...,rho_k." The "choice of rho_1,...,rho_k" that they are referring to was made in the last sentence of the first paragraph on this page, where the authors say that among all expressions for the identity with certain properties, "we assume rho_1 rho_2 ... rho_k has the fewest number of a's." This says that among all such expressions, we should choose one, rho_1 rho_2 ... rho_k, with that property. Then, at the point you ask about, they show that from rho_1 rho_2 ... rho_k we can construct another expression with a smaller number of a's. This contradicts the assumption made when rho_1 rho_2 ... rho_k was chosen, showing that the assumption this was based on (that 1 could be written as the product of an odd number of transpositions) must be false. This is a common method in proofs by contradicting: One chooses something to make some value as small or as large as possible, then gets a value which is smaller or larger, then says "this contradicts our choice of ...", and concludes that the assumption that allowed one to make such a choice must have been false. ---------------------------------------------------------------------- You ask what the symbol S x S means in the definition of "binary operation", p.89. This is stated in the first sentence of that definition: They write "... the set S x S of all ordered pairs of elements of S ..." . That notation is also discussed in my handout on sets and logic: the last item on p.3 is the symbol "x". ---------------------------------------------------------------------- Regarding the commutative law, defined on p.91, you ask "How typical is it" that that law holds for a binary operation. Depends on what you're working with. Let's just focus attention on group operations (ignoring structures with binary operations that don't satisfy the group laws). If someone studies abelian groups, then for that person the answer is "It always holds". If someone studies groups S_n, the answer is "It never holds except in the trivial cases n=1 and n=2" ... . ---------------------------------------------------------------------- Regarding the definition of a group on p.91, you ask whether the implications "a.e = a => e.a = a" and "a.b = e => b.a = e" always hold. If the elements named are elements of a structure that is already assumed to be a group, then it is not hard to prove that those implications do hold. But in a set having a binary operation -- even an associative binary operation -- they need not. For instance, if S is the set of all maps (not just permutations) from the natural numbers to the natural numbers, and a is defined by a(n) = max(n-1,0) (i.e., "subtract 1 if n is positive, but send 0 to 0"), while b is defined by b(n) = n+1, then you will find that a.b is the identity map of N, but b.a is not. In this example, I've assumed that the "e" of your question means the identity map. We obviously can't assume that in getting a counterexample to the first implication you give; but if in the example I've just indicated we let e denote b.a, then you will find that a.e = a but e.a is not a. ---------------------------------------------------------------------- Regarding the definition of a group on p.91, you ask "Is it ok to think of groups as generalized versions of fields or vector spaces?" (which you saw in Math 110). Only if you take "generalized" in a very, very, broad sense! Fields, for instance, are defined by two operations, + and ., while groups are defined by just one; fields have the distributive law relating those operations, while groups have no analog of that. Sort of like saying "Can you think of oranges as generalized apples?" Yes if you really understand the difference; no if it leads you to lose sight of that difference. ---------------------------------------------------------------------- Regarding the two definitions of "group" on pp.91 and 92, you ask whether, given a set with a binary operation, we must check for closure. If you know that what you are given _is_ a binary operation on the set, then by definition of "binary operation", the set must be closed under it, so no additional check is needed. However, if you are given some function of two variables and are asked whether it makes the given set a group, then (if this is not obvious) you need to check whether it really does define a binary operation on the set, i.e., whether the set is closed under it. ---------------------------------------------------------------------- You ask whether non-integer real exponents are allowed in the laws of exponents (p.92). Certainly not in the context of this book! The authors only define a^n for integers n; and in a general group, one can't do more than this. If one is working with certain groups of numbers, then depending on the situation, one can extend the definitions to various degrees. For instance, one can define exponentiation by any real exponent as a map from positive reals to positive reals, and prove that the laws of exponents hold for this operation; but if one tries to define a^{1/2} for a _negative_ real number a, one can't get a value in the reals. If one allows complex values, one runs into difficulties with multi-valued-ness, which becomes an interesting topic in Math 185, but which we can't go into here. ---------------------------------------------------------------------- You ask about the step "de = d(bc)" in the second display on p.96. This is because c was assumed a solution to bx = e (just before that display). ---------------------------------------------------------------------- You ask about the definition of a group G being abelian, on p.96. When one says "a group", one understands a set G _given_with_ a binary operation. The authors give the definition of "abelian" under the assumption that the operation is written multiplicatively, but this definition is understood to apply to any group, whether the operation is written as "multiplication", as "*", as "+", or whatever. So a group with operation denoted "%" will be called abelian if a%b = b%a for all a and b. If we have two different operations on a set, say "*" and "%", each of which makes the set a group, then those are two different groups, and whether one is abelian and whether the other is abelian are independent. To show this situation, we would have to go back to the more formal notation of denoting a group by an ordered pair. So if (G,*) and (G,%) are two groups that happen to have the same underlying set G, but different operations * and %, then one of them might be abelian while the other is not. ---------------------------------------------------------------------- Regarding the concept of group (pp.88-100) you ask > ... when a group is mentioned in the text, how can we tell what > operation is applied? The definition of each group states both the set of elements, and the operation. So, the authors have told you that S_n means the set of permutations of {1,...,n} _made_a_group_using_the_operation_of_ _composition_. Similarly, "R" means the set of real numbers made a group using the operation of addition, while "R^x" means the set of nonzero real numbers made a group using the operation of multiplication. On the other hand, if they say "Let G be a group", that means any set given with some group operation, which can be anything satisfying the axioms, and will be written multiplicatively or additively as they decide. ---------------------------------------------------------------------- You ask about the identity element in the group GL_2(R), p.100. As the authors indicate in the proof of Prop.3.1.12, this is the identity matrix "I". Logically, they should have said this earlier. On the other hand, "identity element" is defined in Definition 3.1.4, so knowing basic properties of matrices one should be able to see which matrix satisfies that definition. ---------------------------------------------------------------------- You ask about the phrase "the group axioms" in the first line of section 3.2, p.102. As you guessed, they mean the conditions in Definition 3.1.4. ---------------------------------------------------------------------- Concerning the concept of subgroup (Def.3.2.1, p.103) you ask about the usage which you have seen in other texts, of the "less-than-or-equal" sign for "is a subgroup of". That is commonly used by group theorists, but not so common among other mathematicians, even when they use group theory. Beachy and Blair probably avoided introducing it because it would be likely to cause confusion. I recommend against using it in homework; I don't know whether the grader is familiar with it. ---------------------------------------------------------------------- You ask whether 0 belongs to the group R^+ defined at the bottom of p.103, and whether it has an inverse. The definition given says "... positive real numbers ...", and 0 is not considered positive. One classifies real numbers as positive, negative, and zero. (When one wants to refer to the set of real numbers that are >_ 0, i.e., positive or zero, one says "nonnegative".) Under the operation of multiplication, which is being used here, 0 does not have an inverse. (Under the operation of addition, on the other hand, it is its own inverse.) ---------------------------------------------------------------------- You ask whether the reason for the assumption "nonempty" in Corollary 3.2.3, p.105, is because the empty set does not have an identity element. Essentially, yes. To put it another way, the empty set is not a group, but it satisfies the other condition of the Corollary; so it we left "nonempty" out of the statement, the Corollary would be false. When the authors say "it is crucial to verify ...", I think they mean that students using the corollary need to watch out that they don't overlook saying why some set H they are applying it to is nonempty. You also ask "If there are no elements, is it necessary to have an identity element ... ?" If an object is to fit our definition of a group, it has to have an identity element. Perhaps you mean "Would it be OK if we used a different concept of "group", which allowed to empty set, as long as the conditions forced all _nonempty_ examples to have identity elements?" That has been thought of. The resulting theory would not be too different from the existing theory of groups. But the theory would be a little messier, without having any real advantages; so the standard theory, requiring every group to have an identity element, is preferable. ---------------------------------------------------------------------- Regarding the second paragraph of the proof of Cor.2.3.2, p.105, you ask > Firstly, is it correct to think that e=a*a^-1 belongs to H "by > assumption" because we're letting b=a and we know a*b^-1 is in H? Right. > Secondly, why do the facts that e and a belong in H imply that a^-1 > belongs in H? They say why: Because a^-1 can be expressed as e a^-1. Knowing that, we again use the the assumption that ab^-1 \in H for a, b\in H. Since that, and nonemptiness, are the only assumptions we have in proving this direction, you can expect it to be used in essentially every step. In this case, we put e in the role of the "a" of that assumption and a in the role of the "b". In trying to guess what might have kept you from seeing this, I wonder whether it was the idea that "a" and "b" in that assumption represented fixed elements. But note that the statement was "for all a, b in H", not "for some a, b in H". This means that whatever two elements of H we look at, the element gotten by multiplying one by the inverse of the other is also in H. That is something one has to be conscious of in general in mathematics: When a statement is asserted "for all" members of a certain set, it applies to every case, whether the letters occurring are as in the statement or not. ---------------------------------------------------------------------- You write that the set S defined in the last paragraph on p.109 does not seem to satisfy the conditions to be a subgroup; that a . a^{n-1} = a^n does not belong to S. It does belong to S. By the assumption of (c), o(a) = n, so a^n = e \in S. ---------------------------------------------------------------------- In relation to the discussion of groups of orders 1-6 on p.115, you ask whether there is a formula for the number of groups of order n. There are a lot of general facts; for instance, if n is a prime, as mentioned in class, there is (up to isomorphism) only one such group. If n is a product of two primes, n = pq with p < q, then one can show that there are two groups of order n, the cyclic group and a nonabelian group if q == 1 (mod p), while if this congruence does not hold, there is only the cyclic group. There are two groups of order p^2, namely Z_{p^2} and Z_p x Z_p (both abelian), while there are 5 of order p^3, three abelian and two nonabelian. The results about pq and p^3 I generally prove when I teach Math 250A. The more complicated the prime factorization of a number, the more difficult it is to analyse in general the number of groups of that order. You can find a tabulation of results through n = 2000 at http://www.math.rwth-aachen.de/~Hans-Ulrich.Besche/number.html ---------------------------------------------------------------------- You ask how the associative law comes into the study of the possible form of the multiplication table of a group, as on pp.115-116. Though in analysing such questions, we phrased the discussion in terms of cyclic subgroups, identity elements, etc., the associative law was also being used constantly, without being mentioned. For instance, in order to say that the subgroup generated by a has the form {e, a, a^2, a^3, ...}, we have to be able to make sense of expressions like "a^2" and "a^3". If our operation were not associative, a(aa) and (aa)a could be two different elements, so we wouldn't necessarily have _one_ element that we could call a^3. (Similarly, there could be up to 5 elements that could be called a^4.) Recall also that we used the cancellation properties to show that no element could appear twice in a row or column. But the proof of those cancellation properties uses the associative law. For instance, if ua = va, and we multiply this equation on the right by a^{-1}, getting (ua)a^{-1} = (va)a^{-1}, we then need the associative law to transform this equation into u(aa^{-1}) = v(aa^{-1}) so that we can change the aa^{-1} into e. Likewise, in analysing S_3, we used the associative law in proving that a^i b^j = a^i' b^j' => a^i = a^i' and b^j = b^j'; and later, after getting the formula ba = a^2 b, we used the associative law to make statements such as that multiplying this on the right by b gave the formula ba . b = a^2. ---------------------------------------------------------------------- You ask about the proof of Prop.3.3.2, p.117, in particular, how the authors got from "h_2 ^{-1} k_1 h_2 = k_3" to "k_1 h_2 = h_2 k_3". They do this by multiplying on the left by h_2, to cancel the h_2 ^{-1} on the first term. In class, I worked backwards. Translating what I did into the book's notation, I said that we wanted a product of the form k_1 h_2 to be expressible as a member of H K; assuming it used the same element of h as in the original expression, this meant it should have the form h_2 k_3 for some k_3\in K. So dividing on the left by h_2, the equation k_1 h_2 = h_2 k_3 that we wanted became h_2 ^{-1} k_1 h_2 = k_3. This explained why the book needed the hypothesis it used. When you have difficulty seeing where a computational step comes from, you should compare the equations with those that the book says they come from. Hopefully, if you had written down h_2 ^{-1} k_1 h_2 = k_3 k_1 h_2 = h_2 k_3 and asked yourself how they were related, you would have seen that the second was obtained from the first by multiplying on the left by h_2. ---------------------------------------------------------------------- > On p.118, I don't understand what 3.3.4(a) is claiming. That if we take the set G_1 x G_2 as described in Def.3.3.3, and define "multiplication" of elements of this set by the formula displayed in 3.3.4(a), then this set with this operation will satisfy the definition of a group (p.91). ---------------------------------------------------------------------- You ask about the relation between direct product of groups (p.118) and Cartesian product. Cartesian product is a construction for sets -- it gives us the set of ordered pairs (which, I admit, is really all the authors describe in Def.3.3.3). But to define a group, one also needs an operation, and when one defines the operation componentwise (as in Prop.3.3.4) one gets what is called the direct product group. ---------------------------------------------------------------------- You ask about the reason for the condition 0 not-= 1 in the definition of "field" on p.120. If 0 = 1, then multiplying that equation by any a\in F, we would get 0.a = 1.a, i.e., 0 = a; so every element of F would equal 0, i.e., F would be a 1-element set. The behavior of such a structure would be very different from structures of F with 0 not-= 1. (For instance, vector spaces F^m and F^n would be isomorphic, so one would not have a well-defined concept of the dimension of a vector space.) Hence this case is excluded in the definition to make the concept a mathematically useful one. ---------------------------------------------------------------------- You ask whether forming a product of "elements of S and their inverses", the explanation of "word" given in Definition 3.3.8, p.122, won't always give the identity element. What may be confusing you is the use of the pronoun "their" in that definition. The "they" that the authors have in mind is the set of all members of S. So what they mean is that one takes some elements of S and some inverses of elements of S, and forms the product. The inverses used don't have to be the inverses of the elements used, and even when an element and its inverse are both used, they don't have to be next to each other. So they won't, in general, cancel out to give the identity element. Even though the authors' use of the word "their" could be misleading, there was a way that you should have seen that it didn't mean what you thought. The definition was introduced by a paragraph, "We next investigate ...", which gave an example: If a, b, c\in S, then they pointed out that "products such as a^{-1} a^{-1} bab^{-1}c ... must belong to H". They then said that they would show below that the collection of "all such products" formed the subgroup generated by S. So you should have expected that what they would be talking about next would be products like "a^{-1} a^{-1} bab^{-1}c", and seen that this is, in fact, what they meant by "words". Moral: Don't skip the connecting paragraphs between the definitions, results, etc.! Read them, and expect them to be relevant to what the authors do next. ---------------------------------------------------------------------- You ask what kinds of properties of a group carry over to all isomorphic groups (p.126). Good question. The answer is, all properties that can be expressed purely in terms of the group-theoretic structure. An example of a statement that is _not_ preserved is "G is a subset of Z". (For instance, the groups Z and {n/2 | n\in Z} are isomorphic, but the first satisfies that condition, while the second doesn't.) But statements like "G is abelian", "G is infinite", "G has infinitely many elements of order 2", "G has no nonidentity elements of finite order", etc. are all preserved by isomorphism Logicians can make this sort of thing precise, by setting up an appropriate "language for groups", and a precise description of how it is to be interpreted, and saying "Every property expressible in this language is preserved by isomorphism". Mathematical logic is covered in Math 125, and we can't develop it here; so we will simply learn how to recognize properties that are preserved by isomorphism, and how to prove this true in any given case. ---------------------------------------------------------------------- You ask why, in the first paragraph of p.129, when the authors note that a^n = e => phi(a)^n = e, they can't just say that this means a and phi(a) have the same order. The numbers n such that a^n = e are the multiples of the order of a; so the implication that they prove show that the set of multiples of the order of a _is_contained_in_ (but not necessarily equal to) the set of multiples of the order of phi(a). This means that the order of a is a multiple of the order of phi(a), but doesn't show that they are equal; to get that, one has to show the reverse inclusion, which they do using the map phi^{-1}, which goes the other way. In the next reading, we will see the concept of a homomorphism between groups; and given a homomorphism phi: G_1 -> G_2, the same argument as in this proof shows that the order of phi(a) is a divisor of the order of a; but homomorphisms do not have to have inverses, and the orders of a and phi(a) do not have to be the same. (For instance the map Z_6 to Z_3 sending [k]_6 to [k]_3 for all k is a homomorphism, and taking k=1, one finds that [1]_6 has of order 6, while its image under phi, [1]_3, has order 3.) ---------------------------------------------------------------------- You ask why every element of Z_2 x Z_2 not equal to the identity must have order 2, as stated in Example 3.4.4, p.130. Did you try combining a nonidentity element of Z_2 x Z_2 with itself repeatedly under the operation of that group, and seeing how many times you had to do so to get the identity? If you understand the meaning of the order of an element, and didn't see why elements of this group had order 2, that is what you should have done to test out that statement. If you do so, and don't come immediately to the conclusion that the order of the element is 2, then show me the computation you did, and I will point out where it needs to be corrected. Once you have tried one element, you could try each of the other two nonidentity elements and conclude that all three have order 2. Or you may see the reason that it worked for the first element, and realize that it will have to work the same way for the other two. (And maybe even realize how you could have seen this before doing the computation.) ---------------------------------------------------------------------- You ask about the second sentence of the proof of Prop.3.4.4, p.132. "The identity of G_1" and "the identity of G_2" mean the identity elements of those groups, generally denoted e. The authors have proved that an isomorphism carries the identity of its domain group to the identity of its codomain group, and their argument really just used the equation phi(ab) = phi(a)phi(b), so that fact is true here too. (I didn't stop to think of it before looking at your question, but of course they should have pointed this out, since after stating it for isomorphisms, it's not fair to assume it in this more general context.) So, as phi(e) = e, and phi is one-to-one, phi can't send anything else to e; which is what that sentence asserts. ---------------------------------------------------------------------- You ask on what basis the authors say, in the second sentence of the proof of Proposition 3.4.5 on p.132, that phi is well defined The concept of something being "well-defined" is one of the topics discussed in the last section of the handout on Sets, Logic, and Mathematical Language. After making sure that you understand that, look at the definition of phi given in the first sentence of the proof of Proposition 3.4.5, and (a) determine what condition needs to be verified for that definition to give a well-defined function. Then see what facts the authors point out in the beginning of the second sentence of that proof, and (b) check whether you see why these statements are equivalent to the answer you got for (a), and (c) check whether you understand why those statements are true. ---------------------------------------------------------------------- You ask whether the analog of Prop.3.4.5 on p.132, with Z^x_mn, Z^x_m, and Z^x_n, in place Z_mn, Z_m, and Z_n is also true. Yes. The proof is essentially the same, but where the proof of Prop.3.4.5 implicitly uses the left-hand equation of Prop.1.4.2 (p.37) to show that phi respects the group operation, the proof of the statement about Z^x_mn would use the right-hand equation of that Proposition. ---------------------------------------------------------------------- You ask how the authors get "r=0" in the next-to-last line of the proof of Theorem 3.5.1, p.135. They have shown that a^r \in H. But m is the smallest _positive_ integer such that a to that power is in H and r < m, so r can't be positive. But it is >_ 0, so it must be 0. You also ask whether there is another way to prove this result. As I indicated in class, what I think is the nicest way to prove it is by applying Theorem 1.1.4 to the set of k such that a^k\in H. In fact, the proof they give is really a repetition of the proof of Theorem 1.1.4. I think that the reason they do this is because some instructors may wish to skip chapters 1 and 2, assuming that the general material in them is already familiar to their students, and that results (like that one) which aren't already familiar can be worked in on the way. ---------------------------------------------------------------------- You ask whether the converse of Theorem 3.5.2(a), p.135 holds, namely that if G =~ Z, then G is infinite. Yes. An isomorphism is, in particular, a 1-to-1 correspondence, and if a set is in 1-to-1 correspondence with an infinite set, then it is infinite. ---------------------------------------------------------------------- You ask how, in the proof of Proposition 3.5.3, p.136, we know that a^n = e (beginning of next-to-last line on page). To answer this, you should ask yourself, "What are the a and n in that equation?" Are they things that have been chosen in the course of the proof, or things named in the statement of the Proposition, for instance? And in either case, what was assumed about them? I hope that after checking this, you will be able to figure out why a^n = e holds. If not, then let me know and I will point out the reason. (If you already checked what these elements were before sending your question, but did not see why the equation holds, you should have mentioned this, so that I would know that telling you to do so wasn't enough. On the other hand, if you didn't check what the elements in question were before sending your question -- then I hope you will realize in the future that you need to!) ---------------------------------------------------------------------- You ask about the statement at the bottom of the p.136 that the order of a^d is n/d. Remember that o(x), the order of x, is the least positive integer r such that x^r = e. If d is a divisor of n = o(a), and x = a^d, then for any r, x^r = a^dr; and the smallest positive value of r that makes this the identity is n/r, which makes the exponent n. ---------------------------------------------------------------------- You ask why, as stated in the last paragraph on p.138, the number of generators of Z_n is the number of integers less than and relatively prime to n. This is by Corollary 3.5.4(a) (p.137), translated from multiplicative to additive notation, so that "a^k" is replaced by "k[1]", i.e., [k]. ---------------------------------------------------------------------- I don't know whether what I said in class about the proof of Corollary 3.5.6, p.139 was very helpful, since by the time I reached that point in lecture, I was feeling time pressure very badly, and what I said may not have been clear. You said you got lost at the point where the authors introduce the element b. The idea is that if we want to count the generators of Z_{p_1^{alpha_1}} x ... x Z_{p_m^{alpha_m}}, we let b be any element of that group, figure out a criterion for it to be a generator, and count how many elements satisfy that criterion. An element b of that m-fold direct product is an m-tuple of elements ([b_1], ..., [b_m]), where each [b_k] is a member of Z_{p_k^{alpha_k}}. I hope you can pick it up from there. ---------------------------------------------------------------------- You ask why, as stated in the last paragraph of the proof of Corollary 3.5.6, p.139, the group Z_p^alpha has p^{alpha-1} elements that are multiples of p. The elements of Z_p^alpha correspond to the integers n that satisfy 0 _< n < p^alpha. Those that are multiples of p are of the form p m, where 0 _< m < (p^alpha)/p, so there are (p^alpha)/p of them; i.e., p^{alpha-1}. Another way to see this is to note that they form the subgroup <[p]>, and by Prop.3.5.3, translated into additive notation, the order of this subgroup will be "n/d" where the "n" is p^alpha, and the "d=gcd(m,n)" is gcd(p,p^alpha) = p; so the order is (p^alpha)/p = p^{alpha-1}. ---------------------------------------------------------------------- You ask whether there is a relationship between the exponent of a group G (p.139, Def. 3.5.7) and the orders of its elements. Certainly! You should recall what we have proved about which integers N satisfy a^N = e; the answer is stated in terms of the order of a. From that you should be able to see what property an integer N must have to satisfy a^N = e for all elements a of G, and which will be the smallest integer N having that property. ---------------------------------------------------------------------- You ask about the statement "G is cyclic if and only if there exists an element of order |G|" used in the last sentence before the exercises on p.140. This applies to finite groups G. If an element a of G has order |G|, that means that the subgroup has |G| elements, i.e., it has as many elements as G does, so it is all of G, making G cyclic. ---------------------------------------------------------------------- You ask about the symbol "o(a)" on p.140. The first thing you should do when you don't know the meaning of a symbol is to look in the Index of Symbols, pp.471-473. It is arranged by the page on which the symbol first appears; so working back from the page where you encountered it, you very quickly find its definition. You likewise ask about the term "exponent of G" on p.140. Looking in the index, you will find that it is defined on the page before. Sometimes a symbol or a definition may be missing from the index of symbols or the index; or sometimes the authors may unthinkingly use a word or symbol in a different way from the way in which they have defined it. So in cases where the symbol index, or the index, does not have an item, or gives an interpretation that does not make sense to you, it is appropriate to ask about the meaning. But always check those indices first; usually they will have what you need. ---------------------------------------------------------------------- You ask whether some member of the group of rigid motions of the triangle shown on p.145 can flip it so that the vertex 1 is at the top. There is a motion of the plane that does this, but it isn't a member of the group of rigid motions of the triangle. That group is defined as the set of rigid motions that take points of the triangle to points of the same triangle. The motion you are asking about takes the triangle to a different triangle (i.e., a triangle comprising different points of the plane). ---------------------------------------------------------------------- You ask about the equation |S| = 2n in the next-to-last paragraph of p.146. They've listed the members of S as 2n expressions at the beginning of the paragraph; and they've noted that these expressions represent distinct elements; therefore the set has 2n elements. By the way, they've already shown, in the paragraph above the diagram, that the group G of rigid motions of the n-gon has order 2n; so they can now say that the set S of 2n elements of G must be all of G; that is what they conclude on the same line. ---------------------------------------------------------------------- You ask why the authors think it worth saying on p.149 that if sigma(i) > sigma(j), then x_sigma(i) - x_sigma(j) = -(x_sigma(j) - x_sigma(i)). The point is that the terms appearing in the definition of Delta_n are the differences x_i - x_j, where the subscripts have the relation i < j. So if sigma(i) > sigma(j), then x_sigma(i) - x_sigma(j) is _not_ one of those term. But they are noting that, in this case, it is the _negative_ of one of those terms. Since for every pair i p.162 example 3.7.12 > unimportant > > "by proposition 3.7.9 if we find a particular solution x_0 with > Ax_0=b, the solutions to the nonhomogeneous equation Ax=b consists > of all vectors of the form x_0 + k where k is any solution of the > homogeneous equation Ax=0" > > It's not quite clear to me how they are using prop 3.7.9 > as the proposition says "(3) a = b+k for some k exists in K > I guess I'm confused in the translation of some to any The implication (1) => (3) of Proposition 3.7.9 can be restated: "If b satisfies phi(b) = c, and a is another element of G_1, then a also satisfies phi(a) = c if and only if there exists k\in ker(phi) such that a = kb." Here the "some k" of statement (3) is expressed by "there exists" in the restatement. But that same statement implies that for _every_ k\in ker(phi), the element a = kb satisfies the indicated equation; and that corresponds to the "any" in example 3.7.12. The general observation is that when we pull a "there-exists" out of the left side of an implication, it turns into a "for-all"; i.e., a statement ((there-exists x) P(x)) => Q is equivalent to (for-all x) (P(x) => Q). ---------------------------------------------------------------------- You ask why from fact that we are dealing with an equivalence relation we know that "aH\intersect bH not-=\emptyset" implies that aH = bH, as stated in the sentence on p.165 before Def.3.8.3. If aH\intersect bH is nonempty, that means some element c belongs to both aH and bH. But by that Prop.2.2.3, p.65, c belongs to exactly one of the equivalence classes, so aH and bH are the same equivalence class. ---------------------------------------------------------------------- You ask about the statement in the 3rd-from-last line of the 3rd paragraph of p.166, that "an element in aH determines the same coset". That's a case of Prop.3.8.1: If b is any element in aH (i.e., condition (3) of that proposition), then condition (1) of the proposition says that the coset determined by b, namely bH, is the same coset (the one we started with), aH. This is an instance of a general property of equivalence relations, Prop.2.2.3, p.65. If b belongs to the equivalence class of a, then since that proposition says that it belongs to just one equivalence class (with respect to the given equivalence relation), the equivalence class it determines must be the same as the equivalence class of a. ---------------------------------------------------------------------- You ask about the line just before Example 3.8.3 (p.167), in which the authors say that in an abelian group written additively, Prop.3.8.1 shows that a+H = b+H if and only if a-b\in H. To check this, you needed to translate the Proposition into the notation used in such a group, writing "+" instead of multiplication, and "-" in place of the inverse notation. Please show me what you got when you tried that translation, so that I can help you get past whatever blocked you from seeing that this implied the statement that the authors gave. (The proposition says four conditions are equivalent, while the assertion you ask about just concerns two conditions, so after translating the proposition, you needed to look for the two conditions most relevant to those in that statement.) ---------------------------------------------------------------------- Regarding the last line of p.167, "Since G is abelian, the right cosets are precisely the same as the left cosets", you ask "Are all abelian groups normal?" There is no concept of a "normal group"! If you look at the definition of "normal", it refers to a _subgroup_ H being normal in a group G. So the concept is relative -- a group H may be normal in one group G_1 that contains it, but not normal in another group G_2 that contains it. So you need to re-think what you mean by your question. Let me know your re-stated question. ---------------------------------------------------------------------- You ask how the authors go from (ab)^-1cd to (b^-1d)(d^-1a^-1cd) in the last line of the proof of Prop 3.8.4, p.169. Simplify each of those expressions (using the formula for the inverse of a product in the first, and dropping parentheses in the second) and compare the results! ---------------------------------------------------------------------- You ask about the verification of the formula eN aN = aN in the proof of Theorem 3.8.5, p.169. It is, as you suggested, simply an application of the definition of the multiplication of G/N, i.e., the first display on that page. The same applies to the other calculations in that paragraph. ---------------------------------------------------------------------- You ask whether the results that allow us to make G/N into a group (p.169) work in some cases with a non-normal subgroup H in place of the normal subgroup N. No. One can show that if H is not normal, then the rule used to define the group structure in the normal case never gives a well-defined function. However, one can define a different kind of structure on the set G/H of left cosets of H -- an operation of G on that set. We'll come to that in a few weeks. ---------------------------------------------------------------------- You ask about the phrase "set theoretic product" in Prop.3.8.3(3) p.171. Thanks for pointing out -- it's not good usage on the authors' part, and I'll mention it to them. What they mean is the product in the sense of Definition 3.3.1, p.117. Since that definition applies to any two subsets S and T of G, they refer to it here as the "set-theoretic" product. But properly, "set theoretic" means "in the sense of set theory"; since this product uses the group multiplication of G, that term is not appropriate. ---------------------------------------------------------------------- Regarding the sentence before Def. 3.8.10, p.173 saying that phi "is either one-to-one or trivial" you ask, "If the function is trivial, then it would still be a group homomorphism?" The key question is whether you understand what the authors mean by the "trivial" map. They haven't defined it explicitly, but they've defined related uses of the word "trivial", and you should be able to guess from the reasoning here what they mean by it here. Tell me what you understand them to mean by it (and why), and check the definition of "homomorphism" to see whether the map in question fits that definition. ---------------------------------------------------------------------- You ask how the authors come up with the set {a+b sqrt 2 | a,b \in Q} in Example 4.1.1, p.180. The idea is to think of what happens when you take all rational numbers and the number sqrt 2, and start performing the operations of addition, subtraction, multiplication and division, to get more and more numbers. I hope it is clear that using addition and multiplication repeatedly, one gets at least that set of numbers. If one figures out what happens using further additions and multiplications, one sees that the result can always be simplified back to that form. (If that isn't clear to you, try an example! Note, by the way, that a rational number itself is the special case of a+b sqrt 2 where b is 0, so getting a rational result doesn't carry one outside the set; likewise the set contains the numbers b sqrt 2.) One easily sees that the set is also closed under subtraction; the book shows that it is closed under division. So it is exactly the desired set. In general, when the book discusses an example, they expect you to experiment for yourself if appropriate, either on paper or in your head, and see what you find. ---------------------------------------------------------------------- You ask about the assertion on p.180, ex 4.1.1 that {a + b sqrt 2 | a, b\in Q} is the smallest subfield of the complex numbers containing Q and sqrt 2. You need to ask, "What needs to be verified to show that the field described is the smallest one containing the indicated elements?" which in turn comes down to the question of what that statement means. It means that the set described is a field (more precisely, is a subfield of the complex numbers) containing those elements, and is contained in every field containing those elements. In proving this, the authors devote most of their work to the hard part: Showing that the set they describe forms a subfield. That it contains the given elements is clear. The condition you ask about, which means that every subfield containing those elements contains that set, is implicit in the discussion at the beginning of the example, where the authors get all the elements of the set by field operations from the given elements. If all the elements of the set arise from the given elements by the field operations, they must lie in every field containing those elements. ---------------------------------------------------------------------- You ask about the difference between thinking of x as an "indeterminate" and as an "unknown" (p.184). Well, if we call it an "unknown", that suggests that it is some member of x, and we just don't know which. But if we interpreted it that way, then we would have to say things such as x^5 = x in Z_5[x], because that equation holds for each element of Z_5. Instead, we want to build a system F[x] in which two expressions f(x) and g(x) are not considered the same unless the coefficients of each power of x are the same f(x) as in g(x). (As I said in class, an important reason for doing this is so that we will be able to substitute for x not only elements of the given field F, but also elements of new fields that we build containing it, as the complex numbers were built over the real numbers.) "Indeterminate" is a somewhat arbitrary term that was set up to express this idea. Don't think of it as meaning "something that isn't determined", as the word appears to mean; just think of it as a name we use for the "x" in a polynomial f(x). ---------------------------------------------------------------------- Regarding the point made on p.184 and in class, that we regard polynomials as distinct if they are distinct as expressions, rather than as functions, you write > ... I was wondering whether it is assumed that all expressions are > written in the standard notation before considered for equality. For > example, is x + 15x^2 - x + x^2 considered to be the same expression > as 16x^2? Good point! As Definition 4.1.4 shows, a standard arrangement is assumed; and as noted in the second paragraph of the next page, there even has to be a qualification to using that standard arrangement; for instance, x + 2 and 0x^2 + x + 2 are both in the standard form, but are considered the same polynomial by the criterion of that paragraph. Another way of treating these questions would be to allow all sorts of expressions, but to consider two expressions equal if and only if their equality could be derived as a consequence of the identities on p.186. (In formal terms, one would take the set of all expressions modulo the equivalence relation induced by repeated applications of those identities.) One still would want a convenient way to tell whether two expressions were equivalent, and one would prove a theorem saying that any two expressions could be reduced to a standard form, and that two expressions were equal if and only if the standard forms were the same. The name for such a standard form is a "normal form"; I discuss these ideas, for groups rather than polynomials, in Chapters 1-2 of my Universal Algebra course notes, /~gbergman/245 . ---------------------------------------------------------------------- You ask why the zero polynomial isn't counted as having degree zero like other polynomials a_0 (p.185, top). This is for the same reason that, say, the polynomial x isn't counted as having degree 2 like other polynomials a_2 x^2 + a_1 x + a_0: To make the degree function behave properly under addition and multiplication of polynomials, we have to define it as the largest n such that x^n has _nonzero_ coefficient, and there isn't any such n for the zero polynomial. Various conventions are used to handle that case: One may leave it undefined, as the book does, or make it -infinity, as they mention some authors do, or make it -1, as you say your 110 text does. I'll try to say more about this in lecture. Nevertheless, the authors ought to have defined "constant polynomial" so as to include 0! I'll write them about that. Thanks for pointing it out. ---------------------------------------------------------------------- You ask what is meant by the phrase "the coefficient field" in the first paragraph of p.185. When we consider the set F[x] of polynomials with coefficients in the field F, then F is called the coefficient field. ---------------------------------------------------------------------- Regarding Example 4.1.4, p.185, you ask whether two different polynomials f(x) and g(x) over an infinite field F can define the same polynomial function. No. Since f(x) and g(x) are different polynomials, f(x)-g(x) is a nonzero polynomial, so it can only have finitely many roots. (Because it can only be divisibly by finitely many polynomials of the form x-r.) So not every element of the infinite field F can be a root of f(x) - g(x); and taking a c which is not a root, we have f(c) - g(c) nonzero, so the polynomial functions defined by f and g differ at c. ---------------------------------------------------------------------- You ask why why, as stated in the top paragraph of p.185, -infinity should be chosen as the degree of the zero polynomial. Because the zero polynomial has no terms in x^0, x^1, x^2, etc., it ought to have degree less than every natural number. As you note, this would allow any negative integer. But the advantage of -infinity is that it makes the equation deg(fg) = deg(f) + deg(g) hold even when f or g is 0. Another way of looking at this is that polynomials are just a subsystem of a larger system of expressions, in which x can have both positive and negative exponents; expressions like x^2 + 5 + x^-1 and x^-1 + 2x^-2. (These are known to mathematicians as "Laurent polynomials.) One can define "degree" on such expressions, still meaning the greatest n such that there is an x^n term (so the two expressions just noted would have degrees 2 and -1 respectively). Now if we look at the expression 0 in that system, we see that it should have degree < 0 because it doesn't have any nonnegative-exponent terms, it should have degree < -1 because it also doesn't have any x^-1 terms, etc. etc.; i.e., it should have degree less than every negative integer; -infinity will work. Note also that the authors don't say they will define 0 to have degree -infinity; they just say that it "is often assigned -infinity as a degree." I.e., some authors make that convention. I really wish our authors did so in this text, but they don't; they just comment sometimes that if that convention were made, some calculation would be simpler. ---------------------------------------------------------------------- Regarding the construction of the complex numbers (p.445), you ask whether, after passing from the field of real numbers to the field of complex numbers, one can go still further and get a larger interesting extension field I answer that at length in my "answers to questions of the day" file from Math 104; you might want to look at it. The file is /~gbergman/ug.hndts/06x2+03F_104_q+a.txt . If you use a browser like Firefox that allows you to search for a phrase in a file, search for "construction of the complex numbers", which occurs in the first line of that answer. If you can merely search for single words, look for "complex", and then find the second answer after the one where that first occurs. If, finally, you can go to a specific line-number (though I don't know how to do that), go to line 592 in that file. If you can't do any of these easily, let me know and I'll copy that reply for you. ---------------------------------------------------------------------- You ask how, in the proof of Theorem 4.1.9, p.188, one uses Proposition 4.1.5 to get the fact that if q(x) - q_1(x) is nonzero, then (q(x) - q_1(x))(x - c) has degree >_ 1. To help you, I need to know how far you got on your own. Did you see that Proposition 4.1.5 is about the degree of a product, and that the assertion you are asking about is also about the degree of a product? If so, did you try to apply the proposition to that product? If so, what did applying the proposition give you? As I say in the class handout, if you were asking the question at office hours, I could query you to see where you had gotten on your own and what needed to be clarified. But it's much harder to do that by e-mail, so it's best if you explain how far you've been able to get in your question. ---------------------------------------------------------------------- You ask whether Theorem 4.2.1, p.193, applies to "infinite polynomials". The expressions you are talking about are not called polynomials, they are called power series. (See the definition of a polynomial, Def.4.1.4, which specifies that the terms range from a constant term to an x^m term for some positive integer m.) The algebraic properties of power series are quite different from those of polynomials. One can prove a division algorithm for these -- based on the lowest, rather than the highest degree term with nonzero coefficient. But we will not be studying power series in this course. ---------------------------------------------------------------------- You ask about the first step of Exercise 4.2.2, p.194, where the authors transform 6x^4 - 2x^3 + x^2 + 5x - 18 to 6x^4 + 5x^3 + x^2 + 5x + 3. Remember that the authors have said they are using a simplified notation. If what they do is not clear in that simplified notation, you need to rewrite things using the original notation for Z_7. In that notation, the original expression for f(x) becomes [6]_7 x^4 + [-2]_7 x^3 + [1]_7 x^2 + [5]_7 x + [-18]_7. Now recall from Chapter 1 that each element of Z_n is equal to one of [0]_n, [1]_n, ..., [n-1]_n. Two of the coefficients in the above expression aren't in that form, namely [-2]_7 and [-18]_7. So the authors rewrite them in that form. Is the way they do so now clear? ---------------------------------------------------------------------- You ask about the "multiply and divide by 2" step on p.195 at the end of Example 4.2.2. At the beginning of the exercise, the authors pointed out that since it is easier to divide by a monic polynomial than a non-monic one, they would divide not by g(x) = 2x + 4 but by G(x) = g(x)/2 = x + 2, and compensate for this at the end. (They didn't introduce that capital-letter notation; I am doing it to clarify what is going on.) So they got a result f(x) = Q(x)G(x) + r(x). They now want to express f(x) as a multiple-with-remainder of g(x) rather than of G(x); which is easy because every multiple of G(x) is also a multiple of g(x) = 2G(x), and vice versa. Computationally, the polynomial Q(x)G(x) is equal to (Q(x)/2)(2G(x)); so letting 2Q(x) = q(x), one gets f(x) = g(x)q(x) + r(x), as desired. All that is left is noting that 2G(x) is indeed g(x) = 2x+4, and calculating Q(x)/2. It wasn't clear from your question whether the reasoning, which I've indicated above, or the calculation of Q(x)/2, was what you needed explained. If it was the latter, show me what you get, or what difficulty you run into, when you try to calculate Q(x)/2, and I will see what needs to be clarified. ---------------------------------------------------------------------- You ask whether the conclusion of Theorem 4.2.2, p.195, simply means that I consists of "the set of polynomials that are equal to two polynomials multiplied together". No; you have to notice the quantification of the factors "q(x)" and "d(x)" in the description of that set. "d(x)" is fixed: as the preceding sentence says, it is a nonzero polynomial in I of minimal degree. On the other hand, q(x) ranges over all elements of F[x]. So I consists precisely of all multiples of d(x). The authors mention that this result parallels Theorem 1.1.4; I suggest going back to that and making sure you understand that theorem. ---------------------------------------------------------------------- You ask about the book's statement at the bottom of p.198 that x^2 + x + 1 is irreducible as a polynomial over Z_2. First remember that when they speak of x^2 + x + 1 as a polynomial over Z_2, they really mean the polynomial [1]_2 x^2 + [1]_2 x + [1]_2 in Z_2[x]. As to how one shows that it is irreducible: The only polynomials of degree 1 over Z_2 are [1]_2 x + [0]_2 and [1]_2 x + [1]_2; so if [1]_2 x^2 + [1]_2 x + [1]_2 factored in Z_2[x], it would have to be a product of two of these. Neither of the factors can be [1]_2 x + [0]_2, since that would imply that the constant term of the product was zero; so the only possibility is ([1]_2 x + [1]_2)^2. But that gives [1]_2 x^2 + [0]_2 x + [1]_2, which is not the same and [1]_2 x^2 + [1]_2 x + [1]_2. So there is no factorization. (All of this could have been written more simply by dropping "[ ]_2"s. I included them just in case the authors' omitting them was the source of your confusion.) ---------------------------------------------------------------------- You ask what is meant on p.199 by the statement that "The proof of Thm 1.2.7 can be carried over to polynomials by using irreducible polynomials in place of prime numbers and by using the degree of a polynomial in place of the absolute value of a number." They mean that to get a proof of Theorem 4.2.9, one should read the development of Theorem 1.2.7 (starting with Lemma 1.2.6, though they don't say that), substituting "F[x]" everywhere for "Z" and "polynomial" everywhere for "integer", and replacing reference to one positive integer being larger or smaller than another, or having larger or smaller absolute value, with statements that the degree of one polynomial be larger or smaller than the degree of another. Also, though they don't say this explicitly, "positive integer" should be replaced by "monic polynomial". ---------------------------------------------------------------------- You ask how the fact that p(x) doesn't divide p'(x) is used in the line after the 3rd display on p.200. The preceding display shows that p(x) divides a(x)p'(x). To conclude that it in fact divides a(x), one uses Lemma 4.2.8 and the fact that it doesn't divide p'(x). ---------------------------------------------------------------------- You ask how the authors conclude in the line after the third display on p.200 that p(x) doesn't divide p'(x), suggesting that a constant polynomial p(x) would be irreducible by Def.4.2.6, and would give a counterexample. But if you look more closely at Def.4.2.6, you will see that it begins "A nonconstant polynomial ...". The constants are excluded from the definition of "irreducible polynomial" because they don't have the relevant behavior; e.g., the number of times a nonzero polynomial f(x) is divisible by an irreducible polynomial is always bounded; the number of times one can divide f(x) by a constant is not. As I said in class, p(x) does not divide p'(x) because the latter has smaller degree (and is nonzero since constants are excluded). ---------------------------------------------------------------------- You ask about the use of the symbol at the bottom of p.203. When you're unsure about the meaning of a symbol, you should always first look in the index of symbols on pp. 471-473! It is arranged according to the page on which the symbol is introduced, so you should start from the page where you found the symbol you are wondering about, and work back. If you look for "< ... >" starting from p.203 and working back, you will find for p.203 itself the symbol F[x]/, but going up just two lines, it shows that is the set of polynomials divisible by g(x), and refers you to p. 187. So (as I discussed in class yesterday), F[x]/ means what you get if your start with F[x], and form in it the set of cosets of the additive subgroup . In the future, I hope you will remember to check the index of symbols first. If after doing so you still have a question, then you can pose it as your question of the day. ---------------------------------------------------------------------- You ask whether, in Proposition 4.3.4, p.204, any of the polynomials a(x), b(x), c(x), d(x) can be zero. Certainly! ---------------------------------------------------------------------- You ask about the statement in Example 4.3.1, p.205, that every congruence class in R[x]/ "can be represented by a ... polynomial of the form a+bx". When the authors say that a congruence class is "represented by" an element, they mean that the element is a representative of the congruence class, i.e., a member of the class. (In general, we don't call a member of a set a "representative"; but in the case of a congruence class, once we know one element of the class, it determines the whole class; i.e., every element of the class gives the full information on what the class is, so it is sensible to call it a "representative".) The term was defined on p. 35 for congruence classes of integers; it is used similarly for congruence classes of polynomials. As the authors say, their assertion is a consequence of Proposition 4.3.3. Do you see that that proposition, in the case p(x) = x^2+1, gives this assertion? ---------------------------------------------------------------------- You ask about the statement on p.206, line before Theorem 4.3.6, that the inverse [a(x)]^-1 can be found using the Euclidean algorithm. The authors are discussing the situation where gcd(a(x),p(x)) = 1. They have noted that this implies that 1 can be represented as a linear combination of a(x) and p(x), and observe how to get an inverse to [a(x)] from such a representation. So the problem comes down to how to get that representation. This is sketched in Example 4.2.3, where they indicate how the Euclidean algorithm can be used to find a representation for the gcd of any two polynomials as a linear combination of those polynomials. (That example is brief, because the method has been discussed in much more detail for gcd's of integers, which work in the same way.) ---------------------------------------------------------------------- > proof of thm 4.3.6 p.206 > > ... why can't p(x) be reducible and still gcd(a(x), p(x)) =1? For p(x) reducible, gcd(a(x),p(x))=1 will hold for _some_ a(x), but not in general for _all_ a(x) subject to the conditions stated in the sentence you quote. As I said in lecture, if p(x) is reducible, it will have a factor of smaller positive degree; and if we take for a(x) this factor, assumed monic, then gcd(a(x), p(x)) will be a(x), not 1. If you look at the definition of "field", and the property of fields that we are proving here, you will see that for that property to hold, the statement must be true of _all_ such a(x). ---------------------------------------------------------------------- You ask about the statement on p.208, second paragraph, that we "identify" F with the corresponding subfield of E. I discuss the mathematical use of "to identify" near the top of p. 11 of the handout on Sets, Logic and Mathematical Language that I gave out at the beginning of the Semester. Reread the first two paragraphs about that word (the short third paragraph doesn't apply here), and let me know whether it helps. You might still have questions; if so, let me know; but at least start with that background. ---------------------------------------------------------------------- You ask why [1+x] + [1+x] = [0] in Table 4.3.1, p.209. By definition, [1+x] + [1+x] = [1+x+1+x] = [2+2x], so the question is whether 2+2x lies in the same congruence class as 0; in other words, whether it is divisible by x^2+x+1. Well, since we're in Z_2[x], the coefficients are elements of Z_2, and in Z_2, 2 = 0, so 2+2x = 0+0x = 0, so it certainly divisible by x^2+x+1. To put it another way, it is in the same equivalence class as 0 because it is 0! (It is only in the second table that congruence mod x^2+x+1 comes in in a nontrivial way.) ---------------------------------------------------------------------- You ask about the statement "It follows that r | a_0 s^n and s | a_n r^n", at the end of the proof of Prop.4.4.1, p.211. If you look at the preceding line, you will see that all the summands except the last one, a_0 s^n, are automatically divisible by r. Since the sum equals 0, that last summand must also be divisible by r. (It is the negative of the sum of the others.) Similar reasoning shows that the first summand, a_n r^n, must be divisible by s. ---------------------------------------------------------------------- You ask about the implication in Example 4.4.2, p.212, that "a further analysis of the proof" of Theorem 4.1.9 is needed to conclude that in that theorem, if f(x)\in Z[x], then q(x)\in Z[x] as well. The hypothesis of that theorem, and of Chapter 4 generally, is that F is a field. Not every result proved in that chapter is true if one puts another ring, such as Z, in place of a field. For instance, the Division Algorithm fails: one can't divide x by 2x+1 and get a quotient and remainder in Z[x]. Just looking at the statements of the theorems, one can't tell whether they really depend on F being a field or not. So one must, as the authors say, examine their proofs. In this case, the proof goes over with no changes with F replaced by any commutative ring, and the authors might have said "an examination of the proof" rather than the intimidating "further analysis". ---------------------------------------------------------------------- You ask why the authors choose n=1 in the case they work in the second paragraph of Example 4.4.2, p.212. If you substitute n=0, you will find that the condition of the first paragraph of that example is exactly the conclusion of Prop.4.4.1 (but restricted to roots c that are integers). So in fact, the method of this example is a way of generalizing that proposition, where instead of considering the constant term of f (its value at 0) one considers its value at other points. It is nice to do one's calculations with small numbers, and one to get small numbers one chooses values of n near to 0; for instance, 1. ---------------------------------------------------------------------- You ask why gcd(a_1,a_2,a_3) = gcd(gcd(a_1,a_2),a_3)) (p.213, second line). There are various ways to see this. One is to note that an integer divides all of a_1, a_2, and a_3 if and only if it divides gcd(a_1,a_2) and a_3 (because dividing both a_1 and a_2 is equivalent to dividing gcd(a_1,a_2)), so the "greatest" integer that divides a_1, a_2 and a_3 will be the "greatest" integer that divides gcd(a_1,a_2) and a_3, giving the asserted equality. Another is to note that the gcd of a set of integers is the integer in whose factorization each prime p has the minimum of the powers to which it appears in the given integers; and verify the statement by comparing the powers to which primes appear on the two sides. ---------------------------------------------------------------------- Regarding Definition 4.4.8, p.216, you ask > Are they defining the complex nth roots as those that are not in R? No! The real numbers are a subfield of the complex numbers, so all real roots are also complex roots. Complex roots means all roots in C, real or not. ---------------------------------------------------------------------- You ask why, as stated on p.217, a polynomial f(x) of odd degree over the real numbers must have at least one real root. This is shown by looking at the polynomial function R --> R determined by f. As the variable approaches +infinity, the function will behave like its highest-degree term, going to +infinity or -infinity depending on whether the sign of that term is + or -. As the variable approaches -infinity, the function will go to infinity in the opposite direction (because odd exponents n satisfy (-c)^n = -c^n). Using the Intermediate Value Theorem, one concludes that somewhere inbetween, the function takes on the value 0, i.e., has a root. ---------------------------------------------------------------------- You ask whether a commutative ring (p.225) is necessarily a group as well. The short answer is "yes". The more precise answer is that a commutative ring is a structure (R, +, .) consisting of a set R and two operations + and . satisfying certain conditions, while a group is a structure consisting of a set with one operation satisfying certain conditions; and when (R, +, .) is a commutative ring, then (R, +) is a group. ---------------------------------------------------------------------- You ask whether, following the analogy of groups, with one operation, and rings (p.225, Def. 5.1.2) with two operations, one can define objects with three or more operations. One can define such objects, but the problem is what sort of axioms to impose to make them have useful structure. Note that a commutative ring is not just a set with two abelian group structures. Both the ring operations are associative and commutative with identity element, but only one of them, addition, is a group operation because only that one has the property that every element has an inverse. Equally important, the two operations they are connected by the distributive law, saying that addition distributes over multiplication, (a+b)c = ac + bc; but _not_ a law saying that multiplication distributes over addition: (ab)+c is not assumed equal to (a+c)(b+c). If one had three operations, which ones should one assume to have inverses, and which ones should one assume to distribute with respect to which others? One can try out various combinations, but most of them will either give trivial results -- e.g., with the axioms forcing there to be only one element -- or uninteresting results -- e.g., when one considers two or more operations with no distributive laws between them. There are many other structures than groups and rings that mathematicians study -- semigroups, lattices, Boolean algebras, Lie algebras, ... but these are not found by simply trying different numbers of operations, but by discovering systems of operations and identities that describe interesting mathematically natural structures. ---------------------------------------------------------------------- Concerning the comment immediately following Definition 5.1.2', p.226, that we could allow 1=0, by allowing the zero ring R={0}, you ask whether there are any other rings than the zero ring in which 1 = 0. No, the zero ring is the only ring with this property. As the authors note in point (d) on the next page, for every element a of a commutative ring R one has a.0 = 0. And by definition, one also has a.1 = a. So if 1=0, every element a\in R satisfies a = a.1 = a.0 = 0 . So in such an R, 0 is the only element. ---------------------------------------------------------------------- You write that the definition of multiplication of polynomials in Example 5.1.2, p.228 seems very unintuitive. As I said in class on Tuesday when previewing this reading, this description of T is being built as a "model" of what people had come up with from experience in multiplying polynomials regarded as functions or expressions. That definition of multiplication is given in the last two displays on p.185. This abstract model using sequences ("infinite tuples") has the advantage that one doesn't have to worry whether a polynomial is a particular set of symbols written with ink, or what; a sequence is a well-defined mathematical concept. But for the motivation of the definition, one has to go back to the "rough" ideas that it was abstracted from. (The same is true of the complex numbers.) ---------------------------------------------------------------------- You ask about the relation between the terms "coefficient ring" on p.228 and "commutative ring" (p.225). "Coefficient ring" is a term used when one is talking about polynomials: When one constructs a polynomial ring R[x] (or R[y] etc.), the ring R that one started with, from which one takes the elements that will be coefficients of powers of the indeterminate, is called the "coefficient ring". In the context of this course, that ring will always be a commutative ring; so "coefficient ring" is a term used to refer to a commutative ring R relative to its polynomial ring R[x]. ---------------------------------------------------------------------- You ask about the step three lines from the bottom of p.230, "1/(m+n sqrt 2) \in Z[sqrt 2] if and only if m/(m^2-2n^2) and n/(m^2-2n^2) are integers". The authors ought to have explained that! The key is to calculate 1/(m+n sqrt 2) by multiplying numerator and denominator by 1/(m-n sqrt 2). One gets an expression of the form a + b sqrt 2, where a and b are rational numbers; so this belongs to Z[sqrt 2] if and only if those rational numbers are integers. You should try out the computation; it's quick. ---------------------------------------------------------------------- You ask about the relation between the use of "integral" in "integral domain" (p.232) and in "integral roots". Historically, they come from the same use of the word: "integral" meant "which is an integer". The simple case is that of integral roots, roots that are integers. As to the origin of "integral domain", I think that goes back to looking at the integers as a subset of the field of rationals. As people started studying other fields F, they also looked at subrings R of F that were not fields, and thought of these as having the same relationship to F that Z has to Q. Seeing them as sets parallel to the set of integers, they named them "integral domains". That phrase actually seems to have been used before "ring" was used in its mathematical sense. Once people had the general concept of ring, they sorted out which rings could be subrings of fields: those without zero-divisors. In terms of the present-day meaning, you should think of these two uses of "integral" as unrelated (and certainly as unrelated to the use in calculus, which shares with these uses only the original sense of "integral", namely "whole".) ---------------------------------------------------------------------- You ask what the point is of the concept of integral domain (p.232). Note that the conditions in Definition 5.1.7 are statements that are true in "ordinary arithmetic", and that we are used to taking for granted. So rings for which they are not true are "exotic" until we get used to them, and certain arguments that we would like to give based on our past experience do not hold for those rings. The rings for which those conditions are true, on the other hand, so that we _can_ apply such arguments, are called "integral domains". You need to get familiar with both the "exotic" sorts of rings and with rings that have the familiar property expressed by that display; in other words, both rings that are and that are not integral domains. I suggest that you think of examples of rings you know, and classify them into those that satisfy the definition of integral domain, and those that don't. As I said in class, the phrase "integral domain" has a complicated history; you shouldn't worry about dissecting it, but just learn it as an undivided whole. ---------------------------------------------------------------------- You ask about the relationship between ring homomorphisms (p.238) and linear transformations. Given any sort of algebraic object, a "homomorphism" of that sort of object is defined to be a map of underlying sets which respects the operations. In elementary developments, one usually leaves out of the definition of homomorphisms minor conditions that can be deduced from the more major ones. For instance, the conditions that a group homomorphism respect the inverse operation and the identity element are left out because any map between groups which respects the multiplication will respect these other two parts of the structure; so to include those in the definition would be redundant. But in setting up a general theory, one realizes that the general concept of a homomorphism should be a map respecting all the structure specified for the type of object in question, and that statements that some of these conditions imply others are specific to particular sorts of algebraic object, and should merely be noted as lemmas in the theory of that sort of object. So under this general approach, the condition of respecting the identity element belongs in the definitions of both group and ring; and it happens that in the case of groups one can prove a lemma saying that a map will be a homomorphism if one merely knows that it respects multiplication, while in the case of rings, one finds taht a map which merely respects addition and multiplication need not be a homomorphism. In some fields of mathematics, homomorphisms have special names that were coined before the general term was introduced: In particular, for vector spaces, the term used is "linear transformation", though "homomorphism of vector spaces" would be equally understood. (In the subject of representations of groups on vector spaces, a homomorphism between two vector spaces on which a group G acts by linear automorphisms has the curious name "intertwining operator", though this, too, just means "homomorphism".) ---------------------------------------------------------------------- You ask about the use of "pi" for the "natural projection" of a ring R onto a factor-ring R/I (pp.240, 254, etc.). The authors are using that letter because they use Greek letters for homomorphisms, and "pi" is the Greek letter corresponding to the initial of "projection". No connection with the number pi ! ---------------------------------------------------------------------- You ask whether, in the line after the second display on p.241, "phi(1)" should be "phi_u(1)". Right! I'll point this out to the authors. Thanks for noting it! ---------------------------------------------------------------------- You ask about the difference between the condition "phi(a)=e" in the definition of the kernel of a group homomorphism, and "phi(a)=0" in the definition of the kernel of a ring homomorphism (p.242); and whether one can use the former notation instead of the latter. Groups can be written either additively or multiplicatively, and their identity elements can be denoted by 0, 1, e, or sometimes other letters. When one wants a notation that is understood to cover all cases, one denotes the operation multiplicatively and usually calls the identity element e. Hence the definition of the kernel of a group homomorphism was stated using that letter, but is understood to be applicable to cases where other symbols are used. For rings, on the other hand, we have two operations, with a certain asymmetric relation between them. One of these is always denoted "+", and its identity element "0", while the other is always written multiplicatively, and its identity element is usually denoted "1" (although in this case there are exceptions, such as rings of matrices, where the multiplicative identity is called "I"). The kernel of a ring homomorphism phi is defined to be the set of elements which are mapped by phi to the _additive_ identity element. Hence in writing down that definition, one must write "... = 0", not "... = 1" or "... = e", to show that we mean the identity element of the operation "+", not of the multiplication. ---------------------------------------------------------------------- You ask whether the kernel of a homomorphism R --> S (p.242) is a subring of R. It is not a subring under the definitions we are using in this course, since (unless it is all of R) it does not contain the multiplicative identity element of R. In fact, the main reason that so many undergraduate textbooks do not require an identity element in their definitions of rings and subrings is so that under their definitions, ideals (which is what kernels of homomorphisms are) will be subrings, thus strengthening the analogy between factor-groups of groups by normal subgroups, and factor-rings of rings by ideals. But that one benefit is outweighed by a lot of disadvantages of leaving "1" out of the definition of rings and subrings; so I think our authors' choice (which is also the preference of most workers in ring theory) is the better one. ---------------------------------------------------------------------- You asked about the various uses of "/" we have been seeing with rings (namely, F[x]/ in Def.4.3.2, p .203; R/ker(phi) in Theorem 5.2.5, p.244, and R/I in Theorem 5.3.5, p.253). I hope that what I said in class made it clear that these are all the same usage. On the one hand, in each case we take the additive group of the ring, and form the factor-group (Def. 3.8.6, p.169) of that group by the subgroup in question, getting a new additive group (so all these instances of "/" come from the use of "/" in group theory), and we then make that additive group into a ring by defining an appropriate multiplication. On the other hand, the constructions F[x]/ and R/ker(phi) are special cases of the construction R/I, since and ker(phi) are ideals of the rings F[x] and R respectively. So basically, we have one construction "G/N", for groups, and a construction "R/I" for rings that starts with that group-theoretic construction, and adds more structure to it. ---------------------------------------------------------------------- You ask about the statement after the display in the middle of p.246 that "If deg(pi-hat(f(x)))=deg(f(x)), then this gives a nontrivial factorization of pi-hat(f(x)) in Z_n[x]". The display gives a factorization of pi-hat(f(x)); the question is whether it is a nontrivial factorization, i.e., a factorization into polynomials of smaller degree. The polynomials pi-hat(g(x)) and pi-hat(h(x)) on the right-hand side have degrees less than or equal to the degrees of g(x) and h(x) respectively, which are less than the degree of f(x) (since f(x)=g(x)h(x) was assumed a nontrivial factorization of f(x)). So if the left-hand side has the same degree as f(x), we have the desired inequality of degrees. That is what the authors are pointing out. ---------------------------------------------------------------------- You ask about the relationship between the last two displays before Example 5.2.13 on p.247. You need to read the words of the book carefully. Although the last of those displays is introduced by "It then follows ...", they are not saying that it follows from the preceding display. That preceding display is given as the justification for the assertion two lines above it, that "an element (a_1,...,a_n) ... is a unit if and only if each component a_i is a unit ...". (The fact that that display is a justification of this can be seen from the fact that it is introduced by the words "This can be shown by observing ...".) So when, after that display, the authors say "It then follows easily ...", then mean that the final display follows easily from the statement they have just justified, that (a_1,...,a_n) is a unit if and only if each a_i is a unit. Does the argument make sense now? ---------------------------------------------------------------------- You ask what, precisely, an n-tuple (used in the definition of direct sum, p.247) is. The short answer is: a list of length n. I assume you're familiar with the phrase "ordered pair" for a "length-2 list", (a,b). Similarly, a length-3 list (a,b,c) used to be, and is still sometimes called an "ordered triple", and a list of length 4 and "ordered quadruple". For higher numbers, the forms ended in "-tuple" (e.g., quintuple), from which mathematicians generalized to "n-tuple", and then extended this usage backwards, so that "triple" is now usually replaced by "3-tuple", "quintuple" by "5-tuple", etc.. Set-theorists, who don't want to introduce more "primitive notions" than are needed, introduce a way of regarding these "lists" in terms of other mathematical notions: Namely, they define an n-tuple (a_1,...,a_n) to be a function on the set {1,...,n}; so its value at 1 is called a_1, etc.. (Actually, for their own reasons, they prefer to use {0,...,n-1} and write (a_0,...,a_{n-1}); but if a tuple is written (a_1,...,a_n), as most mathematicians prefer to do, then set-theorists would agree that it is a function on {1,...,n}.) To complicate things further, they define "function" to be a certain sort of set of ordered pairs; so this means that they have to start with a notion of pair before they can construct that of n-tuple. So set-theorists have two notions of "ordered pair": one that they define in a different way, and one that is the n=2 case of their definition of n-tuple. But you don't have to know any of these details! The idea that an ordered n-tuple is an "ordered list", written in parentheses, is all you need. If you want to see some of the additional details, you can take Math 135. ---------------------------------------------------------------------- You ask about the relation between direct sums as in Def.5.2.9, p.247, and direct sums of vector spaces. Whether one is dealing with rings, abelian groups, or vector spaces, there is a distinction between "internal direct sums" and "external direct sums" which our text does not go into (but which I sketched in class for groups). If one takes objects (groups etc.) X_1, ..., X_n, and forms from them an object consisting of ordered n-tuples of elements, with operations defined componentwise, this is an "external" direct sum of the X_i. But one may also start with a known object X, and find certain subsets X_i such that every element of X can be written uniquely as a sum of elements, one from each X_i; and such that each X_i can be given a structure of the same sort (ring, abelian group, vector space), so that this association of each element of X with an n-tuple of elements of the X_i constitutes an isomorphism between X and the (external) direct sum of those structures. Then X is called the "internal direct sum" of the X_i. The conditions for this to hold were found for groups in Theorem 7.1.3, p.319, although for groups written non-additively the authors speak of "direct products" rather than "direct sums"; and they don't actually introduce the terminology of "internal" and "external", and they only consider there the case n=2. But I hope that result bridges for you the gap between what you saw in 110 and what the authors are defining here. ---------------------------------------------------------------------- You ask how the statement on p.247 that (a_1,...,a_n) is a unit if and only if each a_i is a unit implies that R^x is isomorphic to the direct product of the groups R^x_i. Well, strictly speaking, it proves more than that the groups are isomorphic: It shows that they are equal. I.e., it shows that an element (a_1,...,a_n) belongs to R^x if and only if it belongs to the product-set written on the right in the last display; and if you look at the definition of the group operations on the two sides, you will see that they are the same. I hope this helps. ---------------------------------------------------------------------- You ask whether the characteristic (p.248, Def. 5.2.10) of an integral domain should have to be zero, since if n . 1 = 0, with 1 not equal to 0, the factor n must equal 0. When one writes "n . 1" in the definition of characteristic, the "n" is a member of Z, while the "1" is a member of R; so we are not dealing with a product of two elements of R, which is the situation that the definition of integral domain applies to. (Actually, for any ring R and any positive integer n, if we write 1_R for the multiplicative identity of R, and n_R for the result of adding 1_R to itself n times, then for any element x\in R, the element "n . x" can easily be shown to be the product n_R x within the ring R. Now if n is a multiple of the characteristic of R, then n_R = 0; so the fact that n . x will also equal zero becomes, in the ring R, 0 x = 0, which is not a violation of the possibility of R being an integral domain.) ---------------------------------------------------------------------- Regarding the reference on p.248, paragraph before Prop.5.2.11, you say you aren't sure what results from Chapter 3 the authors are referring to. It is the display on p.97 (the additive-notation analogs of the laws of exponents). In the proof of Proposition 5.2.11, the authors verify that these conditions imply that phi is a ring homomorphism. ---------------------------------------------------------------------- You ask about the meaning of the phrase "integral domain" in Proposition 5.2.11, p.248. When you run into a term whose meaning you aren't sure of, the first thing to do is look in the index. If you can't find the term there, or if the page that it leads you to does not clear up your uncertainty, then you should ask about it. If you check for "integral domain" in the index, you will see that it has nothing to do with integers. (There is a historical connection with integers, which I sketch briefly in the answer to a student's question that I have put on the "answers" web page; but the present-day meaning is simply the one defined in the book.) ---------------------------------------------------------------------- You ask whether subrings of principal ideal domains (p.252) are also principal ideal domains. Nope. They are integral domains, of course; and the book hasn't yet pointed out examples of integral domains that aren't principal ideal domains, so they can't address the question. However, an example of an integral domain that isn't a principal ideal domain is Z[x]. For instance, the ideal of all polynomials whose constant terms are even can be shown not to be principal. But it is a subring of Q[x], which is a principal ideal domain by Example 5.3.2. ---------------------------------------------------------------------- You ask why the factor group R/I referred to on p.253 line 3 is abelian. If phi: G --> H is a group homomorphism, and G is abelian, then its image, phi(G), is also abelian. You should be able to prove this quickly from the definition of homomorphism. Now a factor-group G/N is a homomorphic image of G (under the "natural projection" pi); so a factor-group of an abelian group G is abelian. This applies, in particular, when G is the additive group of a ring R. ---------------------------------------------------------------------- Sorry I didn't have time to talk in class about the part of the proof of Prop.5.3.7 (p.254) that you asked about, where the authors say "We must show that this correspondence preserves ideals". What they mean is that since we know that the map described gives a bijection between additive subgroups J of R containing I and additive subgroups K of R/I, to get the result claimed it suffices to show that for additive subgroups J and K that correspond in that way, J is an ideal of R if and only if K is an ideal of R/I. Since an ideal is an additive group that is closed under multiplication by elements of the ring in question, all one has to show is that if J is closed under such multiplication, then so is K, and vice versa. That is the computation they give; let me know if you find any difficulty with it. ---------------------------------------------------------------------- You ask about the use of "x^2 + I = -1 + 2x +I" in getting the first display in Example 5.3.3, p.255. Where the authors say "we can use the formula "x^2 + I = -1 + 2x +I to simplify products", what they mean is that they can use that formula if needed. For the first display, as you observe, it is not needed; grouping terms is enough. For the second display it is needed. ---------------------------------------------------------------------- You ask about the equation x^2 + I = -1 + 2x + I in Example 5.3.3, p.255. Remember that a + I = b + I is equivalent to a-b \in I. So what the authors are doing here is finding a "nice" polynomial which differs from x^2 by a member of I = , i.e., by a multiple of x^2-2x+1. Here "nice" means "having smaller degree than x^2-2x+1", and the fact that one can do this comes from the division algorithm for polynomials, applied with f(x) = x^2 (the polynomial we want to simplify mod I) and g(x) = x^2-2x+1 (the polynomial we want to modify f(x) by some multiple of). You should check that the division algorithm applied to these two polynomials gives remainder -1 + 2x. (The "obvious" way to write the remainder would be 2x-1, but the authors are emphasizing the correspondence a+bi <-> a+bx + I, so they write the constant term first and the x-term after it.) ---------------------------------------------------------------------- You ask about the statement in the last paragraph of p.256 that {0} is an ideal of Z that is prime but not maximal. The preceding sentence shows that {0} is prime. The fact that it is not maximal follows from the fact that there are larger proper ideals, namely nZ for any n > 1. (You ask whether the fact that it doesn't contain 1 shows it is not maximal. No; because when we speak of a "maximal ideal", we mean an ideal maximal among _proper_ ideals; so in that context, all the ideals in question, maximal and non-maximal, don't contain 1. The only ideal containing 1 is the "improper ideal", the whole ring.) ---------------------------------------------------------------------- You ask about the statement at the end of Example 5.3.7, p.258, that if p(x) is irreducible, then F[x]/ is a field. We know this in two ways: On the one hand, from Prop.5.3.9(a) on the preceding page, since Theorem 5.3.10 (also on that page) shows that will then be maximal. And on the other hand, by Theorem 4.3.6. ---------------------------------------------------------------------- You ask how we know, in the 4th-from-last line of Example 5.3.9, p.258, that phi_u(F[x]) is a field. It is, as you guessed, because it is isomorphic to F[x]/ker(phi_u), which is a field by Proposition 5.3.9(a). (Any ring that is isomorphic to a field is a field.) ---------------------------------------------------------------------- Regarding the quotient-field construction (Theorem 5.4.4, p.264) you ask, "Are there any examples that are not like fractions and their operations?" In the context of this course, no: Theorem 5.4.6 shows that any field F containing an isomorphic image of D contains a field isomorphic to Q(D), and by construction, the structure of Q(D) is "like fractions and their operations". Of course, F may contain elements not in the image of Q(D) as well, just as the field of real numbers has elements other than rationals; but I assume that things that "don't come from D" aren't what you're asking about. One situation where your question has more interesting answers is in noncommutative ring theory. The noncommutative analog of a field is called a "skew field" (or "division algebra"). The case that was studied earliest was the one where a construction "like fractions and their operations" did work. (Though one has to distinguish between fractions of the form a b^-1 and those of the form b^-1 a. Some rings allow one sort, some allow the other, and some allow both.) But there are rings that can be embedded in fields where no such expression is possible -- e.g., where an element x y^-1 z can't be converted into a form a b^-1 or b^-1 a; where an element u v^-1 w + x y^-1 z can't be "brought to a common denominator"; where an element such as (u v^-1 w + x y^-1 z)^-1 can't be expressed without using inverses-of-expresions-containing-inverses. I have a paper on the subject; it requires much more background than Math 113 to read, but if you want to take a look at it, it is at /~gbergman/papers/sfd_frm_mtrd.ps . There are also some cases where commutative fields can be constructed in ways that don't look like fractions, even if they are isomorphic to the construction by fractions -- e.g., the construction of the field of rational numbers as repeating decimal expressions. ---------------------------------------------------------------------- You ask about the statement on p.264 line 3, "We are given that ab' = ba' and cd' = dc'". This is the translation of the first sentence of the proof (on the preceding page), "Let [a,b] = [a',b'] and [c,d] = [c',d'] ...". ---------------------------------------------------------------------- Concerning the expression of the given homomorphism theta in Theorem 5.4.6, p.265, as the composite of two homomorphisms, phi and theta-hat, you ask why one would want to go from D to F by two homomorphisms instead of one. The central point of that theorem is the _existence_ of theta-hat; it says that if you have a way of mapping D to F by a homomorphism (namely, theta), this leads to a way of mapping the larger object Q(D) to F as well (namely theta-hat). The "factorization" of theta expressed by writing it as theta-hat phi then shows the relationship between these two maps: theta-hat behaves in a "known" way on the most easily described elements of Q(D), namely, those that come from D via phi. ---------------------------------------------------------------------- You ask about the difference between Theorem 5.4.6 (p.265) and Corollary 5.4.7 (p.266). There are two differences. One is that in the Corollary, the homomorphism D --> F is assumed to be an inclusion, so that one can write elements of F in terms of elements of D without having to use the symbol phi. More important is the assumption that every element of F has the form ab^-1, which does not correspond to any assumption in the original theorem. This means that the resulting map Q(D) --> F is actually onto. Since it is always one-to-one (as in the Theorem), this makes it an isomorphism in the case of the Corollary. ---------------------------------------------------------------------- You ask how one concludes that theta-hat is onto, in Cor.5.4.7, p.266. Check out the definition of theta-hat in this case. You will see that it comes to theta-hat([a,b]) = ab^-1. (If you don't, let me know.) So if every element of F has the form ab^-1, then every such element is in the image of theta-hat; i.e., theta-hat is onto. ---------------------------------------------------------------------- Regarding the terms "base field" and "subfield" mentioned near the end of the second paragraph of section 6.1, p.270, you ask whether there is any difference between them. Just a difference in point of view. When one is considering one or more extensions of a given field K, and looking at what polynomials elements of those extension fields satisfy in K[x], what degrees such elements have over K, what dimensions those fields have over K, etc., one calls K "the base field". One may note in such discussion that one of those extension fields E is a subfield of another of them, F, but one doesn't call E "the base field" unless one changes one's point of view and starts looking at minimal polynomials of members of F in E[x]. (So, for instance, if one considers Q(sqrt 2) and Q(sqrt 2, sqrt 3) as extensions of Q, one may note that the former is a subfield of the latter, but as long as one is looking at polynomials that the elements of these fields satisfy over Q, one is considering Q "the base field".) ---------------------------------------------------------------------- You ask about the reasons for the last two sentences before Def.6.1.1, p.271, relating primeness, irreducibility, and maximality in rings K[x]. These points were discussed on p.256, second paragraph before Def.5.3.8. Of course, the words "prime ideal" and "maximal ideal" weren't used in that paragraph, because that paragraph was motivating the definition of those terms. But the conditions defining those terms were used. (See also Prop 5.3.9 on the next page.) ---------------------------------------------------------------------- You ask about the difference between u "satisfying" f(x) (p.271, Def. 6.1.1) and being a "root" or "zero" of p(x). No difference. I would consider the latter terms better, since what is "satisfied" is really the equation "f( ) = 0". The book uses "root"; I don't think it uses "zero", though that is another term often used. ---------------------------------------------------------------------- You ask whether I can outline the proofs that pi and e are transcendental, mentioned on p.271. Nope! The proofs that I have seen are those in Ian Stewart's "Galois Theory" (chapter 6 in the second edition, chapter 24 in the third edition. If you try looking them up and have the choice, I recommend the second edition as a better book, generally, than the third.) They involve writing down some messy integrals and performing complicated computations, but give no hint what the idea behind those computations is; and without such an idea, I can't wrap my mind around a long proof. ---------------------------------------------------------------------- Regarding the proof of Prop.6.1.2 on p.271 you ask "Why does the fact that I is prime ideal imply that there exists only one monic polynomial p(x) with minimal degree and p(u) = 0?" The authors don't deduce that from the fact that I is prime -- any nonzero ideal I in K[x] has a unique monic generator; i.e., among all the elements of least degree, only one is monic. I thought the book made this clear, but looking back now, I can only find it stated before this in the last sentence of Exercise 5.3.2, p.252, which is a somewhat inconspicuous spot. In any case, I've pointed it out several times in class. You should see whether the reason is clear to you when you think about it. If not, ask again. ---------------------------------------------------------------------- You ask about the relationship between the polynomials p(x) in Prop.6.1.2 and Def.6.1.3 on p.271. The definition gives a name to the polynomial described in the proposition. The reason the definition doesn't refer to irreducibility or the "p(x)|f(x)" condition is that these are not required to determine the polynomial. As the third sentence of the proposition shows, the polynomial is uniquely determined by the "minimal degree having u as a root" condition. So this is all that belongs in the definition, and the proposition gives the additional facts. ---------------------------------------------------------------------- You ask why there are two different terms describing K(u_1,...,u_n) (p.273, Def.6.1.4). It's not too unusual to have different ways of saying the same thing. "To generate the extension field of K" is what the elements u_1,...,u_n are said to do; "to adjoin u_1,...,u_n to K" is what the mathematician studying that extension is said to do; so using each of those verbs, one gets a way of talking about the extension. However, the authors' wording, making these into two separate sentences, is awkward, and I will include in my letter to them at the end of the semester a suggestion that they shorten it. ---------------------------------------------------------------------- You ask about the difference between the expression a_0+a_1u+...+a_nu^n on p.273, 3rd-from-last paragraph, and the expression a_0+a_1u+....+a_n-1u^n-1 on on p.274, middle. Since the difference concerns the relation between the degree of the polynomial and the integer n, you have to check what "n" means in each case. On p.273, there is no "n" in the context, so "all elements of the form a_0+a_1u+...+a_nu^n" means all such expressions as n ranges over _all_ natural numbers. (Although there was an "n" in the paragraph preceding the proposition, that has no connection with this n. That was the number of elements being adjoined, while as the sentence before the proposition indicates, they are beginning the study of that situation by just looking at the case where one element is adjoined; so that "n" has disappeared.) On p.274, on the other hand, at the beginning of the middle paragraph, the authors let the minimal polynomial of u be p(x) = c_0+c_1x+...+c_nx^n; so n is the degree of that minimal polynomial; and the division algorithm allows us to reduce every polynomial in u to one of smaller degree (or 0), i.e., to the form a_0+a_1u+....+a_n-1u^n-1. See Prop.4.3.3, p.204, and the condition "deg(r(x)) < deg(p(x))" there. ---------------------------------------------------------------------- You ask about the expressions a_0+a_1 u+...+a_m u^m and a_0+a_1 u+....+a_n-1 u^n-1 in the middle paragraph on p.274. The authors first note that any expression a_0+a_1 u+...+a_m u^m can be reduced to a linear combination of 1, u, ..., u^n-1 using the relation p(u) = 0. This can be described, as you do, in terms of solving p(u) = 0 for u^n and substituting this repeatedly into the highest-degree term of the given polynomial. However, it's more convenient to look at it in terms of a process we have already studied, the division algorithm for polynomials, Theorem 4.2.1, p.193. Using that, we can write the given expression as a multiple of p(u) plus an expression of degree < n; and since p(u) = 0, the former summand vanishes, and we are left with the latter. The authors are certainly not saying that if one starts with a_0+a_1 u+...+a_m u^m and performs this process, one will be left with a_0+a_1 u+....+a_n-1 u^n-1. Rather, the idea is "Think of all expressions of the form a_0+a_1 u+...+a_m u^m that represent our given element. By the above argument, one of these will have degree < n; so choosing that to be our expression, we can assume m = n-1; and then our element is a_0+a_1 u+....+a_n-1 u^n-1". ---------------------------------------------------------------------- You ask about the converse to the statement of Proposition 6.2.1, p.277, "If F is an extension field of K, then F is a vector space over K." Well, the wording of that proposition simplifies things a bit. What it really means is, "If F is an extension of a field K, and we ignore the operation of multiplying members of F by other members of F except when the latter elements are members of K, then the structure on F that we are left with is a structure of vector space over K." Or briefly, given any extension field F of K, we can strip away part of its structure, and be left with a structure of vector space over K on the set F. So put that way, there isn't an obvious converse. One might ask, "Given a vector space F over a field K, must there always be a way of extending its vector space structure to a structure of extension field over K ?" To that, the answer is no. For instance, because the only irreducible monic polynomials over the field of complex numbers are the polynomials x-c, one can't construct any extensions containing elements algebraic of degree > 1, from which one can deduce by the methods of section 6.2 that C has no algebraic extensions F with [F:C] > 1. So a 2- or 3-dimensional vector space over C, for instance, can't be made into an extensions field. ---------------------------------------------------------------------- You ask whether one can prove directly the implication (2)=>(3) of Prop.6.2.4, p.277, namely that if u\in F belongs to a finite extension of K, then K(u) is a finite extension of K. Certainly. Suppose u\in F belongs to a finite extension E of K. By definition, K(u) is the _smallest_ extension of K in F containing u, so it is contained in E; so being contained in a finite dimensional subspace E of F, it itself must be finite dimensional. (I see that in the authors' very brief development of the theory of dimensions of vector-spaces and appendix A.7, they don't explicitly say that a subspace of a finite dimensional vector space is finite dimensional. But that is easily deduced from the first conclusion of Corollary A.7.9. If you want more details as to how, let me know.) ---------------------------------------------------------------------- You ask why Theorem 6.2.5 (p.278) doesn't apply the concept of the degree of an extension to infinite extensions as well. Because this course doesn't assume familiarity with the set theory needed to talk about cardinalities of infinite sets. From your Class Questionnaire, I see that you have taken Math 104; but most students in the class haven't; and even 104 doesn't talk about multiplying infinite cardinalities, as would be required to make sense of the formula [F:K] = [F:E] [E:K] when the extensions are infinite. That comes in Math 135. The formula is true for arbitrary extensions, but we don't have the concepts to state it. ---------------------------------------------------------------------- You ask why, in examples like 6.2.2, p.279, the roots of a polynomial f(x)\in K[x] don't have to lie in K. It's true that in Def.4.1.10, p.189, we defined a root of a polynomial f(x)\in K[x] to be an element of K. But that was before we saw the idea of one field being contained in the other. Now that we have that concept, whenever K is a subfield of F, a polynomial f(x) over K can also be looked at as a polynomial over F, and we can apply the Def.4.1.10 to it from that point of view, and so ask what roots it has in F. So for instance, we know that x^2 + 1 has no roots in R, but has two roots in C. Much of what we have been doing from section 4.3 on is studying the situation of a polynomial over a subfield having roots in an extension field. The authors actually revised Def.4.1.10 to include this case, in the first paragraph of Example 5.2.11, p.245. ---------------------------------------------------------------------- You ask about the statement in Example 6.4.3, p.290, "Then the field is identified with the cosets of the form a + ". Where they say "identified with the cosets", it would be better if they said "identified with the set of cosets". In any case, note that they also say "See Example 4.3.2 and Corollary 4.4.10". The relevant one of those two is Example 4.3.2; and in fact, the point they are referring to is made in the paragraph after that Example. Try re-reading Example 4.3.2 (p.207), and the paragraph after it, and see whether it clears up what they mean. If not, let me know. ---------------------------------------------------------------------- You ask why, when E is an extension field containing a splitting field F of f(x) over K, one says that f(x) "splits" over E, as noted in the paragraph after Definition 6.4.1, p.289. The idea of this use of the word "to split" is "to break up into a product of linear factors"; and if E contains F, then the factorization of f(x) into linear factors in F[x] is also a factorization in the larger ring E[x], so f(x) indeed splits over E. The phrase "splitting field of f(x) over K" can be thought of as short for "minimal extension of K over which f(x) splits". Fields that contain such a splitting field still have the property that f(x) splits over them, but they themselves aren't splitting fields because they aren't minimal for that property. ---------------------------------------------------------------------- You ask whether the polynomial f(x) of Def.6.4.1, p.289 can be a minimal polynomial. Assuming n>1, f(x) cannot be a minimal polynomial _over_F_, since it is not irreducible over F. But it can be a minimal polynomial _over_K_. For instance, in Example 6.4.1, x^2 + 1 is the minimal polynomial of i over Q; but after we have formed its splitting field and gotten the field Q(i), the minimal polynomial of i over that field is just x - i. ---------------------------------------------------------------------- You ask about the last paragraph of Example 6.4.2, p.290. In computing the degree of the splitting field of x^3 - 2 over Q, they are using Theorem 6.2.5 (p. 278). Can you see which fields they are taking for the "F", "K" and "E" of that theorem in this discussion? Do you see what values they are asserting [F:E] and [E:K] have? Do you see how they get those values? If so, then by applying that theorem to those values you will get the value 6 as they assert. If you don't see the answers to all of the above questions, or don't get the result I have stated, let me know where you run into trouble. ---------------------------------------------------------------------- You ask what is meant by saying that splitting fields are "unique up to isomorphism" (p.291, paragraph before Lemma 6.4.3, and p. 293, paragraph before Theorem 6.4.5). It means that splitting fields (of the same polynomial) are all isomorphic to each other. This sense of "up to" is one of the bits of mathematical language discussed in the last section of the Handout on Sets and Logic. It is likely to come up in your other courses as well, so I suggest that you look it over. ---------------------------------------------------------------------- You ask how, in the statement of Lemma 6.4.3, p.291, theta can take the polynomial p(x) to a polynomial, when it is a homomorphism between fields. Good point. Where the authors write "under theta" in the middle line, they mean "under applying theta to the coefficients." A formal way to look at this is in terms of Prop.5.2.7, p. 244, with K in the role of "R" and L[x] in the role of "S". Since theta takes K into L, which is contained in L[x], it can be regarded as a homomorphism K --> L[x]. Then in the language of that proposition, we have a homomorphism theta-hat_x, K[x] --> L[x], which takes coefficients in K to L[x] (giving values in the subring L) via theta, and "substitutes x for x", i.e., keeps x unchanged. This is the map that is really being applied to p(x) to get q(x). ---------------------------------------------------------------------- You ask whether the prime subfield of F, defined on p.295, is unique. Yes. Whenever something is defined as the "smallest" set having some property, then it is unique, because this means that it has the property and is contained in all others that have the property; so if X and Y each satisfy that description, then each is contained in the other; so they are the same. (There are some cases similar to the one just described in which one has to put on a proviso. For instance, when one speaks of a "least common multiple" of two elements of an integral domain, this means a common multiple which is a divisor of all other common multiples; and if two elements are divisors of each other, that doesn't exactly make them equal: it makes them what, in the next reading, we will learn to call "associates". So in general integral domains, the l.c.m. of two elements, if it exists, is "unique up to associates" rather than literally unique. However, when, as in the definition of prime subfield, we are dealing with set-inclusion rather than any more subtle relation, we can say that a smallest set with some property, if it exists, is truly unique.) ---------------------------------------------------------------------- You ask how one knows in Example 6.5.2, p.298, that 0+ and 1+ form the prime subfield. From the proof of Prop.6.5.1 on p. 295 (or the proof of Corollary 5.4.8 on p. 266) one sees that the prime subfield of a finite field F (or more generally, of a field of characteristic > 0) consists precisely of the elements n.1, where n\in Z, and "1" denotes the unit element of F. (For F a field of characteristic 0, it is instead the isomorphic copy of Q contained in F; again see Corollary 5.4.8). In this example, where F is GF(2^2) = Z_2/, the characteristic is 2, so the prime field will consist of the elements 0 and 1 of the field; their descriptions as cosets are 0+ and 1+. ---------------------------------------------------------------------- You ask whether the converse of Example 7.1.3 on p.318 holds: that if Z_n / m Z_n =~ Z_m, then m|n. To answer this, note that m Z_n is a cyclic subgroup of Z_n, generated by [m]_n. The book determined the order of any such element [m]_n. This will give the order of the group it generates, m Z_n; and knowing the orders of Z_n and m Z_n, you can find the order of the factor group Z_n m Z_n, and see whether it is the same as the order of Z_m. Since they are both cyclic groups, they will be isomorphic if and only if they have the same order. So you can work this out, and see whether this happens if and only if m|n. If you try this out but run into difficulty, you can show me how far you've gotten, and I can help you finish. ---------------------------------------------------------------------- You ask about the relationship between Theorem 7.1.3 (p.319), the direct sums you saw in Math 110, and Cartesian products. The distinctions in terminology are not completely sharp, but in general, "Cartesian product" is used for sets, while when one takes a Cartesian product of the underlying sets of two groups and puts a group structure on it as in Definition 3.3.3, p.118, one calls this the "direct product" of the groups; and if the groups are abelian and written additively, one may alternatively refer to it as their "direct sum". Three complications: First of all, the Cartesian product of the underlying sets of two groups can sometimes be given other group structures, which go by names such as "semidirect product" and "twisted product". We won't consider these in Math 113, but that fact is why we don't just use the same term "Cartesian product" for the set-construction and group-construction shown in Beachy and Blair. (We do use the same symbol for both constructions, but note that it appears twice in the index of symbols: first as a symbol defined on p.50, and then as a symbol defined on p.118, and is given the two different names indicated above.) Secondly, there are related, but logically distinct concepts of "external direct product" and "internal direct product". In the first, we take two groups G_1 and G_2 and construct from them a new group G_1 x G_2. In the second, we take a group G with two subgroups G_1 and G_2 (or as they are called in Theorem 7.1.3, H and K), and prove that its internal structure is like that of the (external) direct product of those subgroups; this is what Theorem 7.1.3 is about, though our book doesn't introduce the name "internal direct product". I'm not sure what definition of the direct sum of vector spaces you saw in Math 110; the "external" vs. "internal" distinction applies there, too. The fact that you say you never saw Cartesian products in that connection may mean that you only saw "internal" direct sums of vector spaces. Finally, although what I have said above goes over from direct sums or products of two groups or vector spaces to direct sums or products of any finite number, when one comes to infinite families, there is a distinction between "direct sum" and "direct product". Taking the case of a countable family of abelian groups A_0, A_1, A_2, ... , their direct product \Pi_i A_i is defined to be the set of all sequences (a_0, a_1, a_2, ...) with each a_i\in A_i, while their direct sum (+)_i A_i (where "(+)" is my way of representing a circle with a "+" inside it in e-mail) is defined to be the subgroup of \Pi_i A_i consisting of those elements such that all but finitely many of the a_i are zero (equivalently, which have only terms "0" from some point on). In the case of nonabelian groups, where the term "sum" is not used, this can be called the "restricted direct product". The above is probably a lot more than you wanted to know. But if you have further questions about it, let me know. ---------------------------------------------------------------------- You ask why, as stated in the third line of p.320, one has a x a^-1 = e => x = e. Given the equation a x a^-1 = e, try solving for x (which one does by multiplying both sides of the equation on the left by a^-1 and on the right by a). If you don't see that this gives the desired result, show me your computations. ---------------------------------------------------------------------- In connection with the definition of Aut(G) and Inn(G) on p.320, you ask how one computes them. This course doesn't put a big emphasis on computation. In general, the method of computing Aut(G) must depend on what we know about the group. When the group has a reasonably easy-to-grasp form, the idea is to look at a generating set for the group, and see what elements of the group those generators might be mapped to so as to preserve their group-theoretic properties. For each possible choice of how the set of generators is mapped, one then tries to extend the map from the generators to all words in the generators (i.e., all members of the group) using the equations phi(xy) = phi(x) phi(y) and phi(x^-1) = phi(x)^-1. If one finds that this yields a well-defined map, and that map is one-to-one and onto, then one has an automorphism; if one doesn't then the case one is looking at is eliminated. This gives the set of elements Aut(G); one then computes how the automorphisms one has found are composed, getting the group operation. The simplest two examples are worked out in Examples 7.1.5 and 7.1.6. To compute Inn(G), one can use Proposition 7.1.8. ---------------------------------------------------------------------- You ask how the proof of Proposition 7.1.8, p.321, gets the final conclusion G/Z(G) =~ Inn(G). By Theorem 3.8.9! ---------------------------------------------------------------------- You ask how one proves that isomorphism between cyclic groups maps generators to generators (p.321, Example 7.1.5). I can't tell from your question whether you tried the obvious -- namely, assume that a is a generator of a group G, let phi: G --> H be an isomorphism, and try to prove that phi(a) is a generator of H, by writing out what it means to say that a is a generator of G (definition, p.107), and deducing that phi(a) satisfies that same condition in H. If you did, how far did you get, and what difficulty stopped you? ---------------------------------------------------------------------- You ask about the difference between centralizer and normalizer, both defined on p.325. You ought to do more for yourself before you give something as your question of the day. If the definitions look roughly alike, then compare them more closely, word for word: They both begin "For any" -- but is what comes after that the same? If not, is it just a difference in how something is phrased, or does it make a real difference? If, after seeing how the definitions differ, it seems to you that the difference is inessential, then it is reasonable to pose this in your question -- saying why you think the difference is inessential, and asking whether you've missed something. But you have to do the first part of the work, so that I get a reasonable question to respond to. ---------------------------------------------------------------------- You ask about the first sentence of the proof of Prop.7.2.5, p.325, where the authors go from (b^-1 a)x(b^-1 a) = x to b^-1 a \in C(x). This is by the definition of C(x) (top of the same page). ---------------------------------------------------------------------- You ask for an example where a group G acts on a set S and the stabilizer G_x (Def. 7.3.3, p.332) of some x\in S is not just the set {e} in G. If you look at the four examples of actions on p.331, you will see that the first and last have the property that the stabilizer of every element is bigger than {e}, assuming |S|>2 in the first example and n>1 in the last. You also ask whether a stabilizer G_x can be infinite. Yes, this will be so in the first example on that page if S is an infinite set, and in the last example if n>1 and F is an infinite field (such as R). ---------------------------------------------------------------------- You ask about the difference between the subset of S fixed by G and the stabilizer G_x of an element of S in G (p.332). If you look at how the definitions start out, you will see that one of those sets is a subset of S, while the other is a subset of G. So they can't be the same! You should then think about those two definitions, and see _which_ subset of S is represented by S^G, and _which_ subset of G is represented by G_x. ---------------------------------------------------------------------- You ask about the statement on p.333, Example 7.3.6, that the orbit Gg is the conjugacy class of G, contrasting this with the definition, which would make that orbit {ag | a\in G}. That definition was stated for a group action under which the result of a\in G acting on x\in S was denoted "ax". But in the group action discussedin Example 7.3.6, G is the "S", and the result of a\in G acting on g\in G is denoted i_a(g) or a g a^-1. So the definition of the orbit becomes {aga^-1 | a\in G}, which is indeed the conjugacy class. I agree that it is confusing that in Definition 7.3.1, the authors set up "multiplicative" notation for group actions, without even talking about other notations, and hence write "Gx" for the orbit of an element, and then, in this example, they introduce an action that can' be written multiplicatively because multiplication already means something else for elements of G, but they still write "Gg" for the orbit of G under this action! ---------------------------------------------------------------------- You ask about the statement at the beginning of the second paragraph of Example 7.3.8, p.334, that because the orbit of each coset xH is all of S, therefore S^G is empty. Look back at the definitions of "orbit" and "S^G" on p.332, and see whether you can verify that an element x\in S belongs to S^G if and only if its orbit consists of _only_one_ element. Now in this example, S has more than one element, so if the orbit of an element is all of S, that element can't be in S^G. Since, this is true of every element of S, nothing is in S^G. Incidentally, the property that S has more than one element is a consequence of the assumption in the first line that H is a proper subset of G. They put that assumption into this example only so that they can say that S^G is empty; everything else they do in this example is correct without that assumption. ---------------------------------------------------------------------- You ask about the relationship between Prop.7.3.4(b) (p.334) and Prop.7.2.5 (p.325). The earlier result is a special case of the later one: The case for the S of Prop.7.3.4 one takes the set G, and for the action of G on S one takes the action of G on itself by conjugation. ---------------------------------------------------------------------- You ask why the function described in the proof of Prop.7.3.4(b), p.334, is "clearly" onto. Did you write down explicitly what the function is, and what its domain and codomain are? The function maps the set of left cosets of G_x to the orbit Gx of x in S, by sending each coset a G_x to the element ax. Since the codomain, Gx, consists, by definition, of all elements ax (a\in G), we see that every such element is the image, under the function in question, of a coset, namely a G_x. The statement that every element of the codomain is the image of some element under the function is what it means for a function to be onto. ---------------------------------------------------------------------- You ask why Proposition 7.3.5(b) follows from Proposition 7.3.4 (p.334). When one has a one-to-one correspondence between two finite sets, as in Proposition 7.3.4(b), then they have the same number of elements; and the number of cosets of a subgroup H of G is the index [G:H]. (That's the definition of [G:H].) ---------------------------------------------------------------------- You ask why |S^G| is the number of orbits with just one element (p.335, proof of Theorem 7.3.6). Look back at the definitions of "orbit" and "S^G" on p.332, and see whether you can verify that an element x\in S belongs to S^G if and only if its orbit consists of _only_one_ element. If you have difficulty with that verification, let me know. You also ask what 1-element orbits look like. Well, as an example, if S consists of the points of the square, and G is the group D_4, acting on the square in the natural way, then there is one element that isn't moved by any members of S, namely the center-point of the square; so the set consisting of just that point is the only one-element orbit here. For another example, if we let any group G act on itself by conjugation, as in Example 7.3.6, then the elements g whose orbits just have 1 element are those that are not moved by conjugation by any element a; i.e., those that commute with all elements a; i.e., the elements of Z(G). ---------------------------------------------------------------------- Regarding the sentence after the proof of Theorem 7.3.6, p.335, you ask how they reduce the class equation to that theorem, together with Example 7.3.5. Where they say "Example 7.3.5", they mean "Example 7.3.6". (They make the same mistake in Example 7.3.7 (p.333), where they begin "This example is closely related to Example 7.3.5", but mean 7.3.6; so they probably changed the numbering of that example late in the process of revising the book.) Thanks for pointing this out -- I'll include it in the list of errata I send them at the end of the Semester! ---------------------------------------------------------------------- You ask how the "alpha+1" condition in the definition of Sylow p-subgroup, p.338, is used in what follows. The statement that p^alpha divides |G| but p^alpha+1 does not is equivalent to saying that p^alpha is the _largest_ power of p that divides |G|. In other words, it says that when we write |G| = m p^alpha, the factor m is not divisible by p. This is used in proving Lemma 7.4.3, where they say in the next-to-last line "by assumption, [G:P] is not divisible by p", and in the third line of the proof of Theorem 7.4.4, where they say "Since |Q| = p^alpha, p is not a divisor of [G:Q]". (Remember that for finite groups, one has [G:H] = |G|/|H|. So if G has order m p^alpha and the subgroup P has order p^alpha, then [G:P]=m.) ---------------------------------------------------------------------- You ask about the conclusion that k|m at the bottom of p.339. They have proved that k = [G:N(P)]; since P is a subgroup of N(P), this makes k a divisor of [G:P]. (Explicitly: [G:N(P)] = [G:P]/[N(P):P].) Now [G:P] = |G|/|P| = (m p^alpha)/(p^alpha) = m. So k|m. I'm sorry I couldn't give any details on the proofs in section 7.4 in lecture -- I had squeezed too much into that reading! The best thing would be for you to go through it as well as you can by yourself, making notes on what needs to be clarified, and come to office hours, where I can help with those points. ---------------------------------------------------------------------- You ask what is meant by a "maximal" p-subgroup in Theorem 7.4.4(a) (p.339). It means a p-subgroup that is not contained in any larger p-subgroup. In general, when we have a set X of mathematical entities with some sort of order-relation (in this case, ordering by inclusion), we say a member x of that set is "maximal" if there is no element y of X that is properly larger than x. Maximal elements don't have to exist; e.g., the infinite family of sets { {0}, {0,1}, {0,1,2}, ... } has no maximal element; likewise, the set of integers (under the ordering by ">") has no maximal element. But any _finite_ nonempty family with an order-relation will have maximal elements -- one, or possibly more than one. For instance, the set of 2-subgroups of S_3 has three maximal elements, the three 2-sylow subgroups <(1,2)>, <(1,3)> and <(2,3)>. ---------------------------------------------------------------------- You say that it looks as though the conditions defining a Euclidean domain (p.409) were taken out of the blue. But aren't they really the conditions that Z and F[x] have in common that allow us to prove the division algorithm in both cases, and get the Euclidean algorithm from it? The authors' motivation of the concept on p. 408 is a little hand-wavy; but that is the point they are making. ---------------------------------------------------------------------- You ask why, in Example 9.1.4, p.410 the condition delta(m+ni) = 1 implies m+ni = +- 1 or +- i. They're talking about the function delta defined in Example 9.1.3 on the preceding page. When you substitute that definition into the equation above, does it become clear? ---------------------------------------------------------------------- Regarding the definition of "associate" on p.410, you ask about conditions for elements of Z (+) Z to be associates. Well, I would hope you would say how far you had gotten, and where you had run into difficulty in trying to answer the question. The definition of "associates" is based on the definition of unit; so did you try to figure out what the units of Z (+) Z were? Did you succeed ... ? Having twice put you through the "please reply" thing regarding your questions of the day, I won't do it this time; but the point is that the way to learn these concepts is to try to apply them; and if you ask about them in a question of the day, you should show me how far you had gotten and what difficulty you had at that point, so that I can see what you need to have explained. If in trying to answer your question you do run into difficulty, I would be glad to help -- if you show me at what step help is needed! ---------------------------------------------------------------------- You ask why the greatest common divisor of a set of elements (Def.9.1.4, p.411) is not in general unique. The authors answer this question in the paragraph after that definition: Because any associate of a greatest common divisor is also a greatest common divisor. If you are unclear why this is so, you should take any example of a greatest common divisor of two elements, and any associate thereof, and see that it also satisfies the condition defining the greatest common divisor of those elements. Since your question refers to Euclidean domains, I suppose you are asking about the last sentence of that same paragraph. This contrasts general Euclidean domains with the particular cases of Z and F[x] referred to in the preceding sentences. The point the authors make is that for those particular rings, extra conditions were added to the definition of "g.c.d.", which had the convenient effect of picking one element out of all the elements that fit the general condition. But those were special tricks used in those two situations. In the general context of a Euclidean domain (and the still more general context of an integral domain) there is no reason to expect some special trick for every case; so one uses Definition 9.1.4 without extra conditions. And seeing how it is used, one realizes that the extra conditions weren't really relevant to the concept of g.c.d.; they just had the convenience of allowing one to say "unique" instead of "unique up to associates". ---------------------------------------------------------------------- I hope I satisfactorily answered in class your first question about the proof of Proposition 9.1.8, p.412: "If p does not divide a, how do we know that the only common divisors of p and a are units?" Briefly: The assumption that p is irreducible means that the only divisors of p that are not divisible by p are units. ---------------------------------------------------------------------- You ask how in the last line of the proof of Lemma 9.1.11, p.413, they get the conclusion "I_n = I = I_m for all n > m". From the definition of I we know that I_m is contained in I; and we have just proved that I is contained in I_m, so they are equal. Moreover, for n > m, we know from our assumptions that I_m is contained in I_n which is contained in I; so knowing that the first and last of these three terms are equal, it follows that the term I_n which lies between them is also equal to both of them. ---------------------------------------------------------------------- You ask about the relation between primitive polynomials (Def.9.2.3, p.416) and irreducible polynomials. Primitive polynomials need not be irreducible. For instance, x^2 and x^2 - 1 are primitive, but not irreducible. The question of whether all irreducible polynomials are primitive is trickier. In Def. 4.2.6, p. 198, the authors define a polynomial _over_a_field_ to be irreducible if it can't be written as a product of polynomials of smaller degree. In that context, it is true that, as you say, a polynomial of degree 1 is always irreducible. For instance, 2x + 4 is irreducible _as_a_polynomial_over_the_field_ Q. But if we consider 2x + 4 as a polynomial over Z, that definition is not applicable because Z is not a field; we have to apply Definition 9.1.7 (p. 412) in the ring Z[x], and by that definition 2x+4 is not irreducible, since it can be factored as 2(x+2). And, in fact, one finds that an irreducible polynomial of positive degree must be primitive. However, if we consider polynomials of degree 0, then this is not so: For any irreducible element p of D, if we regard p as a member of D[x], then it is irreducible but not primitive. ---------------------------------------------------------------------- You ask about the meaning of the last sentence before the short final paragraph in the proof of Theorem 9.2.5, p.416. That sentence is not really needed for the proof. The authors have shown, in what precedes, that the coefficient of x^{s+t} is not divisible by p. This sentence observes that this conclusion is not true of x^r for any r < s+t; so that x^{s+t} is the smallest power of x whose coefficient is not divisible by p. The idea is, roughly, that as one looks at the coefficients of powers of x, coming down to x^{r+s} from above, the number of terms b_i c_j in the computation of that coefficient that are not necessarily divisible by p gets smaller and smaller, until at x^{r+s} one can say that the number which are not divisible by p is exactly 1, allowing one to conclude that the sum of all the b_i c_j terms is not divisible by p. Once one falls below x^{r+s}, the number of terms that may not be divisible by p is 0, so the sum is divisible by p. ---------------------------------------------------------------------- You ask why the authors use the notation a/b in denoting elements of quotient fields in Lemma 9.2.6, p.417, instead of the notation [a,b]. Their point in introducing the notation [a,b] for the equivalence class of the ordered pair (a,b) in the construction of the quotient field Q(D) was to have a symbol for such elements that would not lead us to imagine that we could do arithmetic with them in the way we are used to, until we had proved that such equivalence classes actually formed a field containing D and in which [a,b] was equal to a b^{-1}, from which those familiar properties follow. Once this was done, there was no danger in replacing "[a,b]" with the notation a/b; rather, the latter notation is useful, in that it reminds us of the properties these elements have. At the bottom of p. 265 they note that they will use the notation a/b from then on. ---------------------------------------------------------------------- You ask about the last sentence of the middle paragraph of the proof of Theorem 9.2.8, p.418. The fact that f*(x) is primitive is, as you guessed, a key point: When we factor f*(x) into polynomials of lower degrees, and then factor these into irreducibles, none of those irreducibles are in D, since an irreducible factor in D would be a factor of f*(x) in D, contradicting the statement that f*(x) is irreducible. So we get factors in D only from d, and primitive irreducible factors from f*(x), and since f(x) = d f*(x), when we put these factorizations together, we get a factorization of f(x). ----------------------------------------------------------------------