Burnside’s counting lemma via characters

I just noticed that there is a very conceptually transparent proof of Burnside’s counting lemma via character theory! Here’ s a formulation of the statement. (I have heard it is not really due to Burnside; Frobenius or something.)

Burnside’s counting lemma: If a finite group G acts on a finite set X, the number of orbits for the action is equal to the average number of fixed points of the group elements.

The proof I’ve seen before is a very pretty elementary counting argument. You count the cardinality of the set \{(g,x): gx=x\} two different ways. Indexing over G first, you get

\sum_{g\in G} \left| X^g \right|,

where X^g is the set of fixed points of g in X. Indexing over X first instead, organizing the sum into orbits, and invoking the orbit-stabilizer theorem, you get

\sum_{x\in X}\left|G_x\right| = \sum_{\theta\in \mathcal{O}} |\theta|\left|G_x\right| = |\mathcal{O}||G|,

where \mathcal{O} is the set of orbits, and the x in the middle expression is any element of \theta. Equating the two counts and dividing by |G|, you get the claimed result

|\mathcal{O}| = \frac{1}{|G|}\sum_{g\in G} \left|X^g\right|.

I know I’m late to the party with this, but today was the first time I noticed that the thing on the right looks character-theoretic. I wondered if the result could be understood in those terms. Of course the answer is yes, and the argument is maybe even more straightforward to remember (if less elementary) than the above. The characters do all the work.

Proof by characters: Let V be the permutation representation of G on a complex vector space with basis X, and let \chi be its character. A vector in this representation is fixed iff it has constant coefficients across each orbit; thus the dimension of the fixed-point subspace V^G is equal to the number of orbits |\mathcal{O}|.

On the other hand, the dimension of V^G can also be calculated as the multiplicity of the trivial character of G (call it \psi) in \chi, which is the inner product \langle \psi,\chi\rangle. Meanwhile, \chi(g) is exactly \left| X^g\right| since V is a permutation representation with permutation basis X. Thus,

\langle \psi,\chi\rangle = \frac{1}{|G|} \sum_{g\in G} \overline{1}\cdot \left|X^g\right| = \frac{1}{|G|}\sum_{g\in G} \left|X^g\right|.

So, equating, you get

|\mathcal{O}| = \dim V^G = \frac{1}{|G|}\sum_{g\in G} \left|X^g\right|,

as was to be shown!

Cohen’s theorem on noetherian rings

I recently inherited a copy of Nagata’s book Local Rings, and I’ve already learned a new theorem!

Theorem of Cohen: A commutative ring is noetherian if and only if all its prime ideals are finitely generated.

This is cool because if you, like me, have ever been sad about the fact that noetherianity is not a local property of a ring, this is a sort of substitute. (NB: the axiom of choice, in the form of Zorn’s lemma, is involved.)

There is a naive reason you might hope the theorem is true, that is not the real reason, but it’s sort of the right idea: you might hope that because any non-finitely-generated (henceforth, “non-f.g.”) ideal is contained in a maximal ideal, non-f.g.ness must show up on a maximal.

This is wrong, and actually, the statement of the theorem becomes false if you reduce the quantification from all primes to just maximals. At the bottom of this post I’ll give an example to show this, due to Cory Colbert. The problem is that a non-f.g. ideal can live inside an f.g. ideal. (Dumb example: take your favorite non-f.g. ideal inside your favorite non-noetherian ring. It’s inside the unit ideal. For a more interesting case, see Cory’s example below.) Thus, the family of non-f.g. ideals is not upward-closed, and there’s no reason for a failure of noetherianity to show up on a maximal.

However, it’s approximately the right idea. Non-f.g.ness does not propagate upward because f.g.ness doesn’t simply propagate downward. But what’s true is that f.g.ness does propagate downward to an ideal from two ideals containing it that relate to it in a specific way.

Lemma: If I is an ideal in a commutative ring, and a is an element, and I+(a) and (I:a) are both finitely generated, then I is finitely generated.

Proof: Choose a finite set of generators for I+(a), say f_1,\dots,f_m, and write each one as the sum of an element of I and a multiple of a:

f_i = g_i + ar_i,

with each g_i\in I and each r_i just some ring element. Also choose a finite set of generators h_1,\dots,h_n for (I:a). Then all of g_1,\dots,g_m,ah_1,\dots,ah_n lie in I, and we will show that they generate I. Let x\in I be arbitrary. Then there is a representation

x = \sum p_if_i = \sum \left(p_ig_i + p_iar_i\right)

for x, where the p_i are some ring elements. The key observation is that

x - \sum p_ig_i = \sum p_iar_i

is in I, since both terms on the left are in I, and it follows that \sum p_ir_i is in (I:a)! Thus, there is a representation

\sum p_ir_i = \sum q_jh_j,

where the q_j are some ring elements. Then we have

x = \sum p_ig_i + \sum q_j(ah_j),

and this is the promised expression of x in terms of the proffered generators g_1,\dots,g_m, ah_1,\dots,ah_n. This completes the proof.

With this lemma in hand, the proof of Cohen’s theorem can be briefly summarized as follows. Consider the family of non-f.g. ideals in a ring. It’s not upward closed, so it won’t necessarily reach the maximal ideals themselves, but if it is nonempty, it will have maximal members (this is where Zorn comes in), and what the lemma does (a sort of weaker substitute for upward-closure) is to prove that these maximal members must be prime. Thus non-f.g.ness must show up on a prime if it shows up at all. Here’s the precise argument:

Proof of Cohen’s Theorem: Obviously, a noetherian ring can have only f.g. primes. The converse is the substantive direction. We will (equivalently) prove the inverse: a non-noetherian ring must have a non-f.g. prime. So let A be a non-noetherian ring, and let \mathcal{F} be the family of non-f.g. primes, which is nonempty by assumption. Note that given an ascending chain in \mathcal{F}, the union yields an upper bound; we see this as follows. If the union of an ascending chain of ideals in \mathcal{F} were not in \mathcal{F}, i.e. if it were f.g. as an ideal, then the (finite list of) generators would be found in some finite collection of members of the chain, and then the greatest of these members would contain all the generators and thus equal the whole union. This is a contradiction because the members of the chain were presumed to be in \mathcal{F} whereas the union was presumed not to be. It follows that the union of an ascending chain of non-f.g. ideals is non-f.g. Thus \mathcal{F} satisfies the hypotheses of Zorn’s lemma, so has a maximal element. Call it I.

We claim I is prime. Indeed, suppose ab\in I, but a\notin I. By the former supposition, b\in (I:a). By the latter supposition, I+(a) contains I properly. Since I is maximal in \mathcal{F}, this proper containment implies that I+(a) is f.g. Since I is not f.g. while I+(a) is, the Lemma implies that (I:a) is not f.g. This means that it cannot contain I properly, again by maximality of I in \mathcal{F}. We conclude I = (I:a). But recalling that b\in(I:a), we can now conclude that b\in I! This proves that I is prime, so there is a non-f.g. prime, completing the proof.

I promised I’d tell you about Cory’s example showing that the statement cannot be quantified only over maximal ideals. Let k be any field. The action will all take place inside the field of rational functions over k in two variables. Let R = k[x,y]_{(x,y)}. Let S  = R[y/x,y/x^2,\dots]. Note that (x)S = (x,y)S = (x,y,y/x,y/x^2,\dots)S, because every y/x^i is a multiple of x in S. It follows that S/(x)S = k, so that (x)S is a maximal ideal in S. Finally, consider the localization J = S_{(x)S}.

This is the desired ring. It is local and the unique maximal ideal is principal (generated by x), so certainly f.g. On the other hand, it is non-noetherian: the non-zero element y is contained in the ideal (x^j) = (x)^j, for all j, since y = (y/x^j)x^j, but in a noetherian local ring, the intersection of the powers of the maximal ideal is zero, by the Krull intersection theorem.

Why x^p-a doesn’t factor unless it has a root

I’ve heard this result in the title referred to as “classical” but I’m actually not sure where to find the proof. It came up in conversation with a collaborator last week which is why I’m thinking about it. It’s a problem in Michael Artin’s Algebra text, somewhere in the Galois theory chapter, so I have my own argument, which I’m sharing here. I have no idea how it’s usually proven. Same way? Or might there be a proof that handles the characteristic of the field in a uniform way?

Let k be a field of arbitrary characteristic. Let a be an arbitrary nonzero element of k, and let f=x^p-a\in k[x], where p is some prime number.

Theorem: Either f is irreducible over k or it has a root in k.

Proof: Let \Omega be an algebraic closure of k. In both of the below cases, the plan is to show that if f is reducible over k, then it has a root in k.

Case 1: \text{char}\, k = p.

In this case, f=(x-\alpha)^p, where \alpha is a pth root of a in $\latex \Omega$. Suppose there exists a nontrivial factorization of f,


with g,h\in k[x]. Then, normalizing g,h to be monic, we have

g=(x-\alpha)^m, h=(x-\alpha)^n,

with m,n\geq 1 and m+n=p. This implies \gcd(m,n)=1 since p is prime. Also, since g,h\in k[x], the final terms \alpha^m,\alpha^n are \in k. Since m,n are relatively prime, there exist integers j,\ell for which jm + \ell n = 1, and then we can conclude (\alpha^m)^j(\alpha^n)^\ell = \alpha^{jm+\ell n} = \alpha\in k. Thus f has a root in k.

Case 2: \text{char}\, k \neq p.

In this case, f has distinct roots in the algebraic closure \Omega, since the derivative f' = px^{p-1} is relatively prime with f, as p is invertible in k. For a similar reason, the pth roots of unity in \Omega are distinct.

Let L\subset\Omega be a splitting field for f. Let \zeta be a primitive pth root of unity in \Omega, and let \alpha be some root of f in L. Note that \zeta^j\alpha is also a root of f for j=1,\dots,p-1, and is thus in L, so that L contains \zeta^j\alpha / \alpha = \zeta^j; thus L contains k(\zeta). As all roots of f have the form \zeta^j\alpha, we have L=k(\zeta)(\alpha).

Now L/k is a Galois extension. Let \Gamma = \text{Gal}(L/k), and let G be the subgroup \text{Gal}(L/k(\zeta)). If G is nontrivial, then let \tau be a nontrivial element. We have \tau(\alpha) = \zeta^j\alpha for some j\in {1,\dots,p-1}, as \tau must act nontrivially on \alpha since it generates L over k(\zeta). Also, \tau acts trivially on \zeta by definition of G. Thus for any m\in \mathbb{N}, we have

\tau^m(\alpha) = \zeta^{jm}\alpha.

Since p is prime, jm traverses all residue classes mod p as m varies, and thus \tau^m(\alpha) = \zeta^{jm}\alpha moves to every root of f as m varies. Thus the cyclic subgroup \langle \tau\rangle \subset G acts transitively on the roots of f. It follows that f is irreducible over k(\zeta), and therefore over the smaller field k.

Thus, if f is reducible over k, it must be that G is trivial, and therefore that L=k(\zeta). It follows that \Gamma = \text{Gal}(k(\zeta)/k). An element \tau of \Gamma is therefore determined by its action on \zeta, which must be sent to some \zeta^j, j\in {1,\dots,p-1}; since composition of such maps corresponds to multiplication of the exponents, this means the map \tau\mapsto j realizes \Gamma as a subgroup of \mathbb{F}_p^\times. Since the latter group is cyclic (as it is a finite subgroup of the multiplicative group of a field), it follows that \Gamma is cyclic. Let \sigma be a generator.

Now we have \sigma(\alpha) = \zeta^b\alpha for some b\in {0,\dots,p-1}, and \sigma(\zeta) = \zeta^m for some m\in {1,\dots,p-1}. We will show that f has a root in k.

If m=1, then \zeta is fixed by \sigma, and thus the action of \Gamma = \langle \sigma\rangle on L=k(\zeta) is trivial, in which case L=k(\zeta)=k, and f has a root (in fact, it splits!) in k.

On the other hand, if m\neq 1, then the equation (m-1)j+b = 0, regarded as occurring in \mathbb{F}_p, has a solution for j. For this specific j, we have

\sigma(\zeta^j\alpha) = (\zeta^m)^j(\zeta^b\alpha) = \zeta^{mj+b}\alpha = \zeta^j\alpha,

since exponents of \zeta reduce mod p. We conclude that \zeta^j\alpha is fixed by \langle\sigma\rangle = \Gamma. Since this is the entire Galois group of L/k, we conclude that \zeta^j\alpha\in k. Thus f has a root in k. QED.

Descent and Artin’s lemma

I was reading Popov and Vinberg’s book on invariant theory when they asserted a basic lemma (without proof) that I didn’t recognize. It was this (p. 155):

Lemma 2.4: Suppose K is an extension of the field k, and G is a group of k-automorphisms of K. Suppose V is a (not necessarily finite-dimensional) vector space over k, and W is a subspace of K\otimes_k V that is stable under the natural action of G (induced by the action on the first factor). Then W is generated by invariant vectors.

I asked for a proof on Math.SE. Alex Youcis pointed me to these notes of J. Milne (see section c). They’re phrased in slightly different language, but they do the job. The heart of the matter is the following lemma:

Lemma 16.6 (this is Milne’s numbering, though I’m not being completely faithful to his statement or notation): Suppose K/k is a field extension such that G = \text{Aut}_k\,(K) satisfies K^G=k. (Note this is a stronger set of assumptions about K,k,G than the above.) Let V be a k-vector space. If W\subset K\otimes_kV is a nontrivial K-subspace that is stable under the action of G, then it contains a nontrivial G-invariant vector. (NB: the G-invariant vectors are precisely those that lie in the image of the natural k-linear map V\rightarrow K\otimes_kV given by v\mapsto 1\otimes v.)

The proof of this lemma is as follows. Choose a basis \{e_i\}_{i\in I} for V, where I is some indexing set. Then any element of K\otimes_k V is a finite sum of the form \sum x_i\otimes e_i, with x_i\in K\setminus\{0\} for each i in the (finite) sum. Since W is nontrivial, we can choose a w\in W such that the number of terms in this representation w=\sum x_i\otimes e_i is minimal among nonzero elements of W, and since W is a K-vector space, we can scale this w by an element of K so that one of the coefficients in this sum is 1; thus we can without loss of generality assume w = 1\otimes e_1 + \sum x_i\otimes e_i (where we have labeled e_1 as the basis vector corresponding to the coefficient 1) with the number of nonzero terms minimal. Since W is G-stable, for any g\in G we then have gw = 1\otimes e_1 + \sum gx_i\otimes e_i \in W, and therefore, their difference \sum (gx_i - x_i)\otimes e_i is in W as well. Note this sum has fewer nonzero terms than the sum representing w, and we conclude by w‘s minimality that this new sum is zero, and thus that gx_i - x_i = 0, since the e_i‘s are linearly independent over k and thus the 1\otimes e_i‘s are linearly independent over K.

We can conclude that gx_i=x_i for each x_i and for each g\in G. It is immediate that w = 1\otimes e_1+\sum x_i\otimes e_i is G-invariant, proving the existence of a nontrivial G-invariant vector. Since k = K^G, the x_i‘s all lie in k, and this, together with similar logic as above (linear independence of 1\otimes e_i‘s over K), also lets us recognize that G-invariant vectors are exactly those of the form \sum x_i\otimes e_i with all x_i‘s in k, i.e. those that are images of some v\in V under the natural k-linear map V\rightarrow K\otimes_k V, as per the parenthetical statement in the lemma. QED.

I recognized the proof of Lemma 16.6 as very similar to the proof I learned many years ago of Artin’s lemma, one of the building blocks in Artin’s develoment of the fundamental theorem of Galois theory. Artin’s lemma is the statement that if K is a field, G is a finite group of automorphisms of K, and k = K^G, then [K:k]\leq |G|. I spent some time yesterday morning thinking through this relationship, and I realized that the proof of Artin’s lemma can be reformulated in an elegant way to use Lemma 16.6. I wanted to record that here.

Proof of Artin’s lemma using Lemma 16.6:

Consider any finite collection of elements x_1,\dots,x_n of elements of K that are linearly independent over k. Our goal is to show that n\leq |G|. We will do this by constructing a certain K-linear map

\varphi: K^n\rightarrow K^{|G|}

and showing it is injective. The map \varphi is defined by K-linear extension of the following map on the standard basis e_1,\dots,e_n for K^n:

\varphi(e_i) = (gx_i)_{g\in G}.

The right side here is an element of K^{|G|} with entries indexed by elements of G.

We are going to apply Lemma 16.6 with V=k^n. In this situation, K^n is K\otimes_kV. We will take \text{ker}\,\varphi as W. (To invoke the lemma we will have to show it is G-stable.) The assumption that the x_i‘s are linearly independent over k is then going to imply that there is no G-invariant vector in W, from which Lemma 16.6 will imply that W itself is zero, i.e. that \varphi is injective, as desired. Here are the details:

First, we claim W =\text{ker}\,\varphi is G-stable. We see this as follows. Let

w = \sum_{i=1}^n y_ie_i

(where each y_i lies in K) be an arbitrary element of W, i.e. assume that

\varphi(w) = \sum_{i=1}^n y_i(gx_i)_{g\in G} = 0.

Looking coordinate-by-coordinate, this equation states that \sum_{i=1}^n y_igx_i = 0 for each g\in G. Then, if \gamma\in G is arbitrary, we have

\sum_{i=1}^n (\gamma y_i)(\gamma g x_i) = 0

as well. Since as g traverses G, \gamma g does as well, these equations (for each g\in G) can be arranged into the vector equation

\sum_{i=1}^n (\gamma y_i)(g x_i)_{g\in G}=0.

The left side is precisely \varphi(\gamma w). Thus for any w\in W= \text{ker}\;\varphi, gw\in W=\text{ker}\;\varphi as well, i.e. W is G-stable.

Thus we can invoke Lemma 16.6: if W is nontrivial, then it contains a nontrivial G-invariant element w^\star, which in view of the parenthetical note at the end of the lemma, has the form w^\star = \sum_{i=1}^n y_ie_i with each y_i\in k and some y_i nonzero. But then \varphi(w^\star)=0 is the statement that

\sum_{i=1}^n y_i(gx_i)_{g\in G}=0.

Extracting the coordinate where g=\text{id.} from this vector equation, we get

\sum_{i=1}^n y_ix_i = 0.

But since each y_i\in k, this is a nontrivial linear relation between the x_i‘s over k, which is not possible since we presumed them to be linearly independent. We conclude W=\text{ker}\;\varphi is trivial, i.e. \varphi is injective, and it follows that n\leq |G|. This completes the proof of Artin’s lemma.

Create your website at WordPress.com
Get started