Shareddiagonalizability of real symmetric matrices.sagewsOpen in CoCalc
Author: Dan Drake
Views : 11

Real symmetric matrices are diagonalizable

Here's my explanation for the diagonalizability of real symmetric matrices. There are lots of proofs and ways to show it, but many of them use facts that my students (in a typical sophomore linear algebra course) don't know, such as the details of Jordan Canonical Form, or seem to use too much machinery. What I have here requires these ideas:

  • eigenvalues and eigenvectors
  • eigenvalues of a real symmetric matrix are all real
  • inner product machinery, including ideas of orthogonal complement of a subspace
  • idea of an invariant subspace

I've looked in several linear algebra books and couldn't find any of them that address this issue at this level; all of them are (understandably) bringing in ideas of JCF, Schur's theorem/triangular form, or do things like have multiple matrix representations (of different sizes!) of a linear transformation -- one for the entire vector space, another for the transformation as it acts on an invariant subspace, which seems unnecessarily complicated.

My audience here is people who teach linear algebra, although this document right now is a mish-mash of what I would write for instructors and what I would write for students.

The idea that makes our recursion step work is this lemma.

Linear operators have eigenvectors

Lemma: Let LL be a linear operator on RnR^n and VV a nontrivial subspace of RnR^n. If VV is invariant under LL, then LL has an eigenvector in VV.

The proof is the standard determinant-free proof of the existence of eigenvectors from Axler's Down With Determinants: choose any nonzero vector vVv \in V. Denote the dimension of VV with kk; then, since VV is invariant under LL, the set {v,L(v),L2(v),,Lk(v)}\{v, L(v), L^2(v), \ldots, L^k(v)\} consists of k+1k+1 vectors in a kk-dimensional space and is therefore linearly dependent. That means there is a nontrivial collection of coefficients so that a0v+a1L(v)++akLk(v)=0. a_0 v + a_1 L(v) + \cdots +a_k L^k(v) = \vec{0}. It might be the case that ak=0a_k = 0; let jj be the largest index so that aj0a_j \not = 0. The polynomial i=0jaiti\sum_{i=0}^j a_i t^i factors (over the complex numbers!) into linear factors: 0=i=0jaiti=c(tr1)(tr2)(trj) 0 = \sum_{i=0}^j a_i t^i = c(t - r_1)(t - r_2)\cdots (t-r_j) Then (I find this to be a bit of a leap, although it's plausible enough...a typical engineering student will likely find the switch from a polynomial in tt to a polynomial in LL to be a bit weird, though) the same equation must be true when we use LL: 0=(i=0jaiLi)v=(c(Lr1I)(Lr2I)(LrjI))v \vec{0} = \left(\sum_{i=0}^j a_i L^i\right)v = \left(c(L - r_1 I)(L - r_2 I)\cdots (L-r_j I)\right)v One of the LriLL - r_i L factors must send a nonzero vector to zero...that is, LL has an eigenvector with eigenvalue rir_i.

A bit of cleanup

One last detail: above, rir_i could in principle be complex. But if you prove separately that real symmetric matrices only have real eigenvalues, you can then argue that rir_i is actually real.

Diagonalizability

The idea here is: find an eigenvector, show that the orthogonal complement is invariant, recurse.

Let LL be a real symmetric matrix / symmetric linear operator on RnR^n.

  1. Base case: V1=RnV_1 = R^n is invariant under LL, so LL has an eigenvector v1v_1. Let V2V_2 be the orthogonal complement of (the subspace spanned by) v1v_1.

  2. V2V_2 is invariant under LL: this is where we really need the symmetry of LL. Say wV2w \in V_2. Is L(w)L(w) also in V2V_2? Yes: (L(w),v1)=(w,L(v1))=(w,λ1v1)=λ10=0. (L(w), v_1) = (w, L(v_1)) = (w, \lambda_1 v_1) = \lambda_1 \cdot 0 = 0. Since L(w)L(w) is orthogonal to v1v_1, it is in V2V_2.

  3. Apply the above lemma: LL has an eigenvector v2v_2 in V2V_2. That eigenvector is orthogonal to v1v_1, of course.

  4. Recurse: let V3V2V_3 \subseteq V_2 be the orthogonal complement of v2v_2. Is V3V_3 invariant under LL? Well, certainly L(V3)L(V_3) is in V2V_2, since V2V_2 is invariant, but you can use the above argument to show that L(w)L(w) is orthogonal to v2v_2 for any wV3w \in V_3, so V3V_3 is invariant and once again LL has an eigenvector v3v_3 in V3V_3, which must be orthogonal to v2v_2 and v1v_1.

  5. The recursion bottoms out when you get to VnV_n, which must be 1-dimensional. You find an eigenvector there, and then you have nn orthogonal eigenvectors...which must be linearly independent. Ah! You have an orthogonal basis of eigenvectors. Normalize all the lengths to 11 and you have a lovely orthonormal basis of eigenvectors.

See? That was easy. Beyond the usual stuff to understand diagonalizability, you need some basic inner product machinery (which can be phrased entirely in the language of dot products and xTyx^T y stuff) and the idea of an invariant subspace, which is only epsilon away from the idea of diagonalizability anyway. I don't know why my textbook (Kolman & Hill) punts on this.

#example matrix # example of invariant subspace -- both 1-1 and not # nonexample of invariant subspace # pick a vector, find the polynomial, find the eigenvector # find orthogonal complement, pick a vector again, recurse
P = random_matrix(QQ, 3, 3, algorithm='unimodular') #P = matrix([[3,1,4],[8,3,7],[7,3,3]]).transpose() P
[ 1 4 2] [ 2 9 4] [ 3 13 7]
A = matrix([[2,1,0],[0,2,0],[0,0,-1]]) B = P * A * P.inverse() B
[ 134 -288 -27] [ 54 -116 -11] [ 81 -175 -15]

Claim: the subspace of vectors of the form x=a(314)+b(837) \vec{x} = a \begin{pmatrix} 3 \\ 1 \\ 4 \end{pmatrix} + b \begin{pmatrix} 8 \\ 3 \\ 7 \end{pmatrix} for some aa and bb is invariant under this matrix.

Let's compute.

a, b = var('a b') v1 = vector([3,1,4]) v2 = vector([8,3,7]) show(matrix(B * (a * v1 + b * v2)).T)
(6a+19b2a+7b8a+18b)\displaystyle \left(\begin{array}{r} 6 \, a + 19 \, b \\ 2 \, a + 7 \, b \\ 8 \, a + 18 \, b \end{array}\right)

...which equals (2a+b)(314)+2b(837). (2a + b) \begin{pmatrix} 3 \\ 1 \\ 4 \end{pmatrix} + 2b \begin{pmatrix} 8 \\ 3 \\ 7 \end{pmatrix}. (Don't worry about how I got that, just make sure you understand that BxB \vec{x} is still in the subspace, which means that subspace is invariant under BB.

Now let's find an eigenvector. The above proof says we can take any nonzero vector x\vec{x} in the space (which is 2-dimensional) and find a linear dependence relation among x,Bx,B2x\vec{x}, B\vec{x}, B^2\vec{x}:

x = 2 * v1 - 5 * v2 show([matrix(B^i * x).T for i in [0,1,2]])
[(341327)\displaystyle \left(\begin{array}{r} -34 \\ -13 \\ -27 \end{array}\right), (833174)\displaystyle \left(\begin{array}{r} -83 \\ -31 \\ -74 \end{array}\right), (19672188)\displaystyle \left(\begin{array}{r} -196 \\ -72 \\ -188 \end{array}\right)]

We want solutions to the corresponding homogeneous system, so put those columns into a matrix and row reduce:

M = matrix([B^i * x for i in [0,1,2]]).transpose() show(M) show(M.rref())
(34831961331722774188)\displaystyle \left(\begin{array}{rrr} -34 & -83 & -196 \\ -13 & -31 & -72 \\ -27 & -74 & -188 \end{array}\right)
(104014000)\displaystyle \left(\begin{array}{rrr} 1 & 0 & -4 \\ 0 & 1 & 4 \\ 0 & 0 & 0 \end{array}\right)

So, 4x4Bx+B2x=04 \vec{x} - 4 B \vec{x} + B^2 \vec{x} = \vec{0}:

4 * x - 4 * B * x + B^2 * x
(0, 0, 0)

Let's factor the corresponding polynomial 44tt24 - 4t - t^2. We can do this by hand, but let's make Sage do it.

t = var('t') poly = 4 - 4*t + t^2 prod((t - r)^m for r, m in poly.roots())
(t - 2)^2

Now let's start multiplying to find an eigenvector:

(B - 2*identity_matrix(3))*x
(-15, -5, -20)

That's not zero, keep going.

(B - 2*identity_matrix(3)) * (B - 2*identity_matrix(3))*x
(0, 0, 0)

So (B2I)x(B - 2I)\vec{x} is an eigenvector in this invariant subspace:

(B - 2*identity_matrix(3))*x
(-15, -5, -20)

A subspace not invariant under BB

Let's see that the span of (8,3,7)T(8,3,7)^T and (7,3,3)T(7,3,3)^T is not an invariant subspace for this linear transformation.

v3 = vector([7,3,3]) show(B * (a * v2 + b * v3))
(19a7b,7a3b,18a3b)\displaystyle \left(19 \, a - 7 \, b,\,7 \, a - 3 \, b,\,18 \, a - 3 \, b\right)

Is there a linear combination of v2v_2 and v3v_3 that equals the above vector?

matrix(SR, [v2, v3]).transpose().solve_right(B * (a * v2 + b * v3))
Error in lines 1-1 Traceback (most recent call last): File "/projects/e51fbbc4-453b-4f00-a470-028a1136adeb/.sagemathcloud/sage_server.py", line 879, in execute exec compile(block+'\n', '', 'single') in namespace, locals File "", line 1, in <module> File "sage/matrix/matrix2.pyx", line 358, in sage.matrix.matrix2.Matrix.solve_right (build/cythonized/sage/matrix/matrix2.c:6204) X = self._solve_right_general(C, check=check) File "sage/matrix/matrix2.pyx", line 484, in sage.matrix.matrix2.Matrix._solve_right_general (build/cythonized/sage/matrix/matrix2.c:7544) raise ValueError, "matrix equation has no solutions" ValueError: matrix equation has no solutions

So...no. For some special choices of aa and bb, there are solutions, but not always. So that subspace is not invariant.

An example of the algorithm

Above I just demonstrated invariant subspaces, which can work for any linear transformation or matrix, not necessarily real symmetric ones. Let's now work through the algorithm described above.

C = random_matrix(QQ, 3, 3) A = C * C.transpose() A
[ 5 1 -1] [ 1 2 1] [-1 1 2]
A.eigenvalues()
[3, 0.550510257216822?, 5.449489742783178?]
x = vector([1,0,0]) matrix([A^i * x for i in [0, 1, 2, 3]]).transpose().rref()
[ 1 0 -3 -18] [ 0 1 6 33] [ 0 0 0 0]
t = var('t') poly = 3 - 6*t + t^2 factors = [(A - r * identity_matrix(3))^m for r, m in poly.roots()] show(factors)
[(6+21116111161)\displaystyle \left(\begin{array}{rrr} \sqrt{6} + 2 & 1 & -1 \\ 1 & \sqrt{6} - 1 & 1 \\ -1 & 1 & \sqrt{6} - 1 \end{array}\right), (6+21116111161)\displaystyle \left(\begin{array}{rrr} -\sqrt{6} + 2 & 1 & -1 \\ 1 & -\sqrt{6} - 1 & 1 \\ -1 & 1 & -\sqrt{6} - 1 \end{array}\right)]
factors[1]*x
(-sqrt(6) + 2, 1, -1)
(factors[0]*factors[1]*x).n()
(1.11022302462516e-15, 0.000000000000000, 0.000000000000000)

So (-sqrt(6) + 2, 1, -1) is an eigenvector for AA. We need to work in the orthogonal complement of that vector, and take a nonzero vector there to find another eigenvector. Note that (0,1,1) is orthogonal to that vector; let's use that.

evec1 = factors[1] * vector([1,0,0]) x = vector([0,1,1]) print evec1.dot_product(x) matrix([A^i * x for i in [0, 1, 2]]).transpose().rref()
0 [1 3 9] [0 0 0] [0 0 0]

This one is so simple we don't even need to factor the polynomial: (3,1,0)(-3,1,0) is in the kernel of that matrix, which means 3x+Ax=0 -3 \vec{x} + A \vec{x} = \vec{0} or... Ax=3x. A \vec{x} = 3 \vec{x}. Hey, our guess is actually an eigenvector!

evec2 = vector([0,1,1])

Now we need a vector orthogonal to the first two eigenvectors. The orthogonal complement must be 1-dimensional, so any nonzero vector in there will be an eigenvector. Let's put the 2 known eigenvectors into a matrix as rows and find the nullspace of that matrix -- any vector there is orthogonal to both eigenvectors.

matrix([evec1, evec2]).rref()
[ 1 0 2/(sqrt(6) - 2)] [ 0 1 1]

So the vector (2/(62),1,1)(-2/(\sqrt{6} - 2), -1, 1) is in the kernel / is an eigenvector:

evec3 = vector([-2/(sqrt(6) - 2), -1, 1])

Now let's check that we have an orthogonal basis. I'll put the vectors into a matrix and multiply by the transpose, which does all the dot products. We should get a diagonal matrix.

P = matrix([evec1, evec2, evec3]) P * P.transpose()
[ (sqrt(6) - 2)^2 + 2 0 0] [ 0 2 0] [ 0 0 4/(sqrt(6) - 2)^2 + 2]

Now let's normalize the vectors to get an orthonormal basis.

P = matrix([evec1 / evec1.norm(), evec2 / evec2.norm(), evec3 / evec3.norm()]).apply_map(lambda x: x.full_simplify()) (P * P.transpose()).apply_map(lambda x: x.full_simplify())
[1 0 0] [0 1 0] [0 0 1]

Now we'll use PP to diagonalize BB:

D = (P * A * P.inverse()).apply_map(lambda x: x.full_simplify()) D
[(1079*sqrt(6) - 2643)/(198*sqrt(6) - 485) 0 0] [ 0 3 0] [ 0 0 3*(2*sqrt(6) - 5)/(11*sqrt(6) - 27)]
D.change_ring(RR)
[0.550510257513477 0.000000000000000 0.000000000000000] [0.000000000000000 3.00000000000000 0.000000000000000] [0.000000000000000 0.000000000000000 5.44948974278288]
e = A.eigenvalues()[1] solve(SR(e.minpoly().subs(x=t)) == 0, t)
[t == -sqrt(6) + 3, t == sqrt(6) + 3]
(D[0,0] - (3 - sqrt(6))).factor()
0