Project: Math 340 spring 2015

Path: diagonalizability of real symmetric matrices.sagews

Views: ¹⁹⁶

Real symmetric matrices are diagonalizable

Here's my explanation for the diagonalizability of real symmetric matrices. There are lots of proofs and ways to show it, but many of them use facts that my students (in a typical sophomore linear algebra course) don't know, such as the details of Jordan Canonical Form, or seem to use too much machinery. What I have here requires these ideas:

eigenvalues and eigenvectors
eigenvalues of a real symmetric matrix are all real
inner product machinery, including ideas of orthogonal complement of a subspace
idea of an invariant subspace

I've looked in several linear algebra books and couldn't find any of them that address this issue at this level; all of them are (understandably) bringing in ideas of JCF, Schur's theorem/triangular form, or do things like have multiple matrix representations (of different sizes!) of a linear transformation -- one for the entire vector space, another for the transformation as it acts on an invariant subspace, which seems unnecessarily complicated.

My audience here is people who teach linear algebra, although this document right now is a mish-mash of what I would write for instructors and what I would write for students.

The idea that makes our recursion step work is this lemma.

Linear operators have eigenvectors

Lemma: Let $L$ be a linear operator on $R^n$ and $V$ a nontrivial subspace of $R^n$ . If $V$ is invariant under $L$ , then $L$ has an eigenvector in $V$ .

The proof is the standard determinant-free proof of the existence of eigenvectors from Axler's Down With Determinants: choose any nonzero vector $v \in V$ . Denote the dimension of $V$ with $k$ ; then, since $V$ is invariant under $L$ , the set $\{v, L(v), L^2(v), \ldots, L^k(v)\}$ consists of $k+1$ vectors in a $k$ -dimensional space and is therefore linearly dependent. That means there is a nontrivial collection of coefficients so that [ a_0 v + a_1 L(v) + \cdots +a_k L^k(v) = \vec{0}. ] It might be the case that $a_k = 0$ ; let $j$ be the largest index so that $a_j \not = 0$ . The polynomial $\sum_{i=0}^j a_i t^i$ factors (over the complex numbers!) into linear factors: [ 0 = \sum_{i=0}^j a_i t^i = c(t - r_1)(t - r_2)\cdots (t-r_j) ] Then (I find this to be a bit of a leap, although it's plausible enough...a typical engineering student will likely find the switch from a polynomial in $t$ to a polynomial in $L$ to be a bit weird, though) the same equation must be true when we use $L$ : [ \vec{0} = \left(\sum_{i=0}^j a_i L^i\right)v = \left(c(L - r_1 I)(L - r_2 I)\cdots (L-r_j I)\right)v ] One of the $L - r_i L$ factors must send a nonzero vector to zero...that is, $L$ has an eigenvector with eigenvalue $r_i$ .

A bit of cleanup

One last detail: above, $r_i$ could in principle be complex. But if you prove separately that real symmetric matrices only have real eigenvalues, you can then argue that $r_i$ is actually real.

Diagonalizability

The idea here is: find an eigenvector, show that the orthogonal complement is invariant, recurse.

Let $L$ be a real symmetric matrix / symmetric linear operator on $R^n$ .

Base case: $V_1 = R^n$ is invariant under $L$ , so $L$ has an eigenvector $v_1$ . Let $V_2$ be the orthogonal complement of (the subspace spanned by) $v_1$ .
$V_2$ is invariant under $L$ : this is where we really need the symmetry of $L$ . Say $w \in V_2$ . Is $L(w)$ also in $V_2$ ? Yes: [ (L(w), v_1) = (w, L(v_1)) = (w, \lambda_1 v_1) = \lambda_1 \cdot 0 = 0. ] Since $L(w)$ is orthogonal to $v_1$ , it is in $V_2$ .
Apply the above lemma: $L$ has an eigenvector $v_2$ in $V_2$ . That eigenvector is orthogonal to $v_1$ , of course.
Recurse: let $V_3 \subseteq V_2$ be the orthogonal complement of $v_2$ . Is $V_3$ invariant under $L$ ? Well, certainly $L(V_3)$ is in $V_2$ , since $V_2$ is invariant, but you can use the above argument to show that $L(w)$ is orthogonal to $v_2$ for any $w \in V_3$ , so $V_3$ is invariant and once again $L$ has an eigenvector $v_3$ in $V_3$ , which must be orthogonal to $v_2$ and $v_1$ .
The recursion bottoms out when you get to $V_n$ , which must be 1-dimensional. You find an eigenvector there, and then you have $n$ orthogonal eigenvectors...which must be linearly independent. Ah! You have an orthogonal basis of eigenvectors. Normalize all the lengths to $1$ and you have a lovely orthonormal basis of eigenvectors.

See? That was easy. Beyond the usual stuff to understand diagonalizability, you need some basic inner product machinery (which can be phrased entirely in the language of dot products and $x^T y$ stuff) and the idea of an invariant subspace, which is only epsilon away from the idea of diagonalizability anyway. I don't know why my textbook (Kolman & Hill) punts on this.

#example matrix

# example of invariant subspace -- both 1-1 and not

# nonexample of invariant subspace

# pick a vector, find the polynomial, find the eigenvector

# find orthogonal complement, pick a vector again, recurse

P = random_matrix(QQ, 3, 3, algorithm='unimodular')
#P = matrix([[3,1,4],[8,3,7],[7,3,3]]).transpose()
P

[ 1  4  2]
[ 2  9  4]
[ 3 13  7]

A = matrix([[2,1,0],[0,2,0],[0,0,-1]])
B = P * A * P.inverse()
B

[ 134 -288  -27]
[  54 -116  -11]
[  81 -175  -15]

Claim: the subspace of vectors of the form [ \vec{x} = a $\begin{pmatrix} 3 \\ 1 \\ 4 \end{pmatrix}$ + b $\begin{pmatrix} 8 \\ 3 \\ 7 \end{pmatrix}$ ] for some $a$ and $b$ is invariant under this matrix.

Let's compute.

a, b = var('a b')
v1 = vector([3,1,4])
v2 = vector([8,3,7])
show(matrix(B * (a * v1 + b * v2)).T)

\displaystyle \left(\begin{array}{r} 6 \, a + 19 \, b \\ 2 \, a + 7 \, b \\ 8 \, a + 18 \, b \end{array}\right)

...which equals [ (2a + b) $\begin{pmatrix} 3 \\ 1 \\ 4 \end{pmatrix}$ + 2b $\begin{pmatrix} 8 \\ 3 \\ 7 \end{pmatrix}$ . ] (Don't worry about how I got that, just make sure you understand that $B \vec{x}$ is still in the subspace, which means that subspace is invariant under $B$ .

Now let's find an eigenvector. The above proof says we can take any nonzero vector $\vec{x}$ in the space (which is 2-dimensional) and find a linear dependence relation among $\vec{x}, B\vec{x}, B^2\vec{x}$ :

x = 2 * v1 - 5 * v2
show([matrix(B^i * x).T for i in [0,1,2]])

[

\displaystyle \left(\begin{array}{r} -34 \\ -13 \\ -27 \end{array}\right)

\displaystyle \left(\begin{array}{r} -83 \\ -31 \\ -74 \end{array}\right)

\displaystyle \left(\begin{array}{r} -196 \\ -72 \\ -188 \end{array}\right)

]

We want solutions to the corresponding homogeneous system, so put those columns into a matrix and row reduce:

M = matrix([B^i * x for i in [0,1,2]]).transpose()
show(M)
show(M.rref())

\displaystyle \left(\begin{array}{rrr} -34 & -83 & -196 \\ -13 & -31 & -72 \\ -27 & -74 & -188 \end{array}\right)

\displaystyle \left(\begin{array}{rrr} 1 & 0 & -4 \\ 0 & 1 & 4 \\ 0 & 0 & 0 \end{array}\right)

So, $4 \vec{x} - 4 B \vec{x} + B^2 \vec{x} = \vec{0}$ :

4 * x - 4 * B * x + B^2 * x

(0, 0, 0)

Let's factor the corresponding polynomial $4 - 4t - t^2$ . We can do this by hand, but let's make Sage do it.

t = var('t')
poly = 4 - 4*t + t^2
prod((t - r)^m for r, m in poly.roots())

(t - 2)^2

Now let's start multiplying to find an eigenvector:

(B - 2*identity_matrix(3))*x

(-15, -5, -20)

That's not zero, keep going.

(B  - 2*identity_matrix(3)) * (B  - 2*identity_matrix(3))*x

(0, 0, 0)

So $(B - 2I)\vec{x}$ is an eigenvector in this invariant subspace:

(B - 2*identity_matrix(3))*x

(-15, -5, -20)

A subspace not invariant under $B$

Let's see that the span of $(8,3,7)^T$ and $(7,3,3)^T$ is not an invariant subspace for this linear transformation.

v3 = vector([7,3,3])
show(B * (a * v2 + b * v3))

\displaystyle \left(19 \, a - 7 \, b,\,7 \, a - 3 \, b,\,18 \, a - 3 \, b\right)

Is there a linear combination of $v_2$ and $v_3$ that equals the above vector?

matrix(SR, [v2, v3]).transpose().solve_right(B * (a * v2 + b * v3))

Error in lines 1-1
Traceback (most recent call last):
  File "/projects/e51fbbc4-453b-4f00-a470-028a1136adeb/.sagemathcloud/sage_server.py", line 879, in execute
    exec compile(block+'\n', '', 'single') in namespace, locals
  File "", line 1, in <module>
  File "sage/matrix/matrix2.pyx", line 358, in sage.matrix.matrix2.Matrix.solve_right (build/cythonized/sage/matrix/matrix2.c:6204)
    X = self._solve_right_general(C, check=check)
  File "sage/matrix/matrix2.pyx", line 484, in sage.matrix.matrix2.Matrix._solve_right_general (build/cythonized/sage/matrix/matrix2.c:7544)
    raise ValueError, "matrix equation has no solutions"
ValueError: matrix equation has no solutions

So...no. For some special choices of $a$ and $b$ , there are solutions, but not always. So that subspace is not invariant.

An example of the algorithm

Above I just demonstrated invariant subspaces, which can work for any linear transformation or matrix, not necessarily real symmetric ones. Let's now work through the algorithm described above.

C = random_matrix(QQ, 3, 3)
A = C * C.transpose()
A

[ 5  1 -1]
[ 1  2  1]
[-1  1  2]

A.eigenvalues()

[3, 0.550510257216822?, 5.449489742783178?]

x = vector([1,0,0])
matrix([A^i * x for i in [0, 1, 2, 3]]).transpose().rref()

[  1   0  -3 -18]
[  0   1   6  33]
[  0   0   0   0]

t = var('t')
poly = 3 - 6*t + t^2
factors = [(A - r * identity_matrix(3))^m for r, m in poly.roots()]
show(factors)

[

\displaystyle \left(\begin{array}{rrr} \sqrt{6} + 2 & 1 & -1 \\ 1 & \sqrt{6} - 1 & 1 \\ -1 & 1 & \sqrt{6} - 1 \end{array}\right)

\displaystyle \left(\begin{array}{rrr} -\sqrt{6} + 2 & 1 & -1 \\ 1 & -\sqrt{6} - 1 & 1 \\ -1 & 1 & -\sqrt{6} - 1 \end{array}\right)

]

factors[1]*x

(-sqrt(6) + 2, 1, -1)

(factors[0]*factors[1]*x).n()

(1.11022302462516e-15, 0.000000000000000, 0.000000000000000)

So (-sqrt(6) + 2, 1, -1) is an eigenvector for $A$ . We need to work in the orthogonal complement of that vector, and take a nonzero vector there to find another eigenvector. Note that (0,1,1) is orthogonal to that vector; let's use that.

evec1 = factors[1] * vector([1,0,0])
x = vector([0,1,1])
print evec1.dot_product(x)
matrix([A^i * x for i in [0, 1, 2]]).transpose().rref()

0
[1 3 9]
[0 0 0]
[0 0 0]

This one is so simple we don't even need to factor the polynomial: $(-3,1,0)$ is in the kernel of that matrix, which means [ -3 \vec{x} + A \vec{x} = \vec{0} ] or... [ A \vec{x} = 3 \vec{x}. ] Hey, our guess is actually an eigenvector!

evec2 = vector([0,1,1])

Now we need a vector orthogonal to the first two eigenvectors. The orthogonal complement must be 1-dimensional, so any nonzero vector in there will be an eigenvector. Let's put the 2 known eigenvectors into a matrix as rows and find the nullspace of that matrix -- any vector there is orthogonal to both eigenvectors.

matrix([evec1, evec2]).rref()

[              1               0 2/(sqrt(6) - 2)]
[              0               1               1]

So the vector $(-2/(\sqrt{6} - 2), -1, 1)$ is in the kernel / is an eigenvector:

evec3 = vector([-2/(sqrt(6) - 2), -1, 1])

Now let's check that we have an orthogonal basis. I'll put the vectors into a matrix and multiply by the transpose, which does all the dot products. We should get a diagonal matrix.

P = matrix([evec1, evec2, evec3])
P * P.transpose()

[  (sqrt(6) - 2)^2 + 2                     0                     0]
[                    0                     2                     0]
[                    0                     0 4/(sqrt(6) - 2)^2 + 2]

Now let's normalize the vectors to get an orthonormal basis.

P = matrix([evec1 / evec1.norm(), evec2 / evec2.norm(), evec3 / evec3.norm()]).apply_map(lambda x: x.full_simplify())
(P * P.transpose()).apply_map(lambda x: x.full_simplify())

[1 0 0]
[0 1 0]
[0 0 1]

Now we'll use $P$ to diagonalize $B$ :

D = (P * A * P.inverse()).apply_map(lambda x: x.full_simplify())
D

[(1079*sqrt(6) - 2643)/(198*sqrt(6) - 485)                                         0                                         0]
[                                        0                                         3                                         0]
[                                        0                                         0       3*(2*sqrt(6) - 5)/(11*sqrt(6) - 27)]

D.change_ring(RR)

[0.550510257513477 0.000000000000000 0.000000000000000]
[0.000000000000000  3.00000000000000 0.000000000000000]
[0.000000000000000 0.000000000000000  5.44948974278288]

e = A.eigenvalues()[1]
solve(SR(e.minpoly().subs(x=t)) == 0, t)

[t == -sqrt(6) + 3, t == sqrt(6) + 3]

(D[0,0] - (3 - sqrt(6))).factor()

0