LearnMathOnline

Orthogonal Bases and Matrices

Definition. A set \(\{\overrightarrow{v_1},\overrightarrow{v_2},\ldots,\overrightarrow{v_k}\}\) of vectors in \(\mathbb R^n\) is called an orthogonal set if \(\overrightarrow{v_i} \cdot \overrightarrow{v_j}=0\) for all distinct \(i,j=1,2,\ldots,k\). Also \(\{\overrightarrow{v_1},\overrightarrow{v_2},\ldots,\overrightarrow{v_k}\}\) is called an orthonormal set if it is an orthogonal set of unit vectors.

Example. Let \(\overrightarrow{v_1}=\left[\begin{array}{r}2\\0\\-1\end{array} \right]\), \(\overrightarrow{v_2}=\left[\begin{array}{r}0\\2\\0\end{array} \right]\), and \(\overrightarrow{v_3}=\left[\begin{array}{r}1\\0\\2\end{array} \right]\). Verify that \(\overrightarrow{v_1}\cdot \overrightarrow{v_2}=0,\; \overrightarrow{v_1}\cdot \overrightarrow{v_3}=0,\; \overrightarrow{v_2}\cdot \overrightarrow{v_3}=0\). Then \(\{\overrightarrow{v_1},\overrightarrow{v_2},\overrightarrow{v_3}\}\) is an orthogonal set in \(\mathbb R^3\) but not orthonormal. The following is an orthonormal set: \[\left\lbrace \frac{\overrightarrow{v_1}}{\left\lVert\overrightarrow{v_1}\right\rVert}, \frac{\overrightarrow{v_2}}{\left\lVert\overrightarrow{v_2}\right\rVert}, \frac{\overrightarrow{v_3}}{\left\lVert\overrightarrow{v_3}\right\rVert} \right\rbrace =\left\lbrace \frac{1}{\sqrt{5}} \left[\begin{array}{r}2\\0\\-1\end{array} \right], \frac{1}{2}\left[\begin{array}{r}0\\2\\0\end{array} \right], \frac{1}{\sqrt{5}}\left[\begin{array}{r}1\\0\\2\end{array} \right] \right\rbrace.\]

Theorem. If \(\{\overrightarrow{v_1},\overrightarrow{v_2},\ldots,\overrightarrow{v_k}\}\) is an orthogonal set nonzero vectors in \(\mathbb R^n\), then \(\{\overrightarrow{v_1},\overrightarrow{v_2},\ldots,\overrightarrow{v_k}\}\) is linearly independent and consequently it forms a basis of \(\operatorname{Span} \{\overrightarrow{v_1},\overrightarrow{v_2},\ldots,\overrightarrow{v_k}\}\).

Let \(c_1\overrightarrow{v_1}+c_2\overrightarrow{v_2}+\cdots+c_k\overrightarrow{v_k}=\overrightarrow{0}\) for some scalars \(c_1,c_2,\ldots,c_k\). Then \[\begin{array}{rrl} &\overrightarrow{0} \cdot \overrightarrow{v_1} & =(c_1\overrightarrow{v_1}+c_2\overrightarrow{v_2}+\cdots+c_k\overrightarrow{v_k}) \cdot \overrightarrow{v_1}\\ \implies & 0 & =c_1(\overrightarrow{v_1}\cdot \overrightarrow{v_1})+c_2(\overrightarrow{v_2}\cdot \overrightarrow{v_1})+\cdots+c_k(\overrightarrow{v_k}\cdot \overrightarrow{v_1}) \\ \implies & 0 & =c_1\left\lVert\overrightarrow{v_1}\right\rVert^2+0+\cdots+0\\ \implies & c_1& =0 \left(\text{since } \left\lVert\overrightarrow{v_1}\right\rVert \neq 0 \text{ as }\overrightarrow{v_1}\neq \overrightarrow{0} \right). \end{array}\] Similarly we can prove \(c_2=c_3=\cdots=c_k=0\). Thus \(\{\overrightarrow{v_1},\overrightarrow{v_2},\ldots,\overrightarrow{v_k}\}\) is linearly independent and consequently it forms a basis of \(\operatorname{Span} \{\overrightarrow{v_1},\overrightarrow{v_2},\ldots,\overrightarrow{v_k}\}\).

Definition. Let \(W\) be a subspace of \(\mathbb R^n\). An orthogonal basis of \(W\) is a basis of \(W\) that is an orthogonal set. Similarly an orthonormal basis of \(W\) is a basis of \(W\) that is an orthonormal set.

Theorem. Let \(W\) be a subspace of \(\mathbb R^n\) and \(\{\overrightarrow{w_1},\overrightarrow{w_2},\ldots,\overrightarrow{w_k}\}\) be an orthogonal basis of \(W\). If \(\overrightarrow{v}\in W\), then \[\overrightarrow{v}=\frac{\overrightarrow{v}\cdot \overrightarrow{w_1}}{\overrightarrow{w_1} \cdot \overrightarrow{w_1}}\overrightarrow{w_1}+ \frac{\overrightarrow{v}\cdot \overrightarrow{w_2}}{\overrightarrow{w_2} \cdot \overrightarrow{w_2}}\overrightarrow{w_2}+\cdots+ \frac{\overrightarrow{v}\cdot \overrightarrow{w_k}}{\overrightarrow{w_k} \cdot \overrightarrow{w_k}}\overrightarrow{w_k}.\]

Let \(\overrightarrow{v}\in W=\operatorname{Span} \{\overrightarrow{w_1},\overrightarrow{w_2},\ldots,\overrightarrow{w_k}\}\). Then \(\overrightarrow{v}=c_1\overrightarrow{w_1}+c_2\overrightarrow{w_2}+\cdots+c_k\overrightarrow{w_k}\) for some scalars \(c_1,c_2,\ldots,c_k\). Then \[\begin{array}{rrl} &\overrightarrow{v} \cdot \overrightarrow{w_1} & =(c_1\overrightarrow{w_1}+c_2\overrightarrow{w_2}+\cdots+c_k\overrightarrow{w_k}) \cdot \overrightarrow{w_1} \\ \implies & \overrightarrow{v} \cdot \overrightarrow{w_1} & =c_1(\overrightarrow{w_1}\cdot \overrightarrow{w_1})+c_2(\overrightarrow{w_2}\cdot \overrightarrow{w_1})+\cdots+c_k(\overrightarrow{w_k}\cdot \overrightarrow{w_1}) \\ \implies &\overrightarrow{v} \cdot \overrightarrow{w_1} & =c_1(\overrightarrow{w_1}\cdot \overrightarrow{w_1})+0+\cdots+0\\ \implies & c_1& =\displaystyle\frac{\overrightarrow{v} \cdot \overrightarrow{w_1}}{\overrightarrow{w_1}\cdot \overrightarrow{w_1}} \;\; \left(\text{since } \overrightarrow{w_1}\cdot \overrightarrow{w_1}=\left\lVert\overrightarrow{w_1}\right\rVert^2 \neq 0 \text{ as }\overrightarrow{w_1}\neq \overrightarrow{0} \right). \end{array}\] Similarly we can prove that \(c_i =\displaystyle\frac{\overrightarrow{v} \cdot \overrightarrow{w_i}}{\overrightarrow{w_i}\cdot \overrightarrow{w_i}}\) for \(i=2,3,\ldots,k\).

Example. Let \(\overrightarrow{v_1}=\left[\begin{array}{r}2\\0\\-1\end{array} \right]\), \(\overrightarrow{v_2}=\left[\begin{array}{r}0\\2\\0\end{array} \right]\), and \(\overrightarrow{v_3}=\left[\begin{array}{r}1\\0\\2\end{array} \right]\). Write \(\overrightarrow{v}=\left[\begin{array}{r}-1\\4\\3\end{array} \right]\) as a unique linear combination of \(\overrightarrow{v_1},\overrightarrow{v_2},\overrightarrow{v_3}\) which form an orthogonal basis of \(\mathbb R^3\).
Solution. \[\begin{align*} \overrightarrow{v}=\left[\begin{array}{r}-1\\4\\3\end{array} \right] &=\frac{\overrightarrow{v}\cdot \overrightarrow{v_1}}{\overrightarrow{v_1} \cdot \overrightarrow{v_1}}\overrightarrow{v_1} +\frac{\overrightarrow{v}\cdot \overrightarrow{v_2}}{\overrightarrow{v_2} \cdot \overrightarrow{v_2}}\overrightarrow{v_2} +\frac{\overrightarrow{v}\cdot \overrightarrow{v_3}}{\overrightarrow{v_3} \cdot \overrightarrow{v_3}}\overrightarrow{v_3}\\ &=\frac{-5}{5}\overrightarrow{v_1}+\frac{8}{4}\overrightarrow{v_2}+\frac{5}{5}\overrightarrow{v_3}\\ &=-\overrightarrow{v_1}+2\overrightarrow{v_2}+\overrightarrow{v_3}. \end{align*}\]

Theorem. An \(m\times n\) real matrix \(U\) has orthonormal columns if and only if \(U^TU=I_n\).

Let \(U=[\overrightarrow{u_1}\:\overrightarrow{u_2}\:\cdots\overrightarrow{u_n}]\) be an \(m\times n\) real matrix. Then \[U^TU= \left[\begin{array}{c}\overrightarrow{u_1}^T\\\overrightarrow{u_2}^T\\ \vdots\\\overrightarrow{u_n}^T \end{array} \right] [\overrightarrow{u_1}\:\overrightarrow{u_2}\:\cdots\overrightarrow{u_n}] =\left[\begin{array}{cccc} \overrightarrow{u_1}\cdot \overrightarrow{u_1} &\overrightarrow{u_1}\cdot \overrightarrow{u_2}&\cdots &\overrightarrow{u_1}\cdot \overrightarrow{u_n}\\ \overrightarrow{u_2}\cdot \overrightarrow{u_1} &\overrightarrow{u_2}\cdot \overrightarrow{u_2}&\cdots &\overrightarrow{u_2}\cdot \overrightarrow{u_n}\\ \vdots&\vdots&\ddots &\vdots\\ \overrightarrow{u_n}\cdot \overrightarrow{u_1} &\overrightarrow{u_n}\cdot \overrightarrow{u_2}&\cdots &\overrightarrow{u_n}\cdot \overrightarrow{u_n}\\ \end{array}\right].\] Thus \(U\) has orthonormal columns if and only if \(U^TU=I_n\).

Definition. A square real matrix \(U\) is called an orthogonal matrix if \(U\) has orthonormal columns, equivalently if \(U^TU=I\).

Theorem. The following are equivalent for an \(n\times n\) real matrix \(U\).

\(U\) is an orthogonal matrix.

\(U\) has orthonormal columns.

\(U^TU=I_n\).

\(UU^T=I_n\).

\(U\) has orthonormal rows.

\(U^{-1}=U^T\).

Example. \(U=\left[\begin{array}{rrr} \frac{2}{\sqrt{5}}&0&\frac{1}{\sqrt{5}}\\0&1&0\\\frac{-1}{\sqrt{5}}&0&\frac{2}{\sqrt{5}}\end{array}\right]\) is an orthogonal matrix and \(U^{-1}=U^T=\left[\begin{array}{rrr} \frac{2}{\sqrt{5}}&0&\frac{-1}{\sqrt{5}}\\0&1&0\\\frac{1}{\sqrt{5}}&0&\frac{2}{\sqrt{5}}\end{array}\right]\).

Theorem. Let \(U\) be an \(m\times n\) real matrix with orthonormal columns. Then

\((U\overrightarrow{x}) \cdot (U\overrightarrow{y})=\overrightarrow{x} \cdot \overrightarrow{y}\) for all \(\overrightarrow{x},\overrightarrow{y} \in \mathbb R^n\).

\((U\overrightarrow{x}) \cdot (U\overrightarrow{y})=0\) if and only if \(\overrightarrow{x} \cdot \overrightarrow{y}=0\) for all \(\overrightarrow{x},\overrightarrow{y} \in \mathbb R^n\) (i.e., the map \(\overrightarrow{x} \mapsto U\overrightarrow{x}\) preserves the orthogonality between vectors).

\(\left\lVert U\overrightarrow{x}\right\rVert=\left\lVert\overrightarrow{x}\right\rVert\) for all \(\overrightarrow{x}\in \mathbb R^n\) (i.e., the map \(\overrightarrow{x} \mapsto U\overrightarrow{x}\) preserves the length of vectors).

Since \(m\times n\) real matrix \(U\) has orthonormal columns, \(U^TU=I_n\).

\((U\overrightarrow{x}) \cdot (U\overrightarrow{y})=(U\overrightarrow{x})^T (U\overrightarrow{y}) =\overrightarrow{x}^TU^TU\overrightarrow{y}=\overrightarrow{x}^TI_n\overrightarrow{y} =\overrightarrow{x} \cdot \overrightarrow{y}\) for all \(\overrightarrow{x},\overrightarrow{y} \in \mathbb R^n\).

Follows from (a).

By (a), \(\left\lVert U\overrightarrow{x}\right\rVert^2 =(U\overrightarrow{x}) \cdot (U\overrightarrow{x})=\overrightarrow{x} \cdot \overrightarrow{x} =\left\lVert \overrightarrow{x}\right\rVert^2 \implies \left\lVert U\overrightarrow{x}\right\rVert =\left\lVert\overrightarrow{x}\right\rVert\).

Corollary. An \(n\times n\) real matrix \(U\) is orthogonal if and only if \(\left\lVert U\overrightarrow{x}\right\rVert =\left\lVert\overrightarrow{x}\right\rVert\) for all \(\overrightarrow{x}\in \mathbb R^n\).

Let \(U\) be an \(n\times n\) real matrix.
(\(\implies\)) It follows from (c) of the preceding theorem.
(\(\Longleftarrow\)) Suppose \(\left\lVert U\overrightarrow{x}\right\rVert=\left\lVert\overrightarrow{x}\right\rVert\) for all \(\overrightarrow{x}\in \mathbb R^n\). Let \(U^TU=[a_{ij}]\). Since \(U^TU\) is symmetric, \(a_{ij}=a_{ji}\). For \(i=1,2,\ldots,n\), \(a_{ii}=(U\overrightarrow{e_i})^T(U\overrightarrow{e_i}) =\left\lVert U\overrightarrow{e_i}\right\rVert^2=\left\lVert\overrightarrow{e_i}\right\rVert^2=1\). For \(i\neq j\), \[\begin{array}{rrl} & a_{ii}-a_{ji}-a_{ij}+a_{jj}&=(U(\overrightarrow{e_i}-\overrightarrow{e_j}))^T(U(\overrightarrow{e_i}-\overrightarrow{e_j}))\\ \implies & 2-2a_{ij}&=\left\lVert U(\overrightarrow{e_i}-\overrightarrow{e_j})\right\rVert^2 =\left\lVert\overrightarrow{e_i}-\overrightarrow{e_j}\right\rVert^2=2\\ \implies & a_{ij}&=0. \end{array}\] Thus \(U^TU=I_n\) and \(U\) is orthogonal.

Last edited