Numerical Analysis Home

Eigenvalues: Power Method

    


Let \(A\) be an \(n\times n\) matrix. If \(A\overrightarrow{x}=\lambda \overrightarrow{x}\) for some nonzero vector \(\overrightarrow{x}\) and some scalar \(\lambda\), then \(\lambda\) is an eigenvalue of \(A\) and \(\overrightarrow{x}\) is an eigenvector of \(A\) corresponding to \(\lambda\).

Example. Consider \(A=\left[\begin{array}{rr}1&2\\0&3\end{array} \right],\;\lambda=3,\; \overrightarrow{v}=\left[\begin{array}{r}1\\1\end{array} \right],\; \overrightarrow{u}=\left[\begin{array}{r}-2\\1\end{array} \right]\).
Since \(A\overrightarrow{v} =\left[\begin{array}{rr}1&2\\0&3\end{array} \right]\left[\begin{array}{r}1\\1\end{array} \right] =\left[\begin{array}{r}3\\3\end{array} \right] =3\left[\begin{array}{r}1\\1\end{array} \right] =\lambda\overrightarrow{v}\), \(3\) is an eigenvalue of \(A\) and \(\overrightarrow{v}\) is an eigenvector of \(A\) corresponding to the eigenvalue \(3\).
Since \(A\overrightarrow{u} =\left[\begin{array}{rr}1&2\\0&3\end{array} \right]\left[\begin{array}{r}-2\\1\end{array} \right] =\left[\begin{array}{r}0\\3\end{array} \right] \neq \lambda\left[\begin{array}{r}-2\\1\end{array} \right] =\lambda\overrightarrow{u}\) for all scalars \(\lambda\), \(\overrightarrow{u}\) is not an eigenvector of \(A\).

The characteristic polynomial of \(A\) is \(\det(A-\lambda I)\) and its roots are the eigenvalues of \(A\). The eigenvectors of \(A\) corresponding to the eigenvalue \(\lambda\) are nonzero solutions of \((A-\lambda I)\overrightarrow{x}=\overrightarrow{O}\). The eigenspace of \(A\) corresponding to \(\lambda\) is \[\text{NS} (A-\lambda I)=\{\overrightarrow{x}\;|\;(A-\lambda I)\overrightarrow{x}=\overrightarrow{O}\}.\] The dominant eigenvalue of \(A\) is the eigenvalue of \(A\) with the largest modulus. The spectral radius \(\rho(A)\) of \(A\) is the modulus of the dominant eigenvalue of \(A\). For example, if \(3\) and \(-4\) are the eigenvalues of \(A\), then \(-4\) is the dominant eigenvalue of \(A\) and \(\rho(A)=4\).


The Power Method constructs a sequence \(\{\overrightarrow{x}^{(k)}\}\) defined by \[\overrightarrow{x}^{(k)}=\frac{A^k\overrightarrow{x}^{(0)}}{||A^k\overrightarrow{x}^{(0)}||},\; k\geq 1\] with an initial approximation \(\overrightarrow{x}^{(0)}\) to approximate an unit eigenvector corresponding to \(\lambda_1\):
Since \(\overrightarrow{v_1},\ldots,\overrightarrow{v_n}\) are \(n\) linearly independent eigenvectors, they span \(\mathbb R^n\) and \[\overrightarrow{x}^{(0)}=c_1\overrightarrow{v_1}+c_2\overrightarrow{v_2}+\cdots+c_n\overrightarrow{v_n},\] for some scalars \(c_1,c_2,\ldots,c_n\). We choose \(\overrightarrow{x}^{(0)}\) such that \(c_1\neq 0\). Then \[\overrightarrow{x}^{(k)}=\frac{A^k\overrightarrow{x}^{(0)}}{||A^k\overrightarrow{x}^{(0)}||} \to \pm\frac{\overrightarrow{v_1}}{||\overrightarrow{v_1}||}\text{ as } k\to \infty.\]

\[\begin{align*} & \overrightarrow{x}^{(0)}=c_1\overrightarrow{v_1}+c_2\overrightarrow{v_2}+\cdots+c_n\overrightarrow{v_n}\\ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \implies & A\overrightarrow{x}^{(0)}=c_1A\overrightarrow{v_1}+c_2A\overrightarrow{v_2}+\cdots+c_nA\overrightarrow{v_n}\\ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \implies & A\overrightarrow{x}^{(0)}=c_1\lambda_1\overrightarrow{v_1}+c_2\lambda_2\overrightarrow{v_2}+\cdots+c_n\lambda_n\overrightarrow{v_n}\\ %%%%%%%%%%%%%%%%%%%%%%%%%%%%% \implies & A^2\overrightarrow{x}^{(0)}=c_1\lambda_1A\overrightarrow{v_1}+c_2\lambda_2A\overrightarrow{v_2}+\cdots+c_n\lambda_nA\overrightarrow{v_n}\\ %%%%%%%%%%%%%%%%%%%%%%%%%%% \implies & A^2\overrightarrow{x}^{(0)}=c_1\lambda_1^2\overrightarrow{v_1}+c_2\lambda_2^2\overrightarrow{v_2}+\cdots+c_n\lambda_n^2\overrightarrow{v_n}\\ &\cdots\\ %%%%%%%%%%%%%%%%%%%%%%%%%%% \implies & A^k\overrightarrow{x}^{(0)}=c_1\lambda_1^k\overrightarrow{v_1}+c_2\lambda_2^k\overrightarrow{v_2}+\cdots+c_n\lambda_n^k\overrightarrow{v_n}\\ %%%%%%%%%%%%%%%%%%%%%%%%% \implies & A^k\overrightarrow{x}^{(0)}=\lambda_1^k\left(c_1\overrightarrow{v_1}+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\overrightarrow{v_2}+\cdots+c_n\left(\frac{\lambda_n}{\lambda_1}\right)^k\overrightarrow{v_n}\right) %\implies & \frac{A^k\overrightarrow{x}^{(0)}}{\lambda_1^k}=\left(c_1\overrightarrow{v_1}+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\overrightarrow{v_2}+\cdots+c_n\left(\frac{\lambda_n}{\lambda_1}\right)^k\overrightarrow{v_n}\right)\to c_1\overrightarrow{v_1} \text{ as } k\to \infty. \end{align*}\] \[\begin{align*} \overrightarrow{x}^{(k)}=\frac{A^k\overrightarrow{x}^{(0)}}{||A^k\overrightarrow{x}^{(0)}||} &=\frac{\lambda_1^k\left(c_1\overrightarrow{v_1}+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\overrightarrow{v_2}+\cdots+c_n\left(\frac{\lambda_n}{\lambda_1}\right)^k\overrightarrow{v_n}\right)}{\left\Vert\lambda_1^k\left(c_1\overrightarrow{v_1}+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\overrightarrow{v_2}+\cdots+c_n\left(\frac{\lambda_n}{\lambda_1}\right)^k\overrightarrow{v_n}\right)\right\Vert }\\ %%%%%%%%%%%%%%%%%%%%%%%%%%%%% &=\frac{\lambda_1^k\left(c_1\overrightarrow{v_1}+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\overrightarrow{v_2}+\cdots+c_n\left(\frac{\lambda_n}{\lambda_1}\right)^k\overrightarrow{v_n}\right)}{|\lambda_1^k|\left\Vert\left(c_1\overrightarrow{v_1}+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\overrightarrow{v_2}+\cdots+c_n\left(\frac{\lambda_n}{\lambda_1}\right)^k\overrightarrow{v_n}\right)\right\Vert }\\ %%%%%%%%%%%%%%%%%%%%%%%%%% &=\pm\frac{c_1\overrightarrow{v_1}+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\overrightarrow{v_2}+\cdots+c_n\left(\frac{\lambda_n}{\lambda_1}\right)^k\overrightarrow{v_n}}{\left\Vert c_1\overrightarrow{v_1}+c_2\left(\frac{\lambda_2}{\lambda_1}\right)^k\overrightarrow{v_2}+\cdots+c_n\left(\frac{\lambda_n}{\lambda_1}\right)^k\overrightarrow{v_n}\right\Vert }\\ %%%%%%%%%%%%%%%%%%%%%% &\longrightarrow \pm\frac{c_1\overrightarrow{v_1}}{||c_1\overrightarrow{v_1}||} =\pm\frac{c_1\overrightarrow{v_1}}{|c_1|\cdot||\overrightarrow{v_1}||}=\pm\frac{\overrightarrow{v_1}}{||\overrightarrow{v_1}||}\text{ as } k\to \infty. \end{align*}\]

Note that \[A\overrightarrow{v_1}=\lambda_1\overrightarrow{v_1} \implies \overrightarrow{v_1}^T A\overrightarrow{v_1}=\lambda_1\overrightarrow{v_1}^T\overrightarrow{v_1} \implies \lambda_1=\frac{\overrightarrow{v_1}^T A\overrightarrow{v_1}}{\overrightarrow{v_1}^T \overrightarrow{v_1}}\] If \(\overrightarrow{x}^{(k)}\approx\overrightarrow{v_1}\), then \[\lambda_1\approx \lambda_1^{(k)}:=\frac{(\overrightarrow{x}^{(k)})^T A\overrightarrow{x}^{(k)}}{(\overrightarrow{x}^{(k)})^T \overrightarrow{x}^{(k)}}.\]

Remark.

  1. The power method does not work for \( \overrightarrow{x}^{(0)}=c_1\overrightarrow{v_1}+c_2\overrightarrow{v_2}+\cdots+c_n\overrightarrow{v_n}\) when \(c_1=0\).

  2. The rate convergence of \(\{\overrightarrow{x}^{(k)}\}\) is faster when \(\displaystyle \left\vert\frac{\lambda_2}{\lambda_1}\right\vert\) is smaller, i.e., \(|\lambda_2|<<|\lambda_1|\).


Example. Do three iterations by the power method with \(\overrightarrow{x}^{(0)}=[2,3]^T\) to approximate the dominant eigenvalue of \(A=\left[\begin{array}{rr}1&2\\2&1\end{array}\right]\) with its eigenvector.
Solution. \[\begin{align*} A\overrightarrow{x}^{(0)}=\left[\begin{array}{r}8\\7\end{array}\right] &\implies \overrightarrow{x}^{(1)}=\frac{A\overrightarrow{x}^{(0)}}{||A\overrightarrow{x}^{(0)}||_{\infty}}=\frac{1}{8} \left[\begin{array}{r}8\\7\end{array}\right]=\left[\begin{array}{r}1\\0.87\end{array}\right]\\ %%%%%%%%%%%% A\overrightarrow{x}^{(1)}=\left[\begin{array}{r}2.74\\2.87\end{array}\right] &\implies \overrightarrow{x}^{(2)}=\frac{A\overrightarrow{x}^{(1)}}{||A\overrightarrow{x}^{(1)}||_{\infty}}=\frac{1}{2.87} \left[\begin{array}{r}2.74\\2.87\end{array}\right]=\left[\begin{array}{r}0.95\\1\end{array}\right]\\ %%%%%%%%%%%% A\overrightarrow{x}^{(2)}=\left[\begin{array}{r}2.95\\2.90\end{array}\right] &\implies \overrightarrow{x}^{(3)}=\frac{A\overrightarrow{x}^{(2)}}{||A\overrightarrow{x}^{(2)}||_{\infty}}=\frac{1}{2.95} \left[\begin{array}{r}2.95\\2.9\end{array}\right]=\left[\begin{array}{r}1\\0.98\end{array}\right]\\ %%%%%%%%%%%% A\overrightarrow{x}^{(3)}=\left[\begin{array}{r}2.96\\2.98\end{array}\right] & \end{align*}\] \[\begin{align*} \lambda_1^{(1)}= \frac{(\overrightarrow{x}^{(1)})^T A\overrightarrow{x}^{(1)}}{(\overrightarrow{x}^{(1)})^T \overrightarrow{x}^{(1)}} & \approx 5.23/1.75= 2.98\\ \lambda_1^{(2)}= \frac{(\overrightarrow{x}^{(2)})^T A\overrightarrow{x}^{(2)}}{(\overrightarrow{x}^{(2)})^T \overrightarrow{x}^{(2)}} &\approx 5.70/1.90 =3.00\\ \lambda_1^{(3)}= \frac{(\overrightarrow{x}^{(3)})^T A\overrightarrow{x}^{(3)}}{(\overrightarrow{x}^{(3)})^T \overrightarrow{x}^{3)}} &\approx 5.88/1.96 =3.00 \end{align*}\] The actual \(\lambda_1=3\) and \(\overrightarrow{v_1}=\left[\begin{array}{r}1\\1\end{array}\right]\). Note that if \(\overrightarrow{x}^{(0)}=[1,-1]^T\) which is \(\overrightarrow{v_2}\), then the power method fails. Since \(\displaystyle\left\vert\frac{\lambda_2}{\lambda_1}\right\vert=\frac{1}{3}\) is small, the convergence is fast.

Applications:
Power method is used to calculate Google PageRank and to find recommendations of who to follow in Twitter.


Last edited