Moore–Penrose pseudoinverse

2.03K VIEWS

Everipedia is now IQ.wiki - Join the IQ Brainlist and our Discord for early access to editing on the new platform and to participate in the beta testing.

Moore–Penrose pseudoinverse

In mathematics, and in particular linear algebra, a pseudoinverse A+ of a matrix A is a generalization of the inverse matrix.^[1] The most widely known type of matrix pseudoinverse is the Moore–Penrose inverse,^[2]^[3]^[4]^[5] which was independently described by E. H. Moore^[6] in 1920, Arne Bjerhammar^[7] in 1951, and Roger Penrose^[8] in 1955. Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. When referring to a matrix, the term pseudoinverse, without further specification, is often used to indicate the Moore–Penrose inverse. The term generalized inverse is sometimes used as a synonym for pseudoinverse.

A common use of the pseudoinverse is to compute a "best fit" (least squares) solution to a system of linear equations that lacks a unique solution (see below under § Applications). Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the statement and proof of results in linear algebra.

The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. It can be computed using the singular value decomposition.

Notation

In the following discussion, the following conventions are adopted.

will denote one of the fields of real or complex numbers, denoted , , respectively. The vector space of matrices over is denoted by .
For , and denote the transpose and Hermitian transpose (also called conjugate transpose) respectively. If , then .
For , denotes the range (image) of (the space spanned by the column vectors of ) and denotes the kernel (null space) of .
Finally, for any positive integer , denotes the identity matrix.

Definition

For, a pseudoinverse ofis defined as a matrixsatisfying all of the following four criteria, known as the Moore–Penrose conditions:^[8]^[9]

(AA+ need not be the general identity matrix, but it maps all column vectors of A to themselves);
(A+ is a weak inverse for the multiplicative semigroup);
(AA+ is Hermitian);
(A+A is also Hermitian).

exists for any matrix, but, when the latter has fullrank(that is, the rank ofis), thencan be expressed as a simple algebraic formula.

In particular, whenhas linearly independent columns (and thus matrixis invertible),can be computed as

This particular pseudoinverse constitutes a left inverse, since, in this case,.

Whenhas linearly independent rows (matrixis invertible),can be computed as

This is a right inverse, as.

Properties

Existence and uniqueness

The pseudoinverse exists and is unique: for any matrix, there is precisely one matrix, that satisfies the four properties of the definition.^[9]

A matrix satisfying the first condition of the definition is known as a generalized inverse. If the matrix also satisfies the second definition, it is called a generalized reflexive inverse. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.

Basic properties

If has real entries, then so does .
If is invertible, its pseudoinverse is its inverse. That is, .^[10] ^[]
The pseudoinverse of a zero matrix is its transpose.
The pseudoinverse of the pseudoinverse is the original matrix: .^[10] ^[]
Pseudoinversion commutes with transposition, conjugation, and taking the conjugate transpose:^[10] ^[] , , .
The pseudoinverse of a scalar multiple of A is the reciprocal multiple of A+: for .

Identities

The following identities can be used to cancel certain subexpressions or expand expressions involving pseudoinverses. Proofs for these properties can be found in the proofs subpage.

Reduction to Hermitian case

The computation of the pseudoinverse is reducible to its construction in the Hermitian case. This is possible through the equivalences:

asandare Hermitian.

Products

If, and if

has orthonormal columns (that is,), orhas orthonormal rows (that is,), orhas all columns linearly independent (full column rank) andhas all rows linearly independent (full row rank), or(that is,is the conjugate transpose of),

then

The last property yields the equivalences

Projectors

andareorthogonal projection operators, that is, they are Hermitian (,) and idempotent (and). The following hold:

and
is the orthogonal projector onto the range of (which equals the orthogonal complement of the kernel of ).
is the orthogonal projector onto the range of (which equals the orthogonal complement of the kernel of ).
is the orthogonal projector onto the kernel of .
is the orthogonal projector onto the kernel of .^[9]

The last two properties imply the following identities:

Another property is the following: ifis Hermitian and idempotent (true if and only if it represents an orthogonal projection), then, for any matrixthe following equation holds:^[11]

This can be proven by defining matrices,, and checking thatis indeed a pseudoinverse forby verifying that the defining properties of the pseudoinverse hold, whenis Hermitian and idempotent.

From the last property it follows that, ifis Hermitian and idempotent, for any matrix

Finally, ifis an orthogonal projection matrix, then its pseudoinverse trivially coincides with the matrix itself, that is,.

Geometric construction

If we view the matrix as a linear mapover a fieldthencan be decomposed as follows. We writefor thedirect sum,for theorthogonal complement,for thekernelof a map, andfor theimageof a map. Notice thatand. The restrictionis then an isomorphism. These imply thatis defined onto be the inverse of this isomorphism, and onto be zero.

In other words: To findfor givenbinK^m, first projectborthogonally onto the range ofA, finding a pointp(b)in the range. Then formA⁻¹({p(b)}), that is, find those vectors inKⁿthatAsends top(b). This will be an affine subspace ofKⁿparallel to the kernel ofA. The element of this subspace that has the smallest length (that is, is closest to the origin) is the answerwe are looking for. It can be found by taking an arbitrary member ofA⁻¹({p(b)})and projecting it orthogonally onto the orthogonal complement of the kernel ofA.

This description is closely related to the Minimum norm solution to a linear system.

Subspaces

Limit relations

The pseudoinverse are limits:

(seeTikhonov regularization). These limits exist even ifordo not exist.^[9]^:263

Continuity

In contrast to ordinary matrix inversion, the process of taking pseudoinverses is notcontinuous: if the sequenceconverges to the matrixA(in themaximum norm or Frobenius norm, say), then(*A_n*)need not converge toA. However, if all the matrices have the same rank,(*A_n*)will converge toA.^[12]

Derivative

The derivative of a real valued pseudoinverse matrix which has constant rank at a pointmay be calculated in terms of the derivative of the original matrix:^[13]

Examples

Since for invertible matrices the pseudoinverse equals the usual inverse, only examples of non-invertible matrices are considered below.

For the pseudoinverse is (Generally, the pseudoinverse of a zero matrix is its transpose.) The uniqueness of this pseudoinverse can be seen from the requirement , since multiplication by a zero matrix would always produce a zero matrix.

For the pseudoinverse is Indeed, and thus Similarly, and thus

For (The denominators are .)

For the pseudoinverse is Note that for this matrix, the left inverse exists and thus equals , indeed,

Special cases

Scalars

It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar x is zero if x is zero and the reciprocal of x otherwise:

Vectors

The pseudoinverse of the null (all zero) vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:

Linearly independent columns

If the columns ofarelinearly independent(so that), thenis invertible. In this case, an explicit formula is:^[1]

It follows thatis then a left inverse of: .

Linearly independent rows

If the rows ofare linearly independent (so that), thenis invertible. In this case, an explicit formula is:

It follows thatis a right inverse of: .

Orthonormal columns or rows

This is a special case of either full column rank or full row rank (treated above). Ifhas orthonormal columns () or orthonormal rows (), then:

Orthogonal projection matrices

Ifis an orthogonal projection matrix, that is,and, then the pseudoinverse trivially coincides with the matrix itself:

Circulant matrices

For acirculant matrix, the singular value decomposition is given by theFourier transform, that is, the singular values are the Fourier coefficients. Letbe theDiscrete Fourier Transform (DFT) matrix, then^[14]

Construction

Rank decomposition

Letdenote therankof. Thencan be(rank) decomposedaswhereandare of rank. Then.

The QR method

Fororcomputing the productorand their inverses explicitly is often a source of numerical rounding errors and computational cost in practice. An alternative approach using theQR decompositionofmay be used instead.

Consider the case whenis of full column rank, so that. Then theCholesky decomposition, whereis anupper triangular matrix, may be used. Multiplication by the inverse is then done easily by solving a system with multiple right-hand sides,

which may be solved by forward substitution followed by back substitution.

The Cholesky decomposition may be computed without formingexplicitly, by alternatively using theQR decompositionof, wherehas orthonormal columns,, andis upper triangular. Then

soRis the Cholesky factor of.

The case of full row rank is treated similarly by using the formulaand using a similar argument, swapping the roles ofand.

Singular value decomposition (SVD)

A computationally simple and accurate way to compute the pseudoinverse is by using thesingular value decomposition.^[1]^[9]^[15] Ifis the singular value decomposition ofA, then. For arectangular diagonal matrixsuch as, we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, leaving the zeros in place, and then transposing the matrix. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in theMATLAB,GNU Octave, orNumPyfunctionpinv, the tolerance is taken to bet = ε⋅max(m, n)⋅max(Σ), where ε is themachine epsilon.

The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix–matrix multiplication, even if a state-of-the art implementation (such as that of LAPACK) is used.

The above procedure shows why taking the pseudoinverse is not a continuous operation: if the original matrixAhas a singular value 0 (a diagonal entry of the matrixabove), then modifyingAslightly may turn this zero into a tiny positive number, thereby affecting the pseudoinverse dramatically as we now have to take the reciprocal of a tiny number.

Block matrices

Optimized approaches exist for calculating the pseudoinverse of block structured matrices.

The iterative method of Ben-Israel and Cohen

Another method for computing the pseudoinverse (cf. Drazin inverse) uses the recursion

which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse ofif it is started with an appropriatesatisfying. The choice(where, withdenoting the largest singular value of) ^[16] has been argued not to be competitive to the method using the SVD mentioned above, because even for moderately ill-conditioned matrices it takes a long time beforeenters the region of quadratic convergence.^[17] However, if started withalready close to the Moore–Penrose inverse and, for example, convergence is fast (quadratic).

Updating the pseudoinverse

For the cases whereAhas full row or column rank, and the inverse of the correlation matrix (forAwith full row rank orfor full column rank) is already known, the pseudoinverse for matrices related tocan be computed by applying theSherman–Morrison–Woodbury formulato update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms exist that exploit the relationship.^[18]^[19]

Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.^[20]^[21]

Software libraries

The Python package NumPy provides a pseudoinverse calculation through its functions matrix.I and linalg.pinv; its pinv uses the SVD-based algorithm. SciPy adds a function scipy.linalg.pinv that uses a least-squares solver. High-quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK. Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computing or embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.

The MASS package for R provides a calculation of the Moore–Penrose inverse through the ginv function.^[22] The ginv function calculates a pseudoinverse using the singular value decomposition provided by the svd function in the base R package. An alternative is to employ the pinv function available in the pracma package.

The Octave programming language provides a pseudoinverse through the standard package function pinv and the pseudo_inverse() method.

Applications

Linear least-squares

The pseudoinverse provides aleast squaressolution to asystem of linear equations.^[23] For, given a system of linear equations

in general, a vectorthat solves the system may not exist, or if one does exist, it may not be unique. The pseudoinverse solves the "least-squares" problem as follows:

, we have where and denotes the Euclidean norm. This weak inequality holds with equality if and only if for any vector w; this provides an infinitude of minimizing solutions unless A has full column rank, in which case is a zero matrix.^[24] The solution with minimum Euclidean norm is ^[24]

This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let.

, we have where and denotes the Frobenius norm.

Obtaining all solutions of a linear system

If the linear system

has any solutions, they are all given by^[25]

for arbitrary vector. Solution(s) exist if and only if.^[25] If the latter holds, then the solution is unique if and only if A has full column rank, in which caseis a zero matrix. If solutions exist but A does not have full column rank, then we have anindeterminate system, all of whose infinitude of solutions are given by this last equation.

Minimum norm solution to a linear system

For linear systemswith non-unique solutions (such as under-determined systems), the pseudoinverse may be used to construct the solution of minimumEuclidean normamong all solutions.

If is satisfiable, the vector is a solution, and satisfies for all solutions.

This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let.

If is satisfiable, the matrix is a solution, and satisfies for all solutions.

Condition number

Using the pseudoinverse and a matrix norm, one can define a condition number for any matrix:

A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of A can lead to huge errors in the entries of the solution.^[26]

Generalizations

In order to solve more general least-squares problems, one can define Moore–Penrose inverses for all continuous linear operators A : H1 → H2 between two Hilbert spaces H1 and H2, using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudoinverse in this sense.^[26] Those that do are precisely the ones whose range is closed in H2.

In abstract algebra, a Moore–Penrose inverse may be defined on a *-regular semigroup. This abstract definition coincides with the one in linear algebra.

References

[1]

Citation Linkopenlibrary.orgBen-Israel, Adi; Greville, Thomas N.E. (2003). Generalized inverses: Theory and applications (2nd ed.). New York, NY: Springer. doi:10.1007/b97366. ISBN 978-0-387-00293-4..

Sep 28, 2019, 10:38 PM

[2]

Citation Linkopenlibrary.org, p. 7.

Sep 28, 2019, 10:38 PM

[3]

Citation Linkopenlibrary.orgCampbell, S. L.; Meyer, Jr., C. D. (1991). Generalized Inverses of Linear Transformations. Dover. ISBN 978-0-486-66693-8., p. 10.

Sep 28, 2019, 10:38 PM

[4]

Citation Linkopenlibrary.orgNakamura, Yoshihiko (1991). Advanced Robotics: Redundancy and Optimization. Addison-Wesley. ISBN 978-0201151985., p. 42.

Sep 28, 2019, 10:38 PM

[5]

Citation Linkopenlibrary.orgRao, C. Radhakrishna; Mitra, Sujit Kumar (1971). Generalized Inverse of Matrices and its Applications. New York: John Wiley & Sons. p. 240. ISBN 978-0-471-70821-6., p. 50–51.

Sep 28, 2019, 10:38 PM

[6]

Citation Link//doi.org/10.1090%2FS0002-9904-1920-03322-7Moore, E. H. (1920). "On the reciprocal of the general algebraic matrix". Bulletin of the American Mathematical Society. 26 (9): 394–95. doi:10.1090/S0002-9904-1920-03322-7.

Sep 28, 2019, 10:38 PM

[7]

Citation Linkopenlibrary.orgBjerhammar, Arne (1951). "Application of calculus of matrices to method of least squares; with special references to geodetic calculations". Trans. Roy. Inst. Tech. Stockholm. 49.

Sep 28, 2019, 10:38 PM

[8]

Citation Link//doi.org/10.1017%2FS0305004100030401Penrose, Roger (1955). "A generalized inverse for matrices". Proceedings of the Cambridge Philosophical Society. 51 (3): 406–13. doi:10.1017/S0305004100030401.

Sep 28, 2019, 10:38 PM

[9]

Citation Linkopenlibrary.orgGolub, Gene H.; Charles F. Van Loan (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins. pp. 257–258. ISBN 978-0-8018-5414-9.

Sep 28, 2019, 10:38 PM

[10]

Citation Linkopenlibrary.orgStoer, Josef; Bulirsch, Roland (2002). Introduction to Numerical Analysis (3rd ed.). Berlin, New York: Springer-Verlag. ISBN 978-0-387-95452-3..

Sep 28, 2019, 10:38 PM

[11]

Citation Link//doi.org/10.1177%2F027836498500400308Maciejewski, Anthony A.; Klein, Charles A. (1985). "Obstacle Avoidance for Kinematically Redundant Manipulators in Dynamically Varying Environments". International Journal of Robotics Research. 4 (3): 109–117. doi:10.1177/027836498500400308.

Sep 28, 2019, 10:38 PM

[12]

Citation Linkelib.mi.sanu.ac.rsRakočević, Vladimir (1997). "On continuity of the Moore–Penrose and Drazin inverses" (PDF). Matematički Vesnik. 49: 163–72.

Sep 28, 2019, 10:38 PM

[13]

Citation Link//www.jstor.org/stable/2156365Golub, G. H.; Pereyra, V. (April 1973). "The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate". SIAM Journal on Numerical Analysis. 10 (2): 413–32. doi:10.1137/0710036. JSTOR 2156365.

Sep 28, 2019, 10:38 PM

[14]

Citation Link//www.jstor.org/stable/2038377Stallings, W. T.; Boullion, T. L. (1972). "The Pseudoinverse of an r-Circulant Matrix". Proceedings of the American Mathematical Society. 34 (2): 385–88. doi:10.2307/2038377. JSTOR 2038377.

Sep 28, 2019, 10:38 PM

[15]

Citation Linkwebsites.uwlax.eduLinear Systems & Pseudo-Inverse

Sep 28, 2019, 10:38 PM

[16]

Citation Linkbenisrael.netBen-Israel, Adi; Cohen, Dan (1966). "On Iterative Computation of Generalized Inverses and Associated Projections". SIAM Journal on Numerical Analysis. 3 (3): 410–19. doi:10.1137/0703035. JSTOR 2949637.pdf

Sep 28, 2019, 10:38 PM

[17]

Citation Link//www.jstor.org/stable/2156431Söderström, Torsten; Stewart, G. W. (1974). "On the Numerical Properties of an Iterative Method for Computing the Moore–Penrose Generalized Inverse". SIAM Journal on Numerical Analysis. 11 (1): 61–74. doi:10.1137/0711008. JSTOR 2156431.

Sep 28, 2019, 10:38 PM

[18]

Citation Link//www.worldcat.org/oclc/841706164Gramß, Tino (1992). Worterkennung mit einem künstlichen neuronalen Netzwerk (PhD dissertation). Georg-August-Universität zu Göttingen. OCLC 841706164.

Sep 28, 2019, 10:38 PM

[19]

Citation Linkemtiyaz.github.ioEmtiyaz, Mohammad (February 27, 2008). "Updating Inverse of a Matrix When a Column is Added/Removed" (PDF). Cite journal requires |journal= (help)

Sep 28, 2019, 10:38 PM

[20]

Citation Link//doi.org/10.1137%2F0125057Meyer, Jr., Carl D. (1973). "Generalized inverses and ranks of block matrices". SIAM J. Appl. Math. 25 (4): 597–602. doi:10.1137/0125057.

Sep 28, 2019, 10:38 PM

Moore–Penrose pseudoinverse

Moore–Penrose pseudoinverse

Notation

Definition

Properties

Existence and uniqueness

Basic properties

Identities

Reduction to Hermitian case

Products

Projectors

Geometric construction

Subspaces

Limit relations

Continuity

Derivative

Examples

Special cases

Scalars

Vectors

Linearly independent columns

Linearly independent rows

Orthonormal columns or rows

Orthogonal projection matrices

Circulant matrices

Construction

Rank decomposition

The QR method

Singular value decomposition (SVD)

Block matrices

The iterative method of Ben-Israel and Cohen

Updating the pseudoinverse

Software libraries

Applications

Linear least-squares

Obtaining all solutions of a linear system

Minimum norm solution to a linear system

Condition number

Generalizations

See also

References