Matrix Calculus

In the main part of this page we express results in terms of differentials rather than derivatives for two reasons: they avoid notational disagreements and they cope easily with the complex case. In most cases however, the differentials have been written in the form dY: = dY/dX dX: so that the corresponding derivative may be easily extracted.

Derivatives with respect to a real matrix

If X is p#q and Y is m#n, then dY: = dY/dX dX: where the derivative dY/dX is a large mn#pq matrix. If X and/or Y are column vectors or scalars, then the vectorization operator : has no effect and may be omitted. dY/dX is also called the Jacobian Matrix of Y: with respect to X: and det(dY/dX) is the corresponding Jacobian. The Jacobian occurs when changing variables in an integration: Integral(f(Y)dY:)=Integral(f(Y(X)) det(dY/dX) dX:).

Although they do not generalise so well, other authors use alternative notations for the cases when X and Y are both vectors or when one is a scalar. In particular:

Derivatives with respect to a complex matrix

If X is complex then dY: = dY/dX dX: can only be generally true iff Y(X) is an analytic function. This normally implies that Y(X) does not depend explicitly on X C or X H .

Even for non-analytic functions we can treat X and X C (with X H =(X C ) T ) as distinct variables and write uniquely dY: = ∂Y/∂X dX: + ∂Y/∂X C dX C : provided that Y is analytic with respect to X and X C individually (or equivalently with respect to X R and X I individually). ∂Y/∂X is the Generalized Complex Derivative and ∂Y/∂X C is the Complex Conjugate Derivative [R.4, R.9]; their properties are studied in Wirtinger Calculus.

We define the generalized derivatives in terms of partial derivatives with respect to X R and X I :

We have the following relationships for both analytic and non-analytic functions Y(X):

Complex Constrained Minimization

Suppose f(X) is a scalar real function of a complex matrix (or vector), X, and G(X) is a complex-valued matrix (or vector or scalar) function of X. To minimize f(X) subject to G(X)=0, we use complex Lagrange multipliers and minimize f(X)+tr(K H G(X))+tr(K T G(X) C ) subject to G(X)=0. Hence we solve ∂f/∂X+∂tr(K H G)/X+tr(K T G C )/X = 0 T subject to G(X)=0. If g(X) is a vector, this becomes ∂f/∂X+k Hg/∂X+k Tg C /∂X = 0 T . If g(X) is a scalar, this becomes ∂f/∂X+k Cg/∂x+kg C /∂x = 0 T .

Complex Gradient Vector