Matrix Calculus

In the main part of this page we express results in terms of differentials rather than derivatives for two reasons: they avoid notational disagreements and they cope easily with the complex case. In most cases however, the differentials have been written in the form dY: = dY/dX dX: so that the corresponding derivative may be easily extracted.

Derivatives with respect to a real matrix

If X is p#q and Y is m#n, then dY: = dY/dX dX: where the derivative dY/dX is a large mn#pq matrix. If X and/or Y are column vectors or scalars, then the vectorization operator : has no effect and may be omitted. dY/dX is also called the Jacobian Matrix of Y: with respect to X: and det(dY/dX) is the corresponding Jacobian. The Jacobian occurs when changing variables in an integration: Integral(f(Y)dY:)=Integral(f(Y(X)) det(dY/dX) dX:).

Although they do not generalise so well, other authors use alternative notations for the cases when X and Y are both vectors or when one is a scalar. In particular:

dy/dx is sometimes written as a column vector rather than a row vector
dy/dx is sometimes transposed from the above definition or else is sometimes written dy/dxT to emphasise the correspondence between the columns of the derivative and those of xT .
dY/dx and dy/dX are often written as matrices rather than, as here, a column vector and row vector respectively. The matrix form may be converted to the form used here by appending : or :T respectively.

Derivatives with respect to a complex matrix

If X is complex then dY: = dY/dX dX: can only be generally true iff Y(X) is an analytic function. This normally implies that Y(X) does not depend explicitly on X C or X H .

Even for non-analytic functions we can treat X and X C (with X H =(X C ) T ) as distinct variables and write uniquely dY: = ∂Y/∂X dX: + ∂Y/∂X C dX C : provided that Y is analytic with respect to X and X C individually (or equivalently with respect to X R and X I individually). ∂Y/∂X is the Generalized Complex Derivative and ∂Y/∂X C is the Complex Conjugate Derivative [R.4, R.9]; their properties are studied in Wirtinger Calculus.

We define the generalized derivatives in terms of partial derivatives with respect to X R and X I :

We have the following relationships for both analytic and non-analytic functions Y(X):

Complex Constrained Minimization

Suppose f(X) is a scalar real function of a complex matrix (or vector), X, and G(X) is a complex-valued matrix (or vector or scalar) function of X. To minimize f(X) subject to G(X)=0, we use complex Lagrange multipliers and minimize f(X)+tr(K H G(X))+tr(K T G(X) C ) subject to G(X)=0. Hence we solve ∂f/∂X+∂tr(K H G)/∂X+∂tr(K T G C )/∂X = 0 T subject to G(X)=0. If g(X) is a vector, this becomes ∂f/∂X+k H ∂g/∂X+k T ∂g C /∂X = 0 T . If g(X) is a scalar, this becomes ∂f/∂X+k C ∂g/∂x+k∂g C /∂x = 0 T .

Complex Gradient Vector

grad(f(X)) is zero at an extreme value of f .
grad(f(X)) points in the direction of steepest slope of f(x)
The magnitude of the steepest slope is equal to |grad(f(X))|. Specifically, if g(X) = grad(f(X)), then lim_a->0a -1 ( f(X+ag(X)) - f(X) ) = | g(X) | 2
grad(f(X)) is normal to the surface f(X) = constant which means that it can be used for gradient ascent/descent algorithms.
If f(X)=yHy, then grad(f(X))=2(∂y/∂X) Hy+2(∂y/∂XC ) TyC