# Widely linear system of equations, revisited

It seems I just cannot stop talking about widely linear equations. Sorry for being to monotonous. My colleague Dominik recently made me aware of the fact that it is possible to solve the widely linear system of equations

$$\ma{A} \cdot \ma{x} + \ma{B} \cdot \ma{x}^* = \ma{c}$$

that I already visited in an earlier blog post in a much more convenient form. I was suggesting to stack the real and the imaginary parts of both $\ma{x}$ and $\ma{c}$ into long vectors which then transforms the entire system of $N$ complex-valued equations into a system of $2N$ real-valued equations, using real- and imaginary parts of $\ma{A}$ and $\ma{B}$ to construct the coefficients.

It is possible to completely avoid that. Simply conjugate the original equation, this gives you

$$\ma{A}^* \cdot \ma{x}^* + \ma{B}^* \cdot \ma{x} = \ma{c}^*.$$

Combining both equations in one we can write

$$\begin{bmatrix} \ma{A} & \ma{B} \\ \ma{B}^* & \ma{A}^* \end{bmatrix} \cdot \begin{bmatrix} \ma{x} \\ \ma{x}^* \end{bmatrix} = \begin{bmatrix} \ma{c} \\ \ma{c}^* \end{bmatrix}.$$

Therefore, provided the inverse exists, we have

$$\begin{bmatrix} \ma{x} \\ \ma{x}^* \end{bmatrix} = \begin{bmatrix} \ma{A} & \ma{B} \\ \ma{B}^* & \ma{A}^* \end{bmatrix}^{-1} \cdot \begin{bmatrix} \ma{c} \\ \ma{c}^* \end{bmatrix}.$$

This is of course redundant, since we only need to compute $\ma{x}$ (once) and not $\ma{x}$ and $\ma{x}^*$. Instead of simply ditching the last $N$ rows, we can avoid computing them altogether, by applying the rules for inverting two by two block matrices. The final solution then becomes

$$\ma{x} = \Big(\ma{A} – \ma{B} (\ma{A}^*)^{-1} \ma{B}^*\Big)^{-1} \cdot \ma{c} \; – \ma{A}^{-1} \ma{B} \Big(\ma{A}^* – \ma{B}^* \ma{A}^{-1} \ma{B}\Big)^{-1} \cdot \ma{c}^*.$$

So simple. Why didn’t I see it earlier?

# Widely Linear Least Squares

Together with my colleague Jens I recently stumbled across a widely linear estimation problem. More precisely, we were trying to estimate a vector $\ma{x} \in \compl^{N}$ such that $\ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix}$ was as close as possible to a given $\ma{b} \in \compl^{M}$ for a given matrix $\ma{C} \in \compl^{M \times 2N}$. This is not a linear Least Squares problem (due to the non-linear complex conjugation operator), however, it is linear in real and imaginary parts of $\ma{x}$. Hence, we call it widely linear.

How to solve such problems? Before we can calculate a solution, we need an appropriate cost function. Let’s assume for now that $M\geq 2N$ so that we would expect the system of equations to be overdetermined. A proper cost function for $\ma{x}$ would then be

$$\ma{x}_{\rm opt} = \arg \min_{\ma{x}} \left\| \ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix} – \ma{b} \right\|_2^2,$$

i.e., find the $\ma{x}$ that minimizes the Euclidean distance between $\ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix}$ and $\ma{b}$. How to solve it? I was somehow reminded of a blog post I made earlier on solving widely linear systems of equations and argued that the same solution is applicable. You simply break $\ma{C}$ into $\ma{C} = \begin{bmatrix}\ma{A} & \ma{B}\end{bmatrix}$ where $\ma{A}, \ma{B} \in \compl^{M \times N}$ contain the first and the last $N$ columns of $\ma{C}$. Then you rewrite the whole system into a real-valued system of equations following the technique outlined in the earlier blog post. This gives you

$$\ma{x}_{\rm opt} = \arg \min_{\ma{x}} \left\| \begin{bmatrix} \ma{A}_{\rm R} + \ma{B}_{\rm R} & -\ma{A}_{\rm I} + \ma{B}_{\rm I} \\ \ma{A}_{\rm I} + \ma{B}_{\rm I} & \ma{A}_{\rm R} – \ma{B}_{\rm R} \end{bmatrix} \cdot \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix} – \begin{bmatrix}\ma{b}_{\rm R}\\ \ma{b}_{\rm I}\end{bmatrix} \right\|_2^2,$$

where $\ma{A} = \ma{A}_{\rm R}+\jmath \ma{A}_{\rm I}$,$\ma{B} = \ma{B}_{\rm R}+\jmath \ma{B}_{\rm I}$, $\ma{b} = \ma{b}_{\rm R}+\jmath \ma{b}_{\rm I}$, and $\ma{x} = \ma{x}_{\rm R}+\jmath \ma{x}_{\rm I}$. The Least Squares solution to this optimization problem is simple: it’s a regular real-valued system of equations with $2M$ equations and $2N$ unknowns so applying the pseudo-inverse of the block matrix provides the Least Squares solution in $\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}$. Also, the real-valued stacking we have applied did not change the norm since for any complex vector $\ma{x}$ we have $\left\|\ma{x}\right\|_2^2 = \left\|\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}\right\|_2^2$.

So it’s valid and it works. But it’s not very elegant. We have to introduce a lot of new quantities and write down the big block matrix. Is there a more elegant way? One that avoids stacking real and imaginary parts of $\ma{C}$ and $\ma{b}$?

To find it, we started rewriting $\begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix}$. Since $\ma{x} = \ma{x}_{\rm R}+\jmath \ma{x}_{\rm I}$ and $\ma{x}^* = \ma{x}_{\rm R}-\jmath \ma{x}_{\rm I}$ we can write

$$\ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix} = \underbrace{\ma{C} \cdot \begin{bmatrix} \ma{I}_N & \jmath \ma{I}_N \\ \ma{I}_N & – \jmath \ma{I}_N \end{bmatrix}}_{\ma{\tilde{C}}} \cdot \begin{bmatrix}\ma{x}_{\rm R} \\ \ma{x}_{\rm I}\end{bmatrix}.$$

(As a side note: if we do decide to break $\ma{C}$ into $\begin{bmatrix}\ma{A} & \ma{B}\end{bmatrix}$ as above then we have $\ma{\tilde{C}} = \begin{bmatrix}\ma{A}+\ma{B} & \jmath \ma{A } – \jmath \ma{B}\end{bmatrix}$.)

Our optimization problem then takes the nice and simple form

$$\ma{x}_{\rm opt} = \arg \min_{\ma{x}} \left\| \ma{\tilde{C}} \cdot \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix} – \ma{b} \right\|_2^2.$$

One may be tempted to say that this shows the solution: simply apply the pseudo-inverse of $\ma{\tilde{C}}$. Unfortunately, it is not quite as simple. If we do this, the solution is in general a complex-valued vector. However, $\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}$ must be real-valued by construction. So how do we solve this optimization problem?

Well, we have a cost function $J(\ma{y}) = \left\|\ma{\tilde{C}} \cdot \ma{y} – \ma{b}\right\|_2^2$ that we want to minimize subject to the constraint that $\ma{y}$ is real-valued or equivalently $\ma{y} = \ma{y}^*$ or equivalently $\ma{y}-\ma{y}^* = \ma{0}$. For me this calls for the method of Lagrange multipliers! The Lagrangian for this constrained optimization problem then looks like this:

$$L(\ma{y},\ma{\lambda}) = \left\|\ma{\tilde{C}} \cdot \ma{y} – \ma{b}\right\|_2^2 + 2 \cdot {\rm Re}\{\ma{\lambda}^{\rm H} \cdot (\ma{y}-\ma{y}^*)\}.$$

I would like to spare the details of the derivation since it’s quite straight-forward: compute the gradients of $L$, set to zero, and solve for $\ma{y}$. The solution we obtain then provides the optimal $\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}$ directly. Jens did the calculations (thanks!) and got this result:

$$\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}_{\rm opt} = \ma{\tilde{C}}^+ \cdot \ma{b} \; – \jmath \cdot \ma{G}^{-1} \cdot {\rm Re}\left\{\ma{G}^{-1} \right\}^{-1} \cdot {\rm Im}\{\ma{\tilde{C}}^+ \cdot \ma{b}\},$$

where $\ma{G} = \ma{\tilde{C}}^{\rm H}\ma{\tilde{C}}$ and it was assumed that $\ma{G}$ and  ${\rm Re}\left\{\ma{G}^{-1} \right\}$ are both invertible. One could also write it as

$$\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}_{\rm opt} = {\rm Re}\{\ma{\tilde{C}}^+ \cdot \ma{b}\} \; + \left[ \ma{I} – \ma{G}^{-1} \cdot {\rm Re}\left\{\ma{G}^{-1} \right\}^{-1} \right]\cdot \jmath \cdot {\rm Im}\{\ma{\tilde{C}}^+ \cdot \ma{b}\}.$$

This one is (sort of) instructive. If $\ma{G}$ is real then ${\rm Re}\left\{\ma{G}^{-1} \right\}^{-1} = \ma{G}$ and hence the whole second part vanishes. In this case, the optimal solution is to take simply the real part of the LS solution (note that this happens if $\ma{\tilde{C}}$ is also real-valued in which case it would be more efficient to drop the imaginary part of $\ma{b}$ and then solve the real-valued system of equations directly). Otherwise, we need an additional term that depends on the imaginary part of the LS solution, somehow “projected” through this funny matrix $\ma{I} – \ma{G}^{-1} \cdot {\rm Re}\left\{\ma{G}^{-1} \right\}^{-1}$.

Although I kind of like this second solution better since it avoids the real-valued stacking and is partially instructive where real and imaginary parts go, it has at least one major drawback. For the first solution, we’ve reduced the problem onto a $2M$ by $2N$ system of real-valued linear equations. Applying the pseudo-inverse there we know exactly that it gives is the least squares solution if the equivalent $\ma{b}$ is not in the column space of the equivalent $\ma{C}$ (“no exact solution”) and it gives us the minimum norm solution if the equivalent $\ma{C}$ has a non-empty kernel, i.e, not full column rank (“infinitely many solutions”). Most importantly, it can do both at the same time.

This does not carry over to our complex solution. We derived it for the overdetermined Least Squares case, assuming full column rank. We can do a similar derivation for the case where the existence of infinitely many exact solutions is assumed and we seek the one with the smallest norm. However, the result is different and it is not easy to come up with one solution that accomplishes both at the same time. For those that are curious, the second case yields the solution

$$\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}_{\rm opt} = {\rm Re}\{\ma{\tilde{C}}^+ \cdot \ma{b}\} \; + \left[ \ma{I} – \ma{P} \cdot {\rm Re}\left\{\ma{P} \right\}^{-1} \right]\cdot \jmath \cdot {\rm Im}\{\ma{\tilde{C}}^+ \cdot \ma{b}\},$$

where $\ma{P} = \ma{I} – \ma{\tilde{C}}^{\rm H} \cdot \left(\ma{\tilde{C}} \cdot \ma{\tilde{C}}^{\rm H}\right)^{-1} \ma{\tilde{C}}$.

# Fun with statistics – transformations of random variables part 2

I recently posted on how to find the distribution of functions of random variables, i.e., the distribution of $Y=g(X)$, where $X$ is a random variable with known distribution and $y=g(x)$ is some function.

As a natural extension of this concept we may ask ourselves what happens if we have two random variables involved. Let us start with one function of two random variables, i.e., given $X$ and $Y$ and knowing their joint PDF $f_{X,Y}(x,y)$ or their joint CDF $F_{X,Y}(x,y) = {\rm Pr}[X \leq x, Y \leq y]$ we would like to calculate the distribution of $Z = g(X,Y)$ where $z=g(x,y)$ is a function with two arguments, e.g., $z=x+y$.

Again, there are multiple ways of addressing this problem. A natural way would be to calculate the CDF of $Z$ directly, i.e., $F_Z(z) = {\rm Pr}[Z \leq z] = {\rm Pr}[g(X,Y) \leq z]$. In other words, we need to compute the probability of the event that relates to all realization of $X$ and $Y$ which satisfy $g(X,Y) \leq z$. This is easily done by integrating the joint PDF $f_{X,Y}(x,y)$ over all points in the set ${\mathcal D}_z$ which contains all points $(x,y)$ for which $g(x,y) \leq z$. Written out, we have

$$F_Z(z) = {\rm Pr}[Z \leq z] = {\rm Pr}[g(X,Y) \leq z] = \iint_{{\mathcal D}_z} f_{X,Y}(x,y) {\rm d}x {\rm d} y$$

Whether or not this approach is easy to follow depends on two things: (1) how easy it is to parametrize the set ${\mathcal D}_z$ and (2) how easy it is to integrate the joint PDF over ${\mathcal D}_z$.

Let us make an example considering the simple function $g(x,y) = x+y$. Then ${\mathcal D}_z$ contains all points $(x,y)$ for which $x+y \leq z$, i.e., $y\leq z-x$ or $x \leq z-y$. Geometrically, this is the set of points that are on the lower-left of a line with slope -1 and offset $z$, i.e., a line passing through $(z,0)$ and $(0,z)$. The integral over this set is relatively simple, as we can directly write it as

$$\displaystyle F_Z(z) = \int_{-\infty}^{+\infty} \int_{-\infty}^{z-y} f_{X,Y}(x,y) {\rm d}x {\rm d} y = \int_{-\infty}^{+\infty} \int_{-\infty}^{z-x} f_{X,Y}(x,y) {\rm d}y {\rm d} x$$.

Another example is $g(x,y) = \max(x,y)$. Since $\max(x,y) \leq z \Leftrightarrow ((x \leq z) \;\mbox{and} \; (y \leq z))$ we can argue

$$F_Z(z) = {\rm Pr}[\max(X,Y) \leq z] = \int_{-\infty}^z \int_{-\infty}^z f_{X,Y}(x,y) {\rm d}x {\rm d} y$$.

Geometrically, ${\mathcal D}_z$ contains all points on the “lower left” of the point $(z,z)$, i.e., the intersection of the half-planes below $y=z$ and left of $x=z$.

The second extension is to consider two functions of two random variables. Say we are given the distribution of $X$ and $Y$ via their joint PDF, we would like to find the joint PDF of $Z=g(X,Y)$ and $W=h(X,Y)$. There is a closed-form expresion for it as a direct extension of the closed-form expression for the PDF of one function of one random variable. It reads as

$$f_{Z,W}(z,w) = \sum_{i=1}^N \frac{1}{|{\rm det} \ma{J}(x_i,y_i)|} f_{X,Y}(x_i,y_i)$$,

where $(x_i,y_i)$ are all solutions to the system of equations $z=g(x,y)$, $w=h(x,y)$ in $x$ and $y$. Here, $\ma{J}$ is the Jacobian matrix given by

$$\ma{J} = \left[ \begin{array}{cc} \frac{\partial g}{x} & \frac{\partial g}{y} \\ \frac{\partial h}{x} & \frac{\partial h}{y} \end{array}\right]$$.

Moreover, the term ${\rm det} \ma{J}(x_i,y_i)$ means that we first compute the determinant of the Jacobian matrix (in terms of $x$ and $y$) and then insert $x_i(z,w)$ and $y_i(z,w)$.

Example? How about the joint distribution of $X+Y$ and $X-Y$? In this case, solving for $z=x+y$ and $w=x-y$ for $x$ and $y$ is simple, we have one solution given by $x_1 = (z+w)/2$ and $y_1 = (z-w)/2$. The Jacobian matrix is given by

$$\ma{J} = \left[ \begin{array}{cc} 1& 1 \\ 1 & -1 \end{array}\right]$$
and hence its determinant is $-2$ everywhere. This gives the solution for $f_{Z,W}(z,w)$ in the form

$f_{Z,W}(z,w) = \frac{1}{2} f_{X,Y}((z+w)/2,(z-w)/2)$.

As in the 1-D case, this direct solution depends heavily on our ability to solve the given functions for $x$ and $y$, which may be tedious for complicated functions.

Interestingly, the first case where we considered one function of one random variable can be solved also via this approach, simply by creating another “auxiliary” variable, and then marginalizing over it. So once we have $Z=g(X,Y)$ we make up another $W=h(X,Y)$, choosing it such that the remaining calculations are simple. For instance, for $g(x,y) = x+y$ we may choose $h(x,y) = y$. Then, the Jacobian matrix becomes

$$\ma{J} = \left[ \begin{array}{cc} 1& 1 \\ 0 & 1 \end{array}\right]$$

with determinant one. Moreover, we have $x_1 = z-w$ and $y_1 = w$. Therefore, we get

$$f_{Z,W}(z,w) = f_{X,Y}(z-w,w)$$.

The final step is marginalizing out the auxiliary $W$ which gives

$$f_Z(z) = \int_{-\infty}^{+\infty} f_{X,Y}(z-w,w) {\rm d}w.$$

Looks much like a convolution integral, doesn’t it? In fact, if $X$ and $Y$ are statistically independent, we can write $f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y)$ and hence we obtain

$$f_Z(z) = \int_{-\infty}^{+\infty} f_{X}(z-w)\cdot f_Y(w) {\rm d}w = f_X(x) * f_Y(y),$$

where $*$ denotes convolution. This shows very easily that the PDF of the sum of two random variables is the convolution of their PDFs, if they are statistically independent.

# Widely linear systems of equations

I would strongly assume that this must exist already somewhere but I couldn’t find the solution so I thought it would be interesting to post it here. The closest to this I could find is widely linear estimation (e.g., Picinbono, TSP, 1995), but it’s not quite the same.

Consider the following widely linear system of equations:

$$\ma{A} \cdot \ma{x} + \ma{B} \cdot \ma{x}^* = \ma{c},$$

where $\ma{A},\ma{B} \in \compl^{N \times N}$ are square invertible matrices and $\ma{x}, \ma{c} \in \compl^{N \times 1}$ vectors of corresponding dimension. We would like to find the vector $\ma{x}$ which satisfies this equation given $\ma{A}$, $\ma{B}$, and $\ma{c}$, if it exists.

This is a system of equations but it is not linear in $\ma{x}$. It is widely linear though and this implies that it is linear in  ${\rm Re}\{\ma{x}\}$ and in ${\rm Im}\{\ma{x}\}$. Therefore, it can easily be rewritten as a set of linear equations by introducing the real and imaginary parts of all quantities, i.e., $\ma{A} = \ma{A}_{\rm R} + \jmath \ma{A}_{\rm I}$, $\ma{B} = \ma{B}_{\rm R} + \jmath \ma{B}_{\rm I}$, $\ma{c} = \ma{c}_{\rm R} + \jmath \ma{c}_{\rm I}$, and $\ma{x} = \ma{x}_{\rm R} + \jmath \ma{x}_{\rm I}$. Inserting this into the widely linear system of equations and separating real and imaginary parts we obtain

$$\left( \left[ \ma{A}_{\rm R} + \ma{B}_{\rm R} \right] \cdot \ma{x}_{\rm R} + \left[ -\ma{A}_{\rm I} + \ma{B}_{\rm I} \right] \cdot \ma{x}_{\rm I}\right) + \jmath \cdot \left( \left[ \ma{A}_{\rm I} + \ma{B}_{\rm I} \right] \cdot \ma{x}_{\rm R} + \left[ \ma{A}_{\rm R} – \ma{B}_{\rm R} \right] \cdot \ma{x}_{\rm I}\right) = \ma{c}_{\rm R} + \jmath \ma{c}_{\rm I}.$$

As both sides of the equations are complex numbers, they are equal only if the real parts are equal and the imaginary parts are equal. Hence we have two real-valued systems of equation, which we can write in one larger system:

$$\begin{bmatrix} \ma{A}_{\rm R} + \ma{B}_{\rm R} & -\ma{A}_{\rm I} + \ma{B}_{\rm I} \\ \ma{A}_{\rm I} + \ma{B}_{\rm I} & \ma{A}_{\rm R} – \ma{B}_{\rm R} \end{bmatrix} \cdot \begin{bmatrix} \ma{x}_{\rm R} \\ \ma{x}_{\rm I} \end{bmatrix} = \begin{bmatrix} \ma{c}_{\rm R} \\ \ma{c}_{\rm I} \end{bmatrix}.$$
$$\ma{\tilde{C}} \cdot \begin{bmatrix} \ma{x}_{\rm R} \\ \ma{x}_{\rm I} \end{bmatrix} = \begin{bmatrix} \ma{c}_{\rm R} \\ \ma{c}_{\rm I} \end{bmatrix}.$$

Consequently we have exactly one solution in $\ma{x}$ if and only if the block matrix $\ma{\tilde{C}}$ is non-singular, which implies additional conditions on $\ma{A}$ and $\ma{B}$. For instance, a sufficient (but not necessary) condition is that $\ma{A}_{\rm R} + \ma{B}_{\rm R}$ and its Schur complement $\ma{A}_{\rm R} – \ma{B}_{\rm R} – (\ma{A}_{\rm I} + \ma{B}_{\rm I}) \cdot (\ma{A}_{\rm R} + \ma{B}_{\rm R})^{-1} \cdot (-\ma{A}_{\rm I} + \ma{B}_{\rm I})$ are both invertible.

*Update* We found a simpler, solution, please read the follow-up blog post on the closed-form solution that avoids real-valued stacking.

# … and a simpler proof for it

A much simpler proof for the more generic algebraic rule on manipulating quadratic forms I posted recently just became apparent to me. All you need to do is to first show that

$${\rm trace}\{\ma{A}^{\rm H} \cdot \ma{B}\} = {\rm vec}\{\ma{A}\}^{\rm H} \cdot {\rm vec}\{\ma{B}\},$$

which is fairly easy because both represent a short-hand notation for $\sum_k\sum_\ell\sum_m a_{\ell,k}^* b_{\ell,m}$. Once you have this rule set you simply break ${\rm trace}\left\{\ma{A} \cdot \ma{X} \cdot \ma{R} \cdot \ma{X}^{\rm H} \cdot \ma{B}^{\rm H}\right\}$ into ${\rm vec}\left\{\left(\ma{A} \cdot \ma{X}\right)^{\rm H}\right\}^{\rm H} \cdot {\rm vec}\left\{\ma{R} \cdot \ma{X}^{\rm H} \cdot \ma{B}^{\rm H}\right\}$ and apply the rules for generic linear forms twice.