Together with my colleague Jens I recently stumbled across a widely linear estimation problem. More precisely, we were trying to estimate a vector $\ma{x} \in \compl^{N}$ such that $\ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix}$ was as close as possible to a given $\ma{b} \in \compl^{M}$ for a given matrix $\ma{C} \in \compl^{M \times 2N}$. This is not a linear Least Squares problem (due to the non-linear complex conjugation operator), however, it is linear in real and imaginary parts of $\ma{x}$. Hence, we call it widely linear.
How to solve such problems? Before we can calculate a solution, we need an appropriate cost function. Let’s assume for now that $M\geq 2N$ so that we would expect the system of equations to be overdetermined. A proper cost function for $\ma{x}$ would then be
$$ \ma{x}_{\rm opt} = \arg \min_{\ma{x}} \left\| \ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix} – \ma{b} \right\|_2^2,$$
i.e., find the $\ma{x}$ that minimizes the Euclidean distance between $\ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix}$ and $\ma{b}$. How to solve it? I was somehow reminded of a blog post I made earlier on solving widely linear systems of equations and argued that the same solution is applicable. You simply break $\ma{C}$ into $\ma{C} = \begin{bmatrix}\ma{A} & \ma{B}\end{bmatrix}$ where $\ma{A}, \ma{B} \in \compl^{M \times N}$ contain the first and the last $N$ columns of $\ma{C}$. Then you rewrite the whole system into a real-valued system of equations following the technique outlined in the earlier blog post. This gives you
$$ \ma{x}_{\rm opt} = \arg \min_{\ma{x}} \left\| \begin{bmatrix}
\ma{A}_{\rm R} + \ma{B}_{\rm R} & -\ma{A}_{\rm I} + \ma{B}_{\rm I} \\
\ma{A}_{\rm I} + \ma{B}_{\rm I} & \ma{A}_{\rm R} – \ma{B}_{\rm R}
\end{bmatrix} \cdot \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix} – \begin{bmatrix}\ma{b}_{\rm R}\\ \ma{b}_{\rm I}\end{bmatrix} \right\|_2^2,$$
where $\ma{A} = \ma{A}_{\rm R}+\jmath \ma{A}_{\rm I}$,$\ma{B} = \ma{B}_{\rm R}+\jmath \ma{B}_{\rm I}$, $\ma{b} = \ma{b}_{\rm R}+\jmath \ma{b}_{\rm I}$, and $\ma{x} = \ma{x}_{\rm R}+\jmath \ma{x}_{\rm I}$. The Least Squares solution to this optimization problem is simple: it’s a regular real-valued system of equations with $2M$ equations and $2N$ unknowns so applying the pseudo-inverse of the block matrix provides the Least Squares solution in $\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}$. Also, the real-valued stacking we have applied did not change the norm since for any complex vector $\ma{x}$ we have $\left\|\ma{x}\right\|_2^2 = \left\|\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}\right\|_2^2$.
So it’s valid and it works. But it’s not very elegant. We have to introduce a lot of new quantities and write down the big block matrix. Is there a more elegant way? One that avoids stacking real and imaginary parts of $\ma{C}$ and $\ma{b}$?
To find it, we started rewriting $\begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix}$. Since $\ma{x} = \ma{x}_{\rm R}+\jmath \ma{x}_{\rm I}$ and $\ma{x}^* = \ma{x}_{\rm R}-\jmath \ma{x}_{\rm I}$ we can write
$$ \ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix} = \underbrace{\ma{C} \cdot \begin{bmatrix} \ma{I}_N & \jmath \ma{I}_N \\ \ma{I}_N & – \jmath \ma{I}_N \end{bmatrix}}_{\ma{\tilde{C}}} \cdot \begin{bmatrix}\ma{x}_{\rm R} \\ \ma{x}_{\rm I}\end{bmatrix}.$$
(As a side note: if we do decide to break $\ma{C}$ into $\begin{bmatrix}\ma{A} & \ma{B}\end{bmatrix}$ as above then we have $\ma{\tilde{C}} = \begin{bmatrix}\ma{A}+\ma{B} & \jmath \ma{A } – \jmath \ma{B}\end{bmatrix}$.)
Our optimization problem then takes the nice and simple form
$$ \ma{x}_{\rm opt} = \arg \min_{\ma{x}} \left\| \ma{\tilde{C}} \cdot \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix} – \ma{b} \right\|_2^2.$$
One may be tempted to say that this shows the solution: simply apply the pseudo-inverse of $\ma{\tilde{C}}$. Unfortunately, it is not quite as simple. If we do this, the solution is in general a complex-valued vector. However, $\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}$ must be real-valued by construction. So how do we solve this optimization problem?
Well, we have a cost function $J(\ma{y}) = \left\|\ma{\tilde{C}} \cdot \ma{y} – \ma{b}\right\|_2^2$ that we want to minimize subject to the constraint that $\ma{y}$ is real-valued or equivalently $\ma{y} = \ma{y}^*$ or equivalently $\ma{y}-\ma{y}^* = \ma{0}$. For me this calls for the method of Lagrange multipliers! The Lagrangian for this constrained optimization problem then looks like this:
$$L(\ma{y},\ma{\lambda}) = \left\|\ma{\tilde{C}} \cdot \ma{y} – \ma{b}\right\|_2^2 + 2 \cdot {\rm Re}\{\ma{\lambda}^{\rm H} \cdot (\ma{y}-\ma{y}^*)\}.$$
I would like to spare the details of the derivation since it’s quite straight-forward: compute the gradients of $L$, set to zero, and solve for $\ma{y}$. The solution we obtain then provides the optimal $\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}$ directly. Jens did the calculations (thanks!) and got this result:
$$ \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}_{\rm opt} = \ma{\tilde{C}}^+ \cdot \ma{b} \; – \jmath \cdot \ma{G}^{-1} \cdot {\rm Re}\left\{\ma{G}^{-1} \right\}^{-1} \cdot {\rm Im}\{\ma{\tilde{C}}^+ \cdot \ma{b}\},$$
where $\ma{G} = \ma{\tilde{C}}^{\rm H}\ma{\tilde{C}}$ and it was assumed that $\ma{G}$ and ${\rm Re}\left\{\ma{G}^{-1} \right\}$ are both invertible. One could also write it as
$$ \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}_{\rm opt} = {\rm Re}\{\ma{\tilde{C}}^+ \cdot \ma{b}\} \; + \left[ \ma{I} – \ma{G}^{-1} \cdot {\rm Re}\left\{\ma{G}^{-1} \right\}^{-1} \right]\cdot \jmath \cdot {\rm Im}\{\ma{\tilde{C}}^+ \cdot \ma{b}\}.$$
This one is (sort of) instructive. If $\ma{G}$ is real then ${\rm Re}\left\{\ma{G}^{-1} \right\}^{-1} = \ma{G}$ and hence the whole second part vanishes. In this case, the optimal solution is to take simply the real part of the LS solution (note that this happens if $\ma{\tilde{C}}$ is also real-valued in which case it would be more efficient to drop the imaginary part of $\ma{b}$ and then solve the real-valued system of equations directly). Otherwise, we need an additional term that depends on the imaginary part of the LS solution, somehow “projected” through this funny matrix $\ma{I} – \ma{G}^{-1} \cdot {\rm Re}\left\{\ma{G}^{-1} \right\}^{-1}$.
Although I kind of like this second solution better since it avoids the real-valued stacking and is partially instructive where real and imaginary parts go, it has at least one major drawback. For the first solution, we’ve reduced the problem onto a $2M$ by $2N$ system of real-valued linear equations. Applying the pseudo-inverse there we know exactly that it gives is the least squares solution if the equivalent $\ma{b}$ is not in the column space of the equivalent $\ma{C}$ (“no exact solution”) and it gives us the minimum norm solution if the equivalent $\ma{C}$ has a non-empty kernel, i.e, not full column rank (“infinitely many solutions”). Most importantly, it can do both at the same time.
This does not carry over to our complex solution. We derived it for the overdetermined Least Squares case, assuming full column rank. We can do a similar derivation for the case where the existence of infinitely many exact solutions is assumed and we seek the one with the smallest norm. However, the result is different and it is not easy to come up with one solution that accomplishes both at the same time. For those that are curious, the second case yields the solution
$$ \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}_{\rm opt} = {\rm Re}\{\ma{\tilde{C}}^+ \cdot \ma{b}\} \; + \left[ \ma{I} – \ma{P} \cdot {\rm Re}\left\{\ma{P} \right\}^{-1} \right]\cdot \jmath \cdot {\rm Im}\{\ma{\tilde{C}}^+ \cdot \ma{b}\},$$
where $\ma{P} = \ma{I} – \ma{\tilde{C}}^{\rm H} \cdot \left(\ma{\tilde{C}} \cdot \ma{\tilde{C}}^{\rm H}\right)^{-1} \ma{\tilde{C}}$.