Extended Trigonometric Pythagorean Identities: The Proof

I recently posted on Extended Trigonometric Pythagorean Identities and a “supercharged” version of them, which allows to simplify certain sums of shifted sine functions raised to integer powers. In particular, the claim was that

$$\sum_{n=0}^{N-1} \sin^2\left(x+n\frac{\pi}{N}\right) = \frac{N}{2}$$

or more generally for any integer $k$ and $N\geq k+1$:

$$\sum_{n=0}^{N-1} \sin^{2k}\left(x+n\frac{\pi}{N}\right) =
N \frac{(2k)!}{(k!)^2 2^{2k}} = \frac{N}{\sqrt{\pi}} \frac{\Gamma(k+1/2)}{\Gamma(k+1)} $$

However, in the initial TPI post, I struggled a bit with the proof. It took a while to realize that it is actually quite simple using (a) the algebraic series, i.e.,

$$ \sum_{n=0}^{N-1} q^n = \frac{1-q^N}{1-q}$$

for any $q \in \compl_{\neq 0,1}$ and (b) the fact that $\sin(x) = \frac{1}{2\jmath}\left({\rm e}^{\jmath x} – {\rm e}^{-\jmath x}\right)$. Let us use this relation to prove the following Lemma:

Lemma 1: For any $M \in \mathbb{Z}$, we have

$$ \sum_{n=0}^{N-1}{\rm e}^{\jmath \frac{2\pi}{N} \cdot M \cdot n} =
\begin{cases}
N & M = k \cdot N, \, k\in \mathbb{Z} \\
0 & {\rm otherwise}.
\end{cases}$$

Proof: The proof is simple by realizing that the sum is in fact an arithmetic series with $q={\rm e}^{\jmath \frac{2\pi}{N} \cdot M}$. Obviously, if $M$ is an integer multiple of $N$ we have $q=1$ and the sum is equal to $N$. Otherwise, by the above identity, the series is equal to

$$\frac{1-{\rm e}^{\jmath 2\pi \cdot M}}{1-{\rm e}^{\jmath \frac{2\pi}{N} \cdot M}}
= \frac{{\rm e}^{\jmath \pi \cdot M}}{{\rm e}^{\jmath \frac{\pi}{N} \cdot M}}\cdot
\frac{{\rm e}^{-\jmath \pi \cdot M}-{\rm e}^{\jmath \pi \cdot M}}{{\rm e}^{-\jmath \frac{\pi}{N} \cdot M}-{\rm e}^{\jmath \frac{\pi}{N} \cdot M}}= {\rm e}^{\jmath \frac{\pi}{N}M(N-1)} \cdot \frac{\sin(\pi M)}{\sin(\frac{\pi}{N}M)} = 0,
$$
since the enumerator is zero (and the denominator is not).

Piece of cake.

Now, we can proceed to prove the TPI for $2k=2$:

\begin{align}
\sum_{n=0}^{N-1} \sin^2\left(x+n\frac{\pi}{N}\right)
& = \sum_{n=0}^{N-1}  \left(\frac{1}{2\jmath} {\rm e}^{\jmath(x+n\frac{\pi}{N})}- \frac{1}{2\jmath} {\rm e}^{-\jmath(x+n\frac{\pi}{N})}\right)^2\\
& = -\frac{1}{4}\sum_{n=0}^{N-1} {\rm e}^{2\jmath(x+n\frac{\pi}{N})} + {\rm e}^{-2\jmath(x+n\frac{\pi}{N})} – 2 {\rm e}^{\jmath(x+n\frac{\pi}{N})-\jmath(x+n\frac{\pi}{N})} \\
& = -\frac{1}{4} {\rm e}^{2\jmath x} \sum_{n=0}^{N-1} {\rm e}^{\jmath 2n\frac{\pi}{N}}
-\frac{1}{4} {\rm e}^{-2\jmath x} \sum_{n=0}^{N-1} {\rm e}^{-\jmath 2n\frac{\pi}{N}}
-\frac{1}{4} \sum_{n=0}^{N-1} (-2) \\
& = -\frac{1}{4} \cdot 0 -\frac{1}{4}\cdot 0 -\frac{1}{4}\cdot(-2N) = \frac{N}{2}
\end{align}

where we have used Lemma 1 for $M=1$ and $M=-1$ (which for the Lemma to work requires $M\neq N$ and thus $N\geq 2$). Isn’t that simple? I wonder why I didn’t see it earlier.

Even better yet, this technique allows to extend the proof to other values of $k$. Let’s try $2k=4$:

\begin{align}
\sum_{n=0}^{N-1} \sin^4\left(x+n\frac{\pi}{N}\right)
& = \sum_{n=0}^{N-1}  \left(\frac{1}{2\jmath} {\rm e}^{\jmath(x+n\frac{\pi}{N})}- \frac{1}{2\jmath} {\rm e}^{-\jmath(x+n\frac{\pi}{N})}\right)^4\\
& = \frac{1}{16} \sum_{n=0}^{N-1} {\rm e}^{4 \jmath(x+n\frac{\pi}{N})}
-4 {\rm e}^{3\jmath(x+n\frac{\pi}{N}) -\jmath(x+n\frac{\pi}{N})}
+6 {\rm e}^{2\jmath(x+n\frac{\pi}{N}) -2 \jmath(x+n\frac{\pi}{N})} \\ &
-4 {\rm e}^{\jmath(x+n\frac{\pi}{N}) -3\jmath(x+n\frac{\pi}{N})}
+ {\rm e}^{-4 \jmath(x+n\frac{\pi}{N})} \\
& = \frac{1}{16}{\rm e}^{4 \jmath x} \sum_{n=0}^{N-1} {\rm e}^{4 \jmath n\frac{\pi}{N}}
– \frac{4}{16}{\rm e}^{2\jmath x} \sum_{n=0}^{N-1}{\rm e}^{2 \jmath n\frac{\pi}{N}}
+ \frac{6}{16}\sum_{n=0}^{N-1} {\rm e}^{0} \\ &
– \frac{4}{16}{\rm e}^{-2\jmath x} \sum_{n=0}^{N-1}{\rm e}^{-2 \jmath n\frac{\pi}{N}}
+ \frac{1}{16}{\rm e}^{-4 \jmath x} \sum_{n=0}^{N-1} {\rm e}^{-4 \jmath n\frac{\pi}{N}} \\
& = \frac{1}{16} \cdot 0 – \frac{4}{16} \cdot 0 + \frac{6}{16} \cdot N – \frac{4}{16} \cdot 0 + \frac{1}{16} \cdot 0 = \frac{3}{8} N,
\end{align}

where this time we have used Lemma 1 for $M=2, 1, -1, -2$ and thus need $N\geq 3$. This already shows the pattern: The polynomic expansion creates mostly terms with vanishing sums except for the “middle” term where the exponents cancel. The coefficient in front of this term is $\frac{1}{2^{2k}}$ (from the $\frac{1}{2\jmath}$ that comes with expanding the sine) times ${2k \choose k}$ (from the binomial expansion). This explains where the constant $N \cdot \frac{(2k)!}{(k!)^2 2^{2k}}$ comes from.

Formally, we have

\begin{align}
\sum_{n=0}^{N-1} \sin^{2k}\left(x+n\frac{\pi}{N}\right) & =
\sum_{n=0}^{N-1} \left( \frac{1}{2\jmath}{\rm e}^{\jmath(x+n\frac{\pi}{N})}
– \frac{1}{2\jmath}{\rm e}^{-\jmath(x+n\frac{\pi}{N})} \right)^{2k} \\
& =
\sum_{n=0}^{N-1} \frac{1}{(2\jmath)^{2k}} \sum_{\ell = 0}^{2k} {2k \choose \ell} (-1)^\ell
{\rm e}^{(2k-\ell) \jmath (x+n\frac{\pi}{N})}{\rm e}^{-\ell \jmath (x+n\frac{\pi}{N})} \\
& =
\sum_{n=0}^{N-1} \frac{(-1)^k}{2^{2k}} \sum_{\ell = 0}^{2k} (-1)^\ell {2k \choose \ell}
{\rm e}^{2(k-\ell) \jmath (x+n\frac{\pi}{N})}\\
& = \frac{(-1)^k}{2^{2k}} \sum_{\ell = 0}^{2k} (-1)^\ell {2k \choose \ell}
{\rm e}^{2(k-\ell) \jmath x} \sum_{n=0}^{N-1}
{\rm e}^{2(k-\ell) \jmath n\frac{\pi}{N}} \\
& = \frac{1}{2^{2k}} {2k \choose k} N,
\end{align}

where in the last step all terms $\ell \neq k$ vanish due to Lemma 1 applied for $M=k, k-1, …, 1, -1, …, -k+1, -k$. This requires $N\geq k+1$.

Eh voila.

Trigonometric Pythagorean Identity, supercharged

You know how they say good things always come back? Well, I recently stumbled over something that reminded me a lot on a post I had made about generalizations of the Trigonometric Pythagorean Identity. In short, the well-known identity $\sin^2(x)+\cos^2(x)=1$ can be generalized to a sum of $N\geq 2$ terms that are uniformly shifted copies of the sine function, which yields

$$\sum_{n=0}^{N-1} \sin^2\left(x+n\frac{\pi}{N}\right) = \frac{N}{2}$$

Well, I now came across a sum of fourth powers of shifted sine functions and much to my initial surprise, these admit very similar simplifications. In fact, it works for any integer power! Look at what I found:

$$\sum_{n=0}^{N-1} \sin^{2k}\left(x+n\frac{\pi}{N}\right) =
N \frac{(2k)!}{(k!)^2 2^{2k}} = \frac{N}{\sqrt{\pi}} \frac{\Gamma(k+1/2)}{\Gamma(k+1)}  \; k \in \mathbb{N}
$$

for $N\geq k+1$. Isn’t this fascinating? No matter to which even power we raise the shifted sines, their sums will always give a constant in the form $c_k \cdot N$ and this constants $c_k$ can be computed analytically.

Here are some examples: sum of squares ($k=1$): $c_1 = 1/2$, sum of fourth powers ($k=2$): $c_2=3/8$, $k=3: 5/16$, $k=4: 35/128$ and so on. Moreover, I think I know how to prove even the “supercharged” version of the TPI for any $k$. I’ll write about it in another blog post.

*Update*: And here is the proof!

*Update2*: Just another small addition: The coefficients $c_k$ satisfy an interesting recurrence relation since you can compute $c_k$ as

$$c_k = \frac{2k+1}{2k+2} c_{k-1}$$

with $c_0 = 1$. This makes clear what structure they actually have: $c_1 = \frac{1}{2}$, $c_2 = \frac{1 \cdot 3}{2 \cdot 4}$, $c_3 = \frac{1 \cdot 3 \cdot 5}{2\cdot 4\cdot 6}$, and so on. Each $c_k$ is equal to the product of the first $k$ odd numbers divided by the product of the first $k$ even numbers. If you like, you can write them with double factorials via

$$c_k = \frac{(2k-1)!!}{(2k)!!}.$$

They are highly related to Wallis’ integrals $W_n$. Maybe this is not too surprising since they are defined as

$$W_n = \int_{0}^{\frac{\pi}{2}} \cos^n(x) {\rm d}x$$

and satisfy

$$W_{2n} = \frac{(2n-1)!!}{(2n)!!} \frac{\pi}{2}.$$

So what the generalized TPI above shows is that the equispaced $N$-term sum delivers somehow the same value as the integral, no matter where we start the sum. Kind of cool I think.

Trigonometric Pythagorean Identity, extended

Here is to another curious and funny identity that has popped up many times and I’ve never quite gotten to find a proof for it:

$$ \sum_{n=0}^{N-1} \sin^2\left( x+ \frac{\pi}{N} n \right) = \frac{N}{2}, \; \forall N \in \mathbb{N} \geq 2$$

The Trigonometric Pythagorean Identity

What’s the connection to good old Pythagoras you ask? Well, let us look at it for $N=2$:

$$\sin^2 \left( x\right) + \sin^2\left( x + \frac{\pi}{2}\right) = 1$$

Since we can think of the cosine as a shifted version of the sine, where the shift is $\pi/2$, i.e., $\cos(x) = \sin(x+\pi/2)$, the last identity reads in fact as:

$$\sin^2 \left( x\right) + \cos^2\left( x \right) = 1$$

This may look familiar, in fact it is often referred to as the Trigonometric Pythagorean Identity (I’ll call it TPI for short). Here is a graphical illustration: On the left-hand side you see the sine and the cosine function, on the right-hand side their squares and the sum of the squares. They add up to a constant.

sin_cos sin_cos_square

Analytical arguments aside what is the link to Pythagoras’ Theorem? Well, there is a nice geometric illustrations of this. Consider a right triangle with one of its angles $\theta$, the sides being $a$ (the adjacent), $b$ (the opposite) and $c$ (the hypotenuse). From Pythagoras Theorem we know that $a^2 + b^2 = c^2$. At the same time we know from the definition of the trigonometric functions that sine and the cosine are related to the ratio between adjacent (opposite) and hypotenuse. In equations:

$$ \sin(\theta) = \frac{b}{c} \\ \cos(\theta) = \frac{a}{c}$$

Squaring and summing gives:

$$ \sin(\theta)^2 + \cos(\theta)^2 = \frac{a^2}{c^2} + \frac{b^2}{c^2} = \frac{a^2+b^2}{c^2} = 1$$

which explains the link to Pythagoras (a more complete version of the argument that also works for obtuse angles can be found on wikipedia).

Extended TPI for N>2

Okay, that was very elementary and maybe even boring stuff. The cool thing is that this works under much more general settings. I have not found a real name of the identity above but you could see it as a straightforward generalization of the TPI. Instead of considering a squared sine and one shifted copy (i.e., summing the shifts 0 and $\pi/2$) you can consider a sum of $N$ shifted copies that still add up to a constant provided that the shifts are chosen uniformly (i.e., all integer multiples of $\pi/N$).

Here is a graphical illustration for $N=3$ and $N=4$:

sin_cos_square_N3 sin_cos_square_N4

Cool stuff, right? Well I think so. However, the actual question of this blog post is, how do we prove this?

Proof idea 1: Decomposing into interleaved sums

This is the most straightforward and I like it a lot for its elegance. It tries to reduce an arbitary $N$ to elementary values of $N$ that have been proven once. It works very nicely for even $N$, where we can reduce to $N=2$. The drawback is that we have to do all other $N$ separately.

Let us start with $N=4$ for simplicity. For $N=4$ the sum looks like this:

$$S = \sin^2\left(x\right) + \sin^2\left(x + \frac{\pi}{4}\right) + \sin^2\left(x + \frac{\pi}{2}\right) + \sin^2\left(x + \frac{3\pi}{4}\right)$$

Actually, the first and the third term look familiar. They represent the case $N=2$ which we assume proven. Let us rearrange the sum by exchanging second and third term:

$$S = \underbrace{\sin^2\left(x\right)  + \sin^2\left(x + \frac{\pi}{2}\right)}_{f_1(x)} + \underbrace{\sin^2\left(x + \frac{\pi}{4}\right) + \sin^2\left(x + \frac{3\pi}{4}\right)}_{f_2(x)}$$

Now, for $f_1(x)$ we know that $f_1(x) = 1, \; \forall x$ from the TPI. The key is to realize that $f_2(x)$ is actually a shifted copy of $f_1(x)$, i.e., $f_2(x) = f_1(x + \pi/4)$, since both arguments just get an “extra” $\pi/4$ to it (remember that $3/4\pi$ is nothing but $\pi/2 + \pi/4$).

Now it’s easy, we have:

$$ S = f_1(x) + f_2(x) = f_1(x) + f_1(x-\pi/4) = 1 + 1 = 2,$$

since $f_1(x) = 1, \; \forall x$.

 

Does this work for any even $N$? Sure! We always have $f_1(x)$ in there (the first and the $N/2$-th term) and all other terms are $\pi/N$ shifted versions of it. Mathematically:

\begin{align} & \sum_{n=0}^{N-1} \sin^2\left( x+ \frac{\pi}{N} n \right) \\
= & \sum_{n=0}^{N/2-1} \sin^2\left( x+ \frac{\pi}{N} n \right) + \sin^2\left( x+ \pi + \frac{\pi}{N} n \right) \\
= & \sum_{n=0}^{N/2-1} \sin^2\left( x+ \frac{\pi}{N} n \right) + \cos^2\left( x+ \frac{\pi}{N} n \right) \\
= & \sum_{n=0}^{N/2-1} f_1\left(x + \frac{\pi}{N} n\right) \\
= & \sum_{n=0}^{N/2-1} 1 = \frac{N}{2}. \end{align}

Piece of cake.

This is nice but here is the trouble: for odd $N$ it’s not so easy. It already breaks down for $N=3$ since $f_1(x)$ does not even appear there. Even if you prove it for $N=3$, you can use this to prove it for all integer multiples of $3$ but it will not work for $N=5$. In other words, you would have to do this for every prime number separately. Not very elegant.

In essence, in case you were thinking of induction here, this will not work. Increasing $N$ by one changes all terms in the sum so you cannot go from $N$ to $N+1$ (only from $N$ to $2N$).

So, to prove it for arbitrary $N$ we need something else. What is there?

Proof idea 2: Power series

Wikipedia suggests that the original TPI can be proven using the definition of sine and cosine via its power series, i.e.,

\begin{align}
\sin(x) & = \sum_{n=0}^\infty \frac{(-1)^n}{(2n+1)!} x^{2n+1} \\
\cos(x) & = \sum_{n=0}^\infty \frac{(-1)^n}{(2n)!} x^{2n}
\end{align}

Based on these, you can find the power series for $\sin^2$ and $\cos^2$, then  sum them and find out that $n=0$ gives $0 + 1 = 1$ and all $n>0$ give $(1-1)^{2n} = 0$.

Oookay, to be honest I haven’t tried this approach for the extended TPI. I can imagine it would work but for sure it would be a lot of work and quite surely far away from being an elegant proof. In highschool, we referred to these as the “crowbar-type” solutions.

It’s one of the typical points where your favorite textbook might simply say: the details of the proof are left as an excercise to the reader. Go ahead and give it a shot if you like… 😉

 Proof idea 3: Differentiating

As usual, writing things down makes the whole thing much more clear and more often than not gives new ideas. I just thought about differentiating the whole sum. If we can show that the differential is zero everywhere we at least know that the sum is constant. We still have to calculate this constant for a complete answer of course.

To calculate the differential let us look at $\sin^2(x)$ for a second. Its differential satisfies a cute relation:

$$\frac{{\rm d} \sin^2(x)}{{\rm d} x} = 2 \sin(x) \cos(x) = \sin(2x).$$

Why cute? Well, look at it for a second. If I differentiate $x^2$ I get $2x$, the 2 goes from the exponent in front of it. If I differentiate $\sin^2(x)$ I get $\sin(2x)$, the 2 goes inside the sine function. But please be careful here, this is not a general rule! It is a pure coincidence, like a little joke made by nature. In fact, it neither works for $n>2$ nor does it work for the cosine, which satisfies

$$\frac{{\rm d} \cos^2(x)}{{\rm d} x} = -2 \sin(x) \cos(x) = -\sin(2x).$$

Now let us turn to our sum $S$. We get

$$ \frac{{\rm d} S}{{\rm d} x} = \sum_{n=0}^{N-1} \sin\left( 2x+ \frac{2\pi}{N} n \right),$$

which we claim should be zero everywhere. This is an interesting one: if you overlap a sine with all its shifted copies (shifts being uniformly chosen from $[0,2\pi)$), the net result is zero. This is something I’ve seen many times before, especially in its relation to complex numbers.

Here is one geometrical interpretation. From Euler’s identity, we have

$${\rm e}^{\jmath x} = \cos(x) + \jmath \sin(x).$$

Therefore, if we write the following sum

$$ \sum_{n=0}^{N-1} {\rm e}^{\jmath (x+\frac{2\pi}{N}n)}
= \sum_{n=0}^{N-1} \cos\left(x+\frac{2\pi}{N}n\right) + \jmath \sum_{n=0}^{N-1} \sin\left(x+\frac{2\pi}{N}n\right),$$

i.e., our desired sum appears in the imaginary part (besides for a factor 2 on the $x$ but this does not matter since our argument applies for any $x$ so it can be rescaled). I am claiming the whole (complex) sum is zero, which shows that our desired sum is zero. Why is that so? Let’s factor out ${\rm e}^{\jmath x}$:

$$ \sum_{n=0}^{N-1} {\rm e}^{\jmath (x+\frac{2\pi}{N}n}) =  {\rm e}^{\jmath x} \underbrace{\sum_{n=0}^{N-1} {\rm e}^{\jmath \frac{2\pi}{N}n}}_z.$$

Now, what is $z$? Imagine a complex plane. Every term in the sum for $z$ is a point on the unit circle, spaced evenly. You can imagine the sum like this: take arrows to all points on the unit circles and add them all together. Find the sum of all these arrows. What is the net direction that remains when we’ve gone in all directions there are evenly? Well, it cannot be any point other than zero. Here is a visualization for it:

illustration_vectors_sum_zero

The whole thing is completely symmetric so there cannot be any preferred direction. We must have $z=0$, which proves the whole thing.

Of course this is, while being illustrative, a little hand-wavy. There are many more ways to prove $z=0$ more rigorously. I’m not sure which is the most concise and elegant. Right now, I keep thinking about DFTs here, since ${\rm e}^{\jmath \frac{2\pi}{N}n}$ is actually the second row of the $N \times N$ DFT matrix, which makes $z=0$ equivalent to showing that the first and the second row of a DFT matrix are mutually orthogonal. At the same time, ${\rm e}^{\jmath \frac{2\pi}{N}n}$ is the DFT of the sequence $\delta[n-1] = [0,1,0,…,0]$ in which case $z=0$ follows from the fact that the sum of the DFT coefficients is always equal to the first value of the sequence. But that’s just what comes to my mind first, I guess there are easier ways to prove this.

So there you have it, if we differentiate the sum, it’s easy to show that it must be constant. However, this is not the complete proof of the extended TPI yet, we still have to compute the constant. I’m not sure that is easy to do for an arbitrary $N$, is it?

To conclude, so far

So, what do we have? A quite elegant proof for even $N$ that does not generalize to odd $N$, a possible tedious proof idea based on power series, and a sort-of elegant proof that the sum is constant for arbitrary $N$ which still lacks the computation of this constant.

Now I’m curious: do you have better ideas to prove it? Have you come across this or similar identities? Do you know if it has a name and where it has been proven first?

 

*Update*: I have found a generalization to sine functions raised to any even power $2k$ and also an elegant proof, which also proves the TPI discussed in this post. Check it out if you like.

Widely linear system of equations, revisited

It seems I just cannot stop talking about widely linear equations. Sorry for being to monotonous. My colleague Dominik recently made me aware of the fact that it is possible to solve the widely linear system of equations

$$ \ma{A} \cdot \ma{x} + \ma{B} \cdot \ma{x}^* = \ma{c}$$

that I already visited in an earlier blog post in a much more convenient form. I was suggesting to stack the real and the imaginary parts of both $\ma{x}$ and $\ma{c}$ into long vectors which then transforms the entire system of $N$ complex-valued equations into a system of $2N$ real-valued equations, using real- and imaginary parts of $\ma{A}$ and $\ma{B}$ to construct the coefficients.

It is possible to completely avoid that. Simply conjugate the original equation, this gives you

$$ \ma{A}^* \cdot \ma{x}^* + \ma{B}^* \cdot \ma{x} = \ma{c}^*.$$

Combining both equations in one we can write

$$ \begin{bmatrix} \ma{A} & \ma{B} \\ \ma{B}^* & \ma{A}^* \end{bmatrix} \cdot \begin{bmatrix} \ma{x}  \\ \ma{x}^*  \end{bmatrix} = \begin{bmatrix} \ma{c}  \\ \ma{c}^*  \end{bmatrix}. $$

Therefore, provided the inverse exists, we have

$$ \begin{bmatrix} \ma{x}  \\ \ma{x}^*  \end{bmatrix} = \begin{bmatrix} \ma{A} & \ma{B} \\ \ma{B}^* & \ma{A}^* \end{bmatrix}^{-1} \cdot  \begin{bmatrix} \ma{c}  \\ \ma{c}^*  \end{bmatrix}.$$

This is of course redundant, since we only need to compute $\ma{x}$ (once) and not $\ma{x}$ and $\ma{x}^*$. Instead of simply ditching the last $N$ rows, we can avoid computing them altogether, by applying the rules for inverting two by two block matrices. The final solution then becomes

$$ \ma{x} = \Big(\ma{A} – \ma{B} (\ma{A}^*)^{-1} \ma{B}^*\Big)^{-1} \cdot \ma{c} \; – \ma{A}^{-1} \ma{B} \Big(\ma{A}^* – \ma{B}^* \ma{A}^{-1} \ma{B}\Big)^{-1} \cdot \ma{c}^*.$$

So simple. Why didn’t I see it earlier?

Widely Linear Least Squares

Together with my colleague Jens I recently stumbled across a widely linear estimation problem. More precisely, we were trying to estimate a vector $\ma{x} \in \compl^{N}$ such that $\ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix}$ was as close as possible to a given $\ma{b} \in \compl^{M}$ for a given matrix $\ma{C} \in \compl^{M \times 2N}$. This is not a linear Least Squares problem (due to the non-linear complex conjugation operator), however, it is linear in real and imaginary parts of $\ma{x}$. Hence, we call it widely linear.

How to solve such problems? Before we can calculate a solution, we need an appropriate cost function. Let’s assume for now that $M\geq 2N$ so that we would expect the system of equations to be overdetermined. A proper cost function for $\ma{x}$ would then be

$$ \ma{x}_{\rm opt} = \arg \min_{\ma{x}} \left\| \ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix} – \ma{b} \right\|_2^2,$$

i.e., find the $\ma{x}$ that minimizes the Euclidean distance between $\ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix}$ and $\ma{b}$. How to solve it? I was somehow reminded of a blog post I made earlier on solving widely linear systems of equations and argued that the same solution is applicable. You simply break $\ma{C}$ into $\ma{C} = \begin{bmatrix}\ma{A} & \ma{B}\end{bmatrix}$ where $\ma{A}, \ma{B} \in \compl^{M \times N}$ contain the first and the last $N$ columns of $\ma{C}$. Then you rewrite the whole system into a real-valued system of equations following the technique outlined in the earlier blog post. This gives you

$$ \ma{x}_{\rm opt} = \arg \min_{\ma{x}} \left\| \begin{bmatrix}
\ma{A}_{\rm R} + \ma{B}_{\rm R} & -\ma{A}_{\rm I} + \ma{B}_{\rm I}  \\
\ma{A}_{\rm I} + \ma{B}_{\rm I} & \ma{A}_{\rm R} – \ma{B}_{\rm R}
\end{bmatrix} \cdot \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix} – \begin{bmatrix}\ma{b}_{\rm R}\\ \ma{b}_{\rm I}\end{bmatrix} \right\|_2^2,$$

where $\ma{A} = \ma{A}_{\rm R}+\jmath \ma{A}_{\rm I}$,$\ma{B} = \ma{B}_{\rm R}+\jmath \ma{B}_{\rm I}$, $\ma{b} = \ma{b}_{\rm R}+\jmath \ma{b}_{\rm I}$, and $\ma{x} = \ma{x}_{\rm R}+\jmath \ma{x}_{\rm I}$. The Least Squares solution to this optimization problem is simple: it’s a regular real-valued system of equations with $2M$ equations and $2N$ unknowns so applying the pseudo-inverse of the block matrix provides the Least Squares solution in $\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}$. Also, the real-valued stacking we have applied did not change the norm since for any complex vector $\ma{x}$ we have $\left\|\ma{x}\right\|_2^2 = \left\|\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}\right\|_2^2$.

So it’s valid and it works. But it’s not very elegant. We have to introduce a lot of new quantities and write down the big block matrix. Is there a more elegant way? One that avoids stacking real and imaginary parts of $\ma{C}$ and $\ma{b}$?

To find it, we started rewriting $\begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix}$. Since $\ma{x} = \ma{x}_{\rm R}+\jmath \ma{x}_{\rm I}$ and $\ma{x}^* = \ma{x}_{\rm R}-\jmath \ma{x}_{\rm I}$ we can write

$$ \ma{C} \cdot \begin{bmatrix}\ma{x}\\ \ma{x}^*\end{bmatrix} = \underbrace{\ma{C} \cdot \begin{bmatrix} \ma{I}_N & \jmath \ma{I}_N \\ \ma{I}_N & – \jmath \ma{I}_N \end{bmatrix}}_{\ma{\tilde{C}}} \cdot \begin{bmatrix}\ma{x}_{\rm R} \\ \ma{x}_{\rm I}\end{bmatrix}.$$

(As a side note: if we do decide to break $\ma{C}$ into $\begin{bmatrix}\ma{A} & \ma{B}\end{bmatrix}$ as above then we have $\ma{\tilde{C}} = \begin{bmatrix}\ma{A}+\ma{B} & \jmath \ma{A } – \jmath \ma{B}\end{bmatrix}$.)

Our optimization problem then takes the nice and simple form

$$ \ma{x}_{\rm opt} = \arg \min_{\ma{x}} \left\| \ma{\tilde{C}} \cdot \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix} – \ma{b} \right\|_2^2.$$

One may be tempted to say that this shows the solution: simply apply the pseudo-inverse of $\ma{\tilde{C}}$. Unfortunately, it is not quite as simple. If we do this, the solution is in general a complex-valued vector. However, $\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}$ must be real-valued by construction. So how do we solve this optimization problem?

Well, we have a cost function $J(\ma{y}) = \left\|\ma{\tilde{C}} \cdot \ma{y} – \ma{b}\right\|_2^2$ that we want to minimize subject to the constraint that $\ma{y}$ is real-valued or equivalently $\ma{y} = \ma{y}^*$ or equivalently $\ma{y}-\ma{y}^* = \ma{0}$. For me this calls for the method of Lagrange multipliers! The Lagrangian for this constrained optimization problem then looks like this:

$$L(\ma{y},\ma{\lambda}) = \left\|\ma{\tilde{C}} \cdot \ma{y} – \ma{b}\right\|_2^2 + 2 \cdot {\rm Re}\{\ma{\lambda}^{\rm H} \cdot (\ma{y}-\ma{y}^*)\}.$$

I would like to spare the details of the derivation since it’s quite straight-forward: compute the gradients of $L$, set to zero, and solve for $\ma{y}$. The solution we obtain then provides the optimal $\begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}$ directly. Jens did the calculations (thanks!) and got this result:

$$ \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}_{\rm opt} = \ma{\tilde{C}}^+ \cdot \ma{b} \; – \jmath \cdot \ma{G}^{-1} \cdot {\rm Re}\left\{\ma{G}^{-1} \right\}^{-1} \cdot {\rm Im}\{\ma{\tilde{C}}^+ \cdot \ma{b}\},$$

where $\ma{G} = \ma{\tilde{C}}^{\rm H}\ma{\tilde{C}}$ and it was assumed that $\ma{G}$ and  ${\rm Re}\left\{\ma{G}^{-1} \right\}$ are both invertible. One could also write it as

$$ \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}_{\rm opt} = {\rm Re}\{\ma{\tilde{C}}^+ \cdot \ma{b}\} \; + \left[ \ma{I} – \ma{G}^{-1} \cdot {\rm Re}\left\{\ma{G}^{-1} \right\}^{-1} \right]\cdot \jmath \cdot {\rm Im}\{\ma{\tilde{C}}^+ \cdot \ma{b}\}.$$

This one is (sort of) instructive. If $\ma{G}$ is real then ${\rm Re}\left\{\ma{G}^{-1} \right\}^{-1} = \ma{G}$ and hence the whole second part vanishes. In this case, the optimal solution is to take simply the real part of the LS solution (note that this happens if $\ma{\tilde{C}}$ is also real-valued in which case it would be more efficient to drop the imaginary part of $\ma{b}$ and then solve the real-valued system of equations directly). Otherwise, we need an additional term that depends on the imaginary part of the LS solution, somehow “projected” through this funny matrix $\ma{I} – \ma{G}^{-1} \cdot {\rm Re}\left\{\ma{G}^{-1} \right\}^{-1}$.

Although I kind of like this second solution better since it avoids the real-valued stacking and is partially instructive where real and imaginary parts go, it has at least one major drawback. For the first solution, we’ve reduced the problem onto a $2M$ by $2N$ system of real-valued linear equations. Applying the pseudo-inverse there we know exactly that it gives is the least squares solution if the equivalent $\ma{b}$ is not in the column space of the equivalent $\ma{C}$ (“no exact solution”) and it gives us the minimum norm solution if the equivalent $\ma{C}$ has a non-empty kernel, i.e, not full column rank (“infinitely many solutions”). Most importantly, it can do both at the same time.

This does not carry over to our complex solution. We derived it for the overdetermined Least Squares case, assuming full column rank. We can do a similar derivation for the case where the existence of infinitely many exact solutions is assumed and we seek the one with the smallest norm. However, the result is different and it is not easy to come up with one solution that accomplishes both at the same time. For those that are curious, the second case yields the solution

$$ \begin{bmatrix}\ma{x}_{\rm R}\\ \ma{x}_{\rm I}\end{bmatrix}_{\rm opt} = {\rm Re}\{\ma{\tilde{C}}^+ \cdot \ma{b}\} \; + \left[ \ma{I} – \ma{P} \cdot {\rm Re}\left\{\ma{P} \right\}^{-1} \right]\cdot \jmath \cdot {\rm Im}\{\ma{\tilde{C}}^+ \cdot \ma{b}\},$$

where $\ma{P} = \ma{I} – \ma{\tilde{C}}^{\rm H} \cdot \left(\ma{\tilde{C}} \cdot \ma{\tilde{C}}^{\rm H}\right)^{-1} \ma{\tilde{C}}$.