Fun with statistics – transformations of random variables part 2

I recently posted on how to find the distribution of functions of random variables, i.e., the distribution of $Y=g(X)$, where $X$ is a random variable with known distribution and $y=g(x)$ is some function.

 

As a natural extension of this concept we may ask ourselves what happens if we have two random variables involved. Let us start with one function of two random variables, i.e., given $X$ and $Y$ and knowing their joint PDF $f_{X,Y}(x,y)$ or their joint CDF $F_{X,Y}(x,y) = {\rm Pr}[X \leq x, Y \leq y]$ we would like to calculate the distribution of $Z = g(X,Y)$ where $z=g(x,y)$ is a function with two arguments, e.g., $z=x+y$.

Again, there are multiple ways of addressing this problem. A natural way would be to calculate the CDF of $Z$ directly, i.e., $F_Z(z) = {\rm Pr}[Z \leq z] = {\rm Pr}[g(X,Y) \leq z]$. In other words, we need to compute the probability of the event that relates to all realization of $X$ and $Y$ which satisfy $g(X,Y) \leq z$. This is easily done by integrating the joint PDF $f_{X,Y}(x,y)$ over all points in the set ${\mathcal D}_z$ which contains all points $(x,y)$ for which $g(x,y) \leq z$. Written out, we have

$$F_Z(z) = {\rm Pr}[Z \leq z] = {\rm Pr}[g(X,Y) \leq z] = \iint_{{\mathcal D}_z} f_{X,Y}(x,y) {\rm d}x {\rm d} y$$

Whether or not this approach is easy to follow depends on two things: (1) how easy it is to parametrize the set ${\mathcal D}_z$ and (2) how easy it is to integrate the joint PDF over ${\mathcal D}_z$.

Let us make an example considering the simple function $g(x,y) = x+y$. Then ${\mathcal D}_z$ contains all points $(x,y)$ for which $x+y \leq z$, i.e., $y\leq z-x$ or $x \leq z-y$. Geometrically, this is the set of points that are on the lower-left of a line with slope -1 and offset $z$, i.e., a line passing through $(z,0)$ and $(0,z)$. The integral over this set is relatively simple, as we can directly write it as

$$\displaystyle F_Z(z) = \int_{-\infty}^{+\infty} \int_{-\infty}^{z-y} f_{X,Y}(x,y) {\rm d}x {\rm d} y = \int_{-\infty}^{+\infty} \int_{-\infty}^{z-x} f_{X,Y}(x,y) {\rm d}y {\rm d} x$$.

Another example is $g(x,y) = \max(x,y)$. Since $\max(x,y) \leq z \Leftrightarrow ((x \leq  z) \;\mbox{and} \; (y \leq z))$ we can argue

$$F_Z(z) = {\rm Pr}[\max(X,Y) \leq z] = \int_{-\infty}^z \int_{-\infty}^z f_{X,Y}(x,y) {\rm d}x {\rm d} y$$.

Geometrically, ${\mathcal D}_z$ contains all points on the “lower left” of the point $(z,z)$, i.e., the intersection of the half-planes below $y=z$ and left of $x=z$.

 

The second extension is to consider two functions of two random variables. Say we are given the distribution of $X$ and $Y$ via their joint PDF, we would like to find the joint PDF of $Z=g(X,Y)$ and $W=h(X,Y)$. There is a closed-form expresion for it as a direct extension of the closed-form expression for the PDF of one function of one random variable. It reads as

$$f_{Z,W}(z,w) = \sum_{i=1}^N \frac{1}{|{\rm det} \ma{J}(x_i,y_i)|} f_{X,Y}(x_i,y_i)$$,

where $(x_i,y_i)$ are all solutions to the system of equations $z=g(x,y)$, $w=h(x,y)$ in $x$ and $y$. Here, $\ma{J}$ is the Jacobian matrix given by

$$ \ma{J} = \left[ \begin{array}{cc}
\frac{\partial g}{x} & \frac{\partial g}{y} \\
\frac{\partial h}{x} & \frac{\partial h}{y}
\end{array}\right]$$.

Moreover, the term ${\rm det} \ma{J}(x_i,y_i)$ means that we first compute the determinant of the Jacobian matrix (in terms of $x$ and $y$) and then insert $x_i(z,w)$ and $y_i(z,w)$.

 

Example? How about the joint distribution of $X+Y$ and $X-Y$? In this case, solving for $z=x+y$ and $w=x-y$ for $x$ and $y$ is simple, we have one solution given by $x_1 = (z+w)/2$ and $y_1 = (z-w)/2$. The Jacobian matrix is given by

$$ \ma{J} = \left[ \begin{array}{cc}
1& 1 \\ 1 & -1
\end{array}\right]$$
and hence its determinant is $-2$ everywhere. This gives the solution for $f_{Z,W}(z,w)$ in the form

$f_{Z,W}(z,w) = \frac{1}{2} f_{X,Y}((z+w)/2,(z-w)/2)$.

As in the 1-D case, this direct solution depends heavily on our ability to solve the given functions for $x$ and $y$, which may be tedious for complicated functions.

 

Interestingly, the first case where we considered one function of one random variable can be solved also via this approach, simply by creating another “auxiliary” variable, and then marginalizing over it. So once we have $Z=g(X,Y)$ we make up another $W=h(X,Y)$, choosing it such that the remaining calculations are simple. For instance, for $g(x,y) = x+y$ we may choose $h(x,y) = y$. Then, the Jacobian matrix becomes

$$ \ma{J} = \left[ \begin{array}{cc}
1& 1 \\ 0 & 1
\end{array}\right]$$

with determinant one. Moreover, we have $x_1 = z-w$ and $y_1 = w$. Therefore, we get

$$f_{Z,W}(z,w) = f_{X,Y}(z-w,w)$$.

The final step is marginalizing out the auxiliary $W$ which gives

$$f_Z(z) = \int_{-\infty}^{+\infty} f_{X,Y}(z-w,w) {\rm d}w.$$

Looks much like a convolution integral, doesn’t it? In fact, if $X$ and $Y$ are statistically independent, we can write $f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y)$ and hence we obtain

$$f_Z(z) = \int_{-\infty}^{+\infty} f_{X}(z-w)\cdot f_Y(w) {\rm d}w = f_X(x) * f_Y(y),$$

where $*$ denotes convolution. This shows very easily that the PDF of the sum of two random variables is the convolution of their PDFs, if they are statistically independent.

Widely linear systems of equations

I would strongly assume that this must exist already somewhere but I couldn’t find the solution so I thought it would be interesting to post it here. The closest to this I could find is widely linear estimation (e.g., Picinbono, TSP, 1995), but it’s not quite the same.

Consider the following widely linear system of equations:

$$\ma{A} \cdot \ma{x} + \ma{B} \cdot \ma{x}^* = \ma{c},$$

where $\ma{A},\ma{B} \in \compl^{N \times N}$ are square invertible matrices and $\ma{x}, \ma{c} \in \compl^{N \times 1}$ vectors of corresponding dimension. We would like to find the vector $\ma{x}$ which satisfies this equation given $\ma{A}$, $\ma{B}$, and $\ma{c}$, if it exists.

This is a system of equations but it is not linear in $\ma{x}$. It is widely linear though and this implies that it is linear in  ${\rm Re}\{\ma{x}\}$ and in ${\rm Im}\{\ma{x}\}$. Therefore, it can easily be rewritten as a set of linear equations by introducing the real and imaginary parts of all quantities, i.e., $\ma{A} = \ma{A}_{\rm R} + \jmath \ma{A}_{\rm I}$, $\ma{B} = \ma{B}_{\rm R} + \jmath \ma{B}_{\rm I}$, $\ma{c} = \ma{c}_{\rm R} + \jmath \ma{c}_{\rm I}$, and $\ma{x} = \ma{x}_{\rm R} + \jmath \ma{x}_{\rm I}$. Inserting this into the widely linear system of equations and separating real and imaginary parts we obtain

$$\left( \left[ \ma{A}_{\rm R} + \ma{B}_{\rm R} \right] \cdot \ma{x}_{\rm R} + \left[ -\ma{A}_{\rm I} + \ma{B}_{\rm I} \right] \cdot \ma{x}_{\rm I}\right) + \jmath \cdot \left( \left[ \ma{A}_{\rm I} + \ma{B}_{\rm I} \right] \cdot \ma{x}_{\rm R} + \left[ \ma{A}_{\rm R} – \ma{B}_{\rm R} \right] \cdot \ma{x}_{\rm I}\right) = \ma{c}_{\rm R} + \jmath \ma{c}_{\rm I}.$$

As both sides of the equations are complex numbers, they are equal only if the real parts are equal and the imaginary parts are equal. Hence we have two real-valued systems of equation, which we can write in one larger system:

$$ \begin{bmatrix}
\ma{A}_{\rm R} + \ma{B}_{\rm R} & -\ma{A}_{\rm I} + \ma{B}_{\rm I}  \\
\ma{A}_{\rm I} + \ma{B}_{\rm I} & \ma{A}_{\rm R} – \ma{B}_{\rm R}
\end{bmatrix}
\cdot
\begin{bmatrix} \ma{x}_{\rm R} \\ \ma{x}_{\rm I} \end{bmatrix}
= \begin{bmatrix} \ma{c}_{\rm R} \\ \ma{c}_{\rm I} \end{bmatrix}.$$
$$ \ma{\tilde{C}} \cdot
\begin{bmatrix} \ma{x}_{\rm R} \\ \ma{x}_{\rm I} \end{bmatrix}
= \begin{bmatrix} \ma{c}_{\rm R} \\ \ma{c}_{\rm I} \end{bmatrix}.$$

Consequently we have exactly one solution in $\ma{x}$ if and only if the block matrix $\ma{\tilde{C}}$ is non-singular, which implies additional conditions on $\ma{A}$ and $\ma{B}$. For instance, a sufficient (but not necessary) condition is that $\ma{A}_{\rm R} + \ma{B}_{\rm R}$ and its Schur complement $\ma{A}_{\rm R} – \ma{B}_{\rm R} – (\ma{A}_{\rm I} + \ma{B}_{\rm I}) \cdot (\ma{A}_{\rm R} + \ma{B}_{\rm R})^{-1} \cdot (-\ma{A}_{\rm I} + \ma{B}_{\rm I})$ are both invertible.

 

*Update* We found a simpler, solution, please read the follow-up blog post on the closed-form solution that avoids real-valued stacking.

… and a simpler proof for it

A much simpler proof for the more generic algebraic rule on manipulating quadratic forms I posted recently just became apparent to me. All you need to do is to first show that

$${\rm trace}\{\ma{A}^{\rm H} \cdot \ma{B}\} = {\rm vec}\{\ma{A}\}^{\rm H} \cdot {\rm vec}\{\ma{B}\},$$

which is fairly easy because both represent a short-hand notation for $\sum_k\sum_\ell\sum_m a_{\ell,k}^* b_{\ell,m}$. Once you have this rule set you simply break ${\rm trace}\left\{\ma{A} \cdot \ma{X} \cdot \ma{R} \cdot \ma{X}^{\rm H} \cdot \ma{B}^{\rm H}\right\}$ into ${\rm vec}\left\{\left(\ma{A} \cdot \ma{X}\right)^{\rm H}\right\}^{\rm H} \cdot {\rm vec}\left\{\ma{R} \cdot \ma{X}^{\rm H} \cdot \ma{B}^{\rm H}\right\}$ and apply the rules for generic linear forms twice.

Even more algebra fun

I just realized that there is an even more general version of the rules for quadratic forms I posted recently:

$${\rm trace}\left(\ma{A} \cdot \ma{X} \cdot \ma{R} \cdot \ma{X}^{\rm H} \cdot \ma{B}^{\rm H}\right) = {\rm vec}\left(\ma{X}\right)^{\rm H} \cdot \left( \ma{R}^{\rm T} \otimes \ma{B}^{\rm H} \cdot \ma{A} \right) \cdot {\rm vec}\left(\ma{X}\right) $$

Enjoy!

Fun with statistics – transformations of random variables part 1

Here is another tool I learned about and have found very useful ever since then: transformations of random variables. The basic idea is this: Given a random variable $X$ from which we know how it is distributed, i.e., either its probability density function (PDF) $f_X(x)$ or its cumulative distribution function (CDF) $F_X(x)$, find the distribution of a new random variable $Y$ formed as $Y=g(X)$, where $y = g(x)$ is a function from $\real$ to $\real$. For example, we may wonder what is the distribution of a squared normal random variable or of the logarithm of a uniform random variable. Tasks like this appear suprisingly often in engineering. Communications are a good example: Your signal of interest is a random process containing random desired information, as well as random undesired perturbations like noise. To find an optimal detector for your information of interest you need to know exactly how it is distributed. Now what happens if your receiver ist best described by some nonlinear function? How do the known (or assumed) distributions of your desired signal and your perturbation change?

Fortunately, the basic procedure of answering the above problem is strikingly simple. There are two basic approaches: one based on the CDF and one based on the PDF. Let us start with the CDF. To find the CDF of $Y$ we need to think what it is: the probability that our random variable $Y$ is less than or equal to some given $y$. Inserting $Y=g(X)$ we can try to solve this for the CDF of $X$:

$$F_Y(y) = {\rm Pr}[Y \leq y] = {\rm Pr}[g(X) \leq y] = {\rm Pr}[X \leq g^{-1}(y)] = F_X(g^{-1}(y))$$.

Really? So simple? Well, not quite. The above step only works if $g(x) = y$ is bijective, i.e., uniquely invertible, and monotonically increasing. The tricky step if of course rewriting the condition $g(X) \leq y$ in terms of a condition on $X$. Here, care needs to be taken. For example, say $g(x) = a\cdot x + b$. Then $g(X) \leq y$ becomes $X \leq (y-b)/a$ only for $a>0$, for $a<0$ we have $X \geq (y-b)/a$. Of course this still admits a simple solution in terms of the CDF since ${\rm Pr}[X \geq (y-b)/a] = 1-{\rm Pr}[X \leq (y-b)/a] + {\rm Pr}[X=(y-b)/a]$ which becomes $1-F_X((y-b)/a)$ if $X$ is continuous ($F_X(x)$ is differentiable).

For other functions there may be multiple solutions. For instance, consider $g(x) = y^2$. This gives $F_Y(y) = {\rm Pr}[X^2 \leq y] = {\rm Pr}[-\sqrt{y} \leq X \leq +\sqrt{y}] = F_X(\sqrt{y}) – F_X(-\sqrt{y})$.

 

What if we want to work with the PDF instead of the CDF? Sometimes the CDF is not available or very complicated to work with. In this case, we can find a similar process for PDFs. It is easily derived from the previous approach on the CDF by taking its derivative and using the chain rule twice. We find that the PDF of $Y$ can be directly computed as

$$f_Y(y) = \sum_{i=1}^n \frac{1}{|g'(x_i(y))|} f_X(x_i(y)),$$

where $x_i(y)$ are all solutions of $y=g(x)$ in terms of $y$ for $i=1, 2, \ldots, N$. Furthermore, $g'(x)$ is the first derivative of $g(x)$ with respect to $x$. Example. Consider $y=g(x) = x^2$. It has two solutions in $x$: $x_1(y) =+\sqrt{y}$ and $x_2(y) = -\sqrt{y}$. We find $g'(x) = 2x$ and inserting $x_i(y)$ we have $g'(x_i(y)) = \pm 2 \sqrt{y}$. Therefore, the final solution for $f_Y(y)$ becomes

$$f_Y(y) = \frac{1}{|2\sqrt{y}|}f_X(\sqrt{y}) + \frac{1}{|-2\sqrt{y}|} f_X(-\sqrt{y})= \frac{1}{2\sqrt{y}}\left(f_X(\sqrt{y})+f_X(-\sqrt{y})\right).$$

 

Of course, the entire process has some limitations, the main one being that we need to be able to solve $g(x) = y$ for $x$. This becomes tedious if there are infinitely many solutions, as for instance for $g(x) = \sin(x)$. Even more so, it becomes impossible for many functions. Try $g(x) = {\rm sinc}(x) = \sin(x)/x$ for instance. Even the number of solutions $N$ depends on $y$ and finding $x_i(y)$ explicitly is not really possible.

Still for a wide range of problems this is a quite handy tool.