{"id":96,"date":"2012-01-14T15:29:48","date_gmt":"2012-01-14T15:29:48","guid":{"rendered":"http:\/\/florian-roemer.de\/blog\/?p=96"},"modified":"2012-01-30T08:10:49","modified_gmt":"2012-01-30T08:10:49","slug":"fun-with-statistics-transformations-of-random-variables-part-2","status":"publish","type":"post","link":"https:\/\/florian-roemer.de\/blog\/fun-with-statistics-transformations-of-random-variables-part-2\/","title":{"rendered":"Fun with statistics &#8211; transformations of random variables part 2"},"content":{"rendered":"<p>I <a title=\"Fun with statistics \u2013 transformations of random variables part 1\" href=\"http:\/\/florian-roemer.de\/blog\/?p=62\">recently posted <\/a>on how to find the distribution of functions of random variables, i.e., the distribution of $Y=g(X)$, where $X$ is a random variable with known distribution and $y=g(x)$ is some function.<\/p>\n<p>&nbsp;<\/p>\n<p>As a natural extension of this concept we may ask ourselves what happens if we have two random variables involved. Let us start with <strong>one function of two random variables<\/strong>, i.e., given $X$ and $Y$ and knowing their joint PDF $f_{X,Y}(x,y)$ or their joint CDF $F_{X,Y}(x,y) = {\\rm Pr}[X \\leq x, Y \\leq y]$ we would like to calculate the distribution of $Z = g(X,Y)$ where $z=g(x,y)$ is a function with two arguments, e.g., $z=x+y$.<\/p>\n<p>Again, there are multiple ways of addressing this problem. A natural way would be to calculate the CDF of $Z$ directly, i.e., $F_Z(z) = {\\rm Pr}[Z \\leq z] = {\\rm Pr}[g(X,Y) \\leq z]$. In other words, we need to compute the probability of the event that relates to all realization of $X$ and $Y$ which satisfy $g(X,Y) \\leq z$. This is easily done by integrating the joint PDF $f_{X,Y}(x,y)$ over all points in the set ${\\mathcal D}_z$ which contains all points $(x,y)$ for which $g(x,y) \\leq z$. Written out, we have<\/p>\n<p>$$F_Z(z) = {\\rm Pr}[Z \\leq z] = {\\rm Pr}[g(X,Y) \\leq z] = \\iint_{{\\mathcal D}_z} f_{X,Y}(x,y) {\\rm d}x {\\rm d} y$$<\/p>\n<p>Whether or not this approach is easy to follow depends on two things: (1) how easy it is to parametrize the set ${\\mathcal D}_z$ and (2) how easy it is to integrate the joint PDF over ${\\mathcal D}_z$.<\/p>\n<p>Let us make an example considering the simple function $g(x,y) = x+y$. Then ${\\mathcal D}_z$ contains all points $(x,y)$ for which $x+y \\leq z$, i.e., $y\\leq z-x$ or $x \\leq z-y$. Geometrically, this is the set of points that are on the lower-left of a line with slope -1 and offset $z$, i.e., a line passing through $(z,0)$ and $(0,z)$. The integral over this set is relatively simple, as we can directly write it as<\/p>\n<p>$$\\displaystyle F_Z(z) = \\int_{-\\infty}^{+\\infty} \\int_{-\\infty}^{z-y} f_{X,Y}(x,y) {\\rm d}x {\\rm d} y = \\int_{-\\infty}^{+\\infty} \\int_{-\\infty}^{z-x} f_{X,Y}(x,y) {\\rm d}y {\\rm d} x$$.<\/p>\n<p>Another example is $g(x,y) = \\max(x,y)$. Since $\\max(x,y) \\leq z \\Leftrightarrow ((x \\leq\u00a0 z) \\;\\mbox{and} \\; (y \\leq z))$ we can argue<\/p>\n<p>$$F_Z(z) = {\\rm Pr}[\\max(X,Y) \\leq z] = \\int_{-\\infty}^z \\int_{-\\infty}^z f_{X,Y}(x,y) {\\rm d}x {\\rm d} y$$.<\/p>\n<p>Geometrically, ${\\mathcal D}_z$ contains all points on the &#8220;lower left&#8221; of the point $(z,z)$, i.e., the intersection of the half-planes below $y=z$ and left of $x=z$.<\/p>\n<p>&nbsp;<\/p>\n<p>The second extension is to consider <strong>two functions of two random variables<\/strong>. Say we are given the distribution of $X$ and $Y$ via their joint PDF, we would like to find the joint PDF of $Z=g(X,Y)$ and $W=h(X,Y)$. There is a closed-form expresion for it as a direct extension of the closed-form expression for the PDF of one function of one random variable. It reads as<\/p>\n<p>$$f_{Z,W}(z,w) = \\sum_{i=1}^N \\frac{1}{|{\\rm det} \\ma{J}(x_i,y_i)|} f_{X,Y}(x_i,y_i)$$,<\/p>\n<p>where $(x_i,y_i)$ are all solutions to the system of equations $z=g(x,y)$, $w=h(x,y)$ in $x$ and $y$. Here, $\\ma{J}$ is the Jacobian matrix given by<\/p>\n<p>$$ \\ma{J} = \\left[ \\begin{array}{cc}<br \/>\n\\frac{\\partial g}{x} &amp; \\frac{\\partial g}{y} \\\\<br \/>\n\\frac{\\partial h}{x} &amp; \\frac{\\partial h}{y}<br \/>\n\\end{array}\\right]$$.<\/p>\n<p>Moreover, the term ${\\rm det} \\ma{J}(x_i,y_i)$ means that we first compute the determinant of the Jacobian matrix (in terms of $x$ and $y$) and then insert $x_i(z,w)$ and $y_i(z,w)$.<\/p>\n<p>&nbsp;<\/p>\n<p>Example? How about the joint distribution of $X+Y$ and $X-Y$? In this case, solving for $z=x+y$ and $w=x-y$ for $x$ and $y$ is simple, we have one solution given by $x_1 = (z+w)\/2$ and $y_1 = (z-w)\/2$. The Jacobian matrix is given by<\/p>\n<p>$$ \\ma{J} = \\left[ \\begin{array}{cc}<br \/>\n1&amp; 1 \\\\ 1 &amp; -1<br \/>\n\\end{array}\\right]$$<br \/>\nand hence its determinant is $-2$ everywhere. This gives the solution for $f_{Z,W}(z,w)$ in the form<\/p>\n<p>$f_{Z,W}(z,w) = \\frac{1}{2} f_{X,Y}((z+w)\/2,(z-w)\/2)$.<\/p>\n<p>As in the 1-D case, this direct solution depends heavily on our ability to solve the given functions for $x$ and $y$, which may be tedious for complicated functions.<\/p>\n<p>&nbsp;<\/p>\n<p>Interestingly, the first case where we considered one function of one random variable can be solved also via this approach, simply by creating another &#8220;auxiliary&#8221; variable, and then marginalizing over it. So once we have $Z=g(X,Y)$ we make up another $W=h(X,Y)$, choosing it such that the remaining calculations are simple. For instance, for $g(x,y) = x+y$ we may choose $h(x,y) = y$. Then, the Jacobian matrix becomes<\/p>\n<p>$$ \\ma{J} = \\left[ \\begin{array}{cc}<br \/>\n1&amp; 1 \\\\ 0 &amp; 1<br \/>\n\\end{array}\\right]$$<\/p>\n<p>with determinant one. Moreover, we have $x_1 = z-w$ and $y_1 = w$. Therefore, we get<\/p>\n<p>$$f_{Z,W}(z,w) = f_{X,Y}(z-w,w)$$.<\/p>\n<p>The final step is marginalizing out the auxiliary $W$ which gives<\/p>\n<p>$$f_Z(z) = \\int_{-\\infty}^{+\\infty} f_{X,Y}(z-w,w) {\\rm d}w.$$<\/p>\n<p>Looks much like a convolution integral, doesn&#8217;t it? In fact, if $X$ and $Y$ are statistically independent, we can write $f_{X,Y}(x,y) = f_X(x) \\cdot f_Y(y)$ and hence we obtain<\/p>\n<p>$$f_Z(z) = \\int_{-\\infty}^{+\\infty} f_{X}(z-w)\\cdot f_Y(w) {\\rm d}w = f_X(x) * f_Y(y),$$<\/p>\n<p>where $*$ denotes convolution. This shows very easily that the PDF of the sum of two random variables is the convolution of their PDFs, if they are statistically independent.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently posted on how to find the distribution of functions of random variables, i.e., the distribution of $Y=g(X)$, where $X$ is a random variable with known distribution and $y=g(x)$ is some function. &nbsp; As a natural extension of this concept we may ask ourselves what happens if we have two random variables involved. Let [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[9],"_links":{"self":[{"href":"https:\/\/florian-roemer.de\/blog\/wp-json\/wp\/v2\/posts\/96"}],"collection":[{"href":"https:\/\/florian-roemer.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/florian-roemer.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/florian-roemer.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/florian-roemer.de\/blog\/wp-json\/wp\/v2\/comments?post=96"}],"version-history":[{"count":18,"href":"https:\/\/florian-roemer.de\/blog\/wp-json\/wp\/v2\/posts\/96\/revisions"}],"predecessor-version":[{"id":111,"href":"https:\/\/florian-roemer.de\/blog\/wp-json\/wp\/v2\/posts\/96\/revisions\/111"}],"wp:attachment":[{"href":"https:\/\/florian-roemer.de\/blog\/wp-json\/wp\/v2\/media?parent=96"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/florian-roemer.de\/blog\/wp-json\/wp\/v2\/categories?post=96"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/florian-roemer.de\/blog\/wp-json\/wp\/v2\/tags?post=96"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}