Conjoint Probability 1. Discrete Random Variables 1.1 Definition Let X , Y X, Y X , Y be two discrete Random Variables with finite co-domains. Let U = { x 1 , … , x n } , V = { y 1 , … , y m } U=\{x_1,\dots,x_n\},V=\{y_1,\dots,y_m\} U = { x 1 , … , x n } , V = { y 1 , … , y m } be the possible values of X and Y respectively We define the random variable Z = ( X , Y ) Z=(X, Y) Z = ( X , Y ) as a joint random variable defined by the law:
∀ ( x , y ) ∈ U × V , P ( Z = ( x , y ) ) = P ( X = x ∧ Y = y ) \forall (x,y)\in U\times V,\quad \mathcal{P}(Z=(x,y))=\mathcal{P}(X=x \wedge Y=y) ∀ ( x , y ) ∈ U × V , P ( Z = ( x , y )) = P ( X = x ∧ Y = y ) We can express the joint probabilities compactly in the following form:
M = ( p 1 , 1 … p 1 , m p 2 , 1 … p 2 , m ⋮ ⋱ ⋮ p n , 1 … p n , m ) M=\begin{pmatrix} p_{1,1}&\dots&p_{1,m}\\ p_{2,1}&\dots&p_{2,m}\\ \vdots &\ddots&\vdots\\ p_{n,1}&\dots&p_{n,m} \end{pmatrix} M = ⎝ ⎛ p 1 , 1 p 2 , 1 ⋮ p n , 1 … … ⋱ … p 1 , m p 2 , m ⋮ p n , m ⎠ ⎞ With ∀ i , ∈ { 1 , … , n } , ∀ j ∈ { 1 , … , m } , p i , j = P ( X = x i ∧ Y = y j ) \forall i,\in\{1,\dots,n\},\forall j\in\{1,\dots,m\},\quad p_{i,j}=\mathcal{P}(X=x_i \wedge Y=y_j) ∀ i , ∈ { 1 , … , n } , ∀ j ∈ { 1 , … , m } , p i , j = P ( X = x i ∧ Y = y j )
If X X X and Y Y Y are independent, then:
p i , j = P ( X = x i ) ⋅ P ( Y = y j ) p_{i,j}=\mathcal{P}(X=x_i)\cdot \mathcal{P}(Y=y_j) p i , j = P ( X = x i ) ⋅ P ( Y = y j ) 1.3 Joint Distribution Matrix Let M M M be a matrix with positive entries.
M M M is said to be a joint distribution matrix if its entries sum to one:
∑ i , j M i , j = 1 \sum_{i,j}M_{i,j}=1 i , j ∑ M i , j = 1 1.4 Event Probability Let A ⊆ U × V A\subseteq U\times V A ⊆ U × V
We have:
P ( ( X , Y ) ∈ A ) = ∑ ( x , y ) ∈ A P ( X = x ∧ Y = y ) \mathcal{P}((X,Y)\in A)=\sum_{(x,y)\in A}\mathcal{P}(X=x\wedge Y=y) P (( X , Y ) ∈ A ) = ( x , y ) ∈ A ∑ P ( X = x ∧ Y = y ) 1.5 Marginal Distribution 1.5.1 Marginal Distribution of X X X The marginal distribution of X X X is the probability distribution of X determined from ( X , Y ) (X, Y) ( X , Y )
∀ x ∈ U , P ( X = x ) = ∑ y ∈ V P ( X = x ∧ Y = y ) \forall x\in U,\quad \mathcal{P}(X=x)=\sum_{y\in V}\mathcal{P}(X=x\wedge Y=y) ∀ x ∈ U , P ( X = x ) = y ∈ V ∑ P ( X = x ∧ Y = y ) 1.5.2 Marginal Distribution of Y Y Y The marginal distribution of Y Y Y is the probability distribution of X determined from ( X , Y ) (X, Y) ( X , Y )
∀ x ∈ V , P ( Y = y ) = ∑ x ∈ U P ( X = x ∧ Y = y ) \forall x\in V,\quad \mathcal{P}(Y=y)=\sum_{x\in U}\mathcal{P}(X=x\wedge Y=y) ∀ x ∈ V , P ( Y = y ) = x ∈ U ∑ P ( X = x ∧ Y = y ) 1.6 Conditional Distribution 1.6.1 Conditional Distribution of X X X knowing Y = y Y=y Y = y ∀ x ∈ U , P ( X [ Y = y ] = x ) = P ( X Y = y = x ) = P ( Y = y ∣ X = x ) = P ( X = x ∧ Y = y ) P ( Y = y ) \forall x\in U,\quad \mathcal{P}(X[Y=y]=x)=\mathcal{P}(X_{Y=y}=x)=\mathcal{P}(Y=y \mid X=x)=\frac{\mathcal{P}(X=x \wedge Y=y)}{\mathcal{P}(Y=y)} ∀ x ∈ U , P ( X [ Y = y ] = x ) = P ( X Y = y = x ) = P ( Y = y ∣ X = x ) = P ( Y = y ) P ( X = x ∧ Y = y ) 1.6.2 Conditional Distribution of Y Y Y knowing X = x X=x X = x ∀ y ∈ V , P ( Y [ X = x ] = y ) = P ( Y X = x = y ) = P ( Y = y ∣ X = x ) = P ( X = x ∧ Y = y ) P ( Y = y ) \forall y\in V,\quad \mathcal{P}(Y[X=x]=y)=\mathcal{P}(Y_{X=x}=y)=\mathcal{P}(Y=y \mid X=x)=\frac{\mathcal{P}(X=x \wedge Y=y)}{\mathcal{P}(Y=y)} ∀ y ∈ V , P ( Y [ X = x ] = y ) = P ( Y X = x = y ) = P ( Y = y ∣ X = x ) = P ( Y = y ) P ( X = x ∧ Y = y ) 2. Continuous Real Random Variables 2.1 Definition Let X , Y X, Y X , Y be two continuous Random Variables with a joint probability density function f X , Y f_{X,Y} f X , Y . We define the random variable Z = ( X , Y ) Z=(X, Y) Z = ( X , Y ) as a joint random variable defined by the law:
∀ ( x , y ) ∈ R 2 , F Z ( Z ≤ ( x , y ) ) = P ( X ≤ x ∧ Y ≤ y ) = ∫ − ∞ x ∫ − ∞ y f X , Y ( u , v ) dvdu = ∬ ] − ∞ , x ] × ] − ∞ , y ] f X , Y ( u ) du \forall (x,y)\in \mathbb{R}^2,\quad F_Z(Z \le (x,y))=\mathcal{P}(X \le x \wedge Y\le y)=\int_{-\infty}^x\int_{-\infty}^y f_{X,Y}(u,v)\space\text{dvdu}=\iint_{\mathopen]-\infty,x\mathclose]\times \mathopen]-\infty,y\mathclose]}f_{X,Y}(u)\space \text{du} ∀ ( x , y ) ∈ R 2 , F Z ( Z ≤ ( x , y )) = P ( X ≤ x ∧ Y ≤ y ) = ∫ − ∞ x ∫ − ∞ y f X , Y ( u , v ) dvdu = ∬ ] − ∞ , x ] × ] − ∞ , y ] f X , Y ( u ) du 2.2 Joint Distribution Function A function h ∈ L 1 ( R ) h\in\mathscr{L}^1(\mathbb{R}) h ∈ L 1 ( R ) is said to be a joint distribution function if:
h h h is positive: h ≥ 0 h\ge 0 h ≥ 0
The integral of h h h is 1 1 1 :
∥ h ∥ 1 = ∬ R 2 h ( u ) du = ∫ R ∫ R h ( x , y ) dydx = 1 \lVert h \rVert_1=\iint_{\mathbb{R}^2}h(u)\text{du}=\int_{\mathbb{R}}\int_{\mathbb{R}}h(x,y)\space \text{dydx}=1 ∥ h ∥ 1 = ∬ R 2 h ( u ) du = ∫ R ∫ R h ( x , y ) dydx = 1 2.4 Event Probability Let A ⊆ B A\subseteq \mathcal{B} A ⊆ B
We have:
P ( ( X , Y ) ∈ A ) = ∬ A h ( u ) du \mathcal{P}((X,Y)\in A)=\iint_{A}h(u)\space \text{du} P (( X , Y ) ∈ A ) = ∬ A h ( u ) du 2.5 Marginal Distribution 2.5.1 Marginal Distribution of X X X The marginal distribution of X X X is the probability distribution of X X X determined from ( X , Y ) (X, Y) ( X , Y )
∀ x ∈ R , f X ( x ) = ∫ y ∈ R f X , Y ( x , y ) dy \forall x\in \mathbb{R},\quad f_X(x)=\int_{y\in\mathbb{R}}f_{X,Y}(x,y)\text{dy} ∀ x ∈ R , f X ( x ) = ∫ y ∈ R f X , Y ( x , y ) dy 2.5.2 Marginal Distribution of Y Y Y The marginal distribution of Y Y Y is the probability distribution of Y Y Y determined from ( X , Y ) (X, Y) ( X , Y )
∀ y ∈ R , f Y ( y ) = ∫ x ∈ R f X , Y ( x , y ) dx \forall y\in \mathbb{R},\quad f_Y(y)=\int_{x\in\mathbb{R}}f_{X,Y}(x,y)\text{dx} ∀ y ∈ R , f Y ( y ) = ∫ x ∈ R f X , Y ( x , y ) dx 2.6 Conditional Distribution 2.6.1 Conditional Distribution of X X X knowing Y = y Y=y Y = y It is the conditional distribution of X X X with the knowledge that Y = y Y=y Y = y , it is defined as:
∀ x ∈ R , f X [ Y = y ] ( x ) = f X ∣ Y = y ( x ) = f X , Y ( x , y ) f Y ( y ) \forall x\in \mathbb{R},\quad f_{X[Y=y]}(x)=f_{X_{\mid Y=y}}(x)=\frac{f_{X,Y}(x,y)}{f_Y(y)} ∀ x ∈ R , f X [ Y = y ] ( x ) = f X ∣ Y = y ( x ) = f Y ( y ) f X , Y ( x , y ) 2.6.2 Conditional Distribution of Y Y Y knowing X = x X=x X = x It is the conditional distribution of Y Y Y with the knowledge that X = x X=x X = x , it is defined as:
∀ y ∈ R , f Y [ X = x ] ( x ) = f Y ∣ X = x ( y ) = f X , Y ( x , y ) f X ( x ) \forall y\in \mathbb{R},\quad f_{Y[X=x]}(x)=f_{Y_{\mid X=x}}(y)=\frac{f_{X,Y}(x,y)}{f_X(x)} ∀ y ∈ R , f Y [ X = x ] ( x ) = f Y ∣ X = x ( y ) = f X ( x ) f X , Y ( x , y ) 3. Conditional Expectation 3.1 Definition Let X X X and Y Y Y two random variables.
The conditional expectation of Y Y Y given X = x X=x X = x , noted E [ Y ∣ X = x ] , \mathbb{E}[Y\mid X=x], E [ Y ∣ X = x ] , is the expected value of Y Y Y with the additional information that X = x X=x X = x . It is equal to:
∀ x , E [ Y ∣ X = x ] = E [ Y ∣ X = x ] \forall x,\quad \mathbb{E}[Y\mid X=x]=\mathbb{E}[Y_{\mid X=x}] ∀ x , E [ Y ∣ X = x ] = E [ Y ∣ X = x ] 3.2 As a random variable By introducing the function φ \varphi φ defined as follow:
φ : R → R x → E [ Y ∣ X = x ] \begin{align*} \varphi:&\mathbb{R}\rightarrow \mathbb{R}\\ &x\rightarrow \mathbb{E}[Y\mid X=x] \end{align*} φ : R → R x → E [ Y ∣ X = x ] We will define the conditional expectation of Y Y Y given X X X , denoted by E [ Y ∣ X ] \mathbb{E}[Y\mid X] E [ Y ∣ X ] as following:
E [ Y ∣ X ] = φ ( X ) \mathbb{E}[Y\mid X]=\varphi(X) E [ Y ∣ X ] = φ ( X ) To calculate its distribution, see Function on a random variable
3.3 Law of Total Expectation Let Y , X Y,X Y , X two random variables.
We have the following:
E [ Y ] = E [ E [ Y ∣ ∣ X ] = E X [ E Y [ Y ∣ X ] ] \mathbb{E}[Y]=\mathbb{E}\left[\mathbb{E}[Y|\mid X\right] =\mathbb{E}_X\left[\mathbb{E}_Y[Y\mid X]\right] E [ Y ] = E [ E [ Y ∣ ∣ X ] = E X [ E Y [ Y ∣ X ] ] To avoid confusion, we noted:
E Y \mathbb{E}_Y E Y to emphasise that the expectation is calculated against Y Y Y E X \mathbb{E}_X E X to emphasise that the expectation is calculated against X X X 4. Conditional Variance 4.1 Definition Let X X X and Y Y Y two random variables.
The conditional variance of Y Y Y given X = x X=x X = x , noted V [ Y ∣ X = x ] , \mathbb{V}[Y\mid X=x], V [ Y ∣ X = x ] , is the variance of Y Y Y with the additional information that X = x X=x X = x . It is equal to:
∀ x , V [ Y ∣ X = x ] = V [ Y ∣ X = x ] \forall x,\quad \mathbb{V}[Y\mid X=x]=\mathbb{V}[Y_{\mid X=x}] ∀ x , V [ Y ∣ X = x ] = V [ Y ∣ X = x ] 4.2 As a random variable By introducing the function φ \varphi φ defined as follow:
φ : R → R x → V [ Y ∣ X = x ] \begin{align*} \varphi:&\mathbb{R}\rightarrow \mathbb{R}\\ &x\rightarrow \mathbb{V}[Y\mid X=x] \end{align*} φ : R → R x → V [ Y ∣ X = x ] We will define the conditional variance of Y Y Y given X X X , denoted by V [ Y ∣ X ] \mathbb{V}[Y\mid X] V [ Y ∣ X ] as following:
V [ Y ∣ X ] = φ ( X ) \mathbb{V}[Y\mid X]=\varphi(X) V [ Y ∣ X ] = φ ( X ) To calculate its distribution, see Function on a random variable
3.3 Law of Total Variance Let Y , X Y,X Y , X two random variables.
We have the following:
V [ Y ] = V [ E [ Y ∣ X ] ] + E [ V [ Y ∣ X ] ] = V X [ E Y [ Y ∣ X ] ] + E X [ V Y [ Y ∣ X ] ] \mathbb{V}[Y]=\mathbb{V}\left[\mathbb{E}[Y\mid X]\right]+\mathbb{E}\left[\mathbb{V}[Y\mid X]\right] =\mathbb{V}_X\left[\mathbb{E}_Y[Y\mid X]\right]+\mathbb{E}_X\left[\mathbb{V}_Y[Y\mid X]\right] V [ Y ] = V [ E [ Y ∣ X ] ] + E [ V [ Y ∣ X ] ] = V X [ E Y [ Y ∣ X ] ] + E X [ V Y [ Y ∣ X ] ] To avoid confusion, we noted:
E Y , V Y \mathbb{E}_Y,\mathbb{V}_Y E Y , V Y to emphasise that the expectation and variance are calculated respectively against Y Y Y E X , V X \mathbb{E}_X,\mathbb{V}_X E X , V X to emphasise that the expectation and variance are calculated respectively against X X X