Discrete Distributions 1 1.1 Definition A discrete random variable X X X is said to follow the uniform distribution D ( a , b ) \mathcal{D}(a,b) D ( a , b ) if:
∀ k ∈ { a , … , b } , P ( X = k ) = 1 b − a + 1 \forall k\in\{a,\dots,b\},\quad \mathcal{P}(X=k)=\frac{1}{b-a+1} ∀ k ∈ { a , … , b } , P ( X = k ) = b − a + 1 1 1.2 Significance 1.3 Moments 1.3.1 Non central Moments ∀ n ∈ N , E [ X n ] = 1 b − a + 1 ∑ k = a b k n \forall n\in\mathbb{N},\quad \mathbb{E}[X^n]=\frac{1}{b-a+1}\sum_{k=a}^bk^n ∀ n ∈ N , E [ X n ] = b − a + 1 1 k = a ∑ b k n In particular, the expected value E [ X ] \mathbb{E}[X] E [ X ] is:
E [ X ] = 1 b − a + 1 ∑ k = a b k = ( b − a + 1 ) ( a + b ) 2 ( b − a + 1 ) = a + b 2 \boxed{\mathbb{E}[X]=\frac{1}{b-a+1}\sum_{k=a}^bk=\frac{(b-a+1)(a+b)}{2(b-a+1)}=\frac{a+b}{2}} E [ X ] = b − a + 1 1 k = a ∑ b k = 2 ( b − a + 1 ) ( b − a + 1 ) ( a + b ) = 2 a + b 1.3.2 Central Moments ∀ n ∈ N , E [ ( X − E [ X ] ) n ] = 1 b − a + 1 ∑ k = a b ( k − a + b 2 ) n \forall n\in\mathbb{N},\quad \mathbb{E}\left[(X-\mathbb{E}[X])^n\right]=\frac{1}{b-a+1}\sum_{k=a}^b\left(k-\frac{a+b}{2}\right)^n ∀ n ∈ N , E [ ( X − E [ X ] ) n ] = b − a + 1 1 k = a ∑ b ( k − 2 a + b ) n To get the exact expression of the variance, we will start by calculating the following quantity
∑ k = a b ( k + 1 ) 3 = ∑ k = a b k 3 + 3 k 2 + 3 k + 1 ⟹ ∑ k = a b 3 k 2 = ( b + 1 ) 3 − a 3 − ∑ k = a b ( 3 k − 1 ) 3 ∑ k = a b k 2 = ( b + 1 ) 3 − a 3 − 3 2 ( a + b ) ( b − a + 1 ) − ( b − a + 1 ) ⟹ 6 ∑ k = a b k 2 = ( b − a + 1 ) ( 2 a 2 + 2 ( 1 + b ) a + 2 ( b + 1 ) 2 − 3 ( a + b ) − 2 ) = ( b − a + 1 ) ( 2 a 2 + 2 a + 2 a b + 2 b 2 + 4 b + 2 − 3 a − 3 b − 2 ) = ( b − a + 1 ) ( 2 a 2 − a + 2 a b + 2 b 2 + b ) ⟹ ∑ k = a b k 2 = 1 6 ( b − a + 1 ) ( 2 a 2 − a + 2 a b + 2 b 2 + b ) \begin{align*} \sum_{k=a}^b(k+1)^3&= \sum_{k=a}^b k^3+3k^2+3k+1\\ \implies \sum_{k=a}^b3k^2&=(b+1)^3-a^3-\sum_{k=a}^b(3k-1)\\ 3\sum_{k=a}^bk^2&=(b+1)^3-a^3-\frac{3}{2}(a+b)(b-a+1)-(b-a+1)\\ \implies 6 \sum_{k=a}^bk^2&=(b-a+1)\left(2a^2+2(1+b)a+2(b+1)^2-3(a+b)-2\right)\\ &=(b-a+1)(2a^2+2a+2ab+2b^2+4b+2-3a-3b-2)\\ &=(b-a+1)(2a^2-a+2ab+2b^2+b)\\ \implies \sum_{k=a}^bk^2&=\frac{1}{6}(b-a+1)(2a^2-a+2ab+2b^2+b)\\ \end{align*} k = a ∑ b ( k + 1 ) 3 ⟹ k = a ∑ b 3 k 2 3 k = a ∑ b k 2 ⟹ 6 k = a ∑ b k 2 ⟹ k = a ∑ b k 2 = k = a ∑ b k 3 + 3 k 2 + 3 k + 1 = ( b + 1 ) 3 − a 3 − k = a ∑ b ( 3 k − 1 ) = ( b + 1 ) 3 − a 3 − 2 3 ( a + b ) ( b − a + 1 ) − ( b − a + 1 ) = ( b − a + 1 ) ( 2 a 2 + 2 ( 1 + b ) a + 2 ( b + 1 ) 2 − 3 ( a + b ) − 2 ) = ( b − a + 1 ) ( 2 a 2 + 2 a + 2 ab + 2 b 2 + 4 b + 2 − 3 a − 3 b − 2 ) = ( b − a + 1 ) ( 2 a 2 − a + 2 ab + 2 b 2 + b ) = 6 1 ( b − a + 1 ) ( 2 a 2 − a + 2 ab + 2 b 2 + b ) From that, we can directly calculate the variance V [ X ] \mathbb{V}[X] V [ X ] as follow:
V [ X ] = E [ X 2 ] − E [ X ] 2 = 1 b − a + 1 ∑ k = a b k 2 − ( a + b ) 2 4 = 2 a 2 − a + 2 a b + 2 b 2 + b 6 − ( a + b ) 2 4 = 4 a 2 − 2 a + 4 a b + 4 b 2 + 2 b − 3 a 2 − 6 a b − 3 b 2 12 = a 2 − 2 a + b 2 + 2 b − 2 a b 12 = ( b − a ) 2 + 2 ( b − a ) 12 = ( b − a ) 2 + 2 ( b − a ) + 1 − 1 12 = ( b − a + 1 ) 2 − 1 12 \begin{align*} \mathbb{V}[X]&=\mathbb{E}[X^2]-\mathbb{E}[X]^2\\ &=\frac{1}{b-a+1}\sum_{k=a}^bk^2-\frac{(a+b)^2}{4}\\ &=\frac{2a^2-a+2ab+2b^2+b}{6}-\frac{(a+b)^2}{4}\\ &=\frac{4a^2-2a+4ab+4b^2+2b-3a^2-6ab-3b^2}{12}\\ &=\frac{a^2-2a+b^2+2b-2ab}{12} \\ &=\frac{(b-a)^2+2(b-a)}{12}\\ &=\frac{(b-a)^2+2(b-a)+1-1}{12}\\ &=\frac{(b-a+1)^2-1}{12 } \end{align*} V [ X ] = E [ X 2 ] − E [ X ] 2 = b − a + 1 1 k = a ∑ b k 2 − 4 ( a + b ) 2 = 6 2 a 2 − a + 2 ab + 2 b 2 + b − 4 ( a + b ) 2 = 12 4 a 2 − 2 a + 4 ab + 4 b 2 + 2 b − 3 a 2 − 6 ab − 3 b 2 = 12 a 2 − 2 a + b 2 + 2 b − 2 ab = 12 ( b − a ) 2 + 2 ( b − a ) = 12 ( b − a ) 2 + 2 ( b − a ) + 1 − 1 = 12 ( b − a + 1 ) 2 − 1 2. Bernoulli Distribution 2.1 Definition A discrete random variable X X X is said to follow the Bernoulli distribution B ( p ) \mathcal{B}(p) B ( p ) if:
{ P ( X = 1 ) = p P ( X = 0 ) = 1 − p \begin{cases} \mathcal{P}(X=1)&=p \\ \mathcal{P}(X=0)&=1-p \end{cases} { P ( X = 1 ) P ( X = 0 ) = p = 1 − p 2.2 Significance In probability theory and statistics , the Bernoulli distribution , named after Swiss mathematician Jacob Bernoulli , is the discrete probability distribution of a random variable which takes the value 1 1 1 with probability p p p and the value 0 0 0 with probability q = 1 − p q=1-p q = 1 − p .
Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question .
Such questions lead to outcomes that are boolean -valued: a single bit whose value is success/yes /true /one with probability p p p and failure/no/false /zero with probability q q q .
It can be used to represent a (possibly biased) coin toss where 1 1 1 and 0 0 0 would represent "heads" and "tails", respectively, and p p p would be the probability of the coin landing on heads (or vice versa where 1 1 1 would represent tails and p p p would be the probability of tails).
In particular, unfair coins would have p ≠ 1 2 p\neq \frac{1}{2} p = 2 1
2.3 Moments 2.3.1 Idempotence The Bernoulli distribution is idempotent:
∀ n ∈ N ∗ , X n = X \forall n\in\mathbb{N}^*,\quad X^n=X ∀ n ∈ N ∗ , X n = X 2.3.2 Non-central moments ∀ n ∈ N ∗ , E [ X n ] = E [ X ] = P ( X = 1 ) = p \begin{align*} \forall n \in\mathbb{N}^*,\quad \mathbb{E}[X^n]&=\mathbb{E}[X]\\ &=\mathcal{P}(X=1)\\ &=p \end{align*} ∀ n ∈ N ∗ , E [ X n ] = E [ X ] = P ( X = 1 ) = p 2.3.3 Central Moments ∀ n ∈ N ∗ , E [ ( X − E [ X ] ) n ] = E [ X ] = ( 1 − p ) n P ( X = 1 ) + ( − p ) n P ( x = 0 ) = ( − p ) n ( 1 − p ) + p ( 1 − p ) n = p ( 1 − p ) ( ( 1 − p ) n − 1 − ( − p ) n − 1 ) \begin{align*} \forall n \in\mathbb{N}^*,\quad \mathbb{E}\left[(X-\mathbb{E}[X])^n\right]&=\mathbb{E}[X]\\ &=(1-p)^n\mathcal{P}(X=1)+(-p)^n\mathcal{P}(x=0)\\ &=(-p)^n(1-p)+p(1-p)^n \\ &=p(1-p)\left((1-p)^{n-1}-(-p)^{n-1}\right) \end{align*} ∀ n ∈ N ∗ , E [ ( X − E [ X ] ) n ] = E [ X ] = ( 1 − p ) n P ( X = 1 ) + ( − p ) n P ( x = 0 ) = ( − p ) n ( 1 − p ) + p ( 1 − p ) n = p ( 1 − p ) ( ( 1 − p ) n − 1 − ( − p ) n − 1 ) In particular, the variance V [ X ] \mathbb{V}[X] V [ X ] is equal to:
V [ X ] = p ( 1 − p ) \boxed{\mathbb{V}[X]=p(1-p)} V [ X ] = p ( 1 − p ) 2.4 Product of Bernoulli distributions Let n ∈ N n\in\mathbb{N} n ∈ N
Let p 1 , … , p n ∈ [ 0 , 1 ] p_1,\dots,p_n\in[0,1] p 1 , … , p n ∈ [ 0 , 1 ]
Let X 1 ∼ B ( p 1 ) , … , X n ∼ B ( p n ) X_1\sim \mathcal{B}(p_1),\dots,X_n\sim \mathcal{B}(p_n) X 1 ∼ B ( p 1 ) , … , X n ∼ B ( p n )
The random variable P = ∏ i = 1 n X i P=\prod_{i=1}^nX_i P = ∏ i = 1 n X i follows a Bernoulli distribution B ( p ) \mathcal{B}(p) B ( p ) with:
p = P ( ⋀ i = 1 n X i = 1 ) p=\mathcal{P}\left(\bigwedge_{i=1}^n X_i=1\right) p = P ( i = 1 ⋀ n X i = 1 ) If the random variables are independent, then:
p = ∏ i = 1 n X i p=\prod_{i=1}^n X_i p = i = 1 ∏ n X i 2.4.1 Example 1 Let X 1 ∼ B ( 0.5 ) X_1\sim \mathcal{B}(0.5) X 1 ∼ B ( 0.5 ) Let X 2 ∼ B ( 0.7 ) X_2\sim \mathcal{B}(0.7) X 2 ∼ B ( 0.7 ) Let X 3 ∼ B ( 0.3 ) X_3 \sim \mathcal{B}(0.3) X 3 ∼ B ( 0.3 ) We will assume that X 1 , X 2 , X 3 X_1,X_2,X_3 X 1 , X 2 , X 3 are independent Let P = X 1 X 2 X 3 P=X_1X_2X_3 P = X 1 X 2 X 3 The probability distribution function of P P P is:
P ( P = k ) = { 0.105 k = 1 0.895 k = 0 \mathcal{P}(P=k)=\begin{cases} 0.105 & k=1 \\ 0.895 & k=0 \end{cases} P ( P = k ) = { 0.105 0.895 k = 1 k = 0 2.4.2 Example 2 Let X 1 ∼ B ( 0.5 ) X_1\sim \mathcal{B}(0.5) X 1 ∼ B ( 0.5 ) Let X 2 ∼ B ( 0.7 ) X_2\sim \mathcal{B}(0.7) X 2 ∼ B ( 0.7 ) Let X 3 X_3 X 3 be the random variable defined to be 1 1 1 if X 2 = X 1 X_2=X_1 X 2 = X 1 and 0 0 0 otherwise We will assume that X 1 , X 2 , X 3 X_1,X_2,X_3 X 1 , X 2 , X 3 are independent Let P = X 1 X 2 X 3 P=X_1X_2X_3 P = X 1 X 2 X 3 P ( X 1 = 1 ∧ X 2 = 1 ∧ X 3 = 1 ) = P ( X 1 = 1 ∧ X 2 = 1 ∧ X 1 = X 2 ) = P ( X 1 = 1 ∧ X 2 = 1 ) = 0.35 \mathcal{P}(X_1 = 1 \wedge X_2=1 \wedge X_3=1) =\mathcal{P}(X_1 = 1 \wedge X_2=1 \wedge X_1=X_2)=\mathcal{P}(X_1 = 1 \wedge X_2=1)=0.35 P ( X 1 = 1 ∧ X 2 = 1 ∧ X 3 = 1 ) = P ( X 1 = 1 ∧ X 2 = 1 ∧ X 1 = X 2 ) = P ( X 1 = 1 ∧ X 2 = 1 ) = 0.35 So P ∼ B ( 0.35 ) P\sim \mathbb{B}(0.35) P ∼ B ( 0.35 )
2.5 Binary Function on Bernoulli distributions Let n ∈ N n\in\mathbb{N} n ∈ N
Let p 1 , … , p n ∈ [ 0 , 1 ] p_1,\dots,p_n\in[0,1] p 1 , … , p n ∈ [ 0 , 1 ]
Let X 1 ∼ B ( p 1 ) , … , X n ∼ B ( p n ) X_1\sim \mathcal{B}(p_1),\dots,X_n\sim \mathcal{B}(p_n) X 1 ∼ B ( p 1 ) , … , X n ∼ B ( p n )
Let F : { 0 , 1 } n → { 0 , 1 } F:\{0,1\}^n\rightarrow \{0,1\} F : { 0 , 1 } n → { 0 , 1 } be a binary function
Then the random variable Y = F ( X 1 , … , X n ) Y=F(X_1,\dots,X_n) Y = F ( X 1 , … , X n ) follows a Bernoulli distribution B ( p ) \mathcal{B}(p) B ( p ) with:
p = P ( ( X 1 , … , X n ) ∈ F − 1 ( 1 ) ) p=\mathcal{P}\left((X_1,\dots,X_n)\in F^{-1}(1)\right) p = P ( ( X 1 , … , X n ) ∈ F − 1 ( 1 ) ) If the random variables are independent, then:
p = ∑ U ∈ F − 1 ( 1 ) ∏ i = 1 n P ( X i = U i ) p=\sum_{U\in F^{-1}(1)}\prod_{i=1}^n\mathcal{P}(X_i=U_i) p = U ∈ F − 1 ( 1 ) ∑ i = 1 ∏ n P ( X i = U i ) 2.5.1 Example 1 Let X 1 ∼ B ( p 1 = 0.8 ) X_1\sim \mathcal{B}(p_1=0.8) X 1 ∼ B ( p 1 = 0.8 )
Let X 2 ∼ B ( p 2 = 0.6 ) X_2 \sim \mathcal{B}(p_2=0.6) X 2 ∼ B ( p 2 = 0.6 )
Let X 3 ∼ B ( p 3 = 0.5 ) X_3 \sim \mathcal{B}(p_3=0.5) X 3 ∼ B ( p 3 = 0.5 )
We will assume X 1 , X 2 , X 3 X_1,X_2,X_3 X 1 , X 2 , X 3 are independent
Let F F F be a binary function defined by:
x 1 x_1 x 1 x 2 x_2 x 2 x 3 x_3 x 3 F ( x 1 , x 2 , x 3 ) F(x_1,x_2,x_3) F ( x 1 , x 2 , x 3 ) 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
Let Y = F ( X 1 , X 2 , X 3 ) Y=F(X_1,X_2,X_3) Y = F ( X 1 , X 2 , X 3 )
We have:
P ( Y = 1 ) = P ( X 1 = 0 ∧ X 2 = 0 ∧ X 3 = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ X 3 = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ X 3 = 1 ) + P ( X 1 = 1 ∧ X 2 = 1 ∧ X 3 = 1 ) = p ˉ 1 p ˉ 2 p ˉ 3 + p ˉ 1 p 2 p ˉ 3 + p ˉ 1 p 2 p 3 + p 1 p 2 p 3 = p ˉ 1 p 3 ˉ ( p ˉ 2 + p 2 ) + p 2 p 3 ( p 1 + p ˉ 1 ) = p ˉ 1 p 3 ˉ + p 2 p 3 = 0.4 P ( Y = 0 ) = 0.6 \begin{align*} \mathcal{P}(Y=1)&= \mathcal{P}(X_1 =0 \wedge X_2=0\wedge X_3=0) +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge X_3=0) \\ &+\mathcal{P}(X_1 =0 \wedge X_2=1\wedge X_3=1)+\mathcal{P}(X_1 =1 \wedge X_2=1\wedge X_3=1)\\ &= \bar{p}_1\bar{p}_2\bar{p}_3+\bar{p}_1p_2\bar{p}_3+\bar{p}_1p_2p_3+p_1p_2p_3 \\ &= \bar{p}_1\bar{p_3}(\bar{p}_2+p_2)+p_2p_3(p_1+\bar{p}_1)\\ &= \bar{p}_1\bar{p_3}+p_2p_3 \\ &= 0.4\\ \mathcal{P}(Y=0)&=0.6 \end{align*} P ( Y = 1 ) P ( Y = 0 ) = P ( X 1 = 0 ∧ X 2 = 0 ∧ X 3 = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ X 3 = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ X 3 = 1 ) + P ( X 1 = 1 ∧ X 2 = 1 ∧ X 3 = 1 ) = p ˉ 1 p ˉ 2 p ˉ 3 + p ˉ 1 p 2 p ˉ 3 + p ˉ 1 p 2 p 3 + p 1 p 2 p 3 = p ˉ 1 p 3 ˉ ( p ˉ 2 + p 2 ) + p 2 p 3 ( p 1 + p ˉ 1 ) = p ˉ 1 p 3 ˉ + p 2 p 3 = 0.4 = 0.6 So Y ∼ P ( 0.4 ) Y\sim \mathcal{P}(0.4) Y ∼ P ( 0.4 )
Example 2 We will define X 1 , X 2 X_1,X_2 X 1 , X 2 and F F F as the first example Let X 3 = X 1 ∨ X 2 X_3=X_1\vee X_2 X 3 = X 1 ∨ X 2 the random variable equal to 1 1 1 if X 1 = 1 \or X 2 = 1 X_1=1 \or X_2=1 X 1 = 1 \or X 2 = 1 Let Y = F ( X 1 , X 2 , X 3 ) Y=F(X_1,X_2,X_3) Y = F ( X 1 , X 2 , X 3 ) P ( Y = 1 ) = P ( X 1 = 0 ∧ X 2 = 0 ∧ X 3 = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ X 3 = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ X 3 = 1 ) + P ( X 1 = 1 ∧ X 2 = 1 ∧ X 3 = 1 ) = P ( X 1 = 0 ∧ X 2 = 0 ∧ ( X 1 ∨ X 2 ) = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ ( X 1 ∨ X 2 ) = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ ( X 1 ∨ X 2 ) = 1 ) + P ( X 1 = 1 ∧ X 2 = 1 ∧ ( X 1 ∨ X 2 ) = 1 ) = P ( X 1 = 0 ∧ X 2 = 0 ∧ Always ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ Impossible ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ Always ) + P ( X 1 = 1 ∧ X 2 = 1 ∧ 1 = Always ) = P ( X 1 = 0 ∧ X 2 = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ) + P ( X 1 = 1 ∧ X 2 = 1 ) = p ˉ 1 p ˉ 2 + p ˉ 1 p 2 + p 1 p 2 = p ˉ 1 + p 1 p 2 = 0.68 \begin{align*} \mathcal{P}(Y=1)&= \mathcal{P}(X_1 =0 \wedge X_2=0\wedge X_3=0) +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge X_3=0) \\ &\quad +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge X_3=1)+\mathcal{P}(X_1 =1 \wedge X_2=1\wedge X_3=1)\\ &= \mathcal{P}(X_1 =0 \wedge X_2=0\wedge (X_1\vee X_2)=0) +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge (X_1\vee X_2)=0) \\ &\quad +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge (X_1\vee X_2)=1)+\mathcal{P}(X_1 =1 \wedge X_2=1\wedge (X_1\vee X_2)=1)\\ &= \mathcal{P}(X_1 =0 \wedge X_2=0\wedge \text{Always}) +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge \text{Impossible}) \\ &\quad +\mathcal{P}(X_1 =0 \wedge X_2=1\wedge \text{Always})+\mathcal{P}(X_1 =1 \wedge X_2=1\wedge 1=\text{Always})\\ &=\mathcal{P}(X_1 =0 \wedge X_2=0)+\mathcal{P}(X_1 =0 \wedge X_2=1)+\mathcal{P}(X_1 =1 \wedge X_2=1)\\ &=\bar{p}_1\bar{p}_2+\bar{p}_1p_2+p_1p_2\\ &=\bar{p}_1+p_1p_2\\ &=0.68 \end{align*} P ( Y = 1 ) = P ( X 1 = 0 ∧ X 2 = 0 ∧ X 3 = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ X 3 = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ X 3 = 1 ) + P ( X 1 = 1 ∧ X 2 = 1 ∧ X 3 = 1 ) = P ( X 1 = 0 ∧ X 2 = 0 ∧ ( X 1 ∨ X 2 ) = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ ( X 1 ∨ X 2 ) = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ ( X 1 ∨ X 2 ) = 1 ) + P ( X 1 = 1 ∧ X 2 = 1 ∧ ( X 1 ∨ X 2 ) = 1 ) = P ( X 1 = 0 ∧ X 2 = 0 ∧ Always ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ Impossible ) + P ( X 1 = 0 ∧ X 2 = 1 ∧ Always ) + P ( X 1 = 1 ∧ X 2 = 1 ∧ 1 = Always ) = P ( X 1 = 0 ∧ X 2 = 0 ) + P ( X 1 = 0 ∧ X 2 = 1 ) + P ( X 1 = 1 ∧ X 2 = 1 ) = p ˉ 1 p ˉ 2 + p ˉ 1 p 2 + p 1 p 2 = p ˉ 1 + p 1 p 2 = 0.68 2.6 Conditioning a Bernoulli distribution Let p ∈ [ 0 , 1 ] p\in[0,1] p ∈ [ 0 , 1 ] Let X X X be a Bernoulli distribution, and A \mathcal{A} A an event The random variable Y Y Y defined by P ( Y = k ) = P ( X = k ∣ A ) \mathcal{P}(Y=k)=\mathcal{P}(X=k\mid \mathcal{A}) P ( Y = k ) = P ( X = k ∣ A ) follows the Bernoulli distribution B ( p ) \mathcal{B}(p) B ( p ) with:
p = P ( X = 1 ∣ A ) p=\mathcal{P}(X=1\mid \mathcal{A}) p = P ( X = 1 ∣ A ) 3. Binomial Distribution 3.1 Definition A random variable X X X is said to follow the binomial distribution B ( n , p ) \mathcal{B}(n,p) B ( n , p ) with parametes n ∈ N n\in\mathbb{N} n ∈ N and p ∈ [ 0 , 1 ] p\in[0,1] p ∈ [ 0 , 1 ] if:
∃ X 1 , … , X n ∼ B ( p ) i.i.d / X = ∑ k = 0 n X i \exists X_1,\dots,X_n \sim \mathcal{B}(p) \space \text{i.i.d} /\quad X=\sum_{k=0}^nX_i ∃ X 1 , … , X n ∼ B ( p ) i.i.d / X = k = 0 ∑ n X i 3.2 Significance In probability theory and statistics , the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments , each asking a yes–no question , and each with its own Boolean -valued outcome : success (with probability p p p ) or failure (with probability q = 1 − p q=1-p q = 1 − p )
3.3 Probability mass function Let S k S_k S k be the set of subsets of I = { 1 , … , n } I=\{1,\dots,n\} I = { 1 , … , n } of size k k k
The number of such sets is:
∣ S k ∣ = ( n k ) \lvert S_k \vert = {n \choose k} ∣ S k ∣ = ( k n ) With that, the probability mass function is:
∀ k ∈ { 0 , … , n } , P ( X = k ) = ∑ A ∈ S k P ( ⋀ s ∈ A X s = 1 and ⋀ s ∈ I ∖ A X s = 0 ) = ∑ A ∈ S k ∏ s ∈ A P ( X s = 1 ) × ∏ s ∈ I ∖ A P ( X s = 0 ) thanks to independence = ∑ A ∈ S k ∏ s ∈ A p × ∏ s ∈ I ∖ A ( 1 − p ) = ∑ A ∈ S k p ∣ A ∣ × ( 1 − p ) n − ∣ A ∣ = ∑ A ∈ S k p k × ( 1 − p ) n − k = ∣ S k ∣ p k × ( 1 − p ) n − k = ( n k ) p k ( 1 − p ) n − k \begin{align*} \forall k\in\{0,\dots,n\},\quad \mathcal{P}(X=k)&= \sum_{A\in S_k}\mathcal{P}\left(\bigwedge_{s\in A} X_s=1 \space \text{and} \space \bigwedge_{s\in I \setminus A} X_s=0 \right) \\ &= \sum_{A\in S_k}\prod_{s\in A}\mathcal{P}(X_s=1) \times \prod_{s\in I\setminus A}\mathcal{P}(X_s=0) \quad \text{thanks to independence} \\ &=\sum_{A\in S_k}\prod_{s\in A}p \times \prod_{s\in I\setminus A}(1-p) \\ &= \sum_{A\in S_k}p^{\lvert A\rvert} \times (1-p)^{n-\lvert A \rvert}\\ &= \sum_{A\in S_k}p^{k} \times (1-p)^{n-k} \\ &= \lvert S_k \rvert p^{k} \times (1-p)^{n-k} \\ &= {n \choose k}p^k(1-p)^{n-k} \end{align*} ∀ k ∈ { 0 , … , n } , P ( X = k ) = A ∈ S k ∑ P ⎝ ⎛ s ∈ A ⋀ X s = 1 and s ∈ I ∖ A ⋀ X s = 0 ⎠ ⎞ = A ∈ S k ∑ s ∈ A ∏ P ( X s = 1 ) × s ∈ I ∖ A ∏ P ( X s = 0 ) thanks to independence = A ∈ S k ∑ s ∈ A ∏ p × s ∈ I ∖ A ∏ ( 1 − p ) = A ∈ S k ∑ p ∣ A ∣ × ( 1 − p ) n − ∣ A ∣ = A ∈ S k ∑ p k × ( 1 − p ) n − k = ∣ S k ∣ p k × ( 1 − p ) n − k = ( k n ) p k ( 1 − p ) n − k 3.4 Moments 3.4.1 Raw Moments The expected value can be calculated directly from the definition:
E [ X ] = ∑ k = 1 n E [ X k ] = n p \boxed{\mathbb{E}[X]=\sum_{k=1}^n\mathbb{E}[X_k]=np} E [ X ] = k = 1 ∑ n E [ X k ] = n p For higher order moments:
∀ m ∈ N ∗ , E [ X m ] = ∑ k = 1 n ( n k ) k m p k ( 1 − p ) n − k \begin{align*} \forall m\in\mathbb{N}^*,\quad \mathbb{E}[X^m]&= \sum_{k=1}^n{n \choose k}k^mp^k(1-p)^{n-k} \end{align*} ∀ m ∈ N ∗ , E [ X m ] = k = 1 ∑ n ( k n ) k m p k ( 1 − p ) n − k 3.4.2 Central Moments The variance can be calculated directly from the definition:
V [ X ] = ∑ k = 1 n V [ X k ] = n p ( 1 − p ) \boxed{\mathbb{V}[X]=\sum_{k=1}^n\mathbb{V}[X_k]=np(1-p)} V [ X ] = k = 1 ∑ n V [ X k ] = n p ( 1 − p ) For higher order central moments:
∀ m ∈ N ∗ , E [ ( X − E [ X ] ) m ] = ∑ k = 1 n ( n k ) ( k − n p ) m p k ( 1 − p ) n − k \begin{align*} \forall m\in\mathbb{N}^*,\quad \mathbb{E}\left[\left(X-\mathbb{E}[X]\right)^m \right]&= \sum_{k=1}^n{n \choose k}(k-np)^mp^k(1-p)^{n-k} \end{align*} ∀ m ∈ N ∗ , E [ ( X − E [ X ] ) m ] = k = 1 ∑ n ( k n ) ( k − n p ) m p k ( 1 − p ) n − k 4. Geometric Distribution 4.1 Definition A random variable X X X is said to follow the geometric distribution G ( p ) \mathcal{G}(p) G ( p ) if:
∃ X 1 , ⋯ ∼ B ( p ) i.i.d / X = arg min n ∈ N ∗ { X n = 1 } \exists X_1,\dots \sim \mathcal{B}(p) \space \text{i.i.d} / \quad X=\arg\min_{n\in\mathbb{N}^*} \{X_n=1\} ∃ X 1 , ⋯ ∼ B ( p ) i.i.d / X = arg n ∈ N ∗ min { X n = 1 } 4.2 Significance In probability theory and statistics , the geometric distribution is the probability distribution of the number X X X of Bernoulli trials needed to get one success.
The geometric distribution gives the probability that the first occurrence of success requires k k k independent trials, each with success probability p p p .
4.3 Probability mass function ∀ n ∈ N ∗ , P ( X = n ) = P ( arg min k ∈ N { X k = 1 } = n ) = P ( ⋀ k = 1 n − 1 X k = 0 and X n = 1 ) = P ( X n = 1 ) × ∏ k = 1 n − 1 P ( X k = 0 ) = p ( 1 − p ) n − 1 \begin{align*} \forall n\in\mathbb{N}^*,\quad \mathcal{P}(X=n) &= \mathcal{P}(\arg\min_{k\in\mathbb{N}}\{X_k=1\}=n) \\ &=\mathcal{P}\left(\bigwedge_{k=1}^{n-1} X_k =0 \space \text{and} \space X_n=1\right) \\ &=\mathcal{P}(X_n=1)\times \prod_{k=1}^{n-1}\mathcal{P}(X_k=0) \\ &=p(1-p)^{n-1} \end{align*} ∀ n ∈ N ∗ , P ( X = n ) = P ( arg k ∈ N min { X k = 1 } = n ) = P ( k = 1 ⋀ n − 1 X k = 0 and X n = 1 ) = P ( X n = 1 ) × k = 1 ∏ n − 1 P ( X k = 0 ) = p ( 1 − p ) n − 1 4.4 Moments 4.4.1 Prelude Let φ n \varphi_n φ n defined as:
φ n : R ∗ → R x → ∑ m ∈ N m n x m \begin{align*} \varphi_{n}:&\mathbb{R}^*\rightarrow \mathbb{R}\\ &x\rightarrow \sum_{m\in\mathbb{N}}m^nx^m \end{align*} φ n : R ∗ → R x → m ∈ N ∑ m n x m This function will be a helper function for calculating E [ X n ] \mathbb{E}[X^n] E [ X n ]
In fact, φ n \varphi_n φ n is differentiable and:
φ n ′ = ∑ m ∈ N ∗ m n + 1 x m − 1 = 1 x φ n + 1 \varphi'_n=\sum_{m\in\mathbb{N}^*}m^{n+1}x^{m-1}=\frac{1}{x}\varphi_{n+1} φ n ′ = m ∈ N ∗ ∑ m n + 1 x m − 1 = x 1 φ n + 1 Which implies:
∀ n ∈ N , φ n + 1 = x φ n ′ \forall n\in\mathbb{N},\quad \varphi_{n+1}=x\varphi'_n ∀ n ∈ N , φ n + 1 = x φ n ′ And we have the following:
φ 0 = ∑ m ∈ N x m = 1 1 − x \varphi_0=\sum_{m\in\mathbb{N}}x^m=\frac{1}{1-x} φ 0 = m ∈ N ∑ x m = 1 − x 1 4.4.2 Raw Moments ∀ n ∈ N , E [ X n ] = ∑ m ∈ N m n p ( 1 − p ) n − 1 = p 1 − p φ n ( 1 − p ) \begin{align*} \forall n\in\mathbb{N},\quad \mathbb{E}[X^n]&=\sum_{m\in\mathbb{N}}m^np(1-p)^{n-1} \\ &= \frac{p}{1-p}\varphi_n(1-p) \end{align*} ∀ n ∈ N , E [ X n ] = m ∈ N ∑ m n p ( 1 − p ) n − 1 = 1 − p p φ n ( 1 − p ) With that, we can calculate the expected value E [ X ] \mathbb{E}[X] E [ X ] as:
∀ x ∈ R ∗ , φ 1 ( x ) = x φ 0 ′ ( x ) = x ( 1 − x ) 2 E [ X ] = p 1 − p φ 1 ( 1 − p ) = p 1 − p ⋅ 1 − p p 2 = 1 p \begin{align*} \forall x\in\mathbb{R}^*,\quad \varphi_1(x)&=x\varphi_0'(x)\\ &=\frac{x}{(1-x)^2}\\ \mathbb{E}[X]&=\frac{p}{1-p}\varphi_1(1-p) \\ &=\frac{p}{1-p}\cdot \frac{1-p}{p^2}\\ &=\frac{1}{p} \end{align*} ∀ x ∈ R ∗ , φ 1 ( x ) E [ X ] = x φ 0 ′ ( x ) = ( 1 − x ) 2 x = 1 − p p φ 1 ( 1 − p ) = 1 − p p ⋅ p 2 1 − p = p 1 4.4.3 Variance The variance V [ X ] \mathbb{V}[X] V [ X ] can be calculated as:
∀ x ∈ R ∖ { 0 , 1 } , φ 2 ( x ) = x φ 1 ′ ( x ) = x ⋅ ( x ( 1 − x ) 2 ) ′ = x ⋅ ( 1 − x ) 2 + 2 ( 1 − x ) x ( 1 − x ) 4 = x ⋅ 1 − x + 2 x ( 1 − x ) 3 = x ( x + 1 ) ( 1 − x ) 3 V [ X ] = E [ X 2 ] − E [ X ] 2 = p 1 − p ⋅ φ 2 ( 1 − p ) − 1 p 2 = p 1 − p ⋅ ( 1 − p ) ( 2 − p ) p 3 − 1 p 2 = 2 − p − 1 p 2 = 1 − p p 2 \begin{align*} \forall x\in\mathbb{R}\setminus\{0,1\},\quad \varphi_2(x)&=x\varphi_1'(x)\\ &=x\cdot \left(\frac{x}{(1-x)^2}\right)'\\ &=x\cdot \frac{(1-x)^2+2(1-x)x}{(1-x)^4}\\ &=x\cdot \frac{1-x+2x}{(1-x)^3}\\ &=\frac{x(x+1)}{(1-x)^3}\\ \mathbb{V}[X]&=\mathbb{E}[X^2]-\mathbb{E}[X]^2\\ &=\frac{p}{1-p}\cdot \varphi_2(1-p)-\frac{1}{p^2}\\ &=\frac{p}{1-p}\cdot \frac{(1-p)(2-p)}{p^3}-\frac{1}{p^2}\\ &=\frac{2-p-1}{p^2}\\ &=\frac{1-p}{p^2} \end{align*} ∀ x ∈ R ∖ { 0 , 1 } , φ 2 ( x ) V [ X ] = x φ 1 ′ ( x ) = x ⋅ ( ( 1 − x ) 2 x ) ′ = x ⋅ ( 1 − x ) 4 ( 1 − x ) 2 + 2 ( 1 − x ) x = x ⋅ ( 1 − x ) 3 1 − x + 2 x = ( 1 − x ) 3 x ( x + 1 ) = E [ X 2 ] − E [ X ] 2 = 1 − p p ⋅ φ 2 ( 1 − p ) − p 2 1 = 1 − p p ⋅ p 3 ( 1 − p ) ( 2 − p ) − p 2 1 = p 2 2 − p − 1 = p 2 1 − p 5. Negative Binomial Distribution A random variable X X X is said to follow the negative binomial distribution N B ( r , p ) \mathcal{NB}(r,p) N B ( r , p ) with paramters r ∈ N ∗ r\in\mathbb{N}^* r ∈ N ∗ and p ∈ [ 0 , 1 ] p\in[0,1] p ∈ [ 0 , 1 ] if:
∃ X 1 , ⋯ ∼ B ( p ) i.i.d / X = arg min n ∈ N { ∑ k = 1 n X k = r } \exists X_1,\dots \sim \mathcal{B}(p) \space \text{i.i.d} / \quad X=\arg\min_{n\in\mathbb{N}} \left\{\sum_{k=1}^nX_k=r\right\} ∃ X 1 , ⋯ ∼ B ( p ) i.i.d / X = arg n ∈ N min { k = 1 ∑ n X k = r } 5.2 Significance In probability theory and statistics , the negative binomial distribution is a discrete probability distribution that models the number of trials in a sequence of independent and identically distributed Bernoulli trials so that a specified (non-random) number of successes r r r occurs.
For example, we can define rolling a 6 6 6 on a die as a success, and rolling any other number as a failure, and ask how many rolls will occur to get the third success ( r = 3 ) (r=3) ( r = 3 ) . In such a case, the probability distribution of the number of needed trials will be a negative binomial distribution.
5.3 Probability mass function Let S n , k S_{n,k} S n , k be the set of subsets of I n = { 1 , … , n } I_n=\{1,\dots,n\} I n = { 1 , … , n } of size k k k
∀ n ∈ N , P ( X = n ) = P ( arg min n ∈ N { ∑ k = 1 n X k = r } ) = ∑ A ∈ S n − 1 , r − 1 P ( ⋀ s ∈ A X s = 1 and X n = 1 and ⋀ s ∈ I n − 1 ∖ A X s = 0 ) = ∑ A ∈ S n − 1 , r − 1 P ( X n = 1 ) ∏ s ∈ A P ( X s = 1 ) ⋅ ∏ s ∈ I n − 1 ∖ A P ( X s = 0 ) = ∑ A ∈ S n − 1 , r − 1 p ∣ A ∣ + 1 ( 1 − p ) n − 1 − ∣ A ∣ = ∑ A ∈ S n − 1 , r − 1 p r ( 1 − p ) n − r = ∣ S n − 1 , r − 1 ∣ ⋅ p r ( 1 − p ) n − r = ( n − 1 r − 1 ) p r ( 1 − p ) n − r \begin{align*} \forall n\in\mathbb{N},\quad \mathcal{P}(X=n) &= \mathcal{P}\left(\arg\min_{n\in\mathbb{N}} \left\{\sum_{k=1}^nX_k=r\right\}\right) \\ &= \sum_{A\in S_{n-1,r-1}} \mathcal{P}\left(\bigwedge_{s\in A}X_s=1 \space \text{and} \space X_n=1 \space \text{and} \bigwedge_{s\in I_{n-1}\setminus A}X_s=0\right) \\ &= \sum_{A\in S_{n-1,r-1}} \mathcal{P}(X_n=1)\prod_{s\in A}\mathcal{P}(X_s=1) \cdot \prod_{s\in I_{n-1}\setminus A} \mathcal{P}(X_s=0) \\ &= \sum_{A\in S_{n-1,r-1}} p^{\lvert A \rvert +1} (1-p)^{n-1-\lvert A \rvert} \\ &= \sum_{A\in S_{n-1,r-1}} p^{r} (1-p)^{n-r} \\ &= \lvert S_{n-1,r-1}\rvert \cdot p^{r} (1-p)^{n-r} \\ &= {n-1 \choose r-1}p^r(1-p)^{n-r} \end{align*} ∀ n ∈ N , P ( X = n ) = P ( arg n ∈ N min { k = 1 ∑ n X k = r } ) = A ∈ S n − 1 , r − 1 ∑ P ⎝ ⎛ s ∈ A ⋀ X s = 1 and X n = 1 and s ∈ I n − 1 ∖ A ⋀ X s = 0 ⎠ ⎞ = A ∈ S n − 1 , r − 1 ∑ P ( X n = 1 ) s ∈ A ∏ P ( X s = 1 ) ⋅ s ∈ I n − 1 ∖ A ∏ P ( X s = 0 ) = A ∈ S n − 1 , r − 1 ∑ p ∣ A ∣ + 1 ( 1 − p ) n − 1 − ∣ A ∣ = A ∈ S n − 1 , r − 1 ∑ p r ( 1 − p ) n − r = ∣ S n − 1 , r − 1 ∣ ⋅ p r ( 1 − p ) n − r = ( r − 1 n − 1 ) p r ( 1 − p ) n − r 5.4 Moments 5.4.1 Raw Moments Let p ∈ [ 0 , 1 ] p\in[0,1] p ∈ [ 0 , 1 ] For r ∈ N , r\in\mathbb{N}, r ∈ N , let X r ∼ N B ( r , p ) X_r\sim \mathcal{NB}(r,p) X r ∼ N B ( r , p ) ∀ n ∈ N ∗ , E [ X r n ] = ∑ m ∈ N ( m − 1 r − 1 ) m n p r ( 1 − p ) m − r = ∑ m ∈ N ( m − 1 ) ! ( m − r ) ! ( r − 1 ) ! m n p r ( 1 − p ) m − r = ∑ m ∈ N m ! ( m − r ) ! r ! r m n − 1 p r ( 1 − p ) m − r = ∑ m ∈ N ( m r ) r m n − 1 p r ( 1 − p ) m − r = r p ∑ m ∈ N ∗ ( m − 1 r ) ( m − 1 ) n − 1 p r + 1 ( 1 − p ) m − 1 − r = r p ∑ m ∈ N ∗ ( m − 1 r ) ∑ s = 0 n − 1 ( n − 1 s ) ( − 1 ) n − 1 − s m s p r + 1 ( 1 − p ) m − 1 − r = r p ∑ s = 0 n − 1 ( − 1 ) n − 1 − s ( n − 1 s ) ∑ m ∈ N ∗ ( m − 1 r ) m s p r + 1 ( 1 − p ) m − 1 − r = r p ∑ s = 0 n − 1 ( − 1 ) n − 1 − s ( n − 1 s ) E [ X r + 1 s ] \begin{align*} \forall n\in\mathbb{N}^*,\quad \mathbb{E}[X_r^n]&=\sum_{m\in\mathbb{N}}{m-1 \choose r-1}m^np^r(1-p)^{m-r}\\ &=\sum_{m\in\mathbb{N}}\frac{(m-1)!}{(m-r)!(r-1)!}m^np^r(1-p)^{m-r}\\ &=\sum_{m\in\mathbb{N}}\frac{m!}{(m-r)!r!}rm^{n-1}p^r(1-p)^{m-r}\\ &=\sum_{m\in\mathbb{N}}{m \choose r}rm^{n-1}p^r(1-p)^{m-r}\\ &=\frac{r}{p}\sum_{m\in\mathbb{N}^*}{m-1\choose r}(m-1)^{n-1}p^{r+1}(1-p)^{m-1-r}\\ &=\frac{r}{p}\sum_{m\in\mathbb{N}^*}{m-1\choose r}\sum_{s=0}^{n-1}{n-1\choose s}(-1)^{n-1-s}m^sp^{r+1}(1-p)^{m-1-r}\\ &=\frac{r}{p}\sum_{s=0}^{n-1}(-1)^{n-1-s}{n-1\choose s}\sum_{m\in\mathbb{N}^*}{m-1\choose r}m^sp^{r+1}(1-p)^{m-1-r}\\ &=\frac{r}{p}\sum_{s=0}^{n-1}(-1)^{n-1-s}{n-1\choose s}\mathbb{E}[X^s_{r+1}] \end{align*} ∀ n ∈ N ∗ , E [ X r n ] = m ∈ N ∑ ( r − 1 m − 1 ) m n p r ( 1 − p ) m − r = m ∈ N ∑ ( m − r )! ( r − 1 )! ( m − 1 )! m n p r ( 1 − p ) m − r = m ∈ N ∑ ( m − r )! r ! m ! r m n − 1 p r ( 1 − p ) m − r = m ∈ N ∑ ( r m ) r m n − 1 p r ( 1 − p ) m − r = p r m ∈ N ∗ ∑ ( r m − 1 ) ( m − 1 ) n − 1 p r + 1 ( 1 − p ) m − 1 − r = p r m ∈ N ∗ ∑ ( r m − 1 ) s = 0 ∑ n − 1 ( s n − 1 ) ( − 1 ) n − 1 − s m s p r + 1 ( 1 − p ) m − 1 − r = p r s = 0 ∑ n − 1 ( − 1 ) n − 1 − s ( s n − 1 ) m ∈ N ∗ ∑ ( r m − 1 ) m s p r + 1 ( 1 − p ) m − 1 − r = p r s = 0 ∑ n − 1 ( − 1 ) n − 1 − s ( s n − 1 ) E [ X r + 1 s ] In particular, the expected value is:
E [ X r ] = r p E [ X r + 1 0 ] = r p \boxed{\mathbb{E}[X_r]=\frac{r}{p}\mathbb{E}[X^0_{r+1}]=\frac{r}{p}} E [ X r ] = p r E [ X r + 1 0 ] = p r 5.4.2 Central Moments We will start by the variance
E [ X r 2 ] = r p ( − E [ X r + 1 0 ] + E [ X r + 1 ] ) = r p ⋅ ( r + 1 p − 1 ) = r ( r + 1 − p ) p 2 ⟹ V [ X r ] = E [ X r 2 ] − E [ X r ] 2 = r ( r + 1 − p ) p 2 − r 2 p 2 = r 1 − p p 2 \begin{align*} \mathbb{E}[X_r^2]&=\frac{r}{p}\left(-\mathbb{E}[X_{r+1}^0]+\mathbb{E}[X_{r+1}]\right)\\ &=\frac{r}{p}\cdot (\frac{r+1}{p}-1)\\ &=\frac{r(r+1-p)}{p^2}\\ \implies \mathbb{V}[X_r]&=\mathbb{E}[X_r^2]-\mathbb{E}[X_r]^2\\ &=\frac{r(r+1-p)}{p^2}-\frac{r^2}{p^2}\\ &=r\frac{1-p}{p^2} \end{align*} E [ X r 2 ] ⟹ V [ X r ] = p r ( − E [ X r + 1 0 ] + E [ X r + 1 ] ) = p r ⋅ ( p r + 1 − 1 ) = p 2 r ( r + 1 − p ) = E [ X r 2 ] − E [ X r ] 2 = p 2 r ( r + 1 − p ) − p 2 r 2 = r p 2 1 − p 5.5 Relation to the Geometric Distribution The geometric distribution is a special case of the negative binomial distribution.
In fact:
∀ p ∈ [ 0 , 1 ] , G ( p ) = N B ( 1 , p ) \boxed{\forall p\in [0,1],\quad \mathcal{G}(p)=\mathcal{NB}(1,p)} ∀ p ∈ [ 0 , 1 ] , G ( p ) = N B ( 1 , p )