Using the Beta distribution

fastai
Published

October 18, 2020

Using the Beta distribution

We try to use beta distributions to estimate if our coin is fair.

From the Data Science from Scratch book.

Libraries and helper functions

import math as m

import numpy as np
import altair as alt
import pandas as pd
def B(alpha: float, beta: float) -> float:
    "This scales the parameters between 0 and 1"
    return m.gamma(alpha) * m.gamma(beta) / m.gamma(alpha + beta)

def beta_pdf(x: float, alpha: float, beta:float) -> float:
    if x <= 0 or x >=1: return 0

    return x ** (alpha - 1) * (1 - x) ** (beta - 1) / B(alpha, beta)

Example 1

  1. We do not want to make assumptions beforehand, so we choose both \(\alpha\) and \(\beta\) to be 1: \(B(1, 1)\)
  2. We flip the coin 10 times and get 3 heads
  3. Our new posterior distribution becomes \(B(4, 8)\) centered around 0.33
  4. We have to assume that the observed probabilty is the real

Example 2

  1. We have a strong assumption that the coin is fair so we choose a $B(20, 20)
  2. Again, we got 3 heads out of 10
  3. Our new Beta is \(B(23, 27)\) centered around 0.46
  4. It suggest that the coin is slightly biased toward tails

Example 3

  1. We believe that the coin is biased toward head by 75% of the time, so we choose \(B(30, 10)\)
  2. Again, we got 3 heads out of 10
  3. Our poseterior distribution is \(B(33, 17)\) centered around 0.66
  4. It suggest that the coin is biased toward the head, although less strongly as we believed
df = pd.DataFrame()

Beta_combinations = [(1, 1), (4, 8), (20, 20), (23, 27), (30, 10), (33, 17)]

for Beta in Beta_combinations:
    alpha, beta = Beta
    df_B = pd.DataFrame()
    df_B['x'] = pd.Series(np.arange(0.01, 1, .01))    
    df_B['y'] = df_B['x'].apply(lambda x: beta_pdf(x, alpha, beta))
    df_B['Beta'] = f'({alpha}, {beta})'

    df = pd.concat([df, df_B])
alt.Chart(df).mark_line().encode(
    alt.X('x:Q'), alt.Y('y:Q'), alt.Color('Beta'), tooltip=['x', 'y', 'Beta'], strokeDash='Beta'
).properties(width=600, title='Beta distributions')