Formal explanation of Bayes

The sampling distribution, with probability function \(p(y|\theta)\), is the probability of observable data y given unobserved (and typically unobservable) model parameters \(\theta\).

(The sampling distribution probability function \(p(y|\theta)\) is called the likelihood function when viewed as a function of \(\theta\) for fixed y.)

The prior distribution, with probability function \(p(\theta)\), models the probability of the parameters \(\theta\).

The full joint distribution over parameters and data has a probability function given by the chain rule,

\(p(y,\theta) = p(y|\theta) \times p(\theta)\)

The posterior distribution gives the probability of parameters \(\theta\) given the observed data y. The posterior probability function \(p(\theta|y)\) is derived from the sampling and prior distributions via Bayes’s rule,

\(p(\theta|y)\)

\(={} \frac{\displaystyle p(y,\theta)}{\displaystyle p(y)}\) by the definition of conditional probability,

\(={} \frac{\displaystyle p(y|\theta) \times p(\theta)}{\displaystyle p(y)}\) by the chain rule,

\({}\frac{\displaystyle p(y|\theta) \times p(\theta)}{\int_{\Theta} \displaystyle p(y,\theta') \, d\theta'}\) by the rule of total probability,

\({} = \frac{\displaystyle p(y|\theta) \times p(\theta)}{\int_{\Theta} \displaystyle p(y|\theta') \times p(\theta') \, d\theta'}\) by the chain rule, and

\({} \propto p(y|\theta) \times p(\theta)\) because y is constant.

The posterior predictive distribution \(p(\tilde{y}|y)\) for new data \(\tilde{y}\) given observed data y is the average of the sampling distribution defined by weighting the parameters proportional to their posterior probability,

\(p(\tilde{y}|y) = \int_{\Theta} \, p(\tilde{y}|\theta) \times p(\theta|y) \, d\theta\)

The key feature is the incorporation into predictive inference of the uncertainty in the posterior parameter estimate. In particular, the posterior is an overdispersed variant of the sampling distribution. The extra dispersion arises by integrating over the posterior. ## Example

p_H1<-0.005   ## Has disease
p_H2<-0.995   ## No disease

######
p_D1_H1<- 0.99   ## Test positive given disease
p_D2_H1 <- 0.01  ## Test negative given disease

p_D1_H2<- 0.01 ## Test positive given no disease 
p_D2_H2 <- 0.99 ## Test negative given no disease

P_H1_D1<- p_H1*p_D1_H1/(p_H1*p_D1_H1 + p_H2*p_D1_H2)
P_H1_D1

## [1] 0.3322148

0.99*0.005/(0.99*0.005+0.01*0.995)

## [1] 0.3322148

Example fisher exact without JAGS

# DATA
n1 = 52 # men
y1 = 9  # left-handed men
n2 = 48 # women
y2 = 4  # left-handed women

# SIMULATION
I = 10000 # simulations
theta1 = rbeta(I, y1+1, (n1-y1)+1)  
theta2 = rbeta(I, y2+1, (n2-y2)+1)
diff = theta1-theta2  # simulated diffs

# OUTPUT
quantiles = quantile(diff,c(0.005,0.025,0.5,0.975,0.995))
round (quantiles,digits=2)

##  0.5%  2.5%   50% 97.5% 99.5% 
## -0.09 -0.04  0.09  0.22  0.27

print("Probability higher % men than women lefties:")

## [1] "Probability higher % men than women lefties:"

print(mean(theta1>theta2))

## [1] 0.9029

print("Probability higher % men than women lefties:")

## [1] "Probability higher % men than women lefties:"

print(mean(theta1>theta2))

## [1] 0.9029

plot(density(diff),
     xlab="theta1 - theta2",
     ylab="p(theta1 - theta2 | y, n)",
     main="Posterior Simulation of Male - Female Lefties",
     ylim=c(0,8),
     frame.plot=FALSE,cex.lab=1.5,lwd=3,yaxt="no")
abline(v=quantiles[2], col="blue")
abline(v=quantiles[4], col="blue")

Formal explanation of Bayes

Duncan Golicher

1/11/2019

Example fisher exact without JAGS