Skip to content

Bayesian Parameter Estimation

Theoretical Context

In Bayesian method of parameter estimation, the unknown parameter θ\theta is treated as a random variable. To the parameter θ\theta, a prior distribution fΘ(θ)f_\Theta(\theta) is assigned, which encodes what we know about the parameter before observing the data. For a given value Θ=θ\Theta=\theta, the data have a probability density function fXΘ(xθ)f_{X|\Theta}(x|\theta). In this notation, read XΘX|\Theta as the (probability of) XX given Θ\Theta. The joint distribution of XX and Θ\Theta can then be written as:

fX,Θ(x,θ)=fXΘ(xθ)  fΘ(θ).\begin{equation*} f_{X,\Theta}(x,\theta)=f_{X|\Theta}(x|\theta)\;f_\Theta(\theta). \end{equation*}

Joint Distribution: the probability density function fA,B(a,b)f_{A,B}(a,b) is defined such that fA,B(a,b)  da  dbf_{A,B}(a,b)\;da\;db gives the probability that aAa+dabBb+dba\leq A\leq a+da\wedge b\leq B\leq b+db .

The marginal distribution of the data is obtained, by definition, upon integrating over the parameter space:

fX(x)=dθ  fX,Θ(xθ)=dθ  fXΘ(xθ)  fΘ(θ).\begin{equation*} f_X(x)=\int d\theta\;f_{X,\Theta}(x|\theta)=\int d\theta \;f_{X|\Theta}(x|\theta)\;f_\Theta(\theta). \end{equation*}

However, note that we can also write the joint distribution as fX,Θ(x,θ)=fΘX(θx)  fX(x)f_{X,\Theta}(x,\theta)=f_{\Theta|X}(\theta|x)\;f_X(x), so that the distribution of Θ\Theta given the data XX is given by:

fΘX(θx)=fX,Θ(x,θ)fX(x)=fX,Θ(x,θ)fΘ(θ)dθ  fX,Θ(x,θ)fΘ(θ).\begin{equation*} f_{\Theta|X}(\theta|x)=\frac{f_{X,\Theta}(x,\theta)}{f_X(x)}=\frac{f_{X,\Theta}(x,\theta)f_\Theta(\theta)}{\int d\theta\;f_{X,\Theta}(x,\theta)f_\Theta(\theta)}. \end{equation*}

This is also called the posterior distribution, representing what we know about Θ\Theta given the data XX. The denominator serves as a normalization constant. The information is contained in the numerator:

posterior  densitylikelihood×prior  density.\begin{equation*} \mathrm{posterior\;density}\propto\mathrm{likelihood}\times\mathrm{prior\;density}. \end{equation*}

To construct the Bayesian estimate θ^Bay\hat{\theta}_{\mathrm{Bay}}, we can extract the average or most probable value from the posterior distribution: this is a mere choice. As a measure for the variance of the estimate, we can take the standard deviation of the distribution.


Example

Let us consider the exponential distribution again, so that fXiΛ(xiλ)=λeλxif_{X_i|\Lambda}(x_i|\lambda)=\lambda e^{-\lambda x_i}. The conditional distribution fXΛ(xλ)f_{X|\Lambda}(x|\lambda) is then given by:

fXΛ(xλ)=i=1nλeλxi=(λeλxˉn)n,\begin{equation*} f_{X|\Lambda}(x|\lambda)=\prod_{i=1}^n\lambda e^{-\lambda x_i}=(\lambda e^{-\lambda\bar{x}_n})^n, \end{equation*}

where xˉn\bar{x}_n is the sample average for the sample of size nn.

There is no formal prescription for what the prior distribution should look like. Usually, we take a conservative approach that is as unbiased as possible, so that the data can speak for themselves. To this end, we take a flat prior, that is constant on a interval in which we expect the true parameter to be. Assuming the parameter to fall inside the interval λ[λmin,λmin]\lambda\in[\lambda_\mathrm{min},\lambda_\mathrm{min}], we get:

fΛ(λ)=1λmaxλmin,λminλλmax,\begin{equation*} f_\Lambda(\lambda)=\frac{1}{\lambda_\mathrm{max}-\lambda_\mathrm{min}},\quad\quad\quad \lambda_\mathrm{min}\leq\lambda\leq\lambda_\mathrm{max}, \end{equation*}

and zero otherwise. Ofcourse, choosing the interval to be very narrow makes even a flat prior biased. We will take the most conservative approach; we only assume that the parameter is positive and take the interval λ[0,)\lambda\in[0,\infty). The posterior is then:

fΛX(λx)=(λeλxˉn)nI(x),I(x)0dλ  (λeλxˉn)n,\begin{equation*} f_{\Lambda|X}(\lambda|x)=\frac{(\lambda e^{-\lambda \bar{x}_n})^n}{I(x)},\quad\quad\quad I(x)\equiv \int_{0}^{\infty}d\lambda \; (\lambda e^{-\lambda \bar{x}_n})^n, \end{equation*}

where I(x)I(x) is a normalization factor depending on the data. Notice that since the prior is flat, it cancelled out of the posterior: the only traces it left are the integration bounds on the normalization integral. The normalization integral can be written in terms of the gamma function as:

I(x)=1(xˉnn)n+10dy  yney=Γ(n+1)(xˉnn)n+1.\begin{equation*} I(x)=\frac{1}{(\bar{x}_nn)^{n+1}}\int_0^\infty dy\; y^n e^{-y} = \frac{\Gamma(n+1)}{(\bar{x}_nn)^{n+1}}. \end{equation*}

Therefore, we find the posterior to be:

fΛX(λx)=(xˉnn)n+1Γ(n+1)(λeλxˉn)n,\begin{equation*} f_{\Lambda|X}(\lambda|x)=\frac{(\bar{x}_nn)^{n+1}}{\Gamma(n+1)}(\lambda e^{-\lambda \bar{x}_n})^n, \end{equation*}

which can be recognized as a Gamma distribution f(λ;α,β)=βαλα1eβλ/Γ(α)f(\lambda;\alpha,\beta)=\beta^\alpha\lambda^{\alpha-1} e^{-\beta\lambda}/\Gamma(\alpha) with parameters α=n+1\alpha=n+1 and β=nxˉn\beta=n\bar{x}_n. From the posterior, we take the expectation value as the estimate and the standard deviation as the error in the estimate:

λ^Bay=E[λ]=αβ=n+1nxˉ,σλ^2=E[λ2]=αβ2=n+1n2xˉ2.\begin{equation*} \hat{\lambda}_{\mathrm{Bay}}=\mathbb{E}[\lambda]=\frac{\alpha}{\beta}=\frac{n+1}{n\bar{x}},\quad\quad\quad\sigma^2_{\hat{\lambda}}=\mathbb{E}[\lambda^2]=\frac{\alpha}{\beta^2}=\frac{n+1}{n^2\bar{x}^2}. \end{equation*}

In the limit of large sample size nn\to\infty, we find that the estimator becomes λ^Bay=1/xˉ\hat{\lambda}_{\mathrm{Bay}}=1/\bar{x}, equivalent to the Method of Moments and Maximum Likelihood Method estimators, the same applies to the error estimates. Below, we plot the posterior obtained for a sample of size n=1000n=1000, together with the estimate and its error.

Posterior distribution for estimation of parameter λ\lambda.

Downloads