Theoretical Context
In data analysis, we have a data sample drawn from a distribution with unknown parameters and we wish to estimate these parameters from the sample. A well known method to do this is the so-called method of moments.
Let the distribution of a random variable X be given by f_X(x|\theta_i), where \theta_i are the parameters that specify the distribution. The k-th moment of the distribution is given by:
\begin{equation*} \mu_k=\mathbb{E}[X^k]=\int dx\;x\;f_X(x|\theta_i). \end{equation*}
If \{X_1,X_2,\dots, X_n\} are independently identically distributed random variables, the moments of the distribution may be estimated by the sample moments \hat{\mu}_k:
\begin{equation*} \hat{\mu}_k=\frac{1}{n}\sum_{i=1}^n X_i^k. \end{equation*}
The method of moments estimates the parameters \theta_i by finding expressions for them in terms of the (lowest order) moments and substituting the sample moments to find the estimated parameters. In concrete terms, the procedure consists of the following steps:
- Calculate the lowest order moments in terms of the parameters \theta_i. Typically, the number of moments required is equal to the number of parameters. That is, find:
\begin{equation*} \mu_k=g(\theta_1,\dots,\mu_k). \end{equation*}
- Invert the expressions to find \theta_i in terms of the moments:
\begin{equation*} \theta_i = h(\mu_1,\dots,\mu_k). \end{equation*}
- Find the estimates by inserting the sample moments:
\begin{equation*} \hat{\theta}_i=h(\hat{\mu}_1,\dots,\hat{\mu}_k). \end{equation*}
Sampling Distribution
Note that there is some intrinsic uncertainty in the method of moments estimate, for it relies on the usage of sample moments. Hence, if we repeat the experiment (draw a different sample from the distribution), we will get slightly different sample moments and, consequently, parameter estimates.
Consider a single parameter \theta with estimate \hat{\theta}. To quantify the variance or uncertainty in the estimate, we will consider its sampling distribution. This distribution describes how the estimates obtained from multiple samples would be distributed. The variance of the sampling distribution provides a measure for the error in the estimate. In some cases, the sampling distribution can be derived analytically, otherwise we have to resort to approximate methods.
Example
Consider the first moment of the exponential distribution:
\begin{equation*} \mu_1=\mathbb{E}[X]=\int_0^\infty dx\;x\;f_X(x|\beta)=\int_0^\infty dx\;\frac{x}{\beta}\; e^{-x/\beta}=\beta, \end{equation*}
whose variance is \sigma^2_X=\beta^2. The method of moments estimate then becomes \hat{\beta}=\hat{\mu}_1=\bar{X}_n. Since we have found \hat{\beta}=\bar{X}_n, the sampling distribution of \hat{\beta} is equivalent to the distribution of the sample average. For sufficiently large sample sizes n, the Central Limit Theorem dictates that \bar{X}_n is normally distributed around the true value \beta with variance \sigma^2_{\hat{\beta}}=\sigma_X^2/n=\beta^2/n. In other words, the pdf of the sampling distribution is given by:
\begin{equation*} f(\hat{\beta}|\beta,n) = f_\mathrm{N}(\beta,\beta^2/n), \end{equation*}
where f_\mathrm{N}(\mu,\sigma^2) denotes a normal distribution. The result \sigma_{\hat{\beta}}=\beta/\sqrt{n} is of little practical use, since the true parameter is unknown. We can approximate the variance of the estimate by replacing the true parameter by the estimate itself. This yields the estimated standard error s_{\hat{\beta}}=\hat{\beta}/\sqrt{n}.
Consistent Estimators
Casting the above example in more general terms, we can (typically) write the variance of the estimator and the estimated standard error as:
\begin{equation*} \sigma_{\hat{\theta}}=\sigma(\theta)/\sqrt{n},\quad\quad\quad s_{\hat{\theta}}=\sigma(\hat{\theta})/\sqrt{n}. \end{equation*}
The assumption that s_{\hat{\theta}}\simeq \sigma_{\hat{\theta}} is only valid for consistent estimators. Formally, for a consistent estimator \hat{\theta} we have:
\begin{equation*} \lim_{n\to\infty}\frac{s_{\hat{\theta}}}{\sigma_{\hat{\theta}}}=1, \end{equation*}
provided that \sigma(\theta) is continuous function.
Consistent Estimator: Let \hat{\theta}(n) be an estimate of the parameter \theta, depending on the sample size n. Then \hat{\theta}(n) is consistent if it converges in probability to \theta according to \lim_{n\to\infty} \mathbb{P}(|\hat{\theta}-\theta|>\epsilon)=0,\;\forall \epsilon.
Now, we argue that the method of moments estimate are consistent. The law of large numbers implies that the sample moments \hat{\mu}_k converge in probability to the true moments \mu_k. If the functions h are continuous, the estimates converge to the true parameters as the sample moments converge to the true moments. That is, the estimates are consistent and provided that the variance \sigma(\theta) is continuous, we have:
\begin{equation*} \lim_{n\to\infty} \sigma(\hat{\theta})=\sigma(\theta)\quad\quad\to\quad\quad \lim_{n\to \infty}s_{\hat{\theta}}=\sigma_{\hat{\theta}}. \end{equation*}