Density and the variance of a nonparametric estimator

In nonparametric statistics, density estimation and regression are two classical problems. However, there is an interesting difference in the relation between the density value and the variance of a nonparametric estimator.

Density Estimation

We first consider the density estimation problem. For simplicity, we consider \(d=1\) case. Let \(X_1,\cdots,X_n\) be a random sample from an unknown density \(p\). The goal of density estimation is to use this random sample to estimate the underlying density function \(p\).

Here we consider the kernel density estimator (KDE): \[ \widehat{p}_n(x) = \frac{1}{nh}\sum_{i=1}^n K\left(\frac{x-X_i}{h}\right), \] where \(K\) is a smoothing kernel function such as a Gaussian and \(h\) is the smoothing bandwidth which controls the amount of smoothing.

Now we consider the variance of the KDE. It is well-known that the variance of the KDE is \[ {\sf Var}\left(\widehat{p}_n(x)\right) = \frac{p(x)\int K^2(u)du}{nh} + o\left(\frac{1}{nh}\right). \] Thus, a higher density implies a higher variance.

Regression

Now we turn to the regression problem. A general setting for the regression is that we observe IID pairs of random variables \[ (X_1,Y_1),\cdots,(X_n,Y_n) \] and \(\mathbb{E}(Y_i|X_i=x) = m(x)\) is called the regression function. The goal of regression analysis is to estimate the function \(m(x)\).

Here we consider the kernel regression (also known as Nadaraya-Watson regression), which is to use the estimator \[ \widehat{m}_n(x) = \frac{\sum_{i=1}^n Y_i K\left(\frac{x-X_i}{h}\right)}{\sum_{i=1}^nK\left(\frac{x-X_i}{h}\right)}, \] where \(K\) and \(h\) are the kernel function and the smoothing bandwidth as the KDE.

Now let the density of the covariate \(X\) be \(f(x)\). Then it is also well-known that the variance of the estimator \(\widehat{m}_n(x)\) is \[ {\sf Var} \left(\widehat{m}_n(x)\right) = \frac{\sigma^2(x)}{f(x)\cdot nh}\int K^2(u)du + o\left(\frac{1}{nh}\right), \] where \(\sigma^2(x) = {\sf Var}(Y|X=x)\). Namely, the variance is smaller when the density of the covariate is higher.

Comparison

Here we see a difference between density estimation and regression. In density estimation, a higher density area corresponds to a higher local variance of the estimator. However, in regression, a higher density area of the covariate corresponds to a lower local variance of the estimator.

Although the two problems (density estimation and regression) are very similar, the covariates actually play very different roles in both scenarios. In density estimation, the density of the covariate is the target of estimation. But in regression, the density of the covariate just represents the amount of data around a given point; the target is the conditional expectation of the response \(Y\) rather than the covariate \(X\).

In density estimation, when the density is higher, though we have more data points around a given point \(x\), the difficulty of the problem also increases. On the other hand, for regression problem, the difficulty of the problem is more about the conditional variance \({\sf Var}(Y|X=x)\). Thus, when we have more data points around a given point \(x\), we only benefits from the increase of sample size.