docs/source/nonparametric.rst
.. currentmodule:: statsmodels.nonparametric
.. _nonparametric:
nonparametricThis section collects various methods in nonparametric statistics. This includes kernel density estimation for univariate and multivariate data, kernel regression and locally weighted scatterplot smoothing (lowess).
sandbox.nonparametric contains additional functions that are work in progress or do not have unit tests yet. We are planning to include here nonparametric density estimators, especially based on kernel or orthogonal polynomials, smoothers, and tools for nonparametric models and methods in other parts of statsmodels.
The kernel density estimation (KDE) functionality is split between univariate and multivariate estimation, which are implemented in quite different ways.
Univariate estimation (as provided by KDEUnivariate) uses FFT transforms,
which makes it quite fast. Therefore it should be preferred for continuous,
univariate data if speed is important. It supports using different kernels;
bandwidth estimation is done only by a rule of thumb (Scott or Silverman).
Multivariate estimation (as provided by KDEMultivariate) uses product
kernels. It supports least squares and maximum likelihood cross-validation
for bandwidth estimation, as well as estimating mixed continuous, ordered and
unordered data. The default kernels (Gaussian, Wang-Ryzin and
Aitchison-Aitken) cannot be altered at the moment however. Direct estimation
of the conditional density (:math:P(X | Y) = P(X, Y) / P(Y)) is supported
by KDEMultivariateConditional.
KDEMultivariate can do univariate estimation as well, but is up to two orders
of magnitude slower than KDEUnivariate.
Kernel regression (as provided by KernelReg) is based on the same product
kernel approach as KDEMultivariate, and therefore has the same set of
features (mixed data, cross-validated bandwidth estimation, kernels) as
described above for KDEMultivariate. Censored regression is provided by
KernelCensoredReg.
Note that code for semi-parametric partial linear models and single index
models, based on KernelReg, can be found in the sandbox.
.. module:: statsmodels.nonparametric :synopsis: Nonparametric estimation of densities and curves
The public functions and classes are
.. currentmodule:: statsmodels.nonparametric.smoothers_lowess .. autosummary:: :toctree: generated/
lowess
.. currentmodule:: statsmodels.nonparametric.kde .. autosummary:: :toctree: generated/
KDEUnivariate
.. currentmodule:: statsmodels.nonparametric.kernel_density .. autosummary:: :toctree: generated/
KDEMultivariate KDEMultivariateConditional EstimatorSettings
.. currentmodule:: statsmodels.nonparametric.kernel_regression .. autosummary:: :toctree: generated/
KernelReg KernelCensoredReg
helper functions for kernel bandwidths
.. currentmodule:: statsmodels.nonparametric.bandwidths .. autosummary:: :toctree: generated/
bw_scott bw_silverman select_bandwidth
There are some examples for nonlinear functions in
:mod:statsmodels.nonparametric.dgp_examples
Asymmetric kernels like beta for the unit interval and gamma for positive valued random variables avoid problems at the boundary of the support of the distribution.
Statsmodels has preliminary support for estimating density and cumulative
distribution function using kernels for the unit interval, beta or the
positive real line, all other kernels.
Several of the kernels for the positive real line assume that the density at the zero boundary is zero. The gamma kernel also allows the case of positive or unbound density at the zero boundary.
There are currently no defaults and no support for choosing the bandwidth. the user has to provide the bandwidth.
The functions to compute kernel density and kernel cdf are
.. currentmodule:: statsmodels.nonparametric.kernels_asymmetric .. autosummary:: :toctree: generated/
pdf_kernel_asym cdf_kernel_asym
The available kernel functions for pdf and cdf are
.. autosummary:: :toctree: generated/
kernel_pdf_beta kernel_pdf_beta2 kernel_pdf_bs kernel_pdf_gamma kernel_pdf_gamma2 kernel_pdf_invgamma kernel_pdf_invgauss kernel_pdf_lognorm kernel_pdf_recipinvgauss kernel_pdf_weibull kernel_cdf_beta kernel_cdf_beta2 kernel_cdf_bs kernel_cdf_gamma kernel_cdf_gamma2 kernel_cdf_invgamma kernel_cdf_invgauss kernel_cdf_lognorm kernel_cdf_recipinvgauss kernel_cdf_weibull
The sandbox.nonparametric contains additional insufficiently tested classes for testing functional form and for semi-linear and single index models.