docs/numpy_ml.nonparametric.rst
Nonparametric models ####################
.. raw:: html
<h2>K-Nearest Neighbors</h2>The k-nearest neighbors_ (KNN) model is a nonparametric supervised learning
approach that can be applied to classification or regression problems. In a
classification context, the KNN model assigns a class label for a new datapoint
by taking a majority vote amongst the labels for the k closest points
("neighbors") in the training data. Similarly, in a regression context, the KNN
model predicts the target value associated with a new datapoint by taking the
average of the targets associated with the k closes points in the training
data.
.. _k-nearest neighbors: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
Models
~numpy_ml.nonparametric.KNN.. raw:: html
<h2>Gaussian Process Regression</h2>A Gaussian process_ defines a prior distribution over functions mapping
:math:X \rightarrow \mathbb{R}, where X can be any finite (or
infinite!)-dimensional set.
Let :math:f(x_k) be the random variable corresponding to
the value of a function f at a point :math:x_k \in X. Define a random
variable :math:z = [f(x_1), \ldots, f(x_N)] for any finite set of points
:math:\{x_1, \ldots, x_N\} \subset X. If f is distributed according to a
Gaussian Process, it is the case that
.. math::
z \sim \mathcal{N}(\mu, K)
for
.. math::
\mu &= [\text{mean}(x_1), \ldots, \text{mean}(x_N)] \\
K_{ij} &= \text{kernel}(x_i, x_j)
where mean is the mean function (in Gaussian process regression it is common
to define mean(x) = 0), and kernel is a :doc:kernel <numpy_ml.utils.kernels> / covariance function that determines the general
shape of the GP prior over functions, p(f).
In Gaussian process regression_ (AKA simple Kriging [2]_ [3]_), a Gaussian
process is used as a prior on functions and is combined with the Gaussian
likelihood from the linear model via Bayes' rule to compute a posterior over
functions f:
.. math::
y \mid X, f &\sim \mathcal{N}( [f(x_1), \ldots, f(x_n)], \alpha I ) \\
f \mid X &\sim \text{GP}(0, K)
Due to the conjugacy of the Gaussian Process prior with the regression model's Gaussian likelihood, the posterior will also be Gaussian and can be computed in closed form.
.. _Gaussian process: https://en.wikipedia.org/wiki/Gaussian_process
.. _Gaussian process regression: https://en.wikipedia.org/wiki/Kriging
Models
~numpy_ml.nonparametric.GPRegressionReferences
.. [1] Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA. .. [2] Krige, D. G., (1951). "A statistical approach to some mine valuations and allied problems at the Witwatersrand", Master's thesis of the University of Witwatersrand. .. [3] Matheron, G., (1963). "Principles of geostatistics", Economic Geology, 58, 1246-1266.
.. raw:: html
<h2>Kernel Regression</h2>Kernel regression is another nonparametric approach to nonlinear regression.
Like the Gaussian Process regression approach (or, more generally, all
regression models), kernel regression attempts to learn a function f which
captures the conditional expectation of some targets y given the data
X, under the assumption that
.. math:: y_i = f(x_i) + \epsilon_i \ \ \ \ \text{where } \mathbb{E}[\epsilon | \mathbf{x}] = \mathbb{E}[\epsilon] = 0
Unlike the Gaussian Process regression approach, however, kernel regression
does not place a prior over f. Instead, it models :math:f = \mathbb{E}[y | X] = \int_y \frac{p(X, y)}{p(X)} y \ \text{d}y using a :doc:kernel function <numpy_ml.utils.kernels>, k, to estimate the smoothed data probabilities.
For example, the :class:Nadaraya-Watson <numpy_ml.nonparametric.KernelRegression>
estimator [4]_ [5]_ uses the following probability estimates:
.. math:: \hat{p}(X) &= \prod_{i=1}^N \hat{p}(x_i) = \prod_{i=1}^N \sum_{j=1}^N \frac{k(x_i - x_j)}{N} \ \hat{p}(X, y) & \prod_{i=1}^N \hat{p}(x_i, y_i) = \prod_{i=1}^N \sum_{j=1}^N \frac{k(x_i - x_j) k(y_i - y_j)}{N}
Models
~numpy_ml.nonparametric.KernelRegressionReferences
.. [4] Nadaraya, E. A. (1964). "On estimating regression". Theory of Probability and Its Applications, 9 (1), 141-2. .. [5] Watson, G. S. (1964). "Smooth regression analysis". Sankhyā: The Indian Journal of Statistics, Series A. 26 (4), 359–372.
.. raw:: html
<h2>See Also</h2>The :doc:trees <numpy_ml.trees> module contains other classic nonparametric
approaches, including :doc:decision trees <numpy_ml.trees.dt>,
:doc:random forests <numpy_ml.trees.rf>, and :doc:gradient boosted decision trees <numpy_ml.trees.gbdt>.
.. toctree:: :maxdepth: 2 :hidden:
numpy_ml.nonparametric.knn numpy_ml.nonparametric.gp numpy_ml.nonparametric.kernel_regression