doc/modules/unsupervised_reduction.rst
.. _data_reduction:
If your number of features is high, it may be useful to reduce it with an
unsupervised step prior to supervised steps. Many of the
:ref:unsupervised-learning methods implement a transform method that
can be used to reduce the dimensionality. Below we discuss two specific
examples of this pattern that are heavily used.
.. topic:: Pipelining
The unsupervised data reduction and the supervised estimator can be
chained in one step. See :ref:`pipeline`.
.. currentmodule:: sklearn
:class:decomposition.PCA looks for a combination of features that
capture well the variance of the original features. See :ref:decompositions.
.. rubric:: Examples
sphx_glr_auto_examples_applications_plot_face_recognition.pyThe module: :mod:~sklearn.random_projection provides several tools for data
reduction by random projections. See the relevant section of the
documentation: :ref:random_projection.
.. rubric:: Examples
sphx_glr_auto_examples_miscellaneous_plot_johnson_lindenstrauss_bound.py:class:cluster.FeatureAgglomeration applies
:ref:hierarchical_clustering to group together features that behave
similarly.
.. rubric:: Examples
sphx_glr_auto_examples_cluster_plot_feature_agglomeration_vs_univariate_selection.pysphx_glr_auto_examples_cluster_plot_digits_agglomeration.py.. topic:: Feature scaling
Note that if features have very different scaling or statistical
properties, :class:cluster.FeatureAgglomeration may not be able to
capture the links between related features. Using a
:class:preprocessing.StandardScaler can be useful in these settings.