doc/whats_new/v1.6.rst
.. include:: _contributors.rst
.. currentmodule:: sklearn
.. _release_notes_1_6:
For a short description of the main highlights of the release, please refer to
:ref:sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_6_0.py.
.. include:: changelog_legend.inc
.. towncrier release notes start
.. _changes_1_6_1:
January 2025
tags.input_tags.sparse flag was corrected for a majority of estimators.
By :user:Antoine Baker <antoinebaker> :pr:30187_more_tags, _get_tags, and _safe_tags are now raising a
:class:DeprecationWarning instead of a :class:FutureWarning to only notify
developers instead of end-users.
By :user:Guillaume Lemaitre <glemaitre> in :pr:30573sklearn.metricsLoïc Estève <lesteve> :pr:30454sklearn.model_selection~model_selection.cross_validate, :func:~model_selection.cross_val_predict,
and :func:~model_selection.cross_val_score now accept params=None when metadata
routing is enabled. By Adrin Jalali_ :pr:30451sklearn.treelog2 instead of ln for building trees to maintain behavior of previous
versions. By Thomas Fan_ :pr:30557sklearn.utils|Enhancement| :func:utils.estimator_checks.check_estimator_sparse_tag ensures that
the estimator tag input_tags.sparse is consistent with its fit
method (accepting sparse input X or raising the appropriate error).
By :user:Antoine Baker <antoinebaker> :pr:30187
|Fix| Raise a DeprecationWarning when there is no concrete implementation of __sklearn_tags__
in the MRO of the estimator. We request to inherit from BaseEstimator that
implements __sklearn_tags__.
By :user:Guillaume Lemaitre <glemaitre> :pr:30516
.. _changes_1_6_0:
December 2024
|Enhancement| __sklearn_tags__ was introduced for setting tags in estimators.
More details in :ref:estimator_tags.
By :user:Thomas Fan <thomasjpfan> and :user:Adrin Jalali <adrinjalali> :pr:29677
|Enhancement| Scikit-learn classes and functions can be used while only having a
import sklearn import line. For example, import sklearn; sklearn.svm.SVC() now works.
By :user:Thomas Fan <thomasjpfan> :pr:29793
|Fix| Classes :class:metrics.ConfusionMatrixDisplay,
:class:metrics.RocCurveDisplay, :class:calibration.CalibrationDisplay,
:class:metrics.PrecisionRecallDisplay, :class:metrics.PredictionErrorDisplay and
:class:inspection.PartialDependenceDisplay now properly handle Matplotlib aliases
for style parameters (e.g., c and color, ls and linestyle, etc).
By :user:Joseph Barbier <JosephBARBIERDARNAL> :pr:30023
|API| :func:utils.validation.validate_data is introduced and replaces previously
private base.BaseEstimator._validate_data method. This is intended for third party
estimator developers, who should use this function in most cases instead of
:func:utils.check_array and :func:utils.check_X_y.
By :user:Adrin Jalali <adrinjalali> :pr:29696
Additional estimators and functions have been updated to include support for all
Array API <https://data-apis.org/array-api/latest/>_ compliant inputs.
See :ref:array_api for more details.
|Feature| :class:model_selection.GridSearchCV,
:class:model_selection.RandomizedSearchCV,
:class:model_selection.HalvingGridSearchCV and
:class:model_selection.HalvingRandomSearchCV now support Array API
compatible inputs when their base estimators do.
By :user:Tim Head <betatim> and :user:Olivier Grisel <ogrisel> :pr:27096
|Feature| :func:sklearn.metrics.f1_score now supports Array API compatible
inputs.
By :user:Omar Salman <OmarManzoor> :pr:27369
|Feature| :class:preprocessing.LabelEncoder now supports Array API compatible inputs.
By :user:Omar Salman <OmarManzoor> :pr:27381
|Feature| :func:sklearn.metrics.mean_absolute_error now supports Array API compatible
inputs.
By :user:Edoardo Abati <EdAbati> :pr:27736
|Feature| :func:sklearn.metrics.mean_tweedie_deviance now supports Array API
compatible inputs.
By :user:Thomas Li <lithomas1> :pr:28106
|Feature| :func:sklearn.metrics.pairwise.cosine_similarity now supports Array API
compatible inputs.
By :user:Edoardo Abati <EdAbati> :pr:29014
|Feature| :func:sklearn.metrics.pairwise.paired_cosine_distances now supports Array
API compatible inputs.
By :user:Edoardo Abati <EdAbati> :pr:29112
|Feature| :func:sklearn.metrics.cluster.entropy now supports Array API compatible
inputs.
By :user:Yaroslav Korobko <Tialo> :pr:29141
|Feature| :func:sklearn.metrics.mean_squared_error now supports Array API compatible
inputs.
By :user:Yaroslav Korobko <Tialo> :pr:29142
|Feature| :func:sklearn.metrics.pairwise.additive_chi2_kernel now supports Array API
compatible inputs.
By :user:Yaroslav Korobko <Tialo> :pr:29144
|Feature| :func:sklearn.metrics.d2_tweedie_score now supports Array API compatible
inputs.
By :user:Emily Chen <EmilyXinyi> :pr:29207
|Feature| :func:sklearn.metrics.max_error now supports Array API compatible inputs.
By :user:Edoardo Abati <EdAbati> :pr:29212
|Feature| :func:sklearn.metrics.mean_poisson_deviance now supports Array API
compatible inputs.
By :user:Emily Chen <EmilyXinyi> :pr:29227
|Feature| :func:sklearn.metrics.mean_gamma_deviance now supports Array API compatible
inputs.
By :user:Emily Chen <EmilyXinyi> :pr:29239
|Feature| :func:sklearn.metrics.pairwise.cosine_distances now supports Array API
compatible inputs.
By :user:Emily Chen <EmilyXinyi> :pr:29265
|Feature| :func:sklearn.metrics.pairwise.chi2_kernel now supports Array API
compatible inputs.
By :user:Yaroslav Korobko <Tialo> :pr:29267
|Feature| :func:sklearn.metrics.mean_absolute_percentage_error now supports Array API
compatible inputs.
By :user:Emily Chen <EmilyXinyi> :pr:29300
|Feature| :func:sklearn.metrics.pairwise.paired_euclidean_distances now supports
Array API compatible inputs.
By :user:Emily Chen <EmilyXinyi> :pr:29389
|Feature| :func:sklearn.metrics.pairwise.euclidean_distances and
:func:sklearn.metrics.pairwise.rbf_kernel now support Array API compatible
inputs.
By :user:Omar Salman <OmarManzoor> :pr:29433
|Feature| :func:sklearn.metrics.pairwise.linear_kernel,
:func:sklearn.metrics.pairwise.sigmoid_kernel, and
:func:sklearn.metrics.pairwise.polynomial_kernel now support Array API
compatible inputs.
By :user:Omar Salman <OmarManzoor> :pr:29475
|Feature| :func:sklearn.metrics.mean_squared_log_error and
:func:sklearn.metrics.root_mean_squared_log_error
now support Array API compatible inputs.
By :user:Virgil Chan <virchan> :pr:29709
|Feature| :class:preprocessing.MinMaxScaler with clip=True now supports Array API
compatible inputs.
By :user:Shreekant Nandiyawar <Shree7676> :pr:29751
Support for the soon to be deprecated cupy.array_api module has been
removed in favor of directly supporting the top level cupy module, possibly
via the array_api_compat.cupy compatibility wrapper.
By :user:Olivier Grisel <ogrisel> :pr:29639
Refer to the :ref:Metadata Routing User Guide <metadata_routing> for
more details.
|Feature| :class:semi_supervised.SelfTrainingClassifier
now supports metadata routing. The fit method now accepts **fit_params
which are passed to the underlying estimators via their fit methods.
In addition, the
:meth:~semi_supervised.SelfTrainingClassifier.predict,
:meth:~semi_supervised.SelfTrainingClassifier.predict_proba,
:meth:~semi_supervised.SelfTrainingClassifier.predict_log_proba,
:meth:~semi_supervised.SelfTrainingClassifier.score
and :meth:~semi_supervised.SelfTrainingClassifier.decision_function
methods also accept **params which are
passed to the underlying estimators via their respective methods.
By :user:Adam Li <adam2392> :pr:28494
|Feature| :class:ensemble.StackingClassifier and
:class:ensemble.StackingRegressor now support metadata routing and pass
**fit_params to the underlying estimators via their fit methods.
By :user:Stefanie Senger <StefanieSenger> :pr:28701
|Feature| :func:model_selection.learning_curve now supports metadata routing for the
fit method of its estimator and for its underlying CV splitter and scorer.
By :user:Stefanie Senger <StefanieSenger> :pr:28975
|Feature| :class:compose.TransformedTargetRegressor now supports metadata
routing in its :meth:~compose.TransformedTargetRegressor.fit and
:meth:~compose.TransformedTargetRegressor.predict methods and routes the
corresponding params to the underlying regressor.
By :user:Omar Salman <OmarManzoor> :pr:29136
|Feature| :class:feature_selection.SequentialFeatureSelector now supports
metadata routing in its fit method and passes the corresponding params to
the :func:model_selection.cross_val_score function.
By :user:Omar Salman <OmarManzoor> :pr:29260
|Feature| :func:model_selection.permutation_test_score now supports metadata routing
for the fit method of its estimator and for its underlying CV splitter and scorer.
By :user:Adam Li <adam2392> :pr:29266
|Feature| :class:feature_selection.RFE and :class:feature_selection.RFECV
now support metadata routing.
By :user:Omar Salman <OmarManzoor> :pr:29312
|Feature| :func:model_selection.validation_curve now supports metadata routing for
the fit method of its estimator and for its underlying CV splitter and scorer.
By :user:Stefanie Senger <StefanieSenger> :pr:29329
|Fix| Metadata is routed correctly to grouped CV splitters via
:class:linear_model.RidgeCV and :class:linear_model.RidgeClassifierCV and
UnsetMetadataPassedError is fixed for :class:linear_model.RidgeClassifierCV with
default scoring.
By :user:Stefanie Senger <StefanieSenger> :pr:29634
|Fix| Many method arguments which shouldn't be included in the routing mechanism are
now excluded and the set_{method}_request methods are not generated for them.
By Adrin Jalali_ :pr:29920
Due to limited maintainer resources and small number of users, official PyPy
support has been dropped. Some parts of scikit-learn may still work but PyPy is
not tested anymore in the scikit-learn Continuous Integration.
By :user:Loïc Estève <lesteve> :pr:29128
From scikit-learn 1.6 onwards, support for building with setuptools has been
removed. Meson is the only supported way to build scikit-learn.
By :user:Loïc Estève <lesteve> :pr:29400
scikit-learn has preliminary support for free-threaded CPython, in particular free-threaded wheels are available for all of our supported platforms.
Free-threaded (also known as nogil) CPython 3.13 is an experimental version of CPython 3.13 which aims at enabling efficient multi-threaded use cases by removing the Global Interpreter Lock (GIL).
For more details about free-threaded CPython see py-free-threading doc <https://py-free-threading.github.io>,
in particular how to install a free-threaded CPython <https://py-free-threading.github.io/installing_cpython/>
and Ecosystem compatibility tracking <https://py-free-threading.github.io/tracking/>_.
Feel free to try free-threaded on your use case and report any issues!
By :user:Loïc Estève <lesteve> and many other people in the wider Scientific
Python and CPython ecosystem, for example :user:Nathan Goldbaum <ngoldbaum>,
:user:Ralf Gommers <rgommers>, :user:Edgar Andrés Margffoy Tuay <andfoy>. :pr:30360
sklearn.base|Enhancement| Added a function :func:base.is_clusterer which determines whether a given
estimator is of category clusterer.
By :user:Christian Veenhuis <ChVeen> :pr:28936
|API| Passing a class object to :func:~sklearn.base.is_classifier,
:func:~sklearn.base.is_regressor, and
:func:~sklearn.base.is_outlier_detector is now deprecated. Pass an instance
instead.
By Adrin Jalali_ :pr:30122
sklearn.calibrationcv="prefit" is deprecated for :class:~sklearn.calibration.CalibratedClassifierCV.
Use :class:~sklearn.frozen.FrozenEstimator instead, as
CalibratedClassifierCV(FrozenEstimator(estimator)).
By Adrin Jalali_ :pr:30171sklearn.clustercopy parameter of :class:cluster.Birch was deprecated in 1.6 and will be
removed in 1.8. It has no effect as the estimator does not perform in-place operations
on the input data.
By :user:Yao Xiao <Charlie-XIAO> :pr:29124sklearn.composesklearn.compose.ColumnTransformer verbose_feature_names_out
now accepts string format or callable to generate feature names.
By :user:Marc Bresson <MarcBresson> :pr:28934sklearn.covariancecovariance.MinCovDet fitting is now slightly faster.
By :user:Antony Lee <anntzer> :pr:29835sklearn.cross_decompositioncross_decomposition.PLSRegression properly raises an error when
n_components is larger than n_samples.
By :user:Thomas Fan <thomasjpfan> :pr:29710sklearn.datasetsdatasets.fetch_file allows downloading arbitrary data-file
from the web. It handles local caching, integrity checks with SHA256 digests
and automatic retries in case of HTTP errors.
By :user:Olivier Grisel <ogrisel> :pr:29354sklearn.decomposition|Enhancement| :class:~sklearn.decomposition.LatentDirichletAllocation now has a
normalize parameter in
:meth:~sklearn.decomposition.LatentDirichletAllocation.transform and
:meth:~sklearn.decomposition.LatentDirichletAllocation.fit_transform
methods to control whether the document topic distribution is normalized.
By Adrin Jalali_ :pr:30097
|Fix| :class:~sklearn.decomposition.IncrementalPCA
will now only raise a ValueError when the number of samples in the
input data to partial_fit is less than the number of components
on the first call to partial_fit. Subsequent calls to partial_fit
no longer face this restriction.
By :user:Thomas Gessey-Jones <ThomasGesseyJonesPX> :pr:30224
sklearn.discriminant_analysisdiscriminant_analysis.QuadraticDiscriminantAnalysis
will now cause LinAlgWarning in case of collinear variables. These errors
can be silenced using the reg_param attribute.
By :user:Alihan Zihna <azihna> :pr:19731sklearn.ensemble|Feature| :class:ensemble.ExtraTreesClassifier and
:class:ensemble.ExtraTreesRegressor now support missing-values in the data matrix
X. Missing-values are handled by randomly moving all of the samples to the left, or
right child node as the tree is traversed.
By :user:Adam Li <adam2392> :pr:28268
|Efficiency| Small runtime improvement of fitting
:class:ensemble.HistGradientBoostingClassifier and
:class:ensemble.HistGradientBoostingRegressor by parallelizing the initial search
for bin thresholds.
By :user:Christian Lorentzen <lorentzenchr> :pr:28064
|Efficiency| :class:ensemble.IsolationForest now runs parallel jobs
during :term:predict offering a speedup of up to 2-4x on sample sizes
larger than 2000 using joblib.
By :user:Adam Li <adam2392> and :user:Sérgio Pereira <sergiormpereira> :pr:28622
|Enhancement| The verbosity of :class:ensemble.HistGradientBoostingClassifier
and :class:ensemble.HistGradientBoostingRegressor got a more granular control. Now,
verbose = 1 prints only summary messages, verbose >= 2 prints the full
information as before.
By :user:Christian Lorentzen <lorentzenchr> :pr:28179
|API| The parameter algorithm of :class:ensemble.AdaBoostClassifier is deprecated
and will be removed in 1.8.
By :user:Jérémie du Boisberranger <jeremiedbb> :pr:29997
sklearn.feature_extractionfeature_extraction.text.TfidfVectorizer now correctly preserves the
dtype of idf_ based on the input data.
By :user:Guillaume Lemaitre <glemaitre> :pr:30022sklearn.frozen~sklearn.frozen.FrozenEstimator is now introduced which allows
freezing an estimator. This means calling .fit on it has no effect, and doing a
clone(frozenestimator) returns the same estimator instead of an unfitted clone.
:pr:29705 By Adrin Jalali_ :pr:29705sklearn.impute|Fix| :class:impute.KNNImputer excludes samples with nan distances when
computing the mean value for uniform weights.
By :user:Xuefeng Xu <xuefeng-xu> :pr:29135
|Fix| When min_value and max_value are array-like and some features are dropped due to
keep_empty_features=False, :class:impute.IterativeImputer no longer raises an
error and now indexes correctly.
By :user:Guntitat Sawadwuthikul <gunsodo> :pr:29451
|Fix| Fixed :class:impute.IterativeImputer to make sure that it does not skip
the iterative process when keep_empty_features is set to True.
By :user:Arif Qodari <arifqodari> :pr:29779
|API| Add a warning in :class:impute.SimpleImputer when keep_empty_feature=False and
strategy="constant". In this case empty features are not dropped and this behaviour
will change in 1.8.
By :user:Arthur Courselle <ArthurCourselle> and :user:Simon Riou <simon-riou> :pr:29950
sklearn.linear_model|Enhancement| The solver="newton-cholesky" in
:class:linear_model.LogisticRegression and
:class:linear_model.LogisticRegressionCV is extended to support the full
multinomial loss in a multiclass setting.
By :user:Christian Lorentzen <lorentzenchr> :pr:28840
|Fix| In :class:linear_model.Ridge and :class:linear_model.RidgeCV, after fit,
the coef_ attribute is now of shape (n_samples,) like other linear models.
By :user:Maxwell Liu<MaxwellLZH>, Guillaume Lemaitre, and Adrin Jalali :pr:19746
|Fix| :class:linear_model.LogisticRegressionCV corrects sample weight handling
for the calculation of test scores.
By :user:Shruti Nath <snath-xoc> :pr:29419
|Fix| :class:linear_model.LassoCV and :class:linear_model.ElasticNetCV now
take sample weights into accounts to define the search grid for the internally tuned
alpha hyper-parameter.
By :user:John Hopfensperger <s-banach> and :user:Shruti Nath <snath-xoc> :pr:29442
|Fix| :class:linear_model.LogisticRegression, :class:linear_model.PoissonRegressor,
:class:linear_model.GammaRegressor, :class:linear_model.TweedieRegressor
now take sample weights into account to decide when to fall back to solver='lbfgs'
whenever solver='newton-cholesky' becomes numerically unstable.
By :user:Antoine Baker <antoinebaker> :pr:29818
|Fix| :class:linear_model.RidgeCV now properly uses predictions on the same scale as
the target seen during fit. These predictions are stored in cv_results_ when
scoring != None. Previously, the predictions were rescaled by the square root of the
sample weights and offset by the mean of the target, leading to an incorrect estimate
of the score.
By :user:Guillaume Lemaitre <glemaitre>,
:user:Jérôme Dockes <jeromedockes> and
:user:Hanmin Qin <qinhanmin2014> :pr:29842
|Fix| :class:linear_model.RidgeCV now properly supports custom multioutput scorers
by letting the scorer manage the multioutput averaging. Previously, the predictions
and true targets were both squeezed to a 1D array before computing the error.
By :user:Guillaume Lemaitre <glemaitre> :pr:29884
|Fix| :class:linear_model.LinearRegression now sets the cond parameter when
calling the scipy.linalg.lstsq solver on dense input data. This ensures
more numerically robust results on rank-deficient data. In particular, it
empirically fixes the expected equivalence property between fitting with
reweighted or with repeated data points.
By :user:Antoine Baker <antoinebaker> :pr:30040
|Fix| :class:linear_model.LogisticRegression and other linear models that
accept solver="newton-cholesky" now report the correct number of iterations
when they fall back to the "lbfgs" solver because of a rank deficient
Hessian matrix.
By :user:Olivier Grisel <ogrisel> :pr:30100
|Fix| :class:~sklearn.linear_model.SGDOneClassSVM now correctly inherits from
:class:~sklearn.base.OutlierMixin and the tags are correctly set.
By :user:Guillaume Lemaitre <glemaitre> :pr:30227
|API| Deprecates copy_X in :class:linear_model.TheilSenRegressor as the parameter
has no effect. copy_X will be removed in 1.8.
By :user:Adam Li <adam2392> :pr:29105
sklearn.manifoldmanifold.locally_linear_embedding and
:class:manifold.LocallyLinearEmbedding now allocate more efficiently the memory of
sparse matrices in the Hessian, Modified and LTSA methods.
By :user:Giorgio Angelotti <giorgioangel> :pr:28096sklearn.metrics|Efficiency| :func:sklearn.metrics.classification_report is now faster by caching
classification labels.
By :user:Adrin Jalali <adrinjalali> :pr:29738
|Enhancement| :meth:metrics.RocCurveDisplay.from_estimator,
:meth:metrics.RocCurveDisplay.from_predictions,
:meth:metrics.PrecisionRecallDisplay.from_estimator, and
:meth:metrics.PrecisionRecallDisplay.from_predictions now accept a new keyword
despine to remove the top and right spines of the plot in order to make it clearer.
By :user:Yao Xiao <Charlie-XIAO> :pr:26367
|Enhancement| :func:sklearn.metrics.check_scoring now accepts raise_exc to specify
whether to raise an exception if a subset of the scorers in multimetric scoring fails
or to return an error code.
By :user:Stefanie Senger <StefanieSenger> :pr:28992
|Fix| :func:metrics.roc_auc_score will now correctly return np.nan and
warn user if only one class is present in the labels.
By :user:Hleb Levitski <glevv> and :user:Janez Demšar <janezd> :pr:27412, :pr:30013
|Fix| The functions :func:metrics.mean_squared_log_error and
:func:metrics.root_mean_squared_log_error now check whether the inputs are within
the correct domain for the function :math:y=\log(1+x), rather than
:math:y=\log(x). The functions :func:metrics.mean_absolute_error,
:func:metrics.mean_absolute_percentage_error, :func:metrics.mean_squared_error
and :func:metrics.root_mean_squared_error now explicitly check whether a scalar
will be returned when multioutput=uniform_average.
By :user:Virgil Chan <virchan> :pr:29709
|API| The assert_all_finite parameter of functions
:func:metrics.pairwise.check_pairwise_arrays and :func:metrics.pairwise_distances
is renamed into ensure_all_finite. force_all_finite will be removed in 1.8.
By :user:Jérémie du Boisberranger <jeremiedb> :pr:29404
|API| scoring="neg_max_error" should be used instead of scoring="max_error"
which is now deprecated.
By :user:Farid "Freddie" Taba <artificialfintelligence> :pr:29462
|API| The default value of the response_method parameter of
:func:metrics.make_scorer will change from None to "predict" and None will be
removed in 1.8. In the meantime, None is equivalent to "predict".
By :user:Jérémie du Boisberranger <jeremiedb> :pr:30001
sklearn.model_selection|Enhancement| :class:~model_selection.GroupKFold now has the ability to shuffle groups into
different folds when shuffle=True.
By :user:Zachary Vealey <zvealey> :pr:28519
|Enhancement| There is no need to call fit on a
:class:~sklearn.model_selection.FixedThresholdClassifier if the underlying
estimator is already fitted.
By :user:Adrin Jalali <adrinjalali> :pr:30172
|Fix| Improve error message when :func:model_selection.RepeatedStratifiedKFold.split
is called without a y argument
By :user:Anurag Varma <Anurag-Varma> :pr:29402
sklearn.neighbors|Enhancement| :class:neighbors.NearestNeighbors,
:class:neighbors.KNeighborsClassifier,
:class:neighbors.KNeighborsRegressor,
:class:neighbors.RadiusNeighborsClassifier,
:class:neighbors.RadiusNeighborsRegressor,
:class:neighbors.KNeighborsTransformer,
:class:neighbors.RadiusNeighborsTransformer, and
:class:neighbors.LocalOutlierFactor
now work with metric="nan_euclidean", supporting nan inputs.
By :user:Carlo Lemos <vitaliset>, Guillaume Lemaitre, and Adrin Jalali :pr:25330
|Enhancement| Add :meth:neighbors.NearestCentroid.decision_function,
:meth:neighbors.NearestCentroid.predict_proba and
:meth:neighbors.NearestCentroid.predict_log_proba
to the :class:neighbors.NearestCentroid estimator class.
Support the case when X is sparse and shrinking_threshold
is not None in :class:neighbors.NearestCentroid.
By :user:Matthew Ning <NoPenguinsLand> :pr:26689
|Enhancement| Make predict, predict_proba, and score of
:class:neighbors.KNeighborsClassifier and
:class:neighbors.RadiusNeighborsClassifier accept X=None as input. In this case
predictions for all training set points are returned, and points are not included
into their own neighbors.
By :user:Dmitry Kobak <dkobak> :pr:30047
|Fix| :class:neighbors.LocalOutlierFactor raises a warning in the fit method
when duplicate values in the training data lead to inaccurate outlier detection.
By :user:Henrique Caroço <HenriqueProj> :pr:28773
sklearn.neural_networkneural_network.MLPRegressor does no longer crash when the model
diverges and that early_stopping is enabled.
By :user:Marc Bresson <MarcBresson> :pr:29773sklearn.pipeline|MajorFeature| :class:pipeline.Pipeline can now transform metadata up to the step requiring the
metadata, which can be set using the transform_input parameter.
By Adrin Jalali_ :pr:28901
|Enhancement| :class:pipeline.Pipeline now warns about not being fitted before calling methods
that require the pipeline to be fitted. This warning will become an error in 1.8.
By Adrin Jalali_ :pr:29868
|Fix| Fixed an issue with tags and estimator type of :class:~sklearn.pipeline.Pipeline
when pipeline is empty. This allows the HTML representation of an empty
pipeline to be rendered correctly.
By :user:Gennaro Daniele Acciaro <gdacciaro> :pr:30203
sklearn.preprocessing|Enhancement| Added warn option to handle_unknown parameter in
:class:preprocessing.OneHotEncoder.
By :user:Hleb Levitski <glevv> :pr:28637
|Enhancement| The HTML representation of :class:preprocessing.FunctionTransformer
will show the function name in the label.
By :user:Yao Xiao <Charlie-XIAO> :pr:29158
|Fix| :class:preprocessing.PowerTransformer now uses scipy.special.inv_boxcox
to output nan if the input of BoxCox's inverse is invalid.
By :user:Xuefeng Xu <xuefeng-xu> :pr:27875
sklearn.semi_supervisedsemi_supervised.SelfTrainingClassifier
deprecated the base_estimator parameter in favor of estimator.
By :user:Adam Li <adam2392> :pr:28494sklearn.tree|Feature| :class:tree.ExtraTreeClassifier and :class:tree.ExtraTreeRegressor now
support missing-values in the data matrix X. Missing-values are handled by
randomly moving all of the samples to the left, or right child node as the tree is
traversed.
By :user:Adam Li <adam2392> and :user:Loïc Estève <lesteve> :pr:27966, :pr:30318
|Fix| Escape double quotes for labels and feature names when exporting trees to Graphviz
format.
By :user:Santiago M. Mola <smola>. :pr:17575
sklearn.utils|Enhancement| :func:utils.check_array now accepts ensure_non_negative
to check for negative values in the passed array, until now only available through
calling :func:utils.check_non_negative.
By :user:Tamara Atanasoska <tamaraatanasoska> :pr:29540
|Enhancement| :func:~sklearn.utils.estimator_checks.check_estimator and
:func:~sklearn.utils.estimator_checks.parametrize_with_checks now check and fail if
the classifier has the tags.classifier_tags.multi_class = False tag but does not
fail on multi-class data.
By Adrin Jalali_ :pr:29874
|Enhancement| :func:utils.validation.check_is_fitted now passes on stateless
estimators. An estimator can indicate it's stateless by setting the requires_fit
tag. See :ref:estimator_tags for more information.
By :user:Adrin Jalali <adrinjalali> :pr:29880
|Enhancement| Changes to :func:~utils.estimator_checks.check_estimator and
:func:~utils.estimator_checks.parametrize_with_checks.
:func:~utils.estimator_checks.check_estimator introduces new arguments:
on_skip, on_fail, and callback to control the behavior of the check
runner. Refer to the API documentation for more details.
generate_only=True is deprecated in
:func:~utils.estimator_checks.check_estimator. Use
:func:~utils.estimator_checks.estimator_checks_generator instead.
The _xfail_checks estimator tag is now removed, and now in order to indicate
which tests are expected to fail, you can pass a dictionary to the
:func:~utils.estimator_checks.check_estimator as the expected_failed_checks
parameter. Similarly, the expected_failed_checks parameter in
:func:~utils.estimator_checks.parametrize_with_checks can be used, which is a
callable returning a dictionary of the form::
{
"check_name": "reason to mark this check as xfail",
}
By Adrin Jalali_ :pr:30149
|Fix| :func:utils.estimator_checks.parametrize_with_checks and
:func:utils.estimator_checks.check_estimator now support estimators that
have set_output called on them.
By :user:Adrin Jalali <adrinjalali> :pr:29869
|API| The assert_all_finite parameter of functions :func:utils.check_array,
:func:utils.check_X_y, :func:utils.as_float_array is renamed into
ensure_all_finite. force_all_finite will be removed in 1.8.
By :user:Jérémie du Boisberranger <jeremiedb> :pr:29404
|API| utils.estimator_checks.check_sample_weights_invariance
replaced by
utils.estimator_checks.check_sample_weight_equivalence_on_dense_data
which uses integer (including zero) weights and
utils.estimator_checks.check_sample_weight_equivalence_on_sparse_data
which does the same on sparse data.
By :user:Antoine Baker <antoinebaker> :pr:29818, :pr:30137
|API| Using _estimator_type to set the estimator type is deprecated. Inherit from
:class:~sklearn.base.ClassifierMixin, :class:~sklearn.base.RegressorMixin,
:class:~sklearn.base.TransformerMixin, or :class:~sklearn.base.OutlierMixin
instead. Alternatively, you can set estimator_type in :class:~sklearn.utils.Tags
in the __sklearn_tags__ method.
By Adrin Jalali_ :pr:30122
.. rubric:: Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.5, including:
Aaron Schumacher, Abdulaziz Aloqeely, abhi-jha, Acciaro Gennaro Daniele, Adam J. Stewart, Adam Li, Adeel Hassan, Adeyemi Biola, Aditi Juneja, Adrin Jalali, Aisha, Akanksha Mhadolkar, Akihiro Kuno, Alberto Torres, alexqiao, Alihan Zihna, Aniruddha Saha, antoinebaker, Antony Lee, Anurag Varma, Arif Qodari, Arthur Courselle, ArthurDbrn, Arturo Amor, Aswathavicky, Audrey Flanders, aurelienmorgan, Austin, awwwyan, AyGeeEm, a.zy.lee, baggiponte, BlazeStorm001, bme-git, Boney Patel, brdav, Brigitta Sipőcz, Cailean Carter, Camille Troillard, Carlo Lemos, Christian Lorentzen, Christian Veenhuis, Christine P. Chai, claudio, Conrad Stevens, datarollhexasphericon, Davide Chicco, David Matthew Cherney, Dea María Léon, Deepak Saldanha, Deepyaman Datta, dependabot[bot], dinga92, Dmitry Kobak, Domenico, Drew Craeton, dymil, Edoardo Abati, EmilyXinyi, Eric Larson, Evelyn, fabianhenning, Farid "Freddie" Taba, Gael Varoquaux, Giorgio Angelotti, Hleb Levitski, Guillaume Lemaitre, Guntitat Sawadwuthikul, Haesun Park, Hanjun Kim, Henrique Caroço, hhchen1105, Hugo Boulenger, Ilya Komarov, Inessa Pawson, Ivan Pan, Ivan Wiryadi, Jaimin Chauhan, Jakob Bull, James Lamb, Janez Demšar, Jérémie du Boisberranger, Jérôme Dockès, Jirair Aroyan, João Morais, Joe Cainey, Joel Nothman, John Enblom, JorgeCardenas, Joseph Barbier, jpienaar-tuks, Julian Chan, K.Bharat Reddy, Kevin Doshi, Lars, Loic Esteve, Lucas Colley, Lucy Liu, lunovian, Marc Bresson, Marco Edward Gorelli, Marco Maggi, Marco Wolsza, Maren Westermann, MarieS-WiMLDS, Martin Helm, Mathew Shen, mathurinm, Matthew Feickert, Maxwell Liu, Meekail Zain, Michael Dawson, Miguel Cárdenas, m-maggi, mrastgoo, Natalia Mokeeva, Nathan Goldbaum, Nathan Orgera, nbrown-ScottLogic, Nikita Chistyakov, Nithish Bolleddula, Noam Keidar, NoPenguinsLand, Norbert Preining, notPlancha, Olivier Grisel, Omar Salman, ParsifalXu, Piotr, Priyank Shroff, Priyansh Gupta, Quentin Barthélemy, Rachit23110261, Rahil Parikh, raisadz, Rajath, renaissance0ne, Reshama Shaikh, Roberto Rosati, Robert Pollak, rwelsch427, Santiago Castro, Santiago M. Mola, scikit-learn-bot, sean moiselle, SHREEKANT VITTHAL NANDIYAWAR, Shruti Nath, Søren Bredlund Caspersen, Stefanie Senger, Stefano Gaspari, Steffen Schneider, Štěpán Sršeň, Sylvain Combettes, Tamara, Thomas, Thomas Gessey-Jones, Thomas J. Fan, Thomas Li, ThorbenMaa, Tialo, Tim Head, Tuhin Sharma, Tushar Parimi, Umberto Fasci, UV, vedpawar2254, Velislav Babatchev, Victoria Shevchenko, viktor765, Vince Carey, Virgil Chan, Wang Jiayi, Xiao Yuan, Xuefeng Xu, Yao Xiao, yareyaredesuyo, Zachary Vealey, Ziad Amerr