doc/whats_new/v1.3.rst
.. include:: _contributors.rst
.. currentmodule:: sklearn
.. _release_notes_1_3:
For a short description of the main highlights of the release, please refer to
:ref:sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_3_0.py.
.. include:: changelog_legend.inc
.. _changes_1_3_2:
October 2023
:mod:sklearn.datasets
.......................
data_home as any object that implements
the :class:os.PathLike interface, for instance, :class:pathlib.Path.
:pr:27468 by :user:Yao Xiao <Charlie-XIAO>.:mod:sklearn.decomposition
............................
decomposition.KernelPCA by forcing the output of
the internal :class:preprocessing.KernelCenterer to be a default array. When the
arpack solver is used, it expects an array with a dtype attribute.
:pr:27583 by :user:Guillaume Lemaitre <glemaitre>.:mod:sklearn.metrics
......................
zero_division=np.nan
(e.g. :func:~metrics.precision_score) within a parallel loop
(e.g. :func:~model_selection.cross_val_score) where the singleton for np.nan
will be different in the sub-processes.
:pr:27573 by :user:Guillaume Lemaitre <glemaitre>.:mod:sklearn.tree
...................
27580 by :user:Loïc Estève <lesteve>... _changes_1_3_1:
September 2023
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
solver='sparse_cg' may have slightly different
results with scipy>=1.12, because of an underlying change in the scipy solver
(see scipy#18488 <https://github.com/scipy/scipy/pull/18488>_ for more
details)
:pr:26814 by :user:Loïc Estève <lesteve>set_output API correctly works with list input. :pr:27044 by
Thomas Fan_.:mod:sklearn.calibration
..........................
calibration.CalibratedClassifierCV can now handle models that
produce large prediction scores. Before it was numerically unstable.
:pr:26913 by :user:Omar Salman <OmarManzoor>.:mod:sklearn.cluster
......................
|Fix| :class:cluster.BisectingKMeans could crash when predicting on data
with a different scale than the data used to fit the model.
:pr:27167 by Olivier Grisel_.
|Fix| :class:cluster.BisectingKMeans now works with data that has a single feature.
:pr:27243 by :user:Jérémie du Boisberranger <jeremiedbb>.
:mod:sklearn.cross_decomposition
..................................
cross_decomposition.PLSRegression now automatically ravels the output
of predict if fitted with one dimensional y.
:pr:26602 by :user:Yao Xiao <Charlie-XIAO>.:mod:sklearn.ensemble
.......................
ensemble.AdaBoostClassifier with algorithm="SAMME"
where the decision function of each weak learner should be symmetric (i.e.
the sum of the scores should sum to zero for a sample).
:pr:26521 by :user:Guillaume Lemaitre <glemaitre>.:mod:sklearn.feature_selection
................................
feature_selection.mutual_info_regression now correctly computes the
result when X is of integer dtype. :pr:26748 by :user:Yao Xiao <Charlie-XIAO>.:mod:sklearn.impute
.....................
impute.KNNImputer now correctly adds a missing indicator column in
transform when add_indicator is set to True and missing values are observed
during fit. :pr:26600 by :user:Shreesha Kumar Bhat <Shreesha3112>.:mod:sklearn.metrics
......................
metrics.get_scorer handle properly
multilabel-indicator matrix.
:pr:27002 by :user:Guillaume Lemaitre <glemaitre>.:mod:sklearn.mixture
......................
mixture.GaussianMixture from user-provided
precisions_init for covariance_type of full or tied was not correct,
and has been fixed.
:pr:26416 by :user:Yang Tao <mchikyt3>.:mod:sklearn.neighbors
........................
|Fix| :meth:neighbors.KNeighborsClassifier.predict no longer raises an
exception for pandas.DataFrames input.
:pr:26772 by :user:Jérémie du Boisberranger <jeremiedbb>.
|Fix| Reintroduce sklearn.neighbors.BallTree.valid_metrics and
sklearn.neighbors.KDTree.valid_metrics as public class attributes.
:pr:26754 by :user:Julien Jerphanion <jjerphan>.
|Fix| :class:sklearn.model_selection.HalvingRandomSearchCV no longer raises
when the input to the param_distributions parameter is a list of dicts.
:pr:26893 by :user:Stefanie Senger <StefanieSenger>.
|Fix| Neighbors based estimators now correctly work when metric="minkowski" and the
metric parameter p is in the range 0 < p < 1, regardless of the dtype of X.
:pr:26760 by :user:Shreesha Kumar Bhat <Shreesha3112>.
:mod:sklearn.preprocessing
............................
|Fix| :class:preprocessing.LabelEncoder correctly accepts y as a keyword
argument. :pr:26940 by Thomas Fan_.
|Fix| :class:preprocessing.OneHotEncoder shows a more informative error message
when sparse_output=True and the output is configured to be pandas.
:pr:26931 by Thomas Fan_.
:mod:sklearn.tree
...................
|Fix| :func:tree.plot_tree now accepts class_names=True as documented.
:pr:26903 by :user:Thomas Roehr <2maz>
|Fix| The feature_names parameter of :func:tree.plot_tree now accepts any kind of
array-like instead of just a list. :pr:27292 by :user:Rahil Parikh <rprkh>.
.. _changes_1_3:
June 2023
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
|Enhancement| :meth:multiclass.OutputCodeClassifier.predict now uses a more
efficient pairwise distance reduction. As a consequence, the tie-breaking
strategy is different and thus the predicted labels may be different.
:pr:25196 by :user:Guillaume Lemaitre <glemaitre>.
|Enhancement| The fit_transform method of :class:decomposition.DictionaryLearning
is more efficient but may produce different results as in previous versions when
transform_algorithm is not the same as fit_algorithm and the number of iterations
is small. :pr:24871 by :user:Omar Salman <OmarManzoor>.
|Enhancement| The sample_weight parameter now will be used in centroids
initialization for :class:cluster.KMeans, :class:cluster.BisectingKMeans
and :class:cluster.MiniBatchKMeans.
This change will break backward compatibility, since numbers generated
from same random seeds will be different.
:pr:25752 by :user:Hleb Levitski <glevv>,
:user:Jérémie du Boisberranger <jeremiedbb>,
:user:Guillaume Lemaitre <glemaitre>.
|Fix| Treat more consistently small values in the W and H matrices during the
fit and transform steps of :class:decomposition.NMF and
:class:decomposition.MiniBatchNMF which can produce different results than previous
versions. :pr:25438 by :user:Yotam Avidar-Constantini <yotamcons>.
|Fix| :class:decomposition.KernelPCA may produce different results through
inverse_transform if gamma is None. Now it will be chosen correctly as
1/n_features of the data that it is fitted on, while previously it might be
incorrectly chosen as 1/n_features of the data passed to inverse_transform.
A new attribute gamma_ is provided for revealing the actual value of gamma
used each time the kernel is called.
:pr:26337 by :user:Yao Xiao <Charlie-XIAO>.
|Enhancement| :class:model_selection.LearningCurveDisplay displays both the
train and test curves by default. You can set score_type="test" to keep the
past behaviour.
:pr:25120 by :user:Guillaume Lemaitre <glemaitre>.
|Fix| :class:model_selection.ValidationCurveDisplay now accepts passing a
list to the param_range parameter.
:pr:27311 by :user:Arturo Amor <ArturoAmorQ>.
|Enhancement| The get_feature_names_out method of the following classes now
raises a NotFittedError if the instance is not fitted. This ensures the error is
consistent in all estimators with the get_feature_names_out method.
impute.MissingIndicatorfeature_extraction.DictVectorizerfeature_extraction.text.TfidfTransformerfeature_selection.GenericUnivariateSelectfeature_selection.RFEfeature_selection.RFECVfeature_selection.SelectFdrfeature_selection.SelectFprfeature_selection.SelectFromModelfeature_selection.SelectFwefeature_selection.SelectKBestfeature_selection.SelectPercentilefeature_selection.SequentialFeatureSelectorfeature_selection.VarianceThresholdkernel_approximation.AdditiveChi2Samplerimpute.IterativeImputerimpute.KNNImputerimpute.SimpleImputerisotonic.IsotonicRegressionpreprocessing.Binarizerpreprocessing.KBinsDiscretizerpreprocessing.MaxAbsScalerpreprocessing.MinMaxScalerpreprocessing.Normalizerpreprocessing.OrdinalEncoderpreprocessing.PowerTransformerpreprocessing.QuantileTransformerpreprocessing.RobustScalerpreprocessing.SplineTransformerpreprocessing.StandardScalerrandom_projection.GaussianRandomProjectionrandom_projection.SparseRandomProjectionThe NotFittedError displays an informative message asking to fit the instance
with the appropriate arguments.
:pr:25294, :pr:25308, :pr:25291, :pr:25367, :pr:25402,
by :user:John Pangas <jpangas>, :user:Rahil Parikh <rprkh> ,
and :user:Alex Buzenet <albuzenet>.
|Enhancement| Added a multi-threaded Cython routine to the compute squared Euclidean distances (sometimes followed by a fused reduction operation) for a pair of datasets consisting of a sparse CSR matrix and a dense NumPy.
This can improve the performance of following functions and estimators:
sklearn.metrics.pairwise_distances_argminsklearn.metrics.pairwise_distances_argmin_minsklearn.cluster.AffinityPropagationsklearn.cluster.Birchsklearn.cluster.MeanShiftsklearn.cluster.OPTICSsklearn.cluster.SpectralClusteringsklearn.feature_selection.mutual_info_regressionsklearn.neighbors.KNeighborsClassifiersklearn.neighbors.KNeighborsRegressorsklearn.neighbors.RadiusNeighborsClassifiersklearn.neighbors.RadiusNeighborsRegressorsklearn.neighbors.LocalOutlierFactorsklearn.neighbors.NearestNeighborssklearn.manifold.Isomapsklearn.manifold.LocallyLinearEmbeddingsklearn.manifold.TSNEsklearn.manifold.trustworthinesssklearn.semi_supervised.LabelPropagationsklearn.semi_supervised.LabelSpreadingA typical example of this performance improvement happens when passing a sparse
CSR matrix to the predict or transform method of estimators that rely on
a dense NumPy representation to store their fitted parameters (or the reverse).
For instance, :meth:sklearn.neighbors.NearestNeighbors.kneighbors is now up
to 2 times faster for this case on commonly available laptops.
:pr:25044 by :user:Julien Jerphanion <jjerphan>.
|Enhancement| All estimators that internally rely on OpenMP multi-threading
(via Cython) now use a number of threads equal to the number of physical
(instead of logical) cores by default. In the past, we observed that using as
many threads as logical cores on SMT hosts could sometimes cause severe
performance problems depending on the algorithms and the shape of the data.
Note that it is still possible to manually adjust the number of threads used
by OpenMP as documented in :ref:parallelism.
:pr:26082 by :user:Jérémie du Boisberranger <jeremiedbb> and
:user:Olivier Grisel <ogrisel>.
Metadata routing <metadata_routing>'s related base
methods are included in this release. This feature is only available via the
enable_metadata_routing feature flag which can be enabled using
:func:sklearn.set_config and :func:sklearn.config_context. For now this
feature is mostly useful for third party developers to prepare their code
base for metadata routing, and we strongly recommend that they also hide it
behind the same feature flag, rather than having it enabled by default.
:pr:24027 by Adrin Jalali_, :user:Benjamin Bossan <BenjaminBossan>, and
:user:Omar Salman <OmarManzoor>...
Entries should be grouped by module (in alphabetic order) and prefixed with
one of the labels: |MajorFeature|, |Feature|, |Efficiency|, |Enhancement|,
|Fix| or |API| (see whats_new.rst for descriptions).
Entries should be ordered by those labels (e.g. |Fix| after |Efficiency|).
Changes not specific to a module should be listed under Multiple Modules
or Miscellaneous.
Entries should end with:
:pr:123456 by :user:Joe Bloggs <joeongithub>.
where 123456 is the pull request number, not the issue number.
sklearn
.........
skip_parameter_validation, to the function
:func:sklearn.set_config and context manager :func:sklearn.config_context, that
allows to skip the validation of the parameters passed to the estimators and public
functions. This can be useful to speed up the code but should be used with care
because it can lead to unexpected behaviors or raise obscure error messages when
setting invalid parameters.
:pr:25815 by :user:Jérémie du Boisberranger <jeremiedbb>.:mod:sklearn.base
...................
|Feature| A __sklearn_clone__ protocol is now available to override the
default behavior of :func:base.clone. :pr:24568 by Thomas Fan_.
|Fix| :class:base.TransformerMixin now currently keeps a namedtuple's class
if transform returns a namedtuple. :pr:26121 by Thomas Fan_.
:mod:sklearn.calibration
..........................
calibration.CalibratedClassifierCV now does not enforce sample
alignment on fit_params. :pr:25805 by Adrin Jalali_.:mod:sklearn.cluster
......................
|MajorFeature| Added :class:cluster.HDBSCAN, a modern hierarchical density-based
clustering algorithm. Similarly to :class:cluster.OPTICS, it can be seen as a
generalization of :class:cluster.DBSCAN by allowing for hierarchical instead of flat
clustering, however it varies in its approach from :class:cluster.OPTICS. This
algorithm is very robust with respect to its hyperparameters' values and can
be used on a wide variety of data without much, if any, tuning.
This implementation is an adaptation from the original implementation of HDBSCAN in
scikit-learn-contrib/hdbscan <https://github.com/scikit-learn-contrib/hdbscan>_,
by :user:Leland McInnes <lmcinnes> et al.
:pr:26385 by :user:Meekail Zain <micky774>
|Enhancement| The sample_weight parameter now will be used in centroids
initialization for :class:cluster.KMeans, :class:cluster.BisectingKMeans
and :class:cluster.MiniBatchKMeans.
This change will break backward compatibility, since numbers generated
from same random seeds will be different.
:pr:25752 by :user:Hleb Levitski <glevv>,
:user:Jérémie du Boisberranger <jeremiedbb>,
:user:Guillaume Lemaitre <glemaitre>.
|Fix| :class:cluster.KMeans, :class:cluster.MiniBatchKMeans and
:func:cluster.k_means now correctly handle the combination of n_init="auto"
and init being an array-like, running one initialization in that case.
:pr:26657 by :user:Binesh Bannerjee <bnsh>.
|API| The sample_weight parameter in predict for
:meth:cluster.KMeans.predict and :meth:cluster.MiniBatchKMeans.predict
is now deprecated and will be removed in v1.5.
:pr:25251 by :user:Hleb Levitski <glevv>.
|API| The Xred argument in :func:cluster.FeatureAgglomeration.inverse_transform
is renamed to Xt and will be removed in v1.5. :pr:26503 by Adrin Jalali_.
:mod:sklearn.compose
......................
|Fix| :class:compose.ColumnTransformer raises an informative error when the individual
transformers of ColumnTransformer output pandas dataframes with indexes that are
not consistent with each other and the output is configured to be pandas.
:pr:26286 by Thomas Fan_.
|Fix| :class:compose.ColumnTransformer correctly sets the output of the
remainder when set_output is called. :pr:26323 by Thomas Fan_.
:mod:sklearn.covariance
.........................
|Fix| Allows alpha=0 in :class:covariance.GraphicalLasso to be
consistent with :func:covariance.graphical_lasso.
:pr:26033 by :user:Genesis Valencia <genvalen>.
|Fix| :func:covariance.empirical_covariance now gives an informative
error message when input is not appropriate.
:pr:26108 by :user:Quentin Barthélemy <qbarthelemy>.
|API| Deprecates cov_init in :func:covariance.graphical_lasso in 1.3 since
the parameter has no effect. It will be removed in 1.5.
:pr:26033 by :user:Genesis Valencia <genvalen>.
|API| Adds costs_ fitted attribute in :class:covariance.GraphicalLasso and
:class:covariance.GraphicalLassoCV.
:pr:26033 by :user:Genesis Valencia <genvalen>.
|API| Adds covariance parameter in :class:covariance.GraphicalLasso.
:pr:26033 by :user:Genesis Valencia <genvalen>.
|API| Adds eps parameter in :class:covariance.GraphicalLasso,
:func:covariance.graphical_lasso, and :class:covariance.GraphicalLassoCV.
:pr:26033 by :user:Genesis Valencia <genvalen>.
:mod:sklearn.datasets
.......................
|Enhancement| Allows to overwrite the parameters used to open the ARFF file using
the parameter read_csv_kwargs in :func:datasets.fetch_openml when using the
pandas parser.
:pr:26433 by :user:Guillaume Lemaitre <glemaitre>.
|Fix| :func:datasets.fetch_openml returns improved data types when
as_frame=True and parser="liac-arff". :pr:26386 by Thomas Fan_.
|Fix| Following the ARFF specs, only the marker "?" is now considered as a missing
values when opening ARFF files fetched using :func:datasets.fetch_openml when using
the pandas parser. The parameter read_csv_kwargs allows to overwrite this behaviour.
:pr:26551 by :user:Guillaume Lemaitre <glemaitre>.
|Fix| :func:datasets.fetch_openml will consistently use np.nan as missing marker
with both parsers "pandas" and "liac-arff".
:pr:26579 by :user:Guillaume Lemaitre <glemaitre>.
|API| The data_transposed argument of :func:datasets.make_sparse_coded_signal
is deprecated and will be removed in v1.5.
:pr:25784 by :user:Jérémie du Boisberranger.
:mod:sklearn.decomposition
............................
|Efficiency| :class:decomposition.MiniBatchDictionaryLearning and
:class:decomposition.MiniBatchSparsePCA are now faster for small batch sizes by
avoiding duplicate validations.
:pr:25490 by :user:Jérémie du Boisberranger <jeremiedbb>.
|Enhancement| :class:decomposition.DictionaryLearning now accepts the parameter
callback for consistency with the function :func:decomposition.dict_learning.
:pr:24871 by :user:Omar Salman <OmarManzoor>.
|Fix| Treat more consistently small values in the W and H matrices during the
fit and transform steps of :class:decomposition.NMF and
:class:decomposition.MiniBatchNMF which can produce different results than previous
versions. :pr:25438 by :user:Yotam Avidar-Constantini <yotamcons>.
|API| The W argument in :func:decomposition.NMF.inverse_transform and
:class:decomposition.MiniBatchNMF.inverse_transform is renamed to Xt and
will be removed in v1.5. :pr:26503 by Adrin Jalali_.
:mod:sklearn.discriminant_analysis
....................................
discriminant_analysis.LinearDiscriminantAnalysis now
supports the PyTorch <https://pytorch.org/>__. See
:ref:array_api for more details. :pr:25956 by Thomas Fan_.:mod:sklearn.ensemble
.......................
|Feature| :class:ensemble.HistGradientBoostingRegressor now supports
the Gamma deviance loss via loss="gamma".
Using the Gamma deviance as loss function comes in handy for modelling skewed
distributed, strictly positive valued targets.
:pr:22409 by :user:Christian Lorentzen <lorentzenchr>.
|Feature| Compute a custom out-of-bag score by passing a callable to
:class:ensemble.RandomForestClassifier, :class:ensemble.RandomForestRegressor,
:class:ensemble.ExtraTreesClassifier and :class:ensemble.ExtraTreesRegressor.
:pr:25177 by Tim Head_.
|Feature| :class:ensemble.GradientBoostingClassifier now exposes
out-of-bag scores via the oob_scores_ or oob_score_ attributes.
:pr:24882 by :user:Ashwin Mathur <awinml>.
|Efficiency| :class:ensemble.IsolationForest predict time is now faster
(typically by a factor of 8 or more). Internally, the estimator now precomputes
decision path lengths per tree at fit time. It is therefore not possible
to load an estimator trained with scikit-learn 1.2 to make it predict with
scikit-learn 1.3: retraining with scikit-learn 1.3 is required.
:pr:25186 by :user:Felipe Breve Siola <fsiola>.
|Efficiency| :class:ensemble.RandomForestClassifier and
:class:ensemble.RandomForestRegressor with warm_start=True now only
recomputes out-of-bag scores when there are actually more n_estimators
in subsequent fit calls.
:pr:26318 by :user:Joshua Choo Yun Keat <choo8>.
|Enhancement| :class:ensemble.BaggingClassifier and
:class:ensemble.BaggingRegressor expose the allow_nan tag from the
underlying estimator. :pr:25506 by Thomas Fan_.
|Fix| :meth:ensemble.RandomForestClassifier.fit sets max_samples = 1
when max_samples is a float and round(n_samples * max_samples) < 1.
:pr:25601 by :user:Jan Fidor <JanFidor>.
|Fix| :meth:ensemble.IsolationForest.fit no longer warns about missing
feature names when called with contamination not "auto" on a pandas
dataframe.
:pr:25931 by :user:Yao Xiao <Charlie-XIAO>.
|Fix| :class:ensemble.HistGradientBoostingRegressor and
:class:ensemble.HistGradientBoostingClassifier treats negative values for
categorical features consistently as missing values, following LightGBM's and
pandas' conventions.
:pr:25629 by Thomas Fan_.
|Fix| Fix deprecation of base_estimator in :class:ensemble.AdaBoostClassifier
and :class:ensemble.AdaBoostRegressor that was introduced in :pr:23819.
:pr:26242 by :user:Marko Toplak <markotoplak>.
:mod:sklearn.exceptions
.........................
exceptions.InconsistentVersionWarning which is raised
when a scikit-learn estimator is unpickled with a scikit-learn version that is
inconsistent with the scikit-learn version the estimator was pickled with.
:pr:25297 by Thomas Fan_.:mod:sklearn.feature_extraction
.................................
feature_extraction.image.PatchExtractor now follows the
transformer API of scikit-learn. This class is defined as a stateless transformer
meaning that it is not required to call fit before calling transform.
Parameter validation only happens at fit time.
:pr:24230 by :user:Guillaume Lemaitre <glemaitre>.:mod:sklearn.feature_selection
................................
|Enhancement| All selectors in :mod:sklearn.feature_selection will preserve
a DataFrame's dtype when transformed. :pr:25102 by Thomas Fan_.
|Fix| :class:feature_selection.SequentialFeatureSelector's cv parameter
now supports generators. :pr:25973 by Yao Xiao <Charlie-XIAO>.
:mod:sklearn.impute
.....................
|Enhancement| Added the parameter fill_value to :class:impute.IterativeImputer.
:pr:25232 by :user:Thijs van Weezel <ValueInvestorThijs>.
|Fix| :class:impute.IterativeImputer now correctly preserves the Pandas
Index when the set_config(transform_output="pandas"). :pr:26454 by Thomas Fan_.
:mod:sklearn.inspection
.........................
|Enhancement| Added support for sample_weight in
:func:inspection.partial_dependence and
:meth:inspection.PartialDependenceDisplay.from_estimator. This allows for
weighted averaging when aggregating for each value of the grid we are making the
inspection on. The option is only available when method is set to brute.
:pr:25209 and :pr:26644 by :user:Carlo Lemos <vitaliset>.
|API| :func:inspection.partial_dependence returns a :class:utils.Bunch with
new key: grid_values. The values key is deprecated in favor of grid_values
and the values key will be removed in 1.5.
:pr:21809 and :pr:25732 by Thomas Fan_.
:mod:sklearn.kernel_approximation
...................................
kernel_approximation.AdditiveChi2Sampler is now stateless.
The sample_interval_ attribute is deprecated and will be removed in 1.5.
:pr:25190 by :user:Vincent Maladière <Vincent-Maladiere>.:mod:sklearn.linear_model
...........................
|Efficiency| Avoid data scaling when sample_weight=None and other
unnecessary data copies and unexpected dense to sparse data conversion in
:class:linear_model.LinearRegression.
:pr:26207 by :user:Olivier Grisel <ogrisel>.
|Enhancement| :class:linear_model.SGDClassifier,
:class:linear_model.SGDRegressor and :class:linear_model.SGDOneClassSVM
now preserve dtype for numpy.float32.
:pr:25587 by :user:Omar Salman <OmarManzoor>.
|Enhancement| The n_iter_ attribute has been included in
:class:linear_model.ARDRegression to expose the actual number of iterations
required to reach the stopping criterion.
:pr:25697 by :user:John Pangas <jpangas>.
|Fix| Use a more robust criterion to detect convergence of
:class:linear_model.LogisticRegression with penalty="l1" and solver="liblinear"
on linearly separable problems.
:pr:25214 by Tom Dupre la Tour_.
|Fix| Fix a crash when calling fit on
:class:linear_model.LogisticRegression with solver="newton-cholesky" and
max_iter=0 which failed to inspect the state of the model prior to the
first parameter update.
:pr:26653 by :user:Olivier Grisel <ogrisel>.
|API| Deprecates n_iter in favor of max_iter in
:class:linear_model.BayesianRidge and :class:linear_model.ARDRegression.
n_iter will be removed in scikit-learn 1.5. This change makes those
estimators consistent with the rest of estimators.
:pr:25697 by :user:John Pangas <jpangas>.
:mod:sklearn.manifold
.......................
manifold.Isomap now correctly preserves the Pandas
Index when the set_config(transform_output="pandas"). :pr:26454 by Thomas Fan_.:mod:sklearn.metrics
......................
|Feature| Adds zero_division=np.nan to multiple classification metrics:
:func:metrics.precision_score, :func:metrics.recall_score,
:func:metrics.f1_score, :func:metrics.fbeta_score,
:func:metrics.precision_recall_fscore_support,
:func:metrics.classification_report. When zero_division=np.nan and there is a
zero division, the metric is undefined and is excluded from averaging. When not used
for averages, the value returned is np.nan.
:pr:25531 by :user:Marc Torrellas Socastro <marctorsoc>.
|Feature| :func:metrics.average_precision_score now supports the
multiclass case.
:pr:17388 by :user:Geoffrey Bolmier <gbolmier> and
:pr:24769 by :user:Ashwin Mathur <awinml>.
|Efficiency| The computation of the expected mutual information in
:func:metrics.adjusted_mutual_info_score is now faster when the number of
unique labels is large and its memory usage is reduced in general.
:pr:25713 by :user:Kshitij Mathur <Kshitij68>,
:user:Guillaume Lemaitre <glemaitre>, :user:Omar Salman <OmarManzoor> and
:user:Jérémie du Boisberranger <jeremiedbb>.
|Enhancement| :class:metrics.silhouette_samples now accepts a sparse
matrix of pairwise distances between samples, or a feature array.
:pr:18723 by :user:Sahil Gupta <sahilgupta2105> and
:pr:24677 by :user:Ashwin Mathur <awinml>.
|Enhancement| A new parameter drop_intermediate was added to
:func:metrics.precision_recall_curve,
:func:metrics.PrecisionRecallDisplay.from_estimator,
:func:metrics.PrecisionRecallDisplay.from_predictions,
which drops some suboptimal thresholds to create lighter precision-recall
curves.
:pr:24668 by :user:dberenbaum.
|Enhancement| :meth:metrics.RocCurveDisplay.from_estimator and
:meth:metrics.RocCurveDisplay.from_predictions now accept two new keywords,
plot_chance_level and chance_level_kw to plot the baseline chance
level. This line is exposed in the chance_level_ attribute.
:pr:25987 by :user:Yao Xiao <Charlie-XIAO>.
|Enhancement| :meth:metrics.PrecisionRecallDisplay.from_estimator and
:meth:metrics.PrecisionRecallDisplay.from_predictions now accept two new
keywords, plot_chance_level and chance_level_kw to plot the baseline
chance level. This line is exposed in the chance_level_ attribute.
:pr:26019 by :user:Yao Xiao <Charlie-XIAO>.
|Fix| :func:metrics.pairwise.manhattan_distances now supports readonly sparse datasets.
:pr:25432 by :user:Julien Jerphanion <jjerphan>.
|Fix| Fixed :func:metrics.classification_report so that empty input will return
np.nan. Previously, "macro avg" and weighted avg would return
e.g. f1-score=np.nan and f1-score=0.0, being inconsistent. Now, they
both return np.nan.
:pr:25531 by :user:Marc Torrellas Socastro <marctorsoc>.
|Fix| :func:metrics.ndcg_score now gives a meaningful error message for input of
length 1.
:pr:25672 by :user:Lene Preuss <lene> and :user:Wei-Chun Chu <wcchu>.
|Fix| :func:metrics.log_loss raises a warning if the values of the parameter
y_pred are not normalized, instead of actually normalizing them in the metric.
Starting from 1.5 this will raise an error.
:pr:25299 by :user:Omar Salman <OmarManzoor.
|Fix| In :func:metrics.roc_curve, use the threshold value np.inf instead of
arbitrary max(y_score) + 1. This threshold is associated with the ROC curve point
tpr=0 and fpr=0.
:pr:26194 by :user:Guillaume Lemaitre <glemaitre>.
|Fix| The 'matching' metric has been removed when using SciPy>=1.9
to be consistent with scipy.spatial.distance which does not support
'matching' anymore.
:pr:26264 by :user:Barata T. Onggo <magnusbarata>
|API| The eps parameter of the :func:metrics.log_loss has been deprecated and
will be removed in 1.5. :pr:25299 by :user:Omar Salman <OmarManzoor>.
:mod:sklearn.gaussian_process
...............................
gaussian_process.GaussianProcessRegressor has a new argument
n_targets, which is used to decide the number of outputs when sampling
from the prior distributions. :pr:23099 by :user:Zhehao Liu <MaxwellLZH>.:mod:sklearn.mixture
......................
mixture.GaussianMixture is more efficient now and will bypass
unnecessary initialization if the weights, means, and precisions are
given by users.
:pr:26021 by :user:Jiawei Zhang <jiawei-zhang-a>.:mod:sklearn.model_selection
..............................
|MajorFeature| Added the class :class:model_selection.ValidationCurveDisplay
that allows easy plotting of validation curves obtained by the function
:func:model_selection.validation_curve.
:pr:25120 by :user:Guillaume Lemaitre <glemaitre>.
|API| The parameter log_scale in the method plot of the class
:class:model_selection.LearningCurveDisplay has been deprecated in 1.3 and
will be removed in 1.5. The default scale can be overridden by setting it
directly on the ax object and will be set automatically from the spacing
of the data points otherwise.
:pr:25120 by :user:Guillaume Lemaitre <glemaitre>.
|Enhancement| :func:model_selection.cross_validate accepts a new parameter
return_indices to return the train-test indices of each cv split.
:pr:25659 by :user:Guillaume Lemaitre <glemaitre>.
:mod:sklearn.multioutput
..........................
getattr on :meth:multioutput.MultiOutputRegressor.partial_fit
and :meth:multioutput.MultiOutputClassifier.partial_fit now correctly raise
an AttributeError if done before calling fit. :pr:26333 by Adrin Jalali_.:mod:sklearn.naive_bayes
..........................
naive_bayes.GaussianNB does not raise anymore a ZeroDivisionError
when the provided sample_weight reduces the problem to a single class in fit.
:pr:24140 by :user:Jonathan Ohayon <Johayon> and :user:Chiara Marmo <cmarmo>.:mod:sklearn.neighbors
........................
|Enhancement| The performance of :meth:neighbors.KNeighborsClassifier.predict
and of :meth:neighbors.KNeighborsClassifier.predict_proba has been improved
when n_neighbors is large and algorithm="brute" with non Euclidean metrics.
:pr:24076 by :user:Meekail Zain <micky774>, :user:Julien Jerphanion <jjerphan>.
|Fix| Remove support for KulsinskiDistance in :class:neighbors.BallTree. This
dissimilarity is not a metric and cannot be supported by the BallTree.
:pr:25417 by :user:Guillaume Lemaitre <glemaitre>.
|API| The support for metrics other than euclidean and manhattan and for
callables in :class:neighbors.NearestNeighbors is deprecated and will be removed in
version 1.5. :pr:24083 by :user:Valentin Laurent <Valentin-Laurent>.
:mod:sklearn.neural_network
.............................
neural_network.MLPRegressor and :class:neural_network.MLPClassifier
reports the right n_iter_ when warm_start=True. It corresponds to the number
of iterations performed on the current call to fit instead of the total number
of iterations performed since the initialization of the estimator.
:pr:25443 by :user:Marvin Krawutschke <Marvvxi>.:mod:sklearn.pipeline
.......................
|Feature| :class:pipeline.FeatureUnion can now use indexing notation (e.g.
feature_union["scalar"]) to access transformers by name. :pr:25093 by
Thomas Fan_.
|Feature| :class:pipeline.FeatureUnion can now access the
feature_names_in_ attribute if the X value seen during .fit has a
columns attribute and all columns are strings. e.g. when X is a
pandas.DataFrame
:pr:25220 by :user:Ian Thompson <it176131>.
|Fix| :meth:pipeline.Pipeline.fit_transform now raises an AttributeError
if the last step of the pipeline does not support fit_transform.
:pr:26325 by Adrin Jalali_.
:mod:sklearn.preprocessing
............................
|MajorFeature| Introduces :class:preprocessing.TargetEncoder which is a
categorical encoding based on target mean conditioned on the value of the
category. :pr:25334 by Thomas Fan_.
|Feature| :class:preprocessing.OrdinalEncoder now supports grouping
infrequent categories into a single feature. Grouping infrequent categories
is enabled by specifying how to select infrequent categories with
min_frequency or max_categories. :pr:25677 by Thomas Fan_.
|Enhancement| :class:preprocessing.PolynomialFeatures now calculates the
number of expanded terms a-priori when dealing with sparse csr matrices
in order to optimize the choice of dtype for indices and indptr. It
can now output csr matrices with np.int32 indices/indptr components
when there are few enough elements, and will automatically use np.int64
for sufficiently large matrices.
:pr:20524 by :user:niuk-a <niuk-a> and
:pr:23731 by :user:Meekail Zain <micky774>
|Enhancement| A new parameter sparse_output was added to
:class:preprocessing.SplineTransformer, available as of SciPy 1.8. If
sparse_output=True, :class:preprocessing.SplineTransformer returns a sparse
CSR matrix. :pr:24145 by :user:Christian Lorentzen <lorentzenchr>.
|Enhancement| Adds a feature_name_combiner parameter to
:class:preprocessing.OneHotEncoder. This specifies a custom callable to
create feature names to be returned by
:meth:preprocessing.OneHotEncoder.get_feature_names_out. The callable
combines input arguments (input_feature, category) to a string.
:pr:22506 by :user:Mario Kostelac <mariokostelac>.
|Enhancement| Added support for sample_weight in
:class:preprocessing.KBinsDiscretizer. This allows specifying the parameter
sample_weight for each sample to be used while fitting. The option is only
available when strategy is set to quantile and kmeans.
:pr:24935 by :user:Seladus <seladus>, :user:Guillaume Lemaitre <glemaitre>, and
:user:Dea María Léon <deamarialeon>, :pr:25257 by :user:Hleb Levitski <glevv>.
|Enhancement| Subsampling through the subsample parameter can now be used in
:class:preprocessing.KBinsDiscretizer regardless of the strategy used.
:pr:26424 by :user:Jérémie du Boisberranger <jeremiedbb>.
|Fix| :class:preprocessing.PowerTransformer now correctly preserves the Pandas
Index when the set_config(transform_output="pandas"). :pr:26454 by Thomas Fan_.
|Fix| :class:preprocessing.PowerTransformer now correctly raises error when
using method="box-cox" on data with a constant np.nan column.
:pr:26400 by :user:Yao Xiao <Charlie-XIAO>.
|Fix| :class:preprocessing.PowerTransformer with method="yeo-johnson" now leaves
constant features unchanged instead of transforming with an arbitrary value for
the lambdas_ fitted parameter.
:pr:26566 by :user:Jérémie du Boisberranger <jeremiedbb>.
|API| The default value of the subsample parameter of
:class:preprocessing.KBinsDiscretizer will change from None to 200_000 in
version 1.5 when strategy="kmeans" or strategy="uniform".
:pr:26424 by :user:Jérémie du Boisberranger <jeremiedbb>.
:mod:sklearn.svm
..................
dual parameter now accepts auto option for
:class:svm.LinearSVC and :class:svm.LinearSVR.
:pr:26093 by :user:Hleb Levitski <glevv>.:mod:sklearn.tree
...................
|MajorFeature| :class:tree.DecisionTreeRegressor and
:class:tree.DecisionTreeClassifier support missing values when
splitter='best' and criterion is gini, entropy, or log_loss,
for classification or squared_error, friedman_mse, or poisson
for regression. :pr:23595, :pr:26376 by Thomas Fan_.
|Enhancement| Adds a class_names parameter to
:func:tree.export_text. This allows specifying the parameter class_names
for each target class in ascending numerical order.
:pr:25387 by :user:William M <Akbeeh> and :user:crispinlogan <crispinlogan>.
|Fix| :func:tree.export_graphviz and :func:tree.export_text now accepts
feature_names and class_names as array-like rather than lists.
:pr:26289 by :user:Yao Xiao <Charlie-XIAO>
:mod:sklearn.utils
....................
|FIX| Fixes :func:utils.check_array to properly convert pandas
extension arrays. :pr:25813 and :pr:26106 by Thomas Fan_.
|Fix| :func:utils.check_array now supports pandas DataFrames with
extension arrays and object dtypes by returning an ndarray with object dtype.
:pr:25814 by Thomas Fan_.
|API| utils.estimator_checks.check_transformers_unfitted_stateless has been
introduced to ensure stateless transformers don't raise NotFittedError
during transform with no prior call to fit or fit_transform.
:pr:25190 by :user:Vincent Maladière <Vincent-Maladiere>.
|API| A FutureWarning is now raised when instantiating a class which inherits from
a deprecated base class (i.e. decorated by :class:utils.deprecated) and which
overrides the __init__ method.
:pr:25733 by :user:Brigitta Sipőcz <bsipocz> and
:user:Jérémie du Boisberranger <jeremiedbb>.
:mod:sklearn.semi_supervised
..............................
semi_supervised.LabelSpreading.fit and
:meth:semi_supervised.LabelPropagation.fit now accepts sparse metrics.
:pr:19664 by :user:Kaushik Amar Das <cozek>.Miscellaneous .............
EnvironmentError, IOError and
WindowsError.
:pr:26466 by :user:Dimitri Papadopoulos ORfanos <DimitriPapadopoulos>... rubric:: Code and documentation contributors
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 1.2, including:
2357juan, Abhishek Singh Kushwah, Adam Handke, Adam Kania, Adam Li, adienes, Admir Demiraj, adoublet, Adrin Jalali, A.H.Mansouri, Ahmedbgh, Ala-Na, Alex Buzenet, AlexL, Ali H. El-Kassas, amay, András Simon, André Pedersen, Andrew Wang, Ankur Singh, annegnx, Ansam Zedan, Anthony22-dev, Artur Hermano, Arturo Amor, as-90, ashah002, Ashish Dutt, Ashwin Mathur, AymericBasset, Azaria Gebremichael, Barata Tripramudya Onggo, Benedek Harsanyi, Benjamin Bossan, Bharat Raghunathan, Binesh Bannerjee, Boris Feld, Brendan Lu, Brevin Kunde, cache-missing, Camille Troillard, Carla J, carlo, Carlo Lemos, c-git, Changyao Chen, Chiara Marmo, Christian Lorentzen, Christian Veenhuis, Christine P. Chai, crispinlogan, Da-Lan, DanGonite57, Dave Berenbaum, davidblnc, david-cortes, Dayne, Dea María Léon, Denis, Dimitri Papadopoulos Orfanos, Dimitris Litsidis, Dmitry Nesterov, Dominic Fox, Dominik Prodinger, Edern, Ekaterina Butyugina, Elabonga Atuo, Emir, farhan khan, Felipe Siola, futurewarning, Gael Varoquaux, genvalen, Hleb Levitski, Guillaume Lemaitre, gunesbayir, Haesun Park, hujiahong726, i-aki-y, Ian Thompson, Ido M, Ily, Irene, Jack McIvor, jakirkham, James Dean, JanFidor, Jarrod Millman, JB Mountford, Jérémie du Boisberranger, Jessicakk0711, Jiawei Zhang, Joey Ortiz, JohnathanPi, John Pangas, Joshua Choo Yun Keat, Joshua Hedlund, JuliaSchoepp, Julien Jerphanion, jygerardy, ka00ri, Kaushik Amar Das, Kento Nozawa, Kian Eliasi, Kilian Kluge, Lene Preuss, Linus, Logan Thomas, Loic Esteve, Louis Fouquet, Lucy Liu, Madhura Jayaratne, Marc Torrellas Socastro, Maren Westermann, Mario Kostelac, Mark Harfouche, Marko Toplak, Marvin Krawutschke, Masanori Kanazu, mathurinm, Matt Haberland, Max Halford, maximeSaur, Maxwell Liu, m. bou, mdarii, Meekail Zain, Mikhail Iljin, murezzda, Nawazish Alam, Nicola Fanelli, Nightwalkx, Nikolay Petrov, Nishu Choudhary, NNLNR, npache, Olivier Grisel, Omar Salman, ouss1508, PAB, Pandata, partev, Peter Piontek, Phil, pnucci, Pooja M, Pooja Subramaniam, precondition, Quentin Barthélemy, Rafal Wojdyla, Raghuveer Bhat, Rahil Parikh, Ralf Gommers, ram vikram singh, Rushil Desai, Sadra Barikbin, SANJAI_3, Sashka Warner, Scott Gigante, Scott Gustafson, searchforpassion, Seoeun Hong, Shady el Gewily, Shiva chauhan, Shogo Hida, Shreesha Kumar Bhat, sonnivs, Sortofamudkip, Stanislav (Stanley) Modrak, Stefanie Senger, Steven Van Vaerenbergh, Tabea Kossen, Théophile Baranger, Thijs van Weezel, Thomas A Caswell, Thomas Germer, Thomas J. Fan, Tim Head, Tim P, Tom Dupré la Tour, tomiock, tspeng, Valentin Laurent, Veghit, VIGNESH D, Vijeth Moudgalya, Vinayak Mehta, Vincent M, Vincent-violet, Vyom Pathak, William M, windiana42, Xiao Yuan, Yao Xiao, Yaroslav Halchenko, Yotam Avidar-Constantini, Yuchen Zhou, Yusuf Raji, zeeshan lone