Back to Scikit Learn

Version 0.24

doc/whats_new/v0.24.rst

1.8.045.4 KB
Original Source

.. include:: _contributors.rst

.. currentmodule:: sklearn

.. _release_notes_0_24:

============ Version 0.24

For a short description of the main highlights of the release, please refer to :ref:sphx_glr_auto_examples_release_highlights_plot_release_highlights_0_24_0.py.

.. include:: changelog_legend.inc

.. _changes_0_24_2:

Version 0.24.2

April 2021

Changelog

:mod:sklearn.compose ......................

  • |Fix| compose.ColumnTransformer.get_feature_names does not call get_feature_names on transformers with an empty column selection. :pr:19579 by Thomas Fan_.

:mod:sklearn.cross_decomposition ..................................

  • |Fix| Fixed a regression in :class:cross_decomposition.CCA. :pr:19646 by Thomas Fan_.

  • |Fix| :class:cross_decomposition.PLSRegression raises warning for constant y residuals instead of a StopIteration error. :pr:19922 by Thomas Fan_.

:mod:sklearn.decomposition ............................

  • |Fix| Fixed a bug in :class:decomposition.KernelPCA's inverse_transform. :pr:19732 by :user:Kei Ishikawa <kstoneriv3>.

:mod:sklearn.ensemble .......................

  • |Fix| Fixed a bug in :class:ensemble.HistGradientBoostingRegressor fit with sample_weight parameter and least_absolute_deviation loss function. :pr:19407 by :user:Vadim Ushtanit <vadim-ushtanit>.

:mod:sklearn.feature_extraction .................................

  • |Fix| Fixed a bug to support multiple strings for a category when sparse=False in :class:feature_extraction.DictVectorizer. :pr:19982 by :user:Guillaume Lemaitre <glemaitre>.

:mod:sklearn.gaussian_process ...............................

  • |Fix| Avoid explicitly forming inverse covariance matrix in :class:gaussian_process.GaussianProcessRegressor when set to output standard deviation. With certain covariance matrices this inverse is unstable to compute explicitly. Calling Cholesky solver mitigates this issue in computation. :pr:19939 by :user:Ian Halvic <iwhalvic>.

  • |Fix| Avoid division by zero when scaling constant target in :class:gaussian_process.GaussianProcessRegressor. It was due to a std. dev. equal to 0. Now, such case is detected and the std. dev. is affected to 1 avoiding a division by zero and thus the presence of NaN values in the normalized target. :pr:19703 by :user:sobkevich, :user:Boris Villazón-Terrazas <boricles> and :user:Alexandr Fonari <afonari>.

:mod:sklearn.linear_model ...........................

  • |Fix|: Fixed a bug in :class:linear_model.LogisticRegression: the sample_weight object is not modified anymore. :pr:19182 by :user:Yosuke KOBAYASHI <m7142yosuke>.

:mod:sklearn.metrics ......................

  • |Fix| :func:metrics.top_k_accuracy_score now supports multiclass problems where only two classes appear in y_true and all the classes are specified in labels. :pr:19721 by :user:Joris Clement <flyingdutchman23>.

:mod:sklearn.model_selection ..............................

  • |Fix| :class:model_selection.RandomizedSearchCV and :class:model_selection.GridSearchCV now correctly show the score for single metrics and verbose > 2. :pr:19659 by Thomas Fan_.

  • |Fix| Some values in the cv_results_ attribute of :class:model_selection.HalvingRandomSearchCV and :class:model_selection.HalvingGridSearchCV were not properly converted to numpy arrays. :pr:19211 by Nicolas Hug_.

  • |Fix| The fit method of the successive halving parameter search (:class:model_selection.HalvingGridSearchCV, and :class:model_selection.HalvingRandomSearchCV) now correctly handles the groups parameter. :pr:19847 by :user:Xiaoyu Chai <xiaoyuchai>.

:mod:sklearn.multioutput ..........................

  • |Fix| :class:multioutput.MultiOutputRegressor now works with estimators that dynamically define predict during fitting, such as :class:ensemble.StackingRegressor. :pr:19308 by Thomas Fan_.

:mod:sklearn.preprocessing ............................

  • |Fix| Validate the constructor parameter handle_unknown in :class:preprocessing.OrdinalEncoder to only allow for 'error' and 'use_encoded_value' strategies. :pr:19234 by Guillaume Lemaitre <glemaitre>.

  • |Fix| Fix encoder categories having dtype='S' :class:preprocessing.OneHotEncoder and :class:preprocessing.OrdinalEncoder. :pr:19727 by :user:Andrew Delong <andrewdelong>.

  • |Fix| :meth:preprocessing.OrdinalEncoder.transform correctly handles unknown values for string dtypes. :pr:19888 by Thomas Fan_.

  • |Fix| :meth:preprocessing.OneHotEncoder.fit no longer alters the drop parameter. :pr:19924 by Thomas Fan_.

:mod:sklearn.semi_supervised ..............................

  • |Fix| Avoid NaN during label propagation in :class:~sklearn.semi_supervised.LabelPropagation. :pr:19271 by :user:Zhaowei Wang <ThuWangzw>.

:mod:sklearn.tree ...................

  • |Fix| Fix a bug in fit of tree.BaseDecisionTree that caused segmentation faults under certain conditions. fit now deep copies the Criterion object to prevent shared concurrent accesses. :pr:19580 by :user:Samuel Brice <samdbrice> and :user:Alex Adamson <aadamson> and :user:Wil Yegelwel <wyegelwel>.

:mod:sklearn.utils ....................

  • |Fix| Better contains the CSS provided by :func:utils.estimator_html_repr by giving CSS ids to the html representation. :pr:19417 by Thomas Fan_.

.. _changes_0_24_1:

Version 0.24.1

January 2021

Packaging

The 0.24.0 scikit-learn wheels were not working with MacOS <1.15 due to libomp. The version of libomp used to build the wheels was too recent for older macOS versions. This issue has been fixed for 0.24.1 scikit-learn wheels. Scikit-learn wheels published on PyPI.org now officially support macOS 10.13 and later.

Changelog

:mod:sklearn.metrics ......................

  • |Fix| Fix numerical stability bug that could happen in :func:metrics.adjusted_mutual_info_score and :func:metrics.mutual_info_score with NumPy 1.20+. :pr:19179 by Thomas Fan_.

:mod:sklearn.semi_supervised ..............................

  • |Fix| :class:semi_supervised.SelfTrainingClassifier is now accepting meta-estimator (e.g. :class:ensemble.StackingClassifier). The validation of this estimator is done on the fitted estimator, once we know the existence of the method predict_proba. :pr:19126 by :user:Guillaume Lemaitre <glemaitre>.

.. _changes_0_24:

Version 0.24.0

December 2020

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

  • |Fix| :class:decomposition.KernelPCA behaviour is now more consistent between 32-bits and 64-bits data when the kernel has small positive eigenvalues.

  • |Fix| :class:decomposition.TruncatedSVD becomes deterministic by exposing a random_state parameter.

  • |Fix| :class:linear_model.Perceptron when penalty='elasticnet'.

  • |Fix| Change in the random sampling procedures for the center initialization of :class:cluster.KMeans.

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

Changelog

:mod:sklearn.base ...................

  • |Fix| :meth:base.BaseEstimator.get_params now will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None. :pr:17448 by :user:Juan Carlos Alfaro Jiménez <alfaro96>.

:mod:sklearn.calibration ..........................

  • |Efficiency| :class:calibration.CalibratedClassifierCV.fit now supports parallelization via joblib.Parallel using argument n_jobs. :pr:17107 by :user:Julien Jerphanion <jjerphan>.

  • |Enhancement| Allow :class:calibration.CalibratedClassifierCV use with prefit :class:pipeline.Pipeline where data is not X is not array-like, sparse matrix or dataframe at the start. :pr:17546 by :user:Lucy Liu <lucyleeow>.

  • |Enhancement| Add ensemble parameter to :class:calibration.CalibratedClassifierCV, which enables implementation of calibration via an ensemble of calibrators (current method) or just one calibrator using all the data (similar to the built-in feature of :mod:sklearn.svm estimators with the probabilities=True parameter). :pr:17856 by :user:Lucy Liu <lucyleeow> and :user:Andrea Esuli <aesuli>.

:mod:sklearn.cluster ......................

  • |Enhancement| :class:cluster.AgglomerativeClustering has a new parameter compute_distances. When set to True, distances between clusters are computed and stored in the distances_ attribute even when the parameter distance_threshold is not used. This new parameter is useful to produce dendrogram visualizations, but introduces a computational and memory overhead. :pr:17984 by :user:Michael Riedmann <mriedmann>, :user:Emilie Delattre <EmilieDel>, and :user:Francesco Casalegno <FrancescoCasalegno>.

  • |Enhancement| :class:cluster.SpectralClustering and :func:cluster.spectral_clustering have a new keyword argument verbose. When set to True, additional messages will be displayed which can aid with debugging. :pr:18052 by :user:Sean O. Stalley <sstalley>.

  • |Enhancement| Added :func:cluster.kmeans_plusplus as public function. Initialization by KMeans++ can now be called separately to generate initial cluster centroids. :pr:17937 by :user:g-walsh

  • |API| :class:cluster.MiniBatchKMeans attributes, counts_ and init_size_, are deprecated and will be removed in 1.1 (renaming of 0.26). :pr:17864 by :user:Jérémie du Boisberranger <jeremiedbb>.

:mod:sklearn.compose ......................

  • |Fix| :class:compose.ColumnTransformer will skip transformers the column selector is a list of bools that are False. :pr:17616 by Thomas Fan_.

  • |Fix| :class:compose.ColumnTransformer now displays the remainder in the diagram display. :pr:18167 by Thomas Fan_.

  • |Fix| :class:compose.ColumnTransformer enforces strict count and order of column names between fit and transform by raising an error instead of a warning, following the deprecation cycle. :pr:18256 by :user:Madhura Jayratne <madhuracj>.

:mod:sklearn.covariance .........................

  • |API| Deprecates cv_alphas_ in favor of cv_results_['alphas'] and grid_scores_ in favor of split scores in cv_results_ in :class:covariance.GraphicalLassoCV. cv_alphas_ and grid_scores_ will be removed in version 1.1 (renaming of 0.26). :pr:16392 by Thomas Fan_.

:mod:sklearn.cross_decomposition ..................................

  • |Fix| Fixed a bug in :class:cross_decomposition.PLSSVD which would sometimes return components in the reversed order of importance. :pr:17095 by Nicolas Hug_.

  • |Fix| Fixed a bug in :class:cross_decomposition.PLSSVD, :class:cross_decomposition.CCA, and :class:cross_decomposition.PLSCanonical, which would lead to incorrect predictions for est.transform(Y) when the training data is single-target. :pr:17095 by Nicolas Hug_.

  • |Fix| Increases the stability of :class:cross_decomposition.CCA :pr:18746 by Thomas Fan_.

  • |API| The bounds of the n_components parameter is now restricted:

    • into [1, min(n_samples, n_features, n_targets)], for :class:cross_decomposition.PLSSVD, :class:cross_decomposition.CCA, and :class:cross_decomposition.PLSCanonical.
    • into [1, n_features] or :class:cross_decomposition.PLSRegression.

    An error will be raised in 1.1 (renaming of 0.26). :pr:17095 by Nicolas Hug_.

  • |API| For :class:cross_decomposition.PLSSVD, :class:cross_decomposition.CCA, and :class:cross_decomposition.PLSCanonical, the x_scores_ and y_scores_ attributes were deprecated and will be removed in 1.1 (renaming of 0.26). They can be retrieved by calling transform on the training data. The norm_y_weights attribute will also be removed. :pr:17095 by Nicolas Hug_.

  • |API| For :class:cross_decomposition.PLSRegression, :class:cross_decomposition.PLSCanonical, :class:cross_decomposition.CCA, and :class:cross_decomposition.PLSSVD, the x_mean_, y_mean_, x_std_, and y_std_ attributes were deprecated and will be removed in 1.1 (renaming of 0.26). :pr:18768 by :user:Maren Westermann <marenwestermann>.

  • |Fix| :class:decomposition.TruncatedSVD becomes deterministic by using the random_state. It controls the weights' initialization of the underlying ARPACK solver. :pr: #18302 by :user:Gaurav Desai <gauravkdesai> and :user:Ivan Panico <FollowKenny>.

:mod:sklearn.datasets .......................

  • |Feature| :func:datasets.fetch_openml now validates md5 checksum of arff files downloaded or cached to ensure data integrity. :pr:14800 by :user:Shashank Singh <shashanksingh28> and Joel Nothman_.

  • |Enhancement| :func:datasets.fetch_openml now allows argument as_frame to be 'auto', which tries to convert returned data to pandas DataFrame unless data is sparse. :pr:17396 by :user:Jiaxiang <fujiaxiang>.

  • |Enhancement| :func:datasets.fetch_covtype now supports the optional argument as_frame; when it is set to True, the returned Bunch object's data and frame members are pandas DataFrames, and the target member is a pandas Series. :pr:17491 by :user:Alex Liang <tianchuliang>.

  • |Enhancement| :func:datasets.fetch_kddcup99 now supports the optional argument as_frame; when it is set to True, the returned Bunch object's data and frame members are pandas DataFrames, and the target member is a pandas Series. :pr:18280 by :user:Alex Liang <tianchuliang> and Guillaume Lemaitre_.

  • |Enhancement| :func:datasets.fetch_20newsgroups_vectorized now supports loading as a pandas DataFrame by setting as_frame=True. :pr:17499 by :user:Brigitta Sipőcz <bsipocz> and Guillaume Lemaitre_.

  • |API| The default value of as_frame in :func:datasets.fetch_openml is changed from False to 'auto'. :pr:17610 by :user:Jiaxiang <fujiaxiang>.

:mod:sklearn.decomposition ............................

  • |API| For :class:decomposition.NMF, the init value, when 'init=None' and n_components <= min(n_samples, n_features) will be changed from 'nndsvd' to 'nndsvda' in 1.1 (renaming of 0.26). :pr:18525 by :user:Chiara Marmo <cmarmo>.

  • |Enhancement| :func:decomposition.FactorAnalysis now supports the optional argument rotation, which can take the value None, 'varimax' or 'quartimax'. :pr:11064 by :user:Jona Sassenhagen <jona-sassenhagen>.

  • |Enhancement| :class:decomposition.NMF now supports the optional parameter regularization, which can take the values None, 'components', 'transformation' or 'both', in accordance with decomposition.NMF.non_negative_factorization. :pr:17414 by :user:Bharat Raghunathan <bharatr21>.

  • |Fix| :class:decomposition.KernelPCA behaviour is now more consistent between 32-bits and 64-bits data input when the kernel has small positive eigenvalues. Small positive eigenvalues were not correctly discarded for 32-bits data. :pr:18149 by :user:Sylvain Marié <smarie>.

  • |Fix| Fix :class:decomposition.SparseCoder such that it follows scikit-learn API and supports cloning. The attribute components_ is deprecated in 0.24 and will be removed in 1.1 (renaming of 0.26). This attribute was redundant with the dictionary attribute and constructor parameter. :pr:17679 by :user:Xavier Dupré <sdpython>.

  • |Fix| :meth:decomposition.TruncatedSVD.fit_transform consistently returns the same as :meth:decomposition.TruncatedSVD.fit followed by :meth:decomposition.TruncatedSVD.transform. :pr:18528 by :user:Albert Villanova del Moral <albertvillanova> and :user:Ruifeng Zheng <zhengruifeng>.

:mod:sklearn.discriminant_analysis ....................................

  • |Enhancement| :class:discriminant_analysis.LinearDiscriminantAnalysis can now use custom covariance estimate by setting the covariance_estimator parameter. :pr:14446 by :user:Hugo Richard <hugorichard>.

:mod:sklearn.ensemble .......................

  • |MajorFeature| :class:ensemble.HistGradientBoostingRegressor and :class:ensemble.HistGradientBoostingClassifier now have native support for categorical features with the categorical_features parameter. :pr:18394 by Nicolas Hug_ and Thomas Fan_.

  • |Feature| :class:ensemble.HistGradientBoostingRegressor and :class:ensemble.HistGradientBoostingClassifier now support the method staged_predict, which allows monitoring of each stage. :pr:16985 by :user:Hao Chun Chang <haochunchang>.

  • |Efficiency| break cyclic references in the tree nodes used internally in :class:ensemble.HistGradientBoostingRegressor and :class:ensemble.HistGradientBoostingClassifier to allow for the timely garbage collection of large intermediate datastructures and to improve memory usage in fit. :pr:18334 by Olivier Grisel_ Nicolas Hug, Thomas Fan and Andreas Müller_.

  • |Efficiency| Histogram initialization is now done in parallel in :class:ensemble.HistGradientBoostingRegressor and :class:ensemble.HistGradientBoostingClassifier which results in speed improvement for problems that build a lot of nodes on multicore machines. :pr:18341 by Olivier Grisel, Nicolas Hug, Thomas Fan_, and :user:Egor Smirnov <SmirnovEgorRu>.

  • |Fix| Fixed a bug in :class:ensemble.HistGradientBoostingRegressor and :class:ensemble.HistGradientBoostingClassifier which can now accept data with uint8 dtype in predict. :pr:18410 by Nicolas Hug_.

  • |API| The parameter n_classes_ is now deprecated in :class:ensemble.GradientBoostingRegressor and returns 1. :pr:17702 by :user:Simona Maggio <simonamaggio>.

  • |API| Mean absolute error ('mae') is now deprecated for the parameter criterion in :class:ensemble.GradientBoostingRegressor and :class:ensemble.GradientBoostingClassifier. :pr:18326 by :user:Madhura Jayaratne <madhuracj>.

:mod:sklearn.exceptions .........................

  • |API| exceptions.ChangedBehaviorWarning and exceptions.NonBLASDotWarning are deprecated and will be removed in 1.1 (renaming of 0.26). :pr:17804 by Adrin Jalali_.

:mod:sklearn.feature_extraction .................................

  • |Enhancement| :class:feature_extraction.DictVectorizer accepts multiple values for one categorical feature. :pr:17367 by :user:Peng Yu <yupbank> and :user:Chiara Marmo <cmarmo>.

  • |Fix| :class:feature_extraction.text.CountVectorizer raises an issue if a custom token pattern which captures more than one group is provided. :pr:15427 by :user:Gangesh Gudmalwar <ggangesh> and :user:Erin R Hoffman <hoffm386>.

:mod:sklearn.feature_selection ................................

  • |Feature| Added :class:feature_selection.SequentialFeatureSelector which implements forward and backward sequential feature selection. :pr:6545 by Sebastian Raschka_ and :pr:17159 by Nicolas Hug_.

  • |Feature| A new parameter importance_getter was added to :class:feature_selection.RFE, :class:feature_selection.RFECV and :class:feature_selection.SelectFromModel, allowing the user to specify an attribute name/path or a callable for extracting feature importance from the estimator. :pr:15361 by :user:Venkatachalam N <venkyyuvy>.

  • |Efficiency| Reduce memory footprint in :func:feature_selection.mutual_info_classif and :func:feature_selection.mutual_info_regression by calling :class:neighbors.KDTree for counting nearest neighbors. :pr:17878 by :user:Noel Rogers <noelano>.

  • |Enhancement| :class:feature_selection.RFE supports the option for the number of n_features_to_select to be given as a float representing the percentage of features to select. :pr:17090 by :user:Lisa Schwetlick <lschwetlick> and :user:Marija Vlajic Wheeler <marijavlajic>.

:mod:sklearn.gaussian_process ...............................

  • |Enhancement| A new method gaussian_process.kernel._check_bounds_params is called after fitting a Gaussian Process and raises a ConvergenceWarning if the bounds of the hyperparameters are too tight. :issue:12638 by :user:Sylvain Lannuzel <SylvainLan>.

:mod:sklearn.impute .....................

  • |Feature| :class:impute.SimpleImputer now supports a list of strings when strategy='most_frequent' or strategy='constant'. :pr:17526 by :user:Ayako YAGI <yagi-3> and :user:Juan Carlos Alfaro Jiménez <alfaro96>.

  • |Feature| Added method :meth:impute.SimpleImputer.inverse_transform to revert imputed data to original when instantiated with add_indicator=True. :pr:17612 by :user:Srimukh Sripada <d3b0unce>.

  • |Fix| replace the default values in :class:impute.IterativeImputer of min_value and max_value parameters to -np.inf and np.inf, respectively instead of None. However, the behaviour of the class does not change since None was defaulting to these values already. :pr:16493 by :user:Darshan N <DarshanGowda0>.

  • |Fix| :class:impute.IterativeImputer will not attempt to set the estimator's random_state attribute, allowing to use it with more external classes. :pr:15636 by :user:David Cortes <david-cortes>.

  • |Efficiency| :class:impute.SimpleImputer is now faster with object dtype array. when strategy='most_frequent' in :class:~sklearn.impute.SimpleImputer. :pr:18987 by :user:David Katz <DavidKatz-il>.

:mod:sklearn.inspection .........................

  • |Feature| :func:inspection.partial_dependence and inspection.plot_partial_dependence now support calculating and plotting Individual Conditional Expectation (ICE) curves controlled by the kind parameter. :pr:16619 by :user:Madhura Jayratne <madhuracj>.

  • |Feature| Add sample_weight parameter to :func:inspection.permutation_importance. :pr:16906 by :user:Roei Kahny <RoeiKa>.

  • |API| Positional arguments are deprecated in :meth:inspection.PartialDependenceDisplay.plot and will error in 1.1 (renaming of 0.26). :pr:18293 by Thomas Fan_.

:mod:sklearn.isotonic .......................

  • |Feature| Expose fitted attributes X_thresholds_ and y_thresholds_ that hold the de-duplicated interpolation thresholds of an :class:isotonic.IsotonicRegression instance for model inspection purpose. :pr:16289 by :user:Masashi Kishimoto <kishimoto-banana> and :user:Olivier Grisel <ogrisel>.

  • |Enhancement| :class:isotonic.IsotonicRegression now accepts 2d array with 1 feature as input array. :pr:17379 by :user:Jiaxiang <fujiaxiang>.

  • |Fix| Add tolerance when determining duplicate X values to prevent inf values from being predicted by :class:isotonic.IsotonicRegression. :pr:18639 by :user:Lucy Liu <lucyleeow>.

:mod:sklearn.kernel_approximation ...................................

  • |Feature| Added class :class:kernel_approximation.PolynomialCountSketch which implements the Tensor Sketch algorithm for polynomial kernel feature map approximation. :pr:13003 by :user:Daniel López Sánchez <lopeLH>.

  • |Efficiency| :class:kernel_approximation.Nystroem now supports parallelization via joblib.Parallel using argument n_jobs. :pr:18545 by :user:Laurenz Reitsam <LaurenzReitsam>.

:mod:sklearn.linear_model ...........................

  • |Feature| :class:linear_model.LinearRegression now forces coefficients to be all positive when positive is set to True. :pr:17578 by :user:Joseph Knox <jknox13>, :user:Nelle Varoquaux <NelleV> and :user:Chiara Marmo <cmarmo>.

  • |Enhancement| :class:linear_model.RidgeCV now supports finding an optimal regularization value alpha for each target separately by setting alpha_per_target=True. This is only supported when using the default efficient leave-one-out cross-validation scheme cv=None. :pr:6624 by :user:Marijn van Vliet <wmvanvliet>.

  • |Fix| Fixes bug in :class:linear_model.TheilSenRegressor where predict and score would fail when fit_intercept=False and there was one feature during fitting. :pr:18121 by Thomas Fan_.

  • |Fix| Fixes bug in :class:linear_model.ARDRegression where predict was raising an error when normalize=True and return_std=True because X_offset_ and X_scale_ were undefined. :pr:18607 by :user:fhaselbeck <fhaselbeck>.

  • |Fix| Added the missing l1_ratio parameter in :class:linear_model.Perceptron, to be used when penalty='elasticnet'. This changes the default from 0 to 0.15. :pr:18622 by :user:Haesun Park <rickiepark>.

:mod:sklearn.manifold .......................

  • |Efficiency| Fixed :issue:10493. Improve Local Linear Embedding (LLE) that raised MemoryError exception when used with large inputs. :pr:17997 by :user:Bertrand Maisonneuve <bmaisonn>.

  • |Enhancement| Add square_distances parameter to :class:manifold.TSNE, which provides backward compatibility during deprecation of legacy squaring behavior. Distances will be squared by default in 1.1 (renaming of 0.26), and this parameter will be removed in 1.3. :pr:17662 by :user:Joshua Newton <joshuacwnewton>.

  • |Fix| :class:manifold.MDS now correctly sets its _pairwise attribute. :pr:18278 by Thomas Fan_.

:mod:sklearn.metrics ......................

  • |Feature| Added :func:metrics.cluster.pair_confusion_matrix implementing the confusion matrix arising from pairs of elements from two clusterings. :pr:17412 by :user:Uwe F Mayer <ufmayer>.

  • |Feature| new metric :func:metrics.top_k_accuracy_score. It's a generalization of :func:metrics.top_k_accuracy_score, the difference is that a prediction is considered correct as long as the true label is associated with one of the k highest predicted scores. :func:metrics.accuracy_score is the special case of k = 1. :pr:16625 by :user:Geoffrey Bolmier <gbolmier>.

  • |Feature| Added :func:metrics.det_curve to compute Detection Error Tradeoff curve classification metric. :pr:10591 by :user:Jeremy Karnowski <jkarnows> and :user:Daniel Mohns <dmohns>.

  • |Feature| Added metrics.plot_det_curve and :class:metrics.DetCurveDisplay to ease the plot of DET curves. :pr:18176 by :user:Guillaume Lemaitre <glemaitre>.

  • |Feature| Added :func:metrics.mean_absolute_percentage_error metric and the associated scorer for regression problems. :issue:10708 fixed with the PR :pr:15007 by :user:Ashutosh Hathidara <ashutosh1919>. The scorer and some practical test cases were taken from PR :pr:10711 by :user:Mohamed Ali Jamaoui <mohamed-ali>.

  • |Feature| Added :func:metrics.rand_score implementing the (unadjusted) Rand index. :pr:17412 by :user:Uwe F Mayer <ufmayer>.

  • |Feature| metrics.plot_confusion_matrix now supports making colorbar optional in the matplotlib plot by setting colorbar=False. :pr:17192 by :user:Avi Gupta <avigupta2612>

  • |Enhancement| Add sample_weight parameter to :func:metrics.median_absolute_error. :pr:17225 by :user:Lucy Liu <lucyleeow>.

  • |Enhancement| Add pos_label parameter in metrics.plot_precision_recall_curve in order to specify the positive class to be used when computing the precision and recall statistics. :pr:17569 by :user:Guillaume Lemaitre <glemaitre>.

  • |Enhancement| Add pos_label parameter in metrics.plot_roc_curve in order to specify the positive class to be used when computing the roc auc statistics. :pr:17651 by :user:Clara Matos <claramatos>.

  • |Fix| Fixed a bug in :func:metrics.classification_report which was raising AttributeError when called with output_dict=True for 0-length values. :pr:17777 by :user:Shubhanshu Mishra <napsternxg>.

  • |Fix| Fixed a bug in :func:metrics.classification_report which was raising AttributeError when called with output_dict=True for 0-length values. :pr:17777 by :user:Shubhanshu Mishra <napsternxg>.

  • |Fix| Fixed a bug in :func:metrics.jaccard_score which recommended the zero_division parameter when called with no true or predicted samples. :pr:17826 by :user:Richard Decal <crypdick> and :user:Joseph Willard <josephwillard>

  • |Fix| bug in :func:metrics.hinge_loss where error occurs when y_true is missing some labels that are provided explicitly in the labels parameter. :pr:17935 by :user:Cary Goltermann <Ultramann>.

  • |Fix| Fix scorers that accept a pos_label parameter and compute their metrics from values returned by decision_function or predict_proba. Previously, they would return erroneous values when pos_label was not corresponding to classifier.classes_[1]. This is especially important when training classifiers directly with string labeled target classes. :pr:18114 by :user:Guillaume Lemaitre <glemaitre>.

  • |Fix| Fixed bug in metrics.plot_confusion_matrix where error occurs when y_true contains labels that were not previously seen by the classifier while the labels and display_labels parameters are set to None. :pr:18405 by :user:Thomas J. Fan <thomasjpfan> and :user:Yakov Pchelintsev <kyouma>.

:mod:sklearn.model_selection ..............................

  • |MajorFeature| Added (experimental) parameter search estimators :class:model_selection.HalvingRandomSearchCV and :class:model_selection.HalvingGridSearchCV which implement Successive Halving, and can be used as a drop-in replacements for :class:model_selection.RandomizedSearchCV and :class:model_selection.GridSearchCV. :pr:13900 by Nicolas Hug, Joel Nothman and Andreas Müller_.

  • |Feature| :class:model_selection.RandomizedSearchCV and :class:model_selection.GridSearchCV now have the method score_samples :pr:17478 by :user:Teon Brooks <teonbrooks> and :user:Mohamed Maskani <maskani-moh>.

  • |Enhancement| :class:model_selection.TimeSeriesSplit has two new keyword arguments test_size and gap. test_size allows the out-of-sample time series length to be fixed for all folds. gap removes a fixed number of samples between the train and test set on each fold. :pr:13204 by :user:Kyle Kosic <kykosic>.

  • |Enhancement| :func:model_selection.permutation_test_score and :func:model_selection.validation_curve now accept fit_params to pass additional estimator parameters. :pr:18527 by :user:Gaurav Dhingra <gxyd>, :user:Julien Jerphanion <jjerphan> and :user:Amanda Dsouza <amy12xx>.

  • |Enhancement| :func:model_selection.cross_val_score, :func:model_selection.cross_validate, :class:model_selection.GridSearchCV, and :class:model_selection.RandomizedSearchCV allows estimator to fail scoring and replace the score with error_score. If error_score="raise", the error will be raised. :pr:18343 by Guillaume Lemaitre_ and :user:Devi Sandeep <dsandeep0138>.

  • |Enhancement| :func:model_selection.learning_curve now accept fit_params to pass additional estimator parameters. :pr:18595 by :user:Amanda Dsouza <amy12xx>.

  • |Fix| Fixed the len of :class:model_selection.ParameterSampler when all distributions are lists and n_iter is more than the number of unique parameter combinations. :pr:18222 by Nicolas Hug_.

  • |Fix| A fix to raise warning when one or more CV splits of :class:model_selection.GridSearchCV and :class:model_selection.RandomizedSearchCV results in non-finite scores. :pr:18266 by :user:Subrat Sahu <subrat93>, :user:Nirvan <Nirvan101> and :user:Arthur Book <ArthurBook>.

  • |Enhancement| :class:model_selection.GridSearchCV, :class:model_selection.RandomizedSearchCV and :func:model_selection.cross_validate support scoring being a callable returning a dictionary of multiple metric names/values association. :pr:15126 by Thomas Fan_.

:mod:sklearn.multiclass .........................

  • |Enhancement| :class:multiclass.OneVsOneClassifier now accepts the inputs with missing values. Hence, estimators which can handle missing values (may be a pipeline with imputation step) can be used as a estimator for multiclass wrappers. :pr:17987 by :user:Venkatachalam N <venkyyuvy>.

  • |Fix| A fix to allow :class:multiclass.OutputCodeClassifier to accept sparse input data in its fit and predict methods. The check for validity of the input is now delegated to the base estimator. :pr:17233 by :user:Zolisa Bleki <zoj613>.

:mod:sklearn.multioutput ..........................

  • |Enhancement| :class:multioutput.MultiOutputClassifier and :class:multioutput.MultiOutputRegressor now accepts the inputs with missing values. Hence, estimators which can handle missing values (may be a pipeline with imputation step, HistGradientBoosting estimators) can be used as a estimator for multiclass wrappers. :pr:17987 by :user:Venkatachalam N <venkyyuvy>.

  • |Fix| A fix to accept tuples for the order parameter in :class:multioutput.ClassifierChain. :pr:18124 by :user:Gus Brocchini <boldloop> and :user:Amanda Dsouza <amy12xx>.

:mod:sklearn.naive_bayes ..........................

  • |Enhancement| Adds a parameter min_categories to :class:naive_bayes.CategoricalNB that allows a minimum number of categories per feature to be specified. This allows categories unseen during training to be accounted for. :pr:16326 by :user:George Armstrong <gwarmstrong>.

  • |API| The attributes coef_ and intercept_ are now deprecated in :class:naive_bayes.MultinomialNB, :class:naive_bayes.ComplementNB, :class:naive_bayes.BernoulliNB and :class:naive_bayes.CategoricalNB, and will be removed in v1.1 (renaming of 0.26). :pr:17427 by :user:Juan Carlos Alfaro Jiménez <alfaro96>.

:mod:sklearn.neighbors ........................

  • |Efficiency| Speed up seuclidean, wminkowski, mahalanobis and haversine metrics in neighbors.DistanceMetric by avoiding unexpected GIL acquiring in Cython when setting n_jobs>1 in :class:neighbors.KNeighborsClassifier, :class:neighbors.KNeighborsRegressor, :class:neighbors.RadiusNeighborsClassifier, :class:neighbors.RadiusNeighborsRegressor, :func:metrics.pairwise_distances and by validating data out of loops. :pr:17038 by :user:Wenbo Zhao <webber26232>.

  • |Efficiency| neighbors.NeighborsBase benefits of an improved algorithm = 'auto' heuristic. In addition to the previous set of rules, now, when the number of features exceeds 15, brute is selected, assuming the data intrinsic dimensionality is too high for tree-based methods. :pr:17148 by :user:Geoffrey Bolmier <gbolmier>.

  • |Fix| neighbors.BinaryTree will raise a ValueError when fitting on data array having points with different dimensions. :pr:18691 by :user:Chiara Marmo <cmarmo>.

  • |Fix| :class:neighbors.NearestCentroid with a numerical shrink_threshold will raise a ValueError when fitting on data with all constant features. :pr:18370 by :user:Trevor Waite <trewaite>.

  • |Fix| In methods radius_neighbors and radius_neighbors_graph of :class:neighbors.NearestNeighbors, :class:neighbors.RadiusNeighborsClassifier, :class:neighbors.RadiusNeighborsRegressor, and :class:neighbors.RadiusNeighborsTransformer, using sort_results=True now correctly sorts the results even when fitting with the "brute" algorithm. :pr:18612 by Tom Dupre la Tour_.

:mod:sklearn.neural_network .............................

  • |Efficiency| Neural net training and prediction are now a little faster. :pr:17603, :pr:17604, :pr:17606, :pr:17608, :pr:17609, :pr:17633, :pr:17661, :pr:17932 by :user:Alex Henrie <alexhenrie>.

  • |Enhancement| Avoid converting float32 input to float64 in :class:neural_network.BernoulliRBM. :pr:16352 by :user:Arthur Imbert <Henley13>.

  • |Enhancement| Support 32-bit computations in :class:neural_network.MLPClassifier and :class:neural_network.MLPRegressor. :pr:17759 by :user:Srimukh Sripada <d3b0unce>.

  • |Fix| Fix method :meth:neural_network.MLPClassifier.fit not iterating to max_iter if warm started. :pr:18269 by :user:Norbert Preining <norbusan> and :user:Guillaume Lemaitre <glemaitre>.

:mod:sklearn.pipeline .......................

  • |Enhancement| References to transformers passed through transformer_weights to :class:pipeline.FeatureUnion that aren't present in transformer_list will raise a ValueError. :pr:17876 by :user:Cary Goltermann <Ultramann>.

  • |Fix| A slice of a :class:pipeline.Pipeline now inherits the parameters of the original pipeline (memory and verbose). :pr:18429 by :user:Albert Villanova del Moral <albertvillanova> and :user:Paweł Biernat <pwl>.

:mod:sklearn.preprocessing ............................

  • |Feature| :class:preprocessing.OneHotEncoder now supports missing values by treating them as a category. :pr:17317 by Thomas Fan_.

  • |Feature| Add a new handle_unknown parameter with a use_encoded_value option, along with a new unknown_value parameter, to :class:preprocessing.OrdinalEncoder to allow unknown categories during transform and set the encoded value of the unknown categories. :pr:17406 by :user:Felix Wick <FelixWick> and :pr:18406 by Nicolas Hug_.

  • |Feature| Add clip parameter to :class:preprocessing.MinMaxScaler, which clips the transformed values of test data to feature_range. :pr:17833 by :user:Yashika Sharma <yashika51>.

  • |Feature| Add sample_weight parameter to :class:preprocessing.StandardScaler. Allows setting individual weights for each sample. :pr:18510 and :pr:18447 and :pr:16066 and :pr:18682 by :user:Maria Telenczuk <maikia> and :user:Albert Villanova <albertvillanova> and :user:panpiort8 and :user:Alex Gramfort <agramfort>.

  • |Enhancement| Verbose output of :class:model_selection.GridSearchCV has been improved for readability. :pr:16935 by :user:Raghav Rajagopalan <raghavrv> and :user:Chiara Marmo <cmarmo>.

  • |Enhancement| Add unit_variance to :class:preprocessing.RobustScaler, which scales output data such that normally distributed features have a variance of 1. :pr:17193 by :user:Lucy Liu <lucyleeow> and :user:Mabel Villalba <mabelvj>.

  • |Enhancement| Add dtype parameter to :class:preprocessing.KBinsDiscretizer. :pr:16335 by :user:Arthur Imbert <Henley13>.

  • |Fix| Raise error on :meth:sklearn.preprocessing.OneHotEncoder.inverse_transform when handle_unknown='error' and drop=None for samples encoded as all zeros. :pr:14982 by :user:Kevin Winata <kwinata>.

:mod:sklearn.semi_supervised ..............................

  • |MajorFeature| Added :class:semi_supervised.SelfTrainingClassifier, a meta-classifier that allows any supervised classifier to function as a semi-supervised classifier that can learn from unlabeled data. :issue:11682 by :user:Oliver Rausch <orausch> and :user:Patrice Becker <pr0duktiv>.

  • |Fix| Fix incorrect encoding when using unicode string dtypes in :class:preprocessing.OneHotEncoder and :class:preprocessing.OrdinalEncoder. :pr:15763 by Thomas Fan_.

:mod:sklearn.svm ..................

  • |Enhancement| invoke SciPy BLAS API for SVM kernel function in fit, predict and related methods of :class:svm.SVC, :class:svm.NuSVC, :class:svm.SVR, :class:svm.NuSVR, :class:svm.OneClassSVM. :pr:16530 by :user:Shuhua Fan <jim0421>.

:mod:sklearn.tree ...................

  • |Feature| :class:tree.DecisionTreeRegressor now supports the new splitting criterion 'poisson' useful for modeling count data. :pr:17386 by :user:Christian Lorentzen <lorentzenchr>.

  • |Enhancement| :func:tree.plot_tree now uses colors from the matplotlib configuration settings. :pr:17187 by Andreas Müller_.

  • |API| The parameter X_idx_sorted is now deprecated in :meth:tree.DecisionTreeClassifier.fit and :meth:tree.DecisionTreeRegressor.fit, and has no effect. :pr:17614 by :user:Juan Carlos Alfaro Jiménez <alfaro96>.

:mod:sklearn.utils ....................

  • |Enhancement| Add check_methods_sample_order_invariance to :func:~utils.estimator_checks.check_estimator, which checks that estimator methods are invariant if applied to the same dataset with different sample order :pr:17598 by :user:Jason Ngo <ngojason9>.

  • |Enhancement| Add support for weights in utils.sparse_func.incr_mean_variance_axis. By :user:Maria Telenczuk <maikia> and :user:Alex Gramfort <agramfort>.

  • |Fix| Raise ValueError with clear error message in :func:utils.check_array for sparse DataFrames with mixed types. :pr:17992 by :user:Thomas J. Fan <thomasjpfan> and :user:Alex Shacked <alexshacked>.

  • |Fix| Allow serialized tree based models to be unpickled on a machine with different endianness. :pr:17644 by :user:Qi Zhang <qzhang90>.

  • |Fix| Check that we raise proper error when axis=1 and the dimensions do not match in utils.sparse_func.incr_mean_variance_axis. By :user:Alex Gramfort <agramfort>.

Miscellaneous .............

  • |Enhancement| Calls to repr are now faster when print_changed_only=True, especially with meta-estimators. :pr:18508 by :user:Nathan C. <Xethan>.

.. rubric:: Code and documentation contributors

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.23, including:

Abo7atm, Adam Spannbauer, Adrin Jalali, adrinjalali, Agamemnon Krasoulis, Akshay Deodhar, Albert Villanova del Moral, Alessandro Gentile, Alex Henrie, Alex Itkes, Alex Liang, Alexander Lenail, alexandracraciun, Alexandre Gramfort, alexshacked, Allan D Butler, Amanda Dsouza, amy12xx, Anand Tiwari, Anderson Nelson, Andreas Mueller, Ankit Choraria, Archana Subramaniyan, Arthur Imbert, Ashutosh Hathidara, Ashutosh Kushwaha, Atsushi Nukariya, Aura Munoz, AutoViz and Auto_ViML, Avi Gupta, Avinash Anakal, Ayako YAGI, barankarakus, barberogaston, beatrizsmg, Ben Mainye, Benjamin Bossan, Benjamin Pedigo, Bharat Raghunathan, Bhavika Devnani, Biprateep Dey, bmaisonn, Bo Chang, Boris Villazón-Terrazas, brigi, Brigitta Sipőcz, Bruno Charron, Byron Smith, Cary Goltermann, Cat Chenal, CeeThinwa, chaitanyamogal, Charles Patel, Chiara Marmo, Christian Kastner, Christian Lorentzen, Christoph Deil, Christos Aridas, Clara Matos, clmbst, Coelhudo, crispinlogan, Cristina Mulas, Daniel López, Daniel Mohns, darioka, Darshan N, david-cortes, Declan O'Neill, Deeksha Madan, Elizabeth DuPre, Eric Fiegel, Eric Larson, Erich Schubert, Erin Khoo, Erin R Hoffman, eschibli, Felix Wick, fhaselbeck, Forrest Koch, Francesco Casalegno, Frans Larsson, Gael Varoquaux, Gaurav Desai, Gaurav Sheni, genvalen, Geoffrey Bolmier, George Armstrong, George Kiragu, Gesa Stupperich, Ghislain Antony Vaillant, Gim Seng, Gordon Walsh, Gregory R. Lee, Guillaume Chevalier, Guillaume Lemaitre, Haesun Park, Hannah Bohle, Hao Chun Chang, Harry Scholes, Harsh Soni, Henry, Hirofumi Suzuki, Hitesh Somani, Hoda1394, Hugo Le Moine, hugorichard, indecisiveuser, Isuru Fernando, Ivan Wiryadi, j0rd1smit, Jaehyun Ahn, Jake Tae, James Hoctor, Jan Vesely, Jeevan Anand Anne, JeroenPeterBos, JHayes, Jiaxiang, Jie Zheng, Jigna Panchal, jim0421, Jin Li, Joaquin Vanschoren, Joel Nothman, Jona Sassenhagen, Jonathan, Jorge Gorbe Moya, Joseph Lucas, Joshua Newton, Juan Carlos Alfaro Jiménez, Julien Jerphanion, Justin Huber, Jérémie du Boisberranger, Kartik Chugh, Katarina Slama, kaylani2, Kendrick Cetina, Kenny Huynh, Kevin Markham, Kevin Winata, Kiril Isakov, kishimoto, Koki Nishihara, Krum Arnaudov, Kyle Kosic, Lauren Oldja, Laurenz Reitsam, Lisa Schwetlick, Louis Douge, Louis Guitton, Lucy Liu, Madhura Jayaratne, maikia, Manimaran, Manuel López-Ibáñez, Maren Westermann, Maria Telenczuk, Mariam-ke, Marijn van Vliet, Markus Löning, Martin Scheubrein, Martina G. Vilas, Martina Megasari, Mateusz Górski, mathschy, mathurinm, Matthias Bussonnier, Max Del Giudice, Michael, Milan Straka, Muoki Caleb, N. Haiat, Nadia Tahiri, Ph. D, Naoki Hamada, Neil Botelho, Nicolas Hug, Nils Werner, noelano, Norbert Preining, oj_lappi, Oleh Kozynets, Olivier Grisel, Pankaj Jindal, Pardeep Singh, Parthiv Chigurupati, Patrice Becker, Pete Green, pgithubs, Poorna Kumar, Prabakaran Kumaresshan, Probinette4, pspachtholz, pwalchessen, Qi Zhang, rachel fischoff, Rachit Toshniwal, Rafey Iqbal Rahman, Rahul Jakhar, Ram Rachum, RamyaNP, rauwuckl, Ravi Kiran Boggavarapu, Ray Bell, Reshama Shaikh, Richard Decal, Rishi Advani, Rithvik Rao, Rob Romijnders, roei, Romain Tavenard, Roman Yurchak, Ruby Werman, Ryotaro Tsukada, sadak, Saket Khandelwal, Sam, Sam Ezebunandu, Sam Kimbinyi, Sarah Brown, Saurabh Jain, Sean O. Stalley, Sergio, Shail Shah, Shane Keller, Shao Yang Hong, Shashank Singh, Shooter23, Shubhanshu Mishra, simonamaggio, Soledad Galli, Srimukh Sripada, Stephan Steinfurt, subrat93, Sunitha Selvan, Swier, Sylvain Marié, SylvainLan, t-kusanagi2, Teon L Brooks, Terence Honles, Thijs van den Berg, Thomas J Fan, Thomas J. Fan, Thomas S Benjamin, Thomas9292, Thorben Jensen, tijanajovanovic, Timo Kaufmann, tnwei, Tom Dupré la Tour, Trevor Waite, ufmayer, Umberto Lupo, Venkatachalam N, Vikas Pandey, Vinicius Rios Fuck, Violeta, watchtheblur, Wenbo Zhao, willpeppo, xavier dupré, Xethan, Xue Qianming, xun-tang, yagi-3, Yakov Pchelintsev, Yashika Sharma, Yi-Yan Ge, Yue Wu, Yutaro Ikeda, Zaccharie Ramzi, zoj613, Zhao Feng.