============ Version 0.14

.. _changes_0_14:

Version 0.14

August 7, 2013

Changelog

Missing values with sparse and dense matrices can be imputed with the transformer preprocessing.Imputer by Nicolas Trésegnie_.
The core implementation of decision trees has been rewritten from scratch, allowing for faster tree induction and lower memory consumption in all tree-based estimators. By Gilles Louppe_.
Added :class:ensemble.AdaBoostClassifier and :class:ensemble.AdaBoostRegressor, by Noel Dawe_ and Gilles Louppe_. See the :ref:AdaBoost <adaboost> section of the user guide for details and examples.
Added grid_search.RandomizedSearchCV and grid_search.ParameterSampler for randomized hyperparameter optimization. By Andreas Müller_.
Added :ref:biclustering <biclustering> algorithms (sklearn.cluster.bicluster.SpectralCoclustering and sklearn.cluster.bicluster.SpectralBiclustering), data generation methods (:func:sklearn.datasets.make_biclusters and :func:sklearn.datasets.make_checkerboard), and scoring metrics (:func:sklearn.metrics.consensus_score). By Kemal Eren_.
Added :ref:Restricted Boltzmann Machines<rbm> (:class:neural_network.BernoulliRBM). By Yann Dauphin_.
Python 3 support by :user:Justin Vincent <justinvf>, Lars Buitinck, :user:Subhodeep Moitra <smoitra87> and Olivier Grisel. All tests now pass under Python 3.3.
Ability to pass one penalty (alpha value) per target in :class:linear_model.Ridge, by @eickenberg and Mathieu Blondel_.
Fixed sklearn.linear_model.stochastic_gradient.py L2 regularization issue (minor practical significance). By :user:Norbert Crombach <norbert> and Mathieu Blondel_ .
Added an interactive version of Andreas Müller's Machine Learning Cheat Sheet (for scikit-learn) <https://peekaboo-vision.blogspot.de/2013/01/machine-learning-cheat-sheet-for-scikit.html> to the documentation. See :ref:Choosing the right estimator <ml_map>. By Jaques Grobler_.
grid_search.GridSearchCV and cross_validation.cross_val_score now support the use of advanced scoring functions such as area under the ROC curve and f-beta scores. See :ref:scoring_parameter for details. By Andreas Müller_ and Lars Buitinck_. Passing a function from :mod:sklearn.metrics as score_func is deprecated.
Multi-label classification output is now supported by :func:metrics.accuracy_score, :func:metrics.zero_one_loss, :func:metrics.f1_score, :func:metrics.fbeta_score, :func:metrics.classification_report, :func:metrics.precision_score and :func:metrics.recall_score by Arnaud Joly_.
Two new metrics :func:metrics.hamming_loss and metrics.jaccard_similarity_score are added with multi-label support by Arnaud Joly_.
Speed and memory usage improvements in :class:feature_extraction.text.CountVectorizer and :class:feature_extraction.text.TfidfVectorizer, by Jochen Wersdörfer and Roman Sinayev.
The min_df parameter in :class:feature_extraction.text.CountVectorizer and :class:feature_extraction.text.TfidfVectorizer, which used to be 2, has been reset to 1 to avoid unpleasant surprises (empty vocabularies) for novice users who try it out on tiny document collections. A value of at least 2 is still recommended for practical use.
:class:svm.LinearSVC, :class:linear_model.SGDClassifier and :class:linear_model.SGDRegressor now have a sparsify method that converts their coef_ into a sparse matrix, meaning stored models trained using these estimators can be made much more compact.
:class:linear_model.SGDClassifier now produces multiclass probability estimates when trained under log loss or modified Huber loss.
Hyperlinks to documentation in example code on the website by :user:Martin Luessi <mluessi>.
Fixed bug in :class:preprocessing.MinMaxScaler causing incorrect scaling of the features for non-default feature_range settings. By Andreas Müller_.
max_features in :class:tree.DecisionTreeClassifier, :class:tree.DecisionTreeRegressor and all derived ensemble estimators now support percentage values. By Gilles Louppe_.
Performance improvements in :class:isotonic.IsotonicRegression by Nelle Varoquaux_.
:func:metrics.accuracy_score has an option normalize to return the fraction or the number of correctly classified samples by Arnaud Joly_.
Added :func:metrics.log_loss that computes log loss, aka cross-entropy loss. By Jochen Wersdörfer and Lars Buitinck_.
A bug that caused :class:ensemble.AdaBoostClassifier's to output incorrect probabilities has been fixed.
Feature selectors now share a mixin providing consistent transform, inverse_transform and get_support methods. By Joel Nothman_.
A fitted grid_search.GridSearchCV or grid_search.RandomizedSearchCV can now generally be pickled. By Joel Nothman_.
Refactored and vectorized implementation of :func:metrics.roc_curve and :func:metrics.precision_recall_curve. By Joel Nothman_.
The new estimator :class:sklearn.decomposition.TruncatedSVD performs dimensionality reduction using SVD on sparse matrices, and can be used for latent semantic analysis (LSA). By Lars Buitinck_.
Added self-contained example of out-of-core learning on text data :ref:sphx_glr_auto_examples_applications_plot_out_of_core_classification.py. By :user:Eustache Diemert <oddskool>.
The default number of components for sklearn.decomposition.RandomizedPCA is now correctly documented to be n_features. This was the default behavior, so programs using it will continue to work as they did.
:class:sklearn.cluster.KMeans now fits several orders of magnitude faster on sparse data (the speedup depends on the sparsity). By Lars Buitinck_.
Reduce memory footprint of FastICA by Denis Engemann_ and Alexandre Gramfort_.
Verbose output in sklearn.ensemble.gradient_boosting now uses a column format and prints progress in decreasing frequency. It also shows the remaining time. By Peter Prettenhofer_.
sklearn.ensemble.gradient_boosting provides out-of-bag improvement oob_improvement_ rather than the OOB score for model selection. An example that shows how to use OOB estimates to select the number of trees was added. By Peter Prettenhofer_.
Most metrics now support string labels for multiclass classification by Arnaud Joly_ and Lars Buitinck_.
New OrthogonalMatchingPursuitCV class by Alexandre Gramfort_ and Vlad Niculae_.
Fixed a bug in sklearn.covariance.GraphLassoCV: the 'alphas' parameter now works as expected when given a list of values. By Philippe Gervais.
Fixed an important bug in sklearn.covariance.GraphLassoCV that prevented all folds provided by a CV object to be used (only the first 3 were used). When providing a CV object, execution time may thus increase significantly compared to the previous version (bug results are correct now). By Philippe Gervais.
cross_validation.cross_val_score and the grid_search module is now tested with multi-output data by Arnaud Joly_.
:func:datasets.make_multilabel_classification can now return the output in label indicator multilabel format by Arnaud Joly_.
K-nearest neighbors, :class:neighbors.KNeighborsRegressor and :class:neighbors.RadiusNeighborsRegressor, and radius neighbors, :class:neighbors.RadiusNeighborsRegressor and :class:neighbors.RadiusNeighborsClassifier support multioutput data by Arnaud Joly_.
Random state in LibSVM-based estimators (:class:svm.SVC, :class:svm.NuSVC, :class:svm.OneClassSVM, :class:svm.SVR, :class:svm.NuSVR) can now be controlled. This is useful to ensure consistency in the probability estimates for the classifiers trained with probability=True. By Vlad Niculae_.
Out-of-core learning support for discrete naive Bayes classifiers :class:sklearn.naive_bayes.MultinomialNB and :class:sklearn.naive_bayes.BernoulliNB by adding the partial_fit method by Olivier Grisel_.
New website design and navigation by Gilles Louppe, Nelle Varoquaux, Vincent Michel and Andreas Müller_.
Improved documentation on :ref:multi-class, multi-label and multi-output classification <multiclass> by Yannick Schwartz_ and Arnaud Joly_.
Better input and error handling in the :mod:sklearn.metrics module by Arnaud Joly_ and Joel Nothman_.
Speed optimization of the hmm module by :user:Mikhail Korobov <kmike>
Significant speed improvements for :class:sklearn.cluster.DBSCAN by cleverless <https://github.com/cleverless>_

API changes summary

The auc_score was renamed :func:metrics.roc_auc_score.
Testing scikit-learn with sklearn.test() is deprecated. Use nosetests sklearn from the command line.
Feature importances in :class:tree.DecisionTreeClassifier, :class:tree.DecisionTreeRegressor and all derived ensemble estimators are now computed on the fly when accessing the feature_importances_ attribute. Setting compute_importances=True is no longer required. By Gilles Louppe_.
:class:linear_model.lasso_path and :class:linear_model.enet_path can return its results in the same format as that of :class:linear_model.lars_path. This is done by setting the return_models parameter to False. By Jaques Grobler_ and Alexandre Gramfort_
grid_search.IterGrid was renamed to grid_search.ParameterGrid.
Fixed bug in KFold causing imperfect class balance in some cases. By Alexandre Gramfort_ and Tadej Janež.
:class:sklearn.neighbors.BallTree has been refactored, and a :class:sklearn.neighbors.KDTree has been added which shares the same interface. The Ball Tree now works with a wide variety of distance metrics. Both classes have many new methods, including single-tree and dual-tree queries, breadth-first and depth-first searching, and more advanced queries such as kernel density estimation and 2-point correlation functions. By Jake Vanderplas_
Support for scipy.spatial.cKDTree within neighbors queries has been removed, and the functionality replaced with the new :class:sklearn.neighbors.KDTree class.
:class:sklearn.neighbors.KernelDensity has been added, which performs efficient kernel density estimation with a variety of kernels.
:class:sklearn.decomposition.KernelPCA now always returns output with n_components components, unless the new parameter remove_zero_eig is set to True. This new behavior is consistent with the way kernel PCA was always documented; previously, the removal of components with zero eigenvalues was tacitly performed on all data.
gcv_mode="auto" no longer tries to perform SVD on a densified sparse matrix in :class:sklearn.linear_model.RidgeCV.
Sparse matrix support in sklearn.decomposition.RandomizedPCA is now deprecated in favor of the new TruncatedSVD.
cross_validation.KFold and cross_validation.StratifiedKFold now enforce n_folds >= 2 otherwise a ValueError is raised. By Olivier Grisel_.
:func:datasets.load_files's charset and charset_errors parameters were renamed encoding and decode_errors.
Attribute oob_score_ in :class:sklearn.ensemble.GradientBoostingRegressor and :class:sklearn.ensemble.GradientBoostingClassifier is deprecated and has been replaced by oob_improvement_ .
Attributes in OrthogonalMatchingPursuit have been deprecated (copy_X, Gram, ...) and precompute_gram renamed precompute for consistency. See #2224.
:class:sklearn.preprocessing.StandardScaler now converts integer input to float, and raises a warning. Previously it rounded for dense integer input.
:class:sklearn.multiclass.OneVsRestClassifier now has a decision_function method. This will return the distance of each sample from the decision boundary for each class, as long as the underlying estimators implement the decision_function method. By Kyle Kastner_.
Better input validation, warning on unexpected shapes for y.

People

List of contributors for release 0.14 by number of commits.

277 Gilles Louppe
245 Lars Buitinck
187 Andreas Mueller
124 Arnaud Joly
112 Jaques Grobler
109 Gael Varoquaux
107 Olivier Grisel
102 Noel Dawe
99 Kemal Eren
79 Joel Nothman
75 Jake VanderPlas
73 Nelle Varoquaux
71 Vlad Niculae
65 Peter Prettenhofer
64 Alexandre Gramfort
54 Mathieu Blondel
38 Nicolas Trésegnie
35 eustache
27 Denis Engemann
25 Yann N. Dauphin
19 Justin Vincent
17 Robert Layton
15 Doug Coleman
14 Michael Eickenberg
13 Robert Marchman
11 Fabian Pedregosa
11 Philippe Gervais
10 Jim Holmström
10 Tadej Janež
10 syhw
9 Mikhail Korobov
9 Steven De Gryze
8 sergeyf
7 Ben Root
7 Hrishikesh Huilgolkar
6 Kyle Kastner
6 Martin Luessi
6 Rob Speer
5 Federico Vaggi
5 Raul Garreta
5 Rob Zinkov
4 Ken Geis
3 A. Flaxman
3 Denton Cockburn
3 Dougal Sutherland
3 Ian Ozsvald
3 Johannes Schönberger
3 Robert McGibbon
3 Roman Sinayev
3 Szabo Roland
2 Diego Molla
2 Imran Haque
2 Jochen Wersdörfer
2 Sergey Karayev
2 Yannick Schwartz
2 jamestwebber
1 Abhijeet Kolhe
1 Alexander Fabisch
1 Bastiaan van den Berg
1 Benjamin Peterson
1 Daniel Velkov
1 Fazlul Shahriar
1 Felix Brockherde
1 Félix-Antoine Fortin
1 Harikrishnan S
1 Jack Hale
1 JakeMick
1 James McDermott
1 John Benediktsson
1 John Zwinck
1 Joshua Vredevoogd
1 Justin Pati
1 Kevin Hughes
1 Kyle Kelley
1 Matthias Ekman
1 Miroslav Shubernetskiy
1 Naoki Orii
1 Norbert Crombach
1 Rafael Cunha de Almeida
1 Rolando Espinoza La fuente
1 Seamus Abshere
1 Sergey Feldman
1 Sergio Medina
1 Stefano Lattarini
1 Steve Koch
1 Sturla Molden
1 Thomas Jarosch
1 Yaroslav Halchenko