doc/whats_new/v0.13.rst
.. include:: _contributors.rst
.. currentmodule:: sklearn
.. _changes_0_13_1:
February 23, 2013
The 0.13.1 release only fixes some bugs and does not add any new functionality.
Fixed a testing error caused by the function cross_validation.train_test_split being
interpreted as a test by Yaroslav Halchenko_.
Fixed a bug in the reassignment of small clusters in the :class:cluster.MiniBatchKMeans
by Gael Varoquaux_.
Fixed default value of gamma in :class:decomposition.KernelPCA by Lars Buitinck_.
Updated joblib to 0.7.0d by Gael Varoquaux_.
Fixed scaling of the deviance in :class:ensemble.GradientBoostingClassifier by Peter Prettenhofer_.
Better tie-breaking in :class:multiclass.OneVsOneClassifier by Andreas Müller_.
Other small improvements to tests and documentation.
List of contributors for release 0.13.1 by number of commits.
Lars Buitinck_Andreas Müller_Gael Varoquaux_Peter Prettenhofer_Gilles Louppe_Mathieu Blondel_Nelle Varoquaux_Vlad Niculae_Yaroslav Halchenko_.. _changes_0_13:
January 21, 2013
:class:dummy.DummyClassifier and :class:dummy.DummyRegressor, two
data-independent predictors by Mathieu Blondel. Useful to sanity-check
your estimators. See :ref:dummy_estimators in the user guide.
Multioutput support added by Arnaud Joly.
:class:decomposition.FactorAnalysis, a transformer implementing the
classical factor analysis, by Christian Osendorfer_ and Alexandre Gramfort_. See :ref:FA in the user guide.
:class:feature_extraction.FeatureHasher, a transformer implementing the
"hashing trick" for fast, low-memory feature extraction from string fields
by Lars Buitinck_ and :class:feature_extraction.text.HashingVectorizer
for text documents by Olivier Grisel_ See :ref:feature_hashing and
:ref:hashing_vectorizer for the documentation and sample usage.
:class:pipeline.FeatureUnion, a transformer that concatenates
results of several other transformers by Andreas Müller_. See
:ref:feature_union in the user guide.
:class:random_projection.GaussianRandomProjection,
:class:random_projection.SparseRandomProjection and the function
:func:random_projection.johnson_lindenstrauss_min_dim. The first two are
transformers implementing Gaussian and sparse random projection matrix
by Olivier Grisel_ and Arnaud Joly_.
See :ref:random_projection in the user guide.
:class:kernel_approximation.Nystroem, a transformer for approximating
arbitrary kernels by Andreas Müller_. See
:ref:nystroem_kernel_approx in the user guide.
:class:preprocessing.OneHotEncoder, a transformer that computes binary
encodings of categorical features by Andreas Müller_. See
:ref:preprocessing_categorical_features in the user guide.
:class:linear_model.PassiveAggressiveClassifier and
:class:linear_model.PassiveAggressiveRegressor, predictors implementing
an efficient stochastic optimization for linear models by Rob Zinkov_ and
Mathieu Blondel_. See :ref:passive_aggressive in the user
guide.
:class:ensemble.RandomTreesEmbedding, a transformer for creating high-dimensional
sparse representations using ensembles of totally random trees by Andreas Müller_.
See :ref:random_trees_embedding in the user guide.
:class:manifold.SpectralEmbedding and function
:func:manifold.spectral_embedding, implementing the "laplacian
eigenmaps" transformation for non-linear dimensionality reduction by Wei
Li. See :ref:spectral_embedding in the user guide.
:class:isotonic.IsotonicRegression by Fabian Pedregosa, Alexandre Gramfort
and Nelle Varoquaux_,
:func:metrics.zero_one_loss (formerly metrics.zero_one) now has
an option for normalized output that reports the fraction of
misclassifications, rather than the raw number of misclassifications. By
Kyle Beauchamp.
:class:tree.DecisionTreeClassifier and all derived ensemble models now
support sample weighting, by Noel Dawe_ and Gilles Louppe_.
Speedup improvement when using bootstrap samples in forests of randomized
trees, by Peter Prettenhofer_ and Gilles Louppe_.
Partial dependence plots for :ref:gradient_boosting in
ensemble.partial_dependence.partial_dependence by Peter Prettenhofer_. See :ref:sphx_glr_auto_examples_inspection_plot_partial_dependence.py for an
example.
The table of contents on the website has now been made expandable by
Jaques Grobler_.
:class:feature_selection.SelectPercentile now breaks ties
deterministically instead of returning all equally ranked features.
:class:feature_selection.SelectKBest and
:class:feature_selection.SelectPercentile are more numerically stable
since they use scores, rather than p-values, to rank results. This means
that they might sometimes select different features than they did
previously.
Ridge regression and ridge classification fitting with sparse_cg solver
no longer has quadratic memory complexity, by Lars Buitinck_ and
Fabian Pedregosa_.
Ridge regression and ridge classification now support a new fast solver
called lsqr, by Mathieu Blondel_.
Speed up of :func:metrics.precision_recall_curve by Conrad Lee.
Added support for reading/writing svmlight files with pairwise
preference attribute (qid in svmlight file format) in
:func:datasets.dump_svmlight_file and
:func:datasets.load_svmlight_file by Fabian Pedregosa_.
Faster and more robust :func:metrics.confusion_matrix and
:ref:clustering_evaluation by Wei Li.
cross_validation.cross_val_score now works with precomputed kernels
and affinity matrices, by Andreas Müller_.
LARS algorithm made more numerically stable with heuristics to drop
regressors too correlated as well as to stop the path when
numerical noise becomes predominant, by Gael Varoquaux_.
Faster implementation of :func:metrics.precision_recall_curve by
Conrad Lee.
New kernel metrics.chi2_kernel by Andreas Müller_, often used
in computer vision applications.
Fix of longstanding bug in :class:naive_bayes.BernoulliNB fixed by
Shaun Jackman.
Implemented predict_proba in :class:multiclass.OneVsRestClassifier,
by Andrew Winterman.
Improve consistency in gradient boosting: estimators
:class:ensemble.GradientBoostingRegressor and
:class:ensemble.GradientBoostingClassifier use the estimator
:class:tree.DecisionTreeRegressor instead of the
tree._tree.Tree data structure by Arnaud Joly_.
Fixed a floating point exception in the :ref:decision trees <tree>
module, by Seberg.
Fix :func:metrics.roc_curve fails when y_true has only one class
by Wei Li.
Add the :func:metrics.mean_absolute_error function which computes the
mean absolute error. The :func:metrics.mean_squared_error,
:func:metrics.mean_absolute_error and
:func:metrics.r2_score metrics support multioutput by Arnaud Joly_.
Fixed class_weight support in :class:svm.LinearSVC and
:class:linear_model.LogisticRegression by Andreas Müller_. The meaning
of class_weight was reversed as erroneously higher weight meant less
positives of a given class in earlier releases.
Improve narrative documentation and consistency in
:mod:sklearn.metrics for regression and classification metrics
by Arnaud Joly_.
Fixed a bug in :class:sklearn.svm.SVC when using csr-matrices with
unsorted indices by Xinfan Meng and Andreas Müller_.
:class:cluster.MiniBatchKMeans: Add random reassignment of cluster centers
with little observations attached to them, by Gael Varoquaux_.
Renamed all occurrences of n_atoms to n_components for consistency.
This applies to :class:decomposition.DictionaryLearning,
:class:decomposition.MiniBatchDictionaryLearning,
:func:decomposition.dict_learning, :func:decomposition.dict_learning_online.
Renamed all occurrences of max_iters to max_iter for consistency.
This applies to semi_supervised.LabelPropagation and
semi_supervised.label_propagation.LabelSpreading.
Renamed all occurrences of learn_rate to learning_rate for
consistency in ensemble.BaseGradientBoosting and
:class:ensemble.GradientBoostingRegressor.
The module sklearn.linear_model.sparse is gone. Sparse matrix support
was already integrated into the "regular" linear models.
sklearn.metrics.mean_square_error, which incorrectly returned the
accumulated error, was removed. Use :func:metrics.mean_squared_error instead.
Passing class_weight parameters to fit methods is no longer
supported. Pass them to estimator constructors instead.
GMMs no longer have decode and rvs methods. Use the score,
predict or sample methods instead.
The solver fit option in Ridge regression and classification is now
deprecated and will be removed in v0.14. Use the constructor option
instead.
feature_extraction.text.DictVectorizer now returns sparse
matrices in the CSR format, instead of COO.
Renamed k in cross_validation.KFold and
cross_validation.StratifiedKFold to n_folds, renamed
n_bootstraps to n_iter in cross_validation.Bootstrap.
Renamed all occurrences of n_iterations to n_iter for consistency.
This applies to cross_validation.ShuffleSplit,
cross_validation.StratifiedShuffleSplit,
:func:utils.extmath.randomized_range_finder and
:func:utils.extmath.randomized_svd.
Replaced rho in :class:linear_model.ElasticNet and
:class:linear_model.SGDClassifier by l1_ratio. The rho parameter
had different meanings; l1_ratio was introduced to avoid confusion.
It has the same meaning as previously rho in
:class:linear_model.ElasticNet and (1-rho) in
:class:linear_model.SGDClassifier.
:class:linear_model.LassoLars and :class:linear_model.Lars now
store a list of paths in the case of multiple targets, rather than
an array of paths.
The attribute gmm of hmm.GMMHMM was renamed to gmm_
to adhere more strictly with the API.
cluster.spectral_embedding was moved to
:func:manifold.spectral_embedding.
Renamed eig_tol in :func:manifold.spectral_embedding,
:class:cluster.SpectralClustering to eigen_tol, renamed mode
to eigen_solver.
Renamed mode in :func:manifold.spectral_embedding and
:class:cluster.SpectralClustering to eigen_solver.
classes_ and n_classes_ attributes of
:class:tree.DecisionTreeClassifier and all derived ensemble models are
now flat in case of single output problems and nested in case of
multi-output problems.
The estimators_ attribute of
:class:ensemble.GradientBoostingRegressor and
:class:ensemble.GradientBoostingClassifier is now an
array of :class:tree.DecisionTreeRegressor.
Renamed chunk_size to batch_size in
:class:decomposition.MiniBatchDictionaryLearning and
:class:decomposition.MiniBatchSparsePCA for consistency.
:class:svm.SVC and :class:svm.NuSVC now provide a classes_
attribute and support arbitrary dtypes for labels y.
Also, the dtype returned by predict now reflects the dtype of
y during fit (used to be np.float).
Changed default test_size in cross_validation.train_test_split
to None, added possibility to infer test_size from train_size in
cross_validation.ShuffleSplit and
cross_validation.StratifiedShuffleSplit.
Renamed function sklearn.metrics.zero_one to
sklearn.metrics.zero_one_loss. Be aware that the default behavior
in sklearn.metrics.zero_one_loss is different from
sklearn.metrics.zero_one: normalize=False is changed to
normalize=True.
Renamed function metrics.zero_one_score to
:func:metrics.accuracy_score.
:func:datasets.make_circles now has the same number of inner and outer points.
In the Naive Bayes classifiers, the class_prior parameter was moved
from fit to __init__.
List of contributors for release 0.13 by number of commits.
Andreas Müller_Arnaud Joly_Peter Prettenhofer_Gael Varoquaux_Mathieu Blondel_Lars Buitinck_Olivier Grisel_Vlad Niculae_Gilles Louppe_Jaques Grobler_Alexandre Gramfort_Rob Zinkov_Fabian Pedregosa_Christian Osendorfer_Daniel Nouri_Virgile Fritsch <VirgileFritsch>Satrajit Ghosh_James Bergstra_Jake Vanderplas_Robert Layton_Alexandre Passos_