docs/source/release/version0.9.rst
:orphan:
statsmodels is using github to store the updated documentation which is available under https://www.statsmodels.org/stable for the last release, and https://www.statsmodels.org/devel/ for the development version.
Warning
API stability is not guaranteed for new features, although even in this case changes will be made in a backwards compatible way if possible. The stability of a new feature depends on how much time it was already in statsmodels main and how much usage it has already seen. If there are specific known problems or limitations, then they are mentioned in the docstrings.
The list of pull requests for this release can be found on github https://github.com/statsmodels/statsmodels/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Amerged+milestone%3A0.9 (The list does not include some pull request that were merged before the 0.8 release but not included in 0.8.)
The following lists the main new features of statsmodels 0.9. In addition, release 0.9 includes bug fixes, refactorings and improvements in many areas.
base
discrete
new count models (Evgeny Zhurko GSOC, Josef Perktold)
discrete optimization improvements #3921, #3928 (Josef Perktold)
extend discrete margin when extra params, NegativeBinomial #3811 (Josef Perktold)
duration
genmod
graphics
imputation
iolib
multivariate
regression
stats
tools
tsa
maintenance
bug-wrong
A new issue label `type-bug-wrong` indicates bugs that cause that incorrect
numbers are returned without warnings.
(Regular bugs are mostly usability bugs or bugs that raise an exception for
unsupported use cases.)
see https://github.com/statsmodels/statsmodels/issues?q=is%3Aissue+label%3Atype-bug-wrong+is%3Aclosed+milestone%3A0.9
- scale in GLM fit_constrained, #4193 fixed in #4195
cov_params and bse were incorrect if scale is estimated as in Gaussian.
(This did not affect families with scale=1 such as Poisson)
- incorrect `pearson_chi2` with binomial counts, #3612 fixed as part of #3692
- null_deviance and llnull in GLMResults were wrong if exposure was used and
when offset was used with Binomial counts.
- GLM Binomial in the non-binary count case used incorrect endog in recreating
models which is
used by fit_regularized and fit_constrained #4599.
- GLM observed hessian was incorrectly computed if non-canonical link is used,
fixed in #4620
This fix improves convergence with gradient optimization and removes a usually
numerically small error in cov_params.
- discrete predict with offset or exposure, #3569 fixed in #3696
If either offset or exposure are not None but exog is None, then offset and
exposure arguments in predict were ignored.
- discrete margins had wrong dummy and count effect if constant is prepended,
#3695 fixed in #3696
- OLS outlier test, wrong index if order is True, #3971 fixed in #4385
- tsa coint ignored the autolag keyword, #3966 fixed in #4492
This is a backwards incompatible change in default, instead of fixed maxlag
it defaults now to 'aic' lag selection. The default autolag is now the same
as the adfuller default.
- wrong confidence interval in contingency table summary, #3822 fixed in #3830
This only affected the summary and not the corresponding attribute.
- incorrect results in summary_col if regressor_order is used,
#3767 fixed in #4271
Description of selected new feature
-----------------------------------
The following provides more information about a selected set of new features.
Vector Error Correction Model (VECM)
The VECM framework developed during GSOC 2016 by Aleksandar Karakas adds support for non-stationary cointegrated VAR processes to statsmodels. Currently, the following topics are implemented
New methods have been added also to the existing VAR model, and VAR has now limited support for user provided explanatory variables.
New count models have been added as part of GSOC 2017 by Evgeny Zhurko. Additional models that are not yet finished will be added for the next release.
The new models are:
Generalized linear mixed models
Limited support for GLIMMIX models is now included in the genmod
module. Binomial and Poisson models with independent random effects
can be fit using Bayesian methods (Laplace and mean field
approximations to the posterior).
Multiple imputation
~~~~~~~~~~~~~~~~~~~
Multiple imputation using a multivariate Gaussian model is now
included in the imputation module. The model is fit via Gibbs
sampling from the joint posterior of the mean vector, covariance
matrix, and missing data values. A convenience function for fitting a
model to the multiply imputed data sets and combining the results is
provided. This is an alternative to the existing MICE (Multiple
Imputation via Chained Equations) procedures.
Exponential smoothing models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Exponential smoothing models are now available (introduced in #4176 by
Terence L van Zyl). These models are conceptually simple, decomposing a time
series into level, trend, and seasonal components that are constructed from
weighted averages of past observations. Nonetheless, they produce forecasts
that are competitive with more advanced models and which may be easier to
interpret.
Available models include:
- Simple exponential smoothing
- Holt's method
- Holt-Winters exponential smoothing
Improved time series index support
Handling of indexes for time series models has been overhauled (#3272) to take advantage of recent improvements in Pandas and to shift to Pandas much of the special case handling (especially for date indexes) that had previously been done in statsmodels. Benefits include more consistent behavior, a reduced number of bugs from corner cases, and a reduction in the maintenance burden.
Although an effort was made to maintain backwards compatibility with this change, it is possible that some undocumented corner cases that previously worked will now raise warnings or exceptions.
State space models
The state space model infrastructure has been rewritten and improved (#2845).
New features include:
- Kalman smoother rewritten in Cython for substantial performance improvements
- Simulation smoother (Durbin and Koopman, 2002)
- Fast simulation of time series for any state space model
- Univariate Kalman filtering and smoothing (Koopman and Durbin, 2000)
- Collapsed Kalman filtering and smoothing (Jungbacker and Koopman, 2014)
- Optional computation of the lag-one state autocovariance
- Use of the Scipy BLAS functions for Cython interface if available
(`scipy.linalg.cython_blas` for Scipy >= 0.16)
These features yield new features and improve performance for the existing
state space models (`SARIMAX`, `UnobservedComopnents`, `DynamicFactor`, and
`VARMAX`), and they also make Bayesian estimation by Gibbs-sampling possible.
**Warning**: this will be the last version that includes the original state
space code and supports Scipy < 0.16. The next release will only include the
new state space code.
Unobserved components models: frequency-domain seasonals
Unobserved components models now support modeling seasonal factors from a frequency-domain perspective with user-specified period and harmonics (introduced in #4250 by Jordan Yoder). This not only allows for multiple seasonal effects, but also allows the representation of seasonal components with fewer unobserved states. This can improve computational performance and, since it allows for a more parsimonious model, may also improve the out-of-sample performance of the model.
see github issues for a list of bug fixes included in this release https://github.com/statsmodels/statsmodels/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Amerged+milestone%3A0.9+label%3Atype-bug https://github.com/statsmodels/statsmodels/pulls?q=is%3Apr+is%3Amerged+milestone%3A0.9+label%3Atype-bug-wrong
Refitting elastic net regularized models using the refit=True
option now returns the unregularized parameters for the coefficients
selected by the regularized fitter, as documented. #4213
In MixedLM, a bug that produced exceptions when calling
random_effects_cov on models with variance components has been
fixed.
DynamicVAR and DynamicPanelVAR is deprecated and will be removed in a future version. It used rolling OLS from pandas which has been removed in pandas.
In MixedLM, names for the random effects variance and covariance parameters have changed from, e.g. G RE to G Var or G x F Cov. This impacts summary output, and also may require modifications to user code that extracted these parameters from the fitted results object by name.
In MixedLM, the names for the random effects realizations for variance components have been changed. When using formulas, the random effect realizations are named using the column names produced by Patsy when parsing the formula.
Besides receiving contributions for new and improved features and for bugfixes, important contributions to general maintenance for this release came from
and the general maintainer and code reviewer
Additionally, many users contributed by participation in github issues and providing feedback.
Thanks to all of the contributors for the 0.9 release (based on git log):
.. note::
* Aleksandar Karakas
* Alex Fortin
* Alexander Belopolsky
* Brock Mendel
* Chad Fulton
* ChadFulton
* Christian Lorentzen
* Dave Willmer
* Dror Atariah
* Evgeny Zhurko
* Gerard Brunick
* Greg Mosby
* Jacob Kimmel
* Jamie Morton
* Jarvis Miller
* Jasmine Mou
* Jeroen Van Goey
* Jim Correia
* Joon Ro
* Jordan Yoder
* Jorge C. Leitao
* Josef Perktold
* Joses W. Ho
* José Lopez
* Joshua Engelman
* Juan Escamilla
* Justin Bois
* Kerby Shedden
* Kernc
* Kevin Sheppard
* Leland Bybee
* Maxim Uvarov
* Michael Kaminsky
* Mosky Liu
* Natasha Watkins
* Nick DeRobertis
* Niels Wouda
* Pamphile ROY
* Peter Quackenbush
* Quentin Andre
* Richard Höchenberger
* Rob Klooster
* Roman Ring
* Scott Tsai
* Soren Fuglede Jorgensen
* Tom Augspurger
* Tommy Odland
* Tony Jiang
* Yichuan Liu
* ftemme
* hugovk
* kiwirob
* malickf
* tvanzyl
* weizhongg
* zveryansky
These lists of names are automatically generated based on git log, and may not be complete.