Skip to content

Commit

Permalink
Release v0.5.1 (#236)
Browse files Browse the repository at this point in the history
* Flake8

* Explicit error when y is an empty list in pg.ttest

#222

* Add keyword arguments in homoscedasticity function

#218

* Bugfix rm_anova and mixed_anova changed the dtypes of categorical columns + added observed=True to all groupby

#224

* Update version number in init and setup

* Use np.isclose for test_pearson == 1

#195

* Coverage for try..except scipy fallback

* Fix set_option for pandas 1.4

* Upgraded dependencies for seaborn and statsmodels

* Added Jarque-Bera test in pg.normality

#216

* Coverage scipy import error

* Use pd.concat instead of frame.append to avoid FutureWarning

* Remove add_categories(inplace=True) to avoid FutureWarning

* GH Discussions instead of Gitter

* Minor doc fix
  • Loading branch information
raphaelvallat authored Feb 20, 2022
1 parent c9a5e41 commit a3e2ba6
Show file tree
Hide file tree
Showing 16 changed files with 121 additions and 136 deletions.
7 changes: 1 addition & 6 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@
.. image:: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244/status.svg
:target: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244

.. image:: https://badges.gitter.im/owner/repo.png
:target: https://gitter.im/pingouin-stats/Lobby

----------------

Expand Down Expand Up @@ -70,10 +68,7 @@ Documentation
Chat
====

If you have questions, please ask them in the public `Gitter chat <https://gitter.im/pingouin-stats/Lobby>`_

.. image:: https://badges.gitter.im/owner/repo.png
:target: https://gitter.im/pingouin-stats/Lobby
If you have questions, please ask them in `GitHub Discussions <https://github.com/raphaelvallat/pingouin/discussions>`_.

Installation
============
Expand Down
19 changes: 15 additions & 4 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,24 @@ What's new

*************

v0.6.0.dev
----------
v0.5.1 (February 2022)
----------------------

This is a minor release, with several bugfixes and improvements. This release is compatible with SciPy 1.8 and Pandas 1.4.

**Bugfixes**

a. Added support for SciPy 1.8 and Pandas 1.4. `PR 234 <https://github.com/raphaelvallat/pingouin/pull/234>`_.
b. Fixed bug where :py:func:`pingouin.rm_anova` and :py:func:`pingouin.mixed_anova` changed the dtypes of categorical columns in-place (`issue 224 <https://github.com/raphaelvallat/pingouin/issues/224>`_).

**Enhancements**

a. Faster implementation of :py:func:`pingouin.gzscore`, adding all options available in zscore: axis, ddof and nan_policy. Warning: this functions is deprecated and will be removed in pingouin 0.7.0 (use scipy.stats.gzscore instead). See `pull request 210 <https://github.com/raphaelvallat/pingouin/pull/210>`_.
b. Replace use of statsmodels' studentized range distribution functions with more SciPy's more accurate `scipy.stats.studentized_range`. See `pull request 229 <https://github.com/raphaelvallat/pingouin/pull/229>`_.
a. Faster implementation of :py:func:`pingouin.gzscore`, adding all options available in zscore: axis, ddof and nan_policy. Warning: this functions is deprecated and will be removed in pingouin 0.7.0 (use :py:func:`scipy.stats.gzscore` instead). `PR 210 <https://github.com/raphaelvallat/pingouin/pull/210>`_.
b. Replace use of statsmodels' studentized range distribution functions with more SciPy's more accurate :py:func:`scipy.stats.studentized_range`. `PR 229 <https://github.com/raphaelvallat/pingouin/pull/229>`_.
c. Add support for optional keywords argument in the :py:func:`pingouin.homoscedasticity` function (`issue 218 <https://github.com/raphaelvallat/pingouin/issues/218>`_).
d. Add support for the Jarque-Bera test in :py:func:`pingouin.normality` (`issue 216 <https://github.com/raphaelvallat/pingouin/issues/216>`_).

Lastly, we have also deprecated the Gitter forum in favor of `GitHub Discussions <https://github.com/raphaelvallat/pingouin/discussions>`_. Please use Discussions to ask questions, share ideas / tips and engage with the Pingouin community!

*************

Expand Down
5 changes: 1 addition & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,6 @@
.. image:: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244/status.svg
:target: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244

.. image:: https://badges.gitter.im/owner/repo.png
:target: https://gitter.im/pingouin-stats/Lobby


----------------

Expand Down Expand Up @@ -108,7 +105,7 @@ Whenever a new release is out there, you can upgrade your version by typing the
Quick start
===========

* If you have *questions*, please ask them in the public `Gitter chat <https://gitter.im/pingouin-stats/Lobby>`_.
* If you have *questions*, please ask them in `GitHub Discussions <https://github.com/raphaelvallat/pingouin/discussions>`_.

* If you want to *report a bug*, please open an issue on the `GitHub repository <https://github.com/raphaelvallat/pingouin>`_.

Expand Down
2 changes: 1 addition & 1 deletion pingouin/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from .config import *

# Current version
__version__ = "0.5.0"
__version__ = "0.5.1"

# Warn if a newer version of Pingouin is available
from outdated import warn_if_outdated
Expand Down
52 changes: 33 additions & 19 deletions pingouin/distribution.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,11 @@ def normality(data, dv=None, group=None, method="shapiro", alpha=.05):
Grouping variable (only when ``data`` is a long-format dataframe).
method : str
Normality test. `'shapiro'` (default) performs the Shapiro-Wilk test
using :py:func:`scipy.stats.shapiro`, and `'normaltest'` performs the
omnibus test of normality using :py:func:`scipy.stats.normaltest`.
The latter is more appropriate for large samples.
using :py:func:`scipy.stats.shapiro`, `'normaltest'` performs the
omnibus test of normality using :py:func:`scipy.stats.normaltest`, `'jarque_bera'` performs
the Jarque-Bera test using :py:func:`scipy.stats.jarque_bera`.
The Omnibus and Jarque-Bera tests are more suitable than the Shapiro test for
large samples.
alpha : float
Significance level.
Expand Down Expand Up @@ -194,9 +196,16 @@ def normality(data, dv=None, group=None, method="shapiro", alpha=.05):
W pval normal
Pre 0.967718 0.478773 True
Post 0.940728 0.095157 True
5. Same but using the Jarque-Bera test
>>> pg.normality(data, dv='Performance', group='Time', method="jarque_bera")
W pval normal
Pre 0.304021 0.858979 True
Post 1.265656 0.531088 True
"""
assert isinstance(data, (pd.DataFrame, pd.Series, list, np.ndarray))
assert method in ['shapiro', 'normaltest']
assert method in ['shapiro', 'normaltest', 'jarque_bera']
if isinstance(data, pd.Series):
data = data.to_frame()
col_names = ['W', 'pval']
Expand Down Expand Up @@ -227,14 +236,13 @@ def normality(data, dv=None, group=None, method="shapiro", alpha=.05):
grp = data.groupby(group, observed=True, sort=False)
cols = grp.groups.keys()
for _, tmp in grp:
stats = stats.append(normality(tmp[dv].to_numpy(),
method=method,
alpha=alpha))
st_grp = normality(tmp[dv].to_numpy(), method=method, alpha=alpha)
stats = pd.concat([stats, st_grp], axis=0, ignore_index=True)
stats.index = cols
return _postprocess_dataframe(stats)


def homoscedasticity(data, dv=None, group=None, method="levene", alpha=.05):
def homoscedasticity(data, dv=None, group=None, method="levene", alpha=.05, **kwargs):
"""Test equality of variance.
Parameters
Expand All @@ -253,6 +261,8 @@ def homoscedasticity(data, dv=None, group=None, method="levene", alpha=.05):
The former is more robust to departure from normality.
alpha : float
Significance level.
**kwargs : optional
Optional argument(s) passed to the lower-level :py:func:`scipy.stats.levene` function.
Returns
-------
Expand Down Expand Up @@ -339,7 +349,13 @@ def homoscedasticity(data, dv=None, group=None, method="levene", alpha=.05):
W pval equal_var
levene 1.173518 0.310707 True
3. Bartlett test using a list of iterables
3. Same but using a mean center
>>> pg.homoscedasticity(data_long, dv="value", group="variable", center="mean")
W pval equal_var
levene 1.572239 0.209303 True
4. Bartlett test using a list of iterables
>>> data = [[4, 8, 9, 20, 14], np.array([5, 8, 15, 45, 12])]
>>> pg.homoscedasticity(data, method="bartlett", alpha=.05)
Expand All @@ -356,30 +372,28 @@ def homoscedasticity(data, dv=None, group=None, method="levene", alpha=.05):
# Get numeric data only
numdata = data._get_numeric_data()
assert numdata.shape[1] > 1, 'Data must have at least two columns.'
statistic, p = func(*numdata.to_numpy().T)
statistic, p = func(*numdata.to_numpy().T, **kwargs)
else:
# Long-format
assert group in data.columns
assert dv in data.columns
grp = data.groupby(group, observed=True)[dv]
assert grp.ngroups > 1, 'Data must have at least two columns.'
statistic, p = func(*grp.apply(list))
statistic, p = func(*grp.apply(list), **kwargs)
elif isinstance(data, list):
# Check that list contains other list or np.ndarray
assert all(isinstance(el, (list, np.ndarray)) for el in data)
assert len(data) > 1, 'Data must have at least two iterables.'
statistic, p = func(*data)
statistic, p = func(*data, **kwargs)
else:
# Data is a dict
assert all(isinstance(el, (list, np.ndarray)) for el in data.values())
assert len(data) > 1, 'Data must have at least two iterables.'
statistic, p = func(*data.values())
statistic, p = func(*data.values(), **kwargs)

equal_var = True if p > alpha else False
stat_name = 'W' if method.lower() == 'levene' else 'T'

stats = pd.DataFrame({stat_name: statistic, 'pval': p,
'equal_var': equal_var}, index=[method])
stats = pd.DataFrame({stat_name: statistic, 'pval': p, 'equal_var': equal_var}, index=[method])

return _postprocess_dataframe(stats)

Expand Down Expand Up @@ -463,7 +477,7 @@ def _check_multilevel_rm(data, func='epsilon'):
# We end up with a one-way design. It is similar to applying
# a paired T-test to gain scores instead of using repeated measures
# on two time points. Here we have computed the gain scores.
data = data.groupby(level=1, axis=1).diff(axis=1).dropna(axis=1)
data = data.groupby(level=1, axis=1, observed=True).diff(axis=1).dropna(axis=1)
data = data.droplevel(level=0, axis=1)
else:
# Both factors have more than 2 levels -- differ from R / JASP
Expand Down Expand Up @@ -498,8 +512,8 @@ def _long_to_wide_rm(data, dv=None, within=None, subject=None):
# Keep all relevant columns and reset index
data = data[_fl([subject, within, dv])]
# Convert to wide-format + collapse to the mean
data = pd.pivot_table(data, index=subject, values=dv, columns=within,
aggfunc='mean', dropna=True, observed=True)
data = pd.pivot_table(
data, index=subject, values=dv, columns=within, aggfunc='mean', dropna=True, observed=True)
return data


Expand Down
27 changes: 7 additions & 20 deletions pingouin/pairwise.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,8 @@ def pairwise_ttests(data=None, dv=None, between=None, within=None, subject=None,
>>> import pandas as pd
>>> import pingouin as pg
>>> pd.set_option('expand_frame_repr', False)
>>> pd.set_option('max_columns', 20)
>>> pd.set_option('display.expand_frame_repr', False)
>>> pd.set_option('display.max_columns', 20)
>>> df = pg.read_dataset('mixed_anova.csv')
>>> pg.pairwise_ttests(dv='Scores', between='Group', data=df).round(3)
Contrast A B Paired Parametric T dof alternative p-unc BF10 hedges
Expand Down Expand Up @@ -426,11 +426,12 @@ def pairwise_ttests(data=None, dv=None, between=None, within=None, subject=None,
else:
tmp = data
# Recursive call to pairwise_ttests
stats = stats.append(pairwise_ttests(
pt = pairwise_ttests(
dv=dv, between=fbt[i], within=fwt[i], subject=subject, data=tmp,
parametric=parametric, marginal=marginal, alpha=alpha, alternative=alternative,
padjust=padjust, effsize=effsize, correction=correction, nan_policy=nan_policy,
return_desc=return_desc), ignore_index=True, sort=False)
return_desc=return_desc)
stats = pd.concat([stats, pt], axis=0, ignore_index=True, sort=False)

# Then compute the interaction between the factors
if interaction:
Expand Down Expand Up @@ -608,12 +609,6 @@ def pairwise_tukey(data=None, dv=None, between=None, effsize='hedges'):
:math:`Q(\\sqrt2|t_i|, r, N - r)` where :math:`r` is the total number of
groups and :math:`N` is the total sample size.
.. warning:: Versions of Pingouin below 0.3.10 used a wrong algorithm for
the studentized range approximation [2]_, which resulted in (slightly)
incorrect p-values. Please make sure you're using the
LATEST VERSION of Pingouin, and always DOUBLE CHECK your results with
another statistical software.
References
----------
.. [1] Tukey, John W. "Comparing individual means in the analysis of
Expand All @@ -635,7 +630,6 @@ def pairwise_tukey(data=None, dv=None, between=None, effsize='hedges'):
1 Adelie Gentoo 3700.662 5076.016 -1375.354 56.148 -24.495 0.000 -2.967
2 Chinstrap Gentoo 3733.088 5076.016 -1342.928 69.857 -19.224 0.000 -2.894
"""

# First compute the ANOVA
# For max precision, make sure rounding is disabled
old_options = options.copy()
Expand Down Expand Up @@ -760,12 +754,6 @@ def pairwise_gameshowell(data=None, dv=None, between=None, effsize='hedges'):
The p-values are then approximated using the Studentized range distribution
:math:`Q(\\sqrt2|t_i|, r, v_i)`.
.. warning:: Versions of Pingouin below 0.3.10 used a wrong algorithm for
the studentized range approximation [2]_, which resulted in (slightly)
incorrect p-values. Please make sure you're using the
LATEST VERSION of Pingouin, and always DOUBLE CHECK your results with
another statistical software.
References
----------
.. [1] Games, Paul A., and John F. Howell. "Pairwise multiple comparison
Expand All @@ -789,7 +777,6 @@ def pairwise_gameshowell(data=None, dv=None, between=None, effsize='hedges'):
1 Adelie Gentoo 3700.662 5076.016 -1375.354 58.811 -23.386 249.643 0.00 -2.833
2 Chinstrap Gentoo 3733.088 5076.016 -1342.928 65.103 -20.628 170.404 0.00 -3.105
"""

# Check the dataframe
_check_dataframe(dv=dv, between=between, effects='between', data=data)

Expand Down Expand Up @@ -956,8 +943,8 @@ def pairwise_corr(data, columns=None, covar=None, alternative='two-sided',
>>> import pandas as pd
>>> import pingouin as pg
>>> pd.set_option('expand_frame_repr', False)
>>> pd.set_option('max_columns', 20)
>>> pd.set_option('display.expand_frame_repr', False)
>>> pd.set_option('display.max_columns', 20)
>>> data = pg.read_dataset('pairwise_corr').iloc[:, 1:]
>>> pg.pairwise_corr(data, method='spearman', alternative='greater', padjust='bonf').round(3)
X Y method alternative n r CI95% p-unc p-corr p-adjust power
Expand Down
Loading

0 comments on commit a3e2ba6

Please sign in to comment.