Release v0.5.1 (#236)

* Flake8 * Explicit error when y is an empty list in pg.ttest #222 * Add keyword arguments in homoscedasticity function #218 * Bugfix rm_anova and mixed_anova changed the dtypes of categorical columns + added observed=True to all groupby #224 * Update version number in init and setup * Use np.isclose for test_pearson == 1 #195 * Coverage for try..except scipy fallback * Fix set_option for pandas 1.4 * Upgraded dependencies for seaborn and statsmodels * Added Jarque-Bera test in pg.normality #216 * Coverage scipy import error * Use pd.concat instead of frame.append to avoid FutureWarning * Remove add_categories(inplace=True) to avoid FutureWarning * GH Discussions instead of Gitter * Minor doc fix
raphaelvallat · Feb 20, 2022 · a3e2ba6 · a3e2ba6
1 parent c9a5e41
commit a3e2ba6
Show file tree

Hide file tree

Showing 16 changed files with 121 additions and 136 deletions.
diff --git a/README.rst b/README.rst
@@ -23,8 +23,6 @@
 .. image:: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244/status.svg
     :target: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244
 
-.. image:: https://badges.gitter.im/owner/repo.png
-    :target: https://gitter.im/pingouin-stats/Lobby
 
 ----------------
 
@@ -70,10 +68,7 @@ Documentation
 Chat
 ====
 
-If you have questions, please ask them in the public `Gitter chat <https://gitter.im/pingouin-stats/Lobby>`_
-
-.. image:: https://badges.gitter.im/owner/repo.png
-    :target: https://gitter.im/pingouin-stats/Lobby
+If you have questions, please ask them in `GitHub Discussions <https://github.com/raphaelvallat/pingouin/discussions>`_.
 
 Installation
 ============

diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -8,13 +8,24 @@ What's new
 
 *************
 
-v0.6.0.dev
-----------
+v0.5.1 (February 2022)
+----------------------
+
+This is a minor release, with several bugfixes and improvements. This release is compatible with SciPy 1.8 and Pandas 1.4.
+
+**Bugfixes**
+
+a. Added support for SciPy 1.8 and Pandas 1.4. `PR 234 <https://github.com/raphaelvallat/pingouin/pull/234>`_.
+b. Fixed bug where :py:func:`pingouin.rm_anova` and :py:func:`pingouin.mixed_anova` changed the dtypes of categorical columns in-place (`issue 224 <https://github.com/raphaelvallat/pingouin/issues/224>`_).
 
 **Enhancements**
 
-a. Faster implementation of :py:func:`pingouin.gzscore`, adding all options available in zscore: axis, ddof and nan_policy. Warning: this functions is deprecated and will be removed in pingouin 0.7.0 (use scipy.stats.gzscore instead). See `pull request 210 <https://github.com/raphaelvallat/pingouin/pull/210>`_.
-b. Replace use of statsmodels' studentized range distribution functions with more SciPy's more accurate `scipy.stats.studentized_range`. See `pull request 229 <https://github.com/raphaelvallat/pingouin/pull/229>`_.
+a. Faster implementation of :py:func:`pingouin.gzscore`, adding all options available in zscore: axis, ddof and nan_policy. Warning: this functions is deprecated and will be removed in pingouin 0.7.0 (use :py:func:`scipy.stats.gzscore` instead). `PR 210 <https://github.com/raphaelvallat/pingouin/pull/210>`_.
+b. Replace use of statsmodels' studentized range distribution functions with more SciPy's more accurate :py:func:`scipy.stats.studentized_range`. `PR 229 <https://github.com/raphaelvallat/pingouin/pull/229>`_.
+c. Add support for optional keywords argument in the :py:func:`pingouin.homoscedasticity` function (`issue 218 <https://github.com/raphaelvallat/pingouin/issues/218>`_).
+d. Add support for the Jarque-Bera test in :py:func:`pingouin.normality` (`issue 216 <https://github.com/raphaelvallat/pingouin/issues/216>`_).
+
+Lastly, we have also deprecated the Gitter forum in favor of `GitHub Discussions <https://github.com/raphaelvallat/pingouin/discussions>`_. Please use Discussions to ask questions, share ideas / tips and engage with the Pingouin community!
 
 *************
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -21,9 +21,6 @@
 .. image:: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244/status.svg
     :target: http://joss.theoj.org/papers/d2254e6d8e8478da192148e4cfbe4244
 
-.. image:: https://badges.gitter.im/owner/repo.png
-    :target: https://gitter.im/pingouin-stats/Lobby
-
 
 ----------------
 
@@ -108,7 +105,7 @@ Whenever a new release is out there, you can upgrade your version by typing the
 Quick start
 ===========
 
-* If you have *questions*, please ask them in the public `Gitter chat <https://gitter.im/pingouin-stats/Lobby>`_.
+* If you have *questions*, please ask them in `GitHub Discussions <https://github.com/raphaelvallat/pingouin/discussions>`_.
 
 * If you want to *report a bug*, please open an issue on the `GitHub repository <https://github.com/raphaelvallat/pingouin>`_.
 

diff --git a/pingouin/__init__.py b/pingouin/__init__.py
@@ -20,7 +20,7 @@
 from .config import *
 
 # Current version
-__version__ = "0.5.0"
+__version__ = "0.5.1"
 
 # Warn if a newer version of Pingouin is available
 from outdated import warn_if_outdated

diff --git a/pingouin/distribution.py b/pingouin/distribution.py
@@ -86,9 +86,11 @@ def normality(data, dv=None, group=None, method="shapiro", alpha=.05):
         Grouping variable (only when ``data`` is a long-format dataframe).
     method : str
         Normality test. `'shapiro'` (default) performs the Shapiro-Wilk test
-        using :py:func:`scipy.stats.shapiro`, and `'normaltest'` performs the
-        omnibus test of normality using :py:func:`scipy.stats.normaltest`.
-        The latter is more appropriate for large samples.
+        using :py:func:`scipy.stats.shapiro`, `'normaltest'` performs the
+        omnibus test of normality using :py:func:`scipy.stats.normaltest`, `'jarque_bera'` performs
+        the Jarque-Bera test using :py:func:`scipy.stats.jarque_bera`.
+        The Omnibus and Jarque-Bera tests are more suitable than the Shapiro test for
+        large samples.
     alpha : float
         Significance level.
 
@@ -194,9 +196,16 @@ def normality(data, dv=None, group=None, method="shapiro", alpha=.05):
                  W      pval  normal
     Pre   0.967718  0.478773    True
     Post  0.940728  0.095157    True
+
+    5. Same but using the Jarque-Bera test
+
+    >>> pg.normality(data, dv='Performance', group='Time', method="jarque_bera")
+                W      pval   normal
+    Pre   0.304021  0.858979    True
+    Post  1.265656  0.531088    True
     """
     assert isinstance(data, (pd.DataFrame, pd.Series, list, np.ndarray))
-    assert method in ['shapiro', 'normaltest']
+    assert method in ['shapiro', 'normaltest', 'jarque_bera']
     if isinstance(data, pd.Series):
         data = data.to_frame()
     col_names = ['W', 'pval']
@@ -227,14 +236,13 @@ def normality(data, dv=None, group=None, method="shapiro", alpha=.05):
             grp = data.groupby(group, observed=True, sort=False)
             cols = grp.groups.keys()
             for _, tmp in grp:
-                stats = stats.append(normality(tmp[dv].to_numpy(),
-                                               method=method,
-                                               alpha=alpha))
+                st_grp = normality(tmp[dv].to_numpy(), method=method, alpha=alpha)
+                stats = pd.concat([stats, st_grp], axis=0, ignore_index=True)
             stats.index = cols
     return _postprocess_dataframe(stats)
 
 
-def homoscedasticity(data, dv=None, group=None, method="levene", alpha=.05):
+def homoscedasticity(data, dv=None, group=None, method="levene", alpha=.05, **kwargs):
     """Test equality of variance.
 
     Parameters
@@ -253,6 +261,8 @@ def homoscedasticity(data, dv=None, group=None, method="levene", alpha=.05):
         The former is more robust to departure from normality.
     alpha : float
         Significance level.
+    **kwargs : optional
+        Optional argument(s) passed to the lower-level :py:func:`scipy.stats.levene` function.
 
     Returns
     -------
@@ -339,7 +349,13 @@ def homoscedasticity(data, dv=None, group=None, method="levene", alpha=.05):
                    W      pval  equal_var
     levene  1.173518  0.310707       True
 
-    3. Bartlett test using a list of iterables
+    3. Same but using a mean center
+
+    >>> pg.homoscedasticity(data_long, dv="value", group="variable", center="mean")
+                   W      pval  equal_var
+    levene  1.572239  0.209303       True
+
+    4. Bartlett test using a list of iterables
 
     >>> data = [[4, 8, 9, 20, 14], np.array([5, 8, 15, 45, 12])]
     >>> pg.homoscedasticity(data, method="bartlett", alpha=.05)
@@ -356,30 +372,28 @@ def homoscedasticity(data, dv=None, group=None, method="levene", alpha=.05):
             # Get numeric data only
             numdata = data._get_numeric_data()
             assert numdata.shape[1] > 1, 'Data must have at least two columns.'
-            statistic, p = func(*numdata.to_numpy().T)
+            statistic, p = func(*numdata.to_numpy().T, **kwargs)
         else:
             # Long-format
             assert group in data.columns
             assert dv in data.columns
             grp = data.groupby(group, observed=True)[dv]
             assert grp.ngroups > 1, 'Data must have at least two columns.'
-            statistic, p = func(*grp.apply(list))
+            statistic, p = func(*grp.apply(list), **kwargs)
     elif isinstance(data, list):
         # Check that list contains other list or np.ndarray
         assert all(isinstance(el, (list, np.ndarray)) for el in data)
         assert len(data) > 1, 'Data must have at least two iterables.'
-        statistic, p = func(*data)
+        statistic, p = func(*data, **kwargs)
     else:
         # Data is a dict
         assert all(isinstance(el, (list, np.ndarray)) for el in data.values())
         assert len(data) > 1, 'Data must have at least two iterables.'
-        statistic, p = func(*data.values())
+        statistic, p = func(*data.values(), **kwargs)
 
     equal_var = True if p > alpha else False
     stat_name = 'W' if method.lower() == 'levene' else 'T'
-
-    stats = pd.DataFrame({stat_name: statistic, 'pval': p,
-                          'equal_var': equal_var}, index=[method])
+    stats = pd.DataFrame({stat_name: statistic, 'pval': p, 'equal_var': equal_var}, index=[method])
 
     return _postprocess_dataframe(stats)
 
@@ -463,7 +477,7 @@ def _check_multilevel_rm(data, func='epsilon'):
             # We end up with a one-way design. It is similar to applying
             # a paired T-test to gain scores instead of using repeated measures
             # on two time points. Here we have computed the gain scores.
-            data = data.groupby(level=1, axis=1).diff(axis=1).dropna(axis=1)
+            data = data.groupby(level=1, axis=1, observed=True).diff(axis=1).dropna(axis=1)
             data = data.droplevel(level=0, axis=1)
         else:
             # Both factors have more than 2 levels -- differ from R / JASP
@@ -498,8 +512,8 @@ def _long_to_wide_rm(data, dv=None, within=None, subject=None):
     # Keep all relevant columns and reset index
     data = data[_fl([subject, within, dv])]
     # Convert to wide-format + collapse to the mean
-    data = pd.pivot_table(data, index=subject, values=dv, columns=within,
-                          aggfunc='mean', dropna=True, observed=True)
+    data = pd.pivot_table(
+        data, index=subject, values=dv, columns=within, aggfunc='mean', dropna=True, observed=True)
     return data
 
 

diff --git a/pingouin/pairwise.py b/pingouin/pairwise.py
@@ -173,8 +173,8 @@ def pairwise_ttests(data=None, dv=None, between=None, within=None, subject=None,
 
     >>> import pandas as pd
     >>> import pingouin as pg
-    >>> pd.set_option('expand_frame_repr', False)
-    >>> pd.set_option('max_columns', 20)
+    >>> pd.set_option('display.expand_frame_repr', False)
+    >>> pd.set_option('display.max_columns', 20)
     >>> df = pg.read_dataset('mixed_anova.csv')
     >>> pg.pairwise_ttests(dv='Scores', between='Group', data=df).round(3)
       Contrast        A           B  Paired  Parametric     T    dof alternative  p-unc   BF10  hedges
@@ -426,11 +426,12 @@ def pairwise_ttests(data=None, dv=None, between=None, within=None, subject=None,
             else:
                 tmp = data
             # Recursive call to pairwise_ttests
-            stats = stats.append(pairwise_ttests(
+            pt = pairwise_ttests(
                 dv=dv, between=fbt[i], within=fwt[i], subject=subject, data=tmp,
                 parametric=parametric, marginal=marginal, alpha=alpha, alternative=alternative,
                 padjust=padjust, effsize=effsize, correction=correction, nan_policy=nan_policy,
-                return_desc=return_desc), ignore_index=True, sort=False)
+                return_desc=return_desc)
+            stats = pd.concat([stats, pt], axis=0, ignore_index=True, sort=False)
 
         # Then compute the interaction between the factors
         if interaction:
@@ -608,12 +609,6 @@ def pairwise_tukey(data=None, dv=None, between=None, effsize='hedges'):
     :math:`Q(\\sqrt2|t_i|, r, N - r)` where :math:`r` is the total number of
     groups and :math:`N` is the total sample size.
 
-    .. warning:: Versions of Pingouin below 0.3.10 used a wrong algorithm for
-        the studentized range approximation [2]_, which resulted in (slightly)
-        incorrect p-values. Please make sure you're using the
-        LATEST VERSION of Pingouin, and always DOUBLE CHECK your results with
-        another statistical software.
-
     References
     ----------
     .. [1] Tukey, John W. "Comparing individual means in the analysis of
@@ -635,7 +630,6 @@ def pairwise_tukey(data=None, dv=None, between=None, effsize='hedges'):
     1     Adelie     Gentoo  3700.662  5076.016 -1375.354  56.148 -24.495    0.000  -2.967
     2  Chinstrap     Gentoo  3733.088  5076.016 -1342.928  69.857 -19.224    0.000  -2.894
     """
-
     # First compute the ANOVA
     # For max precision, make sure rounding is disabled
     old_options = options.copy()
@@ -760,12 +754,6 @@ def pairwise_gameshowell(data=None, dv=None, between=None, effsize='hedges'):
     The p-values are then approximated using the Studentized range distribution
     :math:`Q(\\sqrt2|t_i|, r, v_i)`.
 
-    .. warning:: Versions of Pingouin below 0.3.10 used a wrong algorithm for
-        the studentized range approximation [2]_, which resulted in (slightly)
-        incorrect p-values. Please make sure you're using the
-        LATEST VERSION of Pingouin, and always DOUBLE CHECK your results with
-        another statistical software.
-
     References
     ----------
     .. [1] Games, Paul A., and John F. Howell. "Pairwise multiple comparison
@@ -789,7 +777,6 @@ def pairwise_gameshowell(data=None, dv=None, between=None, effsize='hedges'):
     1     Adelie     Gentoo  3700.662  5076.016 -1375.354  58.811 -23.386  249.643  0.00  -2.833
     2  Chinstrap     Gentoo  3733.088  5076.016 -1342.928  65.103 -20.628  170.404  0.00  -3.105
     """
-
     # Check the dataframe
     _check_dataframe(dv=dv, between=between, effects='between', data=data)
 
@@ -956,8 +943,8 @@ def pairwise_corr(data, columns=None, covar=None, alternative='two-sided',
 
     >>> import pandas as pd
     >>> import pingouin as pg
-    >>> pd.set_option('expand_frame_repr', False)
-    >>> pd.set_option('max_columns', 20)
+    >>> pd.set_option('display.expand_frame_repr', False)
+    >>> pd.set_option('display.max_columns', 20)
     >>> data = pg.read_dataset('pairwise_corr').iloc[:, 1:]
     >>> pg.pairwise_corr(data, method='spearman', alternative='greater', padjust='bonf').round(3)
                    X                  Y    method alternative    n      r         CI95%  p-unc  p-corr p-adjust  power