Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression results part2 #28

Merged
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions gneiss/_summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
# The full license is in the file COPYING.txt, distributed with this software.
# ----------------------------------------------------------------------------
import pandas as pd
from skbio.stats.composition import ilr_inv


class RegressionResults():
Expand Down Expand Up @@ -57,3 +58,72 @@ def __init__(self, stat_results,
# calculate the overall coefficient of determination (i.e. R2)
sst = sse + ssr
self.r2 = 1 - sse / sst

def _check_projection(self, project):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to use this check in other methods? If not, why not add it in the coefficients code (it looks like a really simple check that is happening within coefficients)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is because a whole bunch of methods are using this check, namely residuals and predict. Those are being added in #30

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. Just out of curiosity, will it make sense to make project a member of the object? What I'm asking is if it will make sense to do somewhere in the code: self.project=True/False, and then just call self.coefficients() & self._check_projection(). This could simplify the code but it will mean that if you are doing self.residuals, self.predict, etc this parameter can't change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, @mortonjt and I discussed this offline and it makes sense to keep as is. Basically, this is converting the results to one of two formats based on the projection.

"""
Parameters
----------
project : bool
Specifies if a projection into the Aitchison simplex can be
performed.

Raises
------
ValueError:
Cannot perform projection into Aitchison simplex if `basis`
is not specified.
ValueError:
Cannot perform projection into Aitchison simplex
if `feature_names` is not specified.
"""
if self.basis is None and project:
raise ValueError("Cannot perform projection into Aitchison simplex"
"if `basis` is not specified.")

if self.feature_names is None and project:
raise ValueError("Cannot perform projection into Aitchison simplex"
"if `feature_names` is not specified.")

def coefficients(self, project=False):
""" Returns coefficients from fit.

Parameters
----------
project : bool, optional
Specifies if coefficients should be projected back into
the Aitchison simplex. If false, the coefficients will be
represented as balances (default: False).

Returns
-------
pd.DataFrame
A table of values where columns are coefficients, and the index
is either balances or proportions, depending on the value of
`project`.

Raises
------
ValueError:
Cannot perform projection into Aitchison simplex if `basis`
is not specified.
ValueError:
Cannot perform projection into Aitchison simplex
if `feature_names` is not specified.
"""
self._check_projection(project)
coef = pd.DataFrame()

for r in self.results:
c = r.params
c.name = r.model.endog_names
coef = coef.append(c)

if project:
# `check=True` due to type issue resolved here
# https://github.com/biocore/scikit-bio/pull/1396
c = ilr_inv(coef.values.T, basis=self.basis, check=False).T
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check=True in the comment but False in the code, which one is it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason why I added this comment.

Although this has been resolved in scikit-bio, it is only in the development version - which would mean that this would have to depend on the development version of scikit-bio.

I was thinking about raising an issue, and making the fix at the next release of scikit-bio. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That fix has been merged in skbio and you could use the dev version of skbio (in travis, setup.py, etc) until there is release, will this work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Then this should be ok

On Jul 31, 2016 6:38 AM, "Antonio Gonzalez" notifications@github.com
wrote:

In gneiss/_summary.py
#28 (comment):

  •    ValueError:
    
  •        Cannot perform projection into Aitchison simplex
    
  •        if `feature_names` is not specified.
    
  •    """
    
  •    self._check_projection(project)
    
  •    coef = pd.DataFrame()
    
  •    for r in self.results:
    
  •        c = r.params
    
  •        c.name = r.model.endog_names
    
  •        coef = coef.append(c)
    
  •    if project:
    
  •        # `check=True` due to type issue resolved here
    
  •        # https://github.com/biocore/scikit-bio/pull/1396
    
  •        c = ilr_inv(coef.values.T, basis=self.basis, check=False).T
    

That fix has been merged in skbio and you could use the dev version of
skbio (in travis, setup.py, etc) until there is release, will this work?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/biocore/gneiss/pull/28/files/f3b7d9472d3eb4ebebe004b42141f66d20f9461a#r72904299,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_a3Tyk2nu8mJS-28A2xkXGhFb67ZI_ks5qbKU6gaJpZM4JXppU
.

c = pd.DataFrame(c, index=self.feature_names,
columns=coef.columns)
return c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about:

return pd.DataFrame(c, index=self.feature_names, columns=coef.columns)

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

else:
return coef
49 changes: 49 additions & 0 deletions gneiss/tests/test_summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,13 @@
#
# The full license is in the file COPYING.txt, distributed with this software.
# ----------------------------------------------------------------------------
import numpy as np
import pandas as pd
import pandas.util.testing as pdt
import statsmodels.formula.api as smf
import unittest
from gneiss._summary import RegressionResults
from skbio.stats.composition import _gram_schmidt_basis, ilr_inv


class TestRegressionResults(unittest.TestCase):
Expand Down Expand Up @@ -55,5 +57,52 @@ def test_regression_results_pvalues(self):
check_exact=False,
check_less_precise=True)

def test_check_projection(self):
feature_names = ['Z1', 'Z2', 'Z3']
basis = _gram_schmidt_basis(3)
res = RegressionResults(self.results, basis=basis,
feature_names=feature_names)

feature_names = ['Z1', 'Z2', 'Z3']
basis = _gram_schmidt_basis(3)

# Test if feature_names is checked for
res = RegressionResults(self.results, basis=basis)
with self.assertRaises(ValueError):
res._check_projection(True)

# Test if basis is checked for
res = RegressionResults(self.results, feature_names=feature_names)
with self.assertRaises(ValueError):
res._check_projection(True)

def test_regression_results_coefficient(self):
exp_coef = pd.DataFrame({'Intercept': [1.447368, -0.052632],
'X': [0.539474, 1.289474]},
index=['Y1', 'Y2'])
res = RegressionResults(self.results)
pdt.assert_frame_equal(res.coefficients(), exp_coef,
check_exact=False,
check_less_precise=True)

def test_regression_results_coefficient_projection(self):
exp_coef = pd.DataFrame(
{'Intercept': ilr_inv(np.array([[1.447368, -0.052632]])),
'X': ilr_inv(np.array([[0.539474, 1.289474]]))},
index=['Z1', 'Z2', 'Z3'])
feature_names = ['Z1', 'Z2', 'Z3']
basis = _gram_schmidt_basis(3)
res = RegressionResults(self.results, basis=basis,
feature_names=feature_names)

pdt.assert_frame_equal(res.coefficients(project=True), exp_coef,
check_exact=False,
check_less_precise=True)

def test_regression_results_coefficient_project_error(self):
res = RegressionResults(self.results)
with self.assertRaises(ValueError):
res.coefficients(project=True)

if __name__ == "__main__":
unittest.main()