Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataframe_regression fixture #35

Merged
merged 3 commits into from
Oct 19, 2020
Merged

Conversation

648trindade
Copy link
Contributor

@648trindade 648trindade commented Oct 12, 2020

Fix #21

Hi guys!

Most of the changes here start from the fact that numerical regression was actually using pandas as backend.

Main changes:

  • Created DataFrameRegressionFixture for dataframe_regression
  • NumericRegressionFixture now inherits most of its behavior from DataFrameRegressionFixture: check method converts python dicts to pandas DataFrames before calling DataFrameRegressionFixture.check
  • I've added some type checking to the inner data types, the following numpy data types are not allowed:
    • timedelta (m)
    • datetime (M)
    • objects (O)
    • zero-terminated bytes (S, a)
    • unicode strings (U)
    • raw data (V)
  • For most tests, I just copied num_regression tests and made sure that a pd.DataFrame was being created from the input dict
  • For dataframe_regression there are no tests for filling asymmetric arrays with values, since pandas natively do not allow them
  • I had to change 1 num_regression test since it was checking types with a string array, and string arrays are no longer allowed (makes sense IMHO)

@nicoddemus
Copy link
Member

nicoddemus commented Oct 13, 2020

Thanks @648trindade!

Can you please also update the CHANGELOG? Something like:

2.1.0 (UNRELEASED)
------------------

* `#35 <https://github.com/ESSS/pytest-regressions/pull/35>`__: New ``dataframe_regression`` fixture to check pandas DataFrames directly.

dataframe_regression.check(pd.DataFrame.from_dict({"data1": data2}))


def test_n_dimensions(dataframe_regression, no_regen):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I understand this is a consistent message, I find it a bit misleading, because in practice, we want to tell that n-dimensional arrays are not supported. I ask because on num_regression, the message is Only 1D arrays are supported on num_data_regression fixture.

Perhaps, at least, tell the user what he did wrong. Something like this:
Only numeric data is supported on dataframe_regression fixture: Column "C" has unsupported data type "Object".

What do you think?

Copy link
Contributor Author

@648trindade 648trindade Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up mixing things up in this test, sorry. Taking a second look at this I came to the fact that pandas do not accepts numpy n-dimensional arrays within dataframe columns.
So the assertion for 1D arrays inside DataFrameRegressionFixture will never fail. I think I can remove that assertion and rename this test as @tadeu suggested

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Copy link
Member

@tadeu tadeu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in overall, I've just commented some minor details, thanks for the awesome contribution :)

@@ -232,18 +72,6 @@ def check(
except ModuleNotFoundError:
raise ModuleNotFoundError(import_error_message("Pandas"))

import functools

__tracebackhide__ = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that __tracebackhide__ = True should be kept

'P': Pa_to_bar(P),
}
),
data_index=positions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation seems wrong here, there's no data_index in dataframe_regression.check


def test_usage_workflow(testdir, monkeypatch):
"""
:type testdir: _pytest.pytester.TmpTestdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea for an issue later, we could use proper type annotations and add mypy support in precommit/CI


import sys

monkeypatch.setattr(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that you just made this test based on the other tests, but I'm curious about why there's this injection of test stuff into sys (seems weird), perhaps @tarcisiofischer or @nicoddemus know about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea

"500 1.20000000000000018 1.10000000000000009 0.10000000000000009",
]
)
# prints used to debug #3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they still necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing it

dataframe_regression.check(pd.DataFrame.from_dict({"data1": data2}))


def test_n_dimensions(dataframe_regression, no_regen):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_n_dimensionstest_non_numeric_data?

@@ -160,7 +160,7 @@ def test_different_data_types(num_regression, no_regen):
# Smoke test: Should not raise any exception
num_regression.check({"data1": data1})

data2 = np.array(["a"] * 10)
data2 = np.array([True] * 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change? Perhaps instead of changing, it could be added as a parameter via parametrize (test both "a" and True).

Copy link
Contributor Author

@648trindade 648trindade Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python strings have 'object' dtype inside numpy/pandas. Objects are no longer accepted since we can not test it numerically (or can we? Maybe I'm missing something).
But boolean type is numerical-like, then the dataframe is not rejected before the actual check, and it returns the expected error message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, but perhaps we could extend it to support simple "value-type" objects such as strings, for an improved usage and for making user lives easier? They would not compare numerically, just for exact matches in these cases. (Could be another PR/issue though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it! Rolled back the num_regression test that I changed to use string arrays again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

def test_different_data_types(dataframe_regression, no_regen):
data1 = np.ones(10)
# Smoke test: Should not raise any exception
dataframe_regression.check(pd.DataFrame.from_dict({"data1": data1}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this line be a separate test (if it doesn't already exist?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to the same CSV file than the below check checks, right? Looks like a kind of 'sanity check'.
But as common cases are well treated by the test_common_cases test function, I will remove it.

domdfcoding added a commit to domdfcoding/pytest-regressions-stubs that referenced this pull request Oct 14, 2020
@tadeu
Copy link
Member

tadeu commented Oct 19, 2020

@648trindade, this looks good to be merged. Do you have plans to add more things to it?

@648trindade
Copy link
Contributor Author

648trindade commented Oct 19, 2020

@648trindade, this looks good to be merged. Do you have plans to add more things to it?

@tadeu Nope, I think it is enough.

@nicoddemus nicoddemus merged commit 899c32d into ESSS:master Oct 19, 2020
domdfcoding added a commit to domdfcoding/pytest-regressions-stubs that referenced this pull request Dec 22, 2020
domdfcoding added a commit to domdfcoding/pytest-regressions-stubs that referenced this pull request Dec 22, 2020
* Reflect changes from ESSS/pytest-regressions#35

* Run stubtest as part of mypy testenv.

* Updated config files.

* Linting.

* Linting

* Updated files with 'repo_helper'. (#2)

Co-authored-by: repo-helper[bot] <74742576+repo-helper[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Reflect changes from ESSS/pytest-regressions#35

* Linting.

* Don't upload coverage.

* Updated config files.

* Linting.

* Updated config files.

Co-authored-by: repo-helper[bot] <74742576+repo-helper[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for pandas DataFrame
4 participants