Add some hypothesis test functions #315

mdahlin · 2025-01-02T16:24:22Z

Adds functions for

Mann-Whitney U test (per https://users.rust-lang.org/t/mann-whitney-u-test/95005)
One Way ANOVA F-test
One Sample t-test

Functions are generally based on the scipy version. I tried to align with existing formatting/setup from the fisher test, but wanted to make sure I was on the right track (in terms of structure, level of documentation, desire for this capability, etc.) before thinking about adding more tests.

YeungOnion · 2025-01-12T00:24:49Z

Sorry that I've taken a bit to reply here.

So far, these are great. We have mentioned the idea of a nan policy in regards to analytical functions (as opposed to empirical functions) and I think following the scipy approach is good because of developer expectations.

We don't really have enough tests to have a sense of uniform API for tests and having the policy as an argument is useful for establishing that.

I think the direction you're going in is good and I would approve this once I look into why the nightly-dependent workflow in the CI won't compile. I'm open to you continuing on this PR or opening a dependent PR.

However, regarding license, to what degree would you say you referred to the scipy source? I don't wish to complicate the license we distribute with, nor do I want to use a license that's not typical for crates on crates.io

mdahlin · 2025-01-12T15:01:06Z

Hey thanks for the response and feedback.

In terms of the nightly piece. I found the same error locally. A day or so later I updated nightly and everything worked just fine, so it seems like it was an issue specific to nightly.

In terms of how much I "referred to the scipy source", it's been a pretty loose reference for the most part but I'll provide some relevant links if you want to form your own opinion.

one-way ANOVA F-test

relevant part of the implementation in this PR:

statrs/src/stats_tests/f_oneway.rs

Line 114 in 100e726

let n = n_i.iter().sum::<usize>();
scipy: https://github.com/scipy/scipy/blob/92d2a8592782ee19a1161d0bf3fc2241ba78bb63/scipy/stats/_stats_py.py#L4173

My conclusion: commonality with scipy is mainly just the function signature as I leveraged a statsdirect page for logic

One Sample t-test

this PR:

statrs/src/stats_tests/ttest_onesample.rs

Line 78 in 100e726

let samplemean = a.iter().sum::<f64>() / (n as f64);
scipy: https://github.com/scipy/scipy/blob/6e246d0b54dd55dc69232a0caae6772228a7ac25/scipy/stats/_stats_py.py#L6092

My conclusion: again mainly just the function signature as I used the logic from this jpm page

Mann Whitney U

Here we'd probably want to look at the two main pieces of logic, being the different methods for calculating the test's statistic, separately

Exact

this PR:

statrs/src/stats_tests/mannwhitneyu.rs

Line 156 in 100e726

fn calc_mwu_exact_pvalue(u: f64, n1: usize, n2: usize) -> f64 {
scipy: https://github.com/scipy/scipy/blob/v1.15.0/scipy/stats/_mannwhitneyu.py#L83

My conclusion: These are very different from each other. The scipy version is doing a lot of 2-d array stuff and matrix operations that I didn't get into in my implementation.

Asymptotic

this PR:

statrs/src/stats_tests/mannwhitneyu.rs

Line 126 in 100e726

fn calc_mwu_asymptotic_pvalue(
scipy: https://github.com/scipy/scipy/blob/92d2a8592782ee19a1161d0bf3fc2241ba78bb63/scipy/stats/_mannwhitneyu.py#L149

My conclusion: This is probably the only case worth your review/thoughts. The scipy version is ~10 LOC and the version in this PR is basically a 1:1 copy of those lines. There isn't too much room for alternative implementation here, but happy to re-write it to avoid any potential issues. Also for reference, the scipy function references this section in the Mann Whitney U wiki article.

mdahlin added 3 commits January 1, 2025 21:35

feat(stats_tests): implement f_oneway

057b610

feat(stats_tests): implement ttest_onesample

9a7185a

feat(stats_tests): implement mannwhitneyu

100e726

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some hypothesis test functions #315

Add some hypothesis test functions #315

mdahlin commented Jan 2, 2025

YeungOnion commented Jan 12, 2025

mdahlin commented Jan 12, 2025

Add some hypothesis test functions #315

Are you sure you want to change the base?

Add some hypothesis test functions #315

Conversation

mdahlin commented Jan 2, 2025

YeungOnion commented Jan 12, 2025

mdahlin commented Jan 12, 2025

one-way ANOVA F-test

One Sample t-test

Mann Whitney U

Exact

Asymptotic