You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
The Polars correlation function pl.corr appears to accept a ddof argument to specify the degrees of freedom when calculating the correlation coefficient. However, the correlation coefficient should be invariant under different ddof values since it's a ratio of covariance and standard deviation times standard deviation, both of which include the same scaling factor. Therefore, the ddof argument should not be required (see also numpy.corrcoef where bias and ddof are deprecated).
Example to Reproduce
import polars as pl
# Create a sample DataFrame
df = pl.DataFrame({
"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"y": [2.1, 2.5, 2.9, 3.6, 3.8, 4.5, 5.1, 5.3, 5.8, 6.3]
})
# Compute correlation with different ddof values
corr_ddof_0 = df.select(pl.corr("x", "y", ddof=0)).item()
corr_ddof_1 = df.select(pl.corr("x", "y", ddof=1)).item()
# Output the correlation values
print(f"Correlation between 'x' and 'y' with ddof=0: {corr_ddof_0}")
print(f"Correlation between 'x' and 'y' with ddof=1: {corr_ddof_1}")
Expected Behavior
Both correlation coefficients should yield the same value because the ddof argument should not affect the correlation result.
Expected Output:
Correlation between 'x' and 'y' with ddof=0: 0.9971627582526871
Correlation between 'x' and 'y' with ddof=1: 0.997162758252687
Suggested Changes
Remove the ddof argument from the pl.corr function.
If the ddof argument is retained for backward compatibility, consider adding documentation notes about its redundancy in correlation computation.
Environment
OS: Windows 10
Python Version: 3.10.13
Polars Version: 0.20.22
The text was updated successfully, but these errors were encountered:
Description
The Polars correlation function pl.corr appears to accept a ddof argument to specify the degrees of freedom when calculating the correlation coefficient. However, the correlation coefficient should be invariant under different ddof values since it's a ratio of covariance and standard deviation times standard deviation, both of which include the same scaling factor. Therefore, the ddof argument should not be required (see also numpy.corrcoef where bias and ddof are deprecated).
Example to Reproduce
Expected Behavior
Both correlation coefficients should yield the same value because the ddof argument should not affect the correlation result.
Expected Output:
Suggested Changes
Environment
The text was updated successfully, but these errors were encountered: