Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added diff() function to DataFrame #414

Merged
merged 8 commits into from
Mar 9, 2022

Conversation

NeonSpork
Copy link
Contributor

Added a diff() function to mimic the functionality of the equivalent function in pandas as best as possible.

Can calculate difference between rows and columns both forwards and backwards, in addition to difference in relation to another dataframe or series.

Added the function to the DataFrame interface.

Added unit testing with as much coverage as I could. The error that is thrown if the axis is set as something other than 0 or 1 follows the convension from a similar function, but as far as I can tell TypeScript won't let you set axis to anything besides 0 or 1, so it is untestable.

The implementation is made using the built in tensorflow functionality for pretty much everything except nudging the data to compare, and I believe this is one of the faster ways to achieve this!

Should be about as near full coverage as possible
Calculate difference between rows, columns, series or another dataframe
@NeonSpork
Copy link
Contributor Author

Resolves #383 partially, there's no pct_change as of yet, but figuring out how to perform the calculations effectively and manipulate the dataframes took the most amount of time. I will try to implement a similar function for pct_change() as well 😃

@risenW risenW self-requested a review March 8, 2022 07:25
Copy link
Member

@risenW risenW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work so far @NeonSpork ! and thanks for taking out time to contribute. I dropped some general suggestions, if you need any input from me, don't hesitate to ping me.

src/danfojs-base/core/frame.ts Outdated Show resolved Hide resolved
src/danfojs-base/core/frame.ts Show resolved Hide resolved
src/danfojs-base/core/frame.ts Outdated Show resolved Hide resolved
src/danfojs-base/core/frame.ts Outdated Show resolved Hide resolved
src/danfojs-base/core/frame.ts Outdated Show resolved Hide resolved
@NeonSpork
Copy link
Contributor Author

NeonSpork commented Mar 8, 2022

Another comment. I have implemented the pct_change() as well, but am struggling with testing.

My expected return is for example 0.8 but JavaScript thinks it should be 0.80000000000012345 to provide a random example. Is there an easy way around this? @risenW

Edit: Since it breaks testing I'm doing pct_change() in a separate branch. I can make a separate pull request for that in a bit, so you can see more hands on!

refactor: Make series array generation more succinct
@risenW
Copy link
Member

risenW commented Mar 8, 2022

Another comment. I have implemented the pct_change() as well, but am struggling with testing.

My expected return is for example 0.8 but JavaScript thinks it should be 0.80000000000012345 to provide a random example. Is there an easy way around this? @risenW

Edit: Since it breaks testing I'm doing pct_change() in a separate branch. I can make a separate pull request for that in a bit, so you can see more hands on!

Yeah, I have had that issue with float values as well. You can try to truncate/floor the value.

And yes, doing pct_change in a separate PR is better.

@NeonSpork
Copy link
Contributor Author

We can cross that bridge in another PR! 👍 I'm trying to refactor and remove some duplication, should hopefully not take too long!

@NeonSpork
Copy link
Contributor Author

Alright! I think I've resolved the issues you pointed out @risenW ?

The code has become more succinct with less duplication, and tests are passing!

@risenW
Copy link
Member

risenW commented Mar 8, 2022

@NeonSpork Everything looks good here. One last thing before I merge, can you also add test for the browser version. Just copy over what you currently have to the browser test folder, and then use the namespace dfd.DataFrame or dfd.Series to create Dataframe and Series. You can see examples in the danfojs-browser/tests/ folder

@NeonSpork
Copy link
Contributor Author

Sounds great! I forgot about the browser tests, oops! Was just happy to see green checks 👍

I'll sort that out in both my PRs in a short bit!

@risenW
Copy link
Member

risenW commented Mar 9, 2022

LGTM ✅✅ Thanks for the awesome work @NeonSpork

@risenW risenW merged commit 746c4cc into javascriptdata:dev Mar 9, 2022
@NeonSpork
Copy link
Contributor Author

LGTM ✅✅ Thanks for the awesome work @NeonSpork

My pleasure man! I'm looking forward to contributing more in the future 🚀

@NeonSpork NeonSpork deleted the diff-port branch March 9, 2022 07:52
@Coelmount
Copy link

@NeonSpork @risenW - this feature is exactly what I am searching for, however, I cannot find it in the docs. Could you maybe point me to the documentation?

@NeonSpork
Copy link
Contributor Author

@NeonSpork @risenW - this feature is exactly what I am searching for, however, I cannot find it in the docs. Could you maybe point me to the documentation?

Hi! Glad you've found my contribution useful! When I wrote that contribution I also added pages to the docs, but for some reason they're not on the official documentation web page. Not sure why that is the case.

Either way the docs pages I added are still in the docs repo, and the examples and explanations for the diff() function can be found here:
https://github.com/javascriptdata/danfojs-doc/blob/master/api-reference%2Fdataframe%2Fdanfo.dataframe.diff.md

Maybe @risenW has some insight into why it's not on the main docs? Maybe the main index page doesn't include links to them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants