Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FileFinancialStorage #1250

Closed
wants to merge 5 commits into from
Closed

Conversation

Chaoyingz
Copy link
Contributor

@Chaoyingz Chaoyingz commented Aug 10, 2022

Description

See #1241.

Changes:

  • Pfeature can be used independently.
  • When writing PIT data, the data is sorted as described by the PIT documentation.
  • Support to update PIT data.
  • Read data by index slicing of the period list.
  • Add the FinancialInterval class makes extending Interval easier.
  • Modify the dump_pit script to make the behavior of reading and writing pit data more uniform.

Motivation and Context

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
    image

  2. Your own tests:
    image

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

Notes

related PR #1000

@you-n-g
Copy link
Collaborator

you-n-g commented Aug 12, 2022

Thanks for your contribution!
Please check the errors in the CI

@Chaoyingz
Copy link
Contributor Author

Chaoyingz commented Aug 12, 2022

@you-n-g thank you for your reply.
I tried to use an empty docker image locally to simulate the CI process (directly clone the code of this branch and then create the environment according to the CI process), and the test can be passed. Do you know any way to debug this CI errors?

@Chaoyingz
Copy link
Contributor Author

@you-n-g CI bug fixed.

from .data import PITD # pylint: disable=C0415

return PITD.period_feature(instrument, str(self), start_index, end_index, cur_time, period)
return PITD.financial(instrument, str(self), start_index, end_index, freq)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think financial is a better name?
Will period_feature be a more general name (financial data is an instance of period_feature)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of storage(FinancialStorage) and the name here are to be consistent with the name of the directory where PIT data is currently stored(financial)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think renaming both of them to pit or period_feature will make this feature more general?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PIT is not limited to storing financial data, period_feature looks easier to understand.

@@ -153,8 +166,8 @@ def test_expr(self):
2019-07-15 0.000000 0.000000 0.047369 0.094737 0.047369
2019-07-16 0.000000 0.000000 0.047369 0.094737 0.047369
2019-07-17 0.000000 0.000000 0.047369 0.094737 0.047369
2019-07-18 0.175322 0.175322 0.135029 0.094737 0.135029
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why this should be 0.087661 instead of 0.135029?
According to the content of data here.

The mean of last two quarters (201901, 201902) on 2019-07-18 should be the average of (0.094737 + 0.175322) /2 = 0.1350295.
Which is not the same as the changed version you committed.
Thanks

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the last two data to calculate the average:
(1.75322 + 0) / 2 = 0.087661

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mean($$roewa_q, 2) operator does not declare an observation point, why use 1.75322 instead of 0?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mean($$roewa_q, 2) means calculate the average of the latest value of the last two quarters at the current observation time point.
The last two quarters are 201901 and 201902. Its latest value is 0.094737 and 0.175322 respectively.

@Chaoyingz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mean($$roewa_q, 2) means calculate the average of the latest value of the last two quarters at the current observation time point. The last two quarters are 201901 and 201902. Its latest value is 0.094737 and 0.175322 respectively.

@Chaoyingz

I don't think the Mean operator has so much meaning, it just calculates the last two values of the given df (it doesn't matter the quarter).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chaoyingz
Yes, it just calculates the last two values of a given df.
But each row in the df indicates the value in a specific quarter in the PIT quarter data.

There are a lot of typical use cases. For example, the average XXX of the last 4 quarters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a lot of typical use cases does not mean that we need to change the purpose of the Mean operator. I think it is possible to write an operator specifically to handle this use case. And it should be difficult to implement this function in the current PitStorage.

@you-n-g
Copy link
Collaborator

you-n-g commented Aug 29, 2022

Sorry for the late response.
I'm a little confused by the latest test. Could you please give some explanation about it?
Thanks.

@Chaoyingz

@you-n-g you-n-g self-assigned this Aug 29, 2022
@Chaoyingz Chaoyingz requested a review from you-n-g August 29, 2022 10:02
Copy link
Collaborator

@you-n-g you-n-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chaoyingz I have replied to your comments :)

@Chaoyingz Chaoyingz requested a review from you-n-g August 30, 2022 10:43
Copy link
Collaborator

@you-n-g you-n-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chaoyingz I have replied to your comments just now.

@Chaoyingz Chaoyingz requested a review from you-n-g September 2, 2022 09:43
@github-actions
Copy link

github-actions bot commented Feb 8, 2023

This PR is stale because it has been open for a year with no activity. Remove the stale label or comment on the PR otherwise this will be closed in 5 days

@github-actions github-actions bot added the stale label Feb 8, 2023
@github-actions github-actions bot closed this Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants