-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FileFinancialStorage #1250
Add FileFinancialStorage #1250
Conversation
Thanks for your contribution! |
@you-n-g thank you for your reply. |
@you-n-g CI bug fixed. |
from .data import PITD # pylint: disable=C0415 | ||
|
||
return PITD.period_feature(instrument, str(self), start_index, end_index, cur_time, period) | ||
return PITD.financial(instrument, str(self), start_index, end_index, freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think financial
is a better name?
Will period_feature
be a more general name (financial data is an instance of period_feature
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of storage(FinancialStorage
) and the name here are to be consistent with the name of the directory where PIT data is currently stored(financial
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think renaming both of them to pit
or period_feature
will make this feature more general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PIT is not limited to storing financial data, period_feature
looks easier to understand.
@@ -153,8 +166,8 @@ def test_expr(self): | |||
2019-07-15 0.000000 0.000000 0.047369 0.094737 0.047369 | |||
2019-07-16 0.000000 0.000000 0.047369 0.094737 0.047369 | |||
2019-07-17 0.000000 0.000000 0.047369 0.094737 0.047369 | |||
2019-07-18 0.175322 0.175322 0.135029 0.094737 0.135029 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain why this should be 0.087661
instead of 0.135029
?
According to the content of data here.
The mean of last two quarters (201901, 201902) on 2019-07-18 should be the average of (0.094737 + 0.175322) /2 = 0.1350295
.
Which is not the same as the changed version you committed.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the last two data to calculate the average:
(1.75322 + 0) / 2 = 0.087661
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mean($$roewa_q, 2)
operator does not declare an observation point, why use 1.75322 instead of 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mean($$roewa_q, 2)
means calculate the average of the latest value of the last two quarters at the current observation time point.
The last two quarters are 201901
and 201902
. Its latest value is 0.094737
and 0.175322
respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mean($$roewa_q, 2)
means calculate the average of the latest value of the last two quarters at the current observation time point. The last two quarters are201901
and201902
. Its latest value is0.094737
and0.175322
respectively.
I don't think the Mean
operator has so much meaning, it just calculates the last two values of the given df (it doesn't matter the quarter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Chaoyingz
Yes, it just calculates the last two values of a given df.
But each row in the df indicates the value in a specific quarter in the PIT quarter data.
There are a lot of typical use cases. For example, the average XXX of the last 4 quarters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a lot of typical use cases does not mean that we need to change the purpose of the Mean
operator. I think it is possible to write an operator specifically to handle this use case. And it should be difficult to implement this function in the current PitStorage
.
Sorry for the late response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Chaoyingz I have replied to your comments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Chaoyingz I have replied to your comments just now.
This PR is stale because it has been open for a year with no activity. Remove the stale label or comment on the PR otherwise this will be closed in 5 days |
Description
See #1241.
Changes:
Pfeature
can be used independently.FinancialInterval
class makes extendingInterval
easier.dump_pit
script to make the behavior of reading and writing pit data more uniform.Motivation and Context
How Has This Been Tested?
pytest qlib/tests/test_all_pipeline.py
under upper directory ofqlib
.Screenshots of Test Results (if appropriate):
Pipeline test:

Your own tests:

Types of changes
Notes
related PR #1000