Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the pit collector script #982

Merged
merged 5 commits into from
Mar 18, 2022

Conversation

Chaoyingz
Copy link
Contributor

Description

Optimize the pit collector script

Motivation and Context

I made a few changes to pit collector script:

  • Split the download script into two parts: download data and normalize data
  • Split _get_data_from_baostock method to improve readability
  • Corresponding adjustment README.md and test_pit.py

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
  2. Your own tests:

PIT test results:

image

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

@SunsetWolf
Copy link
Collaborator

According to the PIT test results image you provided, I did the test and found some error messages and would like to get your help to fix it.
image

@Chaoyingz
Copy link
Contributor Author

hi, @SunsetWolf, in qlib/data/data.py 1166 line:

try:
    return DatasetD.dataset(
        instruments, fields, start_time, end_time, freq, disk_cache, inst_processors=inst_processors
    )
except TypeError:
    return DatasetD.dataset(instruments, fields, start_time, end_time, freq, inst_processors=inst_processors)

It will catch the TypeError exception, but the exception is not captured in the screenshot you can provide, can you provide more information?

@SunsetWolf
Copy link
Collaborator

I would be more than happy to provide you with more detailed information about the error I received, and I hope this information is useful to you in helping me resolve this issue. Thanks!
image

@Chaoyingz
Copy link
Contributor Author

Running this test requires initializing the calendar and instrument data.
Please execute the following command before running the test(you can find them from the comments in test_pit.py):

  1. python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

  2. python scripts/data_collector/pit/collector.py download_data --source_dir ~/.qlib/stock_data/source/pit --start 2000-01-01 --end 2020-01-01 --interval quarterly --symbol_regex "^(600519|000725).*"

  3. python scripts/data_collector/pit/collector.py normalize_data --interval quarterly --source_dir ~/.qlib/stock_data/source/pit --normalize_dir ~/.qlib/stock_data/source/pit_normalized

  4. python scripts/dump_pit.py dump --csv_path ~/.qlib/stock_data/source/pit_normalized --qlib_dir ~/.qlib/qlib_data/cn_data --interval quarterly

@Chaoyingz Chaoyingz requested a review from you-n-g March 18, 2022 07:55
@Chaoyingz Chaoyingz requested a review from you-n-g March 18, 2022 08:57
@Chaoyingz Chaoyingz requested a review from you-n-g March 18, 2022 10:32
@you-n-g
Copy link
Collaborator

you-n-g commented Mar 18, 2022

Everything looks much simpler and cleaner than Qlib's initial version!
It looks great!
Thanks!

@you-n-g you-n-g merged commit 8efc8b9 into microsoft:main Mar 18, 2022
@Chaoyingz Chaoyingz deleted the pit-collector-optimize branch March 22, 2022 03:38
@you-n-g you-n-g added the enhancement New feature or request label Apr 24, 2022
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* Optimize the pit collector script

* Add copyright notice to collector.py

* Remove unnecessary parameters for test_pit.py

* Update test_pit.py

* Update test_pit.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants