Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor doctest #30210

Merged
merged 5 commits into from
Apr 15, 2024
Merged

Refactor doctest #30210

merged 5 commits into from
Apr 15, 2024

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Apr 12, 2024

What does this PR do?

Main changes:

  • run doctest in multiple independent jobs
    • motivation: to avoid GPU OOM (or other environment states) affect subsequent tests
    • this is done by using a map between directories and their files
    • The files in docs/source/en/model_doc or docs/source/en/tasks are NOT grouped together with other files in the
      same directory: the objective is to run doctest against them in independent GitHub Actions jobs.
  • Use the same trick as in Split daily CI using 2 level matrix #28773 to allow GitHub Actions to generate more than 256 jobs in matrix
    • (as now we have many more (smaller) jobs instead of a single big job)

A file doc_test_results is saved and uploaded as artifact so we can always have a way to check.

A (example) run page: see here
A full run page: see here

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh ydshieh requested a review from amyeroberts April 12, 2024 09:15
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new workflow file that contain the doctest job definition (logic copied from existing file doctests.yml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the doctest job definition to the new workflow file. Now, this workflow only get the list of files to run doctest and how we run it (using GitHub Actions matrix)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing really changed: previously, we only have one artifact but now we have multiple artifacts. The change here is just to taking into account of this fact to aggregate the information (test results) across multiple reports.

Comment on lines +15 to +32
"""
This script is used to get the files against which we will run doc testing.
This uses `tests_fetcher.get_all_doctest_files` then groups the test files by their directory paths.

The files in `docs/source/en/model_doc` or `docs/source/en/tasks` are **NOT** grouped together with other files in the
same directory: the objective is to run doctest against them in independent GitHub Actions jobs.

Assume we are under `transformers` root directory:
To get a map (dictionary) between directory (or file) paths and the corresponding files
```bash
python utils/split_doctest_jobs.py
```
or to get a list of lists of directory (or file) paths
```bash
python utils/split_doctest_jobs.py --only_return_keys --num_splits 4
```
(this is used to allow GitHub Actions to generate more than 256 jobs using matrix)
"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check here

Comment on lines +507 to +508
# change to use "/" as path separator
test_files_to_run = ["/".join(Path(x).parts) for x in test_files_to_run]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make everything using /: so it will works on Windows too

@ydshieh ydshieh requested a review from LysandreJik April 12, 2024 12:51
@ydshieh ydshieh mentioned this pull request Apr 12, 2024
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clean! Thanks @ydshieh!

@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 15, 2024

Merge now :-)

@ydshieh ydshieh merged commit b6b6daf into main Apr 15, 2024
8 checks passed
@ydshieh ydshieh deleted the refactor_doctest branch April 15, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants