-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop using a web server for the API docs generation #578
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big improvement and quick turnaround, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work!
scripts/lib/downloadImages.ts
Outdated
await closeWebServer(); | ||
} | ||
await pMap(images, async (img) => { | ||
const imgName = img.src.split("/").pop() || ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's with the || ""
? That seems like it would be a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used because the pop()
function returns string | undefined
. We will never have an undefined because the names of the images are always defined, and we don't work with empty arrays, so I changed to use !
at the end to tell the compiler it will be a string.
const imgName = img.src.split("/").pop()!;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent - that expresses the intent much better. Good change!
scripts/lib/downloadImages.ts
Outdated
return; | ||
} | ||
|
||
await $`cp ${originalImagesFolderPath}/${imgName} public/${img.dest}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use the builtin mechanism to copy files: https://www.geeksforgeeks.org/node-js-fspromises-copyfile-method/. It avoids overhead from spawning a bunch of new distinct processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better, thanks! TIL
Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com>
Closes #511. Example: ``` $ npm run check-pages-render Checked 10 / 71 pages Checked 20 / 71 pages Checked 30 / 71 pages Checked 40 / 71 pages Checked 50 / 71 pages Checked 60 / 71 pages Checked 70 / 71 pages ✅ All pages render without crashing ``` ## Only checks non-API docs by default This script is quite slow, at least on my M1 because the Docker images is built with x86. So, to avoid CI slowing down too much, we only check non-API pages in PR builds. Our nightly cron job checks everything else. ## Does not auto-start Docker We no longer have the file `webServer.ts` thanks to #578, which was a great improvement. Rather than adding back somewhat complex code for us to auto-start the server—and then to periodically ping if it's ready or time out—we expect the user to start up the server. That's acceptable since usually people will rely on CI to run this check. It's too slow for people to be frequently running locally. --------- Co-authored-by: Frank Harkins <frankharkins@hotmail.co.uk>
Closes #511. Example: ``` $ npm run check-pages-render Checked 10 / 71 pages Checked 20 / 71 pages Checked 30 / 71 pages Checked 40 / 71 pages Checked 50 / 71 pages Checked 60 / 71 pages Checked 70 / 71 pages ✅ All pages render without crashing ``` ## Only checks non-API docs by default This script is quite slow, at least on my M1 because the Docker images is built with x86. So, to avoid CI slowing down too much, we only check non-API pages in PR builds. Our nightly cron job checks everything else. ## Does not auto-start Docker We no longer have the file `webServer.ts` thanks to #578, which was a great improvement. Rather than adding back somewhat complex code for us to auto-start the server—and then to periodically ping if it's ready or time out—we expect the user to start up the server. That's acceptable since usually people will rely on CI to run this check. It's too slow for people to be frequently running locally. --------- Co-authored-by: Frank Harkins <frankharkins@hotmail.co.uk>
### Summary This PR is part of Qiskit#564 and changes the `updateApiDocs.ts` script to stop using a web server and a web crawler for the API docs generation. Instead, the script uses the HTML in the artifact zip file to convert it into markdown and copies the images from the same zip to the respective version folder without downloading it from any web server. ### Details #### Convert HTML to markdown After removing the web crawler used to download the HTML files that the script translated into markdown, we need to take into account HTML files that are used as a redirect, like `stubs/qiskit.utils.name_args.html` for Qiskit v0.45: ```html <html><head><meta http-equiv="refresh" content="0; url=https://qiskit.org/documentation/apidoc/utils.html#qiskit.utils.name_args"></head></html> ``` These files were not downloaded using the web crawler and therefore, not processed by the `sphinxHtmlToMarkdown` function. Now, the script will try to convert those files into markdown as well, ending up in an empty markdown file we need to remove. That is translated into this conditional in `updateApisDocs.ts`: ```ts if (result.markdown == "") { continue; } ``` #### Save images As for the images, we don't need the web server because we can find them in the folder called `_images` inside the artifact zip file. The script will copy all the images to the correct API version images folder. Moreover, the script only saves the images corresponding to the release notes for the current APIs (not using the historical argument). This change will allow us to remove unnecessary duplicate images we are currently downloading. #### Bug fix In addition to that change, the PR fixes an underlying issue with the old method. We were only downloading images when they were not present in the API images folder by checking if we already had a file with the same name. New versions of the same API have new images with the same name as the old ones, and because of that, we were never saving the new ones. All the historical versions will be regenerated in a follow-up. Commands used: ```bash npm run gen-api -- -p qiskit -v 0.45.0 -a https://github.com/Qiskit/qiskit/actions/runs/6744953436/artifacts/1026798160 npm run gen-api -- -p qiskit-ibm-provider -v 0.7.3 -a https://github.com/Qiskit/qiskit-ibm-provider/actions/runs/7301486985/artifacts/1131430696 npm run gen-api -- -p qiskit-ibm-runtime -v 0.17.0 -a https://github.com/Qiskit/qiskit-ibm-runtime/suites/18863019852/artifacts/1100724937 ``` Closes Qiskit#564 --------- Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com>
Closes Qiskit#511. Example: ``` $ npm run check-pages-render Checked 10 / 71 pages Checked 20 / 71 pages Checked 30 / 71 pages Checked 40 / 71 pages Checked 50 / 71 pages Checked 60 / 71 pages Checked 70 / 71 pages ✅ All pages render without crashing ``` ## Only checks non-API docs by default This script is quite slow, at least on my M1 because the Docker images is built with x86. So, to avoid CI slowing down too much, we only check non-API pages in PR builds. Our nightly cron job checks everything else. ## Does not auto-start Docker We no longer have the file `webServer.ts` thanks to Qiskit#578, which was a great improvement. Rather than adding back somewhat complex code for us to auto-start the server—and then to periodically ping if it's ready or time out—we expect the user to start up the server. That's acceptable since usually people will rely on CI to run this check. It's too slow for people to be frequently running locally. --------- Co-authored-by: Frank Harkins <frankharkins@hotmail.co.uk>
Summary
This PR is part of #564 and changes the
updateApiDocs.ts
script to stop using a web server and a web crawler for the API docs generation. Instead, the script uses the HTML in the artifact zip file to convert it into markdown and copies the images from the same zip to the respective version folder without downloading it from any web server.Details
Convert HTML to markdown
After removing the web crawler used to download the HTML files that the script translated into markdown, we need to take into account HTML files that are used as a redirect, like
stubs/qiskit.utils.name_args.html
for Qiskit v0.45:These files were not downloaded using the web crawler and therefore, not processed by the
sphinxHtmlToMarkdown
function. Now, the script will try to convert those files into markdown as well, ending up in an empty markdown file we need to remove. That is translated into this conditional inupdateApisDocs.ts
:Save images
As for the images, we don't need the web server because we can find them in the folder called
_images
inside the artifact zip file. The script will copy all the images to the correct API version images folder. Moreover, the script only saves the images corresponding to the release notes for the current APIs (not using the historical argument). This change will allow us to remove unnecessary duplicate images we are currently downloading.Bug fix
In addition to that change, the PR fixes an underlying issue with the old method. We were only downloading images when they were not present in the API images folder by checking if we already had a file with the same name. New versions of the same API have new images with the same name as the old ones, and because of that, we were never saving the new ones.
All the historical versions will be regenerated in a follow-up.
Commands used:
Closes #564