Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using a web server for the API docs generation #578

Merged
merged 6 commits into from
Jan 3, 2024

Conversation

arnaucasau
Copy link
Collaborator

@arnaucasau arnaucasau commented Jan 3, 2024

Summary

This PR is part of #564 and changes the updateApiDocs.ts script to stop using a web server and a web crawler for the API docs generation. Instead, the script uses the HTML in the artifact zip file to convert it into markdown and copies the images from the same zip to the respective version folder without downloading it from any web server.

Details

Convert HTML to markdown

After removing the web crawler used to download the HTML files that the script translated into markdown, we need to take into account HTML files that are used as a redirect, like stubs/qiskit.utils.name_args.html for Qiskit v0.45:

<html><head><meta http-equiv="refresh" content="0; url=https://qiskit.org/documentation/apidoc/utils.html#qiskit.utils.name_args"></head></html>

These files were not downloaded using the web crawler and therefore, not processed by the sphinxHtmlToMarkdown function. Now, the script will try to convert those files into markdown as well, ending up in an empty markdown file we need to remove. That is translated into this conditional in updateApisDocs.ts:

if (result.markdown == "") {
  continue;
}

Save images

As for the images, we don't need the web server because we can find them in the folder called _images inside the artifact zip file. The script will copy all the images to the correct API version images folder. Moreover, the script only saves the images corresponding to the release notes for the current APIs (not using the historical argument). This change will allow us to remove unnecessary duplicate images we are currently downloading.

Bug fix

In addition to that change, the PR fixes an underlying issue with the old method. We were only downloading images when they were not present in the API images folder by checking if we already had a file with the same name. New versions of the same API have new images with the same name as the old ones, and because of that, we were never saving the new ones.

All the historical versions will be regenerated in a follow-up.

Commands used:

npm run gen-api -- -p qiskit -v 0.45.0 -a https://github.com/Qiskit/qiskit/actions/runs/6744953436/artifacts/1026798160
npm run gen-api -- -p qiskit-ibm-provider -v 0.7.3 -a https://github.com/Qiskit/qiskit-ibm-provider/actions/runs/7301486985/artifacts/1131430696
npm run gen-api -- -p qiskit-ibm-runtime -v 0.17.0 -a https://github.com/Qiskit/qiskit-ibm-runtime/suites/18863019852/artifacts/1100724937

Closes #564

Copy link
Member

@frankharkins frankharkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big improvement and quick turnaround, thanks!

Copy link
Collaborator

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

await closeWebServer();
}
await pMap(images, async (img) => {
const imgName = img.src.split("/").pop() || "";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with the || ""? That seems like it would be a bug.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used because the pop() function returns string | undefined. We will never have an undefined because the names of the images are always defined, and we don't work with empty arrays, so I changed to use ! at the end to tell the compiler it will be a string.

const imgName = img.src.split("/").pop()!;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent - that expresses the intent much better. Good change!

return;
}

await $`cp ${originalImagesFolderPath}/${imgName} public/${img.dest}`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use the builtin mechanism to copy files: https://www.geeksforgeeks.org/node-js-fspromises-copyfile-method/. It avoids overhead from spawning a bunch of new distinct processes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, thanks! TIL

arnaucasau and others added 2 commits January 3, 2024 13:58
Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com>
@arnaucasau arnaucasau added this pull request to the merge queue Jan 3, 2024
Merged via the queue into Qiskit:main with commit d06ed65 Jan 3, 2024
3 checks passed
@arnaucasau arnaucasau deleted the fix-images branch January 3, 2024 13:17
github-merge-queue bot pushed a commit that referenced this pull request Jan 5, 2024
Closes #511. Example:

```
$ npm run check-pages-render

Checked 10 / 71 pages
Checked 20 / 71 pages
Checked 30 / 71 pages
Checked 40 / 71 pages
Checked 50 / 71 pages
Checked 60 / 71 pages
Checked 70 / 71 pages
✅ All pages render without crashing
```

## Only checks non-API docs by default

This script is quite slow, at least on my M1 because the Docker images
is built with x86.

So, to avoid CI slowing down too much, we only check non-API pages in PR
builds. Our nightly cron job checks everything else.

## Does not auto-start Docker

We no longer have the file `webServer.ts` thanks to
#578, which was a great
improvement.

Rather than adding back somewhat complex code for us to auto-start the
server—and then to periodically ping if it's ready or time out—we expect
the user to start up the server. That's acceptable since usually people
will rely on CI to run this check. It's too slow for people to be
frequently running locally.

---------

Co-authored-by: Frank Harkins <frankharkins@hotmail.co.uk>
github-actions bot pushed a commit that referenced this pull request Jan 9, 2024
Closes #511. Example:

```
$ npm run check-pages-render

Checked 10 / 71 pages
Checked 20 / 71 pages
Checked 30 / 71 pages
Checked 40 / 71 pages
Checked 50 / 71 pages
Checked 60 / 71 pages
Checked 70 / 71 pages
✅ All pages render without crashing
```

## Only checks non-API docs by default

This script is quite slow, at least on my M1 because the Docker images
is built with x86.

So, to avoid CI slowing down too much, we only check non-API pages in PR
builds. Our nightly cron job checks everything else.

## Does not auto-start Docker

We no longer have the file `webServer.ts` thanks to
#578, which was a great
improvement.

Rather than adding back somewhat complex code for us to auto-start the
server—and then to periodically ping if it's ready or time out—we expect
the user to start up the server. That's acceptable since usually people
will rely on CI to run this check. It's too slow for people to be
frequently running locally.

---------

Co-authored-by: Frank Harkins <frankharkins@hotmail.co.uk>
frankharkins pushed a commit to frankharkins/documentation that referenced this pull request Jul 22, 2024
### Summary

This PR is part of Qiskit#564 and changes the `updateApiDocs.ts` script to
stop using a web server and a web crawler for the API docs generation.
Instead, the script uses the HTML in the artifact zip file to convert it
into markdown and copies the images from the same zip to the respective
version folder without downloading it from any web server.

### Details

#### Convert HTML to markdown

After removing the web crawler used to download the HTML files that the
script translated into markdown, we need to take into account HTML files
that are used as a redirect, like `stubs/qiskit.utils.name_args.html`
for Qiskit v0.45:

```html
<html><head><meta http-equiv="refresh" content="0; url=https://qiskit.org/documentation/apidoc/utils.html#qiskit.utils.name_args"></head></html>
```

These files were not downloaded using the web crawler and therefore, not
processed by the `sphinxHtmlToMarkdown` function. Now, the script will
try to convert those files into markdown as well, ending up in an empty
markdown file we need to remove. That is translated into this
conditional in `updateApisDocs.ts`:

```ts
if (result.markdown == "") {
  continue;
}
```

#### Save images

As for the images, we don't need the web server because we can find them
in the folder called `_images` inside the artifact zip file. The script
will copy all the images to the correct API version images folder.
Moreover, the script only saves the images corresponding to the release
notes for the current APIs (not using the historical argument). This
change will allow us to remove unnecessary duplicate images we are
currently downloading.

#### Bug fix

In addition to that change, the PR fixes an underlying issue with the
old method. We were only downloading images when they were not present
in the API images folder by checking if we already had a file with the
same name. New versions of the same API have new images with the same
name as the old ones, and because of that, we were never saving the new
ones.

All the historical versions will be regenerated in a follow-up.

Commands used:
```bash
npm run gen-api -- -p qiskit -v 0.45.0 -a https://github.com/Qiskit/qiskit/actions/runs/6744953436/artifacts/1026798160
npm run gen-api -- -p qiskit-ibm-provider -v 0.7.3 -a https://github.com/Qiskit/qiskit-ibm-provider/actions/runs/7301486985/artifacts/1131430696
npm run gen-api -- -p qiskit-ibm-runtime -v 0.17.0 -a https://github.com/Qiskit/qiskit-ibm-runtime/suites/18863019852/artifacts/1100724937
```

Closes Qiskit#564

---------

Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com>
frankharkins added a commit to frankharkins/documentation that referenced this pull request Jul 22, 2024
Closes Qiskit#511. Example:

```
$ npm run check-pages-render

Checked 10 / 71 pages
Checked 20 / 71 pages
Checked 30 / 71 pages
Checked 40 / 71 pages
Checked 50 / 71 pages
Checked 60 / 71 pages
Checked 70 / 71 pages
✅ All pages render without crashing
```

## Only checks non-API docs by default

This script is quite slow, at least on my M1 because the Docker images
is built with x86.

So, to avoid CI slowing down too much, we only check non-API pages in PR
builds. Our nightly cron job checks everything else.

## Does not auto-start Docker

We no longer have the file `webServer.ts` thanks to
Qiskit#578, which was a great
improvement.

Rather than adding back somewhat complex code for us to auto-start the
server—and then to periodically ping if it's ready or time out—we expect
the user to start up the server. That's acceptable since usually people
will rely on CI to run this check. It's too slow for people to be
frequently running locally.

---------

Co-authored-by: Frank Harkins <frankharkins@hotmail.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't use server during API generation
3 participants