Skip to content

Commit

Permalink
Stop using a web server for the API docs generation (Qiskit#578)
Browse files Browse the repository at this point in the history
### Summary

This PR is part of Qiskit#564 and changes the `updateApiDocs.ts` script to
stop using a web server and a web crawler for the API docs generation.
Instead, the script uses the HTML in the artifact zip file to convert it
into markdown and copies the images from the same zip to the respective
version folder without downloading it from any web server.

### Details

#### Convert HTML to markdown

After removing the web crawler used to download the HTML files that the
script translated into markdown, we need to take into account HTML files
that are used as a redirect, like `stubs/qiskit.utils.name_args.html`
for Qiskit v0.45:

```html
<html><head><meta http-equiv="refresh" content="0; url=https://qiskit.org/documentation/apidoc/utils.html#qiskit.utils.name_args"></head></html>
```

These files were not downloaded using the web crawler and therefore, not
processed by the `sphinxHtmlToMarkdown` function. Now, the script will
try to convert those files into markdown as well, ending up in an empty
markdown file we need to remove. That is translated into this
conditional in `updateApisDocs.ts`:

```ts
if (result.markdown == "") {
  continue;
}
```

#### Save images

As for the images, we don't need the web server because we can find them
in the folder called `_images` inside the artifact zip file. The script
will copy all the images to the correct API version images folder.
Moreover, the script only saves the images corresponding to the release
notes for the current APIs (not using the historical argument). This
change will allow us to remove unnecessary duplicate images we are
currently downloading.

#### Bug fix

In addition to that change, the PR fixes an underlying issue with the
old method. We were only downloading images when they were not present
in the API images folder by checking if we already had a file with the
same name. New versions of the same API have new images with the same
name as the old ones, and because of that, we were never saving the new
ones.

All the historical versions will be regenerated in a follow-up.

Commands used:
```bash
npm run gen-api -- -p qiskit -v 0.45.0 -a https://github.com/Qiskit/qiskit/actions/runs/6744953436/artifacts/1026798160
npm run gen-api -- -p qiskit-ibm-provider -v 0.7.3 -a https://github.com/Qiskit/qiskit-ibm-provider/actions/runs/7301486985/artifacts/1131430696
npm run gen-api -- -p qiskit-ibm-runtime -v 0.17.0 -a https://github.com/Qiskit/qiskit-ibm-runtime/suites/18863019852/artifacts/1100724937
```

Closes Qiskit#564

---------

Co-authored-by: Eric Arellano <14852634+Eric-Arellano@users.noreply.github.com>
  • Loading branch information
arnaucasau and Eric-Arellano authored Jan 3, 2024
1 parent 26acc62 commit ba9f3de
Show file tree
Hide file tree
Showing 142 changed files with 63 additions and 320 deletions.
35 changes: 0 additions & 35 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@
"mdast": "^3.0.0",
"mkdirp": "^3.0.1",
"p-map": "^6.0.0",
"p-queue": "^7.4.1",
"prettier": "^3.0.3",
"rehype-parse": "^8.0.0",
"rehype-remark": "^9.1.2",
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/images/api/qiskit/circuit-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/images/api/qiskit/circuit-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/images/api/qiskit/circuit-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/images/api/qiskit/circuit-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/images/api/qiskit/circuit-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/images/api/qiskit/circuit_library-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/images/api/qiskit/converters-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/images/api/qiskit/providers_fake_provider-1_00.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/images/api/qiskit/providers_fake_provider-1_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified public/images/api/qiskit/providers_fake_provider-1_02.png
Binary file modified public/images/api/qiskit/pulse-1.png
Binary file modified public/images/api/qiskit/pulse-2.png
Binary file modified public/images/api/qiskit/pulse-3.png
Binary file modified public/images/api/qiskit/pulse-4.png
Binary file modified public/images/api/qiskit/pulse-5.png
Binary file modified public/images/api/qiskit/pulse-6.png
Binary file modified public/images/api/qiskit/pulse-7.png
Binary file modified public/images/api/qiskit/pulse-8.png
Binary file modified public/images/api/qiskit/pulse-9.png
Binary file modified public/images/api/qiskit/qasm3-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-ControlledGate-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-ControlledGate-2.png
Binary file modified public/images/api/qiskit/qiskit-circuit-InstructionSet-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-Operation-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-Parameter-1_00.png
Binary file modified public/images/api/qiskit/qiskit-circuit-Parameter-1_01.png
Binary file modified public/images/api/qiskit/qiskit-circuit-QuantumCircuit-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-QuantumCircuit-2.png
Binary file modified public/images/api/qiskit/qiskit-circuit-QuantumCircuit-3_00.png
Binary file modified public/images/api/qiskit/qiskit-circuit-QuantumCircuit-3_01.png
Binary file modified public/images/api/qiskit/qiskit-circuit-QuantumCircuit-4_00.png
Binary file modified public/images/api/qiskit/qiskit-circuit-QuantumCircuit-4_01.png
Binary file modified public/images/api/qiskit/qiskit-circuit-QuantumCircuit-5.png
Binary file modified public/images/api/qiskit/qiskit-circuit-QuantumCircuit-6.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-AND-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-AND-2.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-GMS-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-GR-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-GRX-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-GRY-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-GRZ-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-GraphState-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-IQP-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-IQP-2.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-InnerProduct-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-MCMTVChain-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-OR-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-OR-2.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-QFT-1.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-QFT-2.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-QFT-3.png
Binary file modified public/images/api/qiskit/qiskit-circuit-library-XOR-1.png
Binary file modified public/images/api/qiskit/qiskit-quantum_info-Statevector-1.png
Binary file modified public/images/api/qiskit/qiskit-visualization-pulse_drawer-1.png
Binary file modified public/images/api/qiskit/qiskit-visualization-pulse_drawer-2.png
Binary file modified public/images/api/qiskit/qiskit-visualization-pulse_drawer-3.png
Binary file modified public/images/api/qiskit/transpiler-10.png
Binary file modified public/images/api/qiskit/transpiler-11.png
Binary file modified public/images/api/qiskit/transpiler-12.png
Binary file modified public/images/api/qiskit/transpiler-13.png
Binary file modified public/images/api/qiskit/transpiler-14.png
Binary file modified public/images/api/qiskit/transpiler-15.png
Binary file modified public/images/api/qiskit/transpiler-16.png
Binary file modified public/images/api/qiskit/transpiler-17.png
Binary file modified public/images/api/qiskit/transpiler-4.png
Binary file modified public/images/api/qiskit/transpiler-5.png
Binary file modified public/images/api/qiskit/transpiler-6.png
Binary file modified public/images/api/qiskit/transpiler-7.png
Binary file modified public/images/api/qiskit/transpiler-8.png
Binary file modified public/images/api/qiskit/transpiler-9.png
Binary file modified public/images/api/qiskit/visualization-1.png
Binary file modified public/images/api/qiskit/visualization-2.png
Binary file modified public/images/api/qiskit/visualization-3.png
Binary file modified public/images/api/qiskit/visualization-4.png
Binary file modified public/images/api/qiskit/visualization-5.png
Binary file modified public/images/api/qiskit/visualization-6.png
2 changes: 1 addition & 1 deletion scripts/commands/convertApiDocsToHistorical.ts
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ async function copyImages(pkgName: string, versionWithoutPatch: string) {
const imageDirSource = `${getRoot()}/public/images/api/${pkgName}/`;
const imageDirDest = `${getRoot()}/public/images/api/${pkgName}/${versionWithoutPatch}`;
await mkdirp(imageDirDest);
await $`find ${imageDirSource}/* -maxdepth 0 -type f | xargs -I {} cp -a {} ${imageDirDest}`;
await $`find ${imageDirSource}/* -maxdepth 0 -type f | grep -v "release_notes" | xargs -I {} cp -a {} ${imageDirDest}`;
}

async function updateLinksFile(
Expand Down
90 changes: 17 additions & 73 deletions scripts/commands/updateApiDocs.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,13 @@
import { $ } from "zx";
import { zxMain } from "../lib/zx";
import { pathExists, getRoot } from "../lib/fs";
import { readFile, writeFile, readdir } from "fs/promises";
import { readFile, writeFile } from "fs/promises";
import { globby } from "globby";
import { join, parse, relative } from "path";
import { sphinxHtmlToMarkdown } from "../lib/sphinx/sphinxHtmlToMarkdown";
import { uniq, uniqBy } from "lodash";
import { mkdirp } from "mkdirp";
import { WebCrawler } from "../lib/WebCrawler";
import { downloadImages } from "../lib/downloadImages";
import { saveImages } from "../lib/saveImages";
import { generateToc } from "../lib/sphinx/generateToc";
import { SphinxToMdResult } from "../lib/sphinx/SphinxToMdResult";
import { mergeClassMembers } from "../lib/sphinx/mergeClassMembers";
Expand All @@ -35,7 +34,6 @@ import { hideBin } from "yargs/helpers";
import { Pkg, PkgInfo, Link } from "../lib/sharedTypes";
import transformLinks from "transform-markdown-links";
import { downloadCIArtifact } from "../lib/downloadArtifacts";
import { startWebServer, closeWebServer } from "../lib/webServer";
import {
findLegacyReleaseNotes,
addNewReleaseNotes,
Expand Down Expand Up @@ -172,7 +170,7 @@ zxMain(async () => {
if (await pathExists(destination)) {
console.log(`Skip downloading sources for ${pkg.name}:${pkg.version}`);
} else {
await downloadApiSources(pkg, artifactUrl, destination);
await downloadCIArtifact(pkg.name, artifactUrl, destination);
}

const baseSourceUrl = `https://github.com/${pkg.githubSlug}/tree/${pkg.versionWithoutPatch}/`;
Expand All @@ -191,7 +189,12 @@ zxMain(async () => {
console.log(
`Convert sphinx html to markdown for ${pkg.name}:${pkg.versionWithoutPatch}`,
);
await convertHtmlToMarkdown(destination, outputDir, baseSourceUrl, pkg);
await convertHtmlToMarkdown(
`${destination}/artifact`,
outputDir,
baseSourceUrl,
pkg,
);
});

/**
Expand All @@ -208,44 +211,6 @@ async function rmFilesInFolder(
await $`find ${dir}/* -maxdepth 0 -type f | xargs rm -f {}`;
}

async function saveHtml(options: {
baseUrl: string;
initialUrl: string;
destination: string;
}): Promise<void> {
const { baseUrl, destination, initialUrl } = options;
let successCount = 0;
let errorCount = 0;
const crawler = new WebCrawler({
initialUrl: initialUrl,
followUrl(url) {
return (
url.startsWith(`${baseUrl}/apidocs`) ||
url.startsWith(`${baseUrl}/apidoc`) ||
url.startsWith(`${baseUrl}/stubs`) ||
url.startsWith(`${baseUrl}/release_notes`)
);
},
async onSuccess(url: string, content: string) {
successCount++;
const relativePath = url.substring(`${baseUrl}/`.length);
const destinationPath = `${destination}/${relativePath}`;
const { dir } = parse(destinationPath);
await mkdirp(dir); // TODO track the folders already created
await writeFile(destinationPath, content);
},
async onError(url: string, error: unknown) {
errorCount++;
console.error(`Error ${url}`, error);
},
});
await crawler.run();
console.log(`Download summary from ${baseUrl}`, {
success: successCount,
error: errorCount,
});
}

async function convertHtmlToMarkdown(
htmlPath: string,
markdownPath: string,
Expand Down Expand Up @@ -279,6 +244,12 @@ async function convertHtmlToMarkdown(
releaseNotesTitle: `${pkg.title} ${pkg.versionWithoutPatch} release notes`,
});

// Avoid creating an empty markdown file for HTML files without content
// (e.g. HTML redirects)
if (result.markdown == "") {
continue;
}

const { dir, name } = parse(`${markdownPath}/${file}`);
let url = `/${relative(`${getRoot()}/docs`, dir)}/${name}`;

Expand Down Expand Up @@ -368,37 +339,10 @@ async function convertHtmlToMarkdown(
JSON.stringify(pkg_json, null, 2) + "\n",
);

console.log("Downloading images");
await downloadImages(
allImages.map((img) => ({
src: img.src,
dest: `${getRoot()}/public${img.dest}`,
})),
`${htmlPath}/artifact`,
);
console.log("Saving images");
await saveImages(allImages, `${htmlPath}/_images`, pkg);
}

function urlToPath(url: string) {
return `${getRoot()}/docs${url}.md`;
}

/**
* Uses a local web server to download the HTML files from a specific CI artifact
*/
async function downloadApiSources(
pkg: Pkg,
artifactUrl: string,
destination: string,
) {
await startWebServer(`${destination}/artifact`);
try {
await downloadCIArtifact(pkg.name, artifactUrl, destination);
await saveHtml({
baseUrl: pkg.baseUrl,
initialUrl: pkg.initialUrl,
destination,
});
} finally {
await closeWebServer();
}
}
117 changes: 0 additions & 117 deletions scripts/lib/WebCrawler.ts

This file was deleted.

58 changes: 0 additions & 58 deletions scripts/lib/downloadImages.ts

This file was deleted.

Loading

0 comments on commit ba9f3de

Please sign in to comment.