Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: Include links to source code in documentation #22405

Closed
wants to merge 9 commits into from

Conversation

rubys
Copy link
Member

@rubys rubys commented Aug 19, 2018

first proof of concept

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory. labels Aug 19, 2018
@rubys rubys added the wip Issues and PRs that are still a work in progress. label Aug 19, 2018
@rubys
Copy link
Member Author

rubys commented Aug 19, 2018

This is a first "proof of concept" step towards implementing #22393. It scans lib/console.js and produces JSON output identifying where each of the exported names are defined.

The basic idea is that out/doc/api/console.html could depend on this JSON, which in turn could depend on lib/console.js, so every time the source file changes, the documentation would be updated. Presumably tools/doc/html.js would read the JSON and match it against headers, and when it found a match, would insert a hyptertext link.

A few things to note from just this first step:

  • It is entirely OK for this process to extract more than is publicly documented. Example: [kGetInspectOptions].
  • At least initially, it should be OK for this process to not find something that is publicly documented. Examples: Inspector only methods, tools/doc/html.js simply wouldn't add a source link in such cases. Over time as we approach 100% we can revisit this.
  • It is impossible to predict at build time what the commit hash that has yet to be committed will be. This argues for the Rust approach of syntax highlighting and publishing the source along with the documentation, at least during development time. Perhaps it might make sense to switch this at release publishing time.
    .

@richardlau
Copy link
Member

It is impossible to predict at build time what the commit hash that has yet to be committed will be.

It should be possible for releases (docs are built at the same time). Probably for nightlies too.

@refack
Copy link
Contributor

refack commented Aug 19, 2018

It is impossible to predict at build time what the commit hash that has yet to be committed will be.

It should be possible for releases (docs are built at the same time). Probably for nightlies too.

Indeed, the commit hash is known and fixed by the time the docs are rendered. Since what we store in git are doc "sources", and only render them as part of the release process. They are even tagged so we can construct a URI apriori (https://github.com/nodejs/node/blob/v10.9.0/lib/console.js#L32).
So it's just a matter of passing it as argument. They are even

@rubys
Copy link
Member Author

rubys commented Aug 20, 2018

It certainly can be made to work for releases and nightlies. What should be done for builds done by developers on their own machines?

@refack
Copy link
Contributor

refack commented Aug 20, 2018

git rev-parse HEAD. But as I mentioned sometime ago, that's a less interesting edge case (people rendering the docs locally for self consumption).

reminder to self. find a way to put that into the binary as well

@rubys
Copy link
Member Author

rubys commented Aug 20, 2018

people rendering the docs locally for self consumption

Note that this use case would include people who are verifying their changes before pushing them.

@rubys
Copy link
Member Author

rubys commented Aug 20, 2018

@rubys rubys removed the wip Issues and PRs that are still a work in progress. label Aug 20, 2018
@rubys
Copy link
Member Author

rubys commented Aug 20, 2018

@rubys
Copy link
Member Author

rubys commented Aug 20, 2018

@rubys
Copy link
Member Author

rubys commented Aug 20, 2018

I think it would be worthwhile for this code to land at this point in order to (1) debug the build processes, (2) get wider review of the appearance and usability of the output produced, (3) get feedback on the developer experience.

Known issues:

  1. Identifying exported definitions and matching those definitions to the documentaiton will require changes to one or more of:

    1. tools/doc/apilinks.js parsing logic
    2. api definition sources in lib/*.js
    3. markdown sources in doc/api/*.md
      A concrete example: if you look at https://nodejs.org/api/buffer.html, you will
      see that instance methods are prefixed by buf.. Should this be changed to
      bufer? Should the parser just know that buffer is special?
  2. Local builds will, of course, pick up uncommitted changes. The github links
    added to the generated documentation will point to the the last committed
    changes, which may mean that line numbers are off, and may even represent a
    commit that hasn't been pushed.

  3. Technically, changing any library source will make the documentation
    stale, but I feel that rebuilding the full set of documentation after every
    source change is a big suboptimal, so I made this an order only
    prerequisite
    .

  4. I haven't added this to the windows build just yet. It wouldn't be hard to do, just hasn't been done yet.

Most of all, landing this would enable mutliple people to contribute to completing this effort.

@rubys
Copy link
Member Author

rubys commented Aug 20, 2018

Test failure is unrelated:

16:02:24 not ok 1940 parallel/test-trace-events-fs-sync
16:02:24   ---
16:02:24   duration_ms: 2.24
16:02:24   severity: fail
16:02:24   exitcode: 1
16:02:24   stack: |-
16:02:24     assert.js:84
16:02:24       throw new AssertionError(obj);
16:02:24       ^
16:02:24     
16:02:24     AssertionError [ERR_ASSERTION]: fs.sync.lchown:
16:02:24     { status: null,
16:02:24       signal: 'SIGSEGV',
16:02:24       output: [ null, '', '' ],
16:02:24       pid: 56249,
16:02:24       stdout: '',
16:02:24       stderr: '' }
16:02:24         at Object.<anonymous> (/data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/parallel/test-trace-events-fs-sync.js:139:10)
16:02:24         at Module._compile (internal/modules/cjs/loader.js:689:30)
16:02:24         at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
16:02:24         at Module.load (internal/modules/cjs/loader.js:599:32)
16:02:24         at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
16:02:24         at Function.Module._load (internal/modules/cjs/loader.js:530:3)
16:02:24         at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
16:02:24         at startup (internal/bootstrap/node.js:257:19)
16:02:24         at bootstrapNodeJSCore (internal/bootstrap/node.js:634:3)
16:02:24   ...

@rubys
Copy link
Member Author

rubys commented Aug 21, 2018

Flaky test: #21038

@rubys rubys requested a review from vsemozhetbyt August 21, 2018 20:26
@refack
Copy link
Contributor

refack commented Aug 21, 2018

@rubys
Copy link
Member Author

rubys commented Aug 21, 2018

Demonstration available

@refack Thanks! Thoughts?

@refack
Copy link
Contributor

refack commented Aug 22, 2018

  1. My instinct is to go ith "convention over configuration, that is have the docs as homogenous as possible (for example use ClassName.toLower() for instance name that is the prefix for methods), so that we can write a small set to inferals.
    When that fails, we can put markup in the .md files and/or the .js files. This will require apilinks.js to produce a list of "misses".
  2. Not much we can do about that without some git contortionism (try to find the line in the upstream/HEAD revision).
  3. Seems perfectly file to me.
  4. Personally I don't see that as a blocker. It might get resolved by a change I'm planning for the Makefile...

/CC @nodejs/website @nodejs/website-redesign let's get some feedback from the website folk.

const repo = (child_process.execSync('git remote get-url origin').toString()
.match(/(\w+\/\w+)\.git\r?\n?$/) || ['', 'nodejs/node'])[1];
let hash = child_process.execSync('git log -1 --pretty=%H').toString().trim();
const repo = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should get this from the environment with a default of https://github.com/nodejs/node/blob/, similar to

node/Makefile

Lines 606 to 609 in 36468ca

# Google Analytics ID used for tracking API docs page views, empty
# DOCS_ANALYTICS means no tracking scripts will be included in the
# generated .html files
DOCS_ANALYTICS ?=

For example on my PC I don't have an origin remote, I have upstream instead. On the CI use use the ssh protocol so: remote.origin.url=git@github.com:nodejs/node.git

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with an environment variable, and I'm fine with a default to nodejs/node when all else fails, and I'm fine with looking at upstream if origin is not found... that being said, I continue to assert that enabling a developer to validate their own work before creating a pull request is an important use case.

It looks like git rev-parse --abbrev-ref --symbolic-full-name @{u} will show the correct remote associated with the current branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that being said, I continue to assert that enabling a developer to validate their own work before creating a pull request is an important use case.

Sure, but I'm willing to assume the most common case, then allow for the developer to manually override it. Case in point our form to start a CI job. It has defaults, and allows the submitter to override them. IMHO trying to deduce configuration on end user's machine is just a world of hurt.

It looks like git rev-parse --abbrev-ref --symbolic-full-name @{u} will show the correct remote associated with the current branch.

The trick is that we need to read files in this non checked out ref. Maybe we can stream them out, and only those who are diff from the current HEAD.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain world of hurt?

I found an alternate set of github commands that does the job, perhaps more clearly:

LOCAL_BRANCH=`git name-rev --name-only HEAD`
TRACKING_REMOTE=`git config branch.$LOCAL_BRANCH.remote`
REMOTE_URL=`git config remote.$TRACKING_REMOTE.url`

This (plus fallback to nodejs/node when any of the above commands fail) should work for the common case (CI, release). It should also work when a developer is working on branch in their own private fork.

Why the insistence on inconveniencing the developer?

@Trott
Copy link
Member

Trott commented Aug 22, 2018

@refack
Copy link
Contributor

refack commented Aug 22, 2018

Please explain world of hurt?

an endless wormhole of edge cases. For example as I mentioned in our CI (which also does the release) git config -l is:

remote.origin.url=git@github.com:nodejs/node.git
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
user.name=Dummy
user.email=dummy@dummy.com
branch.master.remote=origin
branch.master.merge=refs/heads/master

so the only url is git@github.com:nodejs/node.git which still requires mangling.
On that note, simply parsing git remote -v can get us close enough with just one fork:

> git remote -v
origin  git@github.com:nodejs/node.git (fetch)
origin  git@github.com:nodejs/node.git (push)

or

refack  https://github.com/refack/node.git (fetch)
refack  https://github.com/refack/node.git (push)
upstream        https://github.com/nodejs/node.git (fetch)
upstream        https://github.com/nodejs/node.git (push)

On top of that we want to maintain a tarball build that work on code not in a git workspace.


Why the insistence on inconveniencing the developer?

I don't want to inconvenience the developer. I just don't want to over optimize this and end up with a bunch of unmaintainable code.

Thinking about this some more IMHO the main thing a dev would want to verify is that the docs detected the correct file/line-number, so maybe having an intermediary step that also generate an api_local.json with the local FS paths and line numbers of what we actually find might be a very useful debug tool.
That or generate apilinks.json with file:// URIs (although that won't jump to the selected line, we could generate a bunch of local markedup code just for that).
Everything else seems like a too rough approximation, with too many assumptions.

@rubys
Copy link
Member Author

rubys commented Aug 22, 2018

so the only url is git@github.com:nodejs/node.git

The current code contains the following line:

 .match(/(\w+\/\w+)\.git\r?\n?$/) || ['', 'nodejs/node'])[1];

By looking only at the end of the URL, this code effortlessly handles both git and https urls. And defaults to nodejs/node if no match is found. I still don't see the problem. The entire script runs in sub-second time, and requires no network accesses, so the number of forks should not be an issue.

Net: for all the cases you have described, the code (particularly when I apply the changes mentioned above) will "just work". And if anything goes wrong, it falls back to nodejs/node which is exactly what you are advocating.

@refack
Copy link
Contributor

refack commented Aug 22, 2018

I'm just voicing my 2c. I won't block this in any way, as I tend to yield to the author on personal style or cost/benefit decisions.

How about generating an api_local.json file for match debugging? (Not instead of, but in addition to). If we print those to stdout (as file:lineno) my ide will actually generate links for those.

@rubys
Copy link
Member Author

rubys commented Aug 22, 2018

I won't block this in any way

Cool, thanks! At the moment, it is blocked by lack of reviews. The documentation team seems to be a bit inactive. I'll wait a day or so, and then start asking for RSLGTM reviews.

How about generating an api_local.json file for match debugging?

Can you point me at the documentation for such? It sounds like a great idea, as the data sources are "dirty" and the more users we can find for the the filtered data results, the more likely that data will be maintained and accurate.

@vsemozhetbyt
Copy link
Contributor

I will try to review at weekend but do not wait for me if the PR will have enough approvals. The most part of this PR is out of my sure competence anyway)


// parse source
const source = fs.readFileSync(file, 'utf8');
const ast = acorn.parse(source, { ecmaVersion: 10 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding locations: true will annotate the nodes with line & column location. So no need for lineends.
e.g:

> const acorn = require('internal/deps/acorn/dist/acorn');
> const source = fs.readFileSync(file, 'utf8');
> const ast = acorn.parse('lib/buffer.js', { ecmaVersion: 10, locations: true })
> console.log(ast.body[2].loc)
SourceLocation {
  start: Position { line: 42, column: 0 },
  end: Position { line: 42, column: 21 } }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or const acorn = require('../../deps/acorn');

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That gets me:

SyntaxError: Unexpected token (127:12)
    at Parser.pp$4.raise (/Users/rubys/git/node/deps/acorn/dist/acorn.js:2649:13)
    at Parser.pp.unexpected (/Users/rubys/git/node/deps/acorn/dist/acorn.js:637:8)
    at Parser.pp.expect (/Users/rubys/git/node/deps/acorn/dist/acorn.js:631:26)
    at Parser.pp$1.parseTryStatement (/Users/rubys/git/node/deps/acorn/dist/acorn.js:983:10)
    at Parser.pp$1.parseStatement (/Users/rubys/git/node/deps/acorn/dist/acorn.js:768:32)
    at Parser.pp$1.parseBlock (/Users/rubys/git/node/deps/acorn/dist/acorn.js:1077:23)
    at Parser.pp$1.parseStatement (/Users/rubys/git/node/deps/acorn/dist/acorn.js:775:34)
    at Parser.pp$1.parseIfStatement (/Users/rubys/git/node/deps/acorn/dist/acorn.js:902:26)
    at Parser.pp$1.parseStatement (/Users/rubys/git/node/deps/acorn/dist/acorn.js:764:31)
    at Parser.pp$1.parseBlock (/Users/rubys/git/node/deps/acorn/dist/acorn.js:1077:23)
make[1]: *** [out/apilinks.json] Error 1

This is while parsing lib/assert.js

dep/acorn is at version 5.2.1, and needs to be at 5.6.0 or later to support this relatively new JavaScript feature.

How did you get past this?

targos pushed a commit that referenced this pull request Sep 3, 2018
Imported from the tarball published on npm
(https://registry.npmjs.org/acorn/-/acorn-5.7.2.tgz).

Update to emcaScript version 10 in order to get support for
binding-less catch statements.

Also needed to parse node.js lib API in #22405.

PR-URL: #22488
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: John-David Dalton <john.david.dalton@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Bryan English <bryan@bryanenglish.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
targos pushed a commit that referenced this pull request Sep 3, 2018
Parse source code using acorn; extracting exports.  When producing
documentation, match exports to headers.  When a match is found, add a [src]
link.

This first commit handles simple exported classes and functions, and does so
without requiring any changes to the source code or markdown.  Subsequent
commits will attempt to match more headers, and some of these changes are
likely to require changes to the source code and/or markdown.

PR-URL: #22405
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
targos pushed a commit that referenced this pull request Sep 6, 2018
Imported from the tarball published on npm
(https://registry.npmjs.org/acorn/-/acorn-5.7.2.tgz).

Update to emcaScript version 10 in order to get support for
binding-less catch statements.

Also needed to parse node.js lib API in #22405.

PR-URL: #22488
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: John-David Dalton <john.david.dalton@gmail.com>
Reviewed-By: Tiancheng "Timothy" Gu <timothygu99@gmail.com>
Reviewed-By: Bryan English <bryan@bryanenglish.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
targos pushed a commit that referenced this pull request Sep 6, 2018
Parse source code using acorn; extracting exports.  When producing
documentation, match exports to headers.  When a match is found, add a [src]
link.

This first commit handles simple exported classes and functions, and does so
without requiring any changes to the source code or markdown.  Subsequent
commits will attempt to match more headers, and some of these changes are
likely to require changes to the source code and/or markdown.

PR-URL: #22405
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
MylesBorins added a commit to MylesBorins/build that referenced this pull request Jan 29, 2019
rvagg pushed a commit to nodejs/build that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants