Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: readdir optionally returning type information #22020

Closed
wants to merge 10 commits into from

Conversation

bengl
Copy link
Member

@bengl bengl commented Jul 29, 2018

readdir and readdirSync now have a "withFileTypes" option, which, when enabled,
provides an array of DirectoryEntry objects, similar to Stats bjects,
which have the filename and the type information.

Ref: #15699

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Jul 29, 2018
@bengl bengl added the fs Issues and PRs related to the fs subsystem / file system. label Jul 29, 2018
src/node_file.cc Outdated
name_idx = 0;
}

name_argv[name_idx++] = Integer::New(env->isolate(), ent.type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know at one time V8 would optimize the case for arrays where each element was of the same type. Perhaps this might perform better if we had a separate array for the type values? That would also allow us to avoid having to i \ 2 inside the JS loop, which is slower than just referencing i.

doc/api/fs.md Outdated
@@ -2304,9 +2385,10 @@ changes:
* `path` {string|Buffer|URL}
* `options` {string|Object}
* `encoding` {string} **Default:** `'utf8'`
* `withTypes` {boolean} **Default:** `false`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Expand this to withFileTypes, since types can refer to a bunch of different things in a programming context

doc/api/fs.md Outdated
@@ -283,6 +283,87 @@ synchronous use libuv's threadpool, which can have surprising and negative
performance implications for some applications. See the
[`UV_THREADPOOL_SIZE`][] documentation for more information.

## Class: fs.DirectoryEntry

This comment was marked as resolved.

src/node_file.cc Outdated
FSReqBase* req_wrap = FSReqBase::from_req(req);
FSReqAfterScope after(req_wrap, req);

if (after.Proceed()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe do the reverse condition and return, to save one level of indentation for the rest of the function?

src/node_file.cc Outdated

if (name_idx >= arraysize(name_argv)) {
fn->Call(env->context(), names, name_idx, name_argv)
.ToLocalChecked();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is what we do for existing code, but do you mind moving to proper error-checking here? (i.e. return if the Call result is empty like you do below)

doc/api/fs.md Outdated
option set to `true`, the resulting array is filled with `fs.DirectoryEntry`
objects, rather than strings or `Buffers`.

### dirent.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Missing YAML?
  2. It seems this property should be the last one, ABC-wise.

doc/api/fs.md Outdated

* Returns: {boolean}

Returns `true` if the `fs.DirectoryEntry` object describes a file system directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceeds 80 characters length.

doc/api/fs.md Outdated

* Returns: {boolean}

Returns `true` if the `fs.DirectoryEntry` object describes a first-in-first-out (FIFO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceeds 80 characters length.

@bengl
Copy link
Member Author

bengl commented Jul 30, 2018

@addaleax @vsemozhetbyt I added a squash commit which I think addresses your comments. PTAL.

@mscdex What would be better as a return value: an array of arrays like [[name1, name2, ...], [type1, type2, ...]] or an object containing both arrays? Or an array of { name, type } objects?

@Trott Trott added the semver-minor PRs that contain new features and should be released in the next minor version. label Jul 30, 2018
@bengl
Copy link
Member Author

bengl commented Jul 30, 2018

@Trott
Copy link
Member

Trott commented Jul 30, 2018

@nodejs/fs

@Trott
Copy link
Member

Trott commented Jul 30, 2018

Since documentation additions are significant: @nodejs/documentation

@devsnek
Copy link
Member

devsnek commented Jul 30, 2018

why don't we just add scandir and scandirSync so we don't have to deal with polymorphic return types and weird options

also +1 to { name, type }, the whole stats object thing is terrible

@mscdex
Copy link
Contributor

mscdex commented Jul 30, 2018

What would be better as a return value: an array of arrays like [[name1, name2, ...], [type1, type2, ...]] or an object containing both arrays? Or an array of { name, type } objects?

I would guess it's still faster to create objects in JS land, but what I was suggesting is to pass back two separate arrays to req.oncomplete(). You could even reuse one of the arrays for the return value:

req.oncomplete = (err, names, fileTypes) => {
  if (err) {
    callback(err);
    return;
  }
  const len = names.length;
  for (var i = 0; i < len; ++i)
    names[i] = new DirectoryEntry(names[i], fileTypes[i]);
  callback(null, names);
};

I'm not sure what V8 does optimization-wise when reusing the array like that, if it re-optimizes after the loop finishes or what, but that's the general idea I had in mind.

@ronkorving
Copy link
Contributor

ronkorving commented Jul 30, 2018

I'm definitely with @devsnek on making these separate API. Having one API return completely different things based on options is .. a bit too much. Different input format causing a different output format... At that point, I think it has just become a different function. Also, the name scandir as @devsnek suggests feels very natural to me.

@bengl
Copy link
Member Author

bengl commented Jul 30, 2018

@mscdex Right, that would make sense for the async case, but for the sync case, since a single value needs to be returned from the binding function, something holding both arrays is needed.

@devsnek

why don't we just add scandir and scandirSync so we don't have to deal with polymorphic return types and weird options

I had originally planned on doing it that way (as suggested in the linked issue), but most functions on fs, including readdir/readdirSync define themselves in terms of UNIX man pages. In there, scandir is defined as "scan a directory for matching entries", wheras what's happening here is far closer to readdir. I'd rather reserve scandir for when something that closer matches it is implemented. (Note: I'm not necessarily a big fan of having such a POSIX-centric fs module, but it's what we have, and my intention was to not stray too far from it.)

also +1 to { name, type }, the whole stats object thing is terrible

I did the Stats-esque DirectoryEntry object deliberately to be consistent with the stat functions, since any use case of this new feature was previously worked around by calling stat on each entry returned by readdir/readdirSync. Also, the type is a number, and I didn't think it was best to either convert it on the fly to something readable or force the comparison onto the user.

All that being said: If there's consensus around scandir/scandirSync, bare-object-dirents, or any other significant changes to the API I've presented here, I'm happy to make those changes.

@mscdex
Copy link
Contributor

mscdex commented Jul 30, 2018

but for the sync case, since a single value needs to be returned from the binding function, something holding both arrays is needed.

A 2-D array would probably work for that case.

Perhaps an even better solution that would work for both scenarios would be to create the zero-length arrays in JS land and pass them (via the context object) to C++ land and fill them in there. Then do the same process as I showed in the code snippet earlier.

@bengl
Copy link
Member Author

bengl commented Jul 30, 2018

@mscdex Since the async version seems to be all set up to handle both callbacks and promises in C++, it seemed easier to just have a 2-D array in all cases, which I've done in the latest squash commit.


And another CI run...

CI: https://ci.nodejs.org/job/node-test-pull-request/16073/

Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just some nits that do not block the PR. I only roughly looked at the C++ part.
I have no strong opinion about it being a new API or not.

if (testMethod === method) {
assert.strictEqual(dirent[testMethod](), true);
} else {
assert.strictEqual(dirent[testMethod](), false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: assert.strictEqual(dirent[testMethod](), testMethod === method);

const len = names.length;
for (var i = 0; i < len; i++) {
names[i] = new DirectoryEntry(names[i], types[i]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this part into fs/utils.js? It is done three times and could use the abstraction.

lib/fs.js Outdated
return result;
if (!options.withFileTypes) {
return result;
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super tiny nit: the else could be removed to reduce indentation. The same in other places.

@bengl
Copy link
Member Author

bengl commented Jul 30, 2018

I'm not sure what the AIX and SmartOS errors are all about. It seems like isFile isn't working on those systems? That's pretty weird. Maybe the values for the types are not in the set defined by libuv?

@bengl
Copy link
Member Author

bengl commented Jul 31, 2018

Here's another CI run, hopefully fixing AIX and SmartOS...

CI: https://ci.nodejs.org/job/node-test-pull-request/16088/

@bengl
Copy link
Member Author

bengl commented Jul 31, 2018

One more CI, hopefully passing this time...

CI: https://ci.nodejs.org/job/node-test-pull-request/16089/

@Trott
Copy link
Member

Trott commented Aug 1, 2018

@bengl
Copy link
Member Author

bengl commented Aug 1, 2018

Since this introduces a binding-layer breaking change....

CITGM: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/1480/

@bengl
Copy link
Member Author

bengl commented Aug 1, 2018

Rebased to deal with FSReqWrap -> FSReqCallback from @maclover7.

New CI: https://ci.nodejs.org/job/node-test-pull-request/16131/

@bengl
Copy link
Member Author

bengl commented Aug 1, 2018

@targos targos added backport-requested-v10.x and removed author ready PRs that have at least one approval, no pending requests for changes, and a CI started. backport-requested-v10.x labels Aug 19, 2018
targos pushed a commit that referenced this pull request Aug 19, 2018
readdir and readdirSync now have a "withFileTypes" option, which, when
enabled, provides an array of DirectoryEntry objects, similar to Stats
objects, which have the filename and the type information.

Refs: #15699

PR-URL: #22020
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: John-David Dalton <john.david.dalton@gmail.com>
targos pushed a commit that referenced this pull request Sep 3, 2018
readdir and readdirSync now have a "withFileTypes" option, which, when
enabled, provides an array of DirectoryEntry objects, similar to Stats
objects, which have the filename and the type information.

Refs: #15699

PR-URL: #22020
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Roman Reiss <me@silverwind.io>
Reviewed-By: John-David Dalton <john.david.dalton@gmail.com>
targos added a commit that referenced this pull request Sep 5, 2018
Notable changes:

* child_process:
  * `TypedArray` and `DataView` values are now accepted as input by
    `execFileSync` and `spawnSync`. #22409
* coverage:
  * Native V8 code coverage information can now be output to disk by setting the
    environment variable `NODE_V8_COVERAGE` to a directory. #22527
* deps:
  * The bundled npm was upgraded to version 6.4.1. #22591
    * Changelogs:
      [6.3.0-next.0](https://github.com/npm/cli/releases/tag/v6.3.0-next.0)
      [6.3.0](https://github.com/npm/cli/releases/tag/v6.3.0)
      [6.4.0](https://github.com/npm/cli/releases/tag/v6.4.0)
      [6.4.1](https://github.com/npm/cli/releases/tag/v6.4.1)
* fs:
  * The methods `fs.read`, `fs.readSync`, `fs.write`, `fs.writeSync`,
    `fs.writeFile` and `fs.writeFileSync` now all accept `TypedArray` and
    `DataView` objects. #22150
  * A new boolean option, `withFileTypes`, can be passed to to `fs.readdir` and
    `fs.readdirSync`. If set to true, the methods return an array of directory
    entries. These are objects that can be used to determine the type of each
    entry and filter them based on that without calling `fs.stat`. #22020
* http2:
  * The `http2` module is no longer experimental. #22466
* os:
  * Added two new methods: `os.getPriority` and `os.setPriority`, allowing to
    manipulate the scheduling priority of processes. #22394
* process:
  * Added `process.allowedNodeEnvironmentFlags`. This object can be used to
    programmatically validate and list flags that are allowed in the
    `NODE_OPTIONS` environment variable. #19335
* src:
  * Deprecated option variables in public C++ API. #22392
  * Refactored options parsing. #22392
* vm:
  * Added `vm.compileFunction`, a method to create new JavaScript functions from
    a source body, with options similar to those of the other `vm` methods. #21571
* Added new collaborators:
  * [lundibundi](https://github.com/lundibundi) - Denys Otrishko

PR-URL: #22716
targos added a commit that referenced this pull request Sep 6, 2018
Notable changes:

* child_process:
  * `TypedArray` and `DataView` values are now accepted as input by
    `execFileSync` and `spawnSync`. #22409
* coverage:
  * Native V8 code coverage information can now be output to disk by setting the
    environment variable `NODE_V8_COVERAGE` to a directory. #22527
* deps:
  * The bundled npm was upgraded to version 6.4.1. #22591
    * Changelogs:
      [6.3.0-next.0](https://github.com/npm/cli/releases/tag/v6.3.0-next.0)
      [6.3.0](https://github.com/npm/cli/releases/tag/v6.3.0)
      [6.4.0](https://github.com/npm/cli/releases/tag/v6.4.0)
      [6.4.1](https://github.com/npm/cli/releases/tag/v6.4.1)
* fs:
  * The methods `fs.read`, `fs.readSync`, `fs.write`, `fs.writeSync`,
    `fs.writeFile` and `fs.writeFileSync` now all accept `TypedArray` and
    `DataView` objects. #22150
  * A new boolean option, `withFileTypes`, can be passed to to `fs.readdir` and
    `fs.readdirSync`. If set to true, the methods return an array of directory
    entries. These are objects that can be used to determine the type of each
    entry and filter them based on that without calling `fs.stat`. #22020
* http2:
  * The `http2` module is no longer experimental. #22466
* os:
  * Added two new methods: `os.getPriority` and `os.setPriority`, allowing to
    manipulate the scheduling priority of processes. #22407
* process:
  * Added `process.allowedNodeEnvironmentFlags`. This object can be used to
    programmatically validate and list flags that are allowed in the
    `NODE_OPTIONS` environment variable. #19335
* src:
  * Deprecated option variables in public C++ API. #22515
  * Refactored options parsing. #22392
* vm:
  * Added `vm.compileFunction`, a method to create new JavaScript functions from
    a source body, with options similar to those of the other `vm` methods. #21571
* Added new collaborators:
  * [lundibundi](https://github.com/lundibundi) - Denys Otrishko

PR-URL: #22716
targos added a commit that referenced this pull request Sep 6, 2018
Notable changes:

* child_process:
  * `TypedArray` and `DataView` values are now accepted as input by
    `execFileSync` and `spawnSync`. #22409
* coverage:
  * Native V8 code coverage information can now be output to disk by setting the
    environment variable `NODE_V8_COVERAGE` to a directory. #22527
* deps:
  * The bundled npm was upgraded to version 6.4.1. #22591
    * Changelogs:
      [6.3.0-next.0](https://github.com/npm/cli/releases/tag/v6.3.0-next.0)
      [6.3.0](https://github.com/npm/cli/releases/tag/v6.3.0)
      [6.4.0](https://github.com/npm/cli/releases/tag/v6.4.0)
      [6.4.1](https://github.com/npm/cli/releases/tag/v6.4.1)
* fs:
  * The methods `fs.read`, `fs.readSync`, `fs.write`, `fs.writeSync`,
    `fs.writeFile` and `fs.writeFileSync` now all accept `TypedArray` and
    `DataView` objects. #22150
  * A new boolean option, `withFileTypes`, can be passed to to `fs.readdir` and
    `fs.readdirSync`. If set to true, the methods return an array of directory
    entries. These are objects that can be used to determine the type of each
    entry and filter them based on that without calling `fs.stat`. #22020
* http2:
  * The `http2` module is no longer experimental. #22466
* os:
  * Added two new methods: `os.getPriority` and `os.setPriority`, allowing to
    manipulate the scheduling priority of processes. #22407
* process:
  * Added `process.allowedNodeEnvironmentFlags`. This object can be used to
    programmatically validate and list flags that are allowed in the
    `NODE_OPTIONS` environment variable. #19335
* src:
  * Deprecated option variables in public C++ API. #22515
  * Refactored options parsing. #22392
* vm:
  * Added `vm.compileFunction`, a method to create new JavaScript functions from
    a source body, with options similar to those of the other `vm` methods. #21571
* Added new collaborators:
  * [lundibundi](https://github.com/lundibundi) - Denys Otrishko

PR-URL: #22716
targos added a commit that referenced this pull request Sep 6, 2018
Notable changes:

* child_process:
  * `TypedArray` and `DataView` values are now accepted as input by
    `execFileSync` and `spawnSync`. #22409
* coverage:
  * Native V8 code coverage information can now be output to disk by setting the
    environment variable `NODE_V8_COVERAGE` to a directory. #22527
* deps:
  * The bundled npm was upgraded to version 6.4.1. #22591
    * Changelogs:
      [6.3.0-next.0](https://github.com/npm/cli/releases/tag/v6.3.0-next.0)
      [6.3.0](https://github.com/npm/cli/releases/tag/v6.3.0)
      [6.4.0](https://github.com/npm/cli/releases/tag/v6.4.0)
      [6.4.1](https://github.com/npm/cli/releases/tag/v6.4.1)
* fs:
  * The methods `fs.read`, `fs.readSync`, `fs.write`, `fs.writeSync`,
    `fs.writeFile` and `fs.writeFileSync` now all accept `TypedArray` and
    `DataView` objects. #22150
  * A new boolean option, `withFileTypes`, can be passed to to `fs.readdir` and
    `fs.readdirSync`. If set to true, the methods return an array of directory
    entries. These are objects that can be used to determine the type of each
    entry and filter them based on that without calling `fs.stat`. #22020
* http2:
  * The `http2` module is no longer experimental. #22466
* os:
  * Added two new methods: `os.getPriority` and `os.setPriority`, allowing to
    manipulate the scheduling priority of processes. #22407
* process:
  * Added `process.allowedNodeEnvironmentFlags`. This object can be used to
    programmatically validate and list flags that are allowed in the
    `NODE_OPTIONS` environment variable. #19335
* src:
  * Deprecated option variables in public C++ API. #22515
  * Refactored options parsing. #22392
* vm:
  * Added `vm.compileFunction`, a method to create new JavaScript functions from
    a source body, with options similar to those of the other `vm` methods. #21571
* Added new collaborators:
  * [lundibundi](https://github.com/lundibundi) - Denys Otrishko

PR-URL: #22716
@silverwind
Copy link
Contributor

If possible, I'd like to see this backported to 8.x.

const type = types[i];
if (type === UV_DIRENT_UNKNOWN) {
const name = names[i];
const stats = lazyLoadFs().statSync(pathModule.resolve(path, name));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that was asked before (I didn't read through all comments): why does this use resolve instead of join? At this point name is not expected to be an absolute path.

const type = types[i];
if (type === UV_DIRENT_UNKNOWN) {
const name = names[i];
const stats = lazyLoadFs().statSync(pathModule.resolve(path, name));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this use statSync instead of lstatSync? This makes it rather inconsistent because in one case a Dirent may be reported as symbolic link and in case readdir gives type === UV_DIRENT_UNKNOWN symlinks are resolved and the type of the symlinked file is used instead.

andersk added a commit to andersk/TypeScript that referenced this pull request Nov 22, 2019
This makes walking large directory trees much more efficient on Node
10.10 or later.

See:
https://lwn.net/Articles/606995/
https://www.python.org/dev/peps/pep-0471/
nodejs/node#22020
https://nodejs.org/en/blog/release/v10.10.0/

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
andersk added a commit to andersk/TypeScript that referenced this pull request Jan 13, 2020
This makes walking large directory trees much more efficient on Node
10.10 or later.

See:
https://lwn.net/Articles/606995/
https://www.python.org/dev/peps/pep-0471/
nodejs/node#22020
https://nodejs.org/en/blog/release/v10.10.0/

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
amcasey pushed a commit to microsoft/TypeScript that referenced this pull request Jan 15, 2020
Kingwl pushed a commit to Kingwl/TypeScript that referenced this pull request Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system. lib / src Issues and PRs related to general changes in the lib or src directory. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.