-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A proposal to add fs.scandir method to FS module #15699
Comments
I think if we're going to introduce |
For context: libuv/libuv#416 - stalled, and itself a continuation of an older, also stalled PR. |
Speaking from the wild (20+ years of syseng experience): large directories are a recurring ops problem I use as an interview question because I have seen it multiple times, and as recently as 2013. Even at companies like Disney and Amazon, there are developers who don't see a problem with using a directory as a flat key-value store, or creating empty files and never removing them. Eventually an operator like me has to do something with hundreds of millions of files all in the same directory. In the past I've used Perl to deal with it, but I fell out of love with Perl years ago. The usual tools are usually useless specifically because (as I found out with strace) they stat each entry. While stat itself isn't super expensive, it ends up driving the memory and CPU cost of a scan higher than it needs to be if there's enough information in the file name to make decisions from. (Though while writing this I discovered 'ls -1' uses mmap and does not perform any stat()s.) If/when y'all decide to tackle this issue, I would ask that you also offer fs.*dir* so that I can do precisely what I want to: cleanUp = (problemDir, prefix) ->
new Promise (resolve, reject) ->
fs.opendir problemDir, (err, handle) ->
reject err
do next = ->
handle.readdir (err, stat) ->
switch
when err then reject err
when not stat then resolve()
when stat.isDirectory then next()
when (fileName = stat.name).startsWith prefix
fullPath = path.resolve problemDir, fileName
fs.unlink fullPath, (err) ->
if err then reject err else next()
else next()
cleanUp 'incoming', 'system.system'
.then -> console.log 'Completed'
.catch (err) -> console.log err
That said, this kind of feature probably has a small audience and third-party |
Another not mutually exclusive option is to add a new type of stream that emits |
Not essential but it would be nice to have an option to produce a recursive scan. I guess it wouldn’t follow symlinks though to avoid falling into a closed loop |
This would be great to have. I made a very simple parallel find implementation to see if I could use node's async nature to keep a higher queue depth. I'm not sure, but I really think the lack of exposure to dirent- specifically that inability to see that an entry is a directory- is what keeps Node from trouncing find. I went looking into libuv and found that libuv does expose the dirent type, &c, it's just Node's readdir that is limiting. Then I found this thread. But also: having access to things like inodes, extents, &c would open up Node's viability for a wide array of system tools. I'd expect 95% of use cases to be recursing directory trees, but I wanted to call out that there's a lot of other good helpful systems stuff that scandir does. Proposal: add an option to fs.readdir to get full directory entities. Scandir is useful for recursing, but there's still really good useful things that can be done with readdir(3) that libuv permits, but Node doesn't. I'd love to see dirents results available for Node's readdir! |
readdir and readdirSync now have a "withFileTypes" option, which, when enabled, provides an array of DirectoryEntry objects, similar to Stats objects, which have the filename and the type information. Ref: nodejs#15699
readdir and readdirSync now have a "withFileTypes" option, which, when enabled, provides an array of DirectoryEntry objects, similar to Stats objects, which have the filename and the type information. Refs: #15699 PR-URL: #22020 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Roman Reiss <me@silverwind.io> Reviewed-By: John-David Dalton <john.david.dalton@gmail.com>
readdir and readdirSync now have a "withFileTypes" option, which, when enabled, provides an array of DirectoryEntry objects, similar to Stats objects, which have the filename and the type information. Refs: #15699 PR-URL: #22020 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Roman Reiss <me@silverwind.io> Reviewed-By: John-David Dalton <john.david.dalton@gmail.com>
readdir and readdirSync now have a "withFileTypes" option, which, when enabled, provides an array of DirectoryEntry objects, similar to Stats objects, which have the filename and the type information. Refs: #15699 PR-URL: #22020 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Roman Reiss <me@silverwind.io> Reviewed-By: John-David Dalton <john.david.dalton@gmail.com>
This was fixed in #22020. |
Problem
Now any interaction with files and directories in the File System is as follows:
The problem here is that we are call File System a second time due to the fact that we don't know the directory in front of us, or file (or symlink).
But we can reduce twice File System calls by creating
fs.scandir
method that can returnd_name
andd_type
. This information is returned fromuv_dirent_t
(scandir
) (libuv). For example, this is implemented in theLuvit
andpyuv
(also uselibuv
).Motivation
String
→Object
→d_name
+d_type
fs.readdir
: return convertedObject
toString
fs.scandir
: return As isnode-glob
(also for each package that usesfs.readdir
for traversing directories) in most cases (needfs.stat
whend_type
is aDT_UNKNOWN
on the old FS)Proposed solution
Add a methods
fs.scandir
andfs.scandirSync
in a standard Fyle System module.Where
entries
is an array of objects:name
{String} – filename (d_name
inlibuv
)type
{Number} –fs.constants.S_*
(d_type
inlibuv
)Final words
Now I solved this problem by creating C++ Addon but... But I don't speak C++ or speak but very bad (i try 😅 ) and it requires you compile when you install a package that requires additional manipulation to the end user (like https://github.com/nodejs/node-gyp#on-windows).
The text was updated successfully, but these errors were encountered: