Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dirent::d_ino #65

Open
yamt opened this issue Oct 20, 2022 · 23 comments
Open

dirent::d_ino #65

yamt opened this issue Oct 20, 2022 · 23 comments

Comments

@yamt
Copy link

yamt commented Oct 20, 2022

depends on the underlying platforms and/or filesystems, it can be very expensive to provide dirent::d_ino.
i'd suggest to make it optional.
eg. allow a wasi implementation to say it unknown in a way similar to filetype.unknown.
how do you think?

cf. https://github.com/bytecodealliance/wasm-micro-runtime/blob/b49fb8a66840d484e775936399456fb6b85f98c6/core/iwasm/libraries/libc-wasi/sandboxed-system-primitives/src/posix.c#L2101-L2105

@sunfishcode
Copy link
Member

sunfishcode commented Oct 20, 2022

The ability to compare two files' inode numbers is a common way to test whether two file descriptors refer to the same file. If we make d_ino optional, that common idiom wouldn't work.

On what platforms is computing d_ino expensive?

@sunfishcode sunfishcode transferred this issue from WebAssembly/WASI Oct 20, 2022
@yamt
Copy link
Author

yamt commented Oct 20, 2022

The ability to compare two files' inode numbers is a common way to test whether two file descriptors refer to the same file. If we make d_ino optional, that common idiom wouldn't work.

On what platforms is computing d_ino expensive?

eg. nuttx doesn't have it in struct dirent.
although it has st_ino field in struct stat, the majority of its filesystems doesn't provide it.
https://github.com/apache/incubator-nuttx/blob/81ff16c675296d4d411fe1a2c1eaf74f52fa33cc/include/dirent.h#L103-L115
https://github.com/apache/incubator-nuttx/blob/81ff16c675296d4d411fe1a2c1eaf74f52fa33cc/include/sys/stat.h#L146

eg. littlefs doesn't seem to have handy on-disk numbers appropriate for the purpose.
https://github.com/littlefs-project/littlefs/blob/40dba4a556e0d81dfbe64301a6aa4e18ceca896c/lfs.h#L268-L281

@sunfishcode
Copy link
Member

Ok, so then it seems to question is: should we make wasi-filesystem less useful for many applications, in order to make it more useful for people that want to use it on nuttx and littlefs?

@yamt
Copy link
Author

yamt commented Oct 25, 2022

my suggestion is merely to define an official way for apps to know the lack of the certain underlying feature. (file number)
right now, some wasi implementations are using d_ino = 0 for the purpose.
for example, we can just make it official. or maybe we can introduce another special number like -1. or maybe we can introduce pathconf-like api or something along the line.
i don't think it makes "wasi-filesystem less useful for many applications".

@sunfishcode
Copy link
Member

Comparing inode numbers is a common Unix idiom. We can define a new API or a new sentinel convention, but application code ported to WASI from other platforms wouldn't know to use it.

Nuttx advertises itself as POSIX compatible, but in the code it has this comment.

What should we do?

@yamt
Copy link
Author

yamt commented Oct 26, 2022

but application code ported to WASI from other platforms wouldn't know to use it.

sure, but it won't make the situation any worse.

some more data points:

  • uvwasi uses d_ino=0 as the underlying uv fs api doesn't provide it.
  • windows FindNextFileW doesn't seem to provide d_ino equivalent.

@sunfishcode
Copy link
Member

The WASI Subgroup recently started work on an official testsuite. Once this is ready, we can add a test ensuring that two different files have differing inode numbers. Then, any implementation that wants to pass the testsuite will need to implement a d_ino field.

Windows' GetFileInformationByHandle returns a struct containing nFileIndexHigh/nFileIndexLow, which work for this purpose. The one known caveat is ReFS, which requires 128-bit IDs, though that doesn't appear to be widely used, and it raises a similar question: is the existence of ReFS enough to motivate breaking this common programming idiom for the rest of the users?

@yamt
Copy link
Author

yamt commented Oct 27, 2022

The WASI Subgroup recently started work on an official testsuite. Once this is ready, we can add a test ensuring that two different files have differing inode numbers. Then, any implementation that wants to pass the testsuite will need to implement a d_ino field.

where can i find it?

Windows' GetFileInformationByHandle returns a struct containing nFileIndexHigh/nFileIndexLow, which work for this purpose. The one known caveat is ReFS, which requires 128-bit IDs, though that doesn't appear to be widely used, and it raises a similar question: is the existence of ReFS enough to motivate breaking this common programming idiom for the rest of the users?

does it mean three calls per entry? ie. get the handle, GetFileInformationByHandle, close the handle.

in any case, i'm sure that the readdir usage without checking d_ino is far more common than the programming idiom.

@sunfishcode
Copy link
Member

The WASI Subgroup recently started work on an official testsuite. Once this is ready, we can add a test ensuring that two different files have differing inode numbers. Then, any implementation that wants to pass the testsuite will need to implement a d_ino field.

where can i find it?

The repository is at https://github.com/WebAssembly/wasi-testsuite/. As I said, we recently started work on it so there are no tests there yet, but this is the path forward.

Windows' GetFileInformationByHandle returns a struct containing nFileIndexHigh/nFileIndexLow, which work for this purpose. The one known caveat is ReFS, which requires 128-bit IDs, though that doesn't appear to be widely used, and it raises a similar question: is the existence of ReFS enough to motivate breaking this common programming idiom for the rest of the users?

does it mean three calls per entry? ie. get the handle, GetFileInformationByHandle, close the handle.

in any case, i'm sure that the readdir usage without checking d_ino is far more common than the programming idiom.

There are users that use d_ino, even though I also expect you're right that many more don't.

Here's an idea: What if we said it's ok for readdir to omit an inode value, and we teach wasi-libc's readdir to do an extra stat call to get the inode, if d_ino doesn't have it?

That way, wasi-libc users would continue to get a working d_ino field. But we could then offer special options for users to opt out of the extra stat call if they don't need d_ino.

@yamt
Copy link
Author

yamt commented Nov 17, 2022

The WASI Subgroup recently started work on an official testsuite. Once this is ready, we can add a test ensuring that two different files have differing inode numbers. Then, any implementation that wants to pass the testsuite will need to implement a d_ino field.

where can i find it?

The repository is at https://github.com/WebAssembly/wasi-testsuite/. As I said, we recently started work on it so there are no tests there yet, but this is the path forward.

thank you for the link.

Windows' GetFileInformationByHandle returns a struct containing nFileIndexHigh/nFileIndexLow, which work for this purpose. The one known caveat is ReFS, which requires 128-bit IDs, though that doesn't appear to be widely used, and it raises a similar question: is the existence of ReFS enough to motivate breaking this common programming idiom for the rest of the users?

does it mean three calls per entry? ie. get the handle, GetFileInformationByHandle, close the handle.
in any case, i'm sure that the readdir usage without checking d_ino is far more common than the programming idiom.

There are users that use d_ino, even though I also expect you're right that many more don't.

Here's an idea: What if we said it's ok for readdir to omit an inode value, and we teach wasi-libc's readdir to do an extra stat call to get the inode, if d_ino doesn't have it?

That way, wasi-libc users would continue to get a working d_ino field. But we could then offer special options for users to opt out of the extra stat call if they don't need d_ino.

while it would work for platforms which doesn't provide d_ino equivalent,
it merely makes readdir more expensive in case the underlying filesystems w/o suitable on-disk numbers.
i'm afraid that the latter might be more common reason to omit d_ino.

probably readdir can tell libc one of the following:
a. d_ino is here
b. omitted, but you can use stat to get it
c. no d_ino available on this filesystem
but i'm feeling a bit over-engineering now.

sunfishcode added a commit to WebAssembly/wasi-libc that referenced this issue Nov 24, 2022
… value.

On some systems, `fd_readdir` may not implement the `d_ino` field and
may set it to zero. When this happens, have wasi-libc call `fstatat` to
get the inode number.

See the discussion in
WebAssembly/wasi-filesystem#65 for details.
@sunfishcode
Copy link
Member

I've now posted WebAssembly/wasi-libc#345 to implement the wasi-libc change discussed here.

sunfishcode added a commit to WebAssembly/wasi-libc that referenced this issue Nov 30, 2022
… value.

On some systems, `fd_readdir` may not implement the `d_ino` field and
may set it to zero. When this happens, have wasi-libc call `fstatat` to
get the inode number.

See the discussion in
WebAssembly/wasi-filesystem#65 for details.
sunfishcode added a commit to WebAssembly/wasi-libc that referenced this issue Nov 30, 2022
… value.

On some systems, `fd_readdir` may not implement the `d_ino` field and
may set it to zero. When this happens, have wasi-libc call `fstatat` to
get the inode number.

See the discussion in
WebAssembly/wasi-filesystem#65 for details.
sunfishcode added a commit that referenced this issue Nov 30, 2022
`readdir` returning bytes has been a source of complexity for both libc
and engine implementors, with subtle issues about alignment and partial
records.

Rename `dirent` to `dir-entry` for readability, and revamp it:
 - Make the name field a proper `string` rather than being represented
   by trailing bytes in the buffer.
 - Make the inode field `optional`, so that we can make it optional
   as discussed in #65 without special-casing zero, which is reportedly
   a valid inode number on [some filesystems].

And remove the `rewind` parameter, which isn't needed when we return
a stream, as users wanting to start at the beginning can just request
a new stream.

[some filesystems]: https://sourceware.org/pipermail/libc-alpha/2022-September/142059.html
sunfishcode added a commit that referenced this issue Dec 2, 2022
`readdir` returning bytes has been a source of complexity for both libc
and engine implementors, with subtle issues about alignment and partial
records.

Rename `dirent` to `dir-entry` for readability, and revamp it:
 - Make the name field a proper `string` rather than being represented
   by trailing bytes in the buffer.
 - Make the inode field `optional`, so that we can make it optional
   as discussed in #65 without special-casing zero, which is reportedly
   a valid inode number on [some filesystems].

And remove the `rewind` parameter, which isn't needed when we return
a stream, as users wanting to start at the beginning can just request
a new stream.

[some filesystems]: https://sourceware.org/pipermail/libc-alpha/2022-September/142059.html
@sunfishcode
Copy link
Member

while it would work for platforms which doesn't provide d_ino equivalent, it merely makes readdir more expensive in case the underlying filesystems w/o suitable on-disk numbers. i'm afraid that the latter might be more common reason to omit d_ino.

probably readdir can tell libc one of the following: a. d_ino is here b. omitted, but you can use stat to get it c. no d_ino available on this filesystem but i'm feeling a bit over-engineering now.

Unlike d_ino (which is in XSI, an optional part of POSIX) st_ino is required by POSIX, so I don't think supporting filesystems with no way to provide st_ino are things we can support.

Other than that, I think the basic design you described here is what I've proposed; d_ino can be present or absent.

sunfishcode added a commit to WebAssembly/wasi-libc that referenced this issue Dec 2, 2022
… value. (#345)

* If `fd_readdir` returns a zero inode, call `fstatat` to get the inode value.

On some systems, `fd_readdir` may not implement the `d_ino` field and
may set it to zero. When this happens, have wasi-libc call `fstatat` to
get the inode number.

See the discussion in
WebAssembly/wasi-filesystem#65 for details.

* Update the `d_type` field too, in case it changes.
sunfishcode added a commit that referenced this issue Dec 7, 2022
`readdir` returning bytes has been a source of complexity for both libc
and engine implementors, with subtle issues about alignment and partial
records.

Rename `dirent` to `dir-entry` for readability, and revamp it:
 - Make the name field a proper `string` rather than being represented
   by trailing bytes in the buffer.
 - Make the inode field `optional`, so that we can make it optional
   as discussed in #65 without special-casing zero, which is reportedly
   a valid inode number on [some filesystems].

And remove the `rewind` parameter, which isn't needed when we return
a stream, as users wanting to start at the beginning can just request
a new stream.

[some filesystems]: https://sourceware.org/pipermail/libc-alpha/2022-September/142059.html
@jerbob92
Copy link

jerbob92 commented Dec 8, 2022

It feels like this makes readdir needlessly expensive. For example, in Wazero we use the Go FS for our WASI FS implementation, meaning that it's a flexible FS backend for our users. Most real FS backends provide us with inodes, so we can pass that into WASI, but memory and virtual FS systems do not have inodes.

It would be nice if we could set d_ino to something to say that the FS is never going to return an inode, and that it then not calls the fstatat (maybe -1?)

There's also a bit of a flaw in the reasoning behind adding this, yes, inode is commonly used to check if a file is the same. But this should always happens in combination with st_dev, because st_ino is not unique across multiple devices. Howerver, WASI does not provide for st_dev. Should we somehow combine st_dev and st_ino?

@sunfishcode
Copy link
Member

It feels like this makes readdir needlessly expensive. For example, in Wazero we use the Go FS for our WASI FS implementation, meaning that it's a flexible FS backend for our users. Most real FS backends provide us with inodes, so we can pass that into WASI, but memory and virtual FS systems do not have inodes.

The resolution here should make readdir at the WASI level less expensive on these kinds of filesystems, while making readdir at the C/POSIX abstraction level approximately the same cost as it was before.

If you're saying that we shouldn't pay the cost at the C/POSIX abstraction level, then I'd ask the same question I asked above: is efficiency at the C/POSIX level on these specialized filesystems worth teaching all existing application code written at the C/POSIX abstraction level that st_dev+st_ino isn't sufficient to identify a file?

There's also a bit of a flaw in the reasoning behind adding this, yes, inode is commonly used to check if a file is the same. But this should always happens in combination with st_dev, because st_ino is not unique across multiple devices. Howerver, WASI does not provide for st_dev. Should we somehow combine st_dev and st_ino?

WASI does provide for st_dev today; it's the dev field in path_filestat_get and fd_filestat_get at the WASI level, and in stat, fstatat, and fstat in wasi-libc at the POSIX level.

@jerbob92
Copy link

jerbob92 commented Dec 8, 2022

If you're saying that we shouldn't pay the cost at the C/POSIX abstraction level, then I'd ask the same question I asked above: is efficiency at the C/POSIX level on these specialized filesystems worth teaching all existing application code written at the C/POSIX abstraction level that st_dev+st_ino isn't sufficient to identify a file?

Not saying that, just looking for a solution where we do not have to do a fstatat caused by the PR when we already know that that is also not going to provide an inode (ever).

WASI does provide for st_dev today; it's the dev field in path_filestat_get and fd_filestat_get at the WASI level, and in stat, fstatat, and fstat in wasi-libc at the POSIX level.

Huh, indeed. I was rather talking about dirent though. But it looks like POSIX also doesn't provide for that. 🤔
In that case I'm wondering if there are actually people using the readdir result directly to do this comparison, since it doesn't contain the dev, or whether they are then also doing a stat after the readdir.

I'm mostly feeling that wasi-libc has to assume that the runtime will fill in d_ino when it has the inode, putting in the fstatat logic in wasi-libc feels like moving the responsibility. The runtime can also just do that internally if they really want to return the d_ino.

@sunfishcode
Copy link
Member

Not saying that, just looking for a solution where we do not have to do a fstatat caused by the PR when we already know that that is also not going to provide an inode (ever).

That raises a different question than this issue started with.

POSIX says "The st_ino and st_dev fields taken together uniquely identify the file within the system." citation. Consequently, it's impossible to implement POSIX on an implementation where statat doesn't provide unique st_ino values, as you describe.

Should we tell all programs that they can't rely on this POSIX guarantee? Or shall we consider those implementations that do that non-conforming?

WASI does provide for st_dev today; it's the dev field in path_filestat_get and fd_filestat_get at the WASI level, and in stat, fstatat, and fstat in wasi-libc at the POSIX level.

Huh, indeed. I was rather talking about dirent though. But it looks like POSIX also doesn't provide for that. thinking In that case I'm wondering if there are actually people using the readdir result directly to do this comparison, since it doesn't contain the dev, or whether they are then also doing a stat after the readdir.

What people do is read the st_dev from the directory they're reading from, since it'll be the same for all the contents of that directory (even if something is mounted on top of something—readdir shows the inode for the thing underneath the mount).

I'm mostly feeling that wasi-libc has to assume that the runtime will fill in d_ino when it has the inode, putting in the fstatat logic in wasi-libc feels like moving the responsibility. The runtime can also just do that internally if they really want to return the d_ino.

The problem is, people writing programs at the POSIX level won't be aware of this "zero means unavailable" behavior. POSIX-abiding code that sees two open files with the same inode number will assume they're the same, and misbehave. Do we just say "sorry applications, we know POSIX said you could do that, but we can't have you doing that; fix your code", or do we say "sorry engine implementors, you can't use filesystems where fstatat doesn't have an inode"?

@jerbob92
Copy link

jerbob92 commented Dec 8, 2022

Should we tell all programs that they can't rely on this POSIX guarantee? Or shall we consider those implementations that do that non-conforming?

I see that as the problem of the implementation indeed, not wasi-libc's. If documentation behavior is that d_ino is optional and that compilers should do fstatat in case it's 0 then it would be a different story.

What people do is read the st_dev from the directory they're reading from, since it'll be the same for all the contents of that directory (even if something is mounted on top of something—readdir shows the inode for the thing underneath the mount).

Make sense, thanks!

The problem is, people writing programs at the POSIX level won't be aware of this "zero means unavailable" behavior. POSIX-abiding code that sees two open files with the same inode number will assume they're the same, and misbehave. Do we just say "sorry applications, we know POSIX said you could do that, but we can't have you doing that; fix your code", or do we say "sorry engine implementors, you can't use filesystems where fstatat doesn't have an inode"?

I think the latter indeed. If WASI says that d_ino/ino/dev is required, then it's completely the problem of the runtime to make sure it's filled, if the FS in the runtime doesn't support it then it should probably not be used for WASI. I don't think wasi-libc should implement some shady workaround to make that work in some cases. In my opinion @yamt should just do an extra stat in their fd_readdir implementation to get the inode.

@sunfishcode
Copy link
Member

sunfishcode commented Dec 9, 2022

@jerbob92 I'm not familiar with Go FS, but I talked to someone who knows more about it, and they suggested that for some implementations of io/fs.FS (like the posix-ish os.dirFS, but not embed.FS), you can extract an inode by calling the io/fs.FileInfo.Sys() method, fallibly type-asserting the returned interface to *syscall.Stat_t, and readig its Ino field. Do you think that would work in Wazero?

@codefromthecrypt
Copy link

some files are virtual, ex backed by byte buffers and whatnot. others indeed use embed.FS, so we can only use *syscall.Stat_t on real files. As wasm is supposed to be virtualization with security properties etc, it would seem typical that files are not always going to be real files, and even if they were you wouldn't necessarily want to leak the real inodes.

@codefromthecrypt
Copy link

my 2p is that this is both an expensive requirement, and not a very portable one wrt how files are supplied in wasm. It
assumes direct plumbing to real files, which makes sense (except perf) when that's the case. It doesn't map well to virtualized files and virtualized files are in use today in wasm when someone uses a binary as a library (ex fake a file input from a buffer) or serve content from embedded data, etc. What exactly to do when this is the case seems worthy of a primer, as even though wazero folks noticed this I would doubt we can assume all wasi runtimes always use real files.

@codefromthecrypt
Copy link

fyi in benchmarks, using real files measures an order of magnitude slower for a small directory listing vs embedded FS. We'll implement what's needed, defensively, but generally speaking the more things that bind very strictly, the harder the wasi+wasm promise of portability will be to keep.

One way to at least know what will break without very experienced people guessing is more tripwire tests in the wasi-libc repo or at least the webassembly org. Making a call that helps compiler portability but hurts runtime in the process may be still the path to take, but at least the context will be known more before making a decision and releasing it.

my 2p

@sunfishcode
Copy link
Member

The virtualization use case is an interesting point here.

The dev+ino scheme creates two problems here: The device number part assumes that devices exist in a global namespace, which conflicts with the modularity goals of the component model. And the inode part assumes that filesystems have inodes. It's theoretically possible to implement inodes on top of a filesystem that doesn't have them, by maintaining a hash table at runtime of seen files, but that adds a fair amount of complexity and doesn't work if the filesystem supports renaming.

So what if we go the other direction, and remove device and inode numbers?

This would mean the POSIX-required st_dev and st_ino fields would not be provided by wasi-libc, and nor with the d_ino field. To support the common use cases, we could add two new functions, is-same-file and is-same-mountpoint for testing whether two files are the same, or on the same mount point. They wouldn't be drop-in replacements, but many use cases should be able to port to them without great difficulty.

And as an added bonus, this would hide more implementation details that might be leaked in the device and inode numbers.

So this might make sense. But it will mean that codebases that use st_dev/st_ino/d_ino would break when we update to the new APIs which do this, until someone ports them to the new functions.

Thoughts?

sunfishcode added a commit that referenced this issue Jan 13, 2023
As discussed [here], remove the fields which correspond to `st_dev`, `st_ino`,
and `d_ino` in POSIX from the stat and directory entry structs.

 - Device numbers assume the existence of a global device number space,
   which creates implicit relationships between otherwise unrelated
   components.

 - Not all filesystem implementations have these numbers. And some that
   do have these numbers require extra implementation cost to retrieve them.

 - These numbers leak potentially sensitive or identifying information from the
   underlying filesystem implementation.

In their place, provide two functions, `is-same-file` and
`is-same-mountpoint`, for explicitly testing whether two handles are the
same file or are on the same mountpoint, respectively. This doesn't
cover all possible use cases for device and inode numbers, but we can
add more functions as need arises.

[here]: #65 (comment)
@sunfishcode
Copy link
Member

I've now submitted #81 to propose removing the device and inode numbers.

sunfishcode added a commit that referenced this issue Jan 18, 2023
As discussed [here], remove the fields which correspond to `st_dev`, `st_ino`,
and `d_ino` in POSIX from the stat and directory entry structs.

 - Device numbers assume the existence of a global device number space,
   which creates implicit relationships between otherwise unrelated
   components.

 - Not all filesystem implementations have these numbers. And some that
   do have these numbers require extra implementation cost to retrieve them.

 - These numbers leak potentially sensitive or identifying information from the
   underlying filesystem implementation.

In their place, provide two functions, `is-same-file` and
`is-same-mountpoint`, for explicitly testing whether two handles are the
same file or are on the same mountpoint, respectively. This doesn't
cover all possible use cases for device and inode numbers, but we can
add more functions as need arises.

[here]: #65 (comment)
john-sharratt pushed a commit to john-sharratt/wasix-libc that referenced this issue Mar 6, 2023
… value. (WebAssembly#345)

* If `fd_readdir` returns a zero inode, call `fstatat` to get the inode value.

On some systems, `fd_readdir` may not implement the `d_ino` field and
may set it to zero. When this happens, have wasi-libc call `fstatat` to
get the inode number.

See the discussion in
WebAssembly/wasi-filesystem#65 for details.

* Update the `d_type` field too, in case it changes.
sunfishcode added a commit that referenced this issue May 15, 2023
As discussed [here], remove the fields which correspond to `st_dev`, `st_ino`,
and `d_ino` in POSIX from the stat and directory entry structs.

 - Device numbers assume the existence of a global device number space,
   which creates implicit relationships between otherwise unrelated
   components.

 - Not all filesystem implementations have these numbers. And some that
   do have these numbers require extra implementation cost to retrieve them.

 - These numbers leak potentially sensitive or identifying information from the
   underlying filesystem implementation.

In their place, provide some functions, `is-same-object`,
`metadata-hash`, and `metadata-hash-at`, for explicitly testing whether two
handles are the same file or have the same metadata, respectively. This doesn't
cover all possible use cases for device and inode numbers, but we can
add more functions as need arises.

[here]: #65 (comment)
sunfishcode added a commit to sunfishcode/wasi-filesystem that referenced this issue Jul 20, 2023
As discussed [here], remove the fields which correspond to `st_dev`, `st_ino`,
and `d_ino` in POSIX from the stat and directory entry structs.

 - Device numbers assume the existence of a global device number space,
   which creates implicit relationships between otherwise unrelated
   components.

 - Not all filesystem implementations have these numbers. And some that
   do have these numbers require extra implementation cost to retrieve them.

 - These numbers leak potentially sensitive or identifying information from the
   underlying filesystem implementation.

In their place, provide some functions, `is-same-object`,
`metadata-hash`, and `metadata-hash-at`, for explicitly testing whether two
handles are the same file or have the same metadata, respectively. This doesn't
cover all possible use cases for device and inode numbers, but we can
add more functions as need arises.

[here]: WebAssembly#65 (comment)
sunfishcode added a commit to sunfishcode/wasi-filesystem that referenced this issue Jul 20, 2023
As discussed [here], remove the fields which correspond to `st_dev`, `st_ino`,
and `d_ino` in POSIX from the stat and directory entry structs.

 - Device numbers assume the existence of a global device number space,
   which creates implicit relationships between otherwise unrelated
   components.

 - Not all filesystem implementations have these numbers. And some that
   do have these numbers require extra implementation cost to retrieve them.

 - These numbers leak potentially sensitive or identifying information from the
   underlying filesystem implementation.

In their place, provide some functions, `is-same-object`,
`metadata-hash`, and `metadata-hash-at`, for explicitly testing whether two
handles are the same file or have the same metadata, respectively. This doesn't
cover all possible use cases for device and inode numbers, but we can
add more functions as need arises.

[here]: WebAssembly#65 (comment)
sunfishcode added a commit that referenced this issue Jul 20, 2023
* Remove the device and inode numbers from the API.

As discussed [here], remove the fields which correspond to `st_dev`, `st_ino`,
and `d_ino` in POSIX from the stat and directory entry structs.

 - Device numbers assume the existence of a global device number space,
   which creates implicit relationships between otherwise unrelated
   components.

 - Not all filesystem implementations have these numbers. And some that
   do have these numbers require extra implementation cost to retrieve them.

 - These numbers leak potentially sensitive or identifying information from the
   underlying filesystem implementation.

In their place, provide some functions, `is-same-object`,
`metadata-hash`, and `metadata-hash-at`, for explicitly testing whether two
handles are the same file or have the same metadata, respectively. This doesn't
cover all possible use cases for device and inode numbers, but we can
add more functions as need arises.

[here]: #65 (comment)

* Remove the device and inode numbers from the API.

As discussed [here], remove the fields which correspond to `st_dev`, `st_ino`,
and `d_ino` in POSIX from the stat and directory entry structs.

 - Device numbers assume the existence of a global device number space,
   which creates implicit relationships between otherwise unrelated
   components.

 - Not all filesystem implementations have these numbers. And some that
   do have these numbers require extra implementation cost to retrieve them.

 - These numbers leak potentially sensitive or identifying information from the
   underlying filesystem implementation.

In their place, provide some functions, `is-same-object`,
`metadata-hash`, and `metadata-hash-at`, for explicitly testing whether two
handles are the same file or have the same metadata, respectively. This doesn't
cover all possible use cases for device and inode numbers, but we can
add more functions as need arises.

[here]: #65 (comment)

* Explicitly document that the hash can contain a secret value.

* Use a named `record` type instead of a `tuple` for hash values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants