Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading files from within const eval #44

Open
joshtriplett opened this issue Jun 24, 2020 · 7 comments
Open

Reading files from within const eval #44

joshtriplett opened this issue Jun 24, 2020 · 7 comments

Comments

@joshtriplett
Copy link
Member

(Filing this at @oli-obk's request, based on discussions on Zulip.)

We should be able to read files within const eval. This is equivalent to doing include_bytes! (which we already support at compile time), and should use the same machinery for "rebuild if this changes".

@oli-obk suggested that this could work by tagging specific portions of low-level file I/O as something akin to lang items, and then implementing that low-level file I/O by invoking the machinery that powers include_bytes!.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 25, 2020

by invoking the machinery that powers include_bytes!.

It's not yet possible to share the machinery between include_bytes and the hypothetical machinery for const eval reading files, because include_bytes runs at a time where there are no queries available yet. In order for const eval to be able to read from files, we need to add a query that goes from PathBuf to &'tcx [u8] by reading the entire file in one go and interning it. This means that if you have a

const fn foo() -> String {
    File::open("foo.txt").unwrap().read_to_string()
}

you will get different results when calling this function from different crates, because each crate will resolve foo.txt to a different file (to this_crate_root/foo.txt, so right next to Cargo.toml).

I don't think we need to take care to ignore the target directory, because include_bytes can also read from that. Since we're caching all read files via the query system, we won't ever get a situation where the files differ between each read.

Two important thing we need to take care of:

  1. Prevent any non-relative paths or relative paths that go outside the crate root. We can have different schemes to do that. One variant would be to treat the crate root as the file system root.
  2. Ensure the query is never cached in the incremental cache. Even if none of its inputs or input queries change we need to reread the entire file in order to check if the file changed.

@RalfJung
Copy link
Member

In order for const eval to be able to read from files, we need to add a query that goes from (CrateNum, PathBuf) to &'tcx [u8] by reading the entire file in one go and interning it. This means that if you have a

That would take care of dynamically computing the filename. However, @joshtriplett also asked to be able to read parts of a file without reading the whole file, which this would not do.

Prevent any non-relative paths or relative paths that go outside the crate root. We can have different schemes to do that. One variant would be to treat the crate root as the file system root.

Why that? Does include_bytes! do that? Certainly build.rs and proc macros can read anywhere on the file system... (IMO they should be sandboxed but that's a long and separate discussion.^^)

Two important thing we need to take care of:

  1. Ensure that the query system never recomputes this query. Currently, for many queries dropping the cache is okay because the queries are all pure functions that can be recomputed any time. Reading from a file might be the first query to not have that property, so the entire query system needs to cooperate (and future changes in the query system need to take this into account).

@oli-obk
Copy link
Contributor

oli-obk commented Jun 25, 2020

Why that? Does include_bytes! do that? Certainly build.rs and proc macros can read anywhere on the file system... (IMO they should be sandboxed but that's a long and separate discussion.^^)

Because... we can do it? No need to start out a new feature without sandboxing if we can sandbox it from the start

Ensure that the query system never recomputes this query. Currently, for many queries dropping the cache is okay because the queries are all pure functions that can be recomputed any time. Reading from a file might be the first query to not have that property, so the entire query system needs to cooperate (and future changes in the query system need to take this into account).

That's actually fine, because if recomputing the query causes the result to differ, it will poison all queries that called this query and force them to get recomputed, too. So you don't get any behaviour where if you have a dataflow from such a file to an array length, the array length changes within the same compilation.

@joshtriplett
Copy link
Member Author

joshtriplett commented Jun 27, 2020 via email

@septatrix
Copy link

Calling a const function at compile-time will always yield the same result as calling it at runtime, even when called multiple times.
~ https://doc.rust-lang.org/reference/const_eval.html#const-functions

How could this still hold with arbitrary file access which whose content can change at any point?

@ayebear
Copy link

ayebear commented May 5, 2024

How could this still hold with arbitrary file access which whose content can change at any point?

Why does const fn have to be deterministic if proc macros don't have to be? Couldn't we just accept that results could be different at compile time because of files/floats, and then consistently apply that to all of the compile-time concepts? Or just make these rules warnings instead of errors.

@RalfJung
Copy link
Member

RalfJung commented May 5, 2024

Proc macros are like code that generates .rs files. They are executed once upfront and then produce the input for the rest of complication.

Constants can be evaluated multiple times. E.g. when you have something like

impl<T, U> Trait for Type<T, U> {
  const C: Type = ...;
}

The value of C can only be computed once T and U are known. But when two different crates independently compute C for the same T and U, it is crucial that they get the same result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants