Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: single cache file #54

Merged
merged 24 commits into from
Aug 26, 2024
Merged

Conversation

dsherret
Copy link
Member

@dsherret dsherret commented Aug 10, 2024

This is for Deno 2. See denoland/deno#17707 for details.

Format:

<content>\n// denoCacheMetadata=<metadata><EOF>

File format considerations:

  1. We can reuse the original vector we read the file from by storing the content first and then truncating the vector to the content.
  2. The file has a comment on the last line. This makes it still work if loaded as a TS/JS file. Obviously it will not work if this is something like Wasm and someone loads that, but when deno reads this file it removes the comment.
  3. The metadata is json. This makes it easier to add properties in the future.
  4. It's probably best for perf to read this file in one shot to reduce fs calls (especially on network drives), so reading the whole file and extracting out the metadata is ok.
  5. We still want users to easily read the code in this file if they inspect it (so it doesn't use binary)

@dsherret dsherret marked this pull request as ready for review August 10, 2024 23:11
content: &[u8],
metadata: &SerializedCachedUrlMetadata,
) -> std::io::Result<()> {
let serialized_metadata = serde_json::to_vec(&metadata).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use serde_json::to_writer, directly into the result buffer here. Saves one allocation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to do an estimate of the capacity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we're serializing this for the time lol. I'll change this to a single number.

{
    "secs_since_epoch": 1720706015,
    "nanos_since_epoch": 697076300
  }

fn read_content_and_metadata<TMetadata: DeserializeOwned>(
file_bytes: &[u8],
) -> Option<(&[u8], TMetadata)> {
let (file_bytes, metadata_bytes) = split_content_metadata(file_bytes)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print a debug!() if we fail to parse the metadata, or fail to parse the json?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok because we have tests here and in the CLI that ensure the cache is working. Maybe if it becomes an issue?

@dsherret dsherret requested a review from lucacasonato August 22, 2024 15:50
.time_now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_secs(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only storing seconds is ok?

@dsherret dsherret mentioned this pull request Aug 23, 2024
1 task
Copy link
Member

@lucacasonato lucacasonato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dsherret dsherret merged commit ed6ea63 into denoland:main Aug 26, 2024
5 checks passed
@dsherret dsherret deleted the perf_single_cache_file branch August 26, 2024 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants