Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize paths when creating StorePath #2850

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Feb 19, 2025

In main we don't normalize paths when constructing an instance of StorePath, which means it's possible to create groups with names like a//b. This PR fixes this behavior by ensuring that the path attribute of StorePath is normalized with normalize_path in StorePath.__init__.

There is potentially a problem here if different store classes have different rules about path names. This argues against the existence of a separate StorePath class, and in favor of allowing Store instances to manage their current path.

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 19, 2025
@dcherian
Copy link
Contributor

There is potentially a problem here if different store classes have different rules about path names.

The spec controls the characters that can be used, and the "separator" is configurable, so I'm finding it hard to see what other axis of flexbility exists.

# From https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#node-names
# 1. must not be the empty string ("")
# 2. must not include the character "/"
# 3. must not be a string composed only of period characters, e.g. "." or ".."
# 4. must not start with the reserved prefix "__"
zarr_key_chars = st.sampled_from(
".-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"
)
node_names = st.text(zarr_key_chars, min_size=1).filter(
lambda t: t not in (".", "..") and not t.startswith("__")
)

@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 20, 2025

There is potentially a problem here if different store classes have different rules about path names.

The spec controls the characters that can be used, and the "separator" is configurable, so I'm finding it hard to see what other axis of flexbility exists.

# From https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#node-names
# 1. must not be the empty string ("")
# 2. must not include the character "/"
# 3. must not be a string composed only of period characters, e.g. "." or ".."
# 4. must not start with the reserved prefix "__"
zarr_key_chars = st.sampled_from(
".-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"
)
node_names = st.text(zarr_key_chars, min_size=1).filter(
lambda t: t not in (".", "..") and not t.startswith("__")
)

different key-value storage backends can set different rules about what keys are allowed. e.g., s3 and OSX might have different rules about file name length. It's not terrible if we end up parsing paths twice, but if there's a bug in one parsing path or the other, it would be annoying to look in two different classes to find that bug. I guess my broader complaint is that StorePath is a store with 1 extra property. Any logic we add to StorePath is logic that should exist on the store class itself, and this is duplicative. But we can live with it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants