-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate StrictModeConfig output structure #6114
Conversation
📝 WalkthroughWalkthroughThe changes update the strict mode configuration handling across the API and related code. In the OpenAPI specification, several schema names have been updated to use an "Output" suffix, with adjustments made to properties such as rate limits and maximum counts. In the gRPC conversion module, new implementations for converting between the original strict mode configuration types and their newly named output types have been added, including handling related to sparse and multivector settings. These modifications are also propagated to the collection module, where the strict mode configuration field in the collection configuration struct is updated to use the new output type, and conversion logic is adjusted accordingly. New structures for handling strict mode configurations, specifically for sparse and multivector data, have also been introduced, with conversion traits implemented to ensure proper data mapping. Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
⏰ Context from checks skipped due to timeout of 90000ms (12)
🔇 Additional comments (16)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
lib/api/src/grpc/conversions.rs (1)
2004-2047
: ConvertingStrictModeConfigOutput
into RPC-levelStrictModeConfig
.Downcasting from
usize
tou32
or fromf64
tof32
may lead to minor precision or range constraints. If there is a possibility of very large numbers or high precision, consider documenting or validating the permissible ranges.lib/segment/src/types.rs (3)
736-742
: Consider reintroducing validation
If you want to ensure no invalid values are exposed (e.g., minimum size constraints), you could add a validation attribute here (similar to the originalStrictModeSparse
).
810-816
: Revisit optional validation
If you want to enforce a minimum number of vectors for consistency, consider adding a validation check in this output struct as well.
953-1078
: Enhance consistency with other output structs
You might consider adding#[schemars(deny_unknown_fields)]
here, as done in the sparse and multivector output structs, to maintain uniform schema rules for strict output objects. Otherwise, everything else looks fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/redoc/master/openapi.json
(9 hunks)lib/api/src/grpc/conversions.rs
(4 hunks)lib/collection/src/operations/conversions.rs
(2 hunks)lib/collection/src/operations/types.rs
(3 hunks)lib/segment/src/types.rs
(4 hunks)
🔇 Additional comments (25)
lib/collection/src/operations/conversions.rs (2)
24-24
: Import update to support the new StrictModeConfigOutput typeThe import statement has been updated to include
StrictModeConfigOutput
instead of the previousStrictModeConfig
, supporting the restructuring needed for Python REST client compatibility.
2096-2096
: Updated strict mode config conversion to use new output typeThis change correctly implements the PR objective of restructuring the
StrictModeConfig
used withinCollectionInfo
. By replacing the earlierStrictModeConfig::from
withStrictModeConfigOutput::from
, the code now uses the new output structure that allows the Python REST client to recognize these types as capable of accepting additional keys during deserialization.This implementation follows the same pattern that was previously used for
OptimizerConfig
andOptimizerConfigDiff
to maintain backward compatibility between Python client versions.docs/redoc/master/openapi.json (8)
6707-6707
: Schema reference updated to use the new output type.The reference in
CollectionConfig
has been updated to use the newStrictModeConfigOutput
schema instead of the originalStrictModeConfig
. This is in line with the PR objective to restructure theStrictModeConfig
for better handling by the Python REST client.
7256-7385
: New schemaStrictModeConfigOutput
created for backward compatibility.This new schema structure mirrors the original
StrictModeConfig
but with adjusted minimum values to provide more flexibility for client deserialization. The changes to minimum values (from 1 to 0) in several properties allow the fields to accept a wider range of values, which helps maintain backward compatibility between different client versions.Key changes:
read_rate_limit
: minimum changed from 1 to 0write_rate_limit
: minimum changed from 1 to 0max_points_count
: minimum changed from 1 to 0These changes align with similar approaches previously implemented for
OptimizerConfig
andOptimizerConfigDiff
as mentioned in the PR objectives.
7386-7392
: Renamed schema toStrictModeMultivectorConfigOutput
with updated reference.This change is part of the overall restructuring pattern, creating dedicated output schemas with references to their corresponding output components, in this case
StrictModeMultivectorOutput
.
7392-7403
: New schemaStrictModeMultivectorOutput
replacesStrictModeMultivector
.Similar to the main
StrictModeConfigOutput
change, this specialized multivector configuration output schema has been created with a relaxed minimum value formax_vectors
(changed from 1 to 0), providing more flexibility for client deserialization.
7404-7410
: Renamed schema toStrictModeSparseConfigOutput
with updated reference.This follows the consistent pattern of creating dedicated output schemas with references to their corresponding output components, in this case
StrictModeSparseOutput
.
7410-7421
: New schemaStrictModeSparseOutput
replacesStrictModeSparse
.This specialized sparse configuration output schema has been created with a relaxed minimum value for
max_length
(changed from 1 to 0), maintaining the consistent pattern of providing more flexibility in the output schemas.
9601-9766
: OriginalStrictModeConfig
schema maintained for backward compatibility.The original
StrictModeConfig
schema is retained alongside the new output schemas. This dual schema approach is a good practice for maintaining backward compatibility:
- Input validation can still use the stricter requirements of the original schema
- Output serialization can use the more flexible Output version
- Existing client code can continue to work with the original schema
The original schema maintains the stricter minimum values (1 instead of 0) for:
read_rate_limit
write_rate_limit
max_points_count
max_vectors
(inStrictModeMultivector
)max_length
(inStrictModeSparse
)
9839-9847
:UpdateCollection
correctly references the originalStrictModeConfig
.For update operations, the reference to the original
StrictModeConfig
schema is maintained. This is appropriate since updates should still adhere to the stricter validation rules.lib/collection/src/operations/types.rs (3)
34-35
: Import usage looks consistent.The addition of
StrictModeConfigOutput
in these import lines aligns with the subsequent changes in this file where the new type is leveraged. No issues found.
201-201
: Renaming the field to useStrictModeConfigOutput
.Switching to
Option<StrictModeConfigOutput>
instead ofOption<StrictModeConfig>
is logically consistent with the updated conversion logic. Ensure callers properly handle any backward compatibility concerns if needed.
223-223
: Correct mapping forstrict_mode_config
.Referencing the
StrictModeConfigOutput
through.map(StrictModeConfigOutput::from)
properly aligns with the updated struct. This ensures the new schema is consistently applied.lib/api/src/grpc/conversions.rs (6)
1-1
: Additional imports introduced.The inclusion of
BTreeMap
andHashSet
indicates usage in the newly added conversion functions. No immediate issues identified.
1985-2003
: Conversion fromStrictModeSparseConfigOutput
toStrictModeSparseConfig
.This mapping is straightforward, translating
max_length
into au64
field. Confirm that the final type accommodates all use cases, especially regarding potential overflow or architecture differences forusize
.
2049-2095
: Reverse conversion fromStrictModeConfig
toStrictModeConfigOutput
.The code neatly mirrors the prior conversion. The approach is consistent, ensuring that the output aligns well with the new strict mode schema.
2096-2111
: ConvertingStrictModeMultivectorConfig
toStrictModeMultivectorConfigOutput
.The mapping from
u64
tousize
formax_vectors
is logically consistent across the codebase. No further issues noted.
2131-2148
: TranslatingStrictModeMultivectorConfigOutput
back intoStrictModeMultivectorConfig
.This is the symmetrical conversion for multivector fields. The iteration over the config looks good, and key-value pairs are mapped correctly.
2150-2164
: GeneratingStrictModeSparseConfigOutput
fromStrictModeSparseConfig
.The conversion loop and assignment to the
BTreeMap
are clear. Everything appears consistent with the rest of the strict mode transformations.lib/segment/src/types.rs (6)
729-735
: Looks good for flattened output structure
No concerns here; the overall approach of flattening looks correct.
743-755
: Straightforward conversion logic
Implementation correctly iterates over each entry and converts to the output type without side effects.
756-762
: Clear and concise transformation
TheFrom
implementation neatly copies the relevant field. No issues identified.
790-796
: Same pattern, no concerns
Mirrors the sparse config output structure exactly, providing a consistent approach.
797-809
: Conversion logic appears correct
Each key-value pair is mapped nicely to the new output.
817-823
: Simple and correct
The direct conversion fromStrictModeMultivector
toStrictModeMultivectorOutput
is properly handled.
c24575c
to
237a972
Compare
@@ -229,7 +229,7 @@ impl From<CollectionConfigInternal> for CollectionConfig { | |||
optimizer_config, | |||
wal_config: Some(wal_config), | |||
quantization_config, | |||
strict_mode_config, | |||
strict_mode_config: strict_mode_config.map(StrictModeConfigOutput::from), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CollectionConfigInternal
is used in collection telemetry, and, therefore, should either be splitted into Output structure, or contain StrictModeConfigOutput
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
lib/api/src/grpc/conversions.rs (5)
2004-2047
: Watch out for potential truncation when castingusize
tou32
orf32
.Several fields (e.g.,
max_query_limit
,max_timeout
,search_max_oversampling
) are converted fromOption<usize>
into narrower types. If these fields can exceedu32::MAX
or require precise float handling, consider clamping or adding logic to avoid unexpected overflow or truncation.
2049-2093
: Be cautious with cross-architecture differences amid integer widening.Here, the code converts
Option<u32>
and others intoOption<usize>
. On typical platforms, this is fine, but keep in mind that ifusize
is smaller or larger than expected (rare architectures), values could wrap. Adding boundary checks or documentation might help future maintainers.
2096-2110
: Check for safe bounds when convertingu64
tousize
.The loop inserts
max_vectors
by casting it fromOption<u64>
toOption<usize>
. While standard 64-bit systems typically handle this seamlessly, a robust check or clamp for extremely large values can provide safer failover in edge cases.
2150-2164
: Ensure consistency handlingmax_length
across expansions/conversions.This final block casts from
Option<u64>
toOption<usize>
, mirroring the earlier pattern. Consider a consistent approach across all conversions—like clamps, error returns, or thorough documentation—to handle unusual or invalid data safely.
1985-2002
:❓ Verification inconclusive
Consider potential integer conversion edge cases when casting
max_length
fromusize
tou64
.Although this conversion is valid on common 64-bit architectures, be mindful that extremely large
usize
values or a hypothetical 128-bit platform scenario could cause overflow. A quick check or clamp might be prudent if untrusted or large inputs are expected.Below is a script you can use to locate all references to
StrictModeSparseConfigOutput
across the codebase to confirm usage and ensure safe handling of large values:
🏁 Script executed:
#!/bin/bash # Description: Verify usage of StrictModeSparseConfigOutput to check for potential large or untrusted inputs rg -A 10 'StrictModeSparseConfigOutput'Length of output: 7132
Action Required: Revisit the casting of
max_length
for potential overflow risks.The current code performs a direct cast from
usize
tou64
while mapping the configuration:max_length: config.max_length.map(|i| i as u64)This is safe on standard 64-bit architectures. However, as noted in the OpenAPI schema (e.g., in
docs/redoc/master/openapi.json
), if an unexpectedly large value is fed—or if running on a hypothetical 128-bit system—the conversion might overflow. To mitigate any risks when dealing with potentially untrusted or large inputs, consider inserting a check or clamp that ensuresmax_length
doesn’t exceed the maximum value representable byu64
.
- File:
lib/api/src/grpc/conversions.rs
(lines 1985–2002)- Suggestion: Add bounds-checking or clamping before casting, and ensure that the reverse conversion (casting back to
usize
) remains safe and consistent with API expectations.lib/segment/src/types.rs (4)
730-736
: Validate the no-op anonymization logic.This implementation copies the
config
field verbatim instead of actually removing any identifying data. If the intent is to truly anonymize fields, consider masking or removing sensitive values.
745-751
: Check intended anonymization policy.This method returns the same value for
max_length
, making the anonymization logic effectively a pass-through. Verify if this is the expected functionality when “anonymizing.”
790-795
: No-op anonymization concern.Similar to the sparse output, this leaves
max_vectors
as-is. Confirm whether retaining the original value meets your anonymization requirements.
986-1009
: Superficial anonymization pass.All fields (besides nested configs) are directly copied instead of being masked—confirm if you need deeper anonymization for sensitive fields.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/redoc/master/openapi.json
(9 hunks)lib/api/src/grpc/conversions.rs
(4 hunks)lib/collection/src/operations/conversions.rs
(2 hunks)lib/collection/src/operations/types.rs
(3 hunks)lib/segment/src/types.rs
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/collection/src/operations/conversions.rs
- lib/collection/src/operations/types.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (22)
docs/redoc/master/openapi.json (8)
6707-6708
: Schema reference update aligns with PR objective.The change from
StrictModeConfig
toStrictModeConfigOutput
in the CollectionConfig schema is consistent with the PR objective of restructuring the StrictModeConfig to allow Python clients to accept additional keys during deserialization.
7256-7385
: New StrictModeConfigOutput schema appropriately supports the backward compatibility goal.This new schema definition replaces the previous StrictModeConfig in the public API. The structure remains the same but with more lenient minimum values (changed from 1 to 0) for rate limits and counts, which provides better flexibility for client applications, particularly the Python REST client mentioned in the PR objectives.
7324-7325
: Appropriate adjustment of minimum values to improve compatibility.Changing the minimum values from 1 to 0 for
read_rate_limit
,write_rate_limit
, andmax_points_count
allows for more flexible configuration options. This modification is particularly important for maintaining backward compatibility with Python client version 1.13.2 as mentioned in the PR objectives.Also applies to: 7331-7332, 7345-7346
7386-7421
: Well-structured nested output schemas for multivector and sparse configurations.The new output schemas for multivector and sparse configurations maintain the same structure as their non-output counterparts but with adjusted minimum values. This approach is consistent with similar changes made for
OptimizerConfig
andOptimizerConfigDiff
as mentioned in the PR objectives, ensuring a consistent pattern throughout the API.
7399-7400
: Appropriate relaxation of nested schema constraints.Changing the minimum values from 1 to 0 for
max_vectors
andmax_length
in the nested schemas provides more flexibility while maintaining type safety. This approach allows for configurations where these values might not be explicitly set or could be set to 0 to indicate "no limit" or "use default value".Also applies to: 7417-7418
9601-9766
: Original StrictModeConfig schema retained for compatibility.Maintaining the original
StrictModeConfig
schema alongside the newStrictModeConfigOutput
is a good practice for ensuring backward compatibility. This dual-schema approach allows the system to handle both incoming (client to server) and outgoing (server to client) data structures appropriately, which is crucial when evolving APIs that need to maintain compatibility with existing clients.
7279-7280
: Updated field descriptions improve clarity.The descriptions for
unindexed_filtering_retrieve
andunindexed_filtering_update
have been refined to be more precise and informative, which enhances the documentation quality of the API.Also applies to: 7284-7285
7366-7383
: Properly updated references to nested schemas.The schema references have been consistently updated to point to the new
...Output
schema variants, ensuring coherence throughout the API structure. This attention to detail prevents reference errors and maintains the logical structure of the related schemas.Also applies to: 7377-7383
lib/api/src/grpc/conversions.rs (1)
2131-2148
: Add tests for largemax_vectors
values.This conversion shifts
Option<usize>
toOption<u64>
. Although functionally correct on common systems, coverage for extreme or invalid values helps ensure no hidden bugs arise if inputs exceed typical 64-bit ranges or if negative values accidentally appear.Would you like a sample unit test added or an extended verification script?
lib/segment/src/types.rs (13)
738-743
: Confirm handling of additional fields.Using
#[schemars(deny_unknown_fields)]
together with#[serde(flatten)]
may restrict rather than permit unknown top-level fields. Ensure this behavior aligns with the PR's goal of allowing extra keys in the JSON.
753-758
: Struct usability check.
StrictModeSparseOutput
mirrorsStrictModeSparse
with an optionalmax_length
. This is consistent with the new output design.
760-771
: Iterative conversion logic looks good.The loop properly transforms each sparse config entry into the new output form, preserving keys without errors.
773-778
: Straightforward field mapping.The
From
conversion forStrictModeSparse
intoStrictModeSparseOutput
is concise and correct.
815-820
: Consistent pattern with sparse logic.The multivector config’s
anonymize()
follows the same approach, recursing into inner fields.
823-829
: Validate unknown fields approach.Check if
#[schemars(deny_unknown_fields)]
is truly what you need here, given the PR’s stated aim of handling extra keys.
830-841
: Mapping logic confirmed.The loop-based copy from the old
StrictModeMultivectorConfig
to the new output struct looks correct.
843-849
: Straightforward new output struct.
StrictModeMultivectorOutput
is aligned withStrictModeMultivector
, preserving backward compatibility as intended.
850-856
: Correct field forwarding.The
From
implementation forStrictModeMultivector
copies themax_vectors
field directly, maintaining consistency across conversions.
874-874
: Minor documentation improvement.Changing “eg.” to “e.g.” clarifies the comment for retrieval. No further issues here.
878-878
: Minor documentation improvement.Similarly clarifies doc comments for updates. Looks fine.
1011-1089
: Extensive new output struct.
StrictModeConfigOutput
is thorough and matches the fields ofStrictModeConfig
. Ensure regular updates of doc comments to reflect the new usage.
1091-1135
: ComprehensiveFrom
conversion.This mapping cleanly handles nested structures, providing a solid boundary between the input config and the new output struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra inputs are not permitted
I'd argue that pydantic should ignore unknown field instead.
Working on fixing the associated Python client PR before merging this one to make sure it does not break anything. I am currently chasing down an infinite recursion in the |
I am merging this PR because I have been able to show that:
|
Separate the whole struct tree of
StrictModeConfig
used in theCollectionInfo
in order to enable the Python REST client to to tag those types as accepting extra keys during deserialization.This is required to keep the backward compatibility between the Python client 1.13.2 and the upcoming 1.14.0 version.
We have used a similar pattern in the past for
OptimizerConfig
andOptimizerConfigDiff
for the same issue.As a final test I validated that the new output classes are tagged properly in the client generation