Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate StrictModeConfig output structure #6114

Merged
merged 5 commits into from
Mar 6, 2025
Merged

Conversation

agourlay
Copy link
Member

@agourlay agourlay commented Mar 5, 2025

Separate the whole struct tree of StrictModeConfig used in the CollectionInfo in order to enable the Python REST client to to tag those types as accepting extra keys during deserialization.

This is required to keep the backward compatibility between the Python client 1.13.2 and the upcoming 1.14.0 version.

We have used a similar pattern in the past for OptimizerConfig and OptimizerConfigDiff for the same issue.

As a final test I validated that the new output classes are tagged properly in the client generation

Copy link

coderabbitai bot commented Mar 5, 2025

📝 Walkthrough

Walkthrough

The changes update the strict mode configuration handling across the API and related code. In the OpenAPI specification, several schema names have been updated to use an "Output" suffix, with adjustments made to properties such as rate limits and maximum counts. In the gRPC conversion module, new implementations for converting between the original strict mode configuration types and their newly named output types have been added, including handling related to sparse and multivector settings. These modifications are also propagated to the collection module, where the strict mode configuration field in the collection configuration struct is updated to use the new output type, and conversion logic is adjusted accordingly. New structures for handling strict mode configurations, specifically for sparse and multivector data, have also been introduced, with conversion traits implemented to ensure proper data mapping.

Possibly related PRs

  • Simplify StrictModeConfig hashing #6112: The changes in the main PR are related to the StrictModeConfig struct, specifically its renaming to StrictModeConfigOutput and the introduction of conversion implementations, while the retrieved PR focuses on simplifying the hashing implementation of the original StrictModeConfig, indicating a direct connection at the code level.
  • Remove validation on CollectionInfo response type #6107: The changes in the main PR, which involve renaming and modifying the StrictModeConfig and related schemas, are directly related to the changes in the retrieved PR that also involve the strict_mode_config field in the CollectionConfig structure, specifically its validation attributes. Both PRs focus on the handling of strict mode configurations, indicating a strong connection at the code level.
  • Anonymize ShardKey in ReplicaSetTelemetry #6115: The changes in the main PR, which involve renaming and updating schemas related to strict mode configurations, are directly related to the modifications in the retrieved PR that also focus on the StrictModeConfig and its anonymization, indicating a shared context in handling strict mode settings.

Suggested reviewers

  • timvisee

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 237a972 and 9d52667.

📒 Files selected for processing (4)
  • docs/redoc/master/openapi.json (12 hunks)
  • lib/collection/src/collection/mod.rs (2 hunks)
  • lib/collection/src/config.rs (1 hunks)
  • lib/collection/src/telemetry.rs (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
🔇 Additional comments (16)
lib/collection/src/config.rs (2)

207-208: JsonSchema derive has been removed from CollectionConfigInternal.

The removal of the JsonSchema derive macro from CollectionConfigInternal is aligned with the PR objectives to make the struct more flexible during deserialization. This change allows the Python REST client to accept additional keys that might be present in newer versions, maintaining backward compatibility between client versions.


220-220: Field uses StrictModeConfig instead of StrictModeConfigOutput.

This field still uses the internal StrictModeConfig type, which is appropriate since this is an internal representation. The conversion to StrictModeConfigOutput will happen later when generating the API responses, as implemented in the telemetry module.

lib/collection/src/telemetry.rs (5)

5-5: Import updated to use StrictModeConfigOutput.

The import has been correctly updated to use StrictModeConfigOutput instead of StrictModeConfig, which aligns with the PR objectives to restructure the strict mode configuration.


19-19: Field type updated to use new CollectionConfigTelemetry.

The config field in CollectionTelemetry is now using the new CollectionConfigTelemetry type instead of CollectionConfigInternal, creating a proper separation between internal and output/API representations.


52-64: Anonymize implementation updated for CollectionConfigTelemetry.

The Anonymize implementation has been correctly updated to work with the new CollectionConfigTelemetry type, ensuring that telemetry data is properly anonymized before being sent externally.


86-98: New CollectionConfigTelemetry struct with StrictModeConfigOutput.

This new struct is a key component of the restructuring, serving as an output-specific representation that uses StrictModeConfigOutput. This allows the Python client to recognize and accept additional fields during deserialization.


100-121: Conversion implementation from internal to output config.

The From trait implementation enables seamless conversion from CollectionConfigInternal to CollectionConfigTelemetry, including the conversion of StrictModeConfig to StrictModeConfigOutput. This is crucial for maintaining backward compatibility.

Line 117 specifically handles the conversion of the strict_mode_config field, ensuring it's properly transformed to the output type while maintaining the Option wrapper.

lib/collection/src/collection/mod.rs (2)

55-55: Import updated to include CollectionConfigTelemetry.

The import statement has been correctly updated to include the new CollectionConfigTelemetry type, enabling its use in the module.


788-788: Telemetry data now uses CollectionConfigTelemetry.

The get_telemetry_data method now correctly uses CollectionConfigTelemetry::from to convert the internal config to the telemetry config. This ensures that when telemetry data is sent externally, it uses the output-specific config format that allows for backward compatibility with the Python client.

docs/redoc/master/openapi.json (7)

6707-6707: Updated schema reference to use new output structure.

The StrictModeConfig reference has been changed to StrictModeConfigOutput in the CollectionConfig schema. This change is aligned with the PR's objective to restructure StrictModeConfig for better compatibility with the Python REST client.


7256-7385: Renamed StrictModeConfig to StrictModeConfigOutput with relaxed constraints.

The renamed StrictModeConfigOutput schema includes several key changes:

  1. Changed minimum values for read_rate_limit and write_rate_limit from 1 to 0
  2. Changed minimum value for max_points_count from 1 to 0
  3. Updated property descriptions for unindexed filtering fields

These changes allow for more flexible configuration options and should maintain backward compatibility with existing clients.


7366-7367: Updated nested configuration references to use Output suffix.

The references to nested configuration schemas have been updated to use their Output-suffixed variants:

  • StrictModeMultivectorConfigStrictModeMultivectorConfigOutput
  • StrictModeSparseConfigStrictModeSparseConfigOutput

This is consistent with the overall approach of restructuring these configurations for better client compatibility.

Also applies to: 7377-7378


7386-7403: New StrictModeMultivectorConfigOutput and StrictModeMultivectorOutput schemas.

The new schema for multivector configuration is structured to allow for more flexibility:

  1. StrictModeMultivectorConfigOutput as a wrapper for the configuration map
  2. StrictModeMultivectorOutput for individual vector configurations
  3. Relaxed minimum value for max_vectors from 1 to 0, allowing for potentially empty vector configurations

This approach follows the pattern used in OptimizerConfig, as mentioned in the PR description.


7404-7421: New StrictModeSparseConfigOutput and StrictModeSparseOutput schemas.

The new schema for sparse vector configuration follows the same pattern:

  1. StrictModeSparseConfigOutput as a wrapper for the configuration map
  2. StrictModeSparseOutput for individual sparse vector settings
  3. Relaxed minimum value for max_length from 1 to 0, allowing for more flexible configuration

This change completes the restructuring of the strict mode configuration schemas.


9601-9766: Original StrictModeConfig schema maintained alongside new Output version.

The original StrictModeConfig schema is maintained in the API, allowing for backward compatibility. Key differences from the new Output version include:

  1. Stricter minimum values (1 instead of 0) for read_rate_limit, write_rate_limit, and max_points_count
  2. References to the original nested config types instead of the new Output types

Keeping both versions ensures that existing code continues to work while new code can take advantage of the more flexible Output versions.


11479-11488: Updated schema reference in CollectionConfigTelemetry.

The strict_mode_config property in the CollectionConfigTelemetry schema has been updated to use the new StrictModeConfigOutput schema, ensuring consistent usage of the new output structure throughout the API.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
lib/api/src/grpc/conversions.rs (1)

2004-2047: Converting StrictModeConfigOutput into RPC-level StrictModeConfig.

Downcasting from usize to u32 or from f64 to f32 may lead to minor precision or range constraints. If there is a possibility of very large numbers or high precision, consider documenting or validating the permissible ranges.

lib/segment/src/types.rs (3)

736-742: Consider reintroducing validation
If you want to ensure no invalid values are exposed (e.g., minimum size constraints), you could add a validation attribute here (similar to the original StrictModeSparse).


810-816: Revisit optional validation
If you want to enforce a minimum number of vectors for consistency, consider adding a validation check in this output struct as well.


953-1078: Enhance consistency with other output structs
You might consider adding #[schemars(deny_unknown_fields)] here, as done in the sparse and multivector output structs, to maintain uniform schema rules for strict output objects. Otherwise, everything else looks fine.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f73b0c1 and c24575c.

📒 Files selected for processing (5)
  • docs/redoc/master/openapi.json (9 hunks)
  • lib/api/src/grpc/conversions.rs (4 hunks)
  • lib/collection/src/operations/conversions.rs (2 hunks)
  • lib/collection/src/operations/types.rs (3 hunks)
  • lib/segment/src/types.rs (4 hunks)
🔇 Additional comments (25)
lib/collection/src/operations/conversions.rs (2)

24-24: Import update to support the new StrictModeConfigOutput type

The import statement has been updated to include StrictModeConfigOutput instead of the previous StrictModeConfig, supporting the restructuring needed for Python REST client compatibility.


2096-2096: Updated strict mode config conversion to use new output type

This change correctly implements the PR objective of restructuring the StrictModeConfig used within CollectionInfo. By replacing the earlier StrictModeConfig::from with StrictModeConfigOutput::from, the code now uses the new output structure that allows the Python REST client to recognize these types as capable of accepting additional keys during deserialization.

This implementation follows the same pattern that was previously used for OptimizerConfig and OptimizerConfigDiff to maintain backward compatibility between Python client versions.

docs/redoc/master/openapi.json (8)

6707-6707: Schema reference updated to use the new output type.

The reference in CollectionConfig has been updated to use the new StrictModeConfigOutput schema instead of the original StrictModeConfig. This is in line with the PR objective to restructure the StrictModeConfig for better handling by the Python REST client.


7256-7385: New schema StrictModeConfigOutput created for backward compatibility.

This new schema structure mirrors the original StrictModeConfig but with adjusted minimum values to provide more flexibility for client deserialization. The changes to minimum values (from 1 to 0) in several properties allow the fields to accept a wider range of values, which helps maintain backward compatibility between different client versions.

Key changes:

  • read_rate_limit: minimum changed from 1 to 0
  • write_rate_limit: minimum changed from 1 to 0
  • max_points_count: minimum changed from 1 to 0

These changes align with similar approaches previously implemented for OptimizerConfig and OptimizerConfigDiff as mentioned in the PR objectives.


7386-7392: Renamed schema to StrictModeMultivectorConfigOutput with updated reference.

This change is part of the overall restructuring pattern, creating dedicated output schemas with references to their corresponding output components, in this case StrictModeMultivectorOutput.


7392-7403: New schema StrictModeMultivectorOutput replaces StrictModeMultivector.

Similar to the main StrictModeConfigOutput change, this specialized multivector configuration output schema has been created with a relaxed minimum value for max_vectors (changed from 1 to 0), providing more flexibility for client deserialization.


7404-7410: Renamed schema to StrictModeSparseConfigOutput with updated reference.

This follows the consistent pattern of creating dedicated output schemas with references to their corresponding output components, in this case StrictModeSparseOutput.


7410-7421: New schema StrictModeSparseOutput replaces StrictModeSparse.

This specialized sparse configuration output schema has been created with a relaxed minimum value for max_length (changed from 1 to 0), maintaining the consistent pattern of providing more flexibility in the output schemas.


9601-9766: Original StrictModeConfig schema maintained for backward compatibility.

The original StrictModeConfig schema is retained alongside the new output schemas. This dual schema approach is a good practice for maintaining backward compatibility:

  1. Input validation can still use the stricter requirements of the original schema
  2. Output serialization can use the more flexible Output version
  3. Existing client code can continue to work with the original schema

The original schema maintains the stricter minimum values (1 instead of 0) for:

  • read_rate_limit
  • write_rate_limit
  • max_points_count
  • max_vectors (in StrictModeMultivector)
  • max_length (in StrictModeSparse)

9839-9847: UpdateCollection correctly references the original StrictModeConfig.

For update operations, the reference to the original StrictModeConfig schema is maintained. This is appropriate since updates should still adhere to the stricter validation rules.

lib/collection/src/operations/types.rs (3)

34-35: Import usage looks consistent.

The addition of StrictModeConfigOutput in these import lines aligns with the subsequent changes in this file where the new type is leveraged. No issues found.


201-201: Renaming the field to use StrictModeConfigOutput.

Switching to Option<StrictModeConfigOutput> instead of Option<StrictModeConfig> is logically consistent with the updated conversion logic. Ensure callers properly handle any backward compatibility concerns if needed.


223-223: Correct mapping for strict_mode_config.

Referencing the StrictModeConfigOutput through .map(StrictModeConfigOutput::from) properly aligns with the updated struct. This ensures the new schema is consistently applied.

lib/api/src/grpc/conversions.rs (6)

1-1: Additional imports introduced.

The inclusion of BTreeMap and HashSet indicates usage in the newly added conversion functions. No immediate issues identified.


1985-2003: Conversion from StrictModeSparseConfigOutput to StrictModeSparseConfig.

This mapping is straightforward, translating max_length into a u64 field. Confirm that the final type accommodates all use cases, especially regarding potential overflow or architecture differences for usize.


2049-2095: Reverse conversion from StrictModeConfig to StrictModeConfigOutput.

The code neatly mirrors the prior conversion. The approach is consistent, ensuring that the output aligns well with the new strict mode schema.


2096-2111: Converting StrictModeMultivectorConfig to StrictModeMultivectorConfigOutput.

The mapping from u64 to usize for max_vectors is logically consistent across the codebase. No further issues noted.


2131-2148: Translating StrictModeMultivectorConfigOutput back into StrictModeMultivectorConfig.

This is the symmetrical conversion for multivector fields. The iteration over the config looks good, and key-value pairs are mapped correctly.


2150-2164: Generating StrictModeSparseConfigOutput from StrictModeSparseConfig.

The conversion loop and assignment to the BTreeMap are clear. Everything appears consistent with the rest of the strict mode transformations.

lib/segment/src/types.rs (6)

729-735: Looks good for flattened output structure
No concerns here; the overall approach of flattening looks correct.


743-755: Straightforward conversion logic
Implementation correctly iterates over each entry and converts to the output type without side effects.


756-762: Clear and concise transformation
The From implementation neatly copies the relevant field. No issues identified.


790-796: Same pattern, no concerns
Mirrors the sparse config output structure exactly, providing a consistent approach.


797-809: Conversion logic appears correct
Each key-value pair is mapped nicely to the new output.


817-823: Simple and correct
The direct conversion from StrictModeMultivector to StrictModeMultivectorOutput is properly handled.

@generall generall force-pushed the strict-mode-config-output branch from c24575c to 237a972 Compare March 5, 2025 19:21
@@ -229,7 +229,7 @@ impl From<CollectionConfigInternal> for CollectionConfig {
optimizer_config,
wal_config: Some(wal_config),
quantization_config,
strict_mode_config,
strict_mode_config: strict_mode_config.map(StrictModeConfigOutput::from),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectionConfigInternal is used in collection telemetry, and, therefore, should either be splitted into Output structure, or contain StrictModeConfigOutput

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
lib/api/src/grpc/conversions.rs (5)

2004-2047: Watch out for potential truncation when casting usize to u32 or f32.

Several fields (e.g., max_query_limit, max_timeout, search_max_oversampling) are converted from Option<usize> into narrower types. If these fields can exceed u32::MAX or require precise float handling, consider clamping or adding logic to avoid unexpected overflow or truncation.


2049-2093: Be cautious with cross-architecture differences amid integer widening.

Here, the code converts Option<u32> and others into Option<usize>. On typical platforms, this is fine, but keep in mind that if usize is smaller or larger than expected (rare architectures), values could wrap. Adding boundary checks or documentation might help future maintainers.


2096-2110: Check for safe bounds when converting u64 to usize.

The loop inserts max_vectors by casting it from Option<u64> to Option<usize>. While standard 64-bit systems typically handle this seamlessly, a robust check or clamp for extremely large values can provide safer failover in edge cases.


2150-2164: Ensure consistency handling max_length across expansions/conversions.

This final block casts from Option<u64> to Option<usize>, mirroring the earlier pattern. Consider a consistent approach across all conversions—like clamps, error returns, or thorough documentation—to handle unusual or invalid data safely.


1985-2002:

❓ Verification inconclusive

Consider potential integer conversion edge cases when casting max_length from usize to u64.

Although this conversion is valid on common 64-bit architectures, be mindful that extremely large usize values or a hypothetical 128-bit platform scenario could cause overflow. A quick check or clamp might be prudent if untrusted or large inputs are expected.

Below is a script you can use to locate all references to StrictModeSparseConfigOutput across the codebase to confirm usage and ensure safe handling of large values:


🏁 Script executed:

#!/bin/bash
# Description: Verify usage of StrictModeSparseConfigOutput to check for potential large or untrusted inputs
rg -A 10 'StrictModeSparseConfigOutput'

Length of output: 7132


Action Required: Revisit the casting of max_length for potential overflow risks.

The current code performs a direct cast from usize to u64 while mapping the configuration:

max_length: config.max_length.map(|i| i as u64)

This is safe on standard 64-bit architectures. However, as noted in the OpenAPI schema (e.g., in docs/redoc/master/openapi.json), if an unexpectedly large value is fed—or if running on a hypothetical 128-bit system—the conversion might overflow. To mitigate any risks when dealing with potentially untrusted or large inputs, consider inserting a check or clamp that ensures max_length doesn’t exceed the maximum value representable by u64.

  • File: lib/api/src/grpc/conversions.rs (lines 1985–2002)
  • Suggestion: Add bounds-checking or clamping before casting, and ensure that the reverse conversion (casting back to usize) remains safe and consistent with API expectations.
lib/segment/src/types.rs (4)

730-736: Validate the no-op anonymization logic.

This implementation copies the config field verbatim instead of actually removing any identifying data. If the intent is to truly anonymize fields, consider masking or removing sensitive values.


745-751: Check intended anonymization policy.

This method returns the same value for max_length, making the anonymization logic effectively a pass-through. Verify if this is the expected functionality when “anonymizing.”


790-795: No-op anonymization concern.

Similar to the sparse output, this leaves max_vectors as-is. Confirm whether retaining the original value meets your anonymization requirements.


986-1009: Superficial anonymization pass.

All fields (besides nested configs) are directly copied instead of being masked—confirm if you need deeper anonymization for sensitive fields.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c24575c and 237a972.

📒 Files selected for processing (5)
  • docs/redoc/master/openapi.json (9 hunks)
  • lib/api/src/grpc/conversions.rs (4 hunks)
  • lib/collection/src/operations/conversions.rs (2 hunks)
  • lib/collection/src/operations/types.rs (3 hunks)
  • lib/segment/src/types.rs (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/collection/src/operations/conversions.rs
  • lib/collection/src/operations/types.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (22)
docs/redoc/master/openapi.json (8)

6707-6708: Schema reference update aligns with PR objective.

The change from StrictModeConfig to StrictModeConfigOutput in the CollectionConfig schema is consistent with the PR objective of restructuring the StrictModeConfig to allow Python clients to accept additional keys during deserialization.


7256-7385: New StrictModeConfigOutput schema appropriately supports the backward compatibility goal.

This new schema definition replaces the previous StrictModeConfig in the public API. The structure remains the same but with more lenient minimum values (changed from 1 to 0) for rate limits and counts, which provides better flexibility for client applications, particularly the Python REST client mentioned in the PR objectives.


7324-7325: Appropriate adjustment of minimum values to improve compatibility.

Changing the minimum values from 1 to 0 for read_rate_limit, write_rate_limit, and max_points_count allows for more flexible configuration options. This modification is particularly important for maintaining backward compatibility with Python client version 1.13.2 as mentioned in the PR objectives.

Also applies to: 7331-7332, 7345-7346


7386-7421: Well-structured nested output schemas for multivector and sparse configurations.

The new output schemas for multivector and sparse configurations maintain the same structure as their non-output counterparts but with adjusted minimum values. This approach is consistent with similar changes made for OptimizerConfig and OptimizerConfigDiff as mentioned in the PR objectives, ensuring a consistent pattern throughout the API.


7399-7400: Appropriate relaxation of nested schema constraints.

Changing the minimum values from 1 to 0 for max_vectors and max_length in the nested schemas provides more flexibility while maintaining type safety. This approach allows for configurations where these values might not be explicitly set or could be set to 0 to indicate "no limit" or "use default value".

Also applies to: 7417-7418


9601-9766: Original StrictModeConfig schema retained for compatibility.

Maintaining the original StrictModeConfig schema alongside the new StrictModeConfigOutput is a good practice for ensuring backward compatibility. This dual-schema approach allows the system to handle both incoming (client to server) and outgoing (server to client) data structures appropriately, which is crucial when evolving APIs that need to maintain compatibility with existing clients.


7279-7280: Updated field descriptions improve clarity.

The descriptions for unindexed_filtering_retrieve and unindexed_filtering_update have been refined to be more precise and informative, which enhances the documentation quality of the API.

Also applies to: 7284-7285


7366-7383: Properly updated references to nested schemas.

The schema references have been consistently updated to point to the new ...Output schema variants, ensuring coherence throughout the API structure. This attention to detail prevents reference errors and maintains the logical structure of the related schemas.

Also applies to: 7377-7383

lib/api/src/grpc/conversions.rs (1)

2131-2148: Add tests for large max_vectors values.

This conversion shifts Option<usize> to Option<u64>. Although functionally correct on common systems, coverage for extreme or invalid values helps ensure no hidden bugs arise if inputs exceed typical 64-bit ranges or if negative values accidentally appear.

Would you like a sample unit test added or an extended verification script?

lib/segment/src/types.rs (13)

738-743: Confirm handling of additional fields.

Using #[schemars(deny_unknown_fields)] together with #[serde(flatten)] may restrict rather than permit unknown top-level fields. Ensure this behavior aligns with the PR's goal of allowing extra keys in the JSON.


753-758: Struct usability check.

StrictModeSparseOutput mirrors StrictModeSparse with an optional max_length. This is consistent with the new output design.


760-771: Iterative conversion logic looks good.

The loop properly transforms each sparse config entry into the new output form, preserving keys without errors.


773-778: Straightforward field mapping.

The From conversion for StrictModeSparse into StrictModeSparseOutput is concise and correct.


815-820: Consistent pattern with sparse logic.

The multivector config’s anonymize() follows the same approach, recursing into inner fields.


823-829: Validate unknown fields approach.

Check if #[schemars(deny_unknown_fields)] is truly what you need here, given the PR’s stated aim of handling extra keys.


830-841: Mapping logic confirmed.

The loop-based copy from the old StrictModeMultivectorConfig to the new output struct looks correct.


843-849: Straightforward new output struct.

StrictModeMultivectorOutput is aligned with StrictModeMultivector, preserving backward compatibility as intended.


850-856: Correct field forwarding.

The From implementation for StrictModeMultivector copies the max_vectors field directly, maintaining consistency across conversions.


874-874: Minor documentation improvement.

Changing “eg.” to “e.g.” clarifies the comment for retrieval. No further issues here.


878-878: Minor documentation improvement.

Similarly clarifies doc comments for updates. Looks fine.


1011-1089: Extensive new output struct.

StrictModeConfigOutput is thorough and matches the fields of StrictModeConfig. Ensure regular updates of doc comments to reflect the new usage.


1091-1135: Comprehensive From conversion.

This mapping cleanly handles nested structures, providing a solid boundary between the input config and the new output struct.

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra inputs are not permitted

I'd argue that pydantic should ignore unknown field instead.

@agourlay
Copy link
Member Author

agourlay commented Mar 6, 2025

Working on fixing the associated Python client PR before merging this one to make sure it does not break anything.

I am currently chasing down an infinite recursion in the inspection cache consistency GA step.
qdrant/qdrant-client#916 (comment)

@agourlay
Copy link
Member Author

agourlay commented Mar 6, 2025

I am merging this PR because I have been able to show that:

  • the infinite recursion is due to the new Expression type used in scoring and NOT the new Strict Mode Config output types
  • the infinite recursion can be fixed anyway (still needs Georges' validation) Regen REST model from DEV qdrant-client#919

@agourlay agourlay merged commit 293e2e9 into dev Mar 6, 2025
17 checks passed
@agourlay agourlay deleted the strict-mode-config-output branch March 6, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants