-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change to readonly collection and cleanup #75
Conversation
WalkthroughThe pull request introduces modifications primarily to the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (6)
src/Abstractions/src/IVectorCollection.cs (1)
66-68
: Add XML documentation for consistency.All other methods in this interface have XML documentation. Please add documentation for
SearchByMetadata
following the same pattern.+ /// <summary> + /// Searches for records based on metadata filters. + /// </summary> + /// <param name="filters">The metadata filters to apply to the search.</param> + /// <param name="cancellationToken">The cancellation token.</param> + /// <returns>A task representing the asynchronous operation. The task result contains the matching vectors.</returns> Task<IReadOnlyList<Vector>> SearchByMetadata( Dictionary<string, object> filters, CancellationToken cancellationToken = default);src/SemanticKernel/src/SemanticKernelMemoryStoreCollection.cs (1)
Line range hint
17-19
: Consider using JSON serialization for metadata.The current metadata serialization using string concatenation with "#" and "&" delimiters is fragile and could break if metadata values contain these characters. Additionally, this approach requires custom parsing logic in GetAsync.
Consider using JSON serialization instead:
- //TODO: review way to map metadata - if (item.Metadata != null) - metadata = string.Join("#", item.Metadata.Select(kv => kv.Key + "&" + kv.Value)); + if (item.Metadata != null) + metadata = System.Text.Json.JsonSerializer.Serialize(item.Metadata);And in GetAsync:
- if (record?.Metadata?.AdditionalMetadata != null) - metadata = record.Metadata.AdditionalMetadata - .Split('#') - .Select(part => part.Split('&')) - .ToDictionary(split => split[0], split => (object)split[1]); + if (record?.Metadata?.AdditionalMetadata != null) + metadata = System.Text.Json.JsonSerializer.Deserialize<Dictionary<string, object>>( + record.Metadata.AdditionalMetadata);src/Elasticsearch/src/ElasticsearchVectorCollection.cs (1)
Line range hint
107-111
: Add parameter validation for filters.While the return type change to
IReadOnlyList<Vector>
is good, consider adding parameter validation consistent with other methods in the class.Apply this diff to add parameter validation:
Task<IReadOnlyList<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken) { + filters = filters ?? throw new ArgumentNullException(nameof(filters)); throw new NotImplementedException(); }
src/InMemory/src/InMemoryVectorCollection.cs (1)
Line range hint
95-119
: Optimize performance and simplify implementation.The current implementation has some inefficiencies and can be simplified:
Task.Run
is unnecessary here as the operation is already running on the thread pool and doesn't block any UI thread.- The null check for vector.Metadata can be simplified using the null-conditional operator.
Consider this more efficient implementation:
- public async Task<IReadOnlyList<Vector>> SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken = default) + public Task<IReadOnlyList<Vector>> SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken = default) { filters = filters ?? throw new ArgumentNullException(nameof(filters)); - var filteredVectors = await Task.Run(() => _vectors.Values.Where(vector => + var filteredVectors = _vectors.Values.Where(vector => { // Check if all filters match foreach (var filter in filters) { - object? metadataValue = null; - if (vector.Metadata != null && !vector.Metadata.TryGetValue(filter.Key, out metadataValue) || metadataValue == null) + if (!vector.Metadata?.TryGetValue(filter.Key, out var metadataValue) == true || metadataValue == null) { return false; } - else if (!metadataValue.Equals(filter.Value)) // Convert metadata value to string and compare + if (!metadataValue.Equals(filter.Value)) { return false; } } return true; - }).ToList(), cancellationToken).ConfigureAwait(false); + }).ToList(); - return filteredVectors; + return Task.FromResult<IReadOnlyList<Vector>>(filteredVectors); }This implementation:
- Removes unnecessary async/await and Task.Run
- Simplifies the null checks
- Maintains the immutability contract with IReadOnlyList
src/Sqlite/src/SqLiteVectorCollection.cs (1)
Line range hint
192-241
: Consider optimizing the SQL query construction.While the current implementation is functionally correct, there are a few potential improvements:
- Consider using a StringBuilder for query construction to improve performance when dealing with multiple filters
- The WHERE clause could be constructed more efficiently by pre-allocating the parameter names
Here's a suggested optimization:
- var query = $"SELECT id, vector, document FROM {Name}"; - - var whereClauses = new List<string>(); - int paramIndex = 0; - - foreach (var filter in filters) - { - // Validate filter.Key to prevent injection in JSON path - if (string.IsNullOrWhiteSpace(filter.Key) || !IsValidJsonKey(filter.Key)) - throw new ArgumentException($"Invalid filter key: {filter.Key}", nameof(filters)); - - var paramName = "@param" + paramIndex++; - whereClauses.Add($"json_extract(document, '$.Metadata.{filter.Key}') = {paramName}"); - command.Parameters.AddWithValue(paramName, filter.Value); - } - query += " WHERE " + string.Join(" AND ", whereClauses); + var queryBuilder = new StringBuilder($"SELECT id, vector, document FROM {Name} WHERE "); + var whereClauses = filters.Select((kvp, i) => + { + if (string.IsNullOrWhiteSpace(kvp.Key) || !IsValidJsonKey(kvp.Key)) + throw new ArgumentException($"Invalid filter key: {kvp.Key}", nameof(filters)); + + var paramName = $"@param{i}"; + command.Parameters.AddWithValue(paramName, kvp.Value); + return $"json_extract(document, '$.Metadata.{kvp.Key}') = {paramName}"; + }); + + queryBuilder.Append(string.Join(" AND ", whereClauses)); + command.CommandText = queryBuilder.ToString();This optimization:
- Uses StringBuilder for better performance with multiple filters
- Reduces allocations by using string interpolation only when necessary
- Makes the code more concise and easier to follow
src/Mongo/src/MongoVectorCollection.cs (1)
Line range hint
45-77
: Validaterequest.Embeddings
to prevent potential exceptionsCurrently, the method accesses
request.Embeddings.First()
without checking ifEmbeddings
is null or empty. This could lead to anInvalidOperationException
. Add a check to ensureEmbeddings
is not null and contains at least one element.Apply this change:
request = request ?? throw new ArgumentNullException(nameof(request)); + if (request.Embeddings == null || !request.Embeddings.Any()) + throw new ArgumentException("Embeddings must contain at least one embedding.", nameof(request)); settings ??= new VectorSearchSettings();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
- src/Abstractions/src/IVectorCollection.cs (1 hunks)
- src/Abstractions/src/PublicAPI.Unshipped.txt (2 hunks)
- src/Abstractions/src/VectorCollection.cs (2 hunks)
- src/Chroma/src/ChromaVectorCollection.cs (1 hunks)
- src/Elasticsearch/src/ElasticsearchVectorCollection.cs (1 hunks)
- src/InMemory/src/InMemoryVectorCollection.cs (1 hunks)
- src/Mongo/src/MongoVectorCollection.cs (2 hunks)
- src/OpenSearch/src/OpenSearchVectorCollection.cs (1 hunks)
- src/Postgres/src/PostgresVectorCollection.cs (1 hunks)
- src/SemanticKernel/src/SemanticKernelMemoryStoreCollection.cs (1 hunks)
- src/Sqlite/src/SqLiteVectorCollection.cs (1 hunks)
🧰 Additional context used
🪛 LanguageTool
src/Abstractions/src/PublicAPI.Unshipped.txt
[duplication] ~17-~17: Possible typo: you repeated a word
Context: ...g!, object!>! filters, System.Threading.CancellationToken cancellationToken = default(System.Threading.Cancellation...(ENGLISH_WORD_REPEAT_RULE)
🔇 Additional comments (13)
src/Abstractions/src/VectorCollection.cs (1)
1-1
: LGTM!The using directive is appropriately placed and necessary for the Regex functionality.
src/Abstractions/src/IVectorCollection.cs (1)
66-68
: LGTM! Good improvement to immutability.The change from
Task<List<Vector>>
toTask<IReadOnlyList<Vector>>
is a positive improvement that:
- Enforces immutability of the returned collection
- Aligns with the interface's existing pattern of returning read-only collections
- Follows best practices for API design
src/SemanticKernel/src/SemanticKernelMemoryStoreCollection.cs (1)
79-82
:⚠️ Potential issueImplementation required for SearchByMetadata method.
The method is currently throwing
NotImplementedException
while other vector collections have implemented this functionality. This could lead to runtime errors if the method is called.Let's examine how other collections have implemented this method:
Would you like me to propose an implementation based on other collections' approaches?
src/Elasticsearch/src/ElasticsearchVectorCollection.cs (1)
Line range hint
13-106
: Verify implementation timeline for async methods.All async methods contain commented-out implementation code but throw
NotImplementedException
. While this is acceptable during development, it could cause runtime issues if deployed to production.Would you like me to create a GitHub issue to track the implementation of these methods? This would help ensure these don't get overlooked and provide a clear timeline for completion.
✅ Verification successful
Implementation status is expected - no immediate action required
The unimplemented methods are part of a new Elasticsearch implementation that's currently under development. This is evidenced by:
- The git history shows this is a very recent addition with only 3 commits
- The implementation is part of a larger interface refactoring effort (changing to readonly collections)
- Other vector collection implementations (MongoDB, SQLite, etc.) have working implementations
- The commented code shows the planned implementation approach
The
NotImplementedException
throws are appropriate in this development stage as:
- The class is not yet being used in production (no direct usage found outside its own module)
- The implementation pattern follows a clear, staged development approach
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if there are any direct usages of these unimplemented methods # that might cause runtime exceptions echo "Searching for direct method calls..." rg -l "GetAsync|DeleteAsync|AddAsync|SearchAsync|IsEmptyAsync" --type cs \ --glob '!**/ElasticsearchVectorCollection.cs'Length of output: 792
Script:
#!/bin/bash # Let's check if this is part of a larger refactoring effort by examining the interface echo "Checking IVectorCollection interface..." cat src/Abstractions/src/IVectorCollection.cs # Let's also check if there are any tests that might be affected echo -e "\nChecking for related test files..." rg -l "ElasticsearchVectorCollection" --type cs # Let's see if this is a new implementation echo -e "\nChecking git history for this file..." git log --oneline -- src/Elasticsearch/src/ElasticsearchVectorCollection.csLength of output: 3918
src/InMemory/src/InMemoryVectorCollection.cs (1)
Line range hint
95-119
: Consider thread-safety improvements for concurrent operations.While
ConcurrentDictionary
provides thread-safe operations, the LINQ query might observe partial updates if the collection is modified during enumeration. For stronger consistency guarantees, consider taking a snapshot of the values at the start of the operation.Consider this approach for better thread safety:
public Task<IReadOnlyList<Vector>> SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken = default) { filters = filters ?? throw new ArgumentNullException(nameof(filters)); + // Take a snapshot of values for consistent enumeration + var snapshot = _vectors.Values.ToArray(); - var filteredVectors = _vectors.Values.Where(vector => + var filteredVectors = snapshot.Where(vector =>src/Postgres/src/PostgresVectorCollection.cs (3)
148-153
: LGTM! Good optimization with withEmbeddings parameter.The explicit
withEmbeddings: false
parameter optimizes the query by not fetching unused embedding data.
134-145
: 🛠️ Refactor suggestionVerify the implementation of IsValidJsonKey method.
The validation logic is good, but we need to ensure that
IsValidJsonKey
correctly validates JSON property names according to the JSON specification (RFC 8259).#!/bin/bash # Search for IsValidJsonKey implementation ast-grep --pattern 'IsValidJsonKey($_) { $$$ }'Consider adding XML documentation for the allowed characters to help API consumers understand the validation rules:
+ /// <summary> + /// Validates if the given key is a valid JSON property name. + /// </summary> + /// <param name="key">The key to validate.</param> + /// <returns>True if the key is valid; otherwise, false.</returns> + /// <remarks> + /// Valid JSON property names must not contain control characters or whitespace. + /// </remarks> private bool IsValidJsonKey(string key)
127-128
: Verify consumers of this API for breaking changes.The change from
Task<List<Vector>>
toTask<IReadOnlyList<Vector>>
improves immutability but could break existing code that modifies the returned list.✅ Verification successful
The previous command had syntax issues. Let's search for method usages and interface implementations to understand the impact better. The
-A 2
flag will show 2 lines after the match to see how the result is used.
Return type change is safe to proceed
The change from
Task<List<Vector>>
toTask<IReadOnlyList<Vector>>
is part of a coordinated interface change across the codebase. Evidence shows:
- The
IVectorCollection
interface inAbstractions
definesSearchByMetadata
withIReadOnlyList<Vector>
return type- All implementations (Postgres, SQLite, MongoDB, InMemory, etc.) consistently use
IReadOnlyList<Vector>
- Integration tests show no direct list modifications, only count verification
- The change is documented in
PublicAPI.Unshipped.txt
indicating proper API tracking🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for direct usage of SearchByMetadata where the result is modified rg -l "SearchByMetadata.*\.(Add|Remove|Clear|Sort)"Length of output: 224
Script:
#!/bin/bash # Search for usages of SearchByMetadata method rg -A 2 "SearchByMetadata\(" # Also search for interface implementation references rg -l "IVectorCollection|PostgresVectorCollection"Length of output: 5220
src/Sqlite/src/SqLiteVectorCollection.cs (2)
192-193
: LGTM! Good change to IReadOnlyList.The change from
Task<List<Vector>>
toTask<IReadOnlyList<Vector>>
is a good improvement that:
- Enforces immutability of the returned collection
- Aligns with interface changes across other vector collection implementations
- Maintains backward compatibility since List implements IReadOnlyList
Line range hint
192-241
: Verify IsValidJsonKey implementation.The code uses
IsValidJsonKey
which was moved to the baseVectorCollection
class. Let's verify its implementation and accessibility.✅ Verification successful
IsValidJsonKey implementation is properly inherited and used correctly
The verification confirms:
IsValidJsonKey
is implemented in the baseVectorCollection
class (src/Abstractions/src/VectorCollection.cs) as a protected static methodSqLiteVectorCollection
properly inherits fromVectorCollection
and has access to this method- The implementation validates that the key contains only letters, numbers, and underscores using regex
^\w+$
- The usage in
SqLiteVectorCollection
is correct, combining it with a null/whitespace check before using the key in SQL queries🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify IsValidJsonKey implementation in VectorCollection base class # Test: Search for IsValidJsonKey implementation echo "Searching for IsValidJsonKey implementation..." ast-grep --pattern 'class VectorCollection { $$$ protected static bool IsValidJsonKey($_) { $$$ } $$$ }' # Test: Verify no local implementation exists echo "Verifying no local implementation exists..." rg -p "IsValidJsonKey.*\{" src/Sqlite/src/SqLiteVectorCollection.csLength of output: 388
Script:
#!/bin/bash # Let's try a different approach to find the IsValidJsonKey implementation # Search for the base class file first echo "Searching for VectorCollection class file..." fd "VectorCollection.cs" --type f # Search for IsValidJsonKey in all .cs files echo "Searching for IsValidJsonKey implementation across all files..." rg -p "IsValidJsonKey" --type cs -A 5 # Check inheritance of SQLiteVectorCollection echo "Checking SQLiteVectorCollection class definition..." rg -p "class SqLiteVectorCollection" --type cs -A 2Length of output: 2272
src/Abstractions/src/PublicAPI.Unshipped.txt (2)
17-17
: Breaking change: Consider migration guidance for SearchByMetadata.The return type change from
Task<List<Vector>>
toTask<IReadOnlyList<Vector>>
is a breaking change that improves immutability. While this aligns with the PR objective, consumers might need to update their code.Let's check for direct List usage in the codebase:
Consider adding migration guidance in the changelog or documentation to help users adapt to this breaking change. Common migration patterns include:
- Using
.ToList()
if mutation is required- Updating consumer code to work with IReadOnlyList interface
🧰 Tools
🪛 LanguageTool
[duplication] ~17-~17: Possible typo: you repeated a word
Context: ...g!, object!>! filters, System.Threading.CancellationToken cancellationToken = default(System.Threading.Cancellation...(ENGLISH_WORD_REPEAT_RULE)
93-93
: Review the accessibility of IsValidJsonKey method.The addition of
IsValidJsonKey
as a public static method onVectorCollection
expands the public API surface.Let's check its current usage pattern:
Consider whether this utility method should be:
- Internal/protected if it's only meant for internal validation
- Moved to a dedicated validation utility class if it's truly meant for public consumption
- Documented with XML comments to clarify its purpose and validation rules
src/Mongo/src/MongoVectorCollection.cs (1)
Line range hint
79-105
: Method signature update and implementation are appropriateChanging the return type to
IReadOnlyList<Vector>
enhances immutability and aligns with the codebase updates. The method correctly constructs the filter and retrieves the results.
protected static bool IsValidJsonKey(string input) | ||
{ | ||
// Only allow letters, numbers, and underscores | ||
return Regex.IsMatch(input, @"^\w+$"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider enhancing the JSON key validation.
While the implementation is correct, consider these improvements for robustness:
- Add null check
- Add length limits
- Add XML documentation
- Make the regex pattern a constant
Here's a suggested implementation:
+ private const string JsonKeyPattern = @"^\w+$";
+
+ /// <summary>
+ /// Validates if the input string is a valid JSON key.
+ /// </summary>
+ /// <param name="input">The string to validate.</param>
+ /// <returns>True if the input consists only of letters, numbers, and underscores, and is not empty.</returns>
+ /// <exception cref="ArgumentNullException">Thrown when input is null.</exception>
protected static bool IsValidJsonKey(string input)
{
+ ArgumentNullException.ThrowIfNull(input);
+
+ if (input.Length == 0 || input.Length > 256) // adjust max length as needed
+ return false;
+
// Only allow letters, numbers, and underscores
- return Regex.IsMatch(input, @"^\w+$");
+ return Regex.IsMatch(input, JsonKeyPattern);
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
protected static bool IsValidJsonKey(string input) | |
{ | |
// Only allow letters, numbers, and underscores | |
return Regex.IsMatch(input, @"^\w+$"); | |
} | |
private const string JsonKeyPattern = @"^\w+$"; | |
/// <summary> | |
/// Validates if the input string is a valid JSON key. | |
/// </summary> | |
/// <param name="input">The string to validate.</param> | |
/// <returns>True if the input consists only of letters, numbers, and underscores, and is not empty.</returns> | |
/// <exception cref="ArgumentNullException">Thrown when input is null.</exception> | |
protected static bool IsValidJsonKey(string input) | |
{ | |
ArgumentNullException.ThrowIfNull(input); | |
if (input.Length == 0 || input.Length > 256) // adjust max length as needed | |
return false; | |
// Only allow letters, numbers, and underscores | |
return Regex.IsMatch(input, JsonKeyPattern); | |
} |
Task<IReadOnlyList<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken) | ||
{ | ||
throw new NotImplementedException(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation needed for SearchByMetadata
.
While the method signature has been updated to match the interface changes, throwing NotImplementedException
is not production-ready. This method needs to be implemented to maintain consistency with other IVectorCollection
implementations.
Consider implementing the method using OpenSearch's query capabilities. Here's a suggested implementation structure:
- Task<IReadOnlyList<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
- {
- throw new NotImplementedException();
- }
+ async Task<IReadOnlyList<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
+ {
+ var response = await client.SearchAsync<VectorRecord>(s => s
+ .Index(Name)
+ .Query(q => BuildMetadataQuery(q, filters)), cancellationToken)
+ .ConfigureAwait(false);
+
+ if (!response.IsValid)
+ {
+ throw new InvalidOperationException($"Failed to search collection '{Name}' by metadata. DebugInformation: {response.DebugInformation}");
+ }
+
+ return response.Documents
+ .Where(vectorRecord => !string.IsNullOrWhiteSpace(vectorRecord.Text))
+ .Select(vectorRecord => new Vector
+ {
+ Text = vectorRecord.Text ?? string.Empty,
+ Id = vectorRecord.Id,
+ Metadata = vectorRecord.Metadata?.ToDictionary(x => x.Key, x => x.Value),
+ Embedding = vectorRecord.Vector,
+ })
+ .ToList()
+ .AsReadOnly();
+ }
+
+ private static QueryContainer BuildMetadataQuery(QueryContainerDescriptor<VectorRecord> q, Dictionary<string, object> filters)
+ {
+ // Build a bool query combining all metadata filters
+ return q.Bool(b => b.Must(
+ filters.Select(kvp =>
+ q.Term(t => t.Field($"metadata.{kvp.Key}").Value(kvp.Value))
+ )
+ ));
+ }
Committable suggestion was skipped due to low confidence.
public Task<IReadOnlyList<Vector>> SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken) | ||
{ | ||
throw new NotSupportedException("Chroma doesn't support collection metadata"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding documentation and removing async overhead.
The signature change to IReadOnlyList<Vector>
looks good. However, since this method immediately throws an exception, consider these improvements:
+ /// <summary>
+ /// Not supported in ChromaDB as it doesn't support collection metadata.
+ /// </summary>
+ /// <exception cref="NotSupportedException">Always thrown as this operation is not supported.</exception>
- public Task<IReadOnlyList<Vector>> SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
+ public Task<IReadOnlyList<Vector>> SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken = default)
{
- throw new NotSupportedException("Chroma doesn't support collection metadata");
+ return Task.FromException<IReadOnlyList<Vector>>(
+ new NotSupportedException("Chroma doesn't support collection metadata"));
}
This change:
- Adds XML documentation to clearly communicate the limitation
- Makes cancellationToken parameter optional for consistency
- Uses Task.FromException for better async/await performance
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
public Task<IReadOnlyList<Vector>> SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken) | |
{ | |
throw new NotSupportedException("Chroma doesn't support collection metadata"); | |
} | |
/// <summary> | |
/// Not supported in ChromaDB as it doesn't support collection metadata. | |
/// </summary> | |
/// <exception cref="NotSupportedException">Always thrown as this operation is not supported.</exception> | |
public Task<IReadOnlyList<Vector>> SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken = default) | |
{ | |
return Task.FromException<IReadOnlyList<Vector>>( | |
new NotSupportedException("Chroma doesn't support collection metadata")); | |
} |
/// <inheritdoc /> | ||
public async Task<bool> DeleteAsync(IEnumerable<string> ids, CancellationToken cancellationToken = default) | ||
{ | ||
var filter = Builders<Vector>.Filter.In(i => i.Id, ids); | ||
var result = await _mongoCollection.DeleteManyAsync(filter, cancellationToken).ConfigureAwait(false); | ||
return result.IsAcknowledged; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a null check for ids
to prevent potential ArgumentNullException
Currently, the method does not check if ids
is null
. To prevent a possible ArgumentNullException
, consider adding a null check at the beginning of the method.
Apply this change:
+ if (ids == null) throw new ArgumentNullException(nameof(ids));
var filter = Builders<Vector>.Filter.In(i => i.Id, ids);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/// <inheritdoc /> | |
public async Task<bool> DeleteAsync(IEnumerable<string> ids, CancellationToken cancellationToken = default) | |
{ | |
var filter = Builders<Vector>.Filter.In(i => i.Id, ids); | |
var result = await _mongoCollection.DeleteManyAsync(filter, cancellationToken).ConfigureAwait(false); | |
return result.IsAcknowledged; | |
} | |
/// <inheritdoc /> | |
public async Task<bool> DeleteAsync(IEnumerable<string> ids, CancellationToken cancellationToken = default) | |
{ | |
if (ids == null) throw new ArgumentNullException(nameof(ids)); | |
var filter = Builders<Vector>.Filter.In(i => i.Id, ids); | |
var result = await _mongoCollection.DeleteManyAsync(filter, cancellationToken).ConfigureAwait(false); | |
return result.IsAcknowledged; | |
} |
/// <inheritdoc /> | ||
public async Task<Vector?> GetAsync(string id, CancellationToken cancellationToken = default) | ||
{ | ||
var filter = Builders<Vector>.Filter.Eq(i => i.Id, id); | ||
var result = await _mongoCollection.FindAsync(filter, cancellationToken: cancellationToken).ConfigureAwait(false); | ||
return result.FirstOrDefault(cancellationToken: cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a null check for id
to ensure robustness
The method does not check if id
is null
. Adding a null check will prevent potential exceptions and improve the method's reliability.
Apply this change:
+ if (id == null) throw new ArgumentNullException(nameof(id));
var filter = Builders<Vector>.Filter.Eq(i => i.Id, id);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/// <inheritdoc /> | |
public async Task<Vector?> GetAsync(string id, CancellationToken cancellationToken = default) | |
{ | |
var filter = Builders<Vector>.Filter.Eq(i => i.Id, id); | |
var result = await _mongoCollection.FindAsync(filter, cancellationToken: cancellationToken).ConfigureAwait(false); | |
return result.FirstOrDefault(cancellationToken: cancellationToken); | |
/// <inheritdoc /> | |
public async Task<Vector?> GetAsync(string id, CancellationToken cancellationToken = default) | |
{ | |
if (id == null) throw new ArgumentNullException(nameof(id)); | |
var filter = Builders<Vector>.Filter.Eq(i => i.Id, id); | |
var result = await _mongoCollection.FindAsync(filter, cancellationToken: cancellationToken).ConfigureAwait(false); | |
return result.FirstOrDefault(cancellationToken: cancellationToken); |
/// <inheritdoc /> | ||
public async Task<bool> IsEmptyAsync(CancellationToken cancellationToken = default) | ||
{ | ||
return await _mongoCollection.EstimatedDocumentCountAsync(cancellationToken: cancellationToken).ConfigureAwait(false) == 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use CountDocumentsAsync
instead of EstimatedDocumentCountAsync
for accurate emptiness check
EstimatedDocumentCountAsync
may return an approximate count and might not be reliable for checking if the collection is empty. Consider using CountDocumentsAsync
for an accurate result.
Apply this change:
- return await _mongoCollection.EstimatedDocumentCountAsync(cancellationToken: cancellationToken).ConfigureAwait(false) == 0;
+ return await _mongoCollection.CountDocumentsAsync(FilterDefinition<Vector>.Empty, cancellationToken: cancellationToken).ConfigureAwait(false) == 0;
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/// <inheritdoc /> | |
public async Task<bool> IsEmptyAsync(CancellationToken cancellationToken = default) | |
{ | |
return await _mongoCollection.EstimatedDocumentCountAsync(cancellationToken: cancellationToken).ConfigureAwait(false) == 0; | |
} | |
/// <inheritdoc /> | |
public async Task<bool> IsEmptyAsync(CancellationToken cancellationToken = default) | |
{ | |
return await _mongoCollection.CountDocumentsAsync(FilterDefinition<Vector>.Empty, cancellationToken: cancellationToken).ConfigureAwait(false) == 0; | |
} |
/// <inheritdoc /> | ||
public async Task<IReadOnlyCollection<string>> AddAsync(IReadOnlyCollection<Vector> items, CancellationToken cancellationToken = default) | ||
{ | ||
await _mongoCollection.InsertManyAsync(items, cancellationToken: cancellationToken).ConfigureAwait(false); | ||
return items.Select(i => i.Id).ToList(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider returning a read-only collection to enforce immutability
Although the method returns IReadOnlyCollection<string>
, it currently returns a modifiable List<string>
. To ensure that the collection cannot be modified by the caller, consider returning a read-only collection using AsReadOnly()
.
Apply this change:
return items.Select(i => i.Id).ToList()
+ .AsReadOnly();
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/// <inheritdoc /> | |
public async Task<IReadOnlyCollection<string>> AddAsync(IReadOnlyCollection<Vector> items, CancellationToken cancellationToken = default) | |
{ | |
await _mongoCollection.InsertManyAsync(items, cancellationToken: cancellationToken).ConfigureAwait(false); | |
return items.Select(i => i.Id).ToList(); | |
} | |
/// <inheritdoc /> | |
public async Task<IReadOnlyCollection<string>> AddAsync(IReadOnlyCollection<Vector> items, CancellationToken cancellationToken = default) | |
{ | |
await _mongoCollection.InsertManyAsync(items, cancellationToken: cancellationToken).ConfigureAwait(false); | |
return items.Select(i => i.Id).ToList() | |
.AsReadOnly(); | |
} |
Summary by CodeRabbit
Release Notes
New Features
MongoVectorCollection
class for adding, deleting, retrieving, and searching vectors in MongoDB.Improvements
SearchByMetadata
method across multiple collections toIReadOnlyList<Vector>
, enhancing immutability and type safety.IsValidJsonKey
method in theVectorCollection
class.Bug Fixes
SearchByMetadata
method by validating filter keys inPostgresVectorCollection
.