Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SearchByMetadata method to SqlLiteVectorCollection #66

Merged
merged 2 commits into from
Oct 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions src/Abstractions/src/IVectorCollection.cs
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,16 @@
VectorSearchSettings? settings = default,
CancellationToken cancellationToken = default);

/// <summary>
/// Searches for records based on a metadata filter from a specific collection.
/// </summary>
/// <param name="filters">The filters to apply to the search request.</param>
/// <param name="cancellationToken">The cancellation token.</param>
/// <returns>A task representing the asynchronous operation. The task result contains the search response.</returns>
Task<List<Vector>> SearchByMetadata(

Check warning on line 66 in src/Abstractions/src/IVectorCollection.cs

View workflow job for this annotation

GitHub Actions / Build and test / Build, test and publish

Check warning on line 66 in src/Abstractions/src/IVectorCollection.cs

View workflow job for this annotation

GitHub Actions / Build and test / Build, test and publish

Check warning on line 66 in src/Abstractions/src/IVectorCollection.cs

View workflow job for this annotation

GitHub Actions / Build and test / Build, test and publish

Check warning on line 66 in src/Abstractions/src/IVectorCollection.cs

View workflow job for this annotation

GitHub Actions / Build and test / Build, test and publish

Check warning on line 66 in src/Abstractions/src/IVectorCollection.cs

View workflow job for this annotation

GitHub Actions / Build and test / Build, test and publish

Check warning on line 66 in src/Abstractions/src/IVectorCollection.cs

View workflow job for this annotation

GitHub Actions / Build and test / Build, test and publish

Check warning on line 66 in src/Abstractions/src/IVectorCollection.cs

View workflow job for this annotation

GitHub Actions / Build and test / Build, test and publish

Check warning on line 66 in src/Abstractions/src/IVectorCollection.cs

View workflow job for this annotation

GitHub Actions / Build and test / Build, test and publish

Dictionary<string, object> filters,
CancellationToken cancellationToken = default);

/// <summary>
/// Checks if the collection is empty.
/// </summary>
Expand Down
5 changes: 5 additions & 0 deletions src/Chroma/src/ChromaVectorCollection.cs
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,11 @@ private static IDictionary<string, object> DeserializeMetadata(MemoryRecordMetad
return JsonSerializer.Deserialize(metadata.AdditionalMetadata, SourceGenerationContext.Default.IDictionaryStringObject)
?? new Dictionary<string, object>();
}

Task<List<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
{
throw new NotSupportedException("SemanticKernel doesn't support collection metadata");
}
}

[JsonSourceGenerationOptions(Converters = [typeof(ObjectAsPrimitiveConverter)])]
Expand Down
5 changes: 5 additions & 0 deletions src/Elasticsearch/src/ElasticsearchVectorCollection.cs
Original file line number Diff line number Diff line change
Expand Up @@ -103,4 +103,9 @@ public Task<bool> IsEmptyAsync(CancellationToken cancellationToken = default)
{
throw new NotImplementedException();
}

Task<List<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
{
throw new NotImplementedException();
}
}
5 changes: 5 additions & 0 deletions src/InMemory/src/InMemoryVectorCollection.cs
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,9 @@ public Task<bool> IsEmptyAsync(CancellationToken cancellationToken = default)
{
return Task.FromResult(_vectors.GetValueOrDefault(id));
}

Task<List<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
{
throw new NotImplementedException();
}
}
5 changes: 5 additions & 0 deletions src/Mongo/src/MongoVectorCollection.cs
Original file line number Diff line number Diff line change
Expand Up @@ -70,4 +70,9 @@ public async Task<VectorSearchResponse> SearchAsync(VectorSearchRequest request,
.ToArray(),
};
}

Task<List<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
{
throw new NotImplementedException();
}
}
5 changes: 5 additions & 0 deletions src/OpenSearch/src/OpenSearchVectorCollection.cs
Original file line number Diff line number Diff line change
Expand Up @@ -133,4 +133,9 @@ public Task<bool> IsEmptyAsync(CancellationToken cancellationToken = default)
{
throw new NotImplementedException();
}

Task<List<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
{
throw new NotImplementedException();
}
Comment on lines +137 to +140
Copy link

@coderabbitai coderabbitai bot Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Missing TODO comments in SearchByMetadata methods

The SearchByMetadata method in IVectorCollection implementations across multiple vector collections lacks TODO comments. Adding these comments is essential to guide future development and ensure consistent implementation.

Affected files:

  • src/Chroma/src/ChromaVectorCollection.cs
  • src/SemanticKernel/src/SemanticKernelMemoryStoreCollection.cs
  • src/Postgres/src/PostgresVectorCollection.cs
  • src/OpenSearch/src/OpenSearchVectorCollection.cs
  • src/Mongo/src/MongoVectorCollection.cs
  • src/InMemory/src/InMemoryVectorCollection.cs
  • src/Elasticsearch/src/ElasticsearchVectorCollection.cs

Please add TODO comments to each SearchByMetadata method to indicate pending implementations.

🔗 Analysis chain

LGTM! Consider adding a TODO comment for future implementation.

The addition of the SearchByMetadata method aligns with the PR objectives and follows good practices for asynchronous methods. The NotImplementedException is expected at this stage.

Consider adding a TODO comment to remind about future implementation:

 Task<List<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
 {
+    // TODO: Implement SearchByMetadata for OpenSearch
     throw new NotImplementedException();
 }

Let's verify if this change is consistent across other vector collection implementations:


[/run_scripts]


[/run_scripts]


[/run_scripts]


[/run_scripts]

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if SearchByMetadata method is added consistently across all vector collection implementations

# Test: Search for SearchByMetadata method in all *VectorCollection.cs files
rg -U "Task<List<Vector>>.*SearchByMetadata.*\n.*\n.*throw new NotImplementedException\(\);" --type cs

Length of output: 2386


Script:

#!/bin/bash
# Description: Verify that each SearchByMetadata method includes a TODO comment.

# Search for SearchByMetadata methods and check for TODO comments within them
rg -U "Task<List<Vector>> IVectorCollection\.SearchByMetadata\(Dictionary<string, object> filters, CancellationToken cancellationToken\)\s*\{\s*// TODO" --type cs

Length of output: 164


Script:

#!/bin/bash
# Description: Verify that each SearchByMetadata method includes a TODO comment within its body.

# Search for SearchByMetadata methods and check for TODO comments within their implementations
rg -U -A5 "Task<List<Vector>> IVectorCollection\.SearchByMetadata\(Dictionary<string, object> filters, CancellationToken cancellationToken\)\s*\{\s*(// TODO|/\* TODO)" --type cs

Length of output: 179


Script:

#!/bin/bash
# Description: Verify that each SearchByMetadata method includes a TODO comment within its body, allowing for flexible comment placement and formatting.

# Search for SearchByMetadata methods and check for TODO comments anywhere within their implementations
rg -U -A10 "Task<List<Vector>> IVectorCollection\.SearchByMetadata\(Dictionary<string, object> filters, CancellationToken cancellationToken\)\s*\{\s*(?:(?!\}).)*TODO" --type cs

Length of output: 675


Script:

#!/bin/bash
# Description: Verify that each SearchByMetadata method includes a TODO comment within its body.

# Search for SearchByMetadata methods
rg -U "Task<List<Vector>> IVectorCollection\.SearchByMetadata\(Dictionary<string, object> filters, CancellationToken cancellationToken\)\s*\{" --type cs -n | while IFS=: read -r file line; do
    # Extract the lines following the method declaration up to the closing brace
    end_line=$(rg -U -B0 -A20 "Task<List<Vector>> IVectorCollection\.SearchByMetadata\(Dictionary<string, object> filters, CancellationToken cancellationToken\)\s*\{" "$file" | grep -n '}' | head -n1 | cut -d: -f1)
    if [ -n "$end_line" ]; then
        # Calculate the number of lines to extract
        lines_to_extract=$((end_line - line))
        # Extract the relevant lines
        method_body=$(sed -n "${line},+${lines_to_extract}p" "$file")
        # Check for TODO comment
        echo "$method_body" | grep -q "TODO"
        if [ $? -ne 0 ]; then
            echo "$file: Missing TODO in SearchByMetadata method at line $line"
        fi
    else
        echo "$file: Could not determine the end of SearchByMetadata method starting at line $line"
    fi
done

Length of output: 834

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need todo comments here - the NotImplementedException makes it clear that the work is TODO!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robalexclark, understood. I will keep that in mind for future reviews.

(>._.)?


✏️ Learnings added
Learnt from: robalexclark
PR: tryAGI/LangChain.Databases#66
File: src/OpenSearch/src/OpenSearchVectorCollection.cs:137-140
Timestamp: 2024-10-20T10:22:36.855Z
Learning: In this codebase, when a method throws a `NotImplementedException`, adding a TODO comment is unnecessary because the exception clearly indicates pending implementation.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

}
6 changes: 6 additions & 0 deletions src/Postgres/src/PostgresVectorCollection.cs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

namespace LangChain.Databases.Postgres;

/// <summary>
Expand Down Expand Up @@ -123,4 +124,9 @@ public Task<bool> IsEmptyAsync(CancellationToken cancellationToken = default)
{
throw new NotImplementedException();
}

Task<List<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
{
throw new NotImplementedException();
}
Comment on lines +128 to +131
Copy link

@coderabbitai coderabbitai bot Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Inconsistent Implementation of SearchByMetadata Across Vector Collections

The SearchByMetadata method is implemented in SqLiteVectorCollection.cs but remains unimplemented in other vector collection classes like PostgresVectorCollection.cs, OpenSearchVectorCollection.cs, etc. This inconsistency can lead to confusion and potential runtime issues for users expecting uniform functionality across different vector collections.

Consider standardizing the implementation of SearchByMetadata across all vector collections or documenting the reasons for the current discrepancies to ensure clarity for future development and usage.

🔗 Analysis chain

Consider the impact of unimplemented SearchByMetadata on users.

While adding the SearchByMetadata method aligns with the interface requirements, its current unimplemented state might impact users who expect this functionality to work with PostgresVectorCollection.

Consider the following suggestions to mitigate potential issues:

  1. Update the class documentation to clearly state which methods are not yet implemented.
  2. If possible, provide a timeline for when this functionality will be implemented.
  3. Consider implementing a basic version of the method that searches through the metadata stored in the Postgres database, even if it's not optimized for performance initially.

To ensure consistency across the codebase, let's check if other vector collections have implemented this method:

This will help us understand if this is a common pattern across different vector collections or if PostgresVectorCollection is an exception.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for SearchByMetadata implementations in other vector collections
rg -A 5 "SearchByMetadata\(" --glob "src/**/*VectorCollection.cs"

Length of output: 3572

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure I can do some of these like in memory and postgresql, but don't have access or knowledge of any of the other dbs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robalexclark, understood! Implementing SearchByMetadata for the databases you have access to, such as InMemory and PostgreSQL, makes sense. Leaving the other databases unimplemented for now is perfectly acceptable, and future contributors can add those implementations when they have the necessary knowledge.

}
5 changes: 5 additions & 0 deletions src/SemanticKernel/src/SemanticKernelMemoryStoreCollection.cs
Original file line number Diff line number Diff line change
Expand Up @@ -75,4 +75,9 @@ public async Task<VectorSearchResponse> SearchAsync(VectorSearchRequest request,
.ToListAsync(cancellationToken).ConfigureAwait(false);
return new VectorSearchResponse { Items = results.Select(x => new Vector { Text = x.Item1.Metadata.ExternalSourceName }).ToList() };
}

Task<List<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
{
throw new NotImplementedException();
}
Comment on lines +79 to +82
Copy link

@coderabbitai coderabbitai bot Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Consider enhancing the SearchByMetadata method implementation

The explicit interface implementation of IVectorCollection.SearchByMetadata is correct. However, there are a few points to consider:

  1. Add a comment explaining why the method is not implemented and if there are any future plans for implementation. This will help other developers understand the current state and future intentions.

  2. Consider a more graceful approach instead of throwing a NotImplementedException. For example, you could return an empty list or log a warning. This would allow the code to continue execution in scenarios where metadata search is not critical.

  3. Update the PublicAPI.txt documentation to reflect this new method, as mentioned in the PR objectives.

Here's a suggested implementation:

Task<List<Vector>> IVectorCollection.SearchByMetadata(Dictionary<string, object> filters, CancellationToken cancellationToken)
{
    // TODO: Implement metadata search functionality
    // Currently, metadata search is not supported for SemanticKernelMemoryStore
    // Consider implementing this in the future if SemanticKernel adds support for metadata filtering
    Console.WriteLine("Warning: SearchByMetadata is not implemented for SemanticKernelMemoryStoreCollection");
    return Task.FromResult(new List<Vector>());
}

This approach provides more context, logs a warning, and returns an empty list, allowing the calling code to continue execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing a notimplementedexception is clear that the method is not currently supported, and any future developers can implement the method if they wish to

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robalexclark, understood. Throwing a NotImplementedException makes it clear that the method is not currently supported.

(_/)
( --)
(")
(")


✏️ Learnings added
Learnt from: robalexclark
PR: tryAGI/LangChain.Databases#66
File: src/SemanticKernel/src/SemanticKernelMemoryStoreCollection.cs:79-82
Timestamp: 2024-10-20T10:05:41.534Z
Learning: In the LangChain.Databases project, it's acceptable to throw `NotImplementedException` in methods that are not yet implemented, as it clearly indicates to future developers that the method is not currently supported.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

}
196 changes: 137 additions & 59 deletions src/Sqlite/src/SqLiteVectorCollection.cs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
using System.Globalization;
using Microsoft.Data.Sqlite;
using System.Globalization;
using System.Text.Json;
using Microsoft.Data.Sqlite;
using System.Text.RegularExpressions;

namespace LangChain.Databases.Sqlite;

Expand Down Expand Up @@ -37,49 +38,56 @@ private static string SerializeVector(float[] vector)

private async Task InsertDocument(string id, float[] vector, Vector document)
{
var insertCommand = _connection.CreateCommand();
string query = $"INSERT INTO {Name} (id, vector, document) VALUES (@id, @vector, @document)";
insertCommand.CommandText = query;
insertCommand.Parameters.AddWithValue("@id", id);
insertCommand.Parameters.AddWithValue("@vector", SerializeVector(vector));
insertCommand.Parameters.AddWithValue("@document", SerializeDocument(document));
await insertCommand.ExecuteNonQueryAsync().ConfigureAwait(false);

using (var insertCommand = _connection.CreateCommand())
{
string query = $"INSERT INTO {Name} (id, vector, document) VALUES (@id, @vector, @document)";
insertCommand.CommandText = query;
insertCommand.Parameters.AddWithValue("@id", id);
insertCommand.Parameters.AddWithValue("@vector", SerializeVector(vector));
insertCommand.Parameters.AddWithValue("@document", SerializeDocument(document));
await insertCommand.ExecuteNonQueryAsync().ConfigureAwait(false);
}
}

private async Task DeleteDocument(string id)
{
var deleteCommand = _connection.CreateCommand();
string query = $"DELETE FROM {Name} WHERE id=@id";
deleteCommand.CommandText = query;
deleteCommand.Parameters.AddWithValue("@id", id);
await deleteCommand.ExecuteNonQueryAsync().ConfigureAwait(false);
using (var deleteCommand = _connection.CreateCommand())
{
string query = $"DELETE FROM {Name} WHERE id=@id";
deleteCommand.CommandText = query;
deleteCommand.Parameters.AddWithValue("@id", id);
await deleteCommand.ExecuteNonQueryAsync().ConfigureAwait(false);
}
}

private async Task<List<(Vector, float)>> SearchByVector(float[] vector, int k)
{
var searchCommand = _connection.CreateCommand();
string query = $"SELECT id, vector, document, distance(vector, @vector) d FROM {Name} ORDER BY d LIMIT @k";
searchCommand.CommandText = query;
searchCommand.Parameters.AddWithValue("@vector", SerializeVector(vector));
searchCommand.Parameters.AddWithValue("@k", k);
var res = new List<(Vector, float)>();
var reader = await searchCommand.ExecuteReaderAsync().ConfigureAwait(false);
while (await reader.ReadAsync().ConfigureAwait(false))
using (var searchCommand = _connection.CreateCommand())
{
var id = reader.GetString(0);
var vec = await reader.GetFieldValueAsync<string>(1).ConfigureAwait(false);
var doc = await reader.GetFieldValueAsync<string>(2).ConfigureAwait(false);
var docDeserialized = JsonSerializer.Deserialize(doc, SourceGenerationContext.Default.Vector) ?? new Vector
string query = $"SELECT id, vector, document, distance(vector, @vector) d FROM {Name} ORDER BY d LIMIT @k";
searchCommand.CommandText = query;
searchCommand.Parameters.AddWithValue("@vector", SerializeVector(vector));
searchCommand.Parameters.AddWithValue("@k", k);
var res = new List<(Vector, float)>();

using (var reader = await searchCommand.ExecuteReaderAsync().ConfigureAwait(false))
{
Text = string.Empty,
};
var distance = reader.GetFloat(3);
res.Add((docDeserialized, distance));
while (await reader.ReadAsync().ConfigureAwait(false))
{
var id = reader.GetString(0);
var vec = await reader.GetFieldValueAsync<string>(1).ConfigureAwait(false);
var doc = await reader.GetFieldValueAsync<string>(2).ConfigureAwait(false);
var docDeserialized = JsonSerializer.Deserialize(doc, SourceGenerationContext.Default.Vector) ?? new Vector
{
Text = string.Empty,
};
var distance = reader.GetFloat(3);
res.Add((docDeserialized, distance));
}

return res;
}
}

return res;
}

/// <inheritdoc />
Expand Down Expand Up @@ -109,41 +117,48 @@ public async Task<IReadOnlyCollection<string>> AddAsync(
/// <inheritdoc />
public async Task<Vector?> GetAsync(string id, CancellationToken cancellationToken = default)
{
var command = _connection.CreateCommand();
var query = $"SELECT vector, document FROM {Name} WHERE id=@id";
command.CommandText = query;
command.Parameters.AddWithValue("@id", id);
var reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false);
if (!await reader.ReadAsync(cancellationToken).ConfigureAwait(false))
using (var command = _connection.CreateCommand())
{
return null;
}
var query = $"SELECT vector, document FROM {Name} WHERE id=@id";
command.CommandText = query;
command.Parameters.AddWithValue("@id", id);

var vec = await reader.GetFieldValueAsync<string>(0, cancellationToken).ConfigureAwait(false);
var doc = await reader.GetFieldValueAsync<string>(1, cancellationToken).ConfigureAwait(false);
var docDeserialized = JsonSerializer.Deserialize(doc, SourceGenerationContext.Default.Vector) ?? new Vector
{
Text = string.Empty,
};
using (var reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false))
{
if (!await reader.ReadAsync(cancellationToken).ConfigureAwait(false))
{
return null;
}

return new Vector
{
Id = id,
Text = docDeserialized.Text,
Metadata = docDeserialized.Metadata,
Embedding = JsonSerializer.Deserialize(vec, SourceGenerationContext.Default.SingleArray),
};
var vec = await reader.GetFieldValueAsync<string>(0, cancellationToken).ConfigureAwait(false);
var doc = await reader.GetFieldValueAsync<string>(1, cancellationToken).ConfigureAwait(false);
var docDeserialized = JsonSerializer.Deserialize(doc, SourceGenerationContext.Default.Vector) ?? new Vector
{
Text = string.Empty,
};

return new Vector
{
Id = id,
Text = docDeserialized.Text,
Metadata = docDeserialized.Metadata,
Embedding = JsonSerializer.Deserialize(vec, SourceGenerationContext.Default.SingleArray),
};
}
}
}

/// <inheritdoc />
public async Task<bool> IsEmptyAsync(CancellationToken cancellationToken = default)
{
var command = _connection.CreateCommand();
var query = $"SELECT COUNT(*) FROM {Name}";
command.CommandText = query;
var count = await command.ExecuteScalarAsync(cancellationToken).ConfigureAwait(false);
using (var command = _connection.CreateCommand())
{
var query = $"SELECT COUNT(*) FROM {Name}";
command.CommandText = query;
var count = await command.ExecuteScalarAsync(cancellationToken).ConfigureAwait(false);

return count == null || Convert.ToInt32(count, CultureInfo.InvariantCulture) == 0;
return count == null || Convert.ToInt32(count, CultureInfo.InvariantCulture) == 0;
}
}

/// <inheritdoc />
Expand Down Expand Up @@ -180,4 +195,67 @@ public async Task<VectorSearchResponse> SearchAsync(
}).ToArray(),
};
}

/// <inheritdoc />
public async Task<List<Vector>> SearchByMetadata(
Dictionary<string, object> filters,
CancellationToken cancellationToken = default)
{
filters = filters ?? throw new ArgumentNullException(nameof(filters));

using (var command = _connection.CreateCommand())
{
var query = $"SELECT id, vector, document FROM {Name}";

var whereClauses = new List<string>();
int paramIndex = 0;

foreach (var filter in filters)
{
// Validate filter.Key to prevent injection in JSON path
if (string.IsNullOrWhiteSpace(filter.Key) || !IsValidJsonKey(filter.Key))
throw new ArgumentException($"Invalid filter key: {filter.Key}", nameof(filters));

var paramName = "@param" + paramIndex++;
whereClauses.Add($"json_extract(document, '$.Metadata.{filter.Key}') = {paramName}");
command.Parameters.AddWithValue(paramName, filter.Value);
}
query += " WHERE " + string.Join(" AND ", whereClauses);

command.CommandText = query;
var res = new List<Vector>();

using (var reader = await command.ExecuteReaderAsync(cancellationToken).ConfigureAwait(false))
{
while (await reader.ReadAsync(cancellationToken).ConfigureAwait(false))
{
var id = await reader.GetFieldValueAsync<string>(0, cancellationToken).ConfigureAwait(false);
var vec = await reader.GetFieldValueAsync<string>(1, cancellationToken).ConfigureAwait(false);
var doc = await reader.GetFieldValueAsync<string>(2, cancellationToken).ConfigureAwait(false);
var docDeserialized = JsonSerializer.Deserialize(doc, SourceGenerationContext.Default.Vector) ?? new Vector
{
Text = string.Empty,
};

var vector = new Vector
{
Id = id,
Text = docDeserialized.Text,
Metadata = docDeserialized.Metadata,
Embedding = JsonSerializer.Deserialize(vec, SourceGenerationContext.Default.SingleArray),
};

res.Add(vector);
}

return res;
}
}
}

Comment on lines +200 to +254
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure comprehensive validation for filter.Key in SearchByMetadata

The method SearchByMetadata uses IsValidJsonKey to validate filter.Key, restricting it to letters, numbers, and underscores. While this enhances security by preventing SQL injection attacks, it may limit users who have metadata keys with other valid characters (e.g., hyphens, spaces).

Consider the following options:

  • Option 1: Extend the regular expression in IsValidJsonKey to include other valid characters present in your metadata keys.

    -return Regex.IsMatch(input, @"^\w+$");
    +return Regex.IsMatch(input, @"^[\w\- ]+$"); // Allows letters, numbers, underscores, hyphens, and spaces
  • Option 2: Clearly document the restrictions on metadata keys so that users are aware of the allowed characters.

Ensure that any changes maintain security while providing flexibility for valid metadata keys.

Committable suggestion was skipped due to low confidence.

private static bool IsValidJsonKey(string input)
{
// Only allow letters, numbers, and underscores
return Regex.IsMatch(input, @"^\w+$");
}
}
Comment on lines +256 to +260
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reconsider the regular expression in IsValidJsonKey

The current regular expression ^\w+$ restricts keys to letters, numbers, and underscores. This may exclude valid JSON keys that contain other characters.

Depending on your use case, update the regex to include other permissible characters:

-return Regex.IsMatch(input, @"^\w+$");
+return Regex.IsMatch(input, @"^[\w\- ]+$"); // Allows underscores, hyphens, and spaces

Alternatively, if your metadata keys may contain any characters, you might need a different approach to safely include them in the JSON path without risking SQL injection.

Committable suggestion was skipped due to low confidence.