Skip to content

Commit

Permalink
Search Fancy Batching (Azure#15750)
Browse files Browse the repository at this point in the history
Adds SearchIndexingBufferedSender to index search documents with intelligent
batching, automatic flushing, and retries for failed indexing actions.

Fixes Azure#11161.
  • Loading branch information
tg-msft authored and suhas92 committed Oct 12, 2020
1 parent e657ec4 commit a2f6cc5
Show file tree
Hide file tree
Showing 107 changed files with 5,215,360 additions and 16 deletions.
13 changes: 7 additions & 6 deletions sdk/search/Azure.Search.Documents/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
# Release History

## 11.2.0-beta.1 (2020-10-06)

### Fixed

- Support deserializing null values during deserialization of skills ([#15108](https://github.com/Azure/azure-sdk-for-net/issues/15108))
- Fixed issues preventing mocking clients or initializing all models.
## 11.2.0-beta.1 (2020-10-09)

### Added

- Add `SearchIndexingBufferedSender<T>` to make indexing lots of documents fast and easy.
- Add support to `FieldBuilder` to define search fields for `Microsoft.Spatial` types without an explicit assembly dependency.
- Add support to `SearchFilter` to encode geometric types from `Microsoft.Spatial` without an explicit assembly dependency.
- Add `IndexingParameters.IndexingParametersConfiguration` property to define well-known properties supported by Azure Cognitive Search.

### Fixed

- Support deserializing null values during deserialization of skills ([#15108](https://github.com/Azure/azure-sdk-for-net/issues/15108))
- Fixed issues preventing mocking clients or initializing all models.

## 11.1.1 (2020-08-18)

### Fixed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ public SearchClient(System.Uri endpoint, string indexName, Azure.AzureKeyCredent
public virtual string ServiceName { get { throw null; } }
public virtual Azure.Response<Azure.Search.Documents.Models.AutocompleteResults> Autocomplete(string searchText, string suggesterName, Azure.Search.Documents.AutocompleteOptions options = null, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public virtual System.Threading.Tasks.Task<Azure.Response<Azure.Search.Documents.Models.AutocompleteResults>> AutocompleteAsync(string searchText, string suggesterName, Azure.Search.Documents.AutocompleteOptions options = null, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public virtual Azure.Search.Documents.SearchIndexingBufferedSender<T> CreateIndexingBufferedSender<T>(Azure.Search.Documents.SearchIndexingBufferedSenderOptions<T> options = null) { throw null; }
public virtual Azure.Response<Azure.Search.Documents.Models.IndexDocumentsResult> DeleteDocuments(string keyName, System.Collections.Generic.IEnumerable<string> keyValues, Azure.Search.Documents.IndexDocumentsOptions options = null, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public virtual System.Threading.Tasks.Task<Azure.Response<Azure.Search.Documents.Models.IndexDocumentsResult>> DeleteDocumentsAsync(string keyName, System.Collections.Generic.IEnumerable<string> keyValues, Azure.Search.Documents.IndexDocumentsOptions options = null, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public virtual System.Threading.Tasks.Task<Azure.Response<Azure.Search.Documents.Models.IndexDocumentsResult>> DeleteDocumentsAsync<T>(System.Collections.Generic.IEnumerable<T> documents, Azure.Search.Documents.IndexDocumentsOptions options = null, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
Expand Down Expand Up @@ -68,6 +69,45 @@ public static partial class SearchFilter
public static string Create(System.FormattableString filter) { throw null; }
public static string Create(System.FormattableString filter, System.IFormatProvider formatProvider) { throw null; }
}
public static partial class SearchIndexingBufferedSenderExtensions
{
public static void DeleteDocuments(this Azure.Search.Documents.SearchIndexingBufferedSender<Azure.Search.Documents.Models.SearchDocument> indexer, string keyFieldName, System.Collections.Generic.IEnumerable<string> documentKeys, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { }
public static System.Threading.Tasks.Task DeleteDocumentsAsync(this Azure.Search.Documents.SearchIndexingBufferedSender<Azure.Search.Documents.Models.SearchDocument> indexer, string keyFieldName, System.Collections.Generic.IEnumerable<string> documentKeys, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
}
public partial class SearchIndexingBufferedSenderOptions<T>
{
public SearchIndexingBufferedSenderOptions() { }
public bool AutoFlush { get { throw null; } set { } }
public System.TimeSpan? AutoFlushInterval { get { throw null; } set { } }
public System.Threading.CancellationToken FlushCancellationToken { get { throw null; } set { } }
public System.Func<T, string> KeyFieldAccessor { get { throw null; } set { } }
}
public partial class SearchIndexingBufferedSender<T> : System.IAsyncDisposable, System.IDisposable
{
protected SearchIndexingBufferedSender() { }
public virtual System.Uri Endpoint { get { throw null; } }
public virtual string IndexName { get { throw null; } }
public virtual string ServiceName { get { throw null; } }
public event System.Func<Azure.Search.Documents.Models.IndexDocumentsAction<T>, System.Threading.CancellationToken, System.Threading.Tasks.Task> ActionAddedAsync { add { } remove { } }
public event System.Func<Azure.Search.Documents.Models.IndexDocumentsAction<T>, Azure.Search.Documents.Models.IndexingResult, System.Threading.CancellationToken, System.Threading.Tasks.Task> ActionCompletedAsync { add { } remove { } }
public event System.Func<Azure.Search.Documents.Models.IndexDocumentsAction<T>, Azure.Search.Documents.Models.IndexingResult, System.Exception, System.Threading.CancellationToken, System.Threading.Tasks.Task> ActionFailedAsync { add { } remove { } }
public event System.Func<Azure.Search.Documents.Models.IndexDocumentsAction<T>, System.Threading.CancellationToken, System.Threading.Tasks.Task> ActionSentAsync { add { } remove { } }
public virtual void DeleteDocuments(System.Collections.Generic.IEnumerable<T> documents, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { }
public virtual System.Threading.Tasks.Task DeleteDocumentsAsync(System.Collections.Generic.IEnumerable<T> documents, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
~SearchIndexingBufferedSender() { }
public void Flush(System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { }
public System.Threading.Tasks.Task FlushAsync(System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public virtual void IndexDocuments(Azure.Search.Documents.Models.IndexDocumentsBatch<T> batch, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { }
public virtual System.Threading.Tasks.Task IndexDocumentsAsync(Azure.Search.Documents.Models.IndexDocumentsBatch<T> batch, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public virtual void MergeDocuments(System.Collections.Generic.IEnumerable<T> documents, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { }
public virtual System.Threading.Tasks.Task MergeDocumentsAsync(System.Collections.Generic.IEnumerable<T> documents, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public virtual void MergeOrUploadDocuments(System.Collections.Generic.IEnumerable<T> documents, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { }
public virtual System.Threading.Tasks.Task MergeOrUploadDocumentsAsync(System.Collections.Generic.IEnumerable<T> documents, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
System.Threading.Tasks.ValueTask System.IAsyncDisposable.DisposeAsync() { throw null; }
void System.IDisposable.Dispose() { }
public virtual void UploadDocuments(System.Collections.Generic.IEnumerable<T> documents, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { }
public virtual System.Threading.Tasks.Task UploadDocumentsAsync(System.Collections.Generic.IEnumerable<T> documents, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
}
public partial class SearchOptions
{
public SearchOptions() { }
Expand Down
1 change: 1 addition & 0 deletions sdk/search/Azure.Search.Documents/samples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ description: Samples for the Azure.Search.Documents client library
- Perform [service level operations](https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/search/Azure.Search.Documents/samples/Sample02_Service.md).
- Perform [index level operations](https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/search/Azure.Search.Documents/samples/Sample03_Index.md).
- Use [`[FieldBuilderIgnore]`](https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/search/Azure.Search.Documents/samples/Sample04_FieldBuilderIgnore.md) to add fields for unsupported properties using `FieldBuilder`.
- Learn about different ways to [index documents](https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/search/Azure.Search.Documents/samples/Sample05_IndexingDocuments.md).
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Azure.Search.Documents Samples - Indexing Documents

If we want to build a search experience for a product catalog, we need to get
all of that data into Azure Cognitive Search. If your data lives in Azure
Cosmos DB, Azure SQL Database, or Azure Blob Storage you can setup an indexer
to do that for you. But if your data lives elsewhere, this sample will show you
how to move it into a search index.

Let's start with a simple model type:

```C# Snippet:Azure_Search_Documents_Tests_Samples_Sample05_IndexingDocuments_LegacyProduct
public class Product
{
[SimpleField(IsKey = true)]
public string Id { get; set; }

[SearchableField(IsFilterable = true)]
public string Name { get; set; }

[SimpleField(IsSortable = true)]
public double Price { get; set; }

public override string ToString() =>
$"{Id}: {Name} for {Price:C}";
}
```

Let's generate a sample catalog of products using Microsoft's naming conventions
from the mid-2000s because that's old enough to be vintage now:

```C# Snippet:Azure_Search_Documents_Tests_Samples_Sample05_IndexingDocuments_GenerateCatalog
public IEnumerable<Product> GenerateCatalog(int count = 1000)
{
// Adapted from https://weblogs.asp.net/dfindley/Microsoft-Product-Name-Generator
var prefixes = new[] { null, "Visual", "Compact", "Embedded", "Expression" };
var products = new[] { null, "Windows", "Office", "SQL", "FoxPro", "BizTalk" };
var terms = new[] { "Web", "Robotics", "Network", "Testing", "Project", "Small Business", "Team", "Management", "Graphic", "Presentation", "Communication", "Workflow", "Ajax", "XML", "Content", "Source Control" };
var type = new[] { null, "Client", "Workstation", "Server", "System", "Console", "Shell", "Designer" };
var suffix = new[] { null, "Express", "Standard", "Professional", "Enterprise", "Ultimate", "Foundation", ".NET", "Framework" };
var components = new[] { prefixes, products, terms, type, suffix };

var random = new Random();
string RandomElement(string[] values) => values[(int)(random.NextDouble() * values.Length)];
double RandomPrice() => (random.Next(2, 20) * 100.0) / 2.0 - .01;

for (int i = 1; i <= count; i++)
{
yield return new Product
{
Id = i.ToString(),
Name = string.Join(" ", components.Select(RandomElement).Where(n => n != null)),
Price = RandomPrice()
};
}
}
```

That'll output classic software titles like "Visual Office Management Console
Enterprise" for the artisinal price of only $149.99.

We need to get that data into the service so let's create a `SearchIndexClient`:

```C# Snippet:Azure_Search_Documents_Tests_Samples_Sample05_IndexingDocuments_CreateIndex_Connect
Uri endpoint = new Uri(Environment.GetEnvironmentVariable("SEARCH_ENDPOINT"));
string key = Environment.GetEnvironmentVariable("SEARCH_API_KEY");

// Create a client for manipulating search indexes
AzureKeyCredential credential = new AzureKeyCredential(key);
SearchIndexClient indexClient = new SearchIndexClient(endpoint, credential);
```

We'll use `FieldBuilder` to do the heavy lifting and create our search index:

```C# Snippet:Azure_Search_Documents_Tests_Samples_Sample05_IndexingDocuments_CreateIndex_Create
// Create the search index
string indexName = "Products";
await indexClient.CreateIndexAsync(
new SearchIndex(indexName)
{
Fields = new FieldBuilder().Build(typeof(Product))
});
```

And finally we can get a `SearchClient` to the index we just created:

```C# Snippet:Azure_Search_Documents_Tests_Samples_Sample05_IndexingDocuments_CreateIndex_Client
SearchClient searchClient = indexClient.GetSearchClient(indexName);
```

## Simple indexing

The lowest level way to manage documents in your index is by using the
`IndexDocuments` method or its conveniences like `UploadDocuments`,
`DeleteDocuments`, etc.

```C# Snippet:Azure_Search_Documents_Tests_Samples_Sample05_IndexingDocuments_SimpleIndexing1
IEnumerable<Product> products = GenerateCatalog(count: 1000);
await searchClient.UploadDocumentsAsync(products);
```

We can quickly check that the document count matches our expections:

```C# Snippet:Azure_Search_Documents_Tests_Samples_Sample05_IndexingDocuments_SimpleIndexing2
Assert.AreEqual(1000, (int)await searchClient.GetDocumentCountAsync());
```

We do need to be careful to understand the limits of this API though. For
example, trying to upload all of our products in one shot via

```C# Snippet:Azure_Search_Documents_Tests_Samples_Sample05_IndexingDocuments_SimpleIndexing3
IEnumerable<Product> all = GenerateCatalog(count: 100000);
await searchClient.UploadDocumentsAsync(all);
```

results in a `RequestFailedException` with status code `400` because of "too
many indexing actions found in the request...". We also would need to check the
response because the service can return `207` for a well formed request with
partial failures inside. Most of those failures are errors, but some like
`409`, `422`, and `503` can be tried again.

## SearchIndexingBufferedSender

The easiest way to get data into a search index is using
`SearchIndexingBufferedSender`. Let's try sending a massive amount of data
again.

```C# Snippet:Azure_Search_Documents_Tests_Samples_Sample05_IndexingDocuments_BufferedSender1
await using SearchIndexingBufferedSender<Product> indexer =
searchClient.CreateIndexingBufferedSender<Product>();
await indexer.UploadDocumentsAsync(GenerateCatalog(count: 100000));
```

If we checked the count immediately, we wouldn't see it reflect the right
number. The buffered sender will split the indexing actions into batches,
submit them sequentially, retry failures, etc. You can call `FlushAsync` to
wait for everything to be sent to the service.

```C# Snippet:Azure_Search_Documents_Tests_Samples_Sample05_IndexingDocuments_BufferedSender2
await indexer.FlushAsync();
Assert.AreEqual(100000, (int)await searchClient.GetDocumentCountAsync());
```
Original file line number Diff line number Diff line change
Expand Up @@ -37,5 +37,6 @@
<Import Project="$(MSBuildThisFileDirectory)..\..\..\core\Azure.Core\src\Azure.Core.props" />
<ItemGroup>
<PackageReference Include="System.Text.Json" />
<PackageReference Include="System.Threading.Channels" />
</ItemGroup>
</Project>
Loading

0 comments on commit a2f6cc5

Please sign in to comment.