Skip to content

Commit

Permalink
Add C# notebook sample
Browse files Browse the repository at this point in the history
Update docs
  • Loading branch information
dluc committed Aug 21, 2023
1 parent 32db9b5 commit 6501b10
Show file tree
Hide file tree
Showing 8 changed files with 144 additions and 67 deletions.
29 changes: 19 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Semantic Memory

**Semantic Memory** is an open-source library and [service](dotnet/Service)
**Semantic Memory** is an open-source library and [service](dotnet/Service/README.md)
specialized in the efficient indexing of datasets through custom continuous data
pipelines.

Expand Down Expand Up @@ -210,7 +210,7 @@ to **start the Semantic Memory Service**:
> }
> ```
You can find a [full example here](samples/dotnet-WebClient/).
You can find a [full example here](samples/002-dotnet-WebClient/README.md).
## Custom memory ingestion pipelines
Expand Down Expand Up @@ -254,11 +254,20 @@ running the service locally with OpenAPI enabled.
# Examples and Tools
1. [Using the web service](samples/dotnet-WebClient)
2. [Importing files without the service (serverless ingestion)](samples/dotnet-Serverless)
3. [Upload files and get answers from command line with curl](samples/curl)
4. [Writing a custom pipeline handler](samples/dotnet-CustomHandler)
5. [Importing files with custom steps](samples/dotnet-ServerlessCustomPipeline)
6. [Extracting text from documents](samples/dotnet-ExtractTextFromDocs)
7. [Curl script to upload files](tools/upload-file.sh)
8. [Script to start RabbitMQ for development tasks](tools/run-rabbitmq.sh)
## Examples
1. [Collection of Jupyter notebooks with various tests](samples/000-notebooks)
2. [Importing files and asking question without running the service (serverless mode)](samples/001-dotnet-Serverless)
3. [Using the Semantic Memory web service](samples/002-dotnet-WebClient)
4. [How to upload files from command line with curl](samples/003-curl-calling-webservice)
5. [Processing files with custom steps](samples/004-dotnet-ServerlessCustomPipeline)
6. [Writing a custom pipeline handler](samples/006-dotnet-CustomHandler)
## Tools
1. [Curl script to upload files](tools/upload-file.sh)
2. [Curl script to ask questions](tools/ask.sh)
3. [Curl script to search documents](tools/search.sh)
4. [Script to start Qdrant for development tasks](tools/run-qdrant.sh)
5. [Script to start RabbitMQ for development tasks](tools/run-rabbitmq.sh)
9 changes: 9 additions & 0 deletions SemanticMemory.sln
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "tools", "tools", "{CA49F1A1
tools\README.md = tools\README.md
tools\ask.sh = tools\ask.sh
tools\search.sh = tools\search.sh
tools\run-qdrant.sh = tools\run-qdrant.sh
EndProjectSection
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "ClientLib", "dotnet\ClientLib\ClientLib.csproj", "{8A9FA587-7EBA-4D43-BE47-38D798B1C74C}"
Expand All @@ -72,6 +73,13 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "nuget", "nuget", "{02B67859
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "010-using-core-nuget", "samples\010-using-core-nuget\010-using-core-nuget.csproj", "{17F42A38-46CB-4471-AF93-3AE6981B7278}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "000-notebooks", "000-notebooks", "{C93FCED9-808A-4B03-9B4C-0DBAE46D5BDD}"
ProjectSection(SolutionItems) = preProject
samples\000-notebooks\001-upload-and-ask.ipynb = samples\000-notebooks\001-upload-and-ask.ipynb
samples\000-notebooks\appsettings.json = samples\000-notebooks\appsettings.json
samples\000-notebooks\sample-SK-Readme.pdf = samples\000-notebooks\sample-SK-Readme.pdf
EndProjectSection
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand All @@ -89,6 +97,7 @@ Global
{87DEAE8D-138C-4FDD-B4C9-11C3A7817E8F} = {6EF76FD8-4C35-4370-8539-5DDF45357A50}
{02B67859-5B16-46E3-9CB4-916887275649} = {87DEAE8D-138C-4FDD-B4C9-11C3A7817E8F}
{17F42A38-46CB-4471-AF93-3AE6981B7278} = {0A43C65C-6007-4BB4-B3FE-8D439FC91841}
{C93FCED9-808A-4B03-9B4C-0DBAE46D5BDD} = {0A43C65C-6007-4BB4-B3FE-8D439FC91841}
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{8A9FA587-7EBA-4D43-BE47-38D798B1C74C}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
Expand Down
67 changes: 53 additions & 14 deletions dotnet/Service/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Semantic Memory as a Service

This folder contains **Semantic Memory Service**, used to manage memory
settings, ingest data and query for answers.

Expand All @@ -9,6 +11,11 @@ The service is composed by two main components:
If you need deploying and scaling the webservice and the pipeline handlers
separately, you can configure the service to enable/disable them.

Once the service is up and running, you can use the **Semantic Service web
client** or simply interact with the Web API. The API schema is available
at http://127.0.0.1:9001/swagger/index.html when running the service locally
with **OpenAPI** enabled.

# ⚙️ Configuration

To quickly set up the service, run the following command and follow the
Expand Down Expand Up @@ -48,28 +55,60 @@ To run the Semantic Memory service:
> ### On Windows:
>
> ```shell
> cd dotnet/Service
> cd dotnet\Service
> run.cmd
> ```
The `run.sh`/`run.cmd` scripts internally use the `ASPNETCORE_ENVIRONMENT` env var,
so the code will use the settings stored in `appsettings.Development.json`.
The `run.sh`/`run.cmd` scripts internally use the `ASPNETCORE_ENVIRONMENT`
env var, so the code will use the settings stored in `appsettings.Development.json`.
# ⚙️ Dependencies
The service depends on three main components:
* **Content storage**: this is where content like files, chats, emails are saved
and transformed when uploaded. Currently, the solution supports local
* **Content storage**: this is where content like files, chats, emails are
saved and transformed when uploaded. Currently, the solution supports local
filesystem and Azure Blobs.
* **Vector storage**: service used to persist embeddings. Currently, the solution
support Azure Cognitive Search and Qdrant. Soon we'll add support for more vector DBs.
* **Data ingestion orchestration**: this can run in memory and in the same
process, e.g. when working with small files, or run as a service, in which
case it requires persistent queues like Azure Queues or RabbitMQ.
To use RabbitMQ locally, install docker and launch RabbitMQ with:
docker run -it --rm --name rabbitmq \
-p 5672:5672 -e RABBITMQ_DEFAULT_USER=user -e RABBITMQ_DEFAULT_PASS=password \
rabbitmq:3
* **Embedding generator**: all the documents uploaded are automatically
partioned (aka "chunked") and indexed for vector search, generating
several embedding vectors for each file. We recommend using
[OpenAI ADA v2](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)
model, though you can easily plug in any embedding generator if needed.
* **Text generator** (aka LLM): during document ingestion and when asking
questions, the service requires an LLM to execute prompts, e.g. to
generate synthetic data, and to generate answers. The service has
been tested with OpenAI
[GPT3.5 and GPT4](https://platform.openai.com/docs/models/overview)
which we recommend. The number of tokens available is also an important
factor affecting summarization and answer generations, so you might
get better results with 16k and 32k models.
* **Vector storage**: service used to persist embeddings. Currently, the
service supports **Azure Cognitive Search** and **Qdrant**. Soon we'll add
support for more vector DBs.
> To use Qdrant locally, install docker and launch Qdrant with:
>
> docker run -it --rm --name qdrant -p 6333:6333 qdrant/qdrant
> or simply use the `run-qdrant.sh` script from the `tools` folder.
* **Data ingestion orchestration**: this can run in memory and in the same
process, e.g. when working with small files, or run as a service, in which
case it requires persistent queues like **Azure Queues** or **RabbitMQ**
(corelib includes also a basic-experimental file-based queue, that might be
good enough for tests and demos).
When running a service, we recommend persistent queues for reliability and
horizontal scaling.
> To use RabbitMQ locally, install docker and launch RabbitMQ with:
>
> docker run -it --rm --name rabbitmq \
> -p 5672:5672 -e RABBITMQ_DEFAULT_USER=user -e RABBITMQ_DEFAULT_PASS=password \
> rabbitmq:3
> or simply use the `run-rabbitmq.sh` script from the `tools` folder.
40 changes: 29 additions & 11 deletions samples/000-notebooks/001-upload-and-ask.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
},
"outputs": [],
"source": [
"#r \"nuget: Microsoft.SemanticMemory.Core, 0.0.230818.3-preview\"\n",
"#r \"nuget: Microsoft.SemanticMemory.Core, 0.0.230820.3-preview\"\n",
"\n",
"using Microsoft.SemanticMemory.Core.AI.OpenAI;\n",
"using Microsoft.SemanticMemory.Core.AppBuilders;\n",
Expand Down Expand Up @@ -75,7 +75,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
Expand Down Expand Up @@ -104,7 +104,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
Expand All @@ -127,7 +127,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 7,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
Expand All @@ -136,15 +136,23 @@
"kernelName": "csharp"
}
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Question: What's Semantic Kernel?\n",
"\n",
"Answer: Semantic Kernel is a lightweight SDK (Software Development Kit) developed by Microsoft. It enables the integration of AI Large Language Models (LLMs) with conventional programming languages. The SDK combines natural language semantic functions, traditional code native functions, and embeddings-based memory to enhance applications with AI capabilities. Semantic Kernel supports prompt templating, function chaining, vectorized memory, and intelligent planning capabilities. It encapsulates several design patterns from the latest AI research, allowing developers to infuse their applications with features like prompt chaining, summarization, zero/few-shot learning, contextual memory, embeddings, semantic indexing, planning, retrieval-augmented generation, and accessing external knowledge stores. Semantic Kernel is available for use with C# and Python programming languages. It is an open-source project, and developers are encouraged to contribute to its development through GitHub discussions, opening issues, sending pull requests, and joining the Discord community.\n"
]
}
],
"source": [
"var question = \"What's Semantic Kernel?\";\n",
"\n",
"Console.WriteLine($\"\\n\\nQuestion: {question}\");\n",
"\n",
"var answer = await memory.AskAsync(question);\n",
"\n",
"Console.WriteLine($\"\\nAnswer: {answer.Result}\");"
"Console.WriteLine($\"Question: {question}\\n\\nAnswer: {answer.Result}\");"
]
},
{
Expand All @@ -156,7 +164,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"metadata": {
"dotnet_interactive": {
"language": "csharp"
Expand All @@ -165,9 +173,19 @@
"kernelName": "csharp"
}
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sources:\n",
"\n",
" - sample-SK-Readme.pdf - doc001/cf309c5bc07142dfb400a4856ef91b79 [Monday, August 21, 2023]\n"
]
}
],
"source": [
"Console.WriteLine(\"\\n\\n Sources:\\n\");\n",
"Console.WriteLine(\"Sources:\\n\");\n",
"\n",
"foreach (var x in answer.RelevantSources)\n",
"{\n",
Expand Down
22 changes: 1 addition & 21 deletions samples/002-dotnet-WebClient/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,26 +26,6 @@ while (!await memory.IsDocumentReadyAsync(documentId: "doc012"))
string answer = await memory.AskAsync("What's Semantic Kernel?");
```

# Prepare the example

Before running the code, from the folder run this command:

```csharp
dotnet run setup
```

The app will ask a few questions about your configuration, storing the
required information in `appsettings.Development.json`. This file is used when
the env var `ASPNETCORE_ENVIRONMENT` is set to `Development`. Look at the
comments in `appsettings.json` for details and more advanced options.

You can run the command again later to edit the file, or edit it manually for
advanced configurations.

You can find more details about the configuration options in `appsettings.json`,
and more info about .NET configurations at
https://learn.microsoft.com/aspnet/core/fundamentals/configuration
# Run the example

To run the example, depending on your platform, execute either `run.sh` or `run.cmd`.
To run the example, depending on your platform, execute either `run.sh` or `run.cmd` or just `dotnet run`
2 changes: 1 addition & 1 deletion samples/010-using-core-nuget/010-using-core-nuget.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
</PropertyGroup>

<ItemGroup>
<PackageReference Include="Microsoft.SemanticMemory.Core" Version="0.0.230818.3-preview" />
<PackageReference Include="Microsoft.SemanticMemory.Core" Version="0.0.230820.3-preview" />
</ItemGroup>

<ItemGroup>
Expand Down
12 changes: 6 additions & 6 deletions samples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

Some examples about how to use Semantic Memory.

1. [Using the web service](dotnet-WebClient)
2. [Importing files without the service (serverless ingestion)](dotnet-Serverless)
3. [How to upload files from command line with curl](curl)
4. [Writing a custom pipeline handler](dotnet-CustomHandler)
5. [Importing files with custom steps](dotnet-ServerlessCustomPipeline)
6. [Extracting text from documents](dotnet-ExtractTextFromDocs)
1. [Collection of Jupyter notebooks with various tests](000-notebooks)
2. [Importing files and asking question without running the service (serverless mode)](001-dotnet-Serverless)
3. [Using the Semantic Memory web service](002-dotnet-WebClient)
4. [How to upload files from command line with curl](003-curl-calling-webservice)
5. [Processing files with custom steps](004-dotnet-ServerlessCustomPipeline)
6. [Writing a custom pipeline handler](006-dotnet-CustomHandler)
30 changes: 26 additions & 4 deletions tools/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,37 @@ Instructions:
./upload-file.sh -h
```

Example:
# ask.sh

Simple client for asking questions about your documents from the command line.

Instructions:

```bash
./upload-file.sh -f test.pdf -s http://127.0.0.1:9001/upload -u curlUser -c curlDataCollection -i curlExample01
./ask.sh -h
```

# search.sh

Simple client for searching your indexed documents from the command line.

Instructions:

```bash
./search.sh -h
```

# run-qdrant.sh

Script to start Qdrant using Docker, for local development/debugging.

Qdrant is used to store and search vectors, as an alternative to
[Azure Cognitive Search](https://azure.microsoft.com/products/ai-services/cognitive-search).

# run-rabbitmq.sh

Script to start RabbitMQ using Docker, for local development/debugging.

RabbitMQ is used to provides queues for the asynchronous pipelines, as an alternative
to Azure Queues.
RabbitMQ is used to provides queues for the asynchronous pipelines,
as an alternative to
[Azure Queues](https://learn.microsoft.com/azure/storage/queues/storage-queues-introduction).

0 comments on commit 6501b10

Please sign in to comment.