Add C# notebook sample

Update docs
microsoft · Aug 21, 2023 · 6501b10 · 6501b10
1 parent 32db9b5
commit 6501b10
Show file tree

Hide file tree

Showing 8 changed files with 144 additions and 67 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Semantic Memory
 
-**Semantic Memory** is an open-source library and [service](dotnet/Service)
+**Semantic Memory** is an open-source library and [service](dotnet/Service/README.md)
 specialized in the efficient indexing of datasets through custom continuous data
 pipelines.
 
@@ -210,7 +210,7 @@ to **start the Semantic Memory Service**:
 > }
 > ```
 
-You can find a [full example here](samples/dotnet-WebClient/).
+You can find a [full example here](samples/002-dotnet-WebClient/README.md).
 
 ## Custom memory ingestion pipelines
 
@@ -254,11 +254,20 @@ running the service locally with OpenAPI enabled.
 
 # Examples and Tools
 
-1. [Using the web service](samples/dotnet-WebClient)
-2. [Importing files without the service (serverless ingestion)](samples/dotnet-Serverless)
-3. [Upload files and get answers from command line with curl](samples/curl)
-4. [Writing a custom pipeline handler](samples/dotnet-CustomHandler)
-5. [Importing files with custom steps](samples/dotnet-ServerlessCustomPipeline)
-6. [Extracting text from documents](samples/dotnet-ExtractTextFromDocs)
-7. [Curl script to upload files](tools/upload-file.sh)
-8. [Script to start RabbitMQ for development tasks](tools/run-rabbitmq.sh)
+## Examples
+
+1. [Collection of Jupyter notebooks with various tests](samples/000-notebooks)
+2. [Importing files and asking question without running the service (serverless mode)](samples/001-dotnet-Serverless)
+3. [Using the Semantic Memory web service](samples/002-dotnet-WebClient)
+4. [How to upload files from command line with curl](samples/003-curl-calling-webservice)
+5. [Processing files with custom steps](samples/004-dotnet-ServerlessCustomPipeline)
+6. [Writing a custom pipeline handler](samples/006-dotnet-CustomHandler)
+
+## Tools
+
+1. [Curl script to upload files](tools/upload-file.sh)
+2. [Curl script to ask questions](tools/ask.sh)
+3. [Curl script to search documents](tools/search.sh)
+4. [Script to start Qdrant for development tasks](tools/run-qdrant.sh)
+5. [Script to start RabbitMQ for development tasks](tools/run-rabbitmq.sh)
+
diff --git a/SemanticMemory.sln b/SemanticMemory.sln
@@ -47,6 +47,7 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "tools", "tools", "{CA49F1A1
 		tools\README.md = tools\README.md
 		tools\ask.sh = tools\ask.sh
 		tools\search.sh = tools\search.sh
+		tools\run-qdrant.sh = tools\run-qdrant.sh
 	EndProjectSection
 EndProject
 Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "ClientLib", "dotnet\ClientLib\ClientLib.csproj", "{8A9FA587-7EBA-4D43-BE47-38D798B1C74C}"
@@ -72,6 +73,13 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "nuget", "nuget", "{02B67859
 EndProject
 Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "010-using-core-nuget", "samples\010-using-core-nuget\010-using-core-nuget.csproj", "{17F42A38-46CB-4471-AF93-3AE6981B7278}"
 EndProject
+Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "000-notebooks", "000-notebooks", "{C93FCED9-808A-4B03-9B4C-0DBAE46D5BDD}"
+	ProjectSection(SolutionItems) = preProject
+		samples\000-notebooks\001-upload-and-ask.ipynb = samples\000-notebooks\001-upload-and-ask.ipynb
+		samples\000-notebooks\appsettings.json = samples\000-notebooks\appsettings.json
+		samples\000-notebooks\sample-SK-Readme.pdf = samples\000-notebooks\sample-SK-Readme.pdf
+	EndProjectSection
+EndProject
 Global
 	GlobalSection(SolutionConfigurationPlatforms) = preSolution
 		Debug|Any CPU = Debug|Any CPU
@@ -89,6 +97,7 @@ Global
 		{87DEAE8D-138C-4FDD-B4C9-11C3A7817E8F} = {6EF76FD8-4C35-4370-8539-5DDF45357A50}
 		{02B67859-5B16-46E3-9CB4-916887275649} = {87DEAE8D-138C-4FDD-B4C9-11C3A7817E8F}
 		{17F42A38-46CB-4471-AF93-3AE6981B7278} = {0A43C65C-6007-4BB4-B3FE-8D439FC91841}
+		{C93FCED9-808A-4B03-9B4C-0DBAE46D5BDD} = {0A43C65C-6007-4BB4-B3FE-8D439FC91841}
 	EndGlobalSection
 	GlobalSection(ProjectConfigurationPlatforms) = postSolution
 		{8A9FA587-7EBA-4D43-BE47-38D798B1C74C}.Debug|Any CPU.ActiveCfg = Debug|Any CPU

diff --git a/dotnet/Service/README.md b/dotnet/Service/README.md
@@ -1,3 +1,5 @@
+# Semantic Memory as a Service
+
 This folder contains **Semantic Memory Service**, used to manage memory
 settings, ingest data and query for answers.
 
@@ -9,6 +11,11 @@ The service is composed by two main components:
 If you need deploying and scaling the webservice and the pipeline handlers
 separately, you can configure the service to enable/disable them.
 
+Once the service is up and running, you can use the **Semantic Service web
+client** or simply interact with the Web API. The API schema is available
+at http://127.0.0.1:9001/swagger/index.html when running the service locally
+with **OpenAPI** enabled.
+
 # ⚙️ Configuration
 
 To quickly set up the service, run the following command and follow the
@@ -48,28 +55,60 @@ To run the Semantic Memory service:
 > ### On Windows:
 >
 > ```shell
-> cd dotnet/Service
+> cd dotnet\Service
 > run.cmd
 > ```
 
-The `run.sh`/`run.cmd` scripts internally use the `ASPNETCORE_ENVIRONMENT` env var,
-so the code will use the settings stored in `appsettings.Development.json`.
+The `run.sh`/`run.cmd` scripts internally use the `ASPNETCORE_ENVIRONMENT`
+env var, so the code will use the settings stored in `appsettings.Development.json`.
 
 # ⚙️ Dependencies
 
 The service depends on three main components:
 
-* **Content storage**: this is where content like files, chats, emails are saved
-  and transformed when uploaded. Currently, the solution supports local
+* **Content storage**: this is where content like files, chats, emails are
+  saved and transformed when uploaded. Currently, the solution supports local
   filesystem and Azure Blobs.
-* **Vector storage**: service used to persist embeddings. Currently, the solution
-  support Azure Cognitive Search and Qdrant. Soon we'll add support for more vector DBs.
-* **Data ingestion orchestration**: this can run in memory and in the same
-  process, e.g. when working with small files, or run as a service, in which
-  case it requires persistent queues like Azure Queues or RabbitMQ.
 
-To use RabbitMQ locally, install docker and launch RabbitMQ with:
 
-      docker run -it --rm --name rabbitmq \
-         -p 5672:5672 -e RABBITMQ_DEFAULT_USER=user -e RABBITMQ_DEFAULT_PASS=password \
-         rabbitmq:3
+* **Embedding generator**: all the documents uploaded are automatically
+  partioned (aka "chunked") and indexed for vector search, generating
+  several embedding vectors for each file. We recommend using
+  [OpenAI ADA v2](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)
+  model, though you can easily plug in any embedding generator if needed.
+
+
+* **Text generator** (aka LLM): during document ingestion and when asking
+  questions, the service requires an LLM to execute prompts, e.g. to
+  generate synthetic data, and to generate answers. The service has
+  been tested with OpenAI
+  [GPT3.5 and GPT4](https://platform.openai.com/docs/models/overview)
+  which we recommend. The number of tokens available is also an important
+  factor affecting summarization and answer generations, so you might
+  get better results with 16k and 32k models.
+
+
+* **Vector storage**: service used to persist embeddings. Currently, the
+  service supports **Azure Cognitive Search** and **Qdrant**. Soon we'll add
+  support for more vector DBs.
+
+  > To use Qdrant locally, install docker and launch Qdrant with:
+  >
+  >       docker run -it --rm --name qdrant -p 6333:6333 qdrant/qdrant
+  > or simply use the `run-qdrant.sh` script from the `tools` folder.
+
+
+* **Data ingestion orchestration**: this can run in memory and in the same
+  process, e.g. when working with small files, or run as a service, in which
+  case it requires persistent queues like **Azure Queues** or **RabbitMQ**
+  (corelib includes also a basic-experimental file-based queue, that might be
+  good enough for tests and demos).
+  When running a service, we recommend persistent queues for reliability and
+  horizontal scaling.
+
+  > To use RabbitMQ locally, install docker and launch RabbitMQ with:
+  >
+  >      docker run -it --rm --name rabbitmq \
+  >         -p 5672:5672 -e RABBITMQ_DEFAULT_USER=user -e RABBITMQ_DEFAULT_PASS=password \
+  >         rabbitmq:3
+  > or simply use the `run-rabbitmq.sh` script from the `tools` folder.
diff --git a/samples/000-notebooks/001-upload-and-ask.ipynb b/samples/000-notebooks/001-upload-and-ask.ipynb
@@ -24,7 +24,7 @@
    },
    "outputs": [],
    "source": [
-    "#r \"nuget: Microsoft.SemanticMemory.Core, 0.0.230818.3-preview\"\n",
+    "#r \"nuget: Microsoft.SemanticMemory.Core, 0.0.230820.3-preview\"\n",
     "\n",
     "using Microsoft.SemanticMemory.Core.AI.OpenAI;\n",
     "using Microsoft.SemanticMemory.Core.AppBuilders;\n",
@@ -75,7 +75,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 3,
    "metadata": {
     "dotnet_interactive": {
      "language": "csharp"
@@ -104,7 +104,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 4,
    "metadata": {
     "dotnet_interactive": {
      "language": "csharp"
@@ -127,7 +127,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 7,
    "metadata": {
     "dotnet_interactive": {
      "language": "csharp"
@@ -136,15 +136,23 @@
      "kernelName": "csharp"
     }
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Question: What's Semantic Kernel?\n",
+      "\n",
+      "Answer: Semantic Kernel is a lightweight SDK (Software Development Kit) developed by Microsoft. It enables the integration of AI Large Language Models (LLMs) with conventional programming languages. The SDK combines natural language semantic functions, traditional code native functions, and embeddings-based memory to enhance applications with AI capabilities. Semantic Kernel supports prompt templating, function chaining, vectorized memory, and intelligent planning capabilities. It encapsulates several design patterns from the latest AI research, allowing developers to infuse their applications with features like prompt chaining, summarization, zero/few-shot learning, contextual memory, embeddings, semantic indexing, planning, retrieval-augmented generation, and accessing external knowledge stores. Semantic Kernel is available for use with C# and Python programming languages. It is an open-source project, and developers are encouraged to contribute to its development through GitHub discussions, opening issues, sending pull requests, and joining the Discord community.\n"
+     ]
+    }
+   ],
    "source": [
     "var question = \"What's Semantic Kernel?\";\n",
     "\n",
-    "Console.WriteLine($\"\\n\\nQuestion: {question}\");\n",
-    "\n",
     "var answer = await memory.AskAsync(question);\n",
     "\n",
-    "Console.WriteLine($\"\\nAnswer: {answer.Result}\");"
+    "Console.WriteLine($\"Question: {question}\\n\\nAnswer: {answer.Result}\");"
    ]
   },
   {
@@ -156,7 +164,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 6,
    "metadata": {
     "dotnet_interactive": {
      "language": "csharp"
@@ -165,9 +173,19 @@
      "kernelName": "csharp"
     }
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sources:\n",
+      "\n",
+      "  - sample-SK-Readme.pdf  - doc001/cf309c5bc07142dfb400a4856ef91b79 [Monday, August 21, 2023]\n"
+     ]
+    }
+   ],
    "source": [
-    "Console.WriteLine(\"\\n\\n  Sources:\\n\");\n",
+    "Console.WriteLine(\"Sources:\\n\");\n",
     "\n",
     "foreach (var x in answer.RelevantSources)\n",
     "{\n",

diff --git a/samples/002-dotnet-WebClient/README.md b/samples/002-dotnet-WebClient/README.md
@@ -26,26 +26,6 @@ while (!await memory.IsDocumentReadyAsync(documentId: "doc012"))
 string answer = await memory.AskAsync("What's Semantic Kernel?");
 ```
 
-# Prepare the example
-
-Before running the code, from the folder run this command:
-
-```csharp
-dotnet run setup
-```
-
-The app will ask a few questions about your configuration, storing the
-required information in `appsettings.Development.json`. This file is used when
-the env var `ASPNETCORE_ENVIRONMENT` is set to `Development`. Look at the
-comments in `appsettings.json` for details and more advanced options.
-
-You can run the command again later to edit the file, or edit it manually for
-advanced configurations.
-
-You can find more details about the configuration options in `appsettings.json`,
-and more info about .NET configurations at
-https://learn.microsoft.com/aspnet/core/fundamentals/configuration
-
 # Run the example
 
-To run the example, depending on your platform, execute either `run.sh` or `run.cmd`.
+To run the example, depending on your platform, execute either `run.sh` or `run.cmd` or just `dotnet run`
diff --git a/samples/010-using-core-nuget/010-using-core-nuget.csproj b/samples/010-using-core-nuget/010-using-core-nuget.csproj
@@ -8,7 +8,7 @@
     </PropertyGroup>
 
     <ItemGroup>
-      <PackageReference Include="Microsoft.SemanticMemory.Core" Version="0.0.230818.3-preview" />
+      <PackageReference Include="Microsoft.SemanticMemory.Core" Version="0.0.230820.3-preview" />
     </ItemGroup>
 
     <ItemGroup>

diff --git a/samples/README.md b/samples/README.md
@@ -2,9 +2,9 @@
 
 Some examples about how to use Semantic Memory.
 
-1. [Using the web service](dotnet-WebClient)
-2. [Importing files without the service (serverless ingestion)](dotnet-Serverless)
-3. [How to upload files from command line with curl](curl)
-4. [Writing a custom pipeline handler](dotnet-CustomHandler)
-5. [Importing files with custom steps](dotnet-ServerlessCustomPipeline)
-6. [Extracting text from documents](dotnet-ExtractTextFromDocs)
+1. [Collection of Jupyter notebooks with various tests](000-notebooks)
+2. [Importing files and asking question without running the service (serverless mode)](001-dotnet-Serverless)
+3. [Using the Semantic Memory web service](002-dotnet-WebClient)
+4. [How to upload files from command line with curl](003-curl-calling-webservice)
+5. [Processing files with custom steps](004-dotnet-ServerlessCustomPipeline)
+6. [Writing a custom pipeline handler](006-dotnet-CustomHandler)
diff --git a/tools/README.md b/tools/README.md
@@ -8,15 +8,37 @@ Instructions:
 ./upload-file.sh -h
 ```
 
-Example:
+# ask.sh
+
+Simple client for asking questions about your documents from the command line.
+
+Instructions:
 
 ```bash
-./upload-file.sh -f test.pdf -s http://127.0.0.1:9001/upload -u curlUser -c curlDataCollection -i curlExample01
+./ask.sh -h
 ```
 
+# search.sh
+
+Simple client for searching your indexed documents from the command line.
+
+Instructions:
+
+```bash
+./search.sh -h
+```
+
+# run-qdrant.sh
+
+Script to start Qdrant using Docker, for local development/debugging.
+
+Qdrant is used to store and search vectors, as an alternative to
+[Azure Cognitive Search](https://azure.microsoft.com/products/ai-services/cognitive-search).
+
 # run-rabbitmq.sh
 
 Script to start RabbitMQ using Docker, for local development/debugging.
 
-RabbitMQ is used to provides queues for the asynchronous pipelines, as an alternative
-to Azure Queues.
+RabbitMQ is used to provides queues for the asynchronous pipelines,
+as an alternative to
+[Azure Queues](https://learn.microsoft.com/azure/storage/queues/storage-queues-introduction).