Skip to content

pathway-labs/azure-openai-real-time-data-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data streaming for real-time enterprise AI apps

This repository demonstrates how to build real-time generative AI applications using Azure Event Hubs + Azure OpenAIPathway’s LLM App+Streamlit.

Motivation

Real-time AI app needs real-time data to respond with the most up-to-date information to user queries or perform quick actions autonomously. To reduce cost and infrastructural complexity, you can build a real-time data pipeline with Azure Event Hubs, Pathway, and Azure OpenAI. This integrated system leverages the strengths of Pathway for robust data processing, LLMs like GPT for advanced text analytics, and Streamlit for user-friendly data visualization.

This combination empowers businesses to build and deploy enterprise AI applications that provide the freshest contextual visual data.

Example scenario: Customer support and sentiment analysis dashboard

Background

For example, a multinational corporation wants to improve its customer support by analyzing customer feedback and inquiries in real-time. They aim to understand common issues, track customer sentiment, and identify areas for improvement in their products and services. To achieve this, they need a system that can process large data streams, analyze text for insights, and present these insights in an accessible way.

Implementation

  1. Azure Event Hubs & Kafka: Real-Time Data Streaming and Processing

    Azure Event Hubs collects real-time data from various sources, such as customer feedback forms, support chat logs, and social media mentions. This data is then streamed into a Kafka cluster for further processing.

  2. Large Language Models (LLMs) like GPT from Azure OpenAI: Text Analysis and Sentiment Detection

    The text data from Kafka is fed into an LLM for natural language processing using Pathway. This model performs sentiment analysis, key phrase extraction, and feedback categorization (e.g., identifying common issues or topics).

  3. Pathway to enable real-time data pipeline

    Pathway gains access to the data streams from Azure Event Hubs, it preprocesses, transforms, or joins them and the LLM App helps to bring real-time context to the AI App with real-time vector indexing, semantic search, and retrieval capabilities. The text content of the events will be sent to Azure OpenAI embedding APIs via the LLM App to compute the embeddings and vector representations will be indexed.

    Using the LLM app, the company can gain deep insights from unstructured text data, understanding the sentiment and nuances of customer feedback.

  4. Streamlit: Interactive Dashboard for Visualization

    Streamlit is used to create an interactive web dashboard that visualizes the insights derived from customer feedback. This dashboard can show real-time metrics such as overall sentiment trends, and common topics in customer feedback, and even alert the team to emerging issues (See example implementation of alerting to enhance this project).

Overview of the Azure services the sample project uses

Service Purpose
Azure AI Services To use Azure OpenAI GPT model and embeddings.
Azure Event Hubs To stream real-time events from various data sources.
Azure Container Apps Hosts our containerized applications (backend and frontend) with features like auto-scaling and load balancing.
Azure Container Registry Stores our Docker container images in a managed, private registry.
Azure Log Analytics Collects and analyzes telemetry and logs for insights into application performance and diagnostics.
Azure Monitor Provides comprehensive monitoring of our applications, infrastructure, and network.

Azure infrastructure with the main components

Azure Infrastructure Diagram

One click running app demo

Follow the link to see the running UI app in Azure:

Customer support and sentiment analysis dashboard

It builds a real-time dashboard based on an example prompt we provided to analyze the data.

Setup the project

To set up the project you need to follow the below steps:

  1. You have an Azure account with the required settings specified in the Prerequisites section.
  2. Choose one of these environments to open the project:
    1. GitHub Codespaces.
    2. VS Code Dev Containers.
    3. Local environment.
  3. Follow the deploy from scratch or deploy with existing Azure resources guide.

Prerequisites

Azure account requirements: To run and deploy the example project, you'll need:

  • Azure account. If you're new to Azure, get an Azure account for free and you'll get some free Azure credits to get started.
  • Azure subscription with access enabled for the Azure OpenAI service. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access.

Open in GitHub Codespaces

Follow these steps to open the project in a Codespace:

  1. Click here to open in GitHub Codespaces

Open in GitHub Codespaces

  1. Next -> deploy from scratch or deploy with existing Azure resources.

Open in Dev Container

  1. Click here to open in Dev Container

Open in Dev Container

  1. Next -> deploy from scratch or deploy with existing Azure resources.

Local environment

First, install the required tools:

  • Azure Developer CLI
  • Python 3.9, 3.10, or 3.11
    • Important: Ensure you can run python --version from the console. On Ubuntu, you might need to run sudo apt install python-is-python3 to link python to python3.
  • Git
  • Install WSL - For Windows users only.
  • Powershell 7+ (pwsh) - For Windows users only.
    • Important: Ensure you can run pwsh.exe from a PowerShell terminal. If this fails, you likely need to upgrade PowerShell.

Then bring down the project code:

  1. Open a terminal.
  2. Run azd auth login and log in using your Azure account credentials.
  3. Run azd init -t https://github.com/pathway-labs/azure-openai-real-time-data-app. This command will initialize a git repository and you do not need to clone this repository.
  4. When the project starts, the system prompts you to enter a new environment name: AZURE_ENV_NAME. Read more manage environment variables. For example, any name like: pathway and outputs for infrastructure provisioning are automatically stored as environment variables in an .env file, located under .azure/pathway/.env in the project folder.
  5. Then, follow the deploying from scratch guide.

Run the project locally

  1. Open the project in GitHub Codespaces, VS Code Dev Containers, or Local environment.
  2. Deploy from scratch or deploy with existing Azure resources.
  3. Copy .env file, located under .azure/<environment name>/.env folder to a new .env file in the project root folder where README.md file is.
  4. Install the required packages:
pip install --upgrade -r requirements_dev.txt
  1. Navigate to /app/frontend folder cd /app/frontend.
  2. Run the UI app with the streamlit run app.py command. Frontend app uses the backend API deployed in Azure automatically.

Deploy from scratch

If you don't have any pre-existing Azure services and want to start from a fresh deployment, execute the following commands.

  1. Open a terminal.
  2. Run azd up - This will provision Azure resources and deploy the sample project to those resources. We're using Bicep, a language that simplifies the definition of ARM templates and configuring Azure resources.
  3. You keep EVENT_HUBS_NAMESPACE_CONNECTION_STRING and AZURE_OPENAI_API_KEY empty. We will assign them later after the first successful deployment.

Deployment step 1

Deployment step 2

After the application has been successfully deployed you will see URLs for both backend and frontend apps printed to the console.

Deployment step 3

NOTE: It may take 5-10 minutes for the application to be fully deployed.

  1. After the first deployment, we set environment variable values for EVENT_HUBS_NAMESPACE_CONNECTION_STRING and AZURE_OPENAI_API_KEY by running below commands. See how to retrieve Azure OpenAI API Key and Event Hubs connection string. You can also manually set these values from the Azure portal and skip Step 7.
azd env set AZURE_OPENAI_API_KEY {Azure OpenAI API Key}

azd env set EVENT_HUBS_NAMESPACE_CONNECTION_STRING {Azure Event Hubs Namespace Connection String}
  1. Run azd deploy to update these values in the Azure Container App. Pathway LLM App backend uses these environment variables. Other variables will be filled automatically.

Deployment step 4

  1. Follow the generated link in the terminal for the frontend app in the Azure Container app and start to use the app. App ingests data from Azure event hubs. Learn how to send events using Azure Event Hubs Data Generator.

Customer support and sentiment analysis dashboard

Deploy with existing Azure resources

If you already have existing Azure resources, you can re-use those by setting azd environment values.

Existing Azure resource group

  1. Run azd env set AZURE_RESOURCE_GROUP {Name of existing resource group}
  2. Run azd env set AZURE_LOCATION {Location of existing resource group}

Existing Azure OpenAI resource

  1. Run azd env set AZURE_OPENAI_SERVICE {Name of existing OpenAI service}
  2. Run azd env set AZURE_OPENAI_RESOURCE_GROUP {Name of existing resource group that OpenAI service is provisioned to}
  3. Run azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT {Name of existing ChatGPT deployment}. Only needed if your ChatGPT deployment is not the default 'chat'.
  4. Run azd env set AZURE_OPENAI_EMB_DEPLOYMENT {Name of existing GPT embedding deployment}. Only needed if your embedding deployment is not the default 'embedding'.

When you run azd up after and are prompted to select a value for openAiResourceGroupLocation, make sure to select the same location as the existing OpenAI resource group.