Burr Serving with BentoML

This repository shows how to deploy a Burr Application with BentoML.

Overview

Burr and BentoML help you build the application and serving layers of your system.

Application layer

Burr creates easy to understand and debug applications with a clear path to production. It supports synchronous, asynchronous, and streaming actions. Persistence & durability, hooks, and telemetry features are built-in.

Serving layer

BentoML is a specialized tool to package, deploy, and manage AI services. Get the most performance from your system by specifying resource requirements (CPU, GPU, RAM, concurrency, workers, etc.), autoscaling, and adaptive batching for requests. It also automatically generates synchronous and asynchronous clients for your service.

Directory Content

web_page_qna/ is an introductory example to deploy with BentoML a Burr Application that uses LLMs to answer questions about a web page.

Community

Join the BentoML developer community on Slack for more support and discussions!

Join the Burr Discord server for help, questions, and feature requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Burr Serving with BentoML

Overview

Application layer

Serving layer

Directory Content

Community

Files

README.md

Latest commit

History

README.md

File metadata and controls

Burr Serving with BentoML

Overview

Application layer

Serving layer

Directory Content

Community