Deep Dive into Chrome Built-in AI: Prompt API

Overview

I’ve recently got invited into the early preview program for the Chrome Built-in AI (Prompt API). The built-in AI is exploratory work for what will potentially become a cross-browser standard for embedded AI. It leverages Gemini Nano on device; that means it is bundled into your web browser and the LLM generation happens in your local browser environment.

This repo is extremely WIP and demonstration prototype code to investigate ways that Prompt API could be used. It will ONLY work on Chrome Browsers with Gemini installed. It DOES not work on mobiles and is almost entirely theoretical.

Read the full blog here: here.

The Good: The Easy, the Fast, and the Free

Reasons to Want Embedded AI

Speed: Immediate execution with no latency.
Cost: Affordable, crowdsourced LLM usage.
Usability: Easy to use with simple API calls.

Accessing the Prompt API is as simple as these two lines of code:

const session = await window.ai.createTextSession();
const result = await session.prompt(
  `Tyingshoelaces.com are writing a really cool blog about you. What do you think about that then?`
);

Performance

Although we are restricted to a single session (no concurrency), the performance for complicated long text generation was good. Here are some test results:

Execution Time 1: 0h 0m 3s 47ms
Execution Time 2: 0h 0m 3s 870ms
Execution Time 3: 0h 0m 2s 355ms
Execution Time 4: 0h 0m 3s 176ms
Execution Time 5: 0h 0m 7s 103ms

Average Session Execution Time: 0h 0m 3s 910.1999999999998ms

The average execution time for 5 chained requests to the built-in AI is between 3-4 seconds per complete request for long text generation prompts. This is more than acceptable for most use cases.

Advantages

Scale: Every LLM request is handled via an experimental browser API, which helps in decentralizing LLM distribution.
Preloaded Models: Similar to WebLLM, but with preloaded models bundled into browsers.
Easy and Cost-efficient: Fast, free (or paid for by the consumer), and really easy to use.

Tradeoffs

Sacrifice in Quality

Experimentation Only: The API is designed for experimentation, not production.
Non-Responsiveness: Occasional unresponsiveness, likely due to multiple asynchronous requests.
Smaller Model: Generalist nature leads to less polished output.

Other Considerations

Private Model: May be useful for internal, non-public systems.
Task-specific Models: Future architectures might use multiple small, highly tuned, task-oriented LLMs.

A Different Approach: Neurons, Not Brain

Concept of Tiny, Fast Connections Meshed

The current focus on larger context windows may not be the most efficient way to scale Generative AI. Instead, focusing on small, precise details meshed together can form something larger and more efficient.

Example Use Case: E-Commerce Recommendation Algorithm

Threads and Multidimensional Processes

Social Cues and Sentiment Analysis

Interaction time
Browsing behavior
Source referral data

Behavior Cues and User Input Interpretation

Conversation initiation
Tone of user input

User Context

Age and gender demographics
User identity

Site Context

Trending products and site data

Using well-prepared data to inform the LLM can provide more targeted and useful user interactions.

Conclusion

We are still in the early stages of LLMs, and significant advancements are expected. Moving LLMs to the browser can revolutionize how we use and experiment with AI, making it cheaper and more accessible. Building efficient and nuanced infrastructure will massively improve the quality of output, regardless of model size or algorithm quality.

What I Did

I built a demo where you can experience a browser-controlled voice interaction:

Talk into the browser using the WebSpeechRecognitionAPI.
Prompt API for intepretation.
Responses from Gemini in browser.
Psuedo code for informing AI model.

Technologies Used

PromptAPI: ★★★★★ - New benchmarks in speed and cost.
WebSpeechRecognitionAPI: ★★★★☆ - Noticeable difference from GPT-io, great for cheap requests and demos.

Links

Blog: Blog Post

Edward Ejb503, Tying Shoelaces Blog

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
public		public
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
measurespeed.js		measurespeed.js
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Dive into Chrome Built-in AI: Prompt API

Overview

The Good: The Easy, the Fast, and the Free

Reasons to Want Embedded AI

Performance

Advantages

Tradeoffs

Sacrifice in Quality

Other Considerations

A Different Approach: Neurons, Not Brain

Concept of Tiny, Fast Connections Meshed

Example Use Case: E-Commerce Recommendation Algorithm

Threads and Multidimensional Processes

Conclusion

What I Did

Technologies Used

Links

About

Releases

Packages

Languages

License

Ejb503/chrome-ai-prompt-api

Folders and files

Latest commit

History

Repository files navigation

Deep Dive into Chrome Built-in AI: Prompt API

Overview

The Good: The Easy, the Fast, and the Free

Reasons to Want Embedded AI

Performance

Advantages

Tradeoffs

Sacrifice in Quality

Other Considerations

A Different Approach: Neurons, Not Brain

Concept of Tiny, Fast Connections Meshed

Example Use Case: E-Commerce Recommendation Algorithm

Threads and Multidimensional Processes

Conclusion

What I Did

Technologies Used

Links

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages