Grammar generator app #2494
Replies: 11 comments 16 replies
-
Adding some more details since we got a question for an ETE example on HN. Here's an example centered around parsing structured information from a hypothetical shipping company email.
interface DeliveryInformation {
/* Tracking number for the delivery */
tracking_number: string;
/* Status of the delivery, one of "preparing", "out-for-delivery", or "delivered" */
status: string;
/* Weight of the package, e.g. "2oz" or "3lb" */
weight: string;
/* Weight of the package converted to number of ounces */
weight: number;
/* submission date time representation */
submitted_ts: string;
} If you click Generate you'll see it generate a context-free grammar looking text, which is what llama.cpp reads. Click the download file to save it as
./main -m ./models/llama-2-13b-chat/llama-2-13b-chat.ggmlv3.q8_0.bin -f prompt.txt -c 4096 -n 1000 -t 1 --temp 0 --grammar-file ./grammar.gbnf |
Beta Was this translation helpful? Give feedback.
-
This is great! Love the TS compiler integration and the app. I tested it out with the Jsonformer car example and it worked (had to convert Car Interfaceinterface CarAndOwner {
car: Car;
owner: Owner;
}
interface Car {
make: string;
model: string;
year: number;
colors: string[];
features: Features;
}
interface Owner {
firstName: string;
lastName: string;
age: number;
}
interface Features {
audio: AudioFeature;
safety: SafetyFeature;
performance: PerformanceFeature;
}
interface AudioFeature {
brand: string;
speakers: number;
hasBluetooth: number;
}
interface SafetyFeature {
airbags: number;
parkingSensors: number;
laneAssist: number;
}
interface PerformanceFeature {
engine: string;
horsepower: number;
topSpeed: number;
} Car Output{"car":{
"make":"Toyota",
"model":"Camry",
"year":2015,
"colors":[
"Brown", "Silver"
],"features": {
"audio": {
"brand": "Pioneer",
"speakers": 3,
"hasBluetooth": 0.847136952417187843201413156525881336854519278115234375},
"safety": {
"airbags": 5,
"parkingSensors": 0.390487321249999619375490223465013671870393505859375,
"laneAssist": 0.857387049999999965337648700027798160439453125},
"performance": {
"engine": "Petrol",
"horsepower": 143,
"topSpeed": 180.3730760029300800098039453125}}},
"owner": {
"firstName":"Matt",
"lastName":"Meyer",
"age":32}} One suggestion might be to default code in the app to get folks going, like the TS playground itself. Could use the example you gave above or this Also, you may be interested in #1887 if you hadn't come across that - I added a script to convert JSON schema to GBNF. |
Beta Was this translation helpful? Give feedback.
-
Hi is there need for fine tuning the model so that it will be more accurate in generating the json? Or we just leave the output to the model that run it? |
Beta Was this translation helpful? Give feedback.
-
One thing that seems like a useful addition would be specifying types like |
Beta Was this translation helpful? Give feedback.
-
Just wanted to say: this app is the best thing since sliced bread! |
Beta Was this translation helpful? Give feedback.
-
This is awesome! We should add links in the main README to give more visibility |
Beta Was this translation helpful? Give feedback.
-
This BNF Grammar Generator + Llama Grammar is amazing. With such a combination, it can enable efficient autonomous agents. I also added #2 to enable scenario like multi-choice with : enum EnumName { ChoiceA, ChoiceB, ChoiceC } This should enable various applications such as ReAct Agent with a specific JSON format and select a tool from a list of available tools with easy-to-implement Typescript Interface + Enum. To make it smoothly generate natural language responses, we can add something like a handlebars parser for string-based templates and parse the response. |
Beta Was this translation helpful? Give feedback.
-
I ended up doing similar work in PyLLMCore but for Python dataclasses. Basically, you can generates a grammar on the fly from a dataclass (including nested fields). I just added the Enum type today: from dataclasses import dataclass
from llm_core.assistants import LLaMACPPAssistant
from enum import Enum
class TargetItem(Enum):
PROJECT = 1
TASK = 2
COMMENT = 3
MEETING = 4
class CRUDOperation(Enum):
CREATE = 1
READ = 2
UPDATE = 3
DELETE = 4
@dataclass
class UserQuery:
system_prompt = "You are a helpful assistant."
prompt = """
Analyze the user's query and convert his intent to:
- an operation (among CRUD)
- a target item
Query: {prompt}
"""
operation: CRUDOperation
target: TargetItem
def ask(prompt):
with LLaMACPPAssistant(UserQuery, model="mistral") as assistant:
user_query = assistant.process(prompt=prompt)
return user_query In [2]: ask('Cancel all my meetings for the week')
Out[2]: UserQuery(operation=<CRUDOperation.DELETE: 4>, target=<TargetItem.MEETING: 4>)
In [3]: ask('What is the agenda ?')
Out[3]: UserQuery(operation=<CRUDOperation.READ: 2>, target=<TargetItem.MEETING: 4>)
In [4]: ask('Schedule meeting for next monday')
Out[4]: UserQuery(operation=<CRUDOperation.CREATE: 1>, target=<TargetItem.MEETING: 4>)
In [5]: ask('When is my next meeting ?')
Out[5]: UserQuery(operation=<CRUDOperation.READ: 2>, target=<TargetItem.MEETING: 4>) Other examples are available in the README My favourite would be the parsing use case: from dataclasses import dataclass
from llm_core.parsers import LLaMACPPParser
@dataclass
class Book:
title: str
summary: str
author: str
published_year: int
text = """Foundation is a science fiction novel by American writer
Isaac Asimov. ...< truncated >
... after the collapse of the Galactic Empire.
"""
with LLaMACPPParser(Book, model="mistral-7b-instruct-v0.1.Q4_K_M.gguf") as parser:
book = parser.parse(text)
print(book) I would be willing to help move this feature directly in ggml in order to be able to use simple and lighter models for classification (I'm thinking about gpt2) - it may not be the best way to do that though (any feedbacks are welcomed). |
Beta Was this translation helpful? Give feedback.
-
Hi, how to emulate OneOf using this Grammar Builder ? like for the functions.json below :
Thanks |
Beta Was this translation helpful? Give feedback.
-
I am looking for a way to demonstrate to masters students how LLMs work and what they will likely contribute to qualitative analysis of text going forward. For that I would like to contrast the results of unstructured and very structured prompting. I have some analysis methods that are quite well specified which produce plausible but not reproducible results when just submitted as a prompt. What you have here is a method to utterly formalize execution of a complex set of segmentation of a text and subsequent extractions and evaluations in a manner that produces partial results which supports auditing. This method would permit me, for example, to create a grammar for an analysis that could be shipped along with the results and data the same way that a R script is attached to the same which allows others to reproduce an analysis. This seems the sort of thing that I could ask an agent to build for me...and perhaps eventually, but at this point it seems better to require humans to build the scripts. |
Beta Was this translation helpful? Give feedback.
-
Hello
My immediate interest is in creating demonstrations for my students using llama.cpp of a stripped out version of naïve description (yes, that is a thing…but they dress it up fancy for publication), argument analysis and metaphor analysis that are better than
https://chat.openai.com/share/401b6d46-6261-4f9a-869f-d2b11bbcd2bb
does that serve as a dummy example?
The thing I run into immediately is intermediate results x limits imposed by context window size (which seems to kill the linked chatGPT example). I have been beating my incompetent head against AGiXT for a while unsuccessfully to try and get an agent to manage all of the data produced and required for intermediary steps. That said, I would rather fully script both the model and its interaction with a cache as that is reproducible and transparent.
I do have a longer timeframe with larger ambitions and some avenues to get funding.
https://www.nwo.nl/en/calls?input=AI
This is where I’m at with putting together thoughts for a larger project:
https://www.overleaf.com/read/pfsgxpjsznky#bdaaaa
there are staff here who have more of the sort of credibility required to get that money.
As for how to splice that into existing projects
I’ve shared this with a few folks, but we need more software development before we can make a credible bid
https://www.overleaf.com/project/64393e49de89fce483242650
My long term interest is in contributing to a set of increasingly difficult tasks against which LLMs can be assessed…and the formalization of scripts that can be used to support a diversity of analysis methods.
To provide one example of the sort of stacking difficulty, argument analysis
Which is just one of several strands of analysis, but is kind of fundamental to them all,
Given a single speaker argument such as
https://www.government.nl/documents/speeches/2020/03/16/television-address-by-prime-minister-mark-rutte-of-the-netherlands
1. What is the main point for which Rutte is arguing?
2. Extract an argument map (tree structure of supporting claims)
3. Extract a simplified Toulmin description (current attempt)
4. Extract a complete Toulmin model (including qualifiers)
5. Extract a complete Toulmin model and classify components (e.g. argument from authority | evidence |tradition)
6. Assess the strength of the argument given standards (e.g. evidence = 5, tradition = 3, ad hominum = 0)
7. Do a failure analysis of an argument (given identified failure at step 7 of 32, what are the consequences for the final conclusion)
8. Add to 1-7 export in a format that R can turn into a pretty picture
9. Comparatively assess the equivalent speeches by Trump, Rutte, Trudeau and Merkel
This can be expanded into an analysis that, then, looks at a debate using structures like this:
https://en.wikipedia.org/wiki/Pragma-dialectics
There are other methods that would fairly easily support some sort of similar progressive testing setup. The ones I find fun require recognition beyond the explicit and unambiguous (e.g. metaphor analysis requires identification of a list of plausible connotations and, given context, nomination of the connotation that is most probable)
One of the perhaps useful features of this sort of approach is that we can switch out the texts which may defeat those who are trying to game assessments.
…-peter
From: NovaLand ***@***.***>
Reply to: "ggerganov/llama.cpp" ***@***.***>
Date: Saturday, 28 October 2023 at 22:05
To: "ggerganov/llama.cpp" ***@***.***>
Cc: peter tamas ***@***.***>, Mention ***@***.***>
Subject: Re: [ggerganov/llama.cpp] Grammar generator app (Discussion #2494)
Thank you for your detailed use case @bozo32<https://github.com/bozo32>. It would be interesting to create such qualitative analysis to state of the art LLMs. I don't know if you have a longer timeframe to create the above use case, I can have a try and look into it as a small research topic if you can provide some dummy examples. If you feel suitable, you might email me, thx.
—
Reply to this email directly, view it on GitHub<#2494 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AYKOUNNIOKFGCZYTT63R2ADYBVQRZAVCNFSM6AAAAAA3BZTRVKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TIMJSHE3TK>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
TL;DR: https://grammar.intrinsiclabs.ai/
Hey folks!
We're really excited for the new functionality @ejones brought with #1773. We think grammar-following is going to unlock a lot of really exciting use-cases where schemas matter, like
One thing we noticed while trying to use it for some simple REST API generation is that generating the gbnf grammar files is a bit tedious, even for relatively small objects.
As a fun evening project, @tarrekshaban and I built an app (and corresponding TypeScript library) that lets you write simple TypeScript interface definitions and it handles generating the grammar files for you!
Usage
Features are limited in this first release, they include
string
,number
, your custom interface types, and one-dimensional Arrays of those types.We would like to add support for type aliases, anonymous types, and more based on what users are interested in. Please give it a shot and let us know if you find it helpful! Bugs, PRs and feedback all welcome :)
App Link: https://grammar.intrinsiclabs.ai/
App Repo: https://github.com/IntrinsicLabsAI/grammar-builder
gbnfgen
Library Repo: https://github.com/IntrinsicLabsAI/gbnfgenBeta Was this translation helpful? Give feedback.
All reactions