-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JS] Upgrade to 0.9.3 is causing significant increase in hallucinations #1360
Comments
Thanks for the report. I'm investigating. |
cc @mbleigh |
I have a PR: #1365 ai.defineFormat(
{
name: 'myJson',
format: 'json',
},
(schema) => {
let instructions: string | undefined;
if (schema) {
instructions = `Output should be in JSON format and conform to the following schema:
\`\`\`
${JSON.stringify(schema)}
\`\`\`
`;
}
return {
parseChunk: (chunk) => {
return extractJson(chunk.accumulatedText);
},
parseMessage: (message) => {
return extractJson(message.text);
},
instructions,
};
}
);
const MenuItemSchema = z.object({
name: z.string(),
description: z.string(),
calories: z.number(),
allergens: z.array(z.string()),
});
export const menuSuggestionFlow = ai.defineFlow(
{
name: 'menuSuggestionFlow',
outputSchema: MenuItemSchema.nullable(),
},
async () => {
const response = await ai.generate({
prompt: 'Invent a menu item for a pirate themed restaurant.',
output: { format: 'myJson', schema: MenuItemSchema },
});
return response.output;
}
); |
this is an interesting path, I think the reported difference was between |
d'oh... I should pay better attention... let me diff |
Interesting, the diff is only in format: however, gemini json mode condition is genkit/js/plugins/googleai/src/gemini.ts Line 605 in 39c86af
|
yeah, verified that there's no diff between 0.9.1->0.9.3 at the gemini model call level.... |
I'm not sure that it's the same issue, but I isolated down the changes I noticed to a repeatable case in #1368 . Filed as a separate issue so as not to derail this thread. @tekkeon are you able to provide any more context for your snippets? I think an example with
on the generation showing hallucination / strange output would be very helpful. If you're using the Developer UI you can export a trace from a bad generation under the trace details tab which would also help us debug if you have a sharable trace. |
Thanks for the quick replies here. @pavelgj @i2amsam I've taken a screenshot of the diff of the redacted rendered prompts. I'll take a look at the traces as well, though I suspect there will at least be some IP in there I wouldn't want shared publicly. I can share it with you privately though if it will help. |
@pavelgj Not sure if it's relevant, but the lines you referred to were changed 2 weeks ago adding in the |
Hmmmm, I'm a little out of my wheelhouse here, but @pavelgj wouldn't we expect to have a |
Oh, and @tekkeon are you using the Vertex version or the Gemini API version? |
Oh great question - we're using the Vertex version. |
A change in 0.9 is that Gemini models now use constrained generation by default (that's the Can you try adding |
@mbleigh I think tekkon's config doesn't have the |
Hmm, no that shouldn't be the problem. I'm a little stumped as to what might be causing the regression...if there's any way to get a minimal reproduction that exhibits the behavior it would help a lot. |
I'm gonna try a couple of these ideas and see if that gets us any clues. Specifically:
If these don't reveal any new insights, I'll see if I can come up with a minimal reproduction. It's obviously tough to do as I'm guessing hallucination is a lot less likely with less complex prompts, so I'd have to come up with a complex enough prompt to start to trigger issues. Alternatively, @mbleigh is there a secure channel where I can send you the specific example prompt and results we're seeing between the two versions? It won't contain real data from production systems, just fake data but through our actual systems and prompts. |
this feature is now released in 0.9.4, you should be able to define a custom JSON formatter to control and evaluate various flag combinations: ai.defineFormat(
{ name: 'customJson' },
(schema) => {
let instructions: string | undefined;
if (schema) {
instructions = `Output should be in JSON format and conform to the following schema:
\`\`\`
${JSON.stringify(schema)}
\`\`\`
`;
}
return {
parseChunk: (chunk) => extractJson(chunk.accumulatedText),
parseMessage: (message) => extractJson(message.text),
instructions,
};
}
);
const { output } = await ai.generate({
prompt: 'Invent a menu item for a pirate themed restaurant.',
output: { format: 'customJson', schema: MenuItemSchema },
}); |
@tekkeon you can email me bleigh (at) google.com with the specific repro if needed 🙂 |
Okay so I have some interesting results here. After upgrading to 0.9.3, if I go into the built version in @mbleigh I'm going to send you the inputs and outputs I get from the above test before and after making the edits to gemini.js. Hopefully it can help! |
I'm hoping #1612 will imrove this. |
Describe the bug
After upgrading to 0.9.3, we're noticing that the output Gemini (we're specifically using Gemini Pro 1.5 002) has numerous problems in our system. We're getting hallucinated text and overall very strange outputs that make our application unusable. When we downgrade back to 0.9.1, everything works properly again.
We looked at the rendered prompts and noticed the only difference was that
format
was now specified asjson
.Snippit from 0.9.1:
Snippet from 0.9.3
We suspect this change to default to JSON mode may be causing the issue.
To Reproduce
Upgrade from 0.9.1 to 0.9.3 and run some prompts with complex outputs.
Expected behavior
We expected AI output to remain consistent to what our system had been producing.
Runtime (please complete the following information):
Node version
The text was updated successfully, but these errors were encountered: