-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: The output content is different #8585
Comments
Can you confirm that adding |
I tried to add this parameter, but it couldn't output the result because "logical batch size for prompt processing (must be >=32 to use BLAS)" |
What about |
Maybe related to #8593 , problem with seeding for sampling |
|
I’m afraid that might not be the case, because I’ve already set the seed to 1 in common.h |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
What happened?
First, I set the seed to 1 and temp to 0, to ensure that the llm always outputs the same content when facing the same input. For example, using llama3-8b, when I input
"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nhello, who are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
the output is
”9906: Hello1070: there0: !358: I2846: 'm264: a11190: helpful18328: assistant11: ,1618: here311: to7945: assist499: you449: with904: any4860: questions477: or9256: tasks499: you1253: may617: have13: .358: I2846: 'm264: a6500: computer2068: program6319: designed311: to3619: understand323: and6013: respond311: to5933: natural4221: language11: ,779: so499: you649: can6369: chat449: with757: me1120: just1093: like499: you1053: would449: with264: a4333: friend382: .
40: I649: can1520: help499: you449: with264: a7029: wide2134: range315: of2574: things11: ,1778: such439: as1473: :
9: *22559: Answer287: ing4860: questions389: on5370: various13650: topics11: ,505: from8198: science323: and3925: history311: to16924: entertainment323: and7829: culture198:
9: *81200: Providing17931: definitions323: and41941: explanations369: for4339: words323: and32847: phrases198:
9: *67118: Offering18726: suggestions323: and19075: recommendations369: for6603: books11: ,9698: movies11: ,4731: music11: ,323: and810: more198:
9: *2755: Ass11330: isting449: with4221: language14807: translation323: and32528: grammar27358: correction198:
9: *97554: Generating6848: ideas323: and87881: brainstorm287: ing10105: solutions311: to5435: problems198:
9: *1628: And1790: much810: more2268: !
4516: So11: ,1148: what596: 's389: on701: your4059: mind30: ?3234: Do499: you617: have264: a3230: specific3488: question477: or8712: topic499: you4265: 'd1093: like311: to4358: discuss30: ?358: I2846: 'm682: all25212: ears0: !128009: [end of text]. "
I have printed out both the sampled token id and the corresponding characters. Then, I put the first tokenid of the output at the end of the output token sequence, that is, embd_inp.push_back(9906), and the output I get is
1070: there0: !358: I2846: 'm459: an15592: AI18328: assistant11: ,6319: designed311: to1520: help499: you449: with264: a7029: wide2134: range315: of9256: tasks323: and4860: questions13: .358: I2846: 'm264: a5780: machine6975: learning1646: model11: ,16572: trained389: on264: a13057: vast3392: amount315: of1495: text828: data11: ,902: which20682: enables757: me311: to3619: understand323: and6013: respond311: to5933: natural4221: language11374: inputs382: .
40: I649: can7945: assist499: you449: with4395: everything505: from4689: general6677: knowledge323: and74032: trivia311: to810: more3230: specific13650: topics1093: like8198: science11: ,3925: history11: ,323: and5557: technology13: .358: I649: can1101: also1520: help499: you449: with4221: language14228: -related9256: tasks1778: such439: as4221: language14807: translation11: ,1495: text29385: summar2065: ization11: ,323: and1524: even4477: writing18726: suggestions382: .
40: I2846: 'm1618: here311: to1520: help499: you304: in904: any1648: way358: I649: can11: ,779: so2733: feel1949: free311: to2610: ask757: me4205: anything430: that596: 's389: on701: your4059: mind13: .3639: What596: 's389: on701: your4059: mind3432: today30: ?128009: [end of text].
Obviously, the two outputs are not the same. However, I think that due to the existence of the mask, the kvcache generated by the two calculations should be the same. So why are the output results different? Is there something I didn’t set correctly, or is there a bug somewhere in the code?
Name and Version
llama-cli, version e02b597, build with cmake,(windows11)
What operating system are you seeing the problem on?
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: