Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: The output content is different #8585

Closed
yancaoweidaode opened this issue Jul 19, 2024 · 7 comments
Closed

Bug: The output content is different #8585

yancaoweidaode opened this issue Jul 19, 2024 · 7 comments
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) stale

Comments

@yancaoweidaode
Copy link
Contributor

yancaoweidaode commented Jul 19, 2024

What happened?

First, I set the seed to 1 and temp to 0, to ensure that the llm always outputs the same content when facing the same input. For example, using llama3-8b, when I input

"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nhello, who are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
the output is

”9906: Hello1070: there0: !358: I2846: 'm264: a11190: helpful18328: assistant11: ,1618: here311: to7945: assist499: you449: with904: any4860: questions477: or9256: tasks499: you1253: may617: have13: .358: I2846: 'm264: a6500: computer2068: program6319: designed311: to3619: understand323: and6013: respond311: to5933: natural4221: language11: ,779: so499: you649: can6369: chat449: with757: me1120: just1093: like499: you1053: would449: with264: a4333: friend382: .

40: I649: can1520: help499: you449: with264: a7029: wide2134: range315: of2574: things11: ,1778: such439: as1473: :

9: *22559: Answer287: ing4860: questions389: on5370: various13650: topics11: ,505: from8198: science323: and3925: history311: to16924: entertainment323: and7829: culture198:
9: *81200: Providing17931: definitions323: and41941: explanations369: for4339: words323: and32847: phrases198:
9: *67118: Offering18726: suggestions323: and19075: recommendations369: for6603: books11: ,9698: movies11: ,4731: music11: ,323: and810: more198:
9: *2755: Ass11330: isting449: with4221: language14807: translation323: and32528: grammar27358: correction198:
9: *97554: Generating6848: ideas323: and87881: brainstorm287: ing10105: solutions311: to5435: problems198:
9: *1628: And1790: much810: more2268: !

4516: So11: ,1148: what596: 's389: on701: your4059: mind30: ?3234: Do499: you617: have264: a3230: specific3488: question477: or8712: topic499: you4265: 'd1093: like311: to4358: discuss30: ?358: I2846: 'm682: all25212: ears0: !128009: [end of text]. "

I have printed out both the sampled token id and the corresponding characters. Then, I put the first tokenid of the output at the end of the output token sequence, that is, embd_inp.push_back(9906), and the output I get is
1070: there0: !358: I2846: 'm459: an15592: AI18328: assistant11: ,6319: designed311: to1520: help499: you449: with264: a7029: wide2134: range315: of9256: tasks323: and4860: questions13: .358: I2846: 'm264: a5780: machine6975: learning1646: model11: ,16572: trained389: on264: a13057: vast3392: amount315: of1495: text828: data11: ,902: which20682: enables757: me311: to3619: understand323: and6013: respond311: to5933: natural4221: language11374: inputs382: .

40: I649: can7945: assist499: you449: with4395: everything505: from4689: general6677: knowledge323: and74032: trivia311: to810: more3230: specific13650: topics1093: like8198: science11: ,3925: history11: ,323: and5557: technology13: .358: I649: can1101: also1520: help499: you449: with4221: language14228: -related9256: tasks1778: such439: as4221: language14807: translation11: ,1495: text29385: summar2065: ization11: ,323: and1524: even4477: writing18726: suggestions382: .

40: I2846: 'm1618: here311: to1520: help499: you304: in904: any1648: way358: I649: can11: ,779: so2733: feel1949: free311: to2610: ask757: me4205: anything430: that596: 's389: on701: your4059: mind13: .3639: What596: 's389: on701: your4059: mind3432: today30: ?128009: [end of text].

Obviously, the two outputs are not the same. However, I think that due to the existence of the mask, the kvcache generated by the two calculations should be the same. So why are the output results different? Is there something I didn’t set correctly, or is there a bug somewhere in the code?

Name and Version

llama-cli, version e02b597, build with cmake,(windows11)

What operating system are you seeing the problem on?

No response

Relevant log output

No response

@yancaoweidaode yancaoweidaode added bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) labels Jul 19, 2024
@ggerganov
Copy link
Owner

Can you confirm that adding -b 1 the results are the same?

@yancaoweidaode
Copy link
Contributor Author

Can you confirm that adding -b 1 the results are the same?

I tried to add this parameter, but it couldn't output the result because "logical batch size for prompt processing (must be >=32 to use BLAS)"

@ggerganov
Copy link
Owner

What about -ub 1?

@ngxson
Copy link
Collaborator

ngxson commented Jul 21, 2024

Maybe related to #8593 , problem with seeding for sampling

@yancaoweidaode
Copy link
Contributor Author

-ub 1
ok, I’ve tried it, and the output results are the same for both attempts. However, it’s strange that for the same prompt, the outputs are different when n_ubatch=1 and n_ubatch=512. But I can’t figure out where the problem might be.

@yancaoweidaode
Copy link
Contributor Author

Maybe related to #8593 , problem with seeding for sampling

I’m afraid that might not be the case, because I’ve already set the seed to 1 in common.h

@github-actions github-actions bot added the stale label Aug 22, 2024
Copy link
Contributor

github-actions bot commented Sep 5, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) stale
Projects
None yet
Development

No branches or pull requests

3 participants