Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when trying to make deals, if a bad miner causes an Error, all subsequent deal attempts fail even with healthy miners, until cli is killed. #8095

Closed
9 of 18 tasks
smagdali opened this issue Feb 15, 2022 · 3 comments · Fixed by #8155

Comments

@smagdali
Copy link
Member

smagdali commented Feb 15, 2022

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

lotus version 1.15.0-rc1+mainnet+git.3031b4d98

Describe the Bug

If, when trying to make a deal, you encounter a miner that throws an Error, that Error state persists until you kill the client and restart.

I tried this with many miners and combos (inc f0838467 f023467 f01240 f01234 f08403 f01278 f0773157) before identifying f010479 as the culprit that would poison all subsequent deal attempts.

#7778 looks like it might be adjacent.

Seems like the error state is not being cleared correctly when starting the loop again.

Logging Information

2022-02-15T11:02:11.421Z        WARN    rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/handler.go:279  error in RPC call to 'Filecoin.ClientQueryAsk': bls signature failed to verify # BAD MINER
2022-02-15T11:02:53.701Z        WARN    rpc     go-jsonrpc@v0.1.4-0.20210217175800-45ea43ac2bec/handler.go:279  error in RPC call to 'Filecoin.ClientQueryAsk': bls signature failed to verify # GOOD MINER

# New Deal, GOOD MINER
2022/02/15 11:03:55 storess: Get /1
2022/02/15 11:03:55 storess: Get /2
2022-02-15T11:03:57.068Z        INFO    storagemarket_impl      clientstates/client_states.go:157       sending data for a deal bafyreifx6wns3cxfw3cv7xwo6ntgymbej5i6ftgpeco2sqs6y5oonn327q
2022-02-15T11:03:57.070Z        INFO    dt-impl impl/impl.go:189        open push channel to 12D3KooWNHwmwNRkMEP6VqDCpjSZkqripoJgN7eWruvXXqC2kG9f with base cid bafykbzacea6ns5swfl4rroe4sbhot3lw4vyarycejwmwpwfcxuku2sbb62qbm
2022-02-15T11:03:57.096Z        INFO    dt_graphsync    graphsync/graphsync.go:480      12D3KooWAchWy2cgE9KvH18k1S8HEVQRUpvTggVJh3HRPa8xwKT8-12D3KooWNHwmwNRkMEP6VqDCpjSZkqripoJgN7eWruvXXqC2kG9f-1644856356955663059, GS pull request queued in response to our push, req={root:{str:p <vV*NvDMgآMH!} selector:0xc02db53820 priority:0 id:45 extensions:map[fil/data-transfer:[131 244 246 134 0 245 244 27 22 211 180 36 78 226 170 211 246 96] fil/data-transfer/1.1:[163 100 73 115 82 113 244 103 82 101 113 117 101 115 116 246 104 82 101 115 112 111 110 115 101 166 100 84 121 112 101 0 100 65 99 112 116 245 100 80 97 117 115 244 102 88 102 101 114 73 68 27 22 211 180 36 78 226 170 211 100 86 82 101 115 246 100 86 84 121 112 96] graphsync/dedup-by-key:[120 139 100 97 116 97 45 116 114 97 110 115 102 101 114 45 49 50 68 51 75 111 111 87 65 99 104 87 121 50 99 103 69 57 75 118 72 49 56 107 49 83 56 72 69 86 81 82 85 112 118 84 103 103 86 74 104 51 72 82 80 97 56 120 119 75 84 56 45 49 50 68 51 75 111 111 87 78 72 119 109 119 78 82 107 77 69 80 54 86 113 68 67 112 106 83 90 107 113 114 105 112 111 74 103 78 55 101 87 114 117 118 88 88 113 67 50 107 71 57 102 45 49 54 52 52 56 53 54 51 53 54 57 53 53 54 54 51 48 53 57]] isCancel:false isUpdate:false}
2022-02-15T11:03:57.099Z        INFO    graphsync       responsemanager/querypreparer.go:29     Processing request hooks for request: 45
2022-02-15T11:03:57.101Z        INFO    dt-impl impl/events.go:188      channel 12D3KooWAchWy2cgE9KvH18k1S8HEVQRUpvTggVJh3HRPa8xwKT8-12D3KooWNHwmwNRkMEP6VqDCpjSZkqripoJgN7eWruvXXqC2kG9f-1644856356955663059: received new response, accepting channel
2022-02-15T11:03:57.103Z        INFO    dt_graphsync    graphsync/graphsync.go:985      incoming graphsync request      {"peer": "12D3KooWNHwmwNRkMEP6VqDCpjSZkqripoJgN7eWruvXXqC2kG9f", "graphsync request id": 45, "data transfer channel id": "12D3KooWAchWy2cgE9KvH18k1S8HEVQRUpvTggVJh3HRPa8xwKT8-12D3KooWNHwmwNRkMEP6VqDCpjSZkqripoJgN7eWruvXXqC2kG9f-1644856356955663059"}
2022-02-15T11:04:16.010Z        INFO    dt-impl impl/events.go:270      channel 12D3KooWAchWy2cgE9KvH18k1S8HEVQRUpvTggVJh3HRPa8xwKT8-12D3KooWNHwmwNRkMEP6VqDCpjSZkqripoJgN7eWruvXXqC2kG9f-1644856356955663059: transfer initiated by local node is complete
2022-02-15T11:04:16.044Z        INFO    dt-impl impl/events.go:205      channel 12D3KooWAchWy2cgE9KvH18k1S8HEVQRUpvTggVJh3HRPa8xwKT8-12D3KooWNHwmwNRkMEP6VqDCpjSZkqripoJgN7eWruvXXqC2kG9f-1644856356955663059: received complete response, completing channel

Repo Steps

It's possible that 'f010479' has recovered by the time you get to this. At time of writing this was 100% reproducible.

  1. Try to make a deal with 'f010479' and then 'f01240' or any other good miners.

As soon as you've errored once, all further deals fail.

Test case:

'bad' miner - f010479 - throws a "Error: failed to query ask: bls signature failed to verify" when queried for a deal.
good miner - f01240

1. Good miner offers me a deal:

Data CID (from lotus client import): bafykbzacea6ns5swfl4rroe4sbhot3lw4vyarycejwmwpwfcxuku2sbb62qbm
.. calculating data size
Deal duration (days): 300

Make this a verified deal? (yes/no): yes
Miner Addresses (f0.. f0..), none to find: f01240                                                        
.. querying miner asks
-----
Proposing from f1zk5oufzpzeokprrtwv7lsdkr5jl7fg3jkp5kplq
	Balance: 0 FIL

Piece size: 512MiB (Payload size: 408MiB)
Duration: 7200h0m0s
Total price: ~0 FIL (0 FIL per epoch)
Verified: true

Accept (yes/no): yes
.. executing
Deal (f01240) CID: bafyreibzjpjjcf3keakna7i2ei66cujxt6yogzg2swwodmvkipalwjoazy

2. Bad miner causes Error, and then Error persists for all miners until client is killed

Data CID (from lotus client import): bafykbzacea6ns5swfl4rroe4sbhot3lw4vyarycejwmwpwfcxuku2sbb62qbm
.. calculating data size
Deal duration (days): 300

Make this a verified deal? (yes/no): yes
Miner Addresses (f0.. f0..), none to find: f010479
.. querying miner asks
Error: failed to query ask: bls signature failed to verify
Miner Addresses (f0.. f0..), none to find: f01240
.. querying miner asks
Error: failed to query ask: bls signature failed to verify     # THIS SHOULD BE FINE
Miner Addresses (f0.. f0..), none to find: ^CError: reading miner address: EOF
ERROR: context canceled #Give up 
@smagdali smagdali changed the title when trying to make deals, if a bad miner causes an Error, all subsequent deal attempts fail until cli is killed, even with healthy miners. when trying to make deals, if a bad miner causes an Error, all subsequent deal attempts fail even with healthy miners, until cli is killed. Feb 15, 2022
@Reiers
Copy link

Reiers commented Feb 15, 2022

Hi @smagdali

Thanks for the report. I have added correct labels.
I noticed that you are running a old Lotus version (and the build is .dirty) there are a lot of improvements in the latest release.
Could you try to upgrade Lotus, and see if you are able to reproduce the issue?
I will leave it open and add to the board if the issue persist after the upgrade. Thanks !

Link to latest Lotus tags, here.

@smagdali
Copy link
Member Author

Ah. Now I have to go and figure out where in the docs/tutorial I was following had the old link. (but will update and retest )

@smagdali
Copy link
Member Author

Confirmed that this is still a bug with

lotus --version
lotus version 1.15.0-rc1+mainnet+git.3031b4d98

magik6k pushed a commit that referenced this issue Mar 25, 2022
* Update client.go

Bug fixed : if hitting return instead of filling with a valid value, the CLI crashed (line 796)
Change return err by continue to ask a valid value again

* Fix #8100

Bug fixed : if hitting return instead of filling with a valid value, the CLI crashed (line 796)
Change return err by continue to ask a valid value again

* lint reported errors on CircleCI

Commit to fix lint reported errors during tests on CircleCI
(remove blank lines x2)

* lint reported errors on CircleCI

Commit to fix lint reported errors during tests on CircleCI
(remove trailling whitespace)

* Fix #8095
Clear the list of miner addresses and successfull get-asked before asking for new ones
Previous behavior : if get-ask failed for miner(s), the faulty miner will be retried each time,
so you have to stop the command and start again to change this faulty miner (instead of removing from the list on the new attempt)

* Clear the list of miner addresses and successfull get-asked before asking for new ones
Previous behavior : if get-ask failed for miner(s), the faulty miner will be retried each time,
so you have to stop the command and start again to change this faulty miner (instead of removing from the list on the new attempt)
@TippyFlitsUK TippyFlitsUK removed the need/analysis Hint: Needs Analysis label Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants